ec 720 - math for economists lecture notesfaculty.business.utsa.edu/salva/ec720f11/notes.pdf ·...

EC 720 - Math for Economists

Lecture Notes

Samson Alva∗

Department of Economics, Boston College

Fall 2011

∗e-mail: [email protected].

1

1 Overview of Course

This is a course on optimization, with an emphasis on applications. We will under-

stand the important theorems, and sketch heuristic proofs when illuminating, whose results

you should understand and be comfortable using. The relevant mathematical tools will be

developed as needed.

Optimization underpins the modern economic theory. Actors in economic models are

assumed to be optimizers, and the canonical approach to solving models involves formulating

a concept of equilibrium, whereby the actors solve their particular optimization problems in

a mutually consistent manner as prescribed by the equilibrium concept.

The essential element of an optimization problem is the objective function f , which maps

a domain U to a range V , where the range space has associated with it some notion of

an ordering so that the values of the function f at different points in the domain can be

compared. Most commonly, this range space V is a subset of the real line R, which I will

assume henceforth. I will couch results in terms of a maximization problem, but the results

for minimization problems can easily be extracted from these. There is little we can say at

this level of generality. Perhaps the most famous result is the Weierstrass Theorem, which

ensures that global extrema exist if f is continuous and U is compact.

In the context of a differentiable f with the U ⊂ R and V ≡ R, the quintessential

result from optimization theory is the First Order Necessary Condition (FONC) for local

extrema: the value of the derivative at any (interior) local extremum must be zero. This

condition generalizes to differentiable finite-dimensional real-valued functions, as one would

have seen in a course on multivariable calculus. Other important conditions include the First

Order Sufficient Condition (FOSC), the Second Order Necessary Condition (SONC), and the

Second Order Sufficient Condition (SOSC) for extrema (see Simon, Blume for details).

These introductory optimization problems are called unconstrained optimization prob-

lems, because the choice variables are independent of each other. However, most interesting

optimization problems involve (one of more of the following): constraints, implying that the

choice variables are not independent; time, implying that the optimum sought involves a

sequence of interlinked choices; uncertainty, implying that the outcome from a particular

choice is not determined by that choice. As we will see, all three of these wrinkles can han-

dled by the Karush-Kuhn-Tucker (KKT) approach1 to constrained optimization, which is a

very general theory that we shall begin with in this course. However, optimization problems

with time and with uncertainty often have a great deal of structure that can be used to

obtain sharper (and more useful) results; we will explore these later in the course.

1You may have heard of the Kuhn-Tucker Theorem, but the necessary condition described by the the-

2

Another important set of theorems (other than the First/Second Order Neces-

sary/Sufficient Conditions) are the various Envelope Theorems that provide machinery for

the comparative statics2 of equilibrium. The distinction between endogenous and exogenous

variables in an economics model is represented by the distinction between choice variables

and parameters in optimization theory. One could think of there being two main questions

of any optimization problem: 1) Given a point in the parameter space, what are the optima

of the problem? 2) What are the properties of the set of optima/policy function? Envelope

Theorems are an answers to the second question, while FONC, etc. are answers to the first

question.

In summary, as we study optimization, it is useful to understand the assumptions that

are used for results under each of the following categories:

• Existence: Does the optimization problem have a solution?

• Uniqueness: Does the optimization problem have multiple or exactly one solution?

• Characterization: Are there conditions that solutions satisfy that are both necessary

and sufficient?

• Regularity: Do the solutions depend continuously on the parameters of the problem?

• Sensitivity: Does the optimal value (and the optimal choices) depend smoothly on

the parameters of the problem i.e. what is the effect of a marginal change in the

parameters?

orem was actually discovered by William Karush many years before (see the Wikipedia article: http://

en.wikipedia.org/wiki/Karush-Kuhn-Tucker_conditions). Given the customary omission of Karush’sname, this could be considered an example of Stephen Stigler’s Law of Eponymy: “No scientific discovery isnamed after its original discoverer.” (see http://en.wikipedia.org/wiki/Stigler’s_law_of_eponymy).

2Paul Samuelson was the early proponent of the argument that the meaningfulness of models come fromits comparative statics properties.

3

http://en.wikipedia.org/wiki/Karush-Kuhn-Tucker_conditions

http://en.wikipedia.org/wiki/Karush-Kuhn-Tucker_conditions

http://en.wikipedia.org/wiki/Stigler's_law_of_eponymy

Part I

Mathematical Preliminaries

Notation

⊂ - subset (or set inclusion)

( - strict subset (or strict set inclusion)

⊃ - superset

) - strict superset

∈ - element of

3 - contains

N - the set of natural numbers

Z - the set of integers

R - the set of real numbers

2 Sets, Functions, and Proofs

2.1 Sets and Functions

A set is a collection of objects. We will work with this intuitive definition of a set without

limiting what a set may contain3.

A set that contains no elements is called the empty set and is denoted by ∅ or {}. The

union of two sets A and B, denoted A∪B, is a set that contains all elements that are con-

tained in A and all elements that are contained in B, without repetition. The intersection

of two sets A and B, denoted A ∩ B, is a set that contains only those elements that are

contained in both A and in B. A set C is a subset of another set A if every element in C

is also an element in A. The universal set U is the set containing all elements of every

possible set. Note well that the universal set can be different depending on the context,

and that the universal universal set is an ill-defined concept that leads to paradoxes (see

previous footnote). Given two sets A and B, we can define the set difference A−B or A\Bto be the set of all elements in A that are not in B. Then, A = (A − B) ∪ (A ∩ B). The

complement of a set A, denoted A, is the set of all elements not in A, that is, the set of

all elements that are in the universal set but not in A. Then, A = U − A. The power set

3This is naıve set theory, and suffers from the (Bertrand) Russell paradox: consider the set of all setsthat do not contain themselves. Now does this set contain itself?

4

of a set A, denoted P (A), is the collection of all subsets of A. Notice that if the cardinality

(see below for definition) of the set A is finite (and equal to a), then the number of subsets

of A, i.e. the cardinality of the power set of A, is 2a.

Next, we (intuitively) define a map from one source set (the domain) to another target

set (the codomain). Imagine the two sets written as a list, with one list written above the

other. Now, imagine that there are some element-to-element links between the two sets but

not within a set, where links are unique. This is the intuitive definition of a map. There are

four types of maps, which describe the nature of the links between the two sets. A map is

one-to-one if every element in the domain has no more than one link. A map is many-

to-one if the domain has the property that every element of that set has no more than one

link while the codomain contains at least one element that has more than one link. A map

is one-to-many if every element in the codomain has no more than one link, but there is at

least one element in the domain that has more than one link. A map is many-to-many if

both domain and codomain contain at least one element that has more than one link. Given

some map f , with domain A, the range of f is the set of all element in the codomain that

have a link to some element of the domain. Thus, the range is a subset of the codomain.

A function is a map from one set, called the domain of the function, to another set,

called the codomain of the function, with the restriction that each element in the domain

be mapped to exactly one element in the codomain. A function can be one-to-one or many-

to-one (but not many-to-many or one-to-many).

The cardinality of a set is the mathematical equivalent of the “size” of a set. For finite

sets (set containing a finite number of elements) the cardinality is the same as the number

of elements. For non-finite sets, the notion of size has to be extended to the realm of infinite

numbers. In order to do this, we need to have a formal way of counting the elements in a

set.

Consider pairing an element from the set of unknown cardinality with an element of the

set of natural numbers (the counting numbers 1, 2, 3...) denoted by N, following the natural

order of the counting numbers. For example, if I had a basket of apples and I wished to

know how many I had, I could have written down in ascending order the natural numbers

and placed one and only one apple on each number starting from 1. The number of apples

(that is, the cardinality of the set of apples) would then be given by the last number upon

which I placed an apple. This pairing of an element from one set to another is captured by

the definition of a function.

Finally, we can formalize the notion of counting, and thus of cardinality. For any set A,

we can define a counting function as a one-to-one function with domain A and codomain Nwith the restriction that for any element of the natural numbers to which there is a link, every

5

lesser natural number has a link. Now, if the counting function is surjective, in addition to

being injective, i.e. bijective, then we say that the set is countably infinite. If the counting

function has a range with a least upper bound, then the set is finite and has cardinality

equal to the least upper bound. The empty set is considered to have cardinality 0. If there is

no way to construct a counting function, because no one-to-one function exists, then the set

is uncountably infinite. A set is countable if it is finite or countably infinite. Otherwise

it is uncountable.

Suppose A and B are subsets of some set X. Denote as A−B the set of all points in A

that are not contained in B i.e. A−B ≡ {a ∈ A : a 6∈ B}.

Definition 2.1. Suppose X is the universal set. If A ⊂ X, the complement of A, denoted

Ac, is the set X − A.

2.2 Proofs

In mathematics, a statement is a sentence that is either true or false. A proof is a

sound argument for the truth of a particular statement expressed in mathematical language.

An implication is a statement of the form “If A is true, then B is true”, where A and B

are statements. Here, A is the hypothesis and B is the conclusion. Implications are often

written as “If A, then B” or “A implies B”. There is a frequently used symbolic notation

for the implication: A =⇒ B.

A proposition is a true statement of interest to be proved – the proof would accept the

truth of some number of statements (the premises) and logically and cogently argue for

the truth of the proposition. A truth table is a useful method for determining the truth

value of complex statements. A theorem is a proposition that is subjectively considered

to be of great import or value. Sometimes, because of the length of an argument for a

theorem, the proof is broken into stages, with each linking proposition being proved as a

lemma. So, lemmata (plural of lemma) are propositions whose subjective import derives not

necessarily from its statement but from its role as a stage in the overarching construction

of a proof of a theorem. However, occasionally a lemma has importance independent of

the theorem for which it was constructed. Lastly, corollaries are propositions that follow

almost immediately from a theorem; the proof of such a statement is usually trivial, but the

subjective value of the knowledge of its truth is not.

A little reflection will reveal that to be able to employ mathematical logic fruitfully

one needs to know the truth value of some statements. Logic describes the relationships

between statements, and describes the rules by which the truth value of a statement can be

ascertained given a particular set of premises. However, to ground a particular systematic

6

body of knowledge one needs axioms. An axiom is a statement whose truth value is accepted

without formal proof. The defense for the choice of a particular axiom, and the consequences

of acceptance or rejection of a particular statement as axiomatically true is the bread and

butter of theoretical economics.4 Axioms are the atoms of a particular knowledge system,

just as certain mathematical concepts that are without formal definition (see for example

the definition of a set above) are atoms of a mathematical system.

By definition, every statement is either true or false. Then, we can define logical operators

on statements, analogous to the arithmetic operators plus and multiply. The operator AND

denoted ∧ is a binary operator such that A ∧ B is true if and only if A is true and B is

true. The operator OR denoted ∨ is a binary operator such that A ∨ B is true if and only

if at least one of A and B is true. The operator NOT denoted ¬ is a unary operator such

that ¬A is true if and only if A is false. Using the language of operators, we can see now

that implies, denoted =⇒ , is a binary logical operator. What is the truth table for this

operator?

Now, consider an arbitrary implication A =⇒ B. We can define three operations that

take an implication and produce another implication. The converse of A =⇒ B denoted

CONV is B =⇒ A. The inverse of A =⇒ B denoted INV is ¬A =⇒ ¬B. The

contrapositive of A =⇒ B denoted CONTR is ¬B =⇒ ¬A. What is the converse of

the inverse of an implication? What is the inverse of the contrapositive? What is the inverse

of the inverse of an implication?

Suppose we have to prove that the implication A =⇒ B is true. We could attempt a

direct proof, where we would assume A holds true and produce a chain of implications ending

with the desired outcome i.e. A =⇒ C =⇒ D =⇒ . . . =⇒ E =⇒ B. Alternatively

we could attempt an indirect proof, which comes in two varieties. First, we could directly

prove the contrapositive ¬B =⇒ ¬A, which is equivalent to A =⇒ B. Second, we could

assume that A =⇒ B is false i.e. A ∧ ¬B is true, and then show that this assumption

leads to a contradiction of a previously proved (or assumed) statement, a technique known

as reductio ad absurbdum or proof by contradiction.

A statement that is true or false conditional on the value of one or more variables is a

conditional statement. E.g. x2 +3y = 5. Most statements one encounters are conditional

statements. It is important to note that a conditional statement has a determinate truth

value, conditional on the values of each of the variables upon which it depends.

Intimately connected with conditional statements and implications are quantifiers, which

4John von Neumann is often credited with introducing the axiomatic method in economic theory (forexample, expected utility theory), particularly due to his previous work on the foundations of logic and settheory in mathematics and on the foundations of quantum mechanics in physics. Kenneth Arrow was anearly (and successful) proponent of this approach, made especially famous in his work on social choice theory.

7

delineate the scope or domain in which the truth of a conditional statement or implication

holds. There are two types of quantifiers: existential and universal. The existential

quantifier can be recognized by the use of words such as “there exists” or “there is/are”, and

can be denoted by ∃. When such a quantifier is present, the truth of the (sub-)statement

to which it is attached is determined by the possibility of constructing or otherwise proving

the existence of at least one object satisfying the conditions of the statement. E.g. (There

exist x, y such that x2 + 3y = 5) is a statement (and not a conditional one), and furthermore

is a true statement, since x = 1, y = 43

allows for the truth of the conditional statement.

Therefore, the only way for a statement with a single, existential quantifier to be false is for

there to be no object that satisfies the conditional statement. Notice that there is an hidden

assumption in the previous example. I argued that the statement is true by constructing

an example (proof by construction). However, I assumed that x, y ∈ R. This is neither

allowed nor disallowed by the statement, which implies that the truth value of the statement

is itself conditional on the domain of the variables x and y. Suppose the statement read

“There exist x, y such x2 + 3y = 5 and x, y ∈ N”. Then, the statement would be false, since

there is no pair of values for the variables that would satisfy the expression and the domain

restrictions.

The universal quantifier can be recognized by the use of words such as “for all/every/any”

and can be denoted by ∀. N.B. “For some” is not a universal quantifier, but an existential

quantifier, even though the word “for” appears. When a universal quantifier is present,

the truth of the (sub-)statement to which it is attached is determined by the possibility

of constructing or otherwise proving the existence of at least one object not satisfying the

conditions of the statement. Any such object would prove the statement false. E.g. (For all

x, y, x2 + 3y = 5) is false since x = 1, y = 1 yields a conditionally false statement.

Sometimes, theorems or other statements involve the negation of quantifiers. Any state-

ment of the form “¬(∀x,A(x))”, where A(x) is a conditional statement, can be written as

“∃x, such that ¬A(x)”. Thus, when negated, the universal quantifier becomes an existential

quantifier, with the attached conditional statement becoming negated. A similar algorithm

allows for the negation of an existential quantifier: “¬(∃x, such that A(x))” is equivalent to

“∀x,¬A(x)”.

One common method of proof that may be employable is the proof by induction. The

fundamental principle behind induction is that if S ⊆ N such that (S 3 1) ∧ (n ∈ S =⇒(n + 1) ∈ S), then S = N. Thus, proof by induction can be used whenever the statement

to be proved has a universal quantifier with domain N. The proof requires two steps. The

first step is to show the truth of the conditional statement for some particular m ∈ N. The

second step is to show that the conditional statement is true for some n ∈ N if it is true for

8

n− 1 (or any m < n,m ∈ N).

2.3 Useful Facts

Theorem 2.1 (DeMorgan’s Laws). Let X be some set, and suppose Va ⊂ X for every a ∈ A,

where A is some index set. Then,

1.(⋃

a∈A Va)c ≡ ⋂a∈A V

ca

2.(⋂

a∈A Va)c ≡ ⋃a∈A V

ca

Definition 2.2. Let f : X → Y .

1. The function f is injective if given x, x′ ∈ X, f(x) = f(x′) implies x = x′.

2. The function f is surjective if for all y ∈ Y , there exists an x ∈ X such that f(x) = y.

3. The function f is bijective if it is injective and surjective.

4. If A ⊂ X, denote by f(A) the set of all images of points in A i.e. f(A) ≡ {y ∈ Y :

f(a) = y, a ∈ A}; we call f(A) the image of A.

5. If B ⊂ Y , denote by f−1(B) the set of all points in X whose images are in B i.e.

f−1(B) ≡ {x ∈ X : f(x) ∈ B}; we call f−1(B) the preimage of B.

Do not be confused by the notation f−1(B). In particular, it is not a function like an

inverse function. However, if f is bijective, then the preimage notation can be interpreted as

an inverse function. It is best to think of f−1 as an operator acting on subsets of the range

space.

Proposition 2.2. Let f : X → Y , A,A′ ⊂ X, and B,B′ ⊂ Y . Also, let A be an arbitrary

collection of subsets of X and B be an arbitrary collection of subsets of Y. Then

1. f−1 satisfies the following:

(a) B ⊂ B′ implies f−1(B) ⊂ f−1(B′).

(b) f−1(B −B′) = f−1(B)− f−1(B′).

(c) f−1(⋃B∈B B) =

⋃B∈B f

−1(B).

(d) f−1(⋂B∈B B) =

⋂B∈B f

−1(B).

2. Also, f satisfies the following:

(a) A ⊂ A′ implies f(A) ⊂ f(A′).

9

(b) f(A− A′) ⊃ f(A)− f(A′); equality obtains if f is injective.

(c) f(⋃A∈AA) =

⋃A∈A f(A).

(d) f(⋂A∈AA) ⊂

⋂A∈A f(A); equality obtains if f is injective.

3. Finally, f and f−1 satisfy:

(a) A ⊂ f−1(f(A)); equality holds if f is injective.

(b) B ⊃ f(f−1(B)); equality holds if f is surjective.

Proof. We will use an element argument to demonstrate that B ⊂ B′ =⇒ f−1(B) ⊂f−1(B′). Suppose B ⊂ B′ and let x ∈ f−1(B) be some arbitrary element. Then f(x) ∈ B,

by definition of f−1, and so f(x) ∈ B′, since B ⊂ B′. Thus, x ∈ f−1(B′), by definition of

f−1.

The rest of the proof is left as an exercise. See Exercise 2.1.

Exercise 2.1. Prove all the items in Proposition 2.2. For those statements that don’t hold

with with equality (items 2b, 2d, 3a, 3b), provide examples to show why equality fails to

hold.

3 Real Vector Spaces

Let V be a nonempty set.

Definition 3.1. A real vector space is a set V together with two binary operators (+ and

.) that satisfy the follow axioms (where u, v, w ∈ V and α, β ∈ R):

1. Associativity of vector addition: u+ (v + w) = (u+ v) + w

2. Commutativity of vector addition: u+ v = v + w

3. Identity element of vector addition: There exists an element o ∈ V , the zero vector,

such that v + o = v

4. Inverse element of vector addition: For all v there exists an element w, the additive

inverse, such that v + w = o

5. Distributivity of scalar multiplication over vector addition: α.(u+ v) = (α.u) + (α.v)

6. Distributivity of scalar multiplication over field addition: (α + β).u = (α.u) + (β.u)

7. Consistency of scalar multiplication with field multiplication: α(β.u) = (αβ).u

10

8. Identity element of scalar multiplication: 1u = u where 1 ∈ R is the identity element

of the field R

Vector spaces are also called linear spaces. The most familiar real vector space is R,

which is also a field. The Euclidean spaces Rn for finite n are also frequently encountered

real vector spaces. A perhaps less familiar real vector space is the space of all real-valued

continuous functions on the interval [0, 1].

Exercise 3.1. Show that the C([0, 1]), the set of all real-valued continuous functions on the

interval [0, 1] is a real vector space.

Definition 3.2. A normed vector space is a vector space V together with a function

ν : V → R, called a norm, such that for all u, v ∈ V , α ∈ R:

1. ν(u) ≥ 0 and ν(u) = 0 ⇐⇒ u = o

2. ν(α.u) = |α|ν(u)

3. ν(u+ v) ≤ ν(u) + ν(v)

The norm formalizes the notion of a “length” of a vector. Consider the Euclidean space

Rn. The adjective “Euclidean” derives from the use of the Euclidean norm, which for a

vector v ∈ Rn is defined to be√∑n

i=1 v2i , where the vi is the i-th component when written

as a linear combination of the standard orthonormal basis vectors.

Definition 3.3. An real inner product space is a real vector space V together with a

function 〈·, ·〉 : V × V → R, called the inner product or dot product, such that for all

u, v, w ∈ V , α ∈ R:

1. Positive-definiteness: 〈u, u〉 ≥ 0 and 〈u, u〉 = 0 ⇐⇒ u = o

2. Symmetry: 〈u, v〉 = 〈v, u〉

3. Linearity: 〈αu, v〉 = α〈u, v〉 and 〈u+ v, w〉 = 〈u,w〉+ 〈v, w〉

For ease of notation, I will generally denote 〈u, v〉 as u · v.

Inner products formalize the notion of “angles” between vectors. For our spaces of choice,

the Euclidean spaces Rn, for n ∈ N, the usual inner product is defined by u · v ≡∑n

i=1(uivi).

Thus, ν(u) =√u · u. In fact, if 〈·, ·〉 is an inner product, then ν(·) ≡

√〈·, ·〉 is a norm.

11

4 Metric Spaces

Let X be a nonempty set.

Definition 4.1. A function ρ : X ×X → R is a metric if, for any x, y, z ∈ X,

1. Non-negativity and properness: ρ(x, y) ≥ 0 and ρ(x, y) = 0 ⇐⇒ x = y

2. Symmetry: ρ(x, y) = ρ(y, x)

3. Triangle Inequality: ρ(x, y) ≤ ρ(x, z) + ρ(z, y)

A metric space is a nonempty set X together with a metric ρ, denoted (X, ρ).

Metrics formalize the notion of “distance” between points or elements of a set, and metric

spaces allow for notions of convergence and of continuity, as we shall seen soon, though they

are not the most basic way to formalize these notions.

The (finite-dimensional) Euclidean spaces Rn are metric spaces (in fact, they are complete

metric spaces, as we shall soon see), with the metric being derived from the Euclidean norm

as follows: ρ(x, y) ≡ ν(x−y). These spaces are the workhorse for our exploration of classical

and nonlinear programming.

Exercise 4.1. Suppose that (X, ν) is a normed vector space. Show that (X, ρ) is a metric

space when ∀x, y ∈ X, ρ(x, y) ≡ ν(x− y).

The same set could be associated with many different metrics. An example of a different

metric for Rn is the taxicab or Manhattan metric, which is defined for x, y in the set X by∑ni=1 |xi − yi|. In general, the metric used with a particular set can alter the mathematical

properties of the associated metric space, but for finite-dimensional Euclidean spaces, the

choice of metric does not affect the continuity properties of functions on the space (for the

set of metrics derived from p-norms).

Definition 4.2. Two metrics ρ1 and ρ2 defined for some set X are strongly equivalent

if there exist positive constants α, β ∈ R such that for all x, y ∈ X,

αρ1(x, y) ≤ ρ2(x, y) ≤ βρ1(x, y)

Exercise 4.2. Show that the standard Euclidean metric ρ(x, y) ≡√∑

i(xi − yi)2 is strongly

equivalent to the taxicab metric.

12

5 Analysis and Topology of Metric Spaces

5.1 Open Sets and Topology

Let (X, ρ) be a metric space.

Definition 5.1. An ε-ball about x, denoted Bε(x; ρ), is the set {y ∈ X : ρ(x, y) < ε},where ε is a positive real number.

When understandable by context, the notation for the metric will be suppressed.

Definition 5.2. A set U ⊂ X is open if for all x ∈ U , there exists ε > 0 such that

Bε(x) ⊂ U .

It should be clear that ε-balls are open sets.

Definition 5.3. An open neighborhood of a point x ∈ X is an open set U ⊂ X such that

x ∈ U .

Definition 5.4. A point x ⊂ X is an interior point of a set A ⊂ X if there exists ε > 0

such that Bε(x) ⊂ A. The interior of a set A, denoted intA, is the set of all interior points

of A. The point x is a boundary point of the set A if for all ε > 0, Bε(x) ∩ A 6= ∅ and

Bε(x)∩ (X−A) 6= ∅. The boundary of a set A, denoted bdA, is the set of all points x ∈ Xthat are boundary points of A.

Exercise 5.1. Show that the the interior of a set A ⊂ X is equal to the union of all open

subsets of X that are also subsets of A.

Definition 5.5. A set A ⊂ X is closed if bdA ⊂ A.

Note that a set could be neither open nor closed. Also, a set could be both open and

closed. This is amusingly depicted in the following web comic: http://abstrusegoose.

com/394.

Remark 1. A set A is open if and only if every point is an interior point i.e. A = intA.

Definition 5.6 (Closure). The closure of a set A ⊂ X, denoted clA, is the intersection of

all closed sets containing A. Note that the clA is a closed set.

Definition 5.7 (Bounded Set). Let (X, ρ) be a metric space. A set Y ⊂ X is bounded if

there exists x ∈ Y and r ∈ R+ such that Y ⊂ Br(x, ρ).

Definition 5.8 (Totally Bounded Set). Let (X, ρ) be a metric space. A set Y ⊂ X is totally

bounded if for all ε ∈ R+, there exists a finite subset Z ⊂ Y such that Y ⊂⋃z∈Z Bε(x, ρ)

i.e. the set Y can be covered by finitely many ε-balls, for any ε > 0.

13

http://abstrusegoose.com/394

http://abstrusegoose.com/394

Exercise 5.2. Show that a totally bounded set must also be bounded. Demonstrate with

an example that the converse is not true.

Corollary 5.1. For the Euclidean spaces Rn, every bounded set is totally bounded. Hence,

the definitions are equivalent for these spaces.

Definition 5.9 (Metric Topology). Given a metric space (X, ρ), the metric topology

induced by ρ is the set τ of all open subsets of X, where the open sets are defined as in

Definition 5.2.

Exercise 5.3. Let (X, ρ) be a metric space with the metric topology τ . Show that

1. the sets X and ∅ are both open and closed

2. an arbitrary union of open sets is open

3. the finite intersection of open sets is open

4. the complement of an open set is closed (and vice-versa)

The first three items in the exercise 5.3 could be taken as the axioms for an arbitrary set

X together with a collection τ of subsets of X to define a topological space (X, τ). Therefore,

while every metric space has an associated topology, one could study a space with a topology

without a metric or with a topology other than the one induced by the metric.

Definition 5.10 (Topological Space). Given an arbitrary set X, a topology τ on X is a

collection of subsets of X that satisfies the following conditions:

1. X and ∅ are elements of τ

2. τ is closed under arbitrary unions i.e. for any subcollection τ ′ ⊂ τ ,(⋃

U∈τ ′ U)∈ τ

3. τ is closed under finite intersections i.e. for any finite subcollection τ ′ ⊂ τ ,(⋂U∈τ ′ U

)∈ τ

Member of these sets are called open sets, and complements of open sets are called closed.

Definition 5.11 (Topological Base). Let X be a space. Suppose B is a collection of subsets

of X such that:

1.⋃B∈B B = X

2. For every x ∈ B1 ∩B2, B1, B2 ∈ B, there exists B3 ∈ B such that x ∈ B3 ⊂ B1 ∩B2.

14

Then B is a base (or basis) for the topology τ , and τ is generated by B as follows: a set

U ⊂ X is open (U ∈ τ) if for any x ∈ U , there exists B ∈ B such that x ∈ B ⊂ U .

Notice that if one has a topology τ then a collection B is a base if every open set is the

union of base elements. Moreover, every union of base elements is open.

Example 5.1. The intervals (a, b), a < b form a base for the standard topology on R.

Definition 5.12 (Subspace Topology). Let (X, τ) be a topological space. For Y ⊂ X, the

subspace topology (or relative topology or induced topology) of Y is the collection

τY ≡ {U ∩ Y : U ∈ τ} i.e. the restriction of open sets in X to the set Y .

Definition 5.13 (Box Topology). Let (Xa, τa), a ∈ A be a family of topological spaces,

indexed by A. The box topology of the Cartesian product X ≡∏

aXa is the topology

generated by the base Bbox ≡ {∏

a Ua : Ua ∈ τa} i.e. every open set in X is the union of sets

formed by the cartesian of product of sets open in Xa.

Definition 5.14 (Product Topology). Let (Xa, τa), a ∈ A be a family of topological spaces,

indexed by A. The product topology of the Cartesian product X ≡∏

aXa is the topology

generated by the base Bproduct, every element of which is formed by the cartesian product of

sets∏

a∈A Ya, Ya′ = Xa′ for all a′ ∈ A′ where A − A′ is a finite set, and Ya′′ ∈ τa′′ for all

a′′ ∈ A − A′. Thus, the base consists of the cartesian product of entire spaces except for a

finite number of the indices, for which the entire space is replaced with some set open in that

particular space.

Notice that if the index set A in the above definitions is finite, then the two bases are

the same, and thus the two topologies are equivalent. Generally, when considering cartesian

products of topological spaces, we will assume unless otherwise stated that the topology of

the product space is the product topology.

Definition 5.15 (Projection Mapping). Let (Xa, τa) be a family of topological spaces indexed

by the set A. The projection mapping associated with index b ∈ A is the function πb :

X → Xb, where X ≡∏

a∈AXa, such that πb((xa)a∈A) = xb. Thus, the projection mapping of

an index b associates a point in the cartesian product with its bth coordinate.

Definition 5.16 (Bounded Metric). Let (X, ρ) be a metric space. The metric is bounded

if there exists M such that ρ(x, y) ≤M for all x, y ∈ X. Thus, we have a bounded metric if

the metric space is itself a bounded set.

A metric need not bounded, but given any metric space (X, ρ), we could construct a

bounded metric ρ′ ≡ ρ1+ρ∈ [0, 1]. Notice that ρ′ preserves the ordering of distances between

15

points i.e. ρ(x, y) ≥ ρ(u, v) ⇐⇒ ρ′(x, y) ≥ ρ′(u, v). In fact, there are other bounded

metrics one could define that preserve the ordering of ρ.

Definition 5.17 (Ordinal Equivalence). Let (X, ρ) be a metric space. A metric ρ′ is ordi-

nally equivalent to ρ if for all x, y, u, v ∈ X, ρ(x, y) ≥ ρ(u, v) ⇐⇒ ρ′(x, y) ≥ ρ′(u, v).

Definition 5.18 (Equivalence of metrics). Two metrics ρ1 and ρ2 defined for some set X

are (topologically) equivalent if they generate the same topology. In particular, the two

metrics are equivalent if for all x ⊂ X and any ε > 0 there exist ε′ > 0 and ε′′ > 0 such

that

Bε′(x; ρ1) ⊂ Bε(x; ρ2) and Bε′′(x; ρ2) ⊂ Bε(x; ρ1)

Exercise 5.4. Suppose ρ is a metric on X. Show that for any strictly increasing, continuous,

subadditive function f : R+ → R+, where f(0) = 0, the function ρ′ ≡ f ◦ ρ defines a metric

that is equivalent to ρ; a function f is subadditive if for any x, y, f(x + y) ≤ f(x) + f(y).

Don’t forget to prove that ρ′ satisfies the conditions to be a metric. Conclude that the metric

ρ′ ≡ ρ1+ρ

is equivalent to ρ.

Exercise 5.5. Show that ordinally equivalent metrics are equivalent metrics.

Exercise 5.6. Show that strong equivalence of two metrics for a space X implies the metrics

are equivalent. Note that the converse is not true arbitrary metric spaces, because a bounded

metric can be equivalent to an unbounded metric, but cannot be strongly equivalent to it,

because strong equivalence preserves the boundedness property (can you see why?).

Definition 5.19 (Limit Point). For some topological space (X, τ), a point x is a limit point

of a set A ⊂ X if every open neighborhood of x intersects A at some point other than x itself.

Notice that a limit point of a set need not be in the set. Closed sets exhibit the property

that they contain all their limit points, which is a corollary to the following exercise.

Exercise 5.7. Show that closure of a set A ⊂ X is the union of A with the set of limit

points of A.

Definition 5.20 (Denseness). Let (X, τ) be a topological space. A subset Y ⊂ Z is dense

in Z if the closure of Y contains Z.

Definition 5.21 (Separable). A metric space (X, ρ) is separable if there is a dense subset

Y ⊂ X that is countable.

Remark 2 (Density of Rationals). The space of real numbers R are separable, because the

rational numbers are a countable set that is dense in R.

Exercise 5.8. Prove Remark 2.

16

5.2 Sequences

Definition 5.22. A sequence in X is a function a from N to X. A sequence is usually

denoted (xn), where xn ≡ x(n), n ∈ N.

Definition 5.23 (Convergence: Metric). Let (X, ρ) be a metric space. A sequence (xn) in

X converges to x if, for every ε > 0, there exists an N ∈ N such that whenever n ≥ N ,

ρ(xn, x) < ε. We say that such an x is the limit of the sequence, with the notation being

limxn = x. A sequence that does not converge is said to diverge.

This definition of convergence will not work for a topological space without a metric. We

can define convergence of a sequence more generally for such spaces.

Definition 5.24 (Convergence: Topological). Let (X, τ) be a topological space. A sequence

(xn) in X converges to x if for every open set U 3 x, the sequence is eventually contained

in the set U i.e. there exists an N such that xn ∈ U whenever n ≥ N .

Exercise 5.9. Show that the two definitions of convergence of a sequence (xn) are equivalent

for metric spaces.

Definition 5.25 (Sequentially Closed). Let (X, τ) be a topological space. A set Y ⊂ X is se-

quentially closed if for every convergent sequence (xn) contained in Y (where convergence

is relative to the topology of X) the limit of the sequence is in Y .

Theorem 5.2. If (X, ρ) is a metric space, then a set is closed if and only if it is sequentially

closed.

For general topological spaces, it is only true that a closed set is sequentially closed. The

converse does not hold for an arbitrary topological space.

Definition 5.26 (Subsequence). For some space X, let (xn) be a sequence in X and consider

an increasing sequence of natural numbers (mi). This increasing sequence (mi) produces a

unique subsequence, (ami), of the original sequence. Note that the generated subsequence

is itself a sequence.

Definition 5.27 (Cauchy Sequence). Let (X, ρ) be a metric space. A sequence (xn) is a

Cauchy sequence if, for all ε > 0, there exists N ∈ N such that for all m,n ≥ N ,

ρ(an, am) < ε.

Notice that Cauchy sequences can only be defined for metric spaces, and not for topo-

logical spaces in general. Thus, completeness is not a topological property, because two

equivalent metrics could yield different completeness properties.

17

Definition 5.28 (Completeness). Let (X, ρ) be a metric space. A set Y ⊂ X is complete

if every Cauchy sequence in Y has a limit in Y .

Remark 3. Our favorite space Rn is a complete metric space.

Definition 5.29 (Bounded Sequence). Let (X, ρ) be a metric space. A sequence (xn) is a

bounded sequence if the set {xn} is a bounded set.

1. Every convergent sequence is bounded

2. Let lim an = a and lim bn = b, where (an) and (bn) are sequences in R. Then,

(a) lim can = ca,∀c ∈ R

(b) lim(an + bn) = a+ b

(c) lim(anbn) = ab

(d) lim(an/bn) = a/b, b 6= 0

(e) (an ≥ 0,∀n) =⇒ a ≥ 0

(f) (an ≤ bn, ∀n) =⇒ a ≤ b

(g) (∃c ∈ R, c ≤ bn,∀n) =⇒ c ≤ b. A similar statement with the inequalities reverse

also holds.

3. Every monotone and bounded sequence converges.

4. Subsequences of a convergent sequence converge to the same limit as the original se-

quence.

Exercise 5.10 (Closed Sets Inherit Completeness). Let (X, ρ) be a complete metric space.

Show that any closed subset Y ⊂ X is also complete.

5.3 Continuity

Definition 5.30 (Continuity at a point: Topological definition). Let (X, τX) and (Y, τY ) be

topological spaces. A function f : X → Y is continuous at x, if for all open neighborhoods

V of f(x), there exists an open neighborhood U of x such that f(U) ⊂ V i.e. the pre-image

of open neighborhoods of f(x) are open.

Definition 5.31 (Continuity at a point: Cauchy-Weierstrass definition). Let (X, ρX) and

(Y, ρY ) be metric spaces. A function f : X → Y is continuous at x, if for all ε > 0, there

exists δ > 0 such that for all x′ ∈ X, ρX(x, x′) < δ implies ρ(f(x), f(x′)) < ε.

18

Definition 5.32 (Sequential Continuity at a point: Heine definition). Let (X, τX) and

(Y, τY ) be topological spaces. A function f : X → Y is sequentially continuous at

x, if for all sequences (xn) in X that converge to x, the sequence (f(xn)) converges to f(x)

i.e. sequentially continuous functions preserve limits.

Definition 5.33. A function f : X → Y is continuous if it is continuous at every point

x ∈ X. A function f : X → Y is sequentially continuous if it is sequentially continuous

at every point x ∈ X.

For metric spaces, all both definitions of continuity 5.30 and 5.31 are equivalent. More-

over, for metric spaces continuity and sequentially continuity are equivalent. However, in

more general topological spaces, sequential continuity does not imply continuity, but the

converse is still true.

Exercise 5.11. For arbitrary topological spaces X and Y , show that any continuous function

f : X → Y is sequentially continuous.

Exercise 5.12. Suppose f : Rn → R is continuous under the Euclidean metric. Show that

function f is continuous under any metric on Rn that is equivalent to the Euclidean metric.

Exercise 5.13. Show that the composition of two continuous functions is continuous.

Definition 5.34 (Uniform Continuity). A function f : X → R is uniformly continuous

if for all ε > 0 there exists a δ > 0 such that for any x, y ∈ X if ρ(x, y) < δ, then

|f(x)− f(y)| < ε.

Notice the slight change in the order of the quantifiers in the definition of uniform con-

tinuity from the Cauchy-Weierstrass definition of continuity.

Exercise 5.14. Suppose f : Rn → R is uniformly continuous under the Euclidean metric.

Show that function f is uniformly continuous under any metric on Rn that is strongly

equivalent to the Euclidean metric.

Proposition 5.3 (Continuity of Projection Mappings). Let (Xa, τa) be a family of topological

spaces indexed by the set A, and define X ≡∏

aXa. The projection mappings are continuous

in both the product and box topologies.

Proof. Suppose Ub ∈ Xb is an open set. Then the preimage of Ub under the projection

mapping πb is the set∏

a Ya where Ya = Xa for all a 6= b and Yb = Ub. But this set is an

element of both the base Bbox and Bproduct and is therefore open in both the box and the

product topology. Thus, the preimage of open sets are open for any projection mapping,

and so these mappings are continuous.

19

Theorem 5.4. Let (Xa, τa) be a family of topological spaces indexed by the set A, and define

X ≡∏

aXa with the product topology. Suppose f : Y → X is defined by f(y) ≡ (fa(y))a∈A,

where fa : Y → Xa for every a. Then f is continuous if and only if fa is continuous for all

a ∈ A.

The previous theorem is not true for infinite cartesian products with the box topology.

The following provides a simple counterexample.

Example 5.2. Suppose f : R → R∞. Suppose that fn : R → R is defined by fn(t) = t

for all n ∈ N. Thus, f(t) = (t, t, t, . . .). For each n, fn is continuous is the standard

topology of R. However, f is not continuous when R∞ has the box topology. Consider the

set U = (−1, 1) × (−12, 12) × (−1

3, 13) × · · · . It is clear that U is open in R∞ under the box

topology, since (− 1n, 1n) is open in R for any n ∈ N. However, f−1(U) is not open in R. To

demonstrate this, suppose to the contrary f−1(U) were open. Then it would have to be an

interval around 0, say (−ε, ε), which implies that f((−ε, ε)) ⊂ U . Applying the projection

mapping to the left side of the previous inclusion yields πn(f((−ε, ε))) = fn((−ε, ε)) = (−ε, ε)and to the right side yields πn(U) = (− 1

n, 1n). Thus, (−ε, ε) ⊂ (− 1

n, 1n) for all n, which yields

a contradiction since ε-interval could satisfy this, and since the set {0} is not open.

5.4 Compactness

The most general definition, one that works for an arbitrary topological space, involves

the notion of covers.

Definition 5.35 (Open Cover). Let (X, τ) be a topological space, and F = {Uα ∈ τ : α ∈ A}be an indexed family of open sets, where A is an index set. Then, F is an open cover of

X if X ⊂⋃α∈A Uα.

Definition 5.36 (Finite Subcover). Given an open cover F of X, a finite subcover is a

finite subcollection of set from the original open cover F whose union still contains X.

Definition 5.37 (Compact Set: Heine-Borel (Topological) definition). A set S ⊂ X is

compact if every open cover of S has a finite subcover.

The topological definition is quite abstract and at this stage obscure; I include it for the

sake of completeness of exposition. A somewhat more useful but still abstract definition

involves the finite intersection property.

Definition 5.38 (Finite Intersection Property). A collection of sets A has the finite in-

tersection property if every finite subcollection {A1, . . . , Am} has a nonempty intersection

i.e.⋂mi=1Ai 6= ∅.

20

Theorem 5.5. A set S ⊂ X is compact if and only if every collection of closed subsets of

S, A, with the finite intersection property has a nonempty intersection i.e.⋂A∈AA 6= ∅.

For metric spaces, the following notion, sequential compactness is equivalent to compact-

ness, and is for us a more useful definition.

Definition 5.39 (Sequential Compactness). For a topological space (X, τ), a set S ⊂ X

is sequentially compact if every sequence in S has a subsequence that converges to a

limit that is also in S. If X is a metric space, then sequential compactness is equivalent to

compactness.

Theorem 5.6 (Heine-Borel Theorem). A nonempty set S ⊆ Rn (with the Euclidean metric)

is compact if and only it is closed and bounded.

The Heine-Borel Theorem allows us an easy characterization of compact sets in Rn.

Compact sets are useful because they behave as though they are finite sets (hence the word

compact). In economics, we often assume that the sets we are working with are compact,

particularly because of the Weierstrass Theorem. The theorem makes it easy to identify

whether a given set from a Euclidean space is compact or not. There is a generalization of

this theorem to metric spaces that requires some strengthening of the conditions.

Theorem 5.7. Let (X, ρ) be a metric space. A set Y ⊂ X is compact if and only if it is

complete and totally bounded.

Theorem 5.8 (Bolzano-Weierstrass Theorem). Every bounded sequence (xn) in Rn has a

convergent subsequence. Equivalently, a subset of Rn is sequentially compact (hence compact)

if and only if it is closed and bounded.

For metric spaces, the Bolzano-Weierstrass Theorem is essentially the same as the Heine-

Borel Theorem because of the equivalence of the compactness and sequential compactness.

The following is a crucial theorem from which the Weierstrass Extreme Value Theorem

follows quite simply.

Theorem 5.9 (Continuous Mappings Preserve Compactness). Let (X, τX) and (Y, τY ) be

topological spaces, and suppose f : X → Y is a continuous function. Then for any K ⊂ X

that is compact, f(K) is compact.

Proof. Let AY ≡ {Va : a ∈ A} be an open covering of the image f(K) of a compact set

K ⊂ X. Now, since f is continuous, the pre-image of every member of the collection AYis open in X, so we have an open covering AX ≡ {f−1(V ) : V ∈ AY } of K. Since K is

compact, every open cover has a finite subcover, and so there exists a finite subset A′ ⊂ A

21

such that K ⊂⋃a∈A′,Va∈AY

f−1(Va). Now, since A′ defines a subcover of K relative to the

collection AX , it defines a subcover of f(K) relative to the collection AY . But A′ is finite

and so we have a finite subcover for f(K), proving compactness of f(K).

Theorem 5.10 (Uniform Continuity Theorem). Let (X, ρX) and (Y, ρY ) be metric spaces.

If f : X → Y is a continuous function, and X is a compact space, then f is uniformly

continuous.

The following theorem is a very useful result that says the Cartesian product of compact

spaces is compact.

Theorem 5.11 (Tychonoff Theorem). Suppose (Xa, τa) is a compact space for any a ∈ A,

where A is some index set. Then∏

a∈AXa is compact in the product topology.

Proposition 5.12. The following are some useful results about bounded sets and compact

sets:

1. The union of an arbitrary collection of bounded sets is not necessarily bounded.

2. The union of a finite collection of bounded sets is bounded.

3. The intersection of an arbitrary collection of bounded sets is bounded.

4. The sum of two bounded sets is bounded.

5. The union of an arbitrary collection of compact sets is not necessarily compact.

6. The union of a finite collection of compact sets is compact.

7. The sum of two compact sets is compact.

8. Closed subsets of compact spaces are compact.

Exercise 5.15. Prove item 8 in Proposition 5.12.

5.5 Connectedness

Definition 5.40 (Connectedness). A space (X, τ) is connected if there do not exist two

nonempty open disjoint sets U and V such that X = U ∪ V . A subset S ⊂ X is connected

in X if it is a connected space under the subspace topology.

Proposition 5.13. A space (X, τ) is connected if and only if the only sets that are both

open and closed are X and ∅.

22

Proof. Assumed connectedness of X. Suppose U ( X is nonempty and open. Then, U c is

closed. But connectedness implies that U c is no open. Since the complement of every closed

set is an open set, and since U is a generic nonempty open strict subset of X, generically

every nonempty closed strict subset of X is not open.

The proof of the other direction is trivial.

Proposition 5.14 (Results about Connected Sets). Some results involving connected sets:

1. Continuous maps preserve connectedness i.e. the image of a connected set under a

continuous function is connected.

2. Finite Cartesian products of connected sets are connected. Arbitrary Cartesian products

of connected sets are connected under the product topology, but not the box topology.

3. The real line R is connected, as are intervals and rays (intervals that are unbounded

on one side).

As you continue your study of economics, you will find fixed point theorems pop up every-

where in microeconomics, because of their usefulness in proving equilibrium existence, from

Walrasian equilibrium in the Arrow-Debreu-McKenzie-Nikaido general equilibrium model to

Nash equilibrium in game theory. Fixed point theorems generally state that for some map-

ping ψ : Y → Y , for some space Y , there exists a solution to the equation ψ(y) = y. You

may not realize this, but you are probably already familiar with a fixed theorem, just not by

that name. Acemoglu argues, convincingly, that the Intermediate Value Theorem has the

quality of a fixed point theorem. Let us first see a statement of the theorem.

Theorem 5.15 (Intermediate Value Theorem). Let (X, τ) be a connected topological space.

Suppose f : X → Y is a continuous function, where Y ⊂ R endowed with the standard

(subspace) topology5. If a, b ∈ X and there exists z ∈ R such that f(a) ≤ z ≤ f(b), then

there exists c ∈ X such that f(c) = z.

To see why the Intermediate Value Theorem resembles a fixed point theorem consider a

function f : X → X, where X is a compact, connected subset of R i.e. X = [a, b], a ≤ b.

Then, if f is continuous there exists c ∈ [a, b] such that f(c) = c. This follows quite simply

from an application of the Intermediate Value Theorem.

5The theorem could be generalized by taking the space Y to be any ordered space endowed with theorder topology.

23

5.6 Sequences of Functions

Suppose (fn) is a sequence of functions from some set X to a metric space (Y, ρ).

Definition 5.41 (Pointwise Convergence). The sequence (fn) converges pointwise to a

function f if for all x ∈ X, the sequence (fn(x)) converges to f(x). In notation, for all

x ∈ X, for all ε > 0 there exists N such that for all n ≥ N , ρ(fn(x), f(x)) < ε.

Definition 5.42 (Uniform Convergence). The sequence (fn) converges uniformly to a

function f if for all ε > 0 there exists N such that for any x ∈ X and for all n ≥ N ,

ρ(fn(x), f(x)) < ε. Equivalently, we have uniform convergence if lim sup{ρ(fn(x), f(x)) :

x ∈ X} = 0.

Notice the change in the order of quantifiers that is similar to the swap for uniform

continuity. Pointwise convergence looks at the convergence of the function at a point, treating

each point as a sequence by itself, whereas uniform convergence ties together the “rate of

convergence” of sequences at each point x by requiring the same threshold N for all points

x.

Theorem 5.16 (Uniform Convergence Theorem). Suppose (fn) is a sequence of functions

fn : X → Y , where X is a topological space and Y is a metric space. If (fn) is a sequence

of continuous functions that converges uniformly to a function f , then f is continuous.

Example 5.3. Let X ≡ [0, 1] and Y ≡ R. Suppose fn : X → Y is defined by fn(x) ≡ xn,

where n ∈ N. Notice that (fn) converges pointwise to f , where f(x) = 0 for all x ∈ [0, 1)

and f(x) = 1, x = 1. Thus, a sequence of continuous functions converges to a discontinuous

function. However, this sequence of functions does not converge uniformly.

Proof. We shall demonstrate the pointwise convergence of (fn) to f . Choose some x ∈ (0, 1)

and define (an) by an ≡ fn(x) = xn. Let ε > 0. Then, for all n > N ≡ log ε| log x| , |x

n − 0| < ε

and thus an converges to 0. For x = 0 and x = 1 it is clear that (fn(x)) converges to f(0)

and f(1), respectively.

6 Acknowledgements

These notes greatly benefited from the notes of Kim Border at http://www.hss.

caltech.edu/~kcb/Notes.shtml and from Appendix A of Daron Acemoglu’s “Introduc-

tion to Modern Economic Growth”. I have also consulted James Munkres’ “Topology”.

24

http://www.hss.caltech.edu/~kcb/Notes.shtml

http://www.hss.caltech.edu/~kcb/Notes.shtml

7 References

Simon, Blume: Ch. 12, 29

Acemoglu: Appendix A.1–5

Abbott, Stephen. Understanding Analysis. Springer-Verlag, New York. 2001.

Simon, Carl P., Lawrence Blume. Mathematics for Economists. Norton, New York. 1994.

Solow, Daniel. How to Read and Do Proofs. 3.ed. Wiley, New York. 2002.

25

Part II

Static Optimization

We will study here techniques that fall under the category of (static) nonlinear program-

ming. While these techniques still apply for the subdomain of linear programming, there

exist stronger results for that domain that we will not explore. Good references for linear

programming include Dantzig and Intrilligator.

8 Statement of the Problem

The general optimization problem (for our purposes) consists of an objective function, as-

sumed to be real-valued, together with a set of inequality constraints and equality constraints.

Given θ ∈ Θ, the problem is to find x ∈ X that solves

max f(x, θ) (8.1)

subject to the inequality constraints

gj(x, θ) ≤ 0, 1 ≤ j ≤ J (8.2)

and the equality constraints

hk(x, θ) = 0, 1 ≤ k ≤ K (8.3)

If we define the constraint set as

C(θ) ≡ X ∩

( ⋂1≤j≤J

{x : gj(x, θ) ≤ 0}

)∩

( ⋂1≤k≤K

{x : hk(x, θ) = 0}

)(8.4)

then the problem can be more compactly written as

maxx∈C(θ)

f(x, θ). (8.5)

A solution to equation (8.5) is a global maximizer.

Definition 8.1. A point x∗ ∈ C(θ) is a global maximizer (or just maximizer) for the

maximization problem (8.5) if for all x ∈ C(θ), f(x∗, θ) ≥ f(x, θ). It is a strict global

maximizer if for all x ∈ C(θ), x 6= x∗, f(x∗, θ) > f(x, θ). The definition of (strict) global

minimizer has the inequality reversed.

26

While not necessarily a solution to the maximization problem, local maximizers are in-

teresting candidates for solutions since global maximizers are necessarily local maximizers.

Definition 8.2. A point x∗ ∈ C(θ) is a local maximizer for the maximization problem

(8.5) if there exists an open6 neighborhood U ⊂ C(θ) of x∗ such that for all x ∈ U , f(x∗, θ) ≥f(x, θ). It is a strict local maximizer if for all x ∈ U , x 6= x∗, f(x∗, θ) > f(x, θ). The

definition of (strict) local minimizer has the inequality reversed.

Suppose x∗ ∈ C(θ) is a solution of equation (8.5) (where the notation of the dependence

of x∗ on θ has been suppressed). Then the value function can defined as follows: V (θ) ≡maxx∈C(θ) f(x, θ) = f(x∗, θ).

Consider a statement of the following sort: “If x∗ ∈ C(θ) solves (8.5), then condition

A”, where A is a mathematical statement. This type of statement describes a necessary

condition for a maximizer. Suppose instead we have a statement of the following sort: “If

condition A, then x∗ ∈ C(θ) solves (8.5)”. This type of statement describes a sufficient

condition for a maximizer. A necessary condition furnishes a set of potential solutions and

guarantees that any solution is a member of this set. A sufficient condition furnishes a set

of guaranteed solutions but potentially excludes some solutions. Sufficient conditions can be

viewed as existence theorems.

9 Existence of Optima

Theorem 9.1 (Finite Constraint Set). If the constraint set C(θ) is nonempty and finite,

then the objective function f has both a maximizer and a minimizer in the constraint set.

Theorem 9.2 (Weierstrass Extreme Value Theorem). If the constraint set C(θ) is nonempty

and compact and f is continuous, then f has both a maximizer and a minimizer in the

constraint set.

Proof. Since continuous functions map compact sets to compact sets (see Theorem 5.9),

V ≡ f(C(θ)) ⊂ R is a compact set. By the Heine-Borel Theorem, V is closed and bounded.

But any closed and bounded subset of R contains its least upper bound, and thus has a

maximal value. Then, there exists some x ∈ C(θ) that maps to the maximal member of the

set V , and so x is a maximizer. The proof for the case of the existence of a minimizer is

analogous.

6The relevant topology for the problem is the relative topology of C(θ) derived from that of X, both ofwhich can be generated from the metric of the space X.

27

Corollary 9.3. If X ≡ RN and the constraint set C(θ) is nonempty, closed and bounded,

and if f is continuous, then f has both a maximizer and a minimizer.

Example 9.1. The following are examples illustrating the role of the assumptions of the

Weierstrass Theorem.

1. Suppose C(θ) = R, which is nonempty but not compact (since it is not bounded).

Then the continuous function f(x) = x has no maximizer. Also if C(θ) = (0, 1) which

is bounded but not closed and so not compact, then f has no maximizer. But if

C(θ) = (0, 1], which is also nonempty and not compact, then our previously defined

continuous function has a maximizer.

2. Suppose C(θ) = [−1, 1], which is nonempty and compact. Then the discontinuous

function

f(x) =

{1− |x| : x 6= 0

0 : x = 0

has no maximizer.

3. Suppose we have a discontinuous function f on R such that f(x) = 1 when x is rational,

and f(x) = 0 when x is irrational. Then f has a maximizer when C(θ) ≡ R, which is

a noncompact set.

The condition of continuity of the objective function can be weakened to yield a gener-

alization of the Weierstrass Theorem.

Definition 9.1 (Level Sets). Let f : X → R be a function on some space X.

The level set of f at α (also termed the contour or isoquant) is the set I(α; f) ≡{x ∈ X : f(x) = α}.

The upper level set of f at α (also termed the upper contour) is the set U(α; f) ≡{x ∈ X : f(x) ≥ α}.

The lower level set of f at α (also termed the lower contour) is the set L(α; f) ≡{x ∈ X : f(x) ≤ α}.

When clear from context, I will denote upper and lower sets of a function without refer-

ence to the function.

Definition 9.2 (Semicontinuity). Let (X, ρ) be a metric space. A function f : X → R is

upper semicontinuous if for all α ∈ R, the upper level set U(α) is closed. It is lower

semicontinuous if for all α ∈ R, the lower level set L(α) is closed. A function f is

continuous if and only if it is both upper and lower semicontinuous.

28

Theorem 9.4 (Generalized Weierstrass Theorem). Suppose the constraint set C(θ) is com-

pact. If f is upper semicontinuous, then it has a maximizer in the constraint set. If f is

lower semicontinuous, then it has a minimizer.

10 Convex Sets and Functions on Convex Sets

Before we dig deeper into necessary or sufficient conditions for optimizers, we will define

and understand the properties of four special classes of functions, quasiconcave, concave,

quasiconvex, and convex functions.

Suppose V is a vector space, for example RN .

Definition 10.1 (Convex Set). A set S ⊂ V is convex if for all x, y ∈ S, λx+(1−λ)y ∈ Sfor all λ ∈ (0, 1). A set is strictly convex if for all x, y ∈ S, λx + (1− λ)y ∈ intS for all

λ ∈ (0, 1).

The empty set is assumed to be convex.

Proposition 10.1. The intersection of an arbitrary family of convex sets is convex.

Definition 10.2 (Convex Hull). The convex hull of a set S ⊂ V , denoted cvxS is smallest

(under the set inclusion order) convex set that contains S. Equivalently, it is the intersection

of all convex sets that contain S.

Definition 10.3 (Concavity and Convexity). Let f be a real-valued function on a convex

subset S of a vector space V .

The function f is concave if for all distinct x, y ∈ S, f(λx+ (1− λ)y) ≥ λf(x) + (1−λ)f(y) for all λ ∈ (0, 1).

The function f is strictly concave if for all distinct x, y ∈ S, f(λx + (1 − λ)y) >

λf(x) + (1− λ)f(y) for all λ ∈ (0, 1).

The function f is convex if for all distinct x, y ∈ S, f(λx+(1−λ)y) ≤ λf(x)+(1−λ)f(y)

for all λ ∈ (0, 1).

The function f is strictly convex if for all distinct x, y ∈ S, f(λx + (1 − λ)y) <

λf(x) + (1− λ)f(y) for all λ ∈ (0, 1).

Definition 10.4 (Quasiconcavity and Quasiconvexity). Let f be a real-valued function on

a convex subset S of a vector space V .

The function f is quasiconcave if for all distinct x, y ∈ S, f(λx + (1 − λ)y) ≥min{f(x), f(y)} for all λ ∈ (0, 1).

29

The function f is strictly quasiconcave if for all distinct x, y ∈ S, f(λx+ (1−λ)y) >

min{f(x), f(y)} for all λ ∈ (0, 1).

The function f is quasiconvex if for all distinct x, y ∈ S, f(λx + (1 − λ)y) ≤max{f(x), f(y)} for all λ ∈ (0, 1).

The function f is strictly quasiconvex if for all distinct x, y ∈ S, f(λx+ (1− λ)y) <

max{f(x), f(y)} for all λ ∈ (0, 1).

Concavity and convexity could also be defined in terms of hypographs and epigraphs.

Definition 10.5 (Graph, Hypograph, Epigraph). The graph of a function f : X → R,

where X is a convex set, is the set G(f) ≡ {(x, α) : f(x) = α} ⊂ X ×R. The hypograph is

H(f) ≡ {(x, α) : f(x) ≥ α} ⊂ X×R. The epigraph is E(f) ≡ {(x, α) : f(x) ≤ α} ⊂ X×R.

Proposition 10.2. Suppose we have a function f : X → R, where X is a convex set.

The function f is concave if its hypograph H(f) is convex, and is strictly concave if its

hypograph is strictly convex.

The function is convex if its epigraph E(f) is convex, and is strictly convex if its epigraph

is strictly convex.

Quasiconcavity and quasiconvexity could also be defined in terms of level sets 7.

Proposition 10.3. A function f is quasiconcave if and only if for all α ∈ R, U(α) is convex.

A function f is strictly quasiconcave if and only if for all α ∈ R, U(α) is strictly convex.

A function f is quasiconvex if and only if for all α ∈ R, L(α) is convex.

A function f is strictly quasiconvex if and only if for all α ∈ R, L(α) is strictly convex.

Notice that the definitions of these properties do not require continuity or differentiability.

In fact, the weak version of these properties have do not require a topology on the space. The

strict version of these properties (for example, strict quasiconcavity) does require the vector

space to have a norm, however, because our definition of a strictly convex set makes reference

to the interior of the set, which is a topological concept. If we strengthen the assumptions to

include differentiability (of varying degrees), we can obtain alternative conditions that are

necessary or sufficient for these properties. We shall see this below.

It is straightforward to show that every concave function is quasiconcave and every convex

function is quasiconvex. The converse is not true. For example, any monotonic function is

both quasiconcave and quasiconvex, but only linear functions are both concave and convex.

Positive monotonic transformations of a concave (convex) function do not preserve concavity

(convexity) necessarily, but they do preserve quasiconcavity (quasiconvexity).

7The definition in terms of level sets is the one put forth by Arrow, Enthoven (Econometrica 1961).

30

Proposition 10.4. Let f : S → R be a quasiconcave (quasiconvex) function. Then, for any

nondecreasing function g : R→ R, g ◦ f is quasiconcave (quasiconvex).

Example 10.1. Suppose f : R+ → R is defined by f(x) =√x. This function is strictly

concave. Now, suppose g : R → R is defined by g(x) = x4, which is a non-decreasing

function. Notice that g ◦ f(x) = x2, which is a strictly convex function. Thus, concavity is

not preserved. However, both f and g ◦ f are quasiconcave.

A natural question to ask is whether every quasiconcave function is just a non-negative

monotonic transformation of some concave function. The answer is no; the following is an

example from Arrow, Enthoven (Econometrica 1961).

Example 10.2. Suppose f(x, y) = (x − 1) + ((x− 1)2 + 4(x+ y))12 . The level sets of f

are nonparallel straight lines (a Grapher file of the function and level sets is available here:

https://www2.bc.edu/samson-alva/ec720f11/arrowQCexample.gcx).

Another example originally from Aumann (Econometrica 1975). Suppose f(x, y) = y +√x+ y2. Again, the level sets of f are nonparallel straight lines (a Grapher file of the

function and level sets is available here: https://www2.bc.edu/samson-alva/ec720f11/

aumannQCexample.gcx). Notice that f is strictly concave when restricted to either the

first or the second dimension, but linearity of the level sets implies that it is only weakly

quasiconcave.

Philip Reny (2010) proves that a continuous quasiconcave function cannot be transformed

by a strictly increasing function into a concave function unless it has parallel level sets (his

result is actually even stronger than this). The two examples above are such continuous

quasiconcave functions.

Afriat’s Theorem states that for any finite set of choices satisfying the Generalized Axiom

of Revealed Preference there exists a continuous strictly increasing concave utility function

that would generate those choices.

For more details on concavifiability of quasiconcave functions, see the extensive discussion

in Connell, Rasmusen (2011).

Proposition 10.5. Here are some useful results about quasiconcave and concave functions:

1. If f is strictly concave, and h is strictly increasing, then h ◦ f is strictly quasiconcave.

2. If f is strictly quasiconcave and h is strictly increasing, then h ◦ f is strictly quasicon-

cave.

3. If f is strictly quasiconcave and h is nondecreasing, then h ◦ f is weakly quasiconcave.

31

https://www2.bc.edu/samson-alva/ec720f11/arrowQCexample.gcx

https://www2.bc.edu/samson-alva/ec720f11/aumannQCexample.gcx

https://www2.bc.edu/samson-alva/ec720f11/aumannQCexample.gcx

4. If f is weakly but not strictly quasiconcave and h is nondecreasing, then h◦ f is weakly

quasiconcave.

5. If f is weakly but not strictly quasiconcave and h is strictly increasing, then h ◦ f is

NOT necessarily strictly quasiconcave.

Proposition 10.6. Here are some useful results about quasiconvex and convex functions:

1. If fi is quasiconvex, wi ≥ 0 then f ≡ maxi{wifi} is quasiconvex.

2. If fi is convex, then maxi{fi} is convex.

3. If f, g are convex, and g is nondecreasing, then g(f) is convex.

4. If f, g are concave, and g is nonincreasing, g(f) concave.

Now, let’s make some assumptions about differentiability.

Theorem 10.7. Let X ⊂ RN , and suppose f : X → R, f ∈ C1.

1. f is concave if and only if, for all x, y ∈ X, f(y)− f(x) ≤ Df(x)(y − x).

2. f is strictly concave if and only if, for all x, y ∈ X, y 6= x, f(y)−f(x) < Df(x)(y−x).

3. f is convex if and only if, for all x, y ∈ X, f(y)− f(x) ≥ Df(x)(y − x).

4. f is strictly convex if and only if, for all x, y ∈ X, y 6= x, f(y)− f(x) > Df(x)(y−x).

5. f is quasiconcave if and only if, for all x, y ∈ X, f(y) ≥ f(x) implies Df(x)(y−x) ≥ 0.

6. If, for all x, y ∈ X, y 6= x, f(y) ≥ f(x) implies Df(x)(y − x) > 0, then f is strictly

quasiconcave. The converse is not true, as discussed below.

7. f is quasiconvex if and only if, for all x, y ∈ X, f(y) ≤ f(x) implies Df(x)(y−x) ≤ 0.

8. If, for all x, y ∈ X, y 6= x, f(y) ≤ f(x) implies Df(x)(y − x) < 0, then f is strictly

quasiconvex. The converse is not true, as discussed below.

Theorem 10.8. Let X ⊂ RN , and suppose f : X → R, f ∈ C2.

1. f is concave if and only if for all x, D2f(x) is negative semidefinite.

2. If, for all x, D2f(x) is negative definite, then f is strictly concave.

3. f is convex if and only if for all x, D2f(x) is positive semidefinite.

32

4. If, for all x, D2f(x) is positive definite, then f is strictly convex.

5. f is quasiconcave if and only if for all x, D2f(x) is negative semidefinite on the

nullspace8 of Df(x).

6. If, for all x, D2f(x) is negative definite on the nullspace of Df(x), then f is strictly

quasiconcave.

7. f is quasiconvex if and only if for all x, D2f(x) is positive semidefinite on the nullspace

of Df(x).

8. If, for all x, D2f(x) is positive definite on the nullspace of Df(x), then f is strictly

quasiconvex.

There is a characterization of (semi)definite matrices involving determinants.

Definition 10.6 (Principal Minors). Let A be a real-valued, symmetric N×N matrix. Then

a principal minor of order m of the matrix A is a submatrix of A where all but m rows

and corresponding (by index) columns are deleted. There are N !m!(N−m)!

principal minors of

order m.

The leading principal minor of order m is the principal minor of order m with the

last N −m rows and columns deleted.

The following theorems characterize (semi)definiteness of a symmetric matrix.

Theorem 10.9 (Characterization of Definiteness). Suppose A is a real-valued, symmetric

N ×N matrix.

1. A is negative definite if and only if the determinant of the leading principal minor of

order m is nonzero and has the sign (−1)m, for all 1 ≤ m ≤ N .

2. A is positive definite if and only if the determinant of the leading principal minor of

order m is strictly positive, for all 1 ≤ m ≤ N .

3. A is negative semidefinite if and only if the determinant of every principal minor of

order m is zero or has the sign (−1)m, for all 1 ≤ m ≤ N i.e. odd-ordered prin-

cipal minors have nonpositive determinants and even-ordered principal minors have

nonnegative determinants.

4. A is positive semidefinite if and only if the determinant of every principal minor of

order m is nonnegative, for all 1 ≤ m ≤ N .

8The nullspace of a vector is the set of all vectors that are orthogonal to it. The nullspace of a matrixis the set of all vectors that the matrix maps to the zero vector.

33

Checking definiteness of a matrix on a subspace requires using the Bordered Matrix test,

where the matrix in question is bordered on upper and on the left side by the constraints.

Definition 10.7 (Bordered Matrix). Let A be a real-valued symmetric N × N matrix and

bk ∈ RN for k ∈ {1 . . . K}, a set of independent vectors. Let B be the N × K matrix(b1 . . . bk

), and denote by B′ the transpose of B. Then, the bordered matrix of A with

respect to B is H ≡

(0 B′

B A

).

Definition 10.8 (Border-Respecting Principal Minors). Let A be a real-valued, symmetric

N×N matrix, and B be a real-valued N×K matrix of full rank, and denote by H the bordered

matrix of A with respect to B. Then a border-respecting principal minor of order m

of the bordered matrix H is a submatrix of H where all but m rows and corresponding (by

index) columns are deleted, with the restriction that the index of a deleted row (and column)

be greater than K.

The leading border-respecting principal minor of order m is the principal minor

of order m with the last N +K −m rows and columns deleted.

Theorem 10.10 (Characterization of Definiteness on a Linear Constraint Set). Suppose A

is a real-valued symmetric N×N matrix and bk ∈ RN for k ∈ {1 . . . K}, a set of independent

vectors. Let B be the N × K matrix(b1 . . . bk

), and denote by B′ the transpose of B.

Define the bordered matrix H ≡

(0 B′

B A

).

1. A is negative definite on the subspace {v : B′v = 0} if and only if for each m ∈{2K + 1 . . . N + K}, the determinant of the leading border-respecting principal minor

of order m of matrix H is nonzero and has the sign (−1)(m−K) i.e. the determinant of

H has the sign (−1)N and the signs of the last (largest) N−K leading border-respecting

principal minors have alternating signs.

2. A is positive definite on the subspace {v : b′v = 0} if and only if for each m ∈{2K + 1 . . . N + K}, the determinant of the leading border-respecting principal minor

of order m of matrix H is nonzero and has the sign (−1)K.

The characterization of semidefiniteness on a linear constraint set involves testing every

border-respecting principal minor of every order m ∈ {2K + 1, . . . , N + K}, and not just

the border-respecting leading principal minors, analogous to the characterization of semidef-

initeness of an unconstrained symmetric matrix.

34

Theorem 10.11 (Characterization of Semidefiniteness on a Linear Constraint Set). Suppose

we have A, bk, and B as in Theorem 10.10.

1. A is negative semidefinite on the subspace {v : b′v = 0} if and only if for each m ∈{2K+1 . . . N+K}, the determinant of every border-respecting principal minor of order

m alternates in sign or is equal to zero, with the sign of the determinant of H being

(−1)N or equal to zero i.e. every border-respecting principal minor of order m has a

nonpositive determinant if m−K is odd and has a nonnegative determinant if m−Kis even.

2. A is positive semidefinite on the subspace {v : b′v = 0} if and only if for each m ∈{2K+1 . . . N+K}, the determinant of every border-respecting principal minor of order

m is nonnegative if K is even and is nonpositive if K is odd.

Therefore, to test for, say, quasiconcavity of a twice continuously differentiable function

f in the neighborhood of a point x, we need to find the Hessian of f evaluated at x, which

is a real-valued symmetric matrix, and check whether this Hessian, when bordered by the

Jacobian of f evaluated at x, passes the test of negative semidefiniteness of a matrix on a

linear subspace described in Theorem 10.11.

10.1 References

Afriat. The Construction of Utility Functions from Expenditure Data. (International

Economic Review 1967)

Arrow, Enthoven. Quasiconcave Programming. (Econometrica 1961)

Aumann. (Econometrica 1975)

Connell, Rasmusen. Concavifying the Quasiconcave. (Working Paper 2011)

Reny. A Simple Proof of the Nonconcavifiability of Functions with Linear Not-All-Parallel

Contour Sets. (Working Paper 2010)

11 Unconstrained Optimization With a Differentiable

Objective Function

11.1 Overview

- FONC can be derived using a first-order Taylor expansion, which requires the objective

function to be continuously differentiable

35

- SONC can be derived using a second-order Taylor expansion, which requires the objec-

tive function to be twice continuously differentiable

- SOSC can be derived using a second-order Taylor expansion, which requires the objec-

tive function to be twice continuously differentiable

Also, see later 12.4 for more on second-order conditions for unconstrained problems.

11.2 References

See Kim Border’s notes on the calculus of one variable: http://www.hss.caltech.edu/

~kcb/Notes/Max1.pdf

12 Classical Programming: Optimization with Equal-

ity Constraints

Let us focus on optimization problems where the domain of the objective and constraint

functions is an open subset X of RN , with only equality constraints i.e. J = 0 in equation

(8.2)

12.1 Overview

- Introduce auxiliary variables, called multipliers, for each equality constraint, thereby

converting a constrained optimization problem to an unconstrained optimization problem

with a larger set of choice variables

- Show that the necessary conditions for maxima of the Lagrangian problem yield neces-

sary conditions for maxima of the original problem

- Intuition based on the gradients of the objective function and the constraint function

- Interpretation of the multipliers. Nice article on Wikipedia: http://en.wikipedia.

org/wiki/Lagrange_multiplier

- Explanation of the constraint qualification and the failure of the theory to find optima

when the CQ is violated

12.2 First Order Conditions

For illustrative purposes consider the problem with one equality constraint:

max f(x1, x2) subject to h1(x1, x2) = 0

36

http://www.hss.caltech.edu/~kcb/Notes/Max1.pdf

http://www.hss.caltech.edu/~kcb/Notes/Max1.pdf

http://en.wikipedia.org/wiki/Lagrange_multiplier

http://en.wikipedia.org/wiki/Lagrange_multiplier

Figure 1: Graphical Depiction of a Constrained Optimization Problem

A geometric visualization of this problem is given in Figure 1. Note from Figure 1 that at

the point x∗ = (x∗1, x∗2) the level curve of f and the constraint are tangents to each other, i.e.

both have a common slope. We will explore this observation in order to find a characterization

of the solution for this class of problems and its generalization to m restrictions.

In order to find the derivative of the level curve at the optimum point x∗, recall from the

implicit function theorem that for a function G(y1, y2) = c,

dy2dy1

(y) = −(∂G

∂y2(y)

)−1∂G

∂y1(y),

where y is some point in the domain, if, on an open neighborhood of (y), G is continuously

differentiable and ∂G∂y2

(y) is nonzero. In particular according to Figure 1, dx2dx1

(x) defined

implicitly by f(x) ≡ f(x∗), and dx2dx1

(x) defined implicitly by h(x) ≡ h(x∗), must be the same

at x∗:dx2dx1

(x∗) =

∂f∂x1

(x∗)∂f∂x2

(x∗)=

∂h∂x1

(x∗)∂h∂x2

(x∗)=dx2dx1

(x∗),

which after some rearrangement yields

λ∗ ≡∂f∂x1

(x∗)∂h∂x1

(x∗)=

∂f∂x2

(x∗)∂h∂x2

(x∗). (12.1)

37

where λ∗ ∈ R is the common value of the slope at x∗. We are assuming that the ratios above

do not have zero denominators, the assurance of which is the motivation for the constraint

qualifications discussed later.

Now, rewrite equation (12.1) as two equations

∂f

∂x1(x∗)− λ∗ ∂h

∂x1(x∗) = 0 (12.2)

and∂f

∂x2(x∗)− λ∗ ∂h

∂x2(x∗) = 0. (12.3)

Together with the constraint equation

h(x1, x2) = c, (12.4)

we have a system of three equations (12.2), (12.3), and (12.4) with three unknowns:

(x∗1, x∗2, λ∗).

This system is equivalent to the first-order conditions for stationary points of the following

function

L(x1, x2, λ) ≡ f(x1, x2)− λ(h(x1, x2)− c), (12.5)

which we call the Lagrangian; we also call the term λ the Lagrange multiplier. Thus, for the

case with two choice variables and one constraint, a maximizer (subject to a qualification)

x∗ satisfies ∂L∂x1

(x∗1, x∗2, λ∗) = 0, ∂L

∂x2(x∗1, x

∗2, λ∗) = 0, and ∂L

∂λ(x∗1, x

∗2, λ∗) = 0, for some λ∗.

The Lagrange method transforms a constrained problem into an unconstrained problem

via the formulation of the Lagrangian. The transformation introduces a Lagrange multiplier

for every constraint. It is important to note that the transformation is valid only if at least

one of ∂h∂x1

(x∗) and ∂h∂x1

(x∗) is nonzero. If not, then there is no way to define a multiplier, as

should be clear from examining equation (12.1). This is called the non-degenerate constraint

qualification. If the constraint is linear, this qualification will automatically be satisfied.

We can mimic the steps above for an arbitrary problem with N choice variables and K

constraints, where K < N . Suppose we have a solution to the general constrained maxi-

mization problem with only equation constraints; denote it x∗ ∈ RN . Then it must be the

case that h(x∗) = 0, where h is the K-dimensional vector function of constraints. Consider a

linear approximation of this constraint function at x∗: the derivative (the Jacobian) of this

linear approximation will be Dxh(x∗), according to the Taylor Approximation Theorem for

a first-order approximation, assuming h is continuously differentiable at x∗. This Jacobian

matrix is a K × N matrix, where a generic term is ∂hk∂xn

, and when each row is viewed as

38

a vector, the K vectors span a subspace of RN . Each row of the Jacobian is a vector (the

transpose of the gradient vector of the associated constraint) that has an associated (N−1)-

dimensional nullspace (as long as this vector is non-degenerate), which is the tangent plane

to the associated constraint at x∗, the linear approximation of the constraint function at x∗.

Then, the K rows define K such subspaces, and for a vector to satisfy all these constraints,

the vector must be in every one of these subspaces. If the row vectors of the Jacobian are

linearly independent, then the nullspace of the Jacobian is exactly the subspace of vectors

that are in the tangent plane of each constraint, a subspace of (N −K) dimensions.

Now, given the gradient vector of the objective function Dxf(x∗), consider the subspace

orthogonal to the gradient. This hyperplane is a linear approximation of the level set of the

objective function at the point x∗ i.e. any local movement from x∗ within this hyperplane

will not change the value of the objective function. So, if x∗ is a maximum, it must be the

case that any local move from x∗ that does not locally violate the constraints i.e. movement

within the nullspace of the full-rank Jacobian Dxh(x∗), does not change the value of the

objective, and thus we can conclude that the nullspace of Dxh(x∗) must be contained by

the nullspace of Dxf(x∗). But this means that the vector Dxf(x∗) is not in the nullspace of

Dxh(x∗) and so it can be expressed as a linear combination of the rows of this matrix. Thus,

we can conclude that at a maximum, there exists K constants, λ∗k, 1 ≤ k ≤ K, such that

Dxf(x∗) =∑k

λ∗kDxhk(x∗).

Combined with the constraints h, we have N + K equations that must be satisfied by the

N + K unknown maximum x∗ and the constant λ∗, under the qualification that Dxh(x∗)

has full rank. Notice, too, that the argument above is exactly the same if x∗ is a minimum.

Thus, these necessary conditions hold for both maxima and minima. The results are formally

stated below.

Definition 12.1 (Nondegenerate Constraint Qualification). The functions hk, 1 ≤ k ≤ K

satisfy the Nondegenerate Constraint Qualification (NDCQ) at x∗ if the rank of the

Jacobian Dxh(x∗) is K i.e. the Jacobian has full rank.

Theorem 12.1 (First Order Necessary Conditions). Let f and h be continuously differ-

entiable functions. Suppose x∗ is a constrained maximum or minimum of f , where the

constraint set is defined by {x ∈ X : h(x) = 0}, and suppose that the constraints satisfy the

NDCQ at x∗. Define L(x, λ) ≡ f(x) − λh(x) to be the associated Lagrangian. Then there

39

exists λ∗ ∈ RK such that:

∂L∂xn

(x∗, λ∗) =∂f

∂xn(x∗)−

∑k

λ∗k∂hk∂xn

(x∗) = 0

and∂L∂λk

(x∗, λ∗) = −hk(x∗) = 0

for every n ∈ {1, . . . , N}, k ∈ {1, . . . , K}, which equivalently means that (x∗, λ∗) is a sta-

tionary point of the Lagrangian.

12.3 Meaning of the Multiplier

Theorem 12.2 (Meaning of the Multiplier). Let f , hk be C1 function with domain in RN

and let θk ∈ R. Consider the maximization problem: max f(x) subject to hk(x) = θk,

k ∈ {1, . . . , K}.Suppose x∗ is a constrained maximizer, and λ∗ the associated Lagrange multipliers, and

suppose x∗ and λ∗ are C1 functions of θ. Then,

∂f(x∗(θ))

∂θk= λ∗k(θ).

From Theorem 12.2 we conclude that the Lagrange multiplier can be interpreted as the

change in the maximum achieved by the objective function if we slightly relax (tighten) the

corresponding constraint. Consider for example the case in which f is a utility function with

two arguments x1 and x2, with h(x1, x2) = I representing the budget constraint given an

income of I. The Lagrange multiplier associated with the budget constraint is equivalent to

the marginal utility of income.

The principle behind this result is the envelope theorem, which states that only the direct

effect of an exogenous parameter on the objective function matters when studying the total

effect of a change on the optimal value. The change in the parameter also induces a change

in the endogenous choice variables, but optimality requires that the first-order effect of a

change in the endogenous variables will have no effect on the value of the objective.

12.4 Second Order Conditions

12.4.1 Unconstrained Optimization

As discussed in section 10, a concave function f , when twice continuously differentiable,

is characterized by the result that they have negative semidefinite Hessians i.e. D2xf is

40

negative semidefinite at every point on the domain. There is no characterization of a strictly

concave function, but a negative definite Hessian on the domain of the function is a sufficient

condition of strict concavity.

We can easily define local version of these concepts at a point x, by weakening the

requirement that the property of the Hessian holds on the whole domain to holding for some

open neighborhood of the point x. Then, if x is a local maximum i.e. x is a maximum of f

in an open neighborhood of x, then f(x) ≥ f(x′) for all x′ in the neighborhood, and with a

continuously differentiable function f , this yields Df(x)(x′ − x) ≥ f(x′)− f(x). But this is

exactly the condition for a continuously differentiable function to be concave, and so local

concavity of f is a necessary condition for a local maximum. If we also know f is twice

continuously differentiable, then we can conclude that the Hessian of f must be negative

semidefinite on the neighborhood, given that f must be locally concave.

Suppose f is locally strictly concave on this open neighborhood, then x is a local strict

maximum.9 Moreover, if f is twice continuously differentiable at x, then strict local concavity

implies D2xf(x) is negative definite on this neighborhood, and thus serves as a sufficient con-

dition for a local maximum when combined with some necessary conditions for a maximum,

such as the standard first-order conditions.

Definition 12.2 (Regular Maximum). For some twice continuously differentiable function

f : X → R, we call a local maximizer x∗ regular if D2xf(x∗) is negative definite on an open

neighborhood of x∗.

Notice that from the arguments above, a regular maximum is a strict local maximum.

However, the converse need not be true, as should be clear from the following example.

Example 12.1 (A strict local maximizer that is not regular). Suppose f(x) = −x4. Then,

f is twice continuously differentiable, with a strict (local) maximum at x = 0. However,

f ′′(x) = −12x2 evaluates to 0 at x = 0, and so f ′′ is not negative definite at 0. However, f is

strictly concave, and so we see that strict concavity does not imply the Hessian is negative

definite, and as a consequence, a strict local maximum need not be regular.

12.4.2 Constrained Optimization

For constrained maximization problems, the intuition for second order conditions is sim-

ilar to the unconstrained case. However, the constraints imply that local (strict) concavity

9It may seem that the converse should also be true, but notice that the function f(x) = −|x|, which hasa strict maximum at x = 0, is only concave, and not strictly concave, even locally at 0.

41

at the optimum need only be tested on the tangent space of the constraint set at the opti-

mum. Moreover, the relevant function is no longer the objective function, but the associate

Lagrangian function, which is the function whose stationary points we actually compute.

Theorem 12.3 (Second Order Sufficient Conditions). Suppose the functions f : X → Rand h : X → RK are twice continuously differentiable. Let L ≡ f − µh be the Langrangian

function. Suppose x∗ ∈ RN , λ∗ ∈ RK, such that x∗ and λ∗ satisfies the first order conditions

of Theorem 12.1 and the constraint h(x∗) = 0, and x∗ satisfies the NDCQ.

If D2xL(x∗, λ∗) is negative definite on the subspace {v : Dxh(x∗)v = 0}, then x∗ is a strict

local constrained maximum.

If D2xL(x∗, λ∗) is positive definite on the subspace {v : Dxh(x∗)v = 0}, then x∗ is a strict

local constrained minimum.

Theorem 12.4 (Second Order Necessary Conditions). Suppose the functions f : X → Rand h : X → RK are twice continuously differentiable. Let L ≡ f − µh be the associated

Langrangian function, and suppose that x∗ ∈ RN satisfies the NDCQ.

If x∗ is a local constrained maximum, then there exists λ∗ ∈ RK such that D2xL(x∗, λ∗) is

negative semidefinite on the subspace {v : Dxh(x∗)v = 0}.If x∗ is a local constrained minimum, then there exists λ∗ ∈ RK such that D2

xL(x∗, λ∗) is

positive semidefinite on the subspace {v : Dxh(x∗)v = 0}.

13 Nonlinear Programming: The Karush-Kuhn-

Tucker Approach

13.1 Overview

- The KKT method of dealing with inequality constraints: complementary slackness

- Show that the necessary conditions for maxima of the KKT problem yield necessary

conditions for maxima of the original problem

- Intuition based on gradients of the objective function and constraint function, and an

explanation of the constraint qualification, paying attention to the difference in the meaning

of the sign of the gradient of the constraint function, and hence the difference from the case

with equality constraints

- Failure of CQ is problematic

42

13.2 First Order Conditions

Theorem 13.1 (Karush-Kuhn-Tucker Theorem). Let f and g be continuously differentiable

functions. Suppose x∗ is a constrained maximum or minimum of f , where the constraint set

is defined by {x ∈ X : g(x) ≤ 0}. Denote by JB the subset of indices of the constraints that

bind (hold with equality) at x∗, and suppose that these binding constraints satisfy the NDCQ

at x∗. Define L(x, λ) ≡ f(x) − λg(x) to be the associated Lagrangian. Then there exists

λ∗ ∈ RJ+ such that:

∂L∂xn

(x∗, λ∗) =∂f

∂xn(x∗)−

∑j

λ∗j∂gj∂xn

(x∗) = 0

and∂L∂λk

(x∗, λ∗) = −gj(x∗) ≥ 0, λ∗j ≥ 0, λ∗j∂L∂λk

(x∗, λ∗) = 0

for every n ∈ {1, . . . , N}, j ∈ {1, . . . , J}.

Suppose we have nonnegativity constraints on the choice variables. We can treat these

nonnegativity constraints differently, as done so in the original Kuhn-Tucker Theorem.

Theorem 13.2 (Original Kuhn-Tucker Theorem). Let f and g be continuously differentiable

functions. Suppose x∗ is a constrained maximum or minimum of f , where the constraint set

is defined by {x ∈ X : g(x) ≤ 0, x ≥ 0}. Denote by JB the subset of indices of the constraints

that bind (hold with equality) at x∗, and suppose that these binding constraints satisfy the

NDCQ at x∗. Define L(x, λ) ≡ f(x) − λg(x) to be the associated Lagrangian. Then there

exists λ∗ ∈ RJ+ such that:

∂L∂xn

(x∗, λ∗) =∂f

∂xn(x∗)−

∑j

λ∗j∂gj∂xn

(x∗) ≤ 0, x∗n ≥ 0, x∗n∂L∂xn

(x∗, λ∗) = 0

and∂L∂λk

(x∗, λ∗) = −gj(x∗) ≥ 0, λ∗j ≥ 0, λ∗j∂L∂λk

(x∗, λ∗) = 0

for every n ∈ {1, . . . , N}, j ∈ {1, . . . , J}.

13.3 Second Order Conditions

The second-order theorems for the case of equality constraints in the classical program-

ming framework hold here, with binding constraints and any equality constraints of the

general nonlinear problem treated as the equality constraints in the classical programming

framework and the nonbinding constraints just ignored.

43

13.4 The Fritz John Theorem

The following theorem does not require a constraint qualification, which seems good,

but in some cases introduces many candidates, even when the corresponding KKT theorem

would yield sufficient conditions, such as the case of a concave objective.

Theorem 13.3 (Fritz John). Suppose f and g are continuously differentiable, as in the

KKT Theorem 13.1. Suppose x∗ is a constrained maximizer of f subject to the constraints

g = 0. Then there exists λ∗ ∈ RJ+ and γ∗ ∈ R, with at least one of γ∗, λ∗1, . . . , λ

∗J not equal

to 0, such that

γ∗Df(x∗)− λ∗Dg(x∗) = 0.

For more on this theorem, see Simon, Blume pg 475.

14 The Saddle Point Theorem

Definition 14.1 (Saddle Point). Let f : X ×Y → R. (x∗, y∗) ∈ X ×Y is a saddle point

of f if f(x, y∗) ≤ f(x∗, y∗) ≤ f(x∗, y), for ∀x ∈ X , y ∈ Y

Lemma 14.1 (Interchangebility). Let f : X × Y → R, and let (x1, y1) ∈ X × Y and

(x2, y2) ∈ X × Y be saddle point. Then (x1, y2) and (x2, y1) are also saddle points. Also all

saddle points have the same value.

Proof. We know that

f(x, y1) ≤ f(x1, y1) ≤ f(x1, y), x ∈ X , y ∈ Y

and

f(x, y2) ≤ f(x2, y2) ≤ f(x2, y), x ∈ X , y ∈ Y

Then,

f(x, y1) ≤ f(x1, y2) ≤ f(x2, y2)

Also,

f(x2, y2) ≤ f(x2, y2) ≤ f(x1, y1)

Thus,

f(x, y2) ≤ f(x2, y2) ≤ f(x1, y1) ≤ f(x1, y2) ≤ f(x2, y2) ≤ f(x1, y1) ≤ f(x1, y), x ∈ X , y ∈ Y

44

Similarly, we can show

f(x, y1) ≤ f(x2, y1) ≤ f(x2, y), x ∈ X , y ∈ Y

Suppose, we have the Lagrangian:

L(x, λ) ≡ f(x)− λg(x)

where g(x) : RN → RJ .

Theorem 14.2 (Saddle Point Theorem). For any X ⊂ RN , and any f, gj : X → R, if

(x∗, λ∗) is a saddle point of L, then x∗ maximize f over X s.t. gj(x) ≤ 0 and moreover

λ∗jgj(x∗) = 0, ∀j ∈ J

Proof. Since (x∗, λ∗) is a saddle point, L(x∗, λ∗) ≤ L(x∗, λ), and so f(x∗)−λg(x∗) ≤ f(x∗)−λg(x∗), =⇒ , λ∗(x∗) ≥ λg(x∗) for all λ ≥ 0. Thus, g(x∗) ≤ 0. If this is not true, then

∃j, s.t.gj(x∗) > 0. Now is λj >λ∗g(x∗)gj(x∗)

and λj′ , ∀j′ 6= j, then λg(x∗) = (λj)gj(x

∗) > λ∗g(x∗),

violating the saddle point condition. Thus, x∗ satisfies constraints. Also, λ = 0, =⇒λ∗g(x∗) ≥ 0. But λ∗ ≥ 0 and g(x∗) ≤ 0, =⇒ λ∗g(x∗) ≤ 0 =⇒ λ∗g(x∗) = 0. In fact,

we have λ∗jgj(x∗) = 0. Note, L(x∗, λ∗) ≥ L(x, λ∗) and so f(x∗) − λ∗g(x∗) ≥ f(x) − λ∗g(x).

But λ∗g(x∗) = 0, so f(x∗) ≥ f(x) − λ∗g(x∗) ≥ f(x)− λ∗g(x). But λ∗g(x∗) = 0. So

f(x∗) ≥ f(x)− λ∗g(x). If x satisfies g(x) ≤ 0, then λ∗g(x) ≤ 0, =⇒ f(x)− λ∗g(x) ≥ f(x).

Thus, f(x∗) ≥ f(x)

The converse of the Saddle Point Theorem 14.2 isn’t true in general. But there is a

partial converse result.

Theorem 14.3. Let X be convex subset of R. Let f : X → R be quasiconcave and gj : X →R be convex. Suppose, ∃x ∈ X , s.t. gj((x)) < 0, ∀j ∈ J , a condition known as the Slater

Constraint Qualification.

If x∗ is a constrained max of f and g, then ∃λ∗ ∈ RJ+, s.t. (x∗, λ∗) is a saddlepoint of

L : X × RJ+ → R, L(x, λ) ≡ f(x)− λg(x)

15 The Hyperplane Theorems and the Farkas Lemma

Theorem 15.1 (Strictly Separating Hyperplane Theorem). Let X ⊂ R be nonempty closed

and convex. Let y ∈ RN =⇒ X . Then ∃a ∈ Rn, and c ∈ R, s.t. ax < c < ay, ∀x ∈ X

45

Importance of assumptions:

1. X is closed: otherwise y could be a boundary point and then there are points arbitrarily

close to y, yielding a failure of the strict inequality (though of course we can still find

a ∈ RN , s.t. ax ≤ ay)

2. X is convex: otherwise the plane will intersect X for some choice of y /∈ X

Theorem 15.2. Let x, y ⊂ RN be nonempty convex and disjoint. Then ∃a ∈ RN , s.t.

∀x ∈ X , y ∈ Y , ax ≤ ay, and ∃x′ , y′ ∈ Y , s.t. ax < ay

Theorem 15.3 (Separating Hyper-Plane Theorem). Let x, y ⊂ Z ⊂ RN , nonempty convex

and intX⋂

Y = ∅. Then ∃a ∈ RN and c ∈ R, s.t. ax ≤ c ≤ ay, ∀x ∈ X , y ∈ Y

Theorem 15.4 (Supporting Hyperplane Theorem). Let x ⊂ RN , X is convex and x′ ∈

X − intX . Then, ∃a ∈ RN , a 6= 0, s.t. ax ≤ ax′

Note: x′

needs to be a boundary point of X , otherwise ∃x′′ ∈ X , x′′ 6= x

′, s.t. ax

′′> ax

′

Theorem 15.5 (The Farkas Lemma). Let a1, · · · , an, · · · , be non zero vectors in RN . Let

A ≡

a1...

am

Then exactly one of following is true: 1)∃λ ∈ Rm

t , s.t. b ∈ λA2)∃x ∈ Rn, s.t. bx > 0 and Ax ≤ 0

16 Solving Constrained Optimization Problems

1. Find all points that violate NDCQ (or other constraint qualification). These points

are candidate optima.

2. Determine the KKT first order conditions.

3. Find all points that satisfy the KKT conditions: for every subset of constraints, as-

sume these have nonzero multipliers and try to find any points that satisfy the KKT

conditions. All such points that pass NDCQ are candidate optima.

4. If the objective is concave (convex) and the constraint function quasiconvex, then every

solution of the KKT conditions is a global constrained maximum. However, we need

to ensure that we haven’t missed a solution that violates the NDCQ.

46

5. Evaluate all candidate points to find global optima, or use second order conditions to

discriminate between candidate points.

17 Summary of Optimization Theorems

17.1 Unconstrained

Let f : X → R be a continuous function, where X is an open subset of RN . Note that

assuming X is open means that any local optimum is an interior optimum. Henceforth, I

will assume the problem is to find maxima. The results can be easily translated for minima.

Also, keep in mind that an open domain implies a maximum may not exist.

1. If f is continuously differentiable, then a necessary condition for a local maximum x∗

is that Df(x∗) = 0.

2. If f is twice continuously differentiable, then a necessary condition for a local maximum

x∗ is that D2f(x∗) is negative semidefinite.

3. If f is twice continuously differentiable, then the first order necessary condition is also

a sufficient condition for a local maximum x∗ if D2f(x∗) is negative definite.

4. If f is continuously differentiable and concave, then the first order necessary condition

for a local maximum is also a sufficient condition for a global maximum.

5. If f is continuously differentiable and strictly concave, then x∗ is the unique maximizer

of f if Df(x∗) = 0.

6. If f is twice continuously differentiable and concave, then, if x∗ solves Df(x∗) = 0 and

D2f(x∗) is negative definite, it is the unique maximizer.

17.2 Constrained

Theorem 17.1 (Local-Global Theorem). Let f : X → R be a continuous function, where

X is an open subset of RN . Suppose C ⊂ X is convex and compact, and f is quasiconcave.

Then every local maximum is a global maximum.

17.2.1 Classic KKT

Let f : RN+ → R and g : RN

+ → RJ be continuous functions, and suppose f, g are

continuously differentiable.

47

Consider the following problem:

maxx∈RN

+

f(x)

subject to

g(x) ≤ 0.

Notice that x comes from RN+ , and so implicitly we have nonnegativity constraints.

The classic KKT conditions are:

Dxf(x∗)− λ∗Dxg(x∗) ≤ 0, x∗ ≥ 0, (17.1)

g(x∗) ≤ 0 λ∗ ≥ 0, (17.2)

x∗(Dxf(x∗)− λ∗Dxg(x∗)) = 0, (17.3)

λ∗g(x∗) = 0. (17.4)

(17.5)

Suppose x∗ is a solution to the maximization problem. Then the KKT conditions are

necessary conditions (and there exists an associated λ∗) if any one of the following is true:

1. The Jacobian of the constraints that bind at x∗ has full rank (NDCQ).

2. The constraints are affine 10.

3. The constraint functions gj are convex, and there exists an interior point of the con-

straint set i.e. there exists x ∈ RN+ such that for all j, gj(x) < 0. This is the Slater

condition.

4. The constraint functions gj are quasiconvex, have a nonempty interior (the Slater

condition), and if for any j, gj is not convex, then Dxgj(x) 6= 0 for any x ∈ R+. This

is a weakening of the previous item, but ruling out pesky stationary points for the

constraint functions.

Suppose (x∗, λ∗) is a solution of the KKT conditions. Then x∗ is a maximizer, that is

the KKT conditions are sufficient conditions, if gj is quasiconvex for every j and any one of

the following is true:

1. f is concave.

2. f is twice continuously differentiable, quasiconcave, and Dxf(x∗) 6= 0.

10A function F is linear if F (ax + by) = aF (x) + bF (y). An affine function is a linear function with anadded constant. For example, F (x) = 12x is linear (and affine), but F (x) = 12x + 3 is not linear, thoughstill affine.

48

3. f is quasiconcave, and one of the following holds:

(a) ∂f∂xi

(x∗) < 0 for some i

(b) ∂f∂xi

(x∗) > 0 for some i such that there exists x ∈ RN+ with xi > 0.

17.2.2 Modern KKT

Let f : X → R and g : X → RJ be continuous functions, where X ⊂ RN is open, and

suppose f, g are continuously differentiable.


maxx∈X

f(x)

subject to

g(x) ≤ 0.

Any nonnegativity constraints should be included in the set of inequality constraints

explicitly. Define the constraint set by C ≡ {x ∈ X : g(x) ≤ 0}.The modern KKT conditions are:

Dxf(x∗)− λ∗Dxg(x∗) = 0 (17.6)

g(x∗) ≤ 0 λ∗ ≥ 0, (17.7)

λ∗g(x∗) = 0. (17.8)

(17.9)

If every reference to classic KKT is replaced with modern KKT in the section on Classic

KKT, then the results there apply here.

17.2.3 KKT - mixed constraints

Let f : RN → R, g : RN → RK , h : RN → R be continuous functions, and suppose f, g, h

are continuously differentiable.


maxx∈RN

f(x)

subject to

gj(x) ≤ 0, hk(x) = 0

Note also that an equality constraint hk(x) = 0 could be replaced by two inequality con-

straints gj(x) ≤ 0 and gj′(x) ≤ 0, where gj = hk and gj′ = −hk. Thus, the results below are

just appropriate restatements of the results in the classic KKT section.

49

The mixed KKT conditions are:

Dxf(x∗)− λ∗Dxg(x∗)− µ∗Dxh(x∗) = 0, (17.10)

g(x∗) ≤ 0, λ∗ ≥ 0, λ∗g(x∗) = 0 (17.11)

h(x∗) = 0 (17.12)

Suppose x∗ is a solution to the maximization problem. Then the mixed KKT conditions

are necessary conditions (and there exists an associated λ∗) if any one of the following is

true:

1. The Jacobian of the equality constraints and the binding inequality constraints at x∗

has full rank (NDCQ).

2. The constraints are affine.

Suppose (x∗, λ∗, µ∗) is a solution of the mixed KKT conditions. Then x∗ is a maximizer,

that is the KKT conditions are sufficient conditions, if gj is quasiconvex for every j, hk is

linear for every k and any one of the following is true:

1. f is concave.

2. f is twice continuously differentiable, quasiconcave, and Dxf(x∗) 6= 0.

3. f is quasiconcave, and one of the following holds:

(a) ∂f∂xi

(x∗) < 0 for some i

(b) ∂f∂xi

(x∗) > 0 for some i such that there exists x ∈ RN+ with xi > 0.

17.2.4 Saddlepoint Theorem

Let f : X → R and g : X → RJ be functions on an arbitrary set X. Define the

Lagrangian function L(x, λ) ≡ f(x)− λg(x), where λ ∈ RJ+.

1. If (x∗, λ∗) is a saddlepoint of L, then x∗ is a constrained maximizer, and λ∗g(x∗) = 0.

2. Suppose X ⊂ RN is convex, f is concave, gj is convex for each j. If x∗ is a constrained

maximizer of f subject to g ≤ 0 and there exists x such that g(x) < 0 (the Slater

condition), then there exists λ∗ ∈ RJ+ such that (x∗, λ∗) is a saddlepoint of L.

50

18 References

George Dantzig. Linear Programming and Extensions. 1963.11

Michael Intriligator. Mathematical Optimization and Economic Theory. 1971.

11A pdf version is available for free from RAND Corporation. See http://www.rand.org/pubs/reports/R366.html

51

http://www.rand.org/pubs/reports/R366.html

http://www.rand.org/pubs/reports/R366.html

ec 720 - math for economists lecture notesfaculty.business.utsa.edu/salva/ec720f11/notes.pdf ·...

Documents