london school of economics professor leonardo felliecon.lse.ac.uk/staff/lfelli/teach/ec400 lecture...

LONDON SCHOOL OF ECONOMICS Professor Leonardo Felli

Department of Economics S.478; x7525

EC400 20010/11

Math for Microeconomics

September Course, Part II

Lecture Notes

Course Outline

Lecture 1: Tools for optimization (Quadratic forms).

Lecture 2: Tools for optimization (Taylor’s expansion) and Unconstrained optimiza-

tion.

Lecture 3: Concavity, convexity, quasi-concavity and economic applications.

Lecture 4: Constrained Optimization I: Equality Constraints, Lagrange Theorem.

Lecture 5: Constrained Optimization II: Inequality Constraints, Kuhn-Tucker The-

orem.

Lecture 6: Constrained optimization III: The Maximum Value Function, Envelope

Theorem, Implicit Function Theorem and Comparative Statics.

Lecture 1: Tools for optimization:

Quadratic Forms and Taylor’s formulation

What is a quadratic form?

• Quadratic forms are useful because: (i) the simplest functions after linear ones;

(ii) conditions for optimization techniques are stated in terms of quadratic

forms; (iii) economic optimization problems have a quadratic objective func-

tion, such as risk minimization problems in finance, where riskiness is measured

by the quadratic variance of the returns from investments.

• Among the functions of one variable, the simplest functions with a unique global

extremum are the pure quadratics: y = x2 and y = −x2. The level curve of a

general quadratic form in R2 is

a11x21 + a12x1x2 + a22x

22 = b

and can take the form of an ellipse, a hyperbola, a pair of lines, or possibly, the

empty set.

• Definition: A quadratic form on Rn is a real valued function

Q(x1, x2, ..., xn) =∑i≤j

aijxixj

• The general quadratic form of

a11x21 + a12x1x2 + a22x

22

can be written as(x1 x2

)( a11 a12

0 a22

)(x1

x2

).

2

• Each quadratic form can be represented as

Q(x) = xTAx

where A is a (unique) symmetric matrix:a11 a12/2 ... a1n/2

a21/2 a22 ... a2n/2

... ... ... ...

an1/2 a2n/2 ... ann

.

• Conversely if A is a symmetric matrix, then the real valued function Q(x) =

xTAx, is a quadratic form.

3

Definiteness of quadratic forms

• The function always takes the value 0 when x = 0.

• We focus on the question of whether x = 0 is a max, a min, or neither. For

example when

y = ax2

then if a > 0, ax2 is non negative and equals 0 only when x = 0. This is positive

definite, and x = 0 is a global minimizer. If a < 0, then the function is negative

definite.

• In two dimensions,

x21 + x22

is positive definite, whereas

−x21 − x22

is negative definite, whereas

x21 − x22

is indefinite, since it can take both positive and negative values.

• There are two intermediate cases: if the quadratic form is always non negative

but also equals 0 for non zero x′s, is positive semidefinite, such as

(x1 + x2)2

which can be 0 for points such that x1 = −x2. A quadratic form which is never

positive but can be zero at points other than the origin is called negative

semidefinite.

• We apply the same terminology for the symmetric matrix A, that is, the matrix

A is positive semi definite if Q(x) = xTAx is positive semi definite and so on.

4

• Definition: let A be an (n× n) symmetric matrix. Then A is:

(a) positive definite if xTAx > 0 for all x 6= 0 in Rn,

(b) positive semi definite if xTAx ≥ 0 for all x 6= 0 in Rn,

(c) negative definite if xTAx < 0 for all x 6= 0 in Rn,

(d) negative semi definite if xTAx ≤ 0 for all x 6= 0 in Rn,

(e) indefinite xTAx > 0 for some x 6= 0 in Rn and xTAx < 0 for some x 6= 0 in

Rn.

• Application (later this week): a function y = f(x) of one variable is concave

if its second derivative f ′′(x) ≤ 0 on some interval. The generalization of this

result to higher dimensions states that a function is concave on some region if

its second derivative matrix is negative semidefinite for all x in the region.

Testing the definiteness of a matrix:

• Definition: The determinant of a matrix is a unique scalar associated with the

matrix.

• Computing the determinant of a matrix:

– For a (2× 2) matrix A =

(a11 a12

a21 a22

)the detA or |A| is

a11a22 − a12a21

– For A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

the determinant is:

a11 det

(a22 a23

a32 a33

)− a12 det

(a21 a23

a31 a33

)+ a13 det

(a21 a22

a31 a32

).

5

• Definition: Let A be an (n × n) matrix. A (k × k) submatrix of A formed by

deleting (n−k) columns, say columns (i1, i2, ..., in−k) and the same (n−k) rows

from A, (i1, i2, ..., in−k) , is called a kth order principal submatrix of A. The

determinant of a (k × k) principal submatrix is called a kth order principal

minor of A.

• Example: for a general (3× 3) matrix

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

there is one third order principal minor, which is det(A). There are three second

ordered principal minors and three first order principal minors.

• Definition: Let A be an (n × n) matrix. The kth order principal submatrix of

A obtained by deleting the last (n− k) rows and columns from A is called the

kth order leading principal submatrix of A, denoted by Ak. Its determinant

is called the kth order leading principal minor of A, denoted by |Ak|.

• Let A be an (n× n) symmetric matrix. Then

– A is positive definite if and only if all its n leading principal minors are

strictly positive.

– A is negative definite if and only if all its n leading principal minors alter-

nate in sign as follows:

|A1| < 0, |A2| > 0, |A3| < 0 etc.

The kth order leading principal minor should have the same sign as (−1)k.

– A is positive semidefinite if and only if every principal minor of A is non

negative.

6

– A is negative semidefinite if and only if every principal minor of odd order

is non positive and every principal minor of even order is non negative.

• Diagonal matrices:

A =

a1 0 0

0 a2 0

0 0 a3

.

These also correspond to the simplest quadratic forms:

xTAx = a1x21 + a2x

22 + a3x

23.

This quadratic form will be positive (negative) definite if and only if all the

a′is are positive (negative). It will be positive semidefinite if and only if all the

ai; s are non negative and negative semidefinite if and only if all the a′is are non

positive. If there are two a′is of opposite signs, it will be indefinite.

• Let A be a (2× 2) matrix then:

Q(x1, x2) = (x1, x2)

(a b

b c

)(x1

x2

)= ax21 + 2bx1x2 + cx22

If a = 0, then Q cannot be negative or positive definite since Q(1, 0) = 0. So

assume that a 6= 0 and add and subtract b2x22/a to get:

Q(x1, x2) = ax21 + 2bx1x2 + cx22 +b2

ax22 −

b2

ax22

= a(x21 +2bx1x2a

+b2

a2x22)−

b2

ax22 + cx22

= a(x1 +b

ax2)

2 +(ac− b2)

ax22

7

• If both coefficients, a and (ac−b2)/a are positive, then Q will never be negative.

It will equal 0 only when x1 + bax2 = 0 and x2 = 0 in other words, when x1 = 0

and x2 = 0. In other words, if

|a| > 0 and detA =

∣∣∣∣∣ a b

b c

∣∣∣∣∣ > 0

then Q is positive definite. Conversely, if Q is positive definite then both a and

detA = ac− b2 are positive.

Similarly, Q will be negative definite if and only if both coefficient are negative,

which occurs if and only if a < 0 and ac − b2 > 0, that is, when the leading

principal minors alternative in sign. If ac− b2 < 0. then the two coefficients will

have opposite signs and Q will be indefinite.

• Examples of (2× 2) matrixes:

– Consider A =

(2 3

3 7

). Since |A1| = 2 and |A2| = 5, A is positive

definite.

– Consider B =

(2 4

4 7

). Since |B1| = 2 and |B2| = −2, B is indefinite.

Taylor’s formulation:

• The second tool that we need for maximization is Taylor’s series.

• For functions from R1 to R1, the Taylor’s approximation is

f(a+ h) ≈ f(a) + f ′(a)h

The approximate equality holds in the following sense. Write f(a+ h) as

f(a+ h) = f(a) + f ′(a)h+R(h; a)

8

R(h; a) is the difference between the two sides of the approximation, and by the

definition of the derivative f ′(a), we have R(h;a)h→ 0 as h→ 0.

• Geometrically, this is the formalization of the approximation of the graph of f

by its tangent line at (a, f(a)). Analytically, it describes the best approximation

of f by a polynomial of degree 1.

• Definition: the kth order Taylor polynomial of f at x = a is

Pk(a+ h) = f(a) + f ′(a)h+f ′′(a)

2!h2 + ...+

f [k](a)

k!hk

where

f(a+ h)− Pk(a+ h) = Rk(h; a) where limh→0

Rk(h; a)

hk= 0

• Example: we compute the first and second order Taylor polynomial of the

exponential function f(x) = ex at x = 0. All the derivatives of f at x = 0 equal

1. Then:

P1(h) = 1 + h

P2(h) = 1 + h+h2

2

For h = .2, then P1(.2) = 1.2 and P2(.2) = 1.22 compared with the actual value

of e.2 which is 1.22140.

• For functions of several variables:

F (a + h) ≈ F (a) +∂F

∂x1(a)h1 + ...+

∂F

∂xn(a)hn

where R1(h;a)||h|| → 0 as h→ 0. This is the approximation of order 1. Alternatively

F (a + h) = F (a) +DFa · h +R1(h; a)

9

where DFa =(∂F∂x1

(a), ..., ∂F∂xn

(a))

.

For order two, the analogue for f ′′(a)2!h2 is

1

2hTD2Fah,

where D2Fa is the Hessian matrix:

D2Fa =

∂2F∂2x1

∣∣∣x=a

... ∂2F∂xn∂x1

∣∣∣x=a

... ... ...∂2F

∂x1∂xn

∣∣∣x=a

... ∂2F∂2xn

∣∣∣x=a

.

• The extension for order k then trivially follows.

10

Lecture 2: Unconstrained optimization.

Optimization plays a crucial role in economic problems. We start with uncon-

strained optimization problems.

Definition of extreme points

• Definition: The ball B(x, r) centred at x of radius r is the set of all vectors y

in Rn whose distance from x is less than r, that is

B(x, r) = {y ∈ Rn; ||y − x|| < r}.

• Definition: suppose that f(x) is a real valued function defined on a subset C of

Rn. A point x∗ in C is:

1. A global maximizer for f(x) on C if f(x∗) ≥ f(x) for all x ∈ C.

2. A strict global maximizer for f(x) on C if f(x∗) > f(x) for all x ∈ C

such that x 6= x∗.

3. A local maximizer for f(x) if there is a strictly positive number δ such

that f(x∗) ≥ f(x) for all x ∈ C for x ∈ B(x∗, δ).

4. A strict local maximizer for f(x) if there is a strictly positive number δ

such that f(x∗) > f(x) for all x ∈ C for x ∈ B(x∗, δ) and x 6= x∗.

5. A critical point for f(x) if the first partial derivative of f(x) exists at x∗

and∂f(x∗)

∂xi= 0 for i = 1, 2, ..., n.

• Example: find the critical points of F (x, y) = x3 − y3 + 9xy. We set

∂F

∂x= 3x2 + 9y = 0;

∂F

∂y= −3y2 + 9x = 0

11

the critical points are (0, 0) and (3,−3).

Do extreme points exist?

• Theorem (Extreme Value Theorem): Suppose that f(x) is a continuous function

defined on C, which is compact (closed and bounded) in Rn. Then there exists

a point x∗ in C, at which f has a maximum, and there exists a point x∗ in C,

at which f has a minimum. Thus,

f(x∗) ≤ f(x) ≤ f(x∗)

for all x ∈ C.

Functions of one variable

• Necessary condition for maximum in R :

Suppose that f(x) is a differentiable function on an interval I. If x∗ is a local

maximizer of f(x), then either x∗ is an end point of I or f ′(x∗) = 0.

• Second order sufficient condition for a maximum in R :

Suppose that f(x), f ′(x), f ′′(x) are all continuous on an interval in I and that

x∗ is a critical point of f(x). Then:

1. If f ′′(x) ≤ 0 for all x ∈ I, then x∗ is a global maximizer of f(x) on I.

2. If f ′′(x) < 0 for all x ∈ I for x∗ 6= x, then x∗ is a strict global maximizer

of f(x) on I.

3. If f ′′(x∗) < 0 then x∗ is a strict local maximizer of f(x) on I.

12

Functions of several variables

• First order necessary conditions for a maximum in Rn :

Suppose that f(x) is a real valued function for which all first partial derivatives

of f(x) exist on a subset C ⊂ Rn. If x∗ is an interior point of C that is a local

maximizer of f(x), then x∗ is a critical point of f(x), that is

∂f(x∗)

∂xi= 0 for i = 1, 2, ..., n.

Can we say whether (0, 0) or (3,−3) are a local maximum or a local minimum

then? For this we have to consider the Hessian, or the matrix of the second

order partial derivatives. Note that this is a symmetric matrix since cross-

partial derivatives are equal (if the function has continuous second order partial

derivatives, Clairaut’s / Schwarz’s theorem).

• Second order sufficient conditions for a local maximum in Rn

Suppose that f(x) is a real valued function for which all first and second partial

derivatives of f(x) exist on a subset C ⊂ Rn. Suppose that x∗ is a critical point

of f . Then: If D2f(x∗) is negative (positive) definite, then x∗ is a strict local

maximizer (minimizer) of f(x).

It is also true that if x∗ is an interior point and a maximum (minimum) of f ,

then D2f(x∗) is negative (positive) semidefinite.

But it is not true that if x∗ is a critical point, and D2f(x∗) is negative (positive)

semidefinite, then x∗ is a local maximum. A counterexample is f(x) = x3,

which has the property that D2f(0) is semidefinite, but x = 0 is not a maximum

or minimum.

13

• Back to the example of F (x, y) = x3 − y3 + 9xy. Compute the Hessian:

D2F (x, y) =

(6x 9

9 −6y

)

The first order leading principle minor is 6x and the second order leading princi-

pal minor is det (D2F (x, y)) = −36xy−81. At (0, 0) these two minors are 0 and

−81 and hence the matrix is indefinite and this point is neither a local min or

a local max (it is a saddle point). At (3,−3) these two minors are positive and

hence it is a strict local minimum of F. Note that it is not a global minimum

(why?).

• Sketch of proof:

F (x∗ + h) = F (x∗) +DF (x∗)h +1

2hTD2F (x∗)h +R(h)

Ignore R(h) and set DF (x∗) = 0. Then

F (x∗ + h)− F (x∗) ≈ 1

2hTD2F (x∗)h

If D2F (x∗) is negative definite, then for all small enough h 6= 0, the right hand

side is negative. Then

F (x∗ + h) < F (x∗)

for small enough h or in other words, x∗ is a strict local maximizer of F.

Concavity and convexity

• Definition: A real valued function f defined on a convex subset U of Rn is

concave, if for all x,y in U and for all t ∈ [0, 1] :

f(tx + (1− t)y) ≥ tf(x) + (1− t)f(y)

14

A real valued function g defined on a convex subset U of Rn is convex, if for all

x,y in U and for all t ∈ [0, 1] :

g(tx + (1− t)y) ≤ tg(x) + (1− t)g(y)

• Notice: f is concave if and only if −f is convex.

• Notice: linear functions are both convex and concave.

• A convex set:

Definition: A set U is a convex set if for all x ∈ U and y ∈ U, then for all

t ∈ [0, 1] :

tx + (1− t)y ∈ U

• Concave and convex functions need to have convex sets as their domain. Oth-

erwise, the conditions above fail.

• Let f be a continuous and differentiable function on a convex subset U of Rn.

Then f is concave on U if and only if for all x,y in U :

f(y)− f(x) ≤ Df(x)(y − x) =∂f(x)

∂x1(y1 − x1) + ...+

∂f(x)

∂xn(yn − xn)

• Proof on R1 : since f is concave, then

tf(y) + (1− t)f(x) ≤ f(ty + (1− t)x)⇔

t(f(y)− f(x)) + f(x) ≤ f(x+ t(y − x))⇔

f(y)− f(x) ≤ f(x+ t(y − x))− f(x)

t⇔

f(y)− f(x) ≤ f(x+ h)− f(x)

h(y − x)

for h = t(y − x). Taking limits when h→ 0 this becomes

f(y)− f(x) ≤ f ′(x)(y − x).

15

• If f is a continuous and differentiable concave function on a convex set U and

if x0 ∈ U, then

Df(x0)(y − x0) ≤ 0

implies f(y) ≤ f(x0), and if this holds for all y ∈ U, then x0 is a global

maximizer of f .

• Proof: we know that:

f(y)− f(x0) ≤ Df(x0)(y − x0) ≤ 0

Hence also

f(y)− f(x0) ≤ 0.

• Let f be a continuous twice differentiable function whose domain is a convex

open subset U of Rn. If f is a concave function on U and Df(x0) = 0 for some

x0, then x0 is a global maximum of f on U.

• A continuous twice differentiable function f on an open convex subset U of Rn

is concave on U if and only if the Hessian D2f(x) is negative semidefinite for

all x in U . The function f is a convex function if and only if D2f(x) is positive

semidefinite for all x in U.

• Second order sufficient conditions for global maximum (minimum) in Rn :

Suppose that x∗ is a critical point of a function f(x) with continuous first and

second order partial derivatives on Rn. Then x∗ is:

1. a global maximizer for f(x) if D2f(x) is negative (positive) semidefinite

on Rn.

2. a strict global maximizer for f(x) if D2f(x) is negative (positive) definite

on Rn.

16

The property that critical points of concave functions are global maximizers is

an important one in economic theory. For example, many economic principles,

such as marginal rate of substitution equals the price ratio, or marginal revenue

equals marginal cost are simply the first order necessary conditions of the corre-

sponding maximization problem as we will see. Ideally, as economist would like

such a rule also to be a sufficient condition guaranteeing that utility or profit

is being maximized, so it can provide a guideline for economic behaviour. This

situation does indeed occur when the objective function is concave.

17

Lecture 3: Concavity, convexity, quasi-concavity and economic

applications

• Recall:

Definition: A set U is a convex set if for all x ∈ U and y ∈ U, then for all

t ∈ [0, 1] :

tx + (1− t)y ∈ U

• Concave and convex functions need to have convex sets as their domain.

• Recall: A real valued function f defined on a convex subset U of Rn is concave,

if for all x,y in U and for all t ∈ [0, 1] :

f(tx + (1− t)y) ≥ tf(x) + (1− t)f(y)

Why are concave functions so useful in economics?

• Let f1, ..., fk be concave functions, each defined on the same convex subset U

of Rn. Let a1, a2, ..., ak be positive numbers. Then a1f1 + a2f2 + ... + akfk is a

concave function on U.

(Proof: in class).

Consider the problem of maximizing profit for a firm whose production function

is y = g(x), where y denotes output and x denote the input bundle. If p denotes the

price of output and wi is the cost per unit of input i, then the firm’s profit function is

Π(x) = pg(x)− (w1x1 + w2x2 + ...+ wnxn)

The profit function is a concave function if the production function is concave. It

arises because −(w1x1 +w2x2 + ...+wnxn) is concave and g is concave and from the

result above.

18

The first order conditions:

p∂g

∂xi= wi for i = 1, 2, ..., n

are both necessary and sufficient for an interior profit maximizer.

Quasiconcave and quasiconvex functions

• Definition: a level set of function f defined on U in Rn is:

Xfa = {x ∈ U |f(x) = a}

This could be a point, a curve, a plane.

• Definition: a function f defined on a convex subset U of Rn is quasiconcave if

for every real number a,

C+a = {x ∈ U |f(x) ≥ a}

is a convex set.

Thus, the level sets of the function bound convex subsets from below.

• Definition: a function f is quasiconvex if for every real number a,

C−a = {x ∈ U |f(x) ≤ a}

is a convex set.

Thus, the level sets of the function bound convex subsets from above.

• Every concave function is quasiconcave and every convex function is quasicon-

vex.

19

Proof: Let x and y be two points in C+a so that f(x) ≥ a and f(y) ≥a. Then

f(tx + (1− t)y) ≥ tf(x)+(1− t)f(y)

≥ ta+(1− t)a

= a

So tx + (1− t)y is in C+a and hence this set is convex. We have shown that if

f is concave, it is also quasi-concave. Try to show that every convex function

is quasi-convex.

• This is the second advantage of concave functions in economics. Concave func-

tions are quasi-concave. Quasi-concavity is simply a desirable property when

we talk about economic objective functions such as preferences (why?).

• The property that the set above any level set of a concave function is a convex

set is a natural requirement for utility and production functions. For example,

consider an indifference curve C of the concave utility function U . Take two

bundles on this indifference curve. The set of bundles which are preferred to

them, is a convex set. In particular, the bundles that mix their contents are

in this preferred set. Then, given any two bundles, a consumer with a concave

utility function will always prefer a mixture of the bundles to any of them.

• A more important advantage of the shape of the indifference curve is that it

displays a diminishing marginal rate of substitution. As one moves left to right

along the indifference curve C increasing consumption of good 1, the consumer

is willing to give up more and more units of good one to gain an additional unit

of good 2. This property is a property of concave utility functions because each

level set forms the boundary of a convex region.

• Any (positive) monotonic transformation of a concave function is quasiconcave.

• Let y = f(x) be an increasing function on R1. Easy to see graphically that

the function is both quasiconcave and quasiconvex. The same applies for a

20

decreasing function.

• A single peaked function is quasiconcave.

• Consider the following utility function Q(x, y) = min{x, y}.

• The region above and to the right of any of this function’s level sets is a convex

set and hence Q is quasi-concave.

• Let f be a function defined on a convex set U in Rn. Then, the following

statements are equivalent:

(i) f is a quasiconcave function on U

(ii) For all x,y ∈ U and t ∈ [0, 1],

f(x) ≥ f(y) implies f(tx + (1− t)y) ≥ f(y)

(iii) For all x,y ∈ U and t ∈ [0, 1],

f(tx + (1− t)y) ≥ min{f(x), f(y)}

You will prove this in class.

21

Lecture 4: Constrained Optimization I: The Lagrangian

• We now analyze optimal allocation in the presence of scarce resources; after all,

this is what economics is all about.

• Consider the following problem:

maxx1,x2,...,xn

f(x1, x2, ..., xn)

where (x1, x2, ..., xn) ∈ Rn must satisfy:

g1(x1, x2, ..., xn) ≤ b1, .., gk(x1, x2, ..., xn) ≤ bk

and

h1(x1, x2, ..., xn) = c1, .., hm(x1, x2, ..., xn) = cm.

• The function f is called the objective function, while the g and h functions are

the constraint functions: inequality constraint (g) and equality constraints (h).

• An example: utility maximization:

maxx1,x2,...,xn

U(x1, x2, ..., xn)

subject to

p1x1 + p2x2 + ...+ pnxn ≤ I

x1 ≥ 0, x2 ≥ 0, ..., xn ≥ 0

In this case we can treat the latter constraints as −xi ≤ 0.

22

Equality constraints:

• The simple case of two variables and one equality constraint:

maxx1,x2

f(x1, x2)

subject to

p1x1 + p2x2 = I

• Geometrical representation: draw the constraint on the (x1, x2) plane. Draw

representative samples of level curves of the objective function f. The goal is

to find the highest valued level curve of f which meets the constraint set. It

cannot cross the constraint set; it therefore must be tangent to it.

Need to find the slope of the level set of f :

f(x1, x2) = a

Use total differentiation:

∂f(x1, x2)

∂x1dx1 +

∂f(x1, x2)

∂x2dx2 = 0

Then:dx2dx1

= −∂f(x1, x2)

∂x1/∂f(x1, x2)

∂x2

So, the slope of the level set of f at x∗ is

− ∂f

∂x1(x∗)/

∂f

∂x2(x∗)

23

The slope of the constraint at x∗ is

− ∂h

∂x1(x∗)/

∂h

∂x2(x∗)

and hence x∗ satisfies:∂f∂x1

(x∗)∂f∂x2

(x∗)=

∂h∂x1

(x∗)∂h∂x2

(x∗)

or:∂f∂x1

(x∗)∂h∂x1

(x∗)=

∂f∂x2

(x∗)∂h∂x2

(x∗)

Let us denote by µ this common value:

∂f∂x1

(x∗)∂h∂x1

(x∗)=

∂f∂x2

(x∗)∂h∂x2

(x∗)= µ

and then we can re-write these two equations:

∂f

∂x1(x∗)− µ ∂h

∂x1(x∗) = 0

∂f

∂x2(x∗)− µ ∂h

∂x2(x∗) = 0

We therefore have three equations with three unknowns:

∂f

∂x1(x∗)− µ ∂h

∂x1(x∗) = 0

∂f

∂x2(x∗)− µ ∂h

∂x2(x∗) = 0

h(x∗1, x∗2) = c

We can then form the Lagrangian function:

L(x1, x2, µ) = f(x1, x2)− µ(h(x1, x2)− c)

24

and then find the critical point of L, by setting:

∂L

∂x1= 0

∂L

∂x2= 0

∂L

∂µ= 0

and this gives us the same equations as above.

• The variable µ is called the Lagrange multiplier.

• We have reduced a constrained problem in two variables to an unconstrained

problem in three variables.

A caveat: it cannot be that ∂h∂x1

(x∗) = ∂h∂x2

(x∗) = 0. Thus, the constraint quali-

fication is that x∗ is not a critical point of h.

• Formally, let f and h be continuous functions of two variables. Suppose that

x∗ = (x∗1, x∗2) is a solution to max f(x1, x2) subject to h(x1, x2) = c and that x∗

is not a critical point of h. Then there is a real number µ∗ such that (x∗1, x∗2, µ

∗)

is a critical point of the Lagrange function

L(x1, x2, µ) = f(x1, x2)− µ(h(x1, x2)− c).

An example:

maxx1,x2

(x1x2)

subject to

x1 + 4x2 = 16

The constraint qualification is satisfied.

25

L(x1, x2, µ) = x1x2 − µ(x1 + 4x2 − 16)

and the first order conditions are:

x2 − µ = 0

x1 − 4µ = 0

x1 + 4x2 − 16 = 0

and the only solution is x1 = 8, x2 = 2, µ = 2.

A similar anlaysis easily extends to the case of several equality constraints.

26

Inequality constraints:

With equality constraints, we had the following equations:

∂f

∂x1(x∗)− µ ∂h

∂x1(x∗) = 0

∂f

∂x2(x∗)− µ ∂h

∂x2(x∗) = 0

Or: (∂f∂x1

(x∗)∂f∂x2

(x∗)= µ

∂h∂x1

(x∗)∂h∂x2

(x∗)

)Or:

∇f(x∗) = µ∇h(x∗).

And we had no restrictions on µ.

• The simple case of two variables and one inequality constraint:

maxx1,x2

f(x1, x2)

subject to

g(x1, x2) ≤ b

Graphical representation: In the graph, the solution is where the level curve of

f meets the boundary of the constraint set. This means that the constraint is

binding. There is a tangency at the solution.

• So when the constraint is binding, is it the same as an equality constraint?

• But now when we look graphically at the constraint optimization problem, even

when the constraint is binding, we would have a restriction on the Lagrange

27

multiplier. The gradients are again in line so that one is multiplier of the other:

∇f(x∗) = λ∇g(x∗).

But now the sign of λ is important: the gradients must point in the same direction

also because otherwise we can increase f and still satisfy the constraint. This means

that λ ≥ 0. This is the main difference between inequality and equality constraints.

We still form the Lagrangian:

L(x1, x2, µ) = f(x1, x2)− λ(g(x1, x2)− b)

and then find the critical point of L, by setting:

∂L

∂x1=

∂f

∂x1− λ ∂g

∂x1= 0

∂L

∂x2=

∂f

∂x2− λ ∂g

∂x2= 0

But what about ∂L∂λ

?

Suppose that the optimal solution is when g(x1, x2) < b. At this point, the con-

straint is not binding, as the optimal solution is at the interior. The point x∗ of the

optimal solution is a local maximum (it is an unconstrained maximum). Thus:

∂f

∂x1(x∗) =

∂f

∂x2(x∗) = 0

We can still use the Lagrangian, provided that we set λ = 0!

In other words, either the constraint is binding so that g(x1, x2)− b = 0, or that

it is not binding and then λ = 0. In short, the following complementary slackness

condition has to be satisfied:

λ(g(x1, x2)− b) = 0.

28

Lecture 5: Constrained Optimization II: Inequality Constraints

We describe formally the constrained optimization problem with inequality con-

straints:

Let f and g be continuous functions of two variables. Suppose that x∗ = (x∗1, x∗2)

is a solution to max f(x1, x2) subject to g(x1, x2) ≤ b and that x∗ is not a critical

point of g if g(x∗1, x∗2) = b. Then given the Lagrange function

L(x1, x2, λ) = f(x1, x2)− λ(g(x1, x2)− b),

there is a real number λ∗ such that:

∂L(x∗, λ∗)

∂x1= 0

∂L(x∗, λ∗)

∂x2= 0

λ∗(g(x∗1, x∗2)− b) = 0

λ∗ ≥ 0

g(x∗1, x∗2) ≤ b

An example:

ABC is a perfectly competitive, profit maximizing firm, producing y from input

x according to x.5. The price of output is 2, and of input is 1. Negative levels of x

are impossible. Also, the firm cannot buy more than a > 0 units of input. The firm’s

maximization problem is therefore

max f(x) = 2x.5 − x

subject to g(x) = x ≤ a (and x ≥ 0 which we will ignore now).The Lagrangian is:

L(x, λ) = 2x.5 − x− λ[x− a]

29

The first order condition is:

x−.5 − 1− λ = 0

Let us write all the information that we have:

x−.5 − 1− λ = 0

λ(x− a) = 0

λ ≥ 0

x ≤ a

And solve the system of equations.

It is the easiest to divide it in two cases: when λ > 0 and when λ = 0.

Suppose that λ > 0. This means that the constraint is binding. Then we know

that x = a. The full solution is therefore:

x = a, λ =1√a− 1

When is this solution viable? We need to keep consistency so if we assume that λ > 0

then we need to insure it:1√a− 1 > 0⇔ a < 1

What if λ = 0? this means that the constraint is not binding. From the first order

condition:

x−.5 − 1 = 0⇔ x = 1

The full solution is therefore:

x = 1, λ = 0

and this solution holds for all a ≥ 1.

30

Several Inequality constraints:

The generalization is easy: however, now some constraints may be binding and

some may be not binding.

An example:

We have to maximize f(x, y, z) = (xyz) subject to the constraints that x+y+z ≤ 1

and that x ≥ 0, y ≥ 0 and z ≥ 0. The Lagrangian is

xyz − λ1(x+ y + z − 1) + λ2x+ λ3y + λ4z

Solving the Lagrange problem will give us a set of critical points. The optimal

solution will be a subset of this. But we can already restrict this set of critical points

because it is obvious that λ2 = 0 = λ3 = λ4. If one of these is positive, for example

λ2 > 0, then it must mean by complementary slackness, that x = 0. But then the

value of xyz is 0, and obviously we can do better than that (for example, when

x = y = z = .1).

Thus, the non-negativity conditions cannot bind. This leaves us with a problem

with one constraint, and we have to decide whether λ1 > 0 or λ1 = 0. But obviously,

the constraint must bind. If x+y+z < 1 we can increase one of the variables, satisfy

the constraint, and increase the value of the function. From the first order conditions:

xy − λ1 = 0

zy − λ1 = 0

xz − λ1 = 0

we then find that xy = yz = zx and hence it follows that x = y = z = 13

at the

optimal solution.

31

We have looked at: max f(x, y) subject to g(x, y) ≤ b..

We have characterized necessary conditions for a maximum. So that if x∗ is

a solution to a constrained optimization problem (it maximizes f subject to some

constraints), it is also a critical point of the Lagrangian. We find the critical points

of the Lagrangian.

• Can we then say that these are the solutions for the constrained optimization

problem? In other words:

• Can we say that these are maximizers of the Lagrangian, and if these are max-

imizers of the Lagrangian, are these also maximizers of f (subject to the con-

straint)?

To determine the answer, let (x′, y′, λ) satisfy all necessary conditions for a max-

imum. It is clear that if x′, y′ is a maximizer of the Lagrangian, it also maximizes

f.

To see this note that λ[g(x′, y′)−b] = 0. Thus, f(x′, y′) = f(x′, y′)−λ(g(x′, y′)−b).By λ ≥ 0 and g(x, y) ≤ b for all other (x, y), then f(x, y)− λ(g(x, y)− b) ≥ f(x, y).

Since x′, y′ maximizes the Lagrangian, then for all other x, y :

f(x′, y′)− λ(g(x′, y′)− b) ≥ f(x, y)− λ(g(x, y)− b)

which implies that

f(x′, y′) ≥ f(x, y)

So that if x′, y′ maximizes the Lagrangian, it also maximizes f(x, y) subject to

g(x, y) ≤ b.

• Recall the main results from unconstrained optimization:

32

• If f is a concave function defined on a convex subset X in Rn, x0 is a point

in the interior in which Df(x0) = 0, then x0 maximizes f(x) in X, that is,

f(x) ≤ f(x0) for all x.

• You have shown in class that in the constrained optimization problem, if f is

concave and g is convex, then the Lagrangian function is also concave. This

means that we can use first order conditions.

33

The Kuhn-Tucker Theorem:

Consider the problem of maximizing f(x) subject to the constraint that g(x) ≤ b.

Assume that f and g are differentiable, f is concave, g is convex, and that the

constraint qualification holds. Then x∗ solves this problem if and only if there is a

scalar λ such that

∂L(x∗, λ)

∂xi=

∂

∂xif(x∗)− λ ∂

∂xig(x∗) = 0 for all i

λ ≥ 0

g(x∗) ≤ b

λ[b− g(x∗)] = 0

Mechanically (that is, without thinking...), one can solve constrained optimization

problems in the following way:

• Form the Lagrangian L(x, λ) = f(x)− λ(g(x)− b).

• Suppose that there exist λ∗ such that the first order conditions are satisfied,

that is:

∂L(x∗, λ∗)

∂xi= 0 for all i

λ∗ ≥ 0

λ∗i (g(xi)− b) = 0

• Assume that g1 to ge are binding and that ge+1 to gm are not binding. Write

(g1, .., ge) as gE. Assume also that the Hessian of L with respect to x at x∗, λ∗

is negative definite on the linear constraint set {v : DgE(x∗)v = 0}, that is:

v 6= 0, DgE(x∗)v = 0→ vT (D2xL(x∗, λ∗))v < 0,

34

• Then x∗ is a strict local constrained max of f on the constraint set.

• To check this condition, we form the bordered Hessian:

Q =

(0 DgE(x∗)

DgE(x∗)T D2xL(x∗, λ∗)

)

If the last n − e leading principal minors of Q alternate in sign with the sign

of the determinant of the largest matrix the same as the sign of (−1)n, then

sufficient second order conditions hold for a candidate point to be a solution of

a constrained maximization problem.

35

Lecture 6: Constrained Optimization III: Maximum value functions

Profit functions and indirect utility functions are example of maximum value func-

tions, whereas cost functions and expenditure functions are minimum value functions.

• Maximum value function, a definition:

If x(b) solves the problem of maximizing f(x) subject to g(x) ≤ b, the maximum

value function is v(b) = f(x(b)).

• The maximum value function, is non decreasing.

Maximum value functions and the interpretation of the Lagrange multi-

plier

• Consider the problem of maximizing f(x1, x2, ..., xn) subject to the k inequality

constraints

g(x1, x2, ..., xn) ≤ b∗1, ..., g(x1, x2, ..., xn) ≤ b∗k

where b∗ = (b∗1, ..., b∗k). Let x∗1(b

∗), ..., x∗n(b∗) denote the optimal solution and let

λ1(b∗), ..., λk(b

∗) be the corresponding Lagrange multipliers. Suppose that as

b varies near b∗, then x∗1(b∗), ..., x∗n(b∗) and λ1(b

∗), ..., λk(b∗) are differentiable

functions and that x∗(b∗) satisfies the constraint qualification. Then for each

j = 1, 2, ..., k :

λj(b∗) =

∂

∂bjf(x∗(b∗))

• Proof: For simplicity, we do here the case of a single equality constraint, and

with f and g being functions of two variables. The Lagrangian is

L(x, y, λ; b) = f(x, y)− λ(h(x, y)− b)

36

The solution satisfies:

0 =∂L

∂x(x∗(b), y∗(b), λ∗(b); b)

=∂f

∂x(x∗(b), y∗(b))− λ∗(b)∂h

∂x(x∗(b), y∗(b), λ∗(b)),

0 =∂L

∂y(x∗(b), y∗(b), λ∗(b); b)

=∂f

∂y(x∗(b), y∗(b))− λ∗(b)∂h

∂y(x∗(b), y∗(b), λ∗(b)),

for all b. Furthermore, since h(x∗(b), y∗(b)) = b for all b,

∂h

∂x(x∗, y∗)

∂x∗(b)

∂b+∂h

∂y(x∗, y∗)

∂y∗(b)

∂b= 1

for every b. Therefore, using the chain rule, we have:

df(x∗(b), y∗(b))

db=

∂f

∂x(x∗, y∗)

∂x∗(b)

∂b+∂f

∂y(x∗, y∗)

∂y∗(b)

∂b

= λ∗(b)[∂h

∂x(x∗, y∗)

∂x∗(b)

∂b+∂h

∂y(x∗, y∗)

∂y∗(b)

∂b]

= λ∗(b).

• The economic interpretation of the multiplier as a ‘shadow price’: For example,

in the application for a firm maximizing profits, it tells us how valuable another

unit of input would be to the firm’s profits, or how much the maximum value

changes for the firm when the constraint is relaxed. In other words, it is the

maximum amount the firm would be willing to pay to acquire another unit of

input.

• Recall that

L(x, y, λ) = f(x, y)− λ(g(x, y)− b),

So thatd

dbf(x(b), y(b); b) = λ(b) =

∂

∂bL(x(b), y(b), λ(b); b)

37

Hence, what we have found above is simply a particular case of the envelope

theorem, which says that

d

dbf(x(b), y(b); b) =

∂

∂bL(x(b), y(b), λ(b); b)

Maximum value functions and Envelope theorem:

• Consider the problem of maximizing f(x1, x2, ..., xn) subject to the k equality

constraints

h1(x1, x2, ..., xn, c) = 0, ..., hk(x1, x2, ..., xn, c) = 0

Let x∗1(c), ..., x∗n(c) denote the optimal solution and let µ1(c), ..., µk(c) be the

corresponding Lagrange multipliers. Suppose that x∗1(c), ..., x∗n(c) and µ1(c),

..., µk(c) are differentiable functions and that x∗(c) satisfies the constraint qual-

ification. Then for each j = 1, 2, ..., k :

d

dcf(x∗(c); c) =

∂

∂cL(x∗(c), µ(c); c)

• Note: if hi(x1, x2, ..., xn, c) = 0 can be expressed as some h′i(x1, x2, ..., xn)−c = 0,

then we are back at the previous case, in which we have found that

d

dcf(x∗(c), c) =

∂

∂cL(x∗(c), µ(c); c) = λj(c)

But the statement is more general.

• We will prove this for the simple case of an unconstrained problem. Let φ(x; a)

be a continuous function of x ∈ Rn and the scalar a. For any a,consider the

maximization problem of maxφ(x; a). Let x∗(a) be the solution of this problem

and a continuous and differentiable function of a. We will show that

d

daφ(x∗(a); a) =

∂

∂aφ(x∗(a); a)

38

We compute via the chain rule that

d

daφ(x∗(a); a) =

∑i

∂φ

∂xi(x∗(a); a)

∂x∗i∂a

(a) +∂φ

∂a(x∗(a); a)

=∂φ

∂a(x∗(a); a)

since∂φ

∂xi(x∗(a); a) = 0 for all i by the first order conditions.

• Intuitively, when we are already at a maximum, changing slightly the parame-

ters of the problem or the constraints, does not affect the value through changes

in the solution x∗(a), because∂φ

∂xi(x∗(a); a) = 0.

• When we use the envelope theorem we have to make sure though that we do

not jump to another solution in a discrete manner.

39

Comparative Statics

More generally in economic theory, once we pin down an equilibrium or a solution

to an optimization problem, we are interested in how the exogenous variables change

the value of the endogenous variables.

We have been using the Implicit Function Theorem (IFT) throughout without

stating and explaining why we can use it. The IFT allows us to be assured that a set

of simultaneous equations:

F 1(y1, ..., yn;x1, ..., xm) = 0

F 2(y1, ..., yn;x1, ..., xm) = 0...

F n(y1, ..., yn;x1, ..., xm) = 0

will define a set of implicit functions:

y1 = f 1(x1, ..., xm)

y2 = f 2(x1, ..., xm)...

yn = fn(x1, ..., xm)

In other words, what the conditions of the IFT serve to do is to assure that the n

equations can in principle be solved for the n variables, y1, ..., yn, even if we may not

be able to obtain the solution in an explicit form.

40

• Given the set of simultaneous equations above, if the functions F 1, .., F n all

have continuous partial derivatives with respect to all x and y variables, and if

at a point (y′,x′) that solves the set of simultaneous equations the determinant

of the (n× n) Jacobian w.r.t. the y-variables is not 0:

|J | =

∣∣∣∣∣∣∣∣∣∣

∂F 1

∂y1∂F 1

∂y2... ∂F 1

∂yn∂F 2

∂y1∂F 2

∂y2... ∂F 2

∂yn

... ... ...∂Fn

∂y1∂Fn

∂y2... ∂Fn

∂yn

∣∣∣∣∣∣∣∣∣∣6= 0

then there exists an m−dimensional neighbourhood of x′ in which the variables

y1..., yn are functions of x1, ..., xm according to the f i functions defined above.

These functions are satisfied at x′ and y′. They also satisfy the set of simul-

taneous equations for every vector x in the neighborhood, thereby giving to

the set of simultaneous equations above the status of a set of identities in this

neighbourhood. Moreover, the implicit functions f i are continuous and have

continuous partial derivatives with respect to all the x variables.

• It is then possible to find the partial derivatives of the implicit functions without

having to solve them for the y variables. Taking advantage of the fact that in

the neighborhood of the solution, the set of equations have a status of identities,

we can take the total differential of each equation and write dF j = 0. When

considering only dx1 6= 0 and setting the rest dxi = 0, the result, in matrix

notation, is (we will go through an example later in class):

∂F 1

∂y1∂F 1

∂y2... ∂F 1

∂yn∂F 2

∂y1∂F 2

∂y2... ∂F 2

∂yn

... ... ...∂Fn

∂y1∂Fn

∂y2... ∂Fn

∂yn

∂y1∂x1∂y2∂x1

...∂yn∂x1

= −

∂F 1

∂x1∂F 2

∂x1

...∂Fn

∂x1

41

• Finally, since |J | is non zero there is a unique nontrivial solution to this linear

system, which by Cramer’s rule can be identified in the following way:

∂yj∂x1

=|Jj||J |

.

This is for general problems. Optimization problems have a unique feature: the

condition that indeed |J | 6= 0. (What is J? it is simply the matrix of partial

second derivatives of L, or what we call the bordered Hessian). We will see that

later on.

This means that indeed we can take the maximum value function, or set of

equilibrium conditions, totally differentiate them and find how the endogenous

variables change with the exogenous ones in the neighbourhood of the solution.

For example, for the case of optimization with one equality constraint:

F 1(λ, x, y; b) = 0

F 2(λ, x, y; b) = 0

F 3(λ, x, y; b) = 0

is given by

b− g(x, y) = 0

fx − λgx = 0

fy − λgy = 0

We need to ensure that the Jacobian is not zero and then then we can use total

differentiation.

42

Coming back to the condition about the Jacobian, we need to ensure that:

|J | =

∣∣∣∣∣∣∣∣∂F 1

∂λ∂F 1

∂x∂F 1

∂y

∂F 2

∂λ∂F 2

∂x∂F 2

∂x∂F 3

∂λ∂F 3

∂x∂F 3

∂y

∣∣∣∣∣∣∣∣ 6= 0

or: ∣∣∣∣∣∣∣∣0 −gx −gy−gx fxx − λgxx fxy − λgxy−gy fxy − λgxy fyy − λgyy

∣∣∣∣∣∣∣∣ 6= 0

but the determinant of J , is that of the bordered Hessian H̄. Whenever sufficient

second order conditions are satisfied, we know that the determinant of the bordered

Hessian is not zero (in fact it is positive).

Now we can totally differentiate the equations:

gxdx+ gydy − 1db = 0

(fxx − λgxx)dx+ (fxy − λgxy)dy − gxdλ = 0

(fyx − λgyx)dx+ (fyy − λgyy)dy − gydλ = 0

where at the equilibrium solution, one can then solve for ∂x∂b, ∂y∂b, ∂λ∂b.

43

london school of economics professor leonardo felliecon.lse.ac.uk/staff/lfelli/teach/ec400 lecture...

Documents