september math course: multivariate calculus · september math course: multivariate calculus ......

September Math Course:Multivariate Calculus

Arina Nikandrova∗

1 Functions

Function y = f (x), where x is either be a scalar or a vector of several variables (x1, ..., xn) ,can be thought of as a “rule” which converts an input (denoted typically by x) into anoutput (denoted typically by y):

• y is a function of x if you can draw it from left to right without “doubling back,” i.e.,only one value of y should correspond to a value of x.

• y is a continuous function of x if you can draw it without removing your pencil fromthe page.

• y is a differentiable function of x if it is continuous and contains no kinks.

In this part of the course we will focus on functions where the input consists of manyvariables. Such functions are common in economics.

Example 1. A consumer’s utility is a function of all the goods he consumes. So if thereare n goods, then his utility is a function of the quantities (x1, x2, ..., xn) he consumes. Werepresent this by writing u (x1, x2, ..., xn).A firm’s production is a function of the quantities of all the inputs it uses. So, if (x1, x2, ..., xn)are the quantities of the inputs used by the firm and y is the level of output produced, thenwe have y = f (x1, x2, ..., xn) , where f (·) is the production function.

∗e-mail: [email protected]

1

Figure 1: Slope of y = 2x

2 First Order Derivative

2.1 First Order Derivative of Univariate Functions

Consider function of one variable, f (x) . If this function is differentiable at a given point,x0, it has both a value (its “height”), y0 = f (x0) , and a slope. The slope tells us the rateof change: how much y changes when x changes by a given amount.

Example 2. (Linear Function) The simplest function to consider is a linear function of theform y = ax + b. Start at any point (x0, y0) on the line and move along the line so thatthe x-coordinate increases by one unit. The corresponding change in the y-coordinate iscalled the slope of the line. The slope tells us the rate of change: how much y changeswhen x changes by a given amount. The defining characteristic of a line is that this rate ofchange is constant:

∆y∆x

=a (x0 + 1) + b− (ax0 + b)

x0 + 1− x0= a.

For non-linear functions the same change in x leads to different changes in y, dependingon the starting point x0.

Example 3. Consider a quadratic function y = x2. If we start at x0 = 1 and increase x by1, then y changes by 3 (i.e., 4− 1). If we start at x0 = 2, however, then increasing x by 1changes y by 5 (i.e., 9− 4). Thus the same change in x leads to different changes in y.

Consequently, for non-linear functions we cannot define a global notion of the slope.However, it is possible to define a notion of the slope which is valid when the changein x is “small.”

2

Figure 2: Change in y = x2 when x increases by 1 starting from x0 = 1 and x0 = 2

Example 4. Consider the quadratic function y = x2. The line y = 4x − 4 just touchesthe curve y = x2 at the point (x, y) = (2, 4). This follows as 4 = y = x2 = 22 and4 = y = 2x− 4 = 2× 4− 4. Such a line is called a tangent line. The tangent line has theproperty that it “looks the same as the function around the point at which it just touchesthe function.” The tangent line shows the rate of change in y at a point for small changesin x. The slope of the tangent line at point x0 is called the derivative at the point x0.

The derivative of a function f (x) at the point x0 is denoted by f ′ (x0) . The total differentialof f (x) at x0 represents the principal part of the change in a function y = f (x) with respectto changes in x and is defined by the following:

dy = f ′ (x0)dx.

The total differential is a way of understanding the local rate of change of the functionf (x) around the point x0. That is, it is an algebraic way of denoting the slope of a function(hence the alternative notation for a derivative: y′, dy

dx , d fdx ).

Example 5. (Production Costs) Imagine that y = c (x) represents the costs of productionin £ and x the quantity produced by a firm. The derivative of c (·) at a given x0 tells ushow costs change in response to a change in quantity, provided the change is small. Forexample, if we know that the derivative at x0 = 2 is 4, this tells us that if the quantityproduced changes by a small amount dx, then the impact on cost is given approximatelyby the total differential dy = 4dx. Economists have a special for the derivative of the costfunction: it is called marginal cost.

3

Figure 3: Tangent of y = x2 at the point (x, y) = (2, 4).

Figure 4: The total differential of f (x) at x0 represents the main part of the change in f (x)with respect to any – not necessarily small – changes in x.

4

Since the rate of change along a curve is changing constantly, the derivative has to be com-puted separately at each possible value of x. The derivative is thus a local phenomenon:it tells us something about the rate of change in the neighborhood of a point, but it givesno information about the rate of change globally.

Example 6. The information that the derivative of y = x2 (i.e., dy = 2xdx) at x = 2 is4 tells us that the rate of change in y is 4 when x is “close” to 2. It does not give anyinformation about the rate of change at x = 10, and so on.

Formally, the derivative can be thought of as a separate function of x, a slope or derivativefunction given by:

f ′ (x) ≡ limh→0

f (x + h)− f (x)h

.

Given a function y = f (x), the derivative function simply associates to every x the slopeof the tangent line at x. Typically, when we talk about the derivative, we mean the deriva-tive as a function. So when we want to talk about the value of the derivative at a point x0,we shall mention it by saying “the derivative at x0 is ... .”

2.2 Rules of Differentiation

• Differentiation is linear: For any functions f and g and any real numbers a and bthe derivative of the function h (x) = a f (x) + bg (x) with respect to x is h′ (x) =a f ′ (x) + bg′ (x) .

• Power function rule: The derivative of power function h (x) = xn is

h′ (x) = nxn−1.

Special cases include:

– Constant rule: if f is the constant function f (x) = c, for any number c, then forall x, f ′ (x) = 0.

– If f (x) = x, then f ′ (x) = 1.

These special cases imply that the derivative of an affine function is constant, i.e.,if f (x) = ax + b, then f ′ (x) = a. This makes sense as shifting a function doesn’tchange its slope and so additive constants disappear.

• The product rule: For any functions f and g the derivative of the function h (x) =f (x) g (x) with respect to x is

h′ (x) = f ′ (x) g (x) + f (x) g′ (x) .

5

• Quotient rule: The derivative of function h (x) = f (x)g(x) , where g (x) 6= 0, is:

h′ (x) =f ′ (x) g (x)− f (x) g′ (x)

g2 (x).

• The chain rule: The derivative of the function of a composite function h (x) = f (g (x))with respect to x is

h′ (x) = f ′ (g (x)) g′ (x) .

• The inverse function rule: If the function f has an inverse function g, meaning thatg ( f (x)) = x and f (g (y)) = y, then

g′ (y) =1

f ′ (g (y)).

• The basic rules for differentiating exponential and logarithmic functions:

– The derivative of f (x) = ex is f ′ (x) = ex, where e = 2.71828 is the Euler’snumber.

– The derivative of f (x) = ln x is f ′ (x) = 1/x, where ln is the natural logarithmwith the base e = 2.71828.

Intuition for Chain Rule: Let demand be a function of price, q (p) = a − bp, and pricevary with time, so that p (t) = t2. Then demand is a composite function that also dependson time, q (p (t)). How does demand vary with time?

• What is dqdp? Omit the influence of t and imagine that can vary p directly, then dq

dp =

−b

• What is dpdt ? By power rule, dp

dt = 2t

• Overall, we need to consider the chain reaction as a change in t leads to a change inp, in turn changing q :

ddt

q (p (t)) =dqdp

dpdt

= q′ (p) p′ (t) = −2bt.

where dpdt is the small change in p brought about by a small change in t, and dq

dp is thesmall change in q brought about by a small change in p.

To verify the validity, note that by substituting p (t) into q (p) , we get quantity as afunction of time q (t) = a− bt2 and q′ = −2bt.

6

Intuition for Inverse Function Rule: If y = f (x) is a strictly monotonic (or 1:1) function,its inverse, x = f−1 (y), is also a function. Formally:

f−1 (y) = {x : y = f (x)} .

Thus inverse is a function if to each value of y corresponds only one value of x, e.g.,parabola is ruled out (why?).

Example 7. Inverse of y = f (x) = ax + b is function g (y) = y−ba . Inverse of y = f (x) =

x2, where x > 0, is function g (y) =√

y. We can think of these two functions as inverses.If we take x as the input, apply f to it and then pass this output through the functiong, we get back x. Computationally, we just express x from equation y = f (x) to obtainx = f−1 (y) ≡ g (y) .

The derivatives of inverse functions are related to each other. If we apply chain rule toboth sides of x = g ( f (x)) , when g (·) ≡ f−1 (·):

g′ (y) =1

f ′ (x).

However, for the above display to make sense we need to express x in terms of y on theRHS.

Example 8. If y = f (x) = x2, where x > 0, then the derivative of its inverse is

g′ (y) =1

f ′ (x)=

12x

=1

2√

y,

where the last equality follows as by the definition of inverse x = f−1 (y) =√

y.

2.3 First Order Derivative 0f Multivariate Function

We have considered functions of a single variable until now. Most economic problemsinvolve more than one variable, so consider a function y = f (x1, x2, ..., xn) .

Partial Derivatives

The partial derivative of f with respect to xi is the derivative of f with respect to xi treatingall other variables as constants and is denoted by ∂ f /∂xi or fi:

∂

∂xif (x1, x2, ..., xn) ≡ lim

h→0

f (x1, ..., xi−1, xi + h, xi+1, ..., xn)− f (x1, x2, ..., xn)

h.

In order to calculate partial derivatives, we can apply the usual rules of differentiation.

7

(a) 3D graph (b) Cross section when L = 1

Figure 5: Cobb-Douglas production function f (K, L) = K0.5L0.5

Example 9. Consider a Cobb-Douglas production function f (K, L) = KαLβ, where K > 0is capital input, L > 0 is labour input and 1 > α, β > 0 are some constants. Then,

∂ f∂K

= αKα−1Lβ > 0

∂ f∂L

= βKαLβ−1 > 0.

So, for a given labor input, more capital raises output and, for a given capital input, morelabour raises output.

Mathematically, the partial derivative of f with respect to xi tells us the rate of changewhen only the variable xi is allowed to change. Economically, the partial derivatives giveus useful information:

• With a production function, the partial derivative with respect to the input, xi, tellsus the marginal productivity of that factor, or the rate at which additional outputcan be produced by increasing xi, holding other factors constant.

• With a utility function, the partial derivative with respect to good xi tells us therate at which the consumer’s well being increases when she consumes additionalamounts of xi holding constant her consumption of other goods, i.e., the marginalutility of that good.

8

Total Differentials

Partial derivatives are multivariate extensions of derivatives; total differentials are multi-variate extensions of differentials. For functions of more than one independent variable,y = f (x1, x2, ..., xn) , the partial differential of y with respect to any one of the variablesxi is the principal part of the change in y resulting from a change dxi in that one variable.The partial differential is therefore ∂y

∂xidxi involving the partial derivative of y with respect

to xi. The sum of the partial differentials with respect to all of the independent variablesis the total differential

dy =∂y∂x1

dx1 + · · ·+∂y∂xn

dxn,

which is the principal part of the change in y resulting from changes in all independentvariables.To gain some intuition about total differentials,1 suppose there are two variables and con-sider the plane y = a0 + a1x1 + a2x2. How does the function behave when we change x1and x2? Clearly, if dx1 and dx2 are the amounts by which we change x1 and x2, we have,dy = a1dx1 + a2dx2. Note furthermore that the partials are, ∂y

∂x1= a1 and ∂y

∂x2= a2. We can

then write total change in y as:

dy =∂y∂x1

dx1 +∂y∂x2

dx2.

Rewriting this in matrix notation:

dy =[

∂y∂x1

∂y∂x2

] [ dx1dx2

].

In the case of the plane, the vector of all partial derivatives is given by[

∂y∂x1

∂y∂x2

]=[

a1 a2]

. This vector tells us the rates of change in the directions x1 and x2.Now consider a more general two variable function, y = f (x1, x2). With a general func-tion, the idea is to find a plane which looks locally like the function around the point(x1, x2). Since the partial derivatives give the rates of change in x1 and x2, it makes sense to

1Recall that we motivated the notion of a derivative by saying that it was the slope of the line which“looked like the function around the point x0.” When we have n variables, the natural notion of a “line” isgiven by the following linear function:

y = a0 + a1x1 + a2x2 + ... + anxn. (1)

In general, the function (1) is referred to the equation of a plane (it certainly is the equation of a plane whenthere are two variables, x1 and x2).

9

-5

0

5

10x

-10-5

05 10

y

-200

-100

0

100

fHx,yL

(a) Function f (x, y) = −x2 − y2 (blue sur-face) and the tangent plane at point (4, 5)(red surface). The tangent plane, givenby z = −8 (x− 4) − 10 (y− 5) − 41, lookslike f (x, y) = −x2 − y2 around (4, 5) . Thederivative of f (x, y) = −x2 − y2 is theslopes in the two directions of the tangentplane.

-5 5 10x

-100

-50

f Hx,5L

(b) Cross-section when y = 5 : Theslope of the red line represents the par-tial derivative of f (x, y) = −x2 − y2

with respect x at point (4, 5) .

Figure 6: Function f (x, y) = −x2 − y2 and its derivative.

pick the appropriate plane which passes through the point (x1, x2) and has slopes ∂y/∂x1and ∂y/∂x2 in the two directions. The derivative of the function f (x1, x2) at (x1, x2) issimply the vector

[∂y∂x1

∂y∂x2

], where the partial derivatives are evaluated at the point

(x1, x2). We can interpret the derivative as the slopes in the two directions of the planewhich looks “like the function” around the point (x1, x2).For a general function of n variables, y = f (x1, x2, ..., xn) , the derivative of f at point(x1, x2, ..., xn) is the vector of partial derivatives

[∂y∂x1

... ∂y∂xn

]. This vector defines a lin-

ear map, which is the best linear approximation of the function f near the point (x1, x2, ..., xn).This linear map is thus the generalization the usual notion of derivative.

Example 10. For function f (K, L) = KαLβ, the vector of partial derivatives is[∂ f∂K

∂ f∂L

]=[

αKα−1Lβ βKαLβ−1]

.

Then total differential of f is:

d f =[

αKα−1Lβ βKαLβ−1] [ dK

dL

].

10

Total Derivatives

While the partial derivative of f with respect to xi treats all other arguments of f as con-stants, the total derivative of f acknowledges that other arguments of f may also varywith xi due to some postulated relationship. Finding the total derivative relies on thechain rule.

Definition. Consider function f (x, y, z, t) , where x, y, and z depend on t. Then, the chainrule is given by:

d fdt

=∂ f∂x

dxdt

+∂ f∂y

dydt

+∂ f∂z

dzdt

+∂ f∂t

.

In particular notice thatd fdt6= ∂ f

∂t,

as t has a direct effect on f , given by ∂ f∂t and an indirect effect through its effect on x, y and

z.

Example 11. Consider a functiony = 3x− w2,

wherex = 2w2 + w + 4.

Here w has a direct effect on y, given by ∂y∂w and an indirect effect through its effect on x.

Hence, the total derivative of y with respect to w is

dydw

=∂y∂x

dxdw

+∂y∂w

= 3 (4w + 1)− 2w.

Note that unless w = −1/4,dydw6= ∂y

∂w.

A more complicated example.

Example 12. Consider a function

y = f (x1, x2, w) ,

wherex1 = g (w)

11

andx2 = h (w) .

Here w has a direct effect on y, given by ∂y∂w and an indirect effect through its effect on x1

and x2. Hence, the total derivative of y with respect to w is

dydw

=∂y∂x1

dx1

dw+

∂y∂x2

dx2

dw+

∂y∂w

= f1 (x1, x2, w) g′ (w) + f1 (x1, x2, w) h′ (w) + f3 (x1, x2, w) .

Problem 1. Consider the function

z = x2y− 10x− 1t3 ,

where x = e1−y and t = 3y.

1. Find the partial derivative of z with respect to y.

2. Find the total derivative of z with respect to y, dz/dy.

3 Unconstrained Optimization

3.1 Univariate Case

We will consider the following maximization problem

maxx

f (x)

or minimization problemmin

xf (x) .

First Order Conditions: Necessary Conditions for Local Extrema

If a differentiable function f (x) reaches its maximum or minimum at point x∗ then f ′(x∗) =0. To see this consider the total differential:

dy = f ′ (x∗)dx.

If the function reaches a maximum or minimum at x∗ then it must be impossible to in-crease or decrease the value of the function by small changes in x. However, if f ′ (x∗) 6= 0,

12

then it is always possible to make by larger or smaller by making (small) appropriatechanges in x. Therefore, we must have f ′ (x∗) = 0 at a maximum or a minimum.Any point satisfying the condition f ′ (x∗) = 0 may be referred to as a stationary point;when a point satisfying f ′ (x∗) = 0 is a minimum or a maximum, it is referred to as acritical value or extremum.We need to distinguish between local (or relative) extrema and global extrema. Figure 7aillustrates the difference, which is also explained in Definition 1.

Definition 1. A point x∗ is called a global maximum of the function f (x) if f (x∗) ≥ f (x)for all x in the domain of f . A point x is called a local maximum of the function f (x)if there is a “small interval” centered at x∗ such that f (x∗) ≥ f (x) for all x in this smallinterval.A point x∗ is called a global minimum of the function f (x) if f (x∗) ≤ f (x) for all x inthe domain of f . A point x is called a local minimum of the function f (x) if there is a“small interval” centered at x∗ such that f (x∗) ≤ f (x) for all x in this small interval.

The condition f ′ (x∗) = 0 at a maximum or minimum is valid only if x∗ is in the “interior”of the domain of the function. This is because the argument for showing that f ′ (x∗) = 0 isa necessary condition for x∗ to be a maximum or a minimum relies on the ability to makesmall changes in x around x∗. However, at a “boundary point” we cannot make certainchanges. For instance, if the function is defined for all x in the interval [a, b], then at a,we can only increase x, while at b, we can only decrease x. Hence, it is possible that themaximum (or minimum) occurs at a or b and yet this boundary point does not satisfythe necessary condition for maximization (or minimization). For example, in Figure 7athe global minimum of a function defined for x ∈ [0, 6] occurs at point x = 0 and theglobal maximum occurs at point x = 6, neither of which satisfies the first order conditionf ′ (x∗) = 0.Condition f ′ (x∗) = 0 is called a necessary condition because it cannot guarantee that x isindeed a maximum or minimum. It is entirely possible that f ′ (x∗) = 0 but x∗ is neither amaximum nor a minimum.

Example 13. Consider function f (x) = (x + 2)3 + 5. Note that f ′ (−2) = 0, but pointx = −2 is neither maximum, nor minimum (see Figure 7b).

Second Order Conditions: Sufficient Conditions for Local Extrema

Condition f ′ (x∗) = 0 on its own does not distinguish local maxima from local minima.To tell whether point x∗ is a local maximum or a local minimum, we need to look at thesign of function f ′ (x) in the immediate neighborhood of x∗, where neighborhood is definedas points immediately to the left and immediately to the right of x∗:

13

(a) Function f (x) defined for x ∈ [0, 6] : Each pointwhere f ′ (x) = 0 corresponds to either local mini-mum or local maximum, but condition f ′ (x) = 0does not identify global minimum or maximum.Moreover, condition f ′ (x) = 0 on its own does notdistinguish local maximum from local minimum.

-6 -4 -2 2x

-60

-40

-20

20

40

60

f HxL

(b) The point where f ′ (x) = 0 is thepoint of inflection.

Figure 7: The first order condition f ′ (x) = 0 is a necessary, but not sufficient condition forlocal minima and maxima

14

1 2 3 4 5 6x

-4

-2

2

4f HxL

(a) f (x) = (x− 3)2 + 4: Point x = 3is a maximum as f ′ (x) is decreasing(changes sign from positive to nega-tive) in the neighborhood of x = 3.

1 2 3 4 5 6x

2

4

6

8

10

12

f HxL

(b) f (x) = − (x− 3)2 + 4: Point x = 3is a minimum as f ′ (x) is increasing(changes sign from negative to posi-tive) in the neighborhood of x = 3.

Figure 8: The second order conditions, i.e, the conditions on the sign of f ′′ (x) , are suffi-cient for determining local minima and maxima

• Point x = x∗ is a local maximum if in the neighborhood of x∗, f ′ (x) is positive forx < x∗ and is negative for x > x∗;

• Point x = x∗ is a local minimum if in the neighborhood of x∗, f ′ (x) is negative forx < x∗ and is positive for x > x∗;

• Point x = x∗ is neither a local maximum nor a local minimum if in the neighborhoodof x∗, f ′ (x) does not change sign.

An equivalent way to express the above conditions is to say that

• Point x = x∗ is a local maximum if in the neighborhood of x∗, f ′ (x) is a decreasingfunction;

• Point x = x∗ is a local minimum if in the neighborhood of x∗, f ′ (x) is an increasingfunction;

• Point x = x∗ is neither a local maximum nor a local minimum if in the neighborhoodof x∗, f ′ (x) is neither increasing nor decreasing.

15

This last set of conditions can be expressed more succinctly in terms of second orderderivatives, but it requires a few new definitions. Recall that function

f ′ (x) ≡ limh→0

f (x + h)− f (x)h

.

is the first derivative of the function f . The first derivative indicates whether a functionis increasing or decreasing. A function f (x) is weakly decreasing at point x if f ′ (x) ≤ 0;a function f (x) is weakly increasing at point x if f ′ (x) ≥ 0. If the inequalities are strict,then the function is strictly decreasing or strictly increasing.Since the derivative itself is a function, we can take its derivative. This is called the secondderivative and denoted d2 f /dx2 or f ′′ (x). Formally,

d2 fdx2 =

ddx

(d fdx

).

The second derivative indicates whether the first derivative of a function is increasing ordecreasing, thereby describing the curvature of the function.

Definition 2. A function f (x) is called concave if f ′′ (x) ≤ 0 at all points of its domain; afunction f (x) is called convex if f ′′ (x) ≥ 0 at all points of its domain. If the inequalitiesare strict, then the function is called strictly concave or strictly convex.

Example 14. The function f (x) = x2 is convex on its domain; function g (x) = ln x isconcave on the domain x > 0.

A function may be neither concave nor convex on its entire domain.

Example 15. Consider f (x) = −2x3/3 + 10x2 + 5 defined for x ≥ 0. In this case, f ′′ (x) =−4x + 20 and thus:

• for 0 < x ≤ 5, f ′′ (x) ≥ 0 and function is convex;

• for x > 5, f ′′ (x) < 0 and function is concave.

Definition 3. A function f (x) is called concave at x∗ if f ′′ (x∗) ≤ 0; a function f (x) iscalled convex at x∗ if f ′′ (x∗) ≥ 0

Recall Figure 7b where f ′ (−2) = 0, but point x = −2 is neither maximum, nor minimum.This point is called the inflection point.

Definition 4. The point where a function changes its curvature is called an inflectionpoint.

16

-2 -1 1 2x

1

2

3

4f HxL

(a) f (x) = x2 is convex for all x ∈(−∞, ∞)

1 2 3 4x

-2

-1

1

f HxL

(b) g (x) = ln x is concave for all x ∈(0, ∞)

Figure 9: An example of (a) a strictly convex and (b) a strictly concave function.

2 4 6 8 10x

50

100

150

200

250

300

f HxL

Figure 10: Function f (x) = −2x3/3 + 10x2 + 5: point x = 5 is an inflection point wherethe function changes its curvature from convex (for 0 < x < 5) to concave (for x > 5).

17

As an aside, note that since the second derivative is also a function, we can also take itsderivative. This is called the third derivative and denoted f ′′′ (x) to indicate that this func-tion is found by three successive operations of differentiation, starting with the function f .One can continue this process, but we will typically not go beyond the second derivative.

Example 16. Suppose that f (x) = x5. Then, f ′ (x) = 5x4, f ′′ (x) = 20x3 and f ′′′ (x) =60x2.

The observation that the second derivative indicates whether the first derivative of a func-tion is increasing or decreasing leads to the following set of necessary and sufficient con-ditions for identifying maxima and minima:

• If f ′ (x∗) = 0 and f ′′ (x∗) < 0, then x∗ is a local maximum of f (x) ;

• If f ′ (x∗) = 0 and f ′′ (x∗) > 0, then x∗ is a local minimum of f (x) .

The necessary condition only identifies a local maximum or minimum, but not a globalmaximum or minimum. However, the local maxima of a function that is concave on itsentire domain are also global maxima. Similarly, the local minima of globally convexfunctions are also global minima. That is:

• If f ′ (x∗) = 0 and f ′′ (x) < 0 for all x in the domain of f , then x∗ is a global maximumof f (x) ;

• If f ′ (x∗) = 0 and f ′′ (x) > 0 for all x in the domain of f , then x∗ is a global minimumof f (x) .

Function depicted in Figure 8a strictly concave on its entire domain and thus point x = 3is a global maximum; function depicted in Figure 8b strictly convex on its entire domainand thus point x = 3 is a global minimum.The necessary and sufficient conditions for local extrema require f ′′ (x∗) 6= 0. Whenf ′′ (x∗) = 0, point x∗ can be either minimum or maximum or neither of the two. In thiscase we need to use an N−th derivative test.

• If f ′ (x∗) = 0, f ′′ (x∗) = 0,..., f (N−1) (x∗) = 0, f N (x∗) < 0, where N is even, thenpoint x∗ is a maximum;

• If f ′ (x∗) = 0, f ′′ (x∗) = 0,..., f (N−1) (x∗) = 0, f N (x∗) > 0, where N is even, thenpoint x∗ is a minimum;

• If f ′ (x∗) = 0, f ′′ (x∗) = 0,..., f (N−1) (x∗) = 0, f N (x∗) 6= 0, where N is odd, thenpoint x∗ is a point of inflection.

18

Solved Examples

Example 17. Suppose the monopolist’s profit function is given by

Π (q) = pq− c (q) = (100− q) q− q2.

The monopolist aims to maximize profit and thus solves:

maxq

(100− q) q− q2.

From the necessary first order conditions it follows that

Π′ (q) = 100− 4q = 0.

So q∗ = 25 is a candidate for a maximum. To check that this indeed is the maximum, weneed to check the second order conditions for optimization. The second derivative

Π′′ (q) = −4 < 0

for all q and, in particular, for q = 25. Hence q = 25 is a global maximum.

Another economic example.

Example 18. Suppose that firm minimizes its average cost, which is defined for q > 0 andis given by:

C (q) = 100/q + q.

Then, first-order conditions imply:

C′ (q) = −100/q2 + 1 = 0

Therefore, q∗ = 10 (negative output is not allowed). Since,

C′′ (q) = 200/q3 > 0

for all q > 0, q∗ = 10 is a global minimum.

3.2 Multivariate Case

Consider the general maximization problem:

maxx1,...xn

f (x1, x2, ..., xn) .

19

The first order conditions for maximization require the first order differential to be zero atthe optimal point. That is, a vector of small changes (dx1, dx2, ..., dxn) should not changethe value of the function. We thus have

d f =∂ f∂x1

dx1 + · · ·+∂ f∂xn

dxn = 0.

This can be satisfied if∂ f∂x1

= 0,∂ f∂x2

= 0, ...,∂ f∂xn

= 0.

These conditions are necessary conditions and they must also hold for minimization prob-lems.As in the single variable case, we are really after maxima and minima. The first orderconditions alone cannot distinguish between local maxima and local minima. Likewise,the first order conditions cannot identify whether a candidate solution is a local or globalmaxima. We thus need second order conditions to help us. For a point to be a (local)maximum, we must have d2 f < 0 for any vector of (small) changes (dx1, dx2, ..., dxn); thatis, f needs to be a (locally) strictly concave function. Similarly, for a point to be a (local)minimum, we must have d2 f > 0 for any vector of (small) changes (dx1, dx2, ..., dxn); thatis, f needs to be a (locally) strictly convex function.2

Definition 5. Point (x∗1 , x∗2 , ..., x∗n) is a local maximum if for all i

∂ f∂xi|x1=x∗1 ,...,xn=x∗n= 0

and function f (x1, x2, ..., xn) is concave at (x∗1 , x∗2 , ..., x∗n).Point (x∗1 , x∗2 , ..., x∗n) is a local minimum if for all i

∂ f∂xi|x1=x∗1 ,...,xn=x∗n= 0

and function f (x1, x2, ..., xn) is convex at (x∗1 , x∗2 , ..., x∗n).Point (x∗1 , x∗2 , ..., x∗n) is a global maximum if for all i

∂ f∂xi|x1=x∗1 ,...,xn=x∗n= 0

2In the definition below, notation∂ f∂xi|x1=x∗1 ,...,xn=x∗n

should be understood as a partial derivative of f with respect to xi evaluated at a point x∗ =(x∗1 , x∗2 , ..., x∗n

).

20

and function f (x1, x2, ..., xn) is concave for all (x1, x2, ..., xn) (see Figure 11a).Point (x∗1 , x∗2 , ..., x∗n) is a global minimum if for all i

∂ f∂xi|x1=x∗1 ,...,xn=x∗n= 0

and function f (x1, x2, ..., xn) is convex for all (x1, x2, ..., xn) (see Figure 11b).

If at the point (x∗1 , x∗2 , ..., x∗n) where for all i

∂ f∂xi|x1=x∗1 ,...,xn=x∗n= 0,

function f is neither convex, nor concave, then point (x∗1 , x∗2 , ..., x∗n) is a saddle point (seeFigure 12).Now we need tools for identifying whether a multivariate function is concave or convex.

Higher-Order Derivatives of Multivariate Functions

A single variable function f (x) is strictly concave is f ′′ (x) < 0 and is strictly convex iff ′′ (x) > 0. Notice that in the single-variable case, the second-order total differential is:

d2y = f ′′ (x) (dx)2 .

Hence, we can (equivalently) define a function of one variable to be strictly concave ifd2y < 0 and strictly convex if d2y > 0. The advantage of writing it in this way is that wecan extend this definition to functions of many variables.

A multivariate function f (x1, x2, ..., xn) is strictly concave if d2y < 0 and strictly convex ifd2y > 0. This imposes certain restrictions on its second-order partial derivatives.

Second-order partial derivatives

Given a function f (x1, x2, ..., xn), the second-order derivative ∂ f 2/∂xi∂xj is the partialderivative of ∂ f /∂xi with respect to xj. The above may suggest that the order in which thederivatives are taken matters and that the partial derivative of∂ f /∂xi with respect to xj isdifferent from the partial derivative of ∂ f /∂xj with respect to xi. While this can happen,it turns out that if the function f (x1, x2, ..., xn) is well-behaved then the order of differen-tiation does not matter. This result is called Young’s Theorem. We will be dealing withwell-behaved functions for which Young’s Theorem holds.

21

-5

0

5x

-50

5

y

-100

-50

0

f Hx,yL

(a) f (x1, x2) = −2x21 − 2x2

2: the function is concave for all(x1, x2), hence point (x1, x2) = (0, 0) is a global maximum.

-50

5x1

-50

5x2

0

50

100

f Hx1,x2L

(b) f (x1, x2) = 2x21 + 2x2

2: the function is convex for all (x1, x2),hence point (x1, x2) = (0, 0) is a global minimum.

Figure 11: An example of (a) a strictly concave and (b) a strictly convex function of twovariables.

22

-50

5x1

-5 0 5

x2

-50

0

50

f Hx1,x2L

Figure 12: f (x1, x2) = 2x21 − 2x2

2: the function is convex in the direction of x1 and concavein the direction of x2, hence point (x1, x2) = (0, 0) is a saddle point.

Example 19. Consider a Cobb-Douglas production function f (K, L) = KαLβ, where K > 0is capital input, L > 0 is labor input and 1 > α, β > 0 are some constants. For this functionwe can evaluate the second-order partial derivative, ∂2 f /∂L∂K , in two different ways.First, since

∂ f∂L

= βKαLβ−1,

taking the partial derivative of this with respect to K, we get,

∂2 f∂L∂K

=∂

∂K

(∂ f∂L

)= αβKα−1Lβ−1.

Alternatively, since∂y∂K

= αKα−1Lβ,

∂2 f∂K∂L

=∂

∂L

(∂ f∂K

)= αβKα−1Lβ−1.

This illustrates Young’s Theorem: no matter in which order we differentiate, we get thesame answer.Note that if K > 0 and L > 0,

∂2 f∂K∂L

> 0,

23

which means that the marginal productivity of labor (capital) increases as we add morecapital (labor). At the same time, if α < 0 and K > 0, L > 0,

∂2 f∂K2 = α (α− 1)Kα−1Lβ < 0.

This means that the marginal productivity of capital decreases as we add more capital.

Concavity and convexity of a multivariate function

We say a function to be concave if d2y ≤ 0 for all x and to be convex if d2y ≥ 0 for all x.If the function satisfies a stronger condition, d2y < 0 for all x, then it is strictly concave.Analogously, if d2y > 0 for all x, it is strictly convex.Consider a two variable function, y = f (x1, x2). Its differential is:

dy =∂y∂x1

dx1 +∂y∂x2

dx2,

which again can be viewed as a function of x1 and x2. Taking a differential, we obtain

d (dy) =

[∂2y∂x2

1dx1 +

∂2y∂x1∂x2

dx2

]dx1 +

[∂2y

∂x2∂x1dx1 +

∂2y∂x2

2dx2

]dx2,

which after collecting terms yields:

d2y =∂2y∂x2

1(dx1)

2 + 2∂2y

∂x1∂x2dx2dx1 +

∂2y∂x2

2(dx2)

2 .

Thus, the second-order total differential depends on the second-order partial derivativesof f (x1, x2) . For a general function y = f (x1, x2, ..., xn) , one can use a similar procedureto get the formula for the second-order total differential. This is a little more complicatedbut it can be written compactly as follows:

d2y =n

∑i=1

n

∑j=1

∂2y∂xi∂xj

dxidxj.

As things stand, it is not clear how to go about verifying that the second-order total dif-ferential of a function of n variables is never positive or never negative. However, noticethat we can write the second order differential of a function of two variables,

d2y =∂2y∂x2

1(dx1)

2 + 2∂2y

∂x1∂x2dx2dx1 +

∂2y∂x2

2(dx2)

2 ,

24

in matrix form in the following way:

d2y =[

dx1 dx2] ∂2y

∂x21

∂2y∂x1∂x2

∂2y∂x2∂x1

∂2y∂x2

2

[ dx1dx2

].

The matrix of second-order partial derivatives

H ≡

∂2y∂x2

1

∂2y∂x1∂x2

∂2y∂x2∂x1

∂2y∂x2

2

is called Hessian matrix, which is symmetric by Young’s Theorem. For a general functiony = f (x1, x2, ..., xn) ,

d2y =[

dx1 dx2 . . . dxn]

∂2y∂x2

1

∂2y∂x1∂x2

. . . ∂2y∂x1∂xn

∂2y∂x2∂x1

∂2y∂x2

2. . . ∂2y

∂x2∂xn...

... . . . ...∂2y

∂xn∂x1

∂2y∂xn∂x2

. . . ∂2y∂x2

n

dx1dx2

...dxn

and thus

H ≡

∂2y∂x2

1

∂2y∂x1∂x2

. . . ∂2y∂x1∂xn

∂2y∂x2∂x1

∂2y∂x2

2. . . ∂2y

∂x2∂xn...

... . . . ...∂2y

∂xn∂x1

∂2y∂xn∂x2

. . . ∂2y∂x2

n

Now it is clear that to determine whether multivariate function is concave or convex, weneed to know the sign of d2y, i.e., we are interested in the sign of the quadratic form:

d2yscalar

= dx′(1×n)

H(n×n)

dx(n×1)

=[

dx1 dx2 . . . dxn]

∂2y∂x2

1

∂2y∂x1∂x2

. . . ∂2y∂x1∂xn

∂2y∂x2∂x1

∂2y∂x2

2. . . ∂2y

∂x2∂xn...

... . . . ...∂2y

∂xn∂x1

∂2y∂xn∂x2

. . . ∂2y∂x2

n

dx1dx2

...dxn

.

For a given symmetric matrix H and for any x ∈ Rn five situations may arise:

25

Definition 6. An (n× n) matrix H is:

• positive definite if x′Hx > 0 for any (n× 1) vector x ∈ RN, x 6= 0n (note that x 6= 0nmeans that at least one element of x is not equal 0n).

• positive semidefinite if x′Hx ≥ 0 for any (n× 1) vector x ∈ RN, x 6= 0n

• negative definite if x′Hx < 0 for any (n× 1) vector x ∈ RN, x 6= 0n

• negative semidefinite if x′Hx ≤ 0 for any (n× 1) vector x ∈ RN, x 6= 0n

• indefinite if x′Hx > 0 for at least one vector x 6= 0n and x′Hx < 0 for at least onevector x 6= 0n.

From the discussion above, if the Hessian is negative definite for all (x1, ..., xn), the function isstrictly concave. If the Hessian is positive definite for all (x1, ..., xn), the function is strictly convex.So to determine whether a function is concave or convex, we need to be able to determinewhether the Hessian matrix is negative definite or positive definite.We can classify a symmetric matrix H in one of the above categories using either eigen-value test or the principal minor test.Eigenvalue TestThe quadratic form x′Hx is:

• positive (semi)definite if and only if all the eigenvalues of H are strictly positive(non-negative);

• negative (semi)definite if and only if all the eigenvalues of H are strictly negative(non-positive).

Example 20. Consider matrix

A =

1 4 64 2 16 1 6

.

The characteristic equation is

det (A− λI) = (1− λ)(

λ2 − 8λ + 11)− 4 (18− 4λ) + 6 (6λ− 16) = 0.

This equations of order three with no obvious factorization seems difficult to solve!

Principal Minor Test

26

Definition 7. Let H be an n× n matrix. An i-th order principal minor of H is the deter-minant of a submatrix of H obtained by deleting n− i rows and the n− i columns withthe same index. The i-th (order) leading principal minor of H is the determinant of thesubmatrix obtained from H by deleting the last n− i rows and columns.

Example 21. Let A be a 3× 3 matrix

A =

a11 a12 a13a21 a22 a23a31 a32 a33

.

Principal MinorsThere is one third order principal minor of A, det (A). There are three second order prin-cipal minors:

• det[

a11 a12a21 a22

], where the submatrix in the minor’s calculation is obtained by delet-

ing the third row and third column of A.

• det[

a11 a13a31 a33


ing the second row and second column of A.

• det[

a22 a23a32 a33


ing the first row and first column of A.

There are also three first order principal minors: a11 formed by deleting the last two rowsand columns; a22 formed by deleting the first and last rows and columns; and a33 formedby deleting the first two rows and columns.Leading Principal MinorsThe ith leading principal minor of the determinant of the submatrix obtained from A bydeleting all columns and all rows after the i-th. Thus

first l.p.m. = a11

second l.p.m. = det[

a11 a12a21 a22

]

third l.p.m. = det

a11 a12 a13a21 a22 a23a31 a32 a33

.

27

Principal Minor Test:

• The quadratic form x′Hx is positive definite if and only if all leading principal minorsH are positive.

• The quadratic form x′Hx is negative definite if and only if its leading principal minorsof H alternate in sign, the first being negative (i.e. the first is negative, the secondis positive, the third is negative and so on, that is, the i-th order leading principalminor has the sign of (−1)i.

• The quadratic form x′Hx is positive semidefinite for every principal minor is ≥ 0.

• The quadratic form x′Hx is negative semidefinite if every principal minor of H of oddorder is ≤ 0 and every principal minor of even order is ≥ 0.

Note that in the first two cases, it is enough to check the inequality for all the leadingprincipal minors (i.e. for 1 ≤ i ≤ n). In the last two cases, we must check for all principal

minors (i.e for each i with 1 ≤ i ≤ n and for each of the(

ni

)principal minors of order

i).

Example 22. Matrix [1 11 4

]is positive definite.Matrix [

−1 11 −4

]is negative definite.Matrix [

−1 11 4

]is neither positive definite nor negative definite.Matrix 1 4 6

4 2 16 1 6

is indefinite.

In the case of a function of two variables, y = f (x1, x2) :

28

• d2y is positive definite (and thus function is convex) if

∂2y∂x2

1> 0 and

|H| =

∣∣∣∣∣∣∂2y∂x2

1

∂2y∂x1∂x2

∂2y∂x2∂x1

∂2y∂x2

2

∣∣∣∣∣∣ =∂2y∂x2

1× ∂2y

∂x22−(

∂2y∂x1∂x2

)2

> 0;

• d2y is negative definite (and thus function is concave) if

∂2y∂x2

1< 0 and

|H| =

∣∣∣∣∣∣∂2y∂x2

1

∂2y∂x1∂x2

∂2y∂x2∂x1

∂2y∂x2

2

∣∣∣∣∣∣ =∂2y∂x2

1× ∂2y

∂x22−(

∂2y∂x1∂x2

)2

> 0.

Note that the condition |H| > 0 implies that ∂2y∂x2

1and ∂2y

∂x22

should have the same sign

for both positive definite and negative definite H.

Conditions for stationary point of y = f (x1, x2)

Maximum Minimum Saddle PointFOC: ∂y

∂x1= 0, ∂y

∂x1= 0, ∂y

∂x1= 0,

∂y∂x2

= 0 ∂y∂x2

= 0 ∂y∂x2

= 0

SOC: ∂2y∂x2

1, ∂2y

∂x22< 0, ∂2y

∂x21, ∂2y

∂x22> 0,

∂2y∂x2

1

∂2y∂x2

2−(

∂2y∂x1∂x2

)2> 0 ∂2y

∂x21

∂2y∂x2

2−(

∂2y∂x1∂x2

)2> 0 ∂2y

∂x21

∂2y∂x2

2−(

∂2y∂x1∂x2

)2< 0

If∂2y∂x2

1

∂2y∂x2

2−(

∂2y∂x1∂x2

)2

= 0,

the test fails and we need to check the other principal minors to determine whether thestationary point is a maximum, a minimum or a saddle point.

29

Extended Example: Firm’s Profit Maximization

Suppose a firm can sell it’s output at p per unit and that its production function is given byy = AKαLβ. What combination of capital and labor should the firm use so as to maximizeprofits assuming that capital costs r per unit and labor w per unit?The firm’s profits are given by revenue minus costs:

π̃(K, L) = pAKαLβ − rK− wL.

Firm aims to maximize profits, i.e., it solves the following unconstrained optimizationwith multiple variables:

maxK,L

π̃(K, L) = maxK,L

pAKαLβ − rK− wL.

We can use the first order conditions to obtain potential candidates for optimization. Thefirst order conditions (FOC) are:

∂π̃

∂K= αpAKα−1Lβ − r = 0

∂π̃

∂L= βpAKαLβ−1 − w = 0.

At the point where FOC are satisfied the objective function attains the maximum only ifit is a concave function. In multivariate setting function is strictly concave if the matrix ofits second order derivatives, called Hessian, is negative definite. In this problem Hessianis given by

H =

(∂2π/∂K2 ∂2π/∂K∂L

∂2π/∂K∂L ∂2π/∂L2

)=

(α(α− 1)ApKα−2Lβ αβApKα−1Lβ−1

αβApKα−1Lβ−1 β(β− 1)ApKαLβ−2

)To verify whether matrix is negative definite one can look at leading principal minors andcheck whether they alternate in sign with odd order principal minors being negative andeven order principal minors being positive. In this problem this requirement reduces tothe following set of inequalities:

α(α− 1)ApKα−2Lβ < 0

β(β− 1)ApKαLβ−2 < 0det(H) > 0

30

Note that

det(H) = ∂2π/∂K2∂2π/∂L2 −(

∂2π/∂K∂L)2

=(

αβ(α− 1)(β− 1)− α2β2) (

ApKα−1Lβ−1)2

Thus SOC are given by

α(α− 1)ApKα−2Lβ < 0

β(β− 1)ApKαLβ−2 < 0(αβ(α− 1)(β− 1)− α2β2

) (ApKα−1Lβ−1

)2> 0.

SOC inequalities are satisfied if

α− 1 < 0β− 1 < 0

αβ(α− 1)(β− 1)− α2β2 > 0,

where the last inequality is satisfied if α + β < 1 (follows after expanding the product,simplifying and remembering that α > 0 and β > 0).

4 Constrained Optimization

Until now, we have considered unconstrained problems. Usually, economic agents facenatural constraints.

Example 23. Consumer’s Problem: Suppose that a consumer has a utility function U (x1, x2) =

x1/21 x1/2

2 , the price of x1 is p1, the price of x2 is p2 and the consumer has m in income. Howmuch of the two goods should the consumer purchase to maximize her utility?

In producer theory we are frequently interested in the following minimization problem:

Example 24. Firm’s Problem Suppose that a firm’s production function is given by f (K, L) =K1/3L2/3, the price of capital is r and the price of labor is w. What is the least cost way forthe firm to produce Q units of output?

Both of the above problems have a common mathematical structure:

maxx1,...xn

f (x1, x2, ..., xn) subject to g (x1, x2, ..., xn) = 0.

31

We say that f (x1, x2, ..., xn) is the objective function, g (x1, x2, ..., xn) = 0 is the constraintand x1, x2, ..., xn are the choice variables. We are interested in finding a solution to thisproblem

x∗ =

x∗1x∗2...

x∗n

.

The value function for this problem is derived by substituting x∗ into the objective functionto obtain f (x∗1 , x∗2 , ..., x∗n) .It is also possible that instead of maximizing f (x1, x2, ..., xn) we could be minimizingf (x1, x2, ..., xn).

Example 25. (Example 23 continued) Utility maximization problem can be written as:

maxx1,x2

x1/21 x1/2

2 subject to p1x1 + p2x2 = m.

The solution to the problem is a Marshallian demand as a function of prices and income,i.e., x∗1 = x1 (p1, p2, m) and x∗2 = x2 (p1, p2, m) , while the objective function evaluated atthe optimum is an indirect utility function:

v (p1, p2, m) = (x∗1)1/2 (x∗2)

1/2 .

Similarly,

Example 26. (Example 24 continued) Firm’s cost minimization problem can be stated as:

minK,L

rK + wL s.t Q = K1/3L2/3.

The solution to the problem is a conditional input demand as a function of r, w and Q, i.e.,Kc = K(r, w, Q) and Lc = L(r, w, Q), while the objective function evaluated at the opti-mum is a cost function that gives the cost of producing the required level of output Q :

c(r, w, Q) = rKc + wLc.

4.1 Direct Substitution

When the constraint(s) are equalities, we can convert the problem from a constrainedoptimization to an unconstrained optimization problem by substituting for some of thevariables.

32

Example 27. (Example 23 continued) In the consumer’s utility maximization problem p1x1 +p2x2 = m. Hence,

x1 =1p1

m− p2

p1x2.

Substituting this into the objective function,

maxx2

(1p1

m− p2

p1x2

)1/2

x1/22 .

This is a function of just x2 and we can now maximize this function with respect to x2. Byincorporating the constraint into the objective function, we transformed the constrainedoptimization problem into the unconstrained optimization problem, which we know howto solve. The first order conditions give:

12

x−1/22

(1p1

m− p2

p1x2

)1/2

− 12

p2

p1

(1p1

m− p2

p1x2

)−1/2

x1/22 = 0

Solving for x2:1p1

m− p2

p1x2 =

p2

p1x2

=⇒ x2 =12

mp2

=⇒ x1 =1p1

m− p2

p1x2 =

12

mp1

.

Firm’s problem can be solved similarly.

4.2 The Lagrangian Approach

The substitution technique has serious limitations:

• In some cases, we cannot use substitution easily: for instance, suppose the constraintis x4 + 5x3y + y2x + x6 + 5 = 0. Here, it is not possible to solve this equation to getx as a function of y or vice versa.

• Moreover, in many cases, the economic constraints are written in the form g (x1, x2, ..., xn) ≥0 or g (x1, x2, ..., xn) ≤ 0. While the Lagrangian technique can be modified to takecare of such cases, the substitution technique cannot be modified, or can be modifiedonly with some difficulty.

33

Given a problem

maxx1,...xn

f (x1, x2, ..., xn) subject to g (x1, x2, ..., xn) = 0

write down the Lagrangian function

L (x1, x2, ..., xn, λ) = f (x1, x2, ..., xn) + λg (x1, x2, ..., xn) .

Note that the Lagrangian is a function of n + 1 variables: (x1, x2, ..., xn, λ). We then lookfor the stationary points of the Lagrangian, that is, points where all the partial derivativesof the Lagrangian are zero. Using a Lagrangian, we get n + 1 first order conditions:

∂L∂xi

= 0, (i = 1, ..., n)

∂L∂λ

= 0.

Solving these equations will give us candidate solutions for the constrained optimizationproblem. Candidate solutions still need to be checked using the second-order conditions.

Example 28. (Example 23 continued) In the consumer’s utility maximization problem:

L (x1, x2, λ) = x1/21 x1/2

2 + λ (m− p1x1 − p2x2) .

The first order conditions are given by:

∂L∂x1

=12

x−1/21 x1/2

2 − λp1 = 0

∂L∂x2

=12

x−1/22 x1/2

1 − λp2 = 0

∂L∂λ

= p1x1 + p2x2 −m = 0.

Interpretation of FOC: If we divide the first two conditions, we get that

MRS12 =U1

U2=

p1

p2.

This says that at the optimum point, the slope of the indifference curve must be equal tothe slope of the budget line.To solve the problem, note that from the first two conditions it follows that

12p1

x−1/21 x1/2

2 = λ =1

2p2x−1/2

2 x1/21

34

orx2 =

p1

p2x1. (2)

Substituting this into the budget constraint yields:

x1 =12

mp1

.

Substituting x1 back into (2) and solving for x2 yields:

x2 =12

mp2

.

Firm’s problem can be solved similarly:

Example 29. (Example 24 continued) The Lagrangian for the firm’s problem is:

L = rK + wL− λ(

K1/3L2/3 −Q)

First order conditions:

∂L∂K

= r− λ

3K−2/3L2/3 = 0 (3)

∂L∂L

= w− 2λ

3K1/3L−1/3 = 0 (4)

∂L∂λ

= Q− K1/3L2/3 = 0 (5)

Taking ratio of 3 and 4 one obtains:rw

=L

2K(6)

Substituting for K in 5:

Q =(w

2rL)1/3

L2/3

From here expressing L it follows:

Lc =

(2rw

)1/3

Q.

Substituting L∗ into the ratio of first order 6 conditions to express K in terms of parameters,one obtains:

Kc =(w

2r

)2/3Q.

35

Note that the technique has been identical for both maximization and minimization prob-lems. This means that the first order conditions identified so far are only necessary condi-tions and not sufficient conditions. We shall look at sufficient, or second order conditionslater.The Lagrangian approach amounts to searching for points where:

• The constraint is satisfied.

• The constraint and the level curve of the objective function are tangent to one an-other.

If we have more than two variables, then the same intuition can be extended. For instance,with three variables, the Lagrangian conditions will say:

• The rate of substitution between any two variables along the objective function mustequal the rate of substitution along the constraint.

• The optimum point must be on the constraint.

Intuition for the Lagrangian Method

Consider the simplest case of the maximization of a function of two variables subject toone constraint:

maxx1,x2

f (x1, x2) subject to g (x1, x2) = 0.

Suppose that point

x∗ =(

x∗1x∗2

)is a constrained maximum. Therefore any small feasible change in x from this point, thatis, a small movement along the constraint, should not be able to improve the value of theobjective function. We represent small changes in x = (x1, x2)

T by differential notation

dx =

(dx1dx2

).

Then the first-order necessary conditions may be stated as follows:

fx1dx1 + fx2dx2 = 0 (7)

However, a feasible change in x does not change the value of the constraint. That is, theconstraint g (x1, x2) = 0 implies that

gx1dx1 + gx2dx2 = 0 (8)

36

and so dx1 and dx2 are no longer both arbitrary. We can take, e.g., dx1 as arbitrary, butthen dx2 must be chosen to satisfy (8).Taking the ratio of (7) and (8), it is clear that at the optimum

fx1

gx1

=fx2

gx2

≡ λ.

The Lagrange-multiplier method yields the same first-order necessary condition and theLagrange multiplier λ makes sure that both (7) and (8) are simultaneously satisfied.

Economic Interpretation of the Lagrangian Multiplier

Note that we did not compute λ in either consumer’s problem or firm’s problem. This isbecause our interest is in the values of x1 and x2 (or K and L). However, in some instances,it is useful to compute λ: this has an economic interpretation in terms of the shadow priceof the constraint.Suppose we have the problem

maxx1,...xn

f (x1, x2, ..., xn) subject to g (x1, x2, ..., xn) = 0

Suppose we now relax this constraint: instead of requiring g (x1, x2, ..., xn) = 0, we requireg(x, y) = δ where δ is a small positive number. Clearly, since the constraint has beenchanged, the value of the objective function must change. The question is: by how much?The answer to this question is given by λ. For this reason, λ is referred to as the shadowprice of the constraint. It tells us the rate at which the objective function increases if theconstraint is changed by a small amount.

Example 30. (Example 23 continued) In the consumer’s utility maximization problem, wecan compute

λ =1

2p1x−1/2

1 x1/22

=1

2p1

(12

mp1

)−1/2 (12

mp2

)1/2

=1

2 (p1p2)1/2 .

Thus, the shadow price of the constraint tells us that if we give a small amount of addi-tional income to the consumer, then his utility will go up by a factor of

λ =1

2 (p1p2)1/2 .

Thus λ represents a marginal utility of income.

37

4.3 Second Order Conditions

As with the unconstrained case, we need to check the second-order conditions to ensurewe have an optimum. As before, the second-order sufficient conditions for a maximum isd2 f < 0 and for a minimum is d2 f > 0. However, because of the constraint, it is no longersufficient to look at the Hessian of f to verify these conditions.Suppose we have a two-variable constrained optimization problem

maxx1,x2

f (x1, x2) or minx1,x2

f (x1, x2) subject to g (x1, x2) = 0.

The second order conditions for this problem differ slightly from the usual conditionsbecause of the constraint g (x1, x2) = 0 which implies that dx1 and dx2 must be chosento satisfy (8). Thus the second-order sufficient conditions for a maximum is that d2 f < 0subject to (8) and the second-order sufficient conditions for a minimum is that d2 f > 0subject to (8).In practice, to check the second-order sufficient conditions we need to compute the bor-dered Hessian matrix of the Lagrangian at the critical point that we want to check. TheLagrangian of the two-variable constrained optimization problem is

L (x1, x2, λ) = f (x1, x2) + λg (x1, x2) .

The bordered Hessian is the ’usual Hessian’, bordered by the derivatives of the constraintwith respect to the endogenous variables, here x1 and x2. That is,

HB =

0 g1 g2g1 L11 L12g2 L21 L22

.

The second order conditions state:

• If (x∗1 , x∗2 , λ∗) corresponds to a constrained maximum, then∣∣HB

∣∣ evaluated at (x∗1 , x∗2 , λ∗)must be positive.

• If (x∗1 , x∗2 , λ∗) corresponds to a constrained minimum, then∣∣HB

∣∣ evaluated at (x∗1 , x∗2 , λ∗)must be negative.

Example 31. (Example 23 continued) In the consumer’s utility maximization problem,

HB =

0 p1 p2

p1 −14 x−3/2

1 x1/22 −1

4 x−1/21 x−1/2

2p2 −1

4 x−1/21 x−1/2

2 −14 x1/2

1 x−3/22

.

38

In a general n-variable problem with m (m < n) constraints, (x∗, λ∗) that satisfies thefirst-order conditions is

• a local maximum if the last (n−m) leading principle minors of HB alternate in signbeginning with that of (−1)m+1;

• a local minimum if the last (n−m) leading principle minors of HB are of the samesign as (−1)m.

In both cases, HB must be evaluated at (x∗, λ∗) .There are also some global results for equality-constrained problems:

• If f (x1, ..., xn) is concave and all constraints are linear in (x1, ..., xn) , then a solutionto the constrained maximization problem is a global maximum.

• If f (x1, ..., xn) is convex and all constraints are linear in (x1, ..., xn) , then a solutionto the constrained minimization problem is a global minimum.

5 The Envelope Theorem

We are interested in studying how the value function of an optimization problem changeswhen one of the parameters of the problem changes. A very powerful tool for such inves-tigations if the envelope theorem.

5.1 The Envelope Theorem for Unconstrained Optimization

Suppose we have the unconstrained optimization problem

maxx1,x2

f (x1, x2; α)

where α is some exogenous parameter. Suppose that (x∗1 (α) , x∗2 (α)) solves this optimiza-tion problem. Note that the solution will depend upon α. The value function for this prob-lem is derived by substituting (x∗1 (α) , x∗2 (α)) into the objective function:

V (α) = f (x∗1 (α) , x∗2 (α) ; α) .

Notice that the value function is a function of the parameter α. Notice also that the valuefunction depends on α in two different ways:

1. Direct dependence.

39

2. Indirect dependence through x∗1 (α) and x∗2 (α) .

We are interested in knowing how the value function changes when α changes. When wedifferentiate the value function, we get:

dVdα

=∂ f∂x1

∂x∗1∂α

+∂ f∂x2

∂x∗2∂α

+∂ f∂α

,

where the partial derivatives of f are evaluated at the solution (x∗1 (α) , x∗2 (α)). Now notethat at the optimum (assuming we have an interior solution), it must be the case that

∂ f∂x1| x1 = x∗1 (α)

x2 = x∗2 (α)

= 0

and∂ f∂x2| x1 = x∗1 (α)

x2 = x∗2 (α)

= 0.

Hence, the first two terms drop out and we have

dVdα

=∂ f∂α

,

where the partial derivative is evaluated at the point (x∗1 (α) , x∗2 (α)). This result which iscalled the Envelope Theorem says in words: The total derivative of the value function withrespect to the parameter α is the same as the partial derivative of the objective function evaluatedat the optimal point.

Example 32. Consider the unconstrained problem:

maxx1,x2

4x1 + αx2 − x21 − x2

2 + x1x2.

The first order conditions:

4− 2x1 + x2 = 0α− 2x2 + x1 = 0.

Solving:

x∗1 =8 + α

3

x∗2 =2α + 4

3.

40

(We also need to check the second order conditions.)Substituting x∗1 and x∗2 into the objective function, we obtain the value function:

V (α) = 48 + α

3+ α

2α + 43− (8 + α)2

9− (2α + 4)2

9+

(8 + α) (2α + 4)9

.

By the Envelope Theorem:dVdα

= x∗2 =2α + 4

3.

5.2 The Envelope Theorem for Constrained Optimization

Now consider the constrained case. We can basically do the same as before. Consider theproblem,

maxx1,x2

f (x1, x2; α) subject to g (x1, x2; α) = 0.

The Lagrangian for this problem is,

L (x1, x2, λ; α) = f (x1, x2; α) + λg (x1, x2; α) .

Suppose that (x∗1 (α) , x∗2 (α) , λ∗ (α)) solves the constrained optimization problem. Thevalue function for this problem is defined as:

V (α) = f (x∗1 (α) , x∗2 (α) ; α) .

Let us write the value function as:

V (α) = f (x∗1 (α) , x∗2 (α) ; α) + λ∗ (α) g (x∗1 (α) , x∗2 (α) ; α) .

Differentiating with respect to α :

dVdα

=∂ f∂x1

∂x∗1∂α

+∂ f∂x2

∂x∗2∂α

+∂ f∂α

+dλ∗

dαg (x∗1 (α) , x∗2 (α) ; α)

+λ∗ (α)

[∂g∂x1

∂x∗1∂α

+∂g∂x2

∂x∗2∂α

+∂g∂α

],

where again all partial derivatives are evaluated at the solution (x∗1 (α) , x∗2 (α) , λ∗ (α)) .This can be written as

dVdα

=

[∂ f∂x1

+ λ∗∂g∂x1

]∂x∗1∂α

+

[∂ f∂x2

+ λ∗∂g∂x2

]∂x∗2∂α

+dλ∗

dαg (x∗1 (α) , x∗2 (α) ; α) +

∂ f∂α

+ λ∗ (α)∂g∂α

.

41

Note that the first two terms on the right hand side drop out because (x∗1 , x∗2 , λ∗) mustsatisfy the necessary conditions for constrained optimization. The third term drops outbecause g (x∗1 (α) , x∗2 (α) ; α) = 0. We are left with the following:

dVdα

=∂ f∂α

+ λ∗∂g∂α

=∂L∂α

(x∗1 , x∗2 , λ∗) .

In words: The derivative of the value function with respect to the parameter α is the partialderivative of the Lagrangian function with respect to α evaluated at the solution (x∗1 , x∗2 , λ∗).

5.3 Extended Example: Firm’s Cost Minimization Problem

Suppose that a firm’s production function is given by f (K, L) = K1/3L2/3, the price ofcapital is r and the price of labor is w. What is the least cost way for the firm to produce Qunits of output?Firm’s cost minimization problem can be stated as:

minK,L

rK + wL s.t Q = K1/3L2/3.

The solution to the problem is a conditional input demand as a function of r, w and Q, i.e.,Kc = K(r, w, Q) and Lc = L(r, w, Q), while the objective function evaluated at the opti-mum is a cost function that gives the cost of producing the required level of output Q :

c(r, w, Q) = rKc + wLc.

The Lagrangian for the firm’s problem is:

L = rK + wL− λ(

K1/3L2/3 −Q)

First order conditions:

∂L∂K

= r− λ

3K−2/3L2/3 = 0 (9)

∂L∂L

= w− 2λ

3K1/3L−1/3 = 0 (10)

∂L∂λ

= Q− K1/3L2/3 = 0 (11)

Taking ratio of 9 and 10 one obtains:

rw

=L

2K(12)

42

Substituting for K in 11:

Q =(w

2rL)1/3

L2/3

From here expressing L it follows:

Lc =

(2rw

)1/3

Q.

Substituting L∗ into the ratio of first order 12 conditions to express K in terms of parame-ters, one obtains:

Kc =(w

2r

)2/3Q.

The value function of firm’s cost minimization problem is called the cost function:

c(r, w, Q) = rKc + wLc

=

[r(w

2r

)2/3+ w

(2rw

)1/3]

Q.

The value function in this case depends on two parameters:(r, w). However, the Enve-lope theorem is still applicable. For instance, if we want to know how the value functionchanges when w changes, we simply treat r as a constant. Thus, by the Envelope Theorem,differentiating c(r, w, Q) with respect to r and w yields conditional input demands:(

∂c∂r∂c∂w

)=

(Kc

Lc

).

In producer theory this result is referred to as Shepard’s Lemma. You can confirm that theabove is exactly what you will get if you differentiate the value function directly.

6 Integration

The fundamental theorem of calculus is a theorem that links the concept of the derivativeof a function with the concept of the integral.

• The first part of the theorem, sometimes called the first fundamental theorem ofcalculus, is that an indefinite integration can be reversed by a differentiation. Thispart of the theorem is also important because it guarantees the existence of anti-derivatives for continuous functions.

43

• The second part, sometimes called the second fundamental theorem of calculus,is that the definite integral of a function can be computed by using any one of itsinfinitely many anti-derivatives. This part of the theorem has key practical applica-tions because it markedly simplifies the computation of definite integrals.

Integration is useful in economics:

• In microeconomics, consumer surplus, i.e., the difference between what a consumeris willing to pay and what he actually pays, is an integral.

• In macroeconomics, stock variable (e.g., capital) is an integral of a flow variable (e.g.,investment).

• In finance, stock price or net present value is an integral of a dividend flow.

• In probability and statistics, moments of random variables are integrals.

There are two types of integrals:

• Indefinite integrals can be seen as “anti-derivatives” that recover the original func-tion from the first derivative.

• Definite integrals calculate the area under a graph. In this form it is very similar toa sum, but of infinitely many, small parts.

6.1 Indefinite Integrals

We want to find a function F (x) that differentiates to f (x).

Example 33. Consider

f (x) = 3x2

In differentiation the Power Rule implies that if F (x) = xn, then F′ (x) = nxn−1. So guess:F (x) = x3, then F′ (x) = 3x2 = f (x). Hence F (x) = x3 is the

• anti-derivative

• primitive

• Integral of f (x)

44

The first fundamental theorem of calculus: Let f be a continuous real-valued functiondefined on a closed interval [a, b]. Let F be the function defined, for all x in [a, b], by

F (x) =∫ x

af (x̃)dx̃

Then, F is continuous on [a, b], differentiable on the open interval (a, b), and

F′(x) = f (x),

for all x in (a, b).

In

F (x) =∫

f (x)dx,

f (x) is known as the integrand.Is F (x) = x3 the only anti-derivative of f (x) = 3x2? No as d

dx (F (x) + c) = f (x) for anyconstant c. This arbitrary constant is called the constant of integration.

6.2 Rules of Integration

• Integration is linear:∫( f (x) + g (x)) dx =

∫f (x) dx +

∫g (x) dx.

• Power function rule: For n 6= −1

f (x) = axn ⇒ F (x) =∫

f (x)dx =a

n + 1xn+1 + c.

Example 34. f (x) = 3x2 ⇒ F(x) =∫

f (x)dx = x3 + cf (x) = 5= 5x0 ⇒ F(x) =

∫f (x)dx = 5x + c

• There is no general product rule, but∫k f (x)dx = k

∫f (x)dx.

• Exponential Rule: Recall that ddx ekx = kekx. Then,

f (x) = aekx ⇒ F (x) =∫

f (x)dx =ak

ekx + c.

45

Example 35. f (x) = 6e2x ⇒ F(x) =∫

f (x)dx = 3e2x + c

• Log Rule: Recall that ddx ln (x) = 1

x . Then,

f (x) =1x⇒ F (x) =

∫f (x)dx = ln (x) + c.

Example 36. f (x) = 5x+2 ⇒ F (x) =

∫f (x)dx = 5 ln(x + 2) + c

• The Substitution Rule: This technique operates through a “change of a variable”which converts an intractable integral into a form where it can be solved.∫

f (u)dudx

dx =∫

f (u) du = F(u) + c.

This is the “inverse” of the chain rule of differentiation.

Example 37. Find∫

3x2(x3 + 1)dx. Let u = x3, then∫3x2(x3 + 1)dx =

∫ dudx

(u + 1)dx

=∫

(u + 1)du (by substitution rule)

=u2

2+ u + c

=x6

2+ x3 + c

• Integration by Parts: ∫vdu = uv−

∫udv

This is a direct consequence of the product rule of differentiation. Recall that

(uv)′ = u′v + uv′.

Integrating both sides of the above expression gives∫(uv)′ dx =

∫u′vdx +

∫uv′dx.

Since by definition of an integral∫(uv)′ dx = uv,∫

u′vdx = uv−∫

uv′dx.

The first term on the RHS is the product of the integral of u and v and the second term is theintegral of a product function which consists of the integral of u and the derivative of v.

46

Figure 13: A definite integral of f (x) over the interval [a, b] as an area under the curve.

Example 38. Find∫

ln (x) dx. Let v = ln (x) , u = x ⇒ dv = 1x dx, du = dx, then∫

ln (x)dx =∫

vdu

= uv−∫

udv by integration by parts

= x ln (x)−∫

1dx

= x ln (x)− x + c.

6.3 Definite Integral

Let f (x) be a continuous function on the interval [a, b], where a and b are real numberswith a < b. A definite integral of f (x) over the interval [a, b] gives the area underneaththe graph of the function between a and b, where the parts below the x-axis are subtracted.What is the area bounded by the curve y = f (x), the vertical lines x = a and x = b and thex-axis? A first approximation of this area can be obtained by cutting the x-axis betweena and b into intervals of equal length and thus creating rectangles of equal width, wherethe top right-hand corner touches the curve y = f (x) (see yellow rectangles in Figure 14).Thus, if we split the interval [a, b] into 5 subintevals {[x0, x1] , [x1, x2] , ..., [x4, x5]}, wherex0 = a and x5 = b, the sum of the rectangle areas is

(x1 − x0) f (x1) + (x2 − x1) f (x2) + ... + (x5 − x4) f (x5) =5

∑i=1

(xi − xi−1) f (xi) .

47

Figure 14: A definite integral of f (x) over the interval [a, b] as a sum.

However, this method of estimating the area leads to some errors thereby we either over-estimate (like in Figure 14) or underestimate the area (in Figure 14 green rectangles under-estimate the area as the height of the rectangle is the value of the function at the left-handboundary of the subinterval). We can reduce these errors by creating many sub-intervals(see green rectangles in Figure 14). This suggests that a definite integral of f (x) over theinterval [a, b] can be viewed as the limit of the sum of the areas of the rectangles as the size ofeach rectangle gets infinitesimally small and the number infinitely large.The intuition that the integral is the area under the graph is sufficient for (almost) alleconomics and finance. For example, consumer surplus is an area under the demandcurve in price/quantity space. In macroeconomics and finance examples from Section 6flow variables are graphed against time (i.e., with time on x-axis).

The second fundamental theorem of calculus: Let f and F be real-valued functionsdefined on a closed interval [a, b] such that the derivative of F is f , i.e., f and F are suchthat for all x in [a, b],

F′(x) = f (x).

Then, ∫ b

af (x)dx = [F(x)]ba = F (b)− F (a) .

As discussed earlier, if F (x) is an anti-derivative of f , then G (x) := F (x) + c is also ananti-derivative of f for any constant c. However, the value of the definite integral does not

48

depend on the choice of the anti-derivative, as

G (b)− G (a) = F (b) + c− F (a)− c = F (b)− F (a) .

So in practical terms, we can then just ignore the constant term when evaluating definiteintegrals.Process of calculating a definite integral:

1. Determine indefinite integral

2. Evaluate at boundaries

3. Subtract F(a) from F(b)

Example 39. Find∫ 1

0 xdx.

1. F(x) =∫

xdx = x2

2

2. F(0) = 02

2 = 0, F(1) = 12

2 = 12

3. ⇒∫ 1

0 xdx =[

x2

2

]1

0= 1

2 − 0 = 12

Properties of definite integral

1.∫ a

b f (x)dx = F(a)− F(b) = −(F(b)− F(a)) = −∫ b

a f (x)dx

2.∫ a

a f (x)dx = F(a)− F(a) = 0

3.∫ c

a f (x)dx =∫ b

a f (x)dx +∫ c

b f (x)dx (a < b < c)

4.∫ b

a k f (x)dx = k∫ b

a f (x)dx

5.∫ b

a [ f (x) + g(x)]dx =∫ b

a f (x)dx +∫ b

a g(x)dx

Example 40. Calculate∫ 4

91

2√

x dx:

∫ 4

9

12√

xdx =

[√x]4

9

=√

4−√

9= 2− 3 = −1.

49

Calculate∫ 9

41

2√

x dx:

∫ 9

4

12√

xdx =

[√x]9

4

=√

9−√

4= 3− 2 = 1

Another example:

Example 41. Calculate∫ e

1 ln (x)dx:

∫ e

1ln (x)dx = [x ln (x)− x]e1

= e ln (e)− e− (1 ln (1)− 1)= e− e− 0 + 1 = 1

Sometimes we need to take integrals when the interval is not bounded. For example:

• Evaluating the present value of an infinite stream of benefits of a financial asset.

• Evaluating the consumer surplus of a constant elasticity demand function q = apε,as this demand curve it does not hit the y-axis.

In this case, ∫ ∞

af (x)dx = lim

y→∞F (y)− F (a) .

6.4 An Application: Continuous Compounding

In finance, the present value of an asset can be approximated as a definite integral. Con-sider a continuous stream of income c for T years. Since a pound today is not the same ashaving it a year from now, we discount future income. If the discount rate is r , then theincome c received t years into the future is worth

c(1− r)t

in today’s terms. Thus, the present value of an asset paying c every year into the future is

PV =c

(1− r)0 +c

(1− r)1 + ... +c

(1− r)T .

50

When time becomes ’continuous’ it can be shown that the present value of an asset payingan amount c at time t into the future is ce−rt . In this case, the present value of the asset is

PV =∫ T

0ce−rtdt

= c∫ T

0e−rtdt

= c[−1

re−rt

]T

0

=cr

(1− e−rT

).

Note that for an infinitely lived asset

PV =∫ ∞

0ce−rtdt

= limT→∞

cr

(1− e−rT

)=

cr

.

This follows because e−rT goes to zero as T becomes very large.

51

september math course: multivariate calculus · september math course: multivariate calculus ......

Documents