peter ireland lecture note 1

Upload: eskay-hong

Post on 03-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Peter Ireland Lecture Note 1

    1/22

    The Kuhn-Tucker and Envelope Theorems

    Peter Ireland

    EC720.01 - Math for EconomistsBoston College, Department of Economics

    Fall 2012

    The Kuhn-Tucker and envelope theorems can be used to characterize the solution toa wide range of constrained optimization problems: static or dynamic, and under perfect

    foresight or featuring randomness and uncertainty. In addition, these same two resultsprovide foundations for the work on the maximum principle and dynamic programming thatwe will do later on. For both of these reasons, the Kuhn-Tucker and envelope theoremsprovide the starting point for our analysis. Lets consider each in turn, first in fairly generalor abstract settings and then applied to some economic examples.

    1 The Kuhn-Tucker Theorem

    References:

    Dixit, Chapters 2 and 3.Simon-Blume, Chapters 18 and 19.

    Acemoglu, Appendix A.

    Consider a simple constrained optimization problem:

    x R choice variable

    F :R R objective function, continuously differentiable

    c G(x) constraint, withc R and G: R R, also continuously differentiable.

    The problem can be stated as:

    maxx

    F(x) subject toc G(x)

    Copyright c2012 by Peter Ireland. Redistribution is permitted for educational and research purposes,so long as no changes are made. All copies must be provided free of charge and must include this copyright

    notice.

    1

  • 8/12/2019 Peter Ireland Lecture Note 1

    2/22

    This problem is simple because it is static and contains no random or stochastic elementsthat would force decisions to be made under uncertainty. This problem is also simplebecause it has a single choice variable and a single constraint. All these simplificationswill make our statement and proof of the Kuhn-Tucker theorem as clean and intuitiveas possible. But the results can be generalized along all of these dimensions and,

    throughout the semester, we will work through examples that do so.

    Probably the easiest way to solve this problem is via the method of Lagrange multipliers.The mathematical foundations that allow for the application of this method are givento us by Lagranges Theorem or, in its most general form, the Kuhn-Tucker Theorem.

    To prove this theorem, begin by defining the Lagrangian:

    L(x, ) =F(x) +[c G(x)]

    for anyx R and R.

    Theorem (Kuhn-Tucker) Suppose that x maximizesF(x) subject to c G(x), whereF and G are both continuously differentiable, and suppose that G(x) = 0. Thenthere exists a value of such that x and satisfy the following four conditions:

    L1(x, ) =F(x) G(x) = 0, (1)

    L2(x, ) =c G(x) 0, (2)

    0, (3)

    and[c G(x)] = 0. (4)

    ProofConsider two possible cases, depending on whether or not the constraint is bindingat x.

    Case 1: Nonbinding Constraint.

    Ifc > G(x), then let = 0. Clearly, (2)-(4) are satisfied, so it only remains to showthat (1) must hold. With = 0, (1) holds if and only if

    F(x) = 0. (5)

    We can show that (5) must hold using a proof by contradiction. Suppose thatinstead of (5), it turns out that

    F(x)0 such that

    F(x )> F(x) andc > G(x ).

    2

  • 8/12/2019 Peter Ireland Lecture Note 1

    3/22

    But this result contradicts the assumption that x maximizes F(x) subject toc G(x). Similarly, if it turns out that

    F(x)>0,

    then by the continuity ofF andG there must exist an >0 such that

    F(x +)> F(x) andc > G(x +),

    But, again, this result contradicts the assumption that x maximizesF(x) subjecttoc G(x). This establishes that (5) must hold, completing the proof for case 1.

    Case 2: Binding Constraint.

    Ifc = G(x), then let = F(x)/G(x). This is possible, given the assumptionthatG(x)= 0. Clearly, (1), (2), and (4) are satisfied, so it only remains to showthat (3) must hold. With =F(x)/G(x), (3) holds if and only if

    F(x)/G(x) 0. (6)

    We can show that (6) must hold using a proof by contradiction. Suppose thatinstead of (6), it turns out that

    F(x)/G(x)< 0.

    One way that this can happen is if F(x) > 0 and G(x) < 0. But if theseconditions hold, then the continuity ofF andG implies the existence of an >0such that

    F(x

    +)> F(x

    ) andc= G(x

    )> G(x

    +),which contradicts the assumption that x maximizesF(x) subject to c G(x).And if, instead, F(x)/G(x)< 0 because F(x) < 0 and G(x)> 0, then thecontinuity ofF andG implies the existence of an >0 such that

    F(x )> F(x) andc= G(x)> G(x ),

    which again contradicts the assumption that x maximizes F(x) subject to c G(x). This establishes that (6) must hold, completing the proof for case 2.

    Notes:

    a) The theorem can be extended to handle cases with more than one choice variableand more than one constraint: see Dixit, Simon-Blume, Acemoglu, or section 4.1of the notes below.

    b) Equations (1)-(4) are necessary conditions: Ifx is a solution to the optimizationproblem, then there exists a such that (1)-(4) must hold. But (1)-(4) are notsufficient conditions: ifx and satisfy (1)-(4), it does not follow automaticallythatx is a solution to the optimization problem.

    3

  • 8/12/2019 Peter Ireland Lecture Note 1

    4/22

    Despite point (b) listed above, the Kuhn-Tucker theorem is extremely useful in practice.Suppose that we are looking for the solution x to the constrained optimization problem

    maxx

    F(x) subject toc G(x).

    The theorem tells us that if we form the Lagrangian

    L(x, ) =F(x) +[c G(x)],

    then x and the associated must satisfy the first-order condition (FOC) obtainedby differentiatingL by x and setting the result equal to zero:

    L1(x, ) =F(x) G(x) = 0, (1)

    In addition, we know that x must satisfy the constraint:

    c G(x). (2)

    We know that the Lagrange multiplier must be nonnegative:

    0. (3)

    And finally, we know that the complementary slackness condition

    [c G(x)] = 0, (4)

    must hold: If >0, then the constraint must bind; if the constraint does not bind,then = 0.

    In searching for the value ofx that solves the constrained optimization problem, we onlyneed to consider values ofx that satisfy (1)-(4).

    Two pieces of terminology:

    a) The extra assumption that G(x) = 0 is needed to guarantee the existence of amultiplier satisfying (1)-(4). This extra assumption is called the constraintqualification, and almost always holds in practice.

    b) Note that (1) is a FOC for x, while (2) is like a FOC for . In most applications,the second-order conditions (SOC) will imply thatx maximizesL(x, ), while

    minimizesL(x, ). For this reason, (x, ) is typically a saddle-point ofL(x, ).

    Thus, in solving the problem in this way, we are using the Lagrangian to turn a constrainedoptimization problem into an unconstrained optimization problem, where we seek tomaximizeL(x, ) rather than simply F(x).

    One final note:

    4

  • 8/12/2019 Peter Ireland Lecture Note 1

    5/22

    Our general constraint,c G(x), nests as a special case the nonnegativity constraintx 0, obtained by settingc= 0 and G(x) =x.

    So nonnegativity constraints can be introduced into the Lagrangian in the same wayas all other constraints. If we consider, for example, the extended problem

    maxx F(x) subject toc G(x) andx 0,

    then we can introduce a second multiplier , form the Lagrangian as

    L(x,,) =F(x) +[c G(x)] +x,

    and write the first order condition for the optimal x as

    L1(x, , ) =F(x) G(x) + = 0. (1)

    In addition, analogs to our earlier conditions (2)-(4) must also hold for the secondconstraint: x 0, 0, andx = 0.

    Kuhn and Tuckers original statement of the theorem, however, does not incorporatenonnegativity constraints into the Lagrangian. Instead, even with the additionalnonnegativity constraint x 0, they continue to define the Lagrangian as

    L(x, ) =F(x) +[c G(x)].

    If this case, the first order condition for x must be modified to read

    L1(x, ) =F(x) G(x) 0, with equality ifx >0. (1)

    Of course, in (1), 0 in general and = 0 ifx >0. So a close inspection revealsthat these two approaches to handling nonnegativity constraints lead in the end

    to the same results.

    2 The Envelope Theorem

    References:

    Dixit, Chapter 5.

    Simon-Blume, Chapter 19.

    Acemoglu, Appendix A.

    In our discussion of the Kuhn-Tucker theorem, we considered an optimization problem ofthe form

    maxx

    F(x) subject toc G(x)

    Now, lets generalize the problem by allowing the functions F and G to depend on aparameter R. The problem can now be stated as

    maxx

    F(x, ) subject toc G(x, )

    5

  • 8/12/2019 Peter Ireland Lecture Note 1

    6/22

    For this problem, define the maximum value function V :R R as

    V() = maxx

    F(x, ) subject to c G(x, )

    Note that evaluating V requires a two-step procedure:

    First, given , find the value ofx that solves the constrained optimization problem.

    Second, substitute this value ofx, together with the given value of, into the objec-tive function to obtain

    V() =F(x, )

    Now suppose that we want to investigate the properties of this function V. Suppose, inparticular, that we want to take the derivative ofVwith respect to its argument .

    As the first step in evaluatingV (), consider solving the constrained optimization problemfor any given value of by setting up the Lagrangian

    L(x, ) =F(x, ) +[c G(x, )]

    We know from the Kuhn-Tucker theorem that the solution x to the optimization problemand the associated value of the multiplier must satisfy the complementary slacknesscondition:

    [c G(x, )] = 0

    Use this last result to rewrite the expression for V as

    V() =F(x, ) =F(x, ) +[c G(x, )]

    So suppose that we tried to calculate V () simply by differentiating both sides of thisequation with respect to :

    V () =F2(x, ) G2(x

    , ).

    But, in principle, this formula may not be correct. The reason is that x and willthemselves depend on the parameter , and we must take this dependence into accountwhen differentiating Vwith respect to .

    However, the envelope theorem tells us that our formula for V

    () is, in fact, correct. Thatis, the envelope theorem tells us that we can ignore the dependence ofx and onin calculatingV ().

    To see why, for any, letx() denote the solution to the problem: max F(x, ) subject toc G(x, ), and let () be the associated Lagrange multiplier.

    6

  • 8/12/2019 Peter Ireland Lecture Note 1

    7/22

    Theorem (Envelope) Let F and G be continuously differentiable functions ofx and .For any given , let x() maximize F(x, ) subject to c G(x, ), and let () bethe associated value of the Lagrange multiplier. Suppose, further, that x() and()are also continuously differentiable functions, and that the constraint qualificationG1[x(), ]= 0 holds for all values of. Then the maximum value function defined by

    V() = maxx

    F(x, ) subject to c G(x, )

    satisfiesV () =F2[x

    (), ] ()G2[x(), ]. (7)

    ProofThe Kuhn-Tucker theorem tells us that for any given value of, x() and ()must satisfy

    L1[x(), ()] =F1[x

    (), ] ()G1[x(), ] = 0, (1)

    and (){c G[x(), ]}= 0. (4)

    In light of (4),V() =F[x(), ] =F[x(), ] +(){c G[x(), ]}

    Differentiating both sides of this expression with respect to yields

    V () = F1[x(), ]x() +F2[x

    (), ]

    +(){c G[x(), ]} ()G1[x(), ]x() ()G2[x

    (), ]

    which shows that, in principle, we must take the dependence ofx

    and

    on intoaccount when calculating V ().

    Note, however, that

    V () = {F1[x(), ] ()G1[x

    (), ]}x()

    +F2[x(), ] +(){c G[x(), ]} ()G2[x

    (), ],

    which by (1) reduces to

    V () =F2[x(), ] +(){c G[x(), ]} ()G2[x

    (), ]

    Thus, it only remains to show that

    (){c G[x(), ]}= 0 (8)

    Clearly, (8) holds for any such that the constraint is binding.

    7

  • 8/12/2019 Peter Ireland Lecture Note 1

    8/22

    For such that the constraint is not binding, (4) implies that () must equal zero.Furthermore, by the continuity ofG and x, if the constraint does not bind at , thereexists an > 0 such that the constraint does not bind for all + with > ||.Hence, (4) also implies that (+) = 0 for all >||. Using the definition of thederivative

    () = lim0

    (+)

    ()

    = lim0

    0

    = 0,

    it once again becomes apparent that (8) must hold.

    Thus,V () =F2[x

    (), ] ()G2[x(), ]

    as claimed in the theorem.

    Once again, this theorem is useful because it tells us that we can ignore the dependence ofx and on in calculating V ().

    And once again, the theorem can be extended to apply in more general settings: see Dixit,Simon-Blume, Acemoglu, or section 4.2 of the notes below.

    But what is the intuition for why the envelope theorem holds? To obtain some intuition,begin by considering the simpler, unconstrained optimization problem:

    maxx

    F(x, ),

    wherex is the choice variable and is the parameter.

    Associated with this unconstrained problem, define the maximum value function in the

    same way as before: V() = maxx

    F(x, ).

    To evaluate V for any given value of, use the same two-step procedure as before. First,find the valuex() that solves the unconstrained maximization problem for that valueof . Second,substitute that value ofx back into the objective function to obtain

    V() =F[x(), ].

    Now differentiate both sides of this expression through by , carefully taking the dependenceofx on into account:

    V () =F1[x(), ]x() +F2[x

    (), ].

    But, ifx() is the value ofx that maximizes F given , we know that x() must be acritical point ofF:

    F1[x(), ] = 0.

    8

  • 8/12/2019 Peter Ireland Lecture Note 1

    9/22

    Hence, for the unconstrained problem, the envelope theorem implies that

    V() =F2[x(), ],

    so that, again, we can ignore the dependence ofx on in differentiating the maximum

    value function. And this result holds not because x

    fails to depend on : to thecontrary, in fact,x will typically depend on through the functionx(). Instead, theresult holds because sincex is chosen optimally, x() is a critical point ofF given.

    Now return to the constrained optimization problem

    maxx

    F(x, ) subject toc G(x, )

    and define the maximum value function as before:

    V() = maxx

    F(x, ) subject to c G(x, ).

    The envelope theorem for this constrained problem tells us that we can also ignore thedependence ofx on when differentiatingVwith respect to , but only if we start byadding the complementary slackness condition to the maximized objective function tofirst obtain

    V() =F[x(), ] +(){c G[x(), ]}.

    In taking this first step, we are actually evaluating the entire Lagrangian at the optimum,instead of just the objective function. We need to take this first step because for theconstrained problem, the Kuhn-Tucker condition (1) tells us that x() is a criticalpoint, not of the objective function by itself, but of the entire Lagrangian formed by

    adding the product of the multiplier and the constraint to the objective function.

    And what gives the envelope theorem its name? The envelope theorem refers to a geo-metrical presentation of the same result that weve just worked through.

    To see where that geometrical interpretation comes from, consider again the simpler, un-constrained optimization problem:

    maxx

    F(x, ),

    wherex is the choice variable and is a parameter.

    Following along with our previous notation, let x() denote the solution to this problemfor any given value of, so that the function x() tells us how the optimal choice ofxdepends on the parameter .

    Also, continue to define the maximum value function V in the same way as before:

    V() = maxx

    F(x, ).

    9

  • 8/12/2019 Peter Ireland Lecture Note 1

    10/22

    Now let1denote a particular value of, and letx1denote the optimal value ofxassociatedwith this particular value 1. That is, let

    x1 = x(1).

    After substituting this value ofx1into the functionF, we can think about howF(x1, )

    varies as variesthat is, we can think about F(x1, ) as a function of, holding x1fixed.

    In the same way, let 2 denote another particular value of, with 2 > 1 lets say. Andfollowing the same steps as above, letx2 denote the optimal value ofx associated withthis particular value 2, so that

    x2 = x(2).

    Once again, we can hold x2 fixed and consider F(x2, ) as a function of.

    The geometrical presentation of the envelope theorem can be derived by thinking about theproperties of these three functions of : V(), F(x1, ), andF(x2, ).

    One thing that we know about these three functions is that for = 1:

    V(1) =F(x1, 1)> F(x2, 1),

    where the first equality and the second inequality both follow from the fact that, bydefinition,x1 maximizesF(x, 1) by choice ofx.

    Another thing that we know about these three functions is that for = 2:

    V(2) =F(x2, 2)> F(x1, 2),

    because again, by definition, x2 maximizesF(x, 2) by choice ofx.On a graph, these relationships imply that:

    At1,V() coincides with F(x1, ), which lies above F(x2, ).

    At2,V() coincides with F(x2, ), which lies above F(x1, ).

    And we could find more and more values ofVby repeating this procedure for moreand more specific values ofi, i = 1, 2, 3,....

    In other words:

    V() traces out the upper envelope of the collection of functions F(xi, ), formedby holding xi = x(i) fixed and varying .

    Moreover,V() is tangent to each individual functionF(xi, ) at the value i of forwhichxi is optimal, or equivalently:

    V () =F2[x(), ],

    which is the same analytical result that we derived earlier for the unconstrainedoptimization problem.

    10

  • 8/12/2019 Peter Ireland Lecture Note 1

    11/22

    1 2

    V()

    F(x1,)

    F(x2,)

    The Envelope Theorem

  • 8/12/2019 Peter Ireland Lecture Note 1

    12/22

    To generalize these arguments so that they apply to the constrained optimization problem

    maxx

    F(x, ) subject toc G(x, ),

    simply use the fact that in most cases (where the appropriate second-order conditionshold) the value x() that solves the constrained optimization problem for any givenvalue of also maximizes the Lagrangian function

    L(x,,) =F(x, ) +[c G(x, )],

    so that

    V() = maxx

    F(x, ) subject to c G(x, )

    = maxx

    L(x,,)

    Now just replace the function F with the function L in working through the argumentsfrom above to conclude that

    V () =L3[x(), (), ] =F2[x

    (), ] ()G2[x(), ],

    which is again the same result that we derived before for the constrained optimizationproblem.

    3 Two Examples

    3.1 Utility Maximization

    A consumer has a utility function defined over consumption of two goods: U(c1, c2)

    Prices: p1 and p2

    Income: I

    Budget constraint: Ip1c1+p2c2 = G(c1, c2)

    The consumers problem is:

    maxc1,c2

    U(c1, c2) subject to Ip1c1+p2c2

    The Kuhn-Tucker theorem tells us that if we set up the Lagrangian:

    L(c1, c2, ) =U(c1, c2) +(Ip1c1 p2c2)

    Then the optimal consumptions c1 andc

    2 and the associated multiplier must satisfy the

    FOC:L1(c

    1, c

    2, ) =U1(c

    1, c

    2) p1= 0

    andL2(c

    1, c

    2, ) =U2(c

    1, c

    2) p2= 0

    11

  • 8/12/2019 Peter Ireland Lecture Note 1

    13/22

    Move the terms with minus signs to the other side, and divide the first of these FOC bythe second to obtain

    U1(c1, c

    2)

    U2(c1, c

    2)=

    p1p2

    ,

    which is just the familiar condition that says that the optimizing consumer should set

    the slope of his or her indifference curve, the marginal rate of substitution, equal tothe slope of his or her budget constraint, the ratio of prices.

    Now consider I as one of the models parameters, and let the functions c1(I), c

    2(I), and(I) describe how the optimal choices c1 and c

    2 and the associated value of the

    multiplier depend onI.

    In addition, define the maximum value function as

    V(I) = maxc1,c2

    U(c1, c2) subject to Ip1c1+p2c2

    The Kuhn-Tucker theorem tells us that

    (I)[I p1c

    1(I) p2c

    2(I)] = 0

    and hence

    V(I) =U[c1(I), c

    2(I)] =U[c

    1(I), c

    2(I)] +(I)[I p1c

    1(I) p2c

    2(I)].

    The envelope theorem tells us that we can ignore the dependence of c1 and c

    2 on I incalculating

    V (I) =(I),

    which gives us an interpretation of the multiplier as the marginal utility of income.

    3.2 Cost Minimization

    The Kuhn-Tucker and envelope conditions can also be used to study constrained minimiza-tion problems.

    Consider a firm that produces output y using capital k and labor l, according to thetechnology described by

    f(k, l) y.

    r = rental rate for capital

    w = wage rate

    Suppose that the firm takes its output y as given, and chooses inputsk and l to minimizecosts. Then the firm solves

    mink,l

    rk+wl subject to f(k, l) y

    12

  • 8/12/2019 Peter Ireland Lecture Note 1

    14/22

    If we set up the Lagrangian as

    L(k,l ,) =rk+wl [f(k, l) y],

    where the term involving the multiplieris subtracted rather than added in the case of

    a minimization problem, the Kuhn-Tucker conditions (1)-(4) continue to apply, exactlyas before.

    Thus, according to the Kuhn-Tucker theorem, the optimal choices k and l and the asso-ciated multiplier must satisfy the FOC:

    L1(k, l, ) =r f1(k

    , l) = 0 (9)

    andL2(k

    , l, ) =w f2(k, l) = 0 (10)

    Move the terms with minus signs over to the other side, and divide the first FOC by the

    second to obtainf1(k, l)

    f2(k, l)=

    r

    w,

    which is another familiar condition that says that the optimizing firm chooses factorinputs so that the marginal rate of substitution between inputs in production equalsthe ratio of factor prices.

    Now suppose that the constraint binds, as it usually will:

    y= f(k, l) (11)

    Then (9)-(11) represent 3 equations that determine the three unknowns k, l, and asfunctions of the models parameters r, w, and y. In particular, we can think of thefunctions

    k =k(r,w,y)

    andl =l(r,w,y)

    as demand curves for capital and labor: strictly speaking, they are conditional (on y)factor demand functions.

    Now define the minimum cost function as

    C(r,w,y) = mink,l

    rk+wl subject to f(k, l) y

    = rk(r,w,y) +wl(r,w,y)

    = rk(r,w,y) +wl(r,w,y) (r,w,y){f[k(r,w,y), l(r,w,y)] y}

    The envelope theorem tells us that in calculating the derivatives of the cost function, wecan ignore the dependence ofk,l, and on r, w, andy.

    13

  • 8/12/2019 Peter Ireland Lecture Note 1

    15/22

    Hence:C1(r,w,y) =k

    (r,w,y),

    C2(r,w,y) =l(r,w,y),

    and

    C3(r,w,y) =

    (r,w,y).

    The first two of these equations are statements of Shephards lemma; they tell us thatthe derivatives of the cost function with respect to factor prices coincide with theconditional factor demand curves. The third equation gives us an interpretation of themultiplier as a measure of the marginal cost of increasing output.

    Thus, our two examples illustrate how we can apply the Kuhn-Tucker and envelope theoremsin specific economic problems.

    The two examples also show how, in the context of specific economic problems, it is oftenpossible to attach an economic interpretation to the multiplier .

    4 Generalizing the Basic Results

    4.1 The Kuhn-Tucker Theorem

    Our simple version of the Kuhn-Tucker theorem applies to a problem with only one choicevariable and one constraint.

    Section 19.6 of Simon and Blumes book develops a proof for the more general case, withn choice variables and m constraints. Their proof makes repeated, clever use of the

    implicit function theorem, which makes the arguments surprisingly short but also worksto obscure some of the intuition provided by the analysis of the simplest case.

    Nevertheless, having gained the intuition the intuition from working through the simplecase, it is useful to see how the result extends.

    Simon and Blume (Chapter 15) and Acemoglu (Appendix A) both present fairly generalstatements of the implicit function theorem. The special case or application of theirresults that we will need works as follows.

    Consider a system ofn equations in nvariables:

    H1(y1, y2, . . . , yn) = c1,

    H2(y1, y2, . . . , yn) = c2,...

    Hn(y1, y2, . . . , yn) = cn.

    The functions may have other arguments exogenous variables but since thesewill be held fixed, notation referring to them can be suppressed.

    14

  • 8/12/2019 Peter Ireland Lecture Note 1

    16/22

    Now evaluate these equations at a specific set of values y1, y

    2, . . . , y

    n to obtain

    H1(y

    1, y

    2, . . . , y

    n) = c

    1,

    H2(y

    1, y

    2, . . . , y

    n) = c

    2,...

    Hn(y

    1, y

    2, . . . , y

    n) = c

    n.

    Suppose that each function Hi, i = 1, . . . , n, is continuously differentiable and that then nmatrix of derivatives

    H1/y1 H1/ynH2/y1 H2/yn

    ... . . .

    ...Hn/y1 Hn/yn

    is nonsingular at y1, y

    2, . . . , y

    n.

    Then there exist continuously differentiable functions

    y1(c1, c2, . . . , cn),

    y2(c1, c2, . . . , cn),...

    yn(c1, c2. . . , cn),

    defined in an open subset CofRn containing (c1, c

    2, . . . , c

    n), such that

    H1(y1(c1, c2, . . . , cn), y2(c1, c2, . . . , cn), . . . , yn(c1, c2, . . . , cn)) = c1,

    H2(y1(c1, c2, . . . , cn), y2(c1, c2, . . . , cn), . . . , yn(c1, c2, . . . , cn)) = c2,...

    Hn(y1(c1, c2, . . . , cn), y2(c1, c2, . . . , cn), . . . , yn(c1, c2, . . . , cn)) = cn.

    for all (c1, c2, . . . , cn) C.

    With this result in hand, consider the following generalized version of the Kuhn-Tuckertheorem we proved earlier. Let there benchoice variables,x1, x2, . . . , xn. The objectivefunctionF :Rn R is continuously differentiable, as are the m functions Gj :R

    n R,j = 1, 2, . . . , mthat enter into the constraints

    cj Gj(x1, x2, . . . , xn),wherecj R for allj = 1, 2, . . . , m.

    The problem can be stated as:

    maxx1,x2,...,xn

    F(x1, x2, . . . , xn) subject to cj Gj(x1, x2, . . . , xn) for all j = 1, 2, . . . , m .

    Note that, typically, m n will have to hold in order so that there is a set of valuesfor the choice variables that satisfy all of the constraints.

    15

  • 8/12/2019 Peter Ireland Lecture Note 1

    17/22

    To define the Lagrangian, introduce the multipliers j, j = 1, 2, . . . , m, one for each con-straint. Then

    L(x1, x2, . . . , xn, 1, 2, . . . , m) =F(x1, x2, . . . , xn) +m

    j=1

    j[cj Gj(x1, x2, . . . , xn)].

    Theorem (Kuhn-Tucker) Suppose that x1, x

    2, . . . , x

    n maximize F(x1, x2, . . . , xn) sub-ject to cj Gj(x1, x2, . . . , xn) for all j = 1, 2, . . . , m, where F and the Gj s are allcontinuously differentiable. Suppose (without loss of generality) that the first m mconstraints bind at the optimum and that the remaining m m 0 constraints arenonbinding, and assume that the m nmatrix of derivatives

    G1,1(x

    1, x

    2, . . . , x

    n) . . . G1,n(x

    1, x

    2, . . . , x

    n)G2,1(x

    1, x

    2, . . . , x

    n) . . . G2,n(x

    1, x

    2, . . . , x

    n)...

    . . . ...

    Gm,1(x1, x

    2, . . . , x

    n) . . . Gm,n(x

    1, x

    2, . . . , x

    n)

    , (12)

    where Gj,i = Gj/xi, has rank m. Then there exist values

    1,

    2, . . . ,

    m that, to-gether with x1, x

    2, . . . , x

    n, satisfy:

    Li(x

    1, x

    2, . . . , x

    n,

    1,

    2, . . . ,

    n) = Fi(x

    1, x

    2, . . . , x

    n)

    m

    j=1

    jGj,i(x

    1, x

    2, . . . , x

    n) = 0 (13)

    fori= 1, 2, . . . , n,

    Ln+j(x

    1, x

    2, . . . , x

    n,

    1,

    2, . . . ,

    n) =cj Gj(x

    1, x

    2, . . . , x

    n) 0, (14)

    forj= 1, 2, . . . , m,j 0, (15)

    forj= 1, 2, . . . , m, and

    j [cj Gj(x

    1, x

    2, . . . , x

    n)] = 0, (16)

    forj= 1, 2, . . . , m.

    Proof To begin, set the multipliers m+1,

    m+2, . . . ,

    m associated with the nonbindingcontraints equal to zero. Since each of the functions Gj, j = m+ 1, m+ 2, . . . , m, iscontinuously differentiable, sufficiently small adjustments in the choice variables can

    be made without causing these m mconstraints to become binding.Next, note that the m+ 1 nmatrix

    F1(x1, x

    2, . . . , x

    n) . . . F n(x

    1, x

    2, . . . , x

    n)G1,1(x

    1, x

    2, . . . , x

    n) . . . G1,n(x

    1, x

    2, . . . , x

    n)G2,1(x1, x

    2, . . . , x

    n) . . . G2,n(x

    1, x

    2, . . . , x

    n)...

    . . . ...

    Gm,1(x1, x

    2, . . . , x

    n) . . . Gm,n(x

    1, x

    2, . . . , x

    n)

    . (17)

    16

  • 8/12/2019 Peter Ireland Lecture Note 1

    18/22

    must have rank m < m+ 1. To see why, consider the system of equations

    F(x1, x2, . . . , xn) = y

    G1(x1, x2, . . . , xn) = c1

    G2(x1, x2, . . . , xn) = c2

    ...

    Gm(x1, x2, . . . , xn) = cm.

    Withy set equal to the maximized value of the objective function,

    y =F(x1, x

    2, . . . , x

    n),

    each of these m + 1 equations holds when the functions are evaluated at x1, x

    2, . . . , x

    n.In this case, the implicit function theorem implies that it should be possible to adjustthe values of m +1 of the choice variables so to find a new set of valuesx1 , x

    2 , . . . , x

    n

    such that

    F(x1 , x

    2 , . . . , x

    n) = y +

    G1(x

    1 , x

    2 , . . . , x

    n) = c1

    G2(x

    1 , x

    2 , . . . , x

    n) = c2...

    Gm(x

    1 , x

    2 , . . . , x

    n) = cm.

    for a strictly positive but sufficiently small value of. But this contradicts the assump-tion that x1, x

    2, . . . , x

    n solves the constrained optimization problem.

    Since the matrix in (17) has rank m < m+ 1, its m+ 1 rows must be linearly dependent.Hence, there exist scalars 0, 1, . . . m, at least one of which is nonzero, such that

    0 =0

    F1(x

    1, x

    2, . . . , x

    n)...

    Fn(x

    1, x

    2, . . . , x

    n)

    +1

    G1,1(x

    1, x

    2, . . . , x

    n)...

    G1,n(x1, x

    2, . . . , x

    n)

    +. . .+m

    Gm,1(x

    1, x

    2, . . . , x

    n)...

    Gm,n(x1, x

    2, . . . , x

    n)

    .

    (18)

    Moreover, in (18), 0 = 0, since otherwise, the matrix in (12) would have rank lessthan m.

    Thus, forj = 1, 2, . . . ,m, setj =j/0. With these settings for

    1,

    2, . . . ,

    m, plus thesettings m+1 =

    m+2 =

    m = 0 chosen earlier, (18) implies that (13) must hold forall i= 1, 2, . . . , n. Clearly, (14) and (16) are satisfied for all j = 1, 2, . . . , m, and (15)holds for all j = m+ 1, m+ 2, . . . , m. So it only remains to show that (15) holds for

    j= 1, 2, . . . ,m.

    17

  • 8/12/2019 Peter Ireland Lecture Note 1

    19/22

    To see that these last conditions must hold, consider the system of equations

    G1(x1, x2, . . . , xn) =c1

    G2(x1, x2, . . . , xn) =c2..

    .Gm(x1, x2, . . . , xn) =cm,

    (19)

    where 0. These equations hold, with= 0, at x1, x

    2, . . . , x

    n. And since the matrixin (12) has rank m, the implicit function theorem implies that there are functionsx1(), x2(), . . . , xn() such that the same equations hold for all sufficiently small valuesof.

    Since c1 c1, the choices x1(), x2(), . . . , xn() satisfy all of the constraints fromthe original optimization problem. And since, by assumption, x1(0) = x1, x2(0) =x2, . . . , xn(0) =x

    nmaximizes the objective function subject to the constraints, it must

    be thatdF(x1(), x2(), . . . , xn())

    d

    =0

    =n

    i=1

    Fi(x

    1, x

    2, . . . , x

    n)x

    i(0) 0. (20)

    In addition, the equations in (19) implicitly defining x1(), x2(), . . . , xn() imply

    dG1(x1(), x2(), . . . , xn())

    d

    =0

    =n

    i=1

    G1,i(x

    1, x

    2, . . . , x

    n)x

    i(0) =1 (21)

    and

    dGj(x1(), x2(), . . . , xn())d

    =0

    =

    ni=1

    Gj,i(x1, x2, . . . , xn)x

    i(0) = 0 (22)

    forj= 2, 3, . . . ,m.

    Putting all these results together, (13) implies

    0 =Fi(x

    1, x

    2, . . . , x

    n) m

    j=1

    jGj,i(x

    1, x

    2, . . . , x

    n).

    for all i = 1, 2, . . . , n. Mutliplying each of these equations by xi(0) and summing over

    alliyields

    0 =n

    i=1

    Fi(x

    1, x

    2, . . . , x

    n)x

    i(0) n

    i=1

    mj=1

    jGj,i(x

    1, x

    2, . . . , x

    n)x

    i(0)

    or

    0 =n

    i=1

    Fi(x

    1, x

    2, . . . , x

    n)x

    i(0) m

    j=1

    j

    ni=1

    Gj,i(x

    1, x

    2, . . . , x

    n)x

    i(0)

    18

  • 8/12/2019 Peter Ireland Lecture Note 1

    20/22

    In light of (21) and (22), this last equation simplifies to

    0 =n

    i=1

    Fi(x

    1, x

    2, . . . , x

    n)x

    i(0) +

    1.

    And hence, in light of (20),1 0.

    Analogous arguments show thatj 0

    forj= 2, 3, . . . ,mas well, completing the proof.

    4.2 The Envelope Theorem

    Proving a generalized version of the envelope theorem requires no new ideas, just repeatedapplications of the previous ones.

    Consider, again, the constrained optimization problem with n choice variables and mcon-straints:

    maxx1,x2,...,xn

    F(x1, x2, . . . , xn) subject to cj Gj(x1, x2, . . . , xn) for all j = 1, 2, . . . , m .

    Now extend this problem by allowing the functions F and Gj, j = 1, 2, . . . , m, to dependon a parameter R:

    maxx1,x2,...,xn F(x1, x2, . . . , xn, ) subject to

    cj Gj(x1, x2, . . . , xn, ) for all j = 1, 2, . . . , m .

    Just as before, define the maximum value functionV :R R as

    V() = maxx1,x2,...,xn

    F(x1, x2, . . . , xn, )

    subject to cj Gj(x1, x2, . . . , xn, ) for all j = 1, 2, . . . , m .

    Note thatVis still a function of the single parameter , since thenchoice variables areoptimized out. Put another way, evaluatingVrequires the same two-step procedureas before:

    First, given , find the values x1(), x2(), . . . , xn() that solve the constrained opti-mization problem.

    Second, substitute these values x1(), x

    2(), . . . , x

    n(), together with the given valueof, into the objective function to obtain

    V() =F(x1(), x

    2(), . . . , x

    n(), ).

    19

  • 8/12/2019 Peter Ireland Lecture Note 1

    21/22

    And just as before, the envelope theorem tells us that we can calculate the derivative V ()of the maximum value function while ignoring the dependence of x1, x

    2, . . . , x

    n and1,

    2, . . . ,

    m on , provided we invoke the complementary slackness conditions (16)to add the sum of all of the multipliers times all of the constraints to the objectivefunction before differentiating through by .

    Theorem (Envelope)Let F andGj ,j = 1, 2, . . . , m, be continuously differentiable func-tions ofx1, x2, . . . , xn and . For any value of, let x

    1(), x

    2(), . . . , x

    n() maximizeF(x1, x2, . . . , xn, ) subject to cj Gj(x1, x2, . . . , xn, ) for all j = 1, 2, . . . , m, and let1(),

    2(), . . . ,

    m() be the associated values of the Lagrange multipliers. Suppose,further, thatx1(), x

    2(), . . . , x

    n() and

    1(),

    2(), . . . ,

    m() are all continuously dif-ferentiable functions, and that the m() mmatrix of derivatives

    G1,1(x1(), x

    2(), . . . , x

    n(), ) . . . G1,n(x

    1(), x

    2(), . . . , x

    n(), )G2,1(x

    1(), x

    2(), . . . , x

    n(), ) . . . G2,n(x

    1(), x

    2(), . . . , x

    n(), )...

    . . . ...

    Gm(),1(x1(), x2(), . . . , xn(), ) . . . Gm(),n(x1(), x2(), . . . , xn(), )

    associated with the m() m binding constraints has rank m() for each value of.Then the maximum value function defined by

    V() = maxx1,x2,...,xn

    F(x1, x2, . . . , xn, )

    subject to cj Gj(x1, x2, . . . , xn, ) for all j = 1, 2, . . . , m

    satisfies

    V () =Fn+1(x

    1(), x

    2(), . . . , x

    n(), )

    m

    j=1

    j()Gj,n+1(x

    1(), x

    2(), . . . , x

    n(), ). (23)

    ProofThe Kuhn-Tucker theorem implies that for any given value of ,

    Fi(x

    1(), x

    2(), . . . , x

    n(), ) m

    j=1

    j()Gj,i(x

    1(), x

    2(), . . . , x

    n(), ) = 0 (13)

    fori= 1, 2, . . . , n, and

    j()[cj Gj(x1(), x2(), . . . , xn(), )] = 0, (16)

    forj= 1, 2, . . . , mmust hold.

    In light of (16),

    V() =F(x1(), x

    2(), . . . , x

    n(), ) +m

    j=1

    j()[cj Gj(x

    1(), x

    2(), . . . , x

    n(), )].

    20

  • 8/12/2019 Peter Ireland Lecture Note 1

    22/22

    Differentiating both sides of this expression by yields

    V () =n

    i=1

    Fi(x

    1(), x

    2(), . . . , x

    n(), )x()

    +Fn+1(x

    1(), x

    2(), . . . , x

    n(), )

    +m

    j=1

    j()[cj Gj(x

    1(), x

    2(), . . . , x

    n(), )]

    n

    i=1

    mj=1

    j()Gj,i(x

    1(), x

    2(), . . . , x

    n(), )x()

    m

    j=1

    j()Gj,n+1(x

    1(), x

    2(), . . . , x

    n(), ).

    which shows that, in principle, we must take the dependence ofx1(), x

    2(), . . . , x

    n()

    and

    1(),

    2(), . . . ,

    m() on into account when calculating V

    ().

    Note, however, that (13) implies that the sums in the first and fourth lines of this lastexpression together equal zero. Hence, to show that (23) holds, it only remains toshow that

    mj=1

    j()[cj Gj(x

    1(), x

    2(), . . . , x

    n(), )] = 0

    and this is true if

    j()[cj Gj(x

    1(), x

    2(), . . . , x

    n(), )] = 0 (24)

    for allj = 1, 2, . . . , m.

    Clearly, (24) holds for any such that constraint j is binding.

    For such that constraint j is not binding, (16) implies that j() = 0. Furthermore, bythe continuity ofGj and xi(),i = 1, 2, . . . , n, if constraint j does not bind at , thereexists an >0 such that constraint j does not bind for all + with >||. Hence,

    j() = lim0

    j(+)

    j()

    = lim

    0

    0

    = 0,

    and once again it becomes apparent that (24) must hold. Hence, (23) must hold aswell.