dynamic programming value iteration

36
Applied Dynamic Programming In these notes, we will deal with a fundamental tool of dynamic macroeconomics: dynamic programming. Dynamic programming is a very convenient way of writing a large set of dynamic problems in economic analysis as most of the properties of this tool are now well established and understood. 1 In these lectures, we will not deal with all the theoretical properties attached to this tool, but will rather give some recipes to solve economic problems using the tool of dynamic programming. In order to understand the problem, we will first deal with deterministic models, before extending the analysis to stochastic ones. However, we shall give some preliminary definitions and theorems that justify this approach 1 The Bellman Equation and Associated Theorems 1.1 Heuristic Derivation of the Bellman Equation Let us consider the case of an agent that has to decide on the path of a set of control variables, {y t } t=0 in order to maximize the discounted sum of its future payoffs, u(y t ,x t ) where x t is state variables assumed to evolve according to x t+1 = h(x t ,y t ), x 0 given. We finally make the assumption that the model is Markovian. The optimal value our agent can derive from this maximization process is given by V (x t )= max {y t+s D(x t+s )} s=0 X s=0 β s u(y t+s ,x t+s ) (1) 1 For a mathematical exposition of the problem see Bertsekas [1976], for a more economic approach see Lucas, Stokey and Prescott [1989].

Upload: ernestdautovic

Post on 21-Nov-2015

40 views

Category:

Documents


5 download

DESCRIPTION

dynamic programming, DSGE, MAcroeconomics, New Keynesian Methods, value iteration

TRANSCRIPT

  • Applied Dynamic Programming

    In these notes, we will deal with a fundamental tool of dynamic macroeconomics: dynamic

    programming. Dynamic programming is a very convenient way of writing a large set of dynamic

    problems in economic analysis as most of the properties of this tool are now well established and

    understood.1 In these lectures, we will not deal with all the theoretical properties attached to

    this tool, but will rather give some recipes to solve economic problems using the tool of dynamic

    programming. In order to understand the problem, we will first deal with deterministic models,

    before extending the analysis to stochastic ones. However, we shall give some preliminary

    definitions and theorems that justify this approach

    1 The Bellman Equation and Associated Theorems

    1.1 Heuristic Derivation of the Bellman Equation

    Let us consider the case of an agent that has to decide on the path of a set of control variables,

    {yt}t=0 in order to maximize the discounted sum of its future payoffs, u(yt, xt) where xt is statevariables assumed to evolve according to

    xt+1 = h(xt, yt), x0 given.

    We finally make the assumption that the model is Markovian.

    The optimal value our agent can derive from this maximization process is given by

    V (xt) = max{yt+sD(xt+s)}s=0

    s=0

    su(yt+s, xt+s) (1)

    1For a mathematical exposition of the problem see Bertsekas [1976], for a more economic approach see Lucas,Stokey and Prescott [1989].

  • Applied Dynamic Programming 2

    where D is the set of all feasible decisions for the variables of choice. The function V (xt) is the

    value function and corresponds to the payoff of the agent when she adopts her optimal plan.

    Note that the value function is a function of the state variable only. Indeed, since the model is

    Markovian, only the knowledge of the past is necessary to take decisions. In other words, the

    whole path can be predicted once the state variable is observed. Therefore, the value at t is only

    a function of xt. Equation (1) may now be rewritten as

    V (xt) = max{ytD(xt),{yt+sD(xt+s)}s=1}u(yt, xt) +

    s=1

    su(yt+s, xt+s) (2)

    making the change of variable k = s 1, (2) rewrites

    V (xt) = max{ytD(xt),{yt+1+kD(xt+1+k)}k=0}u(yt, xt) +

    k=0

    k+1u(yt+1+k, xt+1+k)

    or

    V (xt) = maxytD(xt)

    u(yt, xt) + max{yt+1+kD(xt+1+k)}k=0

    k=0

    ku(yt+1+k, xt+1+k) (3)

    Note that, by definition, we have

    V (xt+1) = max{yt+1+kD(xt+1+k)}k=0

    k=0

    ku(yt+1+k, xt+1+k)

    such that (3) rewrites as

    V (xt) = maxytD(xt)

    u(yt, xt) + V (xt+1) (4)

    This is the socalled Bellman equation that lies at the core of the dynamic programming theory.

    This equation is associated, in each and every period t, with a set of optimal policy functions

    for y and x, which are defined by

    {yt, xt+1} argmaxytD(xt)

    u(yt, xt) + V (xt+1) (5)

    Our problem is now to solve (4) for the function V (xt). This problem is particularly complicated

    as we are not solving for just a point that would satisfy the equation, but we are interested in

    finding a function that satisfies the equation. We are therefore leaving the realm of real analysis

    for functional analysis.

    A simple procedure to find a solution would be the following

    1. Make an initial guess on the form of the value function V0(xt)

    2. Update the guess using the Bellman equation such that

    Vi+1(xt) = maxytD(xt)

    u(yt, xt) + Vi(h(yt, xt))

  • Applied Dynamic Programming 3

    3. If Vi+1(xt) = Vi(xt), then a fixed point has been found and the problem is solved, if not

    we go back to 2, and iterate on the process until convergence.

    Therefore, solving the Bellman equation just amounts to finding a fixed point of an operator T

    Vi+1 = TVi

    where T stands for the list of operations involved in the computation of the Bellman equation.

    The problem is then that of the existence and the uniqueness of this fixedpoint. Mathematical

    theory has provided conditions for the existence and uniqueness of a solution.

    1.2 Existence and uniqueness of a solution

    A metric space is a set S, together with a metric : S S R+, such that for allx, y, z S

    1. (x, y) > 0, with (x, y) = 0 if and only if x = y,

    2. (x, y) = (y, x),

    3. (x, z) 6 (x, y) + (y, z).

    Definition

    A sequence {xn}n=0 in S converges to x S, if for each > 0 there exists an integerN such that

    (xn, x) < for all n > N

    Definition

    A sequence {xn}n=0 in S is a Cauchy sequence if for each > 0 there exists an integerN such that

    (xn, xm) < for all n,m > N

    Definition

  • Applied Dynamic Programming 4

    A metric space (S, ) is complete if every Cauchy sequence in S converges to a point

    in S.

    Definition

    Let (S, ) be a metric space and T : S S be function mapping S into itself. T is acontraction mapping (with modulus ) if for (0, 1),

    (Tx, Ty) 6 (x, y), for all x, y S.

    Definition

    We then have the following remarkable theorem that establishes the existence and uniqueness

    of the fixed point of a contraction mapping.

    If (S, ) is a complete metric space and T : S S is a contraction mapping withmodulus (0, 1), then

    1. T has exactly one fixed point V S such that V = TV ,

    2. for any V S, (TnV0, V ) < n(V 0, V ), with n = 0, 1, 2 . . .

    Theorem (Contraction Mapping Theorem)

    Since we are endowed with all the tools we need to prove the theorem, we shall do it.

    Proof: In order to prove 1, we shall first prove that if we select any sequence {Vn}n=0, such that for each n,Vn S and

    Vn = TVn+1

    this sequence converges and that it converges to V S. In order to show convergence of {Vn}n=0, we shall provethat {Vn}n=0 is a Cauchy sequence. First of all, note that the contraction property of T implies that

    (V2, V1) = (TV1, TV0) 6 (V1, V0)

    and therefore(Vn+1, Vn) = (TVn, TVn1) 6 (Vn, Vn1) 6 . . . 6 n(V1, V0)

    Now consider two terms of the sequence, Vm and Vn, m > n. The triangle inequality implies that

    (Vm, Vn) 6 (Vm, Vm1) + (Vm1, Vm2) + . . .+ (Vn+1, Vn)

  • Applied Dynamic Programming 5

    therefore, making use of the previous result, we have

    (Vm, Vn) 6(m1 + m2 + . . .+ n

    )(V1, V0) 6

    n

    1 (V1, V0)

    Since (0, 1), n 0 as n , we have that for each > 0, there exists N N such that (Vm, Vn) < .Hence {Vn}n=0 is a Cauchy sequence and it therefore converges. Further, since we have assume that S is complete,Vn converges to V S.We now have to show that V = TV in order to complete the proof of the first part. Note that, for each > 0,and for V0 S, the triangular inequality implies

    (V, TV ) 6 (V, Vn) + (Vn, TV )

    But since {Vn}n=0 is a Cauchy sequence, we have

    (V, TV ) 6 (V, Vn) + (Vn, TV ) 6

    2+

    2

    for large enough n, therefore V = TV .

    Hence, we have proven that T possesses a fixed point and therefore have established its existence. We now haveto prove uniqueness. This can be obtained by contradiction. Suppose, there exists another function, say W Sthat satisfies W = TW . Then, the definition of the fixed point implies

    (V,W ) = (TV, TW )

    but the contraction property implies

    (V,W ) = (TV, TW ) 6 (V,W )

    which, as > 0 implies (V,W ) = 0 and so V = W . The limit is then unique.

    Proving 2. is straightforward as

    (TnV0, V ) = (TnV0, TV ) 6 (Tn1V0, V )

    but we have (Tn1V0, V ) = (Tn1V0, TV ) such that

    (TnV0, V ) = (TnV0, TV ) 6 (Tn1V0, V ) 6 2(Tn2V0, V ) 6 . . . 6 n(V0, V )

    which completes the proof. 2

    This theorem is of great importance as it establishes that any operator that possesses the

    contraction property will exhibit a unique fixedpoint, which therefore provides some rationale

    to the algorithm we were designing in the previous section. It also insures that, if the value

    function satisfies a contraction property, simple iterations will deliver the solution for any initial

    conditions. It therefore remains to provide conditions for the value function to be a contraction.

    These are provided by the next theorem.

  • Applied Dynamic Programming 6

    Let X R` and B(X) be the space of bounded functions V : X R with theuniform metric. Let T : B(X) B(X) be an operator satisfying

    1. (Monotonicity) Let V,W B(X), if V (x) 6 W (x) for all x X, then TV (x) 6TW (x)

    2. (Discounting) There exists some constant (0, 1) such that for all V B(X)and a > 0, we have

    T (V + a) 6 TV + a

    then T is a contraction with modulus .

    Theorem (Blackwells Sufficient Conditions)

    Proof: Let us consider two functions V,W B(X) satisfying 1. and 2., and such thatV 6W + (V,W )

    Monotonicity first implies thatTV 6 T (W + (V,W ))

    and discounting then impliesTV 6 TW + (V,W ))

    We therefore getTV TW 6 (V,W )

    Likewise, if we now consider that W 6 V + (V,W ), we end up withTW TV 6 (V,W )

    Consequently, we have|TV TW | 6 (V,W )

    so that(TV TW ) 6 (V,W )

    which defines a contraction. This completes the proof. 2

    This theorem is extremely useful as it gives us simple tools to check whether a problem is a

    contraction and therefore permits to check whether the simple algorithm we were defined is

    appropriate for the problem we have in hand.

    As an example, let us consider the optimal growth model, for which the Bellman equation writes

    V (kt) = maxctC

    u(ct) + V (kt+1)

    with kt+1 = F (kt) ct. In order to save on notations, let us drop the time subscript and denotethe next period capital stock by k. Plugging the law of motion of capital in the utility function,

    the Bellman equation rewrites

    V (k) = maxkK

    u(F (k) k) + V (k)

  • Applied Dynamic Programming 7

    where we now have to find the optimal next period capital stock i.e. the optimal investment

    policy. Let us now define the operator T as

    (TV )(k) = maxkK

    u(F (k) k) + V (k)

    We would like to know if T is a contraction and therefore if there exists a unique function V

    such that

    V (k) = (TV )(k)

    In order to achieve this task, we just have to check whether T is monotonic and satisfies the

    discounting property.

    1. Monotonicity: Let us consider two candidate value functions, V and W , such that V (k) 6W (k) for all k K . What we want to show is that (TV )(k) 6 (TW )(k). In order to dothat, let us denote by k the optimal next period capital stock, that is

    (TV )(k) = u(F (k) k) + V (k)

    But now, since V (k) 6 W (k) for all k K , we have V (k) 6 W (k), such that it shouldbe clear that

    (TV )(k) 6 u(F (k) k) + W (k)6 max

    kKu(F (k) k) + W (k) = (TW (k))

    Hence we have shown that V (k) 6 W (k) implies (TV )(k) 6 (TW )(k) and thereforeestablished monotonicity.

    2. Discounting: Let us consider a candidate value function, V , and a positive constant a.

    (T (V + a))(k) = maxkK

    u(F (k) k) + (V (k) + a)= max

    kKu(F (k) k) + V (k) + a

    = (TV )(k) + a

    Therefore, the Bellman equation satisfies discounting in the case of optimal growth model.

    Hence, the optimal growth model satisfies the Blackwells sufficient conditions for a contraction

    mapping, and therefore the value function exists and is unique. We are now in position to design

    a numerical algorithm to solve the Bellman equation.

  • Applied Dynamic Programming 8

    2 Deterministic Dynamic Programming

    2.1 Value Function Iteration

    The contraction mapping theorem gives us a straightforward way to compute the solution to

    the Bellman equation: iterate on the operator T , such that Vi+1 = TVi up to the point where

    the distance between two successive value functions is small enough. Basically this amounts to

    apply the following algorithm

    1. Decide on a grid, X , of admissible values for the state variable x

    X = {x1, . . . , xN}

    and formulate an initial guess for the value function V0(x). Finally, choose a stopping

    criterion > 0.

    2. For each x` X , ` = 1, . . . , N , compute

    Vi+1(x`) = max{xX }u(y(x`, x

    ), x`) + Vi(x)

    3. If Vi+1(x) Vi(x) < go to the next step, otherwise go back to 2.

    4. Compute the final solution as

    y?(x) = y(x, x)

    In order to better understand the algorithm, let us consider a simple example and go back to

    the optimal growth model, with

    u(c) =c1 1

    1 and

    k = k c+ (1 )k > 0Then the Bellman equation writes

    V (k) = max06c6k+(1)k

    c1 11 + V (k

    )

    From the law of motion of capital we can determine consumption as

    c = k + (1 )k k

    such that plugging this results in the Bellman equation, we have

    V (k) = max06k6k+(1)k

    (k + (1 )k k)1 11 + V (k

    )

  • Applied Dynamic Programming 9

    Now, let us define a grid of N feasible values for k such that we have

    K = {k1, . . . , kN}

    and an initial value function V0(k) that is a vector of N numbers that relate each k` to a

    value. Note that this may be anything we want as we know by the contraction mapping

    theorem that the algorithm will converge. But, if we want it to converge fast enough it may

    be a good idea to impose a good initial guess. Finally, we need a stopping criterion.

    Then, for each ` = 1, . . . , N , we compute the feasible values that can be taken by the quantity

    in left hand side of the value function

    Q`,h (k` + (1 )k` kh)1 1

    1 + V (kh) for h feasible

    It is important to understand what h feasible means. Indeed, we only compute consumption

    when it is positive, which restricts the number of possible values for k. Namely, we want k to

    satisfy

    0 6 k 6 k + (1 )kwhich puts an upper bound on the index h. When the grid of values is uniform that is when

    kh = k + (h 1)k, this bound can be computed as

    h = E

    (ki + (1 )ki k

    k

    )+ 1

    where E(x) denotes the integer part of x.

    Then we find

    Q?`,h = maxh=h,...,h

    Q`,h

    and set

    Vi+1(k`) = Q?`,h

    and keep in memory the index h? = argmaxh=1,...,N Q`,h, such that we have

    k(k`) = kh?

    In Figure 1, we report the value function and the decision rules obtained from the deterministic

    optimal growth model with = 0.3, = 0.96, = 0.1 and = 2. The grid for the capital stock

    is composed of 1000 data points ranging from (1k)k? to (1 + k)k?, where k? denotes thesteady state and k = 0.75. The algorithm

    2 is then as follows.

    2The code reported in these notes and those that follow are not efficient from a computational point of view,as they are intended to have you understand the method without adding any coding complications. Much fasterimplementations can be obtained by vectorizing the code.

  • Applied Dynamic Programming 10

    Matlab Code: Value iteration (OGM)

    % Parameters of the economy

    sigma = 2.00; % utility parameter

    delta = 0.10; % depreciation rate

    beta = 0.96; % discount factor

    alpha = 0.30; % capital elasticity of output

    % parameters of the algorithm

    nbk = 2000; % number of data points in the grid

    crit = 1; % inital convergence criterion

    tol = 1e-6; % Tolerance parameter

    % Setup the grid

    ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

    dev = 0.75; % maximal deviation from steady state

    kmin = (1-dev)*ks; % lower bound on the grid

    kmax = (1+dev)*ks; % upper bound on the grid

    kgrid = linspace(kmin,kmax,nbk); % builds the grid

    % Initial conditions

    v = zeros(nbk,1); % value function

    tv = zeros(nbk,1); % value function

    dr = zeros(nbk,1); % decision rule (will contain indices)

    % Main loop

    iter = 1;

    while crit>tol;

    for i=1:nbk

    %

    % compute indexes for which consumption is positive

    %

    imax = sum(kgrid

  • Applied Dynamic Programming 11

    Figure 1: Deterministic OGM (Decision rules, Value iteration)

    0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.52

    1

    0

    1

    2

    3

    4

    capital (kt)

    Value Function

    0 2 4 60

    1

    2

    3

    4

    5

    6

    capital (kt)

    Next period capital

    0 2 4 6

    0.7

    0.8

    0.9

    1

    1.1

    1.2

    1.3

    1.4

    capital (kt)

    Consumption

    2.2 Taking advantage of interpolation

    A possible improvement of the method is to have a much looser grid on the capital stock but have

    a more dense grid on the control variable (consumption in the optimal growth model). Then the

    next period value of the state variable can be computed much more precisely. However, because

    of this precision and the fact the capital grid is coarser, it may be the case that the computed

    optimal value for the next period state variable does not lie in the grid. This implies that the

    value function is unknown at this particular value. Therefore, we need to use an interpolation

    scheme to get an approximation of the value function at this value. One advantage of this

    approach is that it involves less function evaluations and is usually less costly in terms of CPU

    time. The algorithm is then as follows:

    1. Decide on a grid, X , of admissible values for the state variable x

    X = {x1, . . . , xN}

    Decide on a grid, Y , of admissible values for the control variable y

    Y = {y1, . . . , yM} with M N

  • Applied Dynamic Programming 12

    formulate an initial guess for the value function V0(x) and choose a stopping criterion

    > 0.

    2. For each x` X , ` = 1, . . . , N , compute

    x`,j = h(yj , x`)j = 1, . . . ,M

    Compute an interpolated value function at each x`,j = h(yj , x`): Vi(x`,j)

    Vi+1(x`) = max{yY }u(y, x`) + Vi(x

    `,j)

    3. If Vi+1(x) Vi(x) < go to the next step, otherwise go back to 2.

    The matlab code for this approach is reported next. Cubic spline interpolation method is used

    for the value function, with 20 nodes for the capital stock and 10000 nodes for consumption.

    The algorithm converges starting from initial condition for the value function

    v0(k) =((c?/y?) k)1 1

    (1 )

    Matlab Code: Value iteration with interpolation (OGM)

    clear all

    clc

    % Parameters of the economy

    sigma = 2.00; % utility parameter

    delta = 0.10; % depreciation rate

    beta = 0.96; % discount factor

    alpha = 0.30; % capital elasticity of output

    rho = 0.80;

    se = 0.05;

    % parameters of the algorithm

    dev = 0.75; % maximal deviation from steady state

    nbk = 20; % number of grid points in k

    nbc = 10000; % number of grid points in c

    crit = 1; % inital convergence criterion

    tol = 1e-6; % Tolerance parameter

    method = linear; % interpolation method

    % Setup the grid

    ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

    kmin = (1-dev)*ks; % lower bound on the grid

    kmax = (1+dev)*ks; % upper bound on the grid

    kgrid = linspace(kmin,kmax,nbk); % builds the grid

    % Initial conditions

    v = zeros(nbk,1); % value function

  • Applied Dynamic Programming 13

    tv = zeros(nbk,1); % value function

    kpgrid = zeros(nbk,1); % decision rule for k(t+1)

    ygrid = kgrid.^alpha; % output

    % Main loop

    iter = 1;

    while crit>tol;

    for i=1:nbk

    % consumption, next period capital and utility

    c = linspace(0,ygrid(i),nbc);

    k1 = kgrid(i)^alpha+(1-delta)*kgrid(i)-c;

    util = (c.^(1-sigma)-1)/(1-sigma);

    % find value function

    vint = interp1(kgrid,v,k1,method);

    [v1,dr] = max(util+beta*vint);

    tv(i) = v1;

    kpgrid(i) = k1(dr);

    end;

    crit = max(max(abs(tv-v))); % Compute convergence criterion

    v = tv; % Update the value function

    fprintf(Iteration: %3d Crit: %g\n,iter,crit)

    iter = iter+1;

    end

    % Output

    ygrid = kgrid.^alpha;

    cgrid = ygrid+(1-delta)*kgrid-kpgrid;

    2.3 Policy iterations: Howard Improvement

    The simple value iteration algorithm has the attractive feature of being particularly simple to

    implement. However, it is a slow procedure, especially for infinite horizon problems, since it can

    be shown that this procedure converges at the rate , which is usually close to 1! Furthermore,

    it computes unnecessary quantities during the algorithm which slows down convergence. Often,

    computation speed is really important, for instance when one wants to perform a sensitivity

    analysis of the results obtained in a model using different parameterizations. Hence, we would

    like to be able to speed up convergence. This can be achieved by applying the socalled Howard

    improvement method. This idea of this method is to iterate on policy functions rather than on

    the value function. The algorithm may be described as follows

    1. Set i = 0 and guess an initial feasible decision rule for the control variable y = fi(x) and

    compute the value associated to this guess, assuming that this rule is operative forever

    V (xt) =s=0

    su(fi(xt+s), xt+s)

    taking care of the fact that xt+1 = h(xt, yt) = h(xt, fi(xt)). Set a stopping criterion > 0.

  • Applied Dynamic Programming 14

    2. Find a new policy rule y = fi+1(x) such that

    fi+1(x) argmaxy

    u(y, x) + V (x)

    with x = h(x, fi(x))

    3. check if fi+1(x) fi(x) < , if yes then stop, otherwise go back to 2.

    Note that this method differs fundamentally from the value iteration algorithm in at least two

    dimensions

    (i) one iterates on the policy function rather than on the value function;

    (ii) the decision rule is used forever whereas it is assumed that it is used only two consecutive

    periods in the value iteration algorithm. This is precisely this last feature that accelerates

    convergence.

    Note that when computing the value function we actually have to solve a linear system of the

    form

    Vi+1(x`) = u(fi+1(x`), x`) + Vi+1(h(x`, fi+1(x`))) ` = 1, . . . , N

    for Vi+1(x`), which may be rewritten as

    Vi+1(x`) = u(fi+1(x`), x`) + QVi+1(x`) ` = 1, . . . , N

    where Q is an (N N) matrix

    Q`j =

    {1 if x h(fi+1(x`), x`) = x`0 otherwise

    Note that although it is a big matrix, Q is sparse, which can be exploited in solving the system,

    to get

    Vi+1(x) = (I Q)1u(fi+1(x), x)

    Matlab Code: Policy iteration (OGM)

    clear all

    clc

    % Parameters of the economy

    sigma = 2; % utility parameter

    delta = 0.1; % depreciation rate

    beta = 0.96; % discount factor

    alpha = 0.30; % capital elasticity of output

    % parameters of the algorithm

  • Applied Dynamic Programming 15

    nbk = 2000; % number of data points in the grid

    crit = 1; % inital convergence criterion

    epsi = 1e-6; % Tolerance parameter

    % Setup the grid

    dev = 0.5; % maximal deviation from steady state

    ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

    kmin = (1-dev)*ks; % lower bound on the grid

    kmax = (1+dev)*ks; % upper bound on the grid

    kgrid = linspace(kmin,kmax,nbk); % builds the grid

    % Initial conditions

    v = zeros(nbk,1); % value function

    v1 = zeros(nbk,1); % value function

    kp0 = zeros(nbk,1); % value function

    dr = zeros(nbk,1); % decision rule (will contain indices)

    % Main loop

    while crit>epsi;

    for i=1:nbk

    % compute indexes for which consumption is positive

    imax = sum(kgrid

  • Applied Dynamic Programming 16

    which may be particularly costly when the number of grid points is important. Therefore, it

    was proposed to replace the matrix inversion by an additional iteration step, leading to the so

    called modified policy iteration with k steps, which replaces the linear problem by the following

    iteration scheme

    1. Set J0 = Vi

    2. iterate k times on

    Ji+1 = u(y, x) + QJi, i = 0, . . . , k

    3. set Vi+1 = Jk.

    When k , Jk tends toward the solution of the linear system. In that case, the main loopin the matlab code is slightly different.

    Matlab Code: Modified Policy iteration (OGM), main loop only

    K = 100; % Number of iterations

    % Main loop

    while crit>epsi;

    for i=1:nbk

    % compute indexes for which consumption is positive

    imax = sum(kgrid

  • Applied Dynamic Programming 17

    2.4 Endogenous Grid Method

    The endogenous grid method was initially proposed by Carroll [2006] who proposed to formulate

    a grid for the next period state variable rather than its current value. One merit of this approach

    is that it allows to take advantage of some first order conditions and thereby accelerate the

    maximization step. The version which is presented here differs from that proposed in Carroll

    [2006] in that it does not take advantage of some redefinition of variables that would accelerate

    even further the algorithm. This is done (i) to keep notations and intuition simple and (ii) to

    show that the algorithm can be combined with a root finding problem relatively easily.

    Assume the problem to be solved takes the form

    V (xt) = maxu(yt, xt) + V (xt+1)

    with xt+1 = h(xt, yt). The algorithm then works as follows

    1. Set a grid for xt+1 and an initial value function Vi(xt+1). (i = 0 in the first iteration)

    2. Use the first order condition to get y?t = y(xt+1, xt)

    u(yt, xt)

    yt+

    Vi(xt+1)

    xt+1

    h(xt, yt)

    yt= 0 y?t = (xt+1, xt)

    3. Use the transition equation to find the optimal x?t = x(xt+1):

    xt+1 = h(xt, yt) = h(xt,(xt+1, xt) x?t = x(xt+1)

    and update yt such that

    y?t = y(xt+1,

    x(xt+1)) = y(xt+1)

    4. Then compute

    Vi+1(x?t ) = u(y

    ?t , x

    ?t ) + Vi(xt+1)

    5. Interpolate Vi+1(x?t ) on the grid for xt+1 and update the value.

    6. If Vi+1(xt+1) Vi(xt+1) < then stop, else go back to 2.

    Note that the initial guess for the value function now plays a much more important role than

    before as its partial derivative is needed to compute the optimal decision rule for the control

    variable yt. The computation of the partial derivative can be achieved by mixing interpolation

    and numerical differentiation in a rather standard way. Also note how step 4 does not involve

    maximization as the optimal decision rules has already been obtained from step 2.

  • Applied Dynamic Programming 18

    In the optimal growth model case, things are rather simple as the utility is only a function

    of ct and the capital stock can be solved for very simply. From the first order conditions on

    consumption, and given a grid for the next period capital stock kt+1, it is readily obtained

    ct = (Vk(kt+1)) 1

    which can then be used in the capital accumulation equation to obtain the current period capital

    stock kt by solving3

    kt + (1 )kt ct kt+1 = 0We denote by k?t the solution to this equation. This process defines a grid for the current period

    capital stock which is endogenously determined, and so the name. The current value function

    is then given by

    V (k?t ) =c1t 1

    1 + V (kt+1)Since the new value function is evaluated at nodes that do not necessarily lie on the grid for

    next period capital stock, the updated value function has to be interpolated on these nodes.

    Matlab Code: Endogenous Grid Method (OGM)

    % Parameters of the economy

    sigma = 2.00; % utility parameter

    delta = 0.10; % depreciation rate

    beta = 0.96; % discount factor

    alpha = 0.30; % capital elasticity of output

    % parameters of the algorithm

    nbk = 2000; % number of data points in the grid

    crit = 1; % inital convergence criterion

    epsi = 1e-6; % Tolerance parameter

    method = linear; % Interpolation method

    % Setup the grid

    dev = 0.5; % maximal deviation from steady state

    ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

    kmin = (1-dev)*ks; % lower bound on the grid

    kmax = (1+dev)*ks; % upper bound on the grid

    kpgrid = linspace(kmin,kmax,nbk); % builds the grid

    % Initial values

    c0 = 1.2*(kpgrid.^alpha+(1-delta)*kpgrid);

    v0 = (c0.^(1-sigma)-1)/((1-sigma)*(1-beta));

    kgrid = kss*ones(nk,1);

    while crit>tol

    dv0 = diff_value(kpgrid,v0,method);

    c = (beta*dv0).^(-1/sigma);

    tmp = (c.^(1-sigma)-1)/(1-sigma)+beta*v0;

    3In the version proposed by Carroll [2006] this step is further accelerated by rewriting the problem in termsof available resources Yt = k

    t + (1 )kt.

  • Applied Dynamic Programming 19

    crk = 1;

    while crk>tol; % Gauss-Newton Algorithm

    f0 = kgrid.^alpha+(1-delta)*kgrid-kpgrid-c;

    df0 = alpha*kgrid.^(alpha-1)+1-delta;

    k1 = kgrid-f0./df0;

    crk = max(abs(kgrid-k1));

    kgrid= k1;

    end

    v1 = interp1(kgrid,tmp,kpgrid,method);

    crit = max(max(abs(v0-v1)));

    v0 = v1;

    end

    % Output

    cgrid = kgrid.^alpha+(1-delta)*kgrid-kpgrid;

    vgrid = interp1(kpgrid,v1,kgrid,method);

    2.5 Parametric dynamic programming

    The last technique we will describe borrows from approximation theory using either orthogonal

    polynomials or spline functions. The idea is actually to make a guess for the functional form of

    the value function and iterate on the parameters of this functional form. The algorithm then

    works as follows

    1. Choose a functional form for the value function V (x; ), a grid of interpolating nodes

    X = {x1, . . . , xN}, a stopping criterion > 0 and an initial vector of parameters 0.

    2. Using the conjecture for the value function, perform the maximization step in the Bellman

    equation, that is compute w` = T (V (x,i))

    w` = T (V (x`,i)) = maxyu(y, x`) + V (x

    ,i)

    s.t. x = h(y, x`) for ` = 1, . . . , N

    3. Using the approximation method you have chosen, compute a new vector of parameters

    i+1 such that V (x,i+1) approximates the data (x`, w`).

    4. If V (x,i+1) V (x,i) < then stop, otherwise go back to 2.

    First note that for this method to be implementable, we need the payoff function and the

    value function to be continuous. The approximation function may be either a combination of

    polynomials, neural networks, splines. Note that during the optimization problem, we may have

  • Applied Dynamic Programming 20

    to rely on a numerical maximization algorithm, and the approximation method may involve

    numerical minimization in order to solve a nonlinear leastsquare problem of the form:

    i+1 Argmin

    N`=1

    (w` V (x`; ))2

    This algorithm is usually much faster than value iteration as it does not require iterating on a

    large grid. As an example, I will once again focus on the optimal growth problem we have been

    dealing with so far, and I will approximate the value function by

    V (k; ) =

    pi=1

    ii

    (2

    log(k) kk k 1

    )where {i(.)}pi=0 is a set of Chebychev polynomials. In the example, I set p = 5 and used 20nodes. Figure 2 reports the decision rule and the value function in this case, and table 1 reports

    the parameters of the approximation function. The algorithm converged in 242 iterations, but

    it took much less time than value iterations.

    Matlab Code: Parametric dynamic programming (OGM)

    sigma = 1.50; % utility parameter

    delta = 0.10; % depreciation rate

    beta = 0.95; % discount factor

    alpha = 0.30; % capital elasticity of output

    nbk = 20; % number of points in the grid

    p = 10; % order of polynomials

    crit = 1; % convergence criterion

    iter = 1; % iteration

    epsi = 1e-6; % convergence parameter

    ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

    dev = 0.9; % maximal deviation from k*

    kmin = log((1-dev)*ks); % lower bound on the grid

    kmax = log((1+dev)*ks); % upper bound on the grid

    rk = -cos((2*[1:nbk]-1)*pi/(2*nnk)); % Interpolating nodes

    kgrid = exp(kmin+(rk+1)*(kmax-kmin)/2); % mapping

    %

    % Initial guess for the approximation

    %

    v = (((kgrid.^alpha).^(1-sigma)-1)/((1-sigma)*(1-beta)));;

    X = chebychev(rk,n);

    th0 = X\v

    Tv = zeros(nbk,1);

    kp = zeros(nbk,1);

    %

    % Main loop

    %

    options=foptions;

    options(14)=1e9;

  • Applied Dynamic Programming 21

    while crit>epsi;

    k0 = kgrid(1);

    for i=1:nbk

    param = [alpha beta delta sigma kmin kmax n kgrid(i)];

    kp(i) = fminunc(@tv,k0,options,[],param,th0);

    k0 = kp(i);

    Tv(i) = -tv(kp(i),param,th0);

    end;

    theta= X\Tv;

    crit = max(abs(Tv-v));

    v = Tv;

    th0 = theta;

    iter= iter+1;

    end

    Matlab Code: Parametric dynamic programming (extra functions)

    function res=tv(kp,param,theta);

    alpha = param(1);

    beta = param(2);

    delta = param(3);

    sigma = param(4);

    kmin = param(5);

    kmax = param(6);

    n = param(7);

    k = param(8);

    kp = sqrt(kp.^2); % insures positivity of k

    v = value(kp,[kmin kmax n],theta); % computes the value function

    c = k.^alpha+(1-delta)*k-kp; % computes consumption

    d = find(c

  • Applied Dynamic Programming 22

    case 1;

    Tx=[ones(lx,1) X];

    otherwise

    Tx=[ones(lx,1) X];

    for i=3:n+1;

    Tx=[Tx 2*X.*Tx(:,i-1)-Tx(:,i-2)];

    end

    end

    Figure 2: Deterministic OGM (Decision rules, Parametric DP)

    0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 54

    2

    0

    2

    4

    kt

    Value Function

    0 1 2 3 4 50

    1

    2

    3

    4

    5

    kt

    Next period capital stock

    0 1 2 3 4 50

    0.5

    1

    1.5

    2

    kt

    Consumption

    Table 1: Value function approximation

    0 1 2 3 4 5-0.2334 3.0686 0.2543 0.0153 -0.0011 -0.0002

    A potential problem with the use of Chebychev polynomials is that they do not put any assump-

    tion on the shape of the value function, which we know to be concave and strictly increasing in

    this case. This is why Judd [1998] recommends to use shapepreserving methods such as Schu-

    maker approximation. Judd and Solnick [1994] have successfully applied this latter technique

    to the optimal growth model and found that the approximation was very good and dominated

  • Applied Dynamic Programming 23

    other methods (they actually get the same precision with 12 nodes as the one achieved with a

    1200 data points grid using a value iteration technique).

    3 Stochastic Dynamic Programming

    In a large number of problems, we have to deal with stochastic shocks just think of a stan-

    dard RBC model dynamic programming technique can obviously be extended to deal with

    such problems. This section will first show how we can obtain the Bellman equation before

    addressing some important issues concerning discretization of shocks. Then it will describe the

    implementation of the value iteration and policy iteration technique for the stochastic case. The

    code for the endogenous grid method is reported at the end of the text.

    3.1 The Bellman Equation

    The stochastic problem differs from the deterministic problem in that we now have to take

    expectations. The problem then defines a value function which has as argument the state

    variable xt but also the stationary shock st, whose sequence {st}+t=0 satisfies

    st+1 = (st, t+1) (6)

    where is a white noise process. The value function is therefore given by

    V (xt, st) = max{yt+D(xt+ ,st+ )}=0Et

    =0

    u(yt+ , xt+ , st+ ) (7)

    subject to (6) and

    xt+1 = h(xt, yt, st) (8)

    Since both yt, xt and st are either perfectly observed or decided in period t they are known in

    t, such that we may rewrite the value function as

    V (xt, st) = max{ytD(xt,st),{yt+D(xt+ ,st+ )}=1}u(yt, xt, st) + Et

    =1

    u(yt+ , xt+ , st+ )

    or

    V (xt, st) = maxytD(xt,st)

    u(yt, xt, st) + max{yt+D(xt+ ,st+ )}=1Et

    =1

    u(yt+ , xt+ , st+ )

    Using the change of variable k = 1, this rewrites

    V (xt, st) = maxytD(xt,st)

    u(yt, xt, st)

    + max{yt+1+kD(xt+1+k,st+1+k)}k=0

    Et

    k=0

    ku(yt+1+k, xt+1+k, st+1+k)

  • Applied Dynamic Programming 24

    It is important at this point to recall that

    Et(X(t+ )) =

    X(t+ )f(t+ |t)dt+

    =

    X(t+ )f(t+ |t+1)f(t+1|t)dt+dt+1

    =

    . . .

    X(t+ )f(t+ |t+1) . . . f(t+1|t)dt+ . . . dt+1

    which is a corollary of the law of iterated projections, such that the value function rewrites

    V (xt, st) = maxytD(xt,st)

    u(yt, xt, st)

    + max{yt+1+kD(xt+1+k,st+1+k)}k=0

    EtEt+1

    k=0

    ku(yt+1+k, xt+1+k, st+1+k)

    or

    V (xt, st) = maxytD(xt,st)

    u(yt, xt, st)

    + max{yt+1+kD(xt+1+k,st+1+k)}k=0

    Et+1

    k=0

    ku(yt+1+k, xt+1+k, st+1+k)f(st+1|st)dst+1

    Note that because each value for the shock defines a particular mathematical object the max-

    imization of the integral corresponds to the integral of the maximization, therefore the max

    operator and the summation are interchangeable, so that we get

    V (xt, st) = maxytD(xt,st)

    u(yt, xt, st)

    +

    max

    {yt+1+kD(xt+1+k,st+1+k)}k=0Et+1

    k=0

    ku(yt+1+k, xt+1+k, st+1+k)f(st+1|st)dst+1

    By definition, the term under the integral corresponds to V (xt+1, st+1), such that the value

    rewrites

    V (xt, st) = maxytD(xt,st)

    u(yt, xt, st) +

    V (xt+1, st+1)f(st+1|st)dst+1

    or equivalently

    V (xt, st) = maxytD(xt,st)

    u(yt, xt, st) + EtV (xt+1, st+1)

    which is precisely the Bellman equation for the stochastic dynamic programming problem.

    3.2 Discretization of the Shocks

    A very important problem that arises whenever we deal with value iteration or policy iteration in

    a stochastic environment is that of the discretization of the space spanned by the shocks. Indeed,

  • Applied Dynamic Programming 25

    the use of a continuous support for the stochastic shocks is unfeasible for a computer that can

    only deal with discrete supports. We therefore have to transform the continuous problem into

    a discrete one with the constraint that the asymptotic properties of the continuous and the

    discrete processes should be the same. The question we therefore face is: does there exist a

    discrete representation for s which is equivalent to its continuous original representation? The

    answer to this question is yes. In particular as soon as we deal with (V)AR processes, we can

    use a very powerful tool: Markov chains.

    Markov Chains: A Markov chain is a sequence of random values whose probabilities at a time

    interval depends upon the value of the number at the previous time. We will restrict ourselves

    to discrete-time Markov chains, in which the state changes at certain discrete time instants,

    indexed by an integer variable t. At each time step t, the Markov chain is in a state, denoted

    by s S {s1, . . . , sM}. S is called the state space.

    The Markov chain is described in terms of transition probabilities piij . This transition probability

    should be interpreted as follows

    If the economy is in state si in period t, the probability that the next state is equal to

    sj is piij.

    We therefore get the following definition

    A Markov chain is a stochastic process with a discrete indexing S , such that the

    conditional distribution of st+1 is independent of all previously attained states given

    st:

    piij = Prob(st + 1 = sj |st = si), si, sj S .

    Definition

    The important assumption we shall make concerning Markov processes is that the transition

    probabilities, piij , apply as soon as state si is activated no matter the history of the shocks nor

    how the system got to the state si. In other words there is no hysteresis. From a mathematical

    point of view, this corresponds to the socalled Markov property

    P (st+1 = sj |st = si, st1 = sin1 , . . . s0 = si0) = P (st+1 = sj |st = si) = piijfor all period t, all states si, sj S , and all sequences {sin}t1n=0 of earlier states. Thus, theprobability the next state st+1 does only depend on the current realization of s.

  • Applied Dynamic Programming 26

    The transition probabilities piij must of course satisfy

    1. piij > 0 for all i, j = 1, . . . ,M

    2.M

    j=1 piij = 1 for all i = 1, . . . ,M

    All of the elements of a Markov chain model can then be encoded in a transition probability

    matrix

    =

    pi11 . . . pi1M... . . . ...piM1 . . . piMM

    Note that kij then gives the probability that st+k = sj given that st = si. In the long run, we

    obtain the steady state equivalent for a Markov chain: the invariant distribution or stationary

    distribution

    A stationary distribution for a Markov chain is a distribution pi such that

    1. pij > 0 for all j = 1, . . . ,M

    2.M

    j=1 pij = 1

    3. pi = pi

    Definition

    Moment approach to discretization: A first simple way to tackle this problem is to rely on a

    method of moments. The idea is that the continuous process and its discrete approximation

    should possess the same asymptotic properties in terms of conditional first and second order

    moments. We will apply it later to the optimal growth model.

    Nevertheless, if as illustrated this approach is straightforward when we restrict ourselves

    to a 2 states representation of the process, it becomes cumbersome when we want to deal with

    more states. We then can rely on a quadrature based method proposed by Tauchen and Hussey

    [1991].

    Gaussianquadrature approach to discretization: Tauchen and Hussey [1991] provide a simple

    way to discretize VAR processes relying on Gaussian quadrature. This note will only present

  • Applied Dynamic Programming 27

    the case of an AR(1) process of the form4

    st+1 = st + (1 )s+ t+1

    where t+1 ; N (0, 2). This implies that

    1

    2pi

    exp

    {1

    2

    (st+1 st (1 )s

    )2}dst+1 =

    f(st+1|st)dst+1 = 1

    which illustrates the fact that s is a continuous random variable. Tauchen and Hussey [1991]

    propose to replace the integral by(st+1; st, s)f(st+1|s)dst+1

    f(st+1|st)f(st+1|s) f(st+1|s)dst+1 = 1

    where f(st+1|s) denotes the density of st+1 conditional on the fact that st = s (in fact theunconditional density function), which in our case implies that

    (st+1; st, s) f(st+1|st)f(st+1|s) = exp

    {1

    2

    [(st+1 st (1 )s

    )2(st+1 s

    )2]}

    then we can use the standard linear transformation and impose zt = (st s)/(

    2) to get

    1pi

    exp{ ((zt+1 zt)2 z2t+1)} exp (z2t+1) dzt+1

    for which we can use a GaussHermite quadrature. Assume then that we have the quadrature

    nodes zi and weights i, i = 1, . . . , n, the quadrature leads to the formula

    1pi

    nj=1

    j(zj ; zi;x) ' 1

    in other words we might interpret the quantity j(zj ; zi;x)/pi as an estimate piij Prob(st+1 =

    sj |st = si) of the transition probability from state i to state j. However, it is important to re-member that the quadrature is just an approximation such that it will generally be the case

    thatn

    j=1 piij = 1 does not hold exactly. Tauchen and Hussey therefore propose the following

    modification:

    piij =j(zj ; zi; s)

    pii

    where i =1pi

    nj=1 j(zj ; zi;x).

    Matlab Code: TauchenHusseys procedure

    function [s,p]=tauch_hussey(xbar,rho,sigma,n)

    % xbar : mean of the x process

    4In their article, Tauchen and Hussey [1991] consider more general processes.

  • Applied Dynamic Programming 28

    % rho : persistence parameter

    % sigma : volatility

    % n : number of nodes

    %

    % returns the states s and the transition probabilities p

    %

    [xx,wx] = gauss_herm(n); % nodes and weights for s

    s = sqrt(2)*s*xx+mx; % discrete states

    x = xx(:,ones(n,1));

    z = x;

    w = wx(:,ones(n,1));

    %

    % computation

    %

    p = (exp(z.*z-(z-rx*x).*(z-rx*x)).*w)./sqrt(pi);

    sx = sum(p);

    p = p./sx(:,ones(n,1));

    3.3 Value Iteration

    As in the deterministic case, the convergence of the simple value function iteration procedure is

    insured by the contraction mapping theorem. This is however a bit more subtle than that as we

    have to deal with the convergence of a probability measure, which goes far beyond this intro-

    duction to dynamic programming.5 The algorithm is basically the same as in the deterministic

    case up to the delicate point that expectations have to be computed at each iteration. It writes

    as follows

    1. Decide on a grid, X , of admissible values for the state variable x

    X = {x1, . . . , xN}

    and the shocks, s

    S = {s1, . . . , sM} together with the transition matrix = (piij)

    formulate an initial guess for the value function V0(x) and choose a stopping criterion

    > 0.

    2. For each x` X , ` = 1, . . . , N , and sk S , k = 1, . . . ,M compute

    Vi+1(x`, sk) = max{xX }u(y(x`, sk, x

    ), x`, sk) + Mj=1

    pikjVi(x, sj)

    5The interested reader should then refer to Lucas et al. [1989] chapter 9.

  • Applied Dynamic Programming 29

    3. If Vi+1(x, s) Vi(x, s) < go to the next step, otherwise go back to 2.

    4. Compute the final solution as

    y?(x, s) = y(x, x(s, s), s)

    As in the deterministic case, we will illustrate the method relying on the optimal growth model,

    with

    u(c) =c1 1

    1 and

    k = exp(a)k c+ (1 )k

    where a = a+ . Then the Bellman equation writes

    V (k, a) = maxc

    c1 11 +

    V (k, a)d(a|a)

    From the law of motion of capital we can determine consumption as

    c = exp(a)k + (1 )k k

    such that plugging this results in the Bellman equation, we have

    V (k, a) = maxk

    (exp(a)k + (1 )k k)1 11 +

    V (k, a)d(a|a)

    A first problem that we encounter is that we would like to be able to evaluate the integral

    involved by the rational expectation. We therefore have to discretize the shock. Here, we will

    consider that the technology shock can be accurately approximated by a 2 state Markov chain,

    such that a can take on 2 values a1 and a2 (a1 < a2). We will also assume that the transition

    matrix is symmetric, such that

    =

    (pi 1 pi

    1 pi pi)

    a1, a2 and pi are selected such that the process reproduces the conditional first and second order

    moments of the AR(1) process

    First order moments

    pia1 + (1 pi)a2 = a1(1 pi)a1 + pia2 = a2

    Second order moments

    pia21 + (1 pi)a22 (a1)2 = 2(1 pi)a21 + pia22 (a2)2 = 2

  • Applied Dynamic Programming 30

    From the first two equations we get a1 = a2 and pi = (1 +)/2. Plugging these last two resultsin the two last equations, we get a1 =

    2/(1 2). Hence, we will actually work with a value

    function of the form

    V (k, ak) = maxc

    c1 11 +

    2j=1

    pikjV (k, aj)

    Now, let us define a grid of N feasible values for k such that we have

    K = {k1, . . . , kN}

    and an initial value function V0(k) that is a vector of N numbers that relate each k` to a

    value. Note that this may be anything we want as we know by the contraction mapping

    theorem that the algorithm will converge. But, if we want it to converge fast enough it may

    be a good idea to impose a good initial guess. Finally, we need a stopping criterion.

    In Figure 3, we report the value function and the decision rules obtained for the stochastic

    optimal growth model with = 0.3, = 0.96, = 0.1, = 2, = 0.8 and = 0.05. The grid

    for the capital stock is composed of 2000 data points ranging from 25% below to 25% above the

    deterministic steady state.

    Matlab Code: Value iteration (Stochastic OGM)

    % Parameters of the economy

    sigma = 2.00; % utility parameter

    delta = 0.10; % depreciation rate

    beta = 0.96; % discount factor

    alpha = 0.30; % capital elasticity of output

    rho = 0.80;

    se = 0.05;

    % parameters of the algorithm

    dev = 0.75; % maximal deviation from steady state

    nbk = 2000; % number of data points in the grid

    crit = 1; % inital convergence criterion

    tol = 1e-6; % Tolerance parameter

    % Shocks

    p = (1+rho)/2;

    a = se/sqrt(1-rho*rho);

    PI = [p 1-p;1-p p];

    agrid = [-a a];

    nba = length(agrid);

    % Setup the grid

    ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

    kmin = (1-dev)*ks; % lower bound on the grid

    kmax = (1+dev)*ks; % upper bound on the grid

  • Applied Dynamic Programming 31

    kgrid = linspace(kmin,kmax,nbk); % builds the grid

    % Initial conditions

    v = zeros(nbk,nba); % value function

    tv = zeros(nbk,nba); % value function

    dr = zeros(nbk,nba); % decision rule (will contain indices)

    % Main loop

    iter = 1;

    while crit>tol;

    for j=1:nba

    for i=1:nbk

    % compute indexes for which consumption is positive

    imax= sum(kgrid 0.

  • Applied Dynamic Programming 32

    Figure 3: Stochastic OGM (Decision rules, Value iteration)

    0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.52

    1

    0

    1

    2

    3

    4

    capital (kt)

    Value Function

    0 2 4 60

    1

    2

    3

    4

    5

    6

    capital (kt)

    Next period capital

    0 2 4 6

    0.8

    1

    1.2

    1.4

    1.6

    capital (kt)

    Consumption

    2. Find a new policy rule y = fi+1(x, sk), k = 1, . . . ,M , such that

    fi+1(x, sk) argmaxy

    u(y, x, sk) +

    Mj=1

    pikjV (x, sj)

    with x = h(x, fi(x, sk), sk)

    3. check if fi+1(x, s) fi(x, s) < , if yes then stop, otherwise go back to 2.

    When computing the value function we actually have to solve a linear system of the form

    Vi+1(x`, sk) = u(fi+1(x`, sk), x`, sk) +

    Mj=1

    pikjVi+1(h(x`, fi+1(x`, sk), sk), sj)

    for Vi+1(x`, sk) (for all x` X and sk S ), which may be rewritten as

    Vi+1(x`, sk) = u(fi+1(x`, sk), x`, sk) + k.QVi+1(x`, .) x` X

    where Q is an (N N) matrix

    Q`j =

    {1 if x h(fi+1(x`), x`) = xj0 otherwise

  • Applied Dynamic Programming 33

    Note that although it is a big matrix, Q is sparse, which can be exploited in solving the system,

    to get

    Vi+1(x, s) = (I Q)1u(fi+1(x, s), x, s)We apply this algorithm to the same optimal growth model as in the previous section.

    Matlab Code: Policy iteration (Stochastic OGM)

    % Parameters of the economy

    sigma = 2.00; % utility parameter

    delta = 0.10; % depreciation rate

    beta = 0.96; % discount factor

    alpha = 0.30; % capital elasticity of output

    rho = 0.80;

    se = 0.05;

    % parameters of the algorithm

    dev = 0.75; % maximal deviation from steady state

    nbk = 2000; % number of data points in the grid

    crit = 1; % inital convergence criterion

    epsi = 1e-6; % Tolerance parameter

    % Shocks

    p = (1+rho)/2;

    a = se/sqrt(1-rho*rho);

    PI = [p 1-p;1-p p];

    agrid = exp([-a a]);

    nba = length(agrid);

    % Setup the grid

    ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

    kmin = (1-dev)*ks; % lower bound on the grid

    kmax = (1+dev)*ks; % upper bound on the grid

    kgrid = linspace(kmin,kmax,nbk); % builds the grid

    % Initial conditions

    v = zeros(nbk,nba); % value function

    v1 = zeros(nbk,nba); % value function

    kp0 = zeros(nbk,nba); % value function

    dr = zeros(nbk,nba); % decision rule (will contain indices)

    S = 100;

    % Main loop

    while crit>epsi;

    for j=1:nba

    for i=1:nbk

    % compute indexes for which consumption is positive

    imax = sum(kgrid

  • Applied Dynamic Programming 34

    EV = (PI(j,:)*v(1:imax,:));

    [v1(i,j),dr(i,j)]= max(util+beta*EV);

    end;

    end

    % decision rules

    kp = kgrid(dr);

    c = kron(agrid,kgrid.^alpha)+(1-delta)*repmat(kgrid,1,nba)-kp;

    util= (c.^(1-sigma)-1)/(1-sigma);

    Q = sparse(nbk*nba,nbk*nba);

    for j=1:nba;

    Q0 = sparse(nbk,nbk);

    J = sub2ind([nbk nbk],(1:nbk),dr(:,j));

    Q0(J)= 1;

    Q((j-1)*nbk+1:j*nbk,:) = kron(PI(j,:),Q0);

    end

    Tv = (speye(nbk*nba)-beta*Q)\util(:);

    v = reshape(Tv,nbk,nba);

    crit= max(max(abs(kp-kp0)))

    kp0 = kp;

    end

    As for the deterministic case, the modified kstep iterations can be used to update the value

    function rather than solving the big linear system.

    The endogenous grid method algorithm is as easily amendable to the stochastic case as the

    previous algorithms. Only the matlab code is reported.

    Matlab Code: Endogenous Grid Method (Stochastic OGM)

    clear all

    % Parameters of the economy

    sigma = 2.00; % utility parameter

    delta = 0.10; % depreciation rate

    beta = 0.96; % discount factor

    alpha = 0.30; % capital elasticity of output

    rho = 0.80;

    se = 0.05;

    % parameters of the algorithm

    dev = 0.75; % maximal deviation from steady state

    nbk = 2000; % number of data points in the grid

    crit = 1; % inital convergence criterion

    tol = 1e-6; % Tolerance parameter

    method = linear; % Interpolation method

    %

    % Shocks

    %

    a = se/(sqrt(1-rho*rho));

    agrid = exp([-a a]);

    nba = length(agrid);

  • Applied Dynamic Programming 35

    p = (1+rho)/2;

    PI = [p 1-p;1-p p];

    % Setup the grid

    kss = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

    kmin = (1-dev)*kss; % lower bound on the grid

    kmax = (1+dev)*kss; % upper bound on the grid

    kpgrid = linspace(kmin,kmax,nbk); % builds the grid

    %

    % Initial values

    %

    kgrid = kss*ones(nbk,nba);

    v0 = 1.5*(kpgrid.^alpha+(1-delta)*kpgrid);

    v0 = repmat((v0.^(1-sigma)-1)/((1-sigma)*(1-beta)),1,nba);

    v1 = v0;

    while crit>tol

    dv0 = diff_value(kpgrid,v0,method);

    c = (beta*(PI*dv0)).^(-1/sigma);

    u = (c.^(1-sigma)-1)/(1-sigma);

    tmp = u+beta*(PI*v0);

    for i=1:nba

    crk = 1;

    k0 = kgrid(:,i);

    while crk>tol;

    f0 = agrid(i)*k0.^alpha+(1-delta)*k0-kpgrid-c(:,i);

    df0 = alpha*agrid(i)*k0.^(alpha-1)+(1-delta);

    k1 = k0-f0./df0;

    crk = max(abs(k0-k1));

    k0 = k1;

    end

    kgrid(:,i) = k0;

    v1(:,i) = interp1(k0,tmp(:,i),kpgrid,method);

    end

    crit = max(max(abs(v0-v1)));

    v0 = v1;

    end

    %%

    ygrid = kron(agrid,ones(nbk,1)).*kgrid.^alpha;

    cgrid = ygrid+(1-delta)*kgrid-repmat(kpgrid,1,nba);

    igrid = ygrid-cgrid;

  • Applied Dynamic Programming 36

    References

    Bertsekas, D., Dynamic Programming and Stochastic Control, NewYork: Academic Press,

    1976.

    Carroll, C.D., The Method of Endogenous Gridpoints for Solving Dynamic Stochastic Opti-

    mization Problems, Economics Letters, 2006, 91 (3), 312320880.

    Judd, K. and A. Solnick, Numerical Dynamic Programming with ShapePreserving Splines,

    Manuscript, Hoover Institution 1994.

    Judd, K.L., Numerical methods in economics, Cambridge, Massachussets: MIT Press, 1998.

    Lucas, R., N. Stokey, and E. Prescott, Recursive Methods in Economic Dynamics, Cambridge

    (MA): Harvard University Press, 1989.

    Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions

    to Nonlinear Asset Pricing Models, Econometrica, 1991, 59 (2), 371396.