dynamic programming value iteration

Applied Dynamic Programming

In these notes, we will deal with a fundamental tool of dynamic macroeconomics: dynamic

programming. Dynamic programming is a very convenient way of writing a large set of dynamic

problems in economic analysis as most of the properties of this tool are now well established and

understood.1 In these lectures, we will not deal with all the theoretical properties attached to

this tool, but will rather give some recipes to solve economic problems using the tool of dynamic

programming. In order to understand the problem, we will first deal with deterministic models,

before extending the analysis to stochastic ones. However, we shall give some preliminary

definitions and theorems that justify this approach

1 The Bellman Equation and Associated Theorems

1.1 Heuristic Derivation of the Bellman Equation

Let us consider the case of an agent that has to decide on the path of a set of control variables,

{yt}t=0 in order to maximize the discounted sum of its future payoffs, u(yt, xt) where xt is statevariables assumed to evolve according to

xt+1 = h(xt, yt), x0 given.

We finally make the assumption that the model is Markovian.

The optimal value our agent can derive from this maximization process is given by

V (xt) = max{yt+sD(xt+s)}s=0

s=0

su(yt+s, xt+s) (1)

1For a mathematical exposition of the problem see Bertsekas [1976], for a more economic approach see Lucas,Stokey and Prescott [1989].

Applied Dynamic Programming 2

where D is the set of all feasible decisions for the variables of choice. The function V (xt) is the

value function and corresponds to the payoff of the agent when she adopts her optimal plan.

Note that the value function is a function of the state variable only. Indeed, since the model is

Markovian, only the knowledge of the past is necessary to take decisions. In other words, the

whole path can be predicted once the state variable is observed. Therefore, the value at t is only

a function of xt. Equation (1) may now be rewritten as

V (xt) = max{ytD(xt),{yt+sD(xt+s)}s=1}u(yt, xt) +

s=1

su(yt+s, xt+s) (2)

making the change of variable k = s 1, (2) rewrites

V (xt) = max{ytD(xt),{yt+1+kD(xt+1+k)}k=0}u(yt, xt) +

k=0

k+1u(yt+1+k, xt+1+k)

or

V (xt) = maxytD(xt)

u(yt, xt) + max{yt+1+kD(xt+1+k)}k=0

k=0

ku(yt+1+k, xt+1+k) (3)

Note that, by definition, we have

V (xt+1) = max{yt+1+kD(xt+1+k)}k=0

k=0

ku(yt+1+k, xt+1+k)

such that (3) rewrites as

V (xt) = maxytD(xt)

u(yt, xt) + V (xt+1) (4)

This is the socalled Bellman equation that lies at the core of the dynamic programming theory.

This equation is associated, in each and every period t, with a set of optimal policy functions

for y and x, which are defined by

{yt, xt+1} argmaxytD(xt)

u(yt, xt) + V (xt+1) (5)

Our problem is now to solve (4) for the function V (xt). This problem is particularly complicated

as we are not solving for just a point that would satisfy the equation, but we are interested in

finding a function that satisfies the equation. We are therefore leaving the realm of real analysis

for functional analysis.

A simple procedure to find a solution would be the following

1. Make an initial guess on the form of the value function V0(xt)

2. Update the guess using the Bellman equation such that

Vi+1(xt) = maxytD(xt)

u(yt, xt) + Vi(h(yt, xt))


3. If Vi+1(xt) = Vi(xt), then a fixed point has been found and the problem is solved, if not

we go back to 2, and iterate on the process until convergence.

Therefore, solving the Bellman equation just amounts to finding a fixed point of an operator T

Vi+1 = TVi

where T stands for the list of operations involved in the computation of the Bellman equation.

The problem is then that of the existence and the uniqueness of this fixedpoint. Mathematical

theory has provided conditions for the existence and uniqueness of a solution.

1.2 Existence and uniqueness of a solution

A metric space is a set S, together with a metric : S S R+, such that for allx, y, z S

1. (x, y) > 0, with (x, y) = 0 if and only if x = y,

2. (x, y) = (y, x),

3. (x, z) 6 (x, y) + (y, z).

Definition

A sequence {xn}n=0 in S converges to x S, if for each > 0 there exists an integerN such that

(xn, x) < for all n > N

Definition

A sequence {xn}n=0 in S is a Cauchy sequence if for each > 0 there exists an integerN such that

(xn, xm) < for all n,m > N

Definition


A metric space (S, ) is complete if every Cauchy sequence in S converges to a point

in S.

Definition

Let (S, ) be a metric space and T : S S be function mapping S into itself. T is acontraction mapping (with modulus ) if for (0, 1),

(Tx, Ty) 6 (x, y), for all x, y S.

Definition

We then have the following remarkable theorem that establishes the existence and uniqueness

of the fixed point of a contraction mapping.

If (S, ) is a complete metric space and T : S S is a contraction mapping withmodulus (0, 1), then

1. T has exactly one fixed point V S such that V = TV ,

2. for any V S, (TnV0, V ) < n(V 0, V ), with n = 0, 1, 2 . . .

Theorem (Contraction Mapping Theorem)

Since we are endowed with all the tools we need to prove the theorem, we shall do it.

Proof: In order to prove 1, we shall first prove that if we select any sequence {Vn}n=0, such that for each n,Vn S and

Vn = TVn+1

this sequence converges and that it converges to V S. In order to show convergence of {Vn}n=0, we shall provethat {Vn}n=0 is a Cauchy sequence. First of all, note that the contraction property of T implies that

(V2, V1) = (TV1, TV0) 6 (V1, V0)

and therefore(Vn+1, Vn) = (TVn, TVn1) 6 (Vn, Vn1) 6 . . . 6 n(V1, V0)

Now consider two terms of the sequence, Vm and Vn, m > n. The triangle inequality implies that

(Vm, Vn) 6 (Vm, Vm1) + (Vm1, Vm2) + . . .+ (Vn+1, Vn)


therefore, making use of the previous result, we have

(Vm, Vn) 6(m1 + m2 + . . .+ n

)(V1, V0) 6

n

1 (V1, V0)

Since (0, 1), n 0 as n , we have that for each > 0, there exists N N such that (Vm, Vn) < .Hence {Vn}n=0 is a Cauchy sequence and it therefore converges. Further, since we have assume that S is complete,Vn converges to V S.We now have to show that V = TV in order to complete the proof of the first part. Note that, for each > 0,and for V0 S, the triangular inequality implies

(V, TV ) 6 (V, Vn) + (Vn, TV )

But since {Vn}n=0 is a Cauchy sequence, we have

(V, TV ) 6 (V, Vn) + (Vn, TV ) 6

2+

2

for large enough n, therefore V = TV .

Hence, we have proven that T possesses a fixed point and therefore have established its existence. We now haveto prove uniqueness. This can be obtained by contradiction. Suppose, there exists another function, say W Sthat satisfies W = TW . Then, the definition of the fixed point implies

(V,W ) = (TV, TW )

but the contraction property implies

(V,W ) = (TV, TW ) 6 (V,W )

which, as > 0 implies (V,W ) = 0 and so V = W . The limit is then unique.

Proving 2. is straightforward as

(TnV0, V ) = (TnV0, TV ) 6 (Tn1V0, V )

but we have (Tn1V0, V ) = (Tn1V0, TV ) such that

(TnV0, V ) = (TnV0, TV ) 6 (Tn1V0, V ) 6 2(Tn2V0, V ) 6 . . . 6 n(V0, V )

which completes the proof. 2

This theorem is of great importance as it establishes that any operator that possesses the

contraction property will exhibit a unique fixedpoint, which therefore provides some rationale

to the algorithm we were designing in the previous section. It also insures that, if the value

function satisfies a contraction property, simple iterations will deliver the solution for any initial

conditions. It therefore remains to provide conditions for the value function to be a contraction.

These are provided by the next theorem.


Let X R` and B(X) be the space of bounded functions V : X R with theuniform metric. Let T : B(X) B(X) be an operator satisfying

1. (Monotonicity) Let V,W B(X), if V (x) 6 W (x) for all x X, then TV (x) 6TW (x)

2. (Discounting) There exists some constant (0, 1) such that for all V B(X)and a > 0, we have

T (V + a) 6 TV + a

then T is a contraction with modulus .

Theorem (Blackwells Sufficient Conditions)

Proof: Let us consider two functions V,W B(X) satisfying 1. and 2., and such thatV 6W + (V,W )

Monotonicity first implies thatTV 6 T (W + (V,W ))

and discounting then impliesTV 6 TW + (V,W ))

We therefore getTV TW 6 (V,W )

Likewise, if we now consider that W 6 V + (V,W ), we end up withTW TV 6 (V,W )

Consequently, we have|TV TW | 6 (V,W )

so that(TV TW ) 6 (V,W )

which defines a contraction. This completes the proof. 2

This theorem is extremely useful as it gives us simple tools to check whether a problem is a

contraction and therefore permits to check whether the simple algorithm we were defined is

appropriate for the problem we have in hand.

As an example, let us consider the optimal growth model, for which the Bellman equation writes

V (kt) = maxctC

u(ct) + V (kt+1)

with kt+1 = F (kt) ct. In order to save on notations, let us drop the time subscript and denotethe next period capital stock by k. Plugging the law of motion of capital in the utility function,

the Bellman equation rewrites

V (k) = maxkK

u(F (k) k) + V (k)


where we now have to find the optimal next period capital stock i.e. the optimal investment

policy. Let us now define the operator T as

(TV )(k) = maxkK

u(F (k) k) + V (k)

We would like to know if T is a contraction and therefore if there exists a unique function V

such that

V (k) = (TV )(k)

In order to achieve this task, we just have to check whether T is monotonic and satisfies the

discounting property.

1. Monotonicity: Let us consider two candidate value functions, V and W , such that V (k) 6W (k) for all k K . What we want to show is that (TV )(k) 6 (TW )(k). In order to dothat, let us denote by k the optimal next period capital stock, that is

(TV )(k) = u(F (k) k) + V (k)

But now, since V (k) 6 W (k) for all k K , we have V (k) 6 W (k), such that it shouldbe clear that

(TV )(k) 6 u(F (k) k) + W (k)6 max

kKu(F (k) k) + W (k) = (TW (k))

Hence we have shown that V (k) 6 W (k) implies (TV )(k) 6 (TW )(k) and thereforeestablished monotonicity.

2. Discounting: Let us consider a candidate value function, V , and a positive constant a.

(T (V + a))(k) = maxkK

u(F (k) k) + (V (k) + a)= max

kKu(F (k) k) + V (k) + a

= (TV )(k) + a

Therefore, the Bellman equation satisfies discounting in the case of optimal growth model.

Hence, the optimal growth model satisfies the Blackwells sufficient conditions for a contraction

mapping, and therefore the value function exists and is unique. We are now in position to design

a numerical algorithm to solve the Bellman equation.


2 Deterministic Dynamic Programming

2.1 Value Function Iteration

The contraction mapping theorem gives us a straightforward way to compute the solution to

the Bellman equation: iterate on the operator T , such that Vi+1 = TVi up to the point where

the distance between two successive value functions is small enough. Basically this amounts to

apply the following algorithm

1. Decide on a grid, X , of admissible values for the state variable x

X = {x1, . . . , xN}

and formulate an initial guess for the value function V0(x). Finally, choose a stopping

criterion > 0.

2. For each x` X , ` = 1, . . . , N , compute

Vi+1(x`) = max{xX }u(y(x`, x

), x`) + Vi(x)

3. If Vi+1(x) Vi(x) < go to the next step, otherwise go back to 2.

4. Compute the final solution as

y?(x) = y(x, x)

In order to better understand the algorithm, let us consider a simple example and go back to

the optimal growth model, with

u(c) =c1 1

1 and

k = k c+ (1 )k > 0Then the Bellman equation writes

V (k) = max06c6k+(1)k

c1 11 + V (k

)

From the law of motion of capital we can determine consumption as

c = k + (1 )k k

such that plugging this results in the Bellman equation, we have

V (k) = max06k6k+(1)k

(k + (1 )k k)1 11 + V (k

)


Now, let us define a grid of N feasible values for k such that we have

K = {k1, . . . , kN}

and an initial value function V0(k) that is a vector of N numbers that relate each k` to a

value. Note that this may be anything we want as we know by the contraction mapping

theorem that the algorithm will converge. But, if we want it to converge fast enough it may

be a good idea to impose a good initial guess. Finally, we need a stopping criterion.

Then, for each ` = 1, . . . , N , we compute the feasible values that can be taken by the quantity

in left hand side of the value function

Q`,h (k` + (1 )k` kh)1 1

1 + V (kh) for h feasible

It is important to understand what h feasible means. Indeed, we only compute consumption

when it is positive, which restricts the number of possible values for k. Namely, we want k to

satisfy

0 6 k 6 k + (1 )kwhich puts an upper bound on the index h. When the grid of values is uniform that is when

kh = k + (h 1)k, this bound can be computed as

h = E

(ki + (1 )ki k

k

)+ 1

where E(x) denotes the integer part of x.

Then we find

Q?`,h = maxh=h,...,h

Q`,h

and set

Vi+1(k`) = Q?`,h

and keep in memory the index h? = argmaxh=1,...,N Q`,h, such that we have

k(k`) = kh?

In Figure 1, we report the value function and the decision rules obtained from the deterministic

optimal growth model with = 0.3, = 0.96, = 0.1 and = 2. The grid for the capital stock

is composed of 1000 data points ranging from (1k)k? to (1 + k)k?, where k? denotes thesteady state and k = 0.75. The algorithm

2 is then as follows.

2The code reported in these notes and those that follow are not efficient from a computational point of view,as they are intended to have you understand the method without adding any coding complications. Much fasterimplementations can be obtained by vectorizing the code.


Matlab Code: Value iteration (OGM)

% Parameters of the economy

sigma = 2.00; % utility parameter

delta = 0.10; % depreciation rate

beta = 0.96; % discount factor

alpha = 0.30; % capital elasticity of output

% parameters of the algorithm

nbk = 2000; % number of data points in the grid

crit = 1; % inital convergence criterion

tol = 1e-6; % Tolerance parameter

% Setup the grid

ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

dev = 0.75; % maximal deviation from steady state

kmin = (1-dev)*ks; % lower bound on the grid

kmax = (1+dev)*ks; % upper bound on the grid

kgrid = linspace(kmin,kmax,nbk); % builds the grid

% Initial conditions

v = zeros(nbk,1); % value function

tv = zeros(nbk,1); % value function

dr = zeros(nbk,1); % decision rule (will contain indices)

% Main loop

iter = 1;

while crit>tol;

for i=1:nbk

%

% compute indexes for which consumption is positive

%

imax = sum(kgrid


Figure 1: Deterministic OGM (Decision rules, Value iteration)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.52

1

0

1

2

3

4

capital (kt)

Value Function

0 2 4 60

1

2

3

4

5

6

capital (kt)

Next period capital

0 2 4 6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

capital (kt)

Consumption

2.2 Taking advantage of interpolation

A possible improvement of the method is to have a much looser grid on the capital stock but have

a more dense grid on the control variable (consumption in the optimal growth model). Then the

next period value of the state variable can be computed much more precisely. However, because

of this precision and the fact the capital grid is coarser, it may be the case that the computed

optimal value for the next period state variable does not lie in the grid. This implies that the

value function is unknown at this particular value. Therefore, we need to use an interpolation

scheme to get an approximation of the value function at this value. One advantage of this

approach is that it involves less function evaluations and is usually less costly in terms of CPU

time. The algorithm is then as follows:


X = {x1, . . . , xN}

Decide on a grid, Y , of admissible values for the control variable y

Y = {y1, . . . , yM} with M N


formulate an initial guess for the value function V0(x) and choose a stopping criterion

> 0.

2. For each x` X , ` = 1, . . . , N , compute

x`,j = h(yj , x`)j = 1, . . . ,M

Compute an interpolated value function at each x`,j = h(yj , x`): Vi(x`,j)

Vi+1(x`) = max{yY }u(y, x`) + Vi(x

`,j)

3. If Vi+1(x) Vi(x) < go to the next step, otherwise go back to 2.

The matlab code for this approach is reported next. Cubic spline interpolation method is used

for the value function, with 20 nodes for the capital stock and 10000 nodes for consumption.

The algorithm converges starting from initial condition for the value function

v0(k) =((c?/y?) k)1 1

(1 )

Matlab Code: Value iteration with interpolation (OGM)

clear all

clc






rho = 0.80;

se = 0.05;



nbk = 20; % number of grid points in k

nbc = 10000; % number of grid points in c



method = linear; % interpolation method

% Setup the grid








tv = zeros(nbk,1); % value function

kpgrid = zeros(nbk,1); % decision rule for k(t+1)

ygrid = kgrid.âlpha; % output

% Main loop

iter = 1;

while crit>tol;

for i=1:nbk

% consumption, next period capital and utility

c = linspace(0,ygrid(i),nbc);

k1 = kgrid(i)âlpha+(1-delta)*kgrid(i)-c;

util = (c.^(1-sigma)-1)/(1-sigma);

% find value function

vint = interp1(kgrid,v,k1,method);

[v1,dr] = max(util+beta*vint);

tv(i) = v1;

kpgrid(i) = k1(dr);

end;

crit = max(max(abs(tv-v))); % Compute convergence criterion

v = tv; % Update the value function

fprintf(Iteration: %3d Crit: %g\n,iter,crit)

iter = iter+1;

end

% Output

ygrid = kgrid.âlpha;

cgrid = ygrid+(1-delta)*kgrid-kpgrid;

2.3 Policy iterations: Howard Improvement

The simple value iteration algorithm has the attractive feature of being particularly simple to

implement. However, it is a slow procedure, especially for infinite horizon problems, since it can

be shown that this procedure converges at the rate , which is usually close to 1! Furthermore,

it computes unnecessary quantities during the algorithm which slows down convergence. Often,

computation speed is really important, for instance when one wants to perform a sensitivity

analysis of the results obtained in a model using different parameterizations. Hence, we would

like to be able to speed up convergence. This can be achieved by applying the socalled Howard

improvement method. This idea of this method is to iterate on policy functions rather than on

the value function. The algorithm may be described as follows

1. Set i = 0 and guess an initial feasible decision rule for the control variable y = fi(x) and

compute the value associated to this guess, assuming that this rule is operative forever

V (xt) =s=0

su(fi(xt+s), xt+s)

taking care of the fact that xt+1 = h(xt, yt) = h(xt, fi(xt)). Set a stopping criterion > 0.


2. Find a new policy rule y = fi+1(x) such that

fi+1(x) argmaxy

u(y, x) + V (x)

with x = h(x, fi(x))

3. check if fi+1(x) fi(x) < , if yes then stop, otherwise go back to 2.

Note that this method differs fundamentally from the value iteration algorithm in at least two

dimensions

(i) one iterates on the policy function rather than on the value function;

(ii) the decision rule is used forever whereas it is assumed that it is used only two consecutive

periods in the value iteration algorithm. This is precisely this last feature that accelerates

convergence.

Note that when computing the value function we actually have to solve a linear system of the

form

Vi+1(x`) = u(fi+1(x`), x`) + Vi+1(h(x`, fi+1(x`))) ` = 1, . . . , N

for Vi+1(x`), which may be rewritten as

Vi+1(x`) = u(fi+1(x`), x`) + QVi+1(x`) ` = 1, . . . , N

where Q is an (N N) matrix

Q`j =

{1 if x h(fi+1(x`), x`) = x`0 otherwise

Note that although it is a big matrix, Q is sparse, which can be exploited in solving the system,

to get

Vi+1(x) = (I Q)1u(fi+1(x), x)

Matlab Code: Policy iteration (OGM)

clear all

clc


sigma = 2; % utility parameter








epsi = 1e-6; % Tolerance parameter

% Setup the grid








v1 = zeros(nbk,1); % value function

kp0 = zeros(nbk,1); % value function

dr = zeros(nbk,1); % decision rule (will contain indices)

% Main loop

while crit>epsi;

for i=1:nbk


imax = sum(kgrid


which may be particularly costly when the number of grid points is important. Therefore, it

was proposed to replace the matrix inversion by an additional iteration step, leading to the so

called modified policy iteration with k steps, which replaces the linear problem by the following

iteration scheme

1. Set J0 = Vi

2. iterate k times on

Ji+1 = u(y, x) + QJi, i = 0, . . . , k

3. set Vi+1 = Jk.

When k , Jk tends toward the solution of the linear system. In that case, the main loopin the matlab code is slightly different.

Matlab Code: Modified Policy iteration (OGM), main loop only

K = 100; % Number of iterations

% Main loop

while crit>epsi;

for i=1:nbk


imax = sum(kgrid


2.4 Endogenous Grid Method

The endogenous grid method was initially proposed by Carroll [2006] who proposed to formulate

a grid for the next period state variable rather than its current value. One merit of this approach

is that it allows to take advantage of some first order conditions and thereby accelerate the

maximization step. The version which is presented here differs from that proposed in Carroll

[2006] in that it does not take advantage of some redefinition of variables that would accelerate

even further the algorithm. This is done (i) to keep notations and intuition simple and (ii) to

show that the algorithm can be combined with a root finding problem relatively easily.

Assume the problem to be solved takes the form

V (xt) = maxu(yt, xt) + V (xt+1)

with xt+1 = h(xt, yt). The algorithm then works as follows

1. Set a grid for xt+1 and an initial value function Vi(xt+1). (i = 0 in the first iteration)

2. Use the first order condition to get y?t = y(xt+1, xt)

u(yt, xt)

yt+

Vi(xt+1)

xt+1

h(xt, yt)

yt= 0 y?t = (xt+1, xt)

3. Use the transition equation to find the optimal x?t = x(xt+1):

xt+1 = h(xt, yt) = h(xt,(xt+1, xt) x?t = x(xt+1)

and update yt such that

y?t = y(xt+1,

x(xt+1)) = y(xt+1)

4. Then compute

Vi+1(x?t ) = u(y

?t , x

?t ) + Vi(xt+1)

5. Interpolate Vi+1(x?t ) on the grid for xt+1 and update the value.

6. If Vi+1(xt+1) Vi(xt+1) < then stop, else go back to 2.

Note that the initial guess for the value function now plays a much more important role than

before as its partial derivative is needed to compute the optimal decision rule for the control

variable yt. The computation of the partial derivative can be achieved by mixing interpolation

and numerical differentiation in a rather standard way. Also note how step 4 does not involve

maximization as the optimal decision rules has already been obtained from step 2.


In the optimal growth model case, things are rather simple as the utility is only a function

of ct and the capital stock can be solved for very simply. From the first order conditions on

consumption, and given a grid for the next period capital stock kt+1, it is readily obtained

ct = (Vk(kt+1)) 1

which can then be used in the capital accumulation equation to obtain the current period capital

stock kt by solving3

kt + (1 )kt ct kt+1 = 0We denote by k?t the solution to this equation. This process defines a grid for the current period

capital stock which is endogenously determined, and so the name. The current value function

is then given by

V (k?t ) =c1t 1

1 + V (kt+1)Since the new value function is evaluated at nodes that do not necessarily lie on the grid for

next period capital stock, the updated value function has to be interpolated on these nodes.

Matlab Code: Endogenous Grid Method (OGM)










method = linear; % Interpolation method

% Setup the grid





kpgrid = linspace(kmin,kmax,nbk); % builds the grid

% Initial values

c0 = 1.2*(kpgrid.^alpha+(1-delta)*kpgrid);

v0 = (c0.^(1-sigma)-1)/((1-sigma)*(1-beta));

kgrid = kss*ones(nk,1);

while crit>tol

dv0 = diff_value(kpgrid,v0,method);

c = (beta*dv0).^(-1/sigma);

tmp = (c.^(1-sigma)-1)/(1-sigma)+beta*v0;

3In the version proposed by Carroll [2006] this step is further accelerated by rewriting the problem in termsof available resources Yt = k

t + (1 )kt.


crk = 1;

while crk>tol; % Gauss-Newton Algorithm

f0 = kgrid.^alpha+(1-delta)*kgrid-kpgrid-c;

df0 = alpha*kgrid.^(alpha-1)+1-delta;

k1 = kgrid-f0./df0;

crk = max(abs(kgrid-k1));

kgrid= k1;

end

v1 = interp1(kgrid,tmp,kpgrid,method);

crit = max(max(abs(v0-v1)));

v0 = v1;

end

% Output

cgrid = kgrid.^alpha+(1-delta)*kgrid-kpgrid;

vgrid = interp1(kpgrid,v1,kgrid,method);

2.5 Parametric dynamic programming

The last technique we will describe borrows from approximation theory using either orthogonal

polynomials or spline functions. The idea is actually to make a guess for the functional form of

the value function and iterate on the parameters of this functional form. The algorithm then

works as follows

1. Choose a functional form for the value function V (x; ), a grid of interpolating nodes

X = {x1, . . . , xN}, a stopping criterion > 0 and an initial vector of parameters 0.

2. Using the conjecture for the value function, perform the maximization step in the Bellman

equation, that is compute w` = T (V (x,i))

w` = T (V (x`,i)) = maxyu(y, x`) + V (x

,i)

s.t. x = h(y, x`) for ` = 1, . . . , N

3. Using the approximation method you have chosen, compute a new vector of parameters

i+1 such that V (x,i+1) approximates the data (x`, w`).

4. If V (x,i+1) V (x,i) < then stop, otherwise go back to 2.

First note that for this method to be implementable, we need the payoff function and the

value function to be continuous. The approximation function may be either a combination of

polynomials, neural networks, splines. Note that during the optimization problem, we may have


to rely on a numerical maximization algorithm, and the approximation method may involve

numerical minimization in order to solve a nonlinear leastsquare problem of the form:

i+1 Argmin

N`=1

(w` V (x`; ))2

This algorithm is usually much faster than value iteration as it does not require iterating on a

large grid. As an example, I will once again focus on the optimal growth problem we have been

dealing with so far, and I will approximate the value function by

V (k; ) =

pi=1

ii

(2

log(k) kk k 1

)where {i(.)}pi=0 is a set of Chebychev polynomials. In the example, I set p = 5 and used 20nodes. Figure 2 reports the decision rule and the value function in this case, and table 1 reports

the parameters of the approximation function. The algorithm converged in 242 iterations, but

it took much less time than value iterations.

Matlab Code: Parametric dynamic programming (OGM)





nbk = 20; % number of points in the grid

p = 10; % order of polynomials

crit = 1; % convergence criterion

iter = 1; % iteration

epsi = 1e-6; % convergence parameter


dev = 0.9; % maximal deviation from k*

kmin = log((1-dev)*ks); % lower bound on the grid

kmax = log((1+dev)*ks); % upper bound on the grid

rk = -cos((2*[1:nbk]-1)*pi/(2*nnk)); % Interpolating nodes

kgrid = exp(kmin+(rk+1)*(kmax-kmin)/2); % mapping

%

% Initial guess for the approximation

%

v = (((kgrid.^alpha).^(1-sigma)-1)/((1-sigma)*(1-beta)));;

X = chebychev(rk,n);

th0 = X\v

Tv = zeros(nbk,1);

kp = zeros(nbk,1);

%

% Main loop

%

options=foptions;

options(14)=1e9;


while crit>epsi;

k0 = kgrid(1);

for i=1:nbk

param = [alpha beta delta sigma kmin kmax n kgrid(i)];

kp(i) = fminunc(@tv,k0,options,[],param,th0);

k0 = kp(i);

Tv(i) = -tv(kp(i),param,th0);

end;

theta= X\Tv;

crit = max(abs(Tv-v));

v = Tv;

th0 = theta;

iter= iter+1;

end

Matlab Code: Parametric dynamic programming (extra functions)

function res=tv(kp,param,theta);

alpha = param(1);

beta = param(2);

delta = param(3);

sigma = param(4);

kmin = param(5);

kmax = param(6);

n = param(7);

k = param(8);

kp = sqrt(kp.^2); % insures positivity of k

v = value(kp,[kmin kmax n],theta); % computes the value function

c = k.^alpha+(1-delta)*k-kp; % computes consumption

d = find(c


case 1;

Tx=[ones(lx,1) X];

otherwise

Tx=[ones(lx,1) X];

for i=3:n+1;

Tx=[Tx 2*X.*Tx(:,i-1)-Tx(:,i-2)];

end

end

Figure 2: Deterministic OGM (Decision rules, Parametric DP)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 54

2

0

2

4

kt

Value Function

0 1 2 3 4 50

1

2

3

4

5

kt

Next period capital stock

0 1 2 3 4 50

0.5

1

1.5

2

kt

Consumption

Table 1: Value function approximation

0 1 2 3 4 5-0.2334 3.0686 0.2543 0.0153 -0.0011 -0.0002

A potential problem with the use of Chebychev polynomials is that they do not put any assump-

tion on the shape of the value function, which we know to be concave and strictly increasing in

this case. This is why Judd [1998] recommends to use shapepreserving methods such as Schu-

maker approximation. Judd and Solnick [1994] have successfully applied this latter technique

to the optimal growth model and found that the approximation was very good and dominated


other methods (they actually get the same precision with 12 nodes as the one achieved with a

1200 data points grid using a value iteration technique).

3 Stochastic Dynamic Programming

In a large number of problems, we have to deal with stochastic shocks just think of a stan-

dard RBC model dynamic programming technique can obviously be extended to deal with

such problems. This section will first show how we can obtain the Bellman equation before

addressing some important issues concerning discretization of shocks. Then it will describe the

implementation of the value iteration and policy iteration technique for the stochastic case. The

code for the endogenous grid method is reported at the end of the text.

3.1 The Bellman Equation

The stochastic problem differs from the deterministic problem in that we now have to take

expectations. The problem then defines a value function which has as argument the state

variable xt but also the stationary shock st, whose sequence {st}+t=0 satisfies

st+1 = (st, t+1) (6)

where is a white noise process. The value function is therefore given by

V (xt, st) = max{yt+D(xt+ ,st+ )}=0Et

=0

u(yt+ , xt+ , st+ ) (7)

subject to (6) and

xt+1 = h(xt, yt, st) (8)

Since both yt, xt and st are either perfectly observed or decided in period t they are known in

t, such that we may rewrite the value function as

V (xt, st) = max{ytD(xt,st),{yt+D(xt+ ,st+ )}=1}u(yt, xt, st) + Et

=1

u(yt+ , xt+ , st+ )

or

V (xt, st) = maxytD(xt,st)

u(yt, xt, st) + max{yt+D(xt+ ,st+ )}=1Et

=1

u(yt+ , xt+ , st+ )

Using the change of variable k = 1, this rewrites


u(yt, xt, st)

+ max{yt+1+kD(xt+1+k,st+1+k)}k=0

Et

k=0

ku(yt+1+k, xt+1+k, st+1+k)


It is important at this point to recall that

Et(X(t+ )) =

X(t+ )f(t+ |t)dt+

=

X(t+ )f(t+ |t+1)f(t+1|t)dt+dt+1

=

. . .

X(t+ )f(t+ |t+1) . . . f(t+1|t)dt+ . . . dt+1

which is a corollary of the law of iterated projections, such that the value function rewrites


u(yt, xt, st)


EtEt+1

k=0

ku(yt+1+k, xt+1+k, st+1+k)

or


u(yt, xt, st)


Et+1

k=0

ku(yt+1+k, xt+1+k, st+1+k)f(st+1|st)dst+1

Note that because each value for the shock defines a particular mathematical object the max-

imization of the integral corresponds to the integral of the maximization, therefore the max

operator and the summation are interchangeable, so that we get


u(yt, xt, st)

+

max

{yt+1+kD(xt+1+k,st+1+k)}k=0Et+1

k=0

ku(yt+1+k, xt+1+k, st+1+k)f(st+1|st)dst+1

By definition, the term under the integral corresponds to V (xt+1, st+1), such that the value

rewrites


u(yt, xt, st) +

V (xt+1, st+1)f(st+1|st)dst+1

or equivalently


u(yt, xt, st) + EtV (xt+1, st+1)

which is precisely the Bellman equation for the stochastic dynamic programming problem.

3.2 Discretization of the Shocks

A very important problem that arises whenever we deal with value iteration or policy iteration in

a stochastic environment is that of the discretization of the space spanned by the shocks. Indeed,


the use of a continuous support for the stochastic shocks is unfeasible for a computer that can

only deal with discrete supports. We therefore have to transform the continuous problem into

a discrete one with the constraint that the asymptotic properties of the continuous and the

discrete processes should be the same. The question we therefore face is: does there exist a

discrete representation for s which is equivalent to its continuous original representation? The

answer to this question is yes. In particular as soon as we deal with (V)AR processes, we can

use a very powerful tool: Markov chains.

Markov Chains: A Markov chain is a sequence of random values whose probabilities at a time

interval depends upon the value of the number at the previous time. We will restrict ourselves

to discrete-time Markov chains, in which the state changes at certain discrete time instants,

indexed by an integer variable t. At each time step t, the Markov chain is in a state, denoted

by s S {s1, . . . , sM}. S is called the state space.

The Markov chain is described in terms of transition probabilities piij . This transition probability

should be interpreted as follows

If the economy is in state si in period t, the probability that the next state is equal to

sj is piij.

We therefore get the following definition

A Markov chain is a stochastic process with a discrete indexing S , such that the

conditional distribution of st+1 is independent of all previously attained states given

st:

piij = Prob(st + 1 = sj |st = si), si, sj S .

Definition

The important assumption we shall make concerning Markov processes is that the transition

probabilities, piij , apply as soon as state si is activated no matter the history of the shocks nor

how the system got to the state si. In other words there is no hysteresis. From a mathematical

point of view, this corresponds to the socalled Markov property

P (st+1 = sj |st = si, st1 = sin1 , . . . s0 = si0) = P (st+1 = sj |st = si) = piijfor all period t, all states si, sj S , and all sequences {sin}t1n=0 of earlier states. Thus, theprobability the next state st+1 does only depend on the current realization of s.


The transition probabilities piij must of course satisfy

1. piij > 0 for all i, j = 1, . . . ,M

2.M

j=1 piij = 1 for all i = 1, . . . ,M

All of the elements of a Markov chain model can then be encoded in a transition probability

matrix

=

pi11 . . . pi1M... . . . ...piM1 . . . piMM

Note that kij then gives the probability that st+k = sj given that st = si. In the long run, we

obtain the steady state equivalent for a Markov chain: the invariant distribution or stationary

distribution

A stationary distribution for a Markov chain is a distribution pi such that

1. pij > 0 for all j = 1, . . . ,M

2.M

j=1 pij = 1

3. pi = pi

Definition

Moment approach to discretization: A first simple way to tackle this problem is to rely on a

method of moments. The idea is that the continuous process and its discrete approximation

should possess the same asymptotic properties in terms of conditional first and second order

moments. We will apply it later to the optimal growth model.

Nevertheless, if as illustrated this approach is straightforward when we restrict ourselves

to a 2 states representation of the process, it becomes cumbersome when we want to deal with

more states. We then can rely on a quadrature based method proposed by Tauchen and Hussey

[1991].

Gaussianquadrature approach to discretization: Tauchen and Hussey [1991] provide a simple

way to discretize VAR processes relying on Gaussian quadrature. This note will only present


the case of an AR(1) process of the form4

st+1 = st + (1 )s+ t+1

where t+1 ; N (0, 2). This implies that

1

2pi

exp

{1

2

(st+1 st (1 )s

)2}dst+1 =

f(st+1|st)dst+1 = 1

which illustrates the fact that s is a continuous random variable. Tauchen and Hussey [1991]

propose to replace the integral by(st+1; st, s)f(st+1|s)dst+1

f(st+1|st)f(st+1|s) f(st+1|s)dst+1 = 1

where f(st+1|s) denotes the density of st+1 conditional on the fact that st = s (in fact theunconditional density function), which in our case implies that

(st+1; st, s) f(st+1|st)f(st+1|s) = exp

{1

2

[(st+1 st (1 )s

)2(st+1 s

)2]}

then we can use the standard linear transformation and impose zt = (st s)/(

2) to get

1pi

exp{ ((zt+1 zt)2 z2t+1)} exp (z2t+1) dzt+1

for which we can use a GaussHermite quadrature. Assume then that we have the quadrature

nodes zi and weights i, i = 1, . . . , n, the quadrature leads to the formula

1pi

nj=1

j(zj ; zi;x) ' 1

in other words we might interpret the quantity j(zj ; zi;x)/pi as an estimate piij Prob(st+1 =

sj |st = si) of the transition probability from state i to state j. However, it is important to re-member that the quadrature is just an approximation such that it will generally be the case

thatn

j=1 piij = 1 does not hold exactly. Tauchen and Hussey therefore propose the following

modification:

piij =j(zj ; zi; s)

pii

where i =1pi

nj=1 j(zj ; zi;x).

Matlab Code: TauchenHusseys procedure

function [s,p]=tauch_hussey(xbar,rho,sigma,n)

% xbar : mean of the x process

4In their article, Tauchen and Hussey [1991] consider more general processes.


% rho : persistence parameter

% sigma : volatility

% n : number of nodes

%

% returns the states s and the transition probabilities p

%

[xx,wx] = gauss_herm(n); % nodes and weights for s

s = sqrt(2)*s*xx+mx; % discrete states

x = xx(:,ones(n,1));

z = x;

w = wx(:,ones(n,1));

%

% computation

%

p = (exp(z.*z-(z-rx*x).*(z-rx*x)).*w)./sqrt(pi);

sx = sum(p);

p = p./sx(:,ones(n,1));

3.3 Value Iteration

As in the deterministic case, the convergence of the simple value function iteration procedure is

insured by the contraction mapping theorem. This is however a bit more subtle than that as we

have to deal with the convergence of a probability measure, which goes far beyond this intro-

duction to dynamic programming.5 The algorithm is basically the same as in the deterministic

case up to the delicate point that expectations have to be computed at each iteration. It writes

as follows


X = {x1, . . . , xN}

and the shocks, s

S = {s1, . . . , sM} together with the transition matrix = (piij)

formulate an initial guess for the value function V0(x) and choose a stopping criterion

> 0.

2. For each x` X , ` = 1, . . . , N , and sk S , k = 1, . . . ,M compute

Vi+1(x`, sk) = max{xX }u(y(x`, sk, x

), x`, sk) + Mj=1

pikjVi(x, sj)

5The interested reader should then refer to Lucas et al. [1989] chapter 9.


3. If Vi+1(x, s) Vi(x, s) < go to the next step, otherwise go back to 2.

4. Compute the final solution as

y?(x, s) = y(x, x(s, s), s)

As in the deterministic case, we will illustrate the method relying on the optimal growth model,

with

u(c) =c1 1

1 and

k = exp(a)k c+ (1 )k

where a = a+ . Then the Bellman equation writes

V (k, a) = maxc

c1 11 +

V (k, a)d(a|a)

From the law of motion of capital we can determine consumption as

c = exp(a)k + (1 )k k

such that plugging this results in the Bellman equation, we have

V (k, a) = maxk

(exp(a)k + (1 )k k)1 11 +

V (k, a)d(a|a)

A first problem that we encounter is that we would like to be able to evaluate the integral

involved by the rational expectation. We therefore have to discretize the shock. Here, we will

consider that the technology shock can be accurately approximated by a 2 state Markov chain,

such that a can take on 2 values a1 and a2 (a1 < a2). We will also assume that the transition

matrix is symmetric, such that

=

(pi 1 pi

1 pi pi)

a1, a2 and pi are selected such that the process reproduces the conditional first and second order

moments of the AR(1) process

First order moments

pia1 + (1 pi)a2 = a1(1 pi)a1 + pia2 = a2

Second order moments

pia21 + (1 pi)a22 (a1)2 = 2(1 pi)a21 + pia22 (a2)2 = 2


From the first two equations we get a1 = a2 and pi = (1 +)/2. Plugging these last two resultsin the two last equations, we get a1 =

2/(1 2). Hence, we will actually work with a value

function of the form

V (k, ak) = maxc

c1 11 +

2j=1

pikjV (k, aj)

Now, let us define a grid of N feasible values for k such that we have

K = {k1, . . . , kN}

and an initial value function V0(k) that is a vector of N numbers that relate each k` to a

value. Note that this may be anything we want as we know by the contraction mapping

theorem that the algorithm will converge. But, if we want it to converge fast enough it may

be a good idea to impose a good initial guess. Finally, we need a stopping criterion.

In Figure 3, we report the value function and the decision rules obtained for the stochastic

optimal growth model with = 0.3, = 0.96, = 0.1, = 2, = 0.8 and = 0.05. The grid

for the capital stock is composed of 2000 data points ranging from 25% below to 25% above the

deterministic steady state.

Matlab Code: Value iteration (Stochastic OGM)






rho = 0.80;

se = 0.05;






% Shocks

p = (1+rho)/2;

a = se/sqrt(1-rho*rho);

PI = [p 1-p;1-p p];

agrid = [-a a];

nba = length(agrid);

% Setup the grid







v = zeros(nbk,nba); % value function

tv = zeros(nbk,nba); % value function

dr = zeros(nbk,nba); % decision rule (will contain indices)

% Main loop

iter = 1;

while crit>tol;

for j=1:nba

for i=1:nbk


imax= sum(kgrid 0.


Figure 3: Stochastic OGM (Decision rules, Value iteration)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.52

1

0

1

2

3

4

capital (kt)

Value Function

0 2 4 60

1

2

3

4

5

6

capital (kt)

Next period capital

0 2 4 6

0.8

1

1.2

1.4

1.6

capital (kt)

Consumption

2. Find a new policy rule y = fi+1(x, sk), k = 1, . . . ,M , such that

fi+1(x, sk) argmaxy

u(y, x, sk) +

Mj=1

pikjV (x, sj)

with x = h(x, fi(x, sk), sk)

3. check if fi+1(x, s) fi(x, s) < , if yes then stop, otherwise go back to 2.

When computing the value function we actually have to solve a linear system of the form

Vi+1(x`, sk) = u(fi+1(x`, sk), x`, sk) +

Mj=1

pikjVi+1(h(x`, fi+1(x`, sk), sk), sj)

for Vi+1(x`, sk) (for all x` X and sk S ), which may be rewritten as

Vi+1(x`, sk) = u(fi+1(x`, sk), x`, sk) + k.QVi+1(x`, .) x` X

where Q is an (N N) matrix

Q`j =

{1 if x h(fi+1(x`), x`) = xj0 otherwise


Note that although it is a big matrix, Q is sparse, which can be exploited in solving the system,

to get

Vi+1(x, s) = (I Q)1u(fi+1(x, s), x, s)We apply this algorithm to the same optimal growth model as in the previous section.

Matlab Code: Policy iteration (Stochastic OGM)






rho = 0.80;

se = 0.05;






% Shocks

p = (1+rho)/2;

a = se/sqrt(1-rho*rho);

PI = [p 1-p;1-p p];

agrid = exp([-a a]);


% Setup the grid






v = zeros(nbk,nba); % value function

v1 = zeros(nbk,nba); % value function

kp0 = zeros(nbk,nba); % value function

dr = zeros(nbk,nba); % decision rule (will contain indices)

S = 100;

% Main loop

while crit>epsi;

for j=1:nba

for i=1:nbk


imax = sum(kgrid


EV = (PI(j,:)*v(1:imax,:));

[v1(i,j),dr(i,j)]= max(util+beta*EV);

end;

end

% decision rules

kp = kgrid(dr);

c = kron(agrid,kgrid.^alpha)+(1-delta)*repmat(kgrid,1,nba)-kp;

util= (c.^(1-sigma)-1)/(1-sigma);

Q = sparse(nbk*nba,nbk*nba);

for j=1:nba;

Q0 = sparse(nbk,nbk);

J = sub2ind([nbk nbk],(1:nbk),dr(:,j));

Q0(J)= 1;

Q((j-1)*nbk+1:j*nbk,:) = kron(PI(j,:),Q0);

end

Tv = (speye(nbk*nba)-beta*Q)\util(:);

v = reshape(Tv,nbk,nba);

crit= max(max(abs(kp-kp0)))

kp0 = kp;

end

As for the deterministic case, the modified kstep iterations can be used to update the value

function rather than solving the big linear system.

The endogenous grid method algorithm is as easily amendable to the stochastic case as the

previous algorithms. Only the matlab code is reported.

Matlab Code: Endogenous Grid Method (Stochastic OGM)

clear all






rho = 0.80;

se = 0.05;






method = linear; % Interpolation method

%

% Shocks

%

a = se/(sqrt(1-rho*rho));

agrid = exp([-a a]);



p = (1+rho)/2;

PI = [p 1-p;1-p p];

% Setup the grid

kss = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

kmin = (1-dev)*kss; % lower bound on the grid

kmax = (1+dev)*kss; % upper bound on the grid

kpgrid = linspace(kmin,kmax,nbk); % builds the grid

%

% Initial values

%

kgrid = kss*ones(nbk,nba);

v0 = 1.5*(kpgrid.âlpha+(1-delta)*kpgrid);

v0 = repmat((v0.^(1-sigma)-1)/((1-sigma)*(1-beta)),1,nba);

v1 = v0;

while crit>tol

dv0 = diff_value(kpgrid,v0,method);

c = (beta*(PI*dv0)).^(-1/sigma);

u = (c.^(1-sigma)-1)/(1-sigma);

tmp = u+beta*(PI*v0);

for i=1:nba

crk = 1;

k0 = kgrid(:,i);

while crk>tol;

f0 = agrid(i)*k0.âlpha+(1-delta)*k0-kpgrid-c(:,i);

df0 = alpha*agrid(i)*k0.^(alpha-1)+(1-delta);

k1 = k0-f0./df0;

crk = max(abs(k0-k1));

k0 = k1;

end

kgrid(:,i) = k0;

v1(:,i) = interp1(k0,tmp(:,i),kpgrid,method);

end

crit = max(max(abs(v0-v1)));

v0 = v1;

end

%%

ygrid = kron(agrid,ones(nbk,1)).*kgrid.âlpha;

cgrid = ygrid+(1-delta)*kgrid-repmat(kpgrid,1,nba);

igrid = ygrid-cgrid;


References

Bertsekas, D., Dynamic Programming and Stochastic Control, NewYork: Academic Press,

1976.

Carroll, C.D., The Method of Endogenous Gridpoints for Solving Dynamic Stochastic Opti-

mization Problems, Economics Letters, 2006, 91 (3), 312320880.

Judd, K. and A. Solnick, Numerical Dynamic Programming with ShapePreserving Splines,

Manuscript, Hoover Institution 1994.

Judd, K.L., Numerical methods in economics, Cambridge, Massachussets: MIT Press, 1998.

Lucas, R., N. Stokey, and E. Prescott, Recursive Methods in Economic Dynamics, Cambridge

(MA): Harvard University Press, 1989.

Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions

to Nonlinear Asset Pricing Models, Econometrica, 1991, 59 (2), 371396.

dynamic programming value iteration

Documents