dynamic programming value iteration
DESCRIPTION
dynamic programming, DSGE, MAcroeconomics, New Keynesian Methods, value iterationTRANSCRIPT
-
Applied Dynamic Programming
In these notes, we will deal with a fundamental tool of dynamic macroeconomics: dynamic
programming. Dynamic programming is a very convenient way of writing a large set of dynamic
problems in economic analysis as most of the properties of this tool are now well established and
understood.1 In these lectures, we will not deal with all the theoretical properties attached to
this tool, but will rather give some recipes to solve economic problems using the tool of dynamic
programming. In order to understand the problem, we will first deal with deterministic models,
before extending the analysis to stochastic ones. However, we shall give some preliminary
definitions and theorems that justify this approach
1 The Bellman Equation and Associated Theorems
1.1 Heuristic Derivation of the Bellman Equation
Let us consider the case of an agent that has to decide on the path of a set of control variables,
{yt}t=0 in order to maximize the discounted sum of its future payoffs, u(yt, xt) where xt is statevariables assumed to evolve according to
xt+1 = h(xt, yt), x0 given.
We finally make the assumption that the model is Markovian.
The optimal value our agent can derive from this maximization process is given by
V (xt) = max{yt+sD(xt+s)}s=0
s=0
su(yt+s, xt+s) (1)
1For a mathematical exposition of the problem see Bertsekas [1976], for a more economic approach see Lucas,Stokey and Prescott [1989].
-
Applied Dynamic Programming 2
where D is the set of all feasible decisions for the variables of choice. The function V (xt) is the
value function and corresponds to the payoff of the agent when she adopts her optimal plan.
Note that the value function is a function of the state variable only. Indeed, since the model is
Markovian, only the knowledge of the past is necessary to take decisions. In other words, the
whole path can be predicted once the state variable is observed. Therefore, the value at t is only
a function of xt. Equation (1) may now be rewritten as
V (xt) = max{ytD(xt),{yt+sD(xt+s)}s=1}u(yt, xt) +
s=1
su(yt+s, xt+s) (2)
making the change of variable k = s 1, (2) rewrites
V (xt) = max{ytD(xt),{yt+1+kD(xt+1+k)}k=0}u(yt, xt) +
k=0
k+1u(yt+1+k, xt+1+k)
or
V (xt) = maxytD(xt)
u(yt, xt) + max{yt+1+kD(xt+1+k)}k=0
k=0
ku(yt+1+k, xt+1+k) (3)
Note that, by definition, we have
V (xt+1) = max{yt+1+kD(xt+1+k)}k=0
k=0
ku(yt+1+k, xt+1+k)
such that (3) rewrites as
V (xt) = maxytD(xt)
u(yt, xt) + V (xt+1) (4)
This is the socalled Bellman equation that lies at the core of the dynamic programming theory.
This equation is associated, in each and every period t, with a set of optimal policy functions
for y and x, which are defined by
{yt, xt+1} argmaxytD(xt)
u(yt, xt) + V (xt+1) (5)
Our problem is now to solve (4) for the function V (xt). This problem is particularly complicated
as we are not solving for just a point that would satisfy the equation, but we are interested in
finding a function that satisfies the equation. We are therefore leaving the realm of real analysis
for functional analysis.
A simple procedure to find a solution would be the following
1. Make an initial guess on the form of the value function V0(xt)
2. Update the guess using the Bellman equation such that
Vi+1(xt) = maxytD(xt)
u(yt, xt) + Vi(h(yt, xt))
-
Applied Dynamic Programming 3
3. If Vi+1(xt) = Vi(xt), then a fixed point has been found and the problem is solved, if not
we go back to 2, and iterate on the process until convergence.
Therefore, solving the Bellman equation just amounts to finding a fixed point of an operator T
Vi+1 = TVi
where T stands for the list of operations involved in the computation of the Bellman equation.
The problem is then that of the existence and the uniqueness of this fixedpoint. Mathematical
theory has provided conditions for the existence and uniqueness of a solution.
1.2 Existence and uniqueness of a solution
A metric space is a set S, together with a metric : S S R+, such that for allx, y, z S
1. (x, y) > 0, with (x, y) = 0 if and only if x = y,
2. (x, y) = (y, x),
3. (x, z) 6 (x, y) + (y, z).
Definition
A sequence {xn}n=0 in S converges to x S, if for each > 0 there exists an integerN such that
(xn, x) < for all n > N
Definition
A sequence {xn}n=0 in S is a Cauchy sequence if for each > 0 there exists an integerN such that
(xn, xm) < for all n,m > N
Definition
-
Applied Dynamic Programming 4
A metric space (S, ) is complete if every Cauchy sequence in S converges to a point
in S.
Definition
Let (S, ) be a metric space and T : S S be function mapping S into itself. T is acontraction mapping (with modulus ) if for (0, 1),
(Tx, Ty) 6 (x, y), for all x, y S.
Definition
We then have the following remarkable theorem that establishes the existence and uniqueness
of the fixed point of a contraction mapping.
If (S, ) is a complete metric space and T : S S is a contraction mapping withmodulus (0, 1), then
1. T has exactly one fixed point V S such that V = TV ,
2. for any V S, (TnV0, V ) < n(V 0, V ), with n = 0, 1, 2 . . .
Theorem (Contraction Mapping Theorem)
Since we are endowed with all the tools we need to prove the theorem, we shall do it.
Proof: In order to prove 1, we shall first prove that if we select any sequence {Vn}n=0, such that for each n,Vn S and
Vn = TVn+1
this sequence converges and that it converges to V S. In order to show convergence of {Vn}n=0, we shall provethat {Vn}n=0 is a Cauchy sequence. First of all, note that the contraction property of T implies that
(V2, V1) = (TV1, TV0) 6 (V1, V0)
and therefore(Vn+1, Vn) = (TVn, TVn1) 6 (Vn, Vn1) 6 . . . 6 n(V1, V0)
Now consider two terms of the sequence, Vm and Vn, m > n. The triangle inequality implies that
(Vm, Vn) 6 (Vm, Vm1) + (Vm1, Vm2) + . . .+ (Vn+1, Vn)
-
Applied Dynamic Programming 5
therefore, making use of the previous result, we have
(Vm, Vn) 6(m1 + m2 + . . .+ n
)(V1, V0) 6
n
1 (V1, V0)
Since (0, 1), n 0 as n , we have that for each > 0, there exists N N such that (Vm, Vn) < .Hence {Vn}n=0 is a Cauchy sequence and it therefore converges. Further, since we have assume that S is complete,Vn converges to V S.We now have to show that V = TV in order to complete the proof of the first part. Note that, for each > 0,and for V0 S, the triangular inequality implies
(V, TV ) 6 (V, Vn) + (Vn, TV )
But since {Vn}n=0 is a Cauchy sequence, we have
(V, TV ) 6 (V, Vn) + (Vn, TV ) 6
2+
2
for large enough n, therefore V = TV .
Hence, we have proven that T possesses a fixed point and therefore have established its existence. We now haveto prove uniqueness. This can be obtained by contradiction. Suppose, there exists another function, say W Sthat satisfies W = TW . Then, the definition of the fixed point implies
(V,W ) = (TV, TW )
but the contraction property implies
(V,W ) = (TV, TW ) 6 (V,W )
which, as > 0 implies (V,W ) = 0 and so V = W . The limit is then unique.
Proving 2. is straightforward as
(TnV0, V ) = (TnV0, TV ) 6 (Tn1V0, V )
but we have (Tn1V0, V ) = (Tn1V0, TV ) such that
(TnV0, V ) = (TnV0, TV ) 6 (Tn1V0, V ) 6 2(Tn2V0, V ) 6 . . . 6 n(V0, V )
which completes the proof. 2
This theorem is of great importance as it establishes that any operator that possesses the
contraction property will exhibit a unique fixedpoint, which therefore provides some rationale
to the algorithm we were designing in the previous section. It also insures that, if the value
function satisfies a contraction property, simple iterations will deliver the solution for any initial
conditions. It therefore remains to provide conditions for the value function to be a contraction.
These are provided by the next theorem.
-
Applied Dynamic Programming 6
Let X R` and B(X) be the space of bounded functions V : X R with theuniform metric. Let T : B(X) B(X) be an operator satisfying
1. (Monotonicity) Let V,W B(X), if V (x) 6 W (x) for all x X, then TV (x) 6TW (x)
2. (Discounting) There exists some constant (0, 1) such that for all V B(X)and a > 0, we have
T (V + a) 6 TV + a
then T is a contraction with modulus .
Theorem (Blackwells Sufficient Conditions)
Proof: Let us consider two functions V,W B(X) satisfying 1. and 2., and such thatV 6W + (V,W )
Monotonicity first implies thatTV 6 T (W + (V,W ))
and discounting then impliesTV 6 TW + (V,W ))
We therefore getTV TW 6 (V,W )
Likewise, if we now consider that W 6 V + (V,W ), we end up withTW TV 6 (V,W )
Consequently, we have|TV TW | 6 (V,W )
so that(TV TW ) 6 (V,W )
which defines a contraction. This completes the proof. 2
This theorem is extremely useful as it gives us simple tools to check whether a problem is a
contraction and therefore permits to check whether the simple algorithm we were defined is
appropriate for the problem we have in hand.
As an example, let us consider the optimal growth model, for which the Bellman equation writes
V (kt) = maxctC
u(ct) + V (kt+1)
with kt+1 = F (kt) ct. In order to save on notations, let us drop the time subscript and denotethe next period capital stock by k. Plugging the law of motion of capital in the utility function,
the Bellman equation rewrites
V (k) = maxkK
u(F (k) k) + V (k)
-
Applied Dynamic Programming 7
where we now have to find the optimal next period capital stock i.e. the optimal investment
policy. Let us now define the operator T as
(TV )(k) = maxkK
u(F (k) k) + V (k)
We would like to know if T is a contraction and therefore if there exists a unique function V
such that
V (k) = (TV )(k)
In order to achieve this task, we just have to check whether T is monotonic and satisfies the
discounting property.
1. Monotonicity: Let us consider two candidate value functions, V and W , such that V (k) 6W (k) for all k K . What we want to show is that (TV )(k) 6 (TW )(k). In order to dothat, let us denote by k the optimal next period capital stock, that is
(TV )(k) = u(F (k) k) + V (k)
But now, since V (k) 6 W (k) for all k K , we have V (k) 6 W (k), such that it shouldbe clear that
(TV )(k) 6 u(F (k) k) + W (k)6 max
kKu(F (k) k) + W (k) = (TW (k))
Hence we have shown that V (k) 6 W (k) implies (TV )(k) 6 (TW )(k) and thereforeestablished monotonicity.
2. Discounting: Let us consider a candidate value function, V , and a positive constant a.
(T (V + a))(k) = maxkK
u(F (k) k) + (V (k) + a)= max
kKu(F (k) k) + V (k) + a
= (TV )(k) + a
Therefore, the Bellman equation satisfies discounting in the case of optimal growth model.
Hence, the optimal growth model satisfies the Blackwells sufficient conditions for a contraction
mapping, and therefore the value function exists and is unique. We are now in position to design
a numerical algorithm to solve the Bellman equation.
-
Applied Dynamic Programming 8
2 Deterministic Dynamic Programming
2.1 Value Function Iteration
The contraction mapping theorem gives us a straightforward way to compute the solution to
the Bellman equation: iterate on the operator T , such that Vi+1 = TVi up to the point where
the distance between two successive value functions is small enough. Basically this amounts to
apply the following algorithm
1. Decide on a grid, X , of admissible values for the state variable x
X = {x1, . . . , xN}
and formulate an initial guess for the value function V0(x). Finally, choose a stopping
criterion > 0.
2. For each x` X , ` = 1, . . . , N , compute
Vi+1(x`) = max{xX }u(y(x`, x
), x`) + Vi(x)
3. If Vi+1(x) Vi(x) < go to the next step, otherwise go back to 2.
4. Compute the final solution as
y?(x) = y(x, x)
In order to better understand the algorithm, let us consider a simple example and go back to
the optimal growth model, with
u(c) =c1 1
1 and
k = k c+ (1 )k > 0Then the Bellman equation writes
V (k) = max06c6k+(1)k
c1 11 + V (k
)
From the law of motion of capital we can determine consumption as
c = k + (1 )k k
such that plugging this results in the Bellman equation, we have
V (k) = max06k6k+(1)k
(k + (1 )k k)1 11 + V (k
)
-
Applied Dynamic Programming 9
Now, let us define a grid of N feasible values for k such that we have
K = {k1, . . . , kN}
and an initial value function V0(k) that is a vector of N numbers that relate each k` to a
value. Note that this may be anything we want as we know by the contraction mapping
theorem that the algorithm will converge. But, if we want it to converge fast enough it may
be a good idea to impose a good initial guess. Finally, we need a stopping criterion.
Then, for each ` = 1, . . . , N , we compute the feasible values that can be taken by the quantity
in left hand side of the value function
Q`,h (k` + (1 )k` kh)1 1
1 + V (kh) for h feasible
It is important to understand what h feasible means. Indeed, we only compute consumption
when it is positive, which restricts the number of possible values for k. Namely, we want k to
satisfy
0 6 k 6 k + (1 )kwhich puts an upper bound on the index h. When the grid of values is uniform that is when
kh = k + (h 1)k, this bound can be computed as
h = E
(ki + (1 )ki k
k
)+ 1
where E(x) denotes the integer part of x.
Then we find
Q?`,h = maxh=h,...,h
Q`,h
and set
Vi+1(k`) = Q?`,h
and keep in memory the index h? = argmaxh=1,...,N Q`,h, such that we have
k(k`) = kh?
In Figure 1, we report the value function and the decision rules obtained from the deterministic
optimal growth model with = 0.3, = 0.96, = 0.1 and = 2. The grid for the capital stock
is composed of 1000 data points ranging from (1k)k? to (1 + k)k?, where k? denotes thesteady state and k = 0.75. The algorithm
2 is then as follows.
2The code reported in these notes and those that follow are not efficient from a computational point of view,as they are intended to have you understand the method without adding any coding complications. Much fasterimplementations can be obtained by vectorizing the code.
-
Applied Dynamic Programming 10
Matlab Code: Value iteration (OGM)
% Parameters of the economy
sigma = 2.00; % utility parameter
delta = 0.10; % depreciation rate
beta = 0.96; % discount factor
alpha = 0.30; % capital elasticity of output
% parameters of the algorithm
nbk = 2000; % number of data points in the grid
crit = 1; % inital convergence criterion
tol = 1e-6; % Tolerance parameter
% Setup the grid
ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
dev = 0.75; % maximal deviation from steady state
kmin = (1-dev)*ks; % lower bound on the grid
kmax = (1+dev)*ks; % upper bound on the grid
kgrid = linspace(kmin,kmax,nbk); % builds the grid
% Initial conditions
v = zeros(nbk,1); % value function
tv = zeros(nbk,1); % value function
dr = zeros(nbk,1); % decision rule (will contain indices)
% Main loop
iter = 1;
while crit>tol;
for i=1:nbk
%
% compute indexes for which consumption is positive
%
imax = sum(kgrid
-
Applied Dynamic Programming 11
Figure 1: Deterministic OGM (Decision rules, Value iteration)
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.52
1
0
1
2
3
4
capital (kt)
Value Function
0 2 4 60
1
2
3
4
5
6
capital (kt)
Next period capital
0 2 4 6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
capital (kt)
Consumption
2.2 Taking advantage of interpolation
A possible improvement of the method is to have a much looser grid on the capital stock but have
a more dense grid on the control variable (consumption in the optimal growth model). Then the
next period value of the state variable can be computed much more precisely. However, because
of this precision and the fact the capital grid is coarser, it may be the case that the computed
optimal value for the next period state variable does not lie in the grid. This implies that the
value function is unknown at this particular value. Therefore, we need to use an interpolation
scheme to get an approximation of the value function at this value. One advantage of this
approach is that it involves less function evaluations and is usually less costly in terms of CPU
time. The algorithm is then as follows:
1. Decide on a grid, X , of admissible values for the state variable x
X = {x1, . . . , xN}
Decide on a grid, Y , of admissible values for the control variable y
Y = {y1, . . . , yM} with M N
-
Applied Dynamic Programming 12
formulate an initial guess for the value function V0(x) and choose a stopping criterion
> 0.
2. For each x` X , ` = 1, . . . , N , compute
x`,j = h(yj , x`)j = 1, . . . ,M
Compute an interpolated value function at each x`,j = h(yj , x`): Vi(x`,j)
Vi+1(x`) = max{yY }u(y, x`) + Vi(x
`,j)
3. If Vi+1(x) Vi(x) < go to the next step, otherwise go back to 2.
The matlab code for this approach is reported next. Cubic spline interpolation method is used
for the value function, with 20 nodes for the capital stock and 10000 nodes for consumption.
The algorithm converges starting from initial condition for the value function
v0(k) =((c?/y?) k)1 1
(1 )
Matlab Code: Value iteration with interpolation (OGM)
clear all
clc
% Parameters of the economy
sigma = 2.00; % utility parameter
delta = 0.10; % depreciation rate
beta = 0.96; % discount factor
alpha = 0.30; % capital elasticity of output
rho = 0.80;
se = 0.05;
% parameters of the algorithm
dev = 0.75; % maximal deviation from steady state
nbk = 20; % number of grid points in k
nbc = 10000; % number of grid points in c
crit = 1; % inital convergence criterion
tol = 1e-6; % Tolerance parameter
method = linear; % interpolation method
% Setup the grid
ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
kmin = (1-dev)*ks; % lower bound on the grid
kmax = (1+dev)*ks; % upper bound on the grid
kgrid = linspace(kmin,kmax,nbk); % builds the grid
% Initial conditions
v = zeros(nbk,1); % value function
-
Applied Dynamic Programming 13
tv = zeros(nbk,1); % value function
kpgrid = zeros(nbk,1); % decision rule for k(t+1)
ygrid = kgrid.^alpha; % output
% Main loop
iter = 1;
while crit>tol;
for i=1:nbk
% consumption, next period capital and utility
c = linspace(0,ygrid(i),nbc);
k1 = kgrid(i)^alpha+(1-delta)*kgrid(i)-c;
util = (c.^(1-sigma)-1)/(1-sigma);
% find value function
vint = interp1(kgrid,v,k1,method);
[v1,dr] = max(util+beta*vint);
tv(i) = v1;
kpgrid(i) = k1(dr);
end;
crit = max(max(abs(tv-v))); % Compute convergence criterion
v = tv; % Update the value function
fprintf(Iteration: %3d Crit: %g\n,iter,crit)
iter = iter+1;
end
% Output
ygrid = kgrid.^alpha;
cgrid = ygrid+(1-delta)*kgrid-kpgrid;
2.3 Policy iterations: Howard Improvement
The simple value iteration algorithm has the attractive feature of being particularly simple to
implement. However, it is a slow procedure, especially for infinite horizon problems, since it can
be shown that this procedure converges at the rate , which is usually close to 1! Furthermore,
it computes unnecessary quantities during the algorithm which slows down convergence. Often,
computation speed is really important, for instance when one wants to perform a sensitivity
analysis of the results obtained in a model using different parameterizations. Hence, we would
like to be able to speed up convergence. This can be achieved by applying the socalled Howard
improvement method. This idea of this method is to iterate on policy functions rather than on
the value function. The algorithm may be described as follows
1. Set i = 0 and guess an initial feasible decision rule for the control variable y = fi(x) and
compute the value associated to this guess, assuming that this rule is operative forever
V (xt) =s=0
su(fi(xt+s), xt+s)
taking care of the fact that xt+1 = h(xt, yt) = h(xt, fi(xt)). Set a stopping criterion > 0.
-
Applied Dynamic Programming 14
2. Find a new policy rule y = fi+1(x) such that
fi+1(x) argmaxy
u(y, x) + V (x)
with x = h(x, fi(x))
3. check if fi+1(x) fi(x) < , if yes then stop, otherwise go back to 2.
Note that this method differs fundamentally from the value iteration algorithm in at least two
dimensions
(i) one iterates on the policy function rather than on the value function;
(ii) the decision rule is used forever whereas it is assumed that it is used only two consecutive
periods in the value iteration algorithm. This is precisely this last feature that accelerates
convergence.
Note that when computing the value function we actually have to solve a linear system of the
form
Vi+1(x`) = u(fi+1(x`), x`) + Vi+1(h(x`, fi+1(x`))) ` = 1, . . . , N
for Vi+1(x`), which may be rewritten as
Vi+1(x`) = u(fi+1(x`), x`) + QVi+1(x`) ` = 1, . . . , N
where Q is an (N N) matrix
Q`j =
{1 if x h(fi+1(x`), x`) = x`0 otherwise
Note that although it is a big matrix, Q is sparse, which can be exploited in solving the system,
to get
Vi+1(x) = (I Q)1u(fi+1(x), x)
Matlab Code: Policy iteration (OGM)
clear all
clc
% Parameters of the economy
sigma = 2; % utility parameter
delta = 0.1; % depreciation rate
beta = 0.96; % discount factor
alpha = 0.30; % capital elasticity of output
% parameters of the algorithm
-
Applied Dynamic Programming 15
nbk = 2000; % number of data points in the grid
crit = 1; % inital convergence criterion
epsi = 1e-6; % Tolerance parameter
% Setup the grid
dev = 0.5; % maximal deviation from steady state
ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
kmin = (1-dev)*ks; % lower bound on the grid
kmax = (1+dev)*ks; % upper bound on the grid
kgrid = linspace(kmin,kmax,nbk); % builds the grid
% Initial conditions
v = zeros(nbk,1); % value function
v1 = zeros(nbk,1); % value function
kp0 = zeros(nbk,1); % value function
dr = zeros(nbk,1); % decision rule (will contain indices)
% Main loop
while crit>epsi;
for i=1:nbk
% compute indexes for which consumption is positive
imax = sum(kgrid
-
Applied Dynamic Programming 16
which may be particularly costly when the number of grid points is important. Therefore, it
was proposed to replace the matrix inversion by an additional iteration step, leading to the so
called modified policy iteration with k steps, which replaces the linear problem by the following
iteration scheme
1. Set J0 = Vi
2. iterate k times on
Ji+1 = u(y, x) + QJi, i = 0, . . . , k
3. set Vi+1 = Jk.
When k , Jk tends toward the solution of the linear system. In that case, the main loopin the matlab code is slightly different.
Matlab Code: Modified Policy iteration (OGM), main loop only
K = 100; % Number of iterations
% Main loop
while crit>epsi;
for i=1:nbk
% compute indexes for which consumption is positive
imax = sum(kgrid
-
Applied Dynamic Programming 17
2.4 Endogenous Grid Method
The endogenous grid method was initially proposed by Carroll [2006] who proposed to formulate
a grid for the next period state variable rather than its current value. One merit of this approach
is that it allows to take advantage of some first order conditions and thereby accelerate the
maximization step. The version which is presented here differs from that proposed in Carroll
[2006] in that it does not take advantage of some redefinition of variables that would accelerate
even further the algorithm. This is done (i) to keep notations and intuition simple and (ii) to
show that the algorithm can be combined with a root finding problem relatively easily.
Assume the problem to be solved takes the form
V (xt) = maxu(yt, xt) + V (xt+1)
with xt+1 = h(xt, yt). The algorithm then works as follows
1. Set a grid for xt+1 and an initial value function Vi(xt+1). (i = 0 in the first iteration)
2. Use the first order condition to get y?t = y(xt+1, xt)
u(yt, xt)
yt+
Vi(xt+1)
xt+1
h(xt, yt)
yt= 0 y?t = (xt+1, xt)
3. Use the transition equation to find the optimal x?t = x(xt+1):
xt+1 = h(xt, yt) = h(xt,(xt+1, xt) x?t = x(xt+1)
and update yt such that
y?t = y(xt+1,
x(xt+1)) = y(xt+1)
4. Then compute
Vi+1(x?t ) = u(y
?t , x
?t ) + Vi(xt+1)
5. Interpolate Vi+1(x?t ) on the grid for xt+1 and update the value.
6. If Vi+1(xt+1) Vi(xt+1) < then stop, else go back to 2.
Note that the initial guess for the value function now plays a much more important role than
before as its partial derivative is needed to compute the optimal decision rule for the control
variable yt. The computation of the partial derivative can be achieved by mixing interpolation
and numerical differentiation in a rather standard way. Also note how step 4 does not involve
maximization as the optimal decision rules has already been obtained from step 2.
-
Applied Dynamic Programming 18
In the optimal growth model case, things are rather simple as the utility is only a function
of ct and the capital stock can be solved for very simply. From the first order conditions on
consumption, and given a grid for the next period capital stock kt+1, it is readily obtained
ct = (Vk(kt+1)) 1
which can then be used in the capital accumulation equation to obtain the current period capital
stock kt by solving3
kt + (1 )kt ct kt+1 = 0We denote by k?t the solution to this equation. This process defines a grid for the current period
capital stock which is endogenously determined, and so the name. The current value function
is then given by
V (k?t ) =c1t 1
1 + V (kt+1)Since the new value function is evaluated at nodes that do not necessarily lie on the grid for
next period capital stock, the updated value function has to be interpolated on these nodes.
Matlab Code: Endogenous Grid Method (OGM)
% Parameters of the economy
sigma = 2.00; % utility parameter
delta = 0.10; % depreciation rate
beta = 0.96; % discount factor
alpha = 0.30; % capital elasticity of output
% parameters of the algorithm
nbk = 2000; % number of data points in the grid
crit = 1; % inital convergence criterion
epsi = 1e-6; % Tolerance parameter
method = linear; % Interpolation method
% Setup the grid
dev = 0.5; % maximal deviation from steady state
ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
kmin = (1-dev)*ks; % lower bound on the grid
kmax = (1+dev)*ks; % upper bound on the grid
kpgrid = linspace(kmin,kmax,nbk); % builds the grid
% Initial values
c0 = 1.2*(kpgrid.^alpha+(1-delta)*kpgrid);
v0 = (c0.^(1-sigma)-1)/((1-sigma)*(1-beta));
kgrid = kss*ones(nk,1);
while crit>tol
dv0 = diff_value(kpgrid,v0,method);
c = (beta*dv0).^(-1/sigma);
tmp = (c.^(1-sigma)-1)/(1-sigma)+beta*v0;
3In the version proposed by Carroll [2006] this step is further accelerated by rewriting the problem in termsof available resources Yt = k
t + (1 )kt.
-
Applied Dynamic Programming 19
crk = 1;
while crk>tol; % Gauss-Newton Algorithm
f0 = kgrid.^alpha+(1-delta)*kgrid-kpgrid-c;
df0 = alpha*kgrid.^(alpha-1)+1-delta;
k1 = kgrid-f0./df0;
crk = max(abs(kgrid-k1));
kgrid= k1;
end
v1 = interp1(kgrid,tmp,kpgrid,method);
crit = max(max(abs(v0-v1)));
v0 = v1;
end
% Output
cgrid = kgrid.^alpha+(1-delta)*kgrid-kpgrid;
vgrid = interp1(kpgrid,v1,kgrid,method);
2.5 Parametric dynamic programming
The last technique we will describe borrows from approximation theory using either orthogonal
polynomials or spline functions. The idea is actually to make a guess for the functional form of
the value function and iterate on the parameters of this functional form. The algorithm then
works as follows
1. Choose a functional form for the value function V (x; ), a grid of interpolating nodes
X = {x1, . . . , xN}, a stopping criterion > 0 and an initial vector of parameters 0.
2. Using the conjecture for the value function, perform the maximization step in the Bellman
equation, that is compute w` = T (V (x,i))
w` = T (V (x`,i)) = maxyu(y, x`) + V (x
,i)
s.t. x = h(y, x`) for ` = 1, . . . , N
3. Using the approximation method you have chosen, compute a new vector of parameters
i+1 such that V (x,i+1) approximates the data (x`, w`).
4. If V (x,i+1) V (x,i) < then stop, otherwise go back to 2.
First note that for this method to be implementable, we need the payoff function and the
value function to be continuous. The approximation function may be either a combination of
polynomials, neural networks, splines. Note that during the optimization problem, we may have
-
Applied Dynamic Programming 20
to rely on a numerical maximization algorithm, and the approximation method may involve
numerical minimization in order to solve a nonlinear leastsquare problem of the form:
i+1 Argmin
N`=1
(w` V (x`; ))2
This algorithm is usually much faster than value iteration as it does not require iterating on a
large grid. As an example, I will once again focus on the optimal growth problem we have been
dealing with so far, and I will approximate the value function by
V (k; ) =
pi=1
ii
(2
log(k) kk k 1
)where {i(.)}pi=0 is a set of Chebychev polynomials. In the example, I set p = 5 and used 20nodes. Figure 2 reports the decision rule and the value function in this case, and table 1 reports
the parameters of the approximation function. The algorithm converged in 242 iterations, but
it took much less time than value iterations.
Matlab Code: Parametric dynamic programming (OGM)
sigma = 1.50; % utility parameter
delta = 0.10; % depreciation rate
beta = 0.95; % discount factor
alpha = 0.30; % capital elasticity of output
nbk = 20; % number of points in the grid
p = 10; % order of polynomials
crit = 1; % convergence criterion
iter = 1; % iteration
epsi = 1e-6; % convergence parameter
ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
dev = 0.9; % maximal deviation from k*
kmin = log((1-dev)*ks); % lower bound on the grid
kmax = log((1+dev)*ks); % upper bound on the grid
rk = -cos((2*[1:nbk]-1)*pi/(2*nnk)); % Interpolating nodes
kgrid = exp(kmin+(rk+1)*(kmax-kmin)/2); % mapping
%
% Initial guess for the approximation
%
v = (((kgrid.^alpha).^(1-sigma)-1)/((1-sigma)*(1-beta)));;
X = chebychev(rk,n);
th0 = X\v
Tv = zeros(nbk,1);
kp = zeros(nbk,1);
%
% Main loop
%
options=foptions;
options(14)=1e9;
-
Applied Dynamic Programming 21
while crit>epsi;
k0 = kgrid(1);
for i=1:nbk
param = [alpha beta delta sigma kmin kmax n kgrid(i)];
kp(i) = fminunc(@tv,k0,options,[],param,th0);
k0 = kp(i);
Tv(i) = -tv(kp(i),param,th0);
end;
theta= X\Tv;
crit = max(abs(Tv-v));
v = Tv;
th0 = theta;
iter= iter+1;
end
Matlab Code: Parametric dynamic programming (extra functions)
function res=tv(kp,param,theta);
alpha = param(1);
beta = param(2);
delta = param(3);
sigma = param(4);
kmin = param(5);
kmax = param(6);
n = param(7);
k = param(8);
kp = sqrt(kp.^2); % insures positivity of k
v = value(kp,[kmin kmax n],theta); % computes the value function
c = k.^alpha+(1-delta)*k-kp; % computes consumption
d = find(c
-
Applied Dynamic Programming 22
case 1;
Tx=[ones(lx,1) X];
otherwise
Tx=[ones(lx,1) X];
for i=3:n+1;
Tx=[Tx 2*X.*Tx(:,i-1)-Tx(:,i-2)];
end
end
Figure 2: Deterministic OGM (Decision rules, Parametric DP)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 54
2
0
2
4
kt
Value Function
0 1 2 3 4 50
1
2
3
4
5
kt
Next period capital stock
0 1 2 3 4 50
0.5
1
1.5
2
kt
Consumption
Table 1: Value function approximation
0 1 2 3 4 5-0.2334 3.0686 0.2543 0.0153 -0.0011 -0.0002
A potential problem with the use of Chebychev polynomials is that they do not put any assump-
tion on the shape of the value function, which we know to be concave and strictly increasing in
this case. This is why Judd [1998] recommends to use shapepreserving methods such as Schu-
maker approximation. Judd and Solnick [1994] have successfully applied this latter technique
to the optimal growth model and found that the approximation was very good and dominated
-
Applied Dynamic Programming 23
other methods (they actually get the same precision with 12 nodes as the one achieved with a
1200 data points grid using a value iteration technique).
3 Stochastic Dynamic Programming
In a large number of problems, we have to deal with stochastic shocks just think of a stan-
dard RBC model dynamic programming technique can obviously be extended to deal with
such problems. This section will first show how we can obtain the Bellman equation before
addressing some important issues concerning discretization of shocks. Then it will describe the
implementation of the value iteration and policy iteration technique for the stochastic case. The
code for the endogenous grid method is reported at the end of the text.
3.1 The Bellman Equation
The stochastic problem differs from the deterministic problem in that we now have to take
expectations. The problem then defines a value function which has as argument the state
variable xt but also the stationary shock st, whose sequence {st}+t=0 satisfies
st+1 = (st, t+1) (6)
where is a white noise process. The value function is therefore given by
V (xt, st) = max{yt+D(xt+ ,st+ )}=0Et
=0
u(yt+ , xt+ , st+ ) (7)
subject to (6) and
xt+1 = h(xt, yt, st) (8)
Since both yt, xt and st are either perfectly observed or decided in period t they are known in
t, such that we may rewrite the value function as
V (xt, st) = max{ytD(xt,st),{yt+D(xt+ ,st+ )}=1}u(yt, xt, st) + Et
=1
u(yt+ , xt+ , st+ )
or
V (xt, st) = maxytD(xt,st)
u(yt, xt, st) + max{yt+D(xt+ ,st+ )}=1Et
=1
u(yt+ , xt+ , st+ )
Using the change of variable k = 1, this rewrites
V (xt, st) = maxytD(xt,st)
u(yt, xt, st)
+ max{yt+1+kD(xt+1+k,st+1+k)}k=0
Et
k=0
ku(yt+1+k, xt+1+k, st+1+k)
-
Applied Dynamic Programming 24
It is important at this point to recall that
Et(X(t+ )) =
X(t+ )f(t+ |t)dt+
=
X(t+ )f(t+ |t+1)f(t+1|t)dt+dt+1
=
. . .
X(t+ )f(t+ |t+1) . . . f(t+1|t)dt+ . . . dt+1
which is a corollary of the law of iterated projections, such that the value function rewrites
V (xt, st) = maxytD(xt,st)
u(yt, xt, st)
+ max{yt+1+kD(xt+1+k,st+1+k)}k=0
EtEt+1
k=0
ku(yt+1+k, xt+1+k, st+1+k)
or
V (xt, st) = maxytD(xt,st)
u(yt, xt, st)
+ max{yt+1+kD(xt+1+k,st+1+k)}k=0
Et+1
k=0
ku(yt+1+k, xt+1+k, st+1+k)f(st+1|st)dst+1
Note that because each value for the shock defines a particular mathematical object the max-
imization of the integral corresponds to the integral of the maximization, therefore the max
operator and the summation are interchangeable, so that we get
V (xt, st) = maxytD(xt,st)
u(yt, xt, st)
+
max
{yt+1+kD(xt+1+k,st+1+k)}k=0Et+1
k=0
ku(yt+1+k, xt+1+k, st+1+k)f(st+1|st)dst+1
By definition, the term under the integral corresponds to V (xt+1, st+1), such that the value
rewrites
V (xt, st) = maxytD(xt,st)
u(yt, xt, st) +
V (xt+1, st+1)f(st+1|st)dst+1
or equivalently
V (xt, st) = maxytD(xt,st)
u(yt, xt, st) + EtV (xt+1, st+1)
which is precisely the Bellman equation for the stochastic dynamic programming problem.
3.2 Discretization of the Shocks
A very important problem that arises whenever we deal with value iteration or policy iteration in
a stochastic environment is that of the discretization of the space spanned by the shocks. Indeed,
-
Applied Dynamic Programming 25
the use of a continuous support for the stochastic shocks is unfeasible for a computer that can
only deal with discrete supports. We therefore have to transform the continuous problem into
a discrete one with the constraint that the asymptotic properties of the continuous and the
discrete processes should be the same. The question we therefore face is: does there exist a
discrete representation for s which is equivalent to its continuous original representation? The
answer to this question is yes. In particular as soon as we deal with (V)AR processes, we can
use a very powerful tool: Markov chains.
Markov Chains: A Markov chain is a sequence of random values whose probabilities at a time
interval depends upon the value of the number at the previous time. We will restrict ourselves
to discrete-time Markov chains, in which the state changes at certain discrete time instants,
indexed by an integer variable t. At each time step t, the Markov chain is in a state, denoted
by s S {s1, . . . , sM}. S is called the state space.
The Markov chain is described in terms of transition probabilities piij . This transition probability
should be interpreted as follows
If the economy is in state si in period t, the probability that the next state is equal to
sj is piij.
We therefore get the following definition
A Markov chain is a stochastic process with a discrete indexing S , such that the
conditional distribution of st+1 is independent of all previously attained states given
st:
piij = Prob(st + 1 = sj |st = si), si, sj S .
Definition
The important assumption we shall make concerning Markov processes is that the transition
probabilities, piij , apply as soon as state si is activated no matter the history of the shocks nor
how the system got to the state si. In other words there is no hysteresis. From a mathematical
point of view, this corresponds to the socalled Markov property
P (st+1 = sj |st = si, st1 = sin1 , . . . s0 = si0) = P (st+1 = sj |st = si) = piijfor all period t, all states si, sj S , and all sequences {sin}t1n=0 of earlier states. Thus, theprobability the next state st+1 does only depend on the current realization of s.
-
Applied Dynamic Programming 26
The transition probabilities piij must of course satisfy
1. piij > 0 for all i, j = 1, . . . ,M
2.M
j=1 piij = 1 for all i = 1, . . . ,M
All of the elements of a Markov chain model can then be encoded in a transition probability
matrix
=
pi11 . . . pi1M... . . . ...piM1 . . . piMM
Note that kij then gives the probability that st+k = sj given that st = si. In the long run, we
obtain the steady state equivalent for a Markov chain: the invariant distribution or stationary
distribution
A stationary distribution for a Markov chain is a distribution pi such that
1. pij > 0 for all j = 1, . . . ,M
2.M
j=1 pij = 1
3. pi = pi
Definition
Moment approach to discretization: A first simple way to tackle this problem is to rely on a
method of moments. The idea is that the continuous process and its discrete approximation
should possess the same asymptotic properties in terms of conditional first and second order
moments. We will apply it later to the optimal growth model.
Nevertheless, if as illustrated this approach is straightforward when we restrict ourselves
to a 2 states representation of the process, it becomes cumbersome when we want to deal with
more states. We then can rely on a quadrature based method proposed by Tauchen and Hussey
[1991].
Gaussianquadrature approach to discretization: Tauchen and Hussey [1991] provide a simple
way to discretize VAR processes relying on Gaussian quadrature. This note will only present
-
Applied Dynamic Programming 27
the case of an AR(1) process of the form4
st+1 = st + (1 )s+ t+1
where t+1 ; N (0, 2). This implies that
1
2pi
exp
{1
2
(st+1 st (1 )s
)2}dst+1 =
f(st+1|st)dst+1 = 1
which illustrates the fact that s is a continuous random variable. Tauchen and Hussey [1991]
propose to replace the integral by(st+1; st, s)f(st+1|s)dst+1
f(st+1|st)f(st+1|s) f(st+1|s)dst+1 = 1
where f(st+1|s) denotes the density of st+1 conditional on the fact that st = s (in fact theunconditional density function), which in our case implies that
(st+1; st, s) f(st+1|st)f(st+1|s) = exp
{1
2
[(st+1 st (1 )s
)2(st+1 s
)2]}
then we can use the standard linear transformation and impose zt = (st s)/(
2) to get
1pi
exp{ ((zt+1 zt)2 z2t+1)} exp (z2t+1) dzt+1
for which we can use a GaussHermite quadrature. Assume then that we have the quadrature
nodes zi and weights i, i = 1, . . . , n, the quadrature leads to the formula
1pi
nj=1
j(zj ; zi;x) ' 1
in other words we might interpret the quantity j(zj ; zi;x)/pi as an estimate piij Prob(st+1 =
sj |st = si) of the transition probability from state i to state j. However, it is important to re-member that the quadrature is just an approximation such that it will generally be the case
thatn
j=1 piij = 1 does not hold exactly. Tauchen and Hussey therefore propose the following
modification:
piij =j(zj ; zi; s)
pii
where i =1pi
nj=1 j(zj ; zi;x).
Matlab Code: TauchenHusseys procedure
function [s,p]=tauch_hussey(xbar,rho,sigma,n)
% xbar : mean of the x process
4In their article, Tauchen and Hussey [1991] consider more general processes.
-
Applied Dynamic Programming 28
% rho : persistence parameter
% sigma : volatility
% n : number of nodes
%
% returns the states s and the transition probabilities p
%
[xx,wx] = gauss_herm(n); % nodes and weights for s
s = sqrt(2)*s*xx+mx; % discrete states
x = xx(:,ones(n,1));
z = x;
w = wx(:,ones(n,1));
%
% computation
%
p = (exp(z.*z-(z-rx*x).*(z-rx*x)).*w)./sqrt(pi);
sx = sum(p);
p = p./sx(:,ones(n,1));
3.3 Value Iteration
As in the deterministic case, the convergence of the simple value function iteration procedure is
insured by the contraction mapping theorem. This is however a bit more subtle than that as we
have to deal with the convergence of a probability measure, which goes far beyond this intro-
duction to dynamic programming.5 The algorithm is basically the same as in the deterministic
case up to the delicate point that expectations have to be computed at each iteration. It writes
as follows
1. Decide on a grid, X , of admissible values for the state variable x
X = {x1, . . . , xN}
and the shocks, s
S = {s1, . . . , sM} together with the transition matrix = (piij)
formulate an initial guess for the value function V0(x) and choose a stopping criterion
> 0.
2. For each x` X , ` = 1, . . . , N , and sk S , k = 1, . . . ,M compute
Vi+1(x`, sk) = max{xX }u(y(x`, sk, x
), x`, sk) + Mj=1
pikjVi(x, sj)
5The interested reader should then refer to Lucas et al. [1989] chapter 9.
-
Applied Dynamic Programming 29
3. If Vi+1(x, s) Vi(x, s) < go to the next step, otherwise go back to 2.
4. Compute the final solution as
y?(x, s) = y(x, x(s, s), s)
As in the deterministic case, we will illustrate the method relying on the optimal growth model,
with
u(c) =c1 1
1 and
k = exp(a)k c+ (1 )k
where a = a+ . Then the Bellman equation writes
V (k, a) = maxc
c1 11 +
V (k, a)d(a|a)
From the law of motion of capital we can determine consumption as
c = exp(a)k + (1 )k k
such that plugging this results in the Bellman equation, we have
V (k, a) = maxk
(exp(a)k + (1 )k k)1 11 +
V (k, a)d(a|a)
A first problem that we encounter is that we would like to be able to evaluate the integral
involved by the rational expectation. We therefore have to discretize the shock. Here, we will
consider that the technology shock can be accurately approximated by a 2 state Markov chain,
such that a can take on 2 values a1 and a2 (a1 < a2). We will also assume that the transition
matrix is symmetric, such that
=
(pi 1 pi
1 pi pi)
a1, a2 and pi are selected such that the process reproduces the conditional first and second order
moments of the AR(1) process
First order moments
pia1 + (1 pi)a2 = a1(1 pi)a1 + pia2 = a2
Second order moments
pia21 + (1 pi)a22 (a1)2 = 2(1 pi)a21 + pia22 (a2)2 = 2
-
Applied Dynamic Programming 30
From the first two equations we get a1 = a2 and pi = (1 +)/2. Plugging these last two resultsin the two last equations, we get a1 =
2/(1 2). Hence, we will actually work with a value
function of the form
V (k, ak) = maxc
c1 11 +
2j=1
pikjV (k, aj)
Now, let us define a grid of N feasible values for k such that we have
K = {k1, . . . , kN}
and an initial value function V0(k) that is a vector of N numbers that relate each k` to a
value. Note that this may be anything we want as we know by the contraction mapping
theorem that the algorithm will converge. But, if we want it to converge fast enough it may
be a good idea to impose a good initial guess. Finally, we need a stopping criterion.
In Figure 3, we report the value function and the decision rules obtained for the stochastic
optimal growth model with = 0.3, = 0.96, = 0.1, = 2, = 0.8 and = 0.05. The grid
for the capital stock is composed of 2000 data points ranging from 25% below to 25% above the
deterministic steady state.
Matlab Code: Value iteration (Stochastic OGM)
% Parameters of the economy
sigma = 2.00; % utility parameter
delta = 0.10; % depreciation rate
beta = 0.96; % discount factor
alpha = 0.30; % capital elasticity of output
rho = 0.80;
se = 0.05;
% parameters of the algorithm
dev = 0.75; % maximal deviation from steady state
nbk = 2000; % number of data points in the grid
crit = 1; % inital convergence criterion
tol = 1e-6; % Tolerance parameter
% Shocks
p = (1+rho)/2;
a = se/sqrt(1-rho*rho);
PI = [p 1-p;1-p p];
agrid = [-a a];
nba = length(agrid);
% Setup the grid
ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
kmin = (1-dev)*ks; % lower bound on the grid
kmax = (1+dev)*ks; % upper bound on the grid
-
Applied Dynamic Programming 31
kgrid = linspace(kmin,kmax,nbk); % builds the grid
% Initial conditions
v = zeros(nbk,nba); % value function
tv = zeros(nbk,nba); % value function
dr = zeros(nbk,nba); % decision rule (will contain indices)
% Main loop
iter = 1;
while crit>tol;
for j=1:nba
for i=1:nbk
% compute indexes for which consumption is positive
imax= sum(kgrid 0.
-
Applied Dynamic Programming 32
Figure 3: Stochastic OGM (Decision rules, Value iteration)
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.52
1
0
1
2
3
4
capital (kt)
Value Function
0 2 4 60
1
2
3
4
5
6
capital (kt)
Next period capital
0 2 4 6
0.8
1
1.2
1.4
1.6
capital (kt)
Consumption
2. Find a new policy rule y = fi+1(x, sk), k = 1, . . . ,M , such that
fi+1(x, sk) argmaxy
u(y, x, sk) +
Mj=1
pikjV (x, sj)
with x = h(x, fi(x, sk), sk)
3. check if fi+1(x, s) fi(x, s) < , if yes then stop, otherwise go back to 2.
When computing the value function we actually have to solve a linear system of the form
Vi+1(x`, sk) = u(fi+1(x`, sk), x`, sk) +
Mj=1
pikjVi+1(h(x`, fi+1(x`, sk), sk), sj)
for Vi+1(x`, sk) (for all x` X and sk S ), which may be rewritten as
Vi+1(x`, sk) = u(fi+1(x`, sk), x`, sk) + k.QVi+1(x`, .) x` X
where Q is an (N N) matrix
Q`j =
{1 if x h(fi+1(x`), x`) = xj0 otherwise
-
Applied Dynamic Programming 33
Note that although it is a big matrix, Q is sparse, which can be exploited in solving the system,
to get
Vi+1(x, s) = (I Q)1u(fi+1(x, s), x, s)We apply this algorithm to the same optimal growth model as in the previous section.
Matlab Code: Policy iteration (Stochastic OGM)
% Parameters of the economy
sigma = 2.00; % utility parameter
delta = 0.10; % depreciation rate
beta = 0.96; % discount factor
alpha = 0.30; % capital elasticity of output
rho = 0.80;
se = 0.05;
% parameters of the algorithm
dev = 0.75; % maximal deviation from steady state
nbk = 2000; % number of data points in the grid
crit = 1; % inital convergence criterion
epsi = 1e-6; % Tolerance parameter
% Shocks
p = (1+rho)/2;
a = se/sqrt(1-rho*rho);
PI = [p 1-p;1-p p];
agrid = exp([-a a]);
nba = length(agrid);
% Setup the grid
ks = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
kmin = (1-dev)*ks; % lower bound on the grid
kmax = (1+dev)*ks; % upper bound on the grid
kgrid = linspace(kmin,kmax,nbk); % builds the grid
% Initial conditions
v = zeros(nbk,nba); % value function
v1 = zeros(nbk,nba); % value function
kp0 = zeros(nbk,nba); % value function
dr = zeros(nbk,nba); % decision rule (will contain indices)
S = 100;
% Main loop
while crit>epsi;
for j=1:nba
for i=1:nbk
% compute indexes for which consumption is positive
imax = sum(kgrid
-
Applied Dynamic Programming 34
EV = (PI(j,:)*v(1:imax,:));
[v1(i,j),dr(i,j)]= max(util+beta*EV);
end;
end
% decision rules
kp = kgrid(dr);
c = kron(agrid,kgrid.^alpha)+(1-delta)*repmat(kgrid,1,nba)-kp;
util= (c.^(1-sigma)-1)/(1-sigma);
Q = sparse(nbk*nba,nbk*nba);
for j=1:nba;
Q0 = sparse(nbk,nbk);
J = sub2ind([nbk nbk],(1:nbk),dr(:,j));
Q0(J)= 1;
Q((j-1)*nbk+1:j*nbk,:) = kron(PI(j,:),Q0);
end
Tv = (speye(nbk*nba)-beta*Q)\util(:);
v = reshape(Tv,nbk,nba);
crit= max(max(abs(kp-kp0)))
kp0 = kp;
end
As for the deterministic case, the modified kstep iterations can be used to update the value
function rather than solving the big linear system.
The endogenous grid method algorithm is as easily amendable to the stochastic case as the
previous algorithms. Only the matlab code is reported.
Matlab Code: Endogenous Grid Method (Stochastic OGM)
clear all
% Parameters of the economy
sigma = 2.00; % utility parameter
delta = 0.10; % depreciation rate
beta = 0.96; % discount factor
alpha = 0.30; % capital elasticity of output
rho = 0.80;
se = 0.05;
% parameters of the algorithm
dev = 0.75; % maximal deviation from steady state
nbk = 2000; % number of data points in the grid
crit = 1; % inital convergence criterion
tol = 1e-6; % Tolerance parameter
method = linear; % Interpolation method
%
% Shocks
%
a = se/(sqrt(1-rho*rho));
agrid = exp([-a a]);
nba = length(agrid);
-
Applied Dynamic Programming 35
p = (1+rho)/2;
PI = [p 1-p;1-p p];
% Setup the grid
kss = ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
kmin = (1-dev)*kss; % lower bound on the grid
kmax = (1+dev)*kss; % upper bound on the grid
kpgrid = linspace(kmin,kmax,nbk); % builds the grid
%
% Initial values
%
kgrid = kss*ones(nbk,nba);
v0 = 1.5*(kpgrid.^alpha+(1-delta)*kpgrid);
v0 = repmat((v0.^(1-sigma)-1)/((1-sigma)*(1-beta)),1,nba);
v1 = v0;
while crit>tol
dv0 = diff_value(kpgrid,v0,method);
c = (beta*(PI*dv0)).^(-1/sigma);
u = (c.^(1-sigma)-1)/(1-sigma);
tmp = u+beta*(PI*v0);
for i=1:nba
crk = 1;
k0 = kgrid(:,i);
while crk>tol;
f0 = agrid(i)*k0.^alpha+(1-delta)*k0-kpgrid-c(:,i);
df0 = alpha*agrid(i)*k0.^(alpha-1)+(1-delta);
k1 = k0-f0./df0;
crk = max(abs(k0-k1));
k0 = k1;
end
kgrid(:,i) = k0;
v1(:,i) = interp1(k0,tmp(:,i),kpgrid,method);
end
crit = max(max(abs(v0-v1)));
v0 = v1;
end
%%
ygrid = kron(agrid,ones(nbk,1)).*kgrid.^alpha;
cgrid = ygrid+(1-delta)*kgrid-repmat(kpgrid,1,nba);
igrid = ygrid-cgrid;
-
Applied Dynamic Programming 36
References
Bertsekas, D., Dynamic Programming and Stochastic Control, NewYork: Academic Press,
1976.
Carroll, C.D., The Method of Endogenous Gridpoints for Solving Dynamic Stochastic Opti-
mization Problems, Economics Letters, 2006, 91 (3), 312320880.
Judd, K. and A. Solnick, Numerical Dynamic Programming with ShapePreserving Splines,
Manuscript, Hoover Institution 1994.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets: MIT Press, 1998.
Lucas, R., N. Stokey, and E. Prescott, Recursive Methods in Economic Dynamics, Cambridge
(MA): Harvard University Press, 1989.
Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions
to Nonlinear Asset Pricing Models, Econometrica, 1991, 59 (2), 371396.