zietz, 2007
TRANSCRIPT
-
8/12/2019 Zietz, 2007
1/23
PLEASE SCROLL DOWN FOR ARTICLE
This article was downloaded by:
On: 10 April 2011
Access details: Access Details: Free Access
Publisher Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-
41 Mortimer Street, London W1T 3JH, UK
The Journal of Economic EducationPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t914957642
Dynamic Programming An Introduction by ExampleJoachim ZietzaaMiddle Tennessee State University,
Online publication date: 07 August 2010
To cite this ArticleZietz, Joachim(2007) 'Dynamic Programming: An Introduction by Example', The Journal of EconomicEducation, 38: 2, 165 186
To link to this Article DOI 10.3200/JECE.38.2.165-186URL http://dx.doi.org/10.3200/JECE.38.2.165-186
Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.
http://www.informaworld.com/smpp/title~content=t914957642http://dx.doi.org/10.3200/JECE.38.2.165-186http://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://dx.doi.org/10.3200/JECE.38.2.165-186http://www.informaworld.com/smpp/title~content=t914957642 -
8/12/2019 Zietz, 2007
2/23
Spring 2007 165
Dynamic Programming:An Introduction by Example
Joachim Zietz
Abstract: The author introduces some basic dynamic programming techniques,
using examples, with the help of the computer algebra systemMaple. The empha-
sis is on building confidence and intuition for the solution of dynamic problems
in economics. To integrate the material better, the same examples are used to
introduce different techniques. One covers the optimal extraction of a natural
resource, another uses consumer utility maximization, and the final example
solves a simple real business cycle model. Every example is accompanied byMaple computer code to allow for replication.
Keywords: dynamic programming, learning by example,Maple
JEL codes: C610, A230
Dynamic programming should emphasize intuition over mathematical rigor
when it is introduced. Readers interested in more formal mathematical treatments
are urged to read Sargent (1987), Stokey and Lucas (1989), or Ljungqvist and
Sargent (2000). I build the discussion in this article around a set of examples thatcovers a number of key problems from economics. In contrast to other brief intro-
ductions to dynamic programming, such as King (2002), I integrate the use of a
computer algebra system (Maple). Program code is provided for every example
to encourage replication and experimentation.1 On the basis of classroom experi-
ence, I found that this allows readers to understand more easily the structure of
dynamic programming and to move faster to their own applications.2 The prob-
lems in this article should be accessible to undergraduates that are familiar with
the Lagrangian multiplier method to solve constrained optimization problems.
Dynamic programming has strong similarities to optimal control, a competingapproach to dynamic optimization. Dynamic programming has its roots in the
work of Bellman (1957), and optimal control techniques rest on the work of the
Russian mathematician Pontryagin and his coworkers in the late 1950s.3
Although both dynamic programming and optimal control can be applied to dis-
crete time and continuous time problems, most current applications in economics
appear to favor dynamic programming for discrete time problems and optimal
control for continuous time problems. To keep the discussion reasonably simple,
Joachim Zietz is a professor of economics at Middle Tennessee State University (e-mail:[email protected]). The author thanks two anonymous referees for helpful comments. Maple input code
and output for all example programs are available from the author on request. Copyright 2007Heldref Publications
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
3/23
I only deal with discrete time dynamic programming problems, which have
become popular in macroeconomics. However, I do not limit the discussion to
macroeconomics but try to convey the idea that dynamic programming has appli-
cations in other settings as well.
I provide some motivation for the use of dynamic programming techniques and
then discuss finite horizon problems. I treat numerical and nonnumerical prob-
lems. I next cover infinite horizon problems, again using both numerical and non-numerical examples. Finally, I solve a simple stochastic infinite horizon problem
of the real business cycle variety.
A MOTIVATING EXAMPLE
The basic principle of dynamic programming is best illustrated with an exam-
ple. Consider the problem of an oil company that wants to maximize profits from
an oil well. Revenue at time tis given as
wherep is the price of oil, and u is the amount of oil that is extracted and sold. In
dynamic programming applications,u is typically called a control variable. The
companys cost function is quadratic in the amount of oil that is extracted,
The amount of oil remaining in the well follows the recursion or transition
equation
wherexis known as a state variable in dynamic programming language. Because
oil is a nonrenewable resource, pumping out oil in the amount of u at time tmeans
that exactly that much less oil is left in the oil well at time t 1. I assume that
the company applies the discount factor 0.9 for profit streams that occur in
the future. I also assume that the company intends to have the oil well depleted in
four years, which means thatx4 0. This is known as a boundary condition in
dynamic programming. Given these assumptions, the central question to be
solved is how much oil should be pumped at time t,t 0, . . . , 3, to maximize the
discounted profit stream.
If one had never heard of dynamic programming or optimal control, the natu-
ral way to solve this problem would be to set up a Lagrangian multiplier problem
along the following lines:
where x = (x1,x2,x3), withx0 given exogenously, andx4 0, by design, where
u (u0,u1,u2,u3), (1, 2, 3, 4), and where the constraints specify that theamount of oil left at period tis equal to the amount in the previous period minus
the amount pumped in the previous period. Using standard techniques, the above
maxL(x, u, ) a3
t0
t(RtCt) a4
t1
lt(xtxt1ut1),
xt1xtut,
Ct 0.05u2t.
Rtptut,
166 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
4/23
Lagrangian problem can be solved for the decision variables x and u and the
Lagrangian multipliers in terms of the exogenous variables (p0,p1,p2,p3,x0).
TheMaple commands to solve this problem are given as follows4:
restart: with(linalg): Digits:=(3):
# memory is cleared, the linear algebra package
# called and the number of digits for floating
# point calculations is set; a colon at the end
# prevents all output
f:=sum(0.9^t*(p[t]*u[t]-0.05*u[t]^2),t = 0..3);
# the objective function is defined
x[0]:=x0: x[4]:0:
g:=sum(lambda[t]*(x[t]-x[t-1]+u[t-1]),t=1..4);
# the constraints are defined
L:=fg;
# the Lagrangian function is defined
L_grad:=grad(L,[seq(x[t],t=1..3),seq(u[t],t=0..3),
seq(lambda[t],t=1..4)]);
# the gradient of the Lagrangian is derived;
# the seq command greatly simplifies the input lines
solve({seq(L_grad[i],i=1..11)},{seq(x[t],t=1..3),
seq(u[t],t=0..3),seq(lambda[t],t=1..4)});
# the gradient is solved for all x, u, and
assign(%):
# all x, u, and
are assigned their solution val-
# ues; this is required to do further calculations
# with the solution values
For example, the optimal amounts of oil to pump in periods zero to four are
given as
If one now specifies a particular sequence of prices and the value of x0, then a
complete numerical solution results. For example, assuming (p0 20,p1 22,
p2 30,p3 25,x0 1000), one can append the followingMaple commands:
x0:=1000: p[0]:=20: p[1]:=22: p[2]:=30: p[3]:=25:
evalf([seq(x[t],t=0..4)]); evalf([seq(u[t],t=0..3)]);
evalf([seq(lambda[t],t=1..4)]);
to generate the solution for [x0;x;x4], [
u],
[] as[1000;794,566,260;0;],[206,227,308,260],[0.60,0.60,
0.60,0.60].
u3 0.291x0 2.91p0 2.91p1 2.91p2 7.10p3 .
u2 0.262x0 2.62p0 2.62p1 7.38p2 2.62p3
u1 0.235x0 2.35p0 7.65p1 2.35p2 2.36p3
u0 0.212x0 7.88p0 2.12p1 2.12p2 2.12p3
Spring 2007 167
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
5/23
Given the relative simplicity with which this problem can be solved with stan-
dard techniques, students may wonder where there is room for a different approach.
The clue to this puzzle lies in the simple fact that one cannot at time 0 pretend to
know all future prices or actual amounts of oil that are pumped. For example, it may
be that oil rigs break down or workers are on strike so that the optimal amount of
oil simply cannot be pumped in a given year. Yet it should be clear that a change in
any of the futurep orxmeans that the above sequence ofxand u is no longer opti-mal. The whole problem needs to be recalculated subject to the new p and the
known values ofx. To avoid recalculating the problem whenever something unex-
pected happens to eitherp orx, economists try to find a decision rule or policy func-
tion for the control variable u that can be followed regardless of random shocks to
price or state variablex. That is the basic idea of dynamic programming.
FINITE HORIZON PROBLEMS
The Motivating Example Again
I now recalculate the previous problem using dynamic programming tech-
niques. The objective is, as before, to maximize
subject to the transition equations
and the boundary condition
Again, I assume that the company applies the discount factor 0.9.
The problem is now solved with the help of the Bellman equations
(1)
where Vt is known as the value function in dynamic programming terminology.The value function Vappears on both the left and the right side of the Bellman
equations, although with a different time subscript. The different time subscripts
suggest that the dynamic programming problem is solved in a recursive manner,
with the solution for Vderived first for time t 1 and then for time t. The lack
of a summation sign in equation (1) underscores the fact that the optimization
proceeds one period at a time. In line with this recursive approach, the discount
factor that is applied to Vt1 has an exponent of one. Vt1 is discounted by just
one period so it can be summed to the value that is provided by the objective
function at time t. The argument max in equation (1) implies that the control vari-able ut is set to maximize the right-hand side of the Bellman equation for time
period t, given the optimal value function for period t 1 (Vt1).5 Note that both
the optimal value function (Vt1) and the optimal policy function for the control
Vt maxut
(ptut 0.05u2t) 0.9Vt1, t 0,1,2,
x4 0.
xt1xtut, t 0,1,2,3
a3
t0
t(ptut 0.05u2t),
168 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
6/23
variable at time t 1 (ut1) are expressed in terms of the state variable xt1.6
Because the transition equations linkxt1 toxt and ut, the right-hand side of the
Bellman equation can be maximized with respect to ut only after all occurrences
ofxt1 in Vt1 are replaced with the term (xt ut). In short, there are more utterms
in equation (1) than meet the eye.
In applying the Bellman equations of equation (1), I start with the boundary
condition and the value function for t 3 and then solve the problem recursivelyin the backward direction toward period 0.
To obtain the value function for t 3, I first need to identify the optimal pol-
icy function for u for that period. Because x4 0, the transition equation for
period 3 is given as
Thus, the transition equation fort 3 conveniently provides the optimal decision
rule for the control variable u in period 3,
(2)
Equation (2) simply says to extract and sell all the oil that remains at the begin-
ning of t 3 in that same period. Compared with the rule obtained from apply-
ing the Lagrange multiplier method in the previous section, equation (2) is much
simpler because it is independent of any price variables or any other variables
dated t 3. It is an application of the principle of optimality that underlies
dynamic programming: One can do no better than to follow the derived optimal
policy or decision rule for the control variable, regardless of what happened to
control and state variables in earlier periods.Now that the optimal policy function for u3 is known, the value function for
t 3 can be derived as7
(3)
Note the absence of a term V4 in equation (3). This derives from the fact that no
value is added anymore in period 4 because the oil well is depleted. Equation (3)
completes the optimization process for period t 3. The processes for periods 2,
1, and 0 still need to be done.
To solve the optimization problem for t 2, I substitute equation (3) into theBellman equation for period 2 as follows:
to obtain
(4)
The right-hand side of equation (4) can be maximized with respect to u2 only after
considering thatx3 is connected with u2 via the transition equation
(5)
Substituting equation (5) into equation (4) gives
x3x2u2.
V2 maxu2
(p2u2 0.05u22) 0.9(p3x3 0.05x3
2).
V2 maxu2
(p2u2 0.05u22) 0.9V3
V3p3x3 0.05x32.
u3x3.
x4 0x3u3.
Spring 2007 169
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
7/23
(6)
Now, only the decision variable u2 and variables that are known or exogenous at
time t 2 are in equation (6). The right-hand side of equation (6) can be maxi-
mized with respect to the control variable u2,
Solving the above first-order condition for u2 gives the optimal policy or decision
rule for period 2,
(7)
Similar to the decision rule in equation (2), this policy rule for the control vari-
able is simpler than the corresponding rule from the Lagrangian multiplierapproach in the last section. There is also no need to revise this rule in the light
of unexpected shocks to price or the state variablexprior to t 2. This completes
the calculations for t 2.
To find the optimal policy rule for period 1, insert equation (7) into the value
function for period 2 to get an expression similar to that in equation (3). After
some simplifications, the value function for period 2 can be written as
The value function for period 2 needs to be substituted into the Bellman equationfor period 1,
This gives
(8)
Making use of the transition equation
(9)
in equation (8), one can maximize its right-hand side with respect to u1,
Solving the resulting first-order condition for u1 yields
u1 0.2989x1 7.013p1 2.989p2 2.989p3.
2.636p22 4.734p2p3 0.02368(x1u1)
2 2.132p32 4 6>du1 0.
d5(p1u1 0.05u12) 0.9 30.474(p2p3)(x1u1)
x2x1u1
2.636p22 4.734p2p3 0.02368x22 2.132p32 4 .V1 max
u1(p1u1 0.05u1
2) 0.9 30.474(p2p3)x2
V1 maxu1
(p1u1 0.05u12) 0.9V2.
V2 0.474(p2p3)x2 2.636p22 4.734p2p3 0.02368x2
2 2.132p32.
u2 0.4737x2 5.263p2 4.737p3.
p2 0.19u2 0.9p3 0.09x2 0.
d5(p2u2 0.05u22
) 0.9 3p3(x2u2) 0.05(x2u2)2
4 6>du2 0
V2 maxu2
(p2u2 0.05u22) 0.9 3p3(x2u2) 0.05(x2u2)2 4 .
170 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
8/23
This is the optimal policy or decision rule for period 1. It has some similarities
with the equation that determines the optimal value of u1 from the Lagrangian
multiplier approach, but it has fewer terms.
The last step in the solution of the optimal control problem is to obtain the decision
or policy rule for period 0. The procedure is the same as before; insert the optimal deci-
sion rule for period 1 into the Bellman equation for period 1 and simplify to obtain
Next, this value function is inserted into the Bellman equation for period 0, which
is given as
(10)
After substituting all occurrences of x1 in the this equation with (x0 u0), the
right-hand side of equation (10) is maximized with respect to u0. This yields
This decision rule for period 0 is the same as the one derived in the previous sec-
tion using Lagrangian multiplier techniques.
These results can be calculated byMaple with the help of the following code:
restart: N:3: Digits:4:
V[N]:unapply(p[N]*x0.05*x^2,x);
# the objective function is expressed as a function# of the state variable x
for t from N-1 by -1 to 0 do;
# a do loop is entered for a backward recursion
V[t]:=p[t]*u-0.05*u^2+0.9*V[t+1](x-u);
# the Bellman equation is specified; the last term
# says to replace all values of x in V[t+1], which
# are dated t+1, with (x-u), which is dated t
deriv[t]:=diff(V[t],u);
# the right side of the Bellman equation is differen-# tiated with respect to the control variable
u_opt[t]:=evalf(solve(deriv[t],u));
# the first derivative is set to zero and solved for
# the control variable; the optimal policy function
# (u_opt) is evaluated numerically
V[t]:=unapply(evalf(simplify(subs(u=u_opt[t],
V[t]))),x);
# u_opt is substituted into the Bellman equation; the
# result is simplified, evaluated numerically, and# expressed as a function of the state variable x
od;
# the do loop ends.
u0 0.2120x0 7.880p0 2.119p1 2.120p2 2.120p3.
V0 maxu0
(p0 u0 0.05u02) 0.9V1.
2.556p32 3.506p1
2 2.989p1(p2p3).
V1 0.01494x12 3.006p2
2 2.989p2p3 0.2989(p1p2p3)x1
Spring 2007 171
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
9/23
It should be apparent that the recursive structure of the dynamic programming
problem makes it easy in principle to extend the optimization to a larger number
of periods.
A Nonnumerical Example
Much of the work in economics is about solving optimization problems with-
out specific numerical entries. The next example applies the dynamic program-
ming framework in this environment.
Consider the maximization of utility of a consumer who does not work but
lives off a wealth endowment a. Assume that utility depends on the discounted
logged value of consumption c from period zero to two,
where the discount factor is
and where ris the relevant interest rate. Consumption is the models control vari-
able, and the transition equations are given by
The models boundary condition is
which says that wealth is supposed to be depleted at the end of period 2. There
are no bequests. The task is to find the optimal consumption sequence.
The solution process starts in the final period and works backward in time.
Applying the boundary condition to the transition equation for period t 2 results in
This can be solved for c2, the control variable for t 2, to get
(11)
This is the decision rule for period t 2. To find the decision rules for periods
t 1 and t 0, apply the Bellman equation
The Bellman equation for period 1 requires the value function of period 2. The
latter is equal to the above Bellman equation for t 2, with c2 replaced by its
maximized value from equation (11) and with V3 0 because wealth is exhausted
at the end of t 2,
This value function can now be inserted into the Bellman equation for period 1,
V1 maxc1
ln c1 V2,
V2 ln c2 ln 3(1r)a2 4 .
Vt maxct
ln ct Vt1.
c2 (1r)a2.
0 (1r)a2c2.
a3 0,
at1 (1r)atct t 0,1,2.
11(1r),
max U a2
t0
tln c1,
172 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
10/23
to obtain
Before the right-hand side of the Bellman equation is maximized with respect to
c1,a2 has to be replaced using the transition equation for t 1,
The Bellman equation for period 1 is now of the form
and its right-hand side can be maximized with respect to the control variable c1.
The first-order condition and c1 are given as
(12)
The decision rule for t 0 still needs calculating. It requires the value function
for t 1. This value function is equal to the Bellman equation for period 1, with
c1 replaced by the optimal decision rule, equation (12),
Hence, the Bellman equation for t 0 is given as
Substituting out a1 with
gives
Maximizing the right-hand side with respect to the control variable c0 results in
the following first-order condition and optimal value for c0:
a0(1r)c0(1 2)
c0 3a0(1r)c0 4 0
2ln B(1r)2(1r)a0c01
R.
V0 maxc0
ln c0 ln B(1r)(1r)a0c01
R
a1 (1r)a0c0
V0 maxc0
ln c0 lna1(1r)
1 2ln
a1(1r)2
1 .
V0 maxc0
ln c0 V1
V1 lna1(1r)
1 ln
a1(1r)2
1 .
c1(1r)a1
1 .3
a1(1r)c1(1 )
4
5c1
3a1(1r)c1
4 6 0,
d5ln c1 ln(1r) ln 3(1r)a1c1 4 6dc1
V1 maxc1
ln c1 ln5(1r) 3(1r)a1c1 4 6,a2 (1r)a1c1.
V1 maxc1
ln c1 ln 3(1r)a2 4 .
Spring 2007 173
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
11/23
(13)
A comparison of the consumption decision rules for t 0, 1, 2 reveals the
following pattern:
or, more generally, for t 0, ,n,
(14)
The ability to recognize and concisely describe such a pattern plays a key role in
applications of dynamic programming in economics, in particular those applica-tions that deal with infinite horizon problems. I discuss them next.
The followingMaple program solves the above consumption problem:
restart: N:=2: Digits:=4:
V[N]:=unapply(log((1+r)*a),a);
# this initial value function follows from the bound-
# ary condition
for t from N-1 by -1 to 0 do
V[t]:=log(c)+beta*V[t+1]((1+r)*a-c);
# Since V is a function of a, the term in parenthe-
# sis after V[t+1] is substituted for a in V[t+1]
deriv[t]:=diff(V[t],c);
c_opt[t]:=solve(deriv[t],c);
V[t]:=unapply(simplify(subs(c=c_opt[t],V[t])),a);
od;
INFINITE HORIZON PROBLEMS
A problem often posed in economics is to find the optimal decision rule if the
planning horizon does not consist of a fixed number of n periods but of an infinite
number of periods. Examining the limiting case n S may also be of interest if
n is not strictly infinite but merely large because taking the limit leads to a deci-
sion rule that is both simplified and the same for every period. Most often that is
preferred when compared with a situation where one has to deal with a distinct
and rather complex rule for each period from t 0 to t n. I illustrate this point
with the consumption problem of the last section.
A Nonnumerical Problem
The optimal consumption decision rule given in equation (14) can be modified
so it applies for an infinite time horizon. For taking the limit, the first part of
ctat(1r)
a i0n t
i
at
a i0n t
i
at
a i ln t1
i, t 0, p , n.
ctat(1r)
a t02 t
i, t 0,1,2,
c0(1r)a0
1 2.
174 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
12/23
equation (14) is most useful.
Making use of the geometric series theorem,
yields
The decision rule for consumption in period tsimplifies to the intuitively obvi-
ous: Consume in tonly what you receive in interest from your endowment in t.
The associated transition function for wealth is given as
that is, wealth will remain constant over time.
This example illustrates one way of arriving at optimal decision rules that
apply to an infinite planning horizon: simply solve the problem for a finite num-
ber of periods, identify and write down the pattern for the evolution of the con-
trol variable, and take the limit n S . It should be obvious that the same
procedure can be applied to identify the value function for time period t. To illus-
trate this point, recall the sequence of value functions for t 1, 2.
(15)
(16)
The value function for t 0 is given as
(17)
Proceeding now as in the case of the consumption rule, the following pattern
emerges from equations (15) to (17):
(18)
It is possible to check the correctness of the pattern described by equation (18) bylettingMaple calculate the implied sequence of the Vt. TheMaple code to repli-
cate equations (15) through (17) from equation (18) is given as
restart: n:=2:
Vt an t
i0
ilnat(1r)
i1i
a i0n t
i.
V0 ln
a0(1r)
1 2 ln
a0(1r)2
1 2 2ln
a0 (1r)32
1 2.
V1 lna1(1r)
1 ln
a1(1r)2
1 .
V2 ln 3a2(1r) 4 ,
at1 (1r)atct (1r)atatrat,
limnSq
ctat(1r)(1 )atr.
aq
i0
i 1
1 for 0 6 6 1,
limnSq
ct limnSq
at(1r)
a i0n t
i.
Spring 2007 175
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
13/23
seq(V[t]=sum(beta^i*log(a[t]*(1+r)^(i+1)*beta^i/
(sum(beta^(i),i=0..n-t))),i=0..n-t),t=0..n);
After verifying that the formula is correct, the limit n S is taken
Because interest centers on finding an expression involving the variable at, the
idea is to isolate at. This can be done as follows:
or
The last expression can be rewritten as
(19)
where is an expression independent of a. Taking the limit leads to
(20)
With the help of a computer algebra program, students could circumvent equation
(18) and the following algebra altogether by going from equation (17) immedi-
ately to equation (19) for n 2 and t 0. The following sequence of Maple
commands takes equation (17), expands it, and isolates the terms in a0:
restart:
V[0]:=ln(a[0]*(1+r)/(1+beta+beta^2))+
beta*ln(a[0]*(1+r)^2*beta/(1+beta+beta^2))+
beta^2*ln(a[0]*(1+r)^3*beta^2/(1+beta+beta^2));
# the above is equation 17
assume(beta0,r0);
assume(a[0],real);# the assumptions exclude economically senseless
# cases that prevent the program from finding a
# solution; variables with assumptions attached
limnSq
Vt1
1 ln at ha1 1rb ln at h.
limnSq
Vt lim
nSq
a
n t
i0
iln at h,
limnSq
Vt limnSq
an t
i0
iln at limnSq
an t
i0
iln(1r)i1i
an t
i0i
.
limnSq
Vt limnSq
an t
i0
i c ln at ln (1r)i1i
an t
i0i
d
limnSq
Vt limnSq
an t
i0
ilnat(1r)
i1i
an t
i0i
.
176 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
14/23
# are indicated with a in the Maple output
V[0]:=collect(expand(V[0]),log(a[0]));
# the collect command factors the specified term,
# i.e. log(a[0])
The output of these commands is
(21)
where consists of a number of terms in and r.
A Numerically Specified Problem
Infinite horizon problems that are numerically specified can be solved by iter-
ating on the value function. I demonstrate this for the above consumption opti-
mization problem. The task is to find the optimal decision rule for consumption
regardless of time t. I employ the Bellman equation without time subscripts,
where a and a denote current and future value of the state variable, respectively.
The time subscripts are eliminated in the Bellman equation to reflect the idea that
the value function V(a) needs to satisfy the Bellman equation for any a. The iter-
ation method starts with an initial guess of the solution V(a), which is identified
as V(0)(a). Guessing a possible solution is not as difficult as it sounds; each class
of problems typically has associated with it a general form of the solution. For
example, in the current case, it is known from solving the consumption problemfor t 2 that the value function will somehow depend on the log of a.
To simplify the algebra, assume that 0.95.
A numerically specified initial guess. I convert the initial guess of the value func-
tion into numerical form by assumingA 0 andB 1.
The initial guess V(0)(a) is substituted on the right-hand side of the Bellman equation
Next, making use of the transition equation, get the following:
to substitute out a,
the right-hand side of V(a) is maximized with respect to the control variable c,
dV(a)
dc
1
c
0.95
(1.05263ac) 0,
V(a) maxc
ln c 0.95 ln(1.05263ac),
a (1 0.05263) ac
V(a) maxc
ln c 0.95 ln a.
V(0)(a) ln a.
V(0) (a)ABln a.
V(a) maxc
ln c V(a),
V0 (1 2) ln a0 z,
Spring 2007 177
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
15/23
which gives c 0.53981a. Substituting the solution for c back into V(a) yields
and, after simplifying,
Because
the value function requires further iterations. For that purpose, I substitute the
new guess,V(1)(a), into the Bellman equation and get
Substitution of the transition equation for a gives
which simplifies to
Maximizing the right-hand side of V(a) with respect to c yields c 0.36902a.
Substituting the optimal policy rule for c back into V(a) gives
which simplifies to
Because
another iteration is needed. Instead of continuing the above process of iterating
on the value function by hand, I make use of the following Maple program to
iterate until convergence is achieved:
restart: Digits:=8: beta:=0.95:V[0]:=unapply(ln(a),a):
for n from 0 to 200 do
V[n]:=unapply(ln(c)+beta*V[n]((1+((1/beta)-1))*a-
c),(a,c));
# V is a mathematical function of both a and c; in
# the text V is expressed as a function of only the
# state variable a
deriv:=diff(V[n](a,c),c);
c_opt:=expand(evalf(solve(deriv,c)));# the command expand often simplifies the calcula-
# tions that follow especially when used in combi-
# nation with simplify or evalf
V (2)(a)V(1)(a),
V(2)(a) 2.8899 2.8525 ln a.
V (2) (a) ln 0.36902 a 1.1884 1.8525 ln(1.05263a 0.3690a),
V(a) maxc
ln c 1.1884 1.8525 ln(1.05263ac).
V(a) maxc ln c 0.95 3 1.251 1.95 ln(1.05263ac) 4 ,
V(a) maxc
ln c 0.95 ( 1.251 1.95 ln a).
V(1) (a)V(0) (a),
V(1) (a) 1.251 1.95 ln a.
V (1)(a) ln(0.53981a) 0.95 ln(1.05263a 0.53981a),
178 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
16/23
V[n+1]:=unapply(evalf(expand(simplify(subs(c=c_opt,V
[n](a,c))))),a);
# an example for an interlocked command structure:
# the substitute command is executed first, followed
# by simplify, expand, evalf, and unapply
od;
After about 200 iterations with eight digits of accuracy, the value function
reaches
and the corresponding policy function for the control variable
These results are effectively identical to the ones that can be obtained from equa-
tion (20) and the corresponding decision rule for c.8 Specifically, equation (20)
predicts the value function to be
and the corresponding decision rule for c is identical to the one above.
A numerically unspecified initial guess. I start with a numerically unspecified
guess of the value function,
(22)
whereA andB are constants to be determined. In this case, the value function will
not be iterated on. Instead, the idea is to algebraically derive values forA andB
that equalize V(a) on the left and V(a) on the right side of the Bellman equation.
The solution process begins with the Bellman equation
and the substitution of the initial guess of the value function,
Next, the transition equation is used to substitute out the state variable a
(23)
The right-hand side of equation (23) is now maximized with respect to the con-
trol variable c,
which results in
.c21.0526a
20 19B
d5ln c 0.95 3ABln(1.05263ac) 4 6dc 0,
V(a) maxc
ln c 0.95 3ABln(1.05263ac) 4 .
V(a) maxc
ln c 0.95 (ABln a).
V(a) maxc
ln c 0.95V (0)(a)
V(0)(a)ABln a,
V(a) 20 ln a h,
c 0.05263a.
V(a) 58.886 19.9994 ln a
Spring 2007 179
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
17/23
The optimal c is substituted back into equation (23), which, upon expansion,
results in
(24)
If the initial guess of the value function, equation (22), had only one unknownparameter, sayA, I could substitute this initial guess for V(a) on the left side
of equation (24) and then solve for the unknown value ofA. With two unknown
parameters in the value function,A andB, this method is not useful. An alter-
native solution procedure rests on a comparison of coefficients. In particular,
the coefficient of ln a implied by equation (24) is set equal to B, which is the
coefficient of ln a in equation (22). The resulting equation is solved for B.
OnceB is known, the right-hand side of equation (24) is set equal to the right-
hand side of equation (22). The resulting equation determinesA.
To proceed with the coefficient comparison, I factor the terms in ln a inequation (24).
(25)
set the coefficient of ln a in equation (25) equal toB,
and solve forB. This results inB 20. To derive parameterA, substituteB 20
into equations (22) and (25), set the two equal
and solve for A. The solution is A 58.888. With the parameters A and B
known, the policy function for c is given as
and the value function is
Both functions match the ones obtained in earlier sections with other solution
methods.
TheMaple commands to replicate the above solution method are given as
restart: Digits:=8:
assume(a0): beta:=0.95:
V[0]:=unapply(A+B*log(a),a);# this is the initial guess of the value function
V:=unapply(ln(c)+beta*V[0]((1+((1/beta)-1))*a-
c),(a,c));
V(a) 58.888 20.0 ln a.
c 0.05263a,
2.9444 0.95A 20 ln aA 20 ln a
(1 0.95B)B,
0.95BlnB 0.95Bln(20 19B),
V(a) 3.047 (1 0.95B) ln a ln(20 19B) 0.95A 2.846B
0.95BlnB 0.95Bln(20 19B).
V(a) 3.047 ln a ln(20 19B) 0.95A 2.846B 0.95Bln a
180 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
18/23
# the initial guess is substituted into the Bellman
# equation; V has to be expressed as a mathematical
# function of both a and c; in the text, V is ex-
# pressed as a function of only the state variable a
deriv:=diff(V(a,c),c);
# take the first derivative of the right-hand side of
# Vc_opt:=expand(evalf(solve(deriv,c)));
# solve for the optimal c
V:=expand(simplify(subs(c=c_opt,V(a,c))));
V:=collect(V,ln(a));
# substitute the optimal c into the Bellman equation
# and factor (ln a)
B:=solve(diff(V,a)*a=B,B);
# isolate the coefficient of (ln a), set it equal to
# B, and solve for B; to isolate the coefficient of# (ln a) in V[1], apply the chain rule: dV[1]/d(ln a)
# = (dV[1]/da)*(da/d(ln a))
V0:=A + B*log(a):
A:=solve(V = V0,A);
# solve for the coefficient A
`c `=c_opt;`V(a)`=V0;
A STOCHASTIC INFINITE HORIZON EXAMPLE
Dynamic programming problems can be made stochastic by letting one or
more state variables be influenced by a stochastic disturbance term. Most of the
applications of dynamic programming in macroeconomics follow this path. I pro-
vide one example.
The basic real business cycle (RBC) model (discussed in King 2002) is very
much related to that of Kydland and Prescott (1982) and Long and Plosser (1983).
Many RBC models are variations of this basic model.
The problem is to maximize the expected discounted sum of present and futureutility, where utility is assumed to be a logarithmic function of consumption (C)
and leisure (1 n).
where n represents labor input and available time is set at unity. Maximization is
subject to the market clearing constraint that outputy equals the sum of the two
demand components, consumption and investment.
where investment equals the capital stock kbecause capital is assumed to depre-
ciate fully in one period. There are two transition equations, one for each of the
ytCtkt,
maxE0aq
y0
t 3 ln Ct dln(1nt) 4 ,
Spring 2007 181
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
19/23
two state variables, output and the state of technology (A),
(26)
(27)
The transition equation ofy is a dynamic Cobb-Douglas production function, and
the transition equation ofA follows an autoregressive process subject to a distur-bance term e, which is assumed to be normally distributed. C,n, and kare the
control variables. To make the problem easier to follow, I use numbers for the
parameters, 0.9, 0.3, 1.0, and 0.8.
The Bellman equation for this problem is given as
whereEis the expectations operator. The Bellman equation can be simplified
by substituting for Cand, hence, reducing the problem to two control variables
(28)
The next step in the solution process is to find an initial value function. In the ear-
lier consumption problem, which was similar in terms of the objective function
but had only one state variable a, the value function was of the type
Because there are two state variables in the present problem, I try the analogous
solution
(29)
From here on, the problem follows the method used for the previous section.
Substituting the trial solution from equation (29) into equation (28) gives
(30)
Before the right-hand side of equation (30) can be maximized,y andA that appear in
the expectations term are substituted out by their transition equations (26) and (27).
The above equation is expanded to
(31)
and then maximized with respect to both kand n. This results in two first-order
conditions,
1
yk
0.27G
k 0, and 1
1n
0.63G
n 0,
0.63Gln n 0.9GEe 0.72HlnA 0.9HEe
V(y,A) maxk,n
ln(yk) ln(1n) 0.9Z 0.27Glnk
0.9E 3ZGln(A0.8 eek0.3 n0.7)Hln(A0.8 ee) 4 .V(y,A) maxk,n ln(yk) ln(1n)
V(y,A) maxk,n
ln(yk) ln(1 n) 0.9E(ZGlnyHlnA).
V(0) (y,A)ZGlnyHlnA.
V(a)ABln a.
V(y,A) maxk,n
ln(yk) ln(1n) 0.9EV (0)(y,A),
V(y,A) maxC,k,n
ln Ct ln(1nt) 0.9EV(0)(y,A),
At1Atr
eet 1.
yt1At1kta nt
1a.
182 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
20/23
which can be solved for the two control variables to yield
These solutions are substituted back into equation (31), and the expectations
operator is employed to remove the two disturbance terms. These operations
yield
(32)
As in the consumption example of the last section, the implied coefficients of ln
y and lnA and the remainder term of equation (32) have to be compared with G,
H, andZof equation (29). To make such a coefficient comparison feasible, factor
the lny and lnA terms on the right-hand side of equation (32) to obtain
(33)
where
(34)
Note that equation (33) is now in the same format as equation (29) so that thecoefficients of these two equations can be compared. Setting the coefficient of ln
y in equation (33) equal to G yields the determining equation
which results in G 1.37. Employing this value of G, set the coefficient of lnA
in equation (33) equal toH,
and obtain H 3.52. Next, substitute the solutions for G and H into equation(34), set the resulting expression for equal toZ, and solve forZ. This results in
Z 20.853. With the values of Z,G, and Hknown, the value function that
solves the problem is given as
and the policy functions or decision rules for k,n, and Care
Note that the addition of a stochastic term to variableA has had no effect on theoptimal decision rules for k,n, and C. This result is not an accident but, as dis-
cussed at the beginning of this article, the key reason why dynamic programming
is preferred over a traditional Lagrangian multiplier method in dynamic
k 0.27y, n 0.463 C 0.73y.
V(y,A) 20.853 1.37 lny 3.52 lnA,
0.72(1.37H)H
(1 0.27G)G,
ln(100 63G) 0.9Z 0.63Gln(100 63G) 3.5G.
u 9.21 0.9Gln G 0.27Gln(100 27G) ln(100 27G)
V(y,A) u (1 0.27G) lny 0.72(GH) lnA,
0.9Z 0.27Gln0.27Gy
1 0.27G 63Gln
0.63G
1 0.63G 0.72HInA.
V(y,A) ln ay 0.27Gy1 0.27G
b lna1 0.63G1 0.63G
b
k0.27Gy
1 0.27G, n 0.63G
1 0.63G.
Spring 2007 183
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
21/23
optimization problems: unpredictable shocks to the state variables do not change
the optimal decision rules for the control variables.9
The above solution process can be automated and processed with other param-
eters if one uses the followingMaple commands:
restart:
C:=y-k; yy:=AA*k^alpha*n^(1-alpha); AA:=A^rho*exp
(epsilon);
# basic definitions
V0:=ln(C)+delta*ln(1-n)+beta*(Z+G*ln(yy)+H*ln(AA));
# the trial solution for the value function is
# inserted into the Bellman equation
delta:=1.; beta:=0.9; rho:=0.8; alpha:=0.3;
assume(y0,A0,k0,n0):
# exclude economically senseless variable ranges
V:=expand(V0); k_opt:=((diff(V,k)))=0; n_opt:=diff
(V,n)=0; solve({k_opt,n_opt},{n,k});
# the right-hand side of the Bellman equation is maxi-
# mized with respect to the control variables n and
# k;
assign(%): `V `=V;
# k and n are set equal to their optimal values
epsilon:=0: `V `=V;
# in applying the expectations operator all epsilon
# terms drop out
simplify(expand(V)): collect(%,ln(y)): V:=collect(%,
ln(A));
# the value function is factored with respect to (ln
# y) and (ln A)
G:=solve(diff(V,y)*y=G,G);
H:=solve(diff(V,A)*A=H,H);
Z:=solve(V-G*ln(y)-H*ln(A)=Z);
# the values for G, H, and Z are derived
V:=Z+G*ln(y)+H*ln(A); `k `=k; `n `=n; `C `=C;
CONCLUSION
I introduced dynamic programming techniques by way of example, with the
help of the computer algebra system Maple. The emphasis of this introduction
was to build confidence and intuition. I solved the examples one step at a time,
with sufficient repetition of key steps to enhance the learning process. I limited
the coverage of solution techniques to avoid confusion at the initial stages oflearning dynamic programming. To integrate the material better, I used the same
examples to introduce different techniques. One example was on the optimal
extraction of a natural resource, another on consumer utility maximization, and
184 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
22/23
the final example covered a simple real business cycle model, which could be
considered an extension of the utility maximization example.
Every example was accompanied by aMaple computer code to make it possi-
ble for readers not only to replicate a given example but also to modify it in var-
ious ways and, ultimately, to encourage application of the techniques to other
areas in economics. As the next step in the process of learning dynamic pro-
gramming techniques, I recommend that students study and solve the economicexamples discussed in Adda and Cooper (2003). Some of them will seem imme-
diately familiar because of the coverage in this article. Others will open up new
avenues of applying dynamic programming techniques.
NOTES
1. I found the brief overview of discrete time dynamic programming in Parlar (2000) helpful becauseof its focus on coding inMaple.
2. In trying to modify the examples, the reader may want to start with very simple changes to mini-
mize program syntax or rounding errors and to avoid situations where no solution is found at all.The latter can occur easily, even for seemingly minor modifications. In fact, it is for precisely thisreason that a large literature has developed in recent years on more powerful numerical solutionmethods. Judd (1998) provided an excellent starting point. Miranda and Fackler (2002) alsodiscussed numerical techniques to solve dynamic programming problems. They focused on thesoftware packageMATLAB.
3. Kamien and Schwartz (1981) contained one of the most thorough treatments of optimal control.Most major textbooks in mathematical methods for economists also contain chapters on optimalcontrol.
4. Comments are identified with a pound sign (#). They are not executed byMaple. Comments referto the line(s) ofMaple code right above.
5. As seen later, knowledge of the optimal value function for time period t 1 implies also knowing
the optimal policy function for the control variable at time t 1.6. As suggested at the end of the previous section, this may be regarded as the key innovation of thedynamic programming approach over the classical Lagrangian multiplier method.
7. The value function is equal to the Bellman equation in equation (1), except that all occurrences ofu3 are replaced by the optimal policy function for u3, and the max argument is removed.
8. If the calculations are done with fewer digits of accuracy, there is more of a discrepancy in resultsregardless of the number of iterations.
9. The stochastic nature of state variables affects how the state variables move through time. Differentsequences of shocks to the state variables produce different time paths, even with the same deci-sion rules for the control variables. From a large number of such stochastic time paths, RBCresearchers calculate standard deviations or other moments of the state and control variables andcompare them with the moments of actually observed variables to check their models. Compare
Adda and Cooper (2003) on this.
REFERENCES
Adda, J., and R. Cooper. 2003. Dynamic economics: Quantitative methods and applications.Cambridge, MA: MIT Press.
Bellman, R. 1957.Dynamic programming. Princeton, NJ: Princeton University Press.Judd, K. L. 1998.Numerical methods in economics. Cambridge, MA: MIT Press.Kamien, M. I., and N. L. Schwartz. 1981.Dynamic optimization: The calculus of variations and opti-
mal control in economics and management. New York: Elsevier North-Holland.King, I. P. 2002. A simple introduction to dynamic programming in macroeconomic models.
http://www.business.auckland.ac.nz/Departments/econ/workingpapers/full/Text230.pdf (last
accessed August 21, 2005).Kydland, F. E., and E. C. Prescott. 1982. Time to build and aggregate fluctuations.Econometrica 50
(6): 134571.Ljungqvist, L., and T. J. Sargent. 2000. Recursive macroeconomic theory. Cambridge, MA: MIT
Press.
Spring 2007 185
DownloadedAt:15:
0910April2011
-
8/12/2019 Zietz, 2007
23/23
Long, J. B., and C. I. Plosser. 1983. Real business cycles.Journal of Political Economy 91 (1): 3969.Miranda, M. J., and P. L. Fackler. 2002.Applied computational economics and finance. Cambridge,
MA: MIT Press.Parlar, M. 2000. Interactive operations research with Maple: Methods and models. Boston:
Birkhuser.Sargent, T. J. 1987.Dynamic macroeconomic theory. Cambridge, MA: Harvard University Press.Stokey, N., R. E. Lucas, with E. C. Prescott. 1989. Recursive methods in economic dynamics.
Cambridge, MA: Harvard University Press.
186 JOURNAL OF ECONOMIC EDUCATION
DownloadedAt:15:
0910April2011