zietz, 2007

8/12/2019 Zietz, 2007

1/23

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by:

On: 10 April 2011

Access details: Access Details: Free Access

Publisher Routledge

Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-

41 Mortimer Street, London W1T 3JH, UK

The Journal of Economic EducationPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t914957642

Dynamic Programming An Introduction by ExampleJoachim ZietzaaMiddle Tennessee State University,

Online publication date: 07 August 2010

To cite this ArticleZietz, Joachim(2007) 'Dynamic Programming: An Introduction by Example', The Journal of EconomicEducation, 38: 2, 165 186

To link to this Article DOI 10.3200/JECE.38.2.165-186URL http://dx.doi.org/10.3200/JECE.38.2.165-186

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.
http://www.informaworld.com/smpp/title~content=t914957642http://dx.doi.org/10.3200/JECE.38.2.165-186http://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://dx.doi.org/10.3200/JECE.38.2.165-186http://www.informaworld.com/smpp/title~content=t914957642

8/12/2019 Zietz, 2007

2/23

Spring 2007 165

Dynamic Programming:An Introduction by Example

Joachim Zietz

Abstract: The author introduces some basic dynamic programming techniques,

using examples, with the help of the computer algebra systemMaple. The empha-

sis is on building confidence and intuition for the solution of dynamic problems

in economics. To integrate the material better, the same examples are used to

introduce different techniques. One covers the optimal extraction of a natural

resource, another uses consumer utility maximization, and the final example

solves a simple real business cycle model. Every example is accompanied byMaple computer code to allow for replication.

Keywords: dynamic programming, learning by example,Maple

JEL codes: C610, A230

Dynamic programming should emphasize intuition over mathematical rigor

when it is introduced. Readers interested in more formal mathematical treatments

are urged to read Sargent (1987), Stokey and Lucas (1989), or Ljungqvist and

Sargent (2000). I build the discussion in this article around a set of examples thatcovers a number of key problems from economics. In contrast to other brief intro-

ductions to dynamic programming, such as King (2002), I integrate the use of a

computer algebra system (Maple). Program code is provided for every example

to encourage replication and experimentation.1 On the basis of classroom experi-

ence, I found that this allows readers to understand more easily the structure of

dynamic programming and to move faster to their own applications.2 The prob-

lems in this article should be accessible to undergraduates that are familiar with

the Lagrangian multiplier method to solve constrained optimization problems.

Dynamic programming has strong similarities to optimal control, a competingapproach to dynamic optimization. Dynamic programming has its roots in the

work of Bellman (1957), and optimal control techniques rest on the work of the

Russian mathematician Pontryagin and his coworkers in the late 1950s.3

Although both dynamic programming and optimal control can be applied to dis-

crete time and continuous time problems, most current applications in economics

appear to favor dynamic programming for discrete time problems and optimal

control for continuous time problems. To keep the discussion reasonably simple,

Joachim Zietz is a professor of economics at Middle Tennessee State University (e-mail:[email protected]). The author thanks two anonymous referees for helpful comments. Maple input code

and output for all example programs are available from the author on request. Copyright 2007Heldref Publications

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

3/23

I only deal with discrete time dynamic programming problems, which have

become popular in macroeconomics. However, I do not limit the discussion to

macroeconomics but try to convey the idea that dynamic programming has appli-

cations in other settings as well.

I provide some motivation for the use of dynamic programming techniques and

then discuss finite horizon problems. I treat numerical and nonnumerical prob-

lems. I next cover infinite horizon problems, again using both numerical and non-numerical examples. Finally, I solve a simple stochastic infinite horizon problem

of the real business cycle variety.

A MOTIVATING EXAMPLE

The basic principle of dynamic programming is best illustrated with an exam-

ple. Consider the problem of an oil company that wants to maximize profits from

an oil well. Revenue at time tis given as

wherep is the price of oil, and u is the amount of oil that is extracted and sold. In

dynamic programming applications,u is typically called a control variable. The

companys cost function is quadratic in the amount of oil that is extracted,

The amount of oil remaining in the well follows the recursion or transition

equation

wherexis known as a state variable in dynamic programming language. Because

oil is a nonrenewable resource, pumping out oil in the amount of u at time tmeans

that exactly that much less oil is left in the oil well at time t 1. I assume that

the company applies the discount factor 0.9 for profit streams that occur in

the future. I also assume that the company intends to have the oil well depleted in

four years, which means thatx4 0. This is known as a boundary condition in

dynamic programming. Given these assumptions, the central question to be

solved is how much oil should be pumped at time t,t 0, . . . , 3, to maximize the

discounted profit stream.

If one had never heard of dynamic programming or optimal control, the natu-

ral way to solve this problem would be to set up a Lagrangian multiplier problem

along the following lines:

where x = (x1,x2,x3), withx0 given exogenously, andx4 0, by design, where

u (u0,u1,u2,u3), (1, 2, 3, 4), and where the constraints specify that theamount of oil left at period tis equal to the amount in the previous period minus

the amount pumped in the previous period. Using standard techniques, the above

maxL(x, u, ) a3

t0

t(RtCt) a4

t1

lt(xtxt1ut1),

xt1xtut,

Ct 0.05u2t.

Rtptut,

166 JOURNAL OF ECONOMIC EDUCATION

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

4/23

Lagrangian problem can be solved for the decision variables x and u and the

Lagrangian multipliers in terms of the exogenous variables (p0,p1,p2,p3,x0).

TheMaple commands to solve this problem are given as follows4:

restart: with(linalg): Digits:=(3):

# memory is cleared, the linear algebra package

# called and the number of digits for floating

# point calculations is set; a colon at the end

# prevents all output

f:=sum(0.9^t*(p[t]*u[t]-0.05*u[t]^2),t = 0..3);

# the objective function is defined

x[0]:=x0: x[4]:0:

g:=sum(lambda[t]*(x[t]-x[t-1]+u[t-1]),t=1..4);

# the constraints are defined

L:=fg;

# the Lagrangian function is defined

L_grad:=grad(L,[seq(x[t],t=1..3),seq(u[t],t=0..3),

seq(lambda[t],t=1..4)]);

# the gradient of the Lagrangian is derived;

# the seq command greatly simplifies the input lines

solve({seq(L_grad[i],i=1..11)},{seq(x[t],t=1..3),

seq(u[t],t=0..3),seq(lambda[t],t=1..4)});

# the gradient is solved for all x, u, and

assign(%):

# all x, u, and

are assigned their solution val-

# ues; this is required to do further calculations

# with the solution values

For example, the optimal amounts of oil to pump in periods zero to four are

given as

If one now specifies a particular sequence of prices and the value of x0, then a

complete numerical solution results. For example, assuming (p0 20,p1 22,

p2 30,p3 25,x0 1000), one can append the followingMaple commands:

x0:=1000: p[0]:=20: p[1]:=22: p[2]:=30: p[3]:=25:

evalf([seq(x[t],t=0..4)]); evalf([seq(u[t],t=0..3)]);

evalf([seq(lambda[t],t=1..4)]);

to generate the solution for [x0;x;x4], [

u],

[] as[1000;794,566,260;0;],[206,227,308,260],[0.60,0.60,

0.60,0.60].

u3 0.291x0 2.91p0 2.91p1 2.91p2 7.10p3 .

u2 0.262x0 2.62p0 2.62p1 7.38p2 2.62p3

u1 0.235x0 2.35p0 7.65p1 2.35p2 2.36p3

u0 0.212x0 7.88p0 2.12p1 2.12p2 2.12p3

Spring 2007 167

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

5/23

Given the relative simplicity with which this problem can be solved with stan-

dard techniques, students may wonder where there is room for a different approach.

The clue to this puzzle lies in the simple fact that one cannot at time 0 pretend to

know all future prices or actual amounts of oil that are pumped. For example, it may

be that oil rigs break down or workers are on strike so that the optimal amount of

oil simply cannot be pumped in a given year. Yet it should be clear that a change in

any of the futurep orxmeans that the above sequence ofxand u is no longer opti-mal. The whole problem needs to be recalculated subject to the new p and the

known values ofx. To avoid recalculating the problem whenever something unex-

pected happens to eitherp orx, economists try to find a decision rule or policy func-

tion for the control variable u that can be followed regardless of random shocks to

price or state variablex. That is the basic idea of dynamic programming.

FINITE HORIZON PROBLEMS

The Motivating Example Again

I now recalculate the previous problem using dynamic programming tech-

niques. The objective is, as before, to maximize

subject to the transition equations

and the boundary condition

Again, I assume that the company applies the discount factor 0.9.

The problem is now solved with the help of the Bellman equations

(1)

where Vt is known as the value function in dynamic programming terminology.The value function Vappears on both the left and the right side of the Bellman

equations, although with a different time subscript. The different time subscripts

suggest that the dynamic programming problem is solved in a recursive manner,

with the solution for Vderived first for time t 1 and then for time t. The lack

of a summation sign in equation (1) underscores the fact that the optimization

proceeds one period at a time. In line with this recursive approach, the discount

factor that is applied to Vt1 has an exponent of one. Vt1 is discounted by just

one period so it can be summed to the value that is provided by the objective

function at time t. The argument max in equation (1) implies that the control vari-able ut is set to maximize the right-hand side of the Bellman equation for time

period t, given the optimal value function for period t 1 (Vt1).5 Note that both

the optimal value function (Vt1) and the optimal policy function for the control

Vt maxut

(ptut 0.05u2t) 0.9Vt1, t 0,1,2,

x4 0.

xt1xtut, t 0,1,2,3

a3

t0

t(ptut 0.05u2t),


DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

6/23

variable at time t 1 (ut1) are expressed in terms of the state variable xt1.6

Because the transition equations linkxt1 toxt and ut, the right-hand side of the

Bellman equation can be maximized with respect to ut only after all occurrences

ofxt1 in Vt1 are replaced with the term (xt ut). In short, there are more utterms

in equation (1) than meet the eye.

In applying the Bellman equations of equation (1), I start with the boundary

condition and the value function for t 3 and then solve the problem recursivelyin the backward direction toward period 0.

To obtain the value function for t 3, I first need to identify the optimal pol-

icy function for u for that period. Because x4 0, the transition equation for

period 3 is given as

Thus, the transition equation fort 3 conveniently provides the optimal decision

rule for the control variable u in period 3,

(2)

Equation (2) simply says to extract and sell all the oil that remains at the begin-

ning of t 3 in that same period. Compared with the rule obtained from apply-

ing the Lagrange multiplier method in the previous section, equation (2) is much

simpler because it is independent of any price variables or any other variables

dated t 3. It is an application of the principle of optimality that underlies

dynamic programming: One can do no better than to follow the derived optimal

policy or decision rule for the control variable, regardless of what happened to

control and state variables in earlier periods.Now that the optimal policy function for u3 is known, the value function for

t 3 can be derived as7

(3)

Note the absence of a term V4 in equation (3). This derives from the fact that no

value is added anymore in period 4 because the oil well is depleted. Equation (3)

completes the optimization process for period t 3. The processes for periods 2,

1, and 0 still need to be done.

To solve the optimization problem for t 2, I substitute equation (3) into theBellman equation for period 2 as follows:

to obtain

(4)

The right-hand side of equation (4) can be maximized with respect to u2 only after

considering thatx3 is connected with u2 via the transition equation

(5)

Substituting equation (5) into equation (4) gives

x3x2u2.

V2 maxu2

(p2u2 0.05u22) 0.9(p3x3 0.05x3

2).

V2 maxu2

(p2u2 0.05u22) 0.9V3

V3p3x3 0.05x32.

u3x3.

x4 0x3u3.

Spring 2007 169

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

7/23

(6)

Now, only the decision variable u2 and variables that are known or exogenous at

time t 2 are in equation (6). The right-hand side of equation (6) can be maxi-

mized with respect to the control variable u2,

Solving the above first-order condition for u2 gives the optimal policy or decision

rule for period 2,

(7)

Similar to the decision rule in equation (2), this policy rule for the control vari-

able is simpler than the corresponding rule from the Lagrangian multiplierapproach in the last section. There is also no need to revise this rule in the light

of unexpected shocks to price or the state variablexprior to t 2. This completes

the calculations for t 2.

To find the optimal policy rule for period 1, insert equation (7) into the value

function for period 2 to get an expression similar to that in equation (3). After

some simplifications, the value function for period 2 can be written as

The value function for period 2 needs to be substituted into the Bellman equationfor period 1,

This gives

(8)

Making use of the transition equation

(9)

in equation (8), one can maximize its right-hand side with respect to u1,

Solving the resulting first-order condition for u1 yields

u1 0.2989x1 7.013p1 2.989p2 2.989p3.

2.636p22 4.734p2p3 0.02368(x1u1)

2 2.132p32 4 6>du1 0.

d5(p1u1 0.05u12) 0.9 30.474(p2p3)(x1u1)

x2x1u1

2.636p22 4.734p2p3 0.02368x22 2.132p32 4 .V1 max

u1(p1u1 0.05u1

2) 0.9 30.474(p2p3)x2

V1 maxu1

(p1u1 0.05u12) 0.9V2.

V2 0.474(p2p3)x2 2.636p22 4.734p2p3 0.02368x2

2 2.132p32.

u2 0.4737x2 5.263p2 4.737p3.

p2 0.19u2 0.9p3 0.09x2 0.

d5(p2u2 0.05u22

) 0.9 3p3(x2u2) 0.05(x2u2)2

4 6>du2 0

V2 maxu2

(p2u2 0.05u22) 0.9 3p3(x2u2) 0.05(x2u2)2 4 .


DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

8/23

This is the optimal policy or decision rule for period 1. It has some similarities

with the equation that determines the optimal value of u1 from the Lagrangian

multiplier approach, but it has fewer terms.

The last step in the solution of the optimal control problem is to obtain the decision

or policy rule for period 0. The procedure is the same as before; insert the optimal deci-

sion rule for period 1 into the Bellman equation for period 1 and simplify to obtain

Next, this value function is inserted into the Bellman equation for period 0, which

is given as

(10)

After substituting all occurrences of x1 in the this equation with (x0 u0), the

right-hand side of equation (10) is maximized with respect to u0. This yields

This decision rule for period 0 is the same as the one derived in the previous sec-

tion using Lagrangian multiplier techniques.

These results can be calculated byMaple with the help of the following code:

restart: N:3: Digits:4:

V[N]:unapply(p[N]*x0.05*x^2,x);

# the objective function is expressed as a function# of the state variable x

for t from N-1 by -1 to 0 do;

# a do loop is entered for a backward recursion

V[t]:=p[t]*u-0.05*u^2+0.9*V[t+1](x-u);

# the Bellman equation is specified; the last term

# says to replace all values of x in V[t+1], which

# are dated t+1, with (x-u), which is dated t

deriv[t]:=diff(V[t],u);

# the right side of the Bellman equation is differen-# tiated with respect to the control variable

u_opt[t]:=evalf(solve(deriv[t],u));

# the first derivative is set to zero and solved for

# the control variable; the optimal policy function

# (u_opt) is evaluated numerically

V[t]:=unapply(evalf(simplify(subs(u=u_opt[t],

V[t]))),x);

# u_opt is substituted into the Bellman equation; the

# result is simplified, evaluated numerically, and# expressed as a function of the state variable x

od;

# the do loop ends.

u0 0.2120x0 7.880p0 2.119p1 2.120p2 2.120p3.

V0 maxu0

(p0 u0 0.05u02) 0.9V1.

2.556p32 3.506p1

2 2.989p1(p2p3).

V1 0.01494x12 3.006p2

2 2.989p2p3 0.2989(p1p2p3)x1

Spring 2007 171

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

9/23

It should be apparent that the recursive structure of the dynamic programming

problem makes it easy in principle to extend the optimization to a larger number

of periods.

A Nonnumerical Example

Much of the work in economics is about solving optimization problems with-

out specific numerical entries. The next example applies the dynamic program-

ming framework in this environment.

Consider the maximization of utility of a consumer who does not work but

lives off a wealth endowment a. Assume that utility depends on the discounted

logged value of consumption c from period zero to two,

where the discount factor is

and where ris the relevant interest rate. Consumption is the models control vari-

able, and the transition equations are given by

The models boundary condition is

which says that wealth is supposed to be depleted at the end of period 2. There

are no bequests. The task is to find the optimal consumption sequence.

The solution process starts in the final period and works backward in time.

Applying the boundary condition to the transition equation for period t 2 results in

This can be solved for c2, the control variable for t 2, to get

(11)

This is the decision rule for period t 2. To find the decision rules for periods

t 1 and t 0, apply the Bellman equation

The Bellman equation for period 1 requires the value function of period 2. The

latter is equal to the above Bellman equation for t 2, with c2 replaced by its

maximized value from equation (11) and with V3 0 because wealth is exhausted

at the end of t 2,

This value function can now be inserted into the Bellman equation for period 1,

V1 maxc1

ln c1 V2,

V2 ln c2 ln 3(1r)a2 4 .

Vt maxct

ln ct Vt1.

c2 (1r)a2.

0 (1r)a2c2.

a3 0,

at1 (1r)atct t 0,1,2.

11(1r),

max U a2

t0

tln c1,


DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

10/23

to obtain

Before the right-hand side of the Bellman equation is maximized with respect to

c1,a2 has to be replaced using the transition equation for t 1,

The Bellman equation for period 1 is now of the form

and its right-hand side can be maximized with respect to the control variable c1.

The first-order condition and c1 are given as

(12)

The decision rule for t 0 still needs calculating. It requires the value function

for t 1. This value function is equal to the Bellman equation for period 1, with

c1 replaced by the optimal decision rule, equation (12),

Hence, the Bellman equation for t 0 is given as

Substituting out a1 with

gives

Maximizing the right-hand side with respect to the control variable c0 results in

the following first-order condition and optimal value for c0:

a0(1r)c0(1 2)

c0 3a0(1r)c0 4 0

2ln B(1r)2(1r)a0c01

R.

V0 maxc0

ln c0 ln B(1r)(1r)a0c01

R

a1 (1r)a0c0

V0 maxc0

ln c0 lna1(1r)

1 2ln

a1(1r)2

1 .

V0 maxc0

ln c0 V1

V1 lna1(1r)

1 ln

a1(1r)2

1 .

c1(1r)a1

1 .3

a1(1r)c1(1 )

4

5c1

3a1(1r)c1

4 6 0,

d5ln c1 ln(1r) ln 3(1r)a1c1 4 6dc1

V1 maxc1

ln c1 ln5(1r) 3(1r)a1c1 4 6,a2 (1r)a1c1.

V1 maxc1

ln c1 ln 3(1r)a2 4 .

Spring 2007 173

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

11/23

(13)

A comparison of the consumption decision rules for t 0, 1, 2 reveals the

following pattern:

or, more generally, for t 0, ,n,

(14)

The ability to recognize and concisely describe such a pattern plays a key role in

applications of dynamic programming in economics, in particular those applica-tions that deal with infinite horizon problems. I discuss them next.

The followingMaple program solves the above consumption problem:

restart: N:=2: Digits:=4:

V[N]:=unapply(log((1+r)*a),a);

# this initial value function follows from the bound-

# ary condition

for t from N-1 by -1 to 0 do

V[t]:=log(c)+beta*V[t+1]((1+r)*a-c);

# Since V is a function of a, the term in parenthe-

# sis after V[t+1] is substituted for a in V[t+1]

deriv[t]:=diff(V[t],c);

c_opt[t]:=solve(deriv[t],c);

V[t]:=unapply(simplify(subs(c=c_opt[t],V[t])),a);

od;

INFINITE HORIZON PROBLEMS

A problem often posed in economics is to find the optimal decision rule if the

planning horizon does not consist of a fixed number of n periods but of an infinite

number of periods. Examining the limiting case n S may also be of interest if

n is not strictly infinite but merely large because taking the limit leads to a deci-

sion rule that is both simplified and the same for every period. Most often that is

preferred when compared with a situation where one has to deal with a distinct

and rather complex rule for each period from t 0 to t n. I illustrate this point

with the consumption problem of the last section.

A Nonnumerical Problem

The optimal consumption decision rule given in equation (14) can be modified

so it applies for an infinite time horizon. For taking the limit, the first part of

ctat(1r)

a i0n t

i

at

a i0n t

i

at

a i ln t1

i, t 0, p , n.

ctat(1r)

a t02 t

i, t 0,1,2,

c0(1r)a0

1 2.


DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

12/23

equation (14) is most useful.

Making use of the geometric series theorem,

yields

The decision rule for consumption in period tsimplifies to the intuitively obvi-

ous: Consume in tonly what you receive in interest from your endowment in t.

The associated transition function for wealth is given as

that is, wealth will remain constant over time.

This example illustrates one way of arriving at optimal decision rules that

apply to an infinite planning horizon: simply solve the problem for a finite num-

ber of periods, identify and write down the pattern for the evolution of the con-

trol variable, and take the limit n S . It should be obvious that the same

procedure can be applied to identify the value function for time period t. To illus-

trate this point, recall the sequence of value functions for t 1, 2.

(15)

(16)

The value function for t 0 is given as

(17)

Proceeding now as in the case of the consumption rule, the following pattern

emerges from equations (15) to (17):

(18)

It is possible to check the correctness of the pattern described by equation (18) bylettingMaple calculate the implied sequence of the Vt. TheMaple code to repli-

cate equations (15) through (17) from equation (18) is given as

restart: n:=2:

Vt an t

i0

ilnat(1r)

i1i

a i0n t

i.

V0 ln

a0(1r)

1 2 ln

a0(1r)2

1 2 2ln

a0 (1r)32

1 2.

V1 lna1(1r)

1 ln

a1(1r)2

1 .

V2 ln 3a2(1r) 4 ,

at1 (1r)atct (1r)atatrat,

limnSq

ctat(1r)(1 )atr.

aq

i0

i 1

1 for 0 6 6 1,

limnSq

ct limnSq

at(1r)

a i0n t

i.

Spring 2007 175

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

13/23

seq(V[t]=sum(beta^i*log(a[t]*(1+r)^(i+1)*beta^i/

(sum(beta^(i),i=0..n-t))),i=0..n-t),t=0..n);

After verifying that the formula is correct, the limit n S is taken

Because interest centers on finding an expression involving the variable at, the

idea is to isolate at. This can be done as follows:

or

The last expression can be rewritten as

(19)

where is an expression independent of a. Taking the limit leads to

(20)

With the help of a computer algebra program, students could circumvent equation

(18) and the following algebra altogether by going from equation (17) immedi-

ately to equation (19) for n 2 and t 0. The following sequence of Maple

commands takes equation (17), expands it, and isolates the terms in a0:

restart:

V[0]:=ln(a[0]*(1+r)/(1+beta+beta^2))+

beta*ln(a[0]*(1+r)^2*beta/(1+beta+beta^2))+

beta^2*ln(a[0]*(1+r)^3*beta^2/(1+beta+beta^2));

# the above is equation 17

assume(beta0,r0);

assume(a[0],real);# the assumptions exclude economically senseless

# cases that prevent the program from finding a

# solution; variables with assumptions attached

limnSq

Vt1

1 ln at ha1 1rb ln at h.

limnSq

Vt lim

nSq

a

n t

i0

iln at h,

limnSq

Vt limnSq

an t

i0

iln at limnSq

an t

i0

iln(1r)i1i

an t

i0i

.

limnSq

Vt limnSq

an t

i0

i c ln at ln (1r)i1i

an t

i0i

d

limnSq

Vt limnSq

an t

i0

ilnat(1r)

i1i

an t

i0i

.


DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

14/23

# are indicated with a in the Maple output

V[0]:=collect(expand(V[0]),log(a[0]));

# the collect command factors the specified term,

# i.e. log(a[0])

The output of these commands is

(21)

where consists of a number of terms in and r.

A Numerically Specified Problem

Infinite horizon problems that are numerically specified can be solved by iter-

ating on the value function. I demonstrate this for the above consumption opti-

mization problem. The task is to find the optimal decision rule for consumption

regardless of time t. I employ the Bellman equation without time subscripts,

where a and a denote current and future value of the state variable, respectively.

The time subscripts are eliminated in the Bellman equation to reflect the idea that

the value function V(a) needs to satisfy the Bellman equation for any a. The iter-

ation method starts with an initial guess of the solution V(a), which is identified

as V(0)(a). Guessing a possible solution is not as difficult as it sounds; each class

of problems typically has associated with it a general form of the solution. For

example, in the current case, it is known from solving the consumption problemfor t 2 that the value function will somehow depend on the log of a.

To simplify the algebra, assume that 0.95.

A numerically specified initial guess. I convert the initial guess of the value func-

tion into numerical form by assumingA 0 andB 1.

The initial guess V(0)(a) is substituted on the right-hand side of the Bellman equation

Next, making use of the transition equation, get the following:

to substitute out a,

the right-hand side of V(a) is maximized with respect to the control variable c,

dV(a)

dc

1

c

0.95

(1.05263ac) 0,

V(a) maxc

ln c 0.95 ln(1.05263ac),

a (1 0.05263) ac

V(a) maxc

ln c 0.95 ln a.

V(0)(a) ln a.

V(0) (a)ABln a.

V(a) maxc

ln c V(a),

V0 (1 2) ln a0 z,

Spring 2007 177

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

15/23

which gives c 0.53981a. Substituting the solution for c back into V(a) yields

and, after simplifying,

Because

the value function requires further iterations. For that purpose, I substitute the

new guess,V(1)(a), into the Bellman equation and get

Substitution of the transition equation for a gives

which simplifies to

Maximizing the right-hand side of V(a) with respect to c yields c 0.36902a.

Substituting the optimal policy rule for c back into V(a) gives

which simplifies to

Because

another iteration is needed. Instead of continuing the above process of iterating

on the value function by hand, I make use of the following Maple program to

iterate until convergence is achieved:

restart: Digits:=8: beta:=0.95:V[0]:=unapply(ln(a),a):

for n from 0 to 200 do

V[n]:=unapply(ln(c)+beta*V[n]((1+((1/beta)-1))*a-

c),(a,c));

# V is a mathematical function of both a and c; in

# the text V is expressed as a function of only the

# state variable a

deriv:=diff(V[n](a,c),c);

c_opt:=expand(evalf(solve(deriv,c)));# the command expand often simplifies the calcula-

# tions that follow especially when used in combi-

# nation with simplify or evalf

V (2)(a)V(1)(a),

V(2)(a) 2.8899 2.8525 ln a.

V (2) (a) ln 0.36902 a 1.1884 1.8525 ln(1.05263a 0.3690a),

V(a) maxc

ln c 1.1884 1.8525 ln(1.05263ac).

V(a) maxc ln c 0.95 3 1.251 1.95 ln(1.05263ac) 4 ,

V(a) maxc

ln c 0.95 ( 1.251 1.95 ln a).

V(1) (a)V(0) (a),

V(1) (a) 1.251 1.95 ln a.

V (1)(a) ln(0.53981a) 0.95 ln(1.05263a 0.53981a),


DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

16/23

V[n+1]:=unapply(evalf(expand(simplify(subs(c=c_opt,V

[n](a,c))))),a);

# an example for an interlocked command structure:

# the substitute command is executed first, followed

# by simplify, expand, evalf, and unapply

od;

After about 200 iterations with eight digits of accuracy, the value function

reaches

and the corresponding policy function for the control variable

These results are effectively identical to the ones that can be obtained from equa-

tion (20) and the corresponding decision rule for c.8 Specifically, equation (20)

predicts the value function to be

and the corresponding decision rule for c is identical to the one above.

A numerically unspecified initial guess. I start with a numerically unspecified

guess of the value function,

(22)

whereA andB are constants to be determined. In this case, the value function will

not be iterated on. Instead, the idea is to algebraically derive values forA andB

that equalize V(a) on the left and V(a) on the right side of the Bellman equation.

The solution process begins with the Bellman equation

and the substitution of the initial guess of the value function,

Next, the transition equation is used to substitute out the state variable a

(23)

The right-hand side of equation (23) is now maximized with respect to the con-

trol variable c,

which results in

.c21.0526a

20 19B

d5ln c 0.95 3ABln(1.05263ac) 4 6dc 0,

V(a) maxc

ln c 0.95 3ABln(1.05263ac) 4 .

V(a) maxc

ln c 0.95 (ABln a).

V(a) maxc

ln c 0.95V (0)(a)

V(0)(a)ABln a,

V(a) 20 ln a h,

c 0.05263a.

V(a) 58.886 19.9994 ln a

Spring 2007 179

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

17/23

The optimal c is substituted back into equation (23), which, upon expansion,

results in

(24)

If the initial guess of the value function, equation (22), had only one unknownparameter, sayA, I could substitute this initial guess for V(a) on the left side

of equation (24) and then solve for the unknown value ofA. With two unknown

parameters in the value function,A andB, this method is not useful. An alter-

native solution procedure rests on a comparison of coefficients. In particular,

the coefficient of ln a implied by equation (24) is set equal to B, which is the

coefficient of ln a in equation (22). The resulting equation is solved for B.

OnceB is known, the right-hand side of equation (24) is set equal to the right-

hand side of equation (22). The resulting equation determinesA.

To proceed with the coefficient comparison, I factor the terms in ln a inequation (24).

(25)

set the coefficient of ln a in equation (25) equal toB,

and solve forB. This results inB 20. To derive parameterA, substituteB 20

into equations (22) and (25), set the two equal

and solve for A. The solution is A 58.888. With the parameters A and B

known, the policy function for c is given as

and the value function is

Both functions match the ones obtained in earlier sections with other solution

methods.

TheMaple commands to replicate the above solution method are given as

restart: Digits:=8:

assume(a0): beta:=0.95:

V[0]:=unapply(A+B*log(a),a);# this is the initial guess of the value function

V:=unapply(ln(c)+beta*V[0]((1+((1/beta)-1))*a-

c),(a,c));

V(a) 58.888 20.0 ln a.

c 0.05263a,

2.9444 0.95A 20 ln aA 20 ln a

(1 0.95B)B,

0.95BlnB 0.95Bln(20 19B),

V(a) 3.047 (1 0.95B) ln a ln(20 19B) 0.95A 2.846B

0.95BlnB 0.95Bln(20 19B).

V(a) 3.047 ln a ln(20 19B) 0.95A 2.846B 0.95Bln a


DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

18/23

# the initial guess is substituted into the Bellman

# equation; V has to be expressed as a mathematical

# function of both a and c; in the text, V is ex-

# pressed as a function of only the state variable a

deriv:=diff(V(a,c),c);

# take the first derivative of the right-hand side of

# Vc_opt:=expand(evalf(solve(deriv,c)));

# solve for the optimal c

V:=expand(simplify(subs(c=c_opt,V(a,c))));

V:=collect(V,ln(a));

# substitute the optimal c into the Bellman equation

# and factor (ln a)

B:=solve(diff(V,a)*a=B,B);

# isolate the coefficient of (ln a), set it equal to

# B, and solve for B; to isolate the coefficient of# (ln a) in V[1], apply the chain rule: dV[1]/d(ln a)

# = (dV[1]/da)*(da/d(ln a))

V0:=A + B*log(a):

A:=solve(V = V0,A);

# solve for the coefficient A

`c `=c_opt;`V(a)`=V0;

A STOCHASTIC INFINITE HORIZON EXAMPLE

Dynamic programming problems can be made stochastic by letting one or

more state variables be influenced by a stochastic disturbance term. Most of the

applications of dynamic programming in macroeconomics follow this path. I pro-

vide one example.

The basic real business cycle (RBC) model (discussed in King 2002) is very

much related to that of Kydland and Prescott (1982) and Long and Plosser (1983).

Many RBC models are variations of this basic model.

The problem is to maximize the expected discounted sum of present and futureutility, where utility is assumed to be a logarithmic function of consumption (C)

and leisure (1 n).

where n represents labor input and available time is set at unity. Maximization is

subject to the market clearing constraint that outputy equals the sum of the two

demand components, consumption and investment.

where investment equals the capital stock kbecause capital is assumed to depre-

ciate fully in one period. There are two transition equations, one for each of the

ytCtkt,

maxE0aq

y0

t 3 ln Ct dln(1nt) 4 ,

Spring 2007 181

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

19/23

two state variables, output and the state of technology (A),

(26)

(27)

The transition equation ofy is a dynamic Cobb-Douglas production function, and

the transition equation ofA follows an autoregressive process subject to a distur-bance term e, which is assumed to be normally distributed. C,n, and kare the

control variables. To make the problem easier to follow, I use numbers for the

parameters, 0.9, 0.3, 1.0, and 0.8.

The Bellman equation for this problem is given as

whereEis the expectations operator. The Bellman equation can be simplified

by substituting for Cand, hence, reducing the problem to two control variables

(28)

The next step in the solution process is to find an initial value function. In the ear-

lier consumption problem, which was similar in terms of the objective function

but had only one state variable a, the value function was of the type

Because there are two state variables in the present problem, I try the analogous

solution

(29)

From here on, the problem follows the method used for the previous section.

Substituting the trial solution from equation (29) into equation (28) gives

(30)

Before the right-hand side of equation (30) can be maximized,y andA that appear in

the expectations term are substituted out by their transition equations (26) and (27).

The above equation is expanded to

(31)

and then maximized with respect to both kand n. This results in two first-order

conditions,

1

yk

0.27G

k 0, and 1

1n

0.63G

n 0,

0.63Gln n 0.9GEe 0.72HlnA 0.9HEe

V(y,A) maxk,n

ln(yk) ln(1n) 0.9Z 0.27Glnk

0.9E 3ZGln(A0.8 eek0.3 n0.7)Hln(A0.8 ee) 4 .V(y,A) maxk,n ln(yk) ln(1n)

V(y,A) maxk,n

ln(yk) ln(1 n) 0.9E(ZGlnyHlnA).

V(0) (y,A)ZGlnyHlnA.

V(a)ABln a.

V(y,A) maxk,n

ln(yk) ln(1n) 0.9EV (0)(y,A),

V(y,A) maxC,k,n

ln Ct ln(1nt) 0.9EV(0)(y,A),

At1Atr

eet 1.

yt1At1kta nt

1a.


DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

20/23

which can be solved for the two control variables to yield

These solutions are substituted back into equation (31), and the expectations

operator is employed to remove the two disturbance terms. These operations

yield

(32)

As in the consumption example of the last section, the implied coefficients of ln

y and lnA and the remainder term of equation (32) have to be compared with G,

H, andZof equation (29). To make such a coefficient comparison feasible, factor

the lny and lnA terms on the right-hand side of equation (32) to obtain

(33)

where

(34)

Note that equation (33) is now in the same format as equation (29) so that thecoefficients of these two equations can be compared. Setting the coefficient of ln

y in equation (33) equal to G yields the determining equation

which results in G 1.37. Employing this value of G, set the coefficient of lnA

in equation (33) equal toH,

and obtain H 3.52. Next, substitute the solutions for G and H into equation(34), set the resulting expression for equal toZ, and solve forZ. This results in

Z 20.853. With the values of Z,G, and Hknown, the value function that

solves the problem is given as

and the policy functions or decision rules for k,n, and Care

Note that the addition of a stochastic term to variableA has had no effect on theoptimal decision rules for k,n, and C. This result is not an accident but, as dis-

cussed at the beginning of this article, the key reason why dynamic programming

is preferred over a traditional Lagrangian multiplier method in dynamic

k 0.27y, n 0.463 C 0.73y.

V(y,A) 20.853 1.37 lny 3.52 lnA,

0.72(1.37H)H

(1 0.27G)G,

ln(100 63G) 0.9Z 0.63Gln(100 63G) 3.5G.

u 9.21 0.9Gln G 0.27Gln(100 27G) ln(100 27G)

V(y,A) u (1 0.27G) lny 0.72(GH) lnA,

0.9Z 0.27Gln0.27Gy

1 0.27G 63Gln

0.63G

1 0.63G 0.72HInA.

V(y,A) ln ay 0.27Gy1 0.27G

b lna1 0.63G1 0.63G

b

k0.27Gy

1 0.27G, n 0.63G

1 0.63G.

Spring 2007 183

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

21/23

optimization problems: unpredictable shocks to the state variables do not change

the optimal decision rules for the control variables.9

The above solution process can be automated and processed with other param-

eters if one uses the followingMaple commands:

restart:

C:=y-k; yy:=AA*k^alpha*n^(1-alpha); AA:=A^rho*exp

(epsilon);

# basic definitions

V0:=ln(C)+delta*ln(1-n)+beta*(Z+G*ln(yy)+H*ln(AA));

# the trial solution for the value function is

# inserted into the Bellman equation

delta:=1.; beta:=0.9; rho:=0.8; alpha:=0.3;

assume(y0,A0,k0,n0):

# exclude economically senseless variable ranges

V:=expand(V0); k_opt:=((diff(V,k)))=0; n_opt:=diff

(V,n)=0; solve({k_opt,n_opt},{n,k});

# the right-hand side of the Bellman equation is maxi-

# mized with respect to the control variables n and

# k;

assign(%): `V `=V;

# k and n are set equal to their optimal values

epsilon:=0: `V `=V;

# in applying the expectations operator all epsilon

# terms drop out

simplify(expand(V)): collect(%,ln(y)): V:=collect(%,

ln(A));

# the value function is factored with respect to (ln

# y) and (ln A)

G:=solve(diff(V,y)*y=G,G);

H:=solve(diff(V,A)*A=H,H);

Z:=solve(V-G*ln(y)-H*ln(A)=Z);

# the values for G, H, and Z are derived

V:=Z+G*ln(y)+H*ln(A); `k `=k; `n `=n; `C `=C;

CONCLUSION

I introduced dynamic programming techniques by way of example, with the

help of the computer algebra system Maple. The emphasis of this introduction

was to build confidence and intuition. I solved the examples one step at a time,

with sufficient repetition of key steps to enhance the learning process. I limited

the coverage of solution techniques to avoid confusion at the initial stages oflearning dynamic programming. To integrate the material better, I used the same

examples to introduce different techniques. One example was on the optimal

extraction of a natural resource, another on consumer utility maximization, and


DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

22/23

the final example covered a simple real business cycle model, which could be

considered an extension of the utility maximization example.

Every example was accompanied by aMaple computer code to make it possi-

ble for readers not only to replicate a given example but also to modify it in var-

ious ways and, ultimately, to encourage application of the techniques to other

areas in economics. As the next step in the process of learning dynamic pro-

gramming techniques, I recommend that students study and solve the economicexamples discussed in Adda and Cooper (2003). Some of them will seem imme-

diately familiar because of the coverage in this article. Others will open up new

avenues of applying dynamic programming techniques.

NOTES

1. I found the brief overview of discrete time dynamic programming in Parlar (2000) helpful becauseof its focus on coding inMaple.

2. In trying to modify the examples, the reader may want to start with very simple changes to mini-

mize program syntax or rounding errors and to avoid situations where no solution is found at all.The latter can occur easily, even for seemingly minor modifications. In fact, it is for precisely thisreason that a large literature has developed in recent years on more powerful numerical solutionmethods. Judd (1998) provided an excellent starting point. Miranda and Fackler (2002) alsodiscussed numerical techniques to solve dynamic programming problems. They focused on thesoftware packageMATLAB.

3. Kamien and Schwartz (1981) contained one of the most thorough treatments of optimal control.Most major textbooks in mathematical methods for economists also contain chapters on optimalcontrol.

4. Comments are identified with a pound sign (#). They are not executed byMaple. Comments referto the line(s) ofMaple code right above.

5. As seen later, knowledge of the optimal value function for time period t 1 implies also knowing

the optimal policy function for the control variable at time t 1.6. As suggested at the end of the previous section, this may be regarded as the key innovation of thedynamic programming approach over the classical Lagrangian multiplier method.

7. The value function is equal to the Bellman equation in equation (1), except that all occurrences ofu3 are replaced by the optimal policy function for u3, and the max argument is removed.

8. If the calculations are done with fewer digits of accuracy, there is more of a discrepancy in resultsregardless of the number of iterations.

9. The stochastic nature of state variables affects how the state variables move through time. Differentsequences of shocks to the state variables produce different time paths, even with the same deci-sion rules for the control variables. From a large number of such stochastic time paths, RBCresearchers calculate standard deviations or other moments of the state and control variables andcompare them with the moments of actually observed variables to check their models. Compare

Adda and Cooper (2003) on this.

REFERENCES

Adda, J., and R. Cooper. 2003. Dynamic economics: Quantitative methods and applications.Cambridge, MA: MIT Press.

Bellman, R. 1957.Dynamic programming. Princeton, NJ: Princeton University Press.Judd, K. L. 1998.Numerical methods in economics. Cambridge, MA: MIT Press.Kamien, M. I., and N. L. Schwartz. 1981.Dynamic optimization: The calculus of variations and opti-

mal control in economics and management. New York: Elsevier North-Holland.King, I. P. 2002. A simple introduction to dynamic programming in macroeconomic models.

http://www.business.auckland.ac.nz/Departments/econ/workingpapers/full/Text230.pdf (last

accessed August 21, 2005).Kydland, F. E., and E. C. Prescott. 1982. Time to build and aggregate fluctuations.Econometrica 50

(6): 134571.Ljungqvist, L., and T. J. Sargent. 2000. Recursive macroeconomic theory. Cambridge, MA: MIT

Press.

Spring 2007 185

DownloadedAt:15:

0910April2011

8/12/2019 Zietz, 2007

23/23

Long, J. B., and C. I. Plosser. 1983. Real business cycles.Journal of Political Economy 91 (1): 3969.Miranda, M. J., and P. L. Fackler. 2002.Applied computational economics and finance. Cambridge,

MA: MIT Press.Parlar, M. 2000. Interactive operations research with Maple: Methods and models. Boston:

Birkhuser.Sargent, T. J. 1987.Dynamic macroeconomic theory. Cambridge, MA: Harvard University Press.Stokey, N., R. E. Lucas, with E. C. Prescott. 1989. Recursive methods in economic dynamics.

Cambridge, MA: Harvard University Press.


DownloadedAt:15:

0910April2011

zietz, 2007

Documents