od dynamic programming 2010

1

DYNAMICPROGRAMMING

Dynamic Programming

It is a useful mathematical technique for making asequence of interrelated decisions.

Systematic procedure for determining the optimalcombinations of decisions.

There is no standard mathematical formulation of“the” Dynamic Programming problem.

Knowing when to apply dynamic programmingdepends largely on experience with its generalstructure.

206

2

Prototype example

Stagecoach problem

Fortune seeker that wants to go from Missouri toCalifornia in the mid-19th century.

Travel has 4 stages. A is Missouri and J is California.

Cost is the life insurance of a specific route; lowestcost is equivalent to a safest trip.

207

Costs

Cost cij of going from state i to state j is:

Problem: which route minimizes the total cost of thepolicy?

208

B C D E F G H I J

A 2 4 3 B 7 4 6 E 1 4 H 3

C 3 2 4 F 6 3 I 4

D 4 1 5 G 3 3

3

Solving the problem

Note that greedy approach does not work.Solution A B F I J has total cost of 13.

However, e.g. A D F is cheaper than A B F.

Other possibility: trial-and-error. Too much effort evenfor this simple problem.

Dynamic programming is much more efficient thanexhaustive enumeration, especially for large problems.

Usually starts from the last stage of the problem, andit enlarges it one stage at a time.

209

Formulation

Decision variables xn (n = 1, 2, 3, 4) are the immediatedestination of stage n.

Route is A x1 x2 x3 x4, where x4 = J.

Total cost of the best overall policy for the remainingstages is fn(s, xn).

Actual state is s, ready to start stage n, selecting xn asthe immediate destination.

xn* minimizes fn(s, xn) and fn

*(s, xn) is the minimumvalue of fn(s, xn):

210

* *( ) min ( , ) ( , )n

n n n n nxf s f s x f s x

4

Formulation

where

value of csxnis given by cij where i = s (current state)

and j = xn (immediate destination).

Objective: find f1*(A) and the corresponding route.

Dynamic programming finds successively f4*(s), f3

*(s),f2

*(s) and finally f1*(A).

211

1

( , ) immediate cost (stage ) + minimum future cost (stages +1 onward)( )

n

n n

sx n n

f s x n nc f x

Solution procedure

When n = 4, the route is determined by its currentstate s (H or I) and its final destination J.

Since f4*(s) = f4

*(s, J) = csJ, the solution for n = 4 is

212

s f4*(s) x4

*

H 3 J

I 4 J

5

Stage n = 3

Needs a few calculations. If fortune seeker is in state F,he can go to either H or I with costs cF,H = 6 or cF,I = 3.

Choosing H, the minimum additional cost is f4*(H) = 3.

Total cost is 6 + 3 = 9.

Choosing I, the total cost is 3 + 4 = 7. This is smaller, andit is the optimal choice for state F.

213

F

H

I

6

3

3

4

Stage n = 3

Similar calculations can be made for the two possiblestates s = E and s = G, resulting in the table for n = 3:

214

s x3

f3*(s, x3) = csx3 + f4

*(x3)

H I f3*(s) x3

*

E 4 8 4 H

F 9 7 7 I

G 6 7 6 H

6

Stage n = 2

In this case, f2*(s, x2) = csx2

+ f4*(x3).

Example for node C:x2 = E: f2

*(C, E) = cC,E + f3*(E) = 3 + 4 = 7 optimal

x2 = F: f2*(C, F) = cC,F + f3

*(F) = 2 + 7 = 9.x2 = G: f2

*(C, G) = cC,G + f3*(G) = 4 + 6 = 10.

215

C

E

G

3

4

4

6

F7

2

Stage n = 2

Similar calculations can be made for the two possiblestates s = B and s = D, resulting in the table for n = 2:

216

s x2

f2*(s, x2) = csx2 + f3

*(x2)

E F G f2*(s) x2

*

B 11 11 12 11 E or F

C 7 9 10 7 E

D 8 8 11 8 E or F

7

Stage n = 1

Just one possible starting state: A.x1 = B: f2

*(A, B) = cA,B + f2*(B) = 2 + 11 = 13.

x1 = C: f2*(A, C) = cA,C + f2

*(C) = 4 + 7 = 11 optimal

x1 = D: f2*(A, D) = cA,D + f2

*(D) = 3 + 8 = 11 optimal

Resulting in the table:

217

s x1f1

*(s, x1) = csx1 + f2*(x1)

B C D f1*(s) x1

*

A 13 11 11 11 C or D

Optimal solution

Three optimal solutions, all with f1*(A) = 11:

218

8

Characteristics of DP

1. The problem can be divided into stages, with a policydecision required at each stage.

Example: 4 stages and life insurance policy to choose.

Dynamic programming problems require making asequence of interrelated decisions.

2. Each stage has a number of states associated withthe beginning of each stage.

Example: states are the possible territories where thefortune seeker could be located.

States are possible conditions in which the systemmight be.

219


3. Policy decision transforms the current state to a stateassociated with the beginning of the next stage.

Example: fortune seeker’s decision led him from hiscurrent state to the next state on his journey.

DP problems can be interpreted in terms of networks:each node correspond to a state.

Value assigned to each link is the immediatecontribution to the objective function from makingthat policy decision.

Objective corresponds to finding the shortest or thelongest path.

220

9


4. The solution procedure finds an optimal policy forthe overall problem. Finds a prescription of theoptimal policy decision at each stage for each of thepossible states.

Example: solution procedure constructed a table foreach stage, n, that prescribed the optimal decision, xn

*,for each possible state s.

In addition to identifying optimal solutions, DPprovides a policy prescription of what to do underevery possible circumstance (why a decision is calledpolicy decision).

221


5. Given the current state, an optimal policy for theremaining stages is independent of the policydecisions adopted in previous stages.

Optimal immediate decision depends only on thecurrent state: this is the principle of optimality for DP.

Example: at any state, the insurance policy isindependent on how the fortune seeker got there.

Knowledge of the current state conveys all informationnecessary for determining the optimal policyhenceforth (Markovian property).

222

10


6. Solution procedure begins by finding the optimalpolicy for the last stage. Solution is usually trivial.

7. A recursive relationship that identifies optimal policyfor stage n, given optimal policy for stage n + 1, isavailable.

Example: recursive relationship was

Recursive relationship differs somewhat amongdynamic programming problems.

223

* *1( ) min ( )

nn

n sx n nxf s c f x


7. (cont.) Notation:

224

*

number of stages.label for current stage ( 1, 2, , ).current for stage .decision variable for stage .

optimal value of (given ).( , ) contribution of stages , 1, , to o

n

n

n n n

n n n

Nn n Ns state nx n

x x sf s x n n N

* *

bjective function,system starts in state at stage and immediate decision is .

( , )n n

n n n n

s n x

f f s x

11


7. (cont.) recursive relationship:

where fn(sn, xn) is written in terms of sn, xn and .

8. Using recursive relationship, solution procedurestarts at the end and moves backward stage bystage.

It stops when finds optimal policy starting at initial stage.

The optimal policy for the entire problem was found.

Example: the tables for the stages shown this procedure.

225

*1 1( )n nf s

* *( ) max ( , ) or ( ) min ( , )nn

n n n n n n n n n nxxf s f s x f s f s x

Deterministic dynamic programming

Deterministic problems: the state at the next stage iscompletely determined by the state and the policydecision at the current stage.

Form of the objective function: minimize or maximizethe sum, product, etc. of the contributions from theindividual stages.

Set of states: may be discrete or continuous, or a statevector. Decision variables can also be discrete orcontinuous.

226

12

Example: medical teams

The World Health Council has five medical teams to allocate tothree underdeveloped countries.

Measure of performance: additional person-years of life, i.e.,increased life expectancy (in years) times country’s population.

227

Thousands of additional person-years of lifeCountry

Medical teams 1 2 30 0 0 01 45 20 502 70 45 703 90 75 804 105 110 1005 120 150 130

Formulation of the problem

Problem requires three interrelated decisions: howmany teams to allocate to the three countries (stages).

xn is the number of teams to allocate to stage n.

What are the states? What changes from one stage toanother?

sn = number of medical teams still available forremaining countries.

Thus: s1 = 5, s2 = 5 – x1, s3 = s2 – x2.

228

13

States to be considered

229

Thousands of additionalperson-years of life

Country

Medicalteams

1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

Overall problem

pi(xi): measure of performance for allocating ximedical teams to country i.

230

3

1

Maximize ( ),i ii

p x

3

1

subject to

5,ii

x

and are nonnegative integers.ix

14

Policy

Recursive relationship relating functions:

231

* *10,1, ,

( ) max ( ) ( ) , for 1, 2n n

n n n n n n nx sf s p x f s x n

3

*3 3 3 30,1, ,

( ) max ( ), for 3nx s

f s p x n

Solution procedure

For last stage n = 3, values of p3(x3) are the last columnof table. Here, x3

* = s3 and f3*(s3) = p3(s3).

232

n = 3: s3 f3*(s3) x3

*

0 0 0

1 50 12 70 2

3 80 34 100 4

5 130 5


Country

Medicalteams

1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

15

Stage n = 2

Here, finding x2* requires calculating f2(s2, x2) for the

values of x2 = 0, 1, …, s2. Example for s2 = 2:

233

2

0

2

45

0

0

70

150

20State:


Country

Medicalteams

1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

Stage n = 2

Similar calculations can be made for the other values of s2:

234

n = 2 : s2 x2

f2*(s2, x2) = p2(x2) + f3

*(s2 – x2)

0 1 2 3 4 5 f2*(s2) x2

*

0 0 0 01 50 20 50 02 70 70 45 70 0 or 13 80 90 95 75 95 24 100 100 115 125 110 125 35 130 120 125 145 160 150 160 4

16

Stage n = 1

Only state is the startingstate s1 = 5:

235

n = 1 : s1 x1f1

*(s1, x1) = p1(x1) + f2*(s1 – x1)

0 1 2 3 4 5 f1*(s1) x1

*

5 160 170 165 160 155 120 170 1

5

0

5

120

0

0

160

4125

45State:

...


Country

Medicalteams

1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

Optimal policy decision

236

17

Distribution of effort problem

One kind of resource is allocated to a number ofactivities.

In DP involves only one (or few) resources, while in LPcan deal with thousands of resources.

The assumptions of LP: proportionality, divisibility andcertainty can be violated by DP. Only additivity isnecessary because of the principle of optimality.

World Health Council problem violates proportionalityand divisibility (only integers are allowed).

237

Formulation of distribution of effort

When system starts at stage n in state sn, choice xnresults in the next state at stage s + 1:

238

Stage activity ( 1, 2, , ),decision variable for stage ,

State amount of resource still available for allocatingto remaining activities ( , , )

n

n

n n n nx ns

n N

Stage: n

State: snxn sn – xn

n + 1

18

Example

Distributing scientists to research teams3 teams are solving engineering problem to safely flypeople to Mars.

2 extra scientists reduce the probability of failure.

239

Probability of failure

Team

New scientists 1 2 3

0 0.40 0.60 0.80

1 0.20 0.40 0.50

2 0.15 0.20 0.30

Continuous dynamic programming

Previous examples had a discrete state variable sn, ateach stage.

They all have been reversible; the solution procedurecould have moved backward or forward stage bystage.

Next example is continuous. As sn can take any valuesin certain intervals, the solutions fn

*(sn) and xn* must

be expressed as functions of sn.

Stages in the next example will correspond to timeperiods, so the solution must proceed backwards.

240

19

Example: scheduling jobs

The company Local Job Shop needs to scheduleemployment jobs due to seasonal fluctuations.

Machine operators are difficult to hire and costly to train.Peak season payroll should not be maintained afterwards.Overtime work on a regular basis should be avoided.

Requirements in near future:

241

Season Spring Summer Autumn Winter Spring

Requirements 255 220 240 200 255

Example: scheduling jobs

Employment above level in the table costs 2000€ perperson per season.

Total cost of changing level of employment from oneseason to the other is 200€ times the square of thedifference in employment levels.

Fractional levels are possible due to part-timeemployees.

242

20

Formulation

From data, maximum employment should be 255(spring). It is necessary to find the level of employmentfor other seasons. Seasons are stages.

One cycle of four seasons, where stage 1 is summerand stage 4 is spring.

xn = employment level for stage n (n =1,2,3,4); x4=255rn = minimum employment level for stage n: r1=220,r2=240, r3=200, r4=255. Thus:

rn xn 255

243

Formulation

Cost for stage n:

n = 200(xn – xn–1)2 + 2000(xn – rn)State sn is the employment in the preceding season xn–1

sn = xn–1 (n=1: s1 = x0 = x4 = 255)

Problem:

244

42

11

minimize 2000( ) 200( ) ,i i i ii

x x x r

subject to 255, for 1, 2,3,4i ir x i

21

Detailed formulation

Choose x1, x2 and x3 so as to minimize the cost:

245

n rn Feasible xn Possible sn = xn-1 Cost

1 220 220 £ x1 £ 255 s1 = 255 200(x1 - 255)2 + 2000(x1 - 220)

2 240 240 £ x2 £ 255 220 £ s2 £ 255 200(x2 - x1)2 + 2000(x2 - 240)

3 200 200 £ x3 £ 255 240 £ s3 £ 255 200(x3 - x2)2 + 2000(x3 - 200)

4 255 x4 = 255 200 £ s4 £ 255 200(255 - x3)2

Formulation

Recursive relationship:

Basic structure of the problem:

246

* 2 *1255

( ) min 200( ) 2000( ) ( )n n

n n n n n n n nr xf s x s x r f x

22

Solution procedure

Stage 4: the solution is known to be x4* = 255.

Stage 3: possible values of 240 £ s3 £ 255:

graphical solution:

247

s4 f4* (s4) x4

*

200 £ s4 £ 255 200(255 - s4)2 255

* 2 *3 3 3 3 3 4 3200 255

2 23 3 3 3200 255

( ) min 200( ) 2000( 200) ( )

min 200( ) 2000( 200) 200(255 )n

n

x

x

f s x s x f x

x s x x

Graphical solution for f3*(x3)

248

23

Calculus solution for f3*(x3)

Using calculus:

249

3 3 3 3 3 33

3 3

* 33

( , ) 400( ) 2000 400(255 )

400(2 250) 0250

2

f s x x s xx

x ssx

s3 f3* (s3) x3

*

240 £ s3 £ 255 50(250- s3)2+50(260- s3)2

+1000(s3-150) (s3+250)/2

Stage 2

Solved in a similar fashion:

for 220£ s2 £255 (possible values) and 240£ x2 £ 255(feasible values).

Calculating / x2[f2(s2, x2)] = 0, yields:

250

2 *2 2 2 2 2 2 2 3 3

2 *2 2 2 3 3

2 22 2 2

( , ) 200( ) 2000( ) ( )

200( ) 2000( 240) ( )

50(250 ) 50(260 ) 1000( 150)

f s x x s x r f x

x s x f xx x x

* 22

2 2403

sx

24

Stage 2

The solution has to be feasible for 220 £ s2 £ 255 (i.e.,240 £ x2 £ 255 for 220 £ s2 £ 255)!

is only feasible for 240 £ s2 £ 255.

Need to solve for feasible value of x2 that minimizesf2(s2, x2) when 220 £ s2 < 240.

When s2 < 240,

so x2* = 240.

251

* 22

2 2403

sx

2 2 2 22

( , ) 0 for 240 255f s x xx

Stage 2 and Stage 1

Stage 1: procedure is similar.

Solution:x1

* = 247.5; x2* = 245; x3

* = 247.5; x4* = 255

Total cost of 185 000 €

252

s2 f2* (s2) x2

*

220 £ s2 £ 240 200(240-s2)2+115000 240

240 £ s2 £ 255 200/9[(240- s2)2+(255- s2)2 (270-s2)2 ]+2000(s2-195) (2s2+240)/3

s1 f1* (s1) x1

*

255 185000 247.5

25

Probabilistic dynamic programming

Next stage is not completely determined by state andpolicy decision at the current one

There is a probability distribution for determining thenext state, see figure.

S = number of possible states at stage n + 1.

System goes to i (i = 1,2,…,S) with probability pi givenstate sn and decision xn at stage n.

Ci = contribution of stage n to the objective function.

If figure is expanded to all possible states anddecisions at all stages, it is a decision tree.

253

Basic structure

254

26

Probabilistic dynamic programming

Relation between fn(sn, xn) and f*n+1 (sn+1) depends on

the form of the objective function.

Example: minimize the expected sum of thecontributions from individual stages.

fn(sn, xn) is the minimum expected sum from stage nonward, given state sn, and policy decision xn, at stage n:

with

255

*1

1

( , ) ( )S

n n n i i ni

f s x p C f i

1

*1 1 1( ) min ( , )

nn n nx

f i f i x

Example: determining reject allowances

The Hit-and-Mass Manufacturing Company receivedan order to supply one item of a particular type.

Customer requires specified stringent quality requirements.

Manufacturer has to produce more than one to achieve oneacceptable. Number of extra items is the reject allowance.

Probability of acceptable or defective is ½.

Number of acceptable items in a lot size of L has a binomialdistribution: probability of not acceptable item is (1/2)L.

Setup cost is 300, cost per item is 100. Maximum productionruns is 3. Cost of no acceptable item after 3 runs: 1600.

256

27

Formulation

Objective: determine policy regarding lot size(1 + reject allowance) for required production run(s)that minimizes total expected cost.

Stage n = production run n (n = 1,2,3),xn = lot size for stage n,

State sn = number of acceptable items still needed (1or 0) at the beginning of stage n.

At stage 1, state s1 = 1.

257

Formulation

fn(sn, xn) = total expected costs for stages n,…,3 andoptimal decisions are made thereafter:

where (fn*(0) = 0).

Consider monetary unit = 100. Contribution to costfrom stage n is [K(xn) + xn], with

Note that f4*(1) = 16.

258

*

0,1,( ) min ( , )

nn n n n nx

f s f s x

0, if 0( )

3, if 0n

nn

xK x

x

28

Basic structure of the problem

Recursive relationship:

259

* *10,1,2,

1(1) min ( ) (1)2

for 1, 2,3

n

n

x

n n n nxf K x x f

n

Solution procedure

260

s3 x3f3

*(1, x3) = K(x3) + x3 +16(1/2)x3

n = 3: 0 1 2 3 4 5 f3*(s3) x3

*

0 0 0

1 16 12 9 8 8 8 1/2 8 3 or 4

s2 x2f2

*(1, x2) = K(x2) + x2 +16(1/2)x2 f3*(1)

n = 2: 0 1 2 3 4 f2*(s2) x2

*

0 0 0

1 8 8 7 7 7 1/2 7 2 or 3

s1 x1f1

*(1, x1) = K(x1) + x1 +16(1/2)x1 f2*(1)

n = 1: 0 1 2 3 4 f1*(s1) x1

*

0 0

1 7 7 1/2 6 3/4 6 7/8 7 7/16 6 3/4 2

29

Solution

Optimal policy: produce two items on the firstproduction run;

if none is acceptable, then produce either two or threeitems on the second production run;

if none is acceptable, then produce either three or fouritems on the third production run.

The total expected cost for this policy is 675.

261

Conclusions

Dynamic programming: very useful technique formaking a sequence of interrelated decisions. Itrequires formulating an appropriate recursiverelationship for each individual problem.

Example: problem has 10 stages with 10 states and 10possible decisions at each stage.

Exhaustive enumeration must consider up to 10 billioncombinations.

Dynamic programming need make no more than athousand calculations (10 for each state at each stage).

262

od dynamic programming 2010

Documents