od dynamic programming large 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 1/58

DYNAMIC

PROGRAMMING



Dynamic Programming

It is a useful mathematical technique for making a

sequence of interrelated decisions.

Systematic procedure for determining the optimal

combinations of decisions.

There is no standard mathematical formulation of

“the” Dynamic Programming problem.

Knowing when to apply dynamic programming

depends largely on experience with its general

structure.

206



Prototype example

Stagecoach problem

Fortune seeker that wants to go from Missouri to

California in the mid-19th century.

Travel has 4 stages. A is Missouri and J is California. Cost is the life insurance of a specific route; lowest

cost is equivalent to a safest trip.

207



Costs

Cost cij of going from state i to state j is:

Problem: which route minimizes the total cost of thepolicy?

208

B C D E F G H I J

A 2 4 3 B 7 4 6 E 1 4 H 3

C 3 2 4 F 6 3 I 4 D 4 1 5 G 3 3



Solving the problem

Note that greedy approach does not work.

Solution A B F I J has total cost of 13.

However, e.g. A D F is cheaper than A B F .

Other possibility: trial-and-error. Too much effort even

for this simple problem.

Dynamic programming is much more efficient than

exhaustive enumeration, especially for large problems.

Usually starts from the last stage of the problem, and

it enlarges it one stage at a time.

209



Formulation

Decision variables xn (n = 1, 2, 3, 4) are the immediate

destination of stage n. Route is A x1 x2 x3 x4, where x4 = J .

Total cost of the best overall policy for the remaining

stages is f n(s, xn).

Actual state is s, ready to start stage n, selecting xn as

the immediate destination.

xn* minimizes f n(s, xn) and f n

*(s, xn) is the minimum

value of f n(s, xn):

210

* *( ) min ( , ) ( , )n

n n n n n x

f s f s x f s x



Formulation

where

value of csxnis given by cij where i = s (current state)

and j = xn (immediate destination).

Objective: find f 1*( A) and the corresponding route.

Dynamic programming finds successively f 4*(s), f 3

*(s),

f 2*(s) and finally f 1

*( A).

211

1

( , ) immediate cost (stage ) + minimum future cost (stages +1 onward)

( )n

n n

sx n n

f s x n n

c f x



Solution procedure

When n = 4, the route is determined by its current

state s ( H or I ) and its final destination J .

Since f 4*(s) = f 4

*(s, J ) = csJ , the solution for n = 4 is

212

s f 4* (s ) x 4

*

H 3 J

I 4 J



Stage n = 3

Needs a few calculations. If fortune seeker is in state F ,

he can go to either H or I with costs cF,H = 6 or cF,I = 3. Choosing H , the minimum additional cost is f 4

*( H ) = 3.

Total cost is 6 + 3 = 9.

Choosing I , the total cost is 3 + 4 = 7. This is smaller, and

it is the optimal choice for state F .

213

F

H

I

6

3

3

4



Stage n = 3

Similar calculations can be made for the two possible

states s = E and s = G, resulting in the table for n = 3:

214

s x

3

f

3

s, x

3

) =

c

sx

3

+

f

4

x

3

)

H I f

3

s

)

x

3

E 4 8 4 H

F 9 7 7 I

G 6 7 6 H



Stage n = 2

In this case, f 2*(s, x2) = csx2

+ f 4*( x3).

Example for node C :

x2 = E : f 2*(C, E ) = cC,E + f 3

*( E ) = 3 + 4 = 7 optimal

x2 = F : f 2*(C, F ) = cC,F + f 3

*(F ) = 2 + 7 = 9.

x2 = G: f 2*

(C, G) = cC,G+ f 3*

(G) = 4 + 6 = 10.

215

C

E

G

3

4

4

6

F

7

2



Stage n = 2

Similar calculations can be made for the two possible

states s = B and s = D, resulting in the table for n = 2:

216

s x

2

f

2

s, x

2

) =

c

sx

2

+

f

3

x

2

)

E F G f

2

s

)

x

2

B 11 11 12 11 E or F

C 7 9 10 7 E

D 8 8 11 8 E or F



Stage n = 1

Just one possible starting state: A.

x1 = B: f 2*( A, B) = c A,B + f 2*( B) = 2 + 11 = 13.

x1 = C : f 2*( A, C ) = c A,C + f 2

*(C ) = 4 + 7 = 11 optimal

x1 = D: f 2*( A, D) = c A,D + f 2

*( D) = 3 + 8 = 11 optimal

Resulting in the table:

217

s x

1

f

1

s, x

1

) =

c

sx

1

+

f

2

x

1

)

B C D f

1

s

)

x

1

A 13 11 11 11 C or D



Optimal solution

Three optimal solutions, all with f 1*( A) = 11:

218



Characteristics of DP

1. The problem can be divided into stages, with a policy

decision required at each stage. Example: 4 stages and life insurance policy to choose.

Dynamic programming problems require making a

sequence of interrelated decisions .

2. Each stage has a number of states associated with

the beginning of each stage.

Example: states are the possible territories where the

fortune seeker could be located.

States are possible condit ions in which the system

might be.

219




3. Policy decision transforms the current state to a state

associated with the beginning of the next stage. Example: fortune seeker’s decision led him from his

current state to the next state on his journey.

DP problems can be interpreted in terms of networks :

each node correspond to a state .

Value assigned to each link is the immediate

contribution to the objective function from making

that policy decision.

Objective corresponds to finding the shortest or the

longest path .

220




4. The solution procedure finds an optimal policy for

the overall problem. Finds a prescription of theoptimal policy decision at each stage for each of the

possible states.

Example: solution procedure constructed a table for

each stage, n , that prescribed the optimal decision, xn*,for each possible state s.

In addition to identifying optimal solut ions , DP

provides a policy prescription of what to do under

every possible circumstance (why a decision is calledpolicy decision).

221




5. Given the current state, an optimal policy for the

remaining stages is independent of the policydecisions adopted in previous stages .

Optimal immediate decision depends only on the

current state: this is the principle of optimality for DP.

Example: at any state, the insurance policy is

independent on how the fortune seeker got there.

Knowledge of the current state conveys all information

necessary for determining the optimal policy

henceforth (Markovian property ).

222




6. Solution procedure begins by finding the optimal

policy for the last stage . Solution is usually trivial.7. A recursive relationship that identifies optimal policy

for stage n, given optimal policy for stage n + 1, is

available.

Example: recursive relationship was

Recursive relationship differs somewhat among

dynamic programming problems.

223

* *

1( ) min ( )n

n

n sx n n x

f s c f x




7. (cont.) Notation:

224

*

number of stages.

label for current stage ( 1, 2, , ).

current for stage .

decision variable for stage .

optimal value of (given ).

( , ) contribution of stages , 1, , to o

n

n

n n n

n n n

N

n n N

s state n

x n

x x s

f s x n n N

* *

bjective function,

system starts in state at stage and immediate decision is .

( , )

n n

n n n n

s n x

f f s x




7. (cont.) recursive relationship:

where f n(sn, xn) is written in terms of sn, xn and .

8. Using recursive relationship, solution procedure

starts at the end and moves backward stage by

stage.

It stops when finds optimal policy starting at initial stage.

The optimal policy for the entire problem was found. Example: the tables for the stages shown this procedure.

225

*

1 1( )n n f s

* *( ) max ( , ) or ( ) min ( , )nn

n n n n n n n n n n x x

f s f s x f s f s x



Deterministic dynamic programming

Determinist ic problems : the state at the next stage is

completely determined by the state and the policy decision at the current stage.

Form of the objective function: minimize or maximize

the sum, product , etc. of the contributions from the

individual stages.

Set of states: may be discrete or continuous, or a state

vector . Decision variables can also be discrete or

continuous.

226

l d l



Example: medical teams

The World Health Council has five medical teams to allocate to

three underdeveloped countries. Measure of performance: additional person-years of life, i.e.,

increased life expectancy (in years) times country’s population.

227

Thousands of additional person-years of life

Country

Medical teams 1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

l i f h bl



Formulation of the problem

Problem requires three int errelated decisions : how

many teams to allocate to the three countries (stages). xn is the number of teams to allocate to stage n.

What are the states? What changes from one stage to

another?

sn = number of medical teams still available for

remaining countries.

Thus: s1 = 5, s2 = 5 – x1, s3 = s2 – x2.

228

S b id d



States to be considered

229

Thousands of additional

person-years of life

Country

Medical

teams

1 2 3

0 0 0 01 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

O ll bl



Overall problem

pi( xi): measure of performance for allocating xi

medical teams to country i.

230

3

1

Maximize ( ),i i

i

p x

3

1

subject to

5,i

i

x

and are nonnegative integers.i x

P li



Policy

Recursive relat ionship relating functions:

231

* *

10,1, ,

( ) max ( ) ( ) , for 1,2n n

n n n n n n n x s

f s p x f s x n

3

*

3 3 3 30,1, ,

( ) max ( ), for 3n x s

f s p x n

S l ti d



Solution procedure

For last stage n = 3, values of p3( x3) are the last column

of table. Here, x3* = s3 and f 3*(s3) = p3(s3).

232

n

= 3:

s

3

f

3

*

s

3

)

x

3

*

0 0 0

1 50 1

2 70 2

3 80 3

4 100 4

5 130 5



Country

Medical

teams

1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

St 2



Stage n = 2

Here, finding x2* requires calculating f 2(s2, x2) for the

values of x2 = 0, 1, …, s2. Example for s2 = 2:

233

2

0

2

45

0

0

70

1

5020

State:



Country

Medical

teams

1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

St 2



Stage n = 2

Similar calculations can be made for the other values of s2:

234

n = 2 : s

2

x

2

f

2

s

2

, x

2

) =

p

2

x

2

) +

f

3

s

2

–x

2

)

0 1 2 3 4 5 f

2

s

2

) x

2

0 0 0 0

1 50 20 50 0

2 70 70 45 70 0 or 1

3 80 90 95 75 95 2

4 100 100 115 125 110 125 3

5 130 120 125 145 160 150 160 4

St 1



Stage n = 1

Only state is the starting

state s1 = 5:

235

n = 1 : s

1

x

1

f

1

s

1

, x

1

) =p

1

x

1

) + f

2

s

1

–x

1

)

0 1 2 3 4 5 f

1

s

1

) x

1

5 160 170 165 160 155 120 170 1

5

0

5

120

0

0

160

4

12545State:

.

.

.



Country

Medical

teams

1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

O ti l li d i i



Optimal policy decision

236

Di t ib ti f ff t bl



Distribution of effort problem

One kind of resource is allocated to a number of

activities . In DP involves only one (or few) resources, while in LP

can deal with thousands of resources.

The assumptions of LP: proportionality , divisibility and

certainty can be violated by DP. Only additivity is

necessary because of the principle of opt imalit y .

World Health Council problem violates proportionality

and divisibility (only integers are allowed).

237

F l ti f di t ib ti f ff t



Formulation of distribution of effort

When system starts at stage n in state sn, choice xn

results in the next state at stage s + 1:

238

Stage activity ( 1,2, , ),

decision variable for stage ,

State amount of resource still available for allocating

to remaining activities ( , , )

n

n

n n n n

x n

s

n N

Stage: n

State: sn xn

sn – xn

n + 1

Example



Example

Distributing scientists to research teams

3 teams are solving engineering problem to safely flypeople to Mars.

2 extra scientists reduce the probability of failure.

239

Probability of failure

Team

New scientists 1 2 3

0 0.40 0.60 0.80

1 0.20 0.40 0.502 0.15 0.20 0.30

Continuous dynamic programming



Continuous dynamic programming

Previous examples had a discrete state variable sn, at

each stage. They all have been reversible; the solution procedure

could have moved backward or forward stage by

stage.

Next example is continuous. As sn can take any values

in certain intervals, the solutions f n*(sn) and xn

* must

be expressed as functions of sn.

Stages in the next example will correspond to time

periods , so the solution must proceed backwards.

240

Example: scheduling jobs




The company Local Job Shop needs to schedule

employment jobs due to seasonal fluctuations. Machine operators are difficult to hire and costly to train.

Peak season payroll should not be maintained afterwards.

Overtime work on a regular basis should be avoided.

Requirements in near future:

241

Season Spring Summer Autumn Winter Spring

Requirements 255 220 240 200 255





Employment above level in the table costs 2000€ per

person per season. Total cost of changing level of employment from one

season to the other is 200€ times the square of the

difference in employment levels.

Fractional levels are possible due to part-time

employees.

242

Formulation



Formulation

From data, maximum employment should be 255

(spring). It is necessary to find the level of employmentfor other seasons. Seasons are stages .

One cycle of four seasons, where stage 1 is summer

and stage 4 is spring.

xn = employment level for stage n (n =1,2,3,4); x4=255

r n = minimum employment level for stage n: r 1=220,

r 2=240, r 3=200, r 4=255. Thus:

r n xn 255

243

Formulation



Formulation

Cost for stage n:

n = 200( xn – xn –1)2 + 2000( xn – r n)

State sn is the employment in the preceding season xn –1

sn = xn –1 (n=1: s1 = x0 = x4 = 255)

Problem:

244

42

1

1

minimize 2000( ) 200( ) ,i i i i

i

x x x r

subject to 255, for 1,2,3,4i i

r x i

Detailed formulation



Detailed formulation

Choose x1, x2 and x3 so as to minimize the cost:

245

n r

n

Feasiblex

n

Possibles

n

= x

n-1

Cost

1 220 220£ x1 £ 255 s

1 =255 200(x 1 - 255)2+2000(x 1 - 220)

2 240 240£ x2 £ 255 220£ s

2 £ 255 200(x 2 - x 1)

2 +2000(x 2

- 240)

3 200 200£ x3 £ 255 240£ s

3 £ 255 200(x 3 - x 2)

2 +2000(x 3

- 200)

4 255 x4 =255 200£ s

4 £ 255 200(255 - x 3)

2

Formulation



Formulation

Recursive relationship:

Basic structure of the problem:

246

* 2 *

1255

( ) min 200( ) 2000( ) ( )n n

n n n n n n n nr x

f s x s x r f x

Solution procedure



Solution procedure

Stage 4: the solution is known to be x 4* = 255.

Stage 3: possible values of 240£

s3£

255:

graphical solution:

247

s

4

f

4

s

4

) x

4

200 £ s4 £ 255 200(255 - s

4 )2 255

* 2 *

3 3 3 3 3 4 3200 255

2 2

3 3 3 3200 255

( ) min 200( ) 2000( 200) ( )

min 200( ) 2000( 200) 200(255 )

n

n

x

x

f s x s x f x

x s x x

Graphical solution for f *(x )



Graphical solution for f 3 ( x3)

248

Calculus solution for f *(x )



Calculus solution for f 3 ( x3)

Using calculus:

249

3 3 3 3 3 3

3

3 3

* 33

( , ) 400( ) 2000 400(255 )

400(2 250) 0

250

2

f s x x s x x

x s

s x

s

3

f

3

s

3

)

x

3

240 £ s3 £ 255

50(250- s3 )2+50(260- s

3 )2

+1000(s3 - 150)

(s3 +250)/2

Stage 2



Stage 2

Solved in a similar fashion:

for 220£ s2 £255 (possible values) and 240£ x2 £ 255

(feasible values).

Calculating / x2[ f 2(s2, x2)] = 0, yields:

250

2 *

2 2 2 2 2 2 2 3 3

2 *

2 2 2 3 3

2 2

2 2 2

( , ) 200( ) 2000( ) ( )

200( ) 2000( 240) ( )

50(250 ) 50(260 ) 1000( 150)

f s x x s x r f x

x s x f x

x x x

* 22

2 240

3

s x

Stage 2



Stage 2

The solution has to be feasible for 220 £ s2 £ 255 (i.e.,

240 £ x2 £ 255 for 220 £ s2 £ 255)!

is only feasible for 240 £ s2 £ 255.

Need to solve for feasible value of x2

that minimizes

f 2(s2, x2) when 220 £ s2 < 240.

When s2 < 240,

so x2*= 240.

251

* 22

2 240

3

s x

2 2 2 2

2

( , ) 0 for 240 255 f s x x x

Stage 2 and Stage 1



Stage 2 and Stage 1

Stage 1: procedure is similar.

Solution:

x1

* = 247.5; x2

* = 245; x3

* = 247.5; x4

* = 255

Total cost of 185 000 €

252

s

2

f

2

s

2

)

x

2

220 £ s 2 £ 240 200(240- s

2 )2+115000 240

240 £ s 2 £ 255

200/9[(240- s 2 )2+(255- s

2 )2 (270-

s 2 )2 ]+2000(s

2 - 195)

(2s 2 +240)/3

s

1

f

1

s

1

)

x

1

255 185000 247.5

Probabilistic dynamic programming




Next stage is not completely determined by state and

policy decision at the current one There is a probability distribution for determining the

next state, see figure.

S = number of possible states at stage n + 1.

System goes to i (i = 1,2,…,S ) with probability pi given

state sn and decision xn at stage n.

C i = contribution of stage n to the objective function.

If figure is expanded to all possible states and

decisions at all stages, it is a decision tree.

253

Basic structure



Basic structure

254





Relation between f n(sn, xn) and f *n+1 (sn+1) depends on

the form of the objective function. Example: minimize the expected sum of the

contributions from individual stages.

f n(sn, xn) is the minimum expected sum from stage n

onward, given state sn, and policy decision xn, at stage n:

with

255

*

1

1

( , ) ( )S

n n n i i n

i

f s x p C f i

1

*

1 1 1( ) min ( , )n

n n n x

f i f i x

Example: determining reject allowances



Example: determining reject allowances

The Hit-and-Mass Manufacturing Company received

an order to supply one item of a particular type. Customer requires specified stringent quality requirements.

Manufacturer has to produce more than one to achieve one

acceptable. Number of extra items is the reject allowance.

Probability of acceptable or defective is ½. Number of acceptable items in a lot size of L has a binomial

distribution: probability of not acceptable item is (1/2) L.

Setup cost is 300, cost per item is 100. Maximum production

runs is 3. Cost of no acceptable item after 3 runs: 1600.

256

Formulation



Formulation

Objective: determine policy regarding lot size

(1 + reject allowance) for required production run(s)that minimizes total expected cost.

Stage n = production run n (n = 1,2,3),

xn = lot size for stage n,

State sn = number of acceptable items still needed (1

or 0) at the beginning of stage n.

At stage 1, state s1 = 1.

257

Formulation



Formulation

f n(sn, xn) = total expected costs for stages n,…,3 and

optimal decisions are made thereafter:

where ( f n*(0) = 0).

Consider monetary unit = 100. Contribution to costfrom stage n is [K ( xn) + xn], with

Note that f 4*(1) = 16.

258

*

0,1,( ) min ( , )

nn n n n n

x f s f s x

0, if 0( )

3, if 0

n

n

n

xK x

x

Basic structure of the problem



Basic structure of the problem

Recursive relationship:

259

* *

10,1,2,

1(1) min ( ) (1)

2

for 1,2,3

n

n

x

n n n n x

f K x x f

n

Solution procedure



Solution procedure

260

s

3

x

3

f

3

1,x

3

) =K x

3

) +x

3

+16 1/2)

x

3

n

= 3: 0 1 2 3 4 5

f

3

s

3

)

x

3

0 0 0

1 16 12 9 8 8 8 1/2 8 3 or 4

s

2

x

2

f

2

1,x

2

) =K x

2

) +x

2

+16 1/2)

x

2

f

3

1)

n

= 2: 0 1 2 3 4

f

2

s

2

)

x

2

0 0 0

1 8 8 7 7 7 1/2 7 2 or 3

s

1

x

1

f

1

1,x

1

) =K x

1

) +x

1

+16 1/2)

x

1

f

2

1)

n = 1: 0 1 2 3 4 f

1

s

1

) x

1

0 0

1 7 7 1/2 6 3/4 6 7/8 7 7/16 6 3/4 2

Solution



Solution

Optimal policy : produce two items on the first

production run; if none is acceptable, then produce either two or three

items on the second production run;

if none is acceptable, then produce either three or four

items on the third production run.

The total expected cost for this policy is 675.

261

Conclusions



Conclusions

Dynamic programming: very useful technique for

making a sequence of int errelated decisions . It requires formulat ing an appropriate recursive

relat ionship for each individual problem.

Example: problem has 10 stages with 10 states and 10

possible decisions at each stage. Exhaustive enumeration must consider up to 10 billion

combinations.

Dynamic programming need make no more than a

thousand calculations (10 for each state at each stage).

od dynamic programming large 2010

Documents