od dynamic programming large 2010

58
DYNAMIC PROGRAMMING

Upload: carolinarvsocn

Post on 23-Feb-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 1/58

DYNAMIC

PROGRAMMING

Page 2: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 2/58

Dynamic Programming

It is a useful mathematical technique for making a

sequence of interrelated decisions.

Systematic procedure for determining the optimal

combinations of decisions.

There is no standard mathematical formulation of 

“the” Dynamic Programming problem.

Knowing when to apply dynamic programming

depends largely on experience with its general

structure.

206

Page 3: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 3/58

Prototype example

Stagecoach problem

Fortune seeker that wants to go from Missouri to

California in the mid-19th century.

Travel has 4 stages. A is Missouri and J is California. Cost is the life insurance of a specific route; lowest

cost is equivalent to a safest trip.

207

Page 4: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 4/58

Costs

Cost cij of going from state i to state j is:

Problem: which route minimizes the total cost of thepolicy?

208

 B C D E F G H I J 

 A 2 4 3   B 7 4 6   E  1 4   H  3

C  3 2 4   F  6 3   I  4 D 4 1 5   G 3 3

Page 5: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 5/58

Solving the problem

Note that greedy approach does not work.

Solution A  B  F   I   J  has total cost of 13.

However, e.g.  A  D  F is cheaper than A  B  F .

Other possibility: trial-and-error. Too much effort even

for this simple problem.

Dynamic programming is much more efficient than

exhaustive enumeration, especially for large problems.

Usually starts from the last stage of the problem, and

it enlarges it one stage at a time.

209

Page 6: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 6/58

Formulation

Decision variables xn (n = 1, 2, 3, 4) are the immediate

destination of stage n. Route is A  x1  x2  x3  x4, where x4 = J .

Total cost of the best overall policy for the remaining

stages is f n(s, xn).

Actual state is s, ready to start stage n, selecting xn as

the immediate destination.

 xn* minimizes   f n(s, xn) and f n

*(s, xn) is the minimum

value of  f n(s, xn):

210

* *( ) min ( , ) ( , )n

n n n n n x

 f s f s x f s x

Page 7: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 7/58

Formulation

where

value of  csxnis given by cij where i = s (current state)

and   j = xn (immediate destination).

Objective: find f 1*( A) and the corresponding route.

Dynamic programming finds successively f 4*(s), f 3

*(s),

 f 2*(s) and finally f 1

*( A).

211

1

( , ) immediate cost (stage ) + minimum future cost (stages +1 onward)

( )n

n n

sx n n

 f s x n n

c f x

Page 8: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 8/58

Solution procedure

When  n = 4, the route is determined by its current

state s ( H or I ) and its final destination J .

Since f 4*(s) =   f 4

*(s, J ) = csJ , the solution for  n = 4 is

212

s f  4* (s )   x 4

H    3   J 

I    4   J 

Page 9: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 9/58

Stage n = 3

Needs a few calculations. If fortune seeker is in state F ,

he can go to either H or I with costs cF,H = 6 or cF,I = 3. Choosing H , the minimum additional cost is f 4

*( H ) = 3.

Total cost is 6 + 3 = 9.

Choosing I , the total cost is 3 + 4 = 7. This is smaller, and

it is the optimal choice for state F .

213

 H 

 I 

6

3

3

4

Page 10: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 10/58

Stage n = 3

Similar calculations can be made for the two possible

states s = E and s = G, resulting in the table for n = 3:

214

s x

3

f

3

 

s, x

3

) =

c

sx

3

+

f

4

 

x

3

)

H I f

3

 

s

)

x

3

 

E    4 8 4   H 

F    9 7 7   I 

G    6 7 6   H 

Page 11: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 11/58

Stage n = 2

In this case,   f 2*(s, x2) = csx2

+ f 4*( x3).

Example for node C :

 x2 = E :   f 2*(C, E ) = cC,E + f 3

*( E ) = 3 + 4 = 7  optimal

 x2 = F :   f 2*(C, F ) = cC,F + f 3

*(F ) = 2 + 7 = 9.

 x2 = G:   f 2*

(C, G) = cC,G+ f 3*

(G) = 4 + 6 = 10.

215

 E 

G

3

4

4

6

7

2

Page 12: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 12/58

Stage n = 2

Similar calculations can be made for the two possible

states s = B and s = D, resulting in the table for n = 2:

216

s x

2

f

2

 

s, x

2

) =

c

sx

2

+

f

3

 

x

2

)

E F G f

2

 

s

)

x

2

 

B    11 11 12 11   E or F 

C    7 9 10 7   E 

D    8 8 11 8   E or F 

Page 13: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 13/58

Stage n = 1

Just one possible starting state: A.

  x1 = B:   f 2*( A, B) = c A,B + f 2*( B) = 2 + 11 = 13.

  x1 = C :   f 2*( A, C ) = c A,C + f 2

*(C ) = 4 + 7 = 11  optimal

  x1 = D:   f 2*( A, D) = c A,D + f 2

*( D) = 3 + 8 = 11  optimal

Resulting in the table:

217

s x

1

f

1

 

s, x

1

) =

c

sx

1

+

f

2

 

x

1

)

B C D f

1

 

s

)

x

1

 

A   13 11 11 11   C or D 

Page 14: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 14/58

Optimal solution

Three optimal solutions, all with f 1*( A) = 11:

218

Page 15: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 15/58

Characteristics of DP

1. The problem can be divided into stages, with a policy

decision required at each stage. Example: 4 stages and life insurance policy to choose.

Dynamic programming problems require making a

sequence of interrelated decisions .

2. Each stage has a number of states associated with

the beginning of each stage.

Example: states are the possible territories where the

fortune seeker could be located.

States are possible condit ions in which the system

might be.

219

Page 16: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 16/58

Characteristics of DP

3. Policy decision transforms the current state to a state

associated with the beginning of the next stage. Example: fortune seeker’s decision led him from his

current state to the next state on his journey.

DP problems can be interpreted in terms of networks :

each node correspond to a state .

Value assigned to each link is the immediate

contribution to the objective function from making

that policy decision.

Objective corresponds to finding the shortest or the

longest path .

220

Page 17: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 17/58

Characteristics of DP

4. The solution procedure finds an optimal policy for

the overall problem. Finds a prescription of theoptimal policy decision at each stage for each of the

possible states.

Example: solution procedure constructed a table for

each stage, n , that prescribed the optimal decision, xn*,for each possible state s.

In addition to identifying optimal solut ions , DP

provides a policy prescription of what to do under

every possible circumstance (why a decision is calledpolicy decision).

221

Page 18: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 18/58

Characteristics of DP

5. Given the current state, an optimal policy for the 

remaining stages is independent of the policydecisions adopted in previous stages .

Optimal immediate decision depends only on the

current state: this is the principle of optimality for DP.

Example: at any state, the insurance policy is

independent on how the fortune seeker got there.

Knowledge of the current state conveys all information

necessary for determining the optimal policy

henceforth (Markovian property ).

222

Page 19: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 19/58

Characteristics of DP

6. Solution procedure begins by finding the optimal 

policy for the last stage . Solution is usually trivial.7. A recursive relationship that identifies optimal policy

for stage n, given optimal policy for stage  n + 1, is

available.

Example: recursive relationship was

Recursive relationship differs somewhat among

dynamic programming problems.

223

* *

1( ) min ( )n

n

n sx n n x

 f s c f x

Page 20: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 20/58

Characteristics of DP

7. (cont.) Notation:

224

*

number of stages.

label for current stage ( 1, 2, , ).

current for stage .

decision variable for stage .

optimal value of (given ).

( , ) contribution of stages , 1, , to o

n

n

n n n

n n n

 N 

n n N 

s state n

 x n

 x x s

 f s x n n N 

* *

 bjective function,

system starts in state at stage and immediate decision is .

( , )

n n

n n n n

s n x

 f f s x

Page 21: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 21/58

Characteristics of DP

7. (cont.) recursive relationship:

where f n(sn, xn) is written in terms of  sn, xn and .

8. Using recursive relationship, solution procedure

starts at the end and moves backward stage by

stage.

It stops when finds optimal policy starting at initial stage.

The optimal policy for the entire problem was found. Example: the tables for the stages shown this procedure.

225

*

1 1( )n n f s

* *( ) max ( , ) or ( ) min ( , )nn

n n n n n n n n n n x x

 f s f s x f s f s x

Page 22: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 22/58

Deterministic dynamic programming

Determinist ic problems : the state at the next stage is

completely determined by the state and the policy decision at the current stage.

Form of the objective function: minimize or maximize

the sum, product , etc. of the contributions from the

individual stages.

Set of states: may be discrete or continuous, or a state

vector . Decision variables can also be discrete or

continuous.

226

l d l

Page 23: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 23/58

Example: medical teams

The World Health Council has five medical teams to allocate to

three underdeveloped countries. Measure of performance: additional person-years of life, i.e.,

increased life expectancy (in years) times country’s population.

227

Thousands of additional person-years of life

Country

Medical teams 1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

l i f h bl

Page 24: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 24/58

Formulation of the problem

Problem requires three int errelated decisions : how

many teams to allocate to the three countries (stages). xn is the number of teams to allocate to stage n.

What are the states? What changes from one stage to

another?

 sn = number of medical teams still available for

remaining countries.

Thus: s1 = 5,   s2 = 5 –  x1,   s3 = s2 –  x2.

228

S b id d

Page 25: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 25/58

States to be considered

229

Thousands of additional

person-years of life

Country

Medical

teams

1 2 3

0 0 0 01 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

O ll bl

Page 26: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 26/58

Overall problem

 pi( xi): measure of performance for allocating xi

medical teams to country i.

230

3

1

Maximize ( ),i i

i

 p x

3

1

subject to

5,i

i

 x

and are nonnegative integers.i x

P li

Page 27: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 27/58

Policy

Recursive relat ionship relating functions:

231

* *

10,1, ,

( ) max ( ) ( ) , for 1,2n n

n n n n n n n x s

 f s p x f s x n

3

*

3 3 3 30,1, ,

( ) max ( ), for 3n x s

 f s p x n

S l ti d

Page 28: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 28/58

Solution procedure

For last stage n = 3, values of  p3( x3) are the last column

of table. Here, x3* = s3 and f 3*(s3) = p3(s3).

232

n

= 3:

s

3

f

3

*

 

s

3

)

x

3

*

0 0 0

1 50 1

2 70 2

3 80 3

4 100 4

5 130 5

Thousands of additional

person-years of life

Country

Medical

teams

1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

St 2

Page 29: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 29/58

Stage n = 2

Here, finding x2* requires calculating   f 2(s2, x2) for the

values of  x2 = 0, 1, …, s2. Example for s2 = 2:

233

2

0

2

45

0

0

70

1

5020

State:

Thousands of additional

person-years of life

Country

Medical

teams

1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

St 2

Page 30: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 30/58

Stage n = 2

Similar calculations can be made for the other values of  s2:

234

n = 2 : s

2

x

2

f

2

 

s

2

, x

2

) =

p

2

 x

2

) +

f

3

 

s

2

–x

2

)

0 1 2 3 4 5 f

2

 

s

2

) x

2

 

0 0 0 0

1 50 20 50 0

2 70 70 45 70 0 or 1

3 80 90 95 75 95 2

4 100 100 115 125 110 125 3

5 130 120 125 145 160 150 160 4

St 1

Page 31: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 31/58

Stage n = 1

Only state is the starting

state s1 = 5:

235

n = 1 : s

1

x

1

f

1

 

s

1

, x

1

) =p

1

 x

1

) + f

2

 

s

1

–x

1

)

0 1 2 3 4 5 f

1

 

s

1

) x

1

 

5 160 170 165 160 155 120 170 1

5

0

5

120

0

0

160

4

12545State:

 .

 .

 .

Thousands of additional

person-years of life

Country

Medical

teams

1 2 3

0 0 0 0

1 45 20 50

2 70 45 70

3 90 75 80

4 105 110 100

5 120 150 130

O ti l li d i i

Page 32: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 32/58

Optimal policy decision

236

Di t ib ti f ff t bl

Page 33: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 33/58

Distribution of effort problem

One kind of resource is allocated to a number of 

activities . In DP involves only one (or few) resources, while in LP

can deal with thousands of resources.

The assumptions of LP: proportionality , divisibility and

certainty can be violated by DP. Only additivity is

necessary because of the principle of opt imalit y .

World Health Council problem violates proportionality

and divisibility (only integers are allowed).

237

F l ti f di t ib ti f ff t

Page 34: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 34/58

Formulation of distribution of effort

When system starts at stage n in state sn, choice xn

results in the next state at stage s + 1:

238

Stage activity ( 1,2, , ),

decision variable for stage ,

State amount of resource still available for allocating

to remaining activities ( , , )

n

n

n n n n

 x n

s

n N 

Stage:   n

State: sn xn

sn –  xn

n + 1

Example

Page 35: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 35/58

Example

Distributing scientists to research teams

3 teams are solving engineering problem to safely flypeople to Mars.

2 extra scientists reduce the probability of failure.

239

Probability of failure

Team

New scientists 1 2 3

0 0.40 0.60 0.80

1 0.20 0.40 0.502 0.15 0.20 0.30

Continuous dynamic programming

Page 36: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 36/58

Continuous dynamic programming

Previous examples had a discrete state variable sn, at

each stage. They all have been reversible; the solution procedure

could have moved backward or forward stage by

stage.

Next example is continuous. As sn can take any values

in certain intervals, the solutions f n*(sn) and xn

* must

be expressed as functions of  sn.

Stages in the next example will correspond to time 

periods , so the solution must proceed backwards.

240

Example: scheduling jobs

Page 37: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 37/58

Example: scheduling jobs

The company Local Job Shop needs to schedule

employment jobs due to seasonal fluctuations. Machine operators are difficult to hire and costly to train.

Peak season payroll should not be maintained afterwards.

Overtime work on a regular basis should be avoided.

Requirements in near future:

241

Season Spring Summer Autumn Winter Spring

Requirements 255 220 240 200 255

Example: scheduling jobs

Page 38: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 38/58

Example: scheduling jobs

Employment above level in the table costs 2000€ per

person per season. Total cost of changing level of employment from one

season to the other is 200€ times the square of the

difference in employment levels.

Fractional levels are possible due to part-time

employees.

242

Formulation

Page 39: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 39/58

Formulation

From data, maximum employment should be 255

(spring). It is necessary to find the level of employmentfor other seasons. Seasons are stages .

One cycle of four seasons, where stage 1 is summer

and stage 4 is spring.

 xn = employment level for stage n (n =1,2,3,4);   x4=255

 r n = minimum employment level for stage n: r 1=220,

r 2=240, r 3=200, r 4=255. Thus:

r n   xn   255

243

Formulation

Page 40: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 40/58

Formulation

Cost for stage n:

n = 200( xn –  xn –1)2 + 2000( xn –  r n)

State sn is the employment in the preceding season xn –1

sn = xn –1 (n=1: s1 = x0 = x4 = 255)

Problem:

244

42

1

1

minimize 2000( ) 200( ) ,i i i i

i

 x x x r 

subject to 255, for 1,2,3,4i i

r x i

Detailed formulation

Page 41: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 41/58

Detailed formulation

Choose x1, x2 and x3 so as to minimize the cost:

245

n r

n

Feasiblex

n

Possibles

n

= x

n-1

Cost

1 220   220£ x1  £ 255 s

1  =255   200(x 1  -    255)2+2000(x 1  -    220)

2 240   240£ x2  £ 255 220£ s

2  £ 255   200(x 2  -    x 1)

2 +2000(x 2 

  -    240)

3 200   200£ x3  £ 255 240£ s

3  £ 255   200(x 3  -    x 2)

2 +2000(x 3 

  -    200)

4 255   x4  =255 200£ s

4  £ 255   200(255 -    x 3)

2

Formulation

Page 42: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 42/58

Formulation

Recursive relationship:

Basic structure of the problem:

246

* 2 *

1255

( ) min 200( ) 2000( ) ( )n n

n n n n n n n nr x

 f s x s x r f x

Solution procedure

Page 43: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 43/58

Solution procedure

Stage 4: the solution is known to be x 4* = 255.

Stage 3: possible values of 240£

 s3£

255:

graphical solution:

247

s

4

f

4

 

s

4

) x

4

 

200 £ s4  £ 255   200(255 -    s

4 )2 255

* 2 *

3 3 3 3 3 4 3200 255

2 2

3 3 3 3200 255

( ) min 200( ) 2000( 200) ( )

min 200( ) 2000( 200) 200(255 )

n

n

 x

 x

 f s x s x f x

 x s x x

Graphical solution for f *(x )

Page 44: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 44/58

Graphical solution for f 3 ( x3)

248

Calculus solution for f *(x )

Page 45: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 45/58

Calculus solution for f 3 ( x3)

Using calculus:

249

3 3 3 3 3 3

3

3 3

* 33

( , ) 400( ) 2000 400(255 )

400(2 250) 0

250

2

 f s x x s x x

 x s

s x

s

3

f

3

 

s

3

)

x

3

 

240 £ s3  £ 255

  50(250-    s3 )2+50(260-    s

3 )2

+1000(s3 - 150)

  (s3 +250)/2

Stage 2

Page 46: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 46/58

Stage 2

Solved in a similar fashion:

for 220£ s2  £255 (possible values) and 240£ x2 £ 255

(feasible values).

Calculating / x2[ f 2(s2, x2)] = 0, yields:

250

2 *

2 2 2 2 2 2 2 3 3

2 *

2 2 2 3 3

2 2

2 2 2

( , ) 200( ) 2000( ) ( )

200( ) 2000( 240) ( )

50(250 ) 50(260 ) 1000( 150)

 f s x x s x r f x

 x s x f x

 x x x

* 22

2 240

3

s x

 

Stage 2

Page 47: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 47/58

Stage 2

The solution has to be feasible for 220 £ s2  £ 255 (i.e.,

240   £     x2  £ 255 for 220 £ s2 £ 255)!

is only feasible for 240 £ s2  £ 255.

Need to solve for feasible value of  x2

that minimizes

 f 2(s2, x2) when 220 £ s2 < 240.

When s2 < 240,

so  x2*= 240.

251

* 22

2 240

3

s x

 

2 2 2 2

2

( , ) 0 for 240 255 f s x x x

Stage 2 and Stage 1

Page 48: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 48/58

Stage 2 and Stage 1

Stage 1: procedure is similar.

Solution:

 x1

* = 247.5;   x2

* = 245;   x3

* = 247.5;   x4

* = 255

Total cost of 185 000 €

252

s

2

f

2

 

s

2

)

x

2

 

220 £   s 2  £ 240 200(240-   s 

2 )2+115000   240

240 £   s 2  £ 255

  200/9[(240-    s 2 )2+(255-    s 

2 )2 (270- 

s 2 )2 ]+2000(s 

2 - 195)

  (2s 2 +240)/3

s

1

f

1

 

s

1

)

x

1

 

255 185000 247.5

Probabilistic dynamic programming

Page 49: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 49/58

Probabilistic dynamic programming

Next stage is not completely determined by state and

policy decision at the current one There is a probability distribution for determining the

next state, see figure.

 S = number of possible states at stage n + 1.

System goes to i (i = 1,2,…,S ) with probability pi given

state sn and decision xn at stage n.

 C i = contribution of stage n to the objective function.

If figure is expanded to all possible states and

decisions at all stages, it is a decision tree.

253

Basic structure

Page 50: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 50/58

Basic structure

254

Probabilistic dynamic programming

Page 51: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 51/58

Probabilistic dynamic programming

Relation between f n(sn, xn) and f *n+1 (sn+1) depends on

the form of the objective function. Example: minimize the expected sum of the

contributions from individual stages.

  f n(sn, xn) is the minimum expected sum from stage n

onward, given state sn, and policy decision xn, at stage n:

with

255

*

1

1

( , ) ( )S 

n n n i i n

i

 f s x p C f i

1

*

1 1 1( ) min ( , )n

n n n x

 f i f i x

Example: determining reject allowances

Page 52: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 52/58

Example: determining reject allowances

The Hit-and-Mass Manufacturing Company received

an order to supply one item of a particular type. Customer requires specified stringent quality requirements.

Manufacturer has to produce more than one to achieve one

acceptable. Number of extra items is the reject allowance.

Probability of acceptable or defective is ½. Number of acceptable items in a lot size of  L has a binomial

distribution: probability of not acceptable item is (1/2) L.

Setup cost is 300, cost per item is 100. Maximum production

runs is 3. Cost of no acceptable item after 3 runs: 1600.

256

Formulation

Page 53: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 53/58

Formulation

Objective: determine policy regarding lot size

(1 + reject allowance) for required production run(s)that minimizes total expected cost.

Stage n = production run n (n = 1,2,3),

 xn = lot size for stage n,

State sn = number of acceptable items still needed (1

or 0) at the beginning of stage n.

At stage 1, state s1 = 1.

257

Formulation

Page 54: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 54/58

Formulation

  f n(sn, xn) = total expected costs for stages n,…,3 and

optimal decisions are made thereafter:

where ( f n*(0) = 0).

Consider monetary unit = 100. Contribution to costfrom stage n is [K ( xn) + xn], with

Note that f 4*(1) = 16.

258

*

0,1,( ) min ( , )

nn n n n n

 x f s f s x

0, if 0( )

3, if 0

n

n

n

 xK x

 x

 

Basic structure of the problem

Page 55: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 55/58

Basic structure of the problem

Recursive relationship:

259

* *

10,1,2,

1(1) min ( ) (1)

2

for 1,2,3

n

n

 x

n n n n x

 f K x x f 

n

Solution procedure

Page 56: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 56/58

Solution procedure

260

s

3

x

3

f

3

 

1,x

3

) =K x

3

) +x

3

+16 1/2)

x

3

n

= 3: 0 1 2 3 4 5

f

3

 

s

3

)

x

3

 

0 0 0

1 16 12 9 8 8 8 1/2 8 3 or 4

s

2

x

2

f

2

 

1,x

2

) =K x

2

) +x

2

+16 1/2)

x

2

f

3

 

1)

n

= 2: 0 1 2 3 4

f

2

 

s

2

)

x

2

 

0 0 0

1 8 8 7 7 7 1/2 7 2 or 3

s

1

x

1

f

1

 

1,x

1

) =K x

1

) +x

1

+16 1/2)

x

1

f

2

 

1)

n = 1: 0 1 2 3 4 f

1

 

s

1

) x

1

 

0 0

1 7 7 1/2 6 3/4 6 7/8 7 7/16 6 3/4 2

Solution

Page 57: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 57/58

Solution

Optimal policy : produce two items on the first

production run; if none is acceptable, then produce either two or three

items on the second production run;

if none is acceptable, then produce either three or four

items on the third production run.

The total expected cost for this policy is 675.

261

Conclusions

Page 58: OD Dynamic Programming LARGE 2010

7/24/2019 OD Dynamic Programming LARGE 2010

http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 58/58

Conclusions

Dynamic programming: very useful technique for

making a sequence of int errelated decisions . It requires formulat ing an appropriate recursive 

relat ionship for each individual problem.

Example: problem has 10 stages with 10 states and 10

possible decisions at each stage. Exhaustive enumeration must consider up to 10 billion

combinations.

Dynamic programming need make no more than a

thousand calculations (10 for each state at each stage).