od dynamic programming large 2010
TRANSCRIPT
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 1/58
DYNAMIC
PROGRAMMING
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 2/58
Dynamic Programming
It is a useful mathematical technique for making a
sequence of interrelated decisions.
Systematic procedure for determining the optimal
combinations of decisions.
There is no standard mathematical formulation of
“the” Dynamic Programming problem.
Knowing when to apply dynamic programming
depends largely on experience with its general
structure.
206
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 3/58
Prototype example
Stagecoach problem
Fortune seeker that wants to go from Missouri to
California in the mid-19th century.
Travel has 4 stages. A is Missouri and J is California. Cost is the life insurance of a specific route; lowest
cost is equivalent to a safest trip.
207
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 4/58
Costs
Cost cij of going from state i to state j is:
Problem: which route minimizes the total cost of thepolicy?
208
B C D E F G H I J
A 2 4 3 B 7 4 6 E 1 4 H 3
C 3 2 4 F 6 3 I 4 D 4 1 5 G 3 3
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 5/58
Solving the problem
Note that greedy approach does not work.
Solution A B F I J has total cost of 13.
However, e.g. A D F is cheaper than A B F .
Other possibility: trial-and-error. Too much effort even
for this simple problem.
Dynamic programming is much more efficient than
exhaustive enumeration, especially for large problems.
Usually starts from the last stage of the problem, and
it enlarges it one stage at a time.
209
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 6/58
Formulation
Decision variables xn (n = 1, 2, 3, 4) are the immediate
destination of stage n. Route is A x1 x2 x3 x4, where x4 = J .
Total cost of the best overall policy for the remaining
stages is f n(s, xn).
Actual state is s, ready to start stage n, selecting xn as
the immediate destination.
xn* minimizes f n(s, xn) and f n
*(s, xn) is the minimum
value of f n(s, xn):
210
* *( ) min ( , ) ( , )n
n n n n n x
f s f s x f s x
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 7/58
Formulation
where
value of csxnis given by cij where i = s (current state)
and j = xn (immediate destination).
Objective: find f 1*( A) and the corresponding route.
Dynamic programming finds successively f 4*(s), f 3
*(s),
f 2*(s) and finally f 1
*( A).
211
1
( , ) immediate cost (stage ) + minimum future cost (stages +1 onward)
( )n
n n
sx n n
f s x n n
c f x
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 8/58
Solution procedure
When n = 4, the route is determined by its current
state s ( H or I ) and its final destination J .
Since f 4*(s) = f 4
*(s, J ) = csJ , the solution for n = 4 is
212
s f 4* (s ) x 4
*
H 3 J
I 4 J
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 9/58
Stage n = 3
Needs a few calculations. If fortune seeker is in state F ,
he can go to either H or I with costs cF,H = 6 or cF,I = 3. Choosing H , the minimum additional cost is f 4
*( H ) = 3.
Total cost is 6 + 3 = 9.
Choosing I , the total cost is 3 + 4 = 7. This is smaller, and
it is the optimal choice for state F .
213
F
H
I
6
3
3
4
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 10/58
Stage n = 3
Similar calculations can be made for the two possible
states s = E and s = G, resulting in the table for n = 3:
214
s x
3
f
3
s, x
3
) =
c
sx
3
+
f
4
x
3
)
H I f
3
s
)
x
3
E 4 8 4 H
F 9 7 7 I
G 6 7 6 H
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 11/58
Stage n = 2
In this case, f 2*(s, x2) = csx2
+ f 4*( x3).
Example for node C :
x2 = E : f 2*(C, E ) = cC,E + f 3
*( E ) = 3 + 4 = 7 optimal
x2 = F : f 2*(C, F ) = cC,F + f 3
*(F ) = 2 + 7 = 9.
x2 = G: f 2*
(C, G) = cC,G+ f 3*
(G) = 4 + 6 = 10.
215
C
E
G
3
4
4
6
F
7
2
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 12/58
Stage n = 2
Similar calculations can be made for the two possible
states s = B and s = D, resulting in the table for n = 2:
216
s x
2
f
2
s, x
2
) =
c
sx
2
+
f
3
x
2
)
E F G f
2
s
)
x
2
B 11 11 12 11 E or F
C 7 9 10 7 E
D 8 8 11 8 E or F
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 13/58
Stage n = 1
Just one possible starting state: A.
x1 = B: f 2*( A, B) = c A,B + f 2*( B) = 2 + 11 = 13.
x1 = C : f 2*( A, C ) = c A,C + f 2
*(C ) = 4 + 7 = 11 optimal
x1 = D: f 2*( A, D) = c A,D + f 2
*( D) = 3 + 8 = 11 optimal
Resulting in the table:
217
s x
1
f
1
s, x
1
) =
c
sx
1
+
f
2
x
1
)
B C D f
1
s
)
x
1
A 13 11 11 11 C or D
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 14/58
Optimal solution
Three optimal solutions, all with f 1*( A) = 11:
218
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 15/58
Characteristics of DP
1. The problem can be divided into stages, with a policy
decision required at each stage. Example: 4 stages and life insurance policy to choose.
Dynamic programming problems require making a
sequence of interrelated decisions .
2. Each stage has a number of states associated with
the beginning of each stage.
Example: states are the possible territories where the
fortune seeker could be located.
States are possible condit ions in which the system
might be.
219
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 16/58
Characteristics of DP
3. Policy decision transforms the current state to a state
associated with the beginning of the next stage. Example: fortune seeker’s decision led him from his
current state to the next state on his journey.
DP problems can be interpreted in terms of networks :
each node correspond to a state .
Value assigned to each link is the immediate
contribution to the objective function from making
that policy decision.
Objective corresponds to finding the shortest or the
longest path .
220
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 17/58
Characteristics of DP
4. The solution procedure finds an optimal policy for
the overall problem. Finds a prescription of theoptimal policy decision at each stage for each of the
possible states.
Example: solution procedure constructed a table for
each stage, n , that prescribed the optimal decision, xn*,for each possible state s.
In addition to identifying optimal solut ions , DP
provides a policy prescription of what to do under
every possible circumstance (why a decision is calledpolicy decision).
221
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 18/58
Characteristics of DP
5. Given the current state, an optimal policy for the
remaining stages is independent of the policydecisions adopted in previous stages .
Optimal immediate decision depends only on the
current state: this is the principle of optimality for DP.
Example: at any state, the insurance policy is
independent on how the fortune seeker got there.
Knowledge of the current state conveys all information
necessary for determining the optimal policy
henceforth (Markovian property ).
222
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 19/58
Characteristics of DP
6. Solution procedure begins by finding the optimal
policy for the last stage . Solution is usually trivial.7. A recursive relationship that identifies optimal policy
for stage n, given optimal policy for stage n + 1, is
available.
Example: recursive relationship was
Recursive relationship differs somewhat among
dynamic programming problems.
223
* *
1( ) min ( )n
n
n sx n n x
f s c f x
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 20/58
Characteristics of DP
7. (cont.) Notation:
224
*
number of stages.
label for current stage ( 1, 2, , ).
current for stage .
decision variable for stage .
optimal value of (given ).
( , ) contribution of stages , 1, , to o
n
n
n n n
n n n
N
n n N
s state n
x n
x x s
f s x n n N
* *
bjective function,
system starts in state at stage and immediate decision is .
( , )
n n
n n n n
s n x
f f s x
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 21/58
Characteristics of DP
7. (cont.) recursive relationship:
where f n(sn, xn) is written in terms of sn, xn and .
8. Using recursive relationship, solution procedure
starts at the end and moves backward stage by
stage.
It stops when finds optimal policy starting at initial stage.
The optimal policy for the entire problem was found. Example: the tables for the stages shown this procedure.
225
*
1 1( )n n f s
* *( ) max ( , ) or ( ) min ( , )nn
n n n n n n n n n n x x
f s f s x f s f s x
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 22/58
Deterministic dynamic programming
Determinist ic problems : the state at the next stage is
completely determined by the state and the policy decision at the current stage.
Form of the objective function: minimize or maximize
the sum, product , etc. of the contributions from the
individual stages.
Set of states: may be discrete or continuous, or a state
vector . Decision variables can also be discrete or
continuous.
226
l d l
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 23/58
Example: medical teams
The World Health Council has five medical teams to allocate to
three underdeveloped countries. Measure of performance: additional person-years of life, i.e.,
increased life expectancy (in years) times country’s population.
227
Thousands of additional person-years of life
Country
Medical teams 1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
l i f h bl
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 24/58
Formulation of the problem
Problem requires three int errelated decisions : how
many teams to allocate to the three countries (stages). xn is the number of teams to allocate to stage n.
What are the states? What changes from one stage to
another?
sn = number of medical teams still available for
remaining countries.
Thus: s1 = 5, s2 = 5 – x1, s3 = s2 – x2.
228
S b id d
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 25/58
States to be considered
229
Thousands of additional
person-years of life
Country
Medical
teams
1 2 3
0 0 0 01 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
O ll bl
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 26/58
Overall problem
pi( xi): measure of performance for allocating xi
medical teams to country i.
230
3
1
Maximize ( ),i i
i
p x
3
1
subject to
5,i
i
x
and are nonnegative integers.i x
P li
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 27/58
Policy
Recursive relat ionship relating functions:
231
* *
10,1, ,
( ) max ( ) ( ) , for 1,2n n
n n n n n n n x s
f s p x f s x n
3
*
3 3 3 30,1, ,
( ) max ( ), for 3n x s
f s p x n
S l ti d
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 28/58
Solution procedure
For last stage n = 3, values of p3( x3) are the last column
of table. Here, x3* = s3 and f 3*(s3) = p3(s3).
232
n
= 3:
s
3
f
3
*
s
3
)
x
3
*
0 0 0
1 50 1
2 70 2
3 80 3
4 100 4
5 130 5
Thousands of additional
person-years of life
Country
Medical
teams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
St 2
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 29/58
Stage n = 2
Here, finding x2* requires calculating f 2(s2, x2) for the
values of x2 = 0, 1, …, s2. Example for s2 = 2:
233
2
0
2
45
0
0
70
1
5020
State:
Thousands of additional
person-years of life
Country
Medical
teams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
St 2
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 30/58
Stage n = 2
Similar calculations can be made for the other values of s2:
234
n = 2 : s
2
x
2
f
2
s
2
, x
2
) =
p
2
x
2
) +
f
3
s
2
–x
2
)
0 1 2 3 4 5 f
2
s
2
) x
2
0 0 0 0
1 50 20 50 0
2 70 70 45 70 0 or 1
3 80 90 95 75 95 2
4 100 100 115 125 110 125 3
5 130 120 125 145 160 150 160 4
St 1
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 31/58
Stage n = 1
Only state is the starting
state s1 = 5:
235
n = 1 : s
1
x
1
f
1
s
1
, x
1
) =p
1
x
1
) + f
2
s
1
–x
1
)
0 1 2 3 4 5 f
1
s
1
) x
1
5 160 170 165 160 155 120 170 1
5
0
5
120
0
0
160
4
12545State:
.
.
.
Thousands of additional
person-years of life
Country
Medical
teams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
O ti l li d i i
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 32/58
Optimal policy decision
236
Di t ib ti f ff t bl
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 33/58
Distribution of effort problem
One kind of resource is allocated to a number of
activities . In DP involves only one (or few) resources, while in LP
can deal with thousands of resources.
The assumptions of LP: proportionality , divisibility and
certainty can be violated by DP. Only additivity is
necessary because of the principle of opt imalit y .
World Health Council problem violates proportionality
and divisibility (only integers are allowed).
237
F l ti f di t ib ti f ff t
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 34/58
Formulation of distribution of effort
When system starts at stage n in state sn, choice xn
results in the next state at stage s + 1:
238
Stage activity ( 1,2, , ),
decision variable for stage ,
State amount of resource still available for allocating
to remaining activities ( , , )
n
n
n n n n
x n
s
n N
Stage: n
State: sn xn
sn – xn
n + 1
Example
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 35/58
Example
Distributing scientists to research teams
3 teams are solving engineering problem to safely flypeople to Mars.
2 extra scientists reduce the probability of failure.
239
Probability of failure
Team
New scientists 1 2 3
0 0.40 0.60 0.80
1 0.20 0.40 0.502 0.15 0.20 0.30
Continuous dynamic programming
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 36/58
Continuous dynamic programming
Previous examples had a discrete state variable sn, at
each stage. They all have been reversible; the solution procedure
could have moved backward or forward stage by
stage.
Next example is continuous. As sn can take any values
in certain intervals, the solutions f n*(sn) and xn
* must
be expressed as functions of sn.
Stages in the next example will correspond to time
periods , so the solution must proceed backwards.
240
Example: scheduling jobs
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 37/58
Example: scheduling jobs
The company Local Job Shop needs to schedule
employment jobs due to seasonal fluctuations. Machine operators are difficult to hire and costly to train.
Peak season payroll should not be maintained afterwards.
Overtime work on a regular basis should be avoided.
Requirements in near future:
241
Season Spring Summer Autumn Winter Spring
Requirements 255 220 240 200 255
Example: scheduling jobs
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 38/58
Example: scheduling jobs
Employment above level in the table costs 2000€ per
person per season. Total cost of changing level of employment from one
season to the other is 200€ times the square of the
difference in employment levels.
Fractional levels are possible due to part-time
employees.
242
Formulation
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 39/58
Formulation
From data, maximum employment should be 255
(spring). It is necessary to find the level of employmentfor other seasons. Seasons are stages .
One cycle of four seasons, where stage 1 is summer
and stage 4 is spring.
xn = employment level for stage n (n =1,2,3,4); x4=255
r n = minimum employment level for stage n: r 1=220,
r 2=240, r 3=200, r 4=255. Thus:
r n xn 255
243
Formulation
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 40/58
Formulation
Cost for stage n:
n = 200( xn – xn –1)2 + 2000( xn – r n)
State sn is the employment in the preceding season xn –1
sn = xn –1 (n=1: s1 = x0 = x4 = 255)
Problem:
244
42
1
1
minimize 2000( ) 200( ) ,i i i i
i
x x x r
subject to 255, for 1,2,3,4i i
r x i
Detailed formulation
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 41/58
Detailed formulation
Choose x1, x2 and x3 so as to minimize the cost:
245
n r
n
Feasiblex
n
Possibles
n
= x
n-1
Cost
1 220 220£ x1 £ 255 s
1 =255 200(x 1 - 255)2+2000(x 1 - 220)
2 240 240£ x2 £ 255 220£ s
2 £ 255 200(x 2 - x 1)
2 +2000(x 2
- 240)
3 200 200£ x3 £ 255 240£ s
3 £ 255 200(x 3 - x 2)
2 +2000(x 3
- 200)
4 255 x4 =255 200£ s
4 £ 255 200(255 - x 3)
2
Formulation
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 42/58
Formulation
Recursive relationship:
Basic structure of the problem:
246
* 2 *
1255
( ) min 200( ) 2000( ) ( )n n
n n n n n n n nr x
f s x s x r f x
Solution procedure
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 43/58
Solution procedure
Stage 4: the solution is known to be x 4* = 255.
Stage 3: possible values of 240£
s3£
255:
graphical solution:
247
s
4
f
4
s
4
) x
4
200 £ s4 £ 255 200(255 - s
4 )2 255
* 2 *
3 3 3 3 3 4 3200 255
2 2
3 3 3 3200 255
( ) min 200( ) 2000( 200) ( )
min 200( ) 2000( 200) 200(255 )
n
n
x
x
f s x s x f x
x s x x
Graphical solution for f *(x )
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 44/58
Graphical solution for f 3 ( x3)
248
Calculus solution for f *(x )
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 45/58
Calculus solution for f 3 ( x3)
Using calculus:
249
3 3 3 3 3 3
3
3 3
* 33
( , ) 400( ) 2000 400(255 )
400(2 250) 0
250
2
f s x x s x x
x s
s x
s
3
f
3
s
3
)
x
3
240 £ s3 £ 255
50(250- s3 )2+50(260- s
3 )2
+1000(s3 - 150)
(s3 +250)/2
Stage 2
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 46/58
Stage 2
Solved in a similar fashion:
for 220£ s2 £255 (possible values) and 240£ x2 £ 255
(feasible values).
Calculating / x2[ f 2(s2, x2)] = 0, yields:
250
2 *
2 2 2 2 2 2 2 3 3
2 *
2 2 2 3 3
2 2
2 2 2
( , ) 200( ) 2000( ) ( )
200( ) 2000( 240) ( )
50(250 ) 50(260 ) 1000( 150)
f s x x s x r f x
x s x f x
x x x
* 22
2 240
3
s x
Stage 2
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 47/58
Stage 2
The solution has to be feasible for 220 £ s2 £ 255 (i.e.,
240 £ x2 £ 255 for 220 £ s2 £ 255)!
is only feasible for 240 £ s2 £ 255.
Need to solve for feasible value of x2
that minimizes
f 2(s2, x2) when 220 £ s2 < 240.
When s2 < 240,
so x2*= 240.
251
* 22
2 240
3
s x
2 2 2 2
2
( , ) 0 for 240 255 f s x x x
Stage 2 and Stage 1
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 48/58
Stage 2 and Stage 1
Stage 1: procedure is similar.
Solution:
x1
* = 247.5; x2
* = 245; x3
* = 247.5; x4
* = 255
Total cost of 185 000 €
252
s
2
f
2
s
2
)
x
2
220 £ s 2 £ 240 200(240- s
2 )2+115000 240
240 £ s 2 £ 255
200/9[(240- s 2 )2+(255- s
2 )2 (270-
s 2 )2 ]+2000(s
2 - 195)
(2s 2 +240)/3
s
1
f
1
s
1
)
x
1
255 185000 247.5
Probabilistic dynamic programming
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 49/58
Probabilistic dynamic programming
Next stage is not completely determined by state and
policy decision at the current one There is a probability distribution for determining the
next state, see figure.
S = number of possible states at stage n + 1.
System goes to i (i = 1,2,…,S ) with probability pi given
state sn and decision xn at stage n.
C i = contribution of stage n to the objective function.
If figure is expanded to all possible states and
decisions at all stages, it is a decision tree.
253
Basic structure
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 50/58
Basic structure
254
Probabilistic dynamic programming
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 51/58
Probabilistic dynamic programming
Relation between f n(sn, xn) and f *n+1 (sn+1) depends on
the form of the objective function. Example: minimize the expected sum of the
contributions from individual stages.
f n(sn, xn) is the minimum expected sum from stage n
onward, given state sn, and policy decision xn, at stage n:
with
255
*
1
1
( , ) ( )S
n n n i i n
i
f s x p C f i
1
*
1 1 1( ) min ( , )n
n n n x
f i f i x
Example: determining reject allowances
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 52/58
Example: determining reject allowances
The Hit-and-Mass Manufacturing Company received
an order to supply one item of a particular type. Customer requires specified stringent quality requirements.
Manufacturer has to produce more than one to achieve one
acceptable. Number of extra items is the reject allowance.
Probability of acceptable or defective is ½. Number of acceptable items in a lot size of L has a binomial
distribution: probability of not acceptable item is (1/2) L.
Setup cost is 300, cost per item is 100. Maximum production
runs is 3. Cost of no acceptable item after 3 runs: 1600.
256
Formulation
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 53/58
Formulation
Objective: determine policy regarding lot size
(1 + reject allowance) for required production run(s)that minimizes total expected cost.
Stage n = production run n (n = 1,2,3),
xn = lot size for stage n,
State sn = number of acceptable items still needed (1
or 0) at the beginning of stage n.
At stage 1, state s1 = 1.
257
Formulation
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 54/58
Formulation
f n(sn, xn) = total expected costs for stages n,…,3 and
optimal decisions are made thereafter:
where ( f n*(0) = 0).
Consider monetary unit = 100. Contribution to costfrom stage n is [K ( xn) + xn], with
Note that f 4*(1) = 16.
258
*
0,1,( ) min ( , )
nn n n n n
x f s f s x
0, if 0( )
3, if 0
n
n
n
xK x
x
Basic structure of the problem
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 55/58
Basic structure of the problem
Recursive relationship:
259
* *
10,1,2,
1(1) min ( ) (1)
2
for 1,2,3
n
n
x
n n n n x
f K x x f
n
Solution procedure
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 56/58
Solution procedure
260
s
3
x
3
f
3
1,x
3
) =K x
3
) +x
3
+16 1/2)
x
3
n
= 3: 0 1 2 3 4 5
f
3
s
3
)
x
3
0 0 0
1 16 12 9 8 8 8 1/2 8 3 or 4
s
2
x
2
f
2
1,x
2
) =K x
2
) +x
2
+16 1/2)
x
2
f
3
1)
n
= 2: 0 1 2 3 4
f
2
s
2
)
x
2
0 0 0
1 8 8 7 7 7 1/2 7 2 or 3
s
1
x
1
f
1
1,x
1
) =K x
1
) +x
1
+16 1/2)
x
1
f
2
1)
n = 1: 0 1 2 3 4 f
1
s
1
) x
1
0 0
1 7 7 1/2 6 3/4 6 7/8 7 7/16 6 3/4 2
Solution
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 57/58
Solution
Optimal policy : produce two items on the first
production run; if none is acceptable, then produce either two or three
items on the second production run;
if none is acceptable, then produce either three or four
items on the third production run.
The total expected cost for this policy is 675.
261
Conclusions
7/24/2019 OD Dynamic Programming LARGE 2010
http://slidepdf.com/reader/full/od-dynamic-programming-large-2010 58/58
Conclusions
Dynamic programming: very useful technique for
making a sequence of int errelated decisions . It requires formulat ing an appropriate recursive
relat ionship for each individual problem.
Example: problem has 10 stages with 10 states and 10
possible decisions at each stage. Exhaustive enumeration must consider up to 10 billion
combinations.
Dynamic programming need make no more than a
thousand calculations (10 for each state at each stage).