od dynamic programming 2010
DESCRIPTION
asdfadsfTRANSCRIPT
1
DYNAMICPROGRAMMING
Dynamic Programming
It is a useful mathematical technique for making asequence of interrelated decisions.
Systematic procedure for determining the optimalcombinations of decisions.
There is no standard mathematical formulation of“the” Dynamic Programming problem.
Knowing when to apply dynamic programmingdepends largely on experience with its generalstructure.
206
2
Prototype example
Stagecoach problem
Fortune seeker that wants to go from Missouri toCalifornia in the mid-19th century.
Travel has 4 stages. A is Missouri and J is California.
Cost is the life insurance of a specific route; lowestcost is equivalent to a safest trip.
207
Costs
Cost cij of going from state i to state j is:
Problem: which route minimizes the total cost of thepolicy?
208
B C D E F G H I J
A 2 4 3 B 7 4 6 E 1 4 H 3
C 3 2 4 F 6 3 I 4
D 4 1 5 G 3 3
3
Solving the problem
Note that greedy approach does not work.Solution A B F I J has total cost of 13.
However, e.g. A D F is cheaper than A B F.
Other possibility: trial-and-error. Too much effort evenfor this simple problem.
Dynamic programming is much more efficient thanexhaustive enumeration, especially for large problems.
Usually starts from the last stage of the problem, andit enlarges it one stage at a time.
209
Formulation
Decision variables xn (n = 1, 2, 3, 4) are the immediatedestination of stage n.
Route is A x1 x2 x3 x4, where x4 = J.
Total cost of the best overall policy for the remainingstages is fn(s, xn).
Actual state is s, ready to start stage n, selecting xn asthe immediate destination.
xn* minimizes fn(s, xn) and fn
*(s, xn) is the minimumvalue of fn(s, xn):
210
* *( ) min ( , ) ( , )n
n n n n nxf s f s x f s x
4
Formulation
where
value of csxnis given by cij where i = s (current state)
and j = xn (immediate destination).
Objective: find f1*(A) and the corresponding route.
Dynamic programming finds successively f4*(s), f3
*(s),f2
*(s) and finally f1*(A).
211
1
( , ) immediate cost (stage ) + minimum future cost (stages +1 onward)( )
n
n n
sx n n
f s x n nc f x
Solution procedure
When n = 4, the route is determined by its currentstate s (H or I) and its final destination J.
Since f4*(s) = f4
*(s, J) = csJ, the solution for n = 4 is
212
s f4*(s) x4
*
H 3 J
I 4 J
5
Stage n = 3
Needs a few calculations. If fortune seeker is in state F,he can go to either H or I with costs cF,H = 6 or cF,I = 3.
Choosing H, the minimum additional cost is f4*(H) = 3.
Total cost is 6 + 3 = 9.
Choosing I, the total cost is 3 + 4 = 7. This is smaller, andit is the optimal choice for state F.
213
F
H
I
6
3
3
4
Stage n = 3
Similar calculations can be made for the two possiblestates s = E and s = G, resulting in the table for n = 3:
214
s x3
f3*(s, x3) = csx3 + f4
*(x3)
H I f3*(s) x3
*
E 4 8 4 H
F 9 7 7 I
G 6 7 6 H
6
Stage n = 2
In this case, f2*(s, x2) = csx2
+ f4*(x3).
Example for node C:x2 = E: f2
*(C, E) = cC,E + f3*(E) = 3 + 4 = 7 optimal
x2 = F: f2*(C, F) = cC,F + f3
*(F) = 2 + 7 = 9.x2 = G: f2
*(C, G) = cC,G + f3*(G) = 4 + 6 = 10.
215
C
E
G
3
4
4
6
F7
2
Stage n = 2
Similar calculations can be made for the two possiblestates s = B and s = D, resulting in the table for n = 2:
216
s x2
f2*(s, x2) = csx2 + f3
*(x2)
E F G f2*(s) x2
*
B 11 11 12 11 E or F
C 7 9 10 7 E
D 8 8 11 8 E or F
7
Stage n = 1
Just one possible starting state: A.x1 = B: f2
*(A, B) = cA,B + f2*(B) = 2 + 11 = 13.
x1 = C: f2*(A, C) = cA,C + f2
*(C) = 4 + 7 = 11 optimal
x1 = D: f2*(A, D) = cA,D + f2
*(D) = 3 + 8 = 11 optimal
Resulting in the table:
217
s x1f1
*(s, x1) = csx1 + f2*(x1)
B C D f1*(s) x1
*
A 13 11 11 11 C or D
Optimal solution
Three optimal solutions, all with f1*(A) = 11:
218
8
Characteristics of DP
1. The problem can be divided into stages, with a policydecision required at each stage.
Example: 4 stages and life insurance policy to choose.
Dynamic programming problems require making asequence of interrelated decisions.
2. Each stage has a number of states associated withthe beginning of each stage.
Example: states are the possible territories where thefortune seeker could be located.
States are possible conditions in which the systemmight be.
219
Characteristics of DP
3. Policy decision transforms the current state to a stateassociated with the beginning of the next stage.
Example: fortune seeker’s decision led him from hiscurrent state to the next state on his journey.
DP problems can be interpreted in terms of networks:each node correspond to a state.
Value assigned to each link is the immediatecontribution to the objective function from makingthat policy decision.
Objective corresponds to finding the shortest or thelongest path.
220
9
Characteristics of DP
4. The solution procedure finds an optimal policy forthe overall problem. Finds a prescription of theoptimal policy decision at each stage for each of thepossible states.
Example: solution procedure constructed a table foreach stage, n, that prescribed the optimal decision, xn
*,for each possible state s.
In addition to identifying optimal solutions, DPprovides a policy prescription of what to do underevery possible circumstance (why a decision is calledpolicy decision).
221
Characteristics of DP
5. Given the current state, an optimal policy for theremaining stages is independent of the policydecisions adopted in previous stages.
Optimal immediate decision depends only on thecurrent state: this is the principle of optimality for DP.
Example: at any state, the insurance policy isindependent on how the fortune seeker got there.
Knowledge of the current state conveys all informationnecessary for determining the optimal policyhenceforth (Markovian property).
222
10
Characteristics of DP
6. Solution procedure begins by finding the optimalpolicy for the last stage. Solution is usually trivial.
7. A recursive relationship that identifies optimal policyfor stage n, given optimal policy for stage n + 1, isavailable.
Example: recursive relationship was
Recursive relationship differs somewhat amongdynamic programming problems.
223
* *1( ) min ( )
nn
n sx n nxf s c f x
Characteristics of DP
7. (cont.) Notation:
224
*
number of stages.label for current stage ( 1, 2, , ).current for stage .decision variable for stage .
optimal value of (given ).( , ) contribution of stages , 1, , to o
n
n
n n n
n n n
Nn n Ns state nx n
x x sf s x n n N
* *
bjective function,system starts in state at stage and immediate decision is .
( , )n n
n n n n
s n x
f f s x
11
Characteristics of DP
7. (cont.) recursive relationship:
where fn(sn, xn) is written in terms of sn, xn and .
8. Using recursive relationship, solution procedurestarts at the end and moves backward stage bystage.
It stops when finds optimal policy starting at initial stage.
The optimal policy for the entire problem was found.
Example: the tables for the stages shown this procedure.
225
*1 1( )n nf s
* *( ) max ( , ) or ( ) min ( , )nn
n n n n n n n n n nxxf s f s x f s f s x
Deterministic dynamic programming
Deterministic problems: the state at the next stage iscompletely determined by the state and the policydecision at the current stage.
Form of the objective function: minimize or maximizethe sum, product, etc. of the contributions from theindividual stages.
Set of states: may be discrete or continuous, or a statevector. Decision variables can also be discrete orcontinuous.
226
12
Example: medical teams
The World Health Council has five medical teams to allocate tothree underdeveloped countries.
Measure of performance: additional person-years of life, i.e.,increased life expectancy (in years) times country’s population.
227
Thousands of additional person-years of lifeCountry
Medical teams 1 2 30 0 0 01 45 20 502 70 45 703 90 75 804 105 110 1005 120 150 130
Formulation of the problem
Problem requires three interrelated decisions: howmany teams to allocate to the three countries (stages).
xn is the number of teams to allocate to stage n.
What are the states? What changes from one stage toanother?
sn = number of medical teams still available forremaining countries.
Thus: s1 = 5, s2 = 5 – x1, s3 = s2 – x2.
228
13
States to be considered
229
Thousands of additionalperson-years of life
Country
Medicalteams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
Overall problem
pi(xi): measure of performance for allocating ximedical teams to country i.
230
3
1
Maximize ( ),i ii
p x
3
1
subject to
5,ii
x
and are nonnegative integers.ix
14
Policy
Recursive relationship relating functions:
231
* *10,1, ,
( ) max ( ) ( ) , for 1, 2n n
n n n n n n nx sf s p x f s x n
3
*3 3 3 30,1, ,
( ) max ( ), for 3nx s
f s p x n
Solution procedure
For last stage n = 3, values of p3(x3) are the last columnof table. Here, x3
* = s3 and f3*(s3) = p3(s3).
232
n = 3: s3 f3*(s3) x3
*
0 0 0
1 50 12 70 2
3 80 34 100 4
5 130 5
Thousands of additionalperson-years of life
Country
Medicalteams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
15
Stage n = 2
Here, finding x2* requires calculating f2(s2, x2) for the
values of x2 = 0, 1, …, s2. Example for s2 = 2:
233
2
0
2
45
0
0
70
150
20State:
Thousands of additionalperson-years of life
Country
Medicalteams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
Stage n = 2
Similar calculations can be made for the other values of s2:
234
n = 2 : s2 x2
f2*(s2, x2) = p2(x2) + f3
*(s2 – x2)
0 1 2 3 4 5 f2*(s2) x2
*
0 0 0 01 50 20 50 02 70 70 45 70 0 or 13 80 90 95 75 95 24 100 100 115 125 110 125 35 130 120 125 145 160 150 160 4
16
Stage n = 1
Only state is the startingstate s1 = 5:
235
n = 1 : s1 x1f1
*(s1, x1) = p1(x1) + f2*(s1 – x1)
0 1 2 3 4 5 f1*(s1) x1
*
5 160 170 165 160 155 120 170 1
5
0
5
120
0
0
160
4125
45State:
...
Thousands of additionalperson-years of life
Country
Medicalteams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
Optimal policy decision
236
17
Distribution of effort problem
One kind of resource is allocated to a number ofactivities.
In DP involves only one (or few) resources, while in LPcan deal with thousands of resources.
The assumptions of LP: proportionality, divisibility andcertainty can be violated by DP. Only additivity isnecessary because of the principle of optimality.
World Health Council problem violates proportionalityand divisibility (only integers are allowed).
237
Formulation of distribution of effort
When system starts at stage n in state sn, choice xnresults in the next state at stage s + 1:
238
Stage activity ( 1, 2, , ),decision variable for stage ,
State amount of resource still available for allocatingto remaining activities ( , , )
n
n
n n n nx ns
n N
Stage: n
State: snxn sn – xn
n + 1
18
Example
Distributing scientists to research teams3 teams are solving engineering problem to safely flypeople to Mars.
2 extra scientists reduce the probability of failure.
239
Probability of failure
Team
New scientists 1 2 3
0 0.40 0.60 0.80
1 0.20 0.40 0.50
2 0.15 0.20 0.30
Continuous dynamic programming
Previous examples had a discrete state variable sn, ateach stage.
They all have been reversible; the solution procedurecould have moved backward or forward stage bystage.
Next example is continuous. As sn can take any valuesin certain intervals, the solutions fn
*(sn) and xn* must
be expressed as functions of sn.
Stages in the next example will correspond to timeperiods, so the solution must proceed backwards.
240
19
Example: scheduling jobs
The company Local Job Shop needs to scheduleemployment jobs due to seasonal fluctuations.
Machine operators are difficult to hire and costly to train.Peak season payroll should not be maintained afterwards.Overtime work on a regular basis should be avoided.
Requirements in near future:
241
Season Spring Summer Autumn Winter Spring
Requirements 255 220 240 200 255
Example: scheduling jobs
Employment above level in the table costs 2000€ perperson per season.
Total cost of changing level of employment from oneseason to the other is 200€ times the square of thedifference in employment levels.
Fractional levels are possible due to part-timeemployees.
242
20
Formulation
From data, maximum employment should be 255(spring). It is necessary to find the level of employmentfor other seasons. Seasons are stages.
One cycle of four seasons, where stage 1 is summerand stage 4 is spring.
xn = employment level for stage n (n =1,2,3,4); x4=255rn = minimum employment level for stage n: r1=220,r2=240, r3=200, r4=255. Thus:
rn xn 255
243
Formulation
Cost for stage n:
n = 200(xn – xn–1)2 + 2000(xn – rn)State sn is the employment in the preceding season xn–1
sn = xn–1 (n=1: s1 = x0 = x4 = 255)
Problem:
244
42
11
minimize 2000( ) 200( ) ,i i i ii
x x x r
subject to 255, for 1, 2,3,4i ir x i
21
Detailed formulation
Choose x1, x2 and x3 so as to minimize the cost:
245
n rn Feasible xn Possible sn = xn-1 Cost
1 220 220 £ x1 £ 255 s1 = 255 200(x1 - 255)2 + 2000(x1 - 220)
2 240 240 £ x2 £ 255 220 £ s2 £ 255 200(x2 - x1)2 + 2000(x2 - 240)
3 200 200 £ x3 £ 255 240 £ s3 £ 255 200(x3 - x2)2 + 2000(x3 - 200)
4 255 x4 = 255 200 £ s4 £ 255 200(255 - x3)2
Formulation
Recursive relationship:
Basic structure of the problem:
246
* 2 *1255
( ) min 200( ) 2000( ) ( )n n
n n n n n n n nr xf s x s x r f x
22
Solution procedure
Stage 4: the solution is known to be x4* = 255.
Stage 3: possible values of 240 £ s3 £ 255:
graphical solution:
247
s4 f4* (s4) x4
*
200 £ s4 £ 255 200(255 - s4)2 255
* 2 *3 3 3 3 3 4 3200 255
2 23 3 3 3200 255
( ) min 200( ) 2000( 200) ( )
min 200( ) 2000( 200) 200(255 )n
n
x
x
f s x s x f x
x s x x
Graphical solution for f3*(x3)
248
23
Calculus solution for f3*(x3)
Using calculus:
249
3 3 3 3 3 33
3 3
* 33
( , ) 400( ) 2000 400(255 )
400(2 250) 0250
2
f s x x s xx
x ssx
s3 f3* (s3) x3
*
240 £ s3 £ 255 50(250- s3)2+50(260- s3)2
+1000(s3-150) (s3+250)/2
Stage 2
Solved in a similar fashion:
for 220£ s2 £255 (possible values) and 240£ x2 £ 255(feasible values).
Calculating / x2[f2(s2, x2)] = 0, yields:
250
2 *2 2 2 2 2 2 2 3 3
2 *2 2 2 3 3
2 22 2 2
( , ) 200( ) 2000( ) ( )
200( ) 2000( 240) ( )
50(250 ) 50(260 ) 1000( 150)
f s x x s x r f x
x s x f xx x x
* 22
2 2403
sx
24
Stage 2
The solution has to be feasible for 220 £ s2 £ 255 (i.e.,240 £ x2 £ 255 for 220 £ s2 £ 255)!
is only feasible for 240 £ s2 £ 255.
Need to solve for feasible value of x2 that minimizesf2(s2, x2) when 220 £ s2 < 240.
When s2 < 240,
so x2* = 240.
251
* 22
2 2403
sx
2 2 2 22
( , ) 0 for 240 255f s x xx
Stage 2 and Stage 1
Stage 1: procedure is similar.
Solution:x1
* = 247.5; x2* = 245; x3
* = 247.5; x4* = 255
Total cost of 185 000 €
252
s2 f2* (s2) x2
*
220 £ s2 £ 240 200(240-s2)2+115000 240
240 £ s2 £ 255 200/9[(240- s2)2+(255- s2)2 (270-s2)2 ]+2000(s2-195) (2s2+240)/3
s1 f1* (s1) x1
*
255 185000 247.5
25
Probabilistic dynamic programming
Next stage is not completely determined by state andpolicy decision at the current one
There is a probability distribution for determining thenext state, see figure.
S = number of possible states at stage n + 1.
System goes to i (i = 1,2,…,S) with probability pi givenstate sn and decision xn at stage n.
Ci = contribution of stage n to the objective function.
If figure is expanded to all possible states anddecisions at all stages, it is a decision tree.
253
Basic structure
254
26
Probabilistic dynamic programming
Relation between fn(sn, xn) and f*n+1 (sn+1) depends on
the form of the objective function.
Example: minimize the expected sum of thecontributions from individual stages.
fn(sn, xn) is the minimum expected sum from stage nonward, given state sn, and policy decision xn, at stage n:
with
255
*1
1
( , ) ( )S
n n n i i ni
f s x p C f i
1
*1 1 1( ) min ( , )
nn n nx
f i f i x
Example: determining reject allowances
The Hit-and-Mass Manufacturing Company receivedan order to supply one item of a particular type.
Customer requires specified stringent quality requirements.
Manufacturer has to produce more than one to achieve oneacceptable. Number of extra items is the reject allowance.
Probability of acceptable or defective is ½.
Number of acceptable items in a lot size of L has a binomialdistribution: probability of not acceptable item is (1/2)L.
Setup cost is 300, cost per item is 100. Maximum productionruns is 3. Cost of no acceptable item after 3 runs: 1600.
256
27
Formulation
Objective: determine policy regarding lot size(1 + reject allowance) for required production run(s)that minimizes total expected cost.
Stage n = production run n (n = 1,2,3),xn = lot size for stage n,
State sn = number of acceptable items still needed (1or 0) at the beginning of stage n.
At stage 1, state s1 = 1.
257
Formulation
fn(sn, xn) = total expected costs for stages n,…,3 andoptimal decisions are made thereafter:
where (fn*(0) = 0).
Consider monetary unit = 100. Contribution to costfrom stage n is [K(xn) + xn], with
Note that f4*(1) = 16.
258
*
0,1,( ) min ( , )
nn n n n nx
f s f s x
0, if 0( )
3, if 0n
nn
xK x
x
28
Basic structure of the problem
Recursive relationship:
259
* *10,1,2,
1(1) min ( ) (1)2
for 1, 2,3
n
n
x
n n n nxf K x x f
n
Solution procedure
260
s3 x3f3
*(1, x3) = K(x3) + x3 +16(1/2)x3
n = 3: 0 1 2 3 4 5 f3*(s3) x3
*
0 0 0
1 16 12 9 8 8 8 1/2 8 3 or 4
s2 x2f2
*(1, x2) = K(x2) + x2 +16(1/2)x2 f3*(1)
n = 2: 0 1 2 3 4 f2*(s2) x2
*
0 0 0
1 8 8 7 7 7 1/2 7 2 or 3
s1 x1f1
*(1, x1) = K(x1) + x1 +16(1/2)x1 f2*(1)
n = 1: 0 1 2 3 4 f1*(s1) x1
*
0 0
1 7 7 1/2 6 3/4 6 7/8 7 7/16 6 3/4 2
29
Solution
Optimal policy: produce two items on the firstproduction run;
if none is acceptable, then produce either two or threeitems on the second production run;
if none is acceptable, then produce either three or fouritems on the third production run.
The total expected cost for this policy is 675.
261
Conclusions
Dynamic programming: very useful technique formaking a sequence of interrelated decisions. Itrequires formulating an appropriate recursiverelationship for each individual problem.
Example: problem has 10 stages with 10 states and 10possible decisions at each stage.
Exhaustive enumeration must consider up to 10 billioncombinations.
Dynamic programming need make no more than athousand calculations (10 for each state at each stage).
262