dynamic programming[2003]

Dynamic Programming

Text book: Principles of Operations Research for Management

Frank S BudnickDennis McleaveyRichard Mojena

Dynamic programming

Dynamic programming (DP) is a useful mathematical technique for making a sequence of interrelated decisions. It provides a systematic procedure for determining the optimal combination of decisions.

DP vs LP

LP is iterative- i.e., each step represents a unique solution that is non optimal.

DP is recursive- i.e., it optimizes on a step by step basis using information from preceding step. A single step is sequentially related to preceding steps and is not itself a solution to the problem.

Puzzle- River crossingA farmer went to the market and purchased a fox, a goose, and a bag of beans. On his way home, the farmer came to the bank of a river and hired a boat. But in crossing the river by boat, the farmer could carry only himself and a single one of his purchases. If left alone the goose will eat the beans and the fox will eat the goose. How can the farmer carry them all across safely?

Approach of DPThe fundamental approach of DP involves 1.The breaking down of a multistage problem into its subparts or single stages, a process called DECOMPOSITION.2.RECURSIVE decision making i.e., one decision at each stage, according to a specific optimization objective of that stage.3.Combining the results at each stage to solve the entire problem, a process called COMPOSITION.

Computational approaches:– Forward Recursion and– Backward Recursion

Symbolic representation of n stages of analysis using backward recursion

Stage

Decision

Input state

Return

Output state Stage i

xi

Si

ri

Si-1

Model elements for a stage of analysis Notational representation for a stage

1 2 n

Stage n

xn

Sn

rn(Sn, xn)fn(Sn, xn)

Sn-1Stage n-1

Sn-2

xn-1

rn-1(Sn-1, xn-1)fn-1(Sn-1, xn-1)

Stage 1S0S1

r1(S1, x1)f1(S1, x1)

x1

Sequence of n decisions

Symbolic representation of n stage analysis- backward recursion

Si - state of the system prior to state i ri(Si, xi) -Direct criterion return from stage i

fi(Si, xi) -Cumulative criterion return from stage 1 through i.

Bellman’s Principle of Optimality

An optimal set of decision rules has the property that, regardless of the ith decision, the remaining decisions must be optimal with respect to the outcome that results from the ith decision

Shortest Route problemThe objective is to determine the path from the origin to the destination that minimizes the sum of the numbers along the directed arcs of the path. Typically, the number associated with each arc represents the distance. Cost or time of travelling along that particular segment of the journey.

1

3

2

4 7

6

5

8

5

5

6

42

17

4 4 3

3

1

3

2

4 7

6

5

8

5

5

6

42

17

4 4 3

3

Shortest Route problem

DP decomposes this problem into three stages, one for each leg of the journey.

1 2 3 4

13

2

4 7

6

5

8

5

5

6

42

17

4 4 3

3

Leg 1Stage 3

Leg 2Stage 2

Leg 3Stage 1

13

2

4 7

6

5

8

5

5

642

17

4 4 3

3

Leg 1Stage 3

Leg 2Stage 2

Leg 3Stage 1

Stage 3 Stage 2 Stage 1

Input node Input node Input node

1 2 5

3 6

4 7

Backward recursion - Stage 1

Stage 1S0S1

r1(S1, x1)f1(S1, x1)

x1

Entering state S1 travel from

Decision x1

Travel to f1=r1+f0

*

Optimal Policy

Decision x1* Cumulative

return f1*

5 4+0=4 8 4

6 1+0=1 8 1

7 3+0=3 8 3

Entering state S1

Decision x1

Optimal Policy

Decision x1

*

return f1

*

5 4+0=4 8 4

6 1+0=1 8 1

7 3+0=3 8 3


Decision x2 (travel to) f2=r2+f1

*

Optimal Policy


return f2*

5 6 7

2 6+4=10 2+1=3 6 3

3 7+3=10 4+3=7 7 7

4 5+3=8 7 8

Stage 2Stage 1



*

Optimal Policy


return f2*

5 6 7

2

3

4

Stage 2



*

Optimal Policy


return f3*

2 3 4

1 5+3=8 3+7=10 4+8=12 2 8

Enteringstate S2

Decision x2 f2=r2+f1

*

Optimal Policy

Decision x2* Cum

return f2*

5 6 7

2 6+4=10 2+1=3 6 3

3 7+3=10 4+3=7 7 7

4 5+3=8 7 8

Stage 3


Decision x1

Travel to 8Optimal Policy


return f1*

5 4+0=4 8 4

6 1+0=1 8 1

7 3+0=3 8 3



*

Optimal Policy


return f2*

5 6 7

2 6+4=10 2+1=3 6 3

3 7+3=10 4+3=7 7 7

4 5+3=8 7 8Entering state S3 travel from


*

Optimal Policy


return f3*

2 3 4

1 5+3=8 3+7=10 4+8=12 2 8

Solution

113

22

4 7

66

5

88

55

5

6

422

117

4 4 3

3

Path: 1 - 2 – 6 – 8 Cost: 5+2+1 =8

Exercise to do

1. Change the distance of arc 2-6 to 5 and completely solve for shortest route using backward recursion

2. Using DP, determine the longest route for the same problem

Resource allocation problemA company has 5 salesman to be allocated to 3 marketing zone. The return or profit depends upon no of salesman working in the zone. The expected returns for different salesman in different zones as expected from past record are shown below. Determine optimal allocation policy.

No of salesman

Marketing zone

1 2 3

0 45 30 35

1 58 45 45

2 70 60 52

3 82 70 64

4 93 79 72

5 101 90 82

Let s be the no of salesman availablexj be the no of salesmen allocated to zone j

Pj(xj) be the return from the zone j when xj are allocated to zone j.

Formulation

Stage 1Zone 3

S0=0S1

r1

x1

Stage 2Zone 2

S2

r2

x2

Stage 3Zone 1

S3=5

r3

x3

Max. z= P1(x1)+ P2(x2)+ P3(x3)

Sub to x1+x2+x3 <=5x1, x2, x3 >=0

Entering stateS1

Decisionreturn

Optimal policy

x1* f1*

0 35 0 35

1 45 1 45

2 52 2 52

3 64 3 64

4 72 4 72

5 82 5 82

Stage 1

No of salesman

Marketing zone

1 2 3

0 45 30 35

1 58 45 45

2 70 60 52

3 82 70 64

4 93 79 72

5 101 90 82Entering state

S1

Decisionreturn

Optimal policy

x1* f1*

0

1

2

3

4

5

f2 (s2)=r2 (s2)+f1*(s2)

f2*(s2)=opt x2 {r2 (s2)+f1*(s2)}Transformation Equation

s1 = s2-x2

Stage 2

S1 Decisionreturn

Optimal policy

x1* f1*

0 35 0 35

1 45 1 45

2 52 2 52

3 64 3 64

4 72 4 72

5 82 5 82

Salesman

Marketing zone

1 2 3

0 45 30 35

1 58 45 45

2 70 60 52

3 82 70 64

4 93 79 72

5 101 90 82

Stage 1S0=0S1

r1

x1

Stage 2S2

r2

x2

Stage 3S3=5

r3

x3

Stage 1Given data

Entering state

S2

Decision x2 Optimal policy

0 1 2 3 4 5 x2* f2*

0 30+35=65

0 65

1 30+45=75

45+35=80

1 80

2 30+52=82

45+45=90

60+35=95

2 95

3 30+64=94

45+52=97

60+45=105

70+35=105

2,3 105

4 30+72=102

45+64=109

60+52=112

70+45=115

79+35=114

3 115

5 30+82=112

45+72=117

60+64=124

70+52=122

79+45=124

90+35=125

5 125

Stage 2

Salesman

Marketing zone

1 2 3

0 45 30 35

1 58 45 45

2 70 60 52

3 82 70 64

4 93 79 72

5 101 90 82

Stage 1S0=0S1

r1

x1

Stage 2S2

r2

x2

Stage 3S3=5

r3

x3

Stage 2Given data

S2

Decision x2 Optimal

0 1 2 3 4 5 x2* f2*

0 65 0 65

1 75 80 1 80

2 82 90 95 2 95

3 94 97 105 105 2,3 105

4 102 109 112 115 114 3 115

5 112 117 124 122 124 125 5 125

Stage 3Entering

stateS2


0 1 2 3 4 5 x3* f3*

5 45+125=170

58+115=173

70+105=175

82+95=177

93+80=173

101+65=166

3 177

Stage 1S0=0S1 =0

r1=35

x1 =0

Stage 2S2 =2

r2=60

Stage 3S3=5

r3 =82

x3 =3 x2 =2

S2

Decision Optimal

0 1 2 3 4 5 x2* f2*

0 65 0 65

1 75 80 1 80

2 82 90 95 2 95

3 94 97 105 105 2,3 105

4 102 109 112 115 114 3 115

5 112 117 124 122 124 125 5 125

Stage 2

Total return82+60+35

=177

Cargo Loading Problem

A truck can carry cargos up to 10 tonne. There are three different items available to be loaded in the truck with different utility value. The objective is to find out the no of different items to be taken in the truck to maximize the utility. The details are given in the table below.

Item Weight Utility or benefit

A 4 11B 3 7C 5 5

Stage 1Item C

S0=0S1

r1

x1

Stage 2Item B

S2

r2

x2

Stage 3Item A

S3=10

r3

x3

Available space

Available space

Loading decision of

Item A

Loading decision of

Item A

Utility due to loading Item

A

Utility due to loading Item

A

Entering state S1

Decision x1

f1=r1+f0*

Optimal Policy


return f1*

0, 1, 2, 3, 4 0 0 0

5, 6, 7, 8, 9 1 1 5

Entering state S1

Decision x1

f1=r1+f0*

Optimal Policy


return f1*

0, 1, 2, 3, 4 0 0 0

5, 6, 7, 8, 9

Entering state S1

Decision x1

f1=r1+f0*

Optimal Policy


return f1*

0, 1, 2, 3, 4 0 0 0

Entering state S1

Decision x1

f1=r1+f0*

Optimal Policy


return f1*

0, 1, 2, 3, 4

Entering state S1

Decision x1

f1=r1+f0*

Optimal Policy


return f1*

Stage 1

Entering state S1

Decision x1

f1=r1+f0*

Optimal Policy


return f1*

0, 1, 2, 3, 4 0 0 0

5, 6, 7, 8, 9 1 1 5

10 2 2 10

Item Weight Utility

C 5 5

Entering state

S2


0 1 2 3 x2* f2*

Stage 2

Entering state

S2


0 1 2 3 x2* f2*

0, 1, 2 0 0 0

3 0 7+0=7 1 7

4 0 7+0=7 1 7

5 0+5=5 7+0=7 1 7

6 0+5=5 7+0=7 14+0=14 2 14

7 0+5=5 7+0=7 14+0=14 2 14

8 0+5=5 7+5=12 14+0=14 2 14

9 0+5=5 7+5=12 14+0=14 21+0=21 3 21

10 0+10=10 7+5=12 14+0=14 21+0=21 3 21

B 3 7

Entering state

S3


0 1 2 3 x3* f3*

10 0+21=21 11+14=25 22+0=22 1 25

Stage 3

A 4 11

Stage 1Item C

S0=0S1

r1=0

x1=0

Stage 2Item B

S2

r2=14

x2=2

Stage 3Item A

S3=10

r3=11

x3=1

Production Schedule Problem

Month May June July August

Demand 30 40 20 30

A company has to meet the following demand for an item in months May, June, July and August.

The item is to be delivered at the end of each month. Production cost associated with manufacturing of item depends on no of units produced. No of units produced O 10 20 30 40

Production cost 0 7000 9000 10000 11000Max. storage capacity is 30 units. Items that are not delivered in same month, may be stored in inventory at a cost of Rs.100/unit/month. The beginning inventory in may is 20 units and ending inventory in Aug must be zero. For practical purposes items can be produced, stored and delivered and stored in batches of 10 units. Determine production inventory schedule that minimizes the total production inventory cost while meeting the demand requirements

Stage 2July

S1S2

r2

x2

Stage 3June

S3

r3

x3

Stage 4May

S4=20

r4

x4

Stage 1August

S0=0

r1

x1

Let Si be initial inventory in month ixi be the no of units to be produced in month iDi Demand in month iLet C(xi ) cost of producing xi units in month i

Stage transformation equation Si-1 =Si+xi -Di

r i (Si, xi) = C(xi )+100 Si

fi*(Si) =min xi {C(xi )+100 Si +fi-1* (Si-1) }

Stage 1

Entering state

S1


0 10 20 30 x1* f1*

0 10000 30 10000

10 9000+(10*100)=10000

20 10000

20 7000+2000=9000

10 9000

30 100*30=3000

0 3000

August- Demand 30

Stage 2 Entering state S2


0 10 20 30 40 x2* f2*

0 9000+10000=19000

10000+10000=20000

11000+9000

=20000

20 19000

10 7000+100000

+1000=18000

9000+10000+1000

=20000

10000+9000+1000

=20000

11000+1000+3000

=15000

40 15000

20 0+100000+2000

=12000

7000+10000+2000

=19000

9000+9000+2000

=20000

10000+3000+2000

=15000

20 12000

30 0+9000+3000

=12000

7000+9000+3000

=19000

9000+3000+3000

=15000

0 12000

July- Demand 20

Stage 3 Entering state S3


0 10 20 30 40 x3* f3*

0 11000+19000=30000

40 30000

10 10000+19000+1000

=30000

11000+15000+1000

=27000

40 27000

20 30000 27000 25000 40 25000

30 29000 27000 25000 26000 30 25000

40 23000 26000 25000 26000 0 23000

June- Demand 40

Stage 4

Entering state S4


0 10 20 30 40 x4* f4*

20 7000+30000+2000

=39000

9000+27000+2000=38000

10000+25000+2000=37000

11000+25000+2000

=38000

30 37000

May- Demand 30

Stage 2July

S1 =20S2 =20

r2=9000

X2 =20

Stage 3June

S3 =20

r3=11000

X3 =40

Stage 4May

S4=20

r4 =10000

X4 =30

Stage 1August

S0=0

r1=7000

X1 =10

Solution

Thank You

dynamic programming[2003]

Documents

optimal policydecision

cumulative return f1

cumulative return f2

single step

state i risi

single stages

step basis

recursive decision