the optimizing-simulator: merging optimization and ......special maintenance at airbase -1000...

© 2005 Warren B. Powell Slide 1

The Optimizing-Simulator: MergingOptimization and Simulation Using

Approximate Dynamic Programming

Winter Simulation ConferenceDecember 5, 2005

Warren PowellCASTLE LaboratoryPrinceton University

http://www.castlelab.princeton.edu

© 2004 Warren B. Powell, Princeton University


Yellow Freight System



The fractional jet ownership industry


Schneider National


Air Mobility Command

AirMobility

Command

Fuel

Cargo HandlingRamp Space

Maintenance

Cargo Holding


The challenges

Needs for simulation:» Are we using the right mix of people and equipment?» What is the effect of new policies regarding the

management of people and equipment?» What is the marginal contribution from serving

customers?» What is the effect of last-minute demands on the

system?


The challenges

We need simulation technology that accomplishes the following:» Decisions have to handle high dimensional states and

actions (assigning different types of resources to different types of tasks).

» The simulator has to capture behaviors that produce “good” behaviors not just at a point in time, but over time (decisions have to think about the future).

» Performance statistics must match historical performance.


Outline

Modeling and problem representation


Modeling

Resources can have a number of attributes:

LocationEquipment type⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

LocationETA

Equipment typeTrain priority

PoolDue for maint

Home shop

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

LocationETA

A/C typeFuel level

Home shopCrewEqpt1

Eqpt100

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

LocationETA

Bus. segmentSingle/team

DomicileDrive hoursDuty hours

8 day historyDays from home

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

a =


Modeling

The attribute vector

The resource state variable

( )Number of resources with attribute at time .

Resource state variableta

t ta a

R a tR R

∈

=

= =A

1

2t

n

aa

a

a

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦


Modeling

Decision set function:( ) Set of decision types we can use to act

on a resource with attribute .a

a= D

1

2t

n

aa

a

a

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

Modified resource label1ta +d


Modeling

The “modify” function

The information process

1 ( , , ) ( , )t t t tM a W d a c− =

Vector of information arriving during time interval .

Ex: new customer requests, equipment failures, weather delays.

tW t=


Modeling

Decisions

The decision function

( )t t tx X Iπ=

Set of decision functions (policies)π ∈Π =

Information available for making a decision

( ) ,

Number of resources with attribute that we can act on with decision using the information available at time .

tad

t tad a d

x ad t

x x∈ ∈

=

=A D


Approximate dynamic programming

Information and decision processes:

Time

1W

1x 2x 3x 4x 5x 6x0x

2W 3W 4W 5W 6W

Exogenous information process

Decisions determined by a policy


Modeling

System dynamics (classical view):

1 1

Given a decision function (policy) ( ) andexogenous information process , we can modelthe evolution of the state of our system using:

( , ( ), )

t t

t

t t t t t

X SW

S f S X S W

π

π+ +=


Modeling

( )t tX Sπ

tSxtS

1tW +

1tS +


Modeling

User provides:Model of physical system

( )1

Data: Resource vector Information process Software: Decision set function Modify function ( , , )

t

t

t t t

RW

D aM a d W +

Our research goal:The decision function

Decision functions ( )t tX Iπ


Outline

The optimizing simulator


Optimizing over time

Resources



Tasks



t t+1 t+2

Optimizing at a point in time




t = t + 1

Make decision at time t

Update system stateat t+1

t = 0

t < T ???

Classical simulation:» Simple» Extremely flexible

But . . .» Limited solution

quality» Often requires

extensive user defined tables to guide the simulation.

» Can respond to changes in inputs in an unpredictable way.



Optimization» Intelligent» Responds naturally to

new datasets.But . . .» Struggles to handle

complexity of real operations.

» Does not model evolution of information.

» Might be “too intelligent”?

1 1

min

0

t tt

t t t t tt

t t t

t

c x

A x B x b

D x ux

− −− =

≤≥

∑

∑


Multicommodity flowTime

Spac

e

Type



Simulation» Strengths

• Extremely flexible• High level of detail

» Weaknesses• Low level of “intelligence”• Lower solution quality• May have difficulty

“behaving” properly with new scenarios.

• Difficulty adapting to random outcomes.

Optimization» Strengths

• High level of intelligence• System behaves “optimally”

even with new datasets• Reduces data set preparation.

» Weaknesses• Strict rules on problem structure• Low level of detail• Inflexible!

To simulate or to optimize . . .

. . . Why are we asking this question?


Decision-making technologies

Cost-based» The standard assumption of

math programming.» Easily handles tradeoffs.» Easily handles high

dimensions.» Can be difficult to tune to

get the right behavior.

Rule-based» Typically associated with AI.» Very flexible.» Difficult coding tradeoffs.» Struggles with higher

dimensional states.


Expert knowledge ρ

The four information classes

Forecasts of impacts on others tV

tΩForecasts of exogenous events

Knowledge tK


The four information classes

Knowledge tK


Knowledge

Rule-based: one aircraft and one requirement

California

Germany

New Jersey

Colorado

Taiwan

England

New Jersey

Aircraft Requirements


Knowledge

Cost based: one requirement and multiple aircraft

California

Germany

New Jersey

Colorado

Taiwan

England

New Jersey



Knowledge

Costs allow you to make tradeoffs:

California

Germany

-8000Total “cost”-1000Special maintenance at airbase-3000Requires modifications+8000Utilization+5000Appropriate a/c type

-$17,000Repositioning cost“cost”/“bonus”Issue


Knowledge

Cost based: multiple requirements and aircraft

California

Germany

New Jersey

Colorado

Taiwan

England

New Jersey



The information classes


Knowledge tK


Forecasts of exogenous information

California

Germany

New Jersey

Colorado

Taiwan

England

New Jersey

( ) involves solving a linear program/network model.X Iπ


Resources that are known now…




California

Germany

New Jersey

Colorado

Taiwan

England

New Jersey

( ) involves solving a linear program/network model.X Iπ

CaliforniaGermany

New Jersey

Colorado

TaiwanEngland

New Jersey


Resources that are known now…



Aircraft Requirements California

Germany

New Jersey

Colorado

TaiwanEngland

New Jersey

tR

⎧⎪⎪= ⎨⎪⎪⎩

CaliforniaGermany

New Jersey

Colorado

Taiwan

England

New Jersey

( )' 't t tR

>=

⎧⎪⎪⎨⎪⎪⎩

… and are forecasted for the future.



The Information classes



Knowledge tK



Decisions now may need to know the impact on future decisions:» What is the cost of assigning this type of aircraft to

move a requirement?» What is the value of having a certain number of aircraft

in a region?» Should this requirement be satisfied now? Later?

Never?

For these questions, it is important that we optimize over time.

Time tV(a’)

a

V(a’’)

Time t '1( )V a

1a

'2( )V a

2a


The optimization challenge

?


State variables

Systems evolve through a cycle of exogenous and endogenous information

Time

1R̂

1x 2x 3x 4x 5x 6x0x

2R̂ 3R̂ 4R̂ 5R̂ 6R̂ω =


State variables

Systems evolve through a cycle of exogenous and endogenous information

Time

1R̂

1x 2x 3x 4x 5x 6x0x

2R̂ 3R̂ 4R̂ 5R̂ 6R̂

1R 2R 3R 4R 5R 6R0R



Using this state variable, we obtain the optimality equations:

Problem: Curse of dimensionality

{ }1 1( ) max ( , ) ( ) |t t t t t t t txV R C R x E V R R+ +∈

= +X

Three curses

State spaceOutcome spaceAction space (feasible region)



The computational challenge:

{ }1 1( ) max ( , ) ( ) |t t t t t t t txV R C R x E V R R+ +∈

= +X

How do we find ? 1 1( )t tV R+ +

How do we compute the expectation?

How do we find the optimal solution?



A possible approximation strategy:

( ){ }1 1

We start with:

( ) max ( , ) |t t t t t t t tt

V R C R x E V R Rx + += +

Can’t compute this!!!

( )1 1

We solve this for a sample realization:

( , ) max ( , ) ( )t t t t t t tt

V R C R x V Rxω ω+ += +

( )1 1

Now substitute in function approximations:

( , ) max ( , ) ( )t t t t t t tt


Don’t know what this is!

Need to approximate V



One big problem….

( )1 1( , ) max ( , ) ( )t t t t t t tt


1Seeing is cheating!tR +



Alternative: Change the definition of the state variable:

Time

1R̂

1x 2x 3x 4x 5x 6x0x

2R̂ 3R̂ 4R̂ 5R̂ 6R̂

1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 3R1R 2R 4R 5R0R


Approximate dynamic programmingNow our optimality equation looks like:

We drop the expectation and solve the conditional problem:

Finally, we substitute in our approximation:

{ }1, 1 1( ) max ( , ) ( ( , )) |t

x x xt t t t t t t t t tx

V R E C R x V R x Rω− − −∈= +

X

( )( )1 1 ( ) )ˆ( , ( )) max ( ( ), ( )) ,x x

t t t t t t t t txV R R C R x V R x

ω ωω ω ω ω− − ∈

= +(X

( )( )1 1 ( ) )ˆ( , ( )) max ( ( ), ( )) ,x x

t t t t t t t t txV R R C R x V R x

ω ωω ω ω ω− − ∈

= +(X

Expectation outside of the “max” operator.

Post-decision state variable

“Convenient” value function approximation.



Approximating the value function:» We choose approximations of the form:

Linear (in the resource state):

( )

Piecewise linear, separable:

( ) ( )

t t ta taa

t t ta taa

V R v R

V R V R

∈

∈

= ⋅

=

∑

∑

A

A

Best when assets are complex,which means that is small(typically 0 or 1).

taR

Best when assets are simple,which means that may belarger.

taR



A myopic decision rule (policy):

A decision rule that looks into the future:

( )( )( ) )

arg max ( ( ), ( )) ,n xt t t t t t t

xx C R x V R x

ω ωω ω ω

∈= +

(X

( ) )arg max ( ( ), ( ))n

t t t tx

x C R xω ω

ω ω∈

=(X



t t+1 t+2Simulating a myopic policy:



A myopic decision rule (policy):

A decision rule that looks into the future:

( )( )( ) )

arg max ( ( ), ( )) ,n xt t t t t t t

xx C R x V R x

ω ωω ω ω

∈= +

(X

( ) )arg max ( ( ), ( ))n

t t t tx

x C R xω ω

ω ω∈

=(X



1a

'1( )V a

2a

'2( )V a

Option 1: Send directly to customersOption 2: Send to regional depotsOption 3: Send to classification yards

Classification yards


Approximate dynamic programmingTwo-stage resource allocation under uncertainty


Approximate dynamic programmingWe obtain piecewise linear recourse functions for each regions.


Approximate dynamic programmingThe function is piecewise linear on the integers.

We approximate the value of cars in the future using a separable approximation.

0 1 2 3 4 5Number of vehicles at a location

Prof

its


Approximate dynamic programmingTo capture nonlinear behavior:

Each link captures the marginalreward of an additional car.



1nR →

2nR →

3nR →

4nR →

5nR →


Approximate dynamic programmingWe estimate the functions by sampling from our distributions.

1nR →

2nR →

3nR →

4nR →

5nR →

1 ( )nD ω

2 ( )nD ω

3 ( )nD ω

( )nCD ω

1( )nv ω

2 ( )nv ω

3 ( )nv ω

4 ( )nv ω

5 ( )nv ω

Marginal value:



The time t subproblem:

1tR

2tR

3tR

t1 2 3( , , )n

ta t t tV R R R(i-1,t+3)

(i,t+1)

(i+1,t+5)

1 1

2 2

3 3

Gradients:ˆ ˆ( , )ˆ ˆ( , )ˆ ˆ( , )

n nt t

n nt tn nt t

v v

v v

v v

− +

− +

− +



Left and right gradients are found by solving flow augmenting path problems.

3tR

t

i

1 2 3( , , )nta t t tV R R R

(i-1,t+3)Gradients:

3ˆ( )ntv +

The right derivative (the value of one more unit of that resource) is a flow augmenting path from that node to the supersink.

The right derivative (the value of one more unit of that resource) is a flow augmenting path from that node to the supersink.



Left and right derivatives are used to build up a nonlinear approximation of the subproblem.

R1t

1( )kit tV R

R1tk



Left and right derivatives are used to build up a nonlinear approximation of the subproblem.

R1t

ktv+

ktv−

Right derivativeLeft derivative

R1tk

1( )kit tV R



Each iteration adds new segments, as well as refining old ones.

R1t

( 1)ktv+ +

( 1)ktv− +

R1tk+1

1( )kit tV R



0.0

0.5

1.0

1.5

2.0

2.5

0 1 2 3 4 5 6 7 8 9 10

Variable Value, s

Func

tiona

l Val

ue, f

(s) =

ln(1

+s)

Exact1 Iter2 Iter5 Iter10 Iter15 Iter20 Iter

Number of resources

App

roxi

mat

e va

lue

func

t ion


Simulating a myopic policy


t


Simulating a myopic policy



Using value functions to anticipate the future


t

“Here and now” Downstream impacts


80

85

90

95

1001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

Iteration No.

% o

f Obj

ectiv

e U

pper

boun

d

Agg_PWLinear_1

Agg_PWLinear_2

Agg_PWLinear_3

DisAgg_Linear

DisAgg_PWLinear

Decomp_Location

The mathematical optimum


Approximate DP vs. LP


Downloadable atwww.castlelab.princeton.edu



Expert knowledge ρ



Knowledge tK


Low dimensional patterns

Old modeling approach: Engineering costs

0, :Subject tominarg*

≥==

xbAxcxx

Objectives

“Physics”

“Behavior”


Flows from history


Flows from history

Flows from the model



Bottom up/top down modeling:

Specify the behaviorsyou want at a general

level.

Patterns

Specify costs,driver availability,work rules, routing

preferences, load avail.

Engineering



Pattern matching

* arg min ( , )x cx H xθ ρ= +

Cost function

“Behavior”

The “happiness” function –measures the degree to which model behavior agrees with a knowledgeable expert.

( , ) || ( ) || where ( ) is an aggregation functionH x G x G xρ ρ= −



Patterns and aggregation:» What we do:

• We define patterns based on an aggregation of the attributes of a single vehicle.

• Patterns indicate the desirability of a single decision.

» Patterns can be expressed at different levels of aggregation, simultaneously.

• Don’t send C-5’s into Saudi Arabia• Don’t send C-5’s needing maintenance into Saudi Arabia• Don’t send C-5’s needing maintenance loaded with freight to

southeast Asia into Saudi Arabia.

» Patterns are not hard rules – they express desirable or undesirable patterns of behavior.


Flows from history




Length of haul calibration-teams

600

650

700

750

800

850

1 2 3 4 5 6 7 8 9 10

Iteration

MinSolo w/ patternSolo w/o patternMax

Without pattern

With pattern



Patterns can come from history:


Low dimensional patterns… or an expert:



Expert knowledge ρ



Knowledge tK


The military airlift problem


(EK)Expert knowledge

(ADP)Approximate Dynamic Programming

(RH)Rolling horizon

(MP:RL-AL/KNAF)

Myopic cost-based, a list of requirements to a list of aircraft, known now and actionable in the future

(MP:RL-AL/KNAN)

Myopic cost-based, a list of requirements to a list of aircraft, known now and actionable now

(MP:R-AL/KNAF)

Myopic cost-based, one requirement to a list of aircraft, known now and actionable in the future

(MP:R-AL/KNAN)

Myopic cost-based, one requirement to a list of aircraft, known now and actionable now

(RB:R-A)Rule-based

Decision Functions

Information ClassesPolicy

ttt RI =

),( tttt cRI =

),)(( tttttt cRI ≥′′=

),( tttt cRI =

),)(( tttttt cRI ≥′′=

}|,){( ''''''ph

ttttttt tcRI T∈′= ′≥

}|,,){( phttttttttt tVcRI T∈′= ′′≥′′

}|,,,){( phttttttttt tVcRI T∈′= ′′≥′′ ρ

Optimizing simulator

Increasing information sets


Costs of different policies

0

50

100

150

200

250

(RB:R-A)(MP:RL-AL/KNAN)

(ADP)

Policies

Mill

ion

Dol

lors



Transportation cost

Late delivery cost

Repair cost

Total cost

RuleBased

Value functions

Actionablefuture

ActionableNow

Choice ofaircraft


Throughput curves of policies

0

5

10

15

20

25

30

35

40

45

50

0 30 60 90 120 150 180 210

Mill

ions

Time periods

Poun

ds

Cumulative expected thruput(RB:R-A)(MP:R-AL/KNAN)(MP:RL-AL/KNAN)(MP:RL-AL/KNAF)(ADP)




Areas between the cumulative expected thruput curve and different policy thruput curves

0

50

100

150

200

250

300

350

400

(RB:R-A)(MP:R-AL/KNAN)

(MP:RL-AL/KNAN)

(MP:RL-AL/KNAF)

(ADP)

Mill

ions

Policy

Poun

d * d

ays




Outline

Recent experiments with modeling airlift operations


Random demands and equipment failures


Pilots

Aircraft

Customers


Case study

Questions:

» What is the effect of uncertain demands on a military airlift schedule?

» What is the effect of equipment failures?

» How does adaptive learning change the effect of randomness on the performance of the simulation?

» What is the effect of advance information?


Effect of advance notice

86

88

90

92

94

96

98

100

Prebook 0 hours Prebook 2 hours Prebook 6 hours

Perc

ent c

over

age

Effect of advance booking

Withoutlearning


Effect of advance booking

Effect of advance notice

86

88

90

92

94

96

98

100

Prebook 0 hours Prebook 2 hours Prebook 6 hours

Perc

ent c

over

age

Withoutlearning

Withlearning


Midair refueling: initial solution



Path followed by tanker (moves up and down Atlantic).



Second plane crashes

First plane refuels

Green: full of fuelYellow to red: nearing emptyBlack: empty (plane crashes)


Midair refueling: exploration

Learning over many iterations.


Planes learn to meet in the middle so both can refuel.

Midair refueling: final solution


Outline

Calibrating a model for a major truckload motor carrier


Schneider National


Truckload trucking

Questions for the model:» What types of drivers should they hire?

• Domicile?• Single drivers vs. teams?

» What is the value of knowing about customer requests farther in the future?

» What is the profitability of different customers?» What is the value of increasing terminal capacity?


LOH

0

200

400

600

800

1000

1200

1400

1600

US_SOLO US_IC US_TEAM

Capacity category

LOH

Historical maximumSimulationHistorical minimum

Truckload trucking


Revenue per WU

Utilization

0

200

400

600

800

1000

1200

1400


Capacity category

Reve

nue

per W

U

Historical maximumSimulationHistorical minimum

0

200

400

600

800

1000

1200


Capacity category

Util

izat

ion Historical maximum

SimulationHistorical minimum

Truckload trucking


Truckload trucking

Challenge» We want to know the marginal value of each type of

driver.» A driver type is determined by:

» There are 30,000 driver “types”!!!» We need to take the “derivative” of our simulation for

each type.

Location 100Domicile 100

Driver type 3a

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥= =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦


Res

ourc

e St

ate-

Type

Time2+t1+tt

Multistage problems

( )t tX Rπ

3ˆntv

2ˆntv

1ˆntv


Time

Res

ourc

e St

ate-

Type

2+t1+t

Multistage problems

1 1( )t tX Rπ

+ +

1,2ˆntv +

1,1ˆntv +

1,3ˆntv +


Time

Res

ourc

e St

ate-

Type

2+t

Multistage problems

2 2( )t tX Rπ

+ +2,1ˆn

tv +

2,2ˆntv +

2,3ˆntv +


Res

ourc

e St

ate-

Type

Time2+t1+tt

Multistage problems

( )t tX Rπ

3ˆntv

2ˆntv

1ˆntv


Time

Res

ourc

e St

ate-

Type

2+t1+t

Multistage problems

1 1( )t tX Rπ

+ +

1,2ˆntv +

1,1ˆntv +

1,3ˆntv +


Time

Res

ourc

e St

ate-

Type

2+t

Multistage problems

2 2( )t tX Rπ

+ +2,1ˆn

tv +

2,2ˆntv +

2,3ˆntv +


( )t tX Rπ

1 1( )t tX Rπ

+ + 2 2( )t tX Rπ

+ +

Backward pass


Time

Res

ourc

e St

ate-

Type

2+t

2,1ˆntv +

Backward pass


Time

Res

ourc

e St

ate-

Type

2+t1+t

1,2ˆntv +

Backward pass


Time

Res

ourc

e St

ate-

Type

2+t1+tt

3ˆntv

Backward pass


Driver fleet optimization

simulation objective function

1800000

1810000

1820000

1830000

1840000

1850000

1860000

1870000

1880000

1890000

1900000

580 590 600 610 620 630 640 650

# of drivers

s1

s2

s3

s4

s5

s6

s7

s8

s9

s10

avg

pred

Base case+5 resources

+20 resources+30 resources+40 resources

+50 resources+60 resources

+10 resources




1800000

1810000

1820000

1830000

1840000

1850000

1860000

1870000

1880000

1890000

1900000

580 590 600 610 620 630 640 650

# of drivers

s1

s2

s3

s4

s5

s6

s7

s8

s9

s10

avg

pred




1800000

1810000

1820000

1830000

1840000

1850000

1860000

1870000

1880000

1890000

1900000

580 590 600 610 620 630 640 650

# of drivers

s1

s2

s3

s4

s5

s6

s7

s8

s9

s10

avg

pred

av



-500

0

500

1000

1500

2000

2500

3000

3500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Driver types


Add drivers


Reduce drivers

Questions?

the optimizing-simulator: merging optimization and ......special maintenance at airbase -1000...

Documents