© 2003 warren b. powell slide 1 approximate dynamic programming for high dimensional resource...
Post on 20-Jan-2016
330 views
TRANSCRIPT
© 2003 Warren B. Powell Slide 1
Approximate Dynamic Programming forHigh Dimensional Resource Allocation
NSFElectric Power workshop
November 3, 2003
Warren PowellCASTLE Laboratory
Princeton Universityhttp://www.castlelab.princeton.edu
© 2003 Warren B. Powell, Princeton University
© 2003 Warren B. Powell Slide 2
Schneider National
© 2003 Warren B. Powell Slide 3
Schneider National
© 2003 Warren B. Powell Slide 4
© 2003 Warren B. Powell Slide 5
© 2003 Warren B. Powell Slide 7
Air Mobility Command
AirMobility
Command
Fuel
Cargo HandlingRamp Space
Maintenance
Cargo Holding
© 2003 Warren B. Powell Slide 8
The optimization challenge
1
2
3
4
5
6
a
a
a
a
a
a
1
2
3
4
5
6
a
a
a
a
a
a
1
2
3
4
5
6
a
a
a
a
a
a
1
2
3
4
5
6
a
a
a
a
a
a
1
2
3
4
5
6
a
a
a
a
a
a
Special equipment
© 2003 Warren B. Powell Slide 11
State variables Modeling the military airlift problem:
» State variables:
» Control variables:
The attributes of the aircraft
The attribute space
1 If the aircraft has attribute
The resource state vector
The resource state space
aircraft
ta
t ta a
a
a
R a
R R
A
A
R=
Vector representing what we can do with the aircrafttx
© 2003 Warren B. Powell Slide 12
State variablesWe can formulate the problem of determining what to do with our
aircraft as a dynamic program:
1 1( ) max ( , ) ( ) |
So just how big is our state space ?
t t t t t t t t tx
V R c R x E V R R R
XR
R?
© 2003 Warren B. Powell Slide 13
State variables
If we only have N=1 aircraft:
| | The number of potential attributes an aircraft
may have.
=| |
If the attribute vector has one dimension:
Location | |=100 - 1000 locations
The attribute space grows with the number of dim
Aircrafta
R
A
A|
ensions:
Location 500 locations, 10 aircraft types
Aircraft type | |=5000
Location500 locations, 10 aircraft types,2 states
Aircraft type | |=10,000
Loaded/empty
Aircraft
Aircraft
a
a
A|
A|
© 2003 Warren B. Powell Slide 14
State variables
What if we have N>1 aircraft?
| | 1| |
| | 1
N
AR
A
Number of
resources
Attribute space
State space
1 1 1 1 100 100 1 1000 1,000 5 10 2,002 5 100 91,962,520 5 1000 8,416,958,750,200
50 10 12,565,671,261 50 100 13,419,107,273,154,600,000,000,000,000,000,000,000,000 50 1000 109,740,941,767,311,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000
© 2003 Warren B. Powell Slide 15
State variables
0
10
20
30
40
50
60
70
80
90
10
0
11
0
12
0
13
0
14
0
15
0
16
0
17
0
18
0
19
0
20
0
21
0
22
0
23
0
24
0
S1
S4
S7
S100
10
20
30
40
50
60
70
80
90
Number of statesLog scale
Number of resources
Number of attributes
Number of resources Num
ber
of a
ttri
bute
s
Nu
mbe
r of
zer
oes
in s
ize
of s
tate
spa
ce
© 2003 Warren B. Powell Slide 16
Outline
An algorithmic strategy for high-dimensional asset
allocation problems
© 2003 Warren B. Powell Slide 17
Approximate dynamic programming
Systems evolve through a cycle of exogenous and endogenous information
Time
1R̂
1x 2x 3x 4x 5x 6x0x
2R̂ 3R̂ 4R̂ 5R̂ 6R̂
© 2003 Warren B. Powell Slide 18
Approximate dynamic programming
Systems evolve through a cycle of exogenous and endogenous information
Time
1R̂
1x 2x 3x 4x 5x 6x0x
2R̂ 3R̂ 4R̂ 5R̂ 6R̂
1R 2R 3R 4R 5R 6R0R
© 2003 Warren B. Powell Slide 19
Approximate dynamic programming
Using this state variable, we obtain the optimality equations:
Problem: Curse of dimensionality
1 1( ) max ( , ) ( ) |t t t t t t t tx
V R c R x E V R R
X
Three curses
State space
Outcome spaceAction space (feasible region)
© 2003 Warren B. Powell Slide 20
Approximate dynamic programming
The computational challenge:
1 1( ) max ( , ) ( ) |t t t t t t t tx
V R c R x E V R R
X
How do we find ? 1 1( )t tV R
How do we compute the expectation?
How do we find the optimal solution?
© 2003 Warren B. Powell Slide 21
Approximate dynamic programming
Approximation methodology:
1 1
We start with:
( ) max ( , ) |t t t t t t t tt
V R c R x E V R Rx
Can’t compute this!!!
1 1
We solve this for a sample realization:
( , ) max ( , ) ( )t t t t t t tt
V R c R x V Rx
1 1
Now substitute in function approximations:
( , ) max ( , ) ( )t t t t t t tt
V R c R x V Rx
Don’t know what this is!
1ˆSeeing is cheating!tR
© 2003 Warren B. Powell Slide 22
Adaptive dynamic programming
Alternative: Change the definition of the state variable:
Time
1R̂
1x 2x 3x 4x 5x 6x0x
2R̂ 3R̂ 4R̂ 5R̂ 6R̂
1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 3R1R 2R 4R 5R0R
© 2003 Warren B. Powell Slide 23
Adaptive dynamic programming
Now our optimality equation looks like:
We drop the expectation and solve the conditional problem:
Finally, we substitute in our approximation:
1, 1 1( ) max ( , ) ( ( , )) |t
x x xt t t t t t t t t t
xV R E c R x V R x R
tX
1 1 1( ) )
ˆ ˆ( , ( )) max ( , ( ), ( )) ,x x xt t t t t t t t t t
xV R R c R R x V R x
(X
1 1 1( ) )
ˆ ˆ( , ( )) max ( , ( ), ( )) ,x x xt t t t t t t t t t
xV R R c R R x V R x
(X
© 2003 Warren B. Powell Slide 24
Adaptive dynamic programming
Approximating the value function:» We choose approximations of the form:
Linear (in the resource state):
ˆ ˆ ( )
Nonlinear, separable:
ˆ ˆ ( ) ( )
t t ta taa
t t ta taa
V R v R
V R V R
A
A
© 2003 Warren B. Powell Slide 25
Approximate dynamic programming
Multistage problems are typically solved as sequences of two-stage
problems of the general form:
max ( , )x t t t tC x E V x X
This period
Future
© 2003 Warren B. Powell Slide 26
Approximate dynamic programming
Our basic strategy:
1ˆarg max ( )
where
( ) Resource vector
n nx t t ta ta
a
t t t
x C x V R x
R x A x
XA
Separable approximation
ˆ ( ( ))ta taV R x
0 1 2 3 4 5
© 2003 Warren B. Powell Slide 27
Research questions in electric power
Special equipment
© 2003 Warren B. Powell Slide 28
Research questions in electric power
Two-stage resource allocation under uncertainty
© 2003 Warren B. Powell Slide 29
Approximate dynamic programming
© 2003 Warren B. Powell Slide 30
Approximate dynamic programming
© 2003 Warren B. Powell Slide 31
Approximate dynamic programming
© 2003 Warren B. Powell Slide 32
Approximate dynamic programming We estimate the functions by sampling from our distributions.
1nR
2nR
3nR
4nR
5nR
1 ( )nD
2 ( )nD w
3 ( )nD w
( )nCD w
1( )nv
2 ( )nv
3( )nv
4 ( )nv
5 ( )nv
Marginal value:
© 2003 Warren B. Powell Slide 33
A dynamic network:
Approximate dynamic programming
t
© 2003 Warren B. Powell Slide 34
Approximate dynamic programming
Stepping through time:
© 2003 Warren B. Powell Slide 35
Approximate dynamic programming
Iterative learning:
© 2003 Warren B. Powell Slide 36
Nonlinear approximations
0.0
0.5
1.0
1.5
2.0
2.5
0 1 2 3 4 5 6 7 8 9 10
Variable Value, s
Fu
nc
tio
na
l Va
lue
, f(s
) =
ln(1
+s
)
Exact
1 Iter
2 Iter
5 Iter
10 Iter
15 Iter
20 Iter
Number of resources
Ap
prox
imat
e va
lue
func
tion
© 2003 Warren B. Powell Slide 37
Competing algorithmic strategies
Competing optimal algorithms:
» Discrete dynamic programming• Cannot handle even small problems• Numerical comparisons are meaningless
» Stochastic programming• Bender’s decomposition is optimal for this problem class
1 1 1
1
1
1
1,
arg max
subject
ˆˆ
to
ˆ ˆ( ) ( ) for all
:
nt t t
t
t t t t t
t t t
x c x z
z
x x
x
t 1X X t
© 2003 Warren B. Powell Slide 38
Benders decomposition
0.00
20,000.00
40,000.00
60,000.00
80,000.00
100,000.00
120,000.00
25 50 100 250 500 1000 2500 5000
Variations on Bender’s decomposition
SPAR algorithm
Deterministicapproximation
Iterations
© 2003 Warren B. Powell Slide 39
Conclusions:
» Using sequences of separable, nonlinear approximations conquers the explosive growth with the number of resources.
» We are now solving problems with thousands of resources.
» But what about the attribute space?• Complex equipment and people are typically described by
vectors of attributes.• We require multidimensional attributes to capture complex
assets such as equipment and people.• The size of the attribute space grows exponentially in the
number of dimensions.
© 2003 Warren B. Powell Slide 40
Benders decompositionPercent from optimal 100 iterations
0
5
10
15
20
25
30
35
40
45
SD L-shaped CUPPS SPAR
Variations on Benders decomposition SPAR
Per
cent
ove
r op
tim
al
Attribute space = 10
Attribute space = 25
Attribute space = 50
Attribute space = 100
© 2003 Warren B. Powell Slide 41
Benders decompositionPercent from optimal 100 iterations
0
5
10
15
20
25
30
35
40
45
SD L-shaped CUPPS SPAR
Variations on Benders decomposition SPAR
Per
cent
ove
r op
tim
al
Increasing problem size makes solution much worse
With SPAR, the solution gets better.
© 2003 Warren B. Powell Slide 42
Multidimensional attribute spaces
Resource attribute:
"State" that the trucker is currently ina
decision d
da
© 2003 Warren B. Powell Slide 43
Multidimensional attribute spaces
00 TXv
00 NYv
4501 TXv
$450
© 2003 Warren B. Powell Slide 44
NE regionPA
TX
NY?PAv
PA NYv v
NEv
PA NEv v
Multidimensional attribute spaces
© 2003 Warren B. Powell Slide 45
Hierarchical Aggregation
We can use a family of aggregation functions:
Driver domicile
Sleeper type
Capacity type
Current location
DOT hours
(nearest hour)
Drivera
0G
Driver domicile
Current location
DOT hours
(nearest 4 hours)
1G
Driver domicile
Current location
2G
Driver domicile
Current region
3G
Trucka
We can use different levels of aggregation to capture the value of an asset:
© 2003 Warren B. Powell Slide 47
Hierarchical aggregation
Alternative:» Use multiple levels of aggregation at the same time
( ) ( ) g ga a a
g
v w v
Estimate at gth level of aggregation
Weight on gth level of aggregation
© 2003 Warren B. Powell Slide 48
x
f(x)
Hierarchical aggregation
© 2003 Warren B. Powell Slide 49
x
f(x)
High structure Moderate structure Zero structure
Hierarchical aggregation
© 2003 Warren B. Powell Slide 50
Bayesian weights
Wei
ght o
n di
sagg
rega
te
leve
l
Optimal weights
Hierarchical aggregation
© 2003 Warren B. Powell Slide 51
© 2003 Warren B. Powell Slide 52
Hierarchical aggregation
1400000
1450000
1500000
1550000
1600000
1650000
1700000
1750000
1800000
1850000
1900000
0 100 200 300 400 500 600 700 800 900 1000
Iteration
Ob
jec
tiv
e f
un
cti
on
Aggregate
Disaggregate
Weighted Combination
© 2003 Warren B. Powell Slide 53
Hierarchical aggregation
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 200 400 600 800 1000 1200
Iteration
Weig
hts
Iterations
Wei
ghts
1
32
4
5
Aggregation level
67
Weight on most disaggregate level
Weight on most aggregate levels
Optimal weights change as the algorithm progresses:
© 2003 Warren B. Powell Slide 54
Conclusions
» Hierarchical aggregation offers a powerful mechanism for handling high dimensional, arbitrary attribute spaces
» Combined with the use of separable approximations for handling large numbers of assets, we have a powerful approach for large-scale resource allocation problems.
© 2003 Warren B. Powell Slide 55
Research questions
Algorithmic questions:» Stepsizes and rate of convergence
1500000
1550000
1600000
1650000
1700000
1750000
1800000
1850000
1900000
1950000
2000000
0 100 200 300 400 500 600 700 800 900 1000
10001500000
1550000
1600000
1650000
1700000
1750000
1800000
1850000
1900000
1950000
2000000
0 10 20 30 40 50 60 70 80 90 100
100
We need to improve our understanding of adaptive stepsizes.
© 2003 Warren B. Powell Slide 57
Research questions in electric power
Application to electric power:» Fuel optimization (continuous assets):
• What fuel to purchase when we can switch between fuels• Design of fuel contracts• Determining prices of forward contracts• How much and where to store fuel.
» Asset management problems (discrete assets):• Unit commitment problems
– Control of hydro units• Positioning of assets for emergency response
– Special equipment– People with specialized training
© 2003 Warren B. Powell Slide 58
Research questions in electric power
Special equipment