© 2003 warren b. powell slide 1 approximate dynamic programming for high dimensional resource...

© 2003 Warren B. Powell Slide 1

Approximate Dynamic Programming forHigh Dimensional Resource Allocation

NSFElectric Power workshop

November 3, 2003

Warren PowellCASTLE Laboratory

Princeton Universityhttp://www.castlelab.princeton.edu

© 2003 Warren B. Powell, Princeton University


Schneider National


Air Mobility Command

AirMobility

Command

Fuel

Cargo HandlingRamp Space

Maintenance

Cargo Holding


The optimization challenge

1

2

3

4

5

6

a

a

a

a

a

a

1

2

3

4

5

6

a

a

a

a

a

a

1

2

3

4

5

6

a

a

a

a

a

a

1

2

3

4

5

6

a

a

a

a

a

a

1

2

3

4

5

6

a

a

a

a

a

a

Special equipment


State variables Modeling the military airlift problem:

» State variables:

» Control variables:

The attributes of the aircraft

The attribute space

1 If the aircraft has attribute

The resource state vector

The resource state space

aircraft

ta

t ta a

a

a

R a

R R

A

A

R=

Vector representing what we can do with the aircrafttx


State variablesWe can formulate the problem of determining what to do with our

aircraft as a dynamic program:

1 1( ) max ( , ) ( ) |

So just how big is our state space ?

t t t t t t t t tx

V R c R x E V R R R

XR

R?


State variables

If we only have N=1 aircraft:

| | The number of potential attributes an aircraft

may have.

=| |

If the attribute vector has one dimension:

Location | |=100 - 1000 locations

The attribute space grows with the number of dim

Aircrafta

R

A

A|

ensions:

Location 500 locations, 10 aircraft types

Aircraft type | |=5000

Location500 locations, 10 aircraft types,2 states

Aircraft type | |=10,000

Loaded/empty

Aircraft

Aircraft

a

a

A|

A|


State variables

What if we have N>1 aircraft?

| | 1| |

| | 1

N

AR

A

Number of

resources

Attribute space

State space

1 1 1 1 100 100 1 1000 1,000 5 10 2,002 5 100 91,962,520 5 1000 8,416,958,750,200

50 10 12,565,671,261 50 100 13,419,107,273,154,600,000,000,000,000,000,000,000,000 50 1000 109,740,941,767,311,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000


State variables

0

10

20

30

40

50

60

70

80

90

10

0

11

0

12

0

13

0

14

0

15

0

16

0

17

0

18

0

19

0

20

0

21

0

22

0

23

0

24

0

S1

S4

S7

S100

10

20

30

40

50

60

70

80

90

Number of statesLog scale

Number of resources

Number of attributes

Number of resources Num

ber

of a

ttri

bute

s

Nu

mbe

r of

zer

oes

in s

ize

of s

tate

spa

ce


Outline

An algorithmic strategy for high-dimensional asset

allocation problems


Approximate dynamic programming

Systems evolve through a cycle of exogenous and endogenous information

Time

1R̂

1x 2x 3x 4x 5x 6x0x

2R̂ 3R̂ 4R̂ 5R̂ 6R̂



Systems evolve through a cycle of exogenous and endogenous information

Time

1R̂

1x 2x 3x 4x 5x 6x0x

2R̂ 3R̂ 4R̂ 5R̂ 6R̂

1R 2R 3R 4R 5R 6R0R



Using this state variable, we obtain the optimality equations:

Problem: Curse of dimensionality

1 1( ) max ( , ) ( ) |t t t t t t t tx

V R c R x E V R R

X

Three curses

State space

Outcome spaceAction space (feasible region)



The computational challenge:

1 1( ) max ( , ) ( ) |t t t t t t t tx

V R c R x E V R R

X

How do we find ? 1 1( )t tV R

How do we compute the expectation?

How do we find the optimal solution?



Approximation methodology:

1 1

We start with:

( ) max ( , ) |t t t t t t t tt

V R c R x E V R Rx

Can’t compute this!!!

1 1

We solve this for a sample realization:

( , ) max ( , ) ( )t t t t t t tt

V R c R x V Rx

1 1

Now substitute in function approximations:

( , ) max ( , ) ( )t t t t t t tt

V R c R x V Rx

Don’t know what this is!

1ˆSeeing is cheating!tR


Adaptive dynamic programming

Alternative: Change the definition of the state variable:

Time

1R̂

1x 2x 3x 4x 5x 6x0x

2R̂ 3R̂ 4R̂ 5R̂ 6R̂

1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 3R1R 2R 4R 5R0R



Now our optimality equation looks like:

We drop the expectation and solve the conditional problem:

Finally, we substitute in our approximation:

1, 1 1( ) max ( , ) ( ( , )) |t

x x xt t t t t t t t t t

xV R E c R x V R x R

tX

1 1 1( ) )

ˆ ˆ( , ( )) max ( , ( ), ( )) ,x x xt t t t t t t t t t

xV R R c R R x V R x

(X

1 1 1( ) )

ˆ ˆ( , ( )) max ( , ( ), ( )) ,x x xt t t t t t t t t t

xV R R c R R x V R x

(X



Approximating the value function:» We choose approximations of the form:

Linear (in the resource state):

ˆ ˆ ( )

Nonlinear, separable:

ˆ ˆ ( ) ( )

t t ta taa

t t ta taa

V R v R

V R V R

A

A



Multistage problems are typically solved as sequences of two-stage

problems of the general form:

max ( , )x t t t tC x E V x X

This period

Future



Our basic strategy:

1ˆarg max ( )

where

( ) Resource vector

n nx t t ta ta

a

t t t

x C x V R x

R x A x

XA

Separable approximation

ˆ ( ( ))ta taV R x

0 1 2 3 4 5


Research questions in electric power

Special equipment



Two-stage resource allocation under uncertainty


Approximate dynamic programming We estimate the functions by sampling from our distributions.

1nR

2nR

3nR

4nR

5nR

1 ( )nD

2 ( )nD w

3 ( )nD w

( )nCD w

1( )nv

2 ( )nv

3( )nv

4 ( )nv

5 ( )nv

Marginal value:


A dynamic network:


t



Stepping through time:



Iterative learning:


Nonlinear approximations

0.0

0.5

1.0

1.5

2.0

2.5

0 1 2 3 4 5 6 7 8 9 10

Variable Value, s

Fu

nc

tio

na

l Va

lue

, f(s

) =

ln(1

+s

)

Exact

1 Iter

2 Iter

5 Iter

10 Iter

15 Iter

20 Iter

Number of resources

Ap

prox

imat

e va

lue

func

tion


Competing algorithmic strategies

Competing optimal algorithms:

» Discrete dynamic programming• Cannot handle even small problems• Numerical comparisons are meaningless

» Stochastic programming• Bender’s decomposition is optimal for this problem class

1 1 1

1

1

1

1,

arg max

subject

ˆˆ

to

ˆ ˆ( ) ( ) for all

:

nt t t

t

t t t t t

t t t

x c x z

z

x x

x

t 1X X t


Benders decomposition

0.00

20,000.00

40,000.00

60,000.00

80,000.00

100,000.00

120,000.00

25 50 100 250 500 1000 2500 5000

Variations on Bender’s decomposition

SPAR algorithm

Deterministicapproximation

Iterations


Conclusions:

» Using sequences of separable, nonlinear approximations conquers the explosive growth with the number of resources.

» We are now solving problems with thousands of resources.

» But what about the attribute space?• Complex equipment and people are typically described by

vectors of attributes.• We require multidimensional attributes to capture complex

assets such as equipment and people.• The size of the attribute space grows exponentially in the

number of dimensions.


Benders decompositionPercent from optimal 100 iterations

0

5

10

15

20

25

30

35

40

45

SD L-shaped CUPPS SPAR

Variations on Benders decomposition SPAR

Per

cent

ove

r op

tim

al

Attribute space = 10





Benders decompositionPercent from optimal 100 iterations

0

5

10

15

20

25

30

35

40

45

SD L-shaped CUPPS SPAR

Variations on Benders decomposition SPAR

Per

cent

ove

r op

tim

al

Increasing problem size makes solution much worse

With SPAR, the solution gets better.


Multidimensional attribute spaces

Resource attribute:

"State" that the trucker is currently ina

decision d

da



00 TXv

00 NYv

4501 TXv

$450


NE regionPA

TX

NY?PAv

PA NYv v

NEv

PA NEv v



Hierarchical Aggregation

We can use a family of aggregation functions:

Driver domicile

Sleeper type

Capacity type

Current location

DOT hours

(nearest hour)

Drivera

0G

Driver domicile

Current location

DOT hours

(nearest 4 hours)

1G

Driver domicile

Current location

2G

Driver domicile

Current region

3G

Trucka

We can use different levels of aggregation to capture the value of an asset:


Hierarchical aggregation

Alternative:» Use multiple levels of aggregation at the same time

( ) ( ) g ga a a

g

v w v

Estimate at gth level of aggregation

Weight on gth level of aggregation


x

f(x)



x

f(x)

High structure Moderate structure Zero structure



Bayesian weights

Wei

ght o

n di

sagg

rega

te

leve

l

Optimal weights




1400000

1450000

1500000

1550000

1600000

1650000

1700000

1750000

1800000

1850000

1900000

0 100 200 300 400 500 600 700 800 900 1000

Iteration

Ob

jec

tiv

e f

un

cti

on

Aggregate

Disaggregate

Weighted Combination



0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 200 400 600 800 1000 1200

Iteration

Weig

hts

Iterations

Wei

ghts

1

32

4

5

Aggregation level

67

Weight on most disaggregate level

Weight on most aggregate levels

Optimal weights change as the algorithm progresses:


Conclusions

» Hierarchical aggregation offers a powerful mechanism for handling high dimensional, arbitrary attribute spaces

» Combined with the use of separable approximations for handling large numbers of assets, we have a powerful approach for large-scale resource allocation problems.


Research questions

Algorithmic questions:» Stepsizes and rate of convergence

1500000

1550000

1600000

1650000

1700000

1750000

1800000

1850000

1900000

1950000

2000000

0 100 200 300 400 500 600 700 800 900 1000

10001500000

1550000

1600000

1650000

1700000

1750000

1800000

1850000

1900000

1950000

2000000

0 10 20 30 40 50 60 70 80 90 100

100

We need to improve our understanding of adaptive stepsizes.



Application to electric power:» Fuel optimization (continuous assets):

• What fuel to purchase when we can switch between fuels• Design of fuel contracts• Determining prices of forward contracts• How much and where to store fuel.

» Asset management problems (discrete assets):• Unit commitment problems

– Control of hydro units• Positioning of assets for emergency response

– Special equipment– People with specialized training



Special equipment

© 2003 warren b. powell slide 1 approximate dynamic programming for high dimensional resource...

Documents

state variableswhat

state variableswe

state variablesif

n1 aircraft

control variables

military airlift problem

attributesnumber of