tutorial: a unified framework for optimization under uncertainty...
TRANSCRIPT
![Page 1: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/1.jpg)
Tutorial: A Unified Framework for
Optimization under Uncertainty
Enterprisewide Optimization GroupCarnegie Mellon University
September 11, 2018
Warren B. Powell
Princeton UniversityDepartment of Operations Research
and Financial Engineering
© 2018 Warren B. Powell, Princeton University
![Page 2: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/2.jpg)
Ad-click bidding
Roomsage.com
![Page 3: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/3.jpg)
Revenue management
Earning vs. learning
» You want to maximize
revenues, but you do not
know how demand
responds to price.
You earn the most with
prices near the middle, but
you do not learn anything.
You learn the most by sampling
endpoints, but then you do not
earn anything.
![Page 4: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/4.jpg)
Learning problems
Health sciences
» Sequential design of
experiments for drug discovery
» Drug delivery – Optimizing the
design of protective
membranes to control drug
release
» Medical decision making –
Optimal learning for medical
treatments.
![Page 5: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/5.jpg)
Drug discovery
Designing molecules
» X and Y are sites where we can hang substituents to change the
behavior of the molecule. We approximate the performance using
a linear belief model:
0
ij ij
sites i substituents j
Y X
» How to sequence experiments to
learn the best molecule as quickly
as possible?
Reg
ret
![Page 6: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/6.jpg)
Ride sharing
Uber/Lyft
» Provides real-time, on-demand
transportation.
» Drivers are encouraged to enter or leave
the system using pricing signals and
informational guidance.
Decisions:
» How to price to get the right balance of
drivers relative to customers.
» Real-time management of drivers.
» Policies (rules for managing drivers,
customers, …)
![Page 7: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/7.jpg)
Matching buyers with sellers
Now we have a logistic curve for
each origin-destination pair (i,j)
Number of offers for each (i,j) pair
is relatively small.
Need to generalize the learning
across hundreds to thousands of
markets.
0
0( , | )
1
aij ij ij
aij ij ij
p a
Y
p a
eP p a
e
Buyer Seller
Offered price
![Page 8: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/8.jpg)
Emergency storm response
Hurricane Sandy
» Once in 100 years?
» Rare convergence of events
» But, meteorologists did an
amazing job of forecasting
the storm.
The power grid
» Loss of power creates
cascading failures (lack of
fuel, inability to pump water)
» How to plan?
» How to react?
![Page 9: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/9.jpg)
Meeting variability with portfolios of generation
with mixtures of dispatchability
![Page 10: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/10.jpg)
Storage applications
How much energy to store in a battery to handle the
volatility of wind and spot prices to meet demands?
![Page 11: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/11.jpg)
Modeling
Before we can solve complex problems, we have
to know how to think about them.
The biggest challenge when making decisions
under uncertainty is modeling.
Min E {cx}Ax = b
x > 0
Mathematician
Software
Organize class
libraries, and set up
communications and
databases
![Page 12: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/12.jpg)
Modeling
For deterministic problems, we speak the language
of mathematical programming
» Linear programming:
» For time-staged problems
min x cx
0
Ax b
x
1 1
0
t t t t t
t t t
t
A x B x b
D x u
x
0 ,...,
0
minT
T
x x t t
t
c x
Arguably Dantzig’s biggest
contribution, more so than the
simplex algorithm, was his
articulation of optimization
problems in a standard format,
which has given algorithmic
researchers a common
language.
![Page 13: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/13.jpg)
Stochastic
programming
Markov
decision
processes
Reinforcement
learning
Optimal
control
Model
predictive
control
Robust
optimization
Approximate
dynamic
programming
Online
computation
Simulation
optimization
Stochastic
search
Decision
analysis
Stochastic
control
Simulation
optimizationDynamic
Programming
and
control
Optimal
learning
Bandit
problems
![Page 14: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/14.jpg)
Stochastic
programming
Markov
decision
processes
Reinforcement
learning
Optimal
control
Model
predictive
control
Robust
optimization
Approximate
dynamic
programming
Online
computation
Simulation
optimization
Stochastic
search
Decision
analysis
Stochastic
control
Simulation
optimizationDynamic
Programming
and
control
Optimal
learning
Bandit
problems
![Page 15: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/15.jpg)
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
![Page 16: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/16.jpg)
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
![Page 17: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/17.jpg)
Modeling dynamic systems
All sequential decision problems can be modeled
using five core components:
» State variables• What do we need to know at time t?
» Decision variables• What are our decisions?
» Exogenous information• What do we learn for the first time between t and t+1?
» Transition function• How do the state variables evolve over time?.
» Objective function• What are our performance metrics?
![Page 18: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/18.jpg)
Modeling dynamic problems
The state variable:
Controls community
"Information state"
Operations research/MDP/Computer science
, , System state, where:
Resource state (physical state)
Loca
t
t t t t
t
x
S R I B
R
tion/status of truck/train/plane
Energy in storage
Information state
Prices
Weather
Belief state ("state of knowl
t
t
I
B
edge")
Belief about performance of a drug or catalyst
Belief about the status of equipment
![Page 19: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/19.jpg)
The state variable
My definition of a state variable:
» The first depends on a policy. The second depends
only on the problem (and includes the constraints).
» Using either definition, all properly modeled problems
are Markovian!
![Page 20: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/20.jpg)
Modeling dynamic problems
Decisions:
Markov decision processes/Computer science
Discrete action
Control theory
Low-dimensional continuous vector
Operations research
Usually a discrete or continuous but high-dimensional
t
t
t
a
u
x
vector of decisions.
At this point, we do not specify to make a decision.
Instead, we define the function ( ) (or ( ) or ( )),
where specifies the type of policy. " " carries information
about the type of functi
how
X s A s U s
.on , and any tunable parameters ff
![Page 21: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/21.jpg)
The decision variables
Styles of decisions
» Binary
» Finite
» Continuous scalar
» Continuous vector
» Discrete vector
» Categorical
0,1x X
1,2,...,x X M
,x X a b
1( ,..., ), K kx x x x R
1( ,..., ), K kx x x x Z
1( ,..., ), is a category (e.g. red/green/blue)I ix a a a
![Page 22: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/22.jpg)
Modeling dynamic problems
Exogenous information:
New information that first became known at time
ˆ ˆ ˆˆ = , , ,
ˆ Equipment failures, delays, new arrivals
New drivers being hired to the network
ˆ New customer demands
t
t t t t
t
t
W t
R D p E
R
D
ˆ Changes in prices
ˆ Information about the environment (temperature, ...)
t
t
p
E
Note: Any variable indexed by t is known at time t. This convention,
which is not standard in control theory, dramatically simplifies the
modeling of information.
1 2Below, we let represent a sequence of actual observations , ,....
refers to a sample realization of the random variable .t t
W W
W W
![Page 23: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/23.jpg)
Modeling dynamic problems
The transition function
1 1
1 1
1 1
1 1
( , , )
ˆ Inventories
ˆ Spot prices
ˆ Market demands
M
t t t t
t t t t
t t t
t t t
S S S x W
R R x R
p p p
D D D
Also known as the:
“System model”
“State transition model”
“Plant model”
“Plant equation”
“Transition law”
“State equation”
“Transfer function”
“Transformation function”
“Law of motion”
“Model”
For many applications, these equations are unknown. This
is known as “model-free” dynamic programming.
![Page 24: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/24.jpg)
Objective functions
» Cumulative reward (“online learning”)
• Policies have to work well over time.
» Final reward (“offline learning”)
• We only care about how well the final decision 𝑥𝜋,𝑁 works.
» Risk
1 0
0
max , ( ), |T
t t t t t
t
C S X S W S
E
Modeling stochastic, dynamic problems
,
0ˆmax ( , ) |NF x W S
E
0 0 0 1 1 1 0max ( , ( )), ( , ( )),..., ( , ( )) |T T TC S X S C S X S C S X S S
![Page 25: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/25.jpg)
The complete model:
» Objective function
• Cumulative reward (“online learning”)
• Final reward (“offline learning”)
• Risk:
» Transition function:
» Exogenous information:
1 0
0
max , ( ), |T
t t t t t
t
C S X S W S
E
Modeling stochastic, dynamic problems
1 1, , ( )M
t t t tS S S x W
0 1 2, , ,..., TS W W W
,
0ˆmax ( , ) |NF x W S
E
0 0 0 1 1 1 0max ( , ( )), ( , ( )),..., ( , ( )) |T T TC S X S C S X S C S X S S
![Page 26: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/26.jpg)
The modeling process
Modeling real applications
» I conduct a conversation with a domain expert to fill in
the elements of a problem:
State
variables
Decision
variables
New
information
Transition
function
Objective
function
What we need to know
(and only what we need)
What we control
What we didn’t know
when we made our decision
How the state variables evolve
Performance metrics
![Page 27: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/27.jpg)
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
![Page 28: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/28.jpg)
Modeling uncertainty
Classes of uncertainty
» Observational uncertainty
» Prognostic uncertainty (forecasting)
» Experimental noise/variability
» Transitional uncertainty
» Inferential uncertainty
» Model uncertainty
» Systematic exogenous uncertainty
» Control/implementation uncertainty
» Algorithmic noise
» Goal uncertainty
Modeling uncertainty in the context of stochastic optimization is a
relatively untapped area of research.
![Page 29: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/29.jpg)
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
![Page 30: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/30.jpg)
Designing policies
We have to start by describing what we mean by a
policy.
» Definition:
A policy is a mapping from a state to an action.
… any mapping.
How do we search over an arbitrary space of
policies?
![Page 31: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/31.jpg)
Designing policies
“Policies” and the English language
Behavior Habit Procedure
Belief Laws/bylaws Process
Bias Manner Protocols
Commandment Method Recipe
Conduct Mode Ritual
Convention Mores Rule
Culture Patterns Style
Customs Plans Technique
Dogma Policies Tenet
Etiquette Practice Tradition
Fashion Prejudice Way of life
Formula Principle
![Page 32: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/32.jpg)
Designing policies
Two fundamental strategies for finding policies:
1) Policy search – Search over a class of functions for
making decisions to optimize some metric.
2) Lookahead approximations – Approximate the impact
of a decision now on the future.
0( , )0
max , ( | ) |f f
T
t t t tf Ft
E C S X S S
*
' ' ' ' 1
' 1
( ) arg max ( , ) max ( , ( )) | | ,t
T
t t x t t t t t t t t t
t t
X S C S x C S X S S S x
E E
![Page 33: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/33.jpg)
Designing policies
Policy search:
1a) Policy function approximations (PFAs)• Lookup tables
– “when in this state, take this action”
• Parametric functions
– Order-up-to policies: if inventory is less than s, order up to S.
– Affine policies -
– Neural networks
• Locally/semi/non parametric
– Requires optimizing over local regions
1b) Cost function approximations (CFAs)• Optimizing a deterministic model modified to handle uncertainty
(buffer stocks, schedule slack)
( )( | ) arg max ( , | )
t t
CFA
t t txX S C S x
X
( | )PFA
t tx X S
( | ) ( )PFA
t t f f t
f F
x X S S
![Page 34: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/34.jpg)
Designing policies
Lookahead approximations – Approximate the impact of a
decision now on the future:
2a) Approximating the value of being in a state (VFA):
2b) Direct lookahead (DLA)
Optimal policy:
Approximate policy – solve an approximate lookahead model:
*
1 1
1 1
( ) arg max ( , ) ( ) | ,
( ) arg max ( , ) ( ) | ,
arg max ( , ) ( )
t
t
t
t t x t t t t t t
VFA
t t x t t t t t t
x x
x t t t t
X S C S x V S S x
X S C S x V S S x
C S x V S
E
E
' ' ' , 1
' 1
( ) arg max ( , ) max ( , ( )) | | ,t
t HDLA
t t x t t tt t tt t t tt t
t t
X S C S x C S X S S S x
E E
*
' ' ' 1
' 1
( ) arg max ( , ) max ( , ( )) | | ,
t
T
t t x t t t t t t t t
t t
X S C S x C S X S S S xE E
![Page 35: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/35.jpg)
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric/nonparametric functions
2) Cost function approximation (CFAs)
»
3) Policies based on value function approximations (VFAs)
»
4) Direct lookahead policies (DLAs)
» Deterministic lookahead/rolling horizon proc./model predictive control
» Chance constrained programming
» Stochastic lookahead /stochastic prog/Monte Carlo tree search
» “Robust optimization”
Four (meta)classes of policies
( )( | ) arg max ( , | )
t t
CFA
t t txX S C S x
X
( ) arg max ( , ) ( , )t
VFA x x
t t x t t t t t tX S C S x V S S x
' '
' 1
( ) arg max ( , ) ( ) ( ( ), ( ))t
TLA S
t t tt tt tt tt
t t
X S C S x p C S x
xtt,xt ,t+1
,...,xt ,t+T
,
' '( ),...,
' 1
( ) arg max min ( , ) ( ( ), ( ))ttt t t H
TLA RO
t t tt tt tt ttw Wx x
t t
X S C S x C S w x w
[ ( )] 1t tP A x f W
Poli
cy s
earc
h
,
' ',...,
' 1
( ) arg max ( , ) ( , )
tt t t H
TLA D
t t tt tt tt ttx x
t t
X S C S x C S x
Look
ahea
dap
pro
xim
atio
ns
![Page 36: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/36.jpg)
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric/nonparametric functions
2) Cost function approximation (CFAs)
»
3) Policies based on value function approximations (VFAs)
»
4) Direct lookahead policies (DLAs)
» Deterministic lookahead/rolling horizon proc./model predictive control
» Chance constrained programming
» Stochastic lookahead /stochastic prog/Monte Carlo tree search
» “Robust optimization”
Four (meta)classes of policies
( )( | ) arg max ( , | )
t t
CFA
t t txX S C S x
X
,
' ',...,
' 1
( ) arg max ( , ) ( , )
tt t t H
TLA D
t t tt tt tt ttx x
t t
X S C S x C S x
( ) arg max ( , ) ( , )t
VFA x x
t t x t t t t t tX S C S x V S S x
' '
' 1
( ) arg max ( , ) ( ) ( ( ), ( ))t
TLA S
t t tt tt tt tt
t t
X S C S x p C S x
xtt,xt ,t+1
,...,xt ,t+T
,
' '( ),...,
' 1
( ) arg max min ( , ) ( ( ), ( ))ttt t t H
TLA RO
t t tt tt tt ttw Wx x
t t
X S C S x C S w x w
[ ( )] 1t tP A x f W
Funct
ion a
ppro
x.
![Page 37: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/37.jpg)
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric/nonparametric functions
2) Cost function approximation (CFAs)
»
3) Policies based on value function approximations (VFAs)
»
4) Direct lookahead policies (DLAs)
» Deterministic lookahead/rolling horizon proc./model predictive control
» Chance constrained programming
» Stochastic lookahead /stochastic prog/Monte Carlo tree search
» “Robust optimization”
Four (meta)classes of policies
( )( | ) arg max ( , | )
t t
CFA
t t txX S C S x
X
,
' ',...,
' 1
( ) arg max ( , ) ( , )
tt t t H
TLA D
t t tt tt tt ttx x
t t
X S C S x C S x
( ) arg max ( , ) ( , )t
VFA x x
t t x t t t t t tX S C S x V S S x
' '
' 1
( ) arg max ( , ) ( ) ( ( ), ( ))t
TLA S
t t tt tt tt tt
t t
X S C S x p C S x
xtt,xt ,t+1
,...,xt ,t+T
,
' '( ),...,
' 1
( ) arg max min ( , ) ( ( ), ( ))ttt t t H
TLA RO
t t tt tt tt ttw Wx x
t t
X S C S x C S w x w
[ ( )] 1t tP A x f W
Imb
edd
ed o
pti
miz
atio
n
![Page 38: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/38.jpg)
Learning problems
Classes of learning problems in stochastic
optimization
1) Approximating the objective
𝐹(𝑥|𝜃) ≈ 𝔼𝐹(𝑥,𝑊).
2) Designing a policy 𝑋𝜋 𝑆 𝜃 .
3) A value function approximation
𝑉𝑡(𝑆𝑡|𝜃) ≈ 𝑉𝑡(𝑆𝑡).
4) Designing a cost function approximation:• The objective function 𝐶𝜋 𝑆𝑡 , 𝑥𝑡|𝜃 .
• The constraints 𝑋𝜋(𝑆𝑡|𝜃)
5) Approximating the transition function
𝑆𝑀(𝑆𝑡 , 𝑥𝑡 ,𝑊𝑡+1|𝜃) ≈ 𝑆𝑀(𝑆𝑡 , 𝑥𝑡 ,𝑊𝑡+1)
![Page 39: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/39.jpg)
Approximation strategies
Approximation strategies» Lookup tables
• Independent beliefs
• Correlated beliefs
» Linear parametric models• Linear models
• Sparse-linear
• Tree regression
» Nonlinear parametric models• Logistic regression
• Neural networks
» Nonparametric models• Gaussian process regression
• Kernel regression
• Support vector machines
• Deep neural networks
![Page 40: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/40.jpg)
Learning challenges
The learning challenge
From big (batch) data… … to recursive learning
![Page 41: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/41.jpg)
Learning challenges
Variable-dimensional learning
![Page 42: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/42.jpg)
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
![Page 43: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/43.jpg)
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
![Page 44: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/44.jpg)
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
![Page 45: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/45.jpg)
Policy function approximations
Battery arbitrage – When to charge, when to
discharge, given volatile LMPs
![Page 46: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/46.jpg)
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Grid operators require that batteries bid charge and
discharge prices, an hour in advance.
We have to search for the best values for the policy
parameters
DischargeCharge
Charge Discharge and .
Policy function approximations
![Page 47: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/47.jpg)
Policy function approximations
Our policy function might be the parametric
model (this is nonlinear in the parameters):charge
charge discharge
charge
1 if
( | ) 0 if
1 if
t
t t
t
p
X S p
p
Energy in storage:
Price of electricity:
![Page 48: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/48.jpg)
Policy function approximations
Finding the best policy
» We need to maximize
» We cannot compute the expectation, so we run simulations:
DischargeCharge
0
max ( ) , ( | )T
t
t t t
t
F C S X S
E
![Page 49: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/49.jpg)
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
![Page 50: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/50.jpg)
Cost function approximations
Lookup table
» We can organize potential catalysts into groups
» Scientists using domain knowledge can estimate
correlations in experiments between similar catalysts.
![Page 51: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/51.jpg)
51
Cost function approximations
Correlated beliefs: Testing one material teaches us about other
materials
1 2 3 4 4 5
![Page 52: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/52.jpg)
Cost function approximations
Cost function approximations (CFA)
» Upper confidence bounding
» Interval estimation
» Boltzmann exploration (“soft max”)• Choose x with probability:
log( | ) arg max
UCB n UCB n UCB
x x n
x
nX S
N
( | ) arg maxIE n IE n IE n
x x xX S
0
n
xz
'
'
( )
nx
nx
n
x
x
eP
e
![Page 53: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/53.jpg)
53
Cost function approximations
Picking 𝜃𝐼𝐸 = 0 means we are evaluating each choice
at the mean.
1 2 3 4 4 5
![Page 54: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/54.jpg)
54
Cost function approximations
Picking 𝜃𝐼𝐸 = 2 means we are evaluating each choice
at the 95th percentile.
1 2 3 4 4 5
![Page 55: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/55.jpg)
Cost function approximations
Optimizing the policy
» We optimize 𝜃𝐼𝐸 to maximize:
where
Notes:
» This can handle any belief model,
including correlated beliefs, nonlinear
belief models.
» All we require is that we be able to
simulate a policy.
( | ) arg max ( , )n IE n IE n IE n n n n
x x x x xx X S S
,max ( ) ,IE
IE NF F x W
E
Reg
ret
![Page 56: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/56.jpg)
Cost function approximations
Other applications
» Airlines optimizing schedules with schedule slack to
handle weather uncertainty.
» Manufacturers using buffer stocks to hedge against
production delays and quality problems.
» Grid operators scheduling extra generation capacity in
case of outages.
» Adding time to a trip planned by Google maps to
account for uncertain congestion.
![Page 57: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/57.jpg)
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
![Page 58: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/58.jpg)
Value function approximations
Q-learning (for discrete actions)
» But what if the action a is a vector?
1
'
1
1 1
ˆ ( , ) ( , ) max ( ', ')
ˆ( , ) (1 ) ( , ) ( , )
n n n n n n
a
n n n n n n n n n
n n
q s a r s a Q s a
Q s a Q s a q s a
g
a a
-
-
- -
= +
= - +
![Page 59: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/59.jpg)
Blood management
Managing blood inventories
![Page 60: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/60.jpg)
Blood management
Managing blood inventories over time
t=0
0S
1 1ˆ ˆ,R D
1S
Week 1
1x
2 2ˆ ˆ,R D
2S
Week 2
2x
2
xS
3 3ˆ ˆ,R D
3S3x
Week 3
3
xS
t=1 t=2 t=3
Week 0
0x
![Page 61: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/61.jpg)
O-,1
O-,2
O-,3
AB+,2
AB+,3
O-,0
,ˆ
t ABD
AB+,0
AB+,1
AB+,2
O-,0
O-,1
O-,2
,( ,0)t ABR
,( ,1)t ABR
,( ,2)t ABR
,( ,0)t OR
,( ,1)t OR
,( ,2)t OR
,ˆ
t ABD
,ˆ
t AD
,ˆ
t ABD
,ˆ
t ABD
,ˆ
t ABD
,ˆ
t ABD
AB+
AB-
A+
A-
B+
B-
O+
O-
x
tR
AB+,0
AB+,1
,ˆ
t ABD
Satisfy a demand Hold
tS ˆ , t tR D
![Page 62: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/62.jpg)
AB+,0
AB+,1
AB+,2
tR
O-,0
O-,1
O-,2
x
tR
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
1,ˆ
t ABR
1tR
1,ˆ
t OR
ˆtD
,( ,0)t ABR
,( ,1)t ABR
,( ,2)t ABR
,( ,0)t OR
,( ,1)t OR
,( ,2)t OR
![Page 63: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/63.jpg)
AB+,0
AB+,1
AB+,2
tR x
tR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
,( ,0)t ABR
,( ,1)t ABR
,( ,2)t ABR
,( ,0)t OR
,( ,1)t OR
,( ,2)t OR
![Page 64: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/64.jpg)
( )tF R
AB+,0
AB+,1
AB+,2
tR x
tR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
,( ,0)t ABR
,( ,1)t ABR
,( ,2)t ABR
,( ,0)t OR
,( ,1)t OR
,( ,2)t OR
Solve this as a
linear program.
![Page 65: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/65.jpg)
( )tF R
AB+,0
AB+,1
AB+,2
tR x
tR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
Dual variables give
value additional
unit of blood..
Duals
,( ,0)t̂ AB
,( ,1)t̂ AB
,( ,2)t̂ AB
,( ,0)t̂ O
,( ,1)t̂ O
,( ,2)t̂ O
![Page 66: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/66.jpg)
Updating the value function approximation
Estimate the gradient at
,( ,2)
n
t ABR
,( ,2)ˆn
t AB
n
tR
( )tF R
![Page 67: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/67.jpg)
Updating the value function approximation
Update the value function at
,
1
x n
tR
1
1 1( )n x
t tV R
,
1
x n
tR
,( ,2)ˆn
t AB
( )tF R
,( ,2)
n
t ABR
![Page 68: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/68.jpg)
Updating the value function approximation
Update the value function at ,
1
x n
tR
,( ,2)ˆn
t AB
,
1
x n
tR
1
1 1( )n x
t tV R
![Page 69: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/69.jpg)
Updating the value function approximation
Update the value function at ,
1
x n
tR
,
1
x n
tR
1
1 1( )n x
t tV R
1 1( )n x
t tV R
![Page 70: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/70.jpg)
Exploiting concavity
Derivatives are used to estimate a piecewise linear approximation
( )t tV R
tR
![Page 71: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/71.jpg)
Iterative learning
t
![Page 72: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/72.jpg)
Iterative learning
![Page 73: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/73.jpg)
Iterative learning
![Page 74: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/74.jpg)
Iterative learning
![Page 75: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/75.jpg)
1200000
1300000
1400000
1500000
1600000
1700000
1800000
1900000
0 100 200 300 400 500 600 700 800 900 1000
Ob
jecti
ve fu
ncti
on
Iterations
Approximate dynamic programming
… a typical performance graph.
![Page 76: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/76.jpg)
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
![Page 77: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/77.jpg)
Lookahead policies
Planning your next chess move:
» You put your finger on the piece while you think about
moves into the future. This is a lookahead policy,
illustrated for a problem with discrete actions.
![Page 78: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/78.jpg)
![Page 79: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/79.jpg)
Lookahead policies
Decision trees:
![Page 80: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/80.jpg)
Lookahead policies
Modeling lookahead policies
» Lookahead policies solve a lookahead model, which is an
approximation of the future.
» It is important to understand the difference between the:
• Base model – this is the model we are trying to solve by finding
the best policy. This is usually some form of simulator.
• The lookahead model, which is our approximation of the future
to help us make better decisions now.
» The base model is typically a simulator, or it might be the
real world.
![Page 81: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/81.jpg)
XX
X
X
X
X
callcall
call
call
call
call
call
0.26 0.38 0.78 0.6 0.6
0
0
0
0.02
0.03
0.04
0.04
0.05
0.064
0.0640.064 0.064
0.502 0.502
0.502 0.502 0.502
0.603 0.603
0.670.670.05
0.05
0.5030.5
0.76
0.5030.503
0.540.54
0.503
0.5030.503
Emergency storm response
![Page 82: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/82.jpg)
XX
X
X
X
X
0.0 0.0 0.0 0.0 0.0
0
0
00.032
0.048
0.05
60.056
0.08
0.0604
0.06040.0604 0.0604
0.51 0.51
0.51 0.51 0
0.62 0.62
00.990.08
0.08
0.5030.5
0.76
0.5030.503
0.540.54
0.503
0.5030.503
Emergency storm response
callcall
call
call
call
call
call
![Page 83: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/83.jpg)
XX
X
X
X
X
0.0 0.0 0.0 0.0 0.0
0
0
00.0
0.0
0.0
0.0
0.06
0.060.06 0.06
0 0
0 0
0 0
000.0
0.0
0.5030.5
0.76
0.5030.503
0.540.54
0.503
0.5030.503
00.0
Emergency storm response
callcall
call
call
call
call
call
![Page 84: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/84.jpg)
Decision Outcome Decision Outcome Decision
![Page 85: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/85.jpg)
Lookahead policies
Monte Carlo tree search:
C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis and S.
Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games,
vol. 4, no. 1, pp. 1–49, March 2012.
![Page 86: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/86.jpg)
![Page 87: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/87.jpg)
Outline
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Direct lookahead policies (DLAs)
» A hybrid lookahead/CFA
![Page 88: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/88.jpg)
Parametric cost function approximation
An energy storage problem:
![Page 89: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/89.jpg)
Parametric cost function approximation
Forecasts evolve over time as new information arrives:
Actual
Rolling forecasts,
updated each
hour.
Forecast made at
midnight:
![Page 90: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/90.jpg)
Parametric cost function approximation
Benchmark policy – Deterministic lookahead
![Page 91: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/91.jpg)
Parametric cost function approximation
Parametric cost function approximations
» Replace the constraint
with:
» Lookup table modified forecasts (one adjustment term for
each time in the future):
» Exponential function for adjustments (just two parameters)
» Constant adjustment (one parameter)
't t
'
wr
ttx '
wd
ttx
![Page 92: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/92.jpg)
0fs = 10fs =
20fs = 30fs =
Lookup table
Constant parameter
Exponential function
𝜃
𝜃
𝜃
𝜃
![Page 93: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/93.jpg)
Parametric cost function approximation
Improvement over deterministic benchmark:
Lookup tableExponential
Constant
![Page 94: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/94.jpg)
An energy storage problem
Consider a basic energy storage problem:
» We are going to show that with minor variations in the
characteristics of this problem, we can make each class
of policy work best.
![Page 95: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/95.jpg)
An energy storage problem
We can create distinct flavors of this problem:
» Problem class 1 – Best for PFAs• Highly stochastic (heavy tailed) electricity prices
• Stationary data
» Problem class 2 – Best for CFAs• Stochastic prices and wind (but not heavy tailed)
• Stationary data
» Problem class 3 - Best for VFAs• Stochastic wind and prices (but not too random)
• Time varying loads, but inaccurate wind forecasts
» Problem class 4 – Best for deterministic lookaheads• Relatively low noise problem with accurate forecasts
» Problem class 5 – A hybrid policy worked best here• Stochastic prices and wind, nonstationary data, noisy forecasts.
![Page 96: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/96.jpg)
An energy storage problem
The policies
» The PFA:• Charge battery when price is below p1
• Discharge when price is above p2
» The CFA• Optimize over a horizon H; maintain upper and lower bounds (u, l)
for every time period except the first (note that this is a hybrid with a
lookahead).
» The VFA• Piecewise linear, concave value function in terms of energy, indexed
by time.
» The lookahead (deterministic)• Optimize over a horizon H (only tunable parameter) using forecasts of
demand, prices and wind energy
» The lookahead CFA• Use a lookahead policy (deterministic), but with a tunable parameter
that improves robustness.
![Page 97: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/97.jpg)
An energy storage problem
Each policy is best on certain problems
» Results are percent of posterior optimal solution
» … any policy might be best depending on the data.
Joint research with Prof. Stephan Meisel, University of Muenster, Germany.
![Page 98: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/98.jpg)
Outline
Elements of a dynamic model
Modeling uncertainty
Designing policies
The four classes of policies
From deterministic to stochastic optimization
![Page 99: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/99.jpg)
From deterministic to stochastic
Imagine that you would like to solve the time-dependent
linear program:
» subject to
We can convert this to a proper stochastic model by
replacing with and taking an expectation:
The policy has to satisfy with transition function:
0 ,...,
0
minT
T
x x t t
t
c x
0 0 0
1 1 , 1.t t t t t
A x b
A x B x b t
tx ( )t tX S
0
min ( )T
t t t
t
c X S
E
( )t tX St t tA x R
1 1, ,M
t t t tS S S x W
![Page 100: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/100.jpg)
Modeling
Deterministic
» Objective function
» Decision variables:
» Constraints:
• at time t
• Transition function
Stochastic
» Objective function
» Policy
» Constraints at time t
» Transition function
» Exogenous information
1 0
0
max , ( ), |
T
t t t t t
t
E C S X S W S0 ,...,
0
minT
T
t tx x
t
c x
( )t t t tx X S X
1 1t t t tR b B x 1 1, ,M
t t t tS S S x W
0 1 2( , , ,..., )TS W W W
0
t t t
t
A x R
x
t
X
0 ,..., Tx x :X S X
![Page 101: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/101.jpg)
From deterministic to stochastic
Deterministic problems
» Modeling is important, but not
central.
» Algorithms are the most
important, and hardest part.
» Huh?
» Just add up the costs!!
Stochastic problems
» Modeling is the most
important, and hardest, aspect
of stochastic optimization
» Searching for policies is
important, but less critical.
» Modeling uncertainty is often
overlooked, but is of central
importance.
» Evaluating a policy is
important, and difficult. In a
simulator? In the field?
![Page 102: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/102.jpg)
The universal objective function
with
You next need to develop a stochastic model:
» Model uncertainty about parameters in 𝑆0» Model the stochastic process 𝑊1,𝑊2, … ,𝑊𝑁 (for training)
» Model the random variable 𝑊 (for testing, if necessary)
Then search for policies:
» Policy search:• PFAs, CFAs
» Lookahead policies:• VFAs, DLAs
1 0
0
max , ( ), |
T
t t t t t
t
E C S X S W S
Modeling stochastic, dynamic problems
1 1, , ( )M
t t t tS S S x W
![Page 103: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/103.jpg)
Thank you!
For more information, go to
http://www.castlelab.princeton.edu/jungle/
Scroll to “Educational materials”
![Page 104: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/104.jpg)
![Page 105: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/105.jpg)
![Page 106: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/106.jpg)
![Page 107: Tutorial: A Unified Framework for Optimization under Uncertainty …egon.cheme.cmu.edu/ewo/docs/EWO_Seminar_09_11_2018.pdf · » Cumulative reward (“online learning”) • Policies](https://reader036.vdocuments.us/reader036/viewer/2022071212/6026b0a746ca4272c37fb39d/html5/thumbnails/107.jpg)
Theory
Applications
Computation
Modeling