from perturbation analysis to a new paradigm of optimization · from perturbation analysis to a new...
TRANSCRIPT
![Page 1: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/1.jpg)
From Perturbation Analysis
to a New Paradigm of Optimization
Xi-Ren CaoShanghai Jiao Tong University
![Page 2: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/2.jpg)
The Problem in Optimization
1
Policy Space: Best Policy?
D
Policy space too large for exhaustive search
(100 states, 2 actions 2100
=1030
policies, 10Gh ->1012
yrs to count)
State space too large, we cannot analyze every policy
![Page 3: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/3.jpg)
2
Perturbation Analysis (PA)- gradient-based approach
With special structure, by analyzing one policy, obtain performance of its neighboring policies Performance gradient
Queuing networks, Markov processes
q+Dq
q
gradient hill climbing
![Page 4: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/4.jpg)
3
Policies in Distance?
With special structure, by analyzing one policy, find a better policy in the distance Policy Iteration (PI): Discrete version of PA
![Page 5: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/5.jpg)
4
Continuous Discrete
➢ Performance derivatives
q
d
d???
➢ Find the best direction
➢ Hill climbing
➢ Gradient <=0 Local optimal
➢ Performance difference
' ???
➢ Find a better policy
➢ Policy Iteration
➢ No better policy Global optimal
Perturbation Analysis (PA) Relative Optimization (RO)
PA and RO are probably the only way to overcomedifficulties in exhaustive search.
(PDF)
![Page 6: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/6.jpg)
(policy iteration)
▪ Discrete Policy Space
A Sensitivity-Based View of Optimization
q+Dq
q
(perturbation analysis)
▪ Continuous Space
5
![Page 7: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/7.jpg)
Dynamic ProgrammingWorking locally in time and states:
optimal policy at k+1 optimal policy at k
6
k0 1 2 3
X(k)
1
2
3
4
5
6
a1
a2
a3
+1
),()()(K
ki
d
K
d
i
d
idk XFXfx
)5(*2
)2(*2
)4(*2
Problem: Under selectivity for time non-homogeneousLong-run average:
Does not depend on polices in any finite period
}|)({1
lim)( 0
1
0
xXXfEK
xK
kkk
N
![Page 8: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/8.jpg)
X(t)
t+Dtt
Stochastic Control
)()]([)]([)( tdWtXdttXtdX dd +
Problem: Non-smooth value functionLocal property leads to a differential equation Does not work for non-smooth value function
(viscosity solution)7
![Page 9: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/9.jpg)
8
Dynamic Programming:➢ Works backwards in time t+Dt t
➢ Local information
Any Weakness?➢ Not convenient for long-run average
➢ Not for Non-smooth value functions
(viscosity solution)➢ Degenerate processes not well explored
➢ Not necessary: under selectivity issue:
(long-run average not depend on transient actions)
![Page 10: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/10.jpg)
9
Sensitivity Based – PA and RO
PA: Given a sample path X perturbed sample path X(d)
i
bX
Xd
X Xd
T
![Page 11: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/11.jpg)
k0 1 2 3 4 5
d
d’K
AF
E
DC
B
J
I
H
G
Two Performance measures:
Total Reward on ABCDEF
Total Reward on AGHIJK
)(0 xd
)('0 xd
Two Polices: d and d’ xXX dd )0()0( '
10
Relative Optimization: Comparing two policies
![Page 12: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/12.jpg)
Relative Optimization
11
Two Policies:
,..,..,, 21 kPPP ,..',..,',' 21 kPPP,..,..,, 21 kfff ,..',..,',' 21 kfff,..,..,, 21 kggg ,..',..,',' 21 kgggValue function
'
Time non-homogeneous Markov chains:Trans. Prob. Matrices: Reward: Long-run average:
Sijkk ijPP )]|([
)(ifk
,....2,1,0kT
kkk Sfff ))(),...1((:
![Page 13: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/13.jpg)
Relative Optimization
12
+
xXXfgIPEK
K
kkkkk
K0
1
0
'|)'](')'[('1
lim'
Performance difference formula
+
xXXfgIPEK
K
kkkkk
K0
1
0
'|)']()[('1
lim
,' if
)]()[()](')'[( xfgIPxfgIP kkkkkk ++
for all x in S, and all k=0,1,2,… except for a finite period, or on a subsequence with,...., 21 kk .0lim
n
n k
n
HJB
![Page 14: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/14.jpg)
13
)()]([)]([)( tdWtXdttXtdX dd +
+Tdd xtXTXFdssXfEx
0})(|))(())(({)(
Stochastic Control
Finite horizon optimization problem (stationary)
)},({max)(* xx dd
d Goal: .x
![Page 15: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/15.jpg)
14
Ito formula: for a smooth function (x)
).()(2
1)()(])(|)]([{ 2 xxxxxtXtXE
dt
d +
Dynamic programming HJB equation
Ito-Tanaka formula: for a non-smooth function (x)
dtxxxxxtXtXdE )]()(2
1)()([])(|)]([{ 2 +
xXdtLEzz Tz + + )0('|)()]()([
dtztXdtLE X
z
2])(|0)([
where Z is the non-smooth point,
0)(' TLXz
)(z)(z+ : right-sided and left-sided derivative
local time
dt
dtdt 0lim
Derivatives??
![Page 16: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/16.jpg)
15
PDF for a non-smooth value function (x)
++ xXdttXfhhET
)0('|]))(')(''2
1'[(''
0
2
xXTLEzz Xz + + )0('|)(')]()([ '
In addition to the HJB equation at smooth points,we need at the non-smooth points
)()( zz +
Relative Optimization- Based on Comparison
![Page 17: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/17.jpg)
16
1. No viscosity solution is needed!2. The order in dt is at , dt
dtztXdtLE X
z
2])(|0)([
dt
dtdt 0lim
3. X(t) hits the non-smooth point z rarely, but each timeit hits it, the effect in dt is infinity. This cannot becaptured by derivatives.
Example: )()( tWtX
.)( xx 1. .0]0)0(|)([]0)0(|)([ XtdWEXtdE
.||)( xx 2. ]0)0(|)([ XtdE
dtXtWdE
2]0)0(||)(|[
![Page 18: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/18.jpg)
Other Applications???
![Page 19: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/19.jpg)
Global information in entire [0, T], or [0,inf]. Under-selectivity, and non-smoothness,
Degenerate processes explored in details Long-run average➢ State-classification➢ Bias optimality➢ Multi-class optimization
Insights for further research on control andstochastic processes➢ Local times on curves
No viscosity solution needed
Relative Optimization: (based on comparison of performance of any two policies)
18
![Page 20: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/20.jpg)
Performance Optimization
Dynamic Prog.
Relative Opt.
HJB,etc.
Solutions
![Page 21: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/21.jpg)
THANKS!
![Page 22: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/22.jpg)
Example: Long-run Average
Transition prob. matrices P, P’ n * n
Long-run average , ’Steady-state prob. , ’ n- row vectorReward function f n-column vector
Two policieswith finite states
Poisson Equation: (I-P)g + e =f (1)
g: potential, n-column vector, e= (1,1, …, 1)^T n-column vector
Noting ’=’f, ’e=1, left-multiplying (1) with ’ yieldsPDF:
gPP )'(''
➢ ’> if P’g>Pg , Policy iteration
➢ P* is optimal, if P*g*>=Pg* for all P, HJB eqn!
![Page 23: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/23.jpg)
Markov Decision Processes (MDPs) & Policy Iteration
gPPQg )'('''
P is optimal
P g > P g, for all P
^
^^ ^
1. ’> if P’g>Pg , with > for at least one component
2. Policy iteration: At any state find a policy P’ with P’g>Pg
4. Optimality Equations:
3. Improve performance iteratively,Stop when no improvement can be made
![Page 24: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/24.jpg)
Action a in {a1, a2,...,aN}Deterministic:
Stochastic: Distribution of X(k+1).
Reward: Transition:
Terminating:
),( xkf a
)(xF
k K+1 time1
2
3
4
5
6
7
8
states
a1a2
a3
a4 A policy d: a=d(k,X(k))
)](,[)1( kXkkX a+
),,( xkf dUnder policy d :
)],(,[)1( kXkkX d+
The Optimization Problem
1
![Page 25: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/25.jpg)
k0 1 2 3 4 5
A
F
E
D
C
B
Sample paths: ABCDEF, RSUV, states: 5,...,1,0),( kkX d
+1
)],([))(,()(K
ki
dddd
k KXFiXifx .)( xkX d
Total rewards from to xkX d )( )(KX d
A (Deterministic) Policy d
Optimization: },),(max{)(* dxx d
k
d
k
for all k, and x.
SR
U
V
2
![Page 26: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/26.jpg)
k0 1 2 3 4 5
X(k)
1
2
3
4
5
6
7
8
F(6)
F(5)
F(3)
d1
d2
d3d1
d2
d3
Dynamic Programming
Working backwards in time horizontally:optimal policy at k+1 optimal policy at k
3
![Page 27: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/27.jpg)
For stochastic systems, the mapping is replaced by a transition probability and performance is replaced by its mean
)|( xyPd
a
),()|(),()()|(),( *
1
*
1
** yxyPxkfyxyPxkf d
k
y
ad
k
y
dd
++ ++ a
,a .1,...,1,0 Kk,x
The Optimality Condition:
,a .1,...,1,0 Kk
)).,((),()),((),( *
1
**
1
* xkxkfxkxkf d
k
dd
k
d aa ++ ++
,x
(**)
4
![Page 28: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/28.jpg)
k0 1 2 3 4 5
d
d’
K
AF
E
DC
B
J
I
H
G
d
d
dd
1
2
3
4
5
6
7
8
Adding auxiliary paths starting from sample path d’ at each time k, but following policy d
Every sample path has a total reward )].([ ' kX dd
k
'
7
![Page 29: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/29.jpg)
k0 1 2 3 4 5
d
d’K
AF
E
DC
B
J
I
H
G
OQ
ML
R
P
S U
V
dd ' AGHIJK - ABCDEF
= (AGHIJK – AGHIJM) + (AGHIJM – AGHILQ)+ (AGHILQ – AGHOPQ) + (AGHOPQ – AGRSUV)+ (AGRSUV – ABCDEF)
= (JK – JM) + (IJM – ILQ) + (HILQ – HOPQ)+ (GHOPQ – GRSUV) + (AGRSUV – ABCDEF)
8
![Page 30: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/30.jpg)
k0 1 2 3 4 5
d
d’K
A
F
E
DC
B
J
I
H
G
O Q
ML
R
P
SU
V
dd '= (JK – JM) + (IJM – ILQ) + (HILQ – HOPQ)
+ (GHOPQ – GRSUV) + (AGRSUV – ABCDEF)We have
)](),4([)](),4('[ MFJfKFJfJMJK ++
)]},4([),4({)]},4('[),4('{ JJfJJf KK ++
}{}{ RSUVGRHOPQGHGRSUVGHOPQ ++
))]}1(',1([))1(',1({))]}1(',1('[))1(',1('{ 22 XXfXXf dd ++ …… …… ……
))]}4(',4([))4(',4({))]}4(',4('[))4(',4('{ XXfXXf KK ++
9
![Page 31: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/31.jpg)
Thus, we get the Performance difference formula (PDF)
' AGHIJK - ABCDEF
]
++ ++1
0
11 ))]}(',([))(',({))]}(',('[))(',('{K
k
kk kXkkXkfkXkkXkf
Optimality Condition:
)],,('[),(')],([),( 11 xkxkfxkxkf kk ++ ++
,'d .1,...,1,0 Kk,x
(**)This is the same as DP eq
10
![Page 32: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/32.jpg)
Comparison
DP: ~Riemann Integrationlocal information at time k; derivative in continuous time
DC(PDF): ~ Lebesgue Integration Global information in the
entire horizon [0, K]. more than derivative
11
![Page 33: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/33.jpg)
Application to Stochastic Control
![Page 34: From Perturbation Analysis to a New Paradigm of Optimization · From Perturbation Analysis to a New Paradigm of Optimization Xi-Ren Cao ... PA and RO are probably the only way to](https://reader030.vdocuments.us/reader030/viewer/2022041023/5ed41b20abf18346be07b115/html5/thumbnails/34.jpg)