Download - Static and Dynamic Optimization (42111)
![Page 1: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/1.jpg)
Static and Dynamic Optimization (42111)
Build. 303b, room 048Section for Dynamical Systems
Dept. of Applied Mathematics and Computer ScienceThe Technical University of Denmark
Email: [email protected]: +45 4525 3356mobile: +45 9351 1161
2019-11-24 14:37
Lecture 12: Stochastic Dynamic Programming
1 / 29
![Page 2: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/2.jpg)
Outline of lecture
Recap: L11 Deterministic Dynamic Programming (D)
Dynamics Programming (C)
Stochastics (Random variable)
Stochastic Dynamic Programming
Booking profiles
Stochastic Bellman
Stochastic optimal stepping (SDD)
Reading guidance: DO p. 83-92.
2 / 29
![Page 3: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/3.jpg)
Dynamic Programming (D)
Find a sequence of decisions ui i = 0, , 1, . . . N which takes the system
xi+1 = fi(xi, ui) x0 = x0
along a trajectory, such that the cost function
J = φ(xN ) +
N−1∑
i=0
Li(xi, ui)
is minimized.
3 / 29
![Page 4: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/4.jpg)
Dynamic Programming
The Bellman function (the optimal cost to go) is defined as:
Vi(xi) = minuN−1
i
Ji(xi, uN−1i )
and is a function of the present state, xi, and index, i.In particular
VN (xN ) = φN (xN )
Theorem
The Bellman function Vi, is given by the backwards recursion
Vi(xi) = minui
[
Li(xi, ui) + Vi+1(xi+1)]
xi+1 = fi(xi, ui) x0 = x0
with the boundary conditionVN (xN ) = φN (xN )
Bellman equation is a functional equation, gives a sufficient condition and V0(x0) = J∗. �
4 / 29
![Page 5: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/5.jpg)
Dynamic programming
ui = arg minui
[
Li(xi, ui) + Vi+1( fi(xi, ui)︸ ︷︷ ︸
xi+1
)
︸ ︷︷ ︸
Wi(xi,ui)
]
If a maximization problem: min → max.
5 / 29
![Page 6: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/6.jpg)
Type of solutions
−50
5
0
5
10
0
5
10
15
20
25
xt
x
Vt(x)
time (i)
V
Fish bone method (Graphical method)
Schematic method (Tables) − > programming
Analytical (e.g. Sep. of variable)
Analytical:
Guess the type of functionality in Vi(x) i.e. up to a number of parameter. Check if it satisfythe Bellman equation. This results in a (number of) recursion(s) for the parameter(s).
6 / 29
![Page 7: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/7.jpg)
Continuous Dynamic Programming
Find the input function ut, t ∈ R, (more precisely {u}T0 ) that takes the system
x = ft(xt, ut) x0 = x0 t ∈ [0, T ] (1)
such that the cost function
J = φT (xT ) +
∫ T
0Lt(xt, ut) dt (2)
is minimized. Define the truncated performance index (cost to go)
Jt(xt, {u}Tt ) = φT (xT ) +
∫ T
t
Ls(xs, us) ds
The Bellman function (optimal cost to go) is defined by
Vt(xt) = min{u}T
t
[
Jt(xt, {u}Tt )
]
We have the following theorem, which states a sufficient condition.
Theorem
The Bellman function Vt(xt), satisfy the equation
−∂Vt(xt)
∂t= min
ut
[
Lt(xt, ut) +∂Vt(xt)
∂xft(xt, ut)
]
Hamilton Jacobi Bellman (3)
This is a PDE with boundary conditions
VT (xT ) = φT (xT )
�7 / 29
![Page 8: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/8.jpg)
Continuous Dynamic Programming
Proof.
In discrete time we have the Bellman equation
Vi(xi) = minui
[
Li(xi, ui) + Vi+1(xi+1)]
with the boundary conditionVN (xN ) = φN (xN )
t+∆t
i+ 1
t
i
Then
Vt(xt) = minut
[∫ t+∆t
t
Lt(xt, ut) dt+ Vt+∆t(xt+∆t)
]
Apply a Taylor expansion on Vt+∆t(xt+∆t)
Vt(xt) = minut
[
Lt(xt, ut)∆t + Vt(xt) +∂Vt(xt)
∂xft ∆t+
∂Vt(xt)
∂t∆t+ o(|∆t|)
]
8 / 29
![Page 9: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/9.jpg)
Continuous Dynamic Programming
Proof.
Vt(xt) = minut
[
Lt(xt, ut)∆t+ Vt(xt) +∂Vt(xt)
∂xft∆t+
∂Vt(xt)
∂t∆t+o(|∆t|)
]
(just a copy)
Collect the terms which do not depend on the decision (ut):
Vt(xt) = Vt(xt) +∂Vt(xt)
∂t∆t+min
ut
[
Lt(xt, ut) ∆t+∂Vt(xt)
∂xft(xt, ut) ∆t
]
+o(|∆t|)
In the limit ∆t → 0 (and after divide with ∆t):
−∂Vt(xt)
∂t= min
ut
[
Lt(xt, ut) +∂Vt(xt)
∂xft(xt, ut)
]
9 / 29
![Page 10: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/10.jpg)
The HJB equation:
−∂Vt(xt)
∂t= min
u
[
Lt(xt, ut) +∂Vt(xt)
∂xft(xt, ut)
]
(just a copy)
The Hamiltonian function
Ht(xt, ut, λTt ) = Lt(xt, ut) + λT
t ft(xt, ut)
The HJB equation can also be formulated as
−∂Vt(xt)
∂t= min
ut
Ht(xt, ut,∂Vt(xt)
∂x)
Link to Pontryagins maximum principle:
λTt =
∂Vt(xt)
∂x
xt = ft(xt, ut) State equation
−λTt =
∂
∂xt
Ht Costate equation
ut = arg minut
[Ht] Optimality condition
10 / 29
![Page 11: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/11.jpg)
Motion control
Consider the systemxt = ut x0 = x0
and the performance index
J =1
2px2
T +
∫ T
0
1
2u2t dt
The HJB equation, (3), gives:
−∂Vt(xt)
∂t= min
ut
[1
2u2t +
∂Vt(xt)
∂xut
]
VT (xT ) =1
2px2
T
The minimization can be carried out and gives a solution w.r.t. ut which is
ut = −∂Vt(xt)
∂x
So if the Bellman function is known the control action, the decision can be determined from this.If the result above is inserted in the HJB equation we get
−∂Vt(xt)
∂t=
1
2
[∂Vt(xt)
∂x
]2
−
[∂Vt(xt)
∂x
]2
= −1
2
[∂Vt(xt)
∂x
]2
which is a partial differential equation with the boundary condition
VT (xT ) =1
2px2
T
11 / 29
![Page 12: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/12.jpg)
PDE:
−∂Vt(xt)
∂t= −
1
2
[∂Vt(xt)
∂x
]2
(just a copy)
Inspired of the boundary condition we guess on a candidate function of the type
Vt(x) =1
2stx
2
where the time dependence is in the function, st. Since
∂V
∂x= stx
∂V
∂t=
1
2stx
2
the following equation
−1
2stx
2 = −1
2(stx)
2
must be valid for any x, i.e. we can find st by solving the ODE
st = s2t sT = p
backwards. This is actually (a simple version of) the continuous time Riccati equation. Thesolution can be found analytically or by means of numerical methods. Knowing the function, st,we can find the control input
ut = −∂Vt(xt)
∂x= −stxt
12 / 29
![Page 13: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/13.jpg)
Stochastic Dynamic Programming
13 / 29
![Page 14: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/14.jpg)
The Bank loan
Deterministic:xi+1 = (1 + r)xi − ui x0 = x0
Stochastic:xi+1 = (1 + ri)xi − ui x0 = x0
0 5 10 15 20 250
1
2
3
4
5
6
7
8
9
10Rate of interests
%
time (month)
14 / 29
![Page 15: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/15.jpg)
0 1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
4 Bank balance
Bal
ance
time (year)0 1 2 3 4 5 6 7 8 9 10
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
4 Bank balance
Bal
ance
time (year)
15 / 29
![Page 16: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/16.jpg)
Discrete Random Variable
X ∈{x1, x2, ..., xm
}∈ R
n
pk = P{X = xk
}≥ 0
m∑
k=1
pk = 1
1 2 3 4 5 6 7 80
0.2
0.4
E
{
X}
=m∑
k=1
pkxk
E
{
g(X)}
=m∑
k=1
pkg(xk)
16 / 29
![Page 17: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/17.jpg)
Stochastic Dynamic Programming
Consider the problem of minimizing (in some sense):
J = φN (xN , eN ) +
N−1∑
i=0
Li(xi, ui, ei)
subject toxi+1 = fi(xi, ui, ei) x0 = x0
and the constraints(xi, ui, ei) ∈ Vi (xN , eN ) ∈ VN
ei might be vectors reflecting model errors or direct stochastic effects.
17 / 29
![Page 18: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/18.jpg)
Ranking performance Indexes
When ei and others are stochastic variable, what do we mean by that one strategy is better thananother.
In a deterministic situation we mean that
J1 > J2
(J1 (J2) being the objective function for strategy 1 (2)).
In a stochastic situation we can choose the definition
E
{
J1
}
> E
{
J2
}
but others do exists. This choice reflects some kind of average consideration.
18 / 29
![Page 19: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/19.jpg)
Example: Booking profiles
Normally a plane is over booked, ie. more tickets are sold than the number of seats xN . Let xi
be the number of sold tickets on the beginning of day i.
0 N21
If xN < xN we have empty seats - money out the window.If xN > xN we have to pay compensations - also money out the window.
So we want to find a strategy such we are minimizing:
E
{
φ(xN − xN )}
Let wi be the requests for a ticket on day i
(with probability: P{wi = k
}= pk)
and let vi be number of cancellations on day i
(with probability P{vi = k
}= qk).
Dynamics:
xi+1 = xi +min(ui, wi) − vi ei =
[wi
vi
]
Decision information: ui(xi).
19 / 29
![Page 20: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/20.jpg)
Stochastic Bellman Equation
Consider the problem of minimizing:
J = E
{
φ(xN , eN ) +
N−1∑
i=0
Li(xi, ui, ei)}
subject toxi+1 = fi(xi, ui, ei) x0 = x0
and the constraints(xi, ui, ei) ∈ Vi (xN , eN ) ∈ VN
Theorem
The Bellman function (optimal cost to go), Vi(xi) is given by the (backward) recursion:
Vi(xi) = minui
E
{
Li(xi, ui, ei) + Vi+1(xi+1)}
xi+1 = fi(xi, ui, ei)
VN (xN ) = E
{
φN (xN , eN )}
where the optimization is subject to the constraints and the available information . �
20 / 29
![Page 21: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/21.jpg)
Discrete (SDD) case
If ei is discrete, ie.
ei ∈{e1i , e2i , ... emi
}pki = P
{
ei = eki
}
k = 1, 2, ... m
then the stochastic Bellman equation can be expressed as
Vi(xi) = minui
m∑
k=1
pki
[
Li(xi, ui, eki ) + Vi+1(fi(xi, ui, e
ki )
︸ ︷︷ ︸
xi+1
)]
︸ ︷︷ ︸
Wi(xi,ui)
with boundary condition
VN (xN ) =m∑
k=1
pkNφN (xN , ekN )
The entries in the scheme below are now expected values (ie. weighted sums).
Wi ui Vi(xi) u∗i (xi)
xi 0 1 2 3
01234
21 / 29
![Page 22: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/22.jpg)
Optimal stochastic stepping (SDD)
Consider the systemxi+1 = xi + ui + ei x0 = 2,
whereei ∈
{−1 0 1
}ui ∈ {−1, 0, 1}∗
xi ∈ {−2, −1, 0, 1, 2}
and
pki eixi -1 0 1
-2 0 12
12
-1 0 12
12
0 12
0 12
1 12
12
0
2 12
12
0
J = E
{
x24 +
3∑
i=0
x2i + u2
i
}
Notice, no stochastic components.
22 / 29
![Page 23: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/23.jpg)
Optimal stochastic stepping (SDD)
Firstly, from
J = E
{
x24 +
3∑
i=0
x2i + u2
i
}
(no stochastics in cost) we establish V4(x4) = x24. We are assuming perfect state information.
x4 V4
-2 4-1 10 01 12 4
23 / 29
![Page 24: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/24.jpg)
Optimal stochastic stepping (SDD)
Then we establish the W3(x3, u3) function (the cost to go):
W3(x3, u3) =m∑
k=1
pk3
[
L3(x3, u3, ek3) + V4(f3(x3, u3, e
k3)
]
W3(x3, u3) = p13[x23 + u2
3 + V4(x3 + u3 + e13)]
e13, p13
+p23[x23 + u2
3 + V4(x3 + u3 + e23)]
e23, p23
+p33[x23 + u2
3 + V4(x3 + u3 + e33)]
e33, p33
︸ ︷︷ ︸
L3(x3,u3,ek
3)
︸ ︷︷ ︸
f3(x3,u3,ek
3)
or more compact:
W3(x3, u3) = x23 + u2
3 + p13[V4(x3 + u3 + e13)
]
+p23[V4(x3 + u3 + e23)
]
+p33[V4(x3 + u3 + e33)
]
24 / 29
![Page 25: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/25.jpg)
Optimal stochastic stepping (SDD)
W3(x3, u3) =3∑
k=1
pk[
x23 + u2
3 + V4(x3 + u3 + ek3)]
W3(0,−1) =1
2
[02 + (−1)2 + V4(0 − 1−1)
](−1,
1
2)
+0[02 + (−1)2 + V4(0− 1 + 0)
](0, 0)
+1
2
[02 + (−1)2 + V4(0− 1 + 1)
](1,
1
2)
=1
2(1 + 4) + 0 +
1
2(1 + 0) = 3
W3 u3
x3 -1 0 1-2 ∞ 6.5 5.5-1 4.5 1.5 2.50 3 1 31 2.5 1.5 4.52 5.5 3.5 ∞
x4 V4
-2 4-1 10 01 12 4
(just for reference)
25 / 29
![Page 26: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/26.jpg)
Optimal stochastic stepping (SDD)
W3 u3 V3(x3) u∗3(x3)
x3 -1 0 1-2 ∞ 6.5 5.5 5.5 1-1 4.5 1.5 2.5 1.5 00 3 1 3 1 01 2.5 1.5 4.5 1.5 02 5.5 3.5 ∞ 3.5 0
W2 u2 V2(x2) u∗2(x2)
x2 -1 0 1-2 ∞ 7.5 6.25 6.25 1-1 5.5 2.25 3.25 2.25 00 4.25 1.5 3.25 1.5 01 3.25 2.25 4.5 2.25 02 6.25 6.5 ∞ 6.25 -1
26 / 29
![Page 27: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/27.jpg)
Optimal stochastic stepping (SDD)
W1 u1 V1(x1) u∗1(x1)
x1 -1 0 1-2 ∞ 8.25 6.88 6.88 1-1 6.25 2.88 3.88 2.88 00 4.88 2.25 4.88 2.25 01 3.88 2.88 6.25 2.88 02 6.88 8.25 ∞ 6.88 -1
W0 u0 V0(x0) u∗0(x0)
x0 -1 0 12 7.56 8.88 ∞ 7.56 -1
Trace back: ui(xi). A feed back solution. Not a time function.
27 / 29
![Page 28: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/28.jpg)
Deterministic setting (xi+1 = xi + ui i = 0, ... 3)
i 0 1 2 3u∗i -1 0 0 0
Stochastic setting (xi+1 = xi + ui+ei i = 0, ... 3)
x0 u∗0 x1 u∗
1 x2 u∗2 x3 u∗
3-2 1 -2 1 -2 1-1 0 -1 0 -1 00 0 0 0 0 01 0 1 0 1 0
2 -1 2 -1 2 -1 2 0
28 / 29
![Page 29: Static and Dynamic Optimization (42111)](https://reader030.vdocuments.us/reader030/viewer/2022012101/61db64ca35558f20f110ef8f/html5/thumbnails/29.jpg)
Concluding remarks
Discrete state and decision space.
Approximation. Grid covering state and decision space.
Curse of dimensions - combinatoric explosion.
29 / 29