a short presentation of dynamic programming - cermics – centre...

46
MODE , Deterministic dynamic programming Stochastic dynamic programming A short presentation of dynamic programming Michel De Lara cermics, ´ Ecole nationale des ponts et chauss´ ees, ParisTech 7 juin 2006 Cours EDF, mai-juin 2006

Upload: others

Post on 07-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

A short presentation of

dynamic programming

Michel De Lara

cermics, Ecole nationale des ponts et chaussees, ParisTech

7 juin 2006

Cours EDF, mai-juin 2006

Page 2: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Outline of the presentation

1 Deterministic dynamic programming

2 Stochastic dynamic programming

Cours EDF, mai-juin 2006

Page 3: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

State equation

x(t + 1) = F (t, x(t), u(t)) , t ∈ 0, . . . ,T with x(0) = x0

where

x(t) ∈ X = Rn represents the system’s state vector at time t;

x0 ∈ X is the initial condition;

u(t) ∈ U = Rp represents decision or control vector;

F : N × X × U → X is the so called dynamics functionrepresenting the system’s evolution;

the horizon T ∈ N or T = +∞ stands for the term.

Cours EDF, mai-juin 2006

Page 4: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Constraints

the state constraints are respected at any time

x(t) ∈ D(t) ⊂ X ;

the control constraints are respected at any time

u(t) ∈ B(t, x(t)) ⊂ U ;

the final state achieves a fixed target C ⊂ X

x(T ) ∈ C = D(T ) .

Cours EDF, mai-juin 2006

Page 5: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Criterion

The trajectory space is the product space1X

T+1 × UT .

A generic element, a state and control trajectory, is denoted by2

(x(·), u(·)) = (x(0), . . . , x(T ), u(0), . . . , u(T − 1)) .

A criterion I is a function

I : XT+1 × U

T → R

which assigns a real number to a state and control trajectory.

1To be understood as XN× U

N in the infinite horizon case (T = +∞).2To be understood as (x(·), u(·)) = ((x(t))t∈N, (u(t))t∈N) in the infinite

horizon case (T = +∞).Cours EDF, mai-juin 2006

Page 6: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Additive criterion (finite horizon)

It is the most usual criterion defined in the finite horizon case bythe sum

I (x(·), u(·)) =T−1∑

t=0

L(t, x(t), u(t)) + M(T , x(T )).

Function L is referred to as the system’s instantaneous utility (orgain, profit, benefit, payoff, etc.) or instantaneous cost (or loss,disutility, etc. according to the situation), while function M isknown as the final utility or the final cost.

Cours EDF, mai-juin 2006

Page 7: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Additive criterion (infinite horizon)

In the infinite horizon case, we consider

I (x(·), u(·)) =

+∞∑

t=0

L(t, x(t), u(t)).

In economics, the usual present value (PV) approach correspondsto the time separable case with discounting criterion in the form of

I (x(·), u(·)) =

+∞∑

t=0

ρtL(x(t), u(t))

where ρ stands for a discount factor (0 ≤ ρ ≤ 1).

Cours EDF, mai-juin 2006

Page 8: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Quadratic case

The quadratic case corresponds to the situation where L and M

are quadratic in the sense that L(t, x , u) = x ′R(t)x + u′Q(t)u andM(T , x) = x ′R(T )x , where R(t) and Q(t) are positive matrices,giving

I (x(·), u(·)) =

T−1∑

t=0

[x(t)′R(t)x(t)+u(t)′Q(t)u(t)]+x(T )′R(T )x(T ) .

Cours EDF, mai-juin 2006

Page 9: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

The Maximin

The Rawlsian or maximin form in the finite horizon is

I (x(·), u(·)) = min

(

mint=1,...,T−1

L(t, x(t), u(t)),M(T , x(T ))

)

.

In the infinite horizon, we obtain

I (x(·), u(·)) = mint=0,...,+∞

L(t, x(t), u(t)).

Cours EDF, mai-juin 2006

Page 10: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Maximal intertemporal utility

We focus on the maximization problem in additive and separableform in finite horizon

I ? = sup(x(·),u(·))∈Tad(0,x0)

T−1∑

t=0

L(t, x(t), u(t)) + M(T , x(T )) ,

where the set of admissible trajectories Tad(0, x0) is defined asfollows.

Cours EDF, mai-juin 2006

Page 11: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Admissible trajectories

Definition

Let Tad(t, x) ⊂ XT+1 × U

T be defined by

(x(·), u(·)) ∈ Tad(t, x) ⇐⇒

x(t) = x

x(s + 1) = F (s, x(s), u(s))u(s) ∈ B(s, x(s))x(s) ∈ D(s)∀s ≥ t

Tad(t, x) is the set of trajectories which visit x at time t whilerespecting both the constraints and the dynamics after time t.

Cours EDF, mai-juin 2006

Page 12: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Viability kernel

Definition

The viability kernel at time s ∈ 0, . . . ,T is defined by:

Viab(s) :=

x ∈ D(s)

there exists decisions u(·)and states x(·) starting from x at time s

satisfying for any time t ∈ s, . . . ,Tdynamics x(t + 1) = F (t, x(t), u(t))and constraints u(t) ∈ B(t, x(t)) ,

x(t) ∈ D(t)

.

Notice that the viability kernel at horizon T is the target:

Viab(T ) = D(T ) = C .

Cours EDF, mai-juin 2006

Page 13: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Dynamic programming equation for viability kernels

Proposition

The viability kernel Viab(t) satisfies the backward induction

Viab(t) = x ∈ D(t) | ∃u ∈ B(t, x) ,

F (t, x , u) ∈ Viab(t + 1) ,

Viab(T ) = D(T ) .

Cours EDF, mai-juin 2006

Page 14: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Viable controls

For every point x inside the corridor Viab(t), there exists a controlwhich yields a solution x(t + 1) belonging to Viab(t + 1) and,consequently, to D(t + 1).

Definition

Viable controls are

Bviab(t, x) := u ∈ B(t, x) | F (t, x , u) ∈ Viab(t + 1) .

Cours EDF, mai-juin 2006

Page 15: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Value function

The value function V (t, x) at time t and for state x represents theoptimal value of the criterion over T − t periods, given that thestate of the system x(t) at time t is x . In particular V (0, x0) = I ?.

Definition

V (T , x) :=

M(T , x) if x ∈ D(T )−∞ otherwise,

and, for t = 0, ...,T − 1,

V (t, x) := sup(x(·),u(·))∈Tad(t,x)

(

T−1∑

s=t

L(s, x(s), u(s)) + M(T , x(T )))

.

We also set V (t, x) = −∞ whenever no feasibility occurs i.e.Tad(t, x) = ∅ or, equivalently, x 6∈ Viab(t).

V (0, x0) = I ?.Cours EDF, mai-juin 2006

Page 16: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Dynamic programming equation in finite horizon

Proposition

Assume no state constraint, namely D(t) = X. The value functionis solution of the following dynamic programming backward

equation (or Bellman equation), for t = T − 1, ..., 0:

V (T , x) = M(T , x)

V (t, x) = supu∈B(t,x)

(

L(t, x , u) + V(

t + 1,F (t, x , u)))

.

Cours EDF, mai-juin 2006

Page 17: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Thus, to evaluate the value V (t, .) at each time, we start from thefinal value V (T , .) = M(T , .) and then compute V (T − 1, .), andso on by backward induction.

Notice that the essence of dynamic programming is to replace oneoptimization problem over a trajectory space X

T × UT−1 by a

sequence of T optimization problems over the primitive space U.

Cours EDF, mai-juin 2006

Page 18: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Dynamic programming equation in finite horizon with

viability constraints

Proposition

V (T , x) = M(T , x),

∀x ∈ Viab(T )

V (t, x) = supu∈Bviab(t,x)

(

L(t, x , u) + V(

t + 1,F (t, x , u)))

,

∀x ∈ Viab(t).

Cours EDF, mai-juin 2006

Page 19: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Optimal feedback

Definition

An optimal feedback is any mapping υ? : 0, ...,T − 1 × X → U

such that any trajectory (x?(·), u?(·)) generated by

x?(0) = x0 , x?(t+1) = F (t, x?(t), u?(t)) , u?(t) = υ?(t, x(t)) ,

for t = 0, ...,T − 1, for any initial condition x0 ∈ D(0), belongs toTad(0, x0) and is an optimal feasible trajectory, that is

max(x(·),u(·))∈Tad(0,x0)

I (x(·), u(·)) = I (x?(·), u?(·)) .

Cours EDF, mai-juin 2006

Page 20: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Note that υ (greek letter upsilon) denotes a mapping from0, ...,T − 1 × X to U, while u denotes a variable (u ∈ U).

Proposition

For any time t and state x ∈ Viab(t), assume the existence of thefollowing feedback decision

υ?(t, x) ∈ arg maxu∈Bviab(t,x)

(

L(t, x , u) + V (t + 1,F (t, x , u)))

.

Then υ? is an optimal feasible feedback.

Cours EDF, mai-juin 2006

Page 21: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Proof

Recall that

(x(·), u(·)) ∈ Tad(t, x) ⇐⇒

x(t) = x

x(s + 1) = F (s, x(s), u(s))u(s) ∈ B(s, x(s))x(s) ∈ D(s)∀s ≥ t

For any x ∈ Viab(t), the admissible set Tad(t, x) is not empty andwe have

Cours EDF, mai-juin 2006

Page 22: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

V (t, x) = sup(x(·),u(·))∈Tad(t,x)

(

T−1∑

s=t

L(s, x(s), u(s)) + M(T , x(T )))

= supu(t)∈B(t,x)

(

sup8

>

>

>

>

>

>

>

>

>

>

<

>

>

>

>

>

>

>

>

>

>

:

u(t + 1), . . . , u(T − 1)x(t + 1) = F (t, x , u(t))x(s + 1) = F (s, x(s), u(s))x(s) ∈ D(s)u(s) ∈ B(s, x(s))s ≥ t + 1

L(t, x(t), u(t))

+T−1∑

s=t+1

L(s, x(s), u(s)) + M(T , x(T ))

)

= supu∈Bviab(t,x)

(

L(t, x , u)

+ sup(x(·),u(·))∈Tad(t+1,F (t,x,u))

T−1∑

s=t+1

L(x(s), u(s), s))

+ M(T , x(T ))

= supu∈Bviab(t,x)

(

L(t, x , u) + V (t + 1,F (t, x , u)))

.

Cours EDF, mai-juin 2006

Page 23: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Extension to Whittle criterion

Let us call a criterion I in Whittle form whenever it is given by thefollowing backward induction

I (x(·), u(·)) = C (0)

C (t) = ψ(t, x(t), u(t),C (t+1)) , t = 0, . . . ,T − 1

C (T ) = M(T , x(T )) ,

where ψ is either strictly increasing or continuously increasing in itslast argument. This form is adapted to maximin dynamicprogramming, and includes the additive case for whichψ(t, x , u,C ) = L(t, x , u) + C .

V (T , x) := M(T , x),

V (t, x) := supu∈B(t,x)

ψ(t, x , u,V (t + 1,F (t, x , u))) .

Cours EDF, mai-juin 2006

Page 24: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementDynamic programming equation

Dynamic programming equation in infinite horizon

x(t + 1) = F (x(t), u(t)) , t ∈ N

I (x(·), u(·)) =+∞∑

t=0

ρtL(x(t), u(t))

Proposition

V (x) = supu∈B(x)

(

L(x , u) + ρV(

F (x , u)))

.

Cours EDF, mai-juin 2006

Page 25: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

STOCHASTIC DYNAMIC PROGRAMMING

Cours EDF, mai-juin 2006

Page 26: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

State equation with random inputs

The uncertain dynamic model in discrete time is described by astate equation,

x(t+1) = F (t, x(t), u(t),w(t)) , t = 0, . . . ,T with x(0) = x0

where

the horizon T ∈ N or T = +∞ stands for the term;

x(t) ∈ X = Rn represents the system’s state vector at time t;

x0 ∈ X is the initial condition;

u(t) ∈ U = Rp represents decision or control vector;

w(t) stands for the uncertain variable, or disturbance, noise,taking its values in a set W = R

q;

F : N × X × U × W → X is the so called dynamics functionrepresenting the system’s evolution.

Cours EDF, mai-juin 2006

Page 27: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Constraints and viability

As in the certain case, we may require state and decisionconstraints to be satisfied. However, since state trajectories are nolonger unique, the following requirements depend upon thescenarios w(·) = (w(0), w(1). . . , w(T − 1)) ∈ W

T in a way thatwe shall specify later. The assertions below are thus to be taken in

a loose sense at this stage.

The state constraints are respected at any time t

x(t) ∈ D(t) ⊂ X .

The control constraints are respected at any time t

u(t) ∈ B(t, x(t)) ⊂ U .

The final state achieves a fixed target C ⊂ X

x(T ) ∈ C .

Cours EDF, mai-juin 2006

Page 28: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Admissible feedbacks

Solutions are no longer trajectories, as in the deterministic case,but are feedbacks.

Definition

Γ = γ : N × X → U

Γad = γ ∈ Γ | γ(t, x) ∈ B(t, x) ,

∀(t, x) ∈ 0, . . . ,T − 1 × X .

Cours EDF, mai-juin 2006

Page 29: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Solution maps

For γ ∈ Γ, let F γ denote the mapping F γ : N × X × W → X

defined byF γ(t, x ,w) := F (t, x , γ(t, x),w) .

Definition

The state map and control map are defined for any timet0 ∈ 0, . . . ,T byxF [t0, x0, γ,w(·)](t) = x(t) anduF [t0, x0, γ,w(·)](t) = u(t) = γ(t, x(t)) respectively,where x(·) satisfies the dynamics

x(t + 1) = F γ(t, x(t),w(t)) , t = t0, . . . ,T

and the initial condition x(t0) = x0.

Cours EDF, mai-juin 2006

Page 30: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Causality

It should be noticed that, with straightforward notations,

xF [t0, x0, γ,w(·)](t0) = x0

xF [t0, x0, γ,w(·)](t) = xF [t0, x0, γ, (w(t0), . . . ,w(t − 1))](t)for t ≥ t0 + 1

expressing thus a causality property, since the future states after t0

only depend upon the disturbances after t0.

Cours EDF, mai-juin 2006

Page 31: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Criteria to optimize

The criterion I now depends upon the scenarios: this raisesquestions as how to turn this family of values (one per scenario)into a single one to be optimized.

Cours EDF, mai-juin 2006

Page 32: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Additive criterion (finite time)

The additive and separable form in finite horizon is

I (x(·), u(·),w(·)) =

T−1∑

t=0

L(t, x(t), u(t),w(t)) + M(T , x(T ))

in which

L : N × X × U × W → R specifies the instantaneous cost (orloss, disutility, etc. according to the situation) when thecriterion I is minimized, and the instantaneous utility (or gain,profit, benefit, payoff, etc.) when the criterion I is maximized;

M : N × X → R, represents the final cost when the criterion I

is minimized, and the final utility else.

Cours EDF, mai-juin 2006

Page 33: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Additive criterion (infinite time)

The additive and separable form in the infinite horizon is

I (x(·), u(·),w(·)) =

+∞∑

t=0

L(t, x(t), u(t),w(t)) .

Cours EDF, mai-juin 2006

Page 34: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Multiplicative form

The multiplicative form is

I (x(·), u(·),w(·)) =T−1∏

t=0

L(t, x(t), u(t),w(t)) × M(T , x(T )) .

Cours EDF, mai-juin 2006

Page 35: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Probabilistic assumptions

Probabilistic assumptions on the uncertainty w(·) may be added,providing a stochastic nature to the problem.

Mathematically speaking, w(·) = (w(0),w(1), . . . ,w(T − 1)) is asequence of random variables defined over a measurable space(Ω,F) equipped with a probability P. When T = +∞, one ratherspeaks of a stochastic process.

The notation E refers to the mathematical expectation underprobability P. Recall that a random variable is a measurablefunction on (Ω,F).

Cours EDF, mai-juin 2006

Page 36: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Measurability assumptions

To be able to perform mathematical expectations, we are lead toconsider measurability assumptions. The sets X and U are nowassumed to be equipped with σ-fields X and U respectively, thedynamics is supposed to be measurable and, by a feedback, wenow implicitely mean a measurable feedback. From now on,

Γ := γ : N × X → U measurable .

Once a feedback γ is picked up in Γ, all the variables x(·), u(·) andw(·) become random variables defined over (Ω,F ,P), by means ofthe relations

x(t) = xF [0, x0, γ,w(·)](t) and u(t) = γ(t, x(t)) .

Thus, any quantity depending upon states, controls, disturbancesis now a random variable, hence, when bounded or nonnegative,admits an integral with respect to P.

Cours EDF, mai-juin 2006

Page 37: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

The i.i.d. case

Following a common hypothesis, we shall, for sake of simplicity,assume that the random variables w(·) are independent andidentically distributed (i.i.d.) under P.

In such a probabilistic context, we use the notation

E[a(w)] for the expected value of any integrable randomvariable a : W → R

and E[A(w(·))] for any random variable A : WT → R.

Cours EDF, mai-juin 2006

Page 38: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

The discrete i.i.d. case

For discrete probability laws, this means that

E(a(w)) =∑

w∈W

µ(w)a(w)

with µ the common discrete law on W of the random variablesw(t)3 and

E[A(w(·))] =∑

w0∈W

· · ·∑

wT−1∈W

A(w0, . . . ,wT−1)µ(w0) · · · µ(wT−1) .

3Thus, we can choose Ω = WT and P the product of T + 1 copies of µ.

Cours EDF, mai-juin 2006

Page 39: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

The continuous i.i.d. case

For continuous probability laws on W = Rq, this gives

E[a(w)] =

W

a(w)f (w)dw

with f the common density on W of the random variables w(t)4

and

E[A(w(·))] =

W

· · ·

W

A(w0, . . . ,wT−1)f (w0) · · · f (wT−1)dw0 · · · dwT−1 .

4Thus, we can choose Ω = WT and P the product of T + 1 copies of

f (w)dw .Cours EDF, mai-juin 2006

Page 40: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Minimal mean cost

Definition

For any admissible feedback strategy γ ∈ Γad and initial conditionx0 ∈ X, let us consider the expected criterion or mean cost

I (x0, γ) := E

[

I

(

xF [0, x0, γ,w(·)](·), uF [0, x0, γ,w(·)](·),w(·)

)]

.

The stochastic optimization problem corresponds to

I?(x0) = inf

γ∈Γad

I (x0, γ)

= infγ∈Γad

E

[

I

(

xF [0, x0, γ,w(·)](·), uF [0, x0, γ,w(·)](·),w(·)

)]

.

Cours EDF, mai-juin 2006

Page 41: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Optimal feedback

Definition

Any γ? ∈ Γad such that

I?(x0) = min

γ∈ΓadI (x0, γ) = I (x0, γ

?)

is an optimal feedback.

Cours EDF, mai-juin 2006

Page 42: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Stochastic dynamic programming equation in finite horizon

Definition

In absence of state constraints (D(t) = X for t = 0, . . . , T ), thevalue function V (t, x), is defined by the following backwardinduction:

V (T , x) := M(T , x),

V (t, x) := infu∈B(t,x)

E

[

L(t, x , u,w(t)) + V(

t + 1,F (t, x , u,w(t)))

]

.

Contrarily to the deterministic case, the value function is defined

by a backward induction relation and then one can prove that itcoincides with some optimal cost.

Cours EDF, mai-juin 2006

Page 43: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Optimal feedback

Assume no state constraint, namely D(t) = X. For any time t andstate x , assume the existence of the following feedback decision

γ?(t, x) ∈ arg minu∈B(t,x)

E

[

L(t, x , u,w(t))+V (t+1,F (t, x , u,w(t)))

]

.

Then γ? : (t, x) → γ?(t, x) is an optimal strategy, and, for anyx0 ∈ X, the optimal expected cost is given by

V (0, x0) = I?(x0) = I (x0, γ

?) .

Cours EDF, mai-juin 2006

Page 44: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Extension to Whittle criterion

Let us call a criterion I in strong Whittle form whenever it is givenby the following backward induction

I (x(·), u(·),w(·)) = C (0)

C (t) = g(t, x(t), u(t),w(t)) + β(t, x(t), u(t),w(t))C (t+1) ,

t = 0, . . . ,T − 1

C (T ) = M(T , x(T )) ,

where β(t, x(t), u(t),w(t)) > 0. Equivalently,

Cours EDF, mai-juin 2006

Page 45: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

I (x(·), u(·),w(·)) =∑T

t=0 β0β1 · · · βt−1gt

βt = β(t, x(t), u(t),w(t)) > 0 , t = 0, . . . ,T − 1gt = g(t, x(t), u(t),w(t)) , t = 0, . . . ,T − 1gT = M(T , x(T )) .

This form happens to be adapted to stochastic dynamicprogramming, and to include both the additive and multiplicativecases, with respectivelyg(t, x , u,w) = L(t, x , u,w), β(t, x , u,w) = 1 andg(t, x , u,w) = 0, β(t, x , u,w) = L(t, x , u,w).

Cours EDF, mai-juin 2006

Page 46: A short presentation of dynamic programming - CERMICS – Centre d…cermics.enpc.fr/~cohen-g/coursEDF/slides_prog_dyn.pdf · 2006-06-08 · R which assigns a real number to a state

MODE

,

Deterministic dynamic programmingStochastic dynamic programming

Problem statementStochastic dynamic programming equation

Value function

In absence of state constraints (D(t) = X for t = 0, . . . , T ), thevalue function V (t, x) is defined by the following backwardinduction:

V (T , x) := M(T , x),

V (t, x) := infu∈B(t,x)

E

[

g(t, x , u,w(t))

+ β(t, x , u,w(t))V (t + 1,F (t, x , u,w(t)))

]

.

Cours EDF, mai-juin 2006