information patterns and optimal stochastic control...

,

The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

A general dynamic programming equationFrom dynamic to static problems

Information patternsand

optimal stochastic control problems

Michel De Lara

cermics, École nationale des ponts et chaussées, ParisTech

31 mai 2006

Cours EDF, mai-juin 2006

,




A linear control-oriented stochastic model

Controls: u0 ∈ U0 = R and u1 ∈ U1 = R

Random issue: ω = (x0,w) ∈ Ω = R× R

State equations

{x1 = x0 + u0x2 = x1 − u1 .

Output equations

{y0 = x0y1 = x1 + w .

H. S. Witsenhausen. A counterexample in stochastic optimalcontrol. SIAM J. Control, 6(1):131–147, 1968.


,




A LQG problem with linear solution

x0 and w are Gaussian independent.

inf E(k2u20 + x22 ) ,{

x1 = x0 + u0x2 = x1 − u1{

u0 measurable w.r.t. y0 = x0

u1 measurable w.r.t. (y0, y1) = (x0, x1 + w) .

Solution u0 = υ0(y0), u1 = υ1(y0, y1) ,

where υ0 and υ1 are affine functions: u0 = υ0(y0) = 0 andu1 = υ1(y0, y1) = y0.

This is the so called classical information pattern


,




Still LQG but. . . nonlinear solution!

x0 and w are Gaussian independent.

inf E(k2u20 + x22 ) ,{

x1 = x0 + u0x2 = x1 − u1

{u0 measurable w.r.t. y0 = x0u1 measurable w.r.t. y1 = x1 + w .

Solution u0 = υ0(y0), u1 = υ1(y1)

is known to exist and to be (highly) nonlinear!

This is the Witsenhausen counterexample


,




Classical information pattern

Sequential and memory of past knowledge:1 agent 0 observes y0;2 agent 1 observes y0 and y1.

Accumulation of information renders possible the “usual”stochastic dynamic programming with

value function parameterized by the state (usually finitedimensional);infimum taken over the control variables.


,




Nonclassical information patterns

In the sequential case, no memory of past knowledge:agent 1 observes y1.

Signaling/dual effect of control:direct cost minimization versusindirect effect on output available for control.

Interaction between information and control.

Stochastic dynamic programming due to sequentiality butvalue function parameterized by the distribution of the state(infinite dimensional); infimum taken over controls mappings(idem).

Witsenhausen H. S. A standard form for sequential stochasticcontrol. Mathematical Systems Theory, 7(1):5–11, (1973).


,




Outline of the presentation

1 The Witsenhausen counterexample

2 Introductory remarks on information and “dependency”

3 On variables and random variables

4 Information patterns in the linear-quadratic Gaussian example

5 A general dynamic programming equation

6 From dynamic to static problems


,




INTRODUCTORY REMARKS ON

INFORMATION

AND

“DEPENDENCY”


,




Measurability versus functionality

Let f : X× Y→ R be a function of two variables x and y .How can we express that f does not depend upon y?

Either by measurability considerations

f (x , y) = f (x , y ′) , ∀(y , y ′) ∈ Y2 ( ∂f∂y

= 0 in the smooth case)the level curves of f are “vertical”f is measurable with respect to the σ-field X ⊗ {∅, Y}generated by the canonical projection $X : X× Y→ X,(x , y) 7→ x

Or by functional arguments

There exists a measurable g : X→ R such thatf (x , y) = g(x), ∀(x , y) ∈ X× Yf is a measurable function of the projection $X : X× Y→ X


,




We make implicit reference to σ-fields X and Y attached to X andY .

f (x , y) = f (x , y ′) , ⇐⇒ ∃g : X→ R ,

∀(x , y , y ′) ∈ X× Y2 f (x , y) = g(x) , ∀(x , y) ∈ X× Y

σ(f ) ⊂ X ⊗ {∅, Y} ⇐⇒ ∃g : X→ R measurable

f = g ◦$X


,




Defining information

Let Ω be a set. Any ω ∈ Ω is an issue and one of these issues ω0governs a phenomenon; it is however unknown. Having informationabout ω0 is being able to localize this issue in Ω. Supposetherefore that we know that ω0 belongs to a subset G of Ω. Twopolars cases are the following:

if G = Ω, there is no information;

while, if G is a singleton, there is full information.

In between, all we know is that we cannot distinguish ω0 from anyother ω ∈ G .


,




In all generality, an information structure may thus be defined as acollection G of subsets of Ω (G ⊂ 2Ω) called events.Any event contains issues undistinguishable to the observer.The more events in G, the more information.


,




G is an algebra: non empty, stable under complementationand finite union. This is a weak requirement, shared by a largenumber of subsets of 2Ω.

G is a partition field (or π-field): non empty, stable undercomplementation and unlimited union. This is a strongrequirement, shared by a few number of subsets of 2Ω.Partition fields are in one-to-one correspondance withpartitions.

G is a σ-algebra (or σ-field, or even a field): non empty, stableunder complementation and countable union. Being one ofthe basic concepts of measure theory, it is adapted to theintroduction of general probabilities on Ω.


,




Functional dependence of E(X | Y ) onthe conditioning random variable Y

Let X and Y be two random variables on a probability space(Ω,F , P). Assume that X is integrable.

Since E(X | Y ) is σ(Y )-measurable, there exists a measurablefunction f : R→ R such that

E(X | Y ) = f (Y ) , P− a.s.


,




For instance, if Ω is a finite probability space, with P({ω}) > 0 forall ω ∈ Ω, the function f is given, on the range Y (Ω), by

f (y) = E(X | Y = y)

=∑

x ′∈X (Ω)

x ′P(X = x ′ | Y = y)

=∑

x ′∈X (Ω)

x ′∫Ω 1{x ′}(X (ω))1{y}(Y (ω))P(dω)∫

Ω 1{y}(Y (ω))P(dω)

=∑

x ′∈X (Ω)

x ′∑

ω∈Ω 1{x ′}(X (ω))1{y}(Y (ω))P({ω})∑ω∈Ω 1{y}(Y (ω))P({ω})


,




Double dependency upon Y

To stress upon the functional dependencies, we shall write

fP,X ,Y (y) =∑

x ′∈X (Ω)

x ′∫Ω 1{x ′}(X (ω))1{y}(Y (ω))P(dω)∫

Ω 1{y}(Y (ω))P(dω)

so thatE(X | Y )(ω) = fP,X ,Y (Y (ω)) .

There is thus a double (dual) dependency of E(X | Y ) upon Y :

one is pointwise, in the sense that E(X | Y )(ω) depends uponY (ω), so that whenever Y (ω) = Y (ω′), we haveE(X | Y )(ω) = E(X | Y )(ω′);

the other is functional, in the sense thatE(X | Y = ·) = fP,X ,Y (·) depends upon P,X ,Y .


,




Infimum over variables versus over mappings

Let X and Y be two finite sets, and J : U×X×Y→ R a criterion.The formula

infU:X→U

∑

x∈X

∑

y∈Y

J(U(x), x , y) =∑

x∈X

inf

u∈U

∑

y∈Y

J(u, x , y)

is the elementary version of measurable selection theorems like

infU�Z

E[J(·,U(·))] = E[ infu∈U

E[J(·, u) | Z ]]

where U � Z means that the mapping U : Ω→ U is measurablewith respect to the random variable Z , that is σ(U) ⊂ σ(Z ).


,




ON VARIABLES

AND

RANDOM VARIABLES


,




Ordinary variables

State and output equations

x1 = x0 + u0x2 = x1 − u1y0 = x0y1 = x1 + w

are, at this stage, no more than relations between simple variables.


,




Primitive sample space

We endow X0 = R and W = R, with respective σ-fieldsX0 = B(R) and W = B(R), with two probabilities PX0 and PWwith finite second moments, and put

Ωdef= X0 ×W = R

2

with product probability P = PX0 ⊗ PW on the σ-field

F = X0 ⊗W = B(R2) .

The primitive probability space is

(Ω,F , P) = (X0 ×W,X0 ⊗W, PX0 ⊗ PW ) .


,




Primitive random variables

Let us introduce the coordinate mappings

X0 : Ωdef= X0 ×W→ X0 (x0,w) 7→ x0

andW : Ω

def= X0 ×W→ X0 (x0,w) 7→ w .

Under product probability P = PX0 ⊗ PW , X0 and W areindependent random variables.

We follow the tradition to label random variables, hereX0 : Ω→ X0 and W : Ω→W, by capital letters, while ordinaryvariables are x0,w .


,




Policies, control designs, information patterns

At this stage, only X0 and W are random variables.

u0, u1, x1, x2, y0, y1 are “ordinary variables” belonging to thespaces U0 = U1 = X1 = X2 = Y0 = Y1 = R.

They have to be turned into random variables U0, U1, X1, X2, Y0,Y1 (capital letters) to give a meaning to E(k

2U20 + X22 ).

There are two ways to do this,

either by policies,

or by control designs.


,




Define the history space

Hdef= Ω×U0 × U1 = X0 ×W× U0 ×U1

Ω H = Ω× U0 × U1U0 ↙↘ U1 υ0 ↙↘ υ1

U0 U1 U0 U1


,




Policies (measurable point of view)

A policy is a random variable (U0,U1) : (Ω,F)→ U0 × U1.

Given a policy (U0,U1), we define the state random variables by

{X1 = X0 + U0X2 = X1 − U1 .

There remains to express the information pattern, namely how U0and U1 may depend upon the other random variables, let them beX0, W , X1, and possibly upon themselves.


,




σ-fields and signals on the sample space Ω

Policies are measurable mappings over the space Ω = X0 ×W withσ-field

F = X0 ⊗W .

Expressing information patterns, or dependencies, boils down toconsidering

either sub σ-fields G of F ,

or signals, that is measurable mappings Y : (Ω,F)→ (Y,Y).

To any signal is associated the subfield σ(Y ) = Y −1(Y) generatedby Y .

Not all σ-fields of Ω may be obtained as σ(Y ), where Y takes value in a

Borel space. Counterexample is given by the σ-field consisting of

countable subsets and their complements.


,




Information patterns and policies

Let G0 and G1 be two subfields of F = X0 ⊗W.

Consider the optimization problem

infU0�G0,U1�G1

E(k2U20 + (X1 − U1)2)

where we restrict the class of policies (U0,U1) to those such thatU0 is measurable with respect to G0, and U1 to G1.

It may happen that the measurability constraints U0 � G0,U1 � G1lead to an empty set of policies.


,




For instance, the Witsenhausen counterexample is the case

G0 = σ(Y0) and G1 = σ(Y1)

where the observations Y0 and Y1 are the random variables

Y0 = X0 and Y1 = X1 + W = X0 + U0 + W

infU0�Y0,U1�Y1

E(k2U20 + (X1 − U1)2)

means that the first decision U0 may only depend upon Y0, and U1upon Y1.


,




Control design (functional point of view)

Define the history space

Hdef= Ω×U0 × U1 = X0 ×W× U0 ×U1

together with σ-field

H = X0 ⊗W ⊗ U0 ⊗U1 .

A control design is a couple of measurable mappings υ0 : H→ U0and υ1 : H→ U1.


,




Given a control design (υ0, υ1), let random variables U0, X1, U1and X2 be such that

U0 = υ0(X0,W ,U0,U1)X1 = X0 + U0U1 = υ1(X0,W ,U0,U1)X2 = X1 − U1 .

Notice that the equations giving U0 and U1 are implicit ones and

may not admit solutions!


,




Sequential control design (functional point of view)

Define the sequential history spaces

H0def= Ω = X0 ×W (initial “state”)

H1def= Ω× U0 = X0 ×W× U0 (first “state”)

H2def= Ω× U0 × U1 = X0 ×W× U0 × U1 (final “state”)

so that H2 = H, together with σ-fields

H0def= F = X0 ⊗W

H1def= F ⊗ U0 = X0 ⊗W ⊗ U0

H2def= F ⊗ U0 ⊗ U1 = X0 ⊗W ⊗ U0 ⊗ U1


,




A sequential control design is a couple of measurable mappingsυ0 : H0 = X0 ×W→ U0 and υ1 : H1 = X0 ×W× U0 → U1.

Given a sequential control design (υ0, υ1), we define four randomvariables sequentially by

U0 = υ0(X0,W )X1 = X0 + U0U1 = υ1(X0,W ,U0)X2 = X1 − U1 .


,




These equations already express part of the information pattern,namely causality and sequentiality, by the way how U0 and U1depend functionaly upon the other random variables.

Indeed, the random variables are defined in the order above in sucha way that the controls U0 and U1 may depend only upon pastrandom variables.

However, there remains to express the rest of the informationpattern, outside sequentiality.


,




σ-fields and signals on the history space H

Control designs are mappings over the history spaceH = X0 ×W× U0 × U1.Expressing information patterns, or dependencies, boils down toconsidering

either σ-fields I ⊂ H = X0 ⊗W ⊗ U0 ⊗ U1,

or signals on H, that is measurable mappingsy : (H,H)→ (Y,Y).

To any signal is associated the subfield σ(y) = y−1(Y) ⊂ Hgenerated by y .


,




Let I0 ⊂ X0 ⊗W and I1 ⊂ X0 ⊗W ⊗ U0.

Consider the following optimization problem

infυ0�I0,υ1�I1

E(k2U20 + (X1 − U1)2)

where we recall that

U0 = υ0(X0,W ) and U1 = υ1(X0,W ,U0) .

Here, the set of control designs υ0 � I0, υ1 � I1 is not empty.


,




Define the signals

y0def= x0 : H = X0 ×W× U0 ×U1 → R , (x0,w , u0, u1) 7→ x0

and

y1 : H = X0 ×W×U0 ×U1 → R , (x0,w , u0, u1) 7→ x0 + u0 + w

The Witsenhausen counterexample is

infυ0�y0,υ1�y1

E(k2U20 + (X1 − U1)2)

with

U0 = υ0(X0,W ) , X1 = X0 + U0 , U1 = υ1(X0,W ,U0) ,

where υ0 depends only upon y0, and υ1 depends only upon y1.Cours EDF, mai-juin 2006

,




ω 7→ ω , h0 7→ (h0, υ0(h0)) , h1 7→ (h1, υ1(h1))

ΩH∅−→ H0

Hυ0−→ H1Hυ1−→ H2yυ0yυ1

U0 U1

FH−1

∅←− H0H−1υ0←− H1

H−1υ1←− H2∪ ∪I0 I1xυ−10

xυ−11U0 U1


,




From fixed to moving σ-fields

Let

S∅ : Ω = X0 ×W→ H0 = X0 ×W , (x0,w) 7→ (x0,w) .

We have

G0 = S−1∅ (I0)

= S−1∅ (σ(x0))

= S−1∅ (X0 ⊗ {∅, W} ⊗ {∅, U0})

= X0 ⊗ {∅, W}


,




For control design

υ0 : X0 ×W→ U0 ,

we define

Hυ0 : X0 ×W→ H1 = X0 ×W× U0 , (x0,w) 7→ (x0,w , υ0(x0,w)) .

With U0 = υ0(X0,W ), we have

GU01 = H−1υ0

(I1)


,




ΩH∅−→ (H0,H0)

Hυ0−→ (H1,H1)Hυ1−→ H2

∪

G0H−1υ0←− I0 ∪

G1H−1υ1←− I1


,




INFORMATION PATTERNS

IN THE

LINEAR-QUADRATIC GAUSSIAN EXAMPLE


,




Information patterns: policy case

Recall that

Y0 = X0X1 = X0 + U0Y1 = X1 + W = X0 + U0 + W .

LetG0 = σ(Y0) = σ(X0) ⊂ F = X0 ⊗W


,




and G1 ⊂ F = X0 ⊗W be one of the following

G1 =

σ(X0,W ) (noises)σ(X1) (Markovian)σ(Y0,U0,Y1) (full memory)σ(Y0,Y1) (partial memory)σ(U0,Y1) (Bismut)σ(Y1) (Witsenhausen)

⊂ F


,




Dual effet

To stress upon the dependency on the initial policy U0, denote

GU01 =

σ(X0,W ) = Fσ(X1) = σ(X0 + U0)σ(Y0,U0,Y1) = σ(X0,U0,X0 + U0 + W )σ(Y0,Y1) = σ(X0,X0 + U0 + W )σ(U0,Y1) = σ(U0,X0 + U0 + W )σ(Y1) = σ(X0 + U0 + W )


,




and consider the optimization problem

infU0�G0,U1�G

U01

E[k2U20 + (X1 − U1)2] .

which depends on two ways upon U0:

by the cost [k2U20 + (X1 − U1)2]

by the information GU01 .


,




Exploiting sequentiality

infU0�G0,U1�G

U01

E[k2U20 + (X1 − U1)2]

= infU0�G0

(k2E[U20 ] + inf

U1�GU01

E[(X1 − U1)2]

)

because U0 does not depend on U1 .

Thus, by sequentiality, we are led to consider the problem

infU1�G

U01

E[(X1 − U1)2] .


,




A candidate for value function

infU1�G

U01

E[(X1 − U1)2] = E[ inf

u1∈U1E((X1 − u1)

2 | GU01 )]

by a measurable selection theorem

= E[(X1 − E(X1 | G

U01 ))2

]

by definition of E(X1 | GU01 )

= E[var[X1 | GU01 ]] .


,




The dual effect of U0

infU0�G0,U1�G

U01

E[k2U20 + (X1 − U1)2]

= infU0�G0 E[k2 U20︸︷︷︸

pointwise

dependence

on U0

+var[X1 | GU01 ]︸︷︷︸

functional

dependence

on U0

]


,




The control variable enters a conditioning term: this is an exampleof the so called dual effect where the decision has an impact onfuture decisions by providing more or less information, in additionto contributing to cost minimization.

K. Barty, P. Carpentier, J-P. Chancelier, G. Cohen, M. De Lara,and T. Guilbaud. Dual effect free stochastic controls. Annals ofOperations Research, Volume 142, Number 1, February 2006,Pages 41 - 62.


,




Full noise observation pattern

Suppose that noises are sequentially observed in the sense that

G0 = σ(Y0) = σ(X0) and GU01 = σ(X0,W ) = F .

Notice the absence of dual effect, since GU01 does not depend uponU0.


,




Thus, since X1 ∈ GU01 = σ(X0,W ), we have

var[X1 | GU01 ] = var[X1 | σ(X0,W )] = 0

We obtain

infU0�G0,U1�G

U01

E[k2U20 + (X1 − U1)2] = k2 inf

U0�X0E[U20 ]

with the obvious solution

U]0 = 0 and U]1 = E(X

]1 | G

U]0

1 ) = X]0 + U

]0 = X

]0 = Y

]0 .


,




Markovian pattern

The Markovian pattern is the case where the state is sequentiallyobserved:

G0 = σ(Y0) = σ(X0) and GU01 = σ(X1) = σ(X0 + U0) .

Dual effect holds true, in the sense that GU01 indeed depends uponU0.However, here again since X1 � G

U01 , we have

var[X1 | GU01 ] = var[X1 | X1] = 0

giving solution

U]0 = 0 and U]1 = E(X

]1 | X

]1) = X

]1 = X

]0 + U

]0 = X

]0 = Y

]0 .


,




Policy independence of conditional expectation

GU01 depends upon U0, but not var[X1 | GU01 ].

H. S. Witsenhausen. On policy independence of conditionalexpectations. Information and Control, 28(1):65–75, 1975.

“If an observer of a stochastic control system observes both thedecision taken by an agent in the system and the data that wasavailable for this decision, then the conclusions that the observercan draw do not depend on the functional relation (policy, controllaw) used by this agent to reach his decision. ”


,




Let

x1 : H = X0 ×W× U0 × U1 → R , (x0,w , u0, u1) 7→ x0 + u0

Recall thatHυ0(x0,w) = (x0,w , υ0(x0,w))

so thatX1 = x1 ◦ Hυ0


,




Strictly classical pattern:

perfect recall plus control transmission

G0 = σ(Y0)

andGU01 = σ(X0,U0,X0 + U0 + W ) = σ(X0,W )

Notice that GU01 , which seemed to depend upon policy U0 (or uponcontrol design υ0), is in fact policy independent: absence of dualeffect.


,




We obtain

var[X1 | GU01 ] = var[X1 | σ(X0,W )] = 0

with the solution

U]0 = 0 and U]1 = E(X

]1 | G

U]0

1 ) = X]0 + U

]0 = X

]0 = Y

]0 .


,




Classical pattern: perfect recall

G0 = σ(Y0) and GU01 = σ(Y0,Y1) = σ(X0,U0 + W ) .

Since U0 � σ(X0), there exists a measurable υ0 : X0 → U0 suchthat U0 = υ0(X0). Thus, U0 + W − υ0(X0) isσ(X0,U0 + W )-measurable and we can write

GU01 = σ(Y0,Y1) = σ(X0,U0 + W ) = σ(X0,W ) .

This absence of dual effect is related to the linear structure of theproblem. We obtain

var[X1 | GU01 ] = var[X1 | σ(X0,W )] = 0

with the same solution as hereabove.


,




Nonclassical pattern: control recall

In this nonclassical pattern

G0 = σ(Y0)

GU01 = σ(U0,Y1) = σ(U0,X0 + U0 + W ) = σ(U0,X0 + W )

and thus dual effect holds true.


,




ε-optimal policies

Bismut, J., An example of interaction between information andcontrol: the transparency of a game Automatic Control, IEEETransactions on Volume 18, Issue 5, Oct. 1973, 518 - 522

For ε > 0, take

U0 = εY0 = εX0 and U1 =U0ε

= X0

which yields the cost

k2U20 + (X0 + U0 − U1)2 = ε2(k2 + 1)X 20


,




Nonclassical pattern: no recall

G0 = σ(Y0) and GU01 = σ(Y1) = σ(X0 + U0 + W )

Notice that, since U0 � X0, we may take X1 = X0 + U0 instead ofU0 as policy:

infU0�G0,U1�G

U01

E[k2U20 + (X1 − U1)2] =

infX1�X0

E[k2(X1 − X0)2 + var[X1 | X1 + W ]] .


,




Signalling

infX1�X0

E[k2(X1 − X0)2 + var[X1 | X1 + W ]]

Setting X1 = 0 kills the var[X1 | X1 + W ] term, but has a costk2X 20 .

Setting X1 = X0 kills the (X1 − X0)2 term, but has a cost

var[X0 | X0 + W ].

In between. . .


,




For θ : Y1 = X0 → X1, we can show that

var[θ(X0) | θ(X0) + W ] = Fθ(θ(X0) + W )

with

Fθ(y1) =

∫ +∞−∞ dy0 exp(−

y20 +θ(y0)2−2y1θ(y0)2 )θ(y0)

2

∫ +∞−∞ dy0 exp(−

y20 +θ(y0)2−2y1θ(y0)2 )

−

(∫ +∞−∞ dy0 exp(−

y20 +θ(y0)2−2y1θ(y0)2 )θ(y0)∫ +∞

−∞ dy0 exp(−y20 +θ(y0)

2−2y1θ(y0)2 )

)2.


,




A GENERAL DYNAMIC PROGRAMMING EQUATION

Witsenhausen H. S. A standard form for sequential stochasticcontrol. Mathematical Systems Theory, 7(1):5–11, (1973).


,




Standard form

ΩH∅−→ H0

Hυ0−→ H1Hυ1−→ H2yS0yS1

yS2

Z0F

υ00−→ Z1

Fυ11−→ Z2yυ0

yυ1U0 U1


,




State-observation spaces and dynamics

The initial state-observation space Z0 contains all the randomissues:

Z0 = H0 = Ω = X0 ×W

I0 = σ(x0) ⊂ Z0

The first state-observation space Z1 is designed in such a way that

it contains the image of the dynamics x0 7→ x0 + u0;it supports the observation field;it contains variables needed to propagate the followingdynamics.

F0 : Z0 ×U0 → Z1 = X1 × Y1 × U0

(x0,w , u0) 7→ (x0 + u0, x0 + u0 + w , u0)

I1 = σ(y1) ⊂ Z1Cours EDF, mai-juin 2006

,




In the same way,

F1 : Z1 × U1 → Z2 = X2 ×U0

(x1,w , u0, u1) 7→ (x1 − u1, u0)


,




∆0 = {ζ0 : Ω→ U0, ζ0 � I0} , ∆1 = {ζ1 : X1×W×U0 → U1, ζ1 � I1}

Define, for any ζ = (ζ0, ζ1) ∈ ∆0 ×∆1,

F ζ00 (x0,w)def= F0(x0,w , ζ0(x0,w))

Z ζ1 (x0,w) = Fζ00 (x0,w)

F ζ11 (x1, y1, u0)def= F1(x1, y1, u0, u1, ζ1(x1, y1, u0))

.


,




Z ζ2 (x0,w) = Fζ11 (Z

ζ1 (x0,w), ζ1(Z

ζ1 (x0,w))

LetΦ : Z2 → R , Φ(x2, u0) = k

2u20 + x22

and consider

infζ0∈∆0

infζ1∈∆1

∫

ΩdP(ω)Φ(Z ζ2 (ω))


,




Probability measures as states

In the standart form, the so called value functions are defined ondifferent spaces of probability measures P(Zt), t = 0, 1, 2.

The mappings F ζ0 : Z0 → Z1 and Fζ1 : Z1 → Z2 induce mappings

(F ζ0 )? : P(Z0)→ P(Z1) ,

(F ζ1 )? : P(Z1)→ P(Z2)

so that (F ζt )?(πt) is a probability on Zt+1, image of πt by the

mapping F ζt .


,




Value functions

The value functions Vt : P(Zt)→ R are defined by induction by

V2(π2)def= 〈π2,Φ〉 =

∫

Z2

Φ(z2)π2(dz2)

V1(π1)def= infζ1�I1 V2

((F ζ1 )?(π1)

)

V0(π0)def= infζ0�I0 V1

((F ζ0 )?(π1)

).


,




Optimal cost is V0(P)

infζ0∈∆0

infζ1∈∆1

∫

ΩdP(ω)Φ(Z ζ2 (ω)) = inf

ζ0∈∆0inf

ζ1∈∆1

∫

ΩdP(ω)Φ(F ζ11 (Z

ζ1 (ω)))

= infζ0∈∆0

infζ1∈∆1

∫

ΩdP(ω)Φ(F ζ11 (F

ζ00 (ω)))

= infζ0∈∆0

infζ1∈∆1

〈P,Φ ◦ F ζ11 ◦ F

ζ00

〉

= infζ0∈∆0

infζ1∈∆1

〈(F ζ11 )?(F

ζ00 )?P,Φ

〉

= infζ0∈∆0

infζ1∈∆1

V2

((F ζ11 )?(F

ζ00 )?P

)

= infζ0∈∆0

V1

((F ζ00 )?P

)

= V0(P) .Cours EDF, mai-juin 2006

,




Partial information dynamic programming equation

V2

((F ζ1 )?(π1)

)=

∫

Z2

Φ(z2)(Fζ1 )?(π1)(dz2)

=

∫

Z1

Φ(F ζ1 (z1))π1(dz1)

=

∫

Z1

Φ(F1(z1, ζ(z1))π1(dz1)

= Eπ1 [Eπ1(Φ(F1(z ·, ζ(·)))]

= Eπ1 [Eπ1(Φ(F1(·, ζ(·))) | I1)]


,




so that

V1(π1) = infζ1�I1

Eπ1 [Eπ1(Φ(F1(ζ(·), ·)) | I1)]

= Eπ1

[inf

u1∈U1Eπ1(Φ(F1(·, u1)) | I1)

]

= Eπ1

[inf

u1∈U1

∫

Z1

(Φ(F1(z1, u1))dπI11

]

where πI11 is the conditional law (filter when information is nested).


,




Dynamic programming with perfect information

Notice that

V2(π2) = 〈π2,Φ〉 =〈π2, V̂2

〉where V̂2 = Φ .

DefineV̂1(z1) = inf

u1∈U1V̂2 (F1(z1, u1)) .


,




When I1 = σ(z1), we have

V1(π1) = infζ1�I1

V2

((F ζ1 )?(π1)

)

= infζ1�z1

∫

Z1

V2

(F1(z1, ζ(z1)

)π1(dz1)

=

∫

Z1

π1(dz1) infu1∈U1

V2

(F1(z1, u1)

)

=

∫

Z1

π1(dz1)V̂1(z1)

=〈π1, V̂1

〉.

Thus, in the case of perfect information, we recover the “usual”dynamic programming where the value function is indexed by thestate z , and not by probability measures P(Z).


,




EQUIVALENT STOCHASTIC CONTROL PROBLEMS:

FROM DYNAMIC TO STATIC PROBLEMS

H. S. Witsenhausen. Equivalent stochastic control problems.Mathematics of Control, Signals, and Systems, 1(1):3–7, 1988.


,




Criterion and observations

Notice that the cost may be written as

k2u20+x22 = k

2u20+(x1−u1)2 = k2u20+(x0+u0−u1)

2 = k2u20+(y0+u0−u1)2 ,

that is as a function of y0, u0 and u1. Hence, let us define

J : Y0 × U0 × U1 → R , (y0, u0, u1) 7→ k2u20 + (y0 + u0 − u1)

2 .


,




Probabilities

P denotes the standard Gaussian probability on the probabilityspace Ω = X0 ×W, that is

P(dx0dw) =1

2πexp(−

x20 + w2

2)dx0dw .

Q0(dy0) and Q1(dy1) are standard Gaussian probabilities on Y0and Y1 and

Q(dy0dy1) =1

2πexp(−

y20 + y21

2)dy0dy1

= Q0(dy0)⊗Q1(dy1) .


,




Control design

Let γ0 : Y0 → U0 and γ1 : Y0 × Y1 → U1 be two measurable

mappings, and denote γdef= (γ0, γ1).

With X0 and W the coordinate mappings on X0 ×W, we definethe policies and the closed-loop observations by

{Y0 = X0 , U0 = γ0(Y0)Y1 = X0 + U0 + W = X0 + γ0(Y0) + W , U1 = γ1(Y0,Y1) .

Thus defined, Y0 and Y1 are random variables which depend uponcontrol design γ0. We put

Y γ0def= (Y0,Y1) = (Y0,X0 + γ0(Y0) + W ) : X0 ×W→ Y0 × Y1 .


,




Cost criterion

The random cost is J(Y0,U0,U1), the expectation of which we tryand minimize over all designs γ0 and γ1.

Noticing that

J(Y0,U0,U1) = J(Y0, γ0(Y0), γ1(Y0,Y1)) ,

we define

Jγ : Y0 × Y1 → R , (y0, y1) 7→ J(y0, γ0(y0), γ1(y0, y1)) .


,




Radon-Nikodym density

Let us compute the law P(Y0,Y1) of (Y0,Y1) under P, that is theimage Y γ0? (P) of P by the mapping Y

γ0 .

Y γ0? (P)(dy0dy1) = exp(−y20 + (y1 − y0 − γ0(y0))

2

2)

1

2πexp(−

y20 + y21

2)dy0dy1 .

Hence Y γ0? (P) is equivalent to Lebesgue measure dy0dy1, and thusalso to Gaussian probability Q. We have

Y γ0? (P) = Tγ0Q ,

T γ0 : Y0×Y1 → R+ , (y0, y1) 7→ exp(−−y21 + (y1 − y0 − γ0(y0))

2

2) .


,




Reduction to an equivalent static problem

EP (J(Y0,U0,U1))

= EP (Jγ(Y0,Y1)) by definition of J

γ

= EY?(P) (Jγ) by definition of probability image

= EQ (TγJγ) by Y γ0? (P) = T

γ0Q

=

∫

Y0×Y1

J(y0, γ0(y0), γ1(y0, y1))Tγ0(y0)(y0, y1)︸︷︷︸

new cost

Q0(dy0)Q1(dy1) .


,




A static problem

Introducing a new cost

J̃(u0, u1, y0, y1)def= J(y0, u0, u1)T

u0(y0, y1)

=[k2u20 + (y0 + u0 − u1)

2]exp(−

−y21 + (y1 − y0 − u0)2

2) ,

the original optimization problem becomes now a static problem:

infγ0,γ1

∫

Y0×Y1

J̃(γ0(y0), γ1(y0, y1), y0, y1)Q0(dy0)Q1(dy1) .

Indeed, the observations are just noise y0 and y1, not dynamicalquantities affected by the controls.


The Witsenhausen counterexampleIntroductory remarks on information and ``dependency''On variables and random variablesInformation patterns in the linear-quadratic Gaussian exampleA general dynamic programming equationFrom dynamic to static problems

information patterns and optimal stochastic control...

Documents