information patterns and optimal stochastic control...

83
, The Witsenhausen counterexample Introductory remarks on information and “dependency” On variables and random variables Information patterns in the linear-quadratic Gaussian example A general dynamic programming equation From dynamic to static problems Information patterns and optimal stochastic control problems Michel De Lara cermics, ´ Ecole nationale des ponts et chauss´ ees, ParisTech 31 mai 2006 Cours EDF, mai-juin 2006

Upload: others

Post on 03-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Information patternsand

    optimal stochastic control problems

    Michel De Lara

    cermics, École nationale des ponts et chaussées, ParisTech

    31 mai 2006

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    A linear control-oriented stochastic model

    Controls: u0 ∈ U0 = R and u1 ∈ U1 = R

    Random issue: ω = (x0,w) ∈ Ω = R× R

    State equations

    {x1 = x0 + u0x2 = x1 − u1 .

    Output equations

    {y0 = x0y1 = x1 + w .

    H. S. Witsenhausen. A counterexample in stochastic optimalcontrol. SIAM J. Control, 6(1):131–147, 1968.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    A LQG problem with linear solution

    x0 and w are Gaussian independent.

    inf E(k2u20 + x22 ) ,{

    x1 = x0 + u0x2 = x1 − u1{

    u0 measurable w.r.t. y0 = x0

    u1 measurable w.r.t. (y0, y1) = (x0, x1 + w) .

    Solution u0 = υ0(y0), u1 = υ1(y0, y1) ,

    where υ0 and υ1 are affine functions: u0 = υ0(y0) = 0 andu1 = υ1(y0, y1) = y0.

    This is the so called classical information pattern

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Still LQG but. . . nonlinear solution!

    x0 and w are Gaussian independent.

    inf E(k2u20 + x22 ) ,{

    x1 = x0 + u0x2 = x1 − u1

    {u0 measurable w.r.t. y0 = x0u1 measurable w.r.t. y1 = x1 + w .

    Solution u0 = υ0(y0), u1 = υ1(y1)

    is known to exist and to be (highly) nonlinear!

    This is the Witsenhausen counterexample

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Classical information pattern

    Sequential and memory of past knowledge:1 agent 0 observes y0;2 agent 1 observes y0 and y1.

    Accumulation of information renders possible the “usual”stochastic dynamic programming with

    value function parameterized by the state (usually finitedimensional);infimum taken over the control variables.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Nonclassical information patterns

    In the sequential case, no memory of past knowledge:agent 1 observes y1.

    Signaling/dual effect of control:direct cost minimization versusindirect effect on output available for control.

    Interaction between information and control.

    Stochastic dynamic programming due to sequentiality butvalue function parameterized by the distribution of the state(infinite dimensional); infimum taken over controls mappings(idem).

    Witsenhausen H. S. A standard form for sequential stochasticcontrol. Mathematical Systems Theory, 7(1):5–11, (1973).

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Outline of the presentation

    1 The Witsenhausen counterexample

    2 Introductory remarks on information and “dependency”

    3 On variables and random variables

    4 Information patterns in the linear-quadratic Gaussian example

    5 A general dynamic programming equation

    6 From dynamic to static problems

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    INTRODUCTORY REMARKS ON

    INFORMATION

    AND

    “DEPENDENCY”

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Measurability versus functionality

    Let f : X× Y→ R be a function of two variables x and y .How can we express that f does not depend upon y?

    Either by measurability considerations

    f (x , y) = f (x , y ′) , ∀(y , y ′) ∈ Y2 ( ∂f∂y

    = 0 in the smooth case)the level curves of f are “vertical”f is measurable with respect to the σ-field X ⊗ {∅, Y}generated by the canonical projection $X : X× Y→ X,(x , y) 7→ x

    Or by functional arguments

    There exists a measurable g : X→ R such thatf (x , y) = g(x), ∀(x , y) ∈ X× Yf is a measurable function of the projection $X : X× Y→ X

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    We make implicit reference to σ-fields X and Y attached to X andY .

    f (x , y) = f (x , y ′) , ⇐⇒ ∃g : X→ R ,

    ∀(x , y , y ′) ∈ X× Y2 f (x , y) = g(x) , ∀(x , y) ∈ X× Y

    σ(f ) ⊂ X ⊗ {∅, Y} ⇐⇒ ∃g : X→ R measurable

    f = g ◦$X

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Defining information

    Let Ω be a set. Any ω ∈ Ω is an issue and one of these issues ω0governs a phenomenon; it is however unknown. Having informationabout ω0 is being able to localize this issue in Ω. Supposetherefore that we know that ω0 belongs to a subset G of Ω. Twopolars cases are the following:

    if G = Ω, there is no information;

    while, if G is a singleton, there is full information.

    In between, all we know is that we cannot distinguish ω0 from anyother ω ∈ G .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    In all generality, an information structure may thus be defined as acollection G of subsets of Ω (G ⊂ 2Ω) called events.Any event contains issues undistinguishable to the observer.The more events in G, the more information.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    G is an algebra: non empty, stable under complementationand finite union. This is a weak requirement, shared by a largenumber of subsets of 2Ω.

    G is a partition field (or π-field): non empty, stable undercomplementation and unlimited union. This is a strongrequirement, shared by a few number of subsets of 2Ω.Partition fields are in one-to-one correspondance withpartitions.

    G is a σ-algebra (or σ-field, or even a field): non empty, stableunder complementation and countable union. Being one ofthe basic concepts of measure theory, it is adapted to theintroduction of general probabilities on Ω.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Functional dependence of E(X | Y ) onthe conditioning random variable Y

    Let X and Y be two random variables on a probability space(Ω,F , P). Assume that X is integrable.

    Since E(X | Y ) is σ(Y )-measurable, there exists a measurablefunction f : R→ R such that

    E(X | Y ) = f (Y ) , P− a.s.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    For instance, if Ω is a finite probability space, with P({ω}) > 0 forall ω ∈ Ω, the function f is given, on the range Y (Ω), by

    f (y) = E(X | Y = y)

    =∑

    x ′∈X (Ω)

    x ′P(X = x ′ | Y = y)

    =∑

    x ′∈X (Ω)

    x ′∫Ω 1{x ′}(X (ω))1{y}(Y (ω))P(dω)∫

    Ω 1{y}(Y (ω))P(dω)

    =∑

    x ′∈X (Ω)

    x ′∑

    ω∈Ω 1{x ′}(X (ω))1{y}(Y (ω))P({ω})∑ω∈Ω 1{y}(Y (ω))P({ω})

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Double dependency upon Y

    To stress upon the functional dependencies, we shall write

    fP,X ,Y (y) =∑

    x ′∈X (Ω)

    x ′∫Ω 1{x ′}(X (ω))1{y}(Y (ω))P(dω)∫

    Ω 1{y}(Y (ω))P(dω)

    so thatE(X | Y )(ω) = fP,X ,Y (Y (ω)) .

    There is thus a double (dual) dependency of E(X | Y ) upon Y :

    one is pointwise, in the sense that E(X | Y )(ω) depends uponY (ω), so that whenever Y (ω) = Y (ω′), we haveE(X | Y )(ω) = E(X | Y )(ω′);

    the other is functional, in the sense thatE(X | Y = ·) = fP,X ,Y (·) depends upon P,X ,Y .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Infimum over variables versus over mappings

    Let X and Y be two finite sets, and J : U×X×Y→ R a criterion.The formula

    infU:X→U

    x∈X

    y∈Y

    J(U(x), x , y) =∑

    x∈X

    inf

    u∈U

    y∈Y

    J(u, x , y)

    is the elementary version of measurable selection theorems like

    infU�Z

    E[J(·,U(·))] = E[ infu∈U

    E[J(·, u) | Z ]]

    where U � Z means that the mapping U : Ω→ U is measurablewith respect to the random variable Z , that is σ(U) ⊂ σ(Z ).

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    ON VARIABLES

    AND

    RANDOM VARIABLES

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Ordinary variables

    State and output equations

    x1 = x0 + u0x2 = x1 − u1y0 = x0y1 = x1 + w

    are, at this stage, no more than relations between simple variables.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Primitive sample space

    We endow X0 = R and W = R, with respective σ-fieldsX0 = B(R) and W = B(R), with two probabilities PX0 and PWwith finite second moments, and put

    Ωdef= X0 ×W = R

    2

    with product probability P = PX0 ⊗ PW on the σ-field

    F = X0 ⊗W = B(R2) .

    The primitive probability space is

    (Ω,F , P) = (X0 ×W,X0 ⊗W, PX0 ⊗ PW ) .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Primitive random variables

    Let us introduce the coordinate mappings

    X0 : Ωdef= X0 ×W→ X0 (x0,w) 7→ x0

    andW : Ω

    def= X0 ×W→ X0 (x0,w) 7→ w .

    Under product probability P = PX0 ⊗ PW , X0 and W areindependent random variables.

    We follow the tradition to label random variables, hereX0 : Ω→ X0 and W : Ω→W, by capital letters, while ordinaryvariables are x0,w .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Policies, control designs, information patterns

    At this stage, only X0 and W are random variables.

    u0, u1, x1, x2, y0, y1 are “ordinary variables” belonging to thespaces U0 = U1 = X1 = X2 = Y0 = Y1 = R.

    They have to be turned into random variables U0, U1, X1, X2, Y0,Y1 (capital letters) to give a meaning to E(k

    2U20 + X22 ).

    There are two ways to do this,

    either by policies,

    or by control designs.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Define the history space

    Hdef= Ω×U0 × U1 = X0 ×W× U0 ×U1

    Ω H = Ω× U0 × U1U0 ↙↘ U1 υ0 ↙↘ υ1

    U0 U1 U0 U1

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Policies (measurable point of view)

    A policy is a random variable (U0,U1) : (Ω,F)→ U0 × U1.

    Given a policy (U0,U1), we define the state random variables by

    {X1 = X0 + U0X2 = X1 − U1 .

    There remains to express the information pattern, namely how U0and U1 may depend upon the other random variables, let them beX0, W , X1, and possibly upon themselves.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    σ-fields and signals on the sample space Ω

    Policies are measurable mappings over the space Ω = X0 ×W withσ-field

    F = X0 ⊗W .

    Expressing information patterns, or dependencies, boils down toconsidering

    either sub σ-fields G of F ,

    or signals, that is measurable mappings Y : (Ω,F)→ (Y,Y).

    To any signal is associated the subfield σ(Y ) = Y −1(Y) generatedby Y .

    Not all σ-fields of Ω may be obtained as σ(Y ), where Y takes value in a

    Borel space. Counterexample is given by the σ-field consisting of

    countable subsets and their complements.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Information patterns and policies

    Let G0 and G1 be two subfields of F = X0 ⊗W.

    Consider the optimization problem

    infU0�G0,U1�G1

    E(k2U20 + (X1 − U1)2)

    where we restrict the class of policies (U0,U1) to those such thatU0 is measurable with respect to G0, and U1 to G1.

    It may happen that the measurability constraints U0 � G0,U1 � G1lead to an empty set of policies.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    For instance, the Witsenhausen counterexample is the case

    G0 = σ(Y0) and G1 = σ(Y1)

    where the observations Y0 and Y1 are the random variables

    Y0 = X0 and Y1 = X1 + W = X0 + U0 + W

    infU0�Y0,U1�Y1

    E(k2U20 + (X1 − U1)2)

    means that the first decision U0 may only depend upon Y0, and U1upon Y1.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Control design (functional point of view)

    Define the history space

    Hdef= Ω×U0 × U1 = X0 ×W× U0 ×U1

    together with σ-field

    H = X0 ⊗W ⊗ U0 ⊗U1 .

    A control design is a couple of measurable mappings υ0 : H→ U0and υ1 : H→ U1.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Given a control design (υ0, υ1), let random variables U0, X1, U1and X2 be such that

    U0 = υ0(X0,W ,U0,U1)X1 = X0 + U0U1 = υ1(X0,W ,U0,U1)X2 = X1 − U1 .

    Notice that the equations giving U0 and U1 are implicit ones and

    may not admit solutions!

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Sequential control design (functional point of view)

    Define the sequential history spaces

    H0def= Ω = X0 ×W (initial “state”)

    H1def= Ω× U0 = X0 ×W× U0 (first “state”)

    H2def= Ω× U0 × U1 = X0 ×W× U0 × U1 (final “state”)

    so that H2 = H, together with σ-fields

    H0def= F = X0 ⊗W

    H1def= F ⊗ U0 = X0 ⊗W ⊗ U0

    H2def= F ⊗ U0 ⊗ U1 = X0 ⊗W ⊗ U0 ⊗ U1

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    A sequential control design is a couple of measurable mappingsυ0 : H0 = X0 ×W→ U0 and υ1 : H1 = X0 ×W× U0 → U1.

    Given a sequential control design (υ0, υ1), we define four randomvariables sequentially by

    U0 = υ0(X0,W )X1 = X0 + U0U1 = υ1(X0,W ,U0)X2 = X1 − U1 .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    These equations already express part of the information pattern,namely causality and sequentiality, by the way how U0 and U1depend functionaly upon the other random variables.

    Indeed, the random variables are defined in the order above in sucha way that the controls U0 and U1 may depend only upon pastrandom variables.

    However, there remains to express the rest of the informationpattern, outside sequentiality.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    σ-fields and signals on the history space H

    Control designs are mappings over the history spaceH = X0 ×W× U0 × U1.Expressing information patterns, or dependencies, boils down toconsidering

    either σ-fields I ⊂ H = X0 ⊗W ⊗ U0 ⊗ U1,

    or signals on H, that is measurable mappingsy : (H,H)→ (Y,Y).

    To any signal is associated the subfield σ(y) = y−1(Y) ⊂ Hgenerated by y .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Let I0 ⊂ X0 ⊗W and I1 ⊂ X0 ⊗W ⊗ U0.

    Consider the following optimization problem

    infυ0�I0,υ1�I1

    E(k2U20 + (X1 − U1)2)

    where we recall that

    U0 = υ0(X0,W ) and U1 = υ1(X0,W ,U0) .

    Here, the set of control designs υ0 � I0, υ1 � I1 is not empty.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Define the signals

    y0def= x0 : H = X0 ×W× U0 ×U1 → R , (x0,w , u0, u1) 7→ x0

    and

    y1 : H = X0 ×W×U0 ×U1 → R , (x0,w , u0, u1) 7→ x0 + u0 + w

    The Witsenhausen counterexample is

    infυ0�y0,υ1�y1

    E(k2U20 + (X1 − U1)2)

    with

    U0 = υ0(X0,W ) , X1 = X0 + U0 , U1 = υ1(X0,W ,U0) ,

    where υ0 depends only upon y0, and υ1 depends only upon y1.Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    ω 7→ ω , h0 7→ (h0, υ0(h0)) , h1 7→ (h1, υ1(h1))

    ΩH∅−→ H0

    Hυ0−→ H1Hυ1−→ H2yυ0yυ1

    U0 U1

    FH−1

    ∅←− H0H−1υ0←− H1

    H−1υ1←− H2∪ ∪I0 I1xυ−10

    xυ−11U0 U1

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    From fixed to moving σ-fields

    Let

    S∅ : Ω = X0 ×W→ H0 = X0 ×W , (x0,w) 7→ (x0,w) .

    We have

    G0 = S−1∅ (I0)

    = S−1∅ (σ(x0))

    = S−1∅ (X0 ⊗ {∅, W} ⊗ {∅, U0})

    = X0 ⊗ {∅, W}

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    For control design

    υ0 : X0 ×W→ U0 ,

    we define

    Hυ0 : X0 ×W→ H1 = X0 ×W× U0 , (x0,w) 7→ (x0,w , υ0(x0,w)) .

    With U0 = υ0(X0,W ), we have

    GU01 = H−1υ0

    (I1)

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    ΩH∅−→ (H0,H0)

    Hυ0−→ (H1,H1)Hυ1−→ H2

    G0H−1υ0←− I0 ∪

    G1H−1υ1←− I1

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    INFORMATION PATTERNS

    IN THE

    LINEAR-QUADRATIC GAUSSIAN EXAMPLE

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Information patterns: policy case

    Recall that

    Y0 = X0X1 = X0 + U0Y1 = X1 + W = X0 + U0 + W .

    LetG0 = σ(Y0) = σ(X0) ⊂ F = X0 ⊗W

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    and G1 ⊂ F = X0 ⊗W be one of the following

    G1 =

    σ(X0,W ) (noises)σ(X1) (Markovian)σ(Y0,U0,Y1) (full memory)σ(Y0,Y1) (partial memory)σ(U0,Y1) (Bismut)σ(Y1) (Witsenhausen)

    ⊂ F

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Dual effet

    To stress upon the dependency on the initial policy U0, denote

    GU01 =

    σ(X0,W ) = Fσ(X1) = σ(X0 + U0)σ(Y0,U0,Y1) = σ(X0,U0,X0 + U0 + W )σ(Y0,Y1) = σ(X0,X0 + U0 + W )σ(U0,Y1) = σ(U0,X0 + U0 + W )σ(Y1) = σ(X0 + U0 + W )

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    and consider the optimization problem

    infU0�G0,U1�G

    U01

    E[k2U20 + (X1 − U1)2] .

    which depends on two ways upon U0:

    by the cost [k2U20 + (X1 − U1)2]

    by the information GU01 .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Exploiting sequentiality

    infU0�G0,U1�G

    U01

    E[k2U20 + (X1 − U1)2]

    = infU0�G0

    (k2E[U20 ] + inf

    U1�GU01

    E[(X1 − U1)2]

    )

    because U0 does not depend on U1 .

    Thus, by sequentiality, we are led to consider the problem

    infU1�G

    U01

    E[(X1 − U1)2] .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    A candidate for value function

    infU1�G

    U01

    E[(X1 − U1)2] = E[ inf

    u1∈U1E((X1 − u1)

    2 | GU01 )]

    by a measurable selection theorem

    = E[(X1 − E(X1 | G

    U01 ))2

    ]

    by definition of E(X1 | GU01 )

    = E[var[X1 | GU01 ]] .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    The dual effect of U0

    infU0�G0,U1�G

    U01

    E[k2U20 + (X1 − U1)2]

    = infU0�G0 E[k2 U20︸︷︷︸

    pointwise

    dependence

    on U0

    +var[X1 | GU01 ]︸ ︷︷ ︸

    functional

    dependence

    on U0

    ]

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    The control variable enters a conditioning term: this is an exampleof the so called dual effect where the decision has an impact onfuture decisions by providing more or less information, in additionto contributing to cost minimization.

    K. Barty, P. Carpentier, J-P. Chancelier, G. Cohen, M. De Lara,and T. Guilbaud. Dual effect free stochastic controls. Annals ofOperations Research, Volume 142, Number 1, February 2006,Pages 41 - 62.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Full noise observation pattern

    Suppose that noises are sequentially observed in the sense that

    G0 = σ(Y0) = σ(X0) and GU01 = σ(X0,W ) = F .

    Notice the absence of dual effect, since GU01 does not depend uponU0.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Thus, since X1 ∈ GU01 = σ(X0,W ), we have

    var[X1 | GU01 ] = var[X1 | σ(X0,W )] = 0

    We obtain

    infU0�G0,U1�G

    U01

    E[k2U20 + (X1 − U1)2] = k2 inf

    U0�X0E[U20 ]

    with the obvious solution

    U]0 = 0 and U]1 = E(X

    ]1 | G

    U]0

    1 ) = X]0 + U

    ]0 = X

    ]0 = Y

    ]0 .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Markovian pattern

    The Markovian pattern is the case where the state is sequentiallyobserved:

    G0 = σ(Y0) = σ(X0) and GU01 = σ(X1) = σ(X0 + U0) .

    Dual effect holds true, in the sense that GU01 indeed depends uponU0.However, here again since X1 � G

    U01 , we have

    var[X1 | GU01 ] = var[X1 | X1] = 0

    giving solution

    U]0 = 0 and U]1 = E(X

    ]1 | X

    ]1) = X

    ]1 = X

    ]0 + U

    ]0 = X

    ]0 = Y

    ]0 .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Policy independence of conditional expectation

    GU01 depends upon U0, but not var[X1 | GU01 ].

    H. S. Witsenhausen. On policy independence of conditionalexpectations. Information and Control, 28(1):65–75, 1975.

    “If an observer of a stochastic control system observes both thedecision taken by an agent in the system and the data that wasavailable for this decision, then the conclusions that the observercan draw do not depend on the functional relation (policy, controllaw) used by this agent to reach his decision. ”

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Let

    x1 : H = X0 ×W× U0 × U1 → R , (x0,w , u0, u1) 7→ x0 + u0

    Recall thatHυ0(x0,w) = (x0,w , υ0(x0,w))

    so thatX1 = x1 ◦ Hυ0

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    We have

    E[(X1 − u1)2 | GU01 ] = EP[(X1 − u1)

    2 | GU01 ]

    = EP[(X1 − u1)2 | H−1υ0 (I1)]

    = EP[(x1 ◦ Hυ0 − u1)2 | H−1υ0 (I1)]

    = E(Hυ0 )?(P)[(x1 − u1)2 | I1]

    Policy independence of conditional expectation holds true whenE(Hυ0 )?(P)

    [(x1 − u1)2 | I1] does not depend upon υ0.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Strictly classical pattern:

    perfect recall plus control transmission

    G0 = σ(Y0)

    andGU01 = σ(X0,U0,X0 + U0 + W ) = σ(X0,W )

    Notice that GU01 , which seemed to depend upon policy U0 (or uponcontrol design υ0), is in fact policy independent: absence of dualeffect.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    We obtain

    var[X1 | GU01 ] = var[X1 | σ(X0,W )] = 0

    with the solution

    U]0 = 0 and U]1 = E(X

    ]1 | G

    U]0

    1 ) = X]0 + U

    ]0 = X

    ]0 = Y

    ]0 .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Classical pattern: perfect recall

    G0 = σ(Y0) and GU01 = σ(Y0,Y1) = σ(X0,U0 + W ) .

    Since U0 � σ(X0), there exists a measurable υ0 : X0 → U0 suchthat U0 = υ0(X0). Thus, U0 + W − υ0(X0) isσ(X0,U0 + W )-measurable and we can write

    GU01 = σ(Y0,Y1) = σ(X0,U0 + W ) = σ(X0,W ) .

    This absence of dual effect is related to the linear structure of theproblem. We obtain

    var[X1 | GU01 ] = var[X1 | σ(X0,W )] = 0

    with the same solution as hereabove.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Nonclassical pattern: control recall

    In this nonclassical pattern

    G0 = σ(Y0)

    GU01 = σ(U0,Y1) = σ(U0,X0 + U0 + W ) = σ(U0,X0 + W )

    and thus dual effect holds true.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    ε-optimal policies

    Bismut, J., An example of interaction between information andcontrol: the transparency of a game Automatic Control, IEEETransactions on Volume 18, Issue 5, Oct. 1973, 518 - 522

    For ε > 0, take

    U0 = εY0 = εX0 and U1 =U0ε

    = X0

    which yields the cost

    k2U20 + (X0 + U0 − U1)2 = ε2(k2 + 1)X 20

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Nonclassical pattern: no recall

    G0 = σ(Y0) and GU01 = σ(Y1) = σ(X0 + U0 + W )

    Notice that, since U0 � X0, we may take X1 = X0 + U0 instead ofU0 as policy:

    infU0�G0,U1�G

    U01

    E[k2U20 + (X1 − U1)2] =

    infX1�X0

    E[k2(X1 − X0)2 + var[X1 | X1 + W ]] .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Signalling

    infX1�X0

    E[k2(X1 − X0)2 + var[X1 | X1 + W ]]

    Setting X1 = 0 kills the var[X1 | X1 + W ] term, but has a costk2X 20 .

    Setting X1 = X0 kills the (X1 − X0)2 term, but has a cost

    var[X0 | X0 + W ].

    In between. . .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    For θ : Y1 = X0 → X1, we can show that

    var[θ(X0) | θ(X0) + W ] = Fθ(θ(X0) + W )

    with

    Fθ(y1) =

    ∫ +∞−∞ dy0 exp(−

    y20 +θ(y0)2−2y1θ(y0)2 )θ(y0)

    2

    ∫ +∞−∞ dy0 exp(−

    y20 +θ(y0)2−2y1θ(y0)2 )

    (∫ +∞−∞ dy0 exp(−

    y20 +θ(y0)2−2y1θ(y0)2 )θ(y0)∫ +∞

    −∞ dy0 exp(−y20 +θ(y0)

    2−2y1θ(y0)2 )

    )2.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    A GENERAL DYNAMIC PROGRAMMING EQUATION

    Witsenhausen H. S. A standard form for sequential stochasticcontrol. Mathematical Systems Theory, 7(1):5–11, (1973).

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Standard form

    ΩH∅−→ H0

    Hυ0−→ H1Hυ1−→ H2yS0yS1

    yS2

    Z0F

    υ00−→ Z1

    Fυ11−→ Z2yυ0

    yυ1U0 U1

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    State-observation spaces and dynamics

    The initial state-observation space Z0 contains all the randomissues:

    Z0 = H0 = Ω = X0 ×W

    I0 = σ(x0) ⊂ Z0

    The first state-observation space Z1 is designed in such a way that

    it contains the image of the dynamics x0 7→ x0 + u0;it supports the observation field;it contains variables needed to propagate the followingdynamics.

    F0 : Z0 ×U0 → Z1 = X1 × Y1 × U0

    (x0,w , u0) 7→ (x0 + u0, x0 + u0 + w , u0)

    I1 = σ(y1) ⊂ Z1Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    In the same way,

    F1 : Z1 × U1 → Z2 = X2 ×U0

    (x1,w , u0, u1) 7→ (x1 − u1, u0)

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    ∆0 = {ζ0 : Ω→ U0, ζ0 � I0} , ∆1 = {ζ1 : X1×W×U0 → U1, ζ1 � I1}

    Define, for any ζ = (ζ0, ζ1) ∈ ∆0 ×∆1,

    F ζ00 (x0,w)def= F0(x0,w , ζ0(x0,w))

    Z ζ1 (x0,w) = Fζ00 (x0,w)

    F ζ11 (x1, y1, u0)def= F1(x1, y1, u0, u1, ζ1(x1, y1, u0))

    .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Z ζ2 (x0,w) = Fζ11 (Z

    ζ1 (x0,w), ζ1(Z

    ζ1 (x0,w))

    LetΦ : Z2 → R , Φ(x2, u0) = k

    2u20 + x22

    and consider

    infζ0∈∆0

    infζ1∈∆1

    ΩdP(ω)Φ(Z ζ2 (ω))

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Probability measures as states

    In the standart form, the so called value functions are defined ondifferent spaces of probability measures P(Zt), t = 0, 1, 2.

    The mappings F ζ0 : Z0 → Z1 and Fζ1 : Z1 → Z2 induce mappings

    (F ζ0 )? : P(Z0)→ P(Z1) ,

    (F ζ1 )? : P(Z1)→ P(Z2)

    so that (F ζt )?(πt) is a probability on Zt+1, image of πt by the

    mapping F ζt .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Value functions

    The value functions Vt : P(Zt)→ R are defined by induction by

    V2(π2)def= 〈π2,Φ〉 =

    Z2

    Φ(z2)π2(dz2)

    V1(π1)def= infζ1�I1 V2

    ((F ζ1 )?(π1)

    )

    V0(π0)def= infζ0�I0 V1

    ((F ζ0 )?(π1)

    ).

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Optimal cost is V0(P)

    infζ0∈∆0

    infζ1∈∆1

    ΩdP(ω)Φ(Z ζ2 (ω)) = inf

    ζ0∈∆0inf

    ζ1∈∆1

    ΩdP(ω)Φ(F ζ11 (Z

    ζ1 (ω)))

    = infζ0∈∆0

    infζ1∈∆1

    ΩdP(ω)Φ(F ζ11 (F

    ζ00 (ω)))

    = infζ0∈∆0

    infζ1∈∆1

    〈P,Φ ◦ F ζ11 ◦ F

    ζ00

    = infζ0∈∆0

    infζ1∈∆1

    〈(F ζ11 )?(F

    ζ00 )?P,Φ

    = infζ0∈∆0

    infζ1∈∆1

    V2

    ((F ζ11 )?(F

    ζ00 )?P

    )

    = infζ0∈∆0

    V1

    ((F ζ00 )?P

    )

    = V0(P) .Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Partial information dynamic programming equation

    V2

    ((F ζ1 )?(π1)

    )=

    Z2

    Φ(z2)(Fζ1 )?(π1)(dz2)

    =

    Z1

    Φ(F ζ1 (z1))π1(dz1)

    =

    Z1

    Φ(F1(z1, ζ(z1))π1(dz1)

    = Eπ1 [Eπ1(Φ(F1(z ·, ζ(·)))]

    = Eπ1 [Eπ1(Φ(F1(·, ζ(·))) | I1)]

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    so that

    V1(π1) = infζ1�I1

    Eπ1 [Eπ1(Φ(F1(ζ(·), ·)) | I1)]

    = Eπ1

    [inf

    u1∈U1Eπ1(Φ(F1(·, u1)) | I1)

    ]

    = Eπ1

    [inf

    u1∈U1

    Z1

    (Φ(F1(z1, u1))dπI11

    ]

    where πI11 is the conditional law (filter when information is nested).

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Dynamic programming with perfect information

    Notice that

    V2(π2) = 〈π2,Φ〉 =〈π2, V̂2

    〉where V̂2 = Φ .

    DefineV̂1(z1) = inf

    u1∈U1V̂2 (F1(z1, u1)) .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    When I1 = σ(z1), we have

    V1(π1) = infζ1�I1

    V2

    ((F ζ1 )?(π1)

    )

    = infζ1�z1

    Z1

    V2

    (F1(z1, ζ(z1)

    )π1(dz1)

    =

    Z1

    π1(dz1) infu1∈U1

    V2

    (F1(z1, u1)

    )

    =

    Z1

    π1(dz1)V̂1(z1)

    =〈π1, V̂1

    〉.

    Thus, in the case of perfect information, we recover the “usual”dynamic programming where the value function is indexed by thestate z , and not by probability measures P(Z).

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    EQUIVALENT STOCHASTIC CONTROL PROBLEMS:

    FROM DYNAMIC TO STATIC PROBLEMS

    H. S. Witsenhausen. Equivalent stochastic control problems.Mathematics of Control, Signals, and Systems, 1(1):3–7, 1988.

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Criterion and observations

    Notice that the cost may be written as

    k2u20+x22 = k

    2u20+(x1−u1)2 = k2u20+(x0+u0−u1)

    2 = k2u20+(y0+u0−u1)2 ,

    that is as a function of y0, u0 and u1. Hence, let us define

    J : Y0 × U0 × U1 → R , (y0, u0, u1) 7→ k2u20 + (y0 + u0 − u1)

    2 .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Probabilities

    P denotes the standard Gaussian probability on the probabilityspace Ω = X0 ×W, that is

    P(dx0dw) =1

    2πexp(−

    x20 + w2

    2)dx0dw .

    Q0(dy0) and Q1(dy1) are standard Gaussian probabilities on Y0and Y1 and

    Q(dy0dy1) =1

    2πexp(−

    y20 + y21

    2)dy0dy1

    = Q0(dy0)⊗Q1(dy1) .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Control design

    Let γ0 : Y0 → U0 and γ1 : Y0 × Y1 → U1 be two measurable

    mappings, and denote γdef= (γ0, γ1).

    With X0 and W the coordinate mappings on X0 ×W, we definethe policies and the closed-loop observations by

    {Y0 = X0 , U0 = γ0(Y0)Y1 = X0 + U0 + W = X0 + γ0(Y0) + W , U1 = γ1(Y0,Y1) .

    Thus defined, Y0 and Y1 are random variables which depend uponcontrol design γ0. We put

    Y γ0def= (Y0,Y1) = (Y0,X0 + γ0(Y0) + W ) : X0 ×W→ Y0 × Y1 .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Cost criterion

    The random cost is J(Y0,U0,U1), the expectation of which we tryand minimize over all designs γ0 and γ1.

    Noticing that

    J(Y0,U0,U1) = J(Y0, γ0(Y0), γ1(Y0,Y1)) ,

    we define

    Jγ : Y0 × Y1 → R , (y0, y1) 7→ J(y0, γ0(y0), γ1(y0, y1)) .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Radon-Nikodym density

    Let us compute the law P(Y0,Y1) of (Y0,Y1) under P, that is theimage Y γ0? (P) of P by the mapping Y

    γ0 .

    Y γ0? (P)(dy0dy1) = exp(−y20 + (y1 − y0 − γ0(y0))

    2

    2)

    1

    2πexp(−

    y20 + y21

    2)dy0dy1 .

    Hence Y γ0? (P) is equivalent to Lebesgue measure dy0dy1, and thusalso to Gaussian probability Q. We have

    Y γ0? (P) = Tγ0Q ,

    T γ0 : Y0×Y1 → R+ , (y0, y1) 7→ exp(−−y21 + (y1 − y0 − γ0(y0))

    2

    2) .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    Reduction to an equivalent static problem

    EP (J(Y0,U0,U1))

    = EP (Jγ(Y0,Y1)) by definition of J

    γ

    = EY?(P) (Jγ) by definition of probability image

    = EQ (TγJγ) by Y γ0? (P) = T

    γ0Q

    =

    Y0×Y1

    J(y0, γ0(y0), γ1(y0, y1))Tγ0(y0)(y0, y1)︸ ︷︷ ︸

    new cost

    Q0(dy0)Q1(dy1) .

    Cours EDF, mai-juin 2006

  • ,

    The Witsenhausen counterexampleIntroductory remarks on information and “dependency”

    On variables and random variablesInformation patterns in the linear-quadratic Gaussian example

    A general dynamic programming equationFrom dynamic to static problems

    A static problem

    Introducing a new cost

    J̃(u0, u1, y0, y1)def= J(y0, u0, u1)T

    u0(y0, y1)

    =[k2u20 + (y0 + u0 − u1)

    2]exp(−

    −y21 + (y1 − y0 − u0)2

    2) ,

    the original optimization problem becomes now a static problem:

    infγ0,γ1

    Y0×Y1

    J̃(γ0(y0), γ1(y0, y1), y0, y1)Q0(dy0)Q1(dy1) .

    Indeed, the observations are just noise y0 and y1, not dynamicalquantities affected by the controls.

    Cours EDF, mai-juin 2006

    The Witsenhausen counterexampleIntroductory remarks on information and ``dependency''On variables and random variablesInformation patterns in the linear-quadratic Gaussian exampleA general dynamic programming equationFrom dynamic to static problems