stochastic dynamic teams outline · ece, cas, csl, iti and mechse university of illinois at u-c...
TRANSCRIPT
![Page 1: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/1.jpg)
Stochastic Dynamic Teams and Games with Asymmetric
Information TAMER BAȘAR
ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C
September 29, 2015 IMA-Distributed Control and DM over Networks
Sept 28 – Oct 2, 2015 Univ Minnesota, Minneapolis Sept 29, 2015 IMA
OUTLINE !• Introduction to decision making in
stochastic environments • Dynamic teams, games, role of
information, asymmetry, non-classical IS • Some caveats and counter-examples on
existence and characterization of sols • Existence and characterization of team-
optimal policies with non-classical IS • Existence and characterization of Nash
equilibrium policies under asymmetry** • Conclusions
Sept 29, 2015 IMA
General FrameworkINGREDIENTS • Uncertainty (uncertain environment) • Decision makers (DMs) (players, agents) • Perceptions of DMs on uncertainty • Objective(s) • Description of interactions (underlying
network) • Common information / Private information
Sept 29, 2015 IMA
General Framework• Uncertainty (uncertain environment); decision makers (DMs)
(players, agents); perceptions of DMs on uncertainty; objective(s); description of interactions (underlying network); common information
• DMs pick policies (decision laws, strategies) leading to actions that evolve over time
Sept 29, 2015 IMA
General Framework• Uncertainty (uncertain environment); decision makers (DMs)
(players, agents); perceptions of DMs on uncertainty; objective(s); description of interactions (underlying network); common information
• DMs pick policies (decision laws, strategies) leading to actions that evolve over time
• Policies are constructed based on information received (active as well as passive) and guided by individual utility or cost functions over the DM horizon
Sept 29, 2015 IMA
General Framework• Uncertainty (uncertain environment); decision makers (DMs)
(players, agents); perceptions of DMs on uncertainty; objective(s); description of interactions (underlying network); common information
• DMs pick policies (decision laws, strategies) leading to actions that evolve over time
• Policies are constructed based on information received (active as well as passive) and guided by individual utility or cost functions over the DM horizon
– Single DM => stochastic control – Single objective => stochastic teams – Otherwise ZS or NZS games, with NE
![Page 2: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/2.jpg)
Sept 29, 2015 IMA
Coupling of Information and Actions !
• Is the quality of active and relevant information received by a player affected by actions of other players ?
– If no => underlying game (team) is generally “simple”
– If yes => it is generally “difficult”
Sept 29, 2015 IMA
A stochastic decision problem is one with asymmetric information, if different decision units (players, agents, decision makers) acting at the same time instant do not have access to the same information (of relevance).
Asymmetric Information
Sept 29, 2015 IMA
A stochastic decision problem is one with non-classical information, if a decision unit, B, that “follows” another one, A, does not have all the information acquired and used by A.
Non-classical Information
Sept 29, 2015 IMA
A
yA
B C
yB yC
Qi(x, uA, uB, uC) i = A, B, C
Sept 29, 2015 IMA
A
yA
B C
yB yC
Qi(x, uA, uB, uC) versus
A B C
Sept 29, 2015 IMA
A
yA
B C
yB yC
Qi(x, uA, uB, uC) versus
A B C
![Page 3: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/3.jpg)
Sept 29, 2015 IMA
A
yA
B C
yB yC
Qi(x, uA, uB, uC) versus
A B C
Sept 29, 2015 IMA
Questions / Challenges
• Existence of team-optimal (T-O) solutions • Existence of Nash equilibrium (NE) policies (in dynamic stochastic games) • Characterization of T-O solutions • Characterization of NE policies
Sept 29, 2015 IMA
Questions / Challenges
• Existence of team-optimal (T-O) solutions • Existence of Nash equilibrium (NE) policies (in dynamic stochastic games) • Characterization of T-O solutions • Characterization of NE policies • Challenges • Possibility of non-existence with asymmetry • Triple role (caution, probing, signaling) • Signaling (deception/threat) through action • Tension between signaling and optimization
Sept 29, 2015 IMA
Questions / Challenges
• Existence of team-optimal (T-O) solutions • Existence of Nash equilibrium (NE) policies (in dynamic stochastic games) • Characterization of T-O solutions • Characterization of NE policies • Challenges • Possibility of non-existence with asymmetry • Triple role (caution, probing, signaling) • Signaling (deception/threat) through action • Tension between signaling and optimization
NEXT: An example exhibiting the subtleties, such as the significance of failure of continuity of information content of channels in the limit of a sequence of discretized problems
Sept 29, 2015 IMA
Non-classical (limited memory)
γ0 γ1+w
yu0u1x
x, w ~ independent random variables u0, u1 are real valued actions J(γ0 , γ1) = E [ Q(x, u0, u1) | γ0 , γ1 ] J* = min min J(γ0 , γ1)
A B
Sept 29, 2015 IMA
Non-classical (2-pt pmf’s)
γ0 γ1+w
yu0u1x
x ~ ± 1 with equal prob; likewise w J(γ0 , γ1) = E [k (u0 - x)2 + (u0 - u1)2 | γ0 , γ1 ], k > 0 J* = inf inf J(γ0 , γ1)
A B
![Page 4: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/4.jpg)
Sept 29, 2015 IMA
Infimizing policies
γ0 γ1+w
yu0u1x
J* = inf inf J(γ0 , γ1) = 0 – lowest possible Consider the policies: γ0(x) = x + ε sgn(x) γ1(y) = 1+ε, if y=2+ε or ε = -1-ε, if y=-2-ε or –ε ! J = k ε2 " 0 as ε -> 0
A B
Sept 29, 2015 IMA
Infimizing policies
γ0 γ1+w
yu0u1x
J* = inf inf J(γ0 , γ1) = 0 – lowest possible Consider the policies: γ0(x) = x + ε sgn(x) γ1(y) = 1+ε, if y=2+ε or ε = -1-ε, if y=-2-ε or –ε ! J = k ε2 " 0 as ε -> 0
A B
But, as ε -> 0, limiting policies do not lead to J = 0 ! Loss of crucial information in the limit !
Another Example (Bismut, TAC’72)
J(γ0 , γ1) = E [(u0 - x)2 + α (u0 + u1)2 + (u1)2 | γ0 , γ1 ] α takes values 1 & 2 with equal prob x ~ uniform on [-√3 , √3] u0 = γ0(α, x) and u1 = γ1(u0 ) If u1 also knew α, then optimal cost: E[(1/(2+α2)) x2] = 7/24, with u0 = (1/(2+α)) x
Sept 29, 2015 IMA
Another Example (Bismut, TAC’72)
J(γ0 , γ1) = E [(u0 - x)2 + α (u0 + u1)2 + (u1)2 | γ0 , γ1 ] α takes values 1 & 2 with equal prob x ~ uniform on [-√3 , √3] u0 = γ0(α, x) and u1 = γ1(u0 ) If u1 also knew α, then optimal cost: E[(1/(2+α2)) x2] = 7/24, with u0
* = (1/(2+α)) x
Sept 29, 2015 IMA
With the original IS, agent 0 can truncate u0
* to some n decimal places and use the n+1’th decimal to transmit the true value of α – coming to within ε of 7/24. Again limit is not well defined.
Sept 29, 2015 IMA
The “simplest” “difficult” problem (Witsenhausen, 1968)
γ0 γ1+w
yu0u1x
x ~ N(0, σx2) w ~ N(0, σw
2) QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2
optimal team solution exists, but its structure is not known
Sept 29, 2015 IMA
The “simplest” difficult problem (Witsenhausen, 1968)
γ0 γ1+w
yu0u1x
QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2
optimal team solution exists, but its structure is not known affine policies are not optimal
![Page 5: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/5.jpg)
Sept 29, 2015 IMA
The “simplest” difficult problem (Witsenhausen, 1968)
γ0 γ1+w
yu0u1x
QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2
optimal team solution exists, but its structure is not known affine policies are not optimal
Note that if u0 = γ0(x), best γ1 is u1 = γ1(y) = E[γ0(x) | y] Hence optimization is with respect to the probability distribution of u0
Sept 29, 2015 IMA
The “simplest” difficult problem (Witsenhausen, 1968)
γ0 γ1+w
yu0u1x
QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2
optimal team solution exists, but its structure is not known affine policies are not optimal
Original existence proof given by Witsenhausen involved calculus of variations type arguments, and does not extend to more general scenarios.
Sept 29, 2015 IMA
The “simplest” difficult problem
γ0 γ1+w
yu0u1x
QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2 A policy pair that beats the best linear one: u0 = γ0(x) = ε sgn (x) + λ x u1 = γ1(y) = E[ε sgn (x) + λ x | y] optimize wrt ε and λ
Sept 29, 2015 IMA
The “simplest” difficult problem
γ0 γ1+w
yu0u1x
QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2 A more general one with improved perf: u0 = γ0(x) = ε Quant (x) + λ x u1 = γ1(y) = E[ε Quant (x) + λ x | y] optimize wrt ε and λ for different Quant(ization) schemes (Bansal, TB ‘87)
Sept 29, 2015 IMA
The “simplest” difficult problem
γ0 γ1+w
yu0u1x
QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2 u0 = γ0(x) = ε Quant (x) + λ x u1 = γ1(y) = E[ε Quant (x) + λ x | y] Best perf under linear could be arbitrarily bad against best performance under Quant => supk,σ [Qw
linear* / QWQuant*] = ∞ (Mitter, Sahai ‘99) Sept 29, 2015 IMA
Digression However, with Conflicting Objectives
γ0 γ1+w
yu0u1x
QG(x, u0, u1) = - k0 (u0 - x)2 + (u0 - u1)2
J* = min max J(γ0 , γ1)
γ1 γ0 Unique saddle-point solution, control laws are linear
![Page 6: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/6.jpg)
Sept 29, 2015 IMA
However, with Conflicting Objectives
γ0 γ1+w
yu0u1x
QG(x, u0, u1) = - k0 (u0 - x)2 + (u0 - u1)2
J* = min max J(γ0 , γ1)
γ1 γ0 γ0
* (x) = - [k0 / (k0 – (λ* -1)2)]x, γ1
* (y) = λ* y
where λ* uniquely solves the polynomial eqf(λ) = (σw
2 / σx2) λ[k0 – (λ -1)2]2 – k0
2(1-λ) = 0in the open interval (max(0, 1-√k0), 1)
Sept 29, 2015 IMA
Coming back to Witsenhausen (1968)
Is the solution
piecewise constant (or rather piecewise affine)?
Sept 29, 2015 IMA
NO
It is strictly increasing with a real analytic left inverse
(Wu, Verdú – CDC’11)
Sept 29, 2015 IMA
NO It is strictly increasing with a real analytic left inverse
(Wu, Verdú CDC’11 )
Uses optimal transport theory: Given prob measures P and Q, and cost function c: R2" R, find inf over all joint distributions of X and Y, with marginals P and Q, of E[c(X,Y)] (Monge-Kantorovich)
Sept 29, 2015 IMA
The “simplest” difficult problem (Witsenhausen, 1968)
γ0 γ1+w
yu0u1x
QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2
optimal team solution exists, but its structure is not known affine policies are not optimal
E[c(x,u0)]= E[QW] = k W2(P,Q)2 + mmse(Q) where P and Q are PDF’s of x and u0, and W2 is the quadratic Wasserstein distance between P and Q. For any P, ∃ minimizing Q, with “smooth” γ0 . Sept 29, 2015 IMA
A message to take e.g. on a variation to LQG
P
C / µ
+
plant uncertainty, w
sensor noise v
y = H x + v
u
y
dx/dt = Ax+Bu+Dw
u(t) = µ(yt) -- memoryless controller a difficult problem; solution not known
w, v indep GWN
Cost: (1/T) ∫ |z|2 + |u|R2
![Page 7: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/7.jpg)
Sept 29, 2015 IMA
Implications for Distributed Control
P
C
+
C1 CM
w
v
C2 …..
Sept 29, 2015 IMA
Implications for Distributed Control
P
C
+
C1 CM
w
v
C2 …..
In any optimal operation, controllers with decentralized information have to talk to (signal) each other through the plant.
Sept 29, 2015 IMA
Earlier general set-up with a different Q
γ0 γ1+w
yu0u1x
x ~ N(0, σx2) w ~ N(0, σw
2) u0, u1 are real valued actions J(γ0 , γ1) = E [ Q(x, u0, u1) | γ0 , γ1 ] Q(x, u0, u1) = k (u0)2 + (u1 - x)2
A B
Sept 29, 2015 IMA
Gaussian Test Channel
γ0 γ1+w
yu0u1x
Q(x, u0, u1) = k (u0)2 + (u1 - x)2
optimal control law (encoder/decoder)exists, and is linear
Sept 29, 2015 IMA
Generalized Gaussian Test Channel
γ0 γ1+w
yu0u1x
Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 x
optimal control law (encoder/decoder)exists, and is linear
Sept 29, 2015 IMA
Generalized Gaussian Test Channel !(Proof of existence)
γ0 γ1+w
yu0u1x
Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β + infγ b0 E[ γ0(x)x] ≥ k α + β - sgn(b0) σx√ α
![Page 8: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/8.jpg)
Sept 29, 2015 IMA
Generalized Gaussian Test Channel !(Proof of existence-2)
γ0 γ1+w
yu0u1x
Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β + infγ b0 E[ γ0(x)x] ≥ k α + β - sgn(b0) σx√ α
DPT: I(X;Y) ≥ I(X;U1)
Sept 29, 2015 IMA
Generalized Gaussian Test Channel !(Proof of existence-3)
γ0 γ1+w
yu0u1x
Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β + infγ b0 E[ γ0(x)x] ≥ k α + β - sgn(b0) σx√ α
(1/2)log (1+(α/ σw2)) ≥ I(X;Y) ≥ I(X;U1) ≥ (1/2)log (σx
2/ β) C(α) R(β)
Sept 29, 2015 IMA
Generalized Gaussian Test Channel !(Proof of existence-4)
γ0 γ1+w
yu0u1x
Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β + infγ b0 E[ γ0(x)x] ≥ k α + β - sgn(b0) σx√ α
(1/2)log (1+(α/ σw2)) ≥ I(X;Y) ≥ I(X;U1) ≥ (1/2)log (σx
2/ β) ==> β ≥ σw
2 σx2 / (σw
2 + α)Sept 29, 2015 IMA
Generalized Gaussian Test Channel !(Proof of existence-5)
γ0 γ1+w
yu0u1x
Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β + infγ b0 E[ γ0(x)x] ≥ k α + β - sgn(b0) σx √ α
==> β ≥ σw2 σx
2 / (σw2 + α)
Inequality is tight with γ0 (x) = -sgn(b0)(√ α / σx) x
Sept 29, 2015 IMA
Generalized Gaussian Test Channel !(Proof of existence-6)
γ0 γ1+w
yu0u1x
Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β - |b0| σx√ α ≥ k α + σw
2 σx2 / (σw
2 + α) - |b0| σx√ α
Obtain the α that minimizes the bound --> α*
Then, γ0* (x) = -sgn(b0)(√ α* / σx) x, γ1
*(y) = E[x|y]Sept 29, 2015 IMA
Generalized Gaussian Test Channel
γ0 γ1+w
yu0u1x
Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 x
One of the few instances when static/causal coding (and linear in this case) leads to attainment of equality in C(α) ≥ R(β)
![Page 9: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/9.jpg)
Sept 29, 2015 IMA
Recap
γ0 γ1+w
yu0u1x
QW = k0 (u0 - x)2 + (u0 - u1)2 conflicting rolesQG = - k0 (u0 - x)2 + (u0 - u1)2 aligned rolesQTC = k0 (u0)2 + (u1 - x)2 aligned roles
Not only the information structure but also the cost function is a determining factor
Sept 29, 2015 IMA
Noise Corrupted Source
γ0 γ1+w
yu0u1x + v
x ~ N(0, σx2), w ~ N(0, σw
2), v ~ N(0, σv2)
J(γ0 , γ1) = E [ Q(x, u0, u1) | γ0 , γ1 ]
! Similar structural results – linear policies
Sept 29, 2015 IMA
Jammer Corrupted Channel
γ0 γ1+wN
yu0u1x + v
x ~ N(0, σx2), wN ~ N(0, σw
2), v ~ N(0, σv2)
wJ unknown – controled by jammer
J(γ0 , γ1; γJ) = E [ Q(x, u0, u1; wJ) | γ0 , γ1, γJ]
! SP policies are linear (TB – TIT’83)
wJ
Sept 29, 2015 IMA
Jammer Corrupted Channel
γ0 γ1+wN
yu0u1x + v
x ~ N(0, σx2), w unknown, v ~ N(0, σv
2)
J(γ0 , γ1; γw) = E [ Q(x, u0, u1, w) | γ0 , γ1, γw]
! SP policies are linear (TB – TIT’83)
Even with x and channel noise non-Gaussian. under some matching conditions on spectral densities, SP policies are linear (Akyol-Rose-TB TIT-Aug’15)
wJ
Sept 29, 2015 IMA
Relay Channels Multiple Serial Decision Units
γ0 γm+
w0
ymu0umx + v
x ~ N(0, σx2), wi ~ N(0, σw
2), v ~ N(0, σv2)
J(γ0 , γ[1,m]) = E [ Q(x, u0, u[1,m]) | γ0 , γ[1,m]] Q(x, u0, u[1,m]) = (um – x)2 + Σi=1m (ui-1)2
+
wm-1
γi…. ….
Sept 29, 2015 IMA
Multiple Serial Decision Units
γ0 γm+
w0
ymu0umx + v
x ~ N(0, σx2), wi ~ N(0, σw
2), v ~ N(0, σv2)
J(γ0 , γ[1,m]) = E [ Q(x, u0, u[1,m]) | γ0 , γ[1,m]]
! For m > 3, affine laws no longer optimal for this extended GTC model (Lipsa-Martins’10)
+
wm-1
γi…. ….
![Page 10: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/10.jpg)
Sept 29, 2015 IMA
Multiple Serial Decision Units
γ0 γm+
w0
ymu0umx + v
x ~ N(0, σx2), wi ~ N(0, σw
2), v ~ N(0, σv2)
J(γ0 , γ[1,m]) = E [ Q(x, u0, u[1,m]) | γ0 , γ[1,m]]
! Also for m = 3, affine laws not optimal for this extended GTC model (Yüksel et al 2011)
+
wm-1
γi…. ….
Does a general unifying existence proof exist?
One that would apply to • Witsenhausen counter-example • Gaussian test channel • their multi-dimensional versions • relay channels • stochastic control problems with no memory
or limited memory • LQG teams with asymmetric information • etc Sept 29, 2015 IMA
Answer is YES (Gupta, Yüksel, TB -- CDC’14)
(Gupta, Yüksel, TB & Langbort – SICON’15)
Assumptions on the dynamic M-agent T stage stochastic team problem: • Sequential decision problem: Agent i
observes and recalls at time t, y(i,t), and constructs u(i,t)= γ(i,t)(y(i,t))
• Inf J(γ) is finite (e.g. use zero control) • Witsenhausen’s static reduction holds
Sept 29, 2015 IMA
Answer is YES
Assumptions on the dynamic M-agent T stage stochastic team problem: • Sequential decision problem: Agent i
observes and recalls at time t, y(i,t),, and constructs u(i,t)= γ(i,t)(y(i,t))
• Inf J(γ) is finite (e.g. use zero control) • Witsenhausen’s static reduction holds
Sept 29, 2015 IMA
Static reduction (Wits’88): • Convert M-agent team to MT-agent one where each agent acts only once • Measure and cost transformation that turns the dynamic problem into static one, with independent measurements (measurements now enter into cost)
Sept 29, 2015 IMA
Static Reduction for the C-example
γ0 γ1+w
yu0u1x
x ~ N(0, σx2) w ~ N(0, σw
2) QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2
∫ P(dx)P(dy) QW(x, u0, u1) # exp (u0 (2y - u0)/2) x ~ N(0, σx
2) y ~ N(0, 1) independent Sept 29, 2015 IMA
Challenges in the Proof
• Extracting a convergent subsequence of the infimizing sequence
• Showing lower semi-continuity of J • Making sure that informational
constraints are preserved in the limit
![Page 11: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/11.jpg)
Sept 29, 2015 IMA
Approach to Overcome Challenges in the Proof
• With static reduction, redefine optimization over randomized strategies
• Because of independence of measurements information constraints are preserved
• Identify a compact set that contains the optimal solution (Markov’s inequality)
• Use Blackwell’s irrelevant information theorem to go to deterministic policies
• Invoke equivalence of reduced static and dynamic teams
Sept 29, 2015 IMA
Assumptions & Approach Apply to Various Settings
• Multi-dimensional Witsenhausen C-E • Multi-dimensional Gaussian Test Channel • Relay channel and multi-dimensional version • LQG with static output feedback (solution is
generally nonlinear) • Countable/quantized observation spaces in
static teams with observation sharing IS (uses a lifting technique)
Sept 29, 2015 IMA
Next Question
Does there exist a computational scheme that would generate policies with costs arbitrarily close to optimum?
Sept 29, 2015 IMA
Selected Prior Work with no Performance Guarantees
• Lee-Lao-Ho, TAC’01 • Baglietto-Parisini-Zoppoli, TAC’01 • Li-Marden-Shamma, CDC’09 • Gnecco-Sanguinetti, INOC’09, OL’12 • Gnecco-Sanguinetti-Gaggero, SICON’12 • McEneaney-Han, Automatica’15 NOTE: As an optimiz problem these are NP complete – Papadimitriou-Tsitsiklis, SICON’86
Sept 29, 2015 IMA
Recent Answer
Quantized policies are asymptotically optimal -- those that use uniformly quantized measurements (restricted to a compact domain) Saldı-Yüksel-Linder (Sept 2015)
Sept 29, 2015 IMA
Recent Answer (Specifics)
• Cost is bounded and continuous in control • Action space of each agent is a convex subset
of a locally convex vector space • Measurement space of each agent is compact • Static reduction holds, satisfying the above ! On each ε-net of uniformly quantized measurements, there exists a team-optimal policy, leading to asymptotically optimal value for team cost as net gets arbitrarily dense
![Page 12: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/12.jpg)
Sept 29, 2015 IMA
Comprehensive coverage of analysis, optimization and performance of stochastic networked systems in Stochastic Networked Control Systems: Stabilization and Optimization under Information Constraints -- Yüksel & TB Birkhäuser, 2013
Sept 29, 2015 IMA
Stochastic Dynamic Games with Asymmetric Information
“Lifting” to a symmetric game
utilizing the common information of the players
Dynamic Stochastic Games with Asymmetric Information
(A. Nayyar & TB, CDC’12)
• Game G1: Local state process (xi) for Pi & global state process (z):
zt+1 = f0t(zt, ut, w0
t) xi
t+1 = fit(zt, ut, wi
t) • ui
t = μit(xi
[1,t], z[1,t], u[1,t-1]), i = 1, …, n • Ji(μ1, ..., μn) = E[∑[1, T] ci(xi
t, zt, ut)|μ] • Interested in Nash equilibrium μ*
Sept 29, 2015 IMA
From “Dynamic Stochastic Games with Asymmetric Information”
(A. Nayyar, TB, CDC 2012)
• Game G1: Local state process (xi) for Pi & global state process (z):
zt+1 = f0t(zt, ut, w0
t) xi
t+1 = fit(zt, ut, wi
t) • ui
t = μit(xi
[1,t], z[1,t], u[1,t-1]) • Ji(μ1, ..., μn) = E[∑[1, T] ci(xi
t, zt, ut)|μ] • Interested in Nash equilibrium μ*
Sept 29, 2015 IMA
Game G2: ui
t = μit(xi
t, z[t-1,t], ut-1) ! If μ* is a NE for G2, it is also a NE for G1
Models with Asymmetric Information (with A. Nayyar, 2012)
• Game G1: Local state process (xi) for Pi & global state process (z):
zt+1 = f0t(zt, ut, w0
t) xi
t+1 = fit(zt, ut, wi
t) • ui
t = μit(xi
[1,t], z[1,t], u[1,t-1]) • Ji(μ1, ..., μn) = E[∑[1, T] ci(xi
t, zt, ut)|μ] • Interested in Nash equilibrium μ*
Sept 29, 2015 IMA
Game G2: ui
t = μit(xi
t, z[t-1,t], ut-1) ! If μ* is a NE for G2, it is also a NE for G1 Game G3: Replace Pi with FPi who has access to z[t-1,t], ut-1, and selects Γi
t: Xi -> Ui according to φi
t (his strategy): Γi
t = φit(z[t-1,t], ut-1)
Then, control action is: uit = Γi
t(xit)
Cost: Li(φ1, ..., φn) = E[∑[1, T] ci(xit, zt, ut)|φ]
Models with Asymmetric Information (with A. Nayyar, 2012)
• Game G1: Local state process (xi) for Pi & global state process (z):
zt+1 = f0t(zt, ut, w0
t) xi
t+1 = fit(zt, ut, wi
t) • ui
t = μit(xi
[1,t], z[1,t], u[1,t-1]) • Ji(μ1, ..., μn) = E[∑[1, T] ci(xi
t, zt, ut)|μ] • Interested in Nash equilibrium μ*
Sept 29, 2015 IMA
Theorem: Let μ be a NE for G2. Define φi
t(z[t-1,t], ut-1) = μit( #, z[t-1,t], ut-1)
Then, φ is a NE for G3. Conversely, if φ is a NE for G3, μ as above is a NE for G2.
Game G3: Replace Pi with FPi who has access to z[t-1,t], ut-1, and selects Γi
t: Xi -> Ui according to φi
t (his strategy): Γi
t = φit(z[t-1,t], ut-1)
Then, control action is: uit = Γi
t(xit)
Cost: Li(φ1, ..., φn) = E[∑[1, T] ci(xit, zt, ut)|φ]
![Page 13: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/13.jpg)
What does this lead to • Since G3 is a symmetric full information game, its
NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi
t}
Sept 29, 2015 IMA
What does this lead to • Since G3 is a symmetric full information game, its
NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi
t} • For ZS version, (z[t-1, t], ut-1) can be replaced with
z[t-1, t], with some caveats regarding existence of SPE
Sept 29, 2015 IMA
What does this lead to • Since G3 is a symmetric full information game, its
NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi
t} • For ZS version, (z[t-1, t], ut-1) can be replaced with
z[t-1, t], with some caveats regarding SPE • Extends to teams against teams in ZS context
Sept 29, 2015 IMA
What does this lead to • Since G3 is a symmetric full information game, its
NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi
t} • For ZS version, (z[t-1, t], ut-1) can be replaced with
z[t-1, t], with some caveats regarding SPE • Extends to teams against teams in ZS context • Generalization to noisy channels (next)
Sept 29, 2015 IMA
What does this lead to • Since G3 is a symmetric full information game, its
NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi
t} • For ZS version, (z[t-1, t], ut-1) can be replaced with
z[t-1, t], with the caveats identified earlier • Extends to teams against teams in ZS context • Generalization to noisy channels (next) • Other contexts where a SDG with asymmetric
information can be transformed to one with symmetric information (through “lifting”)
Sept 29, 2015 IMA
What does this lead to • Since G3 is a symmetric full information game, its
NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi
t} • For ZS version, (z[t-1, t], ut-1) can be replaced with
z[t-1, t], with the caveats identified earlier • Extends to teams against teams in ZS context • Generalization to noisy channels (next) • Other contexts where a SDG with asymmetric
information can be transformed to one with symmetric information (through “lifting”)
Sept 29, 2015 IMA
Two recent papers: Nayyar, Gupta, Langbort, TB (TAC, 59(3), 2014) “Common information based Markov perfect equilibria for stochastic games with asymmetric information: Finite games” Gupta, Nayyar, Langbort, TB (SICON, 52(5), 2014) “Common information based Markov perfect equilibria for Linear-Gaussian games with asymmetric information”
![Page 14: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks](https://reader036.vdocuments.us/reader036/viewer/2022080720/5f79ba1d27a815702d401036/html5/thumbnails/14.jpg)
Sept 29, 2015 IMA
LQG Games with Asymmetric Information
• Standard set-up with independent measurement(s) and system noises
• Some sharing of past measurements and control actions by the (two) players
• Player i’s policy at time t: g(i,t): I(i,t) -> U(i,t) • Standard stage-additive T-horizon cost • Decomposition into common and private
information: Ct = I(1,t) � I(2,t) , P(i,t) = I(i,t) \ Ct
Sept 29, 2015 IMA
Assumptions / Approach
• Common information Ct does not contract • Common information based common belief is
independent of the policies of the players • ----------------------------------------------- • Introduce common information based game
(G2) on lifted strategy spaces as before • MPE of G2 is CIMPE for G1
Relationship between G1 & G2 INTRODUCTION TECHNICAL CHALLENGES TEAMS CONCLUSION
GAME G2
P
1t
U
1t �1
t : P1t ! U1
t
Mt
VP 1
Common Information Ct
P
2t
U
2t
Controller 1
Game G1
Game G2
Mt
VP 2
�2t : P2
t ! U2t
Controller 2
NE(G1) ()Bijection
NE(G2)
DYNAMIC SEQUENTIAL DECISION PROBLEMS WITH ASYMMETRIC INFORMATION 52/60
Sept 29, 2015 IMA Sept 29, 2015 IMA
Assumptions / Approach / Result
• Common information Ct does not contract • Common information based common belief is
independent of the policies of the players • ----------------------------------------------- • Introduce common information based game
(G2) on lifted strategy spaces as before • MPE of G2 is CIMPE for G1 ! CIMPE is unique & CIMPE strategies are affine; computation involves solving linear eqs
Sept 29, 2015 IMA
Recap • We now have a general theory of existence of
team-optimal solutions in stochastic dynamic teams with asymmetric and non-classical IS
• Caveats in computations based on discretization/quantization, but feasible with uniform quantization of measurements
• Lifting of SGs with asymmetric information to ones with symmetric information at the expense of increase in complexity
• Common information based MPE offers computational advantages
Sept 29, 2015 IMA
THANKS !