approximate solutions for partially observable stochastic games with common payoffs
DESCRIPTION
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs. Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider and Sebastian Thrun July 21, 2004AAMAS 2004. Robot Teams. Robot Teams. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/1.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs
Rosemary Emery-Montemerlo
joint work with
Geoff Gordon, Jeff Schneider and Sebastian Thrun
July 21, 2004 AAMAS 2004
![Page 2: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/2.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Robot Teams
![Page 3: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/3.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Robot Teams
With limited communication, existing paradigms for decentralized robot control are not sufficient
Game theoretic methods are necessary for multi-robot coordination under these conditions
![Page 4: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/4.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
![Page 5: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/5.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
![Page 6: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/6.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
![Page 7: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/7.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
![Page 8: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/8.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
![Page 9: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/9.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
![Page 10: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/10.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
![Page 11: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/11.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
![Page 12: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/12.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
![Page 13: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/13.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Decentralized Decision Making
A robot cannot choose actions based only on joint observations consistent with its own sensor readings
It must consider all joint observations that are consistent with its possible sensor readings
![Page 14: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/14.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Relationship Between Decision Theoretic Models
State Space State Space Belief Space Belief Space
MDP POMDP ?
Distributionover
Belief Space
![Page 15: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/15.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Models of Multi-Agent Systems Partially observable stochastic
games Generalization of stochastic games to
partially observable worlds Related models
DEC-POMDP [Bernstein et al., 2000] MTDP [Pynadath and Tambe, 2002] I-POMDP [Gmystrasiewicz and Doshi, 2004] POIPSG [Peshkin et al., 2000]
![Page 16: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/16.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Partially Observable Stochastic Games
POSG = {I, S, A, Z, T, R, O} I is the set of agents, I= {1,…,n} S is the set of states A is the set of actions, A= A1 An Z is the set of observations, Z= Z1
Zn T is the transition function, T: S A S R is the reward function, R: S A O are the observation emission
probabilities O: S Z A [0,1]
![Page 17: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/17.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Solving POSGs
POSGs are computationally infeasible to solve
![Page 18: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/18.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Solving POSGs
Full POSG
One-StepLookaheadGame at time t(Bayesian Game)
We can approximate a POSG as a series of smaller Bayesian games
![Page 19: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/19.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Bayesian Games Private information relevant to game
Uncertainty in utility Type
Encapsulates private information Will limit selves to games with finite number
of types In robot example
Type 1: Robot doesn’t see anything Type 2: Robot sees intruder at location x
![Page 20: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/20.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Bayesian Games BG = {I, , A,p(), u}
is the joint type space, = 1 n is a specific joint type, = {1,…, n}
p() is common prior on the distribution over
u is the utility function, u= {u1,…,un} ui(ai,a-i,(i, -i))
i is a strategy for player i Defines what player i does for each of its
possible types Actions are individual actions, not joint
actions
![Page 21: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/21.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Bayesian-Nash Equilibrium
Set of best response strategies Each agent tries to maximize its
expected utility conditioned on its probability distribution over the other agents’ types p() Each agent has a policy i that, given
-i , maximizes ui(i,-i, -i)
![Page 22: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/22.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
POSG to Bayesian Game Approximation {I,S,A,Z,T,R,O} to {I, , A,p(), u}t
I = I A = A Type space i
t = all possible histories of agent i’s actions and observations up to time t
p()t calculated from S0,A,T,Z,O, t-1
Prune low probability types Each joint type maps to a joint belief
u given by heuristic and ui = uj QMDP
![Page 23: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/23.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
AlgorithmInitializet=0, hi = {},p(0)0=solveGame(0,p(0))
Make Observationhi = obsi
t U ait-1 U hi
Determine Typei
t = bestMatch(hi, i
t)
Execute Actionai
t = i
t (i
t )
Propagate Forwardt+1,p(t+1)
Find Policy for t+1t+1=solveGame(t,p(t
))t= t+1
Agent i
Initializet=0, hj = {},p(0)0=solveGame(0,p(0))
Make Observationhj = obsj
t U ajt-1 U hj
Determine Typej
t = bestMatch(hj, 2
t)
Execute Actionaj
t = j
t (j
t )
Propagate Forwardt+1,p(t+1)
Find Policy for t+1t+1=solveGame(t,p(t
))t= t+1
Agent j
![Page 24: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/24.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Robotic Team Tag Version of Team Tag
Environment is portion of Gates Hall Full teammate
observability Opponent can be
captured by a single robot in any state
QMDP used as heuristic
Two pioneer-class robots
![Page 25: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/25.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Robot Policies
![Page 26: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/26.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Lady And The Tiger [Nair et al. 2003]
Computation Time
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
3 4 5 6 7 8 9 10
Horizion
Tim
e(m
s)
Full POSG
Bayesian GameApproximation
![Page 27: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/27.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Contributions Algorithm for finding approximate
solutions to POSG with common payoffs Tractability achieved by modeling POSG as
a sequence of Bayesian games Performs comparably to the full POSG for a
small finite-horizon problem Improved performance over ‘blind’
application of utility heuristic in more complex problems
Successful real-time game-theoretic controller for indoor robots
![Page 28: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/28.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Questions?
[email protected] www.cs.cmu.edu/~remery
![Page 29: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/29.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Back-Up Slides
![Page 30: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/30.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Policy Performance
-80
-70
-60
-50
-40
-30
-20
-10
0
10
20
3 4 5 6 7 8 9 10
Horizon
Ex
pe
cte
d o
r A
ve
rag
e R
ew
ard
Full POSG
Bayesian GameApproximationSelfish Policy
Lady And The Tiger [Nair et al. 2003]
![Page 31: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/31.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Robotic Team Tag I = {1,2} S = S1 X S2 X Sopponent
Si = {s0,…,s28}, sopponent= {s0,…,s28,stagged} |S| = 25230
Ai = {N,S,E,W,Tag} Zi = [{si,-1},s-i,a-i] T: adjacent cells O: see opponent if on same cell R: minimize capture time Modified from [Pineau et al. 2003]
![Page 32: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/32.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Environment
![Page 33: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/33.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Performance
-60
-50
-40
-30
-20
-10
0
Full Observability ofTeammate's Position
Without Full Observability ofTeammate's Position
Av
era
ge
Dis
co
un
ted
Va
lue
Full Observability
Most Likely State
QMDP
BayesianApproximation
Robotic Team Tag Results
![Page 34: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs](https://reader036.vdocuments.us/reader036/viewer/2022062301/568145fc550346895db30733/html5/thumbnails/34.jpg)
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo
Performance
0
10
20
30
40
50
60
70
80
90
100
Full Observability ofTeammate's Position
Without FullObservability of
Teammate's Position
Av
era
ge
Tim
es
tep
s
Full Observability
Most Likely State
QMDP
BayesianApproximation
Robotic Team Tag Results