Winning concurrent Winning concurrent reachability games reachability games
requires doubly-requires doubly-exponential patienceexponential patience
MichalMichal Koucký KouckýIMIM A ASS CCRR, Prague, Prague
Kristoffer Arnsfelt Hansen,Kristoffer Arnsfelt Hansen, Peter Bro Peter Bro MiltersenMiltersen
Aarhus U., DenmarkAarhus U., Denmark
2
ExampleExample
Player 1 chooses Player 1 chooses AA{t,h}{t,h}
Player 2 chooses Player 2 chooses BB{t,h}{t,h}
IfIf A = B then move A = B then move
one level up,one level up, A A B = t then move B = t then move
to 1to 1stst level, level, A A B = h then B = h then
Player 1 loses.Player 1 loses.
Entrance fee: Entrance fee: $15$15
Win: $20Win: $20
WW
77
66
55
44
33
22
11
3
Entrance fee: $15 Win: $20Entrance fee: $15 Win: $20
Observation:Observation: To break even, you need at To break even, you need at least ¾ probability to win.least ¾ probability to win.
Good news: Good news: you can win with probability you can win with probability arbitrary close to 1.arbitrary close to 1.
Bad news: Bad news: the expected time to win the the expected time to win the game with probability at least ¾ is 10game with probability at least ¾ is 102525 years (one move per day).years (one move per day).
… … the age of the age of universe: 10universe: 101111 years years
4
Concurrent reachability gamesConcurrent reachability games[de Alfaro, Henzinger, Kupferman ’98, Everett ’57][de Alfaro, Henzinger, Kupferman ’98, Everett ’57]
Two players play on a graph of states. At each Two players play on a graph of states. At each step they simultaneously (independently) pick step they simultaneously (independently) pick one of possible actions each and based on a one of possible actions each and based on a transition table move to the next state.transition table move to the next state.
……
……
……
……
5
Goals:Goals: Player 1 wants to reach a specific state Player 1 wants to reach a specific state or states.or states.
Player 2 wants to prevent Player 1 from Player 2 wants to prevent Player 1 from reaching these states.reaching these states.
Strategy of a player:Strategy of a player: Memory-less Memory-less (non-adaptive) – (non-adaptive) – ππ : states : states
actions.actions. AdaptiveAdaptive – – ππ : history : history actions. actions.
Probabilistic strategy: Probabilistic strategy: ππ gives a probability gives a probability distribution of possible actions.distribution of possible actions.
Patience of a memory-less strategy Patience of a memory-less strategy ππ = 1/min non-zero prob. in = 1/min non-zero prob. in
ππ … [Everett ’57]… [Everett ’57]
6
Winning starting states: Winning starting states:
SureSure – Player 1 has a winning strategy – Player 1 has a winning strategy that never fails.that never fails.
Almost-SureAlmost-Sure – Player 1 has a randomized – Player 1 has a randomized strategy that reaches goal with strategy that reaches goal with probability 1.probability 1.
Limit-SureLimit-Sure – For every – For every > 0 > 0, , Player 1 has Player 1 has a strategy that reaches goal with a strategy that reaches goal with probability at least 1 – probability at least 1 – ..
7
PurgatoryPurgatorynn
Player 1 chooses Player 1 chooses AA{t,h}{t,h}
Player 2 chooses Player 2 chooses BB{t,h}{t,h}
IfIf A = B then move A = B then move
one level up,one level up, A A B = t then move B = t then move
to 1to 1stst level, level, A A B = h then move B = h then move
to state H.to state H.
PP
nn
nn--11
33
22
11
…… HH
8
Our resultsOur results
ThmThm:: 1) For every 0< 1) For every 0< < < ½ ½ , , any any --optimal optimal strategy of Player 1 in Purgatorystrategy of Player 1 in Purgatorynn is of patience is of patience
> 1/> 1/ 22nn-2 -2 ..
2) For every 2) For every ll < < nn/2 , any (1 – 2/2 , any (1 – 2--l l )-optimal )-optimal strategy of Player 1 in Purgatorystrategy of Player 1 in Purgatorynn is of patience is of patience
> 2> 222nn--ll-2-2..
ThmThm:: For every 0< For every 0< < < ½ ½ and every concurrent and every concurrent reachability game with m>61 actions in total, reachability game with m>61 actions in total, both players have both players have --optimal strategies with optimal strategies with patience < 1/patience < 1/ 224242mm ..
9
ThmThm:: 1) For every 0< 1) For every 0< < < ’’ , , if every if every --optimal strategy of optimal strategy of Player 1 is of patience > Player 1 is of patience > tt then the expected time to win then the expected time to win the game by any the game by any ’-’-optimal strategy of Player 1 can be optimal strategy of Player 1 can be forced to be forced to be ΩΩ( ( tt ). ).
patience ~ expected time to winpatience ~ expected time to win
All the results essentially hold also for adaptive strategiesAll the results essentially hold also for adaptive strategies
Recall: Recall: the expected time to win Purgatorythe expected time to win Purgatory77 with probability with probability at least ¾ is 10at least ¾ is 102525 years (one move per day). years (one move per day).
10
Algorithmic consequencesAlgorithmic consequences
Three algorithmic questions:Three algorithmic questions:
1.1. What are *-SURE states?What are *-SURE states? PTIME [dAHK]PTIME [dAHK]
2.2. What are the winning probabilities of different What are the winning probabilities of different states? states?
PSPACE [EY]PSPACE [EY]
3.3. What is the (What is the (--)optimal strategy? )optimal strategy? EXP-EXP-TIME upper-bound [CdAH,…] EXP-EXP-TIME upper-bound [CdAH,…]
EXP-SPACE lower-bound [our results]EXP-SPACE lower-bound [our results]
Cor: Cor: Any algorithm that manipulates winning strategies Any algorithm that manipulates winning strategies in explicit representation must use exponential in explicit representation must use exponential space.space.
… … explicit representationexplicit representation: integer fractions: integer fractions
11
PurgatoryPurgatorynn
ppii – probability of – probability of playing t in state playing t in state ii in in -optimal strategy -optimal strategy of Player 1.of Player 1.
Claim: Claim: 1)1) 0< 0< ppii < 1, < 1, for all for all ii..
2)2) ppii < < , for , for all all ii..
3)3) pp11 ≤ ≤ pp22 . . pp33 … … ppn n
4)4) ppii ≤ ≤ ppi+i+11 . . ppi+i+22 … … ppnn
PP
nn
nn--11
33
22
11
……
1\1\22
tt hh
tt level+level+11
lossloss
hh level=level=11
level+level+11
ppnn
ppnn--
11
pp33
pp22
pp11
Player 2 Player 2 plays hplays h
Player 2 Player 2 plays tplays t
Player 2 plays hPlayer 2 plays h
tt
tt
tt
tt
tt
12
Open problemsOpen problems
Generic algorithm for Generic algorithm for --optimal optimal strategy with symbolic strategy with symbolic representation?representation?
How to redefine the game to be How to redefine the game to be more realistic?more realistic?
13
Goals:Goals: Player 1 wants to reach a specific state or Player 1 wants to reach a specific state or states.states.
Player 2 wants to prevent Player 1 from Player 2 wants to prevent Player 1 from reaching these states.reaching these states.
Winning starting states: Winning starting states:
SureSure – Player 1 has a winning strategy that – Player 1 has a winning strategy that never fails.never fails.
Almost-SureAlmost-Sure – Player 1 has a randomized – Player 1 has a randomized strategy that reaches goal with probability 1.strategy that reaches goal with probability 1.
Limit-SureLimit-Sure – For every – For every > 0 > 0, , Player 1 has a Player 1 has a strategy that reaches goal with probability at strategy that reaches goal with probability at least 1 – least 1 – ..