csr2011 june14 16_30_ibsen-jensen
Post on 21-Oct-2014
257 views
DESCRIPTION
Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iterationTRANSCRIPT
The complexity of solving reachability games using value andstrategy iteration
Kristoffer Arnsfelt HansenRasmus Ibsen-Jensen Peter Bro Miltersen
Aarhus UniversityDenmarkCSR 2011, 14’th June
Overview
What are concurrent reachabillity games? Two standard algorithms solving concurrent
reachabillity games: The value iteration algorithm The strategy iteration algorithm
Examplify important facts for the proof of the time lower bound for both algorithms
1/42
Matrix games von Neumann 1928
0 -1 1
1 0 -1
-1 1 0
2/42
Matrix games von Neumann 1928
0 -1 1
1 0 -1
-1 1 0
2/42
0 -1 1
1 0 -1
-1 1 0
Each entry can be either 0, 1 or a pointer
vs.Dante* Lucifer*
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
0 1
* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42
vs.Dante* Lucifer*
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
3/42
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
0
0 0
0
0 0
0
0 0
0
0 0
3/42
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
1
0 1
0 0 1
0
0 0
0
0 0
0
0 0
3/42
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
3/42
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S:
S S
0 S
0 0
S S
0 S
0 0
3/42
Histories
Each entry can be either 0, 1 or a pointer
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
4/42
Histories and strategies
History: Sequence of positions and choices for each player in each position.
Strategy: Map from histories to probability distributions over choices in the position we arrive at after the history
S1: Set of strategies for Dante
S2: Set of strategies for Lucifer
H1/H2: Sets of stationary strategies (sets of strategies that only depends on the position we arrive at after the history)
5/42
Payoffs
v(i,σ,π): The probability to eventually reach a 1, from position i, if Dante plays by strategy σ and Lucifer by π.
6/42
Everett 1957
iviviv
),,( supinf),,( infsup :i1221 SSSS
Value of i
iH
viviv
),,( supinf),,( infsup :i1221 SSH
7/42
Algorithmic problems
Quantitatively solving a game: Given the game, compute the value of all positions.
Strategically solving a game: Given the game and ε>0, compute σ such that for all π and i: v(i,σ,π)>vi-ε.
8/42
Value iteration Shapley 1953
9/42
Value iteration computes the value of each position in Gt in iteration t, on the basis of the value of each position in Gt-1.
Gt: A modified version of G, where Dante loses after t moves.
Our results: Lower bound for value iteration There exists a concurrent reachabillity game
G, with N matrices and m rows and columns in each matrix, so that:
val(G)=1 and val(Gt) = 3m-N/2, for t=2mN/2
10/42
Our results: Upper bound for value iteration For any concurrent reachabillity game G val(G)-val(Gt)<ε for t=(1/ε)mO(N)
11/42
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
12/42
Value iteration example – G0
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
0
0 0
0
12/42
Value iteration example – G0
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
0
0
0
0
1 S S
0 1 S
0 0 1
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0
0 0
00
0 0
13/42
Value iteration example – G1
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1 0
0 0
0
0 0
01
1
1
1
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
1
0 1
0 0 1
0
0
0
0
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
1
0 1
0 0 1
0
0 0000
0
13/42
0
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
1 0 0
0 1 0
0 0 1
0.33333/
0
0 0
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0 0
0.33333/0 00
0 0
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1 0
0 0
0 0000
00000.33333/
0 0
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0 0 0
0 0 0
0 0 0
0
0.33333/0
00/
0
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0
0
0
0.33333/0
0/ 0/
0/
13/42
Value iteration example – G2
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0
0
0
0.33333/0.33333
0.11111/ 0/
0/
14/42
Value iteration example – G3
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.11111
0
0
0.33333/0.33333
0.11111/ 0/
0.03704/
15/42
Value iteration example – G4
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.11111
0.03704
0
0.33333/0.33333
0.11111/ 0.01235/
0.03704/
16/42
Value iteration example – G5
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.11111
0.03704
0.01235
0.33748/0.33333
0.11533/ 0.01754/
0.04147/
17/42
Value iteration example – G6
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.11533
0.04147
0.01754
0.33925/0.33748
0.11855/ 0.02172/
0.04493/
18/42
Value iteration example – G7
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.11855
0.04493
0.02172
0.34068/0.33925
0.12064/ 0.02519/
0.04772/
19/42
Value iteration example – G8
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.12064
0.04772
0.02519
0.34187/0.34068
0.12388/ 0.02815/
0.04991/
20/42
Value iteration example – G9
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.12388
0.04991
0.02815
0.34378/0.34187
0.12517/ 0.03070/
0.05129/
21/42
Strategy iterationChatterjee, de Alfaro, Henzinger ’06
22/42
Was conjectured to be fast
Our results: Upper bound for strategy iteration An ε-optimal strategy is computed after
t=(1/ε)mO(N) iterations of strategy iteration
This follows from the corresponding results for value iteration
23/42
Our results: Lower bound for strategy iteration There exists a concurrent reachabillity game
G, with N matrices, for large N, and m rows and columns in each matrix, so that:
val(G)=1 and The strategy optained by strategy iteration
guarantees winning probability at most 4m-N/2, for t= 2mN/4
24/42
Strategy iteration, m=2
N Number of iterations neededto get over 1/2
7 18446744073709551617
8 340282366920938463463374607431768211457
9 115792089237316195423570985008687907853269984665640564039457584007913129639937
Strategy iteration: Before iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1. Start strategy for Dante:= Uniform
25/42
Strategy iteration: Before iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
SS S
0 S
0 0
1. Start strategy for Dante:= Uniform
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
25/42
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
S
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
S
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
1
1
1
0
0 0
0
0 0
0
0 0
0
0 0
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
S
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
Strategy iteration: Iteration 1
0 S
S S
0 S
0 0
S S
0 S
0 0
1
0 0
S S
S S
0 S
0 0
0.66667
The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
S
0
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
Strategy iteration: Iteration 1
0
1
0.66667
The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.
0.66667
0.66667
0.66667
0.66667
0.66667
0.66667
0.66667
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
26/42
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
S
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
0.11111
0.03704
0.01235
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
0.01235
0
0 0
S
1
1
1
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.012350.012350.01235
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.33748
26/42
Strategy iteration: Iteration 1
S
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
0.33748
0.33332
0.32920
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
Strategy iteration: Iteration 1
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
0.33748
0.33332
0.32920
0.34599
0.33317
0.32084
0.37327
0.33180
0.29493
0.47368
0.31579
0.21053
26/42
Strategy iteration: Iteration 2
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
0.33748
0.33332
0.32920
0.34599
0.33317
0.32084
0.37327
0.33180
0.29493
0.47368
0.31579
0.21053
27/42
Strategy iteration: Iteration 2
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
0.33748
0.33332
0.32920
0.34599
0.33317
0.32084
0.37327
0.33180
0.29493
0.47368
0.31579
0.21053
27/42
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
0.33748
0.33332
0.32920
0.34599
0.33317
0.32084
0.37327
0.33180
0.29493
0.47368
0.31579
0.21053
27/42
Strategy iteration: Iteration 2
Strategy iteration: Iteration 2
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11677
0.04359
0.02065
0.33748
0.33748
0.33332
0.32920
0.34599
0.33317
0.32084
0.37327
0.33180
0.29493
0.47368
0.31579
0.21053
27/42
Strategy iteration: Iteration 2
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11677
0.04359
0.02065
0.33748
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
27/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11677
0.04359
0.02065
0.33748
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
28/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11677
0.04359
0.02065
0.33748
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
28/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11677
0.04359
0.02065
0.33748
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
28/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
28/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
28/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
28/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12360
0.05185
0.03154
0.34241
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12360
0.05185
0.03154
0.34241
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12360
0.05185
0.03154
0.34241
0.34407
0.33322
0.32271
0.36601
0.33230
0.30169
0.43486
0.32390
0.24125
0.64720
0.25350
0.09930
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
29/42
Strategy iteration: Iteration 5
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12360
0.05185
0.03154
0.34241
0.34407
0.33322
0.32271
0.36601
0.33230
0.30169
0.43486
0.32390
0.24125
0.64720
0.25350
0.09930
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
30/42
Strategy iteration: Iteration 5
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12360
0.05185
0.03154
0.34241
0.34407
0.33322
0.32271
0.36601
0.33230
0.30169
0.43486
0.32390
0.24125
0.64720
0.25350
0.09930
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
30/42
Strategy iteration: Iteration 5
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12360
0.05185
0.03154
0.34241
0.34407
0.33322
0.32271
0.36601
0.33230
0.30169
0.43486
0.32390
0.24125
0.64720
0.25350
0.09930
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
30/42
Strategy iteration: Iteration 5
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12593
0.05476
0.03544
0.34407
0.34407
0.33322
0.32271
0.36601
0.33230
0.30169
0.43486
0.32390
0.24125
0.64720
0.25350
0.09930
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
30/42
Strategy iteration: Iteration 5
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12593
0.05476
0.03544
0.34407
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
30/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12593
0.05476
0.03544
0.34407
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12593
0.05476
0.03544
0.34407
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12593
0.05476
0.03544
0.34407
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 7
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
32/42
Strategy iteration: Iteration 7
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
32/42
Strategy iteration: Iteration 7
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
32/42
Strategy iteration: Iteration 7
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
32/42
Strategy iteration: Iteration 7
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34758
0.33313
0.31929
0.37670
0.33153
0.29177
0.46723
0.31730
0.21547
0.71988
0.21557
0.06455
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
32/42
Strategy iteration: Iteration 8
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34758
0.33313
0.31929
0.37670
0.33153
0.29177
0.46723
0.31730
0.21547
0.71988
0.21557
0.06455
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
33/42
Strategy iteration: Iteration 8
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34758
0.33313
0.31929
0.37670
0.33153
0.29177
0.46723
0.31730
0.21547
0.71988
0.21557
0.06455
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
33/42
Strategy iteration: Iteration 8
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34758
0.33313
0.31929
0.37670
0.33153
0.29177
0.46723
0.31730
0.21547
0.71988
0.21557
0.06455
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
33/42
Strategy iteration: Iteration 8
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13093
0.06118
0.04404
0.34758
0.34758
0.33313
0.31929
0.37670
0.33153
0.29177
0.46723
0.31730
0.21547
0.71988
0.21557
0.06455
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
33/42
Strategy iteration: Iteration 8
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13093
0.06118
0.04404
0.34758
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
33/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13093
0.06118
0.04404
0.34758
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13093
0.06118
0.04404
0.34758
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13093
0.06118
0.04404
0.34758
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13219
0.06283
0.04624
0.34845
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13219
0.06283
0.04624
0.34845
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13219
0.06283
0.04624
0.34845
0.34923
0.33309
0.31768
0.38176
0.33109
0.28715
0.48241
0.31366
0.20393
0.74985
0.19791
0.05224
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Generalized Purgatory P(N,m) Lucifer repeatedly hides a number between 1
and m. Dante must try to guess the number. If he guesses correctly N times in a row, he
goes to heaven. If he ever guesses incorrectly overshooting
Lucifer’s number, he goes to hell.
35/42
Interesting fact
The probability that Dante goes to heaven from purgatory is nearly 1, if he plays well enough.
36/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
1
0 1
0
0
1
0 1
1
0 1
Strategy iteration on 3 matrices
37/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
1
0 1
0
0
1
0 1
1
0 1
t:=0
Strategy iteration on 3 matrices
37/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
0
0
1
0 1
1
0 1
t:=00
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
37/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
0
0
1
0 1
1
0 1
t:=10
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
38/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
0
0
1
0 1
1
0 1
t:=10
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
38/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
0
0
1
0 1
1
0 1
t:=10.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.25
0.125
38/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=10.5
0.66667
0.33333
0.66667
0.33333
0.57143
0.42857
0.53333
0.46667
0.5
0.5
0.25
0.125
1
0 1
0
0
1
0 1
38/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=20.5
0.66667
0.33333
0.66667
0.33333
0.57143
0.42857
0.53333
0.46667
0.5
0.5
0.25
0.125
1
0 1
0
0
1
0 1
39/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=20.5
0.66667
0.33333
0.66667
0.33333
0.57143
0.42857
0.53333
0.46667
0.5
0.5
0.25
0.125
1
0 1
0
0
1
0 1
39/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=20.66667
0.66667
0.33333
0.66667
0.33333
0.57143
0.42857
0.53333
0.46667
0.66667
0.53333
0.30476
0.20317
1
0 1
0
0
1
0 1
39/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=20.66667
0.75000
0.25000
0.75000
0.25000
0.61765
0.38235
0.55654
0.44346
0.66667
1
0 1
0
0
1
0 1
0.53333
0.30476
0.20317
39/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=30.66667
0.75000
0.25000
0.75000
0.25000
0.61765
0.38235
0.55654
0.44346
0.66667
1
0 1
0
0
1
0 1
0.53333
0.30476
0.20317
40/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=30.66667
0.75000
0.25000
0.75000
0.25000
0.61765
0.38235
0.55654
0.44346
0.66667
1
0 1
0
0
1
0 1
0.53333
0.30476
0.20317
40/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=30.75000
0.75000
0.25000
0.75000
0.25000
0.61765
0.38235
0.55654
0.44346
0.75000
0.55654
0.34374
0.25781
1
0 1
0
0
1
0 1
40/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=30.75000
0.80000
0.20000
0.80000
0.20000
0.65072
0.34928
0.57399
0.42601
0.75000
0.55654
0.34374
0.25781
1
0 1
0
0
1
0 1
41/42
The end
Open problems: Find a fast algorithm for the problem
There exists a PSPACE algorithm for the problem, but it is not fast.
Thanks for listening
42/42