csr2011 june14 16_30_ibsen-jensen

The complexity of solving reachability games using value andstrategy iteration

Kristoffer Arnsfelt HansenRasmus Ibsen-Jensen Peter Bro Miltersen

Aarhus UniversityDenmarkCSR 2011, 14’th June

Overview

What are concurrent reachabillity games? Two standard algorithms solving concurrent

reachabillity games: The value iteration algorithm The strategy iteration algorithm

Examplify important facts for the proof of the time lower bound for both algorithms

1/42

Matrix games von Neumann 1928

0 -1 1

1 0 -1

-1 1 0

2/42

0 -1 1

1 0 -1

-1 1 0

Each entry can be either 0, 1 or a pointer

vs.Dante* Lucifer*

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

0 1

* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42

vs.Dante* Lucifer*



* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42



3/42



0

0 0

0

0 0

0

0 0

0

0 0

3/42



1

0 1

0 0 1

0

0 0

0

0 0

0

0 0

3/42



1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

3/42



1 S S

0 1 S

0 0 1

S S

0 S

0 0

S:

S S

0 S

0 0

S S

0 S

0 0

3/42

Histories


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

4/42

Histories and strategies

History: Sequence of positions and choices for each player in each position.

Strategy: Map from histories to probability distributions over choices in the position we arrive at after the history

S1: Set of strategies for Dante

S2: Set of strategies for Lucifer

H1/H2: Sets of stationary strategies (sets of strategies that only depends on the position we arrive at after the history)

5/42

Payoffs

v(i,σ,π): The probability to eventually reach a 1, from position i, if Dante plays by strategy σ and Lucifer by π.

6/42

Everett 1957

iviviv

),,( supinf),,( infsup :i1221 SSSS

Value of i

iH

viviv

),,( supinf),,( infsup :i1221 SSH

7/42

Algorithmic problems

Quantitatively solving a game: Given the game, compute the value of all positions.

Strategically solving a game: Given the game and ε>0, compute σ such that for all π and i: v(i,σ,π)>vi-ε.

8/42

Value iteration Shapley 1953

9/42

Value iteration computes the value of each position in Gt in iteration t, on the basis of the value of each position in Gt-1.

Gt: A modified version of G, where Dante loses after t moves.

Our results: Lower bound for value iteration There exists a concurrent reachabillity game

G, with N matrices and m rows and columns in each matrix, so that:

val(G)=1 and val(Gt) = 3m-N/2, for t=2mN/2

10/42

Our results: Upper bound for value iteration For any concurrent reachabillity game G val(G)-val(Gt)<ε for t=(1/ε)mO(N)

11/42

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

12/42

Value iteration example – G0

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

0

0 0

0

12/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

0

0

0

0

1 S S

0 1 S

0 0 1

13/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0 0

00

0 0

13/42



S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1 0

0 0

0

0 0

01

1

1

1

13/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1

0 1

0 0 1

0

0

0

0

13/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1

0 1

0 0 1

0

0 0000

0

13/42

0


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1 0 0

0 1 0

0 0 1

0.33333/

0

0 0

13/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0 0

0.33333/0 00

0 0

13/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1 0

0 0

0 0000

00000.33333/

0 0

13/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0 0 0

0 0 0

0 0 0

0

0.33333/0

00/

0

13/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0

0

0.33333/0

0/ 0/

0/

13/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0

0

0.33333/0.33333

0.11111/ 0/

0/

14/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0

0

0.33333/0.33333

0.11111/ 0/

0.03704/

15/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0.03704

0

0.33333/0.33333

0.11111/ 0.01235/

0.03704/

16/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0.03704

0.01235

0.33748/0.33333

0.11533/ 0.01754/

0.04147/

17/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11533

0.04147

0.01754

0.33925/0.33748

0.11855/ 0.02172/

0.04493/

18/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11855

0.04493

0.02172

0.34068/0.33925

0.12064/ 0.02519/

0.04772/

19/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.12064

0.04772

0.02519

0.34187/0.34068

0.12388/ 0.02815/

0.04991/

20/42


S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.12388

0.04991

0.02815

0.34378/0.34187

0.12517/ 0.03070/

0.05129/

21/42

Strategy iterationChatterjee, de Alfaro, Henzinger ’06

22/42

Was conjectured to be fast

Our results: Upper bound for strategy iteration An ε-optimal strategy is computed after

t=(1/ε)mO(N) iterations of strategy iteration

This follows from the corresponding results for value iteration

23/42

Our results: Lower bound for strategy iteration There exists a concurrent reachabillity game

G, with N matrices, for large N, and m rows and columns in each matrix, so that:

val(G)=1 and The strategy optained by strategy iteration

guarantees winning probability at most 4m-N/2, for t= 2mN/4

24/42

Strategy iteration, m=2

N Number of iterations neededto get over 1/2

7 18446744073709551617

8 340282366920938463463374607431768211457

9 115792089237316195423570985008687907853269984665640564039457584007913129639937

Strategy iteration: Before iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1. Start strategy for Dante:= Uniform

25/42

Strategy iteration: Before iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

SS S

0 S

0 0

1. Start strategy for Dante:= Uniform

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

25/42

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1

1

1

0

0 0

0

0 0

0

0 0

0

0 0


S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42


0 S

S S

0 S

0 0

S S

0 S

0 0

1

0 0

S S

S S

0 S

0 0

0.66667

The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.


S

0

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42


0

1

0.66667

The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.

0.66667

0.66667

0.66667

0.66667

0.66667

0.66667

0.66667


26/42


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0


S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

S


0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

0.11111

0.03704

0.01235

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333


1 S S

0 1 S

0 0 1

0.01235

0

0 0

S

1

1

1


0.012350.012350.01235

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.33748

26/42


S

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42



S


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

26/42


S


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42


S


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42


S


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

27/42


S


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42


S


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42


S


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

28/42


S


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42


S


1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12360

0.05185

0.03154

0.34241

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42


S


0.12360

0.05185

0.03154

0.34241

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

29/42


S


0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

29/42


S


0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42


S


0.12593

0.05476

0.03544

0.34407

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42


S


0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42


S


0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42


S


0.12786

0.05721

0.03873

0.34543

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42


S


0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42


S


0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42


S


0.12950

0.05932

0.04156

0.34658

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42


S


0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42


S


0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42


S


0.13093

0.06118

0.04404

0.34758

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42


S


0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42


S


0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42


S


0.13219

0.06283

0.04624

0.34845

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42


S


0.13219

0.06283

0.04624

0.34845

0.34923

0.33309

0.31768

0.38176

0.33109

0.28715

0.48241

0.31366

0.20393

0.74985

0.19791

0.05224

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Generalized Purgatory P(N,m) Lucifer repeatedly hides a number between 1

and m. Dante must try to guess the number. If he guesses correctly N times in a row, he

goes to heaven. If he ever guesses incorrectly overshooting

Lucifer’s number, he goes to hell.

35/42

Interesting fact

The probability that Dante goes to heaven from purgatory is nearly 1, if he plays well enough.

36/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

1

0 1

0

0

1

0 1

1

0 1

Strategy iteration on 3 matrices

37/42



1

0 1

0

0

1

0 1

1

0 1

t:=0


37/42




1

0 1

0

0

1

0 1

1

0 1

t:=00

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

37/42




1

0 1

0

0

1

0 1

1

0 1

t:=10

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

38/42




1

0 1

0

0

1

0 1

1

0 1

t:=10.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.25

0.125

38/42




1

0 1

t:=10.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

38/42




1

0 1

t:=20.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

39/42




1

0 1

t:=20.66667

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.66667

0.53333

0.30476

0.20317

1

0 1

0

0

1

0 1

39/42




1

0 1

t:=20.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

39/42




1

0 1

t:=30.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

40/42




1

0 1

t:=30.75000

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.75000

0.55654

0.34374

0.25781

1

0 1

0

0

1

0 1

40/42




1

0 1

t:=30.75000

0.80000

0.20000

0.80000

0.20000

0.65072

0.34928

0.57399

0.42601

0.75000

0.55654

0.34374

0.25781

1

0 1

0

0

1

0 1

41/42

The end

Open problems: Find a fast algorithm for the problem

There exists a PSPACE algorithm for the problem, but it is not fast.

Thanks for listening

42/42