qest'12 paper seminar

The PMC ProblemResolving Non-Determinism

AlgorithmImplementation and Results

Conclusions

Statistical Model Checking for Markov DecisionProcesses

D. Henriques, J. Martins, P. Zuliani, A. Platzer, E. M. Clarke

Computer Science DepartmentCarnegie Mellon University

June 6, 2012

D. Henriques, J. Martins, P. Zuliani, A. Platzer, E. M. Clarke Statistical Model Checking for Markov Decision Processes



Conclusions

Summary

1 The PMC Problem

2 Resolving Non-Determinism

3 Algorithm

4 Implementation and Results

5 Conclusions




Conclusions

Model Checking

Given:

Property ϕ in temporal logic

A transition system M

Does ϕ hold in M, or M |= ϕ?

Example

Is one car always safely behind another, where x1 and x2 are theirpositions:

Gx1 < x2

State of the art can handle millions of states. Used in hardwareand software industry.




Conclusions

Probabilistic Model Checking

Given:

Property ϕ in temporal logic

A probabilistic transition system M

A probability threshold θ

Is the probability that M satisfies ϕ smaller than θ, P≤θ(ϕ)?

Example

Is it very unlikely that cars collide?

P≤0.00001(Fx1 = x2)




Conclusions

Probabilistic Model Checking: exact approach

Exact methods: pros

Can currently handle relatively complex scenarios

Handles systems with non-determinism

Mature tools such as PRISM

They are exact...

Exact methods: cons

State explosion problem greatly reduces applicability

Time-consuming

Possibly hard to parallelise (e.g. PRISM)




Conclusions

Probabilistic Model Checking: statistical approach

Statistical Model Checking: pros

Can currently handle very complex scenarios

Highly parallelisable

Only requires bounded memory

Comes in two flavours: hypothesis testing, interval estimation

Statistical Model Checking: cons

Not exact (but converges to correct solution)

Requires a bounded number ”steps”, i.e. bounded property

Requires fully probabilistic systems

But most interesting systems feature non-determinism!




Conclusions

Summary

1 The PMC Problem


3 Algorithm


5 Conclusions




Conclusions

Markov decision processes (MDPs) & schedulers

p

s p

0.01

0.99

0.5 1

0.5

MDP chooses action non-deterministically

Each action has a distribution of target states

Schedulers σ : States→ Actions are used to resolvenon-determinism

General schedulers 6= memoryless schedulers for boundedproperties




Conclusions

Probabilistic Model Checking: resolving non-determinism

Property P≤θ(ϕ) is actually: is the probability that model Msatisfies property ϕ smaller than θ for all schedulers.

Thus, we check only for optimal schedulers, i.e. thatmaximises P(ϕ)

If optimal schedulers generate probabilities above θ, theproperty is false.

True otherwise.

How do we find optimal schedulers?




Conclusions

Summary

1 The PMC Problem


3 Algorithm


5 Conclusions




Conclusions

Schedulerevaluation

Schedulerimprovement

Determinisation

SMC

False

True

σ uniform

σ improvedQ

σ candidate

deterministic σ




Conclusions

Scheduler Evaluation

This step estimates:

How good the current scheduler is

How much each choice contributed to the satisfaction of ϕ

It does this by:

Turning the MDP into a Markov chain using scheduler σ

Sampling from the Markov chain repeatedly

For satisfying trace, give a positive point to each action taken,and vice-versa for non-sat

At the end, for each action, we have an estimate of the probabilityof satisfaction of ϕ




Conclusions

Scheduler Improvement

This step provably improves scheduler σ by:

For each state, choosing the “best” action with probability1− εChoosing all others with ε

n−1 probability, with n the numberof possible actions

This ensures that:

Search efforts are largely directed at the promising regions ofthe state space

All states remain explorable/reachable (p > 0)




Conclusions

Putting it all together

The entire algorithm is thus very simple:

Start with an uninformed (i.e. Uniform) scheduler

Estimate best actions

Improve scheduler with this new information

Rinse & repeat

When scheduler is “good enough” (or time-limit reached),determinise it

Run Statistical Model Checking using the determinisedscheduler




Conclusions

Properties

This algorithm is a False-biased Monte Carlo algorithm.

If it finds a counterexample, it returns false (up to SMC)

If it does not, it returns true with arbitrarily high probability

It has the following nice properties

Provides counter-example

Converges

Statistically correct

Highly parallelisable




Conclusions

Summary

1 The PMC Problem


3 Algorithm


5 Conclusions




Conclusions

Implementation

Integrated with PRISM simulation engine

Works with PRISM MDP benchmarks

Parallel sampling

Synchronisation for data-structures during evaluation

Ran on 32-core and 48-core machines




Conclusions

Scheduler improvement

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90

# sa%sfied

traces / # to

tal traces

# Learning rounds

10 processes 50 processes 100 processes




Conclusions

Parallelisability

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Run$

me (s)

# of threads

Mutex Bugged 10

Mutex Bugged 30




Conclusions

Correctness

0

50

100

150

200

250

300

350

10 (8K states) 15 (393K states) 20 (16M states) 25 (654M states)

Tim

e(s)

Time (SMC) Time (PRISM)

Learning Optimal Policies for Model Checking João Martins and David Henriques

Introduction

Background

The Algorithm Results - Convergence

References

Markov decision processes (MDPs) are expressive models, popular for modeling systems that exhibit both probabilistic and non-deterministic behaviour.

Useful quantitative properties over MDPs can be automatically verified with probabilistic model checking (PMC), a popular formal verification technique.

Unfortunately, PMC suffers from the state explosion problem. Statistical methods can be used to approximate the desired result without need for complete state space exploration.

One well identified shortcoming is that Statistical methods have been limited to fully probabilistic systems.

Bounded LTL BLTL is an expressive probabilistic logic for reasoning about dynamic systems. Its syntax is given by

:= p | | F<n | G<n | U<n | W<n

It allows us to express properties such as request is acknowledged within n time or process enters the critical region until the flag is .

PRISM Prism is the reference state of the art probabilistic model checker.

It answers the question P>p( ) using value iteration: is the probability of satisfying greater than p for all resolutions of nondeterminism? P>p( ) is known as the probabilistic property and is known as the temporal formula.

Requires the entire state space in memory.

Objectives Develop a Reinforcement Learning algorithm to learn optimal policies for model checking PLTL in MDPs that does not require computation over the entire state space.

Using the above technique, apply Statistical Model Checking (SMC) in systems with non determinism. To the best of our knowledge, this is the first attempt to solve this problem in a general setting.

Integrate the algorithm with the PRISM model checker, in particular, allow the use of extensive benchmark suite.

Younes, H., Clarke, E. and Zuliani, P. Statistical of Probabilistic Properties with Unbounded Until.

10.

Kwiatkowska, M., Norman, G. and Parker, D. PRISM: Probabilistic symbolic model checker.

Bogdoll, J. and Fioriti, L. and Hartmanns, A. and Hermanns, H. Partial Order Methods for Statistical Model Checking and Simulation. 11.

Top-Level Algorithm 1. Initialise a policy such that actions are chosen

uniformly from each state

2. Do K times:

a. Sample a set P of N paths from the MDP using policy

b. For each

If

Positively reinforce

If ( )

Negatively reinforce

c. Update policy based on reinforcement

3. Determinise the policy

4. Use SMC to check the probabilistic property

Leader Election Protocol - Error Randomized leader election protocol. Graphic shows the probability of electing a leader within x steps.

Results - Efficiency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

25 30 35 40 45 50 55 60 65 70 75

Pro

babi

lity

of E

lect

ing

a Le

ader

Lower bound Upper bound Real probability

Mutex Protocol Error Several mutual exclusion processes, one having a small probability of entering the critical zone illegally. Checking the worst-case probability of error. Reinforcement

A, the reinforcement R is R(s,a) = |{ : (s,a and }|- |{ :(s,a and

( )}|

Policy Update New probability distributions are Multinomials with

parameters given by the MLE from reinforcement information (R(a,s)/ R(s,a)).

To avoid having transitions with 0 probability and minimize harmful runs, actual policy updates are a mixture of the previous distribution and the new probability distribution.

Convergence and Stopping Criteria Since optimal policies are deterministic, every once in

a while we determinise the policy and check the probabilistic property using SMC.

Negative answers from SMC are (probabilistically) guaranteed to mean the probabilistic property is false, since there is at least a policy achieving the value in question.

Positive answers from SMC may be false positives. We run the algorithm several times to minimize the probability of always converging to local maxima.

Mutex Protocol - Efficiency

# steps # processes

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Pro

babi

lity

Deterministic Probabilistic

Mutex Protocol Convergence

K

Wireless Network Efficiency IEEE 802.11 Wireless LAN standard for collision avoidance. Several stations broadcast at the same time and enact a back off protocol when collisions are detected.

0.7

0.75

0.8

0.85

0.9

0.95

1

0.75

0.8

0.85

0.9

0.95

1

0.75

0.8

0.85

0.9

0.95

1

0.75

0.8

0.85

0.9

0.95

1

10 processes

True probability

True probability

True probability probability

15 processes

20 processes 25 processes 0

10

20

30

40

50

60

70

2 (204K states) 3 (616K states) 4 (1.9M states) 5 (6.2M states) 6 (19.8M states)

Tim

e (s

)

Time (PRISM) Time (SMC) # stations




Conclusions

CS

MA

34

θ 0.5 0.8 0.85 0.9 0.95 PRISMout F F F T T 0.86t 1.7 11.5 35.9 115.7 111.9 136

CS

MA

36

θ 0.3 0.4 0.45 0.5 0.8 PRISMout F F F T T 0.48t 2.5 9.4 18.8 133.9 119.3 2995

CS

MA

44

θ 0.5 0.7 0.8 0.9 0.95 PRISMout F F F F T 0.93t 3.5 3.7 17.5 69.0 232.8 16244

CS

MA

46

θ 0.5 0.7 0.8 0.9 0.95 PRISMout F F F F F∗

memout

t 3.7 4.1 4.2 26.2 258.9 memout

WL

AN

5

θ 0.1 0.15 0.2 0.25 0.5 PRISMout F F T T T 0.18t 4.9 11.1 124.7 104.7 103.2 1.6

WL

AN

6

θ 0.1 0.15 0.2 0.25 0.5 PRISMout F F T T T 0.18t 5.0 11.3 127.0 104.9 102.9 1.6




Conclusions

Summary

1 The PMC Problem


3 Algorithm


5 Conclusions




Conclusions

Conclusions

Trading absolute correctness for statistical correctness gives usmore applicability

Faster than traditional exact approaches for not completelystructured systems

Statistical correctness

Integration with PRISM




Conclusions

Thank you, questions?


qest'12 paper seminar

Documents