bounded model checking of concurrent data types on relaxed memory models: a case study sebastian...

Bounded Model Checking of Concurrent Data Types on Relaxed

Memory Models:A Case Study

Sebastian BurckhardtRajeev Alur

Milo M. K. Martin

Department of Computer and Information Science

University of Pennsylvania

CAV 2006, Seattle

Sebastian Burckhardt-2-

software multiprocessor

concurrent executions

The General Problem

concurrency libraries can help

e.g. Java JSR-166 but how to debug the

libraries?

optimized implementations of concurrentdatatypes

shared-memory multiprocessorwith relaxedmemory model

The Specific Problem

case study: use SAT solver to find bugs

concurrent executions

Case Study: Two-Lock Queue

Algorithm published by M. Michael and M. Scott [PODC 1996]

Singly linked list with head and tail pointers Dummy node at front Independent head and tail locks

→ allows for concurrent enqueue() and dequeue() Race condition if queue is empty

client program observes ordering of operation calls within each thread argument and return values of the operation

code is correct if and only if all executions are observationally equivalent to some serial execution(def. serial: interleaved at operation boundaries only)

We assume serial executions are correct(can be verified by convential sequential methods)

thread 1

enqueue(1) enqueue(2)

thread 2

enqueue(3) dequeue() → 1

thread 3

dequeue() → 3 dequeue() → 2

Case Study: Our Correctness Criterion

Finer Interleavings = More Executions

serial executionsthreads interleave the operations(operations are atomic)

(operations are in-order) sequentially consistent executions

threads interleave the instructions

(instructions are atomic)

(instructions are in-order) relaxed executions

hardware makes performance-motivated compromises(stores may be non-atomic)

(loads/stores may be out-of-order)

Serial

Relaxed

Reordered Instructions= More Executions

Case Study: Relaxed Memory Models

Example:

output not consistent with any interleaved execution! can be the result of out-of-order stores can be the result of out-of-order loads improves performance (more choices for processor)

Q: Why doesn’t everything break?A: Relaxations are transparent to “normal” programs uniprocessor semantics are preserved library code for lock/unlock contains memory ordering fences

x = 1 y = 2

print y print x

thread 1 thread 2

→ 2 → 0

Which Memory Model?

Memory models are platform dependent

We use a conservative approximation “Relaxed” to capture common effects

Once code is correct for “Relaxed”, it is correct for all models

See paper for formal spec of “Relaxed”

PPCAlpha

Relaxed

Halftime Overview

General motivation Case study parameters

Two-lock queue implementation Correctness criterion Relaxed memory models

Our verification method Symbolic tests SAT encoding

Results Bugs found Evaluation & Conclusion

coming up

Our Verification Method

Encoder SAT solver

implementation code with commit points

symbolic test

pass counterexample

thread 1

enqueue(X)

thread 2

dequeue() → Y

How To Bound Executions

Verify individual “symbolic tests” finite number of operations nondeterministic instruction order nondeterministic input values

Example(this is the smallest one in our test suite)

User creates suite of tests of increasing size

Why symbolic test programs?

1) Avoid undecidability by making everything finite: State is unbounded (dynamic memory allocation)

... is bounded for individual test Sequential consistency is undecidable

... is decidable for individual test

2) Gives us finite instruction sequence to work with State space too large for interleaved system model

.... can directly encode value flow between instructions Memory model specified by axioms

.... can directly encode ordering axioms on instructions

Implementation code

we hand-translated Michael & Scott’s code (above) into a low-level representation that uses explicit loads, stores

we added code for dynamic memory allocation and locks

Commit points

designate where the operation commits logically

given order of commit points, we can constructserial witness execution

eliminates the in“ executions equivalent serial execution”

Counterexample Tracethread 1 enqueue (1)

thread 2 dequeue() → 0

commit point order (3 < 6) indicates that enqueue precedes dequeue, so we would expect dequeue() → 1

incorrect value (0) of queue element gets read (7) before correct value (1) is being written (11).

Encoding

Given symbolic test T(A, B) memory model Y implementation code & commit point specifications

Encoding First step: encode concurrent executions of T on Y as solutions

to CNF formula Y(A, B, X) (aux vars X) Second step: encode counterexamples as solutions to

Y(A, B, X) Atomic(A’, B’, X’) (A = A’ ) (commit point orders match) ((B B’ ) (some operations commit out of order))

thread 1

enqueue(A)

thread 2

dequeue() → B

Encoding Detail:Obtain Symbolic Instruction Stream

Finite instruction sequence for each thread Only loads, stores, moves, and fences Each register is assigned exactly once Control flow represented by predicates

Encoding Detail:Memory Order Example: two threads: Encoding variables

Use bool vars for relative order (x<y) of memory accesses Use bitvector variables Ax and Dx for address and data values associated with memory access x

Encode constraints encode transitivity of memory order encode ordering axioms of the memory model

Example (for SC): (s1<s2)(l1<l2) encode value flow

“Loaded value must match last value stored to same address”Example: value must flow from s1 to l1 under following conditions:

((s1<l1) (As1 = Al1) ((s2<s1)(l1<s2)(As2 Al1))) (Ds1= Dl1)

s1 store s2 store

l1 load l2 load

thread 1 thread 2

Encoding Detail:The combined formula

communication formula

memory order variablesinputvalues

outputvalues

intermediatevalues

thread-local formulas

So what did we learn in the case study?

General motivation Case study parameters

Two-lock queue implementation Correctness criterion Relaxed memory models

Our verification method Symbolic tests SAT encoding

Results Bugs found Evaluation & Conclusion

coming up

Results: 5 code problems found

3 were mistakes we made first commit point guess was wrong incorrect/insufficient fences in lock/unlock and alloc/free

2 were caused by missing fences in queue implementation(not fault of authors... were assuming SC

multiprocessor)

---store-store-fence

---load-load-fence

0 50 100 150

# memory accesses

]Results: Scalability

Graph shows tests in our suite (unsatisfiable instances only) y-axis : runtime in seconds x-axis : # accesses

(loads/stores) in test Fast on small tests, slow

on long tests Not sensitive to # threads

All 5 problems were found on smallest 2 tests... all under 1 sec

Conclusion

quickly finds subtle bugs supports relaxed memory models counterexample traces catches broad range of bugs

(not limited to deadlocks or data races)

is more automatic than deductive methods

not truly scalable(though scalable enough to be useful)

not fully automatic does not solve full

problem(bounded instances, commit points)

We would recommend this method to designers and implementors of concurrent data types.

PROs CONsFUTURE WORK &

CHALLENGES

Ordering/Atomicity Relaxations

store A, 1load B, 0

store B, 1load A, 0

processor 1 processor 2 initially A=B=0

pink numbers =memory order

store A, 1load A, regstore reg, B

load B, 1load A, 0

processor 1 processor 2 initially A=B=0

split store into local / remotecomponents

EXAMPLE 1store, load may executeout of order

EXAMPLE 2stores arebuffered locallybefore effectis global

The following 2 examples illustrate the main effects (1. ordering relaxation / 2. atomicity relaxation)

Where necessary, a programmer can prevent these effects by inserting fence instructions

What code?Data type implementations optimizedfor concurrent execution(Concurrency libraries)

What machines?Common shared-memory multiprocessors(e.g. PPC, Sparc, Alpha)

What bugs?Bugs caused by concurrency(We assume code runs fine if single-threaded)

Encoding Concurrent Executions

x1 load a[0], R1x2 store R1, y

x3 load y, R2 move R2+1, R3x4 store 1, a[R3]

Variables O(n2) bitvectors R1, R2, R3 for intermediate values boolean variables Mij to represent memory order xi < xj (for i < j)

Constraints O(n3) memory order is transitive: Λi<j<k (Mij Mjk) → Mik

loads get latest value stored to same address memory order must respect memory model axioms and fences

(e.g. sequential consistency requires M12 M34)

thread-local computations connect values (e.g. R3 = R2 + 1)

label label

bounded model checking of concurrent data types on relaxed memory models: a case study sebastian...

executions slide

sc slide

seattle slide

instructions memory

dequeue y

head lock tail lock

bugs concurrent executions

relaxed memory models

Documents

cis 540 principles of embedded computation spring 2014...

cicat endotorch filedynamesh®-cicat endotorch cicat milo...

memory model sensitive analysis of concurrent...

burckhardt, jacob. reflections on history

alur vct.doc

verifying safety of a token coherence implementation by...

jacob burckhardt 1860

annual report 2020 - burckhardt compression

reframing alur tof & mapaba

scanned by camscanner - uwks baru 2018-09-21...flowchart...

raindrop energy (final velocity) detachment deposition...

jacob burckhardt- transcending history

corporate governance - burckhardt compression...sales...

catching bugs in software rajeev alur systems design...

timed automata rajeev alur university of pennsylvania alur/...

b2b bartering with alur

memory model sensitive analysis of concurrent data...

ed burckhardt local marketing tips

alur penelitian

embedding csr at burckhardt compression