bounded model checking of concurrent data types on relaxed memory models: a case study sebastian...
Post on 26-Mar-2015
216 Views
Preview:
TRANSCRIPT
Bounded Model Checking of Concurrent Data Types on Relaxed
Memory Models:A Case Study
Sebastian BurckhardtRajeev Alur
Milo M. K. Martin
Department of Computer and Information Science
University of Pennsylvania
CAV 2006, Seattle
Sebastian Burckhardt-2-
software multiprocessor
concurrent executions
bugs
The General Problem
concurrency libraries can help
e.g. Java JSR-166 but how to debug the
libraries?
Sebastian Burckhardt-3-
optimized implementations of concurrentdatatypes
shared-memory multiprocessorwith relaxedmemory model
bugs
The Specific Problem
case study: use SAT solver to find bugs
concurrent executions
Sebastian Burckhardt-4-
Case Study: Two-Lock Queue
Algorithm published by M. Michael and M. Scott [PODC 1996]
1 2 3
head
lock
tail
lock
Singly linked list with head and tail pointers Dummy node at front Independent head and tail locks
→ allows for concurrent enqueue() and dequeue() Race condition if queue is empty
head
lock
tail
lock
Sebastian Burckhardt-5-
client program observes ordering of operation calls within each thread argument and return values of the operation
code is correct if and only if all executions are observationally equivalent to some serial execution(def. serial: interleaved at operation boundaries only)
We assume serial executions are correct(can be verified by convential sequential methods)
thread 1
enqueue(1) enqueue(2)
thread 2
enqueue(3) dequeue() → 1
thread 3
dequeue() → 3 dequeue() → 2
Case Study: Our Correctness Criterion
Sebastian Burckhardt-6-
Finer Interleavings = More Executions
serial executionsthreads interleave the operations(operations are atomic)
(operations are in-order) sequentially consistent executions
threads interleave the instructions
(instructions are atomic)
(instructions are in-order) relaxed executions
hardware makes performance-motivated compromises(stores may be non-atomic)
(loads/stores may be out-of-order)
Serial
SC
Relaxed
Reordered Instructions= More Executions
Sebastian Burckhardt-7-
Case Study: Relaxed Memory Models
Example:
output not consistent with any interleaved execution! can be the result of out-of-order stores can be the result of out-of-order loads improves performance (more choices for processor)
Q: Why doesn’t everything break?A: Relaxations are transparent to “normal” programs uniprocessor semantics are preserved library code for lock/unlock contains memory ordering fences
x = 1 y = 2
print y print x
thread 1 thread 2
→ 2 → 0
Sebastian Burckhardt-8-
Which Memory Model?
Memory models are platform dependent
We use a conservative approximation “Relaxed” to capture common effects
Once code is correct for “Relaxed”, it is correct for all models
See paper for formal spec of “Relaxed”
TSO
PSO
PPCAlpha
Relaxed
RMO
390SC
Sebastian Burckhardt-9-
Halftime Overview
General motivation Case study parameters
Two-lock queue implementation Correctness criterion Relaxed memory models
Our verification method Symbolic tests SAT encoding
Results Bugs found Evaluation & Conclusion
done
coming up
Sebastian Burckhardt-10-
Our Verification Method
Encoder SAT solver
implementation code with commit points
symbolic test
pass counterexample
1
2
5
4
3
Sebastian Burckhardt-11-
thread 1
enqueue(X)
thread 2
dequeue() → Y
How To Bound Executions
Verify individual “symbolic tests” finite number of operations nondeterministic instruction order nondeterministic input values
Example(this is the smallest one in our test suite)
User creates suite of tests of increasing size
1
Sebastian Burckhardt-12-
Why symbolic test programs?
1) Avoid undecidability by making everything finite: State is unbounded (dynamic memory allocation)
... is bounded for individual test Sequential consistency is undecidable
... is decidable for individual test
2) Gives us finite instruction sequence to work with State space too large for interleaved system model
.... can directly encode value flow between instructions Memory model specified by axioms
.... can directly encode ordering axioms on instructions
Sebastian Burckhardt-13-
Implementation code
we hand-translated Michael & Scott’s code (above) into a low-level representation that uses explicit loads, stores
we added code for dynamic memory allocation and locks
2
Sebastian Burckhardt-14-
Commit points
designate where the operation commits logically
given order of commit points, we can constructserial witness execution
eliminates the in“ executions equivalent serial execution”
3
Sebastian Burckhardt-15-
Counterexample Tracethread 1 enqueue (1)
thread 2 dequeue() → 0
451
1112
2
1314
3
6
7
910
8
commit point order (3 < 6) indicates that enqueue precedes dequeue, so we would expect dequeue() → 1
incorrect value (0) of queue element gets read (7) before correct value (1) is being written (11).
4
Sebastian Burckhardt-16-
Encoding
Given symbolic test T(A, B) memory model Y implementation code & commit point specifications
Encoding First step: encode concurrent executions of T on Y as solutions
to CNF formula Y(A, B, X) (aux vars X) Second step: encode counterexamples as solutions to
Y(A, B, X) Atomic(A’, B’, X’) (A = A’ ) (commit point orders match) ((B B’ ) (some operations commit out of order))
thread 1
enqueue(A)
thread 2
dequeue() → B
5
Sebastian Burckhardt-17-
Encoding Detail:Obtain Symbolic Instruction Stream
Finite instruction sequence for each thread Only loads, stores, moves, and fences Each register is assigned exactly once Control flow represented by predicates
Sebastian Burckhardt-18-
Encoding Detail:Memory Order Example: two threads: Encoding variables
Use bool vars for relative order (x<y) of memory accesses Use bitvector variables Ax and Dx for address and data values associated with memory access x
Encode constraints encode transitivity of memory order encode ordering axioms of the memory model
Example (for SC): (s1<s2)(l1<l2) encode value flow
“Loaded value must match last value stored to same address”Example: value must flow from s1 to l1 under following conditions:
((s1<l1) (As1 = Al1) ((s2<s1)(l1<s2)(As2 Al1))) (Ds1= Dl1)
s1 store s2 store
l1 load l2 load
thread 1 thread 2
O(n2)
O(n3)
Sebastian Burckhardt-19-
Encoding Detail:The combined formula
communication formula
memory order variablesinputvalues
outputvalues
intermediatevalues
thread-local formulas
Sebastian Burckhardt-20-
So what did we learn in the case study?
General motivation Case study parameters
Two-lock queue implementation Correctness criterion Relaxed memory models
Our verification method Symbolic tests SAT encoding
Results Bugs found Evaluation & Conclusion
done
coming up
Sebastian Burckhardt-21-
Results: 5 code problems found
3 were mistakes we made first commit point guess was wrong incorrect/insufficient fences in lock/unlock and alloc/free
2 were caused by missing fences in queue implementation(not fault of authors... were assuming SC
multiprocessor)
---store-store-fence
---load-load-fence
Sebastian Burckhardt-22-
0
100
200
300
400
500
600
700
800
900
1000
0 50 100 150
# memory accesses
Ru
nti
me
[s
ec
]Results: Scalability
Graph shows tests in our suite (unsatisfiable instances only) y-axis : runtime in seconds x-axis : # accesses
(loads/stores) in test Fast on small tests, slow
on long tests Not sensitive to # threads
All 5 problems were found on smallest 2 tests... all under 1 sec
Sebastian Burckhardt-23-
Conclusion
quickly finds subtle bugs supports relaxed memory models counterexample traces catches broad range of bugs
(not limited to deadlocks or data races)
is more automatic than deductive methods
not truly scalable(though scalable enough to be useful)
not fully automatic does not solve full
problem(bounded instances, commit points)
We would recommend this method to designers and implementors of concurrent data types.
PROs CONsFUTURE WORK &
CHALLENGES
Sebastian Burckhardt-24-
Sebastian Burckhardt-25-
Ordering/Atomicity Relaxations
store A, 1load B, 0
store B, 1load A, 0
processor 1 processor 2 initially A=B=0
pink numbers =memory order
31
42
store A, 1load A, regstore reg, B
load B, 1load A, 0
processor 1 processor 2 initially A=B=0
split store into local / remotecomponents
1/623
45
EXAMPLE 1store, load may executeout of order
EXAMPLE 2stores arebuffered locallybefore effectis global
The following 2 examples illustrate the main effects (1. ordering relaxation / 2. atomicity relaxation)
Where necessary, a programmer can prevent these effects by inserting fence instructions
Sebastian Burckhardt-26-
What code?Data type implementations optimizedfor concurrent execution(Concurrency libraries)
What machines?Common shared-memory multiprocessors(e.g. PPC, Sparc, Alpha)
What bugs?Bugs caused by concurrency(We assume code runs fine if single-threaded)
Sebastian Burckhardt-27-
Encoding Concurrent Executions
x1 load a[0], R1x2 store R1, y
x3 load y, R2 move R2+1, R3x4 store 1, a[R3]
Variables O(n2) bitvectors R1, R2, R3 for intermediate values boolean variables Mij to represent memory order xi < xj (for i < j)
Constraints O(n3) memory order is transitive: Λi<j<k (Mij Mjk) → Mik
loads get latest value stored to same address memory order must respect memory model axioms and fences
(e.g. sequential consistency requires M12 M34)
thread-local computations connect values (e.g. R3 = R2 + 1)
label label
top related