abstract transformers for thread correlation analysis michal segalov, tau tal lev-ami, tau roman...
TRANSCRIPT
Abstract Transformersfor Thread Correlation Analysis
Michal Segalov, TAUTal Lev-Ami, TAU
Roman Manevich, TAUG. Ramalingam, MSR
IndiaMooly Sagiv, TAU
Motivation A novel approach for static analysis of
highly concurrent algorithms Verify correctness Alert on (possible) bugs
Challenges Fine-grained syncronization
Requires subtle reasoning on thread interference
Heap data structures Unbounded state space
2
add(node) {
while (true) {
<prev,cur,next,found> = locate(node.key)
if (found) return false;
node.next = cur
if (CAS(prev.next, <0,cur>, <0,node>))
return true;
}
remove(key) {
while (true) {
<prev,cur,next,found> = locate(key)
if (!found) return false;
if (!CAS(cur.next, <0,next>, <1,next>)) continue;
if (CAS(prev.next, <0,curr>, <0,next>)
DeleteNode(curr); return true;
else locate(key);
}
}locate(key) {restart: pred = Head ;<tmp,curr> = pred.next; while (true) { if (curr == null) return <null, null, null, false>; <cmark, next> = curr.next; ckey = curr.key; if (pred.next != <0,curr>) goto restart; if (!cmark) { if (ckey >= key) return <prev, curr, next, (key == ckey) > pred = curr; } else { if (CAS(pred.next, <0,curr>, <0,next>)) DeleteNode(curr); else goto restart; } curr = next; }} 3
Concurrent Set [M. Maged SPAA’02]
set implementedby linked list
heavy use ofCAS( Compare and Swap)fine-grained concurrency
add(node1) {
while (true) {
<prev1,cur1,next1,found>=locate(node1.key)
if (found) return false;
node1.next = curr1
if (CAS(prev1.next, <0,curr1>, <0,node1>))
return true;
}
remove(key) {
while (true) {
<prev2,cur2,next2,found> = locate(key)
if (!found) return false;
if (!CAS(cur2.next, <0,next2>, <1,next2>) continue;
if (CAS(prev2.next,<0,cur2>, <0,next2>))
DeleteNode(curr2); return true;
else locate(key);
}
}
curr1prev1
curr2 prev2next2
node1
Headm
4
CAS fails due to mark bit
12
3
4
Tr: remove(2)
Ta: add(3)
Detecting a Bug A node is removed before it is marked
remove(key) {
while (true) {
<prev,cur,next,found> = locate(key)
if (!found) return false;
if (!CAS(cur.next, <0,next>, <1,next>)
continue;
if (CAS(prev.next, <0,cur>, <0,next>))
DeleteNode(cur);
else locate(key);
}
}
5
add(node1) {
while (true) {
<prev,cur,next,found> = locate(node1.key)
if (found) return false;
node1.next = cur1
if (CAS(prev1.next,<0,cur1>,<0,node1>))
return true;
}
remove(key) {
while (true) {
<prev2,cur2,next2,found> = locate(key)
if (!found) return false;
if (CAS(prev2.next, <0,curr2>, <0,next2>)) DeleteNode(cur2);
if (!CAS(cur2.next, <0,next2>, <1,next2>) continue;
else locate(key);
}
}
Concurrent Set [M. Maged SPAA’02]
curr1prev1
curr2 prev2next2
node1
Head
A memory leak6
12
3
4
Tr: remove(2)
Ta: add(3)
Main Results Thread-correlation analysis
A new kind of thread-modular analysis Precise enough to prove properties of fine-
grained concurrent programs Not automatically proven before
Two transformer enhancements Summarizing Effects Summarizing Abstraction On a concurrent set imp. speedup is x34!
7
Thread-modular Abstraction
Abstraction from point of view of one thread Maintains local store and global store precisely Abstracts away local stores of all other threads
Naturally handles unbounded number of threads Imprecise modeling thread interactions
Fine-grained concurrency
main threadprecise information
programstate
8
t .
Thread Correlation Abstraction
Refines thread-modular abstraction to reason about thread interactions Tracks correlations between local stores
of every two threads 3 levels of abstraction
Main thread Secondary thread All other threads
Main-Second abstractedasymmetrically
9
secondary thread track less precisely
main threadprecise information
Singleton Buffer Example
10
boolean empty = true;
Object b = null;
produce() {
1: Object p = new();
2: await (empty) then {
b = p;
empty = false;
}
3:
}
consume() {
Object c;
4: await (!empty) then {
c = b;
empty = true;
}
5: use(c);
6: dispose(c);
7:
}
SafeDereference
NoDouble free
6: 6: 4: 4:
empty
c1 c2 c3 c4
6: 4: 6: 4:
empty
c1 c2 c3 c4
6: 4: 4: 6:
empty
c1 c2 c3 c4
Thread Modular Abstraction
C1
6: empty
c1
6: empty
c2
4: empty
c2
C2
…11
6: empty
c3
4: empty
c3
C3
6: empty
c4
4: empty
c4
C4
6: 6: 4: 4:
empty
c1 c2 c3 c4
6: 4: 6: 4:
empty
c1 c2 c3 c4
6: 4: 4: 6:
empty
c1 c2 c3 c4
Thread Modular Abstraction
C1
6: empty
c1
6: empty
c2
4: empty
c2
C2
…12
6: empty
c3
4: empty
c3
6: 4: 4: 4:
empty
c1 c2 c3 c4
C3
6: empty
c4
4: empty
c4
C4
6: 6: 4: 4:
empty
c1 c2 c3 c4
6: 4: 6: 4:
empty
c1 c2 c3 c4
6: 4: 4: 6:
empty
c1 c2 c3 c4
Thread Correlation Abstraction
C1,C2
6: empty
c1 c2
6: empty
c1 c2
6: empty
c1 c3
6: empty
c1 c3
6: empty
c1 c4
6: empty
c1 c4
6: empty
c2 c1
4: empty
c2 c1
4: empty
c2 c3
6: empty
c2 c3
4: empty
c2 c4
6: empty
c2 c4
C1,C4C1,C3 C2,C1 C2,C3 C2,C4
4: empty
c2 c3
4: empty
c2 c4
…13
2-thread factoid
6: 6: 4: 4:
C1,C2
empty
c1 c2 c3 c4
6: 4: 6: 4:
empty
c1 c2 c3 c4
6: 4: 4: 6:
empty
c1 c2 c3 c4
6: empty
c1 c2
Concretization Example
6: empty
c1 c2
6: empty
c1 c3
6: empty
c1 c3
6: empty
c1 c4
6: empty
c1 c4
6: empty
c2 c1
4: empty
c2 c1
4 empty
c2 c3
6: empty
c2 c3
4: empty
c2 c4
6: empty
c2 c4
C1,C4C1,C3 C2,C1 C2,C3 C2,C4
4: empty
c2 c3
4: empty
c2 c4
…
6: 4: 4: 4:
empty
c1 c2 c3 c4
14
secondary thread track less precisely
main threadprecise information
Abstractions Compared Thread-modular
abstraction 2 levels of
abstraction
Thread-correlation abstraction
3 levels of abstraction
15
main threadprecise information
Point-wise Transformer 6: C1: dispose(c1)
16
empty
6:
c1 c3b
6: C1: dispose(c1)
empty
7:
c1 c3b
17
6: C1: dispose(c1)
Safe??Single factoid – no…
All factoids – Yes!
Point-wise Transformer 6: C1: dispose(c1)
empty
5:
c2 c3b c1 c4
? ?
?: ?:
empty
5:
c2 c3b
?:
Build 3-Thread Factoids(model effect C1 has on C2)
empty
empty
6:
6:
c1 c2
c3
empty6:
c2
c2
empty6:
c1 c2
c3
empty4:
c3c2
empty6:
c4
empty6:
c2
c2
c4
empty4:
c4c2
empty6:
c1 c4
empty6:
c1 c4
empty6:
c1 c3
empty6:
c1 c3
C1,C2 C1,C4C1,C3C2,C3 C2,C4
empty6:
c2 c1
empty4:
c2 c1
C2,C1
…..18
C1: Executing
C2: Tracked
C3: Other
3-Thread Factoids
empty6:
c1 c2
empty6:
c1 c2
empty6:
c1 c3
empty6:
c1 c3
empty6:
c3
empty6:
c2
c2
c3
empty4:
c3c2
empty6:
c2 c1
empty4:
c2 c1
empty
c1 c2
6: 4:
c3
empty
c1 c2
6: 6:
c3
19
C1,C2 C1,C3 C2,C1
C2,C3
C1: Executing
C2: Tracked
C3: Other
6: C1: dispose(c) (exec)
20
empty
c1 c2
6:
c3
empty
c1 c2
6: 4:
c3
6:empty
c1 c2
6:
c3
7:
empty
c1 c2
7: 4:
c3
C1: Executing
C2: Tracked
C3: Other
21
empty
c1 c2
6:
c3
7:
empty
c1 c2
7: 4:
c3
empty6:
c2 c3
C2,C3C2,C1
empty6:
c2 c1
6: C1: dispose(c) (project)
empty4:
c2 c3
empty4:
c2 c1
C1: Executing
C2: Tracked
C3: Other
Transformers Spectrum
22
efficient
precise
most-precise transformerincomputable?
point-wise transformer(thread-modular)
efficientimprecise
baseline transformerprecise enough
quadratic blow-ups
w. Summarizing Effectsprecise enoughmore efficient
w. Summary Abstraction
precise enoughefficient
Reducing Quadratic Blow-ups
|3-thread factoids| O(|2-thread factoids|2)
Summarizing Effects Memoize computations on common sub
states No over-approximation
Summary Abstraction Aggressive abstraction to executing thread Crucial for performance Potential loss of precision
23
Memoizing PCs 6: C1: dispose(c)
empty6:
c1 c2
24
empty6:
c1 c2
empty6:
c1 c3
empty6:
c1 c3
C1,C2 C1,C3 C2,C1 C2,C3
empty6:
c2 c1
empty5:
c2 c1
empty4:
c2 c3
empty6:
c2 c3
empty5:
c2 c3
6:
5: 6:
5:
empty6:
c1 c2 c3
6: empty6:
c1 c2 c3
5:
3-T factoids
exec 6: C1: dispose(c)
empty7:
c1 c2 c3
6: empty7:
c1 c2 c3
5:
projC2,C1 C2,C3
empty6:
c2 c1
empty5:
c2 c1
empty6:
c2 c3
empty5:
c2 c3
C1: Executing
C2: Tracked
C3: Other
empty6:
c1 c2
25
empty6:
c1 c2
empty6:
c1 c3
empty6:
c1 c3
C1,C2 C1,C3 C2,C1 C2,C3
empty6:
c2 c1
empty5:
c2 c1
empty4:
c2 c3
empty6:
c2 c3
empty5:
c2 c3
6:
5: 6:
5:
these states identicalup to the PCs whichare invisible to theexecuting thread
Memoizing PCs 6: C1: dispose(c)
C1: Executing
C2: Tracked
C3: Other
Memoizing PCs 6: C1: dispose(c)
empty6:
c1 c2
26
empty6:
c1 c2
empty6:
c1 c3
empty6:
c1 c3
C1,C2 C1,C3 C2,C1 C2,C3
empty
c2 c1
empty
c2 c1
empty4:
c2 c3
empty
c2 c3
empty
c2 c3
6:
5: 6:
5:
empty6:
c1 c2 c3
3-T factoids
c1 c2
exec 6: C1: dispose(c)
empty7:
c3
projC2,C1 C2,C3
empty
c2 c1
empty
c2 c3
empty
c2 c1
empty
c2 c3
frame
6: 6:
5:5:
4:6:
5: 6:
5:
C1: Executing
C2: Tracked
C3: Other
Evaluation Implemented on top of TVLA
Unbounded number of threads Unbounded number of objects
Thread-modular not precise enough Thread correlations analysis proved
required properties Reproduced injected errors
27
Speedup
05
10152025303540
Speedup Summ. effects speedup
Summ Abs. speedup
Both speedup
28
Speedups Relative to Baseline
Related Work Thread-modular abstractions
Finite-state model checking[Flanagan & Qadeer, SPIN’03]
Environment abstraction[Clarke et al., VMCAI’06, TACAS’08]
Thread-modular shape analysis Coarse-grained concurrency
[Gotsman et al., PLDI’07] Fine-grained concurrency [SAS’08,
CAV’08]
29
Summary New analysis for concurrent systems
Thread-correlations abstraction Handles unbounded number of threads
Two important transformer enhancements Summarizing effects Summary abstraction Reduce quadratic blow-ups
Empirically evaluated
30
Thanks!
Which Properties Did You Prove?
Data Structure Invariants
Linearization
Hand Over Hand DCAS
CAS Lazy List
Maged
Maged Opt 32
Why 3 Levels of Abstraction?
Generalizes naturally by maintaining k local stores in second level k=1 suffices for our benchmarks
Same principles for optimizing More than levels of abstraction complicate
reasoning – usefulness not obvious
33
What is the Increment Relative to CAV’08?
CAV’08 uses two levels of abstraction – thread-modular
Baseline transformer – too expensive – timed-out on some of our benchmarks
34
Does Baseline Transformer Make Sense? Transformer used by earlier CAV’08
paper Starting point of [Flanagan & Qadeer,
SPIN’03] We added optimizations by distinguishing
3 levels of abstraction
35
Which Properties Did You Prove?
Data Structure Invariants
Linearization
Hand Over Hand DCAS
(And in other thread
too)
CAS Lazy List Michael Michael Opt
36
Summary Abstraction Sound approximation heuristics Details in paper
precisereasoning
coarsereasoning
Reduce preciseness
coarsereasoning
baseline transformer w. summary abstraction
37
0
200
400
600
800
1000
1200
Hand OverHand
Lazy List(OPODIS 05)
DCAS (PLDI08)
No Summ Abs NoSumm Effects
No Summ Abs WithSumm Effects
With Summ Abs NoSumm Effects
With Summ Abs WithSumm Effects
41
Running Times
Types of Algorithms
42
Lock free Wait free
Hand Over Hand
No No
DCAS
CAS No (locate)
Lazy List (locate)
Michael No
Michael Opt (locate)
Baseline Transformer
43
exec (1st)statement st
exec(tracked)statement st
3-threadsubstate
3-threadsubstate
factoids factoids factoids
Rt Rpartt
indirectdirect tttrttrRTR
21 ,
21# ,
Conditions EnsuringNo Loss of Precision
Abstraction does not distinguish between local stores with same footprint
Footprint is idempotent
44