heap decomposition for concurrent shape analysis

Heap Decompositionfor Concurrent Shape

Analysis

R. ManevichT. Lev-AmiM. SagivTel Aviv

University

G. Ramalingam

MSR India

J. Berdine

MSR Cambridge

Dagstuhl 08061, February 7, 2008

2

Thread modular analysisfor coarse-grained concurrency E.g., [Qadeer & Flanagan,

SPIN’03][Gotsman et al., PLDI’07] …

With each lock lk subheap h(lk) Partition heap

H = h(lk1) *…* h(lkn) local invariant I(lk)

inferred/specified When thread t

acquires lk it assumes I(lk) releases lk it ensures I(lk) Can analyze each thread “separately”

Avoid explicitly enumerating all thread interleavings

3

Thread modular analysisfor fine-grained concurrency?

CAS

CAS

CAS

CAS

CAS (Compare And Swap)

No locks means more interference between threads

No nice heap partitioning

Still idea of reasoning about threads separately appealing

4

Overview State space is too large for two reasons

Unbounded number of objects infinite Apply finitary abstractions to data structures (e.g.,

abstract away length of list) Exponential in the number of threads

Observation: Threads operate on part of state Correlations between different substates often

irrelevant to prove safety properties Our approach: develop abstraction for

substates Abstract away correlations between substates

of different threads Reduce exponential state space

5

Non-blocking stack [Treiber 1986]

[1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }

[9] data_type pop(Stack *S){[10] do {[11] Node *t = S->Top;[12] if (t == NULL)[13] return EMPTY;[14] Node *s = t->n;[15] data_type r = s->d;[16] } while (!CAS(&S->Top,t,s));[17] return r;[18] }

#define EMPTY -1

typedef int data type;

typedef struct node t { data type d; struct node t *n;} Node;

typedef struct stack t { struct node t *Top;} Stack;

6

Example: successful push


Top

n

tn

xn

7

Example: successful push


Top=CAS succeeds

n

n

tn

x

8

Example: unsuccessful push


CAS fails

Top

n

tn

xn

n

9

Concrete states with storable threads

Top

n

x

nx t

st

t

n

n

prod1

cons1

prod2

pc=7

cons2

pc=6

pc=14

pc=16

t

thread object:name +program location

local variable

next field of list

10

Full state S1

Top

n

x

nx t

st

t

n

n

prod1

cons1

prod2

pc=7

cons2

pc=6

pc=14

pc=16

t

11

Top

n

x

n

t

n

prod1

pc=7

Top

n

nx

t

prod2

pc=6

Top

n

n

cons1

pc=14t

Top

n

n

t

s

n

cons2

pc=16

M1 M2 M3 M4

Decomposition(S1) = M1 M2 M3 M4

Decomposition(S1)

Note that S1Decomposition(S1)

A substate represents all full states that

contain it

Decomposition isstate-sensitive

(depends on values of pointers and heap

connectivity)

12

Full states S1 S2

S1 S2

Top

n

x

nx t

st

t

n

n

prod1

cons1

prod2

pc=7

cons2

pc=6

pc=14

pc=16

t

Top

n

x

nx t

st

t

n

n

prod2

cons2

prod1

pc=7

cons1

pc=6

pc=14

pc=16

t

13

Decomposition(S1 S2)improve explanation

Top

nx

n

t

n

prod1

pc=7

Top

n

nx

t

n

prod2

pc=6

Top

n

n

t

cons1

pc=14

Top

n

nt

s

n

pc=16

cons2

Top

n

nx

t

n

prod1

pc=6

Top

nx

n

t

n

prod2

pc=7

Top

n

nt

s

n

pc=16

cons1

Top

n

n

t

cons2

pc=14

M1

M2

M3

M4

K1

K2

K3

K4

(S1S2) Decomposition(S1S2)Cartesian abstraction ignores

correlations between substates

Decomposition(S1S2) = (M1K1) (M2K2) (M3K3) (M4K4)

State space exponentially more compact

14

Abstraction properties Substates in each subdomain

correspond to a single thread Abstract away correlations between

threads Exponential reduction of state space

Substates preserve information on part of heap (relevant to one thread)

Substates may overlap Useful for reasoning about programs with

fine-grained concurrency Better approximate interference between

threads

15

Main results New parametric abstraction for heaps

Heap decomposition + Cartesian abstraction Parametric in underlying abstraction +

decomposition Parametric sound transformers

Allows balancing efficiency and precision Implementation in HeDec

Heap Decomposition + Canonical Abstraction Used to prove interesting properties of heap-

manipulating programs with fine-grained concurrency Linearizability

Analysis scales linearly in number of threads

16

Sound transformers

{XHj1} j1

{XHj2} j2

{XHj3} j3

{Xj4} j4

{YHj1’} j1’

{YHj2’} j2’

{YHj3’} j3’

{YHj4’} j4’

#

17

Pointwise transformers

{XHj1} j1

{XHj2} j2

{XHj3} j3

{XHj4} j4

{YHj1’} j1’

#

{YHj2’} j2’

#

{YHj3’} j3’

#

{YHj4’} j4’

#

often too imprecise

efficient

18

Imprecision example[1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;

[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }

Top

n

nx

t

n

prod2

pc=6

M2 # : schedules prod1 and executes x->n=t

But where do x and t of prod1

point to?

19

Imprecision example[1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;

[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }

Top

n

x

nx t

st

t

n

n

prod2

cons1

prod1

pc=7

cons2

pc=6

pc=14

pc=16

t #Top

n

x

n

t

n

prod2

pc=7

false alarm:possible cyclic

list

20

Full composition transformers

{XHj1} j1

{XHj2} j2

{XHj3} j3

{XHj4} j4{XHj1}{XHj1}{XHj1}{X

Hj1} #

#({XHj1}{XHj2}{XHj3}{XHj4})

{YHj1’} j1’

{YHj2’} j2’

{YHj3’} j3’

{YHj4’} j4’

exponential space blow-up

precise

21

Partial composition

{XHj1} j1

{XHj2} j2

{XHj3} j3

{XHj4} j4

{XHj1}{XHj2}

{XHj1}{XHj3}

{XHj1}{XHj4}

22

Partial composition

{XHj1}{XHj2}

{XHj1}{XHj3}

{XHj1}{XHj4}

{YHj1’} j1’

{YHj2’} j2’

{YHj3’} j3’

{YHj4’} j4’

#

#({XHj1}{XHj2})

#

#({XHj1}{XHj3})

#

#({XHj1}{XHj4})

efficient and precise

23

Partial composition example

Top

nx

n

t

n

prod1

pc=7

Top

n

nx

t

n

prod2

pc=6

Top

n

nx

t

n

prod1

pc=6

Top

nx

n

t

n

prod2

pc=7

M1

M2

K1

K2

{XHj1}{XHj2}

24

Partial composition example

{XHj1} j1

{XHj2} j2

{XHj1}{XHj2}

Top

n

x

nx

t

t

n

prod2

prod1

pc=7

pc=7

Top

n

x

nx

t

t

n

prod2

prod1

pc=7

pc=6n

K2k1 K2M1

pc=7

false alarm avoided

26

Experimental results List-based fine-grained algorithms

Non-blocking stack [Treiber 1986] Non-blocking queue [Doherty and Groves

FORTE’04]

Two-lock queue [Michael and Scott PODC’96] Benign data races

Verified absence of nullderef + mem. Leaks Verified Linearizability

Analysis built on top of existing full heap analysis of [Amit et al. CAV’07]

Scaled analysis from 2/3 threads to 20 threads Extended to unbounded threads (different work)

27

0

50000

100000

150000

200000

250000

0 5 10 15 20

number of threads

nu

mb

er

of

stat

es

Decomp

Full

0

1000

2000

3000

4000

0 10 20

number of threads

tim

e (s

ec.)

Experimental results Exponential time/space reduction

Non-blocking stack + linearizability

28

Related work Disjoint regions decomposition [TACAS’07]

Fixed decomposition scheme Most precise transformer is FNP-complete

Partial join [Manevich et al. SAS’04]

Orthogonal to decomposition In HeDec we combine decomposition + partial join

[Yang et al.] Handling concurrency for an unbounded

number of threads Thread-modular analysis [Gotsman et al. PLDI’07] Rely-guarantee [Vafeadis et al. CAV’07] Thread quantification (submitted)

29

More related work Local transformers

Works by Reynolds, O’Hearn, Berdine, Yang, Gotsman, Calcagno

Heap analysis by separation[Yahav & Ramalingam PLDI’04] [Hackett & Rugina POPL’05] Decompose verification problem itself and

conservatively approximate contexts Heap decomposition for interprocedural

analysis [Rinetzky et al. POPL’05] [Rinetzky et al. SAS’05] [Gotsman et al. SAS’06] [Gotsman et al. PLDI’07] Decompose/compose at procedure boundaries

Predicate/variable clustering [Clark et al. CAV’00] Statically-determined decomposition

30

Conclusion Parametric framework for shape

analysis Scaling analyses of program with fine-

grained concurrency Generalizes thread-modular analysis Key idea: state decomposition Also useful for sequential programs

Used prove intricate properties like linearizability

HeDec tool http://www.cs.tau.ac.il/~tvla#HEDEC

31

Future/ongoing work Extended analysis for an unbounded

number of threads via thread quantification Orthogonal technique Both techniques compose very well

Can we automatically infer good decompositions?

Can we automatically tune transformers?

Can we ruse ideas to non-shape analyses?

32

Invited questions How do you choose a decomposition? How do you choose transformers? How does it compare to separation

logic? What is a general principle and what

is specific to shape analysis? Caveats / limitations?

33

How do you choose a decomposition? In general this an open problem

Perhaps ctrex. refinement can help Depends on property you want to prove Aim at causes of combinatorial explosion

Threads Iterators

For linearizability we used For each thread t

Thread node, objects referenced by local variables, objects referenced by global variables

Objects referenced by global variables and objects correlated with seq. execution

Locks component: for each lock thread that acquires it

34

How do you choose transformers? In general challenging problem

Have to balance efficiency and precision Have some heuristics

Core subdomains

35

How does it compare to separation logic? Relevant separating conjunction *r

Like * but without the disjointness requirement Do you have an analog of the frame rule?

For disjoint regions decomposition [TACAS’07] In general no, but instead we can use

transformers of different level of precision

#(I1 I2) = #precise(I1) #less-precise(I2)

where #less-precise is cheap to compute Perhaps can find conditions for which

#(I1 I2) = #precise(I1) I2 Relativized formulae

36

What is a general principle and what is specific to shape analysis? Decomposing abstract domains is

general Substate abstraction + Cartesian product

Parametric transformers for Cartesian abstractions is general

Chopping down heaps by heterogeneous abstractions is shape-analysis specific

37

Caveats / limitations? Decomposition + transformers defined by

user Not specialized for program/property

Too much overlap between substates can lead to more expensive analyses

Too fine decomposition requires lots of composition

Partial composition is a bottle neck We have the theory for finer grained

compositions + incremental transformers but no implementation

Instantiated framework for just one abstraction (Canonical Abstraction) Can this be useful for separation logic-based

analyzers?

heap decomposition for concurrent shape analysis

Documents