locksmith: practical static race detection for cmwh/papers/locksmith-journal.pdflocksmith is a...

54
Locksmith: Practical Static Race Detection for C Polyvios Pratikakis University of Maryland and Jeffrey S. Foster University of Maryland and Michael Hicks University of Maryland Locksmith is a static analysis tool for automatically detecting data races in C programs. In this paper, we describe each of Locksmith’s component analyses precisely, and present systematic measurements that isolate interesting tradeoffs between precision and efficiency in each analysis. Using a benchmark suite comprising standalone applications and Linux device drivers totaling more than 200,000 lines of code, we found that a simple no-worklist strategy yielded the most efficient interprocedural dataflow analysis; that our sharing analysis was able to determine that most locations are thread-local, and therefore need not be protected by locks; that modeling C structs and void pointers precisely is key to both precision and efficiency; and that context sensitivity yields a much more precise analysis, though with decreased scalability. Put together, our results illuminate some of the key engineering challenges in building Locksmith and data race detection analyses in particular, and constraint-based program analyses in general. Categories and Subject Descriptors: F.3.1 [Logics and Meanings of Programs]: Specify- ing and Verifying and Reasoning about Programs; F.3.2 [Logics and Meanings of Programs]: Semantics of Programming Languages—Program Analysis; D.2.4 [Software Engineering]: Soft- ware/Program Verification—Validation General Terms: Data Race, Race Detection, Static Analysis Additional Key Words and Phrases: context-sensitive, correlation inference, sharing analysis, contextual effects, Locksmith 1. INTRODUCTION Multithreaded programming is becoming increasingly important as parallel ma- chines become more widespread. Dual-core processors are fairly common, even among desktop users, and hardware manufacturers have announced prototype chips with as many as 80 [intel.com 2007] or 96 [news.com 2007] cores. It seems inevitable that to take advantage of these resources, multithreaded programming will become the norm even for the average programmer. However, writing multithreaded software is currently quite difficult, because the programmer must reason about the myriad of possible thread interactions and may need to consider unintuitive memory models [Manson et al. 2005]. One partic- ularly important problem is data races, which occur when one thread accesses a memory location at the same time another thread writes to it [Lamport 1978]. Races make the program behavior unpredictable, sometimes with disastrous con- sequences [Leveson and Turner 1993; Poulsen 2004]. Moreover, race-freedom is an important property in its own right, because race-free programs are easier to un- ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY, Pages 1–0??.

Upload: others

Post on 18-Mar-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Locksmith: Practical Static Race Detection for C

Polyvios Pratikakis

University of Maryland

and

Jeffrey S. Foster

University of Maryland

and

Michael Hicks

University of Maryland

Locksmith is a static analysis tool for automatically detecting data races in C programs. In this

paper, we describe each of Locksmith’s component analyses precisely, and present systematic

measurements that isolate interesting tradeoffs between precision and efficiency in each analysis.Using a benchmark suite comprising standalone applications and Linux device drivers totaling

more than 200,000 lines of code, we found that a simple no-worklist strategy yielded the most

efficient interprocedural dataflow analysis; that our sharing analysis was able to determine thatmost locations are thread-local, and therefore need not be protected by locks; that modeling

C structs and void pointers precisely is key to both precision and efficiency; and that context

sensitivity yields a much more precise analysis, though with decreased scalability. Put together,our results illuminate some of the key engineering challenges in building Locksmith and data race

detection analyses in particular, and constraint-based program analyses in general.

Categories and Subject Descriptors: F.3.1 [Logics and Meanings of Programs]: Specify-ing and Verifying and Reasoning about Programs; F.3.2 [Logics and Meanings of Programs]:

Semantics of Programming Languages—Program Analysis; D.2.4 [Software Engineering]: Soft-

ware/Program Verification—Validation

General Terms: Data Race, Race Detection, Static Analysis

Additional Key Words and Phrases: context-sensitive, correlation inference, sharing analysis,

contextual effects, Locksmith

1. INTRODUCTION

Multithreaded programming is becoming increasingly important as parallel ma-chines become more widespread. Dual-core processors are fairly common, evenamong desktop users, and hardware manufacturers have announced prototype chipswith as many as 80 [intel.com 2007] or 96 [news.com 2007] cores. It seems inevitablethat to take advantage of these resources, multithreaded programming will becomethe norm even for the average programmer.

However, writing multithreaded software is currently quite difficult, because theprogrammer must reason about the myriad of possible thread interactions and mayneed to consider unintuitive memory models [Manson et al. 2005]. One partic-ularly important problem is data races, which occur when one thread accesses amemory location at the same time another thread writes to it [Lamport 1978].Races make the program behavior unpredictable, sometimes with disastrous con-sequences [Leveson and Turner 1993; Poulsen 2004]. Moreover, race-freedom is animportant property in its own right, because race-free programs are easier to un-

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY, Pages 1–0??.

2 ·

derstand, analyze and transform [Alexandrescu et al. 2005; Reynolds 2004]. Forexample, race freedom simplifies reasoning about code that uses locks to achieveatomicity [Flanagan and Freund 2004; Flanagan and Qadeer 2003].

In prior work, we introduced a static analysis tool called Locksmith for automat-ically finding all data races in a C program [Pratikakis et al. 2006b]. Locksmithaims to soundly detect data races, and works by enforcing one of the most commontechniques for race prevention: for every shared memory location ρ, there must besome lock ` that is held whenever ρ is accessed. When this property holds, we saythat ρ and ` are consistently correlated. In our prior work, we formalized Lock-smith by presenting an algorithm for checking consistent correlation in λ�, a smallextension to the lambda calculus that models locks and shared locations. We thenbriefly sketched some of the necessary extensions to handle the full C language, anddescribed empirical measurements of Locksmith on a benchmark suite.

In this paper, we discuss in detail the engineering aspects of scaling the basicalgorithms for race detection to the full C language. We present our algorithmsprecisely on a core language similar to λ�, which captures the interesting andrelevant features of C with POSIX threads and mutexes. We subsequently extendit with additional features of C, and describe how we handle them in Locksmith.We then perform a systematic exploration of the tradeoffs between precision andefficiency in the analysis algorithms used in Locksmith, both in terms of thealgorithm itself, and in terms of its effects on Locksmith as a whole. We performedmeasurements on a range of benchmarks, including C applications that use POSIXthreads and Linux kernel device drivers. Across more than 200,000 lines of code, wefound many data races, including ones that cause potential crashes. Put together,our results illuminate some of the key engineering challenges in building Locksmithin particular, and constraint-based program analyses in general. We discoveredinteresting—and sometimes unexpected—conclusions about the configuration ofanalyses that lead to the best precision with the best performance. We believe thatour findings will prove valuable for other static analysis designers.

We found that using efficient techniques in our dataflow analysis engine eliminatesthe need for complicated worklist algorithms that are traditionally used in similarsettings. We found that our sharing analysis was effective, determining that theoverwhelming majority of locations are thread-local, and therefore accesses to themneed not be protected by locks. We also found that simple techniques based onscoping and an intraprocedural uniqueness analysis improve noticeably on our basicthread sharing analysis. We discovered that field sensitivity is essential to model Cstructs precisely, and that distinguishing each type that a void ∗ may represent iskey to good precision. Lastly, we found that context sensitivity, which we achievewith parametric polymorphism, greatly improves Locksmith’s precision.

The next section presents an overview of Locksmith and its constituent algo-rithms, and serves as a road map for the rest of the paper.

2. OVERVIEW

Fig. 1 shows the architecture of Locksmith, which is structured as a series of sub-analyses that each generate and solve constraints. In this figure, plain boxes repre-sent processes and shaded boxes represent data. Locksmith is implemented using

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 3

CILCFG

Labeling and Constraint Generation

AbstractControl Flow Graph

Contextual Effect

Constraints

Label Flow Constraints(Points-to Analysis)

Linearity Check

Linearity Constraints

Sharing AnalysisLock State Analysis

Shared LocationsAcquired Locks

Correlation Inference

Correlations

Race Detection

Escaping Check

Non-linear Locks

Fig. 1. Locksmith architecture

CIL, which parses the input C program and simplifies it to a core sub-language [Nec-ula et al. 2002]. As of now, Locksmith supports the core of the POSIX and Linuxkernel thread API, namely the API calls for creating a new thread, the calls for allo-cating, acquiring, releasing and destroying a lock, as well as trylock(). Currently,we do not differentiate between read and write locks.

In the remainder of this section, we sketch each of Locksmith’s components andthen summarize the results of applying Locksmith to a benchmark suite. In thesubsequent discussion, we will use the code in Fig. 2 as a running example.

The program in Fig. 2 begins by defining four global variables, locks lock1 andlock2 and integers count1 and count2. Then lines 4–9 define a function atomic incthat takes pointers to a lock and an integer as arguments, and then increments

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

4 ·

1 pthread mutex t lock1, lock2;2 int count1 = 0, count2 = 0;3

4 void atomic inc(pthread mutex t ∗lock ,5 int ∗count) {6 pthread mutex lock(lock );7 ∗count++;8 pthread mutex unlock(lock);9 }

10

11 int main(void) {12 pthread t1, t2, t3 ;13 int local = 0;14

15 pthread mutex init (&lock1, NULL);16 pthread mutex init (&lock2, NULL);17

18 local ++;19 pthread create 1(&t1, NULL, thread1, &local);20 pthread create 2(&t2, NULL, thread2, NULL);21 pthread create 3(&t3, NULL, thread3, NULL);22 }

23 void ∗thread1(void ∗a) {24 int ∗y = (int ∗) a; /∗ int∗ always ∗/25 while(1) {26 ∗y++; /∗ thread−local ∗/27 }28 }29

30 void ∗thread2(void ∗c) {31 while(1) {32 pthread mutex lock(&lock1);33 count1++;34 pthread mutex unlock(&lock1);35 count2++; /∗ access without lock ∗/36 }37 }38

39 void ∗thread3(void ∗b) {40 while(1) {41 /∗ needs polymorphism for atomic inc ∗/42 atomic inc4(&lock1, &count1);43 atomic inc5(&lock2, &count2);44 }45 }

Fig. 2. Example multithreaded C program

the integer while holding the lock. The main function on lines 11–22 allocates aninteger variable local, initializes the two locks, and then spawns three threads thatexecute functions thread1, thread2 and thread3, passing variable local to thread1 andNULL to thread2 and thread3. We annotate each thread creation and function callsite, except calls to the special mutex initialization function, with a unique indexi, whose use will be explained below. The thread executing thread1 (lines 23–28)first extracts the pointer-to-integer argument into variable y and then continuouslyincrements the integer. The thread executing thread2 (lines 30–37) consists of aninfinite loop that increases count1 while holding lock lock1 and count2 withoutholding a lock. The thread executing thread3 (lines 39–45) consists of an infiniteloop that calls atomic inc twice, to increment count1 under lock1 and count2 underlock2.

There are several interesting things to notice about the locking behavior in thisprogram. First, observe that though the variable local is accessed both in the par-ent thread (lines 13,18) and its child thread thread1 (via the alias ∗y on line 26),no race is possible despite the lack of synchronization. This is because these ac-cesses cannot occur simultaneously, because the parent only accesses local beforethe thread for thread1 is created, and never afterward. Thus both accesses are lo-cal to a particular thread. Second, tracking of lock acquires and releases must beflow-sensitive, so we know that the access on line 33 is guarded by a lock, and theaccess on line 35 is not. Lastly, the atomic inc function is called twice (lines 42–43)with two different locks and integer pointers. We need context sensitivity to avoidconflating these two calls, which would lead to false alarms.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 5

&lock1 &lock2

lock count

&count1 &count2

&local a y(1

(4 (5 (4 (5

atomic_inc

main thead1

(a) Label flow graph

NewL(&lock1) [15]

NewL(&lock2) [16]

Acc(&local) [18]

Fork [19]

Fork [20]

Fork [21]

Acc(y) [26]

Acq(&lock1) [32]

Rel(&lock1) [34]

Acc(&count1) [33]

Acc(&count2) [35]

Acq(lock) [6]

Rel(lock) [8]

Acc(count) [7]

Call [42]

Call [43]

Ret [42]

Ret [43]

thread1

thread2

thread3

(from main)

(from main)

(from main)

(1

(2

(3

(4

(5

)4

)5

main

thread1

atomic_inc

thread2

thread3

(b) Abstract control flow graph

Fig. 3. Constraint graphs generated for example in Fig. 2

2.1 Labeling and Constraint Generation

The first phase of Locksmith is labeling and constraint generation, which traversesthe CIL CFG and generates two key abstractions that form the basis of subsequentanalyses: label flow constraints, to model the flow of data within the program, andabstract control flow constraints, to model the sequencing of key actions and relatethem to the label flow constraints. Because a set of label flow constraints can beconveniently visualized as a graph, we will often refer to them as a label flow graph,and do likewise for a set of abstract control flow constraints.

2.1.1 Label flow graph. Fig. 3(a) shows the label flow graph for the examplefrom Fig. 2. Nodes are static representations of the run-time memory locations(addresses) that contain locks or other data. Edges represent the “flow” of datathrough the program [Mossin 1996; Rehof and Fahndrich 2001; Kodumal and Aiken2005], e.g., according to assignment statements or function calls. The source of a

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

6 ·

path in the label flow graph is an allocation site, e.g., it is the address of a global orlocal variable (e.g., &lock1, &count1, or &local in Fig. 2), or the representation ofa program line at which a malloc() occurs. We distinguish addresses of locks fromthose of other data (which may be subject to races); generally speaking we refer tothe former using metavariable ` and the latter using metavariable ρ.

Locksmith’s label flow analysis is field-sensitive when modeling C struct types,in which each field of each instance of a struct is modeled separately. We found thatfield sensitivity significantly improves precision. To make our algorithm sufficientlyscalable, we modeled fields lazily [Foster et al. 2006]—only if (and when) a fieldwas actually accessed by the program does Locksmith model it, as opposed toeagerly tracking each field of a given instance from the time the instance is created.We found that over all benchmarks only 35%, on average, of the declared fieldsof struct variables in the program are actually accessed, and so the lazy approachafforded significant savings.

Locksmith also tries to model C’s void* types precisely, yet efficiently. In ourfinal design, when a void* pointer might point to two different types, we assumethat it is not used to cast between them, but rather that the programmer alwayscasts the void* pointer to the correct type before using it (in the style of anuntagged union). This is an unsound assumption that might possibly mask a race.However, we found it to greatly increase the precision of the analysis, and is usuallytrue for most C programs. We also tried two sound but less precise alternativestrategies. First, and most conservatively, if a type is cast to void*, we conflate allpointers in that type with each other and the void*. While sound, this techniqueis quite imprecise, and the significant amount of false aliasing it produces degradesLocksmith’s subsequent analyses. A second alternative we considered behaves inexactly the same way, but only when more than one type is cast to the same void*

pointer. Assuming a given void* is only cast to/from a single type, we can relateany pointers occurring within the type to the particular type instances cast to thevoid*, as if the type was never cast to void* in the first place. We found thatapproximately one third of all void* pointers in our benchmarks alias one type,so this strategy increased the precision compared to simply conflating all pointerscasted to a void*. Nevertheless, we found that our final design is more precise andmore efficient, in that it prunes several superficial or imprecise constraints.

To achieve context sensitivity, we incorporate additional information about func-tion calls into the label flow graph. Call and return edges corresponding to a callindexed by i in the program are labeled with (i and )i, respectively. During con-straint resolution, we know that two edges correspond to the same call only if theyare labeled by the same index. For example, in Fig. 3(a) the edges from &lock1 and&count1 are labeled with (4 since they arise from the call on line 42, and analo-gously the edges from &lock2 and &count2 are labeled with (5 . We use a variationon context-free language (CFL) reachability to compute the flow of data throughthe program [Pratikakis et al. 2006b]. In this particular example, since count isaccessed with lock held, we would discover that counti is accessed with locki heldfor i ∈ 1..2. Without the labeled edges, we could not distinguish the two call sites,and Locksmith would lose precision. In particular, Locksmith would think thatlock could be either lock1 or lock2, and thus we would not know which one was held

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 7

at the access to count, causing us to report a potential data race on line 7.Section 3 discusses the label flow analysis in detail, not considering context sen-

sitivity, while Section 5 discusses extensions to this analysis to handle struct andvoid* types. We initially present context-insensitive algorithms for each Lock-smith phase, and discuss context sensitivity for all parts in Section 6.

2.1.2 Abstract control flow graph. Fig. 3(b) shows the abstract control flow graph(ACFG) for the example from Fig. 2. Nodes in the ACFG capture operations in theprogram that are important for data race detection, and relate them to the labelsfrom the label flow graph. ACFGs contain 7 kinds of nodes (the notation [n] nextto each node indicates the line number n from which the node is induced). NewL(`)represents a statement that creates a new lock at location `, and Acq(`) and Rel(`)denote the acquire and release, respectively, of the lock `. Reads and writes ofmemory location ρ are represented by Acc(ρ). Thread creation is indicated by Forknodes, which have two successors: the next statement in the parent thread, andthe first statement of the child thread’s called function. The edge to the latter isannotated with an index just as in the label flow graph, to allow us to relate thetwo graphs. For example, the child edge for the Fork corresponding to line 19 islabeled with (1 , which is the same annotation used for the edge from &local to a inthe label flow graph. Lastly, function calls and returns are represented by Call andRet nodes in the graph. For call site i, we label the edge to the callee with (i, andwe label the return edge with )i, again to allow us to relate the label flow graphwith the ACFG. The edges from a Call to the corresponding Ret allow us to flowinformation “around” callees, often increasing precision; we defer discussion of thisfeature to Section 3.3.

In addition to label flow constraints and the abstract control flow graph, thefirst phase of Locksmith also generates linearity constraints and contextual effectconstraints, which are discussed below (Section 2.5).

2.2 Sharing Analysis

The next Locksmith phase determines the set of locations that could be potentiallysimultaneously accessed by two or more threads during a program’s execution. Werefer to these as the program’s shared locations. We limit subsequent analysisfor possible data races to these shared locations. In particular, if a location isnot shared, then it need not be consistently accessed with a particular lock held.Moreover, if an access site (that is, a read or a write through a pointer) nevertargets a shared variable, it need not be considered by the analysis.

As shown in Fig. 1, this phase takes as input contextual effect constraints, whichare also produced during labeling and constraint generation. In standard effectsystems [Talpin and Jouvelot 1994], the effect of a program statement is the set oflocations that may be accessed (read or written) when the statement is executed.Our contextual effect system additionally determines, for each program state, thefuture effect, which contains the locations that may be accessed by the remainder ofthe current thread. To compute the shared locations in a program, at each threadcreation point we intersect the standard effect of the created thread with the futureeffect of the parent thread. If a location is in both effects, then the location isshared. Note that the future effect of a thread includes the effect of any threads it

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

8 ·

creates, so the future effect of the parent includes the effects of any threads thatthe parent creates later.

For example, consider line 19 in Fig. 2. The spawned thread1 has standardeffect {&local}. The parent thread by itself has no future effect, since it accesses nointeresting variables. However, it spawns two child threads which themselves accesscount1 and count2. Therefore, the future effect at line 19 is {&count1,&count2}.Since {&local1}∩{&count1,&count2} = ∅, there are no shared locations created bythis fork. In particular, even though local was accessed in the past by the parentthread (line 18), our sharing analysis correctly determines all its accesses to bethread local.

On the other hand, consider line 20. Here the effect of the spawned thread2 is{&count1,&count2}, and the future effect at line 20 is also {&count1,&count2}.Thus we determine from this call that count1 and count2 are shared. Notice thathere it was critical to include the effect of thread3 when computing the future effectof the parent, since the parent thread itself does not access anything interesting.

In our implementation, we also distinguish read and write effects, and only marka location ρ as shared if at least one of the accesses to ρ in the intersected effects isa write. In other words, we do not consider locations that are only read by multiplethreads to be shared, and Locksmith does not generate race warnings for them.For that reason, we do not need to differentiate between read and write accesses inthe Acc(ρ) nodes of the ACFG.

The analysis just described only determines if a location is ever shared. It couldbe that a location starts off as thread local and only later becomes shared, meaningthat its initial accesses need not be protected by a consistent lock, while subsequentones do. For example, notice that &count1 in Fig. 2 becomes shared due to thethread creations at lines 20 and 21, since both thread2 and thread3 access it. Sowhile the accesses at lines 33 and 27 (via line 42) must consider &count1 as shared,&count1 would not need to be considered shared if it were accessed at, say, line 19.We use a simple dataflow analysis to distinguish these two cases, and thus avoidreporting a false alarm in the latter case. Section 4 presents the sharing analysisand this variant in more detail.

2.3 Lock State Analysis

In the next phase, Locksmith computes the state of each lock at every programpoint. To do this, we use the ACFG to compute the set of locks ` held before andafter each statement.

In the ACFG in Fig. 3(b), we begin at the entry node by assuming all locks arereleased. In the subsequent discussion, we write Ai for the set of locks that aredefinitely held after statement i. Since statements 15–21 do not affect the set oflocks held, we have A15 = A16 = A18 = A19 = A20 = A21 = AEntry = ∅.

We continue propagation for the control flow of the three created threads. Notethat even if a lock is held at a fork point, it is not held in the new thread, so weshould not propagate the set of held locks along the child Fork edge. For thread1,we find simply that A26 = ∅. For thread2, we have A32 = A33 = {&lock1}, andA34 = A35 = ∅. And lastly for thread3, we have A42 = A43 = A8 = ∅ andA6 = A7 = {lock}. Notice that this last set contains the name of the formalparameter lock. When we perform correlation inference, discussed next, we will

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 9

need to translate information about lock back into information about the actualarguments at the two call sites. Note that for the lock state and the subsequentcorrelation analyses to be sound, we need to reason about the linearity of locks(introduced in Section 2.5). We do not need to reason about the linearity of sharedmemory locations the same way, because contrary to locks, any imprecision on thealiasing of memory locations only leads to a more conservative analysis.

2.4 Correlation Inference

The next phase of Locksmith is correlation inference, which is the core race de-tection algorithm. For each shared variable, we intersect the sets of locks held atall its accesses. We call this the guarded-by set for that location, and if the set isempty, we report a possible data race.

We begin by generating initial correlation constraints at each Acc(ρ) node suchthat ρ may be shared according to the sharing analysis. Correlation constraintshave the form ρ�{`1, . . . , `n}, meaning location ρ is accessed with locks `1 through`n held. We write Cn for the set of correlation constraints inferred for statement n.

The first access in the program, on line 18, yields no correlation constraints(C18 = ∅) because, as we discussed above, the sharing analysis determines the&local is not a shared variable. Similarly, C26 = ∅ because the only location that“flows to” y in the label flow graph is &local, which is not shared. On line 33,on the other hand, we have an access to a shared variable, and so we initializeC33 = {&count1 � {&lock1}}, using the output of the lock state analysis to reportwhich locks are held. Similarly, C35 = {&count2�∅}, since no locks are held at thataccess. Finally, C7 = {count � {lock}}. Here we determine count may be sharedbecause at least one shared variable flows to it in the label flow graph.

Notice that this last correlation constraint is in terms of the local variables ofatomic inc. Thus at each call to atomic inc, we must instantiate this constraint interms of the caller’s variables. We use an iterative fixpoint algorithm to propagatecorrelations through the control flow graph, instantiating as necessary until we reachthe entry node of main. At this point, all correlation constraints are in terms of thenames visible in the top-level scope, and so we can perform the set intersectionsto look for races. Note that, as is standard for label flow analysis, when we labela syntactic occurrence of malloc() or any other memory allocation or lock creationsite, we treat that label as a top-level name.

We begin by propagating C7 backwards, setting C6 = C7. Continuing the back-wards propagation, we encounter two Call edges. For each call site i in the program,there exists a substitution Si that maps the formal parameters to the actual pa-rameters; this substitution is equivalent to a polymorphic type instantiation [Rehofand Fahndrich 2001; Pratikakis et al. 2006a]. For call site 4 we have S4 = [lock 7→&lock1, count 7→ &count1]. Then when we propagate the constraints from C7 back-wards across the edge annotated (4 , we apply S4 to instantiate the constraints forthe caller. In this case we set C42 = S4({count� {lock}}) = {&count1� {&lock1}}and thus we have derived the correct correlation constraint inside of thread3. Sim-ilarly, when we propagate C6 backwards across the edge annotated (5 , we findC43 = {&count2 � {&lock2}}.

We continue backwards propagation, and eventually push all the correlations wehave mentioned so far back to the entry of main:

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

10 ·&count1 � {&lock1} from line 33&count2 � ∅ from line 35&count1 � {&lock1} from the call on line 42&count2 � {&lock2} from the call on line 43

(Note that there are substitutions for the calls indexed by 1–3, but they do notrename &counti or &locki, since those are global names.) We now intersect thelock sets from each correlation, and find that &count1 is consistently correlatedwith (or guarded by) lock1, whereas &count2 is not consistently correlated with alock. We then report a race on &count2.

One important detail we have omitted is that when we propagate correlationconstraints back to callers, we need to interpret them with respect to the “closure”of the label flow graph. For example, given a constraint x � {&lock}, if &y flowsto x in the label flow graph, then we also derive &y � {&lock}. More informationabout the closure computation can be found elsewhere [Pratikakis et al. 2006b].

Propagating correlation constraints backwards through the ACFG also helps toimprove the reporting of potential data races. In our implementation, we alsoassociate a program path, consisting of a sequence of file and line numbers, witheach correlation constraint. When we generate an initial constraint at an access inthe ACFG, the associated path contains just the syntactic location of that access.Whenever we propagate a constraint backwards across a Call edge, we prepend thefile and line number of the call to the path. In this way, when the correlationconstraints reach the main, they describe a path of function calls from main tothe access site, essentially capturing a stack trace of a thread at the point of apotentially racing access, and developers can use these paths to help understanderror messages.

Section 3.3.2 presents the algorithm for solving correlation constraints and in-ferring all correlations in the program. Since correlation analysis is an iterativefixpoint computation, in which we iteratively convert local names to their globalequivalents, we compute correlations using the same framework we used to inferthe state of locks.

2.5 Linearity and Escape Checking

The constraint generation phase also creates linearity constraints, which we use toensure that a static lock name ` used in the analysis never represents two or morerun-time locks that are simultaneously live. Without this assurance, we could notmodel lock acquire and release precisely. That is, suppose during the lock stateanalysis we encounter a node Acq(`). If ` is non-linear, it may represent more thanone lock, and thus we do not know which one will actually be acquired by thisstatement. On the other hand, if ` is linear, then it represents exactly one locationat run time, and hence after Acq(`) we may assume that ` is acquired.

Lock ` could be non-linear for a number of reasons. Consider, for example, alinked list data structure where each node of the linked list contains a lock, meantto guard access to the data at that node. Standard label flow analysis will labeleach element in such a recursive structure with the same name ρ whose pointed-tomemory is a record containing some lock `. Thus, ρ statically represents arbitrarilymany run-time locations, and consequently ` represents arbitrarily many locks.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 11

PO

SIX

thre

ad

pro

gra

ms

Lin

ux

dev

ice

dri

ver

s

Benchmark Size Time Warnings Not- Races Race/Total(LOC) (sec) guarded locations %

aget 1,914 0.85 62 31 31 62/352 (17.61)

ctrace 2,212 0.59 10 9 2 10/311 (3.22)

engine 2,608 0.88 7 0 0 7/410 (1.71)knot 1,985 0.78 12 8 8 12/321 (3.74)

pfscan 1,948 0.46 6 0 0 6/240 (2.50)

smtprc 8,624 5.37 46 1 1 46/1079 (4.26)

3c501 17,443 9.18 15 5 4 15/408 (3.68)eql 16,568 21.38 35 0 0 35/270 (12.96)

hp100 20,370 143.23 14 9 8 14/497 (2.82)

plip 19,141 19.14 42 11 11 44/466 (9.44)sis900 20,428 71.03 6 0 0 6/779 (0.77)

slip 22,693 16.99 3 0 0 3/382 (0.99)

sundance 19,951 106.79 5 1 1 5/753 (0.66)synclink 24,691 1521.07 139 2 0 139/1298 (10.71)wavelan 20,099 19.70 10 1 1 10/695 (1.44)

Total 200,675 1937.44 412 78 67 414/8261 (5.01)

Fig. 4. Benchmarks

With such a representation, a naive analysis would not complain about a programthat acquires a lock contained in one list element but then accesses data present inother elements.

To be conservative, Locksmith treats locks ` such as these as non-linear, withthe consequence that nodes Acq(`) and Rel(`) of such non-linear ` are ignored.This approach solves the problem of missing potential races, but is more likely togenerate false positives, e.g., when there is an access that is actually guarded by ` atrun time. Locksmith addresses this issue by using user-specified existential typesto allow locks in data structures to sometimes be linear, and includes an escapechecking phase to ensure existential types are used correctly. We refer the readerto related papers for further discussion on linearity constraints [Pratikakis et al.2006b] and how they can be augmented with existential quantification [Pratikakiset al. 2006a]. Note that the two locks used in the example of Fig. 2 are linear, sincethey can only refer to one run-time lock.

2.6 Soundness Assumptions

C’s lack of strong typing makes reasoning about C programs difficult. For example,because C programs are permitted to construct pointers from arbitrary integers, itcan be difficult to prove that dereferencing such a pointer will not race with memoryaccessed by another thread. Nevertheless, we believe Locksmith to be sound aslong as the C program under analysis has certain characteristics; assuming thesecharacteristics is typical of C static analyses. In particular, we assume that

—the program does not contain memory errors such as double frees or dereferencingof dangling pointers.

—different calls to malloc() return different memory locations.

—the program manipulates locks and threads only using the provided API.

—variables that may hold values of more than one type—in particular, void* point-ers, untagged unions, and va list lists of optional function arguments—are used

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

12 ·

in a type-safe manner; i.e., each read of such a multi-typed value presumes thetype that was last written.1

—there is no pointer arithmetic except for indexing arrays, and all array indexingis within bounds.

—asm blocks only access what is reachable from their parameters, respecting theirtypes.

—integers are not cast to pointers.

—no non-local control flow such as signal handlers or setjmp/longjmp.

Locksmith prints a warning for code that may introduce unsoundness.

2.7 Results

Fig. 4 summarizes the results of running Locksmith on a set of benchmarks varyingin size, complexity and coding style. The results shown correspond to the defaultconfiguration for all Locksmith analyses, in particular: context-sensitive, field-sensitive label flow analysis, with lazy field propagation and no conflation undervoid*; flow- and context-sensitive sharing analysis; and context-sensitive lock stateanalysis and correlation inference.

The first part of the table presents a set of applications written in C using POSIXthreads, whereas the second part of the table presents the results for a set of net-work drivers taken from the Linux kernel, written in GNU-extended C using kernelthreads and spinlocks. The first column gives the benchmark name and the secondcolumn presents the number of preprocessed and merged lines of code for everybenchmark. We used the CIL merger to combine all the code for every benchmarkinto a single C file, also removing comments and redundant declarations. The nextcolumn lists the running time for Locksmith. Experiments in this paper wereperformed on a dual-core, 3GHz Pentium D CPU with 4GB of physical memory.All times reported are the median of three runs. The fourth column lists the to-tal number of warnings (shared locations that are not protected by any lock) thatLocksmith reports. The next column lists how many of those warnings correspondto cases in which shared memory locations are not protected by any lock, as deter-mined by manual inspection. The sixth column lists how many of those we believecorrespond to races. Note that in some cases there is a difference between theunguarded and races columns, where an unguarded location is not a race. Theseare caused by the use of other synchronization constructs, such as pthread join,semaphores, or inline atomic assembly instructions, which Locksmith does notmodel. Finally, the last column presents the number of allocated memory locationsthat Locksmith reported as unprotected, versus the total number of allocatedlocations, where we consider struct fields to be distinct memory locations. Notethat the number of unprotected memory locations is different from the number ofrace warnings reported in the fourth column. This is because a Locksmith racewarning might involve several concrete memory locations, when they are aliased.

1Locksmith can be configured to treat void* pointers and va list argument lists conservatively,without assuming that they are type-safe. However, this introduces a lot of imprecision, so it is

not the default behavior.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 13

Warning: Possible data race: &count2:example.c:2 is not protected! references: dereference of count:example.c:5 at example.c:7 &count2:example.c:2 => atomic_inc.count:example.c:43 => count:example.c:5 at atomic_inc example.c:43 locks acquired: *atomic_inc.lock:example.c:43 concrete lock2:example.c:16 lock2:example.c:1 in: FORK at example.c:21 -> example.c:43

dereference of &count2:example.c:2 at example.c:35 &count2:example.c:2 locks acquired: <empty> in: FORK at example.c:20

(1)

(3)

(4)

(5)

(2)

Fig. 5. Sample Locksmith warning. Highlighting and markers added for expository purposes.

2.7.1 Warnings. Each warning produced by Locksmith contains informationto explain the potential data race to the user. Fig. 5 shows a sample warning fromthe analysis of the example program shown in Fig. 2, stored in file example.c. Theactual output of Locksmith is pure text; here we have added some highlightingand markers (referred to below) for expository purposes.Locksmith issues one warning per allocation site that is shared but inconsis-

tently (or un-)protected. In this example, the suspect allocation site is the contentsof the global variable count2, declared on line 2 of file example.c (1). After reportingthe allocation site, Locksmith then describes each syntactic access of the sharedlocation.

The text block indicated by (2) describes the first access site at line 7, accessingvariable count which is declared at line 5. The other text block shown in the errorreport (each block is separated by a newline) corresponds to a different access site.Within a block, Locksmith first describes why the expression that was actuallyaccessed aliases the shared location (3). In this case, the shared location &count2“flows to” (indicated by =>) the argument count passed to the function call ofatomic inc at line 43 (written as atomic inc.count), in the label flow graph. That,in turn, flows to the formal argument count of the function, declared at line 5, dueto the invocation at line 43. If there is more than one such path in the label flowgraph, we list the shortest one.

Next, Locksmith prints the set of locks held at the access (4). We speciallydesignate concrete lock labels, which correspond to variables initialized by functionpthread mutex init(), from aliases of those variables. Aliases are included in the errorreport to potentially help the programmer locate the relevant pthread mutex lock()and pthread mutex unlock() statements. In this case, the second lock listed is aconcrete lock created at line 16 and named lock2, after the variable that storesthe result of pthread mutex init(). The global variable lock2 itself, listed third, is

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

14 ·

a different lock that aliases the concrete lock created at line 16. There is also anadditional alias listed first, the argument ∗lock of the call to function atomic inc atline 43 (denoted as ∗atomic inc.lock : example.c : 43). We do not list aliasing pathsfor the lock sets, because we list all the aliases, and also printing the paths betweenthem would only add confusion by replicating the lock aliases’ names many times.

Finally, Locksmith gives stack traces leading up to the access site (5). Eachtrace starts with the thread creation point (either a call to pthread create() or thestart of main()), followed by a list of function invocations. For this access site,the thread that might perform the access is created at line 21 and then makes acall at example.c : 43 to the function that contains the access. We generate thisinformation during correlation inference, as we propagate information from theaccess point backwards through the ACFG (Section 3.3.2).

The other dereference listed has the same structure. Notice that the intersectionof the held lock sets of the two sites is empty, triggering the warning. In this case,the warning corresponds to a true race, caused by the unguarded write to count2 atline 35, listed as the second dereference in the warning message. To check whethera warning corresponds to an actual race, the programmer has to verify that thelisted accesses might actually happen simultaneously, including the case where asingle access occurs simultaneously in several threads. Also, the programmer wouldhave to verify that the aliasing listed indeed occurs during the program execution,and is not simply an artifact of imprecision in the points-to analysis. Moreover, theprogrammer must check whether the set of held locks contains additional locks thatLocksmith cannot verify are definitely held. Last, but not least, the programmerneeds to validate that the location listed in the warning is in fact shared amongdifferent threads.

2.7.2 Races. We found races in many of the benchmarks. In knot, all of theraces are on counter variables always accessed without locks held. These variablesare used to generate usage statistics, which races could render inaccurate, but thiswould not affect the behavior of knot. In aget, most of the races are due to anunprotected global array of shared information. The programmer intended for eachelement of the array to be thread-local, but a race on an unrelated memory locationin the signal handling code can trigger erroneous computation of array indexes,causing races that may trigger a program crash. The remaining two races are dueto unprotected writes to global variables, which might cause a progress bar to beprinted incorrectly. In ctrace, two global flag variables can be read and set at thesame time, causing them to have erroneous values. In smtprc, a variable containingthe number of threads is set in two different threads without synchronization. Thiscan result in not counting some threads, which in turn may cause the main threadto not wait for all child threads at the end of the program. The result is a memoryleak, but in this case it does not cause erroneous behavior since it occurs at theend of the program. In most of the Linux drivers, the races correspond to integercounters or flags, but do not correspond to bugs that could crash the program, asthere is usually a subsequent check that restores the correct value to a variable. Therest of the warnings for the Linux drivers can potentially cause data corruption,although we could not verify that any can cause the kernel to crash.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 15

e ::= x | v | e e | if0 e then e else e| (e, e) | e.j | ref e | ! e | e := e

| newlock | acquire e | release e | fork ev ::= n | λx:t.e | (v1, v2)

t ::= int | t× t | t→ t | ref (t) | lock

Fig. 6. Source language

2.7.3 False positives. The majority of false positives are caused by over-approx-imation in the sharing analysis. The primary reason for this is conservatism in thelabel flow (points-to) analysis, which can cause many thread-local locations to bespuriously conflated with shared locations. Since thread-local memory need not be,and usually is not, protected by locks, this causes many false warnings of races onthose locations. Overly conservative aliasing has several main causes: structuralsubtyping where the same data is cast between two different types (e.g. differentstructs that share a common prefix), asm blocks, casting to or from numericaltypes, and arrays, in decreasing order of significance. One approach to betterhandling structural subtyping may be adapting physical subtyping [Siff et al. 1999]to Locksmith. Currently, when a struct type is cast to a different struct type,Locksmith does not compute field offsets to match individual fields, but ratherconservatively assumes that all labels of one type could alias all labels of the other.

The second largest source of false positives in the benchmarks is the flow sensitiv-ity of the sharedness property, in more detail than our current flow-sensitive sharingpropagation can capture. Specifically, any time that a memory location might beaccessed by two threads, we consider it shared immediately when the second threadis created. However, in many cases thread-local memory is first initialized locally,and then becomes shared indirectly, e.g., via an assignment to a global or otherwiseshared variable. We eliminate some false positives using a simple intraproceduraluniqueness analysis—a location via a unique pointer as determined by this analysisis surely not shared—but it is too weak for many other situations.

Roadmap. In the remainder of this paper, we discuss the components just de-scribed in more detail, starting from its core analysis engine (Section 3), withseparate consideration of lock state analysis (Section 3.3.1), correlation inference(Section 3.3.2), the sharing analysis (Section 4), techniques for effectively modelingC struct and void* types (Section 5), and extensions to enable context-sensitiveanalysis (Section 6). A detailed discussion of related work on data race detectionis presented in Section 7. For many of Locksmith’s analysis components, weimplemented several possible algorithms, and measured the algorithms’ effects onthe precision and efficiency of Locksmith. By combining a careful exposition ofLocksmith’s inner workings with such detailed measurements, we have endeavoredto provide useful data to inform further developments in the static analysis of Cprograms (multithreaded or otherwise). Locksmith is freely available on the web(http://www.cs.umd.edu/projects/PL/locksmith).

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

16 ·

3. LABELING AND CONSTRAINT GENERATION

We present Locksmith’s key algorithms on the language in Fig. 6. This languageextends the standard lambda calculus, which consists of variables x, functions λx:t.e(where the argument x has type t), and function application e e. To model condi-tional control flow, we add integers n and the conditional form if0 e0 then e1 else e2,which evaluates to e1 if e0 evaluates to 0, and to e2 otherwise. To model structureddata (i.e., C structs) we introduce pairs (e, e) along with projection e.j. The latterform returns the jth element of the pair (j ∈ 1, 2). We model pointers and dynamicallocation using references. The expression ref e allocates a fresh memory locationm, initializing it with the result of evaluating e and returning m. The expression ! ereads the value stored in memory location e, and the expression e1 := e2 updatesthe value in location e1 with the result of evaluating e2.

We model locks with three expressions: newlock dynamically allocates and re-turns a new lock, and acquire e and release e acquire and release, respectively, lock e.Our language also includes the expression fork e, which creates a new thread thatevaluates e in parallel with the current thread. The expression fork e is asyn-chronous, i.e., it returns to the parent immediately without waiting for the childthread to complete.

Source language types t include the integer type int , pair types t × t, functiontypes t → t, reference (or pointer) types ref (t), and the type lock of locks. Notethat our source language is monomorphically typed and that, in a function λx:t.e,the type t of the formal argument x is supplied by the programmer. This matchesC, which includes programmer annotations on formal arguments. If we wish toapply Locksmith to a language without these annotations, we can always applystandard type inference to determine such types as a preliminary step.

3.1 Labeling and Constraint Generation

As discussed in Section 2, the first stage of Locksmith is labeling and constraintgeneration, which produces both label flow constraints, to model memory loca-tions and locks, and abstract control flow constraints, to model control flow. Wespecify the constraint generation phase using type inference rules. We discuss thelabel flow constraints only briefly here, and refer the reader to prior work [Mossin1996; Fahndrich et al. 2000; Rehof and Fahndrich 2001; Kodumal and Aiken 2004;Johnson and Wagner 2004; Pratikakis et al. 2006a], including the Locksmith con-ference paper [Pratikakis et al. 2006b], for more details. In our implementation,we use Banshee [Kodumal and Aiken 2005], a set-constraint solving engine, torepresent and solve label flow constraints. For the time being, we present a purelymonomorphic (context-insensitive) analysis; Section 6 discusses context sensitivity.

We extend source language types t to labeled types τ , defined by the grammarat the top of Fig. 7(a). The type grammar is mostly the same as before, with twomain changes. First, reference and lock types now have the forms ref ρ(τ) and lock `,where ρ is an abstract location and ` is an abstract lock. As mentioned in the lastsection, each ρ and ` stands for one or more concrete, run-time locations. Second,function types now have the form (τ, φ)→` (τ ′, φ′), where τ and τ ′ are the domainand range type, and ` is a lock, discussed further below. In this function type, φand φ′ are statement labels that represent the entry and exit node of the function.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 17

τ ::= int | τ × τ | (τ, φ)→` (τ, φ) | ref ρ(τ) | lock`

C ::= C ∪ C | τ ≤ τ | ρ ≤ ρ | ` ≤ ` | φ ≤ φ | φ : κκ ::= Acc(ρ) | NewL(`) | Acq(`) | Rel(`)| Fork | Call(`, φ) | Ret(`, φ)

ρ ∈ abstract locations

` ∈ abstract locksφ ∈ abstract statement labels

〈〈int〉〉 = int

〈〈t1 × t2〉〉 = 〈〈t1〉〉 × 〈〈t2〉〉〈〈t1 → t1〉〉 = (〈〈t1〉〉, φ1)→` (〈〈t2〉〉, φ2) φ1, φ2, ` fresh

〈〈ref (t)〉〉 = ref ρ(〈〈t〉〉) ρ fresh

〈〈lock 〉〉 = lock` ` fresh〈〈τ〉〉 = τ ′ where τ ′ is τ with fresh ρ, `, φ’s, as above

C ` φ ≤ (φ′ : κ) ≡ C ` φ ≤ φ′ and C ` φ′ : κ and φ′ fresh

(a) Auxiliary definitions

Var

C;φ; Γ, x : τ ` x : τ ;φ

Int

C;φ; Γ ` n : int ;φ

PairC;φ; Γ ` e1 : τ1;φ1

C;φ1; Γ ` e2 : τ2;φ2

C;φ; Γ ` (e1, e2) : τ1 × τ2;φ2

ProjC;φ; Γ ` e : τ1 × τ2;φ′ j ∈ 1, 2

C;φ; Γ ` e.j : τj ;φ′

RefC;φ; Γ ` e : τ ;φ′ ρ fresh

C;φ; Γ ` ref e : ref ρ(τ);φ′

DerefC;φ; Γ ` e1 : ref ρ(τ);φ1

C ` φ1 ≤ (φ′ : Acc(ρ))

C;φ; Γ ` ! e1 : τ ;φ′

AssignC;φ; Γ ` e1 : ref ρ(τ1);φ1

C;φ1; Γ ` e2 : τ2;φ2

C ` τ2 ≤ τ1 C ` φ2 ≤ (φ′ : Acc(ρ))

C;φ; Γ ` e1 := e2 : τ2;φ′

NewlockC ` φ ≤ (φ′ : NewL(`)) ` fresh

C;φ; Γ ` newlock : lock`;φ′

AcquireC;φ; Γ ` e : lock`;φ′

C ` φ′ ≤ (φ′′ : Acq(`))

C;φ; Γ ` acquire e : int ;φ′′

ReleaseC;φ; Γ ` e : lock`;φ′

C ` φ′ ≤ (φ′′ : Rel(`))

C;φ; Γ ` release e : int ;φ′′

ForkC;φ′; Γ ` e : τ ;φ′′

C ` φ ≤ (φ′ : Fork)

C;φ; Γ ` fork e : int ;φ

Lamτ = 〈〈t〉〉 C;φλ; Γ, x : τ ` e : τ ′;φ′λφλ fresh ` = {locks e may access}C;φ; Γ ` λx:t.e : (τ, φλ)→` (τ ′, φ′λ);φ

AppC;φ; Γ ` e1 : (τ, φλ)→` (τ ′, φ′λ);φ1

φ1; Γ ` e2 : τ2;φ2 C ` τ2 ≤ τC ` φ2 ≤ (φ3 : Call(`, φ′))

C ` φ3 ≤ φλ C ` φ′λ ≤ (φ′ : Ret(`, φ3))

C;φ; Γ ` e1 e2 : τ ′;φ′

CondC;φ; Γ ` e0 : int ;φ0

C;φ0; Γ ` e1 : τ1;φ1 C;φ0; Γ ` e2 : τ2;φ2

τ = 〈〈τ1〉〉 C ` τ1 ≤ τ C ` τ2 ≤ τC ` φ1 ≤ φ′ C ` φ2 ≤ φ′ φ′ fresh

C;φ; Γ ` if0 e0 then e1 else e2 : τ ;φ′

(b) Type inference rules

Fig. 7. Labeling and Constraint Generation Rules

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

18 ·

We will usually say statement φ instead of statement label φ when the distinctionis made clear by the use of a φ metavariable.

During type inference, our type rules generate constraints C, including flow con-straints of the form τ ≤ τ ′, indicating a value of type τ flows to a position of typeτ ′. Flow constraints among types are ultimately reduced to flow constraints ρ ≤ ρ′and ` ≤ `′ among locations and locks, respectively. When we draw a set of labelflow constraints as a graph, as in Fig. 3(a), ρ’s and `’s form the nodes, and eachconstraint x ≤ y is drawn as a directed edge from x to y.

We also generate two kinds of constraints that, put together, define the ACFG.Whenever statement φ occurs immediately before statement φ′, our type systemgenerates a constraint φ ≤ φ′. As above, we drawn such a constraint as an edge fromφ to φ′. We generate kind constraints of the form φ : κ to indicate that statement φhas kind κ, where the kind indicates the statement’s relevant behavior, as describedin Section 2.1. Note that our type rules assign at most one kind to each statementlabel, and thus we showed only the kinds of statement labels in Fig. 3(b). Statementlabels with no kind, including join points and function entries and exits, have nointeresting effect.

The bottom half of Fig. 7(a) defines some useful shorthands. The notation 〈〈·〉〉denotes a function that takes either a standard type or a labeled type and returnsa new labeled type with the same shape but with fresh abstract locations, locks,and statement labels at each relevant position. By fresh we mean a metavariablethat has not been introduced elsewhere in the typing derivation. We also use theabbreviation C ` φ ≤ (φ′ : κ), which stands for C ` φ ≤ φ′, C ` φ′ : κ, andφ′ fresh. These three operations often go together in our type inference rules.

Fig. 7(b) gives type inference rules that prove judgments of the form C;φ; Γ `e : τ ;φ′, meaning under constraints C and type environment Γ (a mapping fromvariable names to labeled types), if the preceding statement label is φ (the inputstatement label), then expression e has type τ and has the behavior described bystatement φ′ (the output statement label). In these rules, the notation C ` C ′

means that C must contain the constraints C ′. Viewing these rules as defininga constraint generation algorithm, we interpret this judgment as generating theconstraint C ′ and adding it to C.

We discuss the rules briefly. Var and Int are standard and yield no constraints.The output statement labels of these rules are the same as the input statementlabels, since accessing a variable or referring to an integer has no effect.

In Pair, we type e1 with the input statement φ for the whole expression, yieldingoutput statement φ1. We then type e2 starting in φ1 and yielding φ2, the outputstatement label for the whole expression. Notice we assume a left-to-right order ofevaluation. In Proj, we type check the subexpression e, and the output statementlabel of the whole expression is the output of e.

Ref types memory allocation, associating a fresh abstract location ρ with thenewly created updatable reference. Notice that this rule associates ρ with a syn-tactic occurrence of ref, but if that ref is in a function, it may be executed multipletimes. Hence the single ρ chosen by Ref may stand for multiple run-time locations.

Deref is the first rule to actually introduce a new statement label into theabstract control flow graph. We type e1, yielding a pointer to location ρ and

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 19

an output statement φ1. We then add a new statement φ′ to the control flowgraph, occurring immediately after φ1, and give φ′ the kind Acc(ρ) to indicate thedereference. Statement φ′ is the output of the whole expression. Assign is similar,but also requires the type τ2 of e2 be a subtype of the type τ1 of data referencedby the pointer e1.

Newlock types a lock allocation, assigning it a fresh abstract lock `, similarlyto Ref. Acquire and Release both require that their argument be a lock, andboth return some int (in C these functions typically return void). All three of theserules introduce a new statement label of the appropriate kind into the control flowgraph immediately after the output statement label of the last subexpression.

In Fork, the control flow is somewhat different than the other rules, to matchthe asynchronous nature of fork. At run time, the expression e is evaluated in a newthread. Hence we introduce a new statement φ′ with the special kind Fork, to markthread creation, and type e with input statement φ′. We sequence φ′ immediatelyafter φ, since that is their order in the control flow. The output statement label ofthe fork expression as a whole is the same as the input statement φ, since no statein the parent changes after the fork.

In Cond, we sequence the subexpressions as expected. We type both e1 and e2

with input statement φ0, since either may occur immediately after e0 is evaluated.We also create a fresh statement φ′ representing the join point of the condition,and add appropriate constraints to C. Since the join point has no effect on theprogram state, we do not associate a kind with it. We also join the types τ1 andτ2 of the two branches, constraining them to flow to a type τ , which has the sameshape as τ1 but has fresh locations and locks. Note that for the constraints in thisrule to be satisfied (as described in the following section), τ2 must have the sameshape as τ1, and so we could equivalently have written τ = 〈〈τ2〉〉.Lam type checks the function body e in an environment with x bound to τ , which

is the standard type t annotated with fresh locations and locks. We create a newstatement φλ to represent the function entry, and use that as the input statementlabel when typing the function body e. We place φλ and φ′λ, the output statementlabel after e has been evaluated, in the type of the function. We also add an abstractlock ` to the function type to represent the function’s lock effect, which is the setof locks that may be acquired or released when the function executes. For eachlock `′ that either e acquires or releases directly or that appears on the arrow of afunction called in e, a separate effect analysis (not shown) generates a constraint`′ ≤ `. Then during constraint resolution, we compute the set of locks that flowto ` to compute the lock effect. We discuss the use of lock effects in Section 2.4.The output statement label for the expression as a whole is the same as the inputstatement label, since defining a function has no effect.

Finally, App requires that e2’s type be a subtype of e1’s domain type. We alsoadd the appropriate statement labels to the control flow graph. Statement φ2 isthe output of e2, and φλ is the entry node for the function. Thus clearly we needto add control flow from φ2 to φλ. Moreover, φ′λ is the output statement labelof the function body, and that should be the last statement label in the functionapplication as a whole. However, rather than directly inserting φλ and φ′λ in thecontrol flow graph, we introduce two intermediate statement labels, φ3 just before

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

20 ·

C ∪ {int ≤ int} ⇒ C

C ∪ {ref ρ(τ) ≤ ref ρ′(τ ′)} ⇒ C ∪ {ρ ≤ ρ′, τ ≤ τ ′, τ ′ ≤ τ}

C ∪ {lock` ≤ lock`′} ⇒ C ∪ {` ≤ `′}

C ∪ {τ1 × τ2 ≤ τ ′1 × τ ′2} ⇒ C ∪ {τ1 ≤ τ ′1, τ2 ≤ τ ′2}C ∪ {(τ1, φ1)→`1 (τ ′1, φ

′1) ≤ (τ2, φ2)→`2 (τ ′2, φ

′2)} ⇒ C ∪ {τ2 ≤ τ1, τ ′1 ≤ τ ′2, `1 ≤ `2}

∪ {φ2 ≤ φ1, φ′1 ≤ φ′2}

C ∪ {ρ ≤ ρ′, ρ′ ≤ ρ′′} ∪ ⇒ {ρ ≤ ρ′′}C ∪ {` ≤ `′, `′ ≤ `′′} ∪ ⇒ {` ≤ `′′}

Fig. 8. Label flow constraint rewriting rules

the call, and φ′ just after. Statement φ3 has kind Call(`, φ′), and statement φ′ haskind Ret(`, φ3). Pictorially, the control flow graph looks like the following, wherethe φ’s in the kinds of the Call and Ret nodes are drawn with dashed lines:

φ2 φ3 φλ φ'λ φ'

Call(l)

Ret(l)

Using this structure, we can propagate certain dataflow facts “around” functions—i.e., directly from Call to Ret, rather than through the function body—therebyimproving precision and gaining some speed up. In particular, we use this for ourlock state computation (Section 2.3).

3.2 Label Flow Constraint Resolution

After generating constraints using the rules in Fig. 7, we can then solve the con-straints to compute various facts about the analyzed program. We use flow con-straints to answer questions about which locations and locks are used by variousstatements in the program, i.e., to perform a label flow analysis. These constraintshave the form c ≤ c′ where c and c′ are either locations ρ, locks `, or types τ .2

We apply the rewriting rules in Fig. 8 to translate the constraints to a simpler,solved form. The first group of rewriting rules operate on constraints of the formτ ≤ τ ′. These rules are standard structural subtyping rules, matching the shapes ofthe left- and right-hand sides of the constraints and then propagating subtyping tothe components in the usual way (e.g., invariant for references and contravariant forfunction domains) [Pierce 2002]. We will assume that the source program is typecorrect with respect to the standard types, so that these rewriting rules will neverencounter a constraint they cannot reduce further, i.e., in the constraint τ ≤ τ ′,the types τ and τ ′ will always be the same modulo abstract locations and locks.

After applying these rewriting rules, we are left with constraints ρ ≤ ρ′ and` ≤ `′. The remaining rewrite rules add any transitively implied constraints. Herethe notation C ∪ ⇒ C ′ means we add the constraints C ′ to C. We define Sol(C)to be the set of constraints computed by exhaustively applying the rules in Fig. 8

2We discuss the control flow constraints on φ labels in Section 3.3.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 21

to C. We can then define

Flow(C, ρ) = {ρ′ | ρ′ ≤ ρ ∈ Sol(C)}Flow(C, `) = {`′ | `′ ≤ ` ∈ Sol(C)}

In other words, Flow(C, ρ) is the set of abstract locations that flow to ρ in theconstraints C, and similarly for Flow(C, `). We can use this information to answerquestions about locations and locks in the program. For example, given a deref-erence site ! e, if type inference assigns e the type ref ρ(int) and ref e′ the typeref ρ

′(int), then if it is possible for e to evaluate to ref e′, then ρ′ ∈ Flow(C, ρ). In

other words, the set Flow(C, ρ) conservatively models the set of locations that mayflow to the reference annotated with ρ.

Consider the example of Section 2. In function thread1() (lines 23–28) the ar-gument a corresponds to a variable with type ref ρ&a(ref ρa(int)) in this language,ignoring for now the special void* type. (We follow the convention that the locationname is subscripted by the program variable that names the location.) Similarly, thelocal variable y in thread1() corresponds to a variable with type ref ρ&y (ref ρy (int)).(Because in C variables can be l-values, we consider all variables to be referencesto the corresponding type, adding an extra level of reference that is implicit inthe C program.) In C, variable names are implicitly dereferenced when they oc-cur in a read context. For example, the assignment to y in thread1() (line 24)can be written as y = a, where the occurrence of y at the left hand side of theassignment denotes the location y whereas the occurrence of a at the right handside denotes its contents. In our formal language, that assignment corresponds toy := ! a. Typing this assignment with Assign and Deref creates the flow con-straint ref ρa(int) ≤ ref ρy (int). Then, the second and first rewriting rules in Fig. 8solve the constraint reducing it to ρa ≤ ρy, and thus ρa ∈ Flow(C, ρy).

Note that the constraints in this section are monomorphic and do not include the(i and )i edges we introduced in Section 2 for context sensitivity. We will discusshow to incorporate context sensitivity into this system in Section 6.

3.3 Data Flow Analysis with the Abstract Control Flow Graph

Locksmith uses a generic, mostly standard dataflow analysis engine to com-pute per-program point information, such as which locks are held, by propagatingdataflow facts through the ACFG. To construct a dataflow analysis, the program-mer specifies the following characteristics of the target analysis [Aho and Ullman1977]:

—The direction of the analysis (forwards or backwards)

—The type of the dataflow facts to propagate

—Initial dataflow facts at the program entry, and at each statement label

—A merge function to join dataflow facts

—Transfer functions for each kind of statement label

In the remainder of this section, we discuss two dataflow analyses used by Lock-smith—lock state analysis and correlation inference—and then compare the per-formance of various strategies for implementing the fixpoint computation.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

22 ·

φ kind Acqout(φ)

Acc(ρ) Acqin(φ)

Fork ∅NewL(`) Acqin(φ)

Acq(`) Acqin(φ) ∪ Flow(C, `)

φ kind Acqout(φ)

Rel(`) Acqin(φ) \ Flow(C, `)

Call(`, φ′) Acqin(φ) ∩ Flow(C, `)Split(φ′) = Acqin(φ) \ Flow(C, `)

Ret(`, φ′) Acqin(φ) ∪ Split(φ)

Fig. 9. Transfer functions for lock state inference

1 let g () = !y in2 acquire k;3 g ();4 !x;5 release k;6 g ()

Acq(k)[2]

Call[3]

Ret[3]

{k}∅

Acc(x)[4]

Call[6]

Ret[6]

∅ ∅

{k}{k} {k}

Rel(k)[5]

∅∅

Acc(y)[1]

Fig. 10. Splitting the lock state at a function call

3.3.1 Lock State: Forwards dataflow. Locksmith’s lock state analysis is a for-wards dataflow analysis, where the sets of dataflow facts are the sets of locks held.The set of held locks is initially empty for all statement labels, and the merge func-tion is set intersection. Fig. 9 lists the transfer functions for each kind of statementlabel. Acc(ρ) and NewL(`) do not alter the lock state, since neither acquires or re-leases a lock. The transfer function for Fork always returns the empty set of locks,as every new thread starts with all locks released. (Recall from Fig. 7 that the firststatement label in a thread has kind Fork.) The transfer functions for Acq(`) andRel(`) add and remove, respectively, Flow(C, `) from the lock state. This latter setincludes ` and all its aliases. The separate linearity check (mentioned in Section 2.5)ensures this set contains only one run-time lock. In our implementation, we alsosignal a warning at an attempt to acquire or release a lock that is already acquiredor released, respectively.

The transfer function for Call(`, φ′) partitions the held locks into two non-overlap-ping sets. The transfer function sets the output set of held locks to be the inputset intersected with Flow(C, `), which contains the lock effect of the function, i.e.,the aliases of all locks that may be changed by the called function. It is thisintersection that will be propagated into the body of the function. The transferfunction saves the remaining held locks in the set Split(φ′). Then the transferfunction for Ret(`, φ′) adds the saved locks back to the held set. Due to the waythe inference rules generate fresh variables, it is always the case that φ′ is unique,generated fresh for each call site.

For example, consider the program in Fig. 10. This program calls function gtwice, once with k held, and once without holding k. It also accesses x with k held,just after the first call to g. The function g itself accesses y. The right side of thefigure shows the ACFG for this example, annotated with the lock state Acqout(φ)at the end of each edge from statement φ. Statement numbers are given belowthe statement kinds. Initially, no locks are held (∅ on the edge to Acq(k) [2]), andafter line 2 the lock state is {k} (shown on the edge from Acq(k) [2]). Then at

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 23

φ kind Corrout(φ)

Acc(ρ) Corr in(φ) ∪ {ρ� Acqout(φ)}(ρ shared)

Fork Corr in(φ)

NewL(`) Corr in(φ)

φ kind Corrout(φ)

Acq(`) Corr in(φ)

Rel(`) Corr in(φ)

Call(`, φ′) (Corr in(φ) + Split(φ′)) ∪ Corr in(φ′)

Ret(`, φ′) ∅

Fig. 11. Transfer functions for correlation inference

the call to g, we split the lock state into two parts. Since g does not acquire orrelease any locks, we propagate {k} ∩ ∅ = ∅ from the Call [3] to Acc(y) [1]. Duringcorrelation inference, we will recover the fact that y was accessed with k held. Wepropagate the other part of the lock state, {k} \ ∅, from Call [3] to Ret [3]. Next,after Ret [3], the lock state is {k}∪∅, i.e., the locks that flowed “around” g plus thelocks that flowed “through” g. Thus at Acc(x) [4], we see that k is held. Continuingthrough the program, at the next call to g, the empty set of locks is held, and thatis propagated into g as before. This time after the Ret no locks are held, and theprogram continues.

Critically, with this analysis we can discover that x is guarded by k. Imagine ifwe had not split the lock state. Then we would have no dashed lines in the graphin Fig. 10, and we would directly connect the Call and Ret nodes to Acc(y). Butthen we would have two edges flowing to Acc(y), one with state {k}, and one withstate ∅. We would then intersect these sets and decide that no locks were heldat the entry to g. That summarization is fine for g, but when we propagate thisinformation, we would decide no locks were held after the Ret statement labels inthe ACFG, and thus we would think x was accessed with no lock held.

In essence, by splitting the lock state, we make functions parametric in locks theydo not change; similar approaches have been used in other type systems [Smithet al. 2000; Foster et al. 2002]. (We do propagate information about changed locksinto the function, since otherwise we would not be able to track the state of thoselocks correctly.) We have found this kind of lightweight polymorphism critical toLocksmith’s precision. It is particularly important for commonly called functionssuch as printf, which would otherwise almost always cause the lock state to beempty upon their return.

3.3.2 Correlation Inference: Backwards dataflow. Locksmith also uses thedataflow analysis engine to implement correlation inference. Recall from Section 2.4that a correlation constraint has the form ρ�{`1, . . . , `n}, where the `i are the locksthat are held during an access to ρ. To generate such constraints the analysis uses abackwards propagation, where the per-φ state is a set Corr of correlations. Initiallythe set of correlations is empty for all statement labels, and the merge function isset union.

Fig. 11 shows the transfer rules for correlation inference. Note that since this isa backwards analysis, Corr in(φ) corresponds to the state after statement φ, andCorrout(φ) corresponds to the state before φ.

The transfer function for Acc(ρ) adds ρ � Acqout(φ) to the set of correlations,where ρ is determined to be shared according to the sharing analysis (Section 4),and Acqout(φ) was computed by the lock state inference. The transfer functions

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

24 ·

for Fork, NewL(`), Acq(`), and Rel(`) simply propagate the set of correlations. Thelast two transfer functions are the most interesting. Recall that during the lockstate computation, any held locks that are not changed by a function are notpropagated through the function body. However, any lock that is held for theduration of a function call clearly is correlated with all accesses that occur in thebody of the function. Because of this, the transfer function for Call(`, φ′) adds theset Split(φ′) of held locks that are “hidden” from the function body, to the lock setof every correlation for the function. We define Corr in(φ) + Split(φ′) to be the setof correlations {ρ� ({`1, . . . , `n} ∪ Split(φ′))} where ρ� {`1, . . . , `n} ∈ Corr in(φ).

We can apply this transfer function to all correlations caused by accesses duringthe execution of the invoked function. However, if the called function creates a newthread, the correlations originating from that thread are clearly not protected bythe held locks in Split(φ′), even though they are propagated backwards through thefunction body during the analysis. We handle that case by marking all correlationsthat have been propagated through a Fork point as closed, so that their lock set isnot augmented through Split nodes.

We also include in the output of Call(`, φ′) all correlations that occur after thefunction returns, Corr in(φ′). Then, the transfer function for Ret(`, φ′) always re-turns the empty set, as all correlations occurring after the function call are alreadypropagated to the point before it. This speeds up correlation inference, because weneed not propagate correlation information into called functions. Also, recall thatdue to the use of the special call and return kinds in the lock state analysis, we“hide” a set of held locks for every function call. Clearly, as the locks in the splitset are held during the function call, they protect all dereferences that occur inthis given evaluation of the function body. We therefore need to add that “hidden”set of held locks to the set of correlations that occur in the function. Propagat-ing correlations this way facilitates that, as only the correlations that occur in thefunction body are propagated through the Call(`, φ′) node.

To see these transfer functions in action, consider again the program in Fig. 10.Initially the correlation sets are empty, but the transfer functions for accessesAcc(x) [4] and Acc(y) [1] introduce two initial correlations in this program: x� {k}(generated at Acc(x) [4]) and y � ∅ (generated at Acc(y) [1]). The correlationconstraint x � {k} is first propagated backward to Ret [3], then to Call [3], thenAcq(k) [2], and then to the beginning of the program. Notice that following therule for Ret in Fig. 11, we do not propagate this constraint backwards into node [11]in the called function.

There are two backward paths for the second correlation, on y. When we propa-gate it to Call [3], we add the “hidden” lock k to the correlation, yielding y�{k} atCall [3]. When we propagate it to Call [6], there are no hidden locks, and so we getcorrelation y�∅. We propagate both of these constraints unchanged to the start ofthe program. Thus we have that y is correlated with both {k} and ∅, and thereforey is not consistently guarded by a lock.

In our implementation, we also associate a call stack with each correlation con-straint. When we propagate information through a Call node, we add the nameof the called function to the call stack. In this way, once correlations reach theentry of the whole program, we can report not only what locations are correlated

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 25

Benchmark Queue Stack Double Stack Postorder Set No worklist

aget 0.84 0.82 0.83 0.80 0.85 (5)

ctrace 0.61 0.58 0.60 0.57 0.59 (4)

engine 0.88 0.88 0.87 0.89 0.88 (5)knot 0.77 0.74 0.76 0.74 0.78 (8)

pfscan 0.43 0.43 0.41 0.43 0.46 (5)

smtprc 5.92 6.77 5.63 4.31 5.37 (7)

3c501 9.03 9.46 9.21 9.39 9.18 (6)eql timeout timeout timeout timeout 21.38 (4)

hp100 timeout timeout timeout 2524.49 143.23 (15)

plip 30.73 30.21 28.53 25.11 19.14 (5)sis900 85.07 82.89 84.97 79.37 71.03 (6)

slip timeout timeout timeout timeout 16.99 (5)

sundance 104.22 108.92 108.59 103.73 106.79 (11)synclink timeout timeout timeout timeout 1521.07 (11)

wavelan 19.69 20.08 19.66 19.76 19.70 (6)

Fig. 12. Time (in seconds) to perform correlation inference using several fixpoint strategies. The

rightmost column also includes the number of visits.

with which locks, but also on what paths the dereferences occurred. For example,for the code in Fig. 10, if y were shared, we would report a data race, indicatingthat the accesses were due to the call on line 3 and the call on line 6. We havefound that this “path” information makes Locksmith error reports much easier tounderstand.

3.3.3 Fixpoint Computation Strategies. We implemented several strategies forfinding a fixpoint of the sets computed by our dataflow analyses. First, we experi-mented with worklist-based schemes. In these approaches, each time the input set(i.e. the output of predecessors or successors, for a forwards or backwards analysis,respectively) computed for some node φ changed, we would add φ to the worklistfor reconsideration. We tried four particular worklist implementations, discussedin detail by Cooper et al. [2004]: Queue, Stack, Double Stack, and Postorder Set.Initially, our implementations avoided adding duplicate nodes to the worklist, butwe found that the cost to detect and eliminate duplicates is comparable to thegain from not processing the additional nodes. Thus, none of the implementationsreported here attempt to eliminate duplicate nodes.

Second, we implemented the following simple strategy for backwards (forwards)analysis without a worklist:

(1) Starting from the exit (entry) nodes, perform a postorder (reverse postorder)visit of the whole graph, applying the transfer function at each node to propa-gate the state to its predecessors (successors).

(2) If anything changed during the last visit, then revisit the whole graph.

Using postorder for backwards analysis and reverse postorder for forwards analysisis extremely important [Aho and Ullman 1977]. For example, postorder traversalvisits successors of φ before a node φ. Thus in a backwards analysis, a postordertraversal will (in the absence of cycles) require only a single pass through the ACFGto reach a fixpoint.

Fig. 12 shows the times to perform correlation inference using the various strate-

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

26 ·

gies. A timeout indicates a run that did not terminate within one hour. Wefound that for our benchmarks, the Queue, Stack, and Double Stack strategiestake roughly the same time, and the Postorder Set strategy is slightly faster. Sur-prisingly, we discovered that getting rid of the worklist altogether is the optimalstrategy, performing significantly better on the larger benchmarks (the Linux kerneldrivers). For example, on slip, all four worklist algorithms time out after one hour,whereas the no worklist strategy terminates in 17 seconds.

One would expect that the worklist strategies would visit far fewer nodes thanthe no-worklist strategy, and indeed this is the case for our benchmarks. However,Locksmith uses hashconsing of state data structures to memoize transfer functionson control flow nodes, and thus avoids re-computing the output state for everycontrol flow node visited, if the input has not changed. As a result, the cost ofmaintaining the worklist far exceeds the cost of redundant visits to nodes. The lastcolumn also shows the number of visits of the whole graph, in parentheses.

We do not present the respective measurements for the lock state analysis. Wefound that because each benchmark includes a relatively small set of locks, thesets of held locks at each program point are small. This makes the lock stateanalysis quite fast, as little information needs to be propagated. Indeed, for all thebenchmark programs the running time for lock state analysis is negligible comparedto the total running time, and all five strategies work equally well.

4. SHARING ANALYSIS

As we discussed in Section 3.3.2, during correlation inference we can safely ignoreaccesses to thread-local data, since such data need not be protected by locks. In thissection we show how we compute the set of thread-shared locations. We found thatour analysis allows Locksmith to ignore a significant—usually dominant—fractionof the accesses in the program as thread-local.

A simple, but coarse sharing analysis might work by determining the memorylocations read or written by each thread in the program; any location that is everaccessed by more than one thread is potentially shared. Our core analysis (Sec-tion 4.1) improves on this simple approach by ordering memory accesses accordingto whether they happen before or after forking a child thread. In particular, evenif two threads access the same location, the location cannot be accessed simultane-ously, and will not be considered shared by the core analysis, if all accesses in onethread occur before all accesses in the other.

We refine this basic approach in two main ways. First, we use a simple scopingrefinement to avoid confusing non-conflicting accesses to local variables with thesame names but allocated on the stacks of distinct threads. Without this refine-ment, if a parent forks two threads that execute the same function, accesses byeach thread to its own local variables would be considered conflicting, since twounordered threads would be accessing the “same” location. Second, the core anal-ysis only determines whether there are any unordered accesses of some location,but fails to distinguish those accesses that are unordered from those that are not.To gain back some of this lost precision, we use a simple uniqueness analysis totrack a location from the time it is allocated to the point it becomes assigned toa global variable or passed to a function, both which are events that could lead to

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 27

Eff-DerefΦ1; Γ ` e : ref ρ(τ)

Φε2 = Flow(C, ρ) Φ1 � Φ2 ↪→ Φ

Φ; Γ ` ! e : τ

XFlow-CtxtΦ1 = [ε1; (ε2 ∪ ω2)] Φ2 = [ε2;ω2]

Φ = [(ε1 ∪ ε2);ω2]

Φ1 � Φ2 ↪→ Φ

Eff-ForkΦe; Γ ` e : τ Φεe ⊆ Φε Φεe ∩ Φω ⊆ SharLocs

Φ; Γ ` fork e : τ

Fig. 13. Contextual effect rules for finding shared locations (selected)

the location being thread-shared—all accesses that occur before those points mustbe thread-local. Both of these refinements add useful precision at low cost (Sec-tion 4.2). We also tried tracking whether an access to an eventually shared locationoccurs before or after the fork that precipates its being shared, but we found thatdoing so did not add much benefit beyond the first two refinements (Section 4.3).

4.1 Contextual effects for finding shared locations

Locksmith uses an effect system to infer which locations are thread-shared. Givensome expression e, the effect ε of e is the set of all locations ρ that e could dereferenceduring its evaluation [Talpin and Jouvelot 1994]. In recent work, we proposed ageneralization of effects that we call contextual effects [Neamtiu et al. 2008]. Thecontextual effect of e consists not only of the effect of e’s computation, but also theeffect α of the computation that has already occurred—called the prior effect—and the effect ω of the computation yet to take place—called the future effect. Atevery occurrence of fork e in the program, we compute the effect ε of e, the childthread, and the future effect ω of the parent thread. We consider as thread-sharedthose locations in the intersection of these two sets. The implementation by defaultdifferentiates between read and write effects, and does not consider memory that isonly read in parallel to be shared.

Fig. 13 contains selected typing rules from our contextual effect system as appliedto this sharing analysis. Full details can be found in our prior paper [Neamtiu et al.2008]. In these rules, a contextual effect Φ consists of a pair [ε;ω], where the firstelement is the standard effect, and the second element is the future effect.3 In ourimplementation, we generate effect-related constraints along with label flow con-straints. For simplicity, we present effect typing rules here as a separate judgment,but it would be straightforward to merge these rules with those in Fig. 8.Eff-Deref types a dereference ! e. In this case, the contextual effect Φ of the

entire expression is computed by combining the effect Φ1 of the subexpression e,defined in the first premise, with the effect Φ2 of the dereference itself, defined inthe second premise. The second premise of Eff-Deref requires that the standardeffect of Φ2 includes ρ and locations that flow to it according to the label flowanalysis (written Flow(C, ρ), where the constraint set C is due to the label flowanalysis discussed in the previous section). Here the syntax Φε refers to the ε (i.e.,

3We elide prior effects in these rules, because they are not needed in our sharing computation.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

28 ·

1 let x = ref 1 in2 let y = ref 2 in3 !x;4 !y;5 fork (!x; !y);6 fork (!x)

Φ11; Γ ` !x; ! y : int Φε11 ⊆ Φε1Φε11 ∩ Φω1 ⊆ SharLocs

Φ1; Γ ` fork (!x; ! y) : int

Φ22; Γ ` !x : int Φε22 ⊆ Φε2Φε22 ∩ Φω2 ⊆ SharLocs

Φ2; Γ ` fork (!x) : int

Φ1 � Φ2 ↪→ Φ

Φ; Γ ` fork (!x; ! y); fork (!x) : int

where

Φ = [{ρx, ρy}; ∅]Φ1 = [{ρx, ρy}; {ρx}] Φ2 = [{ρx}; ∅]Φ11 = [{ρx, ρy}; ∅] Φ22 = [{ρx}; ∅]

Fig. 14. Example program to illustrate sharing analysis

the first) component of Φ. Given Φ1 and Φ2, the combined effect is computed inthe last premise by the judgment Φ1 � Φ2 ↪→ Φ, defined by rule XFlow-Ctxt.

Looking at XFlow-Ctxt, the first premise requires that because effect Φ1 oc-curs before Φ2, the future effect of Φ1 should include Φ2’s standard effect ε2 andits future effect ω2. Intuitively, rule XFlow-Ctxt computes the effect Φ of thesequential composition of Φ1 and Φ2. Thus, returning to Eff-Deref, the futureeffect of Φ1 naturally must contain Flow(C, ρ), according to XFlow-Ctxt, sincethe evaluation of e occurs before the dereference. The third premise of XFlow-Ctxt states that the combined effect Φ should contain the standard effects of bothΦ1 and Φ2. Again, returning to Eff-Deref, we can see that the ε component ofΦ will thus contain the effect of e (i.e., Φε1) and the effect of the dereference itself(Φε2).

Finally, Eff-Fork type checks thread creation. The second premise of Forkindicates that the standard effect Φεe of the thread itself should be contained inthe effect of the parent. Notice the future effect Φωe of the thread is unconstrained(effectively making it ∅). In particular, as expected, it contains no informationabout the effect of the parent, since the two will execute in parallel. Finally, thethird premise adds to the set SharLocs the locations that the parent and childthread (and threads they fork) could access in parallel: the intersection of thestandard effect of the child thread Φεe and the future effect Φω of the parent.

We can see this analysis in action in Fig. 14. The left side of the figure showsa simple code example, and the right side shows the typing derivation for the lastpart of the example, involving the two thread forks.4 Within this derivation, theboxed portion shows the subderivation for the expression fork (!x; ! y). Here wecan see that the standard effect of this expression is Φε1 = {ρx, ρy}, i.e., locationscorresponding to x and y. The future effect Φω1 consists of the location ρx—thisis because the effect of the subsequent thread spawn is {ρx}, and by EffDeref,this effect is also attributed to the parent thread. Consequently, the intersection ofthese two effects is {ρx}, indicating that x is potentially accessed simultaneously bytwo threads. The second thread spawn yields no additional shared locations (sinceΦω2 = ∅).

There are two interesting things to notice about this example. First, y is accessed

4Note that the syntax e1; e2 can be treated as shorthand for (λx : t.e2) e1 for some x that does

not occur free in e2, and t being the source type of e1.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 29

Pointers Allocation SitesBenchmark Total Shared In scope Total Shared In scope

aget 1,411 258 235 352 64 62

ctrace 1,089 129 116 311 12 12engine 1,441 60 17 410 11 7

knot 1,238 338 238 321 30 15

pfscan 987 53 48 240 8 7smtprc 4,275 196 67 1,079 74 46

3c501 10,020 954 913 408 20 20

eql 4,572 2,377 2,168 273 43 35

hp100 19,401 5,268 5,210 497 15 15plip 13,249 2,867 2,823 466 49 49

sis900 38,624 2,648 2,594 779 11 9

slip 13,748 1,338 1,281 382 20 19sundance 34,142 3,313 3,267 753 9 9

synclink 51,147 11,621 11,472 1,298 155 139wavelan 18,799 2,535 2,125 695 128 10

Total 214,143 33,955 32,574 8,264 649 454

Fig. 15. Precision of sharing analysis and scoping

in the parent thread, at line 4, and then subsequently in the first child thread.Nevertheless, our analysis does not consider y as shared, and indeed, y can neverbe simultaneously accessed by both threads. Second, the example illustrates that itis crucial to include the effect of a child thread in the effect of its parent. Otherwise,we would not have discovered that the two child threads both might simultaneouslyaccess x.

Fig. 15 measures the precision of the sharing analysis. For each benchmark, thesecond column shows the total number of pointers and the third column shows thenumber of shared pointers, computed by intersecting the effects at thread creationpoints. We ignore the fourth column for now and return to it in Section 4.2. Wealso report the results of the sharing analysis in terms of allocation sites, where anallocation site is either a call to malloc() or the location of a variable definition (fifthand sixth columns, ignore the last column). The results underline the effectivenessof using contextual effects to compute shared memory locations; only 16% of allpointers and only 8% of all allocation sites are in the SharLocs set computed byFork. This precision is critical to reducing false positives in Locksmith: thread-local data is almost always accessed without a lock held, and thus if Locksmithincorrectly determines a location is thread-shared, it will likely report a data racefor that location.

4.2 Scoping and Uniqueness

We used two additional optimizations to improve the results of the sharing analysiseven further. Consider the program in Fig. 16. Here the reference g (line 1) isvisible within the function f, which allocates two references x and y (lines 3–4),then writes to them (lines 5–6), and then assigns x to g (line 7). The program callsfunction f twice (lines 9–10) in two parallel threads. In this example, the sharinganalysis from Sections 4.1 and 4.3 will determine that x and y are thread-shared atthe writes on lines 5–6, because they are in the effects of both threads. However,in both cases this is overly conservative.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

30 ·

1 let g = ref ( ref 0) in2 let f () =3 let x = (ref 42) in4 let y = (ref 0) in5 y := 1;6 x := 43;7 g := x8 in9 fork f ();

10 f ()

Fig. 16. Limitations of core sharing analysis

Scoping Optimization. Notice that the location y refers to is allocated in thescope of f and never escapes. Hence, the uses of y by the two different threads mustrefer to distinct memory locations. More generally, when computing thread-sharedlocations, we can hide effects on locations that must be thread-local due to scoping.Formally, we change the type rule for fork as follows:

Eff-Fork-DownΦe; Γ ` e : τ

Φεe ⊆ Φε ~ρ =⋃

ρ∈fl(Γ)

Flow(C, ρ) Φεe ∩ Φω ∩ ~ρ ⊆ SharLocs

Φ; Γ ` fork e : τ

Here we compute the set ~ρ of labels that are reachable in the label flow graph fromlocations in Γ, meaning they are visible to both the parent and child thread. Thenwe only add locations to SharLocs if they are also in ~ρ.

Fig. 15 shows the benefit of this optimization. The fourth column shows thenumber of shared pointers and the last column shows the number of shared alloca-tion sites when using the revised rule Eff-Fork-Down. The average percentageof shared pointers and allocation sites improves to 15% and 6%, respectively.

Uniqueness Analysis. Returning to the example program in Fig.16, note that thescoping optimization does not apply to x, since it escapes via a write to the globalvariable g. However, while x does escape the scope of f eventually, there is no waythat it can be accessed by any other thread during the write at line 6, since thealiasing on line 7 has not yet occurred. So, we can safely ignore the access to x atline 6, not requiring it to be protected by a lock.

This situation can occur in C programs when a struct is malloc’d and initializedthread-locally before becoming shared. To model this situation precisely, we devel-oped a uniqueness analysis to determine when a memory access is guaranteed tobe thread-local because the accessed location has not yet become aliased. We thenignore these accesses during the correlation analysis.

Our uniqueness analysis is implemented as a simple, intraprocedural dataflowanalysis. Whenever a location is created (through a local variable definition ora call to malloc), we mark it as unique. When a unique location ρ is assignedto any non-unique location, or a variable with location ρ has its address taken, ρbecomes non-unique. Using this analysis, we discover that x is unique on line 6 in

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 31

With scoping Without scopingand uniqueness and uniqueness

Benchmark Time (s) Warnings Time (s) Warnings

aget 0.85 62 0.88 64

ctrace 0.59 10 0.58 10engine 0.88 7 0.69 11

knot 0.78 12 0.83 28

pfscan 0.46 6 0.43 7smtprc 5.37 46 5.46 74

3c501 9.18 15 8.93 15eql 21.38 35 20.01 35

hp100 143.23 14 140.97 14plip 19.14 42 18.75 44

sis900 71.03 6 70.77 6

slip 16.99 3 16.65 3sundance 106.79 5 103.21 5synclink 1521.07 139 1454.91 155wavelan 19.70 10 20.60 128

Fig. 17. Scoping and uniqueness

our example, and hence is thread-local.Fig. 17 shows the aggregate effect of these two optimizations for Locksmith. We

compare the running time and number of warnings when using both techniques,shown in the second and third columns, against the running time and number ofwarnings without them, in the fourth and fifth columns. Our results show thatthe effect on running times is negligible, but in several of the benchmarks the twooptimizations determine that many accesses are thread-local, significantly reducingthe number of warnings. In all benchmarks but ctrace, the gain in precision is duepurely to the scoping optimization rather than the uniqueness analysis.

4.3 Flow-sensitive sharedness

Consider the example in Fig. 14 again. Suppose we add acquire l and release l beforeand after, respectively, each access !x in the two child threads (lines 5 and 6), forsome lock l. Also suppose that x is aliased by a global variable immediately afterits allocation. This changed program will be correct, but our analysis will falselycomplain that the dereference of x at line 3 is a potential race because it is notprotected by a lock. Moreover, the scoping optimization and uniqueness analysisdescribed in Section 4.2 do not apply, as x is aliased by a global variable. However,at this dereference site, x is not actually shared, and thus requires no guardinglock. This is because it will not become shared until both of the child threads arespawned, at lines 5 and 6.

In this case, as in the uniqueness analysis in Section 4.2, we can safely ignore amemory access when the accessed location is thread-local during the access, evenif it later becomes shared.

To address this problem, we can use a simple dataflow analysis along the ACFGto determine at which sites in the program a location can be dereferenced after itbecomes shared. Any sites that occur before it becomes shared can be dropped fromcorrelation inference. The transfer functions for the analysis are straightforward,as shown in Fig. 18. In essence, we seed the analysis at the fork points with those

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

32 ·

φ kind Shout(φ)

Fork the portion of SharLocs computed at this fork site

all others Shin(φ)

Fig. 18. Sharedness inference

Benchmark No dataflow Context-insensitive Context-sensitive

Time(s) Warnings Time(s) Warnings Time(s) Warnings

aget 0.80 62 0.85 62 1.73 62ctrace 0.58 10 0.59 10 0.81 10engine 0.89 7 0.88 7 1.27 7

knot 0.64 12 0.78 12 1.70 12pfscan 0.45 6 0.46 6 0.57 2smtprc 5.11 46 5.37 46 16.71 46

3c501 9.29 15 9.18 15 11.46 15eql 21.39 35 21.38 35 24.35 35

hp100 143.45 14 143.23 14 172.97 14plip 19.18 42 19.14 42 37.80 42

sis900 71.96 6 71.03 6 82.35 6

slip 17.05 3 16.99 3 18.92 3sundance 106.89 5 106.79 5 117.01 5synclink 1,513.94 139 1,521.07 139 1,823.91 139

wavelan 19.69 10 19.70 10 26.84 10

Total 1,931.31 412 1,937.44 412 2,338.40 408

Fig. 19. The effect on Locksmith’s results of different dataflow strategies for finding shared

location dereference sites

locations made possibly shared due to that fork. Specifically, we combine the typerules in Fig. 13 with Fig. 7 to get a judgement of the form C;φ; Φ; Γ ` fork e : τ ;φfor typing fork e expressions. Then, at every such point in the program, we setΦεe ∩ Φω ⊆ Shin(φ) to seed the analysis.

Unfortunately, while this optimization adds precision in general, it is not veryhelpful for our benchmarks, as shown in Fig. 19. Columns 2 and 3 in the figureshow the results of Locksmith when using the contextual effects analysis (in-cluding scoping and uniqueness) to compute shared locations, without using thedataflow analysis, while columns 4 and 5 include the dataflow analysis. We seethat the running times are nearly the same, but unfortunately, so are the warningcounts. One reason for this is that a location that is eventually shared may bewritten to by the parent after the child is forked, and then shared with the child bywriting to a global variable. The dataflow analysis conservatively considers all ac-cesses after the fork to be potentially shared. When we make the dataflow analysiscontext-sensitive (Section 6.4), we see an improvement in one case—pfscan has 4fewer warnings. However the context-sensitive results are clearly more expensive tocompute. Thus, because employing dataflow is essentially free and could increaseprecision, we enable it by default, while context-sensitive dataflow for discoveringsharing can be enabled via a command-line flag. (Thus the results from columns 4and 5 in Fig. 19 match the results in Fig. 4.)

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 33

Field-sensitive Field-insensitiveCGen Total CGen Total

Program Tm (s) Tm (s) Labels Shr Wrn Tm (s) Tm (s) Labels Shr Wrnaget 0.55 0.85 5,634 62 62 0.50 0.67 5,490 62 62(11)

ctrace 0.40 0.59 4,351 12 10 0.38 0.53 4,285 15 13(5)engine 0.76 0.88 5,051 7 7 0.79 0.91 4,989 59 59(7)

knot 0.55 0.78 4,752 15 12 0.52 0.83 4,566 24 21(12)pfscan 0.36 0.46 4,143 7 6 0.36 0.46 4,139 15 14(5)smtprc 3.09 5.37 14,815 46 46 3.08 5.14 14,917 97 97(43)3c501 7.92 9.18 25,905 20 15 7.60 18.56 22,976 42 42(6)

eql 2.72 21.38 8,954 35 35 2.39 17.99 7,484 42 42(18)hp100 35.92 143.23 31,609 15 14 34.18 976.12 22,214 41 41(10)

plip 16.41 19.14 24,124 49 42 17.82 103.21 18,969 60 60(6)sis900 65.66 71.03 84,797 9 6 60.45 132.18 71,630 42 42(6)

slip 15.11 16.99 25,371 19 3 15.44 33.24 18,333 56 31(5)sundance 96.72 106.79 73,552 9 5 81.44 6835.26 61,540 44 44(8)synclink 1433.56 1521.07 68,643 139 139 1232.05 timeout n/a 171 n/awavelan 17.89 19.70 30,052 10 10 16.90 40.19 21,071 43 44(6)

Fig. 20. Field sensitivity

5. EFFICIENTLY AND PRECISELY MODELING STRUCT AND VOID* TYPES

In this section we discuss some additional techniques that we used to increase thespeed and precision of Locksmith as applied to C programs. In particular, we ex-plain how we analyze struct and void* types effectively. We initially explored someof these ideas when developing CQual [Foster et al. 2006]. This paper’s contributionis to express the ideas more precisely, in particular using a new formalism for ouranalysis of structures, and to measure their costs and benefits directly, measuringLocksmith’s performance and precision when using one or the other of severaldifferent strategies.

5.1 Field sensitivity

In designing a static analysis for C, one important decision is whether to modelC struct types field-insensitively or field-sensitively [Heintze and Tardieu 2001]. Ina field-insensitive analysis, all fields of a struct type are conflated, i.e., x.f and x.gare treated as the same location by the analysis for any fields f and g. In a field-sensitive analysis, different struct fields are distinguished, i.e., x.f and x.g are treatedas different locations.5 These two design points potentially trade off efficiency andprecision—field-insensitive analysis may be less precise but more scalable, becauseit distinguishes fewer locations. In particular, if there are m occurrences of structtypes, each of which has n fields, then field-sensitive analysis would annotate O(mn)types with fresh locations, whereas field-insensitive analysis would only annotateO(m) types.

We implemented support for both field-insensitive and field-sensitive analysis inLocksmith. Field insensitivity is actually somewhat tricky to use in an analysislike Locksmith, which is layered on top of C types: True field insensitivity wouldthrow away those types, thereby requiring some significant approximations in theanalysis (e.g., conflating all labels of all fields of a struct). Thus, since we had a

5There is a third design point, a field-based analysis, in which x.f and x.g are different, but x.f and

y.f are the same if x and y are instances of the same struct type. We did not explore this optionfor Locksmith, however, because it would greatly reduce the effectiveness of context sensitivity,

which we found was important to improving precision.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

34 ·

full field-sensitive analysis, we opted to implement a simple variation to simulatethe following key aspect of field insensitivity: each instance of a struct uses a singlelocation ρ to represent the top-level location of all fields. Otherwise we use thestandard flow-sensitive implementation. For example, suppose we have a structwith three fields int x, int ∗ y, and int ∗ z. Then for an instance of this struct, fieldsx, y, and z would have types ref ρ(int) (since x can be written to, it has a ref type),

ref ρ(ref ρ′y (int)), and ref ρ(ref ρ

′z (int)), respectively.

Fig. 20 compares the two approaches. For each style of analysis, we list thetime for constraint generation (including annotating types with fresh labels, i.e.,abstract locations and locks), the total analysis time, the number of generated labels(locations and locks), the number of shared locations, and the number of reportedwarnings. Note that since Locksmith issues one warning per unprotected sharedlocation, this means warning counts from field-insensitive and field-sensitive analysisare incomparable: A single warning from field-insensitive analysis might actuallycorrespond to multiple races from the field-sensitive analysis. To normalize thewarning counts, if there is a warning on a location corresponding to a struct field,we count that as n warnings for comparison purposes, where n is the number offields in that struct instance, as computed by our lazy fields algorithm (Section 5.2).The normalized warnings for the field-insensitive analysis are listed in the rightmostcolumn, with the raw number of warnings in parentheses.

These results show that the field-insensitive analysis takes less time to gener-ate constraints and generally creates fewer labels than the field-sensitive analy-sis.6 However, even for the very slightly reduced precision with field insensitivity—conflating the locations of all struct fields—many more locations are consideredshared, which in turn makes Locksmith as a whole both less precise, as evidencedby the warning counts, and considerably slower, since it must infer correlations formore aliases.

5.2 Lazy struct fields

Since field-sensitive analysis can potentially be expensive, in order to achieve theperformance reported in Fig. 20 we had to implement field sensitivity carefully. Atfirst, we used a naive approach, in which we fully annotated all field types of allstruct instances. We quickly ran into scalability problems, however, and were notable to analyze any but the smallest benchmarks.

Examining our benchmarks, we found that many C struct types have a largenumber of fields (up to 300!). However, many large struct types are declared by alibrary and only used in a small subset of the code, and this subset often accessesonly a fraction of the struct’s total fields. Our naive implementation was assigningabstract locations and locks to all of the rarely- or never-used fields, wasting memoryand time generating constraints among them.

To regain scalability, our field-sensitive implementation lazily annotates the fields

6For one program, smtprc, there are fewer field-sensitive labels than field-insensitive labels. This

is because the field-insensitive analysis always creates at least one label (for the location of all

fields) for every occurrence of a struct, whereas the field-sensitive analysis might avoid creatingany labels for a struct instance if it is created and then immediately equated with another structinstance.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 35

τ ::= . . . | t×ζ tC ::= . . . | ζ[j] = τ | ζ = ζ

ζ ∈ pair labels

〈〈t1 × t2〉〉 = t1 ×ζ t2 ζ fresh

〈〈other t〉〉 = τ as in Fig. 7(a)〈〈τ〉〉 = τ ′ where τ ′ is τ with fresh ρ, `, φ, ζ’s

(a) Auxiliary definitions

PairC;φ; Γ ` e1 : τ1;φ1 C;φ1; Γ ` e2 : τ2;φ2 ζ fresh

C ` ζ[1] = τ1 C ` ζ[2] = τ2 tj = std type of ej , j ∈ 1..2

C;φ; Γ ` (e1, e2) : t1 ×ζ t2;φ2

ProjC;φ; Γ ` e : t1 ×ζ t2;φ′ j = 1, 2

τj = 〈〈tj〉〉 C ` ζ[j] = τj

C;φ; Γ ` e.j : τj ;φ′

(b) Type inference rules

C ∪ {t1 ×ζ t2 ≤ t1 ×ζ′t2} ⇒ C ∪ {ζ = ζ′}

C ∪ {ζ = ζ′} ⇒ C[ζ/ζ′]C ∪ {ζ[j] = τ1, ζ[j] = τ2} ∪ ⇒ {τ1 = τ2}

(c) Constraint resolution rule

Fig. 21. Type inference rules for modeling pairs lazily

of struct types [Foster et al. 2006]. Initially we leave all occurrence of struct typesunannotated. Then whenever we encounter a field access in the program, we add theaccessed field to the corresponding struct type. If we create a label flow constraintbetween two struct types, we equate the labels on their fields.

Fig. 21 extends the inference rules from Fig. 7 to implement this lazy field gen-eration algorithm. Our formalism uses pairs instead of general structs, and so weillustrate our approach by modeling pairs lazily. Fig. 21(a) gives the new type andconstraint definitions. Types are the same as before, except pair types now havethe form t ×ζ t, where ζ is a pair label. Notice that this pair type contains unan-notated types t. The pair label ζ is used to track the labeled components of thetype: the constraint ζ[j] = τ indicates that component j of any pair type annotatedwith ζ has annotated type τ . The constraint ζ1 = ζ2 indicates the correspondingcomponents of pairs labeled with ζ1 and ζ2 have the same types.

We also extend 〈〈·〉〉 to introduce fresh pair labels. As shown, when we translatea standard pair type into a labeled pair type, we tag it with a fresh pair label butdo not introduce labels for the component types. This annotation function is usedin Lam from Fig. 7 to give fresh labels to programmer-supplied types. Thus we seethe laziness of this approach: we do not automatically create labels for a pair type

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

36 ·

when it is mentioned in the program text.

Fig. 21(b) gives our modified type rules. Pair types pair creation, which nowassociates a fresh pair label ζ with the output type and constrains the componentsof ζ to their corresponding labeled types. Notice that there is no laziness in thisrule, because we have labeled types for e1 and e2. Here we compute the standardtypes t1 and t2 (of e1 and e2, respectively) by stripping off all labels from τ1 and τ2.Using standard types keeps the analysis “lazy.” Standard types have no (wasted)location or lock annotations; instead, we track the omitted locations off to the sideusing ζ.

More interestingly, Proj types projection, which lazily annotates only the ac-cessed component of the pair. We create a type τj with fresh labels and constrainit to be equal to ζ[j]. In our implementation, rather that creating constraintsζ[j] = τ and then solving them later, we solve these constraints on-line, as we ap-ply the type inference rules. We maintain a partial mapping ST from each ζ[j] toits type. Initially ST is empty. In Proj, if ST (ζ[j]) exists, we use it in place of τjrather than making up a fresh type. Otherwise, we do make a fresh type τj and setST (ζ[j]) = τj . Our implementation for Pair is similar.

Fig. 21(c) gives the resolution rules for our new constraint forms. The firstrewriting rule is for subtyping among pair types. Here we assume the standardtypes of the pairs match (i.e., the program passes the standard type checker) andequate the pair labels, which are merged by the next rewriting rule. The last ruleequates types for different occurrences of ζ[j].

Notice that we lose some precision here compared to the previous type system.In our lazy approach, subtyping among pair types requires their component typesto be equal. The constraint resolution rule from Fig. 8, on the other hand, permitssubtyping the component types, which is more precise. However, in practice, Cprograms mostly manipulate pointers to structs, and subtyping pointer types re-quires that the pointed-to types are equal (Fig. 8), which negates any benefits ofthe more precise subtyping rule. Thus we lose little practical precision with thisapproach.

Moreover, in our implementation, we maintain pair labels ζ in a union-find datastructure. Given the constraint ζ = ζ ′, we unify the two sides of the equationand equate the associated types in ST . This unification process reduces the needto create types for fields. For example, if ζ[0] = τ and ζ ′[1] = τ ′ are the onlymappings in ST , then after we unify the pair labels we will have ζ[0] = ζ ′[0] = τand ζ[1] = ζ ′[1] = τ ′, without creating any additional field types. Thus, we alsogain efficiency from this approach.

We already saw in Fig. 20 that field sensitivity, while it typically produces morelabels that field-insensitive analysis, does not yield an inordinate number of labels.Fig. 22 gives some measurements that illustrate why this is the case. For eachbenchmark, we list the total number of struct types in the program, the number ofinstances of all struct types, the total possible number of instance fields, and theinstance fields that are actually used, both in absolute numbers and as a percentageof the total number of fields. We define the total number of instance fields as allthe used fields of all instances of struct types. For example, if a program definestwo instances of a single struct type with three fields and the program accesses

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 37

struct InstancesBenchmark Types Total Total fields Used fields

aget 11 61 563 179 (32%)

ctrace 12 43 427 79 (19%)

engine 14 48 618 85 (14%)knot 16 68 565 156 (28%)

pfscan 13 65 639 78 (12%)

smtprc 19 80 1,019 106 (10%)

3c501 37 1,025 14,691 3,768 (26%)eql 33 888 9,617 2,301 (24%)

hp100 36 1,786 41,537 12,072 (29%)

plip 46 1,986 24,161 7,320 (30%)sundance 58 4,141 51,400 16,504 (32%)

sis900 60 4,511 58,952 18,106 (31%)

slip 39 1,426 31,529 8,319 (26%)synclink 49 5,431 68,423 38,497 (56%)wavelan 58 2,879 31,823 11,608 (36%)

Total 501 24,438 335,964 119,178 (35%)

Fig. 22. Lazy field statistics

all fields of one instance and two of the other so that the lazy field analysis onlypopulates those with location labels, we “use” five instance fields out of a totalof six. This data shows that on average across all the benchmarks, only 35% ofthe possible instance fields are actually used. Thus, lazy field analysis is effectivebecause modeling those fields would otherwise consume memory and time with nogain in precision.

5.3 Modelling void*

In addition to deciding how to model structs, another important decision in ana-lyzing C code is determining how to model void*, which is typically used by Cprogrammers to express polymorphism. For Locksmith, the key choice is howto track the abstract locations and locks of types that “flow” to or from void*

positions. We experimented with three different strategies:

—Conflate void*. Since any type might be cast to or from void*, an imprecisebut sound approach is to conflate all abstract locations and locks that reach avoid* type. More precisely, let τ = ref ρ(void) be an occurrence of void* inthe program, labeled with location ρ. If we ever derive a constraint τ ≤ τ ′ orτ ′ ≤ τ , we equate all the locations in τ ′ with ρ. This is quite conservative, sinceit effectively aliases all locations reachable from a type that flows to or from avoid*. If any locks occur inside τ ′, Locksmith warns about the loss of precision,and considers these locks non-linear and thus unable to protect memory locations.

—Singleton void*. In the previous approach, we conflated labels because void*

types may be cast arbitrarily. However, it could be that a particular void* inthe program is used with only one concrete type. We thus tried refining theprevious approach as follows. Let τ = ref ρ(void) be an occurrence of void*.We wish to define the partial function base type as a map from void* occurrencesto the single concrete type it could be replaced with. Given a constraint τ ≤ τ ′

or τ ′ ≤ τ , there are three cases. If base type(τ) is as yet undefined, we set

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

38 ·Conflate void* Single void* Type-based void*

Benchmark Tm (s) Wrn Tm (s) Wrn Tm (s) Wrn

aget 1.89 72 1.98 72 0.85 62

ctrace 0.60 10 0.61 10 0.59 10

engine 0.90 7 0.89 7 0.88 7knot 1.46 12 0.77 12 0.78 12

pfscan 0.45 6 0.46 6 0.46 6

smtprc 5.22 46 5.26 46 5.37 46

3c501 246.24 121 358.98 121 9.18 15eql 12.40 41 12.58 42 21.38 35

hp100 105.80 50 782.52 50 143.23 14

plip 413.96 60 4,011.22 148 19.14 42sis900 778.20 149 6,037.00 152 71.03 6

slip 88.79 68 timeout n/a 16.99 3

sundance 3,188.32 148 7,661.57 148 106.79 5synclink timeout n/a timeout n/a 1,521.07 139wavelan 168.28 112 169.03 111 19.70 10

Fig. 23. Performance of void* strategies

it to τ ′. Otherwise, if τ ′ has the same shape (i.e., underlying standard type) asbase type(τ), we generate the constraint base type(τ) ≤ τ ′ or τ ′ ≤ base type(τ), asappropriate. Otherwise, we revert to the above conflation strategy, and collapsebase type(τ) and any other types that flow to τ . As before, if we collapse typesthen we treat any locks occurring inside τ as non-linear, and assume they do notprotect any locations. This approach is sound but is more precise than conflationin the case of void*s that are used with only one type. Indeed, we found thatapproximately one third of void* pointers in our benchmarks are only cast to orfrom one non-void* type.

—Type-based void*. Finally, we can improve further on the previous approach if weare willing to sacrifice completely sound modeling of void*. For each standardtype t that flows to τ = ref ρ(void), we create a type base type(τ, t). Then givena constraint τ ≤ τ ′ or τ ′ ≤ τ , we generate the constraint base type(τ, t) ≤ τ ′ orτ ′ ≤ base type(τ, t), where t is the underlying standard type from τ ′. Thus, ourbase type function is now indexed by the shape of the underlying type, similar toa C untagged union. We will never collapse types (or mark locks as non-linear)using this strategy. Modeling void*s this way is unsound, since we may missrelationships among different types that are cast to or from void*—this wouldbe like storing a pointer into an untagged union but then extracting an integer.Our assumption is that this behavior is unlikely or harmless to our analysis, sinceif it were not the program would likely fail. By default, Locksmith uses thisstrategy, possibly sacrificing soundness, to reduce the number of false positives.The user can switch to one of the above alternative, conservative strategies usinga run-time flag.

For all of these approaches, we need to integrate our modeling of void* with ourlazy modeling of structs. That is, we might discover during constraint resolutionthat a void* type points to a struct type, or that a struct type contains a void*

type.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 39

1 struct cache entry {2 int refs ;3 pthread mutex t refs mutex ;4 ...5 };6

7 void cache entry addref (cache entry ∗entry) {8 ...9 pthread mutex lock(&entry→refs mutex);

10 entry→refs++;11 pthread mutex unlock(&entry→refs mutex);12 ...13 }

Fig. 24. Example code with a per-element lock

Fig. 23 compares the running times and number of warnings produced for eachvoid* strategy. We can see that on most of the benchmarks, the type-based void*

approach yields both many fewer warnings and is much faster than the other ap-proaches. This phenomenon is similar to that of Fig. 20: the more precise analysiscauses fewer locations to be conflated, which both speeds up the computation ofthe Flow() sets and reduces the number of shared locations. Moreover, the type-based warnings are much easier to follow for the user, as there is much less falsealiasing due to conflation. Though the type-based approach could be unsound, amanual analysis of a sample of the additional warnings produced by the alternativeanalyses found no additional races.

6. CONTEXT SENSITIVITY

So far, the analyses we have presented have been context-insensitive. While theresulting analysis is easy to understand and implement, its precision suffers. Thissection considers how we add context sensitivity to solve these problems.

We consider two kinds of context sensitivity. The first is standard: we would liketo analyze different calls to the same function distinctly, rather than conflate them.We use an approach pioneered by Reps et al. [1995], who showed how to reduce theproblem of tracking flow context-sensitively through function calls to the problemof context-free language (CFL) reachability. The insight is to view a call to andreturn from some function f as a string containing a left and right parenthesis,respectively, subscripted by an index identifying the call site. Thus the problem oftracking flow through function calls is one of matching like-subscripted parenthe-ses. We draw ideas more directly from Rehof and Fahndrich [2001] and Fahndrichet al. [2000], which apply Reps et al.’s idea to label flow analysis and points-toanalysis, respectively. To solve context-free language reachability constraints, weuse Banshee, which encodes and solves the problem using set constraints.

In addition to function calls, the analyses presented so far conflate all locks ina recursive data structure, causing further imprecision. Fig. 24 illustrates thisproblem with code extracted from the knot web server. Here cache entry is a linkedlist with a per-node lock refs mutex that guards accesses to the refs field. Withoutsome added context sensitivity, Locksmith conflates all the locks and locations in

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

40 ·

1 let id = λa:ref ρa(int).a in2 let x = id1 (ref 1) in3 let y = id2 (ref 2) in4 !x

ρ1

ρ2ρa ρr

ρx

ρy

idρ1

ρ2ρa ρr

ρx

ρy

id(1

(2

)1

)2

(a) Source program (b) Monomorphic analysis (c) CFL polymorphic analysis

Fig. 25. Precision gain for label flow from context-sensitive analysis with universal polymorphism

the data structure. As a result, it does not know exactly which lock is held at thewrite to entry→refs, and reports that entry→refs may not always be accessed withthe same lock held, falsely indicating a potential data race.

We augment our analysis to solve this problem using a kind of “data structuresensitivity.” To do this, we provide support for existential quantification to relateelements within data structures. This support is encoded as a CFL reachabilityproblem in a manner similar to our encoding of context sensitivity for function calls.We add annotations to specify that in type cache entry, the fields refs and refs mutexshould be given existentially quantified labels. Then we add pack annotations whencache entry is created and unpack annotations wherever it is used, e.g., withincache entry addref. Locksmith then can determine that the refs mutex lock in anode always guards the refs field in that node.

6.1 Examples

We provide two examples to illustrate our approach to supporting context sensitiv-ity via CFL reachability, considering both universal and existential quantification.

Universal context sensitivity. Fig. 25 gives the canonical example illustrating thebenefits of context sensitivity for label flow analysis. (We discuss context-sensitivedataflow analysis below.) This program defines an identity function id and appliesit twice on distinct locations, on lines 2 and 3. As in Section 2, we have indexedeach syntactic use of id with an integer. Fig. 25(b) shows a simplification of theconstraint graph produced by applying the context-insensitive type rules in Fig. 7.Here ρi is the location containing integer i, locations ρa and ρr are from the domainand range types of id, respectively, and ρx and ρy are from the types of x and y.Notice that when we compute the transitive closure of these constraints, we willdiscover that both ρ1 and ρ2 flow to ρx, even though only ρ1 may actually reachthe dereference of x at run time.

Fig. 25(c) shows how using context-free language reachability, which we discussedbriefly in Section 2, eliminates this imprecision. When we use the type of id, welabel the generated constraints with indexed parentheses. In our example, the callon line 2 yields edges ρ1 →(1 ρa and ρa →)1 ρx, and analogously for the call online 3. When we resolve the constraints in Fig. 25(c), we only transitively closepaths that contain no mismatched edges. In this case, that means there is a pathfrom ρ1 to ρx, since (1 matches )1 , but there is no path from ρ2 to ρx, since (1does not match )2 .

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 41

1 let f = λ a:ref ρa(int).0 in2 let g = λ b:ref ρb(int).print b in3 let p = if · · · then4 pack1 (f, ref 1)5 else6 pack2 (g, ref 2)7 in8 unpack (p1, p2) = p in9 p1 p2

ρ1

ρ2ρp2 .

p∃

ρp1

ρa . ρb .

ρprint

f g

ρ1

ρ2ρp2 .

p∃

ρp1

ρa . ρb .

ρprint

f g

(1

(2)1 )2

(a) Source program (b) Monomorphic analysis (c) CFL polymorphic analysis

Fig. 26. Precision gain for label flow from context-sensitive analysis with existential polymorphism

Existential context sensitivity. Consider the example shown in Figure 26(a). Inthis program, function g prints its argument, whereas function f does not.7 Inlines 3–7 of this program we create existentially quantified pairs using pack opera-tions in which f is paired two distinct locations, initialized with values 1 and 2. Asin the example for universal polymorphism above, we have indexed each syntacticoccurrence of pack with an integer. Using an if, we conflate these two pairs, bindingone of them to p. In the last line we use p by applying its first component to itssecond component.

Fig. 26(b) shows a simplification of the constraint graph produced by applying thecontext-insensitive type rules in Fig. 7. Again, ρi is the location containing integeri, locations ρa and ρb are from the domain types of f and g respectively, and ρp1,ρp2 are from the types of the existential package p, and ρprint is the domain of theprint instruction. In this example, f can only be applied to the reference to 1, andg to the reference to 2. However, in the transitive closure of these constraints ρ1flows to ρprint, suggesting a flow that cannot happen at run-time.

Fig. 26(c) shows how using context-free language reachability eliminates thisimprecision. When we create the type of p, we label the generated constraints withindexed parentheses. (This is as opposed to the case of universal polymorphismabove, where the parenthesis edges are introduced at the point of use rather thanthe point of creation.) In our example, the pack on line 4 yields edges ρ1→(1 ρp1and ρp2→)1 ρa, and similarly for the call on line 6. We resolve the constraints inthe same way as in the previous example, which means that there is a path fromρ2 to ρprint, since (2 matches )2 , but there is no path from ρ1 to ρprint, since (1does not match )2 .

In the remainder of this section, we show how to incorporate CFL context sensi-tivity into our system and also apply it to dataflow analysis (analogously to Repset al [Reps et al. 1995]). In Locksmith we apply the ideas of universal and exis-tential polymorphism from label flow analysis to correlation inference, in the sameway. We focus on universal polymorphism here, and refer the reader to our previ-

7For the purpose of this example, we augment the language with a print expression that prints

the contents of its argument.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

42 ·

e ::= . . . | let f = v in e2 | fi | fix f :t.vσ ::= (∀.τ, ~η)

η ::= ` | ρ | φC ::= . . . | η �i+ η | η �i− η

(a) Extensions to source language, types, and constraints

LetC;φ1; Γ ` v1 : τ1;φ2 ~η = fl(Γ)

C;φ2; Γ, f : (∀.τ1, ~η) ` e2 : τ2;φ3

C;φ1; Γ ` let f = v1 in e2 : τ2;φ3

Instτ ′ = 〈〈τ〉〉 C ` τ �i+ τ ′ C ` ~η �i± ~η

C;φ; Γ, f : (∀.τ, ~η) ` fi : τ ′;φ

Fixτ = 〈〈t〉〉 ~η = fl(Γ) C;φ1; Γ, f : (∀.τ, ~η) ` v : τ ′;φ2

C ` τ ′ ≤ τ τ ′′ = 〈〈t〉〉 C ` τ �i+ τ ′′ C ` ~η �i± ~ηC;φ1; Γ ` fix f :t.v : τ ′′;φ2

(b) Additional type inference rules

Fig. 27. Extensions to Fig. 7 for context sensitivity

ous work on existential context sensitivity [Pratikakis et al. 2006a]. We end withexperimental results illustrating the precision benefits of context sensitivity.

6.2 Labeling and Constraint Generation

Fig. 27 extends our core constraint generation rules from Fig. 7, as in Pratikakiset al. [2006a; 2006b]. We begin by introducing three new kinds of expressions, asshown in Fig. 27(a). Expression let f = v in e binds f to value v during eval-uation of e, assigning f a polymorphic type. Here we assume that the names ofpolymorphically-typed variables are syntactically distinct from other (monomor-phically) typed variables. In practice for C, we only introduce polymorphism forfunctions, whose names are easily identified. Next, the expression fi corresponds toa use of variable f annotated with index i. In practice, we simply assign a distinctindex to each syntactic use of a function name. Finally, fix f :t.v binds f to v recur-sively inside of v (which will always be a function in practice). In C, all functionsare potentially mutually recursive, and so we treat a C program as if it were a setof nested fix bindings. Notice that we do not need to visit a function’s definitionbefore its calls, because we can generate the right polymorphic function type fromsimply the function declaration. We then generate the constraints for the functionbody later upon encountering the function definition. So, it is not necessary to visitfunction definitions in a topological order of the call graph.

Let- and fix-bound variables f are assigned polymorphic type schemes σ of theform (∀.τ, ~η). Here τ is the generalized type, and ~η is the set of labels (i.e., ρ’s,`’s and φ’s) that are not quantified in the type scheme [Henglein 1993]. Duringtyping of the new language forms, we generate instantiation constraints of the formη �ip η′, where p is a polarity, either + or −, and i is an index. Informally, such aconstraint means that there is some substitution Si that instantiates η to η′. Thepolarity indicates the direction of “flow”. More particularly, a constraint η �i+ η′

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 43

C ∪ {int �ip int} ⇒ C

C ∪ {ref ρ(τ) �ip ref ρ′(τ ′)} ⇒ C ∪ {ρ �ip ρ′, τ �i± τ ′}

C ∪ {lock` �ip lock`′} ⇒ C ∪ {` �ip `′}

C ∪ {t1 ×ζ t2 �ip t1 ×ζ′t2} ⇒ C ∪ {ζ �i± ζ′}

C ∪ {(τ1, φ1)→ (τ ′1, φ′1) �ip (τ2, φ2)→ (τ ′2, φ

′2)} ⇒

C ∪ {τ1 �ip τ2, φ1 �ip φ2, τ ′1 �ip τ ′2, φ′1 �ip φ′2}

C ∪ {ζ �i± ζ′} ∪ {ζ[j] = τ} ∪ ⇒ {τ �i± ζ′[j]}C ∪ {ζ′ �i± ζ} ∪ {ζ[j] = τ} ∪ ⇒ {ζ′[j] �i± τ}

C ∪ {ρ1 �i− ρ0, ρ1 ≤ ρ2, ρ2 �i+ ρ3} ∪ ⇒ {ρ0 ≤ ρ3}C ∪ {`1 �i− `0, `1 ≤ `2, `2 �i+ `3} ∪ ⇒ {`0 ≤ `3}

Fig. 28. Extensions to Figs. 8 and 21 for context sensitivity

corresponds to an output from a function, and we draw it with an edge η →)i η′.Similarly, a constraint η �i− η′ corresponds to an input to a function, and we draw it

with an edge η′ →(i η′. Notice that for a negative polarity constraint, the directionof the graph edge is opposite the direction of the �.

Fig. 27(b) shows the new type inference rules.8 Let first types v1, and thentypes e2 with f bound to the type scheme (∀.τ1, ~η), where τ1 is the type of v1, and~η is the set of free labels of Γ (as usual for Hindley-Milner-style polymorphism,these are the labels we cannot quantify [Pierce 2002]). In Inst, we instantiate atype scheme (∀.τ, ~η) at index i. We generate a type τ ′ by reannotating τ withfresh labels. We then generate an instantiation constraint τ �i+ τ ′ to indicate thatτ is used at index i at type τ ′, and we generate constraints ~η �i± ~η to indicatethat the substitution Si represented by the constraint τ �i+ τ ′ must not renameany variables in ~η, i.e., they must be instantiated to themselves. (Here the ± isshorthand for generating two constraints, one with polarity + and one with polarity−.) Lastly, Fix combines Let and Inst, binding f to a type scheme during thetyping of v, and then instantiating f to a fresh type as the result.

6.3 Context-Sensitive Label Flow Constraint Resolution

To compute the flow of labels in our new constraint system, we extend our constraintrewriting rules as shown in Fig. 28. The first set of rewrite rules corresponds tothe standard subtyping rules. We reduce �ip constraints to components of a typein a manner that is invariant for references and pairs (due to lazy fields, and thusanalogously to equating pair labels in Fig. 21), and co- and contra-variant forfunction return and argument types, respectively. Here we write p for the oppositeof polarity p.

The next two rules propagate instantiation constraints to components of a lazypair. The first rule requires that if ζ is instantiated to ζ ′, then the j component of ζis instantiated to the j component of ζ ′. The next rule handles the other directionof instantiation. Note that in both rules, the generated constraint on the right-hand

8We have implicitly relaxed the definition of Γ to also include type schemes σ.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

44 ·

side refers to ζ ′[j] although that might be empty. In that case we assume that it isset to 〈〈τ〉〉 (not shown), i.e., a type of the appropriate shape, annotated with freshlabels.

The last two rewrite rules propagate constraints along paths with matched paren-theses. Pictorially, given a sequence of constraints ρ0 →(i ρ1 → ρ2 →)i ρ3, the firstrewrite rule generates a new constraint ρ0 → ρ3, derived from matching the paren-theses on the path; this is called matched flow by Rehof and Fahndrich [2001]. Thelast rewriting rule follows the same pattern for abstract locks.

Given these rewriting rules, our definition of Flow() remains the same:

Flow(C, ρ) = {ρ′ | ρ′ ≤ ρ ∈ Sol(C)}Flow(C, `) = {`′ | `′ ≤ ` ∈ Sol(C)}

For example, letting C be the constraints in Fig. 25, we have Flow(C, ρx) = {ρ1}and Flow(C, ρy) = {ρ2}.

6.4 Context-sensitive Data Flow Analysis

We show now how we extend the dataflow analysis and ACFG with context sensi-tivity, encoding it also with parametric polymorphism.

As discussed above, each instantiation of a type scheme σ = (∀.τ, ~η) at index igenerates the constraint τ �i+ τ ′, where τ ′ = 〈〈τ〉〉. We say that τ is the abstracttype and τ ′ is the instance type at the instantiation i. Moreover, the instantiation idefines a substitution Si of labels, such that Si(τ) = τ ′ and also for all labels η ∈ ~η,we have Si(η) = η. We represent the reverse substitution with S−1

i , mapping the

labels of τ ′ to τ . For example, the instantiation ref ρ(int) �i+ ref ρ′(int) defines

the substitutions Si(ρ) = ρ′ and S−1i (ρ′) = ρ. Note that the substitutions Si and

S−1i only translate between the abstract and the instance type, regardless of the

instantiation polarity. So, even if the above instantiation had negative polarity,ref ρ(int) �i− ref ρ

′(int), the substitutions remain Si(ρ) = ρ′ and S−1

i (ρ′) = ρ.Consider an instantiation of a function type according to Fig. 28, (τ1, φ1) →

(τ ′1, φ′1) �i+ (τ2, φ2) → (τ ′2, φ

′2). This generates the constraints φ1 �i− φ2 and

φ′1 �i+ φ′2 among the statement labels representing the function start and end ofthe abstract and instance types. We extend the definition of dataflow analysis onthe ACFG to account for the two kinds of instantiation edges, so that when wepropagate facts across instantiation edges, they are brought to the correct context.Namely, we apply the substitution Si to all facts propagated from φ′1 to φ′2, totranslate all labels defined in the context of φ′1 to the corresponding labels in thecontext of φ′2. Conversely, we apply the substitution S−1

i to all facts propagatedfrom φ2 to φ1 (recall that the negative instantiation polarity reverses the directionof the graph edge), to translate all facts in terms of the abstract type of the instan-tiation. Note that Si is a partial map, so it is possible that not all facts defined atthe left side of the instantiation can be expressed in terms of its right side, or viceversa. In general, we only propagate the facts that can be expressed at the targetstatement label of an instantiation edge.

Although these rules might resemble the Call and Ret kinds and edges in thecontrol flow graph, in fact they are orthogonal. Specifically, an instantiation edgebetween statement labels corresponds to an occurrence of a function name in theprogram, whereas statement labels with kind Call and Ret correspond to a function

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 45

1 let mylock = λl:lock`a.acquire a in

2 let l1 : lock`1 = newlock in

3 let l2 : lock`2 = newlock in4 let = mylock1 l1 in5 let = mylock2 l2 in6 !x

φ1 φ2

φin φout

φ4 φ5φ3 φ6

φacq

NewL(l1) NewL(l2) Acc(x)

Acq(a)

(1 )1(2 )2

(a) Source program (b) ACFG

Fig. 29. Context-sensitive analysis for lock state

invocation. In many cases these happen to coincide, but one does not imply theother in general. For instance, when a program uses a function pointer to aliasmany functions and invokes it once, then many instantiations correspond to oneinvocation, whereas when the program assigns one function to a function pointer,but invokes it many times, then one instantiation has many invocations.

Lock State. In the Lock State Analysis we propagate the set of held locks acrossinstantiation edges as discussed above, by applying the appropriate renaming, ac-cording to the polarity of the constraint. Namely, for an instantiation φ �i+ φ′ that

corresponds to φ →)i φ′, we translate the set of held locks at φ by applying thesubstitution Si, we close the translated set under aliasing, and we propagate theresulting set of held locks (and all their aliases) to statement φ:

Flow(C, Si(Acqout(φ))) ⊆ Acq in(φ′)

Similarly, for an instantiation φ �i− φ′ that corresponds to φ′ →(i φ, we translate

the set of held locks at φ′ by applying the substitution S−1i before propagating it

to φ:

Flow(C, S−1i (Acqout(φ

′))) ⊆ Acq in(φ)

For example, the program in Fig. 29(a) defines a wrapper function for acquiringa lock that takes an argument a of type lock `a and acquires it. The program createstwo locks and acquires them before dereferencing a variable x (not defined here, forbrevity). Clearly, since the function mylock acquires its argument, the mechanismfor “hiding” irrelevant locks using Call and Ret nodes has no effect here. Indeed,we need to differentiate between the two contexts of the calls to mylock (markedwith indices 1 and 2) to infer that both locks l1 and l2 are held at the dereferencepoint. We do this using the context-sensitive ACFG shown in Fig. 29(b), simplifiedby omitting nodes of no interest for this example. During the dataflow analysis,we infer (as in the monomorphic case) that at the end of the function (φout) theabstract lock ` is held (` ∈ Acqout(φout)). We also have φout �1

+ φ4 and φout �2+

φ5. Moreover, from the instantiations 1 and 2 of mylock’s type (lock `a, φin) →(int , φout), we have S1(`a) = `1 and S2(`a) = `2. To propagate the set of held locksalong the instantiation edges φout �i+ φ4, we apply the corresponding substitutionto the set of held locks, propagating S1(Acqout(φout) = S1({`a}) = {`1} to φ4.Similarly, we propagate S2(Acqout(φout) = S2({`a}) = {`2} to φ5.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

46 ·

Eff-LamΦf ; Γ, x : τ ′ ` e : τ

Φ; Γ ` λx:e. : τ ′ →Φf τ

(a) Type rule for function definition

C ∪ {τ1 →Φ1 τ ′1 �ip τ2 →Φ2 τ ′2} ⇒ C ∪ {τ1 �ip τ2, τ ′1 �ip τ ′2,Φ1 �ip Φ2}

C ∪ {Φ1 �ip Φ2} ⇒ C ∪ {Φα �ip Φα,Φω �ip Φω}

(b) Constraint resolution rules

Fig. 30. Context-Sensitive Contextual Effects

Correlation Inference. We extend the correlation inference with context sensitiv-ity in a similar way, adapted for a backwards analysis. Specifically, at instantiationφ �i+ φ′, due to the backwards direction of the propagation, we propagate from φ′

to φ. Since φ′ lies in the “instance” context, we use S−1i to translate the state at

φ′ to the state at φ. Namely, for every correlation ρ� ~ at φ′, we add a correlationS−1i (ρ) � Flow(C, S−1

i (~)) to φ.Likewise, for negative instantiation edges φ �i− φ′, we propagate from φ to φ′.

As in this case φ lies in the left side of the instantiation, we use Si to translatethe state at φ to the state at φ′. Now, for every correlation ρ � ~ at φ, we add acorrelation Si(ρ) � Flow(C, Si(~)) to φ.

6.5 Context-sensitive Sharing Analysis

We extended the sharing analysis with context sensitivity, both for computing theshared locations at fork points using context-sensitive contextual effects, and alsofor the flow-sensitive propagation of sharing information that marks the interestingdereferences in the program.

Contextual Effects. The contextual effect system presented in Section 4.1 can beextended with context sensitivity in the same way as the label flow analysis. Aspresented in detail in previous work [Neamtiu et al. 2008], function types are an-notated with the effect Φf of the function. We repeat the type rule for functiondefinition in Fig. 30(a). Note that a function type is annotated with the effect Φfof the function body. Moreover, since function definition itself has no effect, it canbe typed under any effect Φ. As in Section 4.1, we present contextual effects asa standalone system, although it is straightforward to combine with the rules inFig. 27. Fig. 30(b) defines the instantiation for annotated function types and con-textual effects. The contextual effect of a function is instantiated covariantly, whichtranslates to a covariant instantiation for the standard effect, and a contravariantinstantiation for the future effect, because the standard effects of the function aredefined inside the function and “returned” to the environment, whereas the fu-ture effect is defined outside the function, in the calling contexts, and “enters” thefunction.

Note that the future effect ω at a given program point in a function includes

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 47

the effects of the program after the function returns. In combination with contextsensitivity this might cause some locations that are in the effect (i.e. accessed afterthe current function returns) to not have a corresponding location, or not yet exist,in the current context. In other words, there might not be a matched parenthesispath from a location ρ, dereferenced in the continuation, to the future effect ω inthe current location. For example, consider the toy program:

1 let f = λ x . x+1 in2 f1 1;3 let p = (ref 41) in4 !p

Clearly, the variable p is not in scope in the body of function f, and moreover,there is no alias of p that is in scope either. This means that there can be nomatched parentheses path from p to the future effect of expression x + 1 in thebody of f. Indeed, the only path from p to the future effect of the expressioninvolves a (1 edge due to the instantiation of f. However, p is clearly in the futureeffect of the expression x + 1, as it is dereferenced later in the program, after thecall to f. To address this problem, when solving for future effects at fork pointsto compute shared locations, we consider paths that do not contain mismatchedparentheses (a.k.a. PN-flow [Fahndrich et al. 2000]), instead of paths with onlymatched parentheses. For the same reason, we also use PN-flow to compute theset of labels in scope and their aliases for the scoping optimization discussed inSection 4.2.

Shared Locations Propagation. The propagation of shared locations according todataflow discussed in Section 4.3 is straightforward to extend with context sensitiv-ity, in the same way as the above dataflow analyses for lock state and correlationinference. For positive instantiation edges, φ �i+ φ′, we propagate from φ to φ′

(forwards analysis), using Si to translate the set of shared locations at φ to thecontext of φ′ and adding the closed set (to account for aliasing) to the state at φ′:

Flow(C, Si(Shout(φ))) ⊆ Shout(φ′)

Similarly, for negative instantiation edges φ �i− φ′, we propagate from φ′ to φ usingSi to translate the shared locations to the context of φ:

Flow(C, S−1i (Shout(φ

′))) ⊆ Shout(φ)

6.6 Results

Fig. 31 compares the running times and number of warnings for context-sensitiveand context-insensitive versions of Locksmith. Note that since the context-sensitiveanalysis is no less sound than the context-insensitive analysis, any warning it elim-inates is a false positive. The context-sensitive results are the same as Fig. 4,reproduced here for convenience. These results show that context sensitivity signif-icantly increases the running time of the analysis, often very significantly, e.g., formost of the Linux drivers. The exceptions are the sis900 and slip benchmarks, forwhich the imprecision of context-insensitive analysis creates so much aliasing thatLocksmith runs out of memory trying to compute the closure of the label flow

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

48 ·Context-sensitive Context-insensitive

Benchmark Time (s) Warnings Time (s) Warnings

aget 0.85 62 0.64 77

ctrace 0.59 10 0.42 21

engine 0.88 7 0.60 15knot 0.78 12 0.60 31

pfscan 0.46 6 0.50 26

smtprc 5.37 46 4.16 128

3c501 9.18 15 0.75 20eql 21.38 35 0.86 41

hp100 143.23 14 2.76 25

plip 19.14 42 1.39 46sis900 71.03 6 Out of Mem. n/a

slip 16.99 3 Out of Mem. n/a

sundance 106.79 5 1.32 20synclink 1521.07 139 23.42 227wavelan 19.70 10 9.59 143

Fig. 31. Comparison of context sensitivity and insensitivity

graph. Furthermore, we see that context sensitivity notably reduces the number ofwarnings reported by Locksmith, eliminating many false positives.

7. RELATED WORK

Several systems have been developed for detecting data races and other concurrencyerrors in multi-threaded programs, including dynamic analysis, static analysis, andhybrid systems.

Dynamic systems such as Eraser [Savage et al. 1997] instrument a program to finddata races at run time and require no annotations. The efficiency and precision ofdynamic systems can be improved with static analysis [Choi et al. 2002; O’Callahanand Choi 2003; Agarwal et al. 2005]. Dynamic systems are fast and easy to use,but cannot prove the absence of races, and require comprehensive test suites.

Researchers have developed type checking systems against races [Flanagan andAbadi 1999] for several languages, including Java [Flanagan and Freund 2000],Java variants [Boyapati and Rinard 2001], and Cyclone [Grossman 2003]. Suchsystems based on type checking perform very well but require a significant numberof programmer annotations, which can be time consuming when checking large codebases [Engler and Ashcraft 2003; Flanagan and Freund 2001]. Static race detectionin ESC/Java [Flanagan et al. 2002], which employs a theorem prover, similarlyrequires many annotations.

Some researchers have developed tools to automatically infer the annotationsneeded by the Java-based type checking systems just mentioned. Most target Java1.4, which simplifies the problem by permitting only lexically-acquired locks viasynchronized statements, whereas C (and Java 1.5) programs may acquire and re-lease locks at any program point. Houdini [Flanagan and Freund 2001] can infertypes for the original race-free Java system [Flanagan and Freund 2000], but lackscontext sensitivity. More recently Agarwal and Stoller [Agarwal and Stoller 2004]and Rose et al [Rose et al. 2005] have developed algorithms that infer types basedon dynamic traces, but these require sizeable test suites to avoid excessive false

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 49

alarms. Flanagan and Freund [Flanagan and Freund 2007] have proposed a systemfor inference which is formulated to support parameterized classes and dependenttypes. Though the problem is NP-complete, their SAT-based approach can ana-lyze 30K lines of Java code in 46 minutes. Von Praun and Gross’s dataflow-basedsystem [von Praun and Gross 2003] also requires no annotations and performs well,checking 2000-line programs in a few seconds.

Naik, Aiken, and Whaley present a race detection system for Java [Naik et al.2006]. Their system scales well to large Java programs and has found several races.Analyzing Java 1.4 avoids some problems we encountered analyzing C code, such asflow-sensitive locking, low-level pointer operations, and unsafe type casts. They alsoomit linearity checking, which we include in Locksmith. In later work [Naik andAiken 2007], Naik and Aiken address the lack of linearity in locks by introducing aconditional must not alias analysis, which can handle fine-grain locking.

Several completely automatic static analyses have been developed for findingraces in C code. Polyspace [Hote 2004] is a proprietary tool that uses abstract in-terpretation to find data races (and other problems). The Blast model checker hasbeen used to find data races in programs written in NesC, a variant of C [Henzingeret al. 2004]. Race checking is not limited to checking for consistent correlation andcan be state dependent, but is limited to checking global variables and can bequite expensive. Seidl et al [Seidl et al. 2003] propose a framework for analyzingmultithreaded programs that interact through global variables. Using their frame-work they develop a race detection system for C and apply it to a small set ofbenchmarks, finding several data races. It is unclear whether their analysis sup-ports context sensitivity and how it models data structures. RacerX [Engler andAshcraft 2003] does not soundly model some features of C for better scalability andto reduce false alarms, but may miss races as a result. KISS [Qadeer and Wu 2004]builds on model checking techniques, and has been shown to find many races, butignores certain kinds of thread interleavings.

Voung, Jhala and Lerner present Relay, a race detection system for C [Vounget al. 2007] that uses flow-sensitive propagation of lock set and guarded-by informa-tion similar to Locksmith. Relay scales to millions of lines of C code by analyzingand summarizing the behavior of parts of the program in parallel, using symbolicevaluation. Unlike Relay, Locksmith generates and solves the constraints for thewhole program together, and is implemented to run on a single processor, limitingits scalability. One benefit of the whole program, type-based analysis in Locksmithis that it can track the flow of function pointers precisely. In contrast, the modularper-file analysis in Relay may not track aliasing of function pointers across filescorrectly, which can result in unmodeled control flow.

Terauchi proposes LP-Race [Terauchi 2008], a static analysis tool that reducesthe problem of race detection to linear programming. The reduction is such thatone need not directly compute acquired locks, and LP-race can handle synchro-nization via semaphores and signals. LP-Race scales to medium-sized programs,some of which cause Locksmith to run out of memory. However, Locksmithruns slightly faster though it uses a more precise aliasing and sharing analysis.We conjecture that, as a result of this more precise analysis, Locksmith’s reportsare more precise—two abstract locations differentiated by a precise analysis could

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

50 ·

be considered to be one location in a less precise analysis. Locksmith’s anal-ysis is inclusion-based, and is both field- and context-sensitive. LP-Race uses aunification-based analysis, that is also field-sensitive. In one common benchmark(smtprc) LP-Race was able to eliminate false positives due to handling semaphoresand thread joins. However, LP-Race produced additional false positives due to alimitation in handling loops that fork an unbounded number of threads. Due tothe way we use future effects in our sharing analysis, Locksmith is able to handlesuch loops and infer shared locations more precisely.

Work that detects violations of atomicity, either dynamically [Flanagan and Fre-und 2004] or statically [Flanagan and Qadeer 2003; Flanagan et al. 2005] typicallyrequires a program to be free of races.

Our analysis is based on ideas initially explored by Reps et al [Reps et al. 1995]and Rehof and Fahndrich [Rehof and Fahndrich 2001], who showed how to encodecontext-sensitive analysis as a context-free language reachability problem. Oursupport for existential types is related to restrict or focus for alias analysis [Aikenet al. 2003; Fahndrich and DeLine 2002]. Our flow-sensitive analysis is a significantextension of our previous work on flow-sensitive type qualifiers [Foster et al. 2002],which used a similar flow-sensitive constraint graph. Both systems can be seen asinference for a variant of the calculus of capabilities [Crary et al. 1999].

Correlation between locks and locations is similar to correlation between regionsand pointers, and several researchers have looked at the problem of region inference,including the Tofte and Birkedal system for the ML Kit [Tofte and Birkedal 1998].Henglein et al [Henglein et al. 2001] use a control-flow-sensitive and context-sensitivetype system to check that regions with non-lexical allocation and deallocation areused correctly. Our treatment of lock allocation is similar to Henglein et al’s treat-ment of region allocation, but our formal system supports higher-order functions,and we present a constraint-based inference algorithm.

8. CONCLUSION

In this paper we described Locksmith, a static analysis tool for finding data racesin C programs. We presented the core algorithms of Locksmith on a simple im-perative language and explored the engineering challenges in handling all of C. Foreach algorithm we performed extensive measurements, comparing with alternativealgorithms in terms of precision and performance. We found that context sensi-tivity greatly reduces the number of false warnings, but also limits Locksmith’soverall scalability. Perhaps surprisingly, we found that field sensitivity improvesboth precision and performance, the latter because more precise modeling of alias-ing speeds up subsequent phases of Locksmith. We described our approaches tomodeling fields lazily and for handling void ∗ pointers, both of which were impor-tant to precision and performance. We also found that our sharing analysis waseffective, determining that most locations are thread-local, and that some simplescoping optimizations and a uniqueness analysis improved on our sharing analysisfurther. Lastly, we found that not using a worklist was the best strategy for ourdataflow analyses. We believe these results will prove valuable to designers of otherstatic analyses for C.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 51

ACKNOWLEDGMENTS

We would like to thank Alan Cox for answering a question on Linux device drivers.This work was supported by NSF CCF-0541036, CCF-0346989, CCF-0430118, andCCF-0524036.

REFERENCES

Agarwal, R., Sasturkar, A., Wang, L., and Stoller, S. D. 2005. Optimized run-time race

detection and atomicity checking using partial discovered types. In ASE ’05: Proceedings of the20th IEEE/ACM international Conference on Automated software engineering. ACM Press,

New York, NY, USA, 233–242.

Agarwal, R. and Stoller, S. D. 2004. Type inference for parameterized race-free java. In

Proceedings of the Fifth International Conference on Verification, Model Checking and AbstractInterpretation. Lecture Notes in Computer Science, vol. 2937. Springer-Verlag, Venice, Italy,

149–160.

Aho, A. V. and Ullman, J. D. 1977. Principles of Compiler Design (Addison-Wesley series in

computer science and information processing). Addison-Wesley Longman Publishing Co., Inc.,

Boston, MA, USA.

Aiken, A., Foster, J. S., Kodumal, J., and Terauchi, T. 2003. Checking and inferring localnon-aliasing. In PLDI ’03: Proceedings of the ACM SIGPLAN 2003 conference on Program-

ming language design and implementation. ACM Press, New York, NY, USA, 129–140.

Alexandrescu, A., Boehm, H., Henney, K., Hutchings, B., Lea, D., and Pugh, B. 2005.

Memory model for multithreaded c++: Issues.

Boyapati, C. and Rinard, M. 2001. A parameterized type system for race-free java programs.

In OOPSLA ’01: Proceedings of the 16th ACM SIGPLAN conference on Object oriented pro-gramming, systems, languages, and applications. ACM Press, New York, NY, USA, 56–69.

Choi, J.-D., Lee, K., Loginov, A., O’Callahan, R., Sarkar, V., and Sridharan, M. 2002.Efficient and precise datarace detection for multithreaded object-oriented programs. In PLDI

’02: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design

and implementation. ACM Press, New York, NY, USA, 258–269.

Cooper, K. D., Harvey, T. J., and Kennedy, K. 2004. Iterative data-flow analysis, revisited.Tech. Rep. TR04-100, Department of Computer Science, Rice University.

Crary, K., Walker, D., and Morrisett, G. 1999. Typed memory management in a calculus ofcapabilities. In POPL ’99: Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on

Principles of programming languages. ACM Press, New York, NY, USA, 262–275.

Engler, D. and Ashcraft, K. 2003. Racerx: effective, static detection of race conditions and

deadlocks. In SOSP ’03: Proceedings of the nineteenth ACM symposium on Operating systemsprinciples. ACM Press, New York, NY, USA, 237–252.

Fahndrich, M. and DeLine, R. 2002. Adoption and focus: practical linear types for imper-ative programming. In PLDI ’02: Proceedings of the ACM SIGPLAN 2002 Conference on

Programming language design and implementation. ACM Press, New York, NY, USA, 13–24.

Fahndrich, M., Rehof, J., and Das, M. 2000. Scalable context-sensitive flow analysis using

instantiation constraints. In PLDI ’00: Proceedings of the ACM SIGPLAN 2000 conferenceon Programming language design and implementation. ACM, New York, NY, USA, 253–263.

Flanagan, C. and Abadi, M. 1999. Types for safe locking. In ESOP ’99: Proceedings of the 8thEuropean Symposium on Programming Languages and Systems. Springer-Verlag, London, UK,

91–108.

Flanagan, C. and Freund, S. N. 2000. Type-based race detection for java. In PLDI ’00:

Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and

implementation. ACM Press, New York, NY, USA, 219–232.

Flanagan, C. and Freund, S. N. 2001. Detecting race conditions in large programs. In PASTE’01: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for

software tools and engineering. ACM Press, New York, NY, USA, 90–96.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

52 ·

Flanagan, C. and Freund, S. N. 2004. Atomizer: a dynamic atomicity checker for multithreaded

programs. In POPL ’04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium onPrinciples of programming languages. ACM Press, New York, NY, USA, 256–267.

Flanagan, C. and Freund, S. N. 2007. Type inference against races. Sci. Comput. Pro-

gram. 64, 1, 140–165.

Flanagan, C., Freund, S. N., and Lifshin, M. 2005. Type inference for atomicity. In TLDI

’05: Proceedings of the 2005 ACM SIGPLAN international workshop on Types in languages

design and implementation. ACM Press, New York, NY, USA, 47–58.

Flanagan, C., Leino, K. R. M., Lillibridge, M., Nelson, G., Saxe, J. B., and Stata, R.

2002. Extended static checking for java. In PLDI ’02: Proceedings of the ACM SIGPLAN 2002Conference on Programming language design and implementation. ACM Press, New York, NY,

USA, 234–245.

Flanagan, C. and Qadeer, S. 2003. A type and effect system for atomicity. In PLDI ’03:Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and

implementation. ACM Press, New York, NY, USA, 338–349.

Foster, J. S., Johnson, R., Kodumal, J., and Aiken, A. 2006. Flow-insensitive type qualifiers.ACM Trans. Program. Lang. Syst. 28, 6, 1035–1087.

Foster, J. S., Terauchi, T., and Aiken, A. 2002. Flow-sensitive type qualifiers. In PLDI ’02:

Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design andimplementation. ACM Press, New York, NY, USA, 1–12.

Grossman, D. 2003. Type-safe multithreading in cyclone. In TLDI ’03: Proceedings of the 2003

ACM SIGPLAN international workshop on Types in languages design and implementation.ACM Press, New York, NY, USA, 13–25.

Heintze, N. and Tardieu, O. 2001. Ultra-fast aliasing analysis using cla: a million lines of c codein a second. In PLDI ’01: Proceedings of the ACM SIGPLAN 2001 conference on Programming

language design and implementation. ACM, New York, NY, USA, 254–263.

Henglein, F. 1993. Type inference with polymorphic recursion. ACM Trans. Program. Lang.Syst. 15, 2, 253–289.

Henglein, F., Makholm, H., and Niss, H. 2001. A direct approach to control-flow sensitive

region-based memory management. In PPDP ’01: Proceedings of the 3rd ACM SIGPLANinternational conference on Principles and practice of declarative programming. ACM Press,

New York, NY, USA, 175–186.

Henzinger, T. A., Jhala, R., and Majumdar, R. 2004. Race checking by context inference.SIGPLAN Not. 39, 6, 1–13.

Hote, C. 2004. Run-time error detection through semantic analysis.

intel.com. 2007. Teraflops research chip.

Johnson, R. and Wagner, D. 2004. Finding user/kernel pointer bugs with type inference.

In SSYM’04: Proceedings of the 13th conference on USENIX Security Symposium. USENIXAssociation, Berkeley, CA, USA, 9–9.

Kodumal, J. and Aiken, A. 2004. The set constraint/cfl reachability connection in practice.In PLDI ’04: Proceedings of the ACM SIGPLAN 2004 conference on Programming languagedesign and implementation. ACM, New York, NY, USA, 207–218.

Kodumal, J. and Aiken, A. 2005. Banshee: A scalable constraint-based analysis toolkit. In

SAS, C. Hankin and I. Siveroni, Eds. Lecture Notes in Computer Science, vol. 3672. Springer,London, UK, 218–234.

Lamport, L. 1978. Time, clocks, and the ordering of events in a distributed system. Commun.

ACM 21, 7, 558–565.

Leveson, N. G. and Turner, C. S. 1993. An investigation of the therac-25 accidents. Com-

puter 26, 7, 18–41.

Manson, J., Pugh, W., and Adve, S. V. 2005. The java memory model. In POPL ’05: Pro-ceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming lan-

guages. ACM Press, New York, NY, USA, 378–391.

Mossin, C. 1996. Flow Analysis of Typed Higher-Order Programs. Ph.D. thesis, DIKU, Depart-

ment of Computer Science, University of Copenhagen.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

· 53

Naik, M. and Aiken, A. 2007. Conditional must not aliasing for static race detection. SIGPLAN

Not. 42, 1, 327–338.

Naik, M., Aiken, A., and Whaley, J. 2006. Effective static race detection for java. In PLDI’06: Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and

implementation. ACM Press, New York, NY, USA, 308–319.

Neamtiu, I., Hicks, M., Foster, J. S., and Pratikakis, P. 2008. Contextual effects for version-

consistent dynamic software updating and safe concurrent programming. In Proceedings of theACM Conference on Principles of Programming Languages (POPL). ACM, New York, NY,

USA, 37–50.

Necula, G. C., McPeak, S., Rahul, S. P., and Weimer, W. 2002. Cil: Intermediate language

and tools for analysis and transformation of c programs. In CC ’02: Proceedings of the 11thInternational Conference on Compiler Construction. Springer-Verlag, London, UK, 213–228.

news.com. 2007. Designer puts 96 cores on single chip.

O’Callahan, R. and Choi, J.-D. 2003. Hybrid dynamic data race detection. In PPoPP ’03:Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel

programming. ACM Press, New York, NY, USA, 167–178.

Pierce, B. C. 2002. Types and programming languages. MIT Press, Cambridge, MA, USA.

Poulsen, K. 2004. Tracking the blackout bug.

Pratikakis, P., Foster, J. S., and Hicks, M. 2006a. Existential label flow inference via CFL

reachability. In Proceedings of the Static Analysis Symposium (SAS). Springer, Seoul, Korea,88–106.

Pratikakis, P., Foster, J. S., and Hicks, M. 2006b. Locksmith: context-sensitive correlationanalysis for race detection. In PLDI ’06: Proceedings of the 2006 ACM SIGPLAN conference on

Programming language design and implementation. ACM Press, New York, NY, USA, 320–331.

Qadeer, S. and Wu, D. 2004. Kiss: keep it simple and sequential. In PLDI ’04: Proceedings of

the ACM SIGPLAN 2004 conference on Programming language design and implementation.ACM Press, New York, NY, USA, 14–24.

Rehof, J. and Fahndrich, M. 2001. Type-base flow analysis: from polymorphic subtyping to

cfl-reachability. In POPL ’01: Proceedings of the 28th ACM SIGPLAN-SIGACT symposium

on Principles of programming languages. ACM Press, New York, NY, USA, 54–66.

Reps, T., Horwitz, S., and Sagiv, M. 1995. Precise interprocedural dataflow analysis via graphreachability. In POPL ’95: Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on

Principles of programming languages. ACM Press, New York, NY, USA, 49–61.

Reynolds, J. C. 2004. Toward a grainless semantics for shared-variable concurrency. In FSTTCS,

K. Lodaya and M. Mahajan, Eds. Lecture Notes in Computer Science, vol. 3328. Springer,Chennai, India, 35–48.

Rose, J., Swamy, N., and Hicks, M. 2005. Dynamic inference of polymorphic lock types. Science

of Computer Programming (SCP) 58, 3 (December), 366–383. Special Issue on Concurrency

and Synchronization in Java programs. Supercedes 2004 CSJP paper of the same name.

Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., and Anderson, T. 1997. Eraser: a

dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst. 15, 4,391–411.

Seidl, H., Vene, V., and Muller-Olm, M. 2003. Global invariants for analyzing multi-threaded

applications.

Siff, M., Chandra, S., Ball, T., Kunchithapadam, K., and Reps, T. 1999. Coping with type

casts in c. In ESEC/FSE-7: Proceedings of the 7th European software engineering conferenceheld jointly with the 7th ACM SIGSOFT international symposium on Foundations of software

engineering. Springer-Verlag, London, UK, 180–198.

Smith, F., Walker, D., and Morrisett, J. G. 2000. Alias types. In ESOP ’00: Proceedings

of the 9th European Symposium on Programming Languages and Systems. Springer-Verlag,London, UK, 366–381.

Talpin, J.-P. and Jouvelot, P. 1994. The type and effect discipline. Inf. Comput. 111, 2,245–296.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.

54 ·

Terauchi, T. 2008. Checking race freedom via linear programming. In PLDI ’08: Proceedings of

the 2008 ACM SIGPLAN conference on Programming language design and implementation.ACM, New York, NY, USA, 1–10.

Tofte, M. and Birkedal, L. 1998. A region inference algorithm. ACM Trans. Program. Lang.

Syst. 20, 4, 724–767.

von Praun, C. and Gross, T. R. 2003. Static conflict analysis for multi-threaded object-orientedprograms. In PLDI ’03: Proceedings of the ACM SIGPLAN 2003 conference on Programming

language design and implementation. ACM, New York, NY, USA, 115–128.

Voung, J. W., Jhala, R., and Lerner, S. 2007. Relay: static race detection on millions of lines

of code. In ESEC-FSE ’07: Proceedings of the the 6th joint meeting of the European softwareengineering conference and the ACM SIGSOFT symposium on The foundations of software

engineering. ACM, New York, NY, USA, 205–214.

ACM Transactions on Programming Languages and Systems, Vol. V, No. N, Month 20YY.