task types for pervasive atomicity - binghamtondavidl/papers/tasktypes-techmay09.pdf · task types...

33
Task Types for Pervasive Atomicity Yu David Liu SUNY Binghamton [email protected] Scott F. Smith The Johns Hopkins University [email protected] Abstract Several languages now provide atomic block syntax to bracket code in which atomicity must be preserved. Outside of these blocks however, there are no atomicity guarantees. In this paper we define a language that takes the opposite approach: with no programmer declaration, atomicity holds pervasively in the code, and is broken only when an explicit need for sharing is declared. Furthermore, our principled way of sharing prevents a complete loss of atomicity; it only means a large region may need to be broken into smaller atomicity-enforcing regions. At the core of our work is a novel constraint-based poly- morphic type inference system, Task Types, to help enforce this flavor of pervasive atomicity statically, and to reduce the need for costly run-time support mechanisms such as locks or transactions. Task Types statically map objects onto tasks (threads). New program syntax is minimal; program- mers need only declare the sharing points, as subtasks. We show the reasonableness of our type system by proving type soundness, isolation invariance, and atomicity enforcement properties hold at run time. Task Type programming requires more explicit focus on how objects are to be shared between threads; we illustrate the programming philosophy in several examples. 1. Introduction In an era when multi-core programming is becoming the rule not the exception, it is now generally agreed that atomicity the property that says an execution in the presence of inter- leavings always has the same effect as a sequential execution – is a crucial invariant. Several languages support a notion of atomic block, asking programmers to declare a code block as subject to atomicity enforcement; this is a large advance over lack of any atomicity, but it leaves the rest of the pro- gram unattended. The overall correctness of such a program then heavily depends on the judgment of the programmer, since any interleaved access on unattended shared memory is a violation of the sequential view, and is a spot where unex- pected bugs may be introduced. Such designs are also known to lead to problems when the atomicity-enforcing code in- terleaves with the non-atomicity-enforcing code, a condition known as weak atomicity [8; 32; 1]. In this paper we take the approach that atomicity should be pervasive throughout the program – every line of code is task object subtask object ordinary object messaging task after capture (a) (b) (c) Figure 1. (1) Non-Shared Memory at Default (2) Subtask- ing (3) Capture part of some atomic region. If atomicity is to be pervasive, there will be a much greater need for locking or software transactional memory (STM) control, to the point where it would seem that performance was severely impacted. A key insight that we build on – originally demonstrated in the Actor model [3] – is that if threads of a program never share objects, the threads can execute in any interleaving manner without violating atomicity; no additional run-time support is needed. We previously developed a dynamically typed language, Coqa, which enforces a non-shared-memory model by default [25]; in this paper we develop Task Types, which defines a primarily static, as opposed to completely dynamic, enforcement of this model. Figure 1(a) illustrates how objects are statically localized in threads via Task Types. Here the two rectangular boxes represent two runtime threads, which we call tasks. Our type system statically guarantees this picture: each object is “owned by” only one task. The black task objects are special objects which can start a new task; the white ordinary objects are the vast majority of objects and are statically localized to tasks. 1 2009/5/5

Upload: buicong

Post on 28-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Task Types for Pervasive Atomicity

Yu David LiuSUNY Binghamton

[email protected]

Scott F. SmithThe Johns Hopkins University

[email protected]

AbstractSeveral languages now provide atomic block syntax tobracket code in which atomicity must be preserved. Outsideof these blocks however, there are no atomicity guarantees.In this paper we define a language that takes the oppositeapproach: with no programmer declaration, atomicity holdspervasively in the code, and is broken only when an explicitneed for sharing is declared. Furthermore, our principledway of sharing prevents a complete loss of atomicity; it onlymeans a large region may need to be broken into smalleratomicity-enforcing regions.

At the core of our work is a novel constraint-based poly-morphic type inference system, Task Types, to help enforcethis flavor of pervasive atomicity statically, and to reducethe need for costly run-time support mechanisms such aslocks or transactions. Task Types statically map objects ontotasks (threads). New program syntax is minimal; program-mers need only declare the sharing points, as subtasks. Weshow the reasonableness of our type system by proving typesoundness, isolation invariance, and atomicity enforcementproperties hold at run time. Task Type programming requiresmore explicit focus on how objects are to be shared betweenthreads; we illustrate the programming philosophy in severalexamples.

1. IntroductionIn an era when multi-core programming is becoming the rulenot the exception, it is now generally agreed that atomicity –the property that says an execution in the presence of inter-leavings always has the same effect as a sequential execution– is a crucial invariant. Several languages support a notion ofatomic block, asking programmers to declare a code blockas subject to atomicity enforcement; this is a large advanceover lack of any atomicity, but it leaves the rest of the pro-gram unattended. The overall correctness of such a programthen heavily depends on the judgment of the programmer,since any interleaved access on unattended shared memory isa violation of the sequential view, and is a spot where unex-pected bugs may be introduced. Such designs are also knownto lead to problems when the atomicity-enforcing code in-terleaves with the non-atomicity-enforcing code, a conditionknown as weak atomicity [8; 32; 1].

In this paper we take the approach that atomicity shouldbe pervasive throughout the program – every line of code is

task object subtask object ordinary object messagingtask

after capture

(a)

(b)

(c)

Figure 1. (1) Non-Shared Memory at Default (2) Subtask-ing (3) Capture

part of some atomic region. If atomicity is to be pervasive,there will be a much greater need for locking or softwaretransactional memory (STM) control, to the point where itwould seem that performance was severely impacted. A keyinsight that we build on – originally demonstrated in theActor model [3] – is that if threads of a program nevershare objects, the threads can execute in any interleavingmanner without violating atomicity; no additional run-timesupport is needed. We previously developed a dynamicallytyped language, Coqa, which enforces a non-shared-memorymodel by default [25]; in this paper we develop Task Types,which defines a primarily static, as opposed to completelydynamic, enforcement of this model.

Figure 1(a) illustrates how objects are statically localizedin threads via Task Types. Here the two rectangular boxesrepresent two runtime threads, which we call tasks. Ourtype system statically guarantees this picture: each objectis “owned by” only one task. The black task objects arespecial objects which can start a new task; the white ordinaryobjects are the vast majority of objects and are staticallylocalized to tasks.

1 2009/5/5

XXXXXXXXXSupportModels

Atomic Blocks/Sections Task Types (This Work) Message Passing De-signs (Erlang, MPI, etc)

Pure Actors

sharing strategy shared memory non-shared memory bydefault

non-shared memory non-shared memory

atomicity property weak or strong atomic-ity in atomic blocks

pervasive strong atom-icity

pervasive strong atom-icity

pervasive strong atomicity

atomicity span (code) atomic block entire task divided bysubtask access

message handler di-vided by blockingoperators

message handler

atomicity span (mem-ory)

multiple objects possi-ble

multiple objects possi-ble

(usually) single actoronly

single actor only

programability (syn-chronous messaging)

Yes Yes Varies No

programability (atomic-ity intention)

declare if you need it implicit (divided at sub-tasking time)

implicit (divided at syn-chronization time)

implicit

Figure 2. The Spectrum of Atomicity-Oriented Programming Models

A pure non-shared memory model significantly impactsprogrammability: how can two threads communicate? Oneobvious solution is they only communicate when a newthread is launched – the pure Actor model adopts this strat-egy – but this would mean giving up synchronous messagingand dividing a piece of code that logically forms one threadinto numerous sub-threads for each message reply continu-ation. We take an intermediate approach between standardmultithreaded programming and Actors, where object shar-ing is supported, but by principled language abstractionsonly. Additionally, our design is flexible enough to naturallysupport both pure Actor-style as well as pure standard-stylemultithreaded programming in the cases where it is desired.

A key abstraction we make is the subtasking constructof Coqa. Tasks are allowed to interact, but only via spe-cial portal objects called subtask objects, as illustrated inFig. 1(b). Each subtask object conceptually handles requestsfrom tasks one at a time, spawning a subtask. Subtasks are infact not threads, they are run on the caller thread which hasblocked waiting for the reply – subtask messaging is syn-chronous. Like a task object, a subtask object can own a setof ordinary objects.

Subtask messaging gives tasks the sharing they criticallyneed, but in a limited manner. The most important benefitof limited sharing is that some atomicity properties are pre-served: the sending task has one region of atomicity fromthe start to the subtask invocation, the subtask itself is anatomic region, and the task execution following the subtaskis a third region of atomicity. In general, any multithreadedapplication can be divided into a series of atomic regions ofexecution, with every line of code falling into some atomicregion – atomicity is pervasive. A particular advantage ofthis methodology is it becomes more feasible to exhaustivelyenumerate all interleaving possibilities of concurrent appli-cations, a boon for program analysis and testing, and ulti-mately for the deployment of more reliable software.

In the dynamic model of Coqa, ordinary objects are as-signed to particular tasks at runtime, and when the task fin-ishes the objects are freed for use by another task. TaskTypes statically assign objects to tasks. Rather than creat-ing a complex static system to approximate the freeing ofan object, our language adds a notion of capture in analogyto a type cast, to bridge the case where the runtime can be“smarter” than the compile time. The expression capture oallows a task to use an object o originally used by a now-completed task. Potentially captured objects must have theirowner task tracked at runtime, and at capture time there mustbe a check to verify that the object is not currently used byany other task. This mechanism is illustrated in Fig. 1.

The design of subtasking and capture reflects a commonphilosophy: non-shared-memory is the default mode, andany programming need that does not observe this defaultmust be explicitly specified. The two are complementarysides of the same coin: subtasking breaks atomicity but theycan be used anytime, whereas capturing preserves atomicitybut can only be effective when the object being captured isnot used by another task any more.

Before we describe our system in more detail, we firstsummarize the state of the art for atomicity language designand static enforcement.

Other Approaches to Atomicity Flanagan and Qadeer [19]first explored type system support for atomicity. This typesystem assigns expressions with types indicating commuta-tivity properties, such as mover to indicate the expressioncan be exchanged with an execution step adjacent to it with-out violating atomicity. Philosophically, such an approach isa bottom-up strategy whereas Task Types are top-down: thework by Flanagan and Qadeer assumes a shared-memorymodel, and supports reasoning about the composability ofatomic blocks, whereas Task Types enforce a non-shared-memory model in which atomicity violations by default can-not happen in the first place. Since Flanagan and Qadeer’s pi-oneering work, many atomicity-oriented systems have been

2 2009/5/5

designed. Researchers have realized the efficiency of suchsystems heavily rely on how much dynamic auditing on shar-ing is needed, and the correctness of such systems cruciallyrely on how memory is partitioned [32; 1]. We feel the timeis now for us to have a foundational reflection on type systemsupport for atomicity again – this time focusing on memory.

The Actor model and related message-passing languages[3; 5; 33] achieve atomicity by imposing much strongerlimitations on object sharing: threads communicate only atthread launch. A primary appeal of this model is it triviallysupports pervasive strong atomicity. See Fig. 2. The case inconcern is the span of atomicity. In the pure Actor model,each message handler is atomic. To support Java-style syn-chronous messaging however, two explicit message handlersneed to be involved, one on the receiver for processing the re-quest, and one on the sender for processing the return value.The atomicity of a messaging round-trip is thus broken intohalves. There is also a programmability concern with such adesign – the flow of control has been broken into more morepieces and is harder to piece together the program mean-ing. This latter aspect has been improved in implementedActor-based languages, which include implicit CPS trans-formations for example, but that alone does not improve thespan of an atomic region. Most Actor-based languages pro-vide blocking constructs such as the receive in Erlang whichfurther breaks the atomicity of a message handler.

Recently, several static inference techniques have beendesigned to automatically insert locks to enforce atomicityfor atomic blocks [27; 23; 17; 9]. These techniques howeverfundamentally assume that atomic blocks are a “local” con-struct, and have made assumptions that are unrealistic forsupporting larger atomic regions: some for example requiresall invocations in an atomic block to be inlined [27], somewould required a static bound on lock resources [17; 9; 23],and some would assume all objects accessed in the atomicblock are not accessed elsewhere [23]. Such assumptions arereasonable in these experimental efforts – many of them C-based – but do not generalize to an object-oriented settingwith pervasive atomicity: both the number of objects (andhence locks) and the number of threads are unknown stati-cally; programs are fundamentally recursive; and atomicityenforcement is needed not for a block of code, but for allcode.

Ownership types [12; 11; 7] and region types [34; 20;10] are well-known for partitioning memory. Task Typeswere initially inspired by ownership type systems, but theyturn out to be a significant departure from standard own-ership type systems however. One key reason is our sup-port for sharing. In addition, because task creations are dy-namic, there may be an unbounded number of runtime tasksof a particular sort due to recursion; they must be mod-eled by only finitely many owners in the static type system,and so one “static owner” may represent multiple “dynamicowners”. Standard ownership type systems would not work

in this context because, to guarantee soundness, the singlestatic owner cannot even share between “itself” – it may rep-resent multiple runtime tasks that should not share with eachother!

2. Informal Discussion

P ::=−→Cls program

Cls ::= µ class a extends a {F M } classµ ::= task | subtask | ε modifierF ::=

−→Fe fields

Fe ::= a f = e field entryM ::=

−→Me methods

Me ::= a m(a x ){e} method entrye ::= x | this | newL a expressions

| f | f:=e | capture e | (a)e| e.m(e) | e->m(e) | e=>m(e)

Figure 3. Abstract Syntax

In this section, we highlight a number of features ofour language, focusing on its type system in particular. Theabstract syntax of our language is given in Fig. 3, and islargely standard for a Java-like language: it includes classesand objects (Cls), methods (M ), fields (F ), inheritance, andprivate field access (read and write), and casting. Notation−→X represents a sequence of X’s. We use metavariable ato represent class names, m for method names, f for fieldnames, and x for variable names. The program point labelL associated with the new expression is only used in theformal development and can be ignored by programmers. Toavoid null pointers, our language supports field initializersand requires no forward referencing happens at initializationtime, just like Java. The syntax omits several constructs thatare useful (so much so that we will use some of them in ourexamples) but are easily encodable: integers, constructors,statement composition e; e′, local variable declarations andassignments (encodable by method invocations), and publicfield access (encodable by a pair of methods with getters andsetters). The arity of method arguments is simplified to be 1.

New to our language are the class modifiers µ, signifyingwhether the class will create task objects (µ = task), subtaskobjects (µ = subtask), or ordinary objects (µ = ε). Themessaging and capture syntax will be discussed below. Fig. 4shows a code snippet.

2.1 Object MessagingExpression o->m(v) creates a new task where o is a task ob-ject. Line [1] in Fig. 4 creates a task by sending a start1message to the task object instantiated inside the createmethod. Task creation messaging is asynchronous and hasno interesting return value. Following Actors, each task ob-ject keeps a queue of all received messages and processes

3 2009/5/5

class Main {void main() {

TaskFactory f = new TaskFactory();MyShared s1 = new MyShared();MyShared s2 = new MyShared();MyShared s3 = new MyShared();Data d = new Data();(f.create()) ->start1(s1, s2, d); //[1](f.create()) ->start2(s3); //[2](f.create()) ->start2(s3); //[3](f.create()) ->start1(s1, s2, d); //[4] (error)

}class TaskFactory {

MyTask create() { return new MyTask(); }}task class MyTask {

void start1(MyShared s1, MyShared s2, Data d) {s1 =>process(d); s2 =>process(d); //[*]

}void start2(MyShared s) {

s =>process(new Data());}

}subtask class MyShared {

void process(Data d) { d.access(); } }}class Data { void access() {...} }

: Main f: TaskFactory : MyTask

: MyTask

s1: MyShared

d:Data

s2: MyShared

: Main f: TaskFactory

: MyTask

: MyTask

s1: MyShared

d: Data

s2: MyShared

s3: MyShared dl:Data

(a)

(b)

: MyTask

Ta

Ta1

Ta2

Ta3

Tb

Tb1

Tb4

Figure 4. A Code Snippet with (a) a snapshot right beforemain completion, where lines [1], [2], [3] are included(but not [4]), creating tasks Ta1, Ta2, Ta3 respectively;(b) a snapshot right before main completion, when lines[1],[4] are included (but not [2],[3]), creating tasksTb1 and Tb4. Such a program would produce a type errorso this run would never happen in reality.

them serially. This sequential processing constraint is to pre-serve atomicity; however, it is still easy to have the samecode running multiple tasks in parallel by multiply instanti-ating the same task class as is done in lines [2] and [3].

Subtasks can be created if tasks need to share state.Expression o=>m(v) creates a subtask where o is a sub-task object. Line [*] creates two subtasks by sending theprocess messages to s1 and s2 respectively. A subtask issimilar to a task in that a subtask object also keeps a queuefor received messages and executes them one by one. Unlikea task however, a subtask can return values, and o=>m(v)is a synchronous operation: the message sender blocks un-til the subtask is completed. For this reason, a subtask isnot in fact a distinct thread – it can piggyback on its parenttask. The subtask object however can be shared by multi-ple tasks/subtasks, the crucial rendezvous mechanism. InFig. 4(a), the two tasks Ta2 and Ta3 share subtask objects3. Since subtasks are technically not threads, we do notillustrate subtasks as nested boundary boxes.

Expression o.m(v) is the familiar synchronous messagingin Java-like languages where o is an ordinary, unshared,object (i.e. neither a task object nor a subtask object). Eachsuch object must be owned by the same task that owns thesender object.

2.2 Dynamic and Static Enforcement of AtomicityWe previously showed how atomicity can be enforced dy-namically [25]; the goal of our Task Type system is to stati-cally approximate dynamic enforcement so that most of thedynamic enforcement mechanisms are unnecessary. We firstgive a simplified account of how atomicity can be achieveddynamically for the language defined in Fig. 3.

In our Coqa language [25] we implemented atomicitywith locks, and proved that a running task is atomic if it locksevery ordinary object it accesses, and if such an object hasalready been locked by another task, the task blocks until theobject is unlocked – which happens when the lock-holdingtask completes. Alternatively, the Coqa locks could be im-plemented with STMs: atomicity is preserved if a transac-tion monitor records every object a task has accessed; uponcompletion, the task rolls back if any of the task’s objectswere accessed by other tasks in the interim, and otherwise itcommits.

Subtasking breaks whole-task atomicity, but as we ex-plained in Sec. 1, it breaks it in the sense of partitioning thetask into sequential regions that are still atomic, not in thesense of breaking atomicity completely; this fact is proved in[25]. Subtasking is partly related to the open nesting of STMtransactions [13; 8; 29]. For a lock-based implementation, asubtask will lock all objects it has accessed, but will unlockall of them upon completion to claim partial victory. Froma concurrency perspective, subtasking increases chances ofparallelism by reducing the duration of contention. Anothercommon issue in this context is whether a subtask can ac-cess objects locked by their ancestors. For STM-systems,

4 2009/5/5

the answer is often negative [8] due to high cost, whereasthe feature is supported relatively easily in Coqa [25].

2.3 Task IsolationTask Types mimic the aforementioned dynamic enforcementbehaviors at compile-time, so that much of the work forlocking or transaction monitoring at run-time can go away.The type system enforces non-shared-memory model (calledtask isolation here), but not a strict one. Introducing sharingpoints, subtasking in particular, produces non-trivial issuesthe must be solved. This means that the static access graph –a static global view of how objects are accessed where objectaccess is defined as being sent with a message – is not astrict hierarchy: the nodes representing subtask objects aresharing points. For instance, in Fig. 4(a), T2 and T3 sharesubtask object s3. Object dl is indeed being accessed bytwo tasks – and a DAG is formed – but this case is legalbecause sharing is encapsulated in the subtask. As anotherexample, the access to object d in the same figure is alsolegal, but for a different reason: it is being accessed by twodifferent subtasks, but such a sharing is not an atomicityviolation because s1 and s2 originate from the same parenttask. Recall that subtasking follows synchronous semantics,so the two accesses will not happen at the same time. Whens2 accesses d, s1 must have already completed and s2should be able to access d without any problem.

Things are quite different if a snapshot like Fig. 4(b)could exist. Here, even though d is also accessed by s1 ands2 – just like in Fig. 4(a) – the two tasks Tb1 and Tb4however both have access to s1 and s2. It is possible thats1 could access d for a request from Tb1 whereas at thesame time s2 would access d as well for a request fromTb4, which would violate our atomicity model.

In Sec. 3, we will define our task isolation model in away that a run-time snapshot like Fig. 4(b) can never arise,and the program would typecheck only if line [4] wereremoved. For run-time snapshots that Task Types do allow,we show that the dynamic locking semantics outlined inSec. 2.2 will be sound, and most importantly much of theneed for run-time locking will be obviated.

2.4 Context-Sensitivity: The Demand for InstancePrecision

The example above also shows the importance of correctlyapproximating the number of tasks being created. After all,the only difference between Fig. 4(a) and Fig. 4(b) from theperspective of object d is that the number of root tasks hasincreased from 1 to 2, and yet it makes the difference ofcorrectness. The algorithm thus has to correctly identify thatlines [1] and [4] will indeed create two different tasks.Here a factory method is used: each invocation of createwill return a different MyTask instance, even though thereis only one program point with the new MyTask expression.The algorithm must be able to treat different call sites of thesame method differently: context-sensitivity is needed. Task

Type context sensitivity is related to polymorphic type-basedconstraint systems such as [36], and is also closely related tocontext-sensitive program analyses such as [16; 39].

2.5 Tunneling: Battling Against the Effect ofRecursion

As is well-known, some imprecision must be introduced toanalyze recursive programs yet keep the analysis terminat-ing. For the specific problem we are tackling here, how-ever, things are more serious in that a standard approach willnot even be sound. Suppose we had a top-level loop (imple-mented via recursion in our primitive language) in the bodyof the main function.

...x = 5;while (x>0) {

(f.create()) -> start1(s1, s2, d);x = x - 1;

}

Based on our analysis of Fig. 4(b), we know this pro-gram will cause problems. However, we obviously can-not reject all programs with recursion or looping. If wechanged the asynchronous expression in the loop aboveto (f.create()) ->start2(s3), the loop aboveshould be legal since no ordinary objects are shared by thedifferent tasks launched in the loop.

Our solution centers around a novel tunnel detectionanalysis. The type system first ignores recursion unsound-ness and performs the non-shared-memory type inferenceas described in Sec. 2.3, and then it additionally explicitlyoutlaws the bad “tunnels”, the object-sharing between twotasks that cannot be differentiated by type variables. The in-tuition is that, if a task x and another task y both end up withaccess to some o, the “bad” case that cannot be detected bythe non-shared-memory type checking algorithm are thosecases where y is represented by the same type variable as x,so that the type system was tricked to believe x and y arethe same task. It turns out only a few checks are needed tooutlaw all such tunnels. This topic will be discussed in moredetail in Sec. 3.4.

2.6 CaptureFrom a programmer’s perspective, capture allows more pro-grams to typecheck without violating atomicity. As we ex-plained earlier, a program with both line [1] and line [4]in Fig. 4 would not typecheck because the dangerous caseof Fig. 4(b) might arise. The same program however wouldtypecheck if line [*] was changed to

d0 = capture d;s1 => process(d0); s2 => process(d0); //[*]

Semantically, capture is a typecast which allows an objectto change its owner. As described in Sec. 1, the changeonly happens at runtime for an object already unlocked by

5 2009/5/5

its original hosting task. If the object is still locked by adifferent task, capture is blocked.

2.7 PropertiesWe will show a type soundness property for Task Types. Wealso show Task Types preserve a non-shared-memory modelmodulo explicit sharing points such as subtask objects. Asa result, we can prove a program with all locks removed –except those used for captured objects or for mutual exclu-sion on task and subtask objects – will behave the same asone subject to the purely dynamic semantics of Sec. 2.2. Inother words, Task Types will preserve atomicity propertiespervasively, and with a relatively low run-time cost.

3. The Formal Type System

P ::=−−−−−−−−−−−−→a 7→ 〈µ;F ;M;U〉 parameterized signatures

M ::=−−−−−−−−−−−−→m 7→ ∀α.(τ → τ ′) method signatures

F ::=−−−−−−→f 7→ ∀α.τ field signatures

U ::= a superclass setτ ::= a@α typesα type variable (incl. self)Σ ::= cons constraint setcons ::= constraint

β ≤ β′ flow constraint| α hosts−→ α′ hosting constraint| lazy(α,m, σ) lazy closure constraint| 〈α1; m1〉

α→ef 〈α2; m2〉 data flow constraintβ ::= flow constraint element

α object| 〈α; f〉 field access marker| 〈a; L;σ〉 instantiation marker| cap(α, α′) capture marker

σ ::=−−−−→α 7→ α′ substitution

G ::=−−−−−−−→〈a; m〉 7→ Σ labeled constraints

Γ ::= −→γ typing environmentγ ::= x 7→ τ

| myC 7→ a| myM 7→ m

ef ∈ Bool effect label

Figure 5. Auxiliary Definitions for the Type System

We now describe Task Types rigorously.First we reviewour basic notation. xn denotes a set {x1, . . . , xn}. −−−−−→xn 7→ ynis used to denote a mapping sequence (also called a map-ping) [x1 7→ y1, . . . , xn 7→ yn]. Given M = −−−−−→xn 7→ yn,dom(M) denotes the domain of M , and it is defined as{x1, . . . xn}. We also write M(x1) = y1, . . .M(xn) = yn.When no confusion arises, we also drop the subscript n forsets and mapping sequences and simply use x and −−−→x 7→ y.

Binary operator ∈ is used both for set containment and se-quence containment, and ∅ is used to represent both emptyset and empty sequence. We write M[x 7→ y] as a mappingupdate: M and M[x 7→ y] are identical except that M[x 7→ y]maps x to y. Updateable mapping concatenation � is definedas M1 � M def= M1[x1 7→ y1] . . . [xn 7→ yn]. We also writeM1 � M as M1 ] M, except the latter function requires thepre-condition of dom(M1) ∩ dom(M) = ∅.

We define function Ξ(P) = P to extract formalization-friendly signature informationP from the more programmer-friendly P . We defer its verbose but obvious definition toAppx. A), and describe it here informally. The function takesa class and computes the signature a 7→ 〈µ;F ;M;U〉, in-cluding its name a, modifier µ, field signatures F , methodsignaturesM, and a superclass set U containing the namesof all superclasses (including a itself). For a class with super-classes, we “inline” all superclasses so that the signature ofthat class will include the signature information of all fieldsand methods on the hierarchy. To make parametric polymor-phism more explicit and readable, the signatures computedby Ξ quantifies the object types appearing in the signatures,Thus, given a method a m(a′ x ){e}, its signature in M ism 7→ ∀α, α′.(a@α→ a′@α′) where α and α′ are type vari-ables. Similarly, given a field a f = e, its signature in F isf 7→ ∀α′′.a@α′′. For simplicity of the formal development,we require the ∀-bound type variables for a particular classsignature to be distinct for each other.

We define a constraint-based type system, with constraintset Σ containing constraints cons inside. The meanings ofspecific constraints will be clarified later, but in general aconstraint connected by ≤ is intuitively a flow constraint,and we use meta variable β to represent these flow elements.Typing environment Γ contains mappings from variables totypes, including two special mappings denoting where thecurrently typed expression is lexically enclosed: myC 7→ asays the enclosing class is a and myM 7→ m says theenclosing method is m.

Function FTV (•) is defined as the set of free type vari-ables in • where • is either Σ, cons, β, or τ . It includesall type variables (α) in these constructs, except those indom(σ) should • include a substitution σ as part of its con-struct, such as lazy(α,m, σ) and 〈a; L;σ〉. Substitution •[σ]is defined on the same domain as FTV , and it will substituteall free occurrences of type variables.

3.1 The Type RulesType-checking starts with the (T-Global) rule. Judgment`global P\G holds when program P typechecks. Its overallstructure includes a per-class typechecking phase (T-Class),where type constraints Σ are collected, and recorded in astructureG in a per-class per-method manner, as it is indexedby 〈a; m〉. Constraints associated with field initializers do notbelong to any method, so they are indexed by a built-in spe-cial method name m∗; see (T-Fields). When a method/field

6 2009/5/5

(T-Global)Ξ(P) = P

P = [Cls1, Cls2, . . . , Clsn] dom(P) = {a1, . . . , an}i = 1, . . . , n name(Clsi) = ai P `cls Clsi : P(ai)\Gi

G = G1 ]G2 · · · ]Gn isolatedTasks(G,P)tunnelFree(G,P) P(amain) = 〈task; ∅;Mmain;Umain〉

dom(Mmain) = {mmain}`global P\G

(T-Class)[myC 7→ a],P `ml M :M\Gm [myC 7→ a],P `fl F : F\Gf

P `cls µ class a {F M} : 〈µ;F ;M;U〉\(Gf ]Gm)

(T-Methods)M = [Me1,Me2, . . . ,Men]

dom(M) = {m1,mn, . . . ,mn} i = 1, . . . , nname(Mei) = mi Γ, P `m Mei :M(mi)\Gi

Γ,P `ml M :M\G1 ] · · · ]Gn

(T-Method)Γ(myC) = a0 Γ � [myM 7→ m, x 7→ τ ], P ` e : τ ′\Σ

Γ,P `m (a′ m(a x){e}) : ∀α.(τ 7→ τ ′)\(〈a0; m〉 7→ Σ)

(T-Fields)Γ(myC) = a0

F = [Fe1, F e2, . . . , F en] dom(F) = {f1, f2, . . . , fn}i = 1, . . . , n name(Fei) = fi Γ,P `f Fei : F(fi)\Σi

Γ,P `fl F : F\(〈a0; m∗〉 7→ Σ1 ∪ . . .Σn)

(T-Field)Γ,P ` f:=e : τ\Σ

Γ,P `f (a f = e) : ∀α. τ\Σ

name(µ class a extends a′ {F M }) def= a

name(a m(a′ x ){e}) def= m

name(a f = e) def= f

Figure 6. Top-Level Typing Rule

is typed, the signature being used is the one we have com-puted in Ξ, as illustrated in (T-Method) and (T-Field). Aprogram contains a bootstrapping task class named Mainwith a special method main. Since all types in our core sys-tem are object types, sugar unit def= aobject@αobject isused to declare methods with no arguments or return values.

Type constraint closure is defined in Fig. 8, as a sys-tem of `ocl judgments. They are used by the definitions ofisolatedTasks(G,P) and tunnelFree(G,P). The first en-forces task isolation as we described in Sec. 2.3 and Sec. 2.4,while the second guarantees the model is correct with thepresence of recursion, as we described in Sec. 2.5. We usethis grayish color to represent formalism related to the defi-nition of tunnelFree(G,P) since the reader probably wants

(T-Var)Γ, P ` x : Γ(x)\∅

(T-New)P(a) = 〈µ;F ;M;U〉 {α1, . . . , αn} = {α | F(f) = ∀α.a@α}α′, α′1, . . . , α

′n fresh σ = [self 7→ α′, α1 7→ α′1, . . . , αn 7→ α′n]

Γ, P ` newL a : (a@α′)\{〈a; L;σ〉 ≤ α′} ∪ lazy(α,m∗, ∅)

(T-SyncM)Γ, P `inv 〈e; m; e′〉 : τ\Σ

Γ, P ` e : (a@α)\Σ0 P(a) = 〈ε;F ;M;U〉

Γ, P ` e.m(e′) : τ\Σ ∪ Σ0 ∪ {self hosts−→ α}

(T-ASyncM)Γ, P `inv 〈e; m; e′〉 : unit\Σ

Γ, P ` e : (a@α)\Σ0 P(a) = 〈task;F ;M;U〉Γ, P ` e->m(e′) : unit\Σ ∪ Σ0

(T-Subtask)Γ, P `inv 〈e; m; e′〉 : τ\Σ

Γ, P ` e : (a@α)\Σ0 P(a) = 〈subtask;F ;M;U〉

Γ, P ` e=>m(e′) : τ\Σ ∪ Σ0 ∪ {self hosts−→ α}

(T-Self)Γ(myC) = a

Γ, P ` this : a@self\∅

(T-Read)P(Γ(myC)) = 〈µ;F ;M;U〉 F(f) = ∀α.a@α

Γ, P ` f : a@α\{〈self; f〉 ≤ α}

(T-Write)P(Γ(myC)) = 〈µ;F ;M;U〉

Γ, P ` e : a@α′\Σ F(f) = ∀α.a@α

Γ, P ` f:=e : a@α′\Σ ∪ {α′ ≤ α}

(T-Capture)Γ, P ` e : (a@α)\Σ α′ fresh

Σ′ = {cap(α, self) ≤ α′, self hosts−→ α′} P(a) = 〈ε;F ;M;U〉Γ, P ` capture e : (a@α′)\Σ ∪ Σ′

(T-Cast)Γ, P ` e : (a′@α′)\Σ P(a) = 〈µ;F ;M;U〉P(a′) = 〈µ′;F ′;M′;U ′〉 a ∈ U ′ or a′ ∈ U

Γ, P ` (a) e : (a@α′)\Σ

(T-Sub)Γ, P ` e : (a@α)\Σ P(a) = 〈µ;F ;M;U〉 a′ ∈ U

Γ, P ` e : (a′@α)\Σ

(T-Call)Γ, P ` e : (a@α)\Σ

P(a) = 〈µ;F ;M;U〉 M(m) = ∀α1.α2.(a1@α1 7→ a2@α2)α′1, α

′2 fresh Γ, P ` e′ : a1@α01\Σ′

Σh = {lazy(α,m, [α1 7→ α′1, α2 7→ α′2])} Γ(myM) = m0

Σd = {〈self; m0〉α′1→false 〈α; m〉, 〈α; m〉

α′2→false 〈self; m0〉}Γ, P `inv 〈e; m; e′〉 : a2@α02\Σ′ ∪ Σh ∪ Σd ∪ {α01 ≤ α′1, α′2 ≤ α02}

Figure 7. Type Rules

7 2009/5/5

(OCL-Main)G `ocl G(〈amain; mmain〉)

(OCL-Subset)G `ocl Σ Σ′ ⊆ Σ

G `ocl Σ′

(OCL-Union)G `ocl Σ1 G `ocl Σ2

G `ocl Σ1 ∪ Σ2

(OCL-FlowHost)

G `ocl α′1 ≤ α1 G `ocl α′2 ≤ α2 G `ocl α1hosts−→ α2

G `ocl α′1hosts−→ α′2

(OCL-CapClass)G `ocl 〈a; L;σ〉 ≤ α G `ocl cap(α, α′′) ≤ α′

G `ocl 〈a; L;σ〉 ≤ α′

(OCL-Lazy-Intro)G `ocl lazy(α,m, σ)

G, ∅ `oclaux lazy(α,m, σ)

(OCL-Lazy-Cancel)G,Π `oclaux Σ

G `ocl Σ

(OCL-Lazy-NonRecursive)G,Π `oclaux lazy(α,m, σm)

G `ocl 〈a; L;σf〉 ≤ α forall σ′m〈a;σ′m; m〉 /∈ ΠΣ = G(〈a; m〉) σ = fresh(FTV (Σ)− dom(σf)− dom(σm))

G,Π ∪ {〈a;σm; m〉} `oclaux Σ[σ][σf ][σm]

(OCL-Lazy-Recursive)G,Π `oclaux lazy(α,m, σm) G `ocl 〈a; L;σf〉 ≤ α

〈a;σ′m; m〉 ∈ Πσm = [α1 7→ α′1, . . . , αu 7→ α′u]σ′m = [α1 7→ α′′1 , . . . , αu 7→ α′′u]

G,Π `oclaux[

i∈{1...,u}{α′i ≤ α′′i } ∪ {α′′i ≤ α′i}

(OCL-MutSource)G `ocl 〈α1; m1〉

α→ef 〈α2; m2〉 G `ocl 〈α1; f〉 ≤ αG `ocl 〈α1; m1〉

α→true 〈α2; m2〉

(OCL-DataFlow)

G `ocl 〈α1; m1〉α′1→ef1 〈α2; m2〉

G `ocl 〈α2; m2〉α′2→ef2 〈α3; m3〉 G `ocl α′1 ≤ α′2

G `ocl 〈α1; m1〉α′1→ef1∨ef2 〈α3; m3〉

Data Structure Π ::= 〈a;σ; m〉Function fresh({α1, . . . , αn})

def= [α1 7→ α′1, . . . , αn 7→ α′n]

α′1, . . . , α′n fresh

Figure 8. Type Constraint Closure

to initially avoid that complex issue. Any discussion of thisformalism will be deferred to Sec. 3.4.

Expressions are typed via the ` rules. (T-New) gener-ates a fresh type variable α, allowing objects of the sameclass created at different program points to be distinguished.Added constraint 〈a; L;σ〉 ≤ α associates the static infor-mation (class name a, lexical point L) with α. Constraintlazy(α,m∗, ∅) leaves a marker in the constraint set, statingthe constraints associated with field initializers of a shouldbe included at closure time. Eager inclusion of such con-straints are not always possible because of recursion. Wewill have more details discussion of this marker constraintshortly.

The most central constraints are the αhosts−→ α′ con-

straints, which piece together the static access graph thatwill be analyzed later for isolation preservation. They aregenerated by (T-SyncM) and (T-Subtask), where self is atype variable with a pre-defined name. Constraint self hosts−→α says that the object enclosing the o.m(v) expression oro=>m(v) is accessing o (which has type variable α). Mes-saging claims access, and this builds an edge on the staticaccess graph α hosts−→ α′ meaning α hosts α′. Our type sys-tem types each class only once. At this modular typecheck-ing phase, the type system does not know yet what typevariable is used to represented the enclosing object. Hencea place holder self is used, which will be instantiated withthe precise type variable at type closure time. We also donot eagerly collect the constraints belonging to the methodbody of the receiver (due to recursion); instead, a markerlazy(α,m, σ) is added, informing the type closure algorithmto include the constraints belonging to the m method of ob-ject α later. This is reflected by Σh in the (T-Call) rule, a ruleused by all three messaging rules for the common part of thethree rules.

The only other place a hosts−→ constraint occurs is (T-Capture). Here a fresh type variable is generated to rep-resent the captured object (intuitively, an object with a newidentity). The hosts−→ constraint says that the enclosing objectnow treats the captured object as one of its own. Constraintcap(α, self) ≤ α′ records the flow, remembering that thecaptured object α′ originally was α and was captured by theenclosing object of the expression. Asynchronous messag-ing (T-AsyncM) does not have a return value. Subtyping iscaptured by (T-Sub), which is standard nominal subtyping.

3.2 Type Closure, Recursion Bounding, and ContextSensitivity

Type closure is defined by G `ocl Σ judgments. If Σ isa singleton set {cons}, we also write G `ocl cons forbrevity. The type closure process starts with (OCL-Main),which starts with the constraints collected from the mainfunction. The most interesting rules here are perhaps thefour rules related to the processing of lazy constraints. Forevery lazy(α,m, σ) constraint in the constraint set of the

8 2009/5/5

classn(G,α) def= {a | G `ocl 〈a; L;σ〉 ≤ α}mod(G,P, α) def= {µ | a ∈ classn(G,α),P(a) = 〈µ;F ;M;U〉}

accesses(G,α, α′) def= (G `ocl α ≤ α′) ∨ (G `ocl αhosts−→ α′) ∨ (accesses(G,α, α′′) ∧ (G `ocl α′′

hosts−→ α′))isAccessed(G,P, α) def= mod(G,P, α) = {ε} ∧ (G `ocl α′

hosts−→ α) for some α′

roots(G,P, α) def= {α′ | accesses(G,α′, α),mod(G,P, α′) = {task}}cutVertices(G,P, α) def= {αcut | mod(G,P, αcut) = {task} or {subtask},

∧αr∈roots(G,P,α)

accesses(G,αr, αcut)}

isolatedTasks(G,P) def=∀α.isAccessed(G,P, α)∀α1.α2.{α1, α2} ∈ cutVertices(G,P, α)

}=⇒ accesses(α1, α2) ∨ accesses(α2, α1)

instHost(G,α) def= {σ(self) | G `ocl 〈a; L;σ〉 ≤ α}capHost(G,α) def= {α′ | G `ocl cap(α′′, α′) ≤ α}

extFree(G,α, αt)def= ∀α′ ∈ instHost(G,α) ∪ capHost(G,α).accesses(G,αt, α

′)

dissemFree(G,P, α, αt)def=

{G `ocl 〈αs; ms〉

α→true 〈αd; md〉accesses(G,αt, αd)

}=⇒ mod(G,P, αs) 6= {subtask}

cordFree(G,P, α, αt)def=

G `ocl 〈αs; ms〉α→ef 〈αd; md〉

accesses(G,αt, αs)accesses(G,αt, αd)

=⇒ mod(G,P, αd) 6= {task}

tunnelFree(G,P) def= ∀α.∀αt

{isAccessed(G,P, α)αt ∈ roots(G,P, α)

}=⇒

extFree(G,α, αt)dissemFree(G,P, α, αt)

cordFree(G,P, α, αt)

Figure 9. Functions isolatedTasks and tunnelFree

main method, the type system uses (OCL-Lazy-Intro) toclose in the constraints associated with method m of objectα. To ensure termination, we mimic the call stack in datastructure Π, recording the methods whose constraints havealready been included on the call chain. If the constraints arenew, (OCL-Lazy-Non-Recursive) is used to merge the newconstraints, while if the same method has been closed before,(OCL-Lazy-Recursive) simply equates the type variablesof the current invocation and its previous invocation. Theremaining rules propagate constraints along flow paths ≤,which is implicitly reflexive and transitive.

Context sensitivity is usually calculated in a programanalysis formalism, but here we use a type-constraint-basedformalism for its ease of mathematical manipulation andproof of correctness. We are maximally context-sensitivein the sense that every call site will refresh the type vari-ables of the constraints of the invoked method body; weonly bound this process at recursion so termination isachieved. The key variable refreshing happens in (OCL-Lazy-Non-Recursive), where all type variables in Σ ex-cept those showing up in dom(σm) and dom(σf) are re-freshed. dom(σm) contains the type variables representingthe formal argument and the return value of the enclosing

method. These type variables are refreshed by the caller, notthe callee, as can be seen in (T-Call) – they end up in thecaller constraint set and are refreshed there. dom(σf) con-tains the type variables representing the fields of the object.They are refreshed by (T-New), so that different instancesof the same class get different type variables for fields, butfor different invocations on the same instance, a field keepsa consistent view of the object it holds. Each object creationpoint in turn lies in some method, and that method invoca-tion will re-refreshen these variables so different calls willyield different object types. This is the reason why differentMyTask instances in Fig. 4 can be approximated staticallyeven though they are all instantiated from one program point.Note that we are giving a high-level description of closurewhich is more a specification than an implementation; manyefficiencies can be realized in an implementation which weare not covering here.

3.3 Isolation PreservationIf we define an access path to be a hosts−→ path from a taskobject to an ordinary object, then the specific invariant TaskTypes enforce is: a subgraph of the static access graph thatincludes exactly all access paths to a fixed ordinary object inthe static access graph must have one articulation point (cut

9 2009/5/5

vertex) which is a subtask or a task. Intuitively, what thissays is, if more than one task can access an object, then thatobject must “hide behind” a subtask which controls the shar-ing, i.e. always be accessed through a single subtask. Notethat such a definition does not restrict a DAG case like d inFig. 4(a). In this case, d “hides behind” the task object ofTa1 itself, which is thread local as well. The rigorous def-inition for this notion is the cutVertices(G,P, α) functionin Fig. 9, intuitively meaning the set of cut vertex ancestorsof α. Helper function mod(G,P, α) computes the modifierinformation for an object. It is easy to see that accordingto our type system, it is always a singleton set. Predicateaccesses(G,α, α′) defines whether an access path exists inthe static access graph. Function roots(G,P, α) computesall “root” task objects that can access α. The definition ofisolatedTasks then captures the intuition expressed above:all cut vertices must be in a total order; in other words, it isalways possible to find a ”least upper bound” cut vertex forα. Readers can verify why the case of dl in Fig. 4(a) type-checks, and that d in in Fig. 4(b) will lead to a type error.

According to this definition, both subtask objects andordinary objects can be owned, and all three can be owners.Note that the static access graph we are defining is a ratherrelaxed one compared with ownership types. For instance,we are not interested in enforcing any hierarchical structuresbetween a pair of ordinary objects: they can form cyclesif a call chain leads back to an ordinary object later. Whatinterests us are how they interact with tasks and subtasks. Itcan be easily verified that the definition here indeed allows asubtask (or the ordinary objects it owns) to access the objectsbelonging to its ancestors, as long as it will not violate theinvariant we have just defined: in cases a subtask object isaccessed by multiple tasks, such a sharing would lead toproblems, and it will indeed be caught by the type system.

3.4 Tunnel DetectionAs observed in Sec. 2.5, a tunnel detection analysis is neededto guarantee soundness. This analysis is defined by thetunnelFree function in Fig. 9; there are only three cases thatmight lead to “smuggling” tunnels, each of which is illus-trated in Fig. 10. In all three sub-figures, T1 and T2 are twotasks that cannot be differentiated statically, as was the casein the while loop example we presented in Sec. 2.5. There-fore, when isolatedTasks is computed, only T1 and T3 areapproximated, and isolatedTasks holds. That however stillleads to non-shared-memory violations at run time.

The first kind of violation is dubbed an “external tunnel”,as shown in Fig. 10(a). Here T3 might instantiate object oand pass it down to both T1 and T2; this is possible sinceT1 and T2 might both have been created by T3. Indeed, thiscase is why the while loop example of Sec. 2.5 fails to type-check, and this violation is caught by the extFree functionwhich is one requirement of tunnelFree . That function inturn relies on functions instHost(G,α) and capHost(G,α),which computes object α’s “host” – the object that instanti-

t2t1

t3

o osx1 sx2

sy2

sy1

o3

T1 T2

T3

t2t1

t3

o osx1 sx2

sy2

sy1

o3

T1 T2

T3

(b)

(c)

t2t1

t3

o osx1 sx2

sy2

sy1

o3

T1 T2

T3

(a)

o2

Figure 10. Tunneling

ated or captured α. In both cases α is purely local to its host,and the extFree function requires that the host is task-local,so α must be task-local as well.

We call the second violation “a disseminator tunnel”, andis illustrated in Fig. 10(b); it is prevented by the dissemFreefunction. In the figure, suppose we have an object o2 thatis instantiated inside T3, and a reference to o2 is storedin a field of subtask object o3. At different times, o2 canbe passed into T1 via sx1, and into T2 via sy2. TheisolatedTasks definition would not be able to detect sucha case for o2 and be tricked to believe o2 was in fact sent tothe same task twice. Observe that for this “tunnel” to exist,a necessary condition is the subtask o3 has to store o2: thetrick to be a “disseminator” is to send the same object (o2here) to two different tasks that cannot be differentiated bya type variable. The dissemFree function analyzes over thedata flow chain, based on the constraints collected in the per-class typechecking phase, those that are in gray color thatwe have yet to explain. In (T-Call), notice that constraint

{〈self; m0〉α′1→false 〈α; m〉, 〈α; m〉 α

′2→false 〈self; m0〉} is col-

lected, meaning an object α′1 flows from the m0 method ofself object into the m method of the α object, whereas thesecond constraint records the flow from the return value.The false here is an effect label, and a true value denotesthe flow originates from an effectful source. With the help ofthe read constraints collected in (T-Read) and a type closure

10 2009/5/5

rule for recording the mutable source (OCL-MutSource),and a propagation rule (OCL-DataFlow), the data flow canbe combined with the effect. What the dissemFree definitionstates is that if a data flow exists between a pair of objects αs

and αd and αd is inside a task (αt), it must not be the casethat the flow path starts from a subtask object that has storedthe object represented by α.

The third case is called a “birth cord tunnel”, illustrated inFig. 10(c), and is prevented by the cordFree function. Againsuppose we have an object o that is instantiated inside T1.A reference to o in T1 can in fact be passed from sx1,and then to o3. Suppose o3 in fact starts up T2 by sendingan asynchronous message to t1 again and passes o as theargument. The isolatedTasks definition would also not beable to detect such an invalid case. What cordFree states isif a data flow exists between a pair of objects αs and αd

inside a task (αt), and one end smuggles the object α outand the other end smuggles it in, it is not allowed that theflow path ends with a task, irrelevant of whether the path iseffectful or not. Here observe that since accesses(G,αt, αd)holds, if αd were to be a task, then αt and αd must be equalaccording to the accesses definition – the task is sending anasynchronous message to a task object that statically cannotbe differentiated from its own task object, and a smugglingtunnel is possible.

Note that the tunnel detection analysis is more relaxedthan a typical escape analysis. The tunnel detection analysisbelieves it is perfectly fine for a locally instantiated objectto leak out of the task boundary, as long as there are no“tunnels” for it to flow back to a task represented by the sametype variable. It is sound because, if the aforementionedwere to leak out the task boundary and were to be accessedby a task that can be differentiated by the type system viatype variables, such a violation would have been caught byby isolatedTasks .

4. Dynamic SemanticsWe now define a lock-based operational semantics for ourlanguage. Related definitions and selected reduction rulesare given in Fig.11. The rest of rules are given in Appx. B.Small-step reductions S ⇒ S′ are defined over states S =〈H;T 〉 for H the object heap, and T a set of parallel tasks(and subtasks). H is a mapping from objects o to field stores(Fd) tagged with their class name a. Program point L andsubstitution σ are only used to facilitate proofs. A task (orsubtask) is a tuple consisting of the task (or subtask) objecto that initiates it, and an expression e to be evaluated. Ex-pressions are extended with values v and runtime expressionwait o that blocks the current execution and waits for thecompletion of subtask executed in o. For the convenience ofcomputing closures, field read expression f is always treatedas this.f, and similarly for the field write expression. Eval-uation context E is standard. Let 〈H;T 〉 ⇒∗ 〈H ′;T ′〉 de-note the multi-step reduction transitively defined over ⇒.

H ::=−−−−−−−−−−−−−−−→o 7→ 〈a; L;σ;Acc;Fd〉

Fd ::=−−−→f 7→ v

T ::= 〈o; e〉 | T ‖ T ′Acc ::= ov ::= oe ::= . . . | v | wait o | e.f | e.f:=eo object IDE ::= • | v.f:=E | capture E | (a)E| E.m(e) | v.m(E)| E->m(e) | v->m(E)| E=>m(e) | v=>m(E)

tryLock(H, o, ot)def= H′ = H[o 7→ 〈a; L;σ;Acc ∪ {ot};Fd〉]if H(o) = 〈a; L;σ;Acc;Fd〉

Acc ⊆ ancestors(H, ot)

method(P ,H , o,m)def= Meif H(o) = 〈a; L;σ;Acc;Fd〉

µ class a {F M} ∈ PMe ∈ M ,name(Me) = m

(R-Capture)H′ = tryLock(H, o, ot)

H, 〈ot; capture o〉 ⇒ H′, 〈ot; o〉

(R-SyncM)H′ = tryLock(H, o, ot) method(P , H, o,m) = a′ m(a x){e}

H, 〈ot; o.m(v)〉 ⇒ H′, 〈ot; e{v/x}{o/this}〉

(R-AsyncM)H(o) = 〈a; L;σ; ∅;Fd〉 H′ = H[o 7→ 〈a; L;σ; {o};Fd〉]

method(P , H, o,m) = a′ m(a x){e}H, 〈ot; E[ o->m(v) ]〉 ⇒ H′, 〈ot; E[ ounit ]〉 ‖ 〈o; e{v/x}{o/this}〉

(R-SubTask)H(o) = 〈a; L;σ; ∅;Fd〉 H′ = H[o 7→ 〈a; L;σ; {ot};Fd〉]

method(P , H, o,m) = a′ m(a x){e}H, 〈ot; E[ o=>m(v) ]〉 ⇒ H′, 〈ot; E[ wait o ]〉 ‖ 〈o; e{v/x}{o/this}〉

Figure 11. Definitions and Selected Rules for Dynamic Se-mantics

Let P ⇒init 〈H;T 〉 denote the first step of reduction thatleads to the initial configuration. Let P ⇒∗ 〈H;T 〉 de-note P ⇒init 〈H;T 〉 and 〈H ′;T ′〉 ⇒∗ 〈H;T 〉. We use〈H;T 〉 ⇑P to denote the computation diverges. Reductionsare implicitly parameterized with code base P .

Each object o on the heap has a lock set Acc for record-ing tasks (or subtasks) that have accessed o. Lock-inducingobject access can happen only in the reductions of the fol-lowing four expressions: (R-Capture), (R-SyncM), (R-AsyncM), (R-SubTask). The latter two require that thetask/subtask object is currently not accessedAcc = ∅, so thatserial processing of requests is enforced. (R-Capture) and(R-SyncM) depend on pre-condition tryLock(H, o, ot),which attempts to have task/subtask object ot lock o if asubset condition, explained shortly, holds. If successful, the

11 2009/5/5

Acc set is incremented, recording the fact that ot currentlylocks the object. In fact, the combinational view of Acc isprecisely the dynamic access graph:

Definition 1 (Dynamic Access Graph). For heap H , dy-namic access graph accG(H ) is defined as a graph 〈V ;E〉where V = dom(H ) and E = {o′ dhosts−→ o | H (o) =〈a; L;σ;Acc;Fd〉, o′ ∈ Acc}.

We are able to prove the dynamic access graph is alwaysa forest.

Theorem 1 (Dynamic Task Isolation). If P ⇒∗ 〈H;T 〉,then graph accG(H) is a forest of directed redundant trees(i.e. trees potentially with redundant edges). The direction isdefined by dhosts−→ .

For a forest with no redundant edges, every ordinary ob-ject would precisely have one parent – isolation is achieved.When redundant edges appear, it signifies the case that a sub-task can access the objects accessed by its ancestors. Notethat subtasking is synchronous as we explained in Sec. 2.1,and hence this is correct (see [25] for a more detailed dis-cussion of this issue). An orphan node with no edges in theforest denotes an ordinary object not accessed by any task,or a task/subtask object with no running tasks/subtasks. Anode representing a task object may also have cyclic edges,meaning the object itself has been accessed (i.e. it is cur-rently running some task).

At first glance, it might first be unintuitive that the staticaccess graph of Sec. 2.1 is not a forest, but the dynamicone is. The key issue here is that the static system is onlyaware that a subtask object is shared by multiple tasks, butdynamically, the subtask object is indeed processing therequest from different tasks one by one, so for any snapshotof the run-time, each subtask can only be accessed by onetask.

Knowing the dynamic access graph is a forest, we nowturn back to the reduction rules. Function ancestors(H, ot)is the standard function to compute all ancestor nodes of oton the access graph of accG(H), inclusive of ot itself. Pre-condition Acc ⊆ ancestors(H, ot) intuitively means that toallow ot to access o, it must be true that the tasks/subtasksthat are currently accessing the object are the ancestors ofot, the case we explained just above. When Acc = ∅, it canbe trivially seen that ot will be given access – o is currentlynot accessed by anyone. When o is currently accessed bysome task and ot is a task/subtask that are not from the sametree, the reduction is blocked. This is consistent with ourintuition: capture o and o.m(v) will be blocked if o is heldby another task on a different tree according to (R-Capture).The rest of the formalism is relatively straightforward.

5. Formal PropertiesWe now establish the main properties of Task Types.

Theorem 2 (Type Soundness). If judgment `global P\Gand deadlock free(G,P), then either S ⇑P , or S ⇒∗〈H ′; 〈o; v〉〉 for P ⇒init S, and some H ′, o, v.

This Theorem states that the execution of a staticallytyped deadlock-free program either diverges or computes toa value. The Theorem is established by showing Lemmas ofSubject Reduction, Progress, and the fact that the bootstrap-ping process leads to a well-typed initial state. The theoremdepends on predicate deadlock free, which we define now.

Definition 2 (Deadlock Freedom). deadlock free(G,P) iff¬cycleCap(G) ∧ ¬cycleSubtask(G,P) where Ξ(P) = P .

Definition 3 (Cyclic Capture Chain). cycleCap(G) holdsiff there exist α0

host, . . . , αn−1host and α0

ord, . . . αn−1ord and for

i = 0, . . . , n − 1, predicate accesses(G,αihost, αiord) and

judgment G `ocl cap(αiord, α(i+1) mod nhost ) ≤ αiordnew holds.

Definition 4 (Cyclic Subtask Invocation Chain). PredicatecycleSubtask(G,P) holds iff there exist distinct α and α′

such that mod(G,P, α) = mod(G,P, α′) = {subtask},accesses(G,α, α′), accesses(G,α′, α), G `ocl 〈a; L;σ〉 ≤α, G `ocl 〈a′; L′;σ′〉 ≤ α′, and σ(self) 6= σ′(self).

These definitions constitute the standard decision proce-dure for statically detecting potential deadlocks: a programcan deadlock if there is no partial ordering on object ac-cess. In our language, this may happen either when thereis a cyclic chain of object capturing (Def. 3), or there is acycle on subtask invocation (Def. 4). Note that in the latterdefinition, two subtasks are enough to express the invoca-tion chain since accesses itself is transitive. The comparisonσ(self) 6= σ′(self) is used to make sure self-deadlocking isnot possible on subtasks.

Our language greatly reduces the likelihood of deadlocks.Potential deadlock can only occur with capture and subtaskblocking, and those are in turn explicitly specified in theprogram and are used only to declare sharing. A precise typesystem will also maximally differentiate the type variablesin the above, reducing the chance of forming false positivecycles in the static access graph.

If predicate deadlock free was added as a pre-conditionof the (T-Global) rule, Task Types would achieve deadlock-free programming; Autolocker [27] achieves deadlock free-dom in a similar fashion. As a theoretical foundation how-ever, we consciously avoid this route, and allow languageimplementors to decide on how to treat a program for whichdeadlock free fails: reject it, provide warnings, or insert run-time checks.

We now describe some properties related to task isolation.

Definition 5 (Task-Local Objects). Predicate local(P , L)asserts that the objects created at lexical point L are lo-cal, and holds iff `global P\G, Ξ(P) = P , G `ocl〈a; L;σ〉 ≤ α, and P(a) = 〈ε;F ;M;U〉 implies there doesnot exist α′, α′′ such that G `ocl cap(α, α′′) ≤ α′.

12 2009/5/5

Definition 6 (Lock Erasure Semantics). Reduction in thepresence of lock erasure P e⇒∗ S is defined as identicalto P ⇒∗ S, except with function tryLock redefined astryLock(H, o, ot)

def= H ′ = H[o 7→ 〈a; L;σ;Acc ∪{ot};Fd〉] for H(o) = 〈a; L;σ;Acc;Fd〉, if local(P , L),and tryLock unchanged when local(P , L) does not hold.

In lock erasure semantics, the pre-condition Acc ⊆ancestors(H, ot) of tryLock is removed for local objects.What this implies is the lock set is never used to determinelocal object reductions, so any real-world implementationwould just ignore the book-keeping of this data structure.

Theorem 3 (Static Task Isolation). If P e⇒∗ 〈H;T 〉, thengraph accG(H) is a forest of directed trees potentially withredundant edges. The direction is defined by dhosts−→ .

This theorem says that even if the reduction always ig-nores the locks on local objects, the dynamic access graphis identical to the one achieved following ⇒∗, as stated inThm. 1. In other words, isolation is achieved without theneed of run-time locking. Definition local thus defines a de-cision procedure that a compiler can use to find local objectsthat do not need runtime locks.

Last, we can prove static task isolation preserves atomic-ity as defined in [25].

Theorem 4 (Atomicity). `global P\G, then the atomicityproperty defined in [25] is preserved for all executions of Pwith lock erasure semantics.

We lack the space for a formal definition of the atomicityproperty; informally it is the pervasive, partitioned atomicityproperty described in Sec. 2.2. Since our dynamic system isvery similar to [25] where this property is formally proved,the same proof technique applies to the context here.

6. Programming with Task TypesTask type programming brings object sharing between tasksto the fore, and constitutes a different programming philoso-phy; in this section we explore this philosophy by consider-ing some common multithreaded programming paradigms.The vast majority of objects are ordinary objects not sharedacross tasks, and they can be programmed normally as well.Our focus here is on the more challenging case of objectsharing and synchronization.

6.1 Pure Task SynchronizationTasks often need to synchronize in ways that do not involvepassing of objects. Examples include semaphores, barriers,latches, etc. All of these synchronizations obviously consti-tute break-points in the atomicity of the involved tasks, andthis translates to the fact that their implementation interfacewill be a subtasking invocation. Since no significant objectdata is passed from one task to the other there will not beconflicts of which thread owns what object. These synchro-

// The semaphore itselfsubtask class Semaphore {

int counter = 0;void newProxy() {

sh = new SemaphoreProxy();sh.setSem(this); return(sh);

}boolean p() {

if (counter > 0){ counter = counter - 1; return(true); }

else { return(false); }}void v() {

counter = counter + 1;}

}// A local proxy for the semaphore which can busy waitclass SemaphoreProxy {

Semaphore theSem;void setSem(Semaphore s) {

theSem = s;}void p() {

if (! theSem=>p()) { this.p(); } // busy wait}void v() {

theSem =>v();}

}// Test user codetask class SemaphoreUser {

void testSem(Semaphore s) {sh = s =>newProxy(); // get a local proxy// ... arbitrary sh.p() and sh.v() messagings ...

}}class Main {

void main() {t1 = new SemaphoreUser();t2 = new SemaphoreUser();s = new Semaphore();t1 ->testSem(s);t2 ->testSem(s);

}}

Figure 12. A Simple Semaphore

nization primitives have similar implementation strategies;here we give the simplest, a semaphore.

A Semaphore Semaphores can be directly implementedwith Task Types; however, the implementation requires busywaiting and so they would likely be built into a produc-tion language as a primitive. It is nonetheless useful to de-fine a busy waiting semaphore as a case study. Figure 12shows how a simple semaphore could be implemented asa shared subtask class Semaphore. The testing main pro-gram here creates two semaphore user tasks t1 and t2which are launched and then repeatedly invoke semaphoreP and V operations in some arbitrary order. The P opera-tion on a semaphore may sometimes (correctly) block, butthere would be a problem if method p() on the subtaskSemaphore object blocked: subtasks can process only onemessage at a time, and so the blocking p() would uninten-tionally starve out any incoming v() and the system woulddeadlock. For this reason, each task asks for a local proxy, a

13 2009/5/5

SemaphoreProxy, which it will interact with, and whichcan poll the actual Semaphore p() method, which is notblocking. Since access to subtask objects is mutually exclu-sive, there will be no race conditions on the semaphore op-erations.

The sharing of s between tasks t1 and t2will typecheckbecause class Semaphorewas declared as a subtask classand can be shared freely. Note that each p()/v() invoca-tion on theSem will define a break point in the otherwisecomplete atomicity of t1/t2. The proxies sh local to eachtask can be polymorphically typed and instantiated uniquelyfor each t1 and t2, meaning the type system will be smartenough to see two versions of sh’s type, one for each task,and not just one version. If there were only one sh type thetwo tasks could not share it since only one task can be owner,and there would be a type error. This shows the power of thepolymorphic type system.

6.2 Synchronization with Data SharingThe default case of task types is for no sharing of objectsbetween tasks to occur; every such sharing is explicitly de-clared, and arises at a synchronization point or subtask whenobjects are passed out or in. Object sharing comes to theforefront in Task Type programming: the default behavior isthat tasks cannot share data and so most of the “new think-ing” needed for task type programming involves planninghow objects will be shared for the cases that it needs to be.

Before studying a particular example let us analyze thedifferent cases of object sharing. The following are all pos-sible interface points where objects can pass from one taskto another:

1. objects passed as arguments when a new task is started;

2. objects passed as arguments to subtasks; and,

3. objects returned by subtask invocations.

If in a program no objects are passed by any of these threemechanisms, then there can be no object sharing betweentasks; in that sense these three cases define a complete inter-face for object sharing between tasks.

Objects passed by one of the three cases above can them-selves be of several different forms. Here is a list of all pos-sible interesting cases, including several that have not beendefined in the system of this paper:

1. task and subtask objects, which can be freely shared sincethey cannot be owned by tasks;

2. captured objects, i.e. objects which the receiver capturesto their task, which as long as the original owning taskhas completed will be successfully transferred to the re-ceiving task;

3. black box objects, which the receiving task will not beable to touch since they are considered owned by a dif-ferent task;

4. transfered objects (not in this paper), the case where theoriginating task has not yet terminated but it has releasedthe object and will no longer be able to access it;

5. immutable objects (not in this paper), which can beshared freely; and,

6. objects passed by copy (not in this paper), another casewhere free sharing is supported.

For all but case 1. above there is in addition a “depth” di-mension to the sharing – the fields of the objects themselveseach fall into one of the six cases above. For example, anobject could at the top level be immutable but could containone field that is a subtask object and another field that is acaptured object, etc.

For objects being shared, programmers need to decidewhich modality is the most appropriate. This work is re-warded by producing a program with a formally declaredsharing policy which is fully enforced by the language, asopposed to the current practice of an informal policy whichis spottily enforced by the language design.

Object Sharing in MapReduce Here we look at a commontype of multi-core algorithm, the case where there is a largedata structure with an “embarrassingly parallel” algorithmthat works by partitioning the data structure between tasks.One such programming pattern is MapReduce [14], whichis similar to the “join continuation” of actors [3]. Here wedevelop a simplified MapReduce which captures much ofthe behavior in a small amount of code. The code for ourMapReduce is given in Figure 13. Each WorkUnit objectencapsulates data and a unit of work that needs to be done onthe data, performed as aWorkUnit.work() which in thissimplified example just returns a single int result. We elidethe details of the work units here, but an example of a workunit could be a single chromosome of DNA plus a patternto search for in the chromosome; the result is the number ofdistinct occurrences of that pattern. The total work units isthen the set of all 22 chromosomes and a pattern to searchfor. Each Mapper task loads in a unit of work, performs thework, and passes the result to a Reducer subtask whichcombines each of the numerical results via a commutativeand associative operator combine. This example focuseson the shared objects; in practice there will be many objectswithin WorkUnits holding the actual data and doing thecomputation on that data, and since those objects are notshared there will be zero programmer burden at that point– only shared objects require programmer overhead.

The object sharing between tasks and subtasks is definedvia Task Types. The subtask class UnitLoader loads thedata of one work unit from somewhere, e.g. it may read itfrom a file. This activity takes place in a subtask becauseeach Mapper needs to take its turn loading data and sub-tasks provide mutual exclusion. Observe that if main()were to read in the work units directly it would own all ofthem and then the individual mapper tasks would not be

14 2009/5/5

task class Mapper {void map(UnitLoader ul,Reducer r) {

wu = ul =>loadWorkUnit();int result = wu.work();r =>reduce(result);

}}subtask class Reducer {

int current = 0;int size;void setSize(int s) {

size = s;}

}void reduce(int r) {

current = this.combine(current,r);size = size - 1;if (size == 0) { ...output final result ... };

}int combine(int i, int j) {

... combine i and j, e.g. add them}

}subtask class UnitLoader {

WorkUnit loadWorkUnit() {... load in a new WorkUnit() ...

}}class WorkUnit {

... declared local data here ...int work() { ... code which does work on data

}task class Main {

void main() {reducer = new Reducer();ul = new UnitLoader();reducer =>setSize(this.getNumUnits());for (i=1; i == this.getNumUnits(); i++) do {

mapper = new Mapper();mapper ->map(ul,reducer); }

}int getNumUnits() {

... returns the number of work units}

}

Figure 13. A Simplified Map-Reduce Example

able to access them until main() completed. And, evenat that point a capture expression would be needed in themapper to dynamically transfer ownership from the Mainto the Mapper task. In this example, the Mapper mapmethod can freely access its work unit wu; it was createdby the UnitsLoader but the only “root task” with accessto it is the particular mapper task itself, so the mapperis the cut vertex and so the objects are appropriately parti-tioned. Also by inspection observe there are also no tunnelswhereby wu could sneak out into some other mapper task:this is trivial because the wu are not passed anywhere exceptfrom UnitsLoader to the appropriate mapper, and theloader does not store it in a field. The reducer, ul, andall the mappers are either tasks or subtasks and so can beshared freely.

This example shows how programmers need to pay closeattention to object sharing when programming with TaskTypes. The benefit is a guarantee of many fewer interleaving

scenarios and significantly eased debugging burden, result-ing in faster deployment of more reliable software.

7. Related WorkType Systems and Static Analyses STM systems primar-ily use dynamic means to enforce atomicity [21; 38; 8]. Theproblem of weak atomicity has attracted significant interestrecently, and a number of static or hybrid solutions exist toensure transactional code and non-transactional code do notinterfere with each other. Such a property is enforced by Sh-peisman et. al. [32] via a hybrid approach with dynamic es-cape analysis and a not-accessed-in-transaction static analy-sis. The latter has the good property of allowing “data hand-off”: a transaction can pass along an object without access-ing it. Data handoff is useful for different threads sharing adata structure (say a queue) but not the elements in it. Thisproperty also holds for Task Types. Passing along an objectreference or storing the reference is not considered accesssince atomicity is not violated. AME [1] describes a con-ceptually static procedure to guarantee violation-freedom oftransactional code and non-transactional code. Harris et. al.[22] used the monads of Haskell to separate computationswith effects and those without.

Constructing type systems to assure race condition free-dom is a well-explored topic. These systems work under sig-nificantly different assumptions than we do: they typicallyassume a Java-like shared memory with explicit lock ac-quire/release expressions. Given the non-shared memory as-sumption and protected access to subtask objects, race con-ditions do not occur in our system. On a technical level, someof these race-free type systems share properties with TaskTypes. For instance, RaceFreeJava [18] allows programmersto declare certain classes to be threadlocal so that their in-stances can be excluded from consideration by their mainalgorithm. Some techniques are designed to remove unnec-essary locks for synchronized blocks [4; 31]. These systemscan be viewed as pre-cursors to lock inference for atomicsections [27; 17; 9; 23], which we reviewed in Sec. 1.

There are many static analysis algorithms for tracking theflow of objects. Closest to our work are several thread escapeanalyses [15; 40; 6] for object-oriented languages, whichuse reachability graphs to prevent or track alias escape fromthreads. Task Types share a focus on thread locality with thiswork, but differ in two important aspects: (1) the pervasiveatomicity of our language requires subtask-style sharing,which is a “partial escape” that should be allowed but has norepresentation in these analyses; (2) Fundamentally, objectreferences do not need to be confined to guarantee atomicity:it is perfectly fine for a task to create an object, pass it overto another task, which in turn stores it or passes it further toa third task. The key to atomicity is there is no conflictingobject access. The effect of these differences is to produceunique issues that escape analyses do not encounter and weneed to solve here. Two particular examples are our cut-

15 2009/5/5

vertex definition of object isolation, and our tunnel detectionmethodology.

Language Designs for Atomicity The shortcomings of ex-plicitly declared atomic blocks are summarized in [35]. Inthat work, a data-centric approach is taken: the fields of anobject are partitioned into statically labelled atomic sets, andaccess to fields in the same set is guaranteed to be atomic,analogous to declaring different subtasks for different atomicsets in Task Types. Their data-centric approach is perhapsone step forward compared with atomic blocks, but the de-sign philosophy is still no-atomicity-unless-you-declare-it,and hence fundamentally different from our notion of per-vasive atomicity. Atomic sets are particularly well-suited foratomicity within one object; for atomicity spanning multipleobjects, a keyword unitfor is provided to compose atomicsets, but this increases annotation overhead, and violates aproperty atomic blocks have that we believe are intuitive:“atomicity is deep”, i.e. all directly and indirectly invokedobjects from an atomic block should be protected. In ourwork, task atomicity is deep by default.

A recent work that is closer to our spirit of pervasiveatomicity is AME [24; 1]. The language constructs of AMEare along the lines of Actors, where an async e expressionstarts up an atomicity-preserving thread. AME is differentfrom our work in its support of an expression unprotected e,meaning atomicity is not preserved for e. Because of theinterleaving of code inside the unprotected and the defaultprotected code, strong atomicity cannot be achieved whenthe condition of violation freedom is not satisfied. AME doesnot support synchronous messaging, and does not overlapwith the static type system aspect of our work.

In Sec. 1, we explained how the Actors model and actor-based message-passing languages such as Erlang [5] canachieve pervasive atomicity. Kilim [33] is a more recentactor-like language, with a focus on providing refined mes-sage passing mechanisms without sacrificing the isolationproperty of Actors. The Kilim type system relies on extraprogrammer declarations called isolation modifiers to de-note how each parameter can be passed/used in the inter-actor context. Kilim and Task Types have the same focus onobject isolation, but in orthogonal and complementary de-sign spaces: Kilim on message passing, and Task Types onatomicity.

8. Towards a Realistic LanguageThis language proposal represents a fairly significant depar-ture from the norm, and it produces many issues which wehad to leave out of this initial design to keep it manageable.In this section we sketch some of the other features that maybe needed in a full implementation. More programming ex-perience will be needed to determine which features beloware critically needed and which are not.

Type System Extensions Perhaps the most pressing areais to further increase the accuracy of Task Types, so more

programs that in fact do not violate atomicity properties aretypable. Some obvious extensions include the following. Im-mutable objects can be freely shared between tasks withoutviolating atomicity, and so a mechanism for creating im-mutable objects is needed; this is not a difficult problem.Similarly, call-by-copy is a variation on immutability whichalso allows object data to be passed from one task to anotherwithout introducing any atomicity-violating back-channelsof information flow. Both of these declarations are usefulin a “shallow” form in which only the top-level object iscopied/immutable, but deep versions may also be desirable,to allow a large immutable object structure to be shared orto copy a large object structure. In this paper, we have con-sidered the most general case where every program point isrecursive. A simple algorithm can be designed to label non-recursive program points (such as the main method), so thattunnel detection there would be unnecessary.

As a conceptual model, Task types define object accessto occur at method invocation time. Of course, what re-ally affects atomicity is when the object fields are read orwritten. This view can be supported by collecting the hosts−→constraints at (T-Read) and (T-Write), rather than at (T-SyncM). Non-exclusive read / exclusive write locks can alsoeasily be added.

A less trivial extension is the support of object transfer.Currently, tasks hold on to objects even if they are in fact fin-ished with them. In many cases it is statically determinablethat once a task passes an ordinary object to another task itwill no longer access it; in this case there is no atomicityviolation, even though the object is shared – its owner wassimply changed from one task to another. In cases where itcannot be statically determined that the host task is finished,it is still possible to dynamically transfer an object via an ex-plicit declaration: the receiving task becomes the new owner,and the originating task will be forced to raise an exceptionif it touches the object. Implementing such a transfer is eas-ily accomplished by the compiler by using a proxy patternon the host side.

Language Construct Extensions Programmers may wishto force an ordinary object invocation, say e.m(e′), to bestrictly atomic, i.e., with no subtasking appearing inside thebody of m or the methods directly or indirectly invoked bythat method. Currently there is no direct way to achieve this,but we could easily extend our language with an expression,say e..m(e′), to indicate such strict atomicity. Since the typeconstraint set contains a conservative approximation of allmethods m may invoke, and since the effect of dynamicdispatch can be mitigated with the precise concrete classtype information for each object, it is easy to statically verifythis property.

There are also different forms of object capture that couldbe considered beyond the current block-until-freed modalitysupported now. Both a non-blocking capture which raises anexception when an object is busy, as well as a capture with

16 2009/5/5

timeout which will not block forever, are potential additions.It may similarly be useful to support these variations on sub-task message blocking if the subtask object is currently busy:either raise an exception now, or support a timeout whichcould raise an exception later if the wait is too long. Sup-porting these variations can further help reduce deadlocks.

There are several language features not addressed here,including exceptions, native methods, interface types, pack-ages, etc. As alluded to in the previous section, a produc-tion language would also need to include non-busy-waitingimplementations of semaphores, synchronization barriers,latches, etc.

Towards an Efficient and Robust Implementation Effec-tive implementation of context-sensitive/polymorphic infer-ence algorithms have been investigated extensively [28; 16;2; 36; 39]. We are currently implementing Task Types ontop of the CoqaJava compiler [25], which is in turn built ontop of Polyglot [30]. The ongoing implementation of poly-morphic inference is based on Wang and Smith’s work [36],and we are also actively looking at various optimizationssuch as BDDs [39]. For full-fledged deadlock avoidance,one strategy we are interested in pursuing is a Discrete Con-trol Theory-based model [37], which re-organizes locks withmaximum permissibility of parallelism.

9. ConclusionWe have presented a new approach to achieve atomicity inmultithreaded object-oriented programs, taking a top-down,mostly static approach as opposed to the usual bottom-upand usually dynamic approach. While our language syntaxrequires only minor changes to a standard object model,the approach to data sharing between threads is significantlydifferent than the norm. A consequence of the model is theobject sharing between tasks is brought front and center forthe programmer, where it should be.

Atomicity with Task Types is top-down in the sense thatit is pervasive: we slice the whole into atomic parts insteadof building islands of atomic blocks. Our notion of sharingbetween tasks is principled, via subtasking or capturing, bothof which still preserve atomicity properties: strong atomicityis achieved for all code regardless of sharing.

Task Types incorporate a powerful inference algorithmwith a precise polymorphic/context-sensitive analysis whichstatically verifies that ordinary objects are appropriately par-titioned between tasks; Task Type programming thus re-quires minimum annotation overhead. Since the partitioningis verified statically, there is no need for dynamic partition-ing of objects between tasks, meaning there is no additionalruntime overhead. The one exception is when the type sys-tem is too weak – in this case we support a cast-like cap-ture syntax which dynamically assigns an object to a par-ticular task. We still need to use locks on captured objectssince there may be real run-time contention; even though weuse locks here, our approach is largely neutral as to whether

locks or STMs are used. Type soundness and isolation en-forcement are formally proved. The “partial sharing” en-abled by subtasks adds a significant subtlety to correctness.

A Technical Report containing the proofs may be foundonline [26].

References[1] Martın Abadi, Andrew Birrell, Tim Harris, and Michael Isard. Semantics of

transactional memory and automatic mutual exclusion. In POPL ’08, pages 63–74, 2008.

[2] Ole Agesen. Concrete type inference: delivering object-oriented applications.PhD thesis, Stanford University, Stanford, CA, USA, 1996.

[3] Gul Agha. ACTORS : A model of Concurrent computations in DistributedSystems. MITP, Cambridge, Mass., 1990.

[4] Jonathan Aldrich, Craig Chambers, Emin Gun Sirer, and Susan Eggers. Staticanalyses for eliminating unnecessary synchronization from java programs. In InSAS’99, page pages, 1999.

[5] J. Armstrong. Erlang — a Survey of the Language and its Industrial Applications.In INAP’96 — The 9th Exhibitions and Symposium on Industrial Applications ofProlog, pages 16–18, Hino, Tokyo, Japan, 1996.

[6] Bruno Blanchet. Escape analysis for object-oriented languages: application tojava. SIGPLAN Not., 34(10):20–34, 1999.

[7] Chandrasekhar Boyapati, Robert Lee, and Martin Rinard. Ownership types forsafe programming: preventing data races and deadlocks. In OOPSLA ’02, pages211–230, 2002.

[8] B. CarlStrom, A. McDonald, H. Chafi, J. Chung, C. Minh, C. Kozyrakis, andK. Olukotun. The atomos transactional programming language. In PLDI’06,June 2006.

[9] Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani. Inferring locks foratomic sections. In PLDI’08, pages 304–315, 2008.

[10] Wei-Ngan Chin, Florin Craciun, Shengchao Qin, and Martin Rinard. Regioninference for an object-oriented language. In PLDI ’04, pages 243–254, 2004.

[11] Dave Clarke. Object Ownership and Containment. PhD thesis, University ofNew South Wales, July 2001.

[12] David G. Clarke, John M. Potter, and James Noble. Ownership types for flexiblealias protection. In In OOPSLA’98, pages 48–64. ACM Press, 1998.

[13] Cray Inc. Chapel Specification, 2005.

[14] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing onlarge clusters. In OSDI’04, 2004.

[15] Jong deok Choi, Manish Gupta, Mauricio Serrano, Vugranam C. Sreedhar, andSam Midkiff. Escape analysis for java. In OOPSLA’99, pages 1–19, 1999.

[16] Maryam Emami, Rakesh Ghiya, and Laurie J. Hendren. Context-sensitive inter-procedural points-to analysis in the presence of function pointers. In PLDI’94,pages 242–256, 1994.

[17] Michael Emmi, Jeffrey S. Fischer, Ranjit Jhala, and Rupak Majumdar. Lockallocation. In POPL ’07, pages 291–296, 2007.

[18] Cormac Flanagan and Stephen N. Freund. Type-based race detection for java. InIn PLDI’00, pages 219–232. ACM Press, 2000.

[19] Cormac Flanagan and Shaz Qadeer. A type and effect system for atomicity. InPLDI’03, pages 338–349, 2003.

[20] Dan Grossman. Type-safe multithreading in cyclone. In TLDI ’03, pages 13–25,2003.

[21] Tim Harris and Keir Fraser. Language support for lightweight transactions. InOOPSLA’03, pages 388–402, 2003.

[22] Tim Harris, Simon Marlow, Simon Peyton-Jones, and Maurice Herlihy. Com-posable memory transactions. In PPoPP ’05, pages 48–60, 2005.

[23] Michael Hicks, Jeffrey S. Foster, and Polyvios Prattikakis. Lock inference foratomic sections. In TRANSACT’06, June 2006.

[24] Michael Isard and Andrew Birrell. Automatic mutual exclusion. In HOTOS’07:Proceedings of the 11th USENIX workshop on Hot topics in operating systems,pages 1–6, Berkeley, CA, USA, 2007. USENIX Association.

[25] Yu David Liu, Xiaoqli Lu, and Scott F. Smith. Coqa: Concurrent objects withquantized atomicity. In CC’08, March 2008.

[26] Yu David Liu and Scott F. Smith. Task Types for Pervasive Atomicity (LongVersion), http://www.cs.binghamton.edu/˜davidl/tasktypes.Technical report, March 2009.

[27] Bill McCloskey, Feng Zhou, David Gay, and Eric Brewer. Autolocker: synchro-nization inference for atomic sections. In POPL’06, pages 346–358, 2006.

17 2009/5/5

[28] Ana Milanova, Atanas Rountev, and Barbara G. Ryder. Parameterized objectsensitivity for points-to analysis for java. ACM Trans. Softw. Eng. Methodol.,14(1):1–41, 2005.

[29] Yang Ni, Vijay Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Richard L.Hudson, J. Eliot B. Moss, Bratin Saha, and Tatiana Shpeisman. Open nesting insoftware transactional memory. In PPoPP’07, March 2007.

[30] Nathaniel Nystrom, Michael R. Clarkson, and Andrew C. Myers. Polyglot: Anextensible compiler framework for java. In CC’03, volume 2622, pages 138–152,NY, April 2003. Springer-Verlag.

[31] Erik Ruf. Effective synchronization removal for java. In PLDI’00, pages 208–218, 2000.

[32] Tatiana Shpeisman, Vijay Menon, Ali-Reza Adl-Tabatabai, Steven Balensiefer,Dan Grossman, Richard L. Hudson, Katherine F. Moore, and Bratin Saha. En-forcing isolation and ordering in stm. In PLDI’07, pages 78–88, 2007.

[33] Sriram Srinivasan and Alan Mycroft. Kilim: Isolation-typed actors for java. InECOOP’08, 2008.

[34] Mads Tofte and Jean-Pierre Talpin. Region-based memory management. Infor-mation and Computation, 1997.

[35] Mandana Vaziri, Frank Tip, and Julian Dolby. Associating synchronizationconstraints with data in an object-oriented language. In POPL ’06, pages 334–345, 2006.

[36] Tiejun Wang and Scott F. Smith. Precise constraint-based type inference for Java.In ECOOP’01, pages 99–117, 2001.

[37] Yin Wang, Stephane Lafortune, Terence Kelly, Manjunath Kudlur, and ScottMahlke. The theory of deadlock avoidance via discrete control. In POPL ’09,pages 252–263, 2009.

[38] Adam Welc, Suresh Jagannathan, and Antony L. Hosking. Transactional moni-tors for concurrent objects. In ECOOP’04, pages 519–542, 2004.

[39] John Whaley and Monica S. Lam. Cloning-based context-sensitive pointer aliasanalysis using binary decision diagrams. In PLDI’04, pages 131–144, 2004.

[40] John Whaley and Martin Rinard. Compositional pointer and escape analysis forjava programs. In OOPSLA, pages 187–206, 1999.

A. The Ξ Function

Ξ(P)def=−−−−−−−−−−−−−−→an 7→ Ξclass(an,P)

if P =−−→Clsn

name(Clsi) = ai, i = 1, . . . , n

Ξclass(a,P)def= 〈µ;F ′ ] F ;M′ �M; {a} ∪ U ′〉if µ class a extends a′ {F M} ∈ P

Ξclass(a′,P) = 〈µ′;F ′;M′;U ′〉Ξfl(F ,P) = FΞml(M ,P) =MoverrideOK (M′,M)µ = µ′ unless µ′ = aobject

Ξclass(aobject,P)def= 〈ε; ∅; ∅; {aobject}〉

Ξml(−−−−−−−−→a m(a′ x){e},P)

def=−−−−−−−−−−−−−−−−−−−−→m 7→ ∀α, α′.(a@α→ a′@α′)α, α′ fresh

Ξfl(−−−−→a f = e,P)

def=−−−−−−−−→f 7→ ∀α.a@αα fresh

Figure 14. Ξ(P) = P: Computing Signatures

Fig. 14 defines the Ξ function. overrideOK (M1,M2) isdefined as for any m such thatM1(m) = ∀α1, α

′1.(a1@α1 →

a′1@α′1) and M2(m) = ∀α2, α′2.(a2@α2 → a′2@α′2), it

holds that a1 = a2 and a′1 = a′2. We also do not considershadowing of fields on the hierarchy (see the ] operatorin the definition). The two restrictions are consistent withFeatherweight Java. Furthermore, we require all class mod-ifiers (such as task) are the same for all classes on the same

inheritance hierarchy, unless the class in concern is the thespecial root aobject.

B. Additional Reduction RulesAll reduction rules other than in the main text are presentedin Fig. 15, except the standard rule for evaluation contextreduction and reducing e; e′. droots(H ) is defined as theset of roots (represented by object IDs) in forest accG(H ).Function removeLocks(H , o) is a reductive operation on theforest accG(H ), so that all edges (in-edges and out-edges)for node o are removed. Anything related to σ in the rulesare purely used for facilitating the proof.

(R-New)Ξ(P) = P

P(a) = 〈µ;F ;M;U〉 o fresh H′ = H[o 7→ 〈a; L;σ; ∅; ∅〉]α′ fresh {α1, . . . , αn} = {α | F(f) = ∀α.a@α}

α′, α′1, . . . , α′n fresh σ = [self 7→ α′, α1 7→ α′1, . . . , αn 7→ α′n]

µ class a {F M} ∈ P F =−−−−−−−→an fn = en

H, 〈ot; newL a〉 ⇒ H′, 〈ot; (o.f:=e1; . . . o.f:=en; o)〉

(R-Write)H(o) = 〈a; L;σ;Acc;Fd〉

H′ = H[o 7→ 〈a; L;σ;Acc;Fd[f 7→ v]〉]H, 〈ot; o.f:=v〉 ⇒ H′, 〈ot; v〉

(R-Read)H(o) = 〈a; L;σ;Acc;Fd〉 Fd(f) = v

H, 〈ot; o.f〉 ⇒ H, 〈ot; v〉

(R-Cast)H(o) = 〈a; L;σ;Acc;Fd〉

Ξ(P) = P P(a) = 〈µ;F ;M;U〉 a′ ∈ UH, 〈ot; (a′)o〉 ⇒ H, 〈ot; o〉

(R-TaskEnd)ot ∈ droots(H )

H, 〈ot; v〉 ‖ T ⇒ removeLocks(H, ot), T

(R-SubtaskEnd)

accG(H ) = 〈V ;E〉 (o′tdhosts−→ ot) ∈ E

H, 〈ot; v〉 ‖ 〈o′t; E[ wait ot ]〉 ⇒ removeLocks(H, ot), 〈o′t; E[ v ]〉

(R-Commute)H,T ‖ T ′ ⇒ H,T ′ ‖ T

(R-Compose)H,T ⇒ H′, T ′

H,T ‖ T ′′ ⇒ H′, T ′ ‖ T ′′

(R-Init)task class amain {F M} ∈ P aunit mmain(aunit x){e} ∈ M

H = omain 7→ 〈amain; Lmain; [self 7→ αmain]; {omain}; ∅〉H ′ = ounit 7→ 〈aunit; Lunit; ∅; ∅; ∅〉

P ⇒init 〈H ]H ′; 〈omain; e〉〉

Figure 15. Additional Operational Semantics Rules

18 2009/5/5

ancestors(H , o) def= {o} ∪ anch(H , o)anch(H , o) def= ∅ if accG(H ) = 〈V ;E〉, (o′ dhosts−→ o) /∈ Eanch(H , o) def= {o′} ∪ anch(H , o′) if accG(H ) = 〈V ;E〉, (o′ dhosts−→ o) ∈ E

removeLocks(H , o) def= H ′[o 7→ 〈a; L;σ; ∅;Fd〉] if H (o) = 〈a; L;σ;Acc;Fd〉H ′ =

⊎H (o0)=〈a0;L0;σ0;Acc0;Fd0〉

(o0 7→ 〈a0; L0;σ0;Acc0 − {o};Fd0〉)

(R-Context)H, 〈ot; e〉 ⇒ H ′, 〈ot; e′〉

H, 〈ot; E[ e ]〉 ⇒ H ′, 〈ot; E[ e′ ]〉

Figure 16. Auxiliary Definitions for Reduction Rules

S ::= 〈H; T 〉 configuration signatureH ::=

−−−−−−−−−−−−→o 7→ 〈a; L;σ;Acc〉 heap signature

T ::= o task signatureγ ::= · · · | iconf 7→ S extended environment element

structure(〈H ; T 〉) def= 〈structure(H ); structure(T )〉structure(H ) def= H

if dom(H ) = dom(H) = {o1, . . . , on}∀i ∈ {1, . . . n} structure(H (oi)) = H(oi)

structure(〈a; L;σ;Acc;Fd〉) def= 〈a; L;σ;Acc〉structure(T ‖ T ′) def= structure(T ) ∪ structure(T )structure(〈ot; e〉) def= {ot}

Figure 17. Auxiliary Definitions for Proof

(WFT-Config)T `wftH H H `wftT T

P `wft 〈H; T 〉

(WFT-Heap)∀o ∈ dom(H)

H(o) = 〈a; L;σ;Acc〉 Acc ⊆ {ot | 〈ot; o〉 in T } ∀〈µ;F ;M;U〉 ∈ range(P). FTV (M) ∩ dom(σ) = ∅P, T `wftH H

(WFT-TaskPool)∀o in T H(o) = 〈a; L;σ;Acc〉 o ∈ Acc

H `wftT T

Figure 18. Well-Formedness Definitions

19 2009/5/5

(T-Config)`global P\G

Ξ(P) = P structure(S) = S P `wft S S,P `h H \Σh S,P `tp T\Σtp accG(H) forestP `conf 〈H ; T 〉\(Σh ∪ Σtp)

(T-Heap)dom(H ) = dom(H) = {o1, . . . , on} ∀i ∈ {1, . . . n} S,P `hc H (oi)\Σi

S,P `h H \(Σ1 ∪ . . . ,Σn)

(T-HeapCell)P(a) = 〈µ;F ;M;U〉 dom(Fd) = {f1, . . . , fn} [iconf : S,myC : a],P ` Fd(fi) : ai@α′i\∅

F(fi) = ∀αi.ai@αi σ(αi) = α′i dom(σ) ∩ range(σ) = ∅ σ(self) = α

S,P `hc 〈a; L;σ;Acc;Fd〉\{〈a; L;σ〉 ≤ α〉}

(T-TaskPool)S,P `tp T : T \Σ S,P `tp T ′ : T ′\Σ′

S,P `tp T ‖ T ′\Σ ∪ Σ′

(T-Task)[iconf : S],P ` e : τ\Σ S = 〈H; T 〉 H(o) = 〈a; L;σ;Acc〉

S,P `tp 〈o; e〉\Σ[σ]

(T-RID)Γ(iconf) = 〈H; T 〉 H(o) = 〈a; L;σ;Acc〉 α = σ(self)

Γ,P ` o : (a@α)\∅

(T-AuxRead)Γ, P ` e : (a0@α0)\Σ0 P(a0) = 〈µ;F ;M;U〉 F(f) = ∀α.a@α

Γ, P ` e.f : a@α\Σ0 ∪ {〈self; f〉 ≤ α}

(T-AuxWrite)Γ, P ` e : (a0@α0)\Σ0 P(a0) = 〈µ;F ;M;U〉 Γ, P ` e′ : a′@α′\Σ F(f) = ∀α.a@α

Γ, P ` e.f:=e′ : a′@α′\Σ0 ∪ Σ ∪ {α′ ≤ α}

(T-Wait)Γ, P ` e : (a@α)\Σ

Γ, P ` wait e : unit\Σ

Figure 19. Auxiliary Typing Rules

C. ProofC.1 Auxiliary Definitions for Reduction RulesFig. 16 gives rigorous definitions that have been used byreduction rules by omitted due to page limit.

C.2 Auxiliary Definitions for ProofFig. 17 gives definitions to data structures that are used totype dynamic data structures. Some auxiliary functions arealso defined. They are solely used by the proof. In addition,droot(H , o) is defined as the root (represented by an objectID) of the tree o is in on forest accG(H ). If the node is anisolated node, then the function returns the ID of itself.

C.3 Auxiliary Typing RulesFig. 18 and Fig. 19 define auxiliary rules to be used for theproof.

20 2009/5/5

C.4 Properties About ValuesLemma 1 (Empty Constraints). If Γ, P ` v : τ\Σ, thenΣ = ∅.

Proof. Induction on the type derivation and case analysison the last step. There are only two cases, (T-RID), and (T-Sub).

Lemma 2 (Unique Type Variable for Values). If Γ, P ` v :a1@α1\Σ1 and Γ, P ` v : a2@α2\Σ2, then α1 = α2.

Proof. Case analysis on v. There are only three cases forthe last step leading to the two derivations, (T-RID), and (T-Sub).

Lemma 3 (Value Substitution). For any Γ such that

Γ, P ` v : τ ′\Σ (3.1)

[myC 7→ a,myM 7→ m, x 7→ τ ′],P ` e : τ\Σ′ (3.2)

then Γ � [myC 7→ a,myM 7→ m],P ` e{v/x} : τ\Σ ∪ Σ′.

Proof. Standard substitution lemma.

C.5 Properties About σLemma 4 (Context-Sensitive Signature Type Refreshment).

Γ � [x 7→ ad@αd],P ` e : ar@αr\Σ (4.1)

α′d /∈ FTV (Σ) (4.2)

α′r /∈ FTV (Σ) (4.3)

Γ � [x 7→ ad@α′d],P ` e : ar@α′r\Σ[σ] where σ = [αd 7→α′d, αr 7→ α′r].

Proof. Induction on the judgment and case analysis on thelast step of the derivation. Here the only interesting rule is(T-Var) as this is where the type variable is freshed.

Lemma 5 (Type Variable Substitution Over Subsetting). IfΣ1 ⊆ Σ2, range(σ) ∩ FTV (Σ2) = ∅, then Σ1[σ] ⊆ Σ2[σ].

Proof. Basic property of substitution.

Lemma 6 (Context-Sensitive Substitution).

`global P\G (6.1)

Ξ(P) = P (6.2)

µ class a {F M } ∈ P (6.3)

t m(t x ){e} ∈ M (6.4)

P(a) = 〈µ;F ;M;U〉 (6.5)

M(m) = ad@αd 7→ ar@αr (6.6)

Γ, P ` v : ad@αd0\Σv (6.7)

α′d, α′r fresh (6.8)

then Γ � [myC : a,myM : m],P ` e{v/x} : ar@αr0\Σ ∪Σv and Σ = G(〈a; m〉)[αd 7→ α′d, αr 7→ α′r] ∪ {αd0 ≤α′d, α

′r ≤ αr0}

Proof. Given (6.1), (6.2), (6.3), (6.4), (6.5), (6.6), (T-Global),(T-Class), we know

[myC 7→ a,myM 7→ m, x 7→ ad@αd],P ` e : ar@αr\Σm

(6.9)

Σm = G(〈a; m〉)(6.10)

According to Lem. 4, (6.9), (6.8), we know

[myC 7→ a,myM 7→ m, x 7→ ad@α′d],P ` e : ar@α′r\Σ′m(6.11)

Σ′m = Σm[αd 7→ α′d, αr 7→ α′r](6.12)

According to Lem. 3, (6.7), (6.11), we know

Γ � [myC 7→ a,myM 7→ m],P ` e{v/x} : ar@α′r\Σ′m ∪ Σv

(6.13)

Thus let Σ = Σ′m. According to Lem. 5, (6.10), (6.12), (6.8),we know

Σ′m = G(〈a; m〉)[αd 7→ α′d, αr 7→ α′r] (6.14)

C.6 Definition and Properties of G−⇀

Definition 7. Σ G−⇀ Σ′ holds iff G � [〈amain; mmain〉 7→Σ] `ocl Σ′.

Lemma 7. If Σ G−⇀ Σ′, then FTV (Σ′) = {α | α ∈FTV (Σ) or α fresh}.

Proof. Case analysis on the (OCL–) rules.

Lemma 8 ( G−⇀ Subsetting). If Σ1G−⇀ Σ2, Σ3 ⊆ Σ2, then

Σ1G−⇀ Σ3.

Proof. Directly follows the definition of G−⇀ , (OCL-Main),and (OCL-Subset).

Lemma 9 ( G−⇀ Padding). If Σ1G−⇀ Σ2, then Σ1 ∪ Σ G−⇀

Σ2 ∪ Σ.

Proof. Directly follows the definition of G−⇀ , (OCL-Main),(OCL-Subset), and (OCL-Union).

21 2009/5/5

C.7 Properties of Γ

Definition 8. Γ ≤acc Γ′ is defined as Γ′ is the same as inΓ, except for the H part, the Acc for Γ′ is superset of itscounterpart in Γ.

Lemma 10 (Acc Strengthening). If Γ,P ` e : τ\Σ, andΓ ≤acc Γ′, then Γ′,P ` e : τ\Σ

Proof. Induction on the type derivation and case analysis onthe last step. The only interesting rules are (T-SyncM), (T-Capture), (T-AsyncM), (T-Subtask).

C.8 Dynamic IsolationDefinition 9 (Stack Access inside Tasks/Subtasks). Predi-cate stack(H ,T ) iff for any 〈o; e〉 in T and accG(H ) =〈V ;E〉, (o dhosts−→ o′) ∈ E and o′ must be an ordinary ob-ject. In addition, for all ancestors of o, the same holds aswell.

Lemma 11 (Stack Access Preservation Over Reduction).Given Si = 〈Hi; Ti〉 for i = 1, 2,

S1 ⇒ S2 (11.1)

stack(H1,T1) (11.2)

then stack(H2,T2).

Proof. Simple case analysis on reduction rules.

Lemma 12 (Stack Access). If P ⇒∗ 〈H;T 〉, then stack(H,T ).

Proof. Directly follows Lem. 11, together with the initializa-tion holds trivially.

Definition 10 (Zero In-Edge Corresponds Zero Out-Edge).zeroInzeroOut(H ) iff accG(H ) = 〈V ;E〉, {o1 | (o1

dhosts−→o) ∈ E} = ∅ implies {o2 | (o

dhosts−→ o2) ∈ E} = ∅.

Lemma 13 (Zero Edge Correspondence Over Reduction).Given S1 ⇒ S2, where Si = 〈Hi; Ti〉 for i = 1, 2,zeroInzeroOut(H1) and stack(H1,T1), then it must holdthat zeroInzeroOut(H2).

Proof. Induction on the length of the execution sequence,and case analysis over the reduction rule being applied tothe last step of reduction S1 ⇒ S2.

For Cases (R-Self), (R-Read), (R-Cast), (R-AsyncM), (R-Subtask), (R-Commute), note that H2 = H1 so the con-clusion holds trivially. For Case (R-New), note that the onlychange accG(H2) has relative to accG(H1) is to add oneextra isolated node to the forest, hence the result is also aforest. For Case (R-Write), note that the Acc component ofthe heap does not change, and hence the graph in concernremain unchanged, so the conclusion holds trivially.

For Cases (R-Context), (R-Compose), the conclusion im-mediately follows induction hypothesis.

For Case (R-Capture) and Case (R-SyncM), the reduc-tion in both cases attempts to add one more edge to for-est accG(H1). Graph-theoretically, adding edges to a forestonly leads to fewer nodes with zero in-edges, and thus thelemma holds trivially.

For Cases (R-TaskEnd), (R-SubtaskEnd), note that thechange from graph accG(H1) to graph accG(H2) is to elim-inate all the in-edges and out-edges of ot where T1 =〈ot; ot; v〉 ‖ T ′1 for some T ′1. Thus, for the node ot, theproperty of zeroInzeroOut holds trivially. Such an opera-tion would have also reduced the degree of in-edges that otis a parent of. Not to lose generality, let us pick one of suchnode, and let it be o′t. Hence we know given accG(H1) =〈V1;E1〉, (ot

dhosts−→ o′t) ∈ E1. Note that the form of T1

qualifies the pre-condition of Lem. 11, and we also knowthat stack(H1,T1) holds. Thus by that lemma we know ifthere exists some o′′t such that (o′t

dhosts−→ o′′t ) ∈ E1 theno′t = o′′t . What this says is all the out-edges of o′t, if any,will be cyclic self-edges. So suppose by performing the re-duction we have indeed reduced the degree of in-edges of o′tto zero, that would mean the self-edges are removed as well,and hence all the out-edges of o′t would be reduced to zeroas well. Hence predicate zeroInzeroOut(H2) holds.

Lemma 14 (Isolation Preservation Over Reduction). GivenSi = 〈Hi; Ti〉 for i = 1, 2,

S1 ⇒ S2 (14.1)

accG(H1) is a forest (14.2)

zeroInzeroOut(H1) (14.3)

then accG(H2) is a forest .

Proof. Induction on the length of the execution sequence,and case analysis over the reduction rule being applied tothe last step of reduction S1 ⇒ S2.

For Cases (R-Self), (R-Read), (R-Cast), (R-AsyncM), (R-Subtask), (R-Commute), note that H2 = H1 so the con-clusion holds trivially. For Case (R-New), note that the onlychange accG(H2) has relative to accG(H1) is to add oneextra isolated node to the forest, hence the result is also aforest. For Case (R-Write), note that the Acc component ofthe heap does not change, and hence the graph in concernremain unchanged, so the conclusion holds trivially.

For Cases (R-Context), (R-Compose), the conclusion im-mediately follows induction hypothesis.

22 2009/5/5

For Case (R-Capture) and Case (R-SyncM), the reduc-tion in both cases attempts to add one more edge to forestaccG(H1), namely ot

dhosts−→ o, where T1 = 〈ot; o′; e1〉 ande1 = o.m(v) for Case (R-SyncM), and e1 = capture ofor Case (R-Capture). Starting from now, we only discussCase (R-SyncM), since Case (R-Capture) is the same. Inthis case, there are only several possibilities where H1(o) =〈a; L;σ;Acc;Fd〉:

• If Acc = ∅, we know by the definition of accG thatan edge o′ dhosts−→ o does not exist for any o′ in forestaccG(H1). By (14.3) and the definition of zeroInzeroOut ,we know an edge o dhosts−→ o′0 does not exist for any o′

in forest accG(H1) either. Thus, o is an isolated node.Adding edge ot

dhosts−→ o will still result in a forest by thebasic property of forests.• If Acc 6= ∅ and ot ∈ Acc, then the reduction does not

make any change to the heap according to (R-SyncM),so the conclusion holds trivially.• If Acc 6= ∅ and ot /∈ Acc, in this case by (R-SyncM)

we know pre-condition Acc ⊆ anc(H1, ot) holds. Whatit says for any object ID in Acc, it is an ancestor of ot.Let any such ancestor to be o′t. We know by the definitionof accG that an edge o′t

dhosts−→ o is already in the graphof accG(H1). Since o′t is ancestor of ot, we know thereexists a ot1, . . . , otn such that edges o′t

dhosts−→ ot1,ot1

dhosts−→ ot2, . . . , otndhosts−→ ot exist. Thus, we know

that even if edge otdhosts−→ o is added to the original

graph, the graph is still a forest since it at most will bea redundant edge.

For Cases (R-TaskEnd), (R-SubtaskEnd), note that thechange from graph accG(H1) to graph accG(H2) is reduc-tive in terms of edges. Graph-theoretically, removing edgesfrom a forest results in a (more sparse) forest.

Theorem 5 (Isolation Preservation). See Thm. 1.

Proof. It can be trivially seen that (R-Init) leads to a forest.The rest of the proof is trivial by Lem. 14.

C.9 Static IsolationLemma 15 (Locality Preservation Reduction). Given

`global P\G (15.1)

S1e⇒ S2 (15.2)

Si = 〈Hi; Ti〉 for i = 1, 2 (15.3)

then

• either droot(H2, o) = droot(H1, o)• or droot(H2, o) = o

for all o ∈ dom(H1) such that H1(o) = 〈a; L;σ;Acc1;Fd1〉and local(P , L).

Proof. Induction on the reduction sequence defined in (15.2)and case analysis on the last step. If the reduction is an in-stance of (R-TaskEnd) or (R-SubtaskEnd), note that thereduction is reductive in terms of the Acc component for allobjects on the heap. In other words, the edges in accG(H1)are reductive. By Thm. 1, we know accG(H1) forms a forest.Graph theoretically, we know there are only two possibilitiesfor any node in the graph when some edges are removed: ei-ther they are still connected (and thus have the same root),or they are isolated. These two cases are droot(H2, o) =droot(H1, o) and droot(H2, o) = o respectively. The con-clusion thus holds.

If the reduction is an instance of (R-Capture), By (T-Capture), the only possibility of change on the graph ofaccG(H1) is on o0 where the redex is capture o0. If thisis the case, by (15.1), we know cap(α, self) ≤ α′ constraintwould be put into the constraint set, which contradicts thedefinition of local(P , L, since we know given Ξ(P) = P , forall G `ocl 〈a0; L;σ0〉 ≤ α0, and P(a0) = 〈ε;F0;M0;U0〉,it never exists α′, α′′ such that G `ocl cap(α0, α

′′) ≤ α′.If the reduction is an instance of (R-SyncM), thus we

know the redex e1 = o0.m(v). If this is the case, by (15.1),and by (T-SyncM), we know α′0

hosts−→ α0 is part of theconstraint set for some α′0 and α0 where α0 representso0. By (15.1), we also know isolatedTasks(G,P). Hence,we know if isAccessed(G,P, α0), then ∀α1.α2.{α1, α2} ∈cutVertices(G,P, α0) implies either accesses(α1, α2) oraccesses(α2, α1). It is easy to verify isAccessed(G,P, α0)does hold in this case, so we know ∀α1.α2.{α1, α2} ∈cutVertices(G,P, α0) implies either accesses(α1, α2) oraccesses(α2, α1). What this suggests is that all the cut ver-tices of α0 will be on the same path going to the samestatic variable root, say αroot. This would have given usdroot(H2, o) = droot(H1, o) except that when the staticgraph is instantiated to the dynamic one, αroot might repre-sent more than one objects. Should that happen, the equa-tion above would not hold. This luckily is not true becausegiven (15.1), we also know and tunnelFree(G,P). The lat-ter can be expanded to three cases. Not to lose general-ity, let us assume there are indeed two oroot1 and oroot2that both map to αroot, and droot(H2, o) = oroot2 anddroot(H1, o) = oroot1. We now show the three cases: 1)oroot1 and oroot2 acquire a reference of o when both ofthem as tasks are instantiated. Note that in this case, theonly possibility is for them to pass o in as an argumentwhen (R-AsyncM) happened to create oroot1 and oroot2. weknow in this context, a member in either instHost(G,α0)or capHost(G,α0) cannot be a descendent of root αroot.Contradiction with the definition of extFree . 2) oroot1 andoroot2 acquire a reference of o when both of them are inthe middle of their execution. We now need to consider howcould two tasks receive a data back in the middle of their

23 2009/5/5

execution. The new expression certainly would not work,since in that case what is returned is not the shared data o.The asynchronous invocation expression does not have a re-turn value. The only interesting cases are both acquiring othrough a path that involves a number of synchronous invo-cations, and end at a subtasking operation, where o is stored.This is precisely what is outlawed by dissemFree . 3) oroot1and oroot2 acquire a reference of o when oroot1 is in the mid-dle of their execution, and oroot2 as a task is being created.Or oroot2 and oroot1 acquire a reference of o when oroot2is in the middle of their execution, and oroot1 as a task isbeing created. Not to lose generality, we only need to proveone case (the former). Note that in this case, it must havebeen that oroot1, through a chain of calls, eventually sendsan asynchronous message to oroot2 to start a task, passing oas an argument. This in reality wouldn’t happen as it will becaptured by cordFree.

All the other reductions either do not change the heap, oronly change the parts that are not related to the concern here(such as (R-Write) updates the Fd component of the heapentry, or (R-New) adds a new isolated node in the graph).The conclusion holds trivially for those cases.

Lemma 16 (Hierarchical Elements in Acc). If P ⇒∗〈H;T 〉 and for any o such that H(o) = 〈a; L;σ;Acc;Fd〉,then either Acc = ∅, or all elements in Acc are on a treepath in accG(H). (In other words, if we consider the partialorder on each tree in accG(H) as transitive, then all ele-ments in Acc form a total order as a sub-graph of a specifictree in accG(H).)

Proof. A simple induction on length of execution sequences,and the only interesting cases are (R-SyncM) and (R-Capture). Note that the pre-condiitions on both rulesAcc ⊆anc(H, ot) successfully enforces this property. It says thatAccmust be the ancestors of ot. Not to lose generality, let usassume the tree path from the root to ot is [o1, . . . , on, ot].Thus the set Acc must be a subset of {o1, . . . , on, ot}. Nowit is trivial to see Acc ∪ {ot} also forms a tree path, whichis trivially ot is [o1, . . . , on, ot]. Hence the invariant holdsover the reductions of (R-SyncM) and (R-Capture). (R-TaskEnd) and (R-SubTaskEnd) only reduces Acc for allnodes involved (or set it to ∅). Reducing nodes on a tree pathdoes not change the property: the rest of the nodes will triv-ially be on the same tree path (only sparser apart from eachother).

Theorem 6 (Static Isolation). See Thm. 3 in Sec. 4.

Proof. For any heap cell with a label L where local(P , L)does not hold, we know according to the definition of e⇒∗,the definition of tryLock remains the same for any reduc-tion related to these cells, so the proof of isolation is iden-tical to Thm. 5. The only interesting case to prove then

is the heap cells where local(P , L) holds. In this case, weknow if we can prove for such an o such that H(o) =〈a; L;σ;Acc;Fd〉, the reduction would be the same as thecase of Thm. 5 iff we can prove Acc ⊆ ancestors(H, ot).If Acc = ∅, then the condition holds trivially. Otherwise, letAcc = {o1, o2, . . . , on}. According to Lem. 16, we knowo1, o2, . . . , on in fact forms a path on the same tree in theforest accG(H). Thus, if ot is a descendant of on (or be-ing on itself), then we know the pre-condition holds triviallyaccording to the definition of ancestor– the ancestors of otwill subsume Acc. If not, not to lose generality we knowthere are only several cases:

• Case 1: ot is on the tree path of {o1, o2, . . . , on}, but itis not on. Not to lose generality, let us assume it is ok,where 1 ≤ k ≤ n − 1, or it is between ok and ok+1. ByLem. 12, we know ot can at most have children nodeson acc(H), but will never have grandchildren. Thus theonly possibility would be to have on−1 being ot, and onbeing its direct child. Even so wouldn’t work, since thesame lemma says all the children nodes of ot must beordinary objects, and by the basic properties of Acc weknow they are always task objects and subtask objects.Contradiction.• Case 2: ot is not on the tree path of {o1, o2, . . . , on},

but in the same tree, such as ot is higher up on the tree,or being a part of a sub-tree where its ancestors do notcompletely cover {o1, o2, . . . , on}. It is not possible forot to be higher up on the tree, since that way, it wouldhave children which are not ordinary objects. Contradic-tion with Lem. 12. It is not possible for ot to be part ofa sub-tree where its ancestors do not completely cover{o1, o2, . . . , on}. This is because, according to Lem. 12,all ancestors of ot can not have non-ordinary children ei-ther. There is no place on the graph to place those nodesin {o1, o2, . . . , on} but are not ancestors of ot.• Case 3: ot does not belong to the same tree at all. This

would not happen directly follows Lem. 15.

Hence the conclusion holds after the proof-by-contradictionof three cases.

C.10 Subject ReductionLemma 17 (Structural Invariant over Reduction). If P `S1\Σ1, S1 ⇒ S2, then

structure(S2) = S2 (17.1)

P `wft S2 (17.2)

Proof. Straightforward case analysis.

24 2009/5/5

Lemma 18 (Well-Typed Heap over Reduction). If

`global P\G (18.1)

Ξ(P) = P (18.2)

S1 ⇒ S2 (18.3)

Si = 〈Hi; Ti〉, for i = 1, 2 (18.4)

structure(Si) = Si, for i = 1, 2 (18.5)

structure(Hi) = Hi, for i = 1, 2 (18.6)

S1,P `h H1\Σh1 (18.7)

o ∈ dom(H1) (18.8)

then

S2,P `h H2\Σh2 (18.9)

Σh1G−⇀ Σh2 (18.10)

Hi(o) = 〈a; L;σ;Acci〉, for i = 1, 2 (18.11)

Proof. Straightforward case analysis.

Lemma 19 (Well-Typed Single Task Reduction). If

`global P\G (19.Known1)

Ξ(P) = P (19.Known2)

S1 ⇒ S2 (19.Known3)

Si = 〈Hi; Ti〉, for i = 1, 2 (19.Known4)

Ti = 〈ot; o; ei〉, for i = 1, 2 (19.Known5)

structure(Si) = Si, for i = 1, 2 (19.Known6)

Si,P `h Hi\Σhi, for i = 1, 2 (19.Known7)

structure(Hi) = Hi, for i = 1, 2 (19.Known8)

Hi(o) = 〈a; L;σ;Acci〉, for i = 1, 2 (19.Known9)

[iconf : S1,myC : a],P ` e1 : τ1\Σ1 (19.Known10)

then

[iconf : S2,myC : a],P ` e2 : τ2\Σ2

(19.Goal1)

Σh1 ∪ Σ1G−⇀ Σ02

(19.Goal2)

range(σfresh) ∩ FTV (Σ02 ∪ Σ1) = ∅(19.Goal3)

for all α0 ∈ (dom(σfresh) ∩ dom(σ)). σfresh(α0) = σ(α0)(19.Goal4)

Σ2 = Σ02[σfresh](19.Goal5)

τ2 = τ1[σfresh](19.Goal6)

Proof. Induction on the length of the reduction in (19.Known3),and case analysis on the last step of the reduction rule beingused:

Case (R-New): By (R-New) and (19.Known2), we know

e1 = newL′ a′

P(a′) = 〈µ′;F ′;M′;U ′〉o′ fresh

Fd′ =⊎

f∈dom(F ′)

(f 7→ null)

H2 = H1[o′ 7→ 〈a′; L′;σ′; ∅;Fd′〉]α′ fresh

{α1, . . . , αn} = {α0 | F ′(f) = ∀α0.a0@α0}α′, α′1, . . . , α

′n fresh

σ′ = [self 7→ α′, α1 7→ α′1, . . . , αn 7→ α′n]

N2 = N1

e2 = o′

By (T-New), (19.Known10), and the definition of e1 above,we know

τ1 = a′@α′

Σ1 = {〈a′; L′;σ′〉 ≤ α′}

Thus by (19.Known4), (19.Known5), (19.Known6), and thedefinition of structure, we know

S1 = 〈H1;N1; T1〉S2 = S1[o′ 7→ 〈a′; L′;σ′; ∅〉]

With the definition of S2 above, and (T-RID), we know

[iconf : S2,myC : a],P ` o′ : a′@α′\∅ (19.1)

Let Σ2 = ∅ and τ2 = a′@α′. Hence (19.1) is identicalto (19.Goal1). Let σfresh = ∅ and let Σ02 = ∅. Thus(19.Goal3), (19.Goal4), (19.Goal5), (19.Goal6) hold triv-ially. (19.Goal2) also holds trivially because of Lem. 8.

Case (R-Cast): By (R-Cast), we know e1 = (a′)v ande2 = v. We also know H2 = H1, N2 = N1. Thus by(19.Known4), (19.Known5), (19.Known6), and the defini-tion of structure, we know S2 = S1. By the same reductionrule, together with (19.Known2), we also know

H1(v) = 〈av; Lv;σv;Accv;Fdv〉 (19.2)

P(av) = 〈µv;Fv;Mv;Uv〉 (19.3)

a′ ∈ Uv (19.4)

By (19.Known1), (T-Heap), (T-HeapCell), we know

σv(self) = αv (19.5)

By the definition of structure and (19.2), we know

H1(v) = 〈av; Lv;σv;Accv〉 (19.6)

25 2009/5/5

By (T-RID), (19.6), (19.5), we know

[iconf : S1,myC : a],P ` v : (av@αv)\∅ (19.7)

By (T-Cast), (19.Known10), and the definition of e1 above,we know

[iconf : S1,myC : a],P ` v : (a′v@α′v)\Σ′v (19.8)

P(a′) = 〈µv;F ′;M′;U ′〉 (19.9)

a′v ∈ U ′ or a′ ∈ Uv (19.10)

Σ1 = Σ′vτ1 = a′@α′v

By Lem. 1 and (19.8), we know Σ′v = ∅. By Lem. 2, (19.7),(19.8), we know αv = α′v. Thus (19.7) can be rewritten as

[iconf : S1,myC : a],P ` v : (av@α′v)\∅ (19.11)

By (T-Sub), (19.11), (19.3), (19.4), we know

[iconf : S1,myC : a],P ` v : (a′@α′v)\∅ (19.12)

Let Σ2 = ∅ and τ2 = a′@α′v. Earlier we have shownS2 = S1, hence (19.12) is identical to (19.Goal1). Letσfresh = ∅ and let Σ02 = ∅. (19.Goal2) also holds triv-ially because of Lem. 8. (19.Goal3), (19.Goal4), (19.Goal5),(19.Goal6) hold trivially.

Case (R-Capture): In this case we know e1 = capture vand e2 = v. We also know

H1(v) = 〈av; Lv;σv;Accv;Fdv〉 (19.13)

Accv ⊆ anc(ot) (19.14)

H2 = H1[v 7→ 〈av; Lv;σv;Accv ∪ {ot};Fdv〉] (19.15)

N2 = N1 (19.16)

Thus by (19.Known4), (19.Known5), (19.Known6), the def-inition of structure, and Def. 8, we know

[iconf : S1,myC : a] ≤acc [iconf : S2,myC : a] (19.17)

By (19.Known1), (T-Heap), (T-HeapCell), we know

σv(self) = αv (19.18)

By the definition of structure and (19.17), we know

H1(v) = 〈av; Lv;σv;Accv〉 (19.19)

By (T-RID), (19.19), (19.18), we know

[iconf : S1,myC : a],P ` v : (av@αv)\∅ (19.20)

By (T-Capture), (19.Known10), we know

[iconf : S1,myC : a], P ` v : (a′v@α′v)\Σ′v (19.21)

α′ fresh (19.22)

P(a′v) = 〈ε;F ′v;M′v;U ′v〉 (19.23)

τ1 = a′v@α′ (19.24)

Σ1 = Σ′v ∪ {cap(α′v, self) ≤ α′, self hosts−→ α′} (19.25)

By Lem. 1 and (19.21), we know Σ′v = ∅. By Lem. 2,(19.20), (19.21), we know αv = α′v. With all these, togetherwith Lem. 10, (19.17), (19.21), we know

[iconf : S2,myC : a], P ` v : (a′v@αv)\∅ (19.26)

Let Σ2 = ∅ and τ2 = a′v@αv. Hence (19.26) is iden-tical to (19.Goal1). Let σfresh = [α′ 7→ αv and letΣ02 = ∅. (19.Goal2) also holds trivially because of Lem. 8.(19.Goal5), (19.Goal6) hold trivially. To prove (19.Goal3),note that range(σfresh) = {αv}. FTV (Σ02 ∪ Σ1) =FTV (Σ1) = {αv, self, α′}. (19.Goal4) holds trivially sinceα′ is fresh and dom(σfresh) ∩ dom(σ) = ∅.

Case (R-Read): given (19.Known8) and (19.Known9),and the definition of structure,H1(o) = 〈a; L;σ;Acc;Fd1〉for some Fd1. Thus by (R-Read), we know e1 = f ande2 = Fd1(f). We also know H2 = H1, N2 = N1. Thusby (19.Known4), (19.Known5), (19.Known6), and the def-inition of structure, we know S2 = S1. By (T-Read),(19.Known10), and the definition of e1 above, we know

P(a) = 〈µ;F ;M;U〉 (19.27)

F(f) = ∀αf.a@αf (19.28)

τ1 = a@αf (19.29)

Σ1 = {〈self; f〉 ≤ αf} (19.30)

By (19.Known1), (T-Heap), (T-HeapCell), (19.Known7),(19.27), (19.28), we know

σ(self) = α (19.31)

[iconf : S1,myC : a],P ` Fd(f) : a@α′f\∅ (19.32)

σ(αf) = α′f (19.33)

Earlier we have proven S2 = S1. Let Σ2 = ∅ andτ2 = a@α′f. Hence (19.32) is identical to (19.Goal1). Letσfresh = [αf 7→ α′f] and let Σ02 = ∅. Thus (19.Goal5),(19.Goal6) hold trivially. (19.Goal2) also holds trivially be-cause of Lem. 8. By (19.Known1), (T-Heap), (T-HeapCell),we know {αf} ∩ {α′f} = ∅. Thus also with the definition ofΣ02 = ∅, we know (19.Goal3) holds. To prove (19.Goal4),note that dom(σfresh) ∩ dom(σ) = {αf} according to(19.33) and the definition of σfresh. Thus (19.Goal4) holdsbecause σfresh(αf) = σ′f = σ(αf) again according to(19.33).

Case (R-Write): given (19.Known8) and (19.Known9),and the definition of structure,H1(o) = 〈a; L;σ;Acc;Fd1〉for some Fd1. Thus by (R-Write), we know e1 = f:=v ande2 = v. We also know H2 = H1[o 7→ 〈a; L;σ;Acc;Fd1{f 7→v}〉], N2 = N1. Thus by (19.Known4), (19.Known5),(19.Known6), and the definition of structure, we knowS2 = S1. By (T-Write), (19.Known10), and the definition

26 2009/5/5

of e1 above, we know

P(a) = 〈µ;F ;M;U〉 (19.34)

F(f) = ∀αf.a@αf (19.35)

τ1 = a@αv (19.36)

[iconf : S1,myC : a], P ` v : a@αv\Σv (19.37)

Σ1 = Σv ∪ {αv ≤ αf, αf ≤ 〈self; f〉} (19.38)

By Lem. 1 and (19.37), we know Σv = ∅. Earlier we haveproven S2 = S1. Let Σ2 = Σv = ∅ and τ2 = a@αv.Hence (19.37) is identical to (19.Goal1). Let σfresh = ∅ andlet Σ02 = ∅. Thus (19.Goal4), (19.Goal5), (19.Goal6) holdtrivially. (19.Goal2) also holds trivially because of Lem. 8.

Case (R-SyncM): thus

e1 = in(o′, e′1)

e2 = in(o′, e′2)

H1, 〈ot; o′; e′1〉 ⇒ H2, N2, 〈ot; o′; e′2〉 (19.39)

By (T-In), (19.Known10), and the definition of e1 above, weknow

H1(o′) = 〈a′; L′;σ′f;Acc′〉 (19.40)

[iconf : S1,myC : a],P ` e′1 : τ ′\Σ′1 (19.41)

σ′ = fresh(FTV (Σ′1)− dom(σ′f)− dom(σ′m)) (19.42)

τ1 = τ ′1[σ′][σ′f][σ′m] (19.43)

Σ1 = Σ′1[σ′][σ′f][σ′m] (19.44)

Given (19.66), (19.Known1), (19.Known2), (19.Known3),(19.Known4), (19.Known5), (19.Known6), (19.Known7),(19.Known8), (19.Known9), we have for some Acc′′,

H2(o′) = 〈a′; L′;σ′f;Acc′′〉 (19.45)

By induction hypothesis, (19.41), (19.Known1), (19.Known2),(19.39), (19.Known6), (19.Known7), (19.Known8), (19.Known9),(19.66), (19.45), we know

[iconf : S2,myC : a],P ` e′2 : τ ′2\Σ′2 (19.46)

Σh1 ∪ Σ′1G−⇀ Σ′02 (19.47)

range(σ′fresh) ∩ FTV (Σ′02 ∪ Σh1 ∪ Σ′1) = ∅ (19.48)

Σ′2 = Σ′02[σ′fresh] (19.49)

τ ′2 = τ ′1[σ′fresh] (19.50)

Let

σ′′ = fresh(FTV (Σ′2)− dom(σ′f)− dom(σ′m)) (19.51)

By (T-In), and (19.45), (19.46), (19.51), and the definition ofe2, we know

[iconf : S2,myC : a],P ` e2 : τ2\Σ2 (19.52)

τ2 = τ ′2[σ′′][σ′f][σ′m] (19.53)

Σ2 = Σ′2[σ′′][σ′f][σ′m] (19.54)

Given (19.47) and , we can easily know

Σh1[σ′][σ′f][σ′m] ∪ Σ′1[σ′][σ′f][σ

′m]

G−⇀ Σ′02[σ′][σ′f][σ′m]

which according to and (19.44), we know it can be rewrittenas

Σh1 ∪ Σ1G−⇀ Σ′02[σ′][σ′f][σ

′m] (19.55)

Now according to Lem 7, and (19.47), we know the typevariables in Σ′02 is either fresh or belongs to FTV (Σh1 ∪Σ′1). Let the former variables are α1, . . . , αk, and the lat-ter variables are αk+1, . . . , αm. By the definition of Σ′2 in(19.49), we know the type variables in Σ′2 are either inΣ02′ or they are in range(σ′fresh). According to (19.48), thetwo aforementioned sets are disjoint. We let the type vari-ables in range(σ′fresh) to be αm+1, . . . , αn. Thus we knowFTV (Σ′2) = {α1, . . . , αn}. Rewriting the definition of σ′′

in (19.51), we know

σ′′ = fresh({α1, . . . , αn} − dom(σ′f)− dom(σ′m))

Note that {α1, . . . , αk} are fresh, and αm+1, . . . , αn arefresh as well, so The above equation can be written as

σ′′ = σ′′x � σ′′y � σ′′z(19.56)

σ′′x = fresh({αk+1, . . . , αm} − dom(σ′f)− dom(σ′m))(19.57)

σ′′y = fresh({α1, . . . , αk})(19.58)

σ′′z = fresh({αm+1, . . . , αn})(19.59)

Since {αk+1, . . . , αm} contains all the type variables thatare part of FTV (Σ′2) that belong to FTV (Σh1 ∪ Σ′1), weknow

Σ′2[σ′′xnew] = Σ′2[σ′′x ](19.60)

τ ′2[σ′′xnew] = τ ′2[σ′′x ](19.61)

σ′′xnew = fresh(FTV (Σh1 ∪ Σ′1)− dom(σ′f)− dom(σ′m))(19.62)

By the definition of set theory, we know FTV (Σh1 ∪Σ′1)−dom(σ′f)−dom(σ′m) = FTV (Σh1)−dom(σ′f)−dom(σ′m)∪FTV (Σ′1)− dom(σ′f)− dom(σ′m). According to the defini-tion of σ′ in (19.42), we know

σ′′xnew = σ′ � σh (19.63)

σh = fresh(FTV (Σh1)− dom(σ′f)− dom(σ′m)) (19.64)

We know (19.Goal1) holds because of (19.52). (19.Goal2)holds because of (19.55) when Σ02 = Σ′02[σ′][σ′f][σ

′m]. To

27 2009/5/5

prove (19.Goal5), we know

Σ2 = Σ′2[σ′′][σ′f][σ′m] By (19.54)

= Σ′2[σ′′x ][σ′′y ][σ′′z ][σ′f][σ′m] By (19.56)

= Σ′2[σ′′xnew][σ′′y ][σ′′z ][σ′f][σ

′m] By (19.60)

= Σ′2[σ′][σh][σ′′y ][σ′′z ][σ′f][σ′m] By (19.63)

= Σ′02[σ′fresh][σ

′][σh][σ′′y ][σ′′z ][σ′f][σ′m] By (19.49)

= Σ′02[σ′][σ′f][σ

′m][σ′fresh][σh][σ

′′y ][σ′′z ]

where σfresh = [σ′fresh][σh][σ′′y ][σ′′z ]. To prove (19.Goal6),

note that

τ2 = τ ′2[σ′′][σ′f][σ′m] By (19.53)

= τ ′2[σ′′x ][σ′′y ][σ′′z ][σ′f][σ′m] By (19.56)

= τ ′2[σ′′xnew][σ′′y ][σ′′z ][σ′f][σ

′m] By (19.61)

= τ ′2[σ′][σh][σ′′y ][σ′′z ][σ′f][σ′m] By (19.63)

= τ ′1[σ′fresh][σ′][σh][σ′′y ][σ′′z ][σ′f][σ

′m] By (19.50)

= τ ′1[σ′][σ′f][σ′m][σ′fresh][σh][σ

′′y ][σ′′z ] By

= τ1[σ′fresh][σh][σ′′y ][σ′′z ] By (19.43)

thus (19.Goal3) holds . The last part is to prove self substi-tution. Thus we know e1 = this and e2 = o. We also knowH2 = H1, N2 = N1. Thus by (19.Known4), (19.Known5),(19.Known6), and the definition of structure, we knowS2 = S1. By (T-Self), (19.Known10), and the definitionof e1 above, we know τ1 = a@self and Σ1 = ∅. By(19.Known1), (T-Heap), (T-HeapCell), (19.Known7), weknow for some α, σ(self) = α. With this, by (T-RID) and(19.Known9), we know

[iconf : S1,myC : a], P ` o : a@α\∅ (19.65)

Earlier we have proven S2 = S1. Let Σ2 = ∅ andτ2 = a@α. Hence (19.65) is identical to (19.Goal1). Letσfresh = [self 7→ α] and let Σ02 = ∅. Thus (19.Goal5),(19.Goal6) hold trivially. (19.Goal2) also holds triviallybecause of Lem. 8. (19.Goal3) also holds trivially sinceΣ02∪Σ1 = ∅. To prove (19.Goal4), note that dom(σfresh)∩dom(σ) = {self} according to the definition of σfresh andour earlier conclusion that σ(self) = α. Thus (19.Goal4)trivially holds as a result of the latter.

Case (R-ASyncM): With (19.Known9), and (19.Known3),(19.Known4), (T-RID) we know

Σo = {〈a; L;σ〉 ≤ α} (19.66)

Given (19.Known2), (T-Call), and (19.39), we know

Γ, P ` v : ad@αd0\Σv (19.67)

Σlazy = {lazy(α,m, σm)} (19.68)

Σd =

〈self; Γ(myM)〉 α′d→false 〈α; m〉

〈α; m〉 α′r→false 〈self; Γ(myM)〉

(19.69)

τ1 = ar@αr0 (19.70)

Σ1 = Σv ∪ Σlazy ∪ Σd ∪ {αd0 ≤ α′d, α′r ≤ αr0} (19.71)

According to Lem. 6, (19.Known1), (19.Known5), (19.Known6),(19.Known7), (19.Known9), (19.52), (19.53), (??), (19.67),we know

Γ � [myC : a,myM : m],P ` e{v/x} : ar@αr0\Σ′′(19.72)

Σ′′ = G(〈a; m〉)[σm] ∪ Σv ∪ {αd0 ≤ α′d, α′r ≤ αr0}(19.73)

According to Def. 8, (19.Known8), we know

Γ′(iconf) = 〈H′; T 〉 (19.74)

H′(o) = 〈a; L;σ;Acc′〉 (19.75)

According to Lem. 10, (19.72), we know

Γ′ � [myC : a,myM : m],P ` e{v/x} : ar@αr0\Σ′′(19.76)

With (T-In), (19.74), (19.75), (19.Known9), (19.76), weknow

Γ′,P ` in(o, e{v/x}) : τ2\Σ2 (19.77)

τ2 = ar@αr0[σin][σ][σm] (19.78)

Σ2 = Σ′′[σin][σ][σm] (19.79)

σin = fresh(FTV (Σ′′)− dom(σ)− {αd, αr}) (19.80)

Let σfresh = [σin][σ][σm]. We know (19.Goal6). The trickything is to show (19.Goal3). The rest of the proof is toshow (19.Goal2) holds. We know that if Σ02 = Σ′′, then(19.Goal5) holds according to (19.79). Now according to(OCL-Main), we know trivially

G� [〈amain; mmain〉 7→ Σo ∪ Σ1] `ocl Σo ∪ Σ1 (19.81)

Let G′ = G � [〈amain; mmain〉 7→ Σo ∪ Σ1]. By (19.66),(19.41), (19.68), (19.71), and the definition of `ocl, we know

G′ `ocl 〈a; L;σ〉 ≤ α〉G′ `ocl lazy(α,m, σm)

As a result of (??), we know G′(〈a; m〉) = G(〈a; m〉).By (OCL-Lazy-Intro), (OCL-Lazy-NonReursive), (OCL-Lazy-Cancel), and the rules above, we know

G′ `ocl G(〈a; m〉)[σfitg][σ][σm]

σfitg = fresh(FTV (G(〈a; m〉))− dom(σ)− dom(σm))(19.82)

28 2009/5/5

It is obvious σfitg, σ, and σm have disjoint domains , so theabove can be rewritten as:

G′ `ocl G(〈a; m〉)[σm][σfitg][σ]

(19.83)

Now according to the definition of σfitg and σin, and thedefinition of Σ′′ in (19.73), we knowG(〈a; m〉)[σm][σfitg] =G(〈a; m〉)[σm][σin]. Plus the disjointness of the three substi-tutions, we know

G′ `ocl G(〈a; m〉)[σin][σ][σm]

(19.84)

which is

Σo ∪ Σlazy ∪ ΣdG−⇀ G(〈a; m〉)[σm] (19.85)

Now according to Lem. 9, we know

Σo ∪ Σv ∪ Σlazy ∪ Σd ∪ {αd0 ≤ α′d, α′r ≤ αr0}G−⇀

G(〈a; m〉)[σm] ∪ Σv ∪ {αd0 ≤ α′d, α′r ≤ αr0}

That is according to the definition of Σ1 in (19.71):

Σo ∪ Σ1G−⇀ G(〈a; m〉)[σm] ∪ Σv ∪ {αd0 ≤ α′d, α′r ≤ αr0}

which is (19.Goal2).

Case (R-Context): thus

e1 = E[ e′1 ]

e2 = E[ e′2 ]

H1, 〈ot; o; e′1〉 ⇒ H2, N2, 〈ot; o; e′2〉 (19.86)

By (T-Context), (19.Known10), the conclusion holds byinduction.

Lemma 20 (Subject Reduction). Given Si = 〈Hi; Ti〉 fori = 1, 2,

P ` S1\Σ1 (20.1)

S1 ⇒ S2 (20.2)

`global P\G (20.3)

then P ` S2\Σ2 and Σ1G−⇀ Σ2.

Proof. Given (20.1) and (T-Config), we know

Ξ(P) = P (20.4)

structure(S1) = S1 (20.5)

P `wft S1 (20.6)

S1,P `h H1\Σh1 (20.7)

S1,P `tp T1\Σtp1 (20.8)

Σ1 = Σh1 ∪ Σtp1 (20.9)

Given Lem. 17, (20.1), (20.2), we know

structure(S2) = S2 (20.10)

P `wft S2 (20.11)

Given Lem. 18, (20.3), (20.4), (20.2), (20.5),(20.10), (20.7),we know

S2,P `h H2\Σh2 (20.12)

Σh1G−⇀ Σh2 (20.13)

With these preconditions, the non-trivial cases are proved inLem. ??, Lem. 19 and Lem. 18.

C.11 ProgressDefinition 11. deadlockable(〈H ; T 〉) holds iff we knowdaccesses(H , oi, o′i) for i = 0 . . . n−1, 〈oi,E[ ei ]〉 belongsto T , and ei = capture o′i+1 mod n or o′i+1 mod n.mi(vi).

Lemma 21 (Progress). Given S1 = 〈H1; T1〉, and P `S1\Σ1, then either T1 = 〈o; v〉, or deadlockable(S1), orthere exists S2 such that S1 ⇒ S2.

Proof. Induction on the execution sequence and case analy-sis on the last step of the reduction. (R-Context) progressesby induction hypothesis. (R-New), (R-Write), (R-Read),(R-Commute), (R-Compose) always progress. (R-Cast)progresses because the pre-condition of a′ ∈ U can be im-mediately satisfied by (T− Cast), which in turn is a sub-derviation of P ` S1\Σ1 . The non-trivial cases are (R-SyncM), (R-Capture), (R-AsyncM), (R-Subtask) , whichwe now prove.

To prove the case of (R-SyncM), note that the pre-condition to satisfy is to have Acc ⊆ anc(H, ot). We knowthe elements in Acc do belong to the same tree in the forestof accG(H), and they form a tree path. In plain words, thepre-condition means that the executing task/subtask ot mustbe lower on the tree path where all elements in Acc belongsto. If deadlockable(S) does not hold, we know a cycle doesnot exist in T1, so we know there must be a chain in T1 thatcan reduce. (R-Capture), (R-AsyncM), (R-Subtask) canbe proved in the similar manner.

C.12 AtomicityWe write step str = (S, r, S′) to denote a transition S ⇒ S′

by reduction rule r. We let change(str) = (so, r, s′o) de-note the fact that the begin and end heaps of step str differat most on their state of o, taking it from so to s′o. Simi-larly, the change in two consecutive steps which changesat most two objects o1 and o2, o1 6= o2, is represented aschange(str1str2) = ((so1 , so2), r1r2, (s′o1 , s

′o2)). If o1 = o2,

then change(str1str2) = (so1 , r1r2, s′o1).

Definition 12 (Local and Nonlocal Step). A step str =(S, r, S′) is a local step if r is one of the local rules: ei-ther (R-Read), (R-Write), (R-New) or (R-SyncM). str is a

29 2009/5/5

nonlocal step if r is one of nonlocal rules: either (R-Task),(R-SubTask), (R-TEnd) or (R-STEnd).

Every nonlocal rule has a label given in Fig 11. For exam-ple, the (R-Task) rule has label (R-Task)(t, γ,mn, v, o, t′)meaning asynchronous message mn was sent from object γin task t to another object o in a new task t′, and the argu-ment passed was v. These labels are used as the observable;the local rules also carry labels but since they are local stepsinternal to a t so that they are not observable steps.

Intuitively, observable steps are places where tasks inter-act by making results of local steps possibly visible to othertasks. For instance, in a (R-Task) step, a task t creates a newtask t′ by sending an asynchronous messagemn. Along withthe message mn, t sends a parameter v to t′. If v is a valuecalculated by t’s local steps, then upon the execution of the(R-Task) step, tmakes its internal computation visible to thenewly created t′.

Lemma 22. In any given local step str, at most one objecto’s state can be changed from so to s′o (so is null if strcreates o).

Proof. A local step is step of applying one of the local rulesdefined in Definition 12. Among all these rules, only (R-New), (R-Read) and (R-Write) can possibly change objectstate. (R-New) creates a single object. The (R-Read) and(R-Write) rules each operates on exactly one object and maychange is its state. No other rules change object state.

Definition 13 (Computation Path). A computation path p isa finite sequence of steps str1str2 . . . stri−1stri such that weknow str1 , str2 , . . . stri−1 , stri = (S0, r1, S1), (S1, r2, S2),. . . ,(Si−2, ri−1, Si−1), (Si−1, ri, Si).

Here we only consider finite paths as is common in pro-cess algebra, which simplifies our presentation. Infinite pathscan be interpreted as a set of ever-increasing finite paths. Forbrevity purpose, we use computation path and path in an in-terchangeable way if no confusion would arise.

The initial run time configuration of a computation path,S0, always has H = ∅, and T = {omain}. omain is the defaultbootstrapping task that starts the execution of a programfrom the entry main method of a class in the program.

Definition 14 (Observable Behavior). The observable be-havior of a computation path p, ob(p), is the label sequenceof all nonlocal steps occurring in p.

Definition 15 (Observable Equivalence). Two paths p1

and p2 are observably equivalent, written as p1 ≡ p2, iffob(p1) = ob(p2).

Definition 16 (Object-blocked). A task t is in an object-blocked state S at some point in a path p if it would beenabled for a next step str = (S, r, S′) for which r is a (R-SyncM) or (R-AsyncM) or (R-Subtask) or (R-Capture)step on object o, except for the fact that there is a captureviolation on o: one of the Acc ⊆ preconditions fails to hold

in S and so the step cannot in fact be the next step at thatpoint.

Object-blocked state is a state a task cannot make processbecause it has to wait for an object becoming available to it.

Definition 17 (Sub-path and Maximal Sub-path). Given afixed p, for some task t a sub-path spt of p is a sequenceof steps in p which are all local steps of task t. A maximalsub-path is a spt in p which is longest: no local t steps in pcan be added to the beginning or the end of spt to obtain alonger sub-path.

Note that the steps in spt need not be consecutive in p,they can be interleaved with steps belonging to other tasks.

Definition 18 (Pointed Maximal Sub-path). For a givenpath, a pointed maximal sub-path for a task t (pmspt) is amaximal sub-path spt with either 1) it has one nonlocal stepappended to its end or 2) there are no more t steps ever inthe path.

The second case is the technical case of when the (finite)path has ended but the task t is still running. The last step ofa pmspt is called its point. We omit the t subscript on pmsptwhen we do not care which task a pmsp belongs to.

Since we have extended the pmsp maximally and haveallowed inclusion of one nonlocal step at the end, we havecaptured all the steps of any path in some pmsp:

Lemma 23. For a given path p, all the steps of p can bepartitioned into a set of pmsp’s where each step str of poccurs in precisely one pmsp, written as str ∈ pmsp.

Proof. This immediately follows the definition of pmsp.

Given this fact, we can make the following unambiguousdefinition.

Definition 19 (Indexed pmsp). For some fixed path p, definepmsp(i) to be the ith pointed maximal sub-path in p, whereall the steps of the pmsp(i) occur after any of pmsp(i + 1)and before any of pmsp(i− 1).

The pmsp’s are the units which we need to serialize: theyare all spread out in the initial path p, and we need to showthere is an equivalent path where each pmsp runs in turn asan atomic unit.

Definition 20 (Path around a pmsp(i)). The path around apmsp(i) is a finite sequence of all of the steps in p from thefirst step after pmsp(i − 1) to the end of pmsp(i) inclusive.It includes all steps of pmsp(i) and also all the interleavedsteps of other tasks.

Definition 19 defines a global ordering on all pmsp’s ina path p without concerning which task a pmsp belongs to.The following Definition 21 defines a task scope index of apmsp which is used to indicate the local ordering of pmsp’sof a task within the scope of the task.

30 2009/5/5

Definition 21 (Task Indexed pmsp). For some fixed path p,define pmspt,i to be the ith pointed maximal sub-path of taskt in p, where all the steps of the pmspt,i occur after any ofpmspt,i+1 and before any of pmspt,i−1.

For a pmspt,i, if we do not care its task scope ordering,we omit the index i and simply use pmspt.

Definition 22 (Waits-for and Deadlocking Path). For somepath p, pmspt1,i waits-for pmspt2,j if t1 goes into a object-blocked state in pmspt1,i on an object captured by t2 in theblocked state. A deadlocking path p is a path where thiswaits-for relation has a cycle: pmspt1,i waits-for pmspt2,jwhile pmspt2,i′ waits-for pmspt1,j′ .

Hereafter we assume in this theoretical development thatthere are no such cycles.

Definition 23 (Quantized Sub-path and Quantized Path). Aquantized sub-path contained in p is a pmspt of p where allsteps of pmspt are consecutive in p. A quantized path p is apath consisting of a sequence of quantized sub-paths.

The main technical Lemma is the following Bubble-Down Lemma, which shows how local steps can be pusheddown in the computation. Use of such a Lemma is the stan-dard technique to show atomicity properties. In this ap-proach, all state transition steps of a potentially interleav-ing execution are categorized based on their commutativitywith consecutive steps: a right mover, a left mover, a bothmover or a non-mover. The reduction is defined as movingthe transition steps in the allowed direction. We show thelocal steps are right movers; in fact they are both-movers butthat stronger result is not needed.

Definition 24 (Step Swap). For any two consecutive stepsstr1str2 in a computation path p, a step swap of str1str2 isdefined as swapping the order of application of rules in thetwo steps, i.e., apply r2 first then r1. We let st′r2st′r1 denote astep swap of str1str2 .

Definition 25 (Equivalent Step Swap). For two consecutivesteps str1str2 in a computation path p, where str1 ∈ pmspt1 ,str2 ∈ pmspt2 , t1 6= t2 and str1str2 = (S, r1, S′)(S′, r2, S′′),ifthe step swap of str1str2 , written as st′r2st′r1 , gives a new pathp′ such that p ≡ p′ and st′r2st′r1 = (S, r2, S∗)(S∗, r1, S′′),then it is an equivalent step swap.

Lemma 24 (Bubble-down Lemma). For any path p with anytwo consecutive steps str1str2 where str1 ∈ pmspt1 ,str2 ∈pmspt2 and t1 6= t2, if str1 is a local step, then a step swapof str1str2 is an equivalent step swap.

Proof. First, observe that if t2 is a subtask of t1, then it isimpossible for str1 to be a local step while str2 is a step oft2. Because according to semantics defined in Figure 11 atask and its subtask never have their local steps consecutiveto each other. Figure. 20 illustrates all possible cases of howsteps of a task and steps of its subtask may layout in a path. Itclearly shows that local steps of the task t1 and its subtask t2

always have their local steps demarcated by some nonlocalsteps.

According to Definition 14 and 15, ≡ is defined by thesequence of labels of nonlocal steps occurring in path p,so a step swap of str1str2 always gives a new path p′,p′ ≡ p, since str1 is a local step by the lemma so thatswap str1 with any str2 never changes the ordering of la-bels. Therefore, to show that the step swap of str1str2 isan equivalent step swap we only need to prove that ifstr1str2 = (S, r1, S′)(S′, r2, S′′), then the step swap ofstr1str2 is st′r2st′r1 = (S, r2, S∗)(S∗, r1, S′′).

Because str1 is a local step, it can at most change one ob-ject’s state on the heap H and a local step does not changeN according to the operational semantics. So we can repre-sent str1 as str1 = ((H1, T1), r1, (H2, T2)) where H1 andH2 differ at most on one object o.(Case I) str2 is also a local step.

In this case, str2 = ((H2, T2), r2, (H3, T3)). Because str1and str2 are steps of different tasks t1 and t2, str1 and str2must change different elements of T (t1 and t2 respectively)by inspection of the rules. This means any change made bystr1 and str2 to T always commute. So in the section of CaseI, we focus only on changes str1 and str2 make on H , andomit N and T for concision.

Suppose change(str1) = (so1 , r1, s′o1) and change(str2) =

(so2 , r2, s′o2).

If o1 6= o2, then it must hold that change(str1str2) =((so1 , so2), r1r2, (s′o1 , s

′o2)). Swapping the order of str1 , str2

by applying r2 first then we know r1 results in the change((so1 , so2), r2r1, (s′o1 , s

′o2)), which has the same start and

end states as change(str1str2), regardless of what r1 and r2are.

If o1 = o2 = o, then change(str1) = (so, r1, s′o),change(str2) = (s′o, r2, s

′′o), change(str1str2) = (so, r1r2, s′′o).

Let so = 〈a; L;Acc;Fd〉.By inspection of the rules this case only arises if both

of the rules are amongst (R-SyncM), (R-Capture), (R-AsyncM), (R-Subtask) and (R-New).Subcase a: r1 = (R-New).

change(str1) = (null, r1, so) so r2 cannot be (R-New)since o cannot be created twice. And, r2 also cannot be anyother local rule: o is just created in str1 by t1. For t2 to beable to access o, it must obtain a reference to o first. Onlyt1 can pass a reference of o to t2 directly or indirectly. As aresult, if str2 is a local step of t2 that operates on o, it cannotbe consecutive to str1 , and vice visa. Therefore, r1 (R-New)is not possible.Subcase b: r1 = (R-SyncM), (R-Capture), (R-AsyncM),(R-Subtask).

Trivially, r2 cannot be (R-New) that creates o becauseno steps can operate on an object before it is created. Inthis case, change(str1) = (〈a; L;Acc;Fd〉, r1, 〈a; L;Acc ∪{ot};Fd〉), if ot locks o in str1 or Acc′ = Acc if Acc isa subset of the ancestors of t1. Either way, there does not

31 2009/5/5

SUBTASK STEND t1 creates t2

...... ... ... ... ......

t2 ends

SUBTASK STEND t1 creates t2

...... ... ... ... ......

t2 ends

(d)

(c)

SUBTASK STEND t1 creates t2

...... ... ... ... ......

t2 ends

... ...(b)

SUBTASK STEND t1 creates t2

...... ... ... ... ......

t2 ends

... ...(a)

local step

step of tasks other than t1 and t2...nonlocal step t2

t1

Figure 20. Task and Subtask

exist an consecutive str2 of t2 where r2 is either (R-SyncM)or (R-AsyncM) or (R-Capture) or (R-Subtask) becauseif so, firing up str2 would violate the preconditions of theaforementioned rule.(Case II) str2 is a nonlocal step.

Let str1 = ((H1, T1), r1, (H2, T2)) and H2 differs fromH1 at most on object o since str1 is a local step. Becausestr1 and str2 are steps of t1 and t2 respectively, they mustchange different elements of T , i.e., t1 and t2. str2 as anonlocal step may add a new fresh element t to T . But str1and str2 still obviously commute in terms of changes to T .So in the following proof, we do not need to concern aboutT commutativity.Subcase a: r2 = (R-Task).

Let ot be the new task created in str2 , then it musthold that str2 = ((H2, T2), r2, (H2, T2 ‖ 〈ot; e〉)). The fi-nal state change of taking str1 and str2 in this order is((H1, T1), r1r2, (H2, T2 ‖ 〈ot; e〉)). Swapping str1 and

str2 results consecutive steps st′1 = ((H1, T1), r2, (H1, T1 ‖〈ot; e〉)) and st′2 = ((H1, T1 ∪ t), r1, (H2, T2 ‖ 〈ot; e〉)),which makes the combined state change of st′1st′2 to be((H1, T1), r2r1, (H2, T2 ‖ 〈ot; e〉)), the same state changeas that of str1str2 .Subcase b: r2 = (R-SubTask).

Let t be the new task created in str2 , then we knowit must have str2 = ((H2, T2), r2, (H2, T3)). str1str2 =((H1, ot1), r1r2, (H2, T3)). If we apply r2 first, we getst′1 = ((H1, ot1), r2, (H1, T2)). Then r1 is applied to getst′2 = ((H1, T2), r1, (H2, T3)). Therefore, st′1st′2 results in atransition((H1, ot1), r2r1, (H2, T3) which has the same start and finalstates as str1str2 .Subcase c: r2 = (R-TEnd).

Let str2 = ((H2, ot2), r2, (H3, T3)), whereH3 =

⊎H2(o)=〈a;L;Acc;Fd〉

(o 7→ 〈a; L;Acc\ot2;Fd〉)). Namely,

32 2009/5/5

str2 removes ot2 from R and W sets of all objects in H2.Consider the case when str1 changes an object o’s F but notitsR orW : taking str1 then str2 has the same state change astaking the two steps in reversed order because the two stepswork on different regions of the heap. If str1 also changesR or W of o, then it can only add ot1 to Acc of o, whilestr2 may only remove ot2 from R or W of o. Consequently,swapping the two steps we still get the same result becausethe set operations commute since ot1 6= ot2.Subcase d: r2 = (R-STEnd).

In this case str2 = ((H2, ot2), r2, (H3, T3)), whereH3 =

⊎H2(o)=〈a;L;Acc;Fd〉

(o 7→ 〈a; L;Acc\ot2;Fd〉). Namely,

str2 removes ot2 fromAcc sets of all objects inH2. Considerthe case when str1 changes an object o’s F but not its Acc:taking str1 then str2 has the same state change as taking thetwo steps in reversed order because the two steps work ondifferent regions of the heap. If str1 adds ot1 toR orW of o,swapping the two steps we still get the same result becausethe set operations commute since ot1 6= ot2.

Given this Lemma we can now directly prove the Quan-tized Atomicity Theorem.

Theorem 7 (Quantized Atomicity). Atomicity in Thm. 4holds for all paths p there exists a quantized path p′ suchthat p′ ≡ p.

Proof. Proceed by first sorting all pmsp’s of p into a wellordering induced by the ordering of their points in p. Writepmsp(i) for the i-th indexed pmsp in this ordering. Supposethat there are n pmsp’s in total in p for some n. We proceedby induction on n to show for all i ≤ n, p is equivalent to apath pi where the 1st to ith indexed pmsp’s in this orderinghave been bubbled to be quantized subpaths in a prefix ofpi: p ≡ pi = pmsp(1) . . . pmsp(i) . . . where pmsp(k) isquantized with k = 1 . . . i . With this fact, for i = n we havep ≡ pn = pmsp(1) . . . pmsp(n) where pmsp(k) is quantizedwith k = 1 . . . n, proving the result.

The base case n = 0 is trivial since the path is empty.Assume by induction that all pmsp(i) for i < n have beenbubbled to be quantized subpaths and the bubbled path pi =pmsp(1) . . . pmsp(i) . . . where pmsp(k) is quantized withk = 1 . . . i, has the property pi ≡ p. Then, the path aroundpmsp(i+1) includes steps of pmsp(i+1) or pmsp’s with big-ger indices. By repeated applications of the Bubble-DownLemma, all these local steps that do not belong to pmsp(i+1)can be pushed down past its point, defining a new path pi+1.In this path pmsp(i + 1) is also now a quantized subpath,and pi+1 ≡ p because pi ≡ p and the Bubble-Down lemmawhich turns pi to pi+1 does not shuffle any nonlocal steps sopi ≡ pi+1.

33 2009/5/5