analysis of multithreaded programs martin rinard laboratory for computer science massachusetts...
Post on 30-Dec-2015
215 Views
Preview:
TRANSCRIPT
Analysis of Multithreaded Programs
Martin RinardLaboratory for Computer Science
Massachusetts Institute of Technology
What is a multithreaded program?
Multiple ParallelThreads Of Control
SharedMutableMemory
read write
Lock Acquire and Release
NOT general parallel programs
No message passing
No tuple spaces
No functional programs
No concurrent constraint programs
NOT just multiple threads of control
No continuations
No reactive systems
Why do programmers use threads?
• Performance (parallel computing programs)• Single computation• Execute subcomputations in parallel• Example: parallel sort
• Program structuring mechanism (activity management programs)
• Multiple activities• Thread for each activity• Example: web server
• Properties have big impact on analyses
Practical Implications
• Threads are useful and increasingly common• POSIX threads standard for C, C++• Java has built-in thread support • Widely used in industry
• Threads introduce complications• Programs viewed as more difficult to
develop• Analyses must handle new model of
execution• Lots of interesting and important problems!
Outline
• Examples of multithreaded programs• Parallel computing program• Activity management program
• Analyses for multithreaded programs• Handling data races• Future directions
Parallel Sort
Example - Divide and Conquer Sort
47 6 1 53 8 2
8 2536 147
Example - Divide and Conquer Sort
47 6 1 53 8 2
Divide
2 8531 674
8 2536 147
47 6 1 53 8 2
Example - Divide and Conquer Sort
Conquer
Divide
Example - Divide and Conquer Sort
2 8531 674 Conquer
8 2536 147 Divide
47 6 1 53 8 2
41 6 7 32 5 8Combine
Example - Divide and Conquer Sort
2 8531 674 Conquer
8 2536 147 Divide
47 6 1 53 8 2
41 6 7 32 5 8Combine
21 3 4 65 7 8
Divide and Conquer Algorithms
• Lots of Recursively Generated Concurrency• Solve Subproblems in Parallel
Divide and Conquer Algorithms
• Lots of Recursively Generated Concurrency• Recursively Solve Subproblems in
Parallel
Divide and Conquer Algorithms
• Lots of Recursively Generated Concurrency• Recursively Solve Subproblems in
Parallel• Combine Results in Parallel
“Sort n Items in d, Using t as Temporary Storage”
void sort(int *d, int *t, int n) if (n > CUTOFF) {
spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn
merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Sort n Items in d, Using t as Temporary Storage”
void sort(int *d, int *t, int n) if (n > CUTOFF) {
spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn
merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
Divide array into subarrays and recursively sort subarrays in
parallel
“Sort n Items in d, Using t as Temporary Storage”
void sort(int *d, int *t, int n) if (n > CUTOFF) {
spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn
merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
Subproblems Identified
Using Pointers Into Middle of Array
47 6 1 53 8 2d
d+n/4d+n/2
d+3*(n/4)
“Sort n Items in d, Using t as Temporary Storage”
void sort(int *d, int *t, int n) if (n > CUTOFF) {
spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn
merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
74 1 6 53 2 8d
d+n/4d+n/2
d+3*(n/4)
Sorted Results Written Back Into
Input Array
“Merge Sorted Quarters of d Into Halves of t”
void sort(int *d, int *t, int n) if (n > CUTOFF) {
spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn
merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
74 1 6 53 2 8
41 6 7 32 5 8
d
tt+n/2
“Merge Sorted Halves of t Back Into d”void sort(int *d, int *t, int n) if (n > CUTOFF) {
spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn
merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
21 3 4 65 7 8
41 6 7 32 5 8
d
tt+n/2
“Use a Simple Sort for Small Problem Sizes”
void sort(int *d, int *t, int n) if (n > CUTOFF) {
spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn
merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
47 6 1 53 8 2
dd+n
“Use a Simple Sort for Small Problem Sizes”
void sort(int *d, int *t, int n) if (n > CUTOFF) {
spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/4),t+2*(n/4),n/4);spawn
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn
merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
47 1 6 53 8 2
dd+n
Key Properties of Parallel Computing Programs
• Structured form of multithreading• Parallelism confined to small region • Single thread coming in• Multiple threads exist during computation• Single thread going out
• Deterministic computation• Tasks update disjoint parts of data
structure in parallel without synchronization
• May also have parallel reductions
Web Server
Accept newconnection
Start newclient thread
Main Loop
Client Threads
Accept newconnection
Start newclient thread
Main Loop
Client Threads
Accept newconnection
Start newclient thread
Main Loop
Client Threads
Wait for input
Produce output
Accept newconnection
Start newclient thread
Main Loop
Client Threads
Wait for input
Produce output
Accept newconnection
Start newclient thread
Main Loop
Client Threads
Wait for input
Produce output
Wait for input
Accept newconnection
Start newclient thread
Main Loop
Client Threads
Wait for input
Produce output
Wait for input
Accept newconnection
Start newclient thread
Main Loop
Client Threads
Wait for input
Produce output
Wait for inputWait for input
Produce output
Accept newconnection
Start newclient thread
Main Loop
Wait for input
Produce output
Wait for input
Produce output
Wait for input
Produce output
Client Threads
Main Loop
Class Main { static public void loop(ServerSocket
s) { c = new Counter();
while (true) { Socket p = s.accept();
Worker t = new Worker(p,c); t.start();
}}
Accept newconnection
Start newclient thread
Worker threads
class Worker extends Thread { Socket s; Counter c; public void run() { out = s.getOutputStream(); in = s.getInputStream();
while (true) { inputLine = in.readLine();
c.increment(); if (inputLine == null) break;
out.writeBytes(inputLine + "\n");
} }}
Wait for input
Increment counter
Produce output
Synchronized Shared Counter
Class Counter {
int contents = 0;
synchronized void increment() {
contents++;
}
}
Acquire lock
Increment counter
Release lock
Simple Activity Management Programs
• Fixed, small number of threads• Based on functional decomposition
User InterfaceThread
Device Managemen
t Thread
Compute Thread
Key Properties of Activity Management Programs
• Threads manage interactions• One thread per client or activity• Blocking I/O for interactions
• Unstructured form of parallelism • Object is unit of sharing
• Mutable shared objects (mutual exclusion) • Private objects (no synchronization)• Read shared objects (no synchronization)• Inherited objects passed from parent to
child
Why analyze multithreaded programs?
Discover or certify absence of errors(multithreading introduces new kinds of errors)
Discover or verify application-specific properties(interactions between threads complicate analysis)
Enable optimizations(new kinds of optimizations with multithreading)
(complications with traditional optimizations)
Classic Errors in Multithreaded Programs
DeadlocksData Races
Deadlock
Thread 1:
lock(l);
lock(m);
x = x + y;
unlock(m);
unlock(l);
Thread 2:
lock(m);
lock(l);
y = y * x;
unlock(l);
unlock(m);
Deadlock if circular waiting for resources (typically mutual exclusion locks)
Deadlock
Threads 1 and 2 Start Execution
Thread 1:
lock(l);
lock(m);
x = x + y;
unlock(m);
unlock(l);
Thread 2:
lock(m);
lock(l);
y = y * x;
unlock(l);
unlock(m);
Deadlock if circular waiting for resources (typically mutual exclusion locks)
Deadlock
Thread 1 acquires lock l
Thread 1:
lock(l);
lock(m);
x = x + y;
unlock(m);
unlock(l);
Thread 2:
lock(m);
lock(l);
y = y * x;
unlock(l);
unlock(m);
Deadlock if circular waiting for resources (typically mutual exclusion locks)
Deadlock
Thread 2 acquires lock m
Thread 1:
lock(l);
lock(m);
x = x + y;
unlock(m);
unlock(l);
Thread 2:
lock(m);
lock(l);
y = y * x;
unlock(l);
unlock(m);
Deadlock if circular waiting for resources (typically mutual exclusion locks)
Deadlock
Thread 1 holds l and waits for m
while
Thread 2 holds m and waits for l
Thread 1:
lock(l);
lock(m);
x = x + y;
unlock(m);
unlock(l);
Thread 2:
lock(m);
lock(l);
y = y * x;
unlock(l);
unlock(m);
Deadlock if circular waiting for resources (typically mutual exclusion locks)
Data Races
A[i] = v;
A[j] = w;
||
A[j] = w
A[i] = v
Data race
Data race if two parallel threads access same memory location and at least one access is a
write
A[j] = w
A[i] = v
No data race
Synchronization and Data Races
Thread 1:
lock(l);
x = x + 1;
unlock(l);
Thread 2:
lock(l);
x = x + 2;
unlock(l);
No data race if synchronization separates accesses
Synchronization protocol: Associate lock with data
Acquire lock to update data atomically
Why are data races errors?
• Exist correct programs which contain races
• But most races are programming errors• Code intended to execute atomically• Synchronization omitted by mistake
• Consequences can be severe• Nondeterministic, timing-dependent
errors• Data structure corruption• Complicates analysis and optimization
Overview of Analyses for Multithreaded Programs
Key problem: interactions between threads
• Flow-insensitive analyses• Escape analyses • Dataflow analyses
• Explicit parallel flow graphs• Interference summary analysis
• State space exploration
Escape Analyses
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation Sites
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation Sites
Correlate lifetimes of objectswith lifetimes of computations
void compute(d,e) ———— ———— ————
void multiplyAdd(a,b,c) ————————— ————————— —————————
void multiply(m) ———— ———— ————
void add(u,v) —————— ——————
void main(i,j) ——————— ——————— ———————
void evaluate(i,j) —————— —————— ——————
void abs(r) ———— ———— ————
void scale(n,m) —————— ——————
Program With Allocation Sites
Correlate lifetimes of objectswith lifetimes of computations
Objects allocated
at this site
Do not escapecomputation of
this method
Classical Approach
• Reachability analysis• If an object is reachable only from local
variables of current procedure, then object does not escape
that procedure
Escape Analysis for Multithreaded Programs
• Extend analysis to recognize when objects do not escape to parallel thread – OOPSLA 1999• Blanchet• Bogda, Hoelzle• Choi, Bupta, Serrano, Sreedhar, Midkiff• Whaley, Rinard
• Analyze interactions to recapture objects that do not escape multithreaded subcomputation• Salcianu, Rinard – PPoPP 2001
Applications
• Synchronization elimination• Stack allocation • Region-based allocation• Data race detection
Eliminate accesses to captured objects as source of data races
Analysis via Parallel Flow Graphs
Parallel Flow Graphs
p = &x
*p = &y
p = &z
q = &a
*q = &b
x y
p z
q a b
Thread 1 Thread 2
Intrathread control-flow edges
Interthread control-flow edges
Heap
Basic Idea: Do dataflow analysis on parallel flow graph
Infeasible Paths Issue
p = &x
*p = &y
p = &z
q = &a
*q = &b
Thread 1 Thread 2
Infeasible Path
x y
p z
q a b
Heap
Infeasible paths cause analysis to lose precision
Because of infeasible path, analysis thinks x z
Analysis Time IssuePotential Solutions
• Partial Order Approaches
p = &x
*p = &y
p = &z
q = &a
*q = &b
Thread 1 Thread 2
Analysis Time IssuePotential Solutions
• Partial Order Approaches – remove edges between statements
in independent regionsp = &x
*p = &y
p = &z
q = &a
*q = &b
Thread 1 Thread 2
Analysis Time IssuePotential Solutions
• Partial Order Approaches – remove edges between statements
in independent regions• How to recognize
independent regions?• Seems like might
need analysis…
p = &x
*p = &y
p = &z
q = &a
*q = &b
Thread 1 Thread 2
Potential Solutions• Partial Order Approaches• Control flow/synchronization analysis
• Synchronization may prevent m from immediately preceding n in execution
• If so, no edge from m to n
No edges between these
statements
y = 1lock(a)y = y + wx = x + 1unlock(a)
x = 1lock(a)x = x + vy = y + 1unlock(a)
Analysis Time Issue
Experience
• Lots of research in field over last two decades• Deadlock detection• Data race detection• Control analysis for multithreaded programs
(mutual exclusion, precedence properties)• Finite-state properties
• Scope – simple activity management programs• Inlinable programs• Bounded threads and objects
References
•FLAVERS•Dwyer, Clarke - FSE 1994•Naumovich, Avrunin, Clarke – FSE 1999•Naumovich, Clarke, Cobleigh – PASTE 1999
•Masticola, Ryder •ICPP 1990 (deadlock detection)•PPoPP 1993 (control-flow analysis)
•Duesterwald, Soffa - TAV 1991•Handles procedures
•Blieberger, Burgstaller, Scholz – Ada Europe 2000
•Symbolic analysis for dynamic thread creation
Scope•Inlinable
programs•Bounded objects
and threads
Interference Approaches
Dataflow Analysis for Bitvector Problems
• Knoop, Steffen, Vollmer – TOPLAS 1996• Bitvector problems
• Dataflow information is a vector of bits• Transfer function for one bit does not
depend on values of other bits• Examples
•Reaching definitions•Available expressions
• As efficient and precise as sequential version!
Available Expressions Example
a = x + y
c = x + y
x = b
b = x + y
d = x + y
Where is x+y available?
Available here!
parbegin
parend
Available here!
Not available here (killed by x = b)
???
Three Interleavings
a = x + y
c = x + y
x = b
b = x + y
d = x + y
Available here!
a = x + y
c = x + y
x = b
b = x + y
d = x + y
a = x + y
c = x + y
x = b
b = x + y
d = x + y
Not available here (killed by x = b)
Available here!
Available Expressions Example
a = x + y
c = x + y
x = b
b = x + y
d = x + y
Where is x+y available?
Available here!
Not available here (killed by x = b)
Not available here (killed by x = b)
parbegin
parend
Available here!
Available here!
Key Concept: Interference
•x=b interferes with x+y•x+y not available at any
statement that executes in parallel with x=b
•Nice algorithm:• Precompute
interference• Propagate information
along sequential control-flow edges only!
• Handle parallel joins specially
a = x + y
c = x + y
x = b
b = x + y
d = x + y
parbegin
parend
Limitations
• No procedures• Bitvector problems only (no pointer analysis)• But can remove these limitations
• Integrate interference into abstraction•Adjust rules to flow information from end
of thread to start of parallel threads•Iteratively compute interactions
• Summary-based approach for procedures • Lose precision for non-bitvector problems
k = j
Pointer Analysis for Multithreaded Programs
• Dataflow information is a triple <C, I, E> :• C = current points-to information
• I = interference points-to edges from parallel threads
• E = set of points-to edges created by current thread
• Interference: Ik = U Ej
where t1 … tn are n parallel threads
• Invariant: I C
• Within each thread, interference points-to edges are always added to the current information
Analysis for Example
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
Analysis for Example
parbegin
parend
p = &x;
p = &y;*p = 1;
Where does p point to at this
statement?
Where does p point to at this
statement?*p = 2;
Analysis for Example
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
p x , , > < p x
Analysis of Parallel Threads
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
, , > < p x
p x , , > < p x
, , > < p x
Analysis of Parallel Threads
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
, , > < p x
p x , , > < p x
, , > < p x
, , > < p x
Analysis of Parallel Threads
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
, , > <
p y , , > < p y
p x
p x , , > < p x
, , > < p x
, , > < p x
Analysis of Parallel Threads
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
, , > <
p y , , > < p y
p x
p x , , > < p x
, , > < p x
, , > < p x
Analysis of Parallel Threads
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
, , > <
p y , , > < p y
p x
p x , , > < p x
p , > < x
yp y,
Analysis of Parallel Threads
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
, , > <
p y , , > < p y
p x
p x , , > < p x
p , > < x
yp y,
p > < x
yp y , ,
Analysis of Parallel Threads
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
, , > <
p y , , > < p y
p x
p x , , > < p x
p , > < x
yp y,
p , > < x
yp y,
Analysis of Thread Joins
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
p, , > < x
y
p y , , > < p y
p y
p , > < x
yp y,
p x , , > < p x
, , > < p xp , > < x
yp y,
Analysis of Thread Joins
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
p, , > < x
y
p y , , > < p y
p y
p , > < x
yp y,
p x , , > < p x
, , > < p xp , > < x
yp y,
Final Result
parbegin
parend
p = &x;
p = &y;*p = 1;
*p = 2;
, , > <
p, , > < x
y
p y , , > < p y
p x
p y
p x , , > < p x
p , > < x
yp y,
p , > < x
yp y,
General Dataflow Equations
parbegin
parend
Parent Thread
Thread 2Thread 1
Parent Thread
C E, I , > <
C U E2 , I U E2 , > <
U
C U E1 , I U E1 , < >
C1 E2, I U E1 , > <
C1 C2 E U E1 U E2, I , > <
C1 E1, I U E2 , > <
General Dataflow Equations
parbegin
parend
Thread 2Thread 1
Parent Thread
C U E2 , I U E2 , > < C U E1 , I U E1 , < >
C2 E2, I U E1 , > < C1 E1, I U E2 , > <
U
C1 C2 E U E1 U E2, I , > <
Parent Thread
C E, I , > <
General Dataflow Equations
parbegin
parend
Thread 2Thread 1
Parent Thread
C U E2 , I U E2 , > < C U E1 , I U E1 , < >
C2 E2, I U E1 , > < C1 E1, I U E2 , > <
U
C1 C2 E U E1 U E2, I , > <
Parent Thread
C E, I , > <
Compositionality Extension
• Compositional at thread level• Analyze each thread once in
isolation• Abstraction captures potential
interactions• Compute interactions whenever need
information• Combine with escape analysis to
obtain partial program analysis
Experience & Expectations
• Limited implementation experience• Pointer analysis (Rugina, Rinard – PLDI
2000)• Compositional pointer and escape
analysis (Salcianu, Rinard – PPoPP 2001)• Small but real programs
• Promising approach• Scales like analyses for sequential
programs• Partial program analyses
Issues
• Developing abstractions• Need interference abstraction• Need fork/join rules• Need interaction analysis
• Analysis time• Precision for richer abstractions
State Space Exploration
State Space Exploration for Multithreaded Programs
Thread 1:
lock(a)lock(b)t = xx = yy = tunlock(b)unlock(a)
Thread 2:
lock(b)lock(a)s = yy = xx = sunlock(a)unlock(b)
/* a controls x, b controls y */
lock a, b;int x, y;
State Space Exploration
2: lock(b)1: lock(b) 2: lock(b)1: lock(a)
1: lock(a) 2: lock(b)
DeadlockedStates
Strengths
• Conceptually simple (at least at first…)• Harmony with other areas of computer
science(simple search often beats more sophisticated approaches)
• Can test for lots of properties and errors• Lots of technology and momentum in this
area• Packaged model checkers• Big successes in hardware verification
Challenges• Analysis time• Unbounded program features
• Dynamic thread creation• Dynamic object creation
• Potential solutions• Sophisticated abstractions (increases
complexity…)• Cousot, Cousot - 1984• Chow, Harrison – POPL 1992• Yahav – POPL 2001
• Granularity coarsening/partial-order techniques• Chow, Harrison – ICCL 1994• Valmari – CAV 1990• Godefroid, Wolper – LICS 1991
Granularity Coarsening
x = 1
y = 2
a = 3
b = 4
x = 1
y = 2
x = 1
y = 2
a = 3
b = 4
a = 3
b = 4
x = 1
y = 2
a = 3
b = 4
Basic Idea:Eliminate Analysis ofInterleavings from
Independent Statements
Issue: Aliasing
x = 1 *p = 3
Are these two statements independent?
Depends…
Potential Solution: Layered analysis (Ball, Rajamani - PLDI 2001)
Potential Problem: Information from later analyses may be needed or useful in previous analyses
ModelExtraction
ModelChecking
PointerAnalysis
PropertiesProgram
Experience
• Program analysis style• Has been used for very detailed
properties• Analysis time issues limit to tiny
programs• Explicit model extraction/model checking
style• Still exploring how to work for software in
general, not just multithreaded programs• No special technology required for
multithreaded programs (at first …)
Expectations
In principle, approach should be quite useful
• Multithreaded programs typically have sparse interaction patterns
• Just not obvious from code• Need some way to target tool to only
those that can actually occur/are interesting
• Pointer preanalysis seems like promising approach
Application to safety problems
• Deadlock detection• Variety of existing approaches• Complex programs can have very
simple synchronization behavior• Ripe for model extraction/model
checking• Data race detection
• More complicated problem• Largely unsolved• Very important in practice
Why data races are so important
• Inadvertent atomicity violations• Timing-dependent data structure corruption• Nondeterministic, irreproducible failures
• Architecture effects• Data races expose weak memory consistency
models• Destroy abstraction of single shared memory
• Compiler optimization effects• Data races expose effect of standard optimizations• Compiler can change meaning of program
• Analysis complications
Atomicity Violations
class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {
list n = new list(i);n.next = head;head = n;length++;
}}
1length
head
4
Atomicity Violations
class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {
list n = new list(i);n.next = head;head = n;length++;
}}
1length
head
4
insert(5)
insert(6)
||
Atomicity Violations
class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {
list n = new list(i);n.next = head;head = n;length++;
}}
1length
head
4
insert(5)
insert(6)
||
5
6
Atomicity Violations
class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {
list n = new list(i);n.next = head;head = n;length++;
}}
1length
head
4
insert(5)
insert(6)
||
5
6
Atomicity Violations
class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {
list n = new list(i);n.next = head;head = n;length++;
}}
1length
head
4
insert(5)
insert(6)
||
5
6
Atomicity Violations
class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {
list n = new list(i);n.next = head;head = n;length++;
}}
2length
head
4
insert(5)
insert(6)
||
5
6
Atomicity Violations
class list { static int length=0;static list head = null;list next; int value;static void insert(int i) {
list n = new list(i);n.next = head;head = n;length++;
}}
3length
head
4
insert(5)
insert(6)
||
5
6
Atomicity Violation Solution
class list { static int length=0;static list head = null;list next; int value;
Synchronized
static void insert(int i) { list n = new list(i);n.next = head;head = n;length++;
}}
2length
head
4
insert(5)
insert(6)
||
5
6
Analysis Complications
Analysis unsound if does not take effect of data races into account
• Desirable to analyze program at granularity of atomic operations• Reduces state space• Required to extract interesting properties
• But must verify that operations are atomic!• Complicated analysis problem
• Extract locking protocol• Verify that program obeys protocol
Architecture Effects
Weak Memory Consistency Models
y=0
x=1
z = x+y
Initially:y=1x=0
Thread 2Thread 1
What is value of z?
y=0
x=1
z = x+y
Initially:y=1x=0
Thread 2Thread 1
What is value of z?
y=0
x=1
z = x+y y=0
x=1
z = x+y
y=0
x=1
z = x+y
z = 1 z = 0
z = 1
Three Interleavings
y=0
x=1
z = x+y
Initially:y=1x=0
Thread 2Thread 1
What is value of z?
y=0
x=1
z = x+y y=0
x=1
z = x+y
y=0
x=1
z = x+y
z = 1 z = 0
z = 1
Three Interleavings
z can be 0 or 1
y=0
x=1
z = x+y
Initially:y=1x=0
Thread 2Thread 1
What is value of z?
y=0
x=1
z = x+y y=0
x=1
z = x+y
y=0
x=1
z = x+y
z = 1 z = 0
z = 1
Three Interleavings
z can be 0 or 1INCORRECT
REASONING!
y=0
x=1
z = x+y
Initially:y=1x=0
Thread 2Thread 1
What is value of z?
z can be 0 or 1 OR 2!
Memory system can reorder writes as long as it preserves
illusion of sequential executionwithin each thread!
z = x+y
y=0
x=1
Different threads can observedifferent orders!
Analysis Complications
• Interleaving semantics is incorrect• No soundness guarantee for current
analyses• Formal semantics of weak memory
consistency models still under developmentMaessen, Arvind, Shen – OOPSLA 2000Manson, Pugh – Java Grande/ISCOPE 2001
• Unclear how to prove ANY analysis sound…
• State space is larger than one might think• Complicates state space exploration• Complicates human reasoning
How does one write a correct program?
y=0
z = x+y
Initially:y=1x=0
Thread 2Thread 1
What is value of z?
Operations not reorderedacross synchronizations
x=1
If synchronization separates conflicting actions from
parallel threads
Then reorderings not visible
Race-free programs can useinterleaving semantics
z is 1
lock(l)
unlock(l)
lock(l)
unlock(l)
Compiler Optimization Effects
• Standard optimizations assume single thread
• With interleaving semantics, optimizations may change meaning of program
• Even if only apply optimizations within serial parts of program!• Superset of reordering effects • Midkiff, Padua – ICPP 1990
Options
• Rethink and reimplement all compilers• Lee, Padua, Midkiff – PPoPP 1999
• Transform program to restore sequential memory consistency model• Shasha, Snir – TOPLAS 1998• Lee, Padua – PACT 2000
• No optimizations across synchronizations• Java memory model (Pugh - JavaGrande
1999)• Semantics no longer interleaving semantics
Program AnalysisAnalyze program, verify absence of data races
• Appealing option• Unlikely to be feasible for full range of
programs• Reconstruct association between locks, data
that they protect, threads that access data•Dynamic object and thread creation•References and pointers•Diversity of locking protocols
• Whole-program analysis• Exception:
simple activity management programs
Eliminate races at language level
• Type system formalizes sharing patterns • Check accesses properly synchronized• Not as difficult as fully automatic approach
• Separate analysis of each module• No need to reconstruct locking protocol• Types provide locking information
• Limits sharing patterns program can use• Key question: Is limitation worth benefit?
• Depends on expressiveness, flexibility, intrusiveness, perceived value of system
Standard Sharing Patterns for Activity Management Programs
Private data - single thread ownership
Mutual exclusion datalock protects data, acquire lock to get ownership
Migrating dataOwnership moves between threads in response to data structure insertions and removals
Published data - distributed for read-only access
General Principle of Ownership
• Formalize as ownership relation• Relation between data items and
threads• Basic requirement for reads
• When a thread reads a data item• Must own item (but can share
ownership with other threads)• Basic requirement for writes
• When a thread writes data item• Must be sole owner of item
Typical Actions to Change Ownership
Object creation (creator owns new object)Synchronization operations
Lock acquire (acquire data that lock protects)Lock release (release data)Similarly for post/wait, Ada accept, …
Thread creation (thread inherits data from parent)
Thread termination (parent gets data back)Unique reference acquisition and release
(acquire or release referenced data)
Proposed Systems
• Monitors + copy in/copy out• Concurrent Pascal (Brinch Hansen TSE 1975)• Guava (Bacon, Strom, Tarafdar – OOPSLA
2000)
• Mutual exclusion data + private data• Flanagan, Abadi – ESOP 2000• Flanagan, Freund – PLDI 2000
• Mutual exclusion data + private data + linear/ownership types• de Line, Fahndrich – PLDI 2001• Boyapati, Rinard – OOPSLA 2001
Thread + Private Data
•Private data identified as such in type system
•Type system ensures reachable only from•Local variables•Other private data
Lock + Shared Data•Type system identifies correspondence
•Type system ensures• Threads hold lock
when access data• Data accessible only
from other data protected by same lock
Copy model of communicati
on
Basic Approach
Thread + Private Data
•Private data identified as such in type system
•Type system ensures reachable only from•Local variables•Other private data
Lock + Shared Data•Type system identifies correspondence
•Type system ensures• Threads hold lock
when access data• Data accessible only
from other data protected by same lock
Type system ensures at most one
reference to this object
Extension: Unique References
Thread + Private Data
•Private data identified as such in type system
•Type system ensures reachable only from•Local variables•Other private data
Lock + Shared Data•Type system identifies correspondence
•Type system ensures• Threads hold lock
when access data• Data accessible only
from other data protected by same lock
Step One:Grab Lock
Extension: Unique References
Thread + Private Data
•Private data identified as such in type system
•Type system ensures reachable only from•Local variables•Other private data
Lock + Shared Data•Type system identifies correspondence
•Type system ensures• Threads hold lock
when access data• Data accessible only
from other data protected by same lock
Step One:Grab Lock
Extension: Unique References
Thread + Private Data
•Private data identified as such in type system
•Type system ensures reachable only from•Local variables•Other private data
Lock + Shared Data•Type system identifies correspondence
•Type system ensures• Threads hold lock
when access data• Data accessible only
from other data protected by same lock
Step Two:Transfer Reference
Extension: Unique References
Thread + Private Data
•Private data identified as such in type system
•Type system ensures reachable only from•Local variables•Other private data
Lock + Shared Data•Type system identifies correspondence
•Type system ensures• Threads hold lock
when access data• Data accessible only
from other data protected by same lock
Step Three:Release Lock
Extension: Unique References
Thread + Private Data
•Private data identified as such in type system
•Type system ensures reachable only from•Local variables•Other private data
Lock + Shared Data•Type system identifies correspondence
•Type system ensures• Threads hold lock
when access data• Data accessible only
from other data protected by same lock
Result:Transferred
ObjectOwnership
Relation Changes Over Time
Extension: Unique References
Prospects
• Remaining challenge: general data structures• Objects with multiple references• Ownership changes correlated with
movements between to data structures• Recognize insertions and deletions
• Language-level solutions are the way to go for activity management programs• Tractable for typical sharing patterns• Big impact in practice
Benefits of ownership formalization
• Identification of atomic regions• Weak memory invisible to programmer• Enables coarse-grain program analysis
• Promote lots of new and interesting analyses• Component interaction analyses• Object propagation analyses
• Better understanding of software structure• Analysis and transformation• Software engineering
What about parallel computing programs?
Parallel Computing Sharing Patterns
Specialized Sharing Patterns• Unsynchronized accesses to disjoint
regions of a single aggregate structure• Threads update disjoint regions of
array• Threads update disjoint subtrees
• Generalized reductions• Commuting updates• Reduction trees
Parallel Computing Prospects
• No language-level solution likely to be feasible
• Race freedom depends on arbitrarily complicated properties of updated data structures
• Impact of data races not as large • Parallelism confined to specific algorithms
• Range of targeted analysis algorithms• Parallel loops with dense matrices• Divide and conquer programs• Generalized reduction recognition
Future Directions
Integrating Specifications
• Past focus: discovering properties• Future focus: verifying properties• Understanding atomicity structure crucial
• Assume race-free programs• Type system or previous analysis
• Enable Owicki/Gries style verification• Assume property holds• Show that each atomic action preserves
it• Consider only actions that affect property
Failure Containment
• Threads as unit of partial failure• Partial executions of failed atomic
actions• Rollback mechanism• Optimization opportunity
• New analyses and transformations• Failure propagation analysis• Failure response transformations
Model Checking
• Avalanche of model checking research• Layered analyses for model extraction• Flow-insensitive pointer analysis
• Initial focus on control problems• Deadlock detection• Operation sequencing constraints• Checking finite-state properties
Steps towards practicality
• Java threads prompt experimentation• Threads as standard part of safe
language• Available multithreaded benchmarks• Open Java implementation platforms
• More implementations• Interprocedural analyses• Scalability emerges as key concern• Directs analyses to relevant problems
Summary• Multithreaded programs common and important• Two kinds of multithreaded programs
• Parallel computing programs• Activity management programs
• Data races as key analysis problem• Programming errors• Complicate analysis and transformation
• Different solutions for different programs• Language solution for activity management• Targeted analyses for parallel computing
• Future directions – specifications, failure containment, model checking, practical implementations
top related