scalable and precise dynamic datarace detection for structured parallelism

48
Scalable and Precise Dynamic Datarace Detection for Structured Parallelism Raghavan Raman Jisheng Zhao Vivek Sarkar Rice University June 13, 2012 Martin Vechev Eran Yahav ETH Zürich Technion

Upload: langer

Post on 22-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Scalable and Precise Dynamic Datarace Detection for Structured Parallelism. Raghavan Raman Jisheng Zhao Vivek Sarkar Rice University. Martin Vechev Eran Yahav ETH Z ü rich Technion. June 13, 2012. Parallel Programming. Parallel programming is inherently hard - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

Scalable and Precise Dynamic Datarace Detection

for Structured Parallelism

Raghavan Raman Jisheng Zhao Vivek SarkarRice University

June 13, 2012

Martin Vechev Eran YahavETH Zürich Technion

Page 2: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

2

Parallel Programming• Parallel programming is inherently hard– Need to reason about large number of

interleavings

• Dataraces are a major source of errors– Manifest only in some of the possible schedules– Hard to detect, reproduce, and correct

Page 3: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

3

Limitations of Past Work• Worst case linear space and time overhead per

memory access• Report false positives and/or false negatives• Dependent on scheduling techniques• Require sequentialization of input programs

Page 4: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

4

Structured Parallelism• Trend in newer programming languages: Cilk,

X10, Habanero Java (HJ), ...– Simplifies reasoning about parallel programs– Benefits: deadlock freedom, simpler analysis

• Datarace detection for structured parallelism– Different from that for unstructured parallelism– Logical parallelism is much larger than number of

processors

Page 5: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

5

Contribution• First practical datarace detector which is

parallel with constant space overhead– Scalable– Sound and Precise

Page 6: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

6

Structured Parallelism in X10, HJ• async <stmt>– Creates a new task that executes <stmt>

• finish <stmt>– Waits for all tasks spawned in <stmt> to complete

Page 7: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

7

SPD3: Scalable Precise Dynamic Datarace Detection

• Identifying parallel accesses– Dynamic Program Structure Tree (DPST)

• Identifying interfering accesses– Access Summary

Page 8: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

8

Dynamic Program Structure Tree (DPST)

• Maintains parent-child relationships among async, finish, and step instances– A step is a maximal sequence of statements with

no async or finish

Page 9: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

9

DPST Examplefinish { // F1

F1

Page 10: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

10

DPST Examplefinish { // F1

S1; F1

S1

Page 11: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

11

DPST Examplefinish { // F1

S1;async { // A1

}

F1

A1S1

Page 12: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

12

DPST Examplefinish { // F1

S1;async { // A1

} S5;

F1

A1S1 S5

Page 13: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

13

DPST Examplefinish { // F1

S1;async { // A1

async { // A2

}} S5;

F1

A1

A2

S1 S5

Page 14: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

14

DPST Examplefinish { // F1

S1;async { // A1

async { // A2

}async { // A3

}

} S5;

F1

A1

A3A2

S1 S5

Page 15: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

15

DPST Examplefinish { // F1

S1;async { // A1

async { // A2

}async { // A3

} S4;

} S5;

F1

A1

A3A2

S1

S4

S5

Page 16: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

16

DPST Examplefinish { // F1

S1;async { // A1

async { // A2S2;

}async { // A3

} S4;

} S5;

F1

A1

A3A2

S1

S2

S4

S5

Page 17: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

17

DPST Examplefinish { // F1

S1;async { // A1

async { // A2S2;

}async { // A3

S3;}

S4; } S5;

F1

A1

A3A2

S1

S2 S3

S4

S5

Page 18: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

18

DPST Examplefinish { // F1

S1;async { // A1

async { // A2S2;

}async { // A3

S3;}

S4; } S5;async { // A4

S6;}

}

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Page 19: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

19

DPST Examplefinish { // F1

S1;async { // A1

async { // A2S2;

}async { // A3

S3;}

S4; } S5;async { // A4

S6;}

}

F1

A1 A4

A3

Left-to-right ordering of children

A2

S1

S2 S3

S4

S5

S6

1: 2: 3: 4: 5: 6: 7: 8: 9:10:11:12:13:14:15:16:

Page 20: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

20

DPST Operations• InsertChild (Node n, Node p)– O(1) time– No synchronization needed

• DMHP (Node n1, Node n2)– O(H) time• H = height(LCA(n1, n2))

– DMHP = Dynamic May Happen in Parallel

Page 21: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

21

Identifying Parallel Accesses using DPST

DMHP (S1, S2)

1) L = LCA (S1, S2)2) C = child of L that is

ancestor of S1

3) If C is async return true

Else return false

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Page 22: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

22

Identifying Parallel Accesses using DPST

DMHP (S1, S2)

1) L = LCA (S1, S2)2) C = child of L that is

ancestor of S1

3) If C is async return true

Else return false

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Page 23: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

23

Identifying Parallel Accesses using DPST

DMHP (S1, S2)

1) L = LCA (S1, S2)2) C = child of L that is

ancestor of S1

3) If C is async return true

Else return false

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

LCA (S3, S6)

Page 24: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

24

Identifying Parallel Accesses using DPST

DMHP (S1, S2)

1) L = LCA (S1, S2)2) C = child of L that is

ancestor of S1

3) If C is async return true

Else return false

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Child of F1 that is ancestor of S3

Page 25: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

25

Identifying Parallel Accesses using DPST

DMHP (S1, S2)

1) L = LCA (S1, S2)2) C = child of L that is

ancestor of S1

3) If C is async return true

Else return false

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

A1 is an async => DMHP (S3, S6) = true

Page 26: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

26

Identifying Parallel Accesses using DPST

DMHP (S1, S2)

1) L = LCA (S1, S2)2) C = child of L that is

ancestor of S1

3) If C is async return true

Else return false

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Page 27: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

27

Identifying Parallel Accesses using DPST

DMHP (S1, S2)

1) L = LCA (S1, S2)2) C = child of L that is

ancestor of S1

3) If C is async return true

Else return false

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

LCA (S5, S6)

Page 28: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

28

Identifying Parallel Accesses using DPST

DMHP (S1, S2)

1) L = LCA (S1, S2)2) C = child of L that is

ancestor of S1

3) If C is async return true

Else return false

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Child of F1 that is ancestor of S5

Page 29: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

29

Identifying Parallel Accesses using DPST

DMHP (S1, S2)

1) L = LCA (S1, S2)2) C = child of L that is

ancestor of S1

3) If C is async return true

Else return false

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

S5 is NOT an async => DMHP (S5, S6) = false

Page 30: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

30

Access SummaryProgram Memory

M Ms.w Ms.r1 Ms.r2

Shadow Memory… …

Page 31: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

31

Access SummaryProgram Memory

M Ms.w Ms.r1 Ms.r2

Shadow Memory

A Step Instance that Wrote M

Page 32: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

32

Access SummaryProgram Memory

M Ms.w Ms.r1 Ms.r2

Shadow Memory

Two Step Instances that Read M

Page 33: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

33

Access Summary Operations

• WriteCheck (Step S, Memory M)– Check for access that interferes with a write of M by S

• ReadCheck (Step S, Memory M)– Check for access that interferes with a read of M by S

Page 34: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

34

SPD3 Example

= M = M

= M M =

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Page 35: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

35

SPD3 Example

= M = M

= M M =

Executing Step

Ms.r1 Ms.r2 Ms.w

S1 null null null

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Page 36: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

36

SPD3 Example

= M = M

= M M =

Executing Step

Ms.r1 Ms.r2 Ms.w

S1 null null null

S4 (Read M) S4 null null

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Update Ms.r1

Page 37: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

37

SPD3 Example

= M = M

= M M =

Executing Step

Ms.r1 Ms.r2 Ms.w

S1 null null null

S4 (Read M) S4 null null

S3 (Read M) S4 S3 null

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Update Ms.r2

Page 38: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

38

SPD3 Example

= M = M

= M M =

Executing Step

Ms.r1 Ms.r2 Ms.w

S1 null null null

S4 (Read M) S4 null null

S3 (Read M) S4 S3 null

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

S3, S4 stand for subtree under LCA(S3,S4)

Page 39: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

39

SPD3 Example

= M = M

= M M =

Executing Step

Ms.r1 Ms.r2 Ms.w

S1 null null null

S4 (Read M) S4 null null

S3 (Read M) S4 S3 null

S2 (Read M) S4 S3 null

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

S2 is in the subtree under LCA(S3, S4)

=> Ignore S2

Page 40: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

40

SPD3 Example

= M = M

= M M =

Executing Step

Ms.r1 Ms.r2 Ms.w

S1 null null null

S4 (Read M) S4 null null

S3 (Read M) S4 S3 null

S2 (Read M) S4 S3 null

S6 (Write M)

F1

A1 A4

A3A2

S1

S2 S3

S4

S5

S6

Report a Read-Write Datarace between steps S4 and S6

Page 41: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

41

SPD3 Algorithm• At async, finish, and step boundaries– Update the DPST

• On every access to a memory M, atomically– Read the fields of its shadow memory, Ms

– Perform ReadCheck or WriteCheck as appropriate– Update the fields of Ms, if necessary

Page 42: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

42

Space Overhead

• DPST: O(a+f)– ‘a’ is the number of async instances– ‘f’ is the number of finish instances

• Shadow Memory: O(1) per memory location

Page 43: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

43

Empirical Evaluation• Experimental Setup– 16-core (4x4) Intel Xeon 2.4GHz system

• 30 GB memory• Red Hat Linux (RHEL 5)• Sun Hotspot JDK 1.6

– All benchmarks written in HJ using only Finish/Async constructs• Executed using the adaptive work-stealing runtime

– SPD3 algorithm • Implemented in Java with static optimizations

Page 44: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

44

SPD3 is Scalable

Crypt

LUFa

ct

MolDyn

MonteCarlo

RayTrac

erSe

ries

SOR

Spars

eMatMult FF

THealt

h

Nqueens

Strass

en

Fannku

ch

Mandelbrot

Matmul

GeoMean0.002.004.006.008.00

10.0012.0014.0016.0018.00

1-thread 2-thread 4-thread 8-thread 16-thread

Slow

dow

n re

lativ

e to

resp

ectiv

e (w

.r.t.

num

ber o

f thr

eads

) uni

nstr

umen

ted

Page 45: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

45

Estimated Peak Heap Memory Usageon 16-threads

Crypt

LUFa

ct

MolDyn

MonteCarlo

RayTrac

erSe

ries

SOR

Spars

e0

100020003000400050006000700080009000

10000

Eraser FastTrack SPD3

Estim

ated

Pea

k He

ap M

emor

y U

sage

(M

B)

Page 46: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

46

Estimated Peak Heap Memory Usage LUFact Benchmark

1 2 4 8 160

500

1000

1500

2000

2500

3000

Eraser FastTrack2 SPD3

Number of threads

Estim

ated

Pea

k He

ap M

emor

y U

sage

(M

B)

Page 47: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

47

Related Work: A ComparisonProperties OTFDAA Offset-

SpanSP-bags SP-hybrid FastTrack ESP-bags SPD3

Target Language Nested Fork-Join & Synchronization

operations

Nested Fork-Join

Spawn-Sync

Spawn-Sync

Unstructured Fork-Join

Async-Finish

Async-Finish

Space Overhead per memory location

O(n) O(1) O(1) O(1) O(N) O(1) O(1)

Guarantees Per-Schedule Per-Input Per-Input Per-Input Per-Input Per-Input Per-Input

Empirical Evaluation No Minimal Yes No Yes Yes Yes

Execute Program in Parallel

Yes Yes No Yes Yes No Yes

Dependent on Scheduling technique

No No No Yes No No No

OTFDAA – On the fly detection of access anomalies (PLDI ’89)n – number of threads executing the programN – maximum logical concurrency in the program

Page 48: Scalable and Precise Dynamic  Datarace  Detection for Structured Parallelism

48

Summary• First practical datarace detector which is

parallel with constant space overhead– Dynamic Program Structure Tree– Access Summary