work-stealing tree data structure

58
Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar Prokopec Martin Odersky 1

Upload: aleksandar-prokopec

Post on 05-Jul-2015

372 views

Category:

Software


2 download

DESCRIPTION

The basics of the work-stealing tree data structure, presented at LCPC 2013.

TRANSCRIPT

Page 1: Work-stealing Tree Data Structure

Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

Aleksandar Prokopec Martin Odersky

1

Page 2: Work-stealing Tree Data Structure

Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

Aleksandar Prokopec Martin Odersky

Irregular Data-Parallel

2

Page 3: Work-stealing Tree Data Structure

Uniform workload

(0 until 10000000) reduce (+)

3

Page 4: Work-stealing Tree Data Structure

Uniform workload

(0 until 10000000) reduce (+)

sum = sum + x

4

Page 5: Work-stealing Tree Data Structure

Uniform workload

(0 until 10000000) reduce (+)

sum = sum + x

N

cycles

5

Page 6: Work-stealing Tree Data Structure

Baseline workload

for (0 until 10000000) {}

N

cycles

6

Page 7: Work-stealing Tree Data Structure

Irregular workload

7

Page 8: Work-stealing Tree Data Structure

Irregular workload

N

cycles

8

Page 9: Work-stealing Tree Data Structure

Irregular workload

for {

x <- 0 until width

y <- 0 until height

} image(x, y) = compute(x, y)

N

cycles

9

Page 10: Work-stealing Tree Data Structure

Irregular workload

for {

x <- 0 until width

y <- 0 until height

} image(x, y) = compute(x, y) image(x, y) = compute(x, y)

N

cycles

10

Page 11: Work-stealing Tree Data Structure

Workload function

workload(n) – work spent on element n after the data-parallel operation completed

11

Page 12: Work-stealing Tree Data Structure

Workload function

Could be…

Runtime value dependent

for {

x <- 0 until width

y <- 0 until height

} img(x, y) = compute(x, y)

workload(n) – work spent on element n after the data-parallel operation completed

12

Page 13: Work-stealing Tree Data Structure

Workload function

Could be…

Execution-schedule dependent

for (n <- nodes)

n.neighbours += new Node

workload(n) – work spent on element n after the data-parallel operation completed

13

Page 14: Work-stealing Tree Data Structure

Workload function

Could be…

Totally random for ((x, y) <- img.indices)

img(x, y) = sample(

x + random(),

y + random()

)

workload(n) – work spent on element n after the data-parallel operation completed

14

Page 15: Work-stealing Tree Data Structure

Data-parallel scheduler

Assign loop elements to workers without knowledge about the workload function.

15

Page 16: Work-stealing Tree Data Structure

Data-parallel scheduler

1. Linear speedup for the baseline workload

Assign loop elements to workers without knowledge about the workload function.

16

Page 17: Work-stealing Tree Data Structure

Data-parallel scheduler

1. Linear speedup for the baseline workload 2. Optimal speedup for irregular workloads

Assign loop elements to workers without knowledge about the workload function.

17

Page 18: Work-stealing Tree Data Structure

Static batching

Decides on the worker-element assignment before the data-parallel operation begins.

N

cycles

18

Page 19: Work-stealing Tree Data Structure

Static batching

Decides on the worker-element assignment before the data-parallel operation begins.

No knowledge → divide uniformly.

Not optimal for even mildly irregular workloads.

N

cycles

19

Page 20: Work-stealing Tree Data Structure

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

progress

20

Page 21: Work-stealing Tree Data Structure

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

0

21

Page 22: Work-stealing Tree Data Structure

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

2 T0: CAS

T0

22

Page 23: Work-stealing Tree Data Structure

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

4 T1: CAS

T0 T1

23

Page 24: Work-stealing Tree Data Structure

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

6 T0: CAS

T0 T1

24

Page 25: Work-stealing Tree Data Structure

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

8 T0: CAS

T0 T1

25

Page 26: Work-stealing Tree Data Structure

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

10 T0: CAS

T0 T1

26

Page 27: Work-stealing Tree Data Structure

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

12 T0: CAS

T0 T1

27

Page 28: Work-stealing Tree Data Structure

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

progress

Pros: lightweight Cons: minimum batch size, contention

28

Page 29: Work-stealing Tree Data Structure

Fixed-size batching - contention

29

Page 30: Work-stealing Tree Data Structure

Factoring, GSS, TS

Batch size varies.

N

cycles

progress

Pros: lightweight Cons: contention

30

Page 31: Work-stealing Tree Data Structure

Task-based work-stealing

N

cycles

0..2 2..4 4..8 8..16

31

Page 32: Work-stealing Tree Data Structure

Task-based work-stealing

N

cycles

0..2 2..4 4..8 8..16

2..4

4..8

8..16

T0 T1 0..2

32

Page 33: Work-stealing Tree Data Structure

Task-based work-stealing

N

cycles

0..2 2..4 4..8 8..16

2..4

4..8

8..16

T0 T1 0..2

steal – a rare event

33

Page 34: Work-stealing Tree Data Structure

Task-based work-stealing

N

cycles

0..2 2..4 4..8 8..16

2..4

4..8

8..16

T0 T1

10..12

12..16

8..10 0..2

34

Page 35: Work-stealing Tree Data Structure

Task-based work-stealing

Pros: can be adaptive - uses stealing information Cons: heavyweight - minimum batch size much larger

N

cycles

0..2 2..4 4..8 8..16

2..4

4..8

8..16

T0 T1

10..12

12..16

0..2 8..10

35

Page 36: Work-stealing Tree Data Structure

Task-based work-stealing

N

cycles

0..2 2..4 4..8 8..16

Cannot be stolen after T0 starts processing it

36

Page 37: Work-stealing Tree Data Structure

Work-stealing tree

0 0 T0 N

owned

37

Page 38: Work-stealing Tree Data Structure

Work-stealing tree

0 0 T0 N 0 50 T0 N

owned owned

T0: CAS

38

Page 39: Work-stealing Tree Data Structure

Work-stealing tree

0 0 T0 N 0 50 T0 N 0 N T0 N …

owned owned completed

T0: CAS T0: CAS

What about stealing?

39

Page 40: Work-stealing Tree Data Structure

Work-stealing tree

0 0 T0 N 0 50 T0 N 0 N T0 N …

owned owned completed

0 -51 T0 N

T0: CAS

T1: CAS

stolen

T0: CAS

40

Page 41: Work-stealing Tree Data Structure

Work-stealing tree

0 50 T0 N 0 N T0 N …

owned completed

0 -51 T0 N

T0: CAS

stolen

T0: CAS

0 0 T0 N

owned

T1: CAS

41

Page 42: Work-stealing Tree Data Structure

Work-stealing tree

0 50 T0 N 0 N T0 N …

owned completed

0 -51 T0 N

T0: CAS

stolen

0 -51 T0 N

expanded

50 50 T0 M M M T1 N

T0: CAS

0 0 T0 N

owned

M = (50 + N) / 2 42

Page 43: Work-stealing Tree Data Structure

Work-stealing tree

0 50 T0 N 0 N T0 N …

owned completed

0 -51 T0 N

T0: CAS

stolen

0 -51 T0 N

expanded

50 50 T0 M M M T1 N

T0: CAS

0 0 T0 N

owned

M = (50 + N) / 2

T0 or T1: CAS

43

Page 44: Work-stealing Tree Data Structure

Work-stealing tree

0 50 T0 N 0 N T0 N …

owned completed

0 -51 T0 N

T0: CAS

stolen

0 -51 T0 N

expanded

50 50 T0 M M M T1 N

T0 or T1: CAS

T0: CAS

0 0 T0 N

owned

M = (50 + N) / 2 44

Page 45: Work-stealing Tree Data Structure

Work-stealing tree - contention

45

Page 46: Work-stealing Tree Data Structure

Work-stealing tree scheduling

1) find either a non-expanded, non-completed node

2) if not found, terminate

3) if not owned, steal and/or expand, and descend

4) advance until node is completed or stolen

5) go to 1)

50

Page 47: Work-stealing Tree Data Structure

Work-stealing tree scheduling

1) find either a non-expanded, non-completed node

2) if not found, terminate

3) if not owned, steal and/or expand, and descend

4) advance until node is completed or stolen

5) go to 1)

1) find either a non-expanded, non-completed node

51

Page 48: Work-stealing Tree Data Structure

Choosing the node to steal

Find first, in-order traversal

2 9

5

3

52

Page 49: Work-stealing Tree Data Structure

Choosing the node to steal

Find first, in-order traversal

2 9

5

3

Catastrophic – a lot of stealing, huge trees

53

Page 50: Work-stealing Tree Data Structure

Choosing the node to steal

Find first, in-order traversal Find first, random order traversal

2 9

5

3

2 9

5

3

Catastrophic – a lot of stealing, huge trees

54

Page 51: Work-stealing Tree Data Structure

Choosing the node to steal

Find first, in-order traversal Find first, random order traversal

2 9

5

3

2 9

5

3

Catastrophic – a lot of stealing, huge trees

Works reasonably well.

55

Page 52: Work-stealing Tree Data Structure

Choosing the node to steal

Find first, in-order traversal Find first, random order traversal Find most elements

2 9

5

3

2 9

5

3

2 9

5

3

Catastrophic – a lot of stealing, huge trees

Works reasonably well. Generates least nodes. Seems to be best.

56

Page 53: Work-stealing Tree Data Structure

Comparison with fixed-size batching

57

Page 54: Work-stealing Tree Data Structure

Comparison with fixed-size batching

58

Page 55: Work-stealing Tree Data Structure

Comparison with task work-stealing

59

Page 56: Work-stealing Tree Data Structure

Thank you!

Questions?

60

Page 57: Work-stealing Tree Data Structure

Finding work

61

Page 58: Work-stealing Tree Data Structure

Other workloads

62