6.189 multicore programming primer, january … › courses › electrical-engineering-and...6.189...

44
MIT OpenCourseWare http://ocw.mit.edu 6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation format: Saman Amarasinghe, 6.189 Multicore Programming Primer, January (IAP) 2007. (Massachusetts Institute of Technology: MIT OpenCourseWare). http://ocw.mit.edu (accessed MM DD, YYYY). License: Creative Commons Attribution-Noncommercial-Share Alike. Note: Please use the actual date you accessed this material in your citation. For more information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms

Upload: others

Post on 29-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

MIT OpenCourseWare http://ocw.mit.edu

6.189 Multicore Programming Primer, January (IAP) 2007

Please use the following citation format:

Saman Amarasinghe, 6.189 Multicore Programming Primer, January (IAP) 2007. (Massachusetts Institute of Technology: MIT OpenCourseWare). http://ocw.mit.edu (accessed MM DD, YYYY). License: Creative Commons Attribution-Noncommercial-Share Alike.

Note: Please use the actual date you accessed this material in your citation.

For more information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms

Page 2: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

6.189 IAP 2007

Lecture 12

StreamIt Parallelizing Compiler

Prof. Saman Amarasinghe, MIT. 1 6.189 IAP 2007 MIT

Page 3: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Common Machine Language

● Represent common properties of architectures � Necessary for performance

● Abstract away differences in architectures � Necessary for portability

● Cannot be too complex � Must keep in mind the typical programmer

● C and Fortran were the common machine languages foruniprocessors � Imperative languages are not the correct abstraction for

parallel architectures. ● What is the correct abstraction for parallel multicore

machines?

Prof. Saman Amarasinghe, MIT. 2 6.189 IAP 2007 MIT

Page 4: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Common Machine Language for Multicores

● Current offerings: � OpenMP � MPI � High Performance Fortran

● Explicit parallel constructs grafted onto imperative language● Language features obscured:

� Composability � Malleability � Debugging

● Huge additional burden on programmer: � Introducing parallelism � Correctness of parallelism � Optimizing parallelism

Prof. Saman Amarasinghe, MIT. 3 6.189 IAP 2007 MIT

Page 5: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Explicit Parallelism

● Programmer controls details of parallelism! ● Granularity decisions:

� if too small, lots of synchronization and thread creation � if too large, bad locality

● Load balancing decisions � Create balanced parallel sections (not data-parallel)

● Locality decisions � Sharing and communication structure

● Synchronization decisions � barriers, atomicity, critical sections, order, flushing

● For mass adoption, we need a better paradigm: � Where the parallelism is natural � Exposes the necessary information to the compiler

Prof. Saman Amarasinghe, MIT. 4 6.189 IAP 2007 MIT

Page 6: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Unburden the Programmer

● Move these decisions to compiler!� Granularity � Load Balancing � Locality � Synchronization

● Hard to do in traditional languages� Can a novel language help?

Prof. Saman Amarasinghe, MIT. 5 6.189 IAP 2007 MIT

Page 7: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Properties of Stream Programs

● Regular and repeating computation

● Synchronous Data Flow ● Independent actors

with explicit communication ● Data items have short lifetimes

Benefits: ● Naturally parallel ● Expose dependencies to compiler● Enable powerful transformations

6.189 IAP 2007 MIT

Speaker

Prof. Saman Amarasinghe, MIT. 6

Adder

AtoD

FMDemod

LPF1

Duplicate

RoundRobin

LPF2 LPF3

HPF1 HPF2 HPF3

Page 8: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Outline

● Why we need New Languages?

● Static Schedule ● Three Types of Parallelism ● Exploiting Parallelism

Prof. Saman Amarasinghe, MIT. 7 6.189 IAP 2007 MIT

Page 9: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Steady-State Schedule

● All data pop/push rates are constant ● Can find a Steady-State Schedule

� # of items in the buffers are the same before and the after executing the schedule

� There exist a unique minimum steady state schedule

● Schedule = { }

A B C

… push=2

pop=3 push=1

pop=2 …

Prof. Saman Amarasinghe, MIT. 8 6.189 IAP 2007 MIT

Page 10: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

…push=2

Steady-State Schedule

● All data pop/push rates are constant ● Can find a Steady-State Schedule

� # of items in the buffers are the same before and the after executing the schedule

� There exist a unique minimum steady state schedule

● Schedule = { A }

A B C

… push=2

pop=3 push=1

pop=2 …

Prof. Saman Amarasinghe, MIT. 9 6.189 IAP 2007 MIT

Page 11: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

…push=2

Steady-State Schedule

● All data pop/push rates are constant ● Can find a Steady-State Schedule

� # of items in the buffers are the same before and the after executing the schedule

� There exist a unique minimum steady state schedule

● Schedule = { A, A }

A B C

… push=2

pop=3 push=1

pop=2 …

Prof. Saman Amarasinghe, MIT. 10 6.189 IAP 2007 MIT

Page 12: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

pop=3push=1

Steady-State Schedule

● All data pop/push rates are constant ● Can find a Steady-State Schedule

� # of items in the buffers are the same before and the after executing the schedule

� There exist a unique minimum steady state schedule

● Schedule = { A, A, B }

A B C

… push=2

pop=2 …

pop=3 push=1

Prof. Saman Amarasinghe, MIT. 11 6.189 IAP 2007 MIT

Page 13: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

…push=2

Steady-State Schedule

● All data pop/push rates are constant ● Can find a Steady-State Schedule

� # of items in the buffers are the same before and the after executing the schedule

� There exist a unique minimum steady state schedule

● Schedule = { A, A, B, A }

A B C

… push=2

pop=3 push=1

pop=2 …

Prof. Saman Amarasinghe, MIT. 12 6.189 IAP 2007 MIT

Page 14: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

pop=3push=1

Steady-State Schedule

● All data pop/push rates are constant ● Can find a Steady-State Schedule

� # of items in the buffers are the same before and the after executing the schedule

� There exist a unique minimum steady state schedule

● Schedule = { A, A, B, A, B }

A B C

… push=2

pop=2 …

pop=3 push=1

Prof. Saman Amarasinghe, MIT. 13 6.189 IAP 2007 MIT

Page 15: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

pop=2…

Steady-State Schedule

● All data pop/push rates are constant ● Can find a Steady-State Schedule

� # of items in the buffers are the same before and the after executing the schedule

� There exist a unique minimum steady state schedule

● Schedule = { A, A, B, A, B, C }

A B C

… push=2

pop=3 push=1

pop=2 …

Prof. Saman Amarasinghe, MIT. 14 6.189 IAP 2007 MIT

Page 16: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Initialization Schedule

● When peek > pop, buffer cannot be empty after firing a filter

● Buffers are not empty at the beginning/end of the steady state schedule

● Need to fill the buffers before starting the steady state execution

peek=4 pop=3 push=1

Prof. Saman Amarasinghe, MIT. 15 6.189 IAP 2007 MIT

Page 17: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

peek=4pop=3push=1

Initialization Schedule

● When peek > pop, buffer cannot be empty after firing a filter

● Buffers are not empty at the beginning/end of the steady state schedule

● Need to fill the buffers before starting the steady state execution

peek=4 pop=3 push=1

Prof. Saman Amarasinghe, MIT. 16 6.189 IAP 2007 MIT

Page 18: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Outline

● Why we need New Languages?

● Static Schedule ● Three Types of Parallelism ● Exploiting Parallelism

Prof. Saman Amarasinghe, MIT. 17 6.189 IAP 2007 MIT

Page 19: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Data ParallelismPeel iterations of filter, place within scatter/gather pair (fission)parallelize filters with state

Pipeline ParallelismBetween producers and consumersStateful filters can be parallelized

Types of Parallelism

Scatter

Gather

Task Parallelism � Parallelism explicit in algorithm � Between filters without producer/consumer

relationship

Task

Prof. Saman Amarasinghe, MIT. 18 6.189 IAP 2007 MIT

Page 20: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Types of Parallelism

Scatter

Gather

Scatter

Gather

Pip

elin

e

Data Parallel

Data

Task

Task Parallelism � Parallelism explicit in algorithm � Between filters without producer/consumer

relationship

Data Parallelism � Between iterations of a stateless filter � Place within scatter/gather pair (fission) � Can’t parallelize filters with state

Pipeline Parallelism � Between producers and consumers � Stateful filters can be parallelized

Prof. Saman Amarasinghe, MIT. 19 6.189 IAP 2007 MIT

Page 21: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Types of Parallelism

Scatter

Gather

Scatter

Gather

Task

Pip

elin

e

Data

Traditionally:

Task Parallelism � Thread (fork/join) parallelism

Data Parallelism � Data parallel loop (forall)

Pipeline Parallelism � Usually exploited in hardware

Prof. Saman Amarasinghe, MIT. 20 6.189 IAP 2007 MIT

Page 22: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Outline

● Why we need New Languages?

● Static Schedule ● Three Types of Parallelism ● Exploiting Parallelism

Prof. Saman Amarasinghe, MIT. 21 6.189 IAP 2007 MIT

Page 23: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Baseline 1: Task Parallelism

Adder

Splitter

Joiner

Compress

BandPass

Expand

Process

BandStop

Compress

BandPass

Expand

Process

BandStop

● Inherent task parallelism between two processing pipelines

● Task Parallel Model: � Only parallelize explicit task

parallelism � Fork/join parallelism

● Execute this on a 2 core machine ~2x speedup over single core

● What about 4, 16, 1024, … cores?

Prof. Saman Amarasinghe, MIT. 22 6.189 IAP 2007 MIT

Page 24: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Raw Microprocessor16 inorder, single-issue cores with D$ and I$

16 memory banks, each bank with DMA

Evaluation: Task Parallelism

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

Bitonic

Sort

Chann

elVoc

oder

DCT

DES

FFT Filte

rbank

FMRadio

Serpe

nt

TDE MPEG2D

ecod

er

Vocod

er

Radar

Geo

metricM

ean

Thro

ughp

ut N

orm

aliz

ed to

Sin

gle

Cor

e St

ream

It

Cycle accurate simulator

Parallelism: Not matched to target! Synchronization: Not matched to target!

Prof. Saman Amarasinghe, MIT. 23 6.189 IAP 2007 MIT

Image by MIT OpenCourseWare.

Page 25: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Adder

ExpandExpandExpandExpandExpandExpand

Baseline 2: Fine-Grained Data Parallelism

Splitter

Joiner

BandStopBandStopBandStopAdder Splitter

Joiner

ProcessProcessProcess

Joiner

BandPassBandPassBandPass

CompressCompressCompress

BandStopBandStopBandStop

Expand

BandStop

Splitter

Joiner

Splitter

Process

BandPass

Compress

Splitter

Joiner

Splitter

Joiner

Splitter

Joiner

ProcessProcessProcessProcess

Joiner

BandPassBandPassBandPassBandPass

CompressCompressCompressCompress

BandStopBandStopBandStopBandStop

Expand

Splitter

Joiner

Splitter

Splitter

Joiner

Splitter

Joiner

Splitter

Joiner

● Each of the filters in the example are stateless

● Fine-grained Data Parallel Model: � Fiss each stateless filter N ways

(N is number of cores) � Remove scatter/gather if

possible ● We can introduce data parallelism

� Example: 4 cores ● Each fission group occupies entire

machine

Prof. Saman Amarasinghe, MIT. 24 6.189 IAP 2007 MIT

Page 26: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Evaluation: Fine-Grained Data Parallelism

0

1

2

3 4

5

6

7

8

9 10

11

12

13

14

15 16

17

18

19

Bitonic

Sort

Channe

lVocod

er

DCT

DES

FFT

Filterban

k

FMRadio

Serpen

t

TDE MPEG2Deco

der

Vocod

er

Radar

Ge

Thro

ughp

ut N

orm

aliz

ed to

Sin

gle

Cor

e St

ream

It

Task Fine-Grained Data

Good ParallelisToo Much Synchronizatio

ometr

icMea

n

m! n!

Prof. Saman Amarasinghe, MIT. 25 6.189 IAP 2007 MIT

Image by MIT OpenCourseWare.

Page 27: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

● Resultant stream graph is mapped to hardware� One filter per core

● What about 4, 16, 1024, cores?

Baseline 3: Hardware Pipeline Parallelism

Adder

Splitter

Joiner

Compress

BandPass

Expand

Process

BandStop

Compress

BandPass

Expand

Process

BandStop

● The BandPass and BandStop filters contain all the work

● Hardware Pipelining � Use a greedy algorithm to

fuse adjacent filters � Want # filters <= # cores

● Example: 8 Cores

Prof. Saman Amarasinghe, MIT. 26 6.189 IAP 2007 MIT

Page 28: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Baseline 3: Hardware Pipeline Parallelism

Adder

Splitter

Joiner

BandPass

Compress Process Expand

BandStop

BandPass

BandStop

Compress Process Expand

● The BandPass and BandStop filters contain all the work

● Hardware Pipelining � Use a greedy algorithm to

fuse adjacent filters � Want # filters <= # cores

● Example: 8 Cores ● Resultant stream graph is

mapped to hardware � One filter per core

● What about 4, 16, 1024, cores?� Performance dependent on

fusing to a load-balanced stream graph

Prof. Saman Amarasinghe, MIT. 27 6.189 IAP 2007 MIT

Page 29: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Evaluation: Hardware Pipeline Parallelism

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

Bitonic

Sort

Channe

lVocod

er

DCT

DES

FFT

Filterban

k

FMRadio

Serpen

t

TDE MPEG2Deco

der

Vocod

er

Radar

Geometr

icMea

n

Thro

ughp

ut N

orm

aliz

ed to

Sin

gle

Cor

e St

ream

It

Task Fine-Grained Data Hardware Pipelining

Parallelism: Not matched to target! Synchronization: Not matched to target!

Prof. Saman Amarasinghe, MIT. 28 6.189 IAP 2007 MIT

Image by MIT OpenCourseWare.

Page 30: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

The StreamIt Compiler

Coarsen Data Software Granularity Parallelize Pipeline

1. Coarsen: Fuse stateless sections of the graph 2. Data Parallelize: parallelize stateless filters 3. Software Pipeline: parallelize stateful filters

Compile to a 16 core architecture � 11.2x mean throughput speedup over single core

Prof. Saman Amarasinghe, MIT. 29 6.189 IAP 2007 MIT

Page 31: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Phase 1: Coarsen the Stream Graph

Splitter

BandPass Peek BandPass

Compress

Peek

Peek

Expand

BandStop

Process

Peek

Expand

BandStop

Process

Compress

Joiner

Adder

● Before data-parallelism is exploited ● Fuse stateless pipelines as much

as possible without introducing state � Don’t fuse stateless with stateful � Don’t fuse a peeking filter with

anything upstream

Prof. Saman Amarasinghe, MIT. 30 6.189 IAP 2007 MIT

Page 32: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Phase 1: Coarsen the Stream Graph

Splitter

Joiner

BandPass Compress Process Expand

BandPass Compress Process Expand

BandStop BandStop

Adder

● Before data-parallelism is exploited ● Fuse stateless pipelines as much

as possible without introducing state � Don’t fuse stateless with stateful � Don’t fuse a peeking filter with

anything upstream ● Benefits:

� Reduces global communication and synchronization

� Exposes inter-node optimization opportunities

Prof. Saman Amarasinghe, MIT. 31 6.189 IAP 2007 MIT

Page 33: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

AdderAdderAdder

Phase 2: Data Parallelize

Splitter

BandPass Compress

BandPass

Process Expand

Process Compress

Expand

Data Parallelize for 4 cores

BandStop

Fiss 4 ways, to occupy entire chip

Prof. Saman Amarasinghe, MIT. 32 6.189 IAP 2007 MIT

Joiner

Adder

BandStop

Splitter

Joiner

Page 34: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

AdderAdderAdder

ProcesExpand

ProcesExpand

Phase 2: Data Parallelize

Splitter

Joiner

Adder

BandPass Compress

s

Splitter

Joiner

BandPass Compress Process Expand

BandPass Compress

s

Splitter

Joiner

BandPass Compress Process Expand

BandStop BandStop

Splitter

Joiner

Data Parallelize for 4 cores

Task parallelism! Each fused filter does equal work Fiss each filter 2 times to occupy entire chip

Prof. Saman Amarasinghe, MIT. 33 6.189 IAP 2007 MIT

Page 35: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

AdderAdderAdder

ProcesExpand

ProcesExpand

Phase 2: Data Parallelize

BandStop BandStop

Splitter

Joiner

Adder

BandPass Compress

s

Splitter

Joiner

BandPass Compress Process Expand

BandPass Compress

s

Splitter

Joiner

BandPass Compress Process Expand

Splitter

Joiner

BandStop

Splitter

Joiner

BandStop

Splitter

Joiner

Data Parallelize for 4 cores ● Task-conscious data parallelization

� Preserve task parallelism ● Benefits:

� Reduces global communication and synchronization

Task parallelism, each filter does equal workFiss each filter 2 times to occupy entire chip

Prof. Saman Amarasinghe, MIT. 34 6.189 IAP 2007 MIT

Page 36: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Evaluation: Coarse-Grained Data Parallelism

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

Bitonic

Sort Cha

nnelVocoder

DCT

DES

FFT Filte

rbank

FMRadio

Serpent

TDE MPEG2Dec

oder Voco

der

Radar

Geometric

Mean Th

roug

hput

Nor

mal

ized

to S

ingl

e C

ore

Stre

amIt

Task Fine-Grained Data Hardware Pipelining Coarse-Grained Task + Data

Good Parallelism! Low Synchronization!

Prof. Saman Amarasinghe, MIT. 35 6.189 IAP 2007 MIT

Image by MIT OpenCourseWare.

Page 37: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

6

Simplified VocoderSplitter

Joiner

AdaptDFT AdaptDFT 6

RectPolar

Splitter

Splitter

Amplify

Diff

UnWrap

Accum

Joiner

Joiner

PolarRect

20

20

22

Diff

Unwrap

11

Data Parallel

Amplify

Accum

1 1 Data Parallel, but too little work! 11

Data Parallel Target a 4 core machine

Prof. Saman Amarasinghe, MIT. 36 6.189 IAP 2007 MIT

Page 38: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

RectPoRectPoRectPo

RectPoRectPoRectPo

Data Parallelize

6

larlarlar

Splitter

Joiner

AdaptDFT AdaptDFT

Splitter

Splitter

Amplify

Diff

UnWrap

Accum

Amplify

Diff

Unwrap

Accum

Joiner

RectPolar

6

5Splitter

Joiner

larlarlarPolarRect

Splitter

Joiner

Joiner 20

20

2

1

1

1

5

2

1

1

1

Target a 4 core machine

Prof. Saman Amarasinghe, MIT. 37 6.189 IAP 2007 MIT

Page 39: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

RectPo

Data + Task Parallel Execution

6Cores

2Time

1 211

1

Splitter

Joiner

Splitter

Splitter

Joiner

Splitter

Joiner

lar Splitter

Joiner

Joiner

6

2

1

1

1

5

5 Target 4 core machine

Prof. Saman Amarasinghe, MIT. 38 6.189 IAP 2007 MIT

Page 40: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

RectPo

We Can Do Better!

6Cores

2 16Time 1

1

1

Splitter

Joiner

Splitter

Splitter

Joiner

Splitter

Joiner

lar Splitter

Joiner

Joiner

6

2

1

1

1

5

5

Target 4 core machine

Prof. Saman Amarasinghe, MIT. 39 6.189 IAP 2007 MIT

Page 41: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

RectPolar

RectPolar

RectPolar

RectPolar

Phase 3: Coarse-Grained Software Pipelining

Prof. Saman Amarasinghe, MIT. 40 6.189 IAP 2007 MIT

Prologue

New Steady

State

● New steady-state is free of dependencies

● Schedule new steady-state using a greedy partitioning

Page 42: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Greedy Partitioning

To Schedule: Cores

Time

Target 4 core machine

16

Prof. Saman Amarasinghe, MIT. 41 6.189 IAP 2007 MIT

Page 43: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

42 6.189 IAP 2007 MITProf. Saman Amarasinghe, MIT.

Coarse-Grained Task + Data + Software Pipeline

Evaluation: Coarse-Grained Task + Data + Software Pipelining

0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19

Bitonic

Sort Cha

nnelVocoder

DCT

DES

FFT

Filterba

nk

FMRadio

Serpent

TDE MPEG2Dec

oder

Vocoder

Radar

GeometricM

ean

Thro

ughp

ut N

orm

aliz

ed to

Sin

gle

Cor

e St

ream

It

Task Fine-Grained Data Hardware Pipelining Coarse-Grained Task + Data

Best Parallelism! Lowest Synchronization!

Image by MIT OpenCourseWare.

Page 44: 6.189 Multicore Programming Primer, January … › courses › electrical-engineering-and...6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation

Summary

● Streaming model naturally exposes task, data, and pipeline parallelism

● This parallelism must be exploited at the correct granularity and combined correctly

Task Fine-

Grained Data

Hardware Pipelining

Coarse-Grained Task + Data

Coarse-Grained Task + Data + Software

Pipeline

Parallelism Application Dependent

Good Application Dependent

Good Best

Synchronization Application Dependent

High Application Dependent

Low Lowest

● Robust speedups across varied benchmark suite

Prof. Saman Amarasinghe, MIT. 43 6.189 IAP 2007 MIT