fractal - mit csailpeople.csail.mit.edu/sanchez/papers/2017.fractal.isca.slides.pdf · fractal: an...
TRANSCRIPT
SUVINAYSUBRAMANIAN,MARKC.JEFFREY,MALEEN ABEYDEERA,HYUN RYONG LEE,VICTORA.YING,JOELEMER,DANIELSANCHEZ
ISCA2017
FRACTALANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM
Currentspeculativesystemsscalepoorly
Speculativeparallelization,e.g.TM,simplifiesparallelprogramming
Performspoorlyonrealworldapplications……becauseapplicationscompriselargeatomictasks
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 2
Largeatomictaskslimitperformance
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 3
DatabaseTransactionquery X ……update Z……query U……update V
Millionsofcycles
Pronetoaborts
Challengingtotrack
Serial(missesparallelism)
Largeatomictaskshaveabundantnestedparallelism!
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 4
qry X qry K
upd Z qry Y qry Y upd J qry S
…
qry Mupd L… qry U upd V……
… Howto- extractparallelism?- maintainatomicity?- achievehighperformance?
…
PriorTMsfailtoexploitnestedparallelism1. Mergingof“nested”speculative
statewithparent
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 5
Core2 Core4Core1 Core3
Time
X
A B
YXA B Y
2. Cyclicdependencebetweenparentandnestedchildren
Largespeculativestate,pronetoaborts Deadlockandlivelock issues
Seethepaperformoredetails!
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 6
Orderingtaskstoguaranteeatomicity
Core2 Core4Core1 Core3Time
X
A B
YX
A B Y
X Y1 2
X Y1 2
A B1.1 1.2
Fractal decouplesatomicityfromparallelism
1. Decouplesunitofatomicityfromunitofparallelism◦ Domain:Alltasksbelongingtoadomainappeartoexecuteatomically
2. Implementationguaranteesatomicitybyorderingtasks◦ Nomergingspeculativestate
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 7
BenefitsofFractalTinytasks
Easytotrack
Composable speculativeparallelism
Fractal ExecutionModel
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 8
DECOUPLINGATOMICITYFROMPARALLELISM
Domains togrouptasksintoatomicunits
Fractalprogramsconsistofatomictasks
Tasksmayaccessarbitrarydata
Tasksmaycreatechildtasks
Tasksbelongtoahierarchyofnesteddomains
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 9
Semanticsacrossdomains
Eachtask:◦ cancreateasinglesubdomain◦ canenqueue childtaskstosubdomain orcurrent domain
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 10
A B
C D
E
X
L M
N O
P
Y
(Alltasksindomain+creatorofdomain)
Appeartoexecuteassingleatomicunit
à
Rootdomain
Semanticswithinadomain
Unordered◦ Arbitraryorderwhilerespectingparent-childdependences
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 11
A B
C D
E
X
L M
N O
P
Y
Timestamp-ordered◦ Tasksappeartoexecuteinincreasingtimestamporder
◦ Childrenappeartoexecuteafterparent
1 10
2
3
12
Rootdomain
Fractal softwareAPI
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 12
fractal::enqueue(function_pointer, timestamp, arguments...);
fractal::create_subdomain(<domain_type>);
Creatingandenqueuing tasks
Creatingsub-domains
forall(), callcc(), parallel_reduce()High-levelprogramminginterface,e.g.
Example:DatabasetransactionsinFractal
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 13
query X query Zupdate Zquery Uupdate V
qry X
qry Z
upd Z
qry U
upd V
query A query Bupdate Cupdate Zupdate K
qry A
qry B
upd C
upd Z
upd K
RootdomainTXN1 TXN2
1
2
3
4
5
1
2
3
4
5
T1 T2
Fractal Implementation
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 14
ATOMICITYTHROUGHORDERING
Fractal VirtualTime(VT)
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 15
Fractal assignsafractalvirtualtime(VT)toeachtaskCapturestheorderingoftasksacrossdomains,withinadomain
FractalVT= 45 23 108 … 9
DomainVT…
Example:DatabasetransactionsinFractal
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 16
query X query Zupdate Zquery Uupdate V
qry X
qry Z
upd Z
qry U
upd V
query A query Bupdate Cupdate Zupdate K
qry A
qry B
upd C
upd Z
upd K
RootdomainTXN1 TXN2
1
2
3
4
5
1
2
3
4
5
1 1
1 2
1 3
1 4
1 5
2 1 2 4
2 2 2 5
2 3
T1 T21 2
Example:DatabasetransactionsinFractal
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 16
query X query Zupdate Zquery Uupdate V
qry X
qry Z
upd Z
qry U
upd V
query A query Bupdate Cupdate Zupdate K
qry A
qry B
upd C
upd Z
upd K
RootdomainTXN1 TXN2
1
2
3
4
5
1
2
3
4
5
1 1
1 2
1 3
1 4
1 5
2 1 2 4
2 2 2 5
2 3
FractalVTcapturesallorderinginformation
T1 T21 2
Swarm[MICRO’15] :Anefficientsubstratefororderedspeculation
LargehardwaretaskqueuesScalableorderedcommitsScalableorderedspeculation
17
64-tile,256-corechip Tileorganization
Core Core Core Core
L1I/D L1I/D L1I/D L1I/D
L2
L3sliceRouter
TaskunitMem /IO
Mem
/IO
Mem /IO
Mem
/IO
Tile
EfficientlysupportstinyspeculativetasksFRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM
Swarmexecutestasksspeculativelyandoutoforder
Fractal featuresFractal VTconstructionrequiresnocentralizedstructures
Fractal VTassignsorderdynamically
Hardwaresupportsafewnumberofconcurrentdepths◦ “Zooming”operationsallowforunboundednesting◦ Spilltasksfromshallowerdomainstomemory◦ Parallelismcompoundsquicklywithdepth
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 18
Seethepaperformoredetails!
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
T11
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
qry X
1 1
qry U1 4
T11
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
qry X
1 1
qry Z1 2
qry U1 4
upd V1 5
T11
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
qry X
1 1
qry Z1 2
qry U1 4
upd V1 5
T11
T22
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
qry X
1 1
qry Z1 2
qry U1 4
upd V1 5 qry A
2 1
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
qry X
1 1
qry Z1 2
qry U1 4
upd V1 5 qry A
2 1
qry B2 2
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
qry X
1 1
qry Z1 2
upd Z1 3
qry U1 4
upd V1 5 qry A
2 1
qry B2 2
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
qry X
1 1
qry Z1 2
upd Z1 3
qry U1 4
upd V1 5 qry A
2 1
qry B2 2
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttaskqry X
1 1
qry Z1 2
upd Z1 3
qry U1 4
upd V1 5 qry A
2 1
qry B2 2
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttaskqry X
1 1
qry Z1 2
upd Z1 3
qry U1 4
upd V1 5 qry A
2 1
qry B2 2
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttaskqry X
1 1
qry Z1 2
upd Z1 3
qry U1 4
upd V1 5 qry A
2 1
qry B2 2
Tracking,conflictdetectionatleveloffine-graintasks
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttaskqry X
1 1
qry Z1 2
upd Z1 3
qry U1 4
upd V1 5 qry A
2 1
qry B2 2
Tracking,conflictdetectionatleveloffine-graintasks
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttaskqry X
1 1
qry Z1 2
upd Z1 3
qry U1 4
upd V1 5 qry A
2 1
qry B2 2
Tracking,conflictdetectionatleveloffine-graintasksSelectiveabortswastelesswork
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttaskqry X
1 1
qry Z1 2
upd Z1 3
qry U1 4
upd V1 5 qry A
2 1
qry B2 2
upd C2 3
Task-leveltracking
Task-levelCD
Selectiveaborts
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttaskqry X
1 1
qry Z1 2
upd Z1 3
qry U1 4
upd V1 5 qry A
2 1
upd Z2 4
qry B2 2
upd C2 3
Task-leveltracking
Task-levelCD
Selectiveaborts
T11
T22
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19
Time
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttaskqry X
1 1
qry Z1 2
upd Z1 3
qry U1 4
upd V1 5 qry A
2 1
upd Z2 4
qry B2 2
upd K2 5
upd C2 3
Task-leveltracking
Task-levelCD
Selectiveaborts
T11
T22
qry U1 4
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 20
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttaskqry X
1 1
qry Z1 2
upd Z1 3
upd V1 5 qry A
2 1
upd Z2 4
qry B2 2
upd K2 5
upd C2 3
Task-leveltracking
Task-levelCD
Selectiveaborts
Time
T11
T22
qry U1 4
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 20
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttaskqry X
1 1
qry Z1 2
upd Z1 3
upd V1 5 qry A
2 1
upd Z2 4
qry B2 2
upd K2 5
upd C2 3
Task-leveltracking
Task-levelCD
Selectiveaborts
Time
T22
T1
qry U1 4
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 20
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttask
upd Z1 3
upd V1 5 qry A
2 1
upd Z2 4
qry B2 2
upd K2 5
upd C2 3
Task-leveltracking
Task-levelCD
Selectiveaborts
Time
qry X
qry Z
T22
T1
qry U1 4
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 20
TXN1 TXN2
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttask
upd Z1 3
upd V1 5 qry A
2 1
upd Z2 4
qry B2 2
upd K2 5
upd C2 3
Task-leveltracking
Task-levelCD
Selectiveaborts
Commitparentbeforechildcompletes
Time
qry X
qry Z
T22
T1
T1
qry U1 4
upd K2 5
upd Z2 4
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 21
TXN1=X TXN2=Y
Core2 Core4Core1 Core3
query X query Zupdate Zquery Uupdate V
query A query Bupdate Cupdate Zupdate K
Aborttask
upd Z1 3
upd V1 5 qry A
2 1
upd Z2 4
qry B2 2
upd K2 5
upd C2 3
Task-leveltracking
Task-levelCD
Selectiveaborts
Commitparentbeforechildcompletes
Time
qry X
qry Z
T22
Fractalunlocksthebenefitsoffine-grainparallelism
Evaluation
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 22
Event-driven,Pin-basedsimulatorTargetsystem:256-core,64-tilechip
Methodology
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 23
Scalabilityexperimentsfrom1–256cores◦ Scaled-downsystemshavefewertiles
Core Core Core Core
L1I/D L1I/D L1I/D L1I/D
L2
L3sliceRouter
TaskunitMem /IO
Mem
/IO
Mem /IO
Mem
/IO
Tile
64MBsharedL3(1MB/tile)
256KBper-tileL2s
16KBper-coreL1s
16Ktaskqueueentries(64/core)4Kcommitqueueentries(16/core)
In-order,single-issue,scoreboarded
Applications◦ Unordered(STAMP):labyrinth,bayes
◦ Ordered:color,msf,silo,maxflow,mis
Fractal uncoversabundantnestedparallelism
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 24
Flat Fractal
Largeatomictasks Nestedparallelismexposedthroughfine-grainedtasks
Fractal uncoversabundantnestedparallelism
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 25
1
128
256
Speedup
1c 128c 256c
322xmaxflow
1
128
256
1c 128c 256c
bayes
1
64
128
1c 128c 256c
labyrinth Flat1x—4.9xFractal
88x—322x
Flat Fractal
Flat 3260 1.8M 16MFractal 373 3590 220
Averagetasklength(cycles)
Fractal avoidsover-serialization
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 26
1
64
128
Speedup
1c 128c 256c
145xmis
1
64
128
1c 128c 256c
color
1
32
64
1c 128c 256c
msfFlat FractalSwarm Flat
26x—98xSwarm
21x—119xFractal
40x—145xFlat 162 633 113Fractal 115 96 49
Averagetasklength(cycles)
ConclusionSpeculativesystemsmustextractnestedparallelisminordertoscalelarge,complex,real-worldapplications
Fractal:Anexecutionmodelforfine-grainnestedspeculativeparallelism◦ Decoupleatomicityfromparallelism◦ Guaranteeatomicitybyorderingtasks
Fractal unlocksthebenefitsoffine-grainspeculativeparallelism◦ Parallelizesmanychallengingworkloads◦ Enablescompositionofspeculativeparallelalgorithms
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 27
ThankYou!Questions?Speculativesystemsmustextractnestedparallelisminordertoscalelarge,complex,real-worldapplications
Fractal:Anexecutionmodelforfine-grainnestedspeculativeparallelism◦ Decoupleatomicityfromparallelism◦ Guaranteeatomicitybyorderingtasks
Fractal unlocksthebenefitsoffine-grainspeculativeparallelism◦ Parallelizesmanychallengingworkloads◦ Enablescompositionofspeculativeparallelalgorithms
FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 28