ptg: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfptg: an...
TRANSCRIPT
![Page 1: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/1.jpg)
PTG: an abstraction for unhindered parallelism
Anthony Danalis Innovative Computing Laboratory
University of Tennessee WOLFHPC 2014, New Orleans
George Bosilca Aurelien Bouteiller Mathieu Faverge Thomas Herault Jack Dongarra
![Page 2: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/2.jpg)
Programming Paradigms
² PTG: Parameterized Task Graph
² Key abstraction behind the PaRSEC runtime
² PTG revives an old programming paradigm:
² Dataflow-based execution
² What is wrong with current prog. paradigms?
² What is the current programming paradigm?
![Page 3: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/3.jpg)
Serial Code
do i = 1, len y(i) = y(i) + a*x(i)enddo
3
![Page 4: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/4.jpg)
Serial Code
do i = 1, len y(i) = y(i) + a*x(i)enddo
4
We tell the computer what to do and exactly in what order to do it.
![Page 5: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/5.jpg)
Vector Architectures
do i = 1, len y(i) = y(i) + a*x(i)enddo
5
![Page 6: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/6.jpg)
VLIW
do i = 1, len y(i) = y(i) + a*x(i)enddo
6
![Page 7: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/7.jpg)
Superscalar
do i = 1, len y(i) = y(i) + a*x(i)enddo
7
![Page 8: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/8.jpg)
OpenMP Code
#pragma omp paralleldo i = 1, len y(i) = y(i) + a*x(i)enddo
8
![Page 9: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/9.jpg)
OpenMP Code
#pragma omp paralleldo i = 1, len y(i) = y(i) + a*x(i)enddo
9
![Page 10: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/10.jpg)
MPI Code
// Exchange initial datado i = my_start, my_len y(i) = y(i) + a*x(i)enddo// Exchange results
10
![Page 11: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/11.jpg)
MPI Code
// Exchange initial datado i = my_start, my_len y(i) = y(i) + a*x(i)enddo// Exchange results
11
![Page 12: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/12.jpg)
MPI Code
// Exchange initial datado i = my_start, my_len y(i) = y(i) + a*x(i)enddo// Exchange results
12
![Page 13: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/13.jpg)
MPI+X Code
• Works well (maybe) only for isomorphic systems • Plagued by limitations of MPI and X
• Dynamic Load balancing • Handling Jitter • Data distribution • Difficult to develop • Difficult to debug
• Replaces “big hammer” by two big hammers!
13
![Page 14: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/14.jpg)
Current Programming Paradigm
Coarse Grain Parallelism with Explicit Message Passing (CGP)
14
Most parallel programs are essentially sequential, with some code to coordinate.
![Page 15: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/15.jpg)
Panel vs Tile Algorithms
15
LAPACK / Panel PLASMA / Tile
![Page 16: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/16.jpg)
What does the code look like?
16
for (k = 0; k < MT; k++) { Insert_Task( geqrt, A[k][k], INOUT, T[k][k], OUTPUT); for (m = k+1; m < MT; m++) { Insert_Task( tsqrt, A[k][k], INOUT | REGION_D|REGION_U, A[m][k], INOUT | LOCALITY, T[m][k], OUTPUT); } for (n = k+1; n < NT; n++) { Insert_Task( unmqr, A[k][k], INPUT | REGION_L, T[k][k], INPUT, A[k][m], INOUT); for (m = k+1; m < MT; m++) { Insert_Task( tsmqr, A[k][n], INOUT, A[m][n], INOUT | LOCALITY, A[m][k], INPUT, T[m][k], INPUT);
![Page 17: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/17.jpg)
What’s wrong with this code
✗ It has: ✗ Control Flow ✗ Hints for runtime to infer Data Flow ✗ High memory requirements or reduced parallelism
ü It should have: ü No (or minimal, user defined) Control Flow ü Explicit Data Flow ü Unhindered parallelism
17
![Page 18: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/18.jpg)
What captures the semantics?
{[k]->[k,k+1]:k<=mt-2}
GEQRT(k)k = 0 .. mt-1
TSMQR(k,m,n)k = 0 .. mt-1m = k+1 .. mt-1n = k+1 .. mt-1
TSQRT(k,m)k = 0 .. mt-1m = k+1 .. mt-1
UNMQR(k,n)k = 0 .. mt-1n = k+1 .. mt-1
{[k,m,n]->[n]:k+1==n && k+1==m}
{[k,m]->[k,m,n]:k<nt-1 && k<n<nt} {
[k,m,n]->[n,m]:k+1==n && m>n}
{[k,m,n]->[k+1,n]:k+1==m && n>m}
{[k]->[k,n]:k<n<nt && k<nt-1}
{[k,n]->[k,k+1,n]:k<mt-1}
{[k,m,n]->[k+1,m,n]:n>k+1 && m>k+1}
{[k,m,n]->[k,m+1,n]:m<mt-1}
{[k,m]->[k,m+1]:m<mt-1}
![Page 19: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/19.jpg)
Parameterized Task Graph (PTG)
ü Task Classes w/ parameters ü geqrt(k), tsqrt(k,m), unmqr(k,n), tsmqr(k,n,m)
ü Precedence constraints between Tasks
ü Compressed form of the Execution DAG
ü Fixed size (problem size independent)
19
![Page 20: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/20.jpg)
Why not discover the whole DAG?
20
Node0
Node1
Node2
Node3
PO
GE
TR
SYTR TR
PO GE GE
TR TR
SYSY GE
PO
TR
SY
PO
SY SY
![Page 21: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/21.jpg)
Why not discover the whole DAG?
21
Node0
Node1
Node2
Node3
PO
GE
TR
SYTR TR
PO GE GE
TR TR
SYSY GE
PO
TR
SY
PO
SY SY
![Page 22: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/22.jpg)
Why not discover the whole DAG?
Node0
Node1
Node2
Node3
PO
GE
TR
SYTR TR
PO GE GE
TR TR
SYSY GE
PO
TR
SY
PO
SY SY
![Page 23: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/23.jpg)
When building the DTG:
The runtime uses a single:
• Insert_Task() function
• Structure for the elements of the DAG
• Function for traversing a task’s neighborhood
The Dynamic Task Graph is not program specific
23
![Page 24: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/24.jpg)
When using a PTG:
{[k]->[k,k+1]:k<=mt-2}
TSQRT(k,m)k = 0 .. mt-1m = k+1 .. mt-1
{[k,m]->[k,m,n]:k<nt-1 && k<n<nt}
{[k,m,n]->[n,m]:k+1==n && m>n} {[
k,m]->[k,m+1]:m<mt-1}
Program specific DAG walkers
![Page 25: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/25.jpg)
PTG: PING-PONG
25
![Page 26: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/26.jpg)
PTG: Binary Tree Reduction
26
![Page 27: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/27.jpg)
Conc
epts
• Separation of roles: compiler optimizes each task, developer describes dependencies between tasks, runtime orchestrates dynamic execution
• Separate algorithms from data distribution • Avoid limitations of control flow execution
Runt
ime
• Portability layer for heterogeneous architectures
• Scheduling policies adapt execution to the hardware & ongoing system status
• Data movements between consumers are inferred from dependencies. Communication/computation overlap
• Coherency protocols minimize data movements
• Memory hierarchy (including NVRAM and disk) integral part of the scheduling decisions
PaRSEC: focus on the algorithm
![Page 28: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/28.jpg)
Developer’s view of PaRSEC
28
![Page 29: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/29.jpg)
Load Balance, Idle time & Jitter
!me$
SPMD$/$M
PI$
PaRSEC
$
29
![Page 30: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/30.jpg)
Performance: (H)QR
0
500
1000
1500
2000
2500
3000
0 50000 100000 150000 200000 250000 300000
PE
RF
OR
MA
NC
E (
GF
LO
P/S
)
MATRIX SIZE M(N=4,480)
Solving Linear Least Square Problem (DGEQRF)
60-node, 480-core, 2.27GHz Intel Xeon Nehalem, IB 20G System
Theoretical Peak: 4358.4 GFlop/s
LibSCI Scalapack
DPLASMA DGEQRF
Hierarchical QR
30
![Page 31: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/31.jpg)
Performance: Systolic QR
0
5
10
15
20
25
768 2304 4032 5760 7776 10080 14784 19584 23868
PE
RF
OR
MA
NC
E (
TF
LO
P/S
)
NUMBER OF CORES
DGEQRF performance strong scaling
LibSCI Scalapack
PaRSEC DPLASMA
Cray XT5 (Kraken) - N = M = 41,472
31
![Page 32: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/32.jpg)
Distributed CPUs + GPUs
0
10
20
30
40
50
60
70
80
16+3N=31k
144+27 400+75 576+108 784+147 1024+192N=246k
Perf
orm
an
ce (
TF
lop
/s)
Number of Cores+GPUs
Distributed Hybrid DPOTRF Weak Scaling on Keeneland1 to 64 nodes (16 cores, 3 M2090 GPUs, Infiniband 20G per node)
PaRSEC DPOTRF (3 GPU per node)Ideal Scaling
32
![Page 33: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/33.jpg)
NWChem Coupled Cluster (CC)
• Computational Chemistry
• TCE Coupled Cluster
• Machine generated Fortran 77 code
• Behavior depends on dynamic, immutable data
• Long sequential chains of DGEMMs
• Work (chain) stealing through GA atomics
33
![Page 34: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/34.jpg)
CC t1_2_2_2
�����
�����
�
��
���
����� ����� ���� �����
���
� ��
��
���
��
��
��
���
��
����� � ����������
������� �!�"��
#��$�� %� ��� �����
34
![Page 35: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/35.jpg)
CC t2_8
0
1
2
3
4
5
6
7
16x1 16x2 16x4 16x8
Ex
ec
uti
on
Tim
e (
se
c)
Nodes x Cores/Node
Execution Time of icsd_t2_8() subroutine in CCSD of NWChem
Original (isolated) icsd_t2_8()PaRSEC (isolated) icsd_t2_8()
35
![Page 36: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/36.jpg)
Summary
² PTG offers a Data-Flow based Prog. Paradigm
² PaRSEC offers state-of-the-art performance
² Data distribution is decoupled from Algorithm
² Control Flow limitations are avoided
² CGP != highest performance
² CGP != easiest to program (?)
36
![Page 37: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/37.jpg)
Backup slides
37
![Page 38: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/38.jpg)
Why not discover the whole DAG?
Quantify Reduced Parallelism (if using window)
38
Question: Given window size W and P processors, what is the highest level of parallelism that can be missed because of control flow limitations? Assuming P<W
![Page 39: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/39.jpg)
PTG vs DTG
39
for(i=0;i<W;i++){ Task(RW:A[i][0]); for(j=1; j<a*W; j++){ Task(R:A[i][j-1], W:A[i][j]); } }
W
...
... a*W
![Page 40: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/40.jpg)
PTG vs DTG
40
for(i=0;i<W;i++){ Task(RW:A[i][0]); for(j=1; j<a*W; j++){ Task(R:A[i][j-1], W:A[i][j]); } }
W
...
... a*W
![Page 41: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/41.jpg)
PTG vs DTG
41
W
...
... a*W
Dynamic Task Graph (DTG): a*W+(W-1)*(a-1)*W = O(a*W^2)
![Page 42: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/42.jpg)
PTG vs DTG
42
W
...
... a*W
Dynamic Task Graph (DTG): a*W+(W-1)*(a-1)*W = O(a*W^2)
Parameterized Task Graph (PTG):
O(a*W^2/P)
![Page 43: PTG: an abstraction for unhindered parallelismhpc.pnl.gov/conf/wolfhpc/2014/talks/danalis.pdfPTG: an abstraction for unhindered parallelism Anthony Danalis Innovative Computing Laboratory](https://reader035.vdocuments.us/reader035/viewer/2022062610/6113cbf29d903b23cf6327f3/html5/thumbnails/43.jpg)
PTG vs DTG
43
W
...
... a*W
Dynamic Task Graph (DTG): a*W+(W-1)*(a-1)*W = O(a*W^2)
Parameterized Task Graph (PTG):
O(a*W^2/P)
O(DTG)/O(PTG) = P