domain’specific-abstrac3ons-and- performanceportability...
TRANSCRIPT
![Page 1: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/1.jpg)
Domain-‐Specific Abstrac3ons and Performance Portability
P. (Saday) Sadayappan The Ohio State University
& Data Access Complexity: Revisi3ng
the Red/Blue Pebble game
![Page 2: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/2.jpg)
Acknowledgements Collaborators Gerald Baumgartner(LSU) Albert Cohen (ENS Paris) Jason Cong (UCLA) Franz Franchetti (CMU) Robert Harrison (Stony Brook) So Hirata (U. Illinois) Jarek Nieploha (PNNL) Srini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ Pitzer (OSU, Chem) J. Ramanujam (LSU) Fabrice Rastello (ENS Lyon) Nasko Rountev (OSU) Vivek Sarkar (Rice)
Ph.D. Students Muthu Baskaran Uday Bondhugula Jim Dinan Xiaoyang Gao Albert Hartono Justin Holewinski Sriram Krishnamoorthy Qingda Lu Mohammad Arafat Venmugil Elango Tom Henretty Martin Kong Pai-Wei Lai Mahesh Ravishankar Kevin Stock Sanket Tavarageri
Funding Natl. Science Foundn Dept. of Energy DARPA
![Page 3: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/3.jpg)
Why Domain-Specific Languages?
• Produc(vity – High level abstrac(ons eases applica(on development
• Performance – Domain-‐specific seman(cs enables specialized op(miza(ons
– Constraints on specifica(on enables more effec(ve general-‐purpose transforma(ons and tuning ((ling, fusion)
• Portability – New architectures => changes only in domain-‐specific compiler, without any change in user applica(on code
![Page 4: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/4.jpg)
int aFunction(int a, int b) { int c=b; return a; } main() { int a,b,c,d,e; int i=4; for (i=0;i<10;i++) { int j=55; c=i+j; c=aFunction(i,c); a=aFunction(a+1,b); } #pragma SliceTarget a; return 0; }
0%
20%
40%
60%
80%
100%
1 4 8 12 16
DSL Technology for Exascale Computing (D-TEC)
Lead PI Daniel J. Quinlan
Lawrence Livermore Na(onal Laboratory
Co-‐PIs and Ins(tu(ons Armando Solar-‐Lezama, Adam Chlipala, Srinivas Devadas,
Una-‐May O’Reilly, Nir Shavit, Youssef Marzouk @ Massachuse\s Ins(tute of Technology John Mellor-‐Crummey & Vivek Sarkar @ Rice University
Vijay Saraswat & David Grove @ IBM Watson P. Sadayappan & Atanas Rountev @ Ohio State University
Ras Bodik @ University of California at Berkeley Craig Rasmussen @ University of Oregon
Phil Colella @ Lawrence Berkeley Na(onal Laboratory Rich Vuduc @ Georgia Tech
Sco\ Baden @ University of California at San Diego
Deputy PI Saman Amarasinghe
MIT
![Page 5: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/5.jpg)
The Tensor Contraction Engine A Domain-Specific Compiler for Many-Body Methods in Quantum Chemistry
Oak Ridge National Laboratory
David E. Bernholdt, Robert Harrison
Pacific Northwest National Laboratory
Jarek Nieplocha
Louisiana State University Gerald Baumgartner J. Ramanujam
Ohio State University Xiaoyang Gao, Albert Hartono, Sriram Krishnamoorthy, Qingda Lu, Alex Sibiryakov, Russell Pitzer, P. Sadayappan
University of Florida
So Hirata
University of Waterloo
Marcel Nooijen
Supported by NSF and DOE
![Page 6: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/6.jpg)
Time Crunch in Quantum Chemistry
Two major bottlenecks in computational chemistry: • Very computationally intensive models • Extremely time consuming to develop codes The vicious cycle of computational science: • More powerful computers make more accurate models computationally
feasible :-) • But efficient parallel implementation of complex models takes longer and
longer • Hence computational scientists spend more time with low-level
programming for performance, and less time doing science :-( • Coupled Cluster family of models
in electronic structure theory • Increasing number of terms =>
explosive increase in code complexity
• Theory is the same, but efficient implementations of higher order models took many years
1992 79901 183 CCSDTQ
1988 33932 102 CCSDT
1982 13213 48 CCSD
1978 3209 11 CCD
Year #F77Lines #Terms Theory
![Page 7: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/7.jpg)
CCSD Doubles Equation (Quantum Chemist’s Eye Test Chart :-) hbar[a,b,i,j] == sum[f[b,c]*t[i,j,a,c],{c}] -sum[f[k,c]*t[k,b]*t[i,j,a,c],{k,c}] +sum[f[a,c]*t[i,j,c,b],{c}] -sum[f[k,c]*t[k,a]*t[i,j,c,b],
{k,c}] -sum[f[k,j]*t[i,k,a,b],{k}] -sum[f[k,c]*t[j,c]*t[i,k,a,b],{k,c}] -sum[f[k,i]*t[j,k,b,a],{k}] -sum[f[k,c]*t[i,c]*t[j,k,b,a],{k,c}] +sum[t[i,c]*t[j,d]*v[a,b,c,d],{c,d}] +sum[t[i,j,c,d]*v[a,b,c,d],{c,d}] +sum[t[j,c]*v[a,b,i,c],{c}] -sum[t[k,b]*v[a,k,i,j],{k}] +sum[t[i,c]*v[b,a,j,c],{c}] -sum[t[k,a]*v[b,k,j,i],{k}] -sum[t[k,d]*t[i,j,c,b]*v[k,a,c,d],{k,c,d}] -sum[t[i,c]*t[j,k,b,d]*v[k,a,c,d],{k,c,d}] -sum[t[j,c]*t[k,b]*v[k,a,c,i],{k,c}] +2*sum[t[j,k,b,c]*v[k,a,c,i],{k,c}] -sum[t[j,k,c,b]*v[k,a,c,i],{k,c}] -sum[t[i,c]*t[j,d]*t[k,b]*v[k,a,d,c],{k,c,d}] +2*sum[t[k,d]*t[i,j,c,b]*v[k,a,d,c],{k,c,d}] -sum[t[k,b]*t[i,j,c,d]*v[k,a,d,c],{k,c,d}] -sum[t[j,d]*t[i,k,c,b]*v[k,a,d,c],{k,c,d}] +2*sum[t[i,c]*t[j,k,b,d]*v[k,a,d,c],{k,c,d}] -sum[t[i,c]*t[j,k,d,b]*v[k,a,d,c],{k,c,d}] -sum[t[j,k,b,c]*v[k,a,i,c],{k,c}] -sum[t[i,c]*t[k,b]*v[k,a,j,c],{k,c}] -sum[t[i,k,c,b]*v[k,a,j,c],{k,c}] -sum[t[i,c]*t[j,d]*t[k,a]*v[k,b,c,d],{k,c,d}] -sum[t[k,d]*t[i,j,a,c]*v[k,b,c,d],{k,c,d}] -sum[t[k,a]*t[i,j,c,d]*v[k,b,c,d],{k,c,d}] +2*sum[t[j,d]*t[i,k,a,c]*v[k,b,c,d],{k,c,d}] -sum[t[j,d]*t[i,k,c,a]*v[k,b,c,d],{k,c,d}] -sum[t[i,c]*t[j,k,d,a]*v[k,b,c,d],{k,c,d}] -sum[t[i,c]*t[k,a]*v[k,b,c,j],{k,c}] +2*sum[t[i,k,a,c]*v[k,b,c,j],{k,c}] -sum[t[i,k,c,a]*v[k,b,c,j],{k,c}] +2*sum[t[k,d]*t[i,j,a,c]*v[k,b,d,c],{k,c,d}] -sum[t[j,d]*t[i,k,a,c]*v[k,b,d,c],{k,c,d}] -sum[t[j,c]*t[k,a]*v[k,b,i,c],{k,c}] -sum[t[j,k,c,a]*v[k,b,i,c],{k,c}] -sum[t[i,k,a,c]*v[k,b,j,c],{k,c}] +sum[t[i,c]*t[j,d]*t[k,a]*t[l,b]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[k,b]*t[l,d]*t[i,j,a,c]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[k,a]*t[l,d]*t[i,j,c,b]*v[k,l,c,d],{k,l,c,d}] +sum[t[k,a]*t[l,b]*t[i,j,c,d]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[j,c]*t[l,d]*t[i,k,a,b]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[j,d]*t[l,b]*t[i,k,a,c]*v[k,l,c,d],{k,l,c,d}] +sum[t[j,d]*t[l,b]*t[i,k,c,a]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[i,c]*t[l,d]*t[j,k,b,a]*v[k,l,c,d],{k,l,c,d}] +sum[t[i,c]*t[l,a]*t[j,k,b,d]*v[k,l,c,d],{k,l,c,d}] +sum[t[i,c]*t[l,b]*t[j,k,d,a]*v[k,l,c,d],{k,l,c,d}] +sum[t[i,k,c,d]*t[j,l,b,a]*v[k,l,c,d],{k,l,c,d}] +4*sum[t[i,k,a,c]*t[j,l,b,d]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[i,k,c,a]*t[j,l,b,d]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[i,k,a,b]*t[j,l,c,d]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[i,k,a,c]*t[j,l,d,b]*v[k,l,c,d],{k,l,c,d}] +sum[t[i,k,c,a]*t[j,l,d,b]*v[k,l,c,d],{k,l,c,d}] +sum[t[i,c]*t[j,d]*t[k,l,a,b]*v[k,l,c,d],{k,l,c,d}] +sum[t[i,j,c,d]*t[k,l,a,b]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[i,j,c,b]*t[k,l,a,d]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[i,j,a,c]*t[k,l,b,d]*v[k,l,c,d],{k,l,c,d}] +sum[t[j,c]*t[k,b]*t[l,a]*v[k,l,c,i],{k,l,c}] +sum[t[l,c]*t[j,k,b,a]*v[k,l,c,i],{k,l,c}] -2*sum[t[l,a]*t[j,k,b,c]*v[k,l,c,i],{k,l,c}] +sum[t[l,a]*t[j,k,c,b]*v[k,l,c,i],{k,l,c}] -2*sum[t[k,c]*t[j,l,b,a]*v[k,l,c,i],{k,l,c}] +sum[t[k,a]*t[j,l,b,c]*v[k,l,c,i],{k,l,c}] +sum[t[k,b]*t[j,l,c,a]*v[k,l,c,i],{k,l,c}] +sum[t[j,c]*t[l,k,a,b]*v[k,l,c,i],{k,l,c}] +sum[t[i,c]*t[k,a]*t[l,b]*v[k,l,c,j],{k,l,c}] +sum[t[l,c]*t[i,k,a,b]*v[k,l,c,j],{k,l,c}] -2*sum[t[l,b]*t[i,k,a,c]*v[k,l,c,j],{k,l,c}] +sum[t[l,b]*t[i,k,c,a]*v[k,l,c,j],{k,l,c}] +sum[t[i,c]*t[k,l,a,b]*v[k,l,c,j],{k,l,c}] +sum[t[j,c]*t[l,d]*t[i,k,a,b]*v[k,l,d,c],{k,l,c,d}] +sum[t[j,d]*t[l,b]*t[i,k,a,c]*v[k,l,d,c],{k,l,c,d}] +sum[t[j,d]*t[l,a]*t[i,k,c,b]*v[k,l,d,c],{k,l,c,d}] -2*sum[t[i,k,c,d]*t[j,l,b,a]*v[k,l,d,c],{k,l,c,d}] -2*sum[t[i,k,a,c]*t[j,l,b,d]*v[k,l,d,c],{k,l,c,d}] +sum[t[i,k,c,a]*t[j,l,b,d]*v[k,l,d,c],{k,l,c,d}] +sum[t[i,k,a,b]*t[j,l,c,d]*v[k,l,d,c],{k,l,c,d}] +sum[t[i,k,c,b]*t[j,l,d,a]*v[k,l,d,c],{k,l,c,d}] +sum[t[i,k,a,c]*t[j,l,d,b]*v[k,l,d,c],{k,l,c,d}] +sum[t[k,a]*t[l,b]*v[k,l,i,j],{k,l}] +sum[t[k,l,a,b]*v[k,l,i,j],{k,l}] +sum[t[k,b]*t[l,d]*t[i,j,a,c]*v[l,k,c,d],{k,l,c,d}] +sum[t[k,a]*t[l,d]*t[i,j,c,b]*v[l,k,c,d],{k,l,c,d}] +sum[t[i,c]*t[l,d]*t[j,k,b,a]*v[l,k,c,d],{k,l,c,d}] -2*sum[t[i,c]*t[l,a]*t[j,k,b,d]*v[l,k,c,d],{k,l,c,d}] +sum[t[i,c]*t[l,a]*t[j,k,d,b]*v[l,k,c,d],{k,l,c,d}] +sum[t[i,j,c,b]*t[k,l,a,d]*v[l,k,c,d],{k,l,c,d}] +sum[t[i,j,a,c]*t[k,l,b,d]*v[l,k,c,d],{k,l,c,d}] -2*sum[t[l,c]*t[i,k,a,b]*v[l,k,c,j],{k,l,c}] +sum[t[l,b]*t[i,k,a,c]*v[l,k,c,j],{k,l,c}] +sum[t[l,a]*t[i,k,c,b]*v[l,k,c,j],{k,l,c}] +v[a,b,i,j]
![Page 8: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/8.jpg)
Example: A CCSD Tensor Contraction
• Tensor contraction is a generalized high-dimension analog of matrix-matrix product • Each loop index appears in exactly two tensors
• Contraction indices appear only in the input (r.h.s) tensors: m and n • External indices appear in output tensor and one input tensor: {i, k} and {j, l}
• Each loop is either a parallel loop or a reduction loop • Relative order of the contraction indices in loop nest is unimportant (associative
reordering of reduction) • Loop nests can be considered fully permutable (dependence analysis in existing
compilers will not permit permutation of contraction indices)
![Page 9: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/9.jpg)
Tensor Contraction Engine
range V = 3000; range O = 100; index a,b,c,d,e,f : V; index i,j,k : O; mlimit = 10000000; function F1(V,V,V,O); function F2(V,V,V,O); procedure P(in T1[O,O,V,V], in T2[O,O,V,V], out X)= begin A3A == sum[ sum[F1(a,b,e,k) * F2(c,f,b,k), {b,k}]
* sum[T1[i,j,c,e] * T2[i,j,a,f], {i,j}], {a,e,c,f}]*0.5 + ...;
end
fkcbekabYttX
YXYXYX
YXYXYXAA
cfaeafij
ceijafce
fceafaecfceafaecfcaefaec
cfeafaecfceafaeccfaeafce
==
+++
++=
,,
,,,,,,
,,,,,,21
)
(3
• Automatic transformation from high-level
specification – Chemist specifies computation in
high-level mathematical form – Synthesis system transforms it to
efficient parallel program – Code is tailored to target machine – Code can be optimized for specific
molecules being modeled • Multi-institutional collaboration (OSU,
LSU, Waterloo, ORNL, PNNL, U. Florida) • Two versions of TCE developed
– a) Full chemistry, but fewer optimizations (Hirata) b) Excluded some details, but sophisticated optimizations
– Used to implement over 20 models, in latest release of NWChem (a few million lines of synthesized code)
– First parallel implementation for many of the methods
– New improved TCE-2 planned
![Page 10: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/10.jpg)
Customized Kernel Generation for Tensor Contractions
• Effec(ve SIMD u(liza(on is increasingly important for high performance on current/emerging processors
• Automa(c vectoriza(on by produc(on compilers (even with manual unrolling) ofen results in performance well under 50% of machine peak
• Customized code generator (using vector intrinsics) for tensor contrac(ons
![Page 11: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/11.jpg)
Approach to Code Generation
![Page 12: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/12.jpg)
Example: Multi-resolution Kernel
![Page 13: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/13.jpg)
Example: Multi-resolution Kernel
![Page 14: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/14.jpg)
Vectorization Dimensions
Inner loop “j” is good for Vectorization (stride is 0 or 1)
Inner loop “i” is bad for vectorization (access stride is N)
for k=0; k<N; k++ for i=0; i<N; i++ for j=0; j<N; j++ C[i][j]+=A[i][k]*B[k][j];
for k=0; k<N; k++ for j=0; j<N; i++ for i=0; i<N; j++ C[i][j]+=A[i][k]*B[k][j];
![Page 15: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/15.jpg)
What if No Vectorizable Loop?
for k=0; k<N; k++ for j=0; j<N ;i++ for i=0; i<N; j++ C[i][j]+=A[k][i]*B[j][k];
![Page 16: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/16.jpg)
What if No Vectorizable Loop?
We can still vectorize by use of in-register transpose Cost of register transpose often amortizable
![Page 17: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/17.jpg)
Code Example
![Page 18: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/18.jpg)
Multiresolution Kernel Performance
![Page 19: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/19.jpg)
Performance: CCSD Tensor Contraction
![Page 20: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/20.jpg)
Domain-‐Specific Op3miza3on: Stencils
Carnegie Mellon University Franz Franche*, Richard Veras
Louisiana State University J. Ramanujam
Ohio State University Tom Henre\y, Jus(n Holewinski, Mar(n Kong, Nasko Rountev, P. Sadayappan
UCLA
Louis-‐Noel Pouchet
Supported by NSF and DOE
![Page 21: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/21.jpg)
Embedded DSL for Stencils
• Benefits of high-‐level specifica(on of computa(ons – Ease of use
• For mathema(cians/scien(sts crea(ng the code – Ease of op(miza(on
• Facilitate loop and data transforma(ons by compiler • Automa(c transforma(on by compiler into parallel C/C++ code
• Embedded DSL provides flexibility – Generality of standard programming language (C, MATLAB) for non compute-‐intensive parts
– Automated transforma(on of embedded DSL code for high performance on different target architectures
• Target architectures for Stencil DSL – Vector-‐SIMD (AVX, LRBNi, ..), GPU, FPGA, customized accelerators
![Page 22: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/22.jpg)
Stencil DSL Example -- Standalone int Nr; int Nc;!grid g [Nr][Nc];!!double griddata a on g at 0,1;!!pointfunction five_point_avg(p) {! double ONE_FIFTH = 0.2;! [1]p[0][0] = ONE_FIFTH*([0]p[-1][0] + [0]p[0][-1] ! + [0]p[0][0] + [0]p[0][1] + [0]p[1][0]);!}!!iterate 1000 {! stencil jacobi_2d {! [0 ][0:Nc-1] : [1]a[0][0] = [0]a[0][0];! [Nr-1 ][0:Nc-1] : [1]a[0][0] = [0]a[0][0];! [0:Nr-1][0 ] : [1]a[0][0] = [0]a[0][0];! [0:Nr-1][Nc-1 ] : [1]a[0][0] = [0]a[0][0];! [1:Nr-2][1:Nc-2] : five_point_avg(a);! }! ! reduction max_diff max {! [0:Nr-1][0:Nc-1] : fabs([1]a[0][0] - [0]a[0][0]);! }!} check (max_diff < .00001) every 4 iterations!
Reference data over two (me steps: current(0) and next (1)
![Page 23: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/23.jpg)
Stencil DSL – Embedded in C int main() {! int Nr = 256; int Nc = 256; int T = 100;! double *a = malloc(Nc*Nr*sizeof(double));!!#pragma sdsl start time_steps:T block:8,8,8 tile:1,3,1 time:4! int Nr; int Nc;! grid g [Nr][Nc];! double griddata a on g at 0,1;! pointfunction five_point_avg(p) {! double ONE_FIFTH = 0.2;! [1]p[0][0] = ONE_FIFTH*([0]p[-1][0] + [0]p[0][-1] ! + [0]p[0][0] + [0]p[0][1] + [0]p[1][0]); }! iterate 1000 {! stencil jacobi_2d {! [0 ][0:Nc-1] : [1]a[0][0] = [0]a[0][0];! [Nr-1 ][0:Nc-1] : [1]a[0][0] = [0]a[0][0];! [0:Nr-1][0 ] : [1]a[0][0] = [0]a[0][0];! [0:Nr-1][Nc-1 ] : [1]a[0][0] = [0]a[0][0]; ! [1:Nr-2][1:Nc-2] : five_point_avg(a);}! reduction max_diff max {! [0:Nr-1][0:Nr-1] : fabs([1]a[0][0] - [0]a[0][0]);! }! } check (max_diff < .00001) every 4 iterations! #pragma sdsl end!}!
![Page 24: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/24.jpg)
Related Work
• 20+ publica(ons over the last few years on op(mizing stencil computa(ons
• Some stencil DSLs and stencil compilers – Pochoir (MIT), PATUS (Univ. Basel), Halide (MIT)
• Frameworks for building DSLs – SEJITS (LBL); Liszt, Op(ML, Op(QL, ... (Stanford)
• Our focus has been complementary: developing abstrac(on-‐specific compiler transforma(ons matched to performance-‐cri(cal characteris(cs of target architecture
![Page 25: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/25.jpg)
Stencils on Vector-SIMD Processors • Fundamental source of
inefficiency with stencil codes on current short-‐vector SIMD ISAs (e.g. SSE, AVX …) – Concurrent opera(ons on con(guous elements
– Each data element is reused in different “slots” of vector register
– Redundant loads or shuffle ops needed
• Compiler transforma(ons based on matching computa(onal characteris(cs of stencils to vector-‐SIMD architecture characteris(cs
for (i=0; i<H; ++i) for (j=0; j<W; ++j) c[i][j]+=b[i][j]+b[i][j+1];
a b c d
m n o p
n o p q
a b c d e f g h i j k l
m n o p q r s t u v w x
Inefficiency: Each element of b is loaded twice
Data in memory
Vector registers
0 1 2 3 VR0
VR1
VR2
VR3
VR4
c[i][j]
b[i][j]
![Page 26: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/26.jpg)
• 1D vector in memory ó (b) 2D logical view of same data • (c) Transposed 2D array moves interac(ng elements into same slot of
different vectors ó (d) New 1D layout afer transforma(on • Boundaries need special handling
Data Layout Transformation
a b c d
0 1 2 3
e f
0 1 2 3
g h i j k l
0 1 2 3 0 1 2 3
m n o p q r s t
0 1 2 3
u v w x
0 1 2 3
a b c d e f
g h i j k l
m n o p q r
s t u v w x
V
N M
a g m s b h n t c i o u d j p v e k q w f l r x
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
V
N M
(a) original layout
(b) dimension lifed (c) transposed
(d) transformed layout
for (i = 0; i < N; ++i) a[i]=b[i-‐1]+b[i]+b[i+1];
![Page 27: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/27.jpg)
Data Layout Transforma(on: Evalua(on
0
2
4
6
8
10
12
14
16 Ph
enom
Core
2Qua
d
Core
i7
Phen
om
Core
2Qua
d
Core
i7
Phen
om
Core
2Qua
d
Core
i7
Phen
om
Core
2Qua
d
Core
i7
Phen
om
Core
2Qua
d
Core
i7
Phen
om
Core
2Qua
d
Core
i7
Phen
om
Core
2Qua
d
Core
i7
J-1D J-2D-5pt J-2D-9pt J-3D Heatttut-3D FDTD-2D Rician-2D
Gflo
p/s
Benchmark/Microarchitecture
Ref. DLT DLTi
![Page 28: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/28.jpg)
Standard Tiling with DLT
Tile 1 Tile 2 Tile 3 Tile 4
Tile Dependences
t
i
(a) Standard tiling -- Linear view
(b) Standard tiling -- DLT view (t=1)
• Standard (ling cannot be used with the layout transform • Inter-‐(le dependences prevent vectoriza(on
![Page 29: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/29.jpg)
Split-‐(ling Time
Space
1 1 1 1
3 3 3 3
5 5 5 56
2 2 2 2 2
4 4 4 4 4
6 6 6 6
Upright
Inverted
Upright
Upright
Upright
Upright
Upright
Upright
Upright
Upright
Upright
Upright
Upright
Inverted
Inverted
Inverted
Inverted
Inverted
Inverted
Inverted
Inverted
• Divide itera(on space into upright and inverted (les • For each ! (mesteps where ! = (me (le size…
• Execute all upright (U) (les in parallel • Execute all inverted (I) (les in parallel
• Nested split (ling can be used for mul(ple spa(al dimensions • For 2D, 4 kinds of (les: UU, UI, IU, II
![Page 30: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/30.jpg)
Hybrid Split-‐(ling Time
Space
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
3 4 5 6 7 8 9 10 11
10
9
12
• Trade-‐off between standard (ling (parallelogram shape) and split (ling (trapezoidal shape) • Split (ling enables inter-‐(le parallelism and DLT but cache footprint grows with (me-‐(le size
• Standard (ling inhibits inter-‐(le parallelism but cache footprint is independent of (me-‐(le size
• Hybrid approach: combine split (ling along some inner spa(al loops and standard (ling along outer spa(al loops
![Page 31: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/31.jpg)
Experimental Results: DLT+Split Tiling
0
10
20
30
40
50
60
70
80
90
100
GFLOP/S
Benchmark
icc-‐par
pochoir
pluto
nested-‐split
hybrid-‐split
Sandy Bridge
![Page 32: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/32.jpg)
Stencils on GPUs • Vector-‐SIMD alignment problems non-‐existent • Different op(miza(on challenges: limited forms of synchroniza(on, avoidance of thread divergence
• Overlapped (ling – Redundantly compute neighboring cells to avoid inter-‐thread-‐block sync, lower communica(on, and avoid thread divergence
32 Logical Computation Actual Computationat time t
Actual Computationat time t+1
Elements needed at time t+1 Useless computation
![Page 33: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/33.jpg)
Stencil Compiler for GPU: Performance
![Page 34: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/34.jpg)
Mul(-‐target Code Genera(on from SDSL
Mul(-‐target Op(miza(on and Code Genera(on
Mul(core CPU
GPU
FPGA
Matlab/eSDSL
C/eSDSL
![Page 35: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/35.jpg)
DSL1 DSL2 DSLm
SSE AVX VHDL CUDA VSX …
…
Compiler 1
Compiler 2
Compiler m
• Our DSL compilers for tensors and stencils did not reuse components • Brute-‐force approach: N*M compilers for N domains and M plazorms • Can we achieve reuse in op(mizing many DSLs for mul(ple targets?
Many Stand-‐Alone DSL Efforts
![Page 36: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/36.jpg)
A Layered Approach
Architecture-‐Independent Domain-‐Specific Layer
Architecture-‐Independent General-‐Transforma(ons
Architecture-‐Specific Kernel Op(mizers
Stencil DSL
Tensor DSL
……. DSL-‐k
Domain-‐Specific Transforma(ons
Domain-‐Specific Par((oners
Polyhedral Transforma(ons
General-‐purpose Transforma(ons
……. Mul(-‐core CPU
GPU FPGA
General-‐purpose Par((oners
![Page 37: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/37.jpg)
Codelet-Centric Vector-SIMD Optimization • Problem: Compilers like icc typically achieve only a frac(on of
processor’s peak performance on polyhedrally transformed code – Overheads due to unaligned load/store opera(ons – Overheads due to repeated load/stores of reused data elements
• Solu(on: Develop a decoupled two-‐step op(miza(on process that allows separa(on of concerns: – Use polyhedral compiler transforma(ons to form small (L1 cache resident) (les with specific desirable proper(es: vector codelet
– Use a low-‐level code op(mizer (SPIRAL back-‐end) to generate op(mized code for vector codelet
• Proper(es of vectorizable codelet: formulated as an Int. Lin. Prog. – Maximal inner-‐most loop parallelism – Maximal number of stride 0/1 accesses in inner-‐most loop – Maximize number of innermost permutable loops – Minimiza(on of unaligned load/store opera(ons
• Same approach now being targeted customized accelerators • Details in PLDI ’13 paper
![Page 38: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/38.jpg)
Experimental Results: Vector Codelets
![Page 39: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/39.jpg)
IO Complexity: Revisi3ng the Red/Blue Pebble Game
ENS Fabrice Rastello
Louisiana State University J. Ramanujam
Ohio State University Venmugil Elango, P. Sadayappan
UCLA
Louis-‐Noel Pouchet
![Page 40: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/40.jpg)
High-Level Summary • Characterizing “communica(on” complexity is much more complicated than computa(onal complexity – Number of data transfers from large “slow” memory to limited “fast” memory (cache, registers etc.)
– Unlike comp. complexity, depends on order of execu(on of opera(ons and amount of fast memory
• First IO lower bounds: Hong/Kung classic 1981 paper – Tight lower bounds for some algorithms but loose for others – Monolithic: Cannot compose analyses: S = S1;S2 – Only used for analysis of regular Computa(onal DAGs – Recent advance by Demmel, Yelick, and colleagues at Berkeley provides a generaliza(on for a class of loop computa(ons (linear and tensor algebra, N-‐body, etc.), but not effec(ve for stencils etc
• We develop pebbling variant and new bounding technique 1. Enables composability: LB(S) from LB(S1) and LB(S2), S = S1;S2 2. Tighter lower bounds for complementary class of CDAGs 3. Enables full automa(on for arbitrary, irregular CDAGs
![Page 41: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/41.jpg)
CDAG characteriza(on [Bilardi 2001]
![Page 42: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/42.jpg)
Hong-‐Kung (1981): Red-‐Blue Pebble Game
![Page 43: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/43.jpg)
Code Example and its CDAG
A[0] A[1] A[2] A[3]
S
S
for (i = 1; i < 4; ++i) S += A[i-‐1] * A[i];
![Page 44: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/44.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 45: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/45.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 46: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/46.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 47: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/47.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 48: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/48.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 49: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/49.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 50: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/50.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 51: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/51.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 52: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/52.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 53: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/53.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 54: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/54.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 55: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/55.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 56: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/56.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 57: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/57.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 58: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/58.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 59: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/59.jpg)
Red-‐Blue Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 60: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/60.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 61: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/61.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 62: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/62.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 63: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/63.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 64: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/64.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 65: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/65.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 66: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/66.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 67: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/67.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 68: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/68.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 69: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/69.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 70: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/70.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 71: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/71.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 72: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/72.jpg)
Red-‐Blue Pebble Game: 3 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 73: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/73.jpg)
Hong-‐Kung (1981): S-‐par((oning of CDAG
![Page 74: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/74.jpg)
Hong-‐Kung (1981): S-‐par((oning theorem
Lower Bound
![Page 75: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/75.jpg)
The Composability Problem • The IO for a computa(on includes all inputs and outputs of the CDAG
• Consider S = S1 ; S2 • If some outputs of S1 are inputs to S2, they may be passed in red pebbles in a complete game for S
• OptIO(S) <= OPtIO(S1)+OPtIO(S2)
• Hence Lbound(S1) and Lbound(S2) cannot simply be added to get Lbound(S)
• In contrast Comp(S) = Comp(S1)+Comp(S2)
Code Sec(on S1
Code Sec(on S2
![Page 76: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/76.jpg)
Addressing Composability
1. Modify pebble game to disallow re-‐computa(on – Add a third type of pebble: white – Placed white pebble on a vertex, along with red pebble, when vertex is first computed
– Compute rule changed to only allow it when vertex does not have a white pebble (no change to load, store and delete rules)
2. Define a new metric IIO (internal IO) of CDAG G: – Given G = (I,V,E,O), internal CDAG G’ = (0,V,E,0) – IIO(G) is IO(G’), for op(mal game on G’ – IIO(G) ≤ IO(G) ≤ IIO(G)+|I|+|O|
IIO(G1)+IIO(G2) ≤ IIO(G1 U G2) where as
IO(G1 U G2) ≤ IO(G1)+IO(G2)
Composable Lower Bounds
Lower Bounds NOT composable
![Page 77: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/77.jpg)
Red-‐Blue-‐White Pebble Game
![Page 78: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/78.jpg)
S-‐par((oning of CDAG – new defini(on
![Page 79: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/79.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 80: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/80.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 81: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/81.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 82: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/82.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 83: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/83.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 84: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/84.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 85: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/85.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 86: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/86.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 87: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/87.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 88: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/88.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 89: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/89.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 90: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/90.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 91: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/91.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 92: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/92.jpg)
IIO RBW Pebble Game: 2 Red Pebbles A[0] A[1] A[2] A[3]
S
S
![Page 93: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/93.jpg)
Limitation of Hong-Kung S-partition Model
• Hong/Kung’s S-‐par((oning approach constrains vertex par((ons based on size of input dominator sets – Not effec(ve for computa(ons with few inputs that generate many more intermediates and finally output few values; extreme example: Diamond DAG
– For S>0 red pebbles, a valid 2S-‐par((on is the en(re CDAG, i.e. lower bound = S*(1-‐1) = 0
![Page 94: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/94.jpg)
New Approach: MinCut Partitioning
• Associa(on between wavefront of “live” ver(ces just before firing vertex x in any valid RBW pebble game and a convex graph mincut including vertex x
• Number of ver(ces in maximal convex graph-‐mincut associated with all ver(ces is a lower bound on the maximal schedule wavefront in op(mal pebble game.
xS
T
![Page 95: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/95.jpg)
Diamond DAG with const. mincut wrt x
xS
T
![Page 96: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/96.jpg)
Constrained mincut wrt x
xS
T
![Page 97: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/97.jpg)
xS
T
Constrained mincut wrt x
![Page 98: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/98.jpg)
Diamond DAG with 4 disjoint vertex sets
G1 G2
G4G3
![Page 99: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/99.jpg)
IO Complexity Comparison
Hong-‐Kung Our bounds
Diamond DAG
FFT
9-‐pt Seidel
![Page 100: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/100.jpg)
IO Lower Bounds for Arbitrary CDAGs
• Automa(on: Tool to generate IO lower bound for an arbitrary CDAG instance, for given number of red pebbles
• Combine both approaches and use (ghter bound: – 2S par55oning of Hong-‐Kung
– New convex mincut • Each bounding approach is
effec(ve for different class of CDAGs – 2S par((oning: high-‐fanout DAGs like Mat-‐Mult
– Convex mincut: bounded fanout CDAGs like stencils
![Page 101: Domain’Specific-Abstrac3ons-and- PerformancePortability ...labexcompilation.ens-lyon.fr/wp-content/uploads/2013/02/Saday.pdfSrini Parthasarathy (OSU) Louis-Noel Pouchet (UCLA) Russ](https://reader034.vdocuments.us/reader034/viewer/2022051905/5ff6f091b3a3467c10688d02/html5/thumbnails/101.jpg)
Summary
• Characterizing lower bounds on inherent data access complexity of algorithms is important
• Previous approaches have some limita(ons • New approach to deriving lower bounds
– Develop (ghter analy(cal bounds for some algorithms – Enable composing lower bounds for cons(tuent sub-‐computa(ons
– Generate lower bounds for irregular CDAGs