![Page 1: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/1.jpg)
Tobias Fuchs
LMU Munich, MNM Team
www.mnm-team.org
Expressing and Exploiting
Multi-Dimensional Locality
in DASH
SPPEXA Symposium 2016
![Page 2: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/2.jpg)
2Expressing and Exploiting Multi-Dimensional Locality in DASH
![Page 3: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/3.jpg)
Background
3Expressing and Exploiting Multi-Dimensional Locality in DASH
DASH
• Vision: “C++ standard template library for HPC”.
• Provides n-dim array abstraction for stencil- and dense matrix
operations.
• Realization of the PGAS (partitioned global address space)
programming model.
![Page 4: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/4.jpg)
Background
4Expressing and Exploiting Multi-Dimensional Locality in DASH
PGAS and Locality
• Combine distributed memory into virtual global memory space.
• Strong sense of data ownership:
private, shared local, shared global
int p = 42;
![Page 5: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/5.jpg)
Background
5Expressing and Exploiting Multi-Dimensional Locality in DASH
PGAS and Locality
• Combine distributed memory into virtual global memory space.
• Strong sense of data ownership:
private, shared local, shared global
int p = 42;dash::Array<T> a;a.local[4] = p;
![Page 6: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/6.jpg)
Background
6Expressing and Exploiting Multi-Dimensional Locality in DASH
PGAS and Locality
• Combine distributed memory into virtual global memory space.
• Strong sense of data ownership:
private, shared local, shared global
int p;dash::Array<T> a;p = a[40];
![Page 7: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/7.jpg)
Background
7Expressing and Exploiting Multi-Dimensional Locality in DASH
PGAS and Locality
• Locality (access distance to data) predominant factor for efficiency.
L = (local accesses) / (total accesses)
• Access pattern on data depends on implementation of algorithm.
• Complexity to maintain locality increases exponentially with the number
of data dimensions.
![Page 8: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/8.jpg)
Objective and Approach
8Expressing and Exploiting Multi-Dimensional Locality in DASH
Objective
Portable efficiency by automatic deduction of optimal data distribution.
Approach
1. Identify distribution properties that allow well-defined specification of
any data distribution.
2. Let algorithms specify soft / hard constraints on distribution properties.
3. Derive optimal distribution for a given set of constraints.
Automatic deduction of optimal data distribution
![Page 9: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/9.jpg)
Distribution Properties
9Expressing and Exploiting Multi-Dimensional Locality in DASH
Property Categories
Mappings in data distribution can be categorized by their stages:
Partitioning Decomposing the index domain to blocks
Mapping Assigning blocks to units
Layout Storage order of block elements in units’ local memory
![Page 10: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/10.jpg)
Distribution Properties
10Expressing and Exploiting Multi-Dimensional Locality in DASH
Example: Morton Order Distribution
Category Properties
Partitioning balanced, regular, rectangular
Mapping balanced, minimal, neighbor
Layout blocked, linear, canonical
![Page 11: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/11.jpg)
Use Cases
11Expressing and Exploiting Multi-Dimensional Locality in DASH
Automatic Deduction of Optimal Data Distribution
“Find a data distribution that fulfills a set of properties.”
// Deduces pattern type, initializes pattern instance:auto pattern =
make_pattern< _partitioning_properties< |-- compile time deduction
balanced, regular >, | via C++11 generic meta template mapping_properties< | programming
neighbor > |layout_properties< |
blocked, row_major > _|> _(Size<2>(10000,10000), |-- run time deductionTeam<2>(24,24)); _|
![Page 12: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/12.jpg)
Use Cases
12Expressing and Exploiting Multi-Dimensional Locality in DASH
Automatic Deduction of Optimal Data Distribution
“Find a data distribution that is optimal for a given algorithm.”
// Deduce pattern from algorithm constraints:auto pattern = dash::make_pattern< dash::summa_pattern_constraints >(
Size<2>(10000,10000),Team<2>(24,24));
dash::Matrix<double, 2> matrix_a(pattern);dash::Matrix<double, 2> matrix_b(pattern);dash::Matrix<double, 2> matrix_c(pattern);
dash::summa(matrix_a, matrix_b, matrix_c);
![Page 13: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/13.jpg)
Use Cases
13Expressing and Exploiting Multi-Dimensional Locality in DASH
Automatic Deduction of Optimal Algorithm
“Find algorithm variant that is optimal for a given data distribution.”
// Specify how data is distributed in global memory:auto pattern = dash::TilePattern<2>(10000,10000, TILED(100,100));
dash::Matrix<double, 2> matrix_a(pattern);dash::Matrix<double, 2> matrix_b(pattern);dash::Matrix<double, 2> matrix_c(pattern);// Selects matrix product algorithm variant that is optimal for the given// pattern:dash::multiply(matrix_a, matrix_b, matrix_c);
![Page 14: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/14.jpg)
Use Cases
14Expressing and Exploiting Multi-Dimensional Locality in DASH
Automatic Deduction of Optimal Algorithm
“Find data distribution for the most efficient algorithm variant.”
// Use constraints of most efficient algorithm, usually SUMMA for DGEMM:auto pattern = dash::make_pattern< dash::multiply_pattern_constraints >(
Size<2>(10000,10000),Team<2>(24,24));
dash::Matrix<double, 2> matrix_a(pattern);dash::Matrix<double, 2> matrix_b(pattern);dash::Matrix<double, 2> matrix_c(pattern);// Calls dash::summadash::multiply(matrix_a, matrix_b, matrix_c);
![Page 15: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/15.jpg)
Evaluation: DGEMM
15Expressing and Exploiting Multi-Dimensional Locality in DASH
MKL multithreaded vs. DASH MPI (GFLOP/s)
DASH: automatic distribution of matrix elements to MPI processes,
each using serial MKL for block matrix multiplication (SUMMA).
MKL: OpenMP threads, matrix initialization in master thread.
![Page 16: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/16.jpg)
Evaluation: DGEMM
16Expressing and Exploiting Multi-Dimensional Locality in DASH
MKL multithreaded vs. DASH MPI (Speedup)
DASH: High locality due to optimal data distribution,
massive communication overhead (MPI, no shared windows).
MKL: Low locality (first touch issues), no communication.
DASH beats MKL for bigger N and higher degrees of parallelism.
Speedup = DASHGFLOPS / MKLGFLOPS
![Page 17: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/17.jpg)
Evaluation: SGEMM
17Expressing and Exploiting Multi-Dimensional Locality in DASH
MKL multithreaded vs. DASH MPI (GFLOP/s)
DASH: automatic distribution of matrix elements to MPI processes,
each using serial MKL for block matrix multiplication (SUMMA).
MKL: OpenMP threads, matrix initialization in master thread.
![Page 18: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/18.jpg)
Evaluation: SGEMM
18Expressing and Exploiting Multi-Dimensional Locality in DASH
MKL multithreaded vs. DASH MPI (Speedup)
DASH: High locality due to optimal data distribution,
massive communication overhead (MPI, no shared windows).
MKL: Low locality (first touch issues), no communication.
DASH beats MKL for bigger N and higher degrees of parallelism.
Speedup = DASHGFLOPS / MKLGFLOPS
![Page 19: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/19.jpg)
Summary
19Expressing and Exploiting Multi-Dimensional Locality in DASH
Summary
• Optimal distribution of n-dim data depends on unmanageable multitude
of factors (topology, access pattern, data flow, …).
• We defined a universal classification of distribution properties.
• Property system allows automatic deduction of optimal data distribution
and algorithm variants at compile time and run time.
Works with any C++11 compiler (tested: Intel 14.0+, gcc 4.7+, clang).
• Work in progress: optimal data distribution for data flows.
![Page 20: Expressing and Exploiting Multi-Dimensional Locality in DASH](https://reader031.vdocuments.us/reader031/viewer/2022030402/587919831a28abf13a8b52e9/html5/thumbnails/20.jpg)
Tobias Fuchs
www.mnm-team.org/~fuchst
DASH Project
www.dash-project.org
Visit for upcoming release