expressing and exploiting multi-dimensional locality in dash

20
Tobias Fuchs [email protected] LMU Munich, MNM Team www.mnm-team.org Expressing and Exploiting Multi-Dimensional Locality in DASH SPPEXA Symposium 2016

Upload: ludwig-maximilians-universitaet-lmu-muenchen

Post on 23-Jan-2018

195 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Expressing and Exploiting Multi-Dimensional Locality in DASH

Tobias Fuchs

[email protected]

LMU Munich, MNM Team

www.mnm-team.org

Expressing and Exploiting

Multi-Dimensional Locality

in DASH

SPPEXA Symposium 2016

Page 2: Expressing and Exploiting Multi-Dimensional Locality in DASH

2Expressing and Exploiting Multi-Dimensional Locality in DASH

Page 3: Expressing and Exploiting Multi-Dimensional Locality in DASH

Background

3Expressing and Exploiting Multi-Dimensional Locality in DASH

DASH

• Vision: “C++ standard template library for HPC”.

• Provides n-dim array abstraction for stencil- and dense matrix

operations.

• Realization of the PGAS (partitioned global address space)

programming model.

Page 4: Expressing and Exploiting Multi-Dimensional Locality in DASH

Background

4Expressing and Exploiting Multi-Dimensional Locality in DASH

PGAS and Locality

• Combine distributed memory into virtual global memory space.

• Strong sense of data ownership:

private, shared local, shared global

int p = 42;

Page 5: Expressing and Exploiting Multi-Dimensional Locality in DASH

Background

5Expressing and Exploiting Multi-Dimensional Locality in DASH

PGAS and Locality

• Combine distributed memory into virtual global memory space.

• Strong sense of data ownership:

private, shared local, shared global

int p = 42;dash::Array<T> a;a.local[4] = p;

Page 6: Expressing and Exploiting Multi-Dimensional Locality in DASH

Background

6Expressing and Exploiting Multi-Dimensional Locality in DASH

PGAS and Locality

• Combine distributed memory into virtual global memory space.

• Strong sense of data ownership:

private, shared local, shared global

int p;dash::Array<T> a;p = a[40];

Page 7: Expressing and Exploiting Multi-Dimensional Locality in DASH

Background

7Expressing and Exploiting Multi-Dimensional Locality in DASH

PGAS and Locality

• Locality (access distance to data) predominant factor for efficiency.

L = (local accesses) / (total accesses)

• Access pattern on data depends on implementation of algorithm.

• Complexity to maintain locality increases exponentially with the number

of data dimensions.

Page 8: Expressing and Exploiting Multi-Dimensional Locality in DASH

Objective and Approach

8Expressing and Exploiting Multi-Dimensional Locality in DASH

Objective

Portable efficiency by automatic deduction of optimal data distribution.

Approach

1. Identify distribution properties that allow well-defined specification of

any data distribution.

2. Let algorithms specify soft / hard constraints on distribution properties.

3. Derive optimal distribution for a given set of constraints.

Automatic deduction of optimal data distribution

Page 9: Expressing and Exploiting Multi-Dimensional Locality in DASH

Distribution Properties

9Expressing and Exploiting Multi-Dimensional Locality in DASH

Property Categories

Mappings in data distribution can be categorized by their stages:

Partitioning Decomposing the index domain to blocks

Mapping Assigning blocks to units

Layout Storage order of block elements in units’ local memory

Page 10: Expressing and Exploiting Multi-Dimensional Locality in DASH

Distribution Properties

10Expressing and Exploiting Multi-Dimensional Locality in DASH

Example: Morton Order Distribution

Category Properties

Partitioning balanced, regular, rectangular

Mapping balanced, minimal, neighbor

Layout blocked, linear, canonical

Page 11: Expressing and Exploiting Multi-Dimensional Locality in DASH

Use Cases

11Expressing and Exploiting Multi-Dimensional Locality in DASH

Automatic Deduction of Optimal Data Distribution

“Find a data distribution that fulfills a set of properties.”

// Deduces pattern type, initializes pattern instance:auto pattern =

make_pattern< _partitioning_properties< |-- compile time deduction

balanced, regular >, | via C++11 generic meta template mapping_properties< | programming

neighbor > |layout_properties< |

blocked, row_major > _|> _(Size<2>(10000,10000), |-- run time deductionTeam<2>(24,24)); _|

Page 12: Expressing and Exploiting Multi-Dimensional Locality in DASH

Use Cases

12Expressing and Exploiting Multi-Dimensional Locality in DASH

Automatic Deduction of Optimal Data Distribution

“Find a data distribution that is optimal for a given algorithm.”

// Deduce pattern from algorithm constraints:auto pattern = dash::make_pattern< dash::summa_pattern_constraints >(

Size<2>(10000,10000),Team<2>(24,24));

dash::Matrix<double, 2> matrix_a(pattern);dash::Matrix<double, 2> matrix_b(pattern);dash::Matrix<double, 2> matrix_c(pattern);

dash::summa(matrix_a, matrix_b, matrix_c);

Page 13: Expressing and Exploiting Multi-Dimensional Locality in DASH

Use Cases

13Expressing and Exploiting Multi-Dimensional Locality in DASH

Automatic Deduction of Optimal Algorithm

“Find algorithm variant that is optimal for a given data distribution.”

// Specify how data is distributed in global memory:auto pattern = dash::TilePattern<2>(10000,10000, TILED(100,100));

dash::Matrix<double, 2> matrix_a(pattern);dash::Matrix<double, 2> matrix_b(pattern);dash::Matrix<double, 2> matrix_c(pattern);// Selects matrix product algorithm variant that is optimal for the given// pattern:dash::multiply(matrix_a, matrix_b, matrix_c);

Page 14: Expressing and Exploiting Multi-Dimensional Locality in DASH

Use Cases

14Expressing and Exploiting Multi-Dimensional Locality in DASH

Automatic Deduction of Optimal Algorithm

“Find data distribution for the most efficient algorithm variant.”

// Use constraints of most efficient algorithm, usually SUMMA for DGEMM:auto pattern = dash::make_pattern< dash::multiply_pattern_constraints >(

Size<2>(10000,10000),Team<2>(24,24));

dash::Matrix<double, 2> matrix_a(pattern);dash::Matrix<double, 2> matrix_b(pattern);dash::Matrix<double, 2> matrix_c(pattern);// Calls dash::summadash::multiply(matrix_a, matrix_b, matrix_c);

Page 15: Expressing and Exploiting Multi-Dimensional Locality in DASH

Evaluation: DGEMM

15Expressing and Exploiting Multi-Dimensional Locality in DASH

MKL multithreaded vs. DASH MPI (GFLOP/s)

DASH: automatic distribution of matrix elements to MPI processes,

each using serial MKL for block matrix multiplication (SUMMA).

MKL: OpenMP threads, matrix initialization in master thread.

Page 16: Expressing and Exploiting Multi-Dimensional Locality in DASH

Evaluation: DGEMM

16Expressing and Exploiting Multi-Dimensional Locality in DASH

MKL multithreaded vs. DASH MPI (Speedup)

DASH: High locality due to optimal data distribution,

massive communication overhead (MPI, no shared windows).

MKL: Low locality (first touch issues), no communication.

DASH beats MKL for bigger N and higher degrees of parallelism.

Speedup = DASHGFLOPS / MKLGFLOPS

Page 17: Expressing and Exploiting Multi-Dimensional Locality in DASH

Evaluation: SGEMM

17Expressing and Exploiting Multi-Dimensional Locality in DASH

MKL multithreaded vs. DASH MPI (GFLOP/s)

DASH: automatic distribution of matrix elements to MPI processes,

each using serial MKL for block matrix multiplication (SUMMA).

MKL: OpenMP threads, matrix initialization in master thread.

Page 18: Expressing and Exploiting Multi-Dimensional Locality in DASH

Evaluation: SGEMM

18Expressing and Exploiting Multi-Dimensional Locality in DASH

MKL multithreaded vs. DASH MPI (Speedup)

DASH: High locality due to optimal data distribution,

massive communication overhead (MPI, no shared windows).

MKL: Low locality (first touch issues), no communication.

DASH beats MKL for bigger N and higher degrees of parallelism.

Speedup = DASHGFLOPS / MKLGFLOPS

Page 19: Expressing and Exploiting Multi-Dimensional Locality in DASH

Summary

19Expressing and Exploiting Multi-Dimensional Locality in DASH

Summary

• Optimal distribution of n-dim data depends on unmanageable multitude

of factors (topology, access pattern, data flow, …).

• We defined a universal classification of distribution properties.

• Property system allows automatic deduction of optimal data distribution

and algorithm variants at compile time and run time.

Works with any C++11 compiler (tested: Intel 14.0+, gcc 4.7+, clang).

• Work in progress: optimal data distribution for data flows.

Page 20: Expressing and Exploiting Multi-Dimensional Locality in DASH

Tobias Fuchs

[email protected]

www.mnm-team.org/~fuchst

DASH Project

www.dash-project.org

Visit for upcoming release