les mardi du d´eveloppement...

28
Les Mardi du D´ eveloppement Technologique Towards a solver software stack on top of runtime systems 24 th of November 2015 HiePACS - RealOpt - Storm Kaust - LBNL - Stanford - UCD - UTK INRIA Bordeaux Sud-Ouest

Upload: others

Post on 08-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Les Mardi du Developpement TechnologiqueTowards a solver software stack on top of runtime systems

24th of November 2015

HiePACS - RealOpt - StormKaust - LBNL - Stanford - UCD - UTKINRIA Bordeaux Sud-Ouest

Page 2: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Outline

Context

Build solver software stack

Solutions and achievements

Next objectives

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 2/23

Page 3: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Context

Outline

Context

Build solver software stack

Solutions and achievements

Next objectives

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 3/23

Page 4: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Context

Matrices Over Runtime Systems @ Exascale

Linear algebra

AX = B

Sequential-Task-Flowfor (j = 0; j < N; j++)

Task (A[j]);

Direct Acyclic Graph

−→

Runtime systems

M. GPU M. GPU

CPU

CPU

CPU

CPU CPU

CPU

CPU

CPU

Time

−→

Heterogeneousplatforms

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 4/23

Page 5: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Context

A view of HiePACS solvers

Chameleon: dense linear solver• Tile algorithms: BLAS 3, some BLAS 2, LAPACK One-Sided, Norms• Supported runtimes: Quark and StarPU, (PaRSEC soon)

• Ability to use cluster of heterogeneous nodes:I MPI+threads, CPUs (BLAS/LAPACK)+GPUs (cuBLAS/MAGMA)

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 5/23

Page 6: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Context

A view of HiePACS solvers

PaStiX: sparse linear solver• LLT, LDLT, and LU, with static pivoting, supernodal approach• Native version: MPI+threads• Versions with runtimes: on top of PaRSEC or StarPU

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 5/23

Page 7: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Context

A view of HiePACS solvers

MaPHyS: hybrid direct/iterative sparse linear solver• Solves Ax = b, where A is a square non singular general matrix• Native version: MPI+PaStiX/MUMPS+BLAS/LAPACK• Do not support runtimes for now, work in progress

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 5/23

Page 8: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Context

A view of HiePACS solvers

ScalFMM: scalable fast multipole methods• Simulate N-body interactions using the Fast Multipole Method based on

interpolation (Chebyshev or Lagrange)• Native version: MPI+OpenMP+BLAS+FFTW• Runtimes version: StarPU, OpenMP4 → StarPU (ADT K’STAR)

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 5/23

Page 9: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Context

Solver on top of runtime system - example: ChameleonChameleon : dense linear algebra tile algorithms (STF) on top of runtime systems

1) Tile matrix layout

nb=192, 320, 960, ...

nb

3) Runtime systems

M. GPU M. GPU

CPU

CPU

CPU

CPU CPU

CPU

CPU

CPU

Time

• QUARK: CPUs (pthread)

• StarPU: CPUs, CUDA, MPI

2) Algorithms

• STF Algorithms (PLASMA)I BLAS 3, some BLAS 2

I LAPACK One-Sided, Norms

• Generic task submission interfaceI different runtimes can be plugged

4) Optimized kernels

• BLAS, LAPACKI Intel MKL, Netlib blas/Lapack, ...

• CUDA/cuBLAS, MAGMA

• Misc: Hwloc, FxT, SimGrid,EZTrace, ...

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 6/23

Page 10: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Context

How to deploy complex HPC software stacks?

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 7/23

Page 11: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Build solver software stack

Outline

Context

Build solver software stack

Solutions and achievements

Next objectives

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 8/23

Page 12: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Build solver software stack

Common situations

You rock: install all the stack by yourself !• specialists on all the stack are rare• problems of compatibility between versions• takes a lot of time

You are a numerician not comfortable with software building• ask someone else to do it• use pre-installed versions (binary packages, modules)

I problem: only a couple of versions exist

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 9/23

Page 13: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Build solver software stack

Wish list

Two kinds of users1. Top-level users want the best version:

• a default build with best options to get perfomances regardingthe platform

2. A specialist wants to have the lead:• on the components he operates on

I flexibility to set his version• but may not care about many dependencies

I automatic choice of best options

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 10/23

Page 14: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Build solver software stack

Requirements

• A simple process to install a default version• A flexible way to choose build variants

I choose compiler, software versionsI enable components, e.g. MPI : YES/NOI build options, e.g. --enable-debug

• Be able to install it on a remote clusterI no root permissionsI no internet access (not necessarily)

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 11/23

Page 15: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Build solver software stack

Existing toolboxes

• PETScI scientific library for solving PDEs in parallel MPI+ThreadsI wrappers to external solvers (partitionners, linear algebra, ...)I custom python scripts to activate packages

I detection mode or download+install a web release, great !I detection problems, fixed versions to download

• TrilinosI similar to PETSc, maybe even broader scopeI embed one precise version of solversI no tool to install missing third party libraries

⇒ no competitive tool to install dependencies

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 12/23

Page 16: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Solutions and achievements

Outline

Context

Build solver software stack

Solutions and achievements

Next objectives

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 13/23

Page 17: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Solutions and achievements

Common build-install process for solvers• use of CMake with similar options• rely on the same detection of the system and libraries

I recursive system of CMake FindsI if your application depends on Chameleon, in CMake:

find package(CHAMELEON COMPONENTS STARPU MPI CUDA MAGMA FXT)

Solver Runtime Kernel

CHAMELEON

Miscellaneous Interface C-Fortran

CBLASLAPACKE

MAGMA

cuBLAS

CUDA MPI

SCHED_QUARK SCHED_STARPU

STARPU

HWLOC FXTSIMGRID

QUARKBLAS

LAPACK

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 14/23

Page 18: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Solutions and achievements

Common build-install process for solvers• use of CMake with similar options• rely on the same detection of the system and libraries

I recursive system of CMake FindsI if your application depends on Chameleon, in CMake:

find package(CHAMELEON COMPONENTS STARPU MPI CUDA MAGMA FXT)

List of available find package in Morse:

solvers chameleon, magma, mumps, pastix, plasma, scalapackruntimes quark, parsec, starpukernels (c)blas, lapack(e), fftwmisc (par)metis, (pt)scotch, hwloc, fxt, eztrace

Available online:https://scm.gforge.inria.fr/anonscm/svn/morse/trunk/morse distrib/cmake modules/morse/find

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 14/23

Page 19: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Solutions and achievements

State of the art: tool to distribute the software stack• we do not want to reinvent the wheel i.e. use an existing

solution:I Dpkg, 0install, Gub, Guix/Nix, Easybuild, ...

• classical package managers cannot meet our requirementsI no root permissions, build variants easy to give, a mode to

handle non open software (Intel MKL, nvidia CUDA)• Spack a custom tool to install HPC libraries will be used

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 15/23

Page 20: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Solutions and achievements

http://scalability-llnl.github.io/spack/

• Python 2.7: no install needed, ready to be used

$ git clone https://github.com/scalability-llnl/spack.git

$ ./spack/bin/spack install gcc

• Easy way to set build variants, examples:

$ spack install openmpi %[email protected]

$ spack install netlib-lapack +shared

$ spack install parpack ^netlib-lapack ^[email protected]

• Handle modulefiles, mirrors to work on clusters

$ spack load mpi

$ spack mirror create openmpi mpich hwloc netlib-blas

$ spack mirror add

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 16/23

Page 21: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

LLNL-PRES-8034738http://bit.ly/spack-git

Come to our SC15 Talk:The Spack Package Manager: Bringing Order to HPC Software ChaosWednesday, 11:30amHall 18AB

Spack is starting to be used in production at LLNL— Build, test, and deployment by code teams.— Tools, libraries, and Python at Livermore Computing.— Build research projects for students, postdocs.

Spack has a rapidly growing external community.— NERSC is working with LLNL on Cray support for Cori.— Argonne/IIT cluster challenge project.— Kitware contributing ParaView builds & features.— Users at INRIA, EPFL, U. Oregon, Sandia, LANL, others.

Get Involved with Spack!

https://bit.ly/spack-git

Get Spack!

Page 22: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Solutions and achievements

Spack weaknesses

• Not so mature ⇒ bugs, not robust enough?• Detection mode is missing ... but will be integrated soon

I positive exchanges with the main developerI reactive to answerI we feel that they have the same needs

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 17/23

Page 23: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Solutions and achievements

Morse in Spack: a fork where new packages can be found

• Available online - git repository:https://github.com/fpruvost/spack/ - morse branch

$ git clone https://github.com/fpruvost/spack.git

$ cd spack && git checkout morse

$ ./bin/spack install maphys

• Build variants examples:

$ spack install maphys ~examples +mumps

$ spack install pastix +starpu ^[email protected] ^mkl-blas

$ spack install [email protected] +debug +cuda +mpi +fxt +examples

Tutorial online:http://morse.gforge.inria.fr/tuto spack-morse/tuto spack.html

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 18/23

Page 24: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Solutions and achievements

MORSE provides packages to automatically installlibraries and its dependencies with

Dense linear solversChameleon, MAGMA, PLASMA,

ScaLAPACK

Sparse linear solversHIPS, MaPHyS, MUMPS, PaStiX,

qr mumps, SuiteSparse

Fast Multipole Method solversScalFMM

Runtime systemsQUARK, ParSEC, StarPU

Kernels(C)BLAS (MKL, Netlib, OpenBlas),

LAPACK(E), FFTW

Miscellaneous(Par)METIS, (PT-)SCOTCH, hwloc, MPI,

CUDA, SimGrid, EZTrace

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 19/23

Page 25: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Next objectives

Outline

Context

Build solver software stack

Solutions and achievements

Next objectives

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 20/23

Page 26: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Next objectives

Deliver a V0 of the solver stack distribution:• Add some missing MPI packages• Communication:

I Documentation: prerequisites, features, limitations,compatibility issues

I Website

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 21/23

Page 27: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Next objectives

Task-based solvers for users - an open question

Unify task-based solvers• Installation of solvers is a step

I common way to install the solver stacks• What about their integration, usage in upper-level programs• Is it possible to factorize something between solvers

I Chameleon, Hips, MaPHyS, PaStiX, ScalFMM. change some solvers API, or. drive existing solvers with an intermediate layer (converters)

I tutorial, documentation, vocabulary?

A solver software stack on top of runtimes - Mardi du Developpement Technologique - 24th of November 2015 22/23

Page 28: Les Mardi du D´eveloppement Technologiquesed.bordeaux.inria.fr/seminars/morse-spack_20151124.pdf · Miscellaneous Interface C-Fortran LAPACKE CBLAS MAGMA cuBLAS CUDA MPI SCHED_QUARK

Next objectives

Thank youANY QUESTIONS ?