an overview on high performance numerical libraries · challenges in computational sciences dublin,...

51
The DOE Advanced CompuTational Software Collection (ACTS) EU-US Summer School on HPC Challenges in Computational Sciences Dublin, Ireland, June 28, 2012 An overview on High Performance Numerical Libraries Tony Drummond Computational Research Division Lawrence Berkeley National Laboratory

Upload: nguyennhi

Post on 12-Apr-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

The DOE Advanced CompuTational Software Collection (ACTS)

EU-US Summer School on HPC Challenges in Computational Sciences

Dublin, Ireland, June 28, 2012

An overview on High Performance Numerical Libraries

Tony DrummondComputational Research DivisionLawrence Berkeley National Laboratory

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

OUTLINE

• Motivation for HPC Numerical Libraries• Introduction to DOE ACTS Collection

• ACTS Numerical Functionality

• Use of the Tools in the ACTS Collection

• Sustainability through emerging hardware• Summary

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Motivation - HPC Applications

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

HPC Software Stack

APPLICATIONS

GENERAL PURPOSE TOOLS

PLATFORM SUPPORT TOOLS AND UTILITIES

HARDWARE

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

HPC Software Stack

APPLICATIONS

GENERAL PURPOSE TOOLS

PLATFORM SUPPORT TOOLS AND UTILITIES

HARDWARE

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

HPC Software Stack

APPLIC

ATIO

NS

GENER

AL PURP

OSE

TOO

LS

PLATFORM SUPPORT TOOLS AND UTILITIES

HARDWARE

SLEPc

PETSc

ATLAS

TAU

Global Arrays

AztecOO

Overture

OPT++

CCA

Hypre

SuperLU

ScaLAPACK

TAO

SUNDIALS

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

The DOE ACTS Collection

Goal: The Advanced CompuTational Software Collection (ACTS) makes reliable and efficient software tools more widely used, and more effective in solving the nation’s engineering and scientific problems.

References:• L.A. Drummond, O. Marques: An Overview of the Advanced CompuTational Software (ACTS)

Collection. ACM Transactions on Mathematical Software Vol. 31 pp. 282-301, 2005• http://acts.nersc.gov

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Speeding-Up Software Development

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

The DOE ACTS CollectionCategory Tool Functionalities

Numerical AztecOO Scalable linear and non-linear solvers using iterative schemes.

Hypre A family of scalable preconditioners.

PETSc Scalable linear and non-linear solvers and additional support for PDE related work.

OPT++ Object-oriented nonlinear optimization solvers.

SUNDIALS Solvers for the solution of systems of ordinary differential equations, nonlinear algebraic equations, and differential-algebraic equations.

ScaLAPACK High performance parallel dense linear algebra.

SLEPc Scalable algorithms for the solution of large sparse eigenvalue problems.

SuperLU Scalable direct solution of large, sparse, nonsymmetric linear systems of equations.

TAO Large-scale optimization software.

Code DevelopmentGlobal Arrays Supports the development of parallel programs.

Overture Supports the development of computational fluid dynamics codes in complex geometries.

Run Time SupportTAU Portable and scalable performance analyzes and tracing tools for C, C++, Fortran and Java

programs.

Library DevelopmentATLAS Automatic generation of optimized numerical dense algebra for scalar processors.

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Current State of DOE ACTS Collection

Category Tool FunctionalitiesNumerical AztecOO Scalable linear and non-linear solvers using iterative schemes.

Hypre A family of scalable preconditioners.PETSc Scalable linear and non-linear solvers and additional support for PDE related work.SUNDIALS Solvers for the solution of systems of ordinary differential equations, nonlinear algebraic

equations, and differential-algebraic equations.ScaLAPACK High performance parallel dense linear algebra.SLEPc Scalable algorithms for the solution of large sparse eigenvalue problems.SuperLU Scalable direct solution of large, sparse, nonsymmetric linear systems of equations.TAO Large-scale optimization software.

Code DevelopmentGlobal Arrays Supports the development of parallel programs.

Overture Supports the development of computational fluid dynamics codes in complex geometries.

Run Time Support TAU Portable and scalable performance analyzes and tracing tools for C, C++, Fortran and Java programs.

Library Development ATLAS Automatic generation of optimized numerical dense algebra for scalar processors.

Category Tool FunctionalitiesTools in

ConsiderationML Multilevel Preconditioners from TrilinosBELOS Krylov based solvers from TrilinosZoltan Parallel Partitional, Data-Management and Load BalancingpOSKI Sparse auto-tuning library

Considering 4 more

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Newly set of Tutorials and Exercises

http://acts.nersc.gov/events/Workshop2011

Next WorkshopAugust 14-17, 2012

http://acts.nersc.gov/events/Workshop2012

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

Problem Methodology Algorithms Library

Systems of Linear Equations

Direct Methods

LU FactorizationScaLAPACK(dense)

SuperLU (sparse)

Cholesky Factorization

ScaLAPACK

LDLT (Tridiagonal matrices)

ScaLAPACK

QR Factorization ScaLAPACK

QR with column pivoting

ScaLAPACK

LQ factorization ScaLAPACK

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Linear Solvers Solution of systems of linear equations may seem easy, but is at

the heart of many computational problems.

The choice of solution methods depends on matrix characteristics. Symmetric vs nonsymmetric Positive definite vs indefinite Dimension Sparsity Special structures

• Banded; block bordered diagonal Conditioning

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Linear Solvers

Primary two favors of linear solvers: Direct Iterative

Prefer to think of them as methods at opposite ends of a spectrum of linear solvers. Preconditioned iterative methods are somewhere in between. Where they are in the spectrum depends on the choice of

preconditioners, including preconditioners constructed using techniques from sparse direct methods.

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Comparison Between Direct and Iterative Solvers Iterative

Unknown no. of ops• Depend on no. of iterations

Preconditioning may be needed to improve convergence

Low memory requirement Simple data structure

Easier to implement Less communication Fewer graph problems

Handling multiple RHS may not be easy

Direct Finite no. of ops

• Doesn’t depend on anything Pivoting may be needed to

maintain stability Large memory requirement Complex data structure

• Banded structure need not be optimal

Harder to implement More communication More graph problems

• Ordering, symbolic manipulation Easy to handle multiple RHS

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Comparison Between Direct and Iterative Solvers Direct methods or iterative methods?

Depend on dimensions, sparsity, and conditioning

Sparse direct solvers have become very efficient.• Almost all sparse direct solvers are built on top of dense matrix

operations.

Direct methods are desirable when• Poor conditioning• High accuracies are desired• Small dimensions … How small is “small”?Really depend on memory requirement and time to solution

• Solving multiple linear systems with the same matrixOnly one factorization required

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

Problem Methodology Algorithms Library

Systems of Linear Equations

Direct Methods

LU FactorizationScaLAPACK(dense)

SuperLU (sparse)

Cholesky Factorization

ScaLAPACK

LDLT (Tridiagonal matrices)

ScaLAPACK

QR Factorization ScaLAPACK

QR with column pivoting

ScaLAPACK

LQ factorization ScaLAPACK

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Software Structure of ScaLAPACK

ScaLAPACK

BLAS

LAPACK BLACS

MPI/PVM/...

PBLASGlobalLocal

platform specific

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

ScaLAPACK Linear Equation Routines

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Summary of ScaLAPACK Functionality

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Magma Library

http://icl.cs.utk.edu/magma/• LAPACK for hybrid architectures including many-

core/GPU

• Version 1.0 released in April, 2011

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

SuperLU• Solve general sparse linear system A x = b.

–Example: A of dimension 106, only 10 ~ 100 nonzeros per row

• Algorithm: Gaussian elimination (LU factorization: A = LU), followed by lower/upper triangular solutions.–Store only nonzeros and perform operations only on

nonzeros.

• Efficient and portable implementation for high-performance architectures, flexible interface.

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

SuperLU Flavors

SuperLU SuperLU_MT SuperLU_DIST

Platform Serial SMP Distributed

Language CC + Pthread

(or pragmas)C + MPI

Data typeReal/complex,

Single/doubleReal, double

Real/complex,

Double

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

SuperLU Design• Exploit dense submatrices in the L & U factors

• Why are they good?– Permit use of Level 3 BLAS– Reduce inefficient indirect addressing (scatter/gather)– Reduce graph algorithms time by traversing a coarser graph

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

ProblemMethodology Algorithms Library

Systems of Linear Equations

(cont..)

Iterative Methods

Conjugate Gradient AztecOO (Trilinos)

PETSc

GMRES AztecOO

PETSc

Hypre

CG Squared AztecOO

PETSc

Bi-CG Stab AztecOO

PETSc

Quasi-Minimal Residual (QMR)

AztecOO

Transpose Free QMR

AztecOO

PETSc

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Trilinos

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Trilinos Linear SolversPackages Linear Solvers/Preconditioners

Amesos • Direct sparse linear solvers

AztecOO • Krylov based iterative linear solvers• ILU-type methods

Belos • Krylov based iterative linear solversCLAPS • Domain decomposition methodsEpetra • Direct dense linear solver

IFPACK/TIFPACK • Algebraic preconditioners• ILU type methods

Komplex • Krylov based iterative linear solvers

Meros • Block preconditioners

ML • Multigrid methodsTeko • Block preconditioners

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Trilinos interoperability

Library FunctionalitySuperLU • Direct sparse linear solvers

MUMPS • Direct sparse linear solvers

PETSc• Epetra_PETScAIJMatrix• ML accepts PETSc KSP for smoothers

(fine grid only) ::

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

PETSc Linear Solvers

Object Linear Solvers/PreconditionersKSP • Krylov based preconditioners

PC• Left preconditioning techniques• interoperable interface to direct and other

iterative solver schemes

AdditiveSchwartz

BlockJacobi Jacobi ILU ICC LU

(Sequential only) Others

Preconditioners

GMRES CG CGS Bi-CG-STAB TFQMR Richardson Chebychev Other

Krylov Subspace Methods

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

(ML−1AMR

−1 )(MRx) = ML−1b,

r ≡ b − AMR−1MRx

For ML=I

rL ≡ ML−1b − ML

−1Ax = ML−1r

For MR=IPETSCDefault

Ax=b,

Preconditioning Technique in PETSc

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

PETSc

ApplicationInitialization Evaluation of A and b Post-

Processing

SolveAx = b PC KSP

Linear Solvers (SLES)

PETSc codeUser code

Main Routine

PETSc Linear Solver Interface

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

PETSc InteroperabilityLibraries/

Frameworks Functionality

MUMPS • Direct sparse linear solvers

SuperLU • Direct sparse linear solvers

Trilinos • ML• Epetra

Hypre • Preconditioners

LAPACK/ScaLAPACK • Direct dense linear solver

::

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

PETSc Interface

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Overlapping Functionality

APPLICATIONS

Iterative Schemes forLinear and Non-Linear Solvers

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

ProblemMethodology Algorithms Library

Systems of Linear Equations

(cont..)

Iterative Methods

Conjugate Gradient AztecOO (Trilinos)

PETSc

GMRES AztecOO

PETSc

Hypre

CG Squared AztecOO

PETSc

Bi-CG Stab AztecOO

PETSc

Quasi-Minimal Residual (QMR)

AztecOO

Transpose Free QMR

AztecOO

PETSc

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

ProblemMethodology Algorithms Library

Systems of Linear Equations

(cont..)

Iterative Methods

(cont..)

SYMMLQPETSc

Precondition CG AztecOOPETScHypre

Richardson PETSc

Block Jacobi Preconditioner

AztecOOPETScHypre

Point Jocobi Preconditioner

AztecOO

Least Squares Polynomials

PETSc

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

ProblemMethodology Algorithms Library

Systems of Linear Equations

(cont..)

Iterative Methods

(cont..)

SOR Preconditioning PETSc

Overlapping Additive Schwartz PETSc

Approximate Inverse Hypre

Sparse LU preconditionerAztecOOPETScHypre

Incomplete LU (ILU) preconditioner

AztecOO

Least Squares Polynomials PETSc

MultiGrid (MG)

Methods

MG PreconditionerPETScHypre

Algebraic MG Hypre

Semi-coarsening Hypre

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

Problem Methodology Algorithm Library

Linear Least Squares Problems

Least Squares ScaLAPACK

Minimum Norm Solution ScaLAPACK

Minimum Norm Least Squares

ScaLAPACK

Standard Eigenvalue Problem Symmetric Eigenvalue Problem

For A=AH or A=AT

ScaLAPACK (dense)

SLEPc (sparse)

Singular Value Problem Singular Value Decomposition

ScaLAPACK (dense)

SLEPc (sparse)

Generalized Symmetric Definite Eigenproblem

Eigenproblem ScaLAPACK (dense)

SLEPc (sparse)

minx | | b− Ax | |2

minx | | x | |2

minx | | x | |2

minx | | b− Ax | |2

Az = λz

A = UΣVT

A = UΣVH

Az = λBzABz = λzBAz = λz

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS Collection

Computational Problem

Methodology Algorithm Library

Non-Linear Equations

Newton Based

Line Search PETSc

Trust Regions PETSc

Pseudo-Transient Continuation

PETSc

Matrix Free PETSc

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

ProblemMethodology Algorithm Library

Non-Linear Optimization

Newton Based

Newton OPT++

TAO

Finite-Difference Newton OPT++

TAO

Quasi-Newton OPT++

TAO

Non-linear Interior Point OPT++

TAO

CG

Standard Non-linear CG OPT++

TAO

Limited Memory BFGS OPT++

Gradient Projections TAO

Direct Search No derivate information OPT++

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

TAO Interface Reusing PETSc

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

ProblemMethodology Algorithm Library

Non-Linear Optimization

Newton Based

Newton OPT++

TAO

Finite-Difference Newton OPT++

TAO

Quasi-Newton OPT++

TAO

Non-linear Interior Point OPT++

TAO

CG

Standard Non-linear CG OPT++

TAO

Limited Memory BFGS OPT++

Gradient Projections TAO

Direct Search No derivate information OPT++

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

ProblemMethodology Algorithm Library

Non-Linear Optimization (cont..)

Semismoothing

Feasible Semismooth

TAO

Unfeasible semismooth

TAO

Ordinary Differential Equations Integration

Adam-Moulton

(Variable coefficient forms)

CVODE (SUNDIALS)

CVODES

Backward Differential Formula

Direct and Iterative Solvers

CVODE

CVODES

Nonlinear Algebraic Equations Inexact Newton

Line Search KINSOL (SUNDIALS)

Differential Algebraic Equations

Backward Differential Formula

Direct and Iterative Solvers

IDA (SUNDIALS)

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS Collection

Computational Problem

Support Techniques Library

Writing Parallel Programs

Distributed Arrays

Shared-Memory Global Arrays

Grid Generation OVERTURE

Structured Meshes Hypre

OVERTURE

PETSc

Semi-Structured Meshes

Hypre

OVERTURE

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS CollectionComputational

ProblemSupport Technique Library

ProfilingAlgorithmic Performance

Automatic instrumentation PETSc

User Instrumentation PETSc

Execution Performance

Automatic Instrumentation TAU

User Instrumentation TAU

Code OptimizationLibrary Installation

Linear Algebra Tuning ATLAS

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

User Interfaces

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Kernel Optimization

PETSc

Hypre

AztecOO P3

P2

P1

SuperLU

ScaLAPACK

Row-Projection Methods

Dense Linear Algebra

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Kernel Optimization

minx | | b− Ax | |2

minx | | x | |2

minx | | x | |2

minx | | b− Ax | |2

Az = λz

A = UΣVT

A = UΣVH

Az = λBzABz = λzBAz = λz

Ax = b or AX = B

Hx = b’Exploit concurrency :(in and out a node)• Hybrid programming (MPI+threads)• NUMA Aware operations

Kernel reusability:• Bottom-Up automatic optimization• Identify key parameters in the algorithm• Run-time parameter control

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

Functionality in The DOE ACTS Collection

OSKI

University of California, Berkeley

Symmetric peak = 612 MFlops

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

ACTS Software Sustainability Cycle

High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012

References

• L.A. Drummond, O. Marques: An Overview of the Advanced CompuTational Software (ACTS) Collection. ACM Transactions on Mathematical Software Vol. 31 pp. 282-301, 2005

• http://acts.nersc.gov

• http://acts.nersc.gov/events/Workshop2011