an overview on high performance numerical libraries · challenges in computational sciences dublin,...
TRANSCRIPT
The DOE Advanced CompuTational Software Collection (ACTS)
EU-US Summer School on HPC Challenges in Computational Sciences
Dublin, Ireland, June 28, 2012
An overview on High Performance Numerical Libraries
Tony DrummondComputational Research DivisionLawrence Berkeley National Laboratory
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
OUTLINE
• Motivation for HPC Numerical Libraries• Introduction to DOE ACTS Collection
• ACTS Numerical Functionality
• Use of the Tools in the ACTS Collection
• Sustainability through emerging hardware• Summary
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Motivation - HPC Applications
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
HPC Software Stack
APPLICATIONS
GENERAL PURPOSE TOOLS
PLATFORM SUPPORT TOOLS AND UTILITIES
HARDWARE
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
HPC Software Stack
APPLICATIONS
GENERAL PURPOSE TOOLS
PLATFORM SUPPORT TOOLS AND UTILITIES
HARDWARE
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
HPC Software Stack
APPLIC
ATIO
NS
GENER
AL PURP
OSE
TOO
LS
PLATFORM SUPPORT TOOLS AND UTILITIES
HARDWARE
SLEPc
PETSc
ATLAS
TAU
Global Arrays
AztecOO
Overture
OPT++
CCA
Hypre
SuperLU
ScaLAPACK
TAO
SUNDIALS
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
The DOE ACTS Collection
Goal: The Advanced CompuTational Software Collection (ACTS) makes reliable and efficient software tools more widely used, and more effective in solving the nation’s engineering and scientific problems.
References:• L.A. Drummond, O. Marques: An Overview of the Advanced CompuTational Software (ACTS)
Collection. ACM Transactions on Mathematical Software Vol. 31 pp. 282-301, 2005• http://acts.nersc.gov
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Speeding-Up Software Development
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
The DOE ACTS CollectionCategory Tool Functionalities
Numerical AztecOO Scalable linear and non-linear solvers using iterative schemes.
Hypre A family of scalable preconditioners.
PETSc Scalable linear and non-linear solvers and additional support for PDE related work.
OPT++ Object-oriented nonlinear optimization solvers.
SUNDIALS Solvers for the solution of systems of ordinary differential equations, nonlinear algebraic equations, and differential-algebraic equations.
ScaLAPACK High performance parallel dense linear algebra.
SLEPc Scalable algorithms for the solution of large sparse eigenvalue problems.
SuperLU Scalable direct solution of large, sparse, nonsymmetric linear systems of equations.
TAO Large-scale optimization software.
Code DevelopmentGlobal Arrays Supports the development of parallel programs.
Overture Supports the development of computational fluid dynamics codes in complex geometries.
Run Time SupportTAU Portable and scalable performance analyzes and tracing tools for C, C++, Fortran and Java
programs.
Library DevelopmentATLAS Automatic generation of optimized numerical dense algebra for scalar processors.
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Current State of DOE ACTS Collection
Category Tool FunctionalitiesNumerical AztecOO Scalable linear and non-linear solvers using iterative schemes.
Hypre A family of scalable preconditioners.PETSc Scalable linear and non-linear solvers and additional support for PDE related work.SUNDIALS Solvers for the solution of systems of ordinary differential equations, nonlinear algebraic
equations, and differential-algebraic equations.ScaLAPACK High performance parallel dense linear algebra.SLEPc Scalable algorithms for the solution of large sparse eigenvalue problems.SuperLU Scalable direct solution of large, sparse, nonsymmetric linear systems of equations.TAO Large-scale optimization software.
Code DevelopmentGlobal Arrays Supports the development of parallel programs.
Overture Supports the development of computational fluid dynamics codes in complex geometries.
Run Time Support TAU Portable and scalable performance analyzes and tracing tools for C, C++, Fortran and Java programs.
Library Development ATLAS Automatic generation of optimized numerical dense algebra for scalar processors.
Category Tool FunctionalitiesTools in
ConsiderationML Multilevel Preconditioners from TrilinosBELOS Krylov based solvers from TrilinosZoltan Parallel Partitional, Data-Management and Load BalancingpOSKI Sparse auto-tuning library
Considering 4 more
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Newly set of Tutorials and Exercises
http://acts.nersc.gov/events/Workshop2011
Next WorkshopAugust 14-17, 2012
http://acts.nersc.gov/events/Workshop2012
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
Problem Methodology Algorithms Library
Systems of Linear Equations
Direct Methods
LU FactorizationScaLAPACK(dense)
SuperLU (sparse)
Cholesky Factorization
ScaLAPACK
LDLT (Tridiagonal matrices)
ScaLAPACK
QR Factorization ScaLAPACK
QR with column pivoting
ScaLAPACK
LQ factorization ScaLAPACK
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Linear Solvers Solution of systems of linear equations may seem easy, but is at
the heart of many computational problems.
The choice of solution methods depends on matrix characteristics. Symmetric vs nonsymmetric Positive definite vs indefinite Dimension Sparsity Special structures
• Banded; block bordered diagonal Conditioning
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Linear Solvers
Primary two favors of linear solvers: Direct Iterative
Prefer to think of them as methods at opposite ends of a spectrum of linear solvers. Preconditioned iterative methods are somewhere in between. Where they are in the spectrum depends on the choice of
preconditioners, including preconditioners constructed using techniques from sparse direct methods.
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Comparison Between Direct and Iterative Solvers Iterative
Unknown no. of ops• Depend on no. of iterations
Preconditioning may be needed to improve convergence
Low memory requirement Simple data structure
Easier to implement Less communication Fewer graph problems
Handling multiple RHS may not be easy
Direct Finite no. of ops
• Doesn’t depend on anything Pivoting may be needed to
maintain stability Large memory requirement Complex data structure
• Banded structure need not be optimal
Harder to implement More communication More graph problems
• Ordering, symbolic manipulation Easy to handle multiple RHS
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Comparison Between Direct and Iterative Solvers Direct methods or iterative methods?
Depend on dimensions, sparsity, and conditioning
Sparse direct solvers have become very efficient.• Almost all sparse direct solvers are built on top of dense matrix
operations.
Direct methods are desirable when• Poor conditioning• High accuracies are desired• Small dimensions … How small is “small”?Really depend on memory requirement and time to solution
• Solving multiple linear systems with the same matrixOnly one factorization required
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
Problem Methodology Algorithms Library
Systems of Linear Equations
Direct Methods
LU FactorizationScaLAPACK(dense)
SuperLU (sparse)
Cholesky Factorization
ScaLAPACK
LDLT (Tridiagonal matrices)
ScaLAPACK
QR Factorization ScaLAPACK
QR with column pivoting
ScaLAPACK
LQ factorization ScaLAPACK
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Software Structure of ScaLAPACK
ScaLAPACK
BLAS
LAPACK BLACS
MPI/PVM/...
PBLASGlobalLocal
platform specific
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
ScaLAPACK Linear Equation Routines
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Summary of ScaLAPACK Functionality
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Magma Library
http://icl.cs.utk.edu/magma/• LAPACK for hybrid architectures including many-
core/GPU
• Version 1.0 released in April, 2011
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
SuperLU• Solve general sparse linear system A x = b.
–Example: A of dimension 106, only 10 ~ 100 nonzeros per row
• Algorithm: Gaussian elimination (LU factorization: A = LU), followed by lower/upper triangular solutions.–Store only nonzeros and perform operations only on
nonzeros.
• Efficient and portable implementation for high-performance architectures, flexible interface.
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
SuperLU Flavors
SuperLU SuperLU_MT SuperLU_DIST
Platform Serial SMP Distributed
Language CC + Pthread
(or pragmas)C + MPI
Data typeReal/complex,
Single/doubleReal, double
Real/complex,
Double
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
SuperLU Design• Exploit dense submatrices in the L & U factors
• Why are they good?– Permit use of Level 3 BLAS– Reduce inefficient indirect addressing (scatter/gather)– Reduce graph algorithms time by traversing a coarser graph
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
ProblemMethodology Algorithms Library
Systems of Linear Equations
(cont..)
Iterative Methods
Conjugate Gradient AztecOO (Trilinos)
PETSc
GMRES AztecOO
PETSc
Hypre
CG Squared AztecOO
PETSc
Bi-CG Stab AztecOO
PETSc
Quasi-Minimal Residual (QMR)
AztecOO
Transpose Free QMR
AztecOO
PETSc
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Trilinos
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Trilinos Linear SolversPackages Linear Solvers/Preconditioners
Amesos • Direct sparse linear solvers
AztecOO • Krylov based iterative linear solvers• ILU-type methods
Belos • Krylov based iterative linear solversCLAPS • Domain decomposition methodsEpetra • Direct dense linear solver
IFPACK/TIFPACK • Algebraic preconditioners• ILU type methods
Komplex • Krylov based iterative linear solvers
Meros • Block preconditioners
ML • Multigrid methodsTeko • Block preconditioners
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Trilinos interoperability
Library FunctionalitySuperLU • Direct sparse linear solvers
MUMPS • Direct sparse linear solvers
PETSc• Epetra_PETScAIJMatrix• ML accepts PETSc KSP for smoothers
(fine grid only) ::
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
PETSc Linear Solvers
Object Linear Solvers/PreconditionersKSP • Krylov based preconditioners
PC• Left preconditioning techniques• interoperable interface to direct and other
iterative solver schemes
AdditiveSchwartz
BlockJacobi Jacobi ILU ICC LU
(Sequential only) Others
Preconditioners
GMRES CG CGS Bi-CG-STAB TFQMR Richardson Chebychev Other
Krylov Subspace Methods
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
(ML−1AMR
−1 )(MRx) = ML−1b,
r ≡ b − AMR−1MRx
For ML=I
rL ≡ ML−1b − ML
−1Ax = ML−1r
For MR=IPETSCDefault
Ax=b,
Preconditioning Technique in PETSc
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
PETSc
ApplicationInitialization Evaluation of A and b Post-
Processing
SolveAx = b PC KSP
Linear Solvers (SLES)
PETSc codeUser code
Main Routine
PETSc Linear Solver Interface
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
PETSc InteroperabilityLibraries/
Frameworks Functionality
MUMPS • Direct sparse linear solvers
SuperLU • Direct sparse linear solvers
Trilinos • ML• Epetra
Hypre • Preconditioners
LAPACK/ScaLAPACK • Direct dense linear solver
::
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
PETSc Interface
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Overlapping Functionality
APPLICATIONS
Iterative Schemes forLinear and Non-Linear Solvers
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
ProblemMethodology Algorithms Library
Systems of Linear Equations
(cont..)
Iterative Methods
Conjugate Gradient AztecOO (Trilinos)
PETSc
GMRES AztecOO
PETSc
Hypre
CG Squared AztecOO
PETSc
Bi-CG Stab AztecOO
PETSc
Quasi-Minimal Residual (QMR)
AztecOO
Transpose Free QMR
AztecOO
PETSc
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
ProblemMethodology Algorithms Library
Systems of Linear Equations
(cont..)
Iterative Methods
(cont..)
SYMMLQPETSc
Precondition CG AztecOOPETScHypre
Richardson PETSc
Block Jacobi Preconditioner
AztecOOPETScHypre
Point Jocobi Preconditioner
AztecOO
Least Squares Polynomials
PETSc
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
ProblemMethodology Algorithms Library
Systems of Linear Equations
(cont..)
Iterative Methods
(cont..)
SOR Preconditioning PETSc
Overlapping Additive Schwartz PETSc
Approximate Inverse Hypre
Sparse LU preconditionerAztecOOPETScHypre
Incomplete LU (ILU) preconditioner
AztecOO
Least Squares Polynomials PETSc
MultiGrid (MG)
Methods
MG PreconditionerPETScHypre
Algebraic MG Hypre
Semi-coarsening Hypre
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
Problem Methodology Algorithm Library
Linear Least Squares Problems
Least Squares ScaLAPACK
Minimum Norm Solution ScaLAPACK
Minimum Norm Least Squares
ScaLAPACK
Standard Eigenvalue Problem Symmetric Eigenvalue Problem
For A=AH or A=AT
ScaLAPACK (dense)
SLEPc (sparse)
Singular Value Problem Singular Value Decomposition
ScaLAPACK (dense)
SLEPc (sparse)
Generalized Symmetric Definite Eigenproblem
Eigenproblem ScaLAPACK (dense)
SLEPc (sparse)
€
minx | | b− Ax | |2
€
minx | | x | |2
€
minx | | x | |2
€
minx | | b− Ax | |2
€
Az = λz
€
A = UΣVT
A = UΣVH
€
Az = λBzABz = λzBAz = λz
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS Collection
Computational Problem
Methodology Algorithm Library
Non-Linear Equations
Newton Based
Line Search PETSc
Trust Regions PETSc
Pseudo-Transient Continuation
PETSc
Matrix Free PETSc
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
ProblemMethodology Algorithm Library
Non-Linear Optimization
Newton Based
Newton OPT++
TAO
Finite-Difference Newton OPT++
TAO
Quasi-Newton OPT++
TAO
Non-linear Interior Point OPT++
TAO
CG
Standard Non-linear CG OPT++
TAO
Limited Memory BFGS OPT++
Gradient Projections TAO
Direct Search No derivate information OPT++
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
TAO Interface Reusing PETSc
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
ProblemMethodology Algorithm Library
Non-Linear Optimization
Newton Based
Newton OPT++
TAO
Finite-Difference Newton OPT++
TAO
Quasi-Newton OPT++
TAO
Non-linear Interior Point OPT++
TAO
CG
Standard Non-linear CG OPT++
TAO
Limited Memory BFGS OPT++
Gradient Projections TAO
Direct Search No derivate information OPT++
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
ProblemMethodology Algorithm Library
Non-Linear Optimization (cont..)
Semismoothing
Feasible Semismooth
TAO
Unfeasible semismooth
TAO
Ordinary Differential Equations Integration
Adam-Moulton
(Variable coefficient forms)
CVODE (SUNDIALS)
CVODES
Backward Differential Formula
Direct and Iterative Solvers
CVODE
CVODES
Nonlinear Algebraic Equations Inexact Newton
Line Search KINSOL (SUNDIALS)
Differential Algebraic Equations
Backward Differential Formula
Direct and Iterative Solvers
IDA (SUNDIALS)
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS Collection
Computational Problem
Support Techniques Library
Writing Parallel Programs
Distributed Arrays
Shared-Memory Global Arrays
Grid Generation OVERTURE
Structured Meshes Hypre
OVERTURE
PETSc
Semi-Structured Meshes
Hypre
OVERTURE
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS CollectionComputational
ProblemSupport Technique Library
ProfilingAlgorithmic Performance
Automatic instrumentation PETSc
User Instrumentation PETSc
Execution Performance
Automatic Instrumentation TAU
User Instrumentation TAU
Code OptimizationLibrary Installation
Linear Algebra Tuning ATLAS
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
User Interfaces
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Kernel Optimization
PETSc
Hypre
AztecOO P3
P2
P1
SuperLU
ScaLAPACK
Row-Projection Methods
Dense Linear Algebra
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Kernel Optimization
€
minx | | b− Ax | |2
€
minx | | x | |2
€
minx | | x | |2
€
minx | | b− Ax | |2
€
Az = λz
€
A = UΣVT
A = UΣVH
€
Az = λBzABz = λzBAz = λz
Ax = b or AX = B
Hx = b’Exploit concurrency :(in and out a node)• Hybrid programming (MPI+threads)• NUMA Aware operations
Kernel reusability:• Bottom-Up automatic optimization• Identify key parameters in the algorithm• Run-time parameter control
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
Functionality in The DOE ACTS Collection
OSKI
University of California, Berkeley
Symmetric peak = 612 MFlops
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
ACTS Software Sustainability Cycle
High Performance Software Numerical Libraries EU-US Summer School on HPC ChallengesDublin, Ireland - June 28, 2012
References
• L.A. Drummond, O. Marques: An Overview of the Advanced CompuTational Software (ACTS) Collection. ACM Transactions on Mathematical Software Vol. 31 pp. 282-301, 2005
• http://acts.nersc.gov
• http://acts.nersc.gov/events/Workshop2011