javier cuenca, josé gonzález department of ingeniería y tecnología de computadores domingo...
TRANSCRIPT
Javier Cuenca, José GonzálezDepartment of Ingeniería y Tecnología de Computadores
Domingo Giménez Department of Informática y Sistemas
University of MurciaSPAIN
Towards the Design of an Automatically Tuned Linear Algebra
Library
Linear Algebra: highly optimizable operations, but optimizations are Platform Specific
Traditional method: Hand-Optimization for each platform• Time-consuming• Incompatible with Hardware Evolution• Incompatible with changes in the system (architecture and
basic libraries)• Unsuitable for systems with variable workload• Misuse by non expert users
Current Situation of Linear Algebra Parallel Routines
Some groups and projects:
ATLAS, GrADS, LAWRA, FLAME, I-LIB
But the problem is very complex.
Solutions to this situation?
Routines Parameterised: System parameters, Algorithmic parameters
System parameters obtained at installation timeAnalytical model of the routine and simple installation routines to
obtain the system parameters
A reduced number of executions at installation time
Algorithmic parameters From the analytical model with the system parameters obtained in the installation process
Our approach
Our approach: the scheme
LAR-IFEXECUT. OF LAR-ERsBL
LIBRARY
INCLUSION PROCESS
LAR-OAPF
OAP SELECTION LAR-SPFINSTALLATION
SYSTEM MANAGER
IMPLEMEN. OF LAR-ERs
LAR-DESIGNER
MODELLING LAR
LAR-MOD
DESIGN
LAR
LAR-ERs
Design: Modelling the LAR LAR-DESIGNER
MODELLING LAR
LAR-MOD
DESIGN
LAR
The behaviour of the algorithm on the platform is defined
Texec = f (SPs, n, APs)
SPs = f(n, APs) System Parameters APs Algorithmic Parameters n Problem Size
LAR-MOD:Analytical Model of LAR
System Parameters (SPs):Hardware Platform Physical Characteristics
Current Conditions
Basic libraries
LARs Performance
LAR-MOD:Analytical Model of LAR
System Parameters (SPs):Hardware Platform Physical Characteristics
Current Conditions
Basic libraries
Two Kinds of SPs:
Communication System Parameters (CSPs)
Arithmetic System Parameters (ASPs)
LARs Performance
LAR-MOD:Analytical Model of LAR
System Parameters (SPs):Hardware Platform Physical Characteristics
Current Conditions
Basic libraries
Two Kinds of SPs:
Communication System Parameters (CSPs):
ts start-up time
tw word-sending time
Arithmetic System Parameters (ASPs)
LARs Performance
LAR-MOD:Analytical Model of LAR
System Parameters (SPs):Hardware Platform Physical Characteristics
Current Conditions
Basic libraries
Two Kinds of SPs:
Communication System Parameters (CSPs)
Arithmetic System Parameters (ASPs):
tc arithmetic cost. Using BLAS: k1 k2 and k3
LARs Performance
LAR-MOD:Analytical Model of LAR
System Parameters (SPs):Hardware Platform Physical Characteristics
Current Conditions
Basic libraries
How to estimate each SP?
1º.- Obtain the kernel of performance cost of LAR
2º.- Make an Estimation Routine from this kernel
LARs Performance
LAR-MOD:Analytical Model of LAR
DesignLAR-DESIGNER
MODELLING LAR
LAR-MOD
DESIGN
LAR
Design: Making the LAR-ERs
IMPLEMEN. OF LAR-ERs
LAR-DESIGNER
MODELLING LAR
LAR-MOD
DESIGN
LAR
LAR-ERs
Arithmetic System Parameters (ASPs):Computation Kernel of the LAR Estimation Routine
Similar storage scheme Similar quantity of data
Communication System Parameters (CSPs):Communication Kernel of the LAR Estimation Routine
Similar kind of communication Similar quantity of data
LAR-ERs: Estimation Routines
IMPLEMEN. OF LAR-ERs
LAR-DESIGNER
MODELLING LAR
LAR-MOD
DESIGN
LAR
LAR-ERs
Design
IMPLEMEN. OF LAR-ERs
LAR-DESIGNER
HAND-MADE
ONLY ONCE
MODELLING LAR
LAR-MOD
DESIGN
LAR
LAR-ERs
Design: Process has finished
Installation: Runing the LAR-ERs
LAR-IFEXECUT. OF LAR-ERsBL
LAR-SPFINSTALLATION
SYSTEM MANAGER
IMPLEMEN. OF LAR-ERs
LAR-DESIGNER
MODELLING LAR
LAR-MOD
DESIGN
LAR
LAR-ERs
Installation: obtaining the OAP
LAR-IFEXECUT. OF LAR-ERsBL
LAR-OAPF
OAP SELECTION LAR-SPFINSTALLATION
SYSTEM MANAGER
IMPLEMEN. OF LAR-ERs
LAR-DESIGNER
MODELLING LAR
LAR-MOD
DESIGN
LAR
LAR-ERs
Algorithmic Parameters (APs)
Known the SPs values,
the Optimum Values for the APs are calculated (OAP):
b block size
p number of processors
r c logical topology
grid configuration (logical 2D mesh)
Installation: obtaining the OAP
Installation
LAR-IFEXECUT. OF LAR-ERsBL
LAR-OAPF
OAP SELECTION LAR-SPFINSTALLATION
SYSTEM MANAGER
IMPLEMEN. OF LAR-ERs
LAR-DESIGNER
MODELLING LAR
LAR-MOD
DESIGN
LAR
LAR-ERs
Installation: putting it all together
LAR-IFEXECUT. OF LAR-ERsBL
LIBRARY
INCLUSION PROCESS
LAR-OAPF
OAP SELECTION LAR-SPFINSTALLATION
SYSTEM MANAGER
IMPLEMEN. OF LAR-ERs
LAR-DESIGNER
MODELLING LAR
LAR-MOD
DESIGN
LAR
LAR-ERs
Installation process finished
LAR-IFEXECUT. OF LAR-ERsBL
LIBRARY
INCLUSION PROCESS
LAR-OAPF
OAP SELECTION LAR-SPFINSTALLATION
SYSTEM MANAGER
IMPLEMEN. OF LAR-ERs
LAR-DESIGNER
MODELLING LAR
LAR-MOD
DESIGN
LAR
LAR-ERs
LAR: Least Squares Toeplitz Routine.
Platform: Network of PCs
LAR: One-sided Block Jacobi Method to solve the Symmetric Eigenvalue Problem.
Platform: SGI Origin 2000
LAR: Gaussian elimination.
Platform: NoW (heterogeneous system)
LAR: block LU factorization.
Platforms: IBM SP2, SGI Origin 2000, NoW
Basic Libraries: reference BLAS, machine BLAS, ATLAS
Experiments
Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with
4 and 8 processors.
LU on IBM SP2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
512 1024 1536 2048 2560 3072 3584
SEQ
PAR4
PAR8
Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with
4, 8 and 16 processors.
LU on Origin 2000
0
0.2
0.4
0.6
0.8
1
1.2
1.4
512 1024 1536 2048 2560 3072 3584
SEQ
PAR4
PAR8
PAR16
Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 processors. Using machine BLAS and ATLAS as basic
libraries.
LU on NoW
0
0,2
0,4
0,6
0,8
1
1,2
512 1024 1536 2048
SEQ BLAS
SEQ ATLAS
PAR4 BLAS
PAR4 ATLAS
We try to develop a methodology valid for a wide range of systems, and to include it in the design of linear algebra libraries:it is necessary to analyse the methodology in more systems and with more routines
The Basic Linear Algebra Library to use can be considered as another parameter
An installation strategy common to a set of routines must be developed
At the moment we are analysing routines individually, but it could be preferable to analyse algorithmic schemes
We are working in the design of a strategy for the parameters election in dynamic systems
Future Works