![Page 1: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/1.jpg)
*Koç University, Istanbul, Turkey
parcorelab.com
A Prediction Framework for Fast SparseTriangular Solves
Najeeb Ahmad*, Buse Yilmaz*, Didem Unat*
Best Artifact Awardee
![Page 2: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/2.jpg)
Outline
• Part I: Main Topic
– Introduction
– Background and Motivation
– Prediction Framework
– Evaluation
– Related Work
– Conclusion
A Prediction Framework for Fast Sparse Triangular Solves 2
![Page 3: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/3.jpg)
Outline
• Part I: Main Topic
– Introduction
– Background and Motivation
– Prediction Framework
– Evaluation
– Related Work
– Conclusion
• Part II: Artifact Evaluation
– Our Artifact Evaluation Experience
A Prediction Framework for Fast Sparse Triangular Solves 2
![Page 4: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/4.jpg)
Introduction
• Sparse Triangular Solve (SpTRSV)
– an important computational kernel
– most time-consuming part of an application in many cases• e.g. ILU-preconditioned GMRES solvers [1]
A Prediction Framework for Fast Sparse Triangular Solves 3
![Page 5: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/5.jpg)
Introduction
• Sparse Triangular Solve (SpTRSV)
– an important computational kernel
– most time-consuming part of an application in many cases
• e.g. ILU-preconditioned GMRES solvers [1]
• Many CPU, GPU SpTRSV algorithms available
– CPU: Intel MKL, Park et al. [2]
– GPU: NVIDIA cuSPARSE library, Liu et al. [3], Li et al. [4]
A Prediction Framework for Fast Sparse Triangular Solves 3
![Page 6: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/6.jpg)
Introduction
• Sparse Triangular Solve (SpTRSV)
– an important computational kernel
– most time-consuming part of an application in many cases
• e.g. ILU-preconditioned GMRES solvers [1]
• Many CPU, GPU SpTRSV algorithms available
– CPU: Intel MKL, Park et al. [2]
– GPU: NVIDIA cuSPARSE library, Liu et al. [3], Li et al. [4]
• No single algorithm/platform performs best for all input matrices
– Algorithm performance varies with matrix sparsity pattern
A Prediction Framework for Fast Sparse Triangular Solves 3
![Page 7: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/7.jpg)
Introduction
• Sparse Triangular Solve (SpTRSV)
– an important computational kernel
– most time-consuming part of an application in many cases
• e.g. ILU-preconditioned GMRES solvers [1]
• Many CPU, GPU SpTRSV algorithms available
– CPU: Intel MKL, Park et al. [2]
– GPU: NVIDIA cuSPARSE library, Liu et al. [3], Li et al. [4]
• No single algorithm/platform performs best for all input matrices
– Algorithm performance varies with matrix sparsity pattern
Selecting the fastest SpTRSV algorithm is a non-trivial task!!!
A Prediction Framework for Fast Sparse Triangular Solves 3
![Page 8: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/8.jpg)
Contributions
• A machine learning-based framework
– predicts the fastest SpTRSV algorithm on heterogeneous CPU-GPU systems
– automated feature extraction, performance data collection and model training
– Extensible with new training datasets, algorithms
A Prediction Framework for Fast Sparse Triangular Solves 4
![Page 9: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/9.jpg)
Contributions
• A machine learning-based framework
– predicts the fastest SpTRSV algorithm on heterogeneous CPU-GPU systems
– automated feature extraction, performance data collection and model training
– Extensible with new training datasets, algorithms
• Performance, accuracy, overhead evaluation of the framework on state-of-the-art CPU-GPU system
A Prediction Framework for Fast Sparse Triangular Solves 4
![Page 10: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/10.jpg)
Contributions
• A machine learning-based framework
– predicts the fastest SpTRSV algorithm on heterogeneous CPU-GPU systems
– automated feature extraction, performance data collection and model training
– Extensible with new training datasets, algorithms
• Performance, accuracy, overhead evaluation of the framework on state-of-the-art CPU-GPU system
• Performance study of six SpTRSV algorithms (CPU & GPU)
A Prediction Framework for Fast Sparse Triangular Solves 4
![Page 11: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/11.jpg)
Contributions
• A machine learning-based framework
– predicts the fastest SpTRSV algorithm on heterogeneous CPU-GPU systems
– automated feature extraction, performance data collection and model training
– Extensible with new training datasets, algorithms
• Performance, accuracy, overhead evaluation of the framework on state-of-the-art CPU-GPU system
• Performance study of six SpTRSV algorithms (CPU & GPU)
• Identification of matrix sparsity features SpTRSV for performance prediction
A Prediction Framework for Fast Sparse Triangular Solves 4
![Page 12: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/12.jpg)
Background and Motivation
• Sparse triangular systems
Ly = b or Ux = y
A Prediction Framework for Fast Sparse Triangular Solves 5
![Page 13: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/13.jpg)
Background and Motivation
• Sparse triangular systems
Ly = b or Ux = y
A Prediction Framework for Fast Sparse Triangular Solves 5
L
![Page 14: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/14.jpg)
Background and Motivation
• Sparse triangular systems
Ly = b or Ux = y
A Prediction Framework for Fast Sparse Triangular Solves 5
L U
![Page 15: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/15.jpg)
Background and Motivation
• SpTRSV characteristics
A Prediction Framework for Fast Sparse Triangular Solves 6
L sparsity pattern
![Page 16: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/16.jpg)
Background and Motivation
• SpTRSV characteristics
A Prediction Framework for Fast Sparse Triangular Solves 6
L sparsity pattern Dependency Graph for L
![Page 17: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/17.jpg)
Background and Motivation
• SpTRSV Algorithms
– Level-scheduling
– Synchronization-free
A Prediction Framework for Fast Sparse Triangular Solves 7
![Page 18: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/18.jpg)
Background and Motivation
• SpTRSV Algorithms
– Level-scheduling
– Synchronization-free
A Prediction Framework for Fast Sparse Triangular Solves 7
![Page 19: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/19.jpg)
Background and Motivation
• SpTRSV Algorithms– Level-scheduling
– Synchronization-free
• SpTRSV performance– CPU
• Intel MKL library– MKL(sequential)
– MKL(parallel)
– GPU• NVIDIA cuSPARSE library
– CUS1
– CUS2(level)
– CUS2(no level)
• Sync-Free [3]
A Prediction Framework for Fast Sparse Triangular Solves 7
![Page 20: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/20.jpg)
Background and Motivation
• SpTRSV Algorithms– Level-scheduling
– Synchronization-free
• SpTRSV performance– CPU
• Intel MKL library– MKL(sequential)
– MKL(parallel)
– GPU• NVIDIA cuSPARSE library
– CUS1
– CUS2(level)
– CUS2(no level)
• Sync-Free [3]
A Prediction Framework for Fast Sparse Triangular Solves 7
MKL(seq)30%
MKL(par)
5%
CUS119%
CUS2(lvl)19%
CUS2(no lvl)5%
Sync-Free22%
Fastest SpTRSV on Intel Gold(6148) + NVIDIA V100 GPU for37 sparse matrices (from SuiteSparse collection)
![Page 21: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/21.jpg)
Background and Motivation
• SpTRSV Algorithms– Level-scheduling
– Synchronization-free
• SpTRSV performance– CPU
• Intel MKL library– MKL(sequential)
– MKL(parallel)
– GPU• NVIDIA cuSPARSE library
– CUS1
– CUS2(level)
– CUS2(no level)
• Sync-Free [3]
A Prediction Framework for Fast Sparse Triangular Solves 7
CPU, 35%
GPU, 65%
Fastest SpTRSV on Intel Gold(6148) + NVIDIA V100 GPU for37 sparse matrices (from SuiteSparse collection)
![Page 22: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/22.jpg)
Background and Motivation
• SpTRSV Algorithms– Level-scheduling
– Synchronization-free
• SpTRSV performance– CPU
• Intel MKL library– MKL(sequential)
– MKL(parallel)
– GPU• NVIDIA cuSPARSE library
– CUS1
– CUS2(level)
– CUS2(no level)
• Sync-Free [3]
A Prediction Framework for Fast Sparse Triangular Solves 7
How to select the fastest SpTRSV algorithmfor a given input matrix on a CPU-GPU platform?
![Page 23: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/23.jpg)
SpTRSV Prediction Framework
• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine
– Based on features and SpTRSV performance data of a pre-selected matrix set
A Prediction Framework for Fast Sparse Triangular Solves 8
![Page 24: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/24.jpg)
SpTRSV Prediction Framework
• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine
– Based on features and SpTRSV performance data of a pre-selected matrix set
• Framework overview
– Five components
A Prediction Framework for Fast Sparse Triangular Solves 8
![Page 25: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/25.jpg)
SpTRSV Prediction Framework
• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine
– Based on features and SpTRSV performance data of a pre-selected matrix set
• Framework overview
– Five components
A Prediction Framework for Fast Sparse Triangular Solves 8
SparseMatrixDataset
1
![Page 26: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/26.jpg)
SpTRSV Prediction Framework
• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine
– Based on features and SpTRSV performance data of a pre-selected matrix set
• Framework overview
– Five components
A Prediction Framework for Fast Sparse Triangular Solves 8
SparseMatrixDataset
MatrixFeature
Extractor
1 2
![Page 27: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/27.jpg)
SpTRSV Prediction Framework
• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine
– Based on features and SpTRSV performance data of a pre-selected matrix set
• Framework overview
– Five components
A Prediction Framework for Fast Sparse Triangular Solves 8
SparseMatrixDataset
SpTRSVAlgorithmRepository
MatrixFeature
Extractor
1 2
3
![Page 28: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/28.jpg)
SpTRSV Prediction Framework
• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine
– Based on features and SpTRSV performance data of a pre-selected matrix set
• Framework overview
– Five components
A Prediction Framework for Fast Sparse Triangular Solves 8
SparseMatrixDataset
SpTRSVAlgorithmRepository
MatrixFeature
Extractor
Performance Data
Collector
1 2
3 4
![Page 29: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/29.jpg)
SpTRSV Prediction Framework
• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine
– Based on features and SpTRSV performance data of a pre-selected matrix set
• Framework overview
– Five components
A Prediction Framework for Fast Sparse Triangular Solves 8
SparseMatrixDataset
SpTRSVAlgorithmRepository
MatrixFeature
Extractor
Performance Data
Collector
Model Trainer And
Tester
1 2
3 4
5
![Page 30: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/30.jpg)
SpTRSV Prediction Framework
• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine
– Based on features and SpTRSV performance data of a pre-selected matrix set
• Framework overview
– Five components
A Prediction Framework for Fast Sparse Triangular Solves 8
SparseMatrixDataset
SpTRSVAlgorithmRepository
MatrixFeature
Extractor
Performance Data
Collector
Model Trainer And
Tester
1 2
3 4
5
![Page 31: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/31.jpg)
SpTRSV Prediction Framework
• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine
– Based on features and SpTRSV performance data of a pre-selected matrix set
• Framework overview
– Five components
A Prediction Framework for Fast Sparse Triangular Solves 8
SparseMatrixDataset
SpTRSVAlgorithmRepository
MatrixFeature
Extractor
Performance Data
Collector
Model Trainer And
Tester
PredictionModel
1 2
3 4
5
![Page 32: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/32.jpg)
SpTRSV Prediction Framework
• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine
– Based on features and SpTRSV performance data of a pre-selected matrix set
• Framework overview
– Five components
A Prediction Framework for Fast Sparse Triangular Solves 8
SparseMatrixDataset
SpTRSVAlgorithmRepository
MatrixFeature
Extractor
Performance Data
Collector
Model Trainer And
Tester
PredictionModel
Input SparseMatrix
PredictedSpTRSVAlgorithm
1 2
3 4
5
![Page 33: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/33.jpg)
Feature Selection
• Structural or sparsity features
– Started with ~50 features, 30 features finalized• Based on feature scores
A Prediction Framework for Fast Sparse Triangular Solves 9
![Page 34: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/34.jpg)
Feature Selection
• Selected Features
A Prediction Framework for Fast Sparse Triangular Solves 10
No. Features Description Score rank
1 nnzs # of non-zeros 1
6 m Number of rows/columns 6
18 lvls # of levels 15
![Page 35: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/35.jpg)
Feature Selection
• Selected Features
A Prediction Framework for Fast Sparse Triangular Solves 10
No. Features Description Score rank
1 nnzs # of non-zeros 1
-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5
-5 max_nnz_pl_cw nnz per level col wise stats 3
6 m Number of rows/columns 6
18 lvls # of levels 15
![Page 36: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/36.jpg)
Feature Selection
• Selected Features
A Prediction Framework for Fast Sparse Triangular Solves 10
No. Features Description Score rank
1 nnzs # of non-zeros 1
-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5
-5 max_nnz_pl_cw nnz per level col wise stats 3
6 m Number of rows/columns 6
7-10 <max,mean,median,std>_rpl Rows per level stats 7,12,13,16
18 lvls # of levels 15
![Page 37: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/37.jpg)
Feature Selection
• Selected Features
A Prediction Framework for Fast Sparse Triangular Solves 10
No. Features Description Score rank
1 nnzs # of non-zeros 1
-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5
-5 max_nnz_pl_cw nnz per level col wise stats 3
6 m Number of rows/columns 6
7-10 <max,mean,median,std>_rpl Rows per level stats 7,12,13,16
18 lvls # of levels 15
26-30 >_mean_<max,std_mean,median,min rl_pl Row-length per level stats 21,23,24,25,26
![Page 38: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/38.jpg)
Feature Selection
• Selected Features
A Prediction Framework for Fast Sparse Triangular Solves 10
No. Features Description Score rank
1 nnzs # of non-zeros 1
-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5
-5 max_nnz_pl_cw nnz per level col wise stats 3
6 m Number of rows/columns 6
7-10 <max,mean,median,std>_rpl Rows per level stats 7,12,13,16
18 lvls # of levels 15
19-21 >_mean_<max,mean,std cl_pl Column-length per level stats 17,18,20
26-30 >_mean_<max,std_mean,median,min rl_pl Row-length per level stats 21,23,24,25,26
![Page 39: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/39.jpg)
Feature Selection
• Selected Features
A Prediction Framework for Fast Sparse Triangular Solves 10
No. Features Description Score rank
1 nnzs # of non-zeros 1
-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5
-5 max_nnz_pl_cw nnz per level col wise stats 3
6 m Number of rows/columns 6
7-10 <max,mean,median,std>_rpl Rows per level stats 7,12,13,16
13-14 <max,min>_rl_cnt Rows with max/min lengths 9,11
18 lvls # of levels 15
19-21 >_mean_<max,mean,std cl_pl Column-length per level stats 17,18,20
22-25 rl<mx_mean,median,std>_ Row-length stats 19,27,28,30
26-30 >_mean_<max,std_mean,median,min rl_pl Row-length per level stats 21,23,24,25,26
![Page 40: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/40.jpg)
Feature Selection
• Selected Features
A Prediction Framework for Fast Sparse Triangular Solves 10
No. Features Description Score rank
1 nnzs # of non-zeros 1
-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5
-5 max_nnz_pl_cw nnz per level col wise stats 3
6 m Number of rows/columns 6
7-10 <max,mean,median,std>_rpl Rows per level stats 7,12,13,16
<11-12 min,max>_cl_cnt Columns with max/min length 8,10
13-14 <max,min>_rl_cnt Rows with max/min lengths 9,11
15-17 <max,std,median>_cl Column-length stats 14,22,29
18 lvls # of levels 15
19-21 >_mean_<max,mean,std cl_pl Column-length per level stats 17,18,20
22-25 rl<mx_mean,median,std>_ Row-length stats 19,27,28,30
26-30 >_mean_<max,std_mean,median,min rl_pl Row-length per level stats 21,23,24,25,26
![Page 41: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/41.jpg)
SpTRSV Prediction Framework
• Matrix Feature Extractor
– A C++/CUDA tool• Uses CPU, GPU for efficient feature
extraction
A Prediction Framework for Fast Sparse Triangular Solves 11
![Page 42: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/42.jpg)
SpTRSV Prediction Framework
• Matrix Feature Extractor
– A C++/CUDA tool• Uses CPU, GPU for efficient feature
extraction
• Performance data collector
– For input matrix, collect performance data for all algorithms
– Reports ID of the fastest algorithm
A Prediction Framework for Fast Sparse Triangular Solves 11
![Page 43: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/43.jpg)
SpTRSV Prediction Framework
• Matrix Feature Extractor
– A C++/CUDA tool• Uses CPU, GPU for efficient feature
extraction
• Performance data collector
– For input matrix, collect performance data for all algorithms
– Reports ID of the fastest algorithm
• Model Trainer and Tester
A Prediction Framework for Fast Sparse Triangular Solves 11
Model Trainer And
Tester
Matrix features
IDs of fastest algorithm
![Page 44: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/44.jpg)
SpTRSV Prediction Framework
• Matrix Feature Extractor
– A C++/CUDA tool• Uses CPU, GPU for efficient feature
extraction
• Performance data collector
– For input matrix, collect performance data for all algorithms
– Reports ID of the fastest algorithm
• Model Trainer and Tester
• Model Selection
– Scikit-learn library for model selection and evaluation
A Prediction Framework for Fast Sparse Triangular Solves 11
Model Trainer And
Tester
Matrix features
IDs of fastest algorithm
![Page 45: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/45.jpg)
SpTRSV Prediction Framework
• Matrix Feature Extractor
– A C++/CUDA tool• Uses CPU, GPU for efficient feature
extraction
• Performance data collector
– For input matrix, collect performance data for all algorithms
– Reports ID of the fastest algorithm
• Model Trainer and Tester
• Model Selection
– Scikit-learn library for model selection and evaluation
– Supervised machine learning
• Deep learning requires large data sets, training times
A Prediction Framework for Fast Sparse Triangular Solves 11
Model Trainer And
Tester
Matrix features
IDs of fastest algorithm
![Page 46: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/46.jpg)
SpTRSV Prediction Framework
• Matrix Feature Extractor
– A C++/CUDA tool
• Uses CPU, GPU for efficient feature extraction
• Performance data collector
– For input matrix, collect performance data for all algorithms
– Reports ID of the fastest algorithm
• Model Trainer and Tester
• Model Selection
– Scikit-learn library for model selection and evaluation
– Supervised machine learning
• Deep learning requires large data sets, training times
– Evaluated classification models:
• Decision trees
• Random Forests
• Support Vector Machine (with grid-search)
• K-Nearest Neighbors
• Multi-Layer Perceptron
A Prediction Framework for Fast Sparse Triangular Solves 11
Model Trainer And
Tester
Matrix features
IDs of fastest algorithm
![Page 47: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/47.jpg)
SpTRSV Prediction Framework
• Matrix Feature Extractor
– A C++/CUDA tool
• Uses CPU, GPU for efficient feature extraction
• Performance data collector
– For input matrix, collect performance data for all algorithms
– Reports ID of the fastest algorithm
• Model Trainer and Tester
• Model Selection
– Scikit-learn library for model selection and evaluation
– Supervised machine learning
• Deep learning requires large data sets, training times
– Evaluated classification models:
• Decision trees
• Random Forests
• Support Vector Machine (with grid-search)
• K-Nearest Neighbors
• Multi-Layer Perceptron
A Prediction Framework for Fast Sparse Triangular Solves 11
Model Trainer And
Tester
Matrix features
IDs of fastest algorithm
![Page 48: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/48.jpg)
Model Training and Testing
A Prediction Framework for Fast Sparse Triangular Solves 12
SparseMatrixDataset
Feature Scaling
TrainingSet
TestingSet
![Page 49: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/49.jpg)
Model Training and Testing
A Prediction Framework for Fast Sparse Triangular Solves 12
SparseMatrixDataset
Feature Scaling
TrainingSet
TestingSet
75%
25%
![Page 50: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/50.jpg)
Model Training and Testing
A Prediction Framework for Fast Sparse Triangular Solves 12
SparseMatrixDataset
Feature Scaling
TrainingSet
TestingSet M
atri
x Fe
atu
re E
xtra
ctio
n
AlgorithmPerformance Data
ClassifierTraining
75%
25%
![Page 51: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/51.jpg)
Model Training and Testing
A Prediction Framework for Fast Sparse Triangular Solves 12
SparseMatrixDataset
Feature Scaling
TrainingSet
TestingSet M
atri
x Fe
atu
re E
xtra
ctio
n
AlgorithmPerformance Data
ClassifierTraining
Trained Model
75%
25%
![Page 52: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/52.jpg)
Model Training and Testing
A Prediction Framework for Fast Sparse Triangular Solves 12
SparseMatrixDataset
Feature Scaling
TrainingSet
TestingSet M
atri
x Fe
atu
re E
xtra
ctio
n
AlgorithmPerformance Data
ClassifierTraining
Trained Model
Predicted Algorithm
75%
25%
![Page 53: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/53.jpg)
Model Training and Testing
A Prediction Framework for Fast Sparse Triangular Solves 12
SparseMatrixDataset
Feature Scaling
TrainingSet
TestingSet M
atri
x Fe
atu
re E
xtra
ctio
n
AlgorithmPerformance Data
ClassifierTraining
Trained Model
Predicted Algorithm
75%
25%
10-fold cross validation
![Page 54: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/54.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
![Page 55: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/55.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
SpTRSV on CPU
![Page 56: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/56.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
SpTRSV on CPU
y
![Page 57: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/57.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
SpTRSV on GPU
![Page 58: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/58.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
SpTRSV on GPU
b
![Page 59: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/59.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSV
![Page 60: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/60.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSVSpTRSV on GPU
![Page 61: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/61.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSVSpTRSV on GPU
y
![Page 62: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/62.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSVSpTRSV on CPU
![Page 63: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/63.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSVSpTRSV on CPU
b
![Page 64: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/64.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSV
![Page 65: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/65.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on different platforms
A Prediction Framework for Fast Sparse Triangular Solves 13
SpTRSV
Computations on CPU
Computations on GPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSV
Data transfershave no impact onAlgorithm selectionin this case
![Page 66: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/66.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
![Page 67: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/67.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
SpTRSV on CPU
![Page 68: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/68.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
SpTRSV on GPU
![Page 69: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/69.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
SpTRSV on GPU
y
b
![Page 70: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/70.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSV
![Page 71: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/71.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSVSpTRSV on GPU
![Page 72: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/72.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSVSpTRSV on CPU
![Page 73: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/73.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSVSpTRSV on CPU
y
b
![Page 74: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/74.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSV
![Page 75: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/75.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSV
Data transfers may impact algorithm selectionin this case
![Page 76: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/76.jpg)
Effects of CPU-GPU Data Transfers
• Computations before/after SpTRSV on same platform
A Prediction Framework for Fast Sparse Triangular Solves 14
SpTRSV
Computations on CPU
Computations on CPU
Nu
merical A
pp
lication
Ly = b
Computations on GPU
Computations on CPU
Nu
merical A
pp
lication
SpTRSV
Framework accountsfor data transfer costs
![Page 77: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/77.jpg)
Evaluation
• Hardware platform
– CPU: Intel Gold (6148)
• 40 cores (20 cores/socket)
– GPU: NVIDIA Tesla V100
A Prediction Framework for Fast Sparse Triangular Solves 15
![Page 78: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/78.jpg)
Evaluation
• Hardware platform
– CPU: Intel Gold (6148)• 40 cores (20 cores/socket)
– GPU: NVIDIA Tesla V100
• Software configuration
– Intel Parallel Studio 2019
– NVIDIA CUDA 10.1
– Compiler options:• -O3
• -gencode arch=compute_70,
code=sm_70
A Prediction Framework for Fast Sparse Triangular Solves 15
![Page 79: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/79.jpg)
Evaluation
• Hardware platform
– CPU: Intel Gold (6148)• 40 cores (20 cores/socket)
– GPU: NVIDIA Tesla V100
• Software configuration
– Intel Parallel Studio 2019
– NVIDIA CUDA 10.1
– Compiler options:• -O3
• -gencode arch=compute_70, code=sm_70
• Sparse Matrix Dataset
– 998 real square matrices from SuiteSparse matrix collection• 1K to 16.24M rows
• 1.074K to 232M nnzs
– Extensible with new matrices
A Prediction Framework for Fast Sparse Triangular Solves 15
![Page 80: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/80.jpg)
Evaluation
• Hardware platform
– CPU: Intel Gold (6148)• 40 cores (20 cores/socket)
– GPU: NVIDIA Tesla V100
• Software configuration
– Intel Parallel Studio 2019
– NVIDIA CUDA 10.1
– Compiler options:• -O3
• -gencode arch=compute_70, code=sm_70
• Sparse Matrix Dataset
– 998 real square matrices from SuiteSparse matrix collection• 1K to 16.24M rows
• 1.074K to 232M nnzs
– Extensible with new matrices
• Performance of SpTRSV Algorithms (998 matrices)
A Prediction Framework for Fast Sparse Triangular Solves 15
MKL(seq)41% MKL(par)
1%
CUS111%
CUS2(lvl)6%
CUS2(no lvl)2%Sync-Free
39%
![Page 81: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/81.jpg)
Evaluation
• Hardware platform
– CPU: Intel Gold (6148)• 40 cores (20 cores/socket)
– GPU: NVIDIA Tesla V100
• Software configuration
– Intel Parallel Studio 2019
– NVIDIA CUDA 10.1
– Compiler options:• -O3
• -gencode arch=compute_70, code=sm_70
• Sparse Matrix Dataset
– 998 real square matrices from SuiteSparse matrix collection• 1K to 16.24M rows
• 1.074K to 232M nnzs
– Extensible with new matrices
• Performance of SpTRSV Algorithms (998 matrices)
A Prediction Framework for Fast Sparse Triangular Solves 15
MKL(seq)41% MKL(par)
1%
CUS111%
CUS2(lvl)6%
CUS2(no lvl)2%Sync-Free
39%
CPU42%
GPU58%
![Page 82: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/82.jpg)
Evaluation: Prediction Accuracy
– 10-fold cross validation
A Prediction Framework for Fast Sparse Triangular Solves 16
![Page 83: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/83.jpg)
Evaluation: Prediction Accuracy
– 10-fold cross validation
– With 30 features
A Prediction Framework for Fast Sparse Triangular Solves 16
Mean ~87% ~89% ~87% ~87%
![Page 84: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/84.jpg)
Evaluation: Prediction Accuracy
– 10-fold cross validation
– With 30 features
– With top 10 features
A Prediction Framework for Fast Sparse Triangular Solves 16
Mean ~87% ~89% ~87% ~87% ~80% ~81% ~80% ~79%
![Page 85: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/85.jpg)
Evaluation: Speedup Gain
A Prediction Framework for Fast Sparse Triangular Solves 17
• Framework achieves significant speedups over arbitrary algorithm choice
![Page 86: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/86.jpg)
Evaluation: Overheads
• Acceptable overheads, especially for large matrices
A Prediction Framework for Fast Sparse Triangular Solves 18
![Page 87: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/87.jpg)
Related Work
• OSKI library [5]
- Runtime autotuning of SpTRSV
• PetaBricks [6]
– Algorithm selection based on data size
• Nitro [7]
– Algorithm selection through user-guided machine learning
• MAPS simulation framework [8]
– Heuristics-based SpTRSV algo. selection on CPU/GPU
– Limited to reservoir simulation
• SpTRSV Algo. Selection on GPUs [9]
– Machine learning-based approach
A Prediction Framework for Fast Sparse Triangular Solves 19
![Page 88: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/88.jpg)
Conclusions
• We use supervised machine learning approach for SpTRSValgorithm selection on CPU-GPU systems
• We implemented the approach as an automated, extensible framework for prediction model training and fastest SpTRSVprediction
• Framework evaluation on Intel Gold CPU + NVIDIA V100 GPU shows 87% model accuracy, 1.4-2.7x mean SpTRSV speedups
A Prediction Framework for Fast Sparse Triangular Solves 20
![Page 89: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/89.jpg)
Paper Artifacts
• Artifacts– Support materials (source code, tools, benchmarks, datasets, models)
required for reproducibility of claimed experimental results1
A Prediction Framework for Fast Sparse Triangular Solves 211 Euro-Par 2020 Call for Artifact Evaluation
![Page 90: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/90.jpg)
Paper Artifacts
• Artifacts– Support materials (source code, tools, benchmarks, datasets, models)
required for reproducibility of claimed experimental results1
• Artifact Evaluation Process (AEP) at Euro-Par1
– Completely optional but highly recommended
– Evaluated by independent committee of experts
A Prediction Framework for Fast Sparse Triangular Solves 211 Euro-Par 2020 Call for Artifact Evaluation
![Page 91: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/91.jpg)
Paper Artifacts
• Artifacts
– Support materials (source code, tools, benchmarks, datasets, models) required for reproducibility of claimed experimental results1
• Artifact Evaluation Process (AEP) at Euro-Par1
– Completely optional but highly recommended
– Evaluated by independent committee of experts
• Our motivation for Artifact Evaluation
– Making our research more accessible
– Documenting and organizing the research for future reference/extension
– Enhance credibility of the research claims
A Prediction Framework for Fast Sparse Triangular Solves 211 Euro-Par 2020 Call for Artifact Evaluation
![Page 92: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/92.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 22
![Page 93: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/93.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 22
![Page 94: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/94.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 22
![Page 95: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/95.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 22
![Page 96: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/96.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 22
![Page 97: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/97.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 22
A selection criterionArtifact should be self-contained
![Page 98: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/98.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 23
![Page 99: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/99.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 23
![Page 100: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/100.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 23
![Page 101: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/101.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 23
A selection criterionArtifacts with long running time will not be evaluated
![Page 102: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/102.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 23
![Page 103: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/103.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 23
![Page 104: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/104.jpg)
Artifact Preparation for Evaluation
• File organization
A Prediction Framework for Fast Sparse Triangular Solves 23
A selection criterionEase of use
![Page 105: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/105.jpg)
Artifact Preparation for Evaluation
• The Overview Document
A Prediction Framework for Fast Sparse Triangular Solves 24
![Page 106: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/106.jpg)
Artifact Preparation for Evaluation
• The Overview Document
A Prediction Framework for Fast Sparse Triangular Solves 24
![Page 107: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/107.jpg)
Artifact Preparation for Evaluation
• The Overview Document
A Prediction Framework for Fast Sparse Triangular Solves 24
![Page 108: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/108.jpg)
Artifact Preparation for Evaluation
• The Overview Document
A Prediction Framework for Fast Sparse Triangular Solves 24
![Page 109: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/109.jpg)
Artifact Preparation for Evaluation
• The Overview Document
A Prediction Framework for Fast Sparse Triangular Solves 24
![Page 110: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/110.jpg)
Artifact Preparation for Evaluation
• The Overview Document
A Prediction Framework for Fast Sparse Triangular Solves 24
![Page 111: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/111.jpg)
Artifact Preparation for Evaluation
• The Overview Document
A Prediction Framework for Fast Sparse Triangular Solves 24
![Page 112: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/112.jpg)
Artifact Preparation for Evaluation
• Reproducing Paper Results
A Prediction Framework for Fast Sparse Triangular Solves 25
![Page 113: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/113.jpg)
Artifact Preparation for Evaluation
• Reproducing Paper Results
A Prediction Framework for Fast Sparse Triangular Solves 25
![Page 114: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/114.jpg)
Artifact Preparation for Evaluation
• Reproducing Paper Results
A Prediction Framework for Fast Sparse Triangular Solves 25
![Page 115: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/115.jpg)
Artifact Preparation for Evaluation
• Reproducing Paper Results
A Prediction Framework for Fast Sparse Triangular Solves 25
![Page 116: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/116.jpg)
Artifact Preparation for Evaluation
• Dataset Generation Guide
A Prediction Framework for Fast Sparse Triangular Solves 26
![Page 117: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/117.jpg)
Artifact Preparation for Evaluation
• Dataset Generation Guide
A Prediction Framework for Fast Sparse Triangular Solves 26
![Page 118: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/118.jpg)
Artifact Preparation for Evaluation
• Dataset Generation Guide
A Prediction Framework for Fast Sparse Triangular Solves 26
![Page 119: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/119.jpg)
Artifact Preparation for Evaluation
• Dataset Generation Guide
A Prediction Framework for Fast Sparse Triangular Solves 26
![Page 120: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/120.jpg)
Artifact Preparation for Evaluation
• Dataset Generation Guide
A Prediction Framework for Fast Sparse Triangular Solves 26
![Page 121: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/121.jpg)
Artifact Preparation for Evaluation
• Dataset Generation Guide
A Prediction Framework for Fast Sparse Triangular Solves 26
![Page 122: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/122.jpg)
Artifact Evaluation
• Final Remarks
– Artifact Evaluation, a time-consuming but rewarding process
– Artifact mirror:
– For questions, queries, suggestions, contact: [email protected]
A Prediction Framework for Fast Sparse Triangular Solves 27
https://github.com/ParCoreLab/SpTRSV_Framework
![Page 123: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/123.jpg)
THANK YOU
A Prediction Framework for Fast Sparse Triangular Solves 28
![Page 124: A Prediction Framework for Fast Sparse Triangular Solves](https://reader034.vdocuments.us/reader034/viewer/2022050715/624bfd71426fc569f27b31b8/html5/thumbnails/124.jpg)
References[1] A. Jamal et al., "A Hybrid CPU/GPU Approach for the Parallel Algebraic Recursive Multilevel Solver pARMS," 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, 2016
[2] Park, J. et al., “Sparsifying synchronization for high-performance shared-memory sparse triangular solver.” ISC, 2014
[3] Liu, W. et al., ”Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides.”, Concurrency Computat: Pract Exper. 2017
[4] Li et al., “Efficient parallel implementations of sparse triangular solves for gpu architectures.”, SIAM Conference on Parallel Processing for Scientific Computing, 2020
[5] Vuduc et al., “OSKI: A library of automatically tuned sparse matrix kernels.”, Journal of Physics: Conference Series, 2005
[6] Ansel, J. et al., “Petabricks: A language and compiler for algorithmic choice.”, SIGPLAN, 2009
[7] Muralidharan, S. et al., “Nitro: A framework for adaptive code variant tuning.”, IEEE 28th IPDPS, 2014
[8] Klie, H. et al., “Exploiting capabilities of many core platforms in reservoir simulation.”, In: SPE Reservoir Simulation Symposiu m, 2011
[9] Dufrechou, E. et al., “Automatic selection of sparse triangular linear system solvers on gpus through machine learning techniques”, International Symposium on Computer Architecture and High Performance Computing, 2019
A Prediction Framework for Fast Sparse Triangular Solves 29