h-cholesky on manycore
TRANSCRIPT
H–Cholesky Factorization on Many-Core Accelerators Gang Liao
August 2, 2015
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
BackgroundIf A is a positive definite matrix, Cholesky factorization: A = 𝐿𝐿%
Data matrices representing some numericalobservations such as proximity matrix orcorrelation matrix are often huge and hard toanalyze, therefore to decompose the datamatrices into some lower-order or lower-rankcanonical forms will reveal the inherentcharacteristic and structure of the matrices andhelp to interpret their meaning readily.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice3
Hierarchical Matrix Hierarchical matrices (H-matrices) are a powerful tool to represent dense matrices coming from integral equations or partial differential equations in a hierarchical, block-oriented, data-sparse way with log-linear memory costs.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Hierarchical Matrix
4
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization NoticeIntel Confidential 5
Implementation: Inadmissible Leaves:The product index set resolves into admissible and inadmissible leaves of the tree. The assembly, storage and matrix-vector multiplication differs for the corresponding two classes of sub matrices.
Inadmissible Leaves:
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization NoticeIntel Confidential 6
Implementation: Admissible Leaves:The product index set resolves into admissible and inadmissible leaves of the tree. The assembly, storage and matrix-vector multiplication differs for the corresponding two classes of sub matrices.
Admissible Leaves:
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization NoticeIntel Confidential 7
Hierarchical Matrix Representation
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization NoticeIntel Confidential 8
Profiling
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Compiler Optimization – Full matrix
Intel Confidential 9
For icc opt1, icc with optimizations like -O2.
For icc opt2, icc with default optimizations like -msse4.2 -O3.
For icc mkl, icc opt2 + mkl function.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Numerical Libraries Optimization – Full matrix
Intel Confidential 10
dpotrf_ vs plasma_dpotrf vsmagma_dpotrf
MKL: Intel Math Kernel Library (Intel MKL) accelerates math processing routines.
PLASMA: Parallel Linear Algebra for Scalable Multi-core Architectures
MAGMA: Matrix Algebra on GPU and Multicore Architectures
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization NoticeIntel Confidential 11
Parallel Optimization The concept of task-based DAG computations is used to split the H-Choleskyfactorization into single tasks and to define corresponding dependencies to form a DAG.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice12
Code
Anal
ysis
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice13
Multicore Optimization – H-Cholesky Factorization
13
Example 1:
Example 2:
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice14
Manycore Optimization – H-Cholesky Factorization
1. Allocate & Copy r->a[row_offset] and r->b[col_offset] into accelerators.
2. Copy result ft->e from accelerators into CPU host memory.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization NoticeIntel Confidential 15
Result & Conclusion
0 500 1000 1500 2000 2500 3000 3500 4000 45000
2
4
6
8
10
12H−Cholesky Decomposition where the problem size (vertices) is 10002
nmin (leaf size)
Tim
e (s
ec)
MKL
HybridH - Cholesky factorization on many-core accelerators is extremely efficient,
which also can be well scaled on large-scaled H-matrix.