h-cholesky on manycore

H–Cholesky Factorization on Many-Core Accelerators Gang Liao

August 2, 2015

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice

BackgroundIf A is a positive definite matrix, Cholesky factorization: A = 𝐿𝐿%

Data matrices representing some numericalobservations such as proximity matrix orcorrelation matrix are often huge and hard toanalyze, therefore to decompose the datamatrices into some lower-order or lower-rankcanonical forms will reveal the inherentcharacteristic and structure of the matrices andhelp to interpret their meaning readily.

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice3

Hierarchical Matrix Hierarchical matrices (H-matrices) are a powerful tool to represent dense matrices coming from integral equations or partial differential equations in a hierarchical, block-oriented, data-sparse way with log-linear memory costs.


Hierarchical Matrix

4

Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization NoticeIntel Confidential 5

Implementation: Inadmissible Leaves:The product index set resolves into admissible and inadmissible leaves of the tree. The assembly, storage and matrix-vector multiplication differs for the corresponding two classes of sub matrices.

Inadmissible Leaves:


Implementation: Admissible Leaves:The product index set resolves into admissible and inadmissible leaves of the tree. The assembly, storage and matrix-vector multiplication differs for the corresponding two classes of sub matrices.

Admissible Leaves:


Hierarchical Matrix Representation


Profiling


Compiler Optimization – Full matrix

Intel Confidential 9

For icc opt1, icc with optimizations like -O2.

For icc opt2, icc with default optimizations like -msse4.2 -O3.

For icc mkl, icc opt2 + mkl function.


Numerical Libraries Optimization – Full matrix

Intel Confidential 10

dpotrf_ vs plasma_dpotrf vsmagma_dpotrf

MKL: Intel Math Kernel Library (Intel MKL) accelerates math processing routines.

PLASMA: Parallel Linear Algebra for Scalable Multi-core Architectures

MAGMA: Matrix Algebra on GPU and Multicore Architectures


Parallel Optimization The concept of task-based DAG computations is used to split the H-Choleskyfactorization into single tasks and to define corresponding dependencies to form a DAG.


Code

Anal

ysis


Multicore Optimization – H-Cholesky Factorization

13

Example 1:

Example 2:


Manycore Optimization – H-Cholesky Factorization

1. Allocate & Copy r->a[row_offset] and r->b[col_offset] into accelerators.

2. Copy result ft->e from accelerators into CPU host memory.


Result & Conclusion

0 500 1000 1500 2000 2500 3000 3500 4000 45000

2

4

6

8

10

12H−Cholesky Decomposition where the problem size (vertices) is 10002

nmin (leaf size)

Tim

e (s

ec)

MKL

HybridH - Cholesky factorization on many-core accelerators is extremely efficient,

which also can be well scaled on large-scaled H-matrix.

h-cholesky on manycore

Documents

intel corporation

matrix intel confidential

optimization notice

matrix algebra

correlation matrix

proximity matrix

data matrices

icc mkl