generic compressed matrix insertion p eter g ottschling – s mart s oft /tud d ag l indbo – k...

22
Generic Compressed Matrix Insertion PETER GOTTSCHLING – SMARTSOFT/TUD DAG LINDBO – KUNGLIGA TEKNISKA HÖGSKOLAN SmartSoft – TU Dresden Peter.Gottschling@smartsoft- computing.com Tel.: +49 (0) 351 463 34018

Upload: prosper-shields

Post on 08-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Generic library for high-performance numeric operations in mathematical notation Many new techniques as implicit enable-if and meta- tuning Most modern iterative solvers Focus on high-performance simulation: FEM/XFEM/FVM/FDM Commercial version in preparation Parallel version in progress Multi-core, GPU support and multigrid in near future Matrix Template Library 4

TRANSCRIPT

Page 1: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

Generic Compressed Matrix InsertionPETER GOTTSCHLING – SMARTSOFT/TUD

DAG LINDBO – KUNGLIGA TEKNISKA HÖGSKOLAN

SmartSoft – TU [email protected].: +49 (0) 351 463 34018

Page 2: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• Software libraries• MTL4• FEniCS

• Compressed sparse matrices• Insertion• Benchmarks• Vision

Overview

Page 3: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• Generic library for high-performance numeric operations in mathematical notation

• Many new techniques as implicit enable-if and meta-tuning

• Most modern iterative solvers• Focus on high-performance simulation:

FEM/XFEM/FVM/FDM• Commercial version in preparation

• Parallel version in progress• Multi-core, GPU support and multigrid in near future

Matrix Template Library 4

Page 4: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

Innovative Produktentwicklung durchFinite-Elemente-Methode (FEM)

Innovative Produktentwicklung durch

template < class LinearOperator, class HilbertSpaceX, class HilbertSpaceB, class Preconditioner, class Iteration >int cg(const LinearOperator& A, HilbertSpaceX& x, const HilbertSpaceB& b, const Preconditioner& M, Iteration& iter){ typedef typename mtl::Collection<HilbertSpaceX>::value_type Scalar; Scalar rho, rho_1, alpha, beta; HilbertSpaceX p(size(x)), q(size(x)), r(size(x)), z(size(x)); r = b - A*x; while (! iter.finished(r)) { z = solve(M, r); rho = dot(r, z); if (iter.first()) p = z; else { beta = rho / rho_1; p = z + beta * p; } q = A * p; alpha = rho / dot(p, q); x += alpha * p; r -= alpha * q; rho_1 = rho; ++iter; } return iter;}

Linearer Gleichungslöser

Page 5: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• Free software for solving differential equations• FFC – FEniCS Form Compiler

• High-level math language for formulating differential equations

• Generate C++ code • DOLFIN – generic FEM kernel

• C++ library for FEM cores: assembler, mesh and function abstraction

• Interface to uBLAS, PETSc, Trillinos, and MTL4

• Paper focus in matrix assembly

FEniCS

Page 6: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

Compressed Sparse Row Format

• Most common general-purpose sparse format

• Entries sorted• Kind of run-

length encoding on rows

Page 7: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

In-Flight Insertion

• Very simple use• Like dense

matrices• Simple realization• Extremely

expensive• All following entries

are changed• Quadratic

complexity

A[0][1]= 6;

Page 8: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• Dedicated insertion phase• Matrix is available after terminating insertion• Later modification impossible• Works for distributed matrices as well

• Used in PETSc, includes construction of communication buffers for dist. SpMVP

• Janus derives its name from it (two faces)

Two-phase Insertion

Page 9: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• Inserter = object providing operations to set up other objects, e.g. matrices or vectors, efficiently

• Insertion phase lasts as long as inserter lives• Insert within a scope (block, function)

• Matrix ready when inserter destroyed• Later insertion possible with another inserter• Extends to distributed matrices and vectors• MTL4 inserters have minimal memory usage

Inserter Concept in MTL4

Page 10: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

int main(int argc, char* argv[]){

compressed2D<float> A(3, 5); { matrix::inserter<compressed2D<float> > ins(A);ins[0][0] << 1.0; ins[0][2] << 2.0;ins[1][3] << 3.0;ins[2][1] << 4.0; ins[2][4] << 5.0; } std::cout << "A is\n" << A << '\n'; return 0;

}

Using Inserters

Page 11: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

Direct Insertion

• Reserve s entries per row

• Find insert position• By linear or binary

search• Move remainder in

row• Linear in s

• That is constant

A[0][1]= 6;

Page 12: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

Indirect Insertion

• For saturated rows use “spare” container

• std::map of index pair• Logarithmic in number

of spare entries• Additional allocation• About 10 times slower

than direct insertion

A[0][4]= 7;

Page 13: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• Assemble CRS matrix• Row order important, and order within row• Performance measure: number of non-zeros inserted per second• Reassembly• Three libraries: uBLAS (including vector-of-vector), MTL4, PETSc• Ordinary workstation (Intel)• All benchmarks run in a simple interface routine for each library, e.g.

Benchmark

void insert row(Matrix& A, int row_idx, int cols_idx, double a, int n)∗ ∗{

for(int j=0; j<n; j++) A(row_idx , cols_idx[j]) += a[j];

}

Page 14: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• 10,000 rows, 5 non-zeros/row• MTL4: 46 million entries per second• uBLAS: 5.9 million entries per second• uBLAS (gov): 2 million entries per second• PETSc: 22 million entries per second

Benchmark: Assembly rate with ascending rows

Page 15: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• 100,000 rows, 50 non-zeros/row• MTL4: 29.6 million entries per second• uBLAS: 6.5 million entries per second• uBLAS (gov): 2.8 million entries per second• PETSc: 32.3 million entries per second

Benchmark: Assembly rate with ascending rows

Page 16: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• 10,000 rows, 5 non-zeros/row• MTL4: 41.4 million entries per second• uBLAS: 31,300 entries per second• uBLAS (gov): 1.9 million entries per second• PETSc: 19.9 million entries per second

Benchmark: Assembly rate with random rows

Page 17: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• 100,000 rows, 50 non-zeros/row• MTL4: 25.6 million entries per second• uBLAS: measuring abandonned• uBLAS (gov): 2.7 million entries per second• PETSc: 25.6 million entries per second

Benchmark: Assembly rate with random rows

Page 18: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• 10,000 rows, 5 non-zeros/row• MTL4: 4.8 million entries per second• uBLAS: 16,700 entries per second• uBLAS (gov): 1.8 million entries per second• PETSc: 15,900 entries per second

Benchmark: Assembly rate with entirely random entries

Page 19: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• 10,000 rows, 50 non-zeros/row• MTL4: 2.9 million entries per second• uBLAS: 3,340 entries per second• uBLAS (gov): 1.7 million entries per second• PETSc: 13,400 entries per second

Benchmark: Assembly rate with random rows

Page 20: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

How to do Science in Silicon?

Graphic application

CPUGPU

Page 21: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

ScientificSoftware

Scientific application

CPU

GPU Multi-Core Par. Arch. Scien. Proc.

Page 22: Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

• Introduced new approach for setting and modifying compressed sparse matrices

• Does not need preparation phase• Minimal memory footprint• Optimal performance• Tuned block-insertion under progress• Extends to distributed data structures

Conclusions