parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core cpu

19
Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU Dmitry Belousov, Elena Akimova Institute of Mathematics and Mechanics of Ural Branch of RAS, Ural Federal University named the first President of Russia B.N.Yeltsin 1

Upload: ural-pdc

Post on 19-Feb-2017

72 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Parallel algorithms for solving linear systems

with block-fivediagonal matrices on multi-core

CPU

Dmitry Belousov, Elena Akimova

Institute of Mathematics and Mechanics of Ural Branch of RAS,

Ural Federal University named the first President of Russia B.N.Yeltsin

1

Page 2: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Systems of linear algebraic equations (SLAE) with block-fivediagonal

matrices arise when solving some mathematical modelling problems, in

particular, geoelectrics and diffusion problems.

Introduction

After using a finite-difference approximation with improved accuracy, the

problem is reduced to solving a SLAE with block-fivediagonal matrix [1].2

Page 3: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

SLAE

0 0 0 1 0 2 0

1 0 1 1 1 2 1 3 1

,

,

С Y D Y E Y F

BY CY DY EY F

− + =− + − + =

We consider the system:

(1)

2 1 1 2

2 1 1

1 1 1 1 1 1

, 2,..., 1;

,

,

i i i i i i i i i i i

N N N N N N N N N

N N N N N N N

AY BY CY DY EY F i N

A Y B Y C Y D Y F

A Y B Y C Y F

− − + +

− − +

+ − + + + +

− + − + = = −

− + − = − + =

3

Page 4: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

After a finite-difference approximation

improved accuracy, the geoelectrics

problems and diffusion problems can be

reduced to solving SLAE with block-

fivediagonal matrices with following

DATA STRUCTURE

Fig. 1. Block-fivediagonal matrix

fivediagonal matrices with following

structure which presented in Fig. 1.

We are proposing the Parallel Conjugate

Gradient Method with regularization

(PCMR), conjugate gradient method with

preconditioner (PCGM), and parallel

square root method (PSRM) to solve such

SLAEs.

4

Page 5: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Parallel Conjugate Gradient Method

0 0 0 0, ,r b Ax p r= − =

We have the systems of the following form:

The iterative process for the method has following form:

Ax b=� �,Ax b A A Eα= = +We add parameters for regularization:

(2)

2

1 12

21

1 1 2

2

2

, ,

, , ,( , )

, .

k

k k k k k k

k k kk k

k

k k k

k kk

r b Ax p r

rx x p r r Ap

Ap p

rp r p

r

α α α

β β

+ +

+

+ +

= − =

= + = = −

= + =

The condition for stopping the CGM iterative process with preconditioner is

/kAr b b ε− <

(3)

5

Page 6: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Parallel Conjugate Gradient Method (PCGM) with preconditioner

1 1 ,C Ax C b− −=

max max

min min

( ) ( ), ( ) , ( ) ,cond A cond A cond A cond Aλ λλ λ

<< = =ɶ

ɶ ɶɶ

Ax b=

(4)

Initial System to be replaced by the system that can be converged

by interative method essentially faster.

We have the following condition to choose the preconditioner:

0 0 0 1 0 0 0

1 1

1 11 1 1 1 1

, , ,

( , ), , ,

( , )

( , ), , .

( , )

k kk k k k k k

k k kk k

k kk k k k k

k k k k

r b Ax p C r z p

r zx x p r r Ap

Ap p

r zz C r p z p

r z

α α α

β β

+ +

+ ++ − + + +

= − = =

= + = = −

= = + =

The condition for stopping the CGM iterative process with preconditioner is

/ .kAz b b ε− <

(5)

min minλ λɶ

6

Page 7: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Parallel Conjugate Gradient Method (PCGM) Parallelization

The parallelization of the CGM methods is based on splitting the

matrix A by horizontal lines into m blocks, and the solution vector z

and the vector of the right-hand part b of the SLAE are divided into

m parts so that n=m x L, where n is the dimension of the system of

equations, m is the number of processors, and L is the number of

lines

Fig. 2. Distribution of data among processors

lines

7

Page 8: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Parallel Square Root Method (PSRM)

= =

TA S S= where S is an upper triangular matrix with positive elements on the main

diagonal, is the transposed matrixTS

This method is based on decomposition of the symmetric positive definite

matrix A in the product:

, .TS y b Sz y= =

1 1 11

1

1 1

,,

, 2,3,...., ; , 1, 2,...,1.

n n nn

i n

i ki k i ik k

k k ii i

ii ii

z y sy b s

b s y y s z

y i n z i n ns s

= = +

==

− − = = = = − −

∑ ∑

To solve systems, we use the recursion formulas:

(6)

8

Page 9: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Parallel Square Root Method parallelization

1 -

Processor

2 -

processor…

m –

processor

Fig. 2. Distribution of the i-th line among processors

9

Page 10: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Parallel matrix sweep algorithm

10

(7)

Page 11: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

(8)

(9)

11

(10)

(11)

(12)

Page 12: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

12

Page 13: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Numerical experiments and results

The parallel algorithms: conjugate gradient method with

preconditioner (PCGM), conjugate gradient method with

regularization (PCGR) and parallel square root method

(PSRM), were implemented on:

• Multi-core Intel processor using the technology of

OpenMP,

• and the Windows API development tools.

The basic data and quasi-model solution of the problem

were provided by the Department of Borehole Geophysics

of the Institute of Petroleum Geology and Geophysics of

the Siberian Branch of RAS (Novosibirsk).

13

Page 14: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Numerical experiments and results

Fig 3. Quasy model solution14

Page 15: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Method Computing system T,sec.(Windows API)

T, sec. (OpenMP)

PCGM

Intel Core I5 (1 core ) 165 145

Intel Core I5 (2 core ) 142 127

Intel Core I5 (4 core ) 118 101

mTmT

Computation times

Table 1. Computation times of solving SLAE

PCGR

Intel Core I5 (1 core ) 1805 1804

Intel Core I5 (2 core ) 1490 1450

Intel Core I5 (4 core ) 1288 1230

PSRM

Intel Core I5 (1 core ) 26 18

Intel Core I5 (2 core ) 18 13

Intel Core I5 (4 core ) 9 7

15

Page 16: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

The numerical solution of the problem is compared with the quasi-model solution

by means of calculating the relative error

/ ,M N MY Y Yσ = − MY is the quasi-model solution of the problem

NY is the numerical solution of the problem

This problem was solved by the parallel conjugate gradient method with

preconditioner, parallel matrix sweep algorithm, and parallel square rootpreconditioner, parallel matrix sweep algorithm, and parallel square root

method. The numerical solution of the problem coincides with the quasi-model

solution with accuracy

7 7 710 , 2 10 , 6 10 .PCGM PCGR PSRMσ σ σ− − −≈ ≈ ⋅ ≈ ⋅

11 6 5maxmax min

min

( ) 1.3 10 , 1.4 10 , 1.1 10 0,cond Aλ

λ λλ

−= ≈ ⋅ = ⋅ = ⋅ >

A priori we find the condition number of the original matrix A

16

Page 17: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Conclusion

We have proposed the methods of solving the SLAE with block-

fivediagonal matrices:

1. Parallel square root method with preconditioner are proposed.

2. Parallel square root method with regularization are proposed.

3. Parallel conjugate gradient method.3. Parallel conjugate gradient method.

4. Parallel matrix sweep algorithm.

Algorithms , instead 4 are implemented on the Intel multi-core processor.

Investigation of efficiency and optimization of parallel algorithms for

solving the problem with quasi-model data are performed.

17

Page 18: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

References1. Dashevsky, J.A, Surodina, I.V., Epov, M.I.: Quasi-three-dimensional mathematical

modelling of diagrams of axisymmetric direct current probes in anisotropic profiles.

Siberian J. of Industrial Mathematics. Vol. 5, № 3 (11), pp. 76–91 (2002).

2. Gorbachev, I.I., Popov, V.V. & Akimova, E.N. Computer simulation of the diffusion

interaction between carbonitride precipitates and austenitic matrix with allowance for the

possibility of variation of their composition, The Physics of Metals and

Metallography 2006, Volume 102, Issue 1, pp 18–28,

http://link.springer.com/article/10.1134/S0031918X06070039

3. Faddeev, V.K., Faddeeva, V.N.: Computational methods of linear algebra. M.: Gos. Isdat. 3. Faddeev, V.K., Faddeeva, V.N.: Computational methods of linear algebra. M.: Gos. Isdat.

Fizmat. Lit. (1963).

4. Elena N. Akimova, Dmitry V. Belousov: Parallel algorithms for solving linear systems with

block-tridiagonal matrices on multi-core CPU with GPU, Journal of Computational Science

2012 Volume 3, Issue 6, Pages 445–449,

http://www.sciencedirect.com/science/article/pii/S1877750312000932

5. Akimova E.N. A parallel matrix sweep algorithm for solving linear system with block-

fivediagonal matrices // AIP Conf. Proc.1648, 850028. 2015. Rhodes, Greece, 22–28 Sept.

2014. 5 p., http://scitation.aip.org/content/aip/proceeding/aipcp/10.1063/1.4913083

6. Voevodin, Vl.V.: Parallel programming technologies. URL: http://parallel.ru.

7. Samarskii, A.A., Nikolaev E.S.: Methods for solving the grid equations. M. Nauka. (1978).

8. Process and Thread Functions. https://msdn.microsoft.com/en-

us/library/windows/desktop/ms684847(v=vs.85).aspx.

18

Page 19: Parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core CPU

Thanks for Attention!Contact

Dmitry Belousov Elena AkimovaDmitry Belousov Elena Akimova

[email protected] [email protected]

T: +79068088664

Institute of Mathematics and Mechanics of UrB RAS

19