parallel algorithms for solving linear systems with block-fivediagonal matrices on multi-core cpu
TRANSCRIPT
Parallel algorithms for solving linear systems
with block-fivediagonal matrices on multi-core
CPU
Dmitry Belousov, Elena Akimova
Institute of Mathematics and Mechanics of Ural Branch of RAS,
Ural Federal University named the first President of Russia B.N.Yeltsin
1
Systems of linear algebraic equations (SLAE) with block-fivediagonal
matrices arise when solving some mathematical modelling problems, in
particular, geoelectrics and diffusion problems.
Introduction
After using a finite-difference approximation with improved accuracy, the
problem is reduced to solving a SLAE with block-fivediagonal matrix [1].2
SLAE
0 0 0 1 0 2 0
1 0 1 1 1 2 1 3 1
,
,
С Y D Y E Y F
BY CY DY EY F
− + =− + − + =
We consider the system:
(1)
2 1 1 2
2 1 1
1 1 1 1 1 1
, 2,..., 1;
,
,
i i i i i i i i i i i
N N N N N N N N N
N N N N N N N
AY BY CY DY EY F i N
A Y B Y C Y D Y F
A Y B Y C Y F
− − + +
− − +
+ − + + + +
− + − + = = −
− + − = − + =
3
After a finite-difference approximation
improved accuracy, the geoelectrics
problems and diffusion problems can be
reduced to solving SLAE with block-
fivediagonal matrices with following
DATA STRUCTURE
Fig. 1. Block-fivediagonal matrix
fivediagonal matrices with following
structure which presented in Fig. 1.
We are proposing the Parallel Conjugate
Gradient Method with regularization
(PCMR), conjugate gradient method with
preconditioner (PCGM), and parallel
square root method (PSRM) to solve such
SLAEs.
4
Parallel Conjugate Gradient Method
0 0 0 0, ,r b Ax p r= − =
We have the systems of the following form:
The iterative process for the method has following form:
Ax b=� �,Ax b A A Eα= = +We add parameters for regularization:
(2)
2
1 12
21
1 1 2
2
2
, ,
, , ,( , )
, .
k
k k k k k k
k k kk k
k
k k k
k kk
r b Ax p r
rx x p r r Ap
Ap p
rp r p
r
α α α
β β
+ +
+
+ +
= − =
= + = = −
= + =
The condition for stopping the CGM iterative process with preconditioner is
/kAr b b ε− <
(3)
5
Parallel Conjugate Gradient Method (PCGM) with preconditioner
1 1 ,C Ax C b− −=
max max
min min
( ) ( ), ( ) , ( ) ,cond A cond A cond A cond Aλ λλ λ
<< = =ɶ
ɶ ɶɶ
Ax b=
(4)
Initial System to be replaced by the system that can be converged
by interative method essentially faster.
We have the following condition to choose the preconditioner:
0 0 0 1 0 0 0
1 1
1 11 1 1 1 1
, , ,
( , ), , ,
( , )
( , ), , .
( , )
k kk k k k k k
k k kk k
k kk k k k k
k k k k
r b Ax p C r z p
r zx x p r r Ap
Ap p
r zz C r p z p
r z
α α α
β β
−
+ +
+ ++ − + + +
= − = =
= + = = −
= = + =
The condition for stopping the CGM iterative process with preconditioner is
/ .kAz b b ε− <
(5)
min minλ λɶ
6
Parallel Conjugate Gradient Method (PCGM) Parallelization
The parallelization of the CGM methods is based on splitting the
matrix A by horizontal lines into m blocks, and the solution vector z
and the vector of the right-hand part b of the SLAE are divided into
m parts so that n=m x L, where n is the dimension of the system of
equations, m is the number of processors, and L is the number of
lines
Fig. 2. Distribution of data among processors
lines
7
Parallel Square Root Method (PSRM)
= =
TA S S= where S is an upper triangular matrix with positive elements on the main
diagonal, is the transposed matrixTS
This method is based on decomposition of the symmetric positive definite
matrix A in the product:
, .TS y b Sz y= =
1 1 11
1
1 1
,,
, 2,3,...., ; , 1, 2,...,1.
n n nn
i n
i ki k i ik k
k k ii i
ii ii
z y sy b s
b s y y s z
y i n z i n ns s
−
= = +
==
− − = = = = − −
∑ ∑
To solve systems, we use the recursion formulas:
(6)
8
Parallel Square Root Method parallelization
1 -
Processor
2 -
processor…
m –
processor
Fig. 2. Distribution of the i-th line among processors
9
Parallel matrix sweep algorithm
10
(7)
(8)
(9)
11
(10)
(11)
(12)
12
Numerical experiments and results
The parallel algorithms: conjugate gradient method with
preconditioner (PCGM), conjugate gradient method with
regularization (PCGR) and parallel square root method
(PSRM), were implemented on:
• Multi-core Intel processor using the technology of
OpenMP,
• and the Windows API development tools.
The basic data and quasi-model solution of the problem
were provided by the Department of Borehole Geophysics
of the Institute of Petroleum Geology and Geophysics of
the Siberian Branch of RAS (Novosibirsk).
13
Numerical experiments and results
Fig 3. Quasy model solution14
Method Computing system T,sec.(Windows API)
T, sec. (OpenMP)
PCGM
Intel Core I5 (1 core ) 165 145
Intel Core I5 (2 core ) 142 127
Intel Core I5 (4 core ) 118 101
mTmT
Computation times
Table 1. Computation times of solving SLAE
PCGR
Intel Core I5 (1 core ) 1805 1804
Intel Core I5 (2 core ) 1490 1450
Intel Core I5 (4 core ) 1288 1230
PSRM
Intel Core I5 (1 core ) 26 18
Intel Core I5 (2 core ) 18 13
Intel Core I5 (4 core ) 9 7
15
The numerical solution of the problem is compared with the quasi-model solution
by means of calculating the relative error
/ ,M N MY Y Yσ = − MY is the quasi-model solution of the problem
NY is the numerical solution of the problem
This problem was solved by the parallel conjugate gradient method with
preconditioner, parallel matrix sweep algorithm, and parallel square rootpreconditioner, parallel matrix sweep algorithm, and parallel square root
method. The numerical solution of the problem coincides with the quasi-model
solution with accuracy
7 7 710 , 2 10 , 6 10 .PCGM PCGR PSRMσ σ σ− − −≈ ≈ ⋅ ≈ ⋅
11 6 5maxmax min
min
( ) 1.3 10 , 1.4 10 , 1.1 10 0,cond Aλ
λ λλ
−= ≈ ⋅ = ⋅ = ⋅ >
A priori we find the condition number of the original matrix A
16
Conclusion
We have proposed the methods of solving the SLAE with block-
fivediagonal matrices:
1. Parallel square root method with preconditioner are proposed.
2. Parallel square root method with regularization are proposed.
3. Parallel conjugate gradient method.3. Parallel conjugate gradient method.
4. Parallel matrix sweep algorithm.
Algorithms , instead 4 are implemented on the Intel multi-core processor.
Investigation of efficiency and optimization of parallel algorithms for
solving the problem with quasi-model data are performed.
17
References1. Dashevsky, J.A, Surodina, I.V., Epov, M.I.: Quasi-three-dimensional mathematical
modelling of diagrams of axisymmetric direct current probes in anisotropic profiles.
Siberian J. of Industrial Mathematics. Vol. 5, № 3 (11), pp. 76–91 (2002).
2. Gorbachev, I.I., Popov, V.V. & Akimova, E.N. Computer simulation of the diffusion
interaction between carbonitride precipitates and austenitic matrix with allowance for the
possibility of variation of their composition, The Physics of Metals and
Metallography 2006, Volume 102, Issue 1, pp 18–28,
http://link.springer.com/article/10.1134/S0031918X06070039
3. Faddeev, V.K., Faddeeva, V.N.: Computational methods of linear algebra. M.: Gos. Isdat. 3. Faddeev, V.K., Faddeeva, V.N.: Computational methods of linear algebra. M.: Gos. Isdat.
Fizmat. Lit. (1963).
4. Elena N. Akimova, Dmitry V. Belousov: Parallel algorithms for solving linear systems with
block-tridiagonal matrices on multi-core CPU with GPU, Journal of Computational Science
2012 Volume 3, Issue 6, Pages 445–449,
http://www.sciencedirect.com/science/article/pii/S1877750312000932
5. Akimova E.N. A parallel matrix sweep algorithm for solving linear system with block-
fivediagonal matrices // AIP Conf. Proc.1648, 850028. 2015. Rhodes, Greece, 22–28 Sept.
2014. 5 p., http://scitation.aip.org/content/aip/proceeding/aipcp/10.1063/1.4913083
6. Voevodin, Vl.V.: Parallel programming technologies. URL: http://parallel.ru.
7. Samarskii, A.A., Nikolaev E.S.: Methods for solving the grid equations. M. Nauka. (1978).
8. Process and Thread Functions. https://msdn.microsoft.com/en-
us/library/windows/desktop/ms684847(v=vs.85).aspx.
18
Thanks for Attention!Contact
Dmitry Belousov Elena AkimovaDmitry Belousov Elena Akimova
[email protected] [email protected]
T: +79068088664
Institute of Mathematics and Mechanics of UrB RAS
19