parallel iterative solution of the hermite collocation equations on gpus

38
Parallel Iterative solution of the Hermite Collocation Equations on GPUs Emmanuel N. Mathioudakis Co-Authors : Elena Papadopoulou Yiannis Saridakis Nikolaos Vilanakis TECHNICAL UNIVERSITY OF CRETE DEPARTMENT OF SCIENCES APPLIED MATHEMATICS AND COMPUTERS LABORATORY 73100 CHANIA - CRETE - GREECE

Upload: manolis-vavalis

Post on 06-Jul-2015

64 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Parallel iterative solution of the hermite collocation equations on gpus

Parallel Iterative solution of the

Hermite Collocation Equations

on GPUs

Emmanuel N. Mathioudakis

Co-Authors : Elena Papadopoulou – Yiannis Saridakis – Nikolaos Vilanakis

TECHNICAL UNIVERSITY OF CRETE

DEPARTMENT OF SCIENCES

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

73100 CHANIA - CRETE - GREECE

Page 2: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Talk Overview

Hermite Collocation for elliptic BVPs &

Derivation of the Collocation linear system

Development of a parallel algorithm for the

Schur Complement method on Shared

Memory Parallel Architectures

Parallel implementation on multicore computers

with GPUs

Page 3: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

( , ) ( , ) , ( , )

( , ) ( , ) , ( , )

u x y f x y x y

u x y g x y x y

L

BBV

P

( , ) ( , ) 0 , ( , ):

( , ) ( , ) 0 , (,

, )

y y yx x xi j i j i j

y y yx x xi j i j i j

ai j

u f

u g

L

B

Hermite Collocation Method

Page 4: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Hermite Collocation Method…

Page 5: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Hermite Collocation Method…

Page 6: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

Page 7: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

Page 8: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

Page 9: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

Page 10: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

Page 11: Parallel iterative solution of the hermite collocation equations on gpus

with 0

( , ) ( , ) ( , ) , ( , )

( , ) ( , ) , ( , )

u x y u x y f x y x y

u x y g x y x y

2M

od

el

Pro

ble

m

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Page 12: Parallel iterative solution of the hermite collocation equations on gpus

Helmholtz Collocation

Matrix

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Page 13: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

Page 14: Parallel iterative solution of the hermite collocation equations on gpus

The collocation matrix is large, sparse and enjoys no pleasant

properties (e.g. symmetric, definite)

Iterative

+

Parallel

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Page 15: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Iterative Solution

with

Page 16: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Iterative Solution

Page 17: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Eigenvalues of Collocation matrix

Page 18: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Schur Complement Iterative Solution

Page 19: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Parallel Iterative Solution of Collocation Linear system

on Shared Memory Architectures

Uniform Load Balancing between core threads

Minimal Idle Cycles of core threads

Minimal Communication

Page 20: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

case of ns = 2p

( )1Rx ( )

2Rx ( )

3Rx ( )

4Rx

Page 21: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

case of ns = 2p

( )1

Bp

x

( )2

Bp

x

( )3

Bp

x

( )4

Bp

x

Page 22: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

case of ns = 2p

V1

( )1Rx ( )

2Rx ( )

3Rx ( )

4Rx( )

1B

px

( )

2B

px

( )

3B

px

( )4

Bp

x

V2 V3 V4 V5 V6 V7 V8

Page 23: Parallel iterative solution of the hermite collocation equations on gpus

V1 V2 V3 V4 V5

Page 24: Parallel iterative solution of the hermite collocation equations on gpus

Mapping into a fixed size Architecture of N Cores

case of k = 2p/N even 2k virtual threads

l=(j-1)k+1,…,jk

l=2p+(j-1)k+1,…,2p+jk

j=1,…,N

( )Bl

x

( )Rl

x

V2 . . .

Pj

Page 25: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Parallel Schur Complement Iterative Solution

Page 26: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Parallel BiCGSTAB

Page 27: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Parallel BiCGSTAB

Page 28: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

The Dirichlet Helmholtz Problem

2100( 0.1)2with

( , ) 10 ( ) ( ) , ( , ) [0,1] [0,1]

( ) ( ) x

u x y x y x y

x x x e

Page 29: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

HP SL390s

6 core [email protected]

24GB memory

Oracle Linux 6.3 x64

PGI 13.5 Fortran

PCI-e gen2 x16

HP SL390s – Tesla M2070 GPUs

+ 2 x

Page 30: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Iterations / Error measurements

ns BiCGSTAB

Iterations || b – Axn||2

256 294 6.06e-11

512 589 2.85e-11

1024 1161 1.39e-11

2048 3726 9.59e-12

Page 31: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

Page 32: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

Page 33: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

Page 34: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

Page 35: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

Page 36: Parallel iterative solution of the hermite collocation equations on gpus

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

Page 37: Parallel iterative solution of the hermite collocation equations on gpus

Conclusions

• A new parallel algorithm implementing the Schur complement with BiCGSTAB iterative method for Hermite Collocation equations has been developed.

• The algorithm is realized on Shared Memory multi-core machines with GPUs .

• A performance acceleration of up to 30% is observed.

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Page 38: Parallel iterative solution of the hermite collocation equations on gpus

Future work

• Design an efficient parallel Schur complement algorithm of the Hermite Collocation equations for Multiprocessor /Grid machines with GPUs.

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY