some experiments on grid computing in computational fluid dynamics thierry coupez(**), alain...

19
SOME EXPERIMENTS on GRID COMPUTING in COMPUTATIONAL FLUID DYNAMICS y Coupez(**), Alain Dervieux(*), Hugues Digonnet(** Guillard (*), Jacques Massoni (***), Vanessa Mariott f Mesri(*), Patrick Nivet (*), Steve Wornom(*)

Upload: ezra-douglas

Post on 28-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

SOME EXPERIMENTS on GRID COMPUTING inCOMPUTATIONAL FLUID DYNAMICS

Thierry Coupez(**), Alain Dervieux(*), Hugues Digonnet(**), Hervé Guillard(*), Jacques Massoni (***), Vanessa Mariotti(*),Youssef Mesri(*), Patrick Nivet (*), Steve Wornom(*)

Large scale computations and CFD Turbulent flows, Required number of mesh points : N = Re ^9/4 Laboratory experiment : Re = 82 000

Industrial devices : Re = 1000 000 Geophysical flows : Re = 10 000 0000

Future of large scale computations in CFD

What kind of architecture for these computations ? Super clusters, e.g Tera10 machine of DAM CEA 4532 proc Intel Titanium

Grid architecture ?

2000 2005 2010

1 M mesh 10 M mesh 100 M mesh1 Tflops 10 Tflops 100 Tflops

End-users requirements

Transparent solution : The grid must be view as a single unified ressource by the end-users

No important code modifications : codes using Fortran/MPIAnd C/C++/MPI must run on the grid

Secure

Mecagrid :projectStarted 11/2002

Connect 3 sites inThe PACA region

Perform experimentsIn grid computingApplied to multimaterialFluid dynamics

Set-up of the Grid

Marseille and CEMEF clusters are private IP addressOnly front-end are routable through the internet

Solution : create a VPN, front end are connected by a tunnel where packets are crypted and transmitted

Installation of the Globus middleware ()

Message passing : MPICH-G2

The MecaGrid : heterogeneous architecture of 162 procs

INRIA Sophia

pf

nina

CEMEF Sophia

IUSTI Marseille

N=32, bi-procSp=2.4Ghz

Vpq=100Mb/s

N=32, mono-procSp=2.4Ghz

Vpq=100Mb/s N=19, bi-procSp=2.4GhzVpq=1Gb/s

N=16, bi-procSp=933Mhz

Vpq=100Mb/s10Mb/s 100Mb/s10Mb/s

The Mecagrid : mesured performances

INRIA Sophia

pf

nina

CEMEF Sophia

IUSTI Marseille

N=32, bi-proc

Sp=2.4GhzVpq=100Mb

/s

N=32, mono-proc

Sp=2.4GhzVpq=100Mb/s

N=16, bi-proc

Sp=2.4GhzVpq=1Gb/s

N=16, bi-proc

Sp=933MhzVpq=100Mb

/s

100Mb/s3.7Mb/s

Stability of the External network

7.2Mb/s

5Mb/s

CFD and parallelism SPMD model

Mesh Partitioning

Initial mesh

Sub-domain 1 Sub-domain 3Sub-domain 2

Solver Solver Solver

Data Data Data

solution

Message passing

Message passing

CODE PORTING

AERO-3D Finite volume code using Fortran77/MPI3D Compressible Navier-Stokes equations with Turbulence modeling (50 000 instructions)Rewrite the code in fortran 90

AEDIPH Finite volume code designed for multimaterialStudies

CIMlib library of CEMEF : a C++/MPI finite element librarySolving multimaterial incompressible flows

Test case : Jet in cross flow

3D LES TurbulenceModeling, CompressibleFlow, explicit solver

Results for 32 partitions100 time steps

Sophia clusters Sophia1-Marseille Sophia2-Marseille

241K mesh 729 s 817 s 1181Com/work 9% 69% 46%400K mesh 827 s 729 965Com/work 1% 13% 6%

Test case 2: 3D Dam break pb

3-D Incompressible Navier-Stokes computation, Level-set representation of the interface with Hamilton-Jacobi reinitialization, Iterative implicit scheme using GMRES (MINRES) preconditioned with ILU, 600 time steps

3D DAM BREAK RESULTS

500 K mesh, 2.5M elements600 time steps : Implicit code : 600 2Mx2M linear systems solvedResults on 3 x 4 proc on 3 different clusters : 60 hWith optimisation of the code for the grid : 37 h

1.5 M mesh, 8.7 M elements600 time steps : Implicit code : 600 6Mx6M linear systems solvedResults on 3 x 11 proc on 3 different clusters : 125 h

PROVISIONAL CONCLUSIONS :

Mecagrid gives access to a large number of processors and the possibility to run larger applications than on a in-home cluster

For sufficient large applications : compete with an in home cluster

No significant communications overhead for sufficient large applications

HOWEVER

Fine tuning of the application codes to obtain good efficiency

Algorithmic developments

Heterogeneous Mesh partitioning

F

The mapping problem : find the mesh partition that minimise the CPU time

Homogeneous (cluster architecture) : load balancing

pvmAVvp

p s

mpct

)(),(

w(v)m)c(p, avec ),(

)(vu,

qm(v)p,m(u)pq

)(

)/vvu,w(m),qp,c( avec ),,(

WEpAVq

p mqpcc

)()(),,( max)(

max)( pp ctmAWF

AVpAVp Heterogeneous (Grid):

Algorithmic Developpements

Iterative linear solvers : b = AX A sparseX X + P (b-AX) P : Preconditioning matrixLU factorization of A : A = LUP : ILU (0), ILU(1), …ILU(k)

ILU(0) ILU(1) ILU(2) ILU(3) Normalized # iter 100 60 46 38 CPU cluster 100 97 126 205CPU Mecagrid 100 60 65 87

Hierarchical mesh partitioner

Initial mesh

partitioner

1C NCiC

partitioner partitioner partitioner

Heterogeneous Mesh partitioning : Test case on 32 proc, mesh size 400 K

Sophia-MRS(hetero)

Sophia1-Sophia2(hetero)

Sophia1-Sophia2(homo)

Sophia-MRS(homo)

CPU Time

579.505

349.42

180.8

143.66

clusters

0

100

200

300

400

500

600

N-I(hom) N-P(hom)

N-P(opt) N-I(opt)

Temps(s)

Gain of more than 75% !

Conclusions

Grid appears as a viable alternative to the use of specialized super-clusters for large scale CFD computations

From the point of view of the numerical analysis, grid architectures are a source of new questions :

Mesh and graph partitioningLinear solvers Communication and latency hiding schemes….