some experiments on grid computing in computational fluid dynamics thierry coupez(**), alain...
TRANSCRIPT
SOME EXPERIMENTS on GRID COMPUTING inCOMPUTATIONAL FLUID DYNAMICS
Thierry Coupez(**), Alain Dervieux(*), Hugues Digonnet(**), Hervé Guillard(*), Jacques Massoni (***), Vanessa Mariotti(*),Youssef Mesri(*), Patrick Nivet (*), Steve Wornom(*)
Large scale computations and CFD Turbulent flows, Required number of mesh points : N = Re ^9/4 Laboratory experiment : Re = 82 000
Industrial devices : Re = 1000 000 Geophysical flows : Re = 10 000 0000
Future of large scale computations in CFD
What kind of architecture for these computations ? Super clusters, e.g Tera10 machine of DAM CEA 4532 proc Intel Titanium
Grid architecture ?
2000 2005 2010
1 M mesh 10 M mesh 100 M mesh1 Tflops 10 Tflops 100 Tflops
End-users requirements
Transparent solution : The grid must be view as a single unified ressource by the end-users
No important code modifications : codes using Fortran/MPIAnd C/C++/MPI must run on the grid
Secure
Mecagrid :projectStarted 11/2002
Connect 3 sites inThe PACA region
Perform experimentsIn grid computingApplied to multimaterialFluid dynamics
Set-up of the Grid
Marseille and CEMEF clusters are private IP addressOnly front-end are routable through the internet
Solution : create a VPN, front end are connected by a tunnel where packets are crypted and transmitted
Installation of the Globus middleware ()
Message passing : MPICH-G2
The MecaGrid : heterogeneous architecture of 162 procs
INRIA Sophia
pf
nina
CEMEF Sophia
IUSTI Marseille
N=32, bi-procSp=2.4Ghz
Vpq=100Mb/s
N=32, mono-procSp=2.4Ghz
Vpq=100Mb/s N=19, bi-procSp=2.4GhzVpq=1Gb/s
N=16, bi-procSp=933Mhz
Vpq=100Mb/s10Mb/s 100Mb/s10Mb/s
The Mecagrid : mesured performances
INRIA Sophia
pf
nina
CEMEF Sophia
IUSTI Marseille
N=32, bi-proc
Sp=2.4GhzVpq=100Mb
/s
N=32, mono-proc
Sp=2.4GhzVpq=100Mb/s
N=16, bi-proc
Sp=2.4GhzVpq=1Gb/s
N=16, bi-proc
Sp=933MhzVpq=100Mb
/s
100Mb/s3.7Mb/s
Stability of the External network
7.2Mb/s
5Mb/s
CFD and parallelism SPMD model
Mesh Partitioning
Initial mesh
Sub-domain 1 Sub-domain 3Sub-domain 2
Solver Solver Solver
Data Data Data
solution
Message passing
Message passing
CODE PORTING
AERO-3D Finite volume code using Fortran77/MPI3D Compressible Navier-Stokes equations with Turbulence modeling (50 000 instructions)Rewrite the code in fortran 90
AEDIPH Finite volume code designed for multimaterialStudies
CIMlib library of CEMEF : a C++/MPI finite element librarySolving multimaterial incompressible flows
Test case : Jet in cross flow
3D LES TurbulenceModeling, CompressibleFlow, explicit solver
Results for 32 partitions100 time steps
Sophia clusters Sophia1-Marseille Sophia2-Marseille
241K mesh 729 s 817 s 1181Com/work 9% 69% 46%400K mesh 827 s 729 965Com/work 1% 13% 6%
Test case 2: 3D Dam break pb
3-D Incompressible Navier-Stokes computation, Level-set representation of the interface with Hamilton-Jacobi reinitialization, Iterative implicit scheme using GMRES (MINRES) preconditioned with ILU, 600 time steps
3D DAM BREAK RESULTS
500 K mesh, 2.5M elements600 time steps : Implicit code : 600 2Mx2M linear systems solvedResults on 3 x 4 proc on 3 different clusters : 60 hWith optimisation of the code for the grid : 37 h
1.5 M mesh, 8.7 M elements600 time steps : Implicit code : 600 6Mx6M linear systems solvedResults on 3 x 11 proc on 3 different clusters : 125 h
PROVISIONAL CONCLUSIONS :
Mecagrid gives access to a large number of processors and the possibility to run larger applications than on a in-home cluster
For sufficient large applications : compete with an in home cluster
No significant communications overhead for sufficient large applications
HOWEVER
Fine tuning of the application codes to obtain good efficiency
Algorithmic developments
Heterogeneous Mesh partitioning
F
The mapping problem : find the mesh partition that minimise the CPU time
Homogeneous (cluster architecture) : load balancing
pvmAVvp
p s
mpct
)(),(
w(v)m)c(p, avec ),(
)(vu,
qm(v)p,m(u)pq
)(
)/vvu,w(m),qp,c( avec ),,(
WEpAVq
p mqpcc
)()(),,( max)(
max)( pp ctmAWF
AVpAVp Heterogeneous (Grid):
Algorithmic Developpements
Iterative linear solvers : b = AX A sparseX X + P (b-AX) P : Preconditioning matrixLU factorization of A : A = LUP : ILU (0), ILU(1), …ILU(k)
ILU(0) ILU(1) ILU(2) ILU(3) Normalized # iter 100 60 46 38 CPU cluster 100 97 126 205CPU Mecagrid 100 60 65 87
Heterogeneous Mesh partitioning : Test case on 32 proc, mesh size 400 K
Sophia-MRS(hetero)
Sophia1-Sophia2(hetero)
Sophia1-Sophia2(homo)
Sophia-MRS(homo)
CPU Time
579.505
349.42
180.8
143.66
clusters
0
100
200
300
400
500
600
N-I(hom) N-P(hom)
N-P(opt) N-I(opt)
Temps(s)
Gain of more than 75% !
Conclusions
Grid appears as a viable alternative to the use of specialized super-clusters for large scale CFD computations
From the point of view of the numerical analysis, grid architectures are a source of new questions :
Mesh and graph partitioningLinear solvers Communication and latency hiding schemes….