optimized parallel approach for 3d modelling of forest fire behaviour g. accary, o. bessonov, d....

14
Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov , D. Fougère, S. Meradji, D. Morvan Institute for Problems in Mechanics, Moscow, Russia Université de la Méditerranée, Marseille, France Université Saint-Esprit de Kaslik, Jounieh, Lebanon Parallel Computing Technologies -- PaCT-2007

Upload: dustin-gray

Post on 17-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour

G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan

Institute for Problems in Mechanics, Moscow, RussiaUniversité de la Méditerranée, Marseille, France

Université Saint-Esprit de Kaslik, Jounieh, Lebanon

Parallel Computing Technologies -- PaCT-2007

Page 2: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Introduction

In this work we present methods for parallelization of 3D CFD forest fire modelling code FIRESTAR 3D on NuMA computers in frame of OpenMP environment.---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Numerical model and methodWhy to parallelize ?Computer system selected for this developmentParallelization modelsSpecifics of OpenMP on NuMA computersHow to parallelize for OpenMP on NuMA ?Example of OpenMP parallelization, geometric parallelismCurrent approach to parallelize FIRESTAR 3DParallelization results for the benchmark problemsParallelization of radiative transfer (input data parallelism)Conclusion

Page 3: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Numerical model and method

Full-physical 3D model of forest fire behaviourComplex unsteady flow in 3D rectangular domainSolid phases (vegetation) and gas mixtureDecomposition mechanisms: drying, pyrolysis, combustionTransfer: convection, diffusion, radiation, turbulenceNavier-Stokes equations in Boussinesq approximation

Finite Volume discretization, non-uniform staggered gridFully implicit segregated SIMPLER-style solution methodLinear solvers BiCGStab (nonsymmeric), CG (symmetric)Explicit-class preconditioners for linear solvers

Page 4: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Why to parallelize ?

3D vs. 2D:-- much bigger grid (Nx*Ny*Nz grid points vs. Nx*Ny);-- more complicated discretizations;-- additional grid compression in problematic areas;

As a result, total computational complexity increases by (at least) 2 orders of magnitude.

Goal: to accelerate by about 10 times (as minimum)and to achieve (along with another optimizations) the speed of 2D simulations.

Page 5: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Computer system selected for this development

SGI Altix 350 shared-memory system20 processors Itanium 2 1.5 Ghz 4M

NuMA organization of the system(Non-uniform Memory Architecture):10 bi-processor modules, with local memory in a module (SMP-nodes), interconnected by very fast interface

Current configuration:8 nodes (16 CPUs) connected to theNuMA switch - "batch domain" forintensive computations.2 nodes (4 CPUs) - "interactive" domain for development and debug.

Page 6: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Parallelization models

2 principal models of parallelization:

message passing (MPI):- more universal;- can be applied to distributed memory systems (clusters) as well as to shared memory computers;- complicated to program, requires total reorganization of a code and (often) revision of algorithms.

shared memory (OpenMP):- looks as an extension of Fortran and C programming languages;- comment-like directives (ignored if compiled without "-openmp" switch);- simple to program, allows to easily parallelize many algorithms.

!$OMP DO do K=1,Nz do J=1,Ny do I=1,Nx {processing} enddo enddo enddo!$OMP END DO

Page 7: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Specifics of OpenMP on NuMA computers

Access to the memory within a node(local memory) is fast; access tothe memory within another node(remote memory) is much slower ==>

Distribution of main data arrays in local memories must correspond to the distribution of computational work between processors !!!

This is not supported explicitly by OpenMP ==> Special initialization is required (e.g. assignment in a parallel loop).

Affiliation (binding) of CPUs to processes in order to avoid migration between processors (e.g. "dplace" utility).

Page 8: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

How to parallelize for OpenMP on NuMA ?

Usually, geometric parallelism is applied -data elements are split in some dimenstion.

FIRESTAR 3D - most computations are in CG solvers & calculation of turbulent quantities =>easily and naturally parallelizable in OpenMP.

Algorithms with recursive dependences(3-diag solvers, line Jacobi/GS preconditioners) - more difficult, not naturally (in development).

Restrictions of OpenMP/NuMA: parallelization in only one spatial direction (~16 CPUs is a limit)

Input data parallelism (or event parallelism) - for radiative transport equation (split by angles).

Page 9: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Example of OpenMP parallelization (geometric parallelism)

!$OMP DO do K=1,Nz do J=1,Ny do I=1,Nx Wo3(I,J,K)=Wo2(I,J,K)+ & beta*Wo3(I,J,K) enddo enddo enddo!$OMP END DO

Every processor computes its own part of the outermost DO-loop (do K=1,Nz). Iterations of this loop are split evenly between all CPUs. Portions of 3D data arrays must be distributed between local memories accordingly.

Page 10: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Current approach to parallelize FIRESTAR 3D

Selection and OpenMP-parallelization of the main time-consuming routines: 1) iterative CG solvers & calculationof turbulent quantities ~80% CPU time (in serial execution).

2) routines for transport equations (velocity, temperature) and pressure correstion ~20% (in serial execution).

3) initialization - just assignment ina parallel DO loop that correspondsto computational parallel DO loops.

4) some serial optimizations andtransformations of the code(in order to avoid dependenciesand side-effects between threads).

!$OMP DO do K=0,Nz+1 do J=0,Ny+1 do I=0,Nx+1 Wo2(I,J,K)=0. Wo3(I,J,K)=0. enddo enddo enddo!$OMP END DO

Page 11: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Parallelization results for the benchmark problem 60x60x60

Speed-up is good ! (problem size 170 MB)

2 processors: limited by thethroughput of a local memory (which is common for 2 CPUs)4, 8 processors: superlinear speed-up (owing to the helpof a large 4 Mbyte L3 cache in every CPU)16 processors: negative effects (not divisible by 16, i.e. loaddisbalance; too small problem, i.e. influence of big boundaries)

Page 12: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Parallelization results: ”airflow canopy” problem 96x96x81

Speed-up is reasonable (problem size 1 GB)

2 processors: limited by thethroughput of a local memory (which is common for 2 CPUs)4, 8 processors: no superlinear speed-up (bigger problem !)

16 processors: negative effects (load disbalance etc.) are partly compensated by positive effects of a large L3-cache

Page 13: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Parallelization of radiative transfer (input data parallelism) (this work was done in collaboration with INRA-URFM-PIF team)

Full sphere is split into parts(sectors) corresponding tothe number of processors;

Equations are integrated independently in each sector(for the full domain) – i.e.each processor computes its own set of input data;

After data from each sector are distributed to subdomains for further processing with geometric parallelism.

Page 14: Optimized Parallel Approach for 3D Modelling of Forest Fire Behaviour G. Accary, O. Bessonov, D. Fougère, S. Meradji, D. Morvan Institute for Problems

PaCT-2007: Optimized Parallel Approach for 3D Modelling

ConclusionIn this word we developed:- strategy of OpenMP parallelization for NuMA computers- parallelization method for 3D CFD fire modelling code

This new method achieves good parallelization efficiency for moderate number of processors (up to 16).

Further work: acceleration of algebraic solvers, develop-ment and parallelization of implicit-class preconditioners.

AcnowledgementsThis work was supported by the European integrated fire management project (Fire Paradox) and by the Russian Foundation for Basic Research (project # 05-08-18110).

Acknowledgemens

PaCT-2007, September 2007Pereslavl-Zalessky, Russia