report of the cemracs project hpc- ilbios: understanding...
TRANSCRIPT
OLGA ASSAINOVA / XAVIER PORTELL
Report of the CEMRACS project HPC-IlBios: understanding the best HPC strategy for the IlBioS approach.Olga Assainova*, Xavier Portell**
CEMRACS 2016. Numerical challenges in parallel scientific computing
August 19, 2016, Marseille
**UMR ECOSYS, AgroParisTech, INRA, Université Paris-SaclayF-78850 Thiverval-Grignon, France.E-mail: [email protected]
*Université de BourgogneFaculté des Sciences Mirande9 avenue Alain SavaryBP 47870 21078 Dijon CedexE-mail: [email protected]
.0‹N°›
SUMMARY
Introduction and background
2
3
1
An overview of StarPU
Model versions produced in CEMRACS
4 Pending work
.0‹N°›
SUMMARY
Introduction and background
2
3
1
An overview of StarPU
Model versions produced in CEMRACS
4 Pending work
Introduction and background_01
.04
.0‹N°›
The CO2 and N2O as main GHG
The CO2 and N2O as main GHG
.0‹N°›
Riebbek (2011)
An important part of the CO2 emissions are due to the activity of the soil microorganisms
The CO2 and N2O as main GHG
.0‹N°›
0,03 % of the total GHG emissions but with a 300-fold greater potential for global warming (Thomson et al 2012)
Hu et al (2015)
.0‹N°›
The soil matrix
The soil matrix
.0‹N°›
The complex geometry of the pore space can now be characterized to a high level of detail and quantify its connectivity and topology
characterized.
X-ray Computed Tomography
The soil matrix
The morphology of the soil matrix affects microbial activity and the gas emissions dynamics to the atmosphere
.0‹N°›
1 Compute-aided Detection - Fluorescence In Situ Hybridization
CAD-FISH1
Imaging of thin soil sections
Microorganisms in soil tend to be found in microcolonies, this means that “identical organisms” will face different microhabitat
conditions
The individuality of the microorganisms plays a role
The soil matrix
.0‹N°›
Mass transport processes modelling in soil
Mass transport processes modelling in soil
.0‹N°›Mass transport processes modelling in soil
Lattice-Boltzmann modelling Fluid Node (D3Q19)
Solid Sites
Solid Sites
From: www.egr.msu.edu/ ~kutay/Lbsite/
.0‹N°›Mass transport processes modelling in soil
Lattice-Boltzmann modelling
Solid Sites
Solid Sites
From: www.egr.msu.edu/ ~kutay/Lbsite/
AdvantageComplex pore space geometry easy
to handleIssues
Computationally demanding, but easy to parallelize (GPU or
accelerators)
.0‹N°›
IlBioS conceptualization and description
IlBioS conceptualization and description
.0‹N°›
The model ILBioS is built coupling an Individual-based Model of the soil bacteria to a lattice-Boltzmann
model simulating the fluid dynamics and mass transport processes of soluble substrates
IlBioS conceptualization and description
.0‹N°›Model conceptualization and description
Lattice-Boltzmann Nodes
Three dimensional lattice-Boltzmann (D3Q7)
.0‹N°›Model conceptualization and description
Three dimensional lattice-Boltzmann (D3Q7)
Connected Lattice-Boltzmann Nodes
.0‹N°›Model conceptualization and description
Three dimensional lattice-Boltzmann (D3Q7)
.0‹N°›Model conceptualization and description
Three dimensional lattice-Boltzmann (D3Q7)
Fluid nodes
Solid nodes
Dissolved Organic Carbon (DOC)
.0‹N°›Model conceptualization and description
Three dimensional lattice-Boltzmann (D3Q7)
.0‹N°›Model conceptualization and description
Three dimensional lattice-Boltzmann (D3Q7)
Fluid nodes
Boundary Solid nodes
Solid nodes
Boundary Fluid nodes
.0‹N°›Model conceptualization and description
Three dimensional lattice-Boltzmann (D3Q7)
Particulate Organic Matter:
Soil bacteria:
Uptake DOC from the boundary fluid nodes.
Release DOC to the boundary fluid nodes.
.0‹N°›Model conceptualization and description
Three dimensional lattice-Boltzmann (D3Q7)
mj mass of the POM agent.
.0‹N°›
mi mass of the bacterium
Model conceptualization and description
Three dimensional lattice-Boltzmann (D3Q7)
.0‹N°›
Computing time requirements: antecedents
Computing time requirements: antecedents
.0‹N°›Computing time requirements: antecedents
Computing time requirements of lBioS❑3D CT image of 2003 nodes.❑10% of porosity.❑10 bacterial nodes.❑5 to 15 POM nodes.❑DOC as a single soluble lattice-Boltzmann substrate.❑Single desktop computer.
25 hours of processing time.
.0‹N°›
Model complexity of upcoming IlBioS models
Model complexity of upcoming IlBioS models
.0‹N°›Model complexity of upcoming IlBioS models
Microbial regulation of the terrestrial Nitrous oxide formation (Hu et al., 2015)
106 – 109 bacteria / gram of soil
.0‹N°›Model complexity of upcoming IlBioS models
Initial (minimal) conceptualization of a multispecies IbM to reproduce CO2 and N2O flow in natural samples
NH3/NH4 NO2 /NO3
N2O
Denitrifying heterotroph(CO2 producing, aerobic (O2) and anaerobic (NO2)
respiration)
Nitrifying autotroph (Aerobic, CO2 consuming)
.0‹N°›Model complexity of upcoming IlBioS models
Initial (minimal) conceptualization of a multispecies IbM to reproduce CO2 and N2O flow in natural samples
CO2, O2, DOC, NH4, NO2/ NO3, N2O
Required lattice-Boltzmann substrates in the minimal system:
.0‹N°›
Future computing time requirements
Future computing time requirements
.0‹N°›Future computing time requirements
Computing time requirements of llBioS❑3D CT image of 2003 nodes.❑10 % of porosity.❑1 000 000 bacterial nodes❑5 to 25 POM nodes.❑O2, DOC, NH4, NO2/ NO3
125-150 hours of processing time.
In any case, a non practical amount of time to perform
simulation experiments
.0‹N°›
Parallelisation strategy of HPC-IlBioS
Parallelisation strategy of HPC-IlBioS
.0‹N°›
Tiling of the data in subdomains
StarPU
CPU functions
Parallelisation strategy of HPC-IlBioS
.0‹N°›
StarPU
100 xSpeed up for lattice-Boltzmann models using GPU
(Banari et al., 2014)
CUDA functions
CPU functions
Parallelisation strategy of HPC-IlBioS
.0‹N°›
MPIStarPU
CUDA functions
CPU functions
CUDA functions
CPU functions
CUDA functions
CPU functions
Future need of increasing the resolution of the 3D CT images
Parallelisation strategy of HPC-IlBioS
.0‹N°›
SUMMARY
Introduction and background
2
3
1
An overview of StarPU
Model versions produced in CEMRACS
4 Pending work
.0‹N°›
StarPU Introduction
StarPU Introduction
.0‹N°›StarPU Introduction
The basics of StarPU syntax and tools
❑Codelet❑Task❑Buffer❑Handle❑CPU/CUDA functions
.0‹N°›StarPU Introduction
StarPU preparation task routine
Registration of the data starpu_init(NULL);
starpu_data_handle_t f_handle; starpu_data_handle_t fn_handle;
starpu_vector_data_register(&ToV_handle, STARPU_MAIN_RAM, (uintptr_t)ToV, nsite, sizeof(ToV[0]));
starpu_matrix_data_register(&f_handle, STARPU_MAIN_RAM, (uintptr_t)f, Q, Q, nsite, sizeof(f[0][0]));
..=>
starpu_data_unregister(f_handle); starpu_data_unregister(f_handle); starpu_shutdown();
.0‹N°›StarPU Introduction
StarPU preparation task routine
Codelet
struct starpu_codelet bounceback_codelet = { .cpu_funcs = { bounceback_spu }, .cpu_funcs_name = {"bounceback_spu"}, .nbuffers = 2,
.modes = { R, RW } };
Task insert starpu_task_insert(&bounceback_codelet, STARPU_R, ToV_handle, STARPU_RW, f_handle, STARPU_VALUE, &SbD, sizeof(unsigned int), STARPU_VALUE, &nfluidsite, sizeof(unsigned long), STARPU_VALUE, &nsite, sizeof(unsigned long),
STARPU_VALUE, &Q, sizeof(int), 0);
.0‹N°›
void bounceback_spu(void *buffers[], void *cl_arg){
unsigned int SubD=0; unsigned long nfluidsite=0; unsigned long nsite=0; int Q=0;
/* Unpacking arguments to their local copy */ starpu_codelet_unpack_args(cl_arg, &SubD,&nfluidsite, &nsite,&Q);
/* Unpacking buffers to to their local copy */ struct starpu_vector_interface *ToV_v =(struct starpu_vector_interface *) buffers[0]; unsigned long* ToV = (unsigned long *)STARPU_VECTOR_GET_PTR(ToV_v);
struct starpu_matrix_interface **f_m =(struct starpu_matrix_interface **) buffers[1]; double **f= (double **) STARPU_MATRIX_GET_PTR(f_m);
/* Beginning of the computations */ register unsigned long ns; for(ns=nfluidsite;ns<nsite;ns++){ if (SubD!=ToV[ns]) continue; /*Skipping computation of LB Nodes not in the current sub domain*/ register int q; for(q=0;q<Q;q++){ f[q][ns]=0.; } } } // end of bounceback_step_spu() function
StarPUexecutivefunction
StarPU Introduction
.0‹N°›
void bounceback_spu(void *buffers[], void *cl_arg){
unsigned int SubD=0; unsigned long nfluidsite=0; unsigned long nsite=0; int Q=0;
/* Unpacking arguments to their local copy */ starpu_codelet_unpack_args(cl_arg, &SubD,&nfluidsite, &nsite,&Q);
/* Unpacking buffers to to their local copy */ struct starpu_vector_interface *ToV_v =(struct starpu_vector_interface *) buffers[0]; unsigned long* ToV = (unsigned long *)STARPU_VECTOR_GET_PTR(ToV_v);
struct starpu_matrix_interface **f_m =(struct starpu_matrix_interface **) buffers[1]; double **f= (double **) STARPU_MATRIX_GET_PTR(f_m);
/* Beginning of the computations */ register unsigned long ns; for(ns=nfluidsite;ns<nsite;ns++){ if (SubD!=ToV[ns]) continue; /*Skipping computation of LB Nodes not in the current sub domain*/ register int q; for(q=0;q<Q;q++){ f[q][ns]=0.; } } } // end of bounceback_step_spu() function
StarPU Introduction
.0‹N°›
SUMMARY
Introduction and background
2
3
1
An overview of StarPU
Model versions produced in CEMRACS
4 Pending work
.0‹N°›
ILBios
.0‹N°›
Spatial cells (LB Nodes)
NBULK
• Water (DOC) • Air • Solid • (agents)
NBOUND NBOUNDS
0 (NX*NY*NZ)-1LBNode
nfluidsites
Bacteria (many individuals/node)
POM
ILBios
.0‹N°›
ILBios
.0‹N°›
Typedef struct BiologicalAgent { unsigned long Who; /* ID of the individual */ unsigned long LBNode; /* lattice-Boltzmann Node */ double Mass; double Active_Fraction; double Respired_CO2; /* technical variable */
} type_bacteria;
Vector holding the individual bacteria properties
General workflow of the model IlBioS
ILBios
.0‹N°›
Typedef struct POMAgent { unsigned long pWho; /* ID */ unsigned long pLBNode; /* lattice-Boltzmann Node */ double pMass; int pFluidNeigh; /* Number of fluid neighbors */ unsigned long *pFluidNeighID; /* technical variable */
} type_POM;
Vector holding the POM properties
ILBios
.0‹N°›
Initialization of the DOC = population Boltzmann
Solid Sites
Solid Sites
From: www.egr.msu.edu/ ~kutay/Lbsite/
ILBios
.0‹N°›
Initialization of the DOC = population Boltzmann Fluid Node (D3Q19)
Solid Sites
Solid Sites
From: www.egr.msu.edu/ ~kutay/Lbsite/
ILBios
.0‹N°›
double *rho = (double *) calloc(nsite, sizeof(double));
Fluid Node (D3Q19)
Solid Sites
Solid Sites
From: www.egr.msu.edu/ ~kutay/Lbsite/
double **f = (double **) calloc(Q, sizeof(double*)); for (q = 0; q < Q; q++) { f[q] = (double *) calloc(nsite, sizeof (double)); }
Initialization of the DOC = population Boltzmann
ILBios
.0‹N°›
Time step (TS) or main loop
ILBios
.0‹N°›
The State of the bacterial and POM agents and the global out-fluxes are stored in
appropriate data arrays every sampling time.
ILBios
.0‹N°›
ILBios
.0‹N°›
Mass of DOC particles moving in the unitary directions
D2Q5
ILBios
.0‹N°›
ILBios
.0‹N°›
The mass of DOC particles is propagated to the neighbors
ILBios
.0‹N°›
Solid nodes
ILBios
.0‹N°›
The DOC particles bounce back on the solid nodes directions
Solid nodes
ILBios
.0‹N°›
ILBios
.0‹N°›
Increase the DOC content of all fluid neighbors
ILBios
.0‹N°›
Decreases the DOC content of the LBNode
ILBios
.0‹N°›
Writing the sampled output in files.
ILBios
.0‹N°›
ILBios
ILBios SbD
.0‹N°›
ILBios
ILBios SbD
1 2
3 4
.0‹N°›
ILBios
ILBios SbD
ILBios SPU
One task for every function in the time step loop
.0‹N°›
for(t=0;t<TMAX+1;t++) { ..=> starpu_task_insert(&POM_agents_loop_codelet, STARPU_R, tw_handle, STARPU_W, rho_handle, STARPU_RW, f_handle, STARPU_RW, POMPopul_handle, STARPU_VALUE, &POM_ts, sizeof(unsigned long), STARPU_VALUE, &Q, sizeof(int), STARPU_VALUE, &Q2, sizeof(int), STARPU_VALUE, &KPOM, sizeof(double), STARPU_VALUE, &MIN_POS_DBL, sizeof(double), 0);
Examplesofthecodelet
starpu_task_insert(&propagation_codelet, STARPU_R, neighbour_handle, STARPU_RW, f_handle, STARPU_VALUE, &Q, sizeof(int), STARPU_VALUE, &nfluidsite, sizeof(unsigned long), STARPU_VALUE, &nsite, sizeof(unsigned long), 0);
}
ILBios
ILBios SbD
ILBios SPU
.0‹N°›
ILBios
ILBios SbD
ILBios SPU One task for every function and many tasks for collision function
HPC-ILBios (collision)
.0‹N°›
for(int step=0; step<PARTS; step++) { struct starpu_task *task = starpu_task_create(); task->cl = &collision_vSubD_codelet;
task->handles[0] = ToV_handle; task->handles[1] = rho_handle; task->handles[2] = uw_handle; task->handles[3] = c_handle; task->handles[4] = twstar_handle; task->handles[5] = jo2_handle; task->handles[6] = jo_q_handle; task->handles[7] = f_handle; task->handles[8] = j_q_handle; task->handles[9] = evenf_handle; task->handles[10] = oddf_handle; task->handles[11] = fn_handle; // The only one changing parameter param.min = min; param.lenth =lenth; task->cl_arg = ¶m; task->cl_arg_size = sizeof(param); task->synchronous = 1; ret =starpu_task_submit(task)
}
ILBios
ILBios SbD
ILBios SPU
HPC-ILBios (collision)
.0‹N°›
SUMMARY
Introduction and background
2
3
1
An overview of StarPU
Model versions produced in CEMRACS
4 Pending work
.0‹N°›
Domaindecompositionapproach
❑ Partial domain decompositionof the liquid parts of the cub❑Implementing MPI
R1
R2
.0‹N°›
Thank you very much for your attention.
But, before we finish …
A big thanks to all people that has helped us !!
Alexandre Ancel
Herbert Oberlin
David Coulette
Philippe Helluy and all SCHNAPS members
Matthieu Boileau
Laura Grigori
Camile Coti