implementation of a particle in cell algorithm in … · faster than tbb version. implementation of...
TRANSCRIPT
Implementation of a Particle in Cell
algorithm in PEANO
Bart Verleye, Denis Jarema, Pawan KumarDirk Roose
KU Leuven & Flanders Exascience Lab, Leuven
(Intel Exascale Labs Europe)
(with T. Weinzierl and K. Unterweger, TU Munich)
Implementation of a Particle in Cell algorithm in PEANO
Overview
I Context
I Peano
I Particle in Cell (PIC) algorithm
I Particles in Peano
I Results
I Conclusions
Implementation of a Particle in Cell algorithm in PEANO
Goal: space weather simulation
cf. previous talk by Arnaud Beck
A parallel multi-physics (multi-scale) codeI AMR based PDE solver (MHD eqs. or Maxwell eqs.)I with locally refined physics (Maxwell-Vlasov eqs.)⇒ particle in cell (PIC) method
Implementation of a Particle in Cell algorithm in PEANO
Requirements
I AMR data structure
I Multi-level data structure: e.g. coupling PDE − PIC(PDE → boundary conditions for PIC)
I Support for particle-based algorithms
I Dynamic load balancing
Implementation of a Particle in Cell algorithm in PEANO
PEANO
A parallel mesh traversal package (TU Munchen)
Features:
I Multi-level and AMR datastructures
I User is shielded from mostof the parallelisation issues
I Hybrid parallelism(TBB / OpenMP, MPI)
I focus on data locality andefficient cache usage
I Low memory overhead
Implementation of a Particle in Cell algorithm in PEANO
PEANO
Constraints:
I Application programmer ‘sees’ only a mesh cell and thecell at the higher level
cell vertices are shared between cells→ information exchange via vertices
(threads, communications, etc: handled by Peano)
I Cells are accessed in an ‘unknown’ order
Implementation of a Particle in Cell algorithm in PEANO
Peano mesh traversal
[H-J Bungartz et al, Comp Mech 48:365-376, 2011]
Implementation of a Particle in Cell algorithm in PEANO
Handling a cell
How to implement a stencil operation within Peano?assuming that data are in cell centers (‘grid points’)how to get information from neighboring cells?
Store average of the value of a cell and of a neighboring cell ina common vertex → stencil can be computed
mab =fa + fb
2
m(n+1)ab = m
(n)ab +
f(n+1)b − f
(n)b
2
fafb
fdfc
mab
mbdmabcd
Implementation of a Particle in Cell algorithm in PEANO
Programming in PEANO
General approach:
I define data for cells and vertices in definition files
I define ‘methods’ in definition filese.g. 1 step of iterative linear system solver
I pre-process −→ glue code
I implement events that happen during the grid traversal
I add locks, compile with TBB or OpenMP
I implement mergeWithNeighbour (c++),compile with MPI
I set tuning parameters: grain size (TBB auto tuning),message size (MPI)
Implementation of a Particle in Cell algorithm in PEANO
Communication in PEANO (MPI version)
Data stored in a vertex at a partition boundary iscommunicated after each traversal to ’neighbor’ via MPI.
5
D
A
2
7 8
1
6
B
C
4
3
9
2bis
8bis
5bis
Implementation of a Particle in Cell algorithm in PEANO
Particle in Cell Simulation
I Electrostatic or electromagnetic
I Explicit or implicit
I Implemented : explicit electrostaticPIC on a regular grid
I Many particles of several species,e.g. 3 species, 200 particles/species/cell
I Initially: particles evenly distributed,but they move and can cluster→ need for dynamic load balancing
Implementation of a Particle in Cell algorithm in PEANO
PIC ’phases’
1. Particle to grid : particles → charge density in mesh cells
2. Linear system solve : solve discretized Poisson eq. →potential
3. Update E-field and velocity : potential → electrical field→ interpolated field → velocity of particles
4. Update location : velocity (and time step) → newparticle positions & particles movement
Each phase requires interaction between threadsor communication between processes
Implementation of a Particle in Cell algorithm in PEANO
Particle-to-Grid
Particle densities determine the RHS of the Poisson eq.Particle: same size as a cell
I 1 particle contributes to4 cells (in 2D)
I Red part: proportion ofthe particle’s densitycontributed to cell 1
I Contribution: to becomputed for everyparticle and every cell
r_4 r_3
r_2r_1
Implementation of a Particle in Cell algorithm in PEANO
Update E-field and velocity
I Potential → electric fieldin each direction
I Electric field interpolatedat position of particles(inverse of theparticle-to-grid operation)→ new velocity ofparticles
E_4 E_3
E_2E_1
Implementation of a Particle in Cell algorithm in PEANO
Update location
I The update of the location of the particles is a particleonly operation, once the velocity and time-step are known
I For every particle: check whether it has to move toanother cell / vertex
Implementation of a Particle in Cell algorithm in PEANO
Particles in PEANO
I Support for particles was not available in PEANOI Particles should not be stored in vertex data structure
otherwise: particle data loaded into the PEANO streams and
stacks also during Poisson solve
I New feature in PEANO:heap: pointer (in vertex data structure) to an array
I A particle belongs to 8 cells (3D), and therefore stored inthe array belonging to the common vertex
I A cell has access to 8 vertices, and can move a particlefrom one vertex to another
I If a particle moves further away:use the spacetree grid: move via the higher level(s)
Implementation of a Particle in Cell algorithm in PEANO
Particles in PEANO (2)
On a spacetree grid, the particles move via the higher level(s):from the finest green level, via the higher blue level to theorange level, and then (during the next traversal) back via blueto the green level
Implementation of a Particle in Cell algorithm in PEANO
Test setup
I Explicit PIC on mesh with N × N × N vertices, with Mparticles/vertex
I Particles: initial and boundary conditionsI Initial position: random around the vertex location,
inside the vertex coverageI Initial velocity: randomI Boundary condition: bounce back
I Poisson equation: Dirichlet boundary conditions
I Conserved quantities (checked)I Constant total densityI Conservation of momentum (until particles hit the wall)
Implementation of a Particle in Cell algorithm in PEANO
Test case: strong scaling results
I Explicit electrostatic PIC on mesh with 64×64×64vertices, 600 particles/vertex = 157 286 400 particles
I On multicore node (2 socket, 6 core X5660 2.8Ghz):compiled with TBB
I expensive phases : up to 8 cores: superlinear speedup(S8 = 8.7)
I SOR iteration : S8 limited to 2.2
I On HP cluster: compiled with MPII expensive phases : up to 64 cores: superlinear speedupI faster than TBB version
Implementation of a Particle in Cell algorithm in PEANO
Multicore results (TBB)
I PIC time step: for each phase: time (s) & (speed-up)on Intel X5660 2.8Ghz 2x6 core node (averaged over 10 time steps)
I Total = time for 1 time step with 20 SOR iterations
cores Part. to grid SOR Upd. E-field Update Total& velocity Location
1 201 0.34 70 206 4834 48 (4.1) 0.17 (2.0) 18 (3.8) 48 (4.2) 117 (4.1)8 23 (8.7) 0.14 (2.4) 10 (7.0) 24 (8.5) 59 (8.1)
12 19 (10.5) 0.15 (2.2) 12 (5.8) 18 (11.4) 52 (9.2)
I 643 grid, 600 particles/vertex = 157 286 400 particles
I Constant grain size: 500, might not be optimal for 12 cores
Implementation of a Particle in Cell algorithm in PEANO
1 10
Number of cores
1E−01
1E+00
1E+01
1E+02
1E+03
Tim
e (
s)
SORUpdate E-field and VelocityUpdate LocationParticle to GridTotal
Time for each 'traversal' (TBB version)
Implementation of a Particle in Cell algorithm in PEANO
4 6 8 10 12 14
Number of cores
0
2
4
6
8
10
12
14
Sp
eed
up
SORUpdate E-field and VelocityParticle to GridUpdate LocationTotalIdeal
Speedup for each 'traversal' (TBB version)
Implementation of a Particle in Cell algorithm in PEANO
MPI results
I PIC time step: for each phase: time (s) & (speed-up)on Intel X5660 2.8Ghz 2x6 core node (averaged over 10 time steps)
I Total = time for 1 time step with 20 SOR iterations
cores Part. to grid SOR Upd. field Update Total& velocity Location
1 201 0.34 70 206 4834 34 (5.9) 0.13 (2.4) 17 (3.8) 34 (5.8) 89 (5.38)
1x8 16 (11.8) 0.08 (4.1) 9 (7.1) 17 (12.1) 45 (10.6)2x4 16 (12.0) 0.08 (4.0) 9 (7.2) 16 (12.2) 44 (10.8)4x2 16 (12.4) 0.08 (4.4) 8 (7.9) 16 (12.5) 42 (11.3)8x1 16 (12.5) 0.07 (4.7) 8.5 (8.2) 16 (13.7) 40 (11.8)64 2.7 (74) 0.07 (5.1) 2.0 (35) 2.3 (90) 8.3 (58)
125 2.4 (84) 0.30 (1.1) 1.7 (41) 1.2 (172) 11 (43)
I 643 grid, 600 particles/vertex = 157 286 400 particlesImplementation of a Particle in Cell algorithm in PEANO
1 10 100
Number of cores
1E−02
1E−01
1E+00
1E+01
1E+02
1E+03
Tim
e (
s)
SORUpdate E-field and VelocityUpdate LocationParticle to GridTotal
Time for each 'traversal' (MPI version)
Implementation of a Particle in Cell algorithm in PEANO
20 40 60 80 100 120 140
Number of cores
0
50
100
150
200
Sp
eed
up
SORUpdate E-field and VelocityParticle to GridUpdate LocationTotalIdeal
Speedup for each 'traversal' (MPI version)
Implementation of a Particle in Cell algorithm in PEANO
Comparison of SOR and CG solvers for Poisson eq.
Preliminary multicore results, 64×64×64 mesh, time per iter.
cores SOR CG1 0.34s 3.6s4 0.17s 2.2s8 0.14s 1.4s
12 0.15s 1.0s
I TBB: grain size = 500I CG: dot product implementation should be improved
I needed: linear system solvers with better arithm. intensity& avoiding communication
cf. talk by Pieter Ghysels
Implementation of a Particle in Cell algorithm in PEANO