implementation of a particle in cell algorithm in … · faster than tbb version. implementation of...

27
Implementation of a Particle in Cell algorithm in PEANO Bart Verleye, Denis Jarema, Pawan Kumar Dirk Roose KU Leuven & Flanders Exascience Lab, Leuven (Intel Exascale Labs Europe) (with T. Weinzierl and K. Unterweger, TU Munich) Implementation of a Particle in Cell algorithm in PEANO

Upload: doankiet

Post on 26-Sep-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Implementation of a Particle in Cell

algorithm in PEANO

Bart Verleye, Denis Jarema, Pawan KumarDirk Roose

KU Leuven & Flanders Exascience Lab, Leuven

(Intel Exascale Labs Europe)

(with T. Weinzierl and K. Unterweger, TU Munich)

Implementation of a Particle in Cell algorithm in PEANO

Overview

I Context

I Peano

I Particle in Cell (PIC) algorithm

I Particles in Peano

I Results

I Conclusions

Implementation of a Particle in Cell algorithm in PEANO

Goal: space weather simulation

cf. previous talk by Arnaud Beck

A parallel multi-physics (multi-scale) codeI AMR based PDE solver (MHD eqs. or Maxwell eqs.)I with locally refined physics (Maxwell-Vlasov eqs.)⇒ particle in cell (PIC) method

Implementation of a Particle in Cell algorithm in PEANO

Requirements

I AMR data structure

I Multi-level data structure: e.g. coupling PDE − PIC(PDE → boundary conditions for PIC)

I Support for particle-based algorithms

I Dynamic load balancing

Implementation of a Particle in Cell algorithm in PEANO

PEANO

A parallel mesh traversal package (TU Munchen)

Features:

I Multi-level and AMR datastructures

I User is shielded from mostof the parallelisation issues

I Hybrid parallelism(TBB / OpenMP, MPI)

I focus on data locality andefficient cache usage

I Low memory overhead

Implementation of a Particle in Cell algorithm in PEANO

PEANO

Constraints:

I Application programmer ‘sees’ only a mesh cell and thecell at the higher level

cell vertices are shared between cells→ information exchange via vertices

(threads, communications, etc: handled by Peano)

I Cells are accessed in an ‘unknown’ order

Implementation of a Particle in Cell algorithm in PEANO

Peano mesh traversal

[H-J Bungartz et al, Comp Mech 48:365-376, 2011]

Implementation of a Particle in Cell algorithm in PEANO

Handling a cell

How to implement a stencil operation within Peano?assuming that data are in cell centers (‘grid points’)how to get information from neighboring cells?

Store average of the value of a cell and of a neighboring cell ina common vertex → stencil can be computed

mab =fa + fb

2

m(n+1)ab = m

(n)ab +

f(n+1)b − f

(n)b

2

fafb

fdfc

mab

mbdmabcd

Implementation of a Particle in Cell algorithm in PEANO

Programming in PEANO

General approach:

I define data for cells and vertices in definition files

I define ‘methods’ in definition filese.g. 1 step of iterative linear system solver

I pre-process −→ glue code

I implement events that happen during the grid traversal

I add locks, compile with TBB or OpenMP

I implement mergeWithNeighbour (c++),compile with MPI

I set tuning parameters: grain size (TBB auto tuning),message size (MPI)

Implementation of a Particle in Cell algorithm in PEANO

Communication in PEANO (MPI version)

Data stored in a vertex at a partition boundary iscommunicated after each traversal to ’neighbor’ via MPI.

5

D

A

2

7 8

1

6

B

C

4

3

9

2bis

8bis

5bis

Implementation of a Particle in Cell algorithm in PEANO

Particle in Cell Simulation

I Electrostatic or electromagnetic

I Explicit or implicit

I Implemented : explicit electrostaticPIC on a regular grid

I Many particles of several species,e.g. 3 species, 200 particles/species/cell

I Initially: particles evenly distributed,but they move and can cluster→ need for dynamic load balancing

Implementation of a Particle in Cell algorithm in PEANO

PIC ’phases’

1. Particle to grid : particles → charge density in mesh cells

2. Linear system solve : solve discretized Poisson eq. →potential

3. Update E-field and velocity : potential → electrical field→ interpolated field → velocity of particles

4. Update location : velocity (and time step) → newparticle positions & particles movement

Each phase requires interaction between threadsor communication between processes

Implementation of a Particle in Cell algorithm in PEANO

Particle-to-Grid

Particle densities determine the RHS of the Poisson eq.Particle: same size as a cell

I 1 particle contributes to4 cells (in 2D)

I Red part: proportion ofthe particle’s densitycontributed to cell 1

I Contribution: to becomputed for everyparticle and every cell

r_4 r_3

r_2r_1

Implementation of a Particle in Cell algorithm in PEANO

Update E-field and velocity

I Potential → electric fieldin each direction

I Electric field interpolatedat position of particles(inverse of theparticle-to-grid operation)→ new velocity ofparticles

E_4 E_3

E_2E_1

Implementation of a Particle in Cell algorithm in PEANO

Update location

I The update of the location of the particles is a particleonly operation, once the velocity and time-step are known

I For every particle: check whether it has to move toanother cell / vertex

Implementation of a Particle in Cell algorithm in PEANO

Particles in PEANO

I Support for particles was not available in PEANOI Particles should not be stored in vertex data structure

otherwise: particle data loaded into the PEANO streams and

stacks also during Poisson solve

I New feature in PEANO:heap: pointer (in vertex data structure) to an array

I A particle belongs to 8 cells (3D), and therefore stored inthe array belonging to the common vertex

I A cell has access to 8 vertices, and can move a particlefrom one vertex to another

I If a particle moves further away:use the spacetree grid: move via the higher level(s)

Implementation of a Particle in Cell algorithm in PEANO

Particles in PEANO (2)

On a spacetree grid, the particles move via the higher level(s):from the finest green level, via the higher blue level to theorange level, and then (during the next traversal) back via blueto the green level

Implementation of a Particle in Cell algorithm in PEANO

Test setup

I Explicit PIC on mesh with N × N × N vertices, with Mparticles/vertex

I Particles: initial and boundary conditionsI Initial position: random around the vertex location,

inside the vertex coverageI Initial velocity: randomI Boundary condition: bounce back

I Poisson equation: Dirichlet boundary conditions

I Conserved quantities (checked)I Constant total densityI Conservation of momentum (until particles hit the wall)

Implementation of a Particle in Cell algorithm in PEANO

Test case: strong scaling results

I Explicit electrostatic PIC on mesh with 64×64×64vertices, 600 particles/vertex = 157 286 400 particles

I On multicore node (2 socket, 6 core X5660 2.8Ghz):compiled with TBB

I expensive phases : up to 8 cores: superlinear speedup(S8 = 8.7)

I SOR iteration : S8 limited to 2.2

I On HP cluster: compiled with MPII expensive phases : up to 64 cores: superlinear speedupI faster than TBB version

Implementation of a Particle in Cell algorithm in PEANO

Multicore results (TBB)

I PIC time step: for each phase: time (s) & (speed-up)on Intel X5660 2.8Ghz 2x6 core node (averaged over 10 time steps)

I Total = time for 1 time step with 20 SOR iterations

cores Part. to grid SOR Upd. E-field Update Total& velocity Location

1 201 0.34 70 206 4834 48 (4.1) 0.17 (2.0) 18 (3.8) 48 (4.2) 117 (4.1)8 23 (8.7) 0.14 (2.4) 10 (7.0) 24 (8.5) 59 (8.1)

12 19 (10.5) 0.15 (2.2) 12 (5.8) 18 (11.4) 52 (9.2)

I 643 grid, 600 particles/vertex = 157 286 400 particles

I Constant grain size: 500, might not be optimal for 12 cores

Implementation of a Particle in Cell algorithm in PEANO

1 10

Number of cores

1E−01

1E+00

1E+01

1E+02

1E+03

Tim

e (

s)

SORUpdate E-field and VelocityUpdate LocationParticle to GridTotal

Time for each 'traversal' (TBB version)

Implementation of a Particle in Cell algorithm in PEANO

4 6 8 10 12 14

Number of cores

0

2

4

6

8

10

12

14

Sp

eed

up

SORUpdate E-field and VelocityParticle to GridUpdate LocationTotalIdeal

Speedup for each 'traversal' (TBB version)

Implementation of a Particle in Cell algorithm in PEANO

MPI results

I PIC time step: for each phase: time (s) & (speed-up)on Intel X5660 2.8Ghz 2x6 core node (averaged over 10 time steps)

I Total = time for 1 time step with 20 SOR iterations

cores Part. to grid SOR Upd. field Update Total& velocity Location

1 201 0.34 70 206 4834 34 (5.9) 0.13 (2.4) 17 (3.8) 34 (5.8) 89 (5.38)

1x8 16 (11.8) 0.08 (4.1) 9 (7.1) 17 (12.1) 45 (10.6)2x4 16 (12.0) 0.08 (4.0) 9 (7.2) 16 (12.2) 44 (10.8)4x2 16 (12.4) 0.08 (4.4) 8 (7.9) 16 (12.5) 42 (11.3)8x1 16 (12.5) 0.07 (4.7) 8.5 (8.2) 16 (13.7) 40 (11.8)64 2.7 (74) 0.07 (5.1) 2.0 (35) 2.3 (90) 8.3 (58)

125 2.4 (84) 0.30 (1.1) 1.7 (41) 1.2 (172) 11 (43)

I 643 grid, 600 particles/vertex = 157 286 400 particlesImplementation of a Particle in Cell algorithm in PEANO

1 10 100

Number of cores

1E−02

1E−01

1E+00

1E+01

1E+02

1E+03

Tim

e (

s)

SORUpdate E-field and VelocityUpdate LocationParticle to GridTotal

Time for each 'traversal' (MPI version)

Implementation of a Particle in Cell algorithm in PEANO

20 40 60 80 100 120 140

Number of cores

0

50

100

150

200

Sp

eed

up

SORUpdate E-field and VelocityParticle to GridUpdate LocationTotalIdeal

Speedup for each 'traversal' (MPI version)

Implementation of a Particle in Cell algorithm in PEANO

Comparison of SOR and CG solvers for Poisson eq.

Preliminary multicore results, 64×64×64 mesh, time per iter.

cores SOR CG1 0.34s 3.6s4 0.17s 2.2s8 0.14s 1.4s

12 0.15s 1.0s

I TBB: grain size = 500I CG: dot product implementation should be improved

I needed: linear system solvers with better arithm. intensity& avoiding communication

cf. talk by Pieter Ghysels

Implementation of a Particle in Cell algorithm in PEANO

Conclusions

I PIC implemented in PEANO

I Good scalability (except Poisson solver)

I Further analysis needed→ hardware simulator may be useful

cf. next talk by Trevor Carlson

Implementation of a Particle in Cell algorithm in PEANO