present and future computational requirements general

Project Computational Current Future Accelerators

Present and Future Computational Requirements

General Plasma PhysicsCenter for Integrated Computation and Analysis of Reconnection

and Turbulence (CICART)

Kai Germaschewski, Homa KarimabadiAmitava Bhattacharjee, Fatima Ebrahimi, Will Fox, Liwei Lin

CICARTSpace Science Center / Dept. of Physics

University of New Hampshire

March 18, 2013

Kai Germaschewski and Homa Karimabadi CICART


Outline

1 Project Information

2 Computational Strategies

3 Current HPC usage and methods

4 HPC requirements for 2017

5 Strategies for New Architectures



Project Information

Center for Integrated Computation and Analysis ofReconnection and TurbulenceDirector: Amitava Bhattacharjee, PPPL / Princeton UniversityCo-Director: Ben Chandran, University of New HampshireCICART has a dual mission in research: it seeks fundamentaladvances in physical understanding, and works to achieve theseadvances by means of innovations in computer simulation methodsand theoretical models, and validation by comparison with laboratoryexperiments and space observations. Our research program has twoelements: niche areas in the physics of magnetic reconnection andturbulence which build on past accomplishments of the CICARTgroup and to which the group is well-positioned to contribute, andhigh-performance computing tools needed to address these topics.



Objectives

Magnetic ReconnectionReconnection and in laser-generated plasma bubblesReconnection and secondary instabilities in large,high-Lundquist-number plasmasParticle acceleration in the presence of multiple magneticislandsGyrokinetic reconnection: comparison with fluid andparticle-in-cell models

TurbulenceImbalanced turbulenceIon heatingTurbulence in laboratory (including fusion-relevant)experiments



Bubble reconnection

WFox APS 2012

Reconnection observed in laser-driven plasma experiments

Rutherford [Nilson, et al PRL 2006, PoP 2008, Willingale et al PoP 2010]

Omega: [C.K. Li, et al PRL 2007]

Shenguang [Zhong et al Nature Phys 2010]



Bubble reconnection

WFox APS 2012

Particle energization is under study

Separatrix: acceleration Outflow: heatingInflow



Bubble reconnection

WFox APS 2012

2D vs 3D: Jz evolution

Quiz: what is bipolar Jz in 3-D at late time?Answer: the Hall current system near the x-line

Early Weibel growth for 2d

Weibel begins for 3-d (note scales)

Late time (peak reconnection)

Time



Current computational methods

Kinetic: Particle Simulation Code (PSC)solves Vlasov-Maxwell equations, in 1D/2D/3D1st and 2nd order shape functionsbinary collisionsexplicit timestepping, parallelized by domain decompositionmodular designspecial features: dynamic load balancing, AMR (wip)

Fluid: Magnetic Reconnection Code (MRC)

solves extended MHD: Generalized Ohm’s Lawfinite-volume, div B = 0, arbitrary curvilinear gridsexplicit, implicit time integration through PETScautomatic code generator generates r.h.s., Jacobianparallelized by domain decomposition / MPI



Current HPC usage

We used to run most simulations on a local Beowulf cluster, butin recent years, we have run almost all simulations at NERSCand other supercomputing centers.

Usage 2009

NERSC: 500,000 hrslocal cluster: 1,400,000 hrs

Usage 2012

NERSC: 3,500,000 hrsJaguar: 15,200,000 hrsXSEDE: 2,000,000 hrsBlueWaters: 100,000 hrslocal cluster: 100,000 hrs



Typical jobs

PSC: Particle-in-Cell20,000 - 60,000 cores, 4 - 24 hoursData written: 5 TB (data read: 3 TB on restart)Memory used: 3 TB, 1 GB / core

MRC/HMHD/OpenGGCM: MHD / Hall-MHD4,000 - 20,000 cores, 12 - 48 hoursData written: 2 TBMemory used: 10 GB, < 100 MB / core

Necessary software, infrastructureHDF5, PETSc, (Parallel Python?)visualization cluster (Paraview, python, matlab)with large memory and fast disk access



HPC requirements for 2017

PIC runs are by far most expensive, so they dominate therequirements.

RequirementsHours: 100,000,000nr cores: 1,000,000 1-2 GB / corewallclock: 72 hrsI/O: 50 TB + 1 PB checkpoint (!?!)online file storage: 200 TB / 1.2 PB (checkpoint)offline file storage: 1 PBdata analysis becomes even more challenging ;-(

Enables: End-to-end plasma bubble reconnection, extended MHDspace weather modeling, predictive 3-d simulations of lab / fusiondevices



Dynamic Load Balancing

Problem with PIC simulations: Those particles just keep onmoving!

PIC codes parallelized via domain decomposition often becomeunbalanced over time – even if balanced nicely at the start ofthe simulation.

Rebalance by moving processor domain boundaries:




Color indicates the processor responsible for the corresponding partof the domain.




with patch-based load balancing

0 5000 10000 15000 20000time step [ms]

0

200

400

600

800

1000w

all

tim

e p

er

step

particle push (reference)whole step (reference)particle push (balanced)whole step (balanced)



nvidia GPU, Intel MIC

Performance on TitanDev / BlueWaters:16-core AMD 6274 CPU, Nvidia Tesla M2090 / Tesla K20X

Kernel Performance[particles/sec]

2D push & V-B current, CPU (AMD) 130 × 106

2D push & V-B current, GPU (M2090) 565 × 106

2D push & V-B current, GPU (K20X) 710 × 106

For best performance, need to use GPU and CPU simultaneously.Patch-based load balancing enables us to do that: On each node, we have 1MPI-process that has ≈ 30 patches that are processed on the GPU, and 15MPI-processes that have 1 patch each that are processed on the remainingCPU cores.



Suggestions

Max. queue limits of > 24 hrs is highly desiredWait time of less than 2 weeksSufficient scratch space for data analysis & purging nomore frequent than 3 monthsEfficient I/OEstablish a POC for large usersSupport from visualization experts is criticalMaintain good interactive accessFault tolerance solutionsPGAS / new programming models / load balancing


present and future computational requirements general

Documents