dft requirements for leadership-class computers n. schunck department of physics astronomy,...

14
DFT requirements for leadership-class computers N. Schunck Department of Physics Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics Division, Oak Ridge National Laboratory, Oak Ridge, TN-37831, USA http://unedf.org The 3rd LACM-EFES-JUSTIPEN Workshop JIHIR, Oak Ridge National Laboratory, February 23-25, 2009 A. Baran, J. Dobaczewski, J. McDonnell, J. Moré, W. Nazarewicz, N. Nikolov, H. H. Nam, J. Pei, J. Sarich, J. Sheikh, A. Staszczak, M. V. Stoitsov, S. Wild

Upload: junior-lambert

Post on 16-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

DFT requirements for leadership-class computers

N. SchunckDepartment of Physics Astronomy, University of Tennessee, Knoxville, TN-37996, USA

Physics Division, Oak Ridge National Laboratory, Oak Ridge, TN-37831, USA

http://unedf.org

The 3rd LACM-EFES-JUSTIPEN Workshop JIHIR, Oak Ridge National Laboratory, February 23-25, 2009

A. Baran, J. Dobaczewski, J. McDonnell, J. Moré, W. Nazarewicz, N. Nikolov, H. H. Nam, J. Pei, J. Sarich, J. Sheikh, A. Staszczak, M. V. Stoitsov, S. Wild

Page 2: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

Nuclear DFT: Why supercomputing?1

Why super-computers:

Large-scale problems (LACM): fission, shape coexistence, time-dependent problems

Systematic restoration of broken symmetries and correlations “made easy” (QRPA, GCM?, etc.)

Optimization of extended functionals on larger sets of experimental data

DFT: A global theory

Supercomputers: DFT at full power…

Ground-state of even nucleus can be computed in a matter of minutes on a standard laptop: why bother with supercomputing?

Principle: average out individual degrees of freedom Treatment of correlations ?

Current lack of quantitative predictions at the ~100 keV level

Extrapolability ?

“No limit” theory: from light nuclei to the physics of neutron stars

Rich physics

Fast and reliable

Page 3: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

Classes of DFT Solvers2

1D 2D 3D

r-space1 mn, 1 core

(HFBRAD)5 hours,70 cores

(HFBAX)-

HO basis -2 mn, 1 core

(HFBTHO)5 hours, 1 core

(HFODD)

Computational package used and developed at ORNL and estimate of the resources needed for a standard HFB

calculation

Coordinate-space: direct integration of the HFB equations Accurate: provide « exact » result Slow and CPU/memory intensive for 2D-3D geometries

Configuration space: expansion of the solutions on a basis (usually HO) Fast and amenable to beyond mean-field extensions Truncation effects: source of divergences/renormalization issues Wrong asymptotic unless different bases are used (WS, PTG, Gamow, etc.)

Non-linear integro-differential fixed point problem

Page 4: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

Recent physics achievements3

Even-even, odd-even and odd-odd mass tables

Nuclear fission

Systematics of odd-proton states in odd nucleiCf. Talks by M. Stoitsov, S. Wild and J.

Moré

Online resources:

http://massexplorer.org/

http://unedf.org/

Page 5: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

Petascale and beyond4• Hardware constraints (see R. Lusk and J. Vary’s talks):

Many cores (100,000+) stacked into sockets - Currently 4 cores/socket, evolution toward 8 cores/socket and more

Small-memory per core (shared memory per socket) Short, crash-prone, expensive runtime

• Consequences on the architecture of DFT solvers: Optimize time of one HFB calculation: reduce number of iterations, use symmetries

smartly by improving/interfacing codes, parallelization, etc. Work on parallel wrapper: load balancing, checkpoints, error control mechanisms, etc.

Page 6: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

Optimization - Interface HFBTHO/HFODD

• Restarting HFODD from HFB-THO means:– Tremendous gain in time of calculation

– Accrued numerical stability

– Taking advantage of existing mass tables

• Procedure:– Coordinate + phase transformation (both unitary)

– Modify HFODD to restart from HFB matrix elements instead of density fields on Gauss-Hermite mesh

5• Interface fulling

working for spherical HO bases (precision of restart at 10-4 - 10-6)

• Memory issue for deformed bases

HFB-THO: Axial

Cylindrical coordinates

Time-reversal symmetry

j-block diagonalization

HFODD: symmetry-unrestricted

Cartesian coordinates

Y-simplex eigenbasis

No time-reversal symmetry

Full diagonalization

Page 7: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

6 Optimization – HFODD Profiling

Broyden routine: storage of NBroyden fields on 3D Gauss-Hermite mesh

Temporary array allocation for HFB matrix diagonalization

neutrons protons

Calculations by J. McDonnell

Safe limit memory/core on Jaguar/Franklin

Page 8: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

7 Optimization – HFODD ParallelizationM

M

• Two levels of parallelism handled by simple MPI group structure– Nuclear configuration (Z, N, interaction, {Qλμ}, etc.)

– HFB solver

• Standard PBLAS and ScaLAPACK libraries for distributed linear algebra

• Natural splitting of the HFB matrix (OpenMP): perhaps not scalable enough

• Splitting:– HFB matrix into N blocks– Eigenfunctions conserve the same N-blocks splitting – Densities must be re-constructed piecewise

• Challenges– Identify self-contained set of all matrices required for one iteration– Handling of conserved symmetries: give different block

structure– Identify and replace all BLAS calls by PBLAS equivalents

M

M

Page 9: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

Optimization - Finite-size spin instabilities8• Response of the nucleus to a

perturbation with finite momentum q studied in the RPA theory

• Channels: scalar-isoscalar, scalar-isovector, vector-isoscalar, vector-isovector, etc.

Modern Skyrme functionals are highly-instable with respect to finite-size spin perturbations !

Convergence of the HFB calculation of 100 blocked states in 157-165Ba

Region of instability

T. Lesinski et al, Phys. Rev. C 74, 044315 (2006)D. Davesne et al, arXiv:0906.1927 (2009)

Warning for next generation of functionals: stability must be assessed !

Page 10: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

Work in progress - Fission9• Example of challenges for next generation DFT: microscopic description of nuclear

fission• Degrees of freedom at the HFB level: deformation, temperature• Potential energy surfaces depend critically on interaction/functional and pairing

correlations

• Computational tools– Augmented Lagrangian

Method – Broyden Method

• Precision tools– Large bases – Benchmarks

• Distributed computing tools– MPI wrapper – Load balancing – Efficient, independent,

constraint calculations

Static HFB pre-requisites

Page 11: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

DFT Computing Infrastructure10

Interfacing codes

Parallelize solver

Load balancing

Page 12: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

11 Deliverables Year 2-3

• Have a DFT package combining HFB-THO and HFODD available for large-scale calculations

• Optimize full diagonalization of “large” (4,000 4,000) matrices in HFODD

– Take advantage of N-core architecture

– Increase speed for large bases (fission, heavy nuclei)

– Overcome current memory limitations

• Optimize Broyden method (Cf. Jorge’s talk) to improve stability/convergence

• Papers on odd nuclei:

1.Methodology and Theoretical Models

2.Systematic and comparison with experiment

Workplan Year 2-3 Current Status

Done (for spherical bases) - large-scale calculations up to 14,112 cores (2 hours)

Well on target– Parallelization of the HFODD core (PBLAS,

ScaLAPACK)

– Will solve issues related to speed, memory and precision

– Change of iteration cycle: updating HFB matrix elements instead of fields

Done - Numerical instabilities of large-scale calculations can be tracked down to physical instabilities built-in current functionals (see Mario’s talk)

Delayed by problem of instabilities– Paper 1 ready to be published– Paper 2 in preparation– Additional Paper 3 on finite-size spin instabilities in preparation

Page 13: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

Work Plan (Year 4)12

• Physics

– Optimization of DME-based functionals: genetic algorithm + Argonne optimizer (cf Mario’s talk)

– Applications of DME functionals: UNEDF-1

• Computing

– Implement DME functionals in HFODD (study of time-odd channels)

– Complete version 1.0 of parallel HFODD core Demonstrate efficiency and scalability of the code First applications: N-dimensional potential energy surface, fission pathways

– Improve parallel interface to HFODD: Optimistic: it should be a good application of ADLB (“moderately long to long” work

units of 1-2 hours, little communication).

Realistic: remove the master and have him work like a slave (French revolution spirit)

– Replace sequential I/O by parallel I/O for HFODD records (used as checkpoints)

Remaining of the year• New version of HFODD: HFBTHO interface, shell correction, finite-temperature,

Augmented Lagrangian Method, matrix elements mixing, parallel interface, etc.• 2 papers on odd nuclei and 1 on spin instabilities in preparation

Page 14: DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics

December 10, 2008 Slide 14

Nuclear Structure and Nuclear Interactions

Forefront Questions in Nuclear Science and the Role of High Performance Computing January 26-28, 2009 · Washington, D.C.

Microscopic Description of Nuclear Fission

Scientific and computational challenges

• Describe dynamics with novel energy functionals and ab initio methods

1) adiabatic approach 2) non-adiabatic/early stochastic3) full time-dependent dynamics

• Develop ultra-scale techniques for the description of fission

• Build a spectroscopic precision nuclear energy density functional

• Perform constrained minimization on a multi-dimensional potential energy surface

• Find full spectrum of dense millions-sized matrices

• Predict half-lives, mass and kinetic energy distribution of fission fragments and fission cross-sections

• Analyze the fission process through the visualization of time evolution

• Develop scalable application software for time-dependent many-body dynamics

• Societal Impact Nuclear Energy programs Threat reduction NNSA Stockpile Stewardship Program

• Time-dependent many-body dynamics Low-energy heavy-ion collisions and

nucleon- and photon-induced reactions Neutron star quakes Vortex dynamics in quantum super-fluids

Summary of research direction

Expected Scientific and Computational Outcomes Potential impact on Nuclear Science

Our Holy Grail…