05.07.2009
DESCRIPTION
SciDAC Progress Report: Algorithms and Parallel Methods for Reactive Atomistic Simulations. 05.07.2009. Project Accomplishments. Novel algorithms (solvers, data structures) for reactive simulations Comprehensive validation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/1.jpg)
05.07.2009
SciDAC Progress Report:
Algorithms and Parallel Methods for Reactive Atomistic Simulations
![Page 2: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/2.jpg)
Project Accomplishments
Novel algorithms (solvers, data structures) for reactive simulations
Comprehensive validation Parallel formulation, implementation,
performance characterization and optimization
Software release over public domain Incorporation of solvers into LAMMPS
![Page 3: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/3.jpg)
Project Accomplishments: Algorithms and Data Structures
Optimal dynamic data structures for 2-, 3-, and 4-body interactions
Krylov subspace solvers for Charge Equilibriation Effective preconditioners (Block Jacobi) Reusing subspaces, selective
orthogonalization Effective initialization strategies
![Page 4: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/4.jpg)
Project Accomplishments: Comprehensive Validation
In-house validation on Bulk water Silica Other hydrocarbons (hexane, cyclohexane)
Collaborative validation on a number of other systems (please see software release)
![Page 5: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/5.jpg)
Project Accomplishments: Parallel Implementation
Highly optimized parallel formulation validated on bgl (BG/L), Jaguar (XT4), and Ranger (Sun), among others.
Optimizations to other platforms under way.
Parallel code in limited release (to Purdue, MIT, and NIST).
![Page 6: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/6.jpg)
Project Accomplishments: Software Release
Code Release (limited public release) Purdue (Strachan et al., Si/Ge/Si Nanorods) CalTech (Goddard et al., Force field development) MIT (Buehler et al., Silica/water) PSU (van Duin et al., Force field development) USF (Pandit et al., Silica/water interface) UIUC (Aluru et al.) Sandia (Thompson, LAMMPS development) Norwegian Institute for Science and Technology
(IBM/AIX optimization)
![Page 7: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/7.jpg)
Project Accomplishments: LAMMPS Development
Charge equilibriation implemented as Fix to LAMMPS Fully validated for accuracy and performance
Preliminary implementation of ReaxFF into LAMMPS Student at Sandia to complete implementation
over summer
![Page 8: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/8.jpg)
Project Accomplishments: Details• The dominant computational cost is
associated with the following force field computations– Bonded potential– Non-bonded potential– Neighbor potential– Charge equilibriation (qEq)
![Page 9: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/9.jpg)
Project Accomplishments: Details
• Bonded, non-bonded, and neighbor potentials require efficient (dynamic) data structures. Their computation is also typically highly optimized through lookups.
• Charge equilibriation minimizes electrostatic energy to compute partial charges on atoms. This can be linearized and solved at each timestep using iterative solvers such as CG and GMRES.
![Page 10: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/10.jpg)
Accurate Charge Equilibriation is Essential to Modeling Fidelity
![Page 11: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/11.jpg)
• Charge equilibriation dominates overall cost for
required (low error tolerance) and for larger systems.
• Efficient solvers for charge equilibriation are critical.
Computational Cost of Charge Equilibriation
![Page 12: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/12.jpg)
Algorithms for Charge Equilibriation• At required tolerances and for larger systems (106
atoms and beyond), charge equilibriation can take over 75% of total simulation time.
• Efficient algorithms for solving the linear system are essential.
• We implement a number of techniques to accelerate the solve:– Effective preconditioners (nested, Block Jacobi)– Reuse of Krylov subspaces (solution spaces are not
likely to change significantly across timesteps)– Selective reorthogonalization (orthogonalization is the
major bottleneck for scalability)– Initial estimates through higher order extrapolation.
![Page 13: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/13.jpg)
Algorithms for Charge Equilibriation• Accelerating GMRES/CG for charge equilibriation
– The kernel for the matrix is shielded electrostatics– The electrostatics is cutoff, typically at 7 – 10 Ao
– An implicit Block Jacobi accelerator can be constructed from a near-block (say 4 Ao neighborhood)
– The inverse block can be explicitly computed and reused
– Alternately, an inner-outer scheme successively increases cutoff and uses the shorter cutoff to precondition the outer, longer cutoff
– Both schemes implemented in parallel and show excellent scaling. Relative performance is system dependent.
![Page 14: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/14.jpg)
Serial and Parallel Performance
![Page 15: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/15.jpg)
• Memory usage and runtimes (NVE water, 648, 6540, 13080, 26160 atoms).
• Relative cost of
various phases
Single Processor Performance Profiling
![Page 16: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/16.jpg)
Single Processor Performance Profiling
• Our code is extremely efficient/optimized. In comparison to traditional (non-reactive) MD simulations (Gromacs), our code was only 3x slower (tested on water and hexane)
• Our code has a very low memory footprint. This is essential since it allows us to scale problems to larger instances, facilitating scalability to large machine configurations
![Page 17: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/17.jpg)
Parallel Performance
• A number of optimizations have been implemented– Trading off redundant computations for
messages– Efficient use of shadow domains and the
midpoint method for minimizing redundant computations
– Reducing number of orthogonalizations in charge equilibriation
– Platform-specific optimizations
![Page 18: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/18.jpg)
Parallel Performance
• Performance characterized primarily on two platforms– Code achieved 81% efficiency on 1024 cores of
ranger at approximately 6100 atoms/core (1.9s/timestep for a 6.2M atom system)
– Code achieved 77% efficiency on 8K cores of a BG/L at approximately 600 atoms/core (1.1s/timestep for a 4.8M atom system)
![Page 19: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/19.jpg)
Ongoing Work
Near Term (12 months)– Integrating out reactive atomistic framework into
LAMMPS (Graduate student Metin Aktulga spending summer with Aidan Thompson and Steve Plimpton at Sandia)
– Parallelizing the GMRES qEq fix to LAMMPS– Sampling techniques for force-field optimization
![Page 20: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/20.jpg)
Ongoing Work
Medium to Long Term (24-36 months)– Advanced accelerators for qEq (multipole-type
hierarchical preconditioners)– Platform-specific optimizations (Tesla/GPU,
RoadRunner)– Supporting hybrid force-fields (reactive and non-
reactive force fields)– Novel solvers, in particular, SPIKE-based
techniques
![Page 21: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/21.jpg)
Additional Material
![Page 22: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/22.jpg)
Charge Equilibration (QEq) Method
• Expand electrostatic energy as a Taylor series in charge around neutral charge.
• Identify the term linear in charge as electronegativity of the atom and the quadratic term as electrostatic potential and self energy.
• Using these, solve for self-term of partial derivative of electrostatic energy.
![Page 23: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/23.jpg)
Qeq Method
We need to minimize:
subject to:
jii
ijii
iele qqHqXE 2
1
0i
iq
H ij Jiij 1 ijrij
3 ij 3 1 3
where
![Page 24: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/24.jpg)
Qeq Method
i
iieleiu quqEqE })({})({
0
jj
ijiui
qHuXEq
uXqH ijj
ij
uXqH ~~~
![Page 25: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/25.jpg)
Qeq Method
)1(1kk
k
iki uXHq
i k
ik
ik
k
ik
ii uHXHq 011
From charge neutrality, we get:
i kkik
ik
k
ik
H
XHu
11
1
![Page 26: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/26.jpg)
Qeq Method
ii
ii
t
su
Let
wherek
k
iki XHs 1
kk
iki Ht 11
or ii
ikk sHX
ii
iktH 1
![Page 27: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/27.jpg)
Qeq Method
Substituting back, we get:
i
ii
ii
iiii tt
ssutsq
We need to solve 2n equations with kernel H for si and ti.
![Page 28: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/28.jpg)
Qeq Method
Observations:H is dense.
The diagonal term is JiThe shielding term is short-range
Long range behavior of the kernel is 1/r
![Page 29: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/29.jpg)
Validation: Water System
![Page 30: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/30.jpg)
Hexane (@200K) and cyclohexane (@300K) - liquid phase
~10000 atoms randomly placed around lattice points in a cube
NVT (@200K for hexane, @300K for cyclohexane), cube is shrunk by 1A on each side after every 7500 steps another way to measure density.
![Page 31: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/31.jpg)
Validation: Silica-Water
![Page 32: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/32.jpg)
Validation: Silica-Water
![Page 33: 05.07.2009](https://reader035.vdocuments.us/reader035/viewer/2022062723/56813f55550346895daa1d54/html5/thumbnails/33.jpg)
Validation: Silica-Water