05.07.2009

05.07.2009

SciDAC Progress Report:

Algorithms and Parallel Methods for Reactive Atomistic Simulations

Project Accomplishments

Novel algorithms (solvers, data structures) for reactive simulations

Comprehensive validation Parallel formulation, implementation,

performance characterization and optimization

Software release over public domain Incorporation of solvers into LAMMPS

Project Accomplishments: Algorithms and Data Structures

Optimal dynamic data structures for 2-, 3-, and 4-body interactions

Krylov subspace solvers for Charge Equilibriation Effective preconditioners (Block Jacobi) Reusing subspaces, selective

orthogonalization Effective initialization strategies

Project Accomplishments: Comprehensive Validation

In-house validation on Bulk water Silica Other hydrocarbons (hexane, cyclohexane)

Collaborative validation on a number of other systems (please see software release)

Project Accomplishments: Parallel Implementation

Highly optimized parallel formulation validated on bgl (BG/L), Jaguar (XT4), and Ranger (Sun), among others.

Optimizations to other platforms under way.

Parallel code in limited release (to Purdue, MIT, and NIST).

Project Accomplishments: Software Release

Code Release (limited public release) Purdue (Strachan et al., Si/Ge/Si Nanorods) CalTech (Goddard et al., Force field development) MIT (Buehler et al., Silica/water) PSU (van Duin et al., Force field development) USF (Pandit et al., Silica/water interface) UIUC (Aluru et al.) Sandia (Thompson, LAMMPS development) Norwegian Institute for Science and Technology

(IBM/AIX optimization)

Project Accomplishments: LAMMPS Development

Charge equilibriation implemented as Fix to LAMMPS Fully validated for accuracy and performance

Preliminary implementation of ReaxFF into LAMMPS Student at Sandia to complete implementation

over summer

Project Accomplishments: Details• The dominant computational cost is

associated with the following force field computations– Bonded potential– Non-bonded potential– Neighbor potential– Charge equilibriation (qEq)

Project Accomplishments: Details

• Bonded, non-bonded, and neighbor potentials require efficient (dynamic) data structures. Their computation is also typically highly optimized through lookups.

• Charge equilibriation minimizes electrostatic energy to compute partial charges on atoms. This can be linearized and solved at each timestep using iterative solvers such as CG and GMRES.

Accurate Charge Equilibriation is Essential to Modeling Fidelity

• Charge equilibriation dominates overall cost for

required (low error tolerance) and for larger systems.

• Efficient solvers for charge equilibriation are critical.

Computational Cost of Charge Equilibriation

Algorithms for Charge Equilibriation• At required tolerances and for larger systems (106

atoms and beyond), charge equilibriation can take over 75% of total simulation time.

• Efficient algorithms for solving the linear system are essential.

• We implement a number of techniques to accelerate the solve:– Effective preconditioners (nested, Block Jacobi)– Reuse of Krylov subspaces (solution spaces are not

likely to change significantly across timesteps)– Selective reorthogonalization (orthogonalization is the

major bottleneck for scalability)– Initial estimates through higher order extrapolation.

Algorithms for Charge Equilibriation• Accelerating GMRES/CG for charge equilibriation

– The kernel for the matrix is shielded electrostatics– The electrostatics is cutoff, typically at 7 – 10 Ao

– An implicit Block Jacobi accelerator can be constructed from a near-block (say 4 Ao neighborhood)

– The inverse block can be explicitly computed and reused

– Alternately, an inner-outer scheme successively increases cutoff and uses the shorter cutoff to precondition the outer, longer cutoff

– Both schemes implemented in parallel and show excellent scaling. Relative performance is system dependent.

Serial and Parallel Performance

• Memory usage and runtimes (NVE water, 648, 6540, 13080, 26160 atoms).

• Relative cost of

various phases

Single Processor Performance Profiling

Single Processor Performance Profiling

• Our code is extremely efficient/optimized. In comparison to traditional (non-reactive) MD simulations (Gromacs), our code was only 3x slower (tested on water and hexane)

• Our code has a very low memory footprint. This is essential since it allows us to scale problems to larger instances, facilitating scalability to large machine configurations

Parallel Performance

• A number of optimizations have been implemented– Trading off redundant computations for

messages– Efficient use of shadow domains and the

midpoint method for minimizing redundant computations

– Reducing number of orthogonalizations in charge equilibriation

– Platform-specific optimizations

Parallel Performance

• Performance characterized primarily on two platforms– Code achieved 81% efficiency on 1024 cores of

ranger at approximately 6100 atoms/core (1.9s/timestep for a 6.2M atom system)

– Code achieved 77% efficiency on 8K cores of a BG/L at approximately 600 atoms/core (1.1s/timestep for a 4.8M atom system)

Ongoing Work

Near Term (12 months)– Integrating out reactive atomistic framework into

LAMMPS (Graduate student Metin Aktulga spending summer with Aidan Thompson and Steve Plimpton at Sandia)

– Parallelizing the GMRES qEq fix to LAMMPS– Sampling techniques for force-field optimization

Ongoing Work

Medium to Long Term (24-36 months)– Advanced accelerators for qEq (multipole-type

hierarchical preconditioners)– Platform-specific optimizations (Tesla/GPU,

RoadRunner)– Supporting hybrid force-fields (reactive and non-

reactive force fields)– Novel solvers, in particular, SPIKE-based

techniques

Additional Material

Charge Equilibration (QEq) Method

• Expand electrostatic energy as a Taylor series in charge around neutral charge.

• Identify the term linear in charge as electronegativity of the atom and the quadratic term as electrostatic potential and self energy.

• Using these, solve for self-term of partial derivative of electrostatic energy.

Qeq Method

We need to minimize:

subject to:

jii

ijii

iele qqHqXE 2

1

0i

iq

H ij Jiij 1 ijrij

3 ij 3 1 3

where

Qeq Method

i

iieleiu quqEqE })({})({

0

jj

ijiui

qHuXEq

uXqH ijj

ij

uXqH ~~~

Qeq Method

)1(1kk

k

iki uXHq

i k

ik

ik

k

ik

ii uHXHq 011

From charge neutrality, we get:

i kkik

ik

k

ik

H

XHu

11

1

Qeq Method

ii

ii

t

su

Let

wherek

k

iki XHs 1

kk

iki Ht 11

or ii

ikk sHX

ii

iktH 1

Qeq Method

Substituting back, we get:

i

ii

ii

iiii tt

ssutsq

We need to solve 2n equations with kernel H for si and ti.

Qeq Method

Observations:H is dense.

The diagonal term is JiThe shielding term is short-range

Long range behavior of the kernel is 1/r

Validation: Water System

Hexane (@200K) and cyclohexane (@300K) - liquid phase

~10000 atoms randomly placed around lattice points in a cube

NVT (@200K for hexane, @300K for cyclohexane), cube is shrunk by 1A on each side after every 7500 steps another way to measure density.

Validation: Silica-Water

05.07.2009

Documents

fidelity charge equilibriation

accurate charge equilibriation

charge equilibriationat

summerproject accomplishments

lammpsproject accomplishments

efficient algorithms

efficient solvers

iterative solvers