Analysis of Mantevo MiniMD benchmark
Gaurav Chotalia
Friedrich-Alexander-University Erlangen-Nürnberg
July 6, 2016MuCoSim SS16
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 1 /
12
Overview
1 MD for atomistic simulation
2 Pro�ling
3 Hotspot and bottlenecks
4 Performance analysis
5 Further outlook
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 2 /
12
Algorithm for MD in MiniMD
Velocity Verlet formulation:
Initialize: X(Positions), V(Velocities) and F(Forces).For timesteps:For every atom:
Update V by 1/2 step (using F).Update X (using V).Build neighbor lists(Occasionally)Calculate F (considering neighbors of current atom)Update V by 1/2 step (using new F).
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 3 /
12
Lennard-Jones (LJ) potential and cut o�
ULJ = 4ε[(
σr
)12 − (σr )6]LJ is fast decaying pair potential.
Force is calculated as gradient of Potential.
Decreases rapidly and approaches zero.
This justi�es cut o� after certain distanceand save computations.
This calls for creating/maintaining list ofneighbors.
Figure : LJ potentialwww.�le.scir.org
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 4 /
12
Neighbor list creation
Verlet list
List of atoms within sphere of Rcut + Rskin
Update list when any atom moves 1
2Rskin
Creating list still requires checking againstall atoms!!
Link Cell
Organize atoms in cells of size Rcut
Check only neighbor cells on grid for forcecalculations.Scanning volume of 27R3
cut vs 4
3πR3
cut
Hybrid
Use link cell approach to create Verlet list.Figure : Neighbor listwww.lammps.sandia.gov
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 5 /
12
Exploiting Newton's 3rd law
We calculate force as interaction betweentwo particles
Use the fact Fij = −Fji
Only half of total work needs to be done.
But there are problems with vectorization.
Figure : Neighbor listwww.lammps.sandia.gov
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 6 /
12
Pro�ling
% time Name
77.56 ForceLJ::compute_halfneigh<0, 1>14.29 Neighbor::build2.04 ForceLJ::compute_halfneigh<0, 1>
Table : Half neighbor list
% time Name
77.00 ForceLJ::compute_fullneigh<0>19.12 Neighbor::build2.77 ForceLJ::compute_fullneigh<1>
Table : Full neighbor list
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 7 /
12
Code from bottleneck region
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 8 /
12
Naive Roo�ine model(Considering only DIV)
Performance unit
Particle updates per second (PU/s)
Average neighbors per particle (neighavg )= 78Avg. fraction of neighbors within cut-o� (cutNeighavg )= 0.7 (DIVperformed)Pmax = f ∗ncores∗SIMD
cycles per DIV ∗neighavg∗cutNeighavgPUs
Pmax = 2.2∗109∗10∗214∗78∗0.7 = 57.56 MPU/s
Iknee ∗ bs = Pmax considering bs = 40.6 GB/sIknee =
Pmax
bs= 0.0014 PU
bytes
Assumed data transfer
3 LD per neighbor
I = 1 PUneighavg∗3∗8 bytes
I = 0.0005 PUbytes < Iknee !!!
likwid measured data transfer
14 of assumed
I = 4 ∗ 1 PUneighavg∗3∗8 bytes
I = 0.002 PUbytes > Iknee !!!
Code scales perfectly so assumed data transfer is obviously wrong !!Gaurav Chotalia (FAU) MiniMD
July 6, 2016 MuCoSim SS16 9 /12
Possible cause for data trasfer discrepency
kernel Code
for(int i = 0; i < nlocal; i++) {
for(int k = 0; k < numneighs; k++) { // Avg. length = 78
const int j = neighs[k]; // strided
LD -> x[j * PAD + 0], x[j * PAD + 1],x[j * PAD + 2]
//calculations
}
// update f[i]
}
Total data volume for one i iteration (considering CL granularity)= 78*8*8= 4.8 kB-> can be kept in cache
Not every atom will have all di�erent neighbors
Extra neighbors loaded due to CL can be used from cache.
likwid measurements show 40 % higher data volume for L2 ascompared to MEM and L3.
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 10 /
12
Further outlook
Check if there is some ordering in neighbor list building so we canknow apriori % LD in inner loop that would not be needed.
Re�ne roo�ine model considering other operations which cannot hidebehind DIV(Current Measured performance is 1
4Pmax)
Investigate e�ect of LD/ST ratio (here 11)
Investigate e�ects of branch misprediction ??
Compare performance of full and half-neighbor versions.
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 11 /
12
Thank you. Questions ??
Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 12 /
12