molecular dynamics (md) on gpus · supercharger library*, vasp & more green* = application...
TRANSCRIPT
Feb. 2, 2017
Molecular Dynamics (MD) on GPUs
2
Accelerating Discoveries
Using a supercomputer powered by the Tesla
Platform with over 3,000 Tesla accelerators,
University of Illinois scientists performed the first
all-atom simulation of the HIV virus and discovered
the chemical structure of its capsid — “the perfect
target for fighting the infection.”
Without gpu, the supercomputer would need to be
5x larger for similar performance.
3
Overview of Life & Material Accelerated Apps
MD: All key codes are GPU-accelerated
Great multi-GPU performance
Focus on dense (up to 16) GPU nodes &/or large # of
GPU nodes
ACEMD*, AMBER (PMEMD)*, BAND, CHARMM, DESMOND, ESPResso,
Folding@Home, GPUgrid.net, GROMACS, HALMD, HOOMD-Blue*,
LAMMPS, Lattice Microbes*, mdcore, MELD, miniMD, NAMD,
OpenMM, PolyFTS, SOP-GPU* & more
QC: All key codes are ported or optimizing
Focus on using GPU-accelerated math libraries,
OpenACC directives
GPU-accelerated and available today:
ABINIT, ACES III, ADF, BigDFT, CP2K, GAMESS, GAMESS-
UK, GPAW, LATTE, LSDalton, LSMS, MOLCAS, MOPAC2012,
NWChem, OCTOPUS*, PEtot, QUICK, Q-Chem, QMCPack,
Quantum Espresso/PWscf, QUICK, TeraChem*
Active GPU acceleration projects:
CASTEP, GAMESS, Gaussian, ONETEP, Quantum
Supercharger Library*, VASP & more
green* = application where >90% of the workload is on GPU
4
MD vs. QC on GPUs
“Classical” Molecular Dynamics Quantum Chemistry (MO, PW, DFT, Semi-Emp)Simulates positions of atoms over time;
chemical-biological or chemical-material behaviors
Calculates electronic properties; ground state, excited states, spectral properties,
making/breaking bonds, physical properties
Forces calculated from simple empirical formulas (bond rearrangement generally forbidden)
Forces derived from electron wave function (bond rearrangement OK, e.g., bond energies)
Up to millions of atoms Up to a few thousand atoms
Solvent included without difficulty Generally in a vacuum but if needed, solvent treated classically (QM/MM) or using implicit methods
Single precision dominated Double precision is important
Uses cuBLAS, cuFFT, CUDA Uses cuBLAS, cuFFT, OpenACC
Geforce (Workstations), Tesla (Servers) Tesla recommended
ECC off ECC on
5
GPU-Accelerated Molecular Dynamics Apps
ACEMD
AMBER
CHARMM
DESMOND
ESPResSO
Folding@Home
GPUGrid.net
GROMACS
HALMD
HOOMD-Blue
LAMMPS
mdcore
Green Lettering Indicates Performance Slides Included
GPU Perf compared against dual multi-core x86 CPU socket.
MELD
NAMD
OpenMM
PolyFTS
6
Benefits of MD GPU-Accelerated Computing
• 3x-8x Faster than CPU only systems in all tests (on average)
• Most major compute intensive aspects of classical MD ported
• Large performance boost with marginal price increase
• Energy usage cut by more than half
• GPUs scale well within a node and/or over multiple nodes
• K80 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive
Why wouldn’t you want to turbocharge your research?
ACEMD
www.acellera.com
470 ns/day on 1 GPU for L-Iduronic acid (1362 atoms)
116 ns/day on 1 GPU for DHFR (23K atoms)
M. Harvey, G. Giupponi and G. De Fabritiis, ACEMD: Accelerated molecular dynamics simulations in the microseconds timescale, J. Chem. Theory and Comput. 5, 1632 (2009)
www.acellera.com
NVT, NPT, PME, TCL, PLUMED, CAMSHIFT1
1 M. J. Harvey and G. De Fabritiis, An implementation of the smooth particle-mesh Ewald (PME) method on GPU hardware, J. Chem. Theory Comput., 5, 2371–2377 (2009)2 For a list of selected references see http://www.acellera.com/acemd/publications
June 2017
AMBER 16
11
JAC_NVE on GP100s
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
320.19 320.14
370.32
404.09
0
50
100
150
200
250
300
350
400
450
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
23,558 atoms PME 2fs
12
JAC_NVE on GP100s
614.42 613.16
714.23
782.11
0
100
200
300
400
500
600
700
800
900
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
23,558 atoms PME 4fs
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
13
JAC_NPT on GP100s
295.75 295.42
333.03
360.64
0
50
100
150
200
250
300
350
400
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
23,558 atoms PME 2fs
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
14
JAC_NPT on GP100s
580.47 578.48
654.66
706.53
0
100
200
300
400
500
600
700
800
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
23,558 atoms PME 4fs
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
15
FactorIX_NVE on GP100s
106.23 105.98
142.45
166.61
0
20
40
60
80
100
120
140
160
180
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
90,906 atoms PME
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
16
FactorIX_NPT on GP100s
102.27 102.26
126.75
146.34
0
20
40
60
80
100
120
140
160
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
90,906 atoms PME
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
17
Cellulose_NVE on GP100s
24.01 24.02
31.35
36.91
0
5
10
15
20
25
30
35
40
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
408,609 atoms PME
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
18
Cellulose_NPT on GP100s
22.76 22.8
28.76
32.37
0
5
10
15
20
25
30
35
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
408,609 atoms PME
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
19
STMV_NPT on GP100s
15.64 15.43
20.22
23.13
0
5
10
15
20
25
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
1,067,095 atoms PME
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
20
TRPCAGE on GP100s
1216.561187.3
0
250
500
750
1000
1250
1500
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
ns/
day
304 atoms GB
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
21
Myoglobin on GP100s
470.41458.28
443.49 447.23
0
150
300
450
600
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
2,492 atoms GB
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
22
Nucleosome on GP100s
11.47 11.29
21.2920.51
0
5
10
15
20
25
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
25,095 atoms GB
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
February 2017
AMBER 16
24
PME-Cellulose_NPT on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)2.35
11.36
15.43
0
4
8
12
16
20
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-Cellulose_NPT
4.8X
6.6X
25
PME-Cellulose_NPT on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)2.35
21.85
30.00
0
5
10
15
20
25
30
35
40
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-Cellulose_NPT
9.3X
12.8X
26
PME-Cellulose_NPT on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)2.35
23.37
32.22
36.65
0
5
10
15
20
25
30
35
40
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
PME-Cellulose_NPT
9.9X
13.7X15.6X
27
PME-Cellulose_NVE on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)2.47
11.85
16.53
0
4
8
12
16
20
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-Cellulose_NVE
4.8X
6.7X
28
PME-Cellulose_NVE on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)2.47
23.34
32.55
0
5
10
15
20
25
30
35
40
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-Cellulose_NVE
9.4X
13.2X
29
PME-Cellulose_NVE on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)2.47
24.94
35.16
40.88
0
5
10
15
20
25
30
35
40
45
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
PME-Cellulose_NVE
10.1X
14.2X16.6X
30
PME-FactorIX_NPT on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)11.43
48.54
66.68
0
10
20
30
40
50
60
70
80
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-FactorIX_NPT
4.2X
5.8X
31
PME-FactorIX_NPT on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)11.43
98.77
132.86
0
20
40
60
80
100
120
140
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-FactorIX_NPT
8.6X
11.6X
32
PME-FactorIX_NPT on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)11.43
106.25
144.11
159.80
0
20
40
60
80
100
120
140
160
180
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
PME-FactorIX_NPT
9.3X
12.6X14.0X
33
PME-FactorIX_NVE on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
11.98
51.14
71.49
0
10
20
30
40
50
60
70
80
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-FactorIX_NVE
5.4X
6.0X
34
PME-FactorIX_NVE on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)11.98
105.86
145.83
0
20
40
60
80
100
120
140
160
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-FactorIX_NVE
8.8X
12.2X
35
PME-FactorIX_NVE on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)11.98
114.88
159.24
178.02
0
20
40
60
80
100
120
140
160
180
200
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
PME-FactorIX_NVE
9.6X
13.3X14.9X
36
PME-JAC_NPT on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
45.89
162.09
216.78
0
50
100
150
200
250
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-JAC_NPT
3.5X
4.7X
37
PME-JAC_NPT on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
45.89
283.60
327.69
0
50
100
150
200
250
300
350
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-JAC_NPT
6.2X
7.1X
38
PME-JAC_NPT on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)45.89
310.52
360.64
423.09
0
50
100
150
200
250
300
350
400
450
1 Broadwell node 1 node +1x P100 PCIe
per node
1 node +2x P100 PCIe
per node
1 node +4x P100 PCIe
per node
ns/
day
PME-JAC_NPT
6.8X7.9X
9.2X
39
PME-JAC_NVE on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
47.90
173.20
234.99
0
50
100
150
200
250
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-JAC_NVE
3.6X
4.9X
40
PME-JAC_NVE on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
47.90
308.46
363.79
0
50
100
150
200
250
300
350
400
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-JAC_NVE
6.4X
7.6X
41
PME-JAC_NVE on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)47.90
339.81
402.18
473.10
0
50
100
150
200
250
300
350
400
450
500
1 Broadwell node 1 node +1x P100 PCIe
per node
1 node +2x P100 PCIe
per node
1 node +4x P100 PCIe
per node
ns/
day
PME-JAC_NVE
7.1X
8.4X9.9X
42
GB-Myoglobin on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)28.86
288.47
339.45
0
50
100
150
200
250
300
350
400
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
GB-Myoglobin
10.0X
11.8X
43
GB-Myoglobin on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)28.86
483.37
561.94
0
100
200
300
400
500
600
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
ns/
day
GB-Myoglobin
16.7X
19.5X
44
GB-Myoglobin on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)28.86
534.28
639.37
0
100
200
300
400
500
600
700
1 Broadwell node 1 node +1x P100 PCIe
per node
1 node +4x P100 PCIe
per node
ns/
day
GB-Myoglobin
18.5X
22.2X
45
GB-Nucleosome on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
0.40
5.84
11.31
20.55
0
5
10
15
20
25
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
ns/
day
GB-Nucleosome
14.6X
28.3X
51.4X
46
GB-Nucleosome on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)0.40
11.91
22.77
39.91
45.92
0
5
10
15
20
25
30
35
40
45
50
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
ns/
day
GB-Nucleosome
29.8X
56.9X
99.8X
114.8X
47
GB-Nucleosome on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)0.40
13.36
25.53
46.2948.29
0
10
20
30
40
50
60
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
ns/
day
GB-Nucleosome
33.4X
63.8X
115.7X
120.7X
48
Rubisco-75K on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
0.01
0.35
0.69
1.34
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
ns/
day
Rubisco-75K
35.0X
69.0X
134.0X
49
Rubisco-75K on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)0.01
0.71
1.40
2.69
4.20
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
ns/
day
Rubisco-75K
71.0X140.0X
269.0X
420.0X
50
Rubisco-75K on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)0.01
0.80
1.57
3.06
4.46
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
ns/
day
Rubisco-75K
80.0X
157.0X
306.0X
446.0X
AMBER 14
52
AMBER 14 vs. AMBER 12
Courtesy of
Scott Le Grand
From GTC 2014
presentation
53
AMBER 14; large P2P and small Boost Clocks impacts
2 x Xeon E5-2690 [email protected] + 4 xTesla K40@745Mhz (no P2P)
2 x Xeon E5-2690 [email protected] + 4 xTesla K40@875Mhz (no P2P)
2 x Xeon E5-2690 [email protected] + 4 xTesla K40@745Mhz (P2P)
2 x Xeon E5-2690 [email protected] + 4 xTesla K40@875Mhz (P2P)
Series1 125.77 132.97 196.68 215.18
125.77132.97
196.68
215.18
0
50
100
150
200
250
ns/d
ay
AMBER 14 (ns/day) on 4x K40; P2P and Boost Clocks ImpactDHFR NVE PME, 2fs Benchmark (CUDA 6.0, ECC off)
Boost
P2P
Boost
No P2P
No Boost
P2PNo Boost
No P2P
5454
AMBER Performance Over Time
Courtesy of
Scott Le Grand
From GTC 2014
presentation
55
Cellulose on K40s, K80s and M6000s
Running AMBER version 14
The blue node contains Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs
The green nodes contain Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro
M6000@987Mhz GPUs
1.93
8.96
7.87
11.76
10.49
13.67
15.3814.90
0
4
8
12
16
20
1 HaswellNode
1 CPU Node+ 1x K40
1 CPU Node+ 0.5x K80
1 CPU Node+ 1x K80
1 CPU Node+ 1x M6000
1 CPU Node+ 2x K40
1 CPU Node+ 2x K80
1 CPU Node+ 2x M6000
Sim
ula
ted T
ime (
ns/
day)
PME-Cellulose_NVE
4.1X
6.1X
5.4X
8.0X7.7X
4.6X
7.1X
56
Factor IX on K40s, K80s and M6000s
Running AMBER version 14
The blue node contains Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs
The green nodes contain Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro
M6000@987Mhz GPUs
9.68
40.48
33.59
50.7047.80
61.18 60.93
66.89
0
10
20
30
40
50
60
70
80
1 HaswellNode
1 CPU Node+ 1x K40
1 CPU Node+ 0.5x K80
1 CPU Node+ 1x K80
1 CPU Node+ 1x M6000
1 CPU Node+ 2x K40
1 CPU Node+ 2x K80
1 CPU Node+ 2x M6000
Sim
ula
ted T
ime (
ns/
day)
PME-FactorIX_NVE
3.5X
5.2X5.0X
6.4X6.3X
7.0X
4.2X
57
JAC on K40s, K80s and M6000s
Running AMBER version 14
The blue node contains Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs
The green nodes contain Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro
M6000@987Mhz GPUs
37.38
134.82
121.30
174.34
161.53
200.34
225.34219.83
0
50
100
150
200
250
1 HaswellNode
1 CPU Node+ 1x K40
1 CPU Node+ 0.5x K80
1 CPU Node+ 1x K80
1 CPU Node+ 1x M6000
1 CPU Node+ 2x K40
1 CPU Node+ 2x K80
1 CPU Node+ 2x M6000
Sim
ula
ted T
ime (
ns/
day)
PME-JAC_NVE
3.2X
4.7X
4.3X
5.4X
6.0X 5.9X
3.6X
58
Cellulose on M40s
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
1.07
10.12
14.40
15.90
0
2
4
6
8
10
12
14
16
18
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - Cellulose_NPT
9.5X
13.5X14.9X
59
Cellulose on M40s
1.07
10.50
15.41
17.13
0
2
4
6
8
10
12
14
16
18
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - Cellulose_NVE
9.8X
14.4X
16.0XRunning AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
60
FactorIX on M40s
5.38
46.90
67.37
72.96
0
10
20
30
40
50
60
70
80
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - FactorIX_NPT
8.7X
12.5X
13.6XRunning AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
61
FactorIX on M40s
5.47
49.33
73.00
80.04
0
10
20
30
40
50
60
70
80
90
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - FactorIX_NVE
9.0X
13.3X14.6X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
62
JAC on M40s
20.88
149.40
211.97
226.63
0
50
100
150
200
250
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - JAC_NPT
7.2X
10.2X10.9X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
63
JAC on M40s
21.11
157.68
230.18
246.15
0
50
100
150
200
250
300
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - JAC_NVE
7.5X
10.9X
11.7X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
64
Myoglobin on M40s
9.83
232.20
300.86
322.09
0
50
100
150
200
250
300
350
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
GB - Myoglobin
23.6X
30.6X32.8X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
65
Nucleosome on M40s
0.13
4.67
9.05
16.11
0
2
4
6
8
10
12
14
16
18
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
GB - Nucleosome
35.9X
69.6X
123.9X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
66
TrpCage on M40s
408.88
831.91
551.36
464.63
0
100
200
300
400
500
600
700
800
900
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
GB - TrpCage2.03X
1.3X
1.1X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
67
Recommended GPU Node Configuration for AMBER Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2
Cores per CPU socket 6+ (1 CPU core drives 1 GPU)
CPU speed (Ghz) 2.66+
System memory per node (GB) 16
GPUs Kepler K20, K40, K80, P100
# of GPUs per CPU socket1-4
GPU memory preference (GB) 6
GPU to CPU connection PCIe 3.0 16x or higher
Server storage 2 TB
Network configuration Infiniband QDR or better
Scale to multiple nodes with same single node configuration67
July 2016
CHARMM DOMDEC-GUI
69
CHARMM DOMDEC-GUI 465 K System Benchmark
Running CHARMM version c40a1
The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who
is responsible for possible benchmarking error.
0.36
2.15
0
1
2
3
4
1 Haswell node 1 node + 1x K80 per node
ns/
day
465 K System (Her1_HER1_membrane)
6.0X
*Higher is better
70
CHARMM DOMDEC-GUI 534 K System Benchmark
Running CHARMM version c40a1
The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who
is responsible for possible benchmarking error.
0.18
1.43
0.0
0.5
1.0
1.5
2.0
1 Haswell node 1 node + 1x K80 per node
ns/
day
534 K System (POPC_PSPC_CHL1mixture)
*Higher is better
8.0X
71
CHARMM DOMDEC-GUI 20 K System Benchmark
Running CHARMM version c40a1
The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell)
CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who
is responsible for possible benchmarking error.
16.00
59.68
0
20
40
60
80
1 Haswell node 1 node + 1x M40 per node
ns/
day
20 K System (Crambin)
*Higher is better
3.7X
72
CHARMM DOMDEC-GUI 61 K System Benchmark
Running CHARMM version c40a1
The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell)
CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who
is responsible for possible benchmarking error.3.90
25.08
0
5
10
15
20
25
30
35
1 Haswell node 1 node + 1x M40 per node
ns/
day
61 K System (GlnBP)
6.4X
*Higher is better
73
CHARMM DOMDEC-GUI 465 K System Benchmark
Running CHARMM version c40a1
The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell)
CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who
is responsible for possible benchmarking error.
0.36
2.27
0
1
2
3
4
1 Haswell node 1 node + 1x M40 per node
ns/
day
465 K System (Her1_HER1_membrane)
*Higher is better
6.3X
October 2016
GROMACS 2016
75
Erik Lindahl (GROMACS developer) video
76
Water 1.5M on K80s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79
5.22
6.14
0
1
2
3
4
5
6
7
1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node
ns/
day
Water 1.5M
1.9X
2.2X
77
Water 3M on K80s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32
2.66
3.05
0
1
1
2
2
3
3
4
1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node
ns/
day
Water 3M
2.0X
2.3X
78
Water 1.5M on M40s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79
6.15
7.60
0
1
2
3
4
5
6
7
8
1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node
ns/
day
Water 1.5M
2.2X
2.7X
79
Water 3M on M40s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32
2.97
3.94
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node
ns/
day
Water 3M
2.3X
3.0X
80
Water 1.5M on P40s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P40 GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79
6.60
8.07
0
1
2
3
4
5
6
7
8
9
1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node
ns/
day
Water 1.5M
2.4X
2.9X
81
Water 3M on P40s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P40 GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32
3.36
4.19
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node
ns/
day
Water 3M
2.5X
3.2X
82
Water 1.5M on P100 PCIes
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79
6.34
7.11
0
1
2
3
4
5
6
7
8
1 Broadwell node 1 node + 2x P100 PCIe (16GB)per node
1 node + 4x P100 PCIe (16GB)per node
ns/
day
Water 1.5M
2.3X
2.5X
83
Water 3M on P100 PCIes
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32
3.16
3.43
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1 Broadwell node 1 node + 2x P100 PCIe (16GB)per node
1 node + 4x P100 PCIe (16GB)per node
ns/
day
Water 3M
2.4X
2.6X
February 2017
GROMACS 5.1.2
85
Water 1.5M on K80s
Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
3.04
3.49
5.75
0
1
2
3
4
5
6
7
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
Water 1.5M
1.1X
1.9X
86
Water 1.5M on P100s PCIe
3.04
4.39
6.967.21
0
2
4
6
8
10
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
ns/
day
Water 1.5M
1.4X
2.3X2.4X
Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
87
Water 1.5M on P100s SXM2
Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
3.04
4.11
6.70
7.18
7.88
0
1
2
3
4
5
6
7
8
9
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x 100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
ns/
day
Water 1.5M
1.4X
2.2X2.4X
2.6X
88
Water 3M on K80s
1.38
1.59
2.98
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
Water 3M
1.2X
2.2X
Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
89
Water 3M on P100s PCIe
1.38
1.96
3.43
3.80
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
ns/
day
Water 3M
1.4X
2.5X
2.8X Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
90
Water 3M on P100s SXM2
Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
1.38
1.84
3.50
3.82
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
Water 3M
1.3X
2.5X2.8X
91
Recommended GPU Node Configuration for GROMACS Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2
Cores per CPU socket 6+
CPU speed (Ghz) 2.66+
System memory per socket (GB) 32
GPUs Kepler K20, K40, K80
# of GPUs per CPU socket
1x
Kepler GPUs: need fast Sandy Bridge or Ivy Bridge, or
high-end AMD Opterons
GPU memory preference (GB) 6
GPU to CPU connection PCIe 3.0 or higher
Server storage 500 GB or higher
Network configuration Gemini, InfiniBand91
February 2017
HOOMD-Blue 1.3.3
93
lj-liquid on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)326.52
1324.84
1594.37
1942.12
0
500
1000
1500
2000
2500
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
e s
teps/
sec
lj-liquid
4.1X
4.9X
5.9X
94
lj-liquid on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)326.52
2912.66
3217.68
0
500
1000
1500
2000
2500
3000
3500
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +8x P100 PCIe (16GB)
per node
avg
tim
est
eps/
sec
lj-liquid
8.9X
9.9X
95
lj-liquid on P100s SXM2
326.52
3129.11
3397.74
0
500
1000
1500
2000
2500
3000
3500
4000
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
lj-liquid
9.6X
10.4XRunning HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
96
lj_liquid_512k on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)43.43
220.10
334.59
526.47
0
100
200
300
400
500
600
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
lj_liquid_512k
5.1X
7.7X
12.1X
97
lj_liquid_512k on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)43.43
398.12
534.54
770.18
1045.50
0
200
400
600
800
1000
1200
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
avg
tim
est
eps/
sec
lj_liquid_512k
9.2X
12.3X
17.7X
24.1X
98
lj_liquid_512k on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)43.43
443.74
568.51
793.36
1119.76
0
200
400
600
800
1000
1200
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
lj_liquid_512k
10.2X
13.1X
18.3X
25.8X
99
lj_liquid_1m on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)22.07
109.54
181.42
303.00
0
50
100
150
200
250
300
350
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
lj_liquid_1m
5.0X
8.2X
13.7X
100
lj_liquid_1m on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)22.07
204.67
294.88
465.58
672.46
0
100
200
300
400
500
600
700
800
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
avg
tim
est
eps/
sec
lj_liquid_1m
9.3X
13.4X
21.1X
30.5X
101
lj_liquid_1m on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)22.07
221.02
315.07
488.04
707.73
0
100
200
300
400
500
600
700
800
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
lj_liquid_1m
10.0X
14.3X
22.1X
32.1X
102
Microsphere on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)17.53
64.87
98.43
166.74
0
20
40
60
80
100
120
140
160
180
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
microsphere
3.7X
5.6X
9.5X
103
Microsphere on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)17.53
145.71
179.54
257.58
371.24
0
50
100
150
200
250
300
350
400
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
avg
tim
est
eps/
sec
microsphere
8.3X
10.2X
14.7X
21.2X
104
Microsphere on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)17.53
151.51
186.01
271.21
384.72
0
50
100
150
200
250
300
350
400
450
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
microsphere
8.6X10.6X
15.5X
21.9X
105
Polymer on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
362.19
975.14
1209.45
1518.99
0
200
400
600
800
1000
1200
1400
1600
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
polymer
2.7X
3.3X
4.2X
106
Polymer on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
362.19
1999.64
2143.15
2480.70
0
500
1000
1500
2000
2500
3000
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
1 node +8x P100 PCIe (16GB)
per node
avg
tim
est
eps/
sec
polymer
5.5X
5.9X
6.8X
107
Polymer on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
362.19
2111.99
2272.27
2651.56
0
500
1000
1500
2000
2500
3000
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
polymer
5.8X
6.3X
7.3X
108
Quasicrystal on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)78.32
502.53
767.90
1280.44
0
200
400
600
800
1000
1200
1400
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
quasicrystal
6.4X
9.8X
16.3X
109
Quasicrystal on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)78.32
851.29
1199.64
1791.41
2261.72
0
500
1000
1500
2000
2500
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
avg
tim
steps/
sec
quasicrystal
10.9X15.3X
22.9X
28.9X
110
Quasicrystal on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)78.32
939.53
1249.90
1940.29
2429.68
0
500
1000
1500
2000
2500
3000
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
steps/
sec
quasicrystal
24.8X
31.0X
12.0X
16.0X
111
Triblock-copolymer on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
361.42
953.01
1170.47
1492.01
0
200
400
600
800
1000
1200
1400
1600
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
triblock-copolymer
2.6X
3.2X
4.1X
112
Triblock-copolymer on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
361.42
1999.14
2155.27
2456.09
0
500
1000
1500
2000
2500
3000
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
1 node +8x P100 PCIe (16GB)
per node
avg
tim
est
eps/
sec
triblock-copolymer
5.5X6.0X
6.8X
113
Triblock-copolymer on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
361.42
2132.922253.83
2587.91
0.00
500.00
1000.00
1500.00
2000.00
2500.00
3000.00
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
triblock-copolymer
5.9X
6.2X
7.2X
February 2017
LAMMPS 2016
115
Atomic-Fluid Lennard-Jones 2.5 Cutoff on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
0.37
0.57
0.00
0.20
0.40
0.60
0.80
1.00
1 Broadwell node 1 node +2x K80 per node
1/se
conds
Atomic-Fluid Lennard-Jones 2.5 Cutoff
1.5X
116
Atomic-Fluid Lennard-Jones 2.5 Cutoff on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
0.37
0.62
0.00
0.20
0.40
0.60
0.80
1.00
1 Broadwell node 1 node +2x P100 PCIe (16GB)
per node
1/se
conds
Atomic-Fluid Lennard-Jones 2.5 Cutoff
1.7X
117
Atomic-Fluid Lennard-Jones 2.5 Cutoff on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 (autoboost) GPUs
0.37
0.64
0.00
0.25
0.50
0.75
1.00
1 Broadwell node 1 node + 2x P100 SXM2 per node
1/se
conds
Atomic-Fluid Lennard-Jones 2.5 Cutoff
1.7X
118
Atomic-Fluid Lennard-Jones 5.0 Cutoff on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)0.10
0.14
0.26
0.36
0.00
0.20
0.40
0.60
0.80
1.00
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
1/se
conds
Atomic-Fluid Lennard-Jones 5.0 Cutoff
1.4X2.6X
3.6X
119
Atomic-Fluid Lennard-Jones 5.0 Cutoff on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)0.10
0.22
0.350.37 0.38
0.00
0.20
0.40
0.60
0.80
1.00
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
1/se
conds
Atomic-Fluid Lennard-Jones 5.0 Cutoff
2.2X
3.5X 3.7X 3.8X
120
Atomic-Fluid Lennard-Jones 5.0 Cutoff on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)0.10
0.22
0.36
0.41
0.00
0.25
0.50
0.75
1.00
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1/se
conds
Atomic-Fluid Lennard-Jones 5.0 Cutoff
2.2X3.6X
4.1X
121
Course-grain Water on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
0.00437 0.00444
0.0000
0.0020
0.0040
0.0060
0.0080
0.0100
1 Broadwell node 1 node +4x K80 per node
1/se
conds
Course-grain Water
1.0X
122
Course-grain Water on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
0.0044
0.0061
0.0093
0.0000
0.0010
0.0020
0.0030
0.0040
0.0050
0.0060
0.0070
0.0080
0.0090
0.0100
1 Broadwell node 1 node +4x P100 PCIe (16GB)
per node
1 node +8x P100 PCIe (16GB)
per node
1/se
conds
Course-grain Water
1.4X
2.1X
123
Course-grain Water on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
0.0044
0.0069
0.0110
0.0000
0.0020
0.0040
0.0060
0.0080
0.0100
0.0120
1 Broadwell node 1 node +4x P100 SXM2
per node
1 node +8x 100 SXM2
per node
1/se
conds
Course-grain Water
1.6X
2.5X
124
EAM on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)0.01
0.02
0.04
0.07
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
1/se
conds
EAM
2.0X
4.0X
7.0X
125
EAM on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)0.01
0.03
0.05
0.08
0.13
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
1/se
conds
EAM
3.0X
5.0X
8.0X
13.0X
126
EAM on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)0.01
0.03
0.05
0.08
0.13
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
1/se
conds
EAM
3.0X
5.0X
8.0X
13.0X
127
Gay-Berne on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
0.010.02
0.03
0.04
0.00
0.01
0.02
0.03
0.04
0.05
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
1/se
conds
Gay-Berne
2.0X
3.0X
4.0X
128
Gay-Berne on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
0.01
0.02
0.04
0.05
0.00
0.01
0.02
0.03
0.04
0.05
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
1/se
conds
Gay-Berne
2.0X
4.0X
5.0X
129
Gay-Berne on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
0.01
0.02
0.04
0.05
0.00
0.01
0.02
0.03
0.04
0.05
1 Broadwell node 1 node +1x SXM2per node
1 node +2x SXM2per node
1 node +4x SXM2per node
1/se
conds
Gay-Berne
2.0X
4.0X
5.0X
130
Rhodopsin on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
0.22 0.22
0.31
0.38
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
1/se
conds
Rhodopsin
1.4X
1.7X
131
Rhodopsin on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
0.22
0.29
0.33
0.48
0.52
0.00
0.10
0.20
0.30
0.40
0.50
0.60
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
1/se
conds
Rhodopsin
1.3X1.5X
2.2X
2.4X
132
Rhodopsin on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
0.22
0.30
0.38
0.490.50
0.00
0.10
0.20
0.30
0.40
0.50
0.60
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
1/se
conds
Rhodopsin
1.4X
1.7X
2.2X 2.3X
133
Recommended GPU Node Configuration for LAMMPS Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2
Cores per CPU socket 6+
CPU speed (Ghz) 2.66+
System memory per socket (GB) 32
GPUsGTX Titan X,
Kepler K20, K40, K80, M40
# of GPUs per CPU socket 1-2
GPU memory preference (GB) 6+
GPU to CPU connection PCIe 3.0 or higher
Server storage 500 GB or higher
Network configuration Gemini, InfiniBand
Scale to thousands of nodes with same single node configuration13
3
July 2017
NAMD 2.12
135
APOA1 on K80s
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs3.45
14.92
17.73
0
4
8
12
16
20
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
APOA1
136
APOA1 on P100s PCIe
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs3.45
22.58 22.85
0
4
8
12
16
20
24
28
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
ns/
day
APOA1
137
APOA1 on P100s SXM2
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
3.45
22.98 23.44 23.87
0
5
10
15
20
25
30
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
APOA1
138
F1ATPASE on K80s
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
1.15
4.81
6.27
0
2
4
6
8
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
F1ATPASE
139
F1ATPASE on P100s PCIe
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
1.15
7.346.99
7.40
0
2
4
6
8
10
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
ns/
day
F1ATPASE
140
F1ATPASE on P100s SXM2
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
1.15
7.116.85
7.11
0
2
4
6
8
10
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
F1ATPASE
141
STMV on K80s
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
0.292
1.274
2.085
0.0
0.5
1.0
1.5
2.0
2.5
3.0
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
STMV
142
STMV on P100s PCIe
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
0.29
2.15
2.32
0.0
0.5
1.0
1.5
2.0
2.5
3.0
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
ns/
day
STMV
143
STMV on P100s SXM2
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
0.292
2.077
0.0
0.5
1.0
1.5
2.0
2.5
3.0
1 Broadwell node 1 node +1x P100 SXM2
per node
ns/
day
STMV
NAMD 2.11 – Up to 2X Faster
145
New GPU features in NAMD 2.11
• GPU-accelerated simulations up to twice as fast as NAMD 2.10
• Pressure calculation with fixed atoms on GPU works as on CPU
• Improved scaling for GPU-accelerated particle-mesh Ewald calculation
• CPU-side operations overlap better and are parallelized across cores.
• Improved scaling for GPU-accelerated simulations
• Nonbonded force calculation results are streamed from the GPU for better overlap.
• NVIDIA CUDA GPU-acceleration binaries for Mac OS X
Selected Text from the NAMD website
146
NAMD 2.11 is up to 2x faster
0
5
10
15
20
25
1 Node 2 Nodes 4 Nodes
Sim
ula
ted T
ime (
ns/
day)
APoA1 (92,224 atoms)
1.2X
1.6X2.0X
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs
147
NAMD 2.11 APoA1 on 1 and 2 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla
K80 (autoboost) GPUs
2.77
11.67
16.99
5.22
19.73
24.31
0
5
10
15
20
25
1 Node 1 Node +1x K80
1 Node +2x K80
2 Nodes 2 Nodes +1x K80
2 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
APoA1(92,224 atoms)
4.2X
6.1X
3.8X
4.7X
148
NAMD 2.11 APoA1 on 4 and 8 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla
K80 (autoboost) GPUs
10.27
20.64
23.52
16.85
27.83 27.74
0
5
10
15
20
25
30
4 Nodes 4 Nodes +1x K80
4 Nodes +2x K80
8 Nodes 8 Nodes +1x K80
8 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
APoA1 (92,224 atoms)
2.0X
2.3X1.7X 1.6X
149
NAMD 2.11 is up to 1.8x faster
0
2
4
6
8
10
1 Node 2 Nodes 4 Nodes
Sim
ula
ted T
ime (
ns/
day)
F1-ATPase (327,506 atoms)
1.1X
1.8X 1.4X
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs
150
NAMD 2.11 F1-ATPase on 1 and 2 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla
K80 (autoboost) GPUs
0.94
3.87
6.11
1.86
7.23
10.58
0
5
10
15
1 Node 1 Node +1x K80
1 Node +2x K80
2 Nodes 2 Nodes +1x K80
2 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
F1-ATPase(327,506 atoms)
4.1X
6.5X
3.9X
5.7X
151
NAMD 2.11 F1-ATPase on 4 and 8 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla
K80 (autoboost) GPUs
3.63
11.66
12.62
6.88
14.22
15.74
0
5
10
15
20
4 Nodes 4 Nodes +1x K80
4 Nodes +2x K80
8 Nodes 8 Nodes +1x K80
8 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
F1-ATPase(327,506 atoms)
3.2X
3.5X2.1X
2.3X
152
NAMD 2.11 is up to 1.5x faster
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1 Node 2 Nodes 4 Nodes
Sim
ula
ted T
ime (
ns/
day)
STMV (1,066,628 atoms)
1.5X
1.1X
1.5X
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs
153
NAMD 2.11 STMV on 1 and 2 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] CPUs (Haswell) + Tesla
K80 (autoboost) GPUs
0.23
1.03
1.75
0.46
1.98
3.27
0
1
2
3
4
1 Node 1 Node +1x K80
1 Node +2x K80
2 Nodes 2 Nodes +1x K80
2 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
STMV(1,066,628 atoms)
4.5X
7.6X4.3X
7.1X
154
NAMD 2.11 STMV on 4 and 8 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] CPUs (Haswell) + Tesla
K80 (autoboost) GPUs
0.90
3.61
4.54
1.74
5.86
6.24
0
2
4
6
8
4 Nodes 4 Nodes +1x K80
4 Nodes +2x K80
8 Nodes 8 Nodes +1x K80
8 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
STMV (1,066,628 atoms)
4.0X
5.0X
3.4X
3.6X
155
Benefits of MD GPU-Accelerated Computing
• 3x-8x Faster than CPU only systems in all tests (on average)
• Most major compute intensive aspects of classical MD ported
• Large performance boost with marginal price increase
• Energy usage cut by more than half
• GPUs scale well within a node and/or over multiple nodes
• K80 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive
Why wouldn’t you want to turbocharge your research?
Dec. 19, 2016
Molecular Dynamics (MD) on GPUs
157
GPU-Accelerated Quantum Chemistry Apps
Abinit
ACES III
ADF
BigDFT
CP2K
GAMESS-US
Gaussian
GPAW
LATTE
LSDalton
MOLCAS
Mopac2012
NWChem
Green Lettering Indicates Performance Slides Included
GPU Perf compared against dual multi-core x86 CPU socket.
Quantum SuperChargerLibrary
RMG
TeraChem
UNM
VASP
WL-LSMS
Octopus
ONETEP
Petot
Q-Chem
QMCPACK
Quantum Espresso