molecular dynamics (md) on gpus · supercharger library*, vasp & more green* = application...

157
Feb. 2, 2017 Molecular Dynamics (MD) on GPUs

Upload: others

Post on 11-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

Feb. 2, 2017

Molecular Dynamics (MD) on GPUs

Page 2: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

2

Accelerating Discoveries

Using a supercomputer powered by the Tesla

Platform with over 3,000 Tesla accelerators,

University of Illinois scientists performed the first

all-atom simulation of the HIV virus and discovered

the chemical structure of its capsid — “the perfect

target for fighting the infection.”

Without gpu, the supercomputer would need to be

5x larger for similar performance.

Page 3: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

3

Overview of Life & Material Accelerated Apps

MD: All key codes are GPU-accelerated

Great multi-GPU performance

Focus on dense (up to 16) GPU nodes &/or large # of

GPU nodes

ACEMD*, AMBER (PMEMD)*, BAND, CHARMM, DESMOND, ESPResso,

Folding@Home, GPUgrid.net, GROMACS, HALMD, HOOMD-Blue*,

LAMMPS, Lattice Microbes*, mdcore, MELD, miniMD, NAMD,

OpenMM, PolyFTS, SOP-GPU* & more

QC: All key codes are ported or optimizing

Focus on using GPU-accelerated math libraries,

OpenACC directives

GPU-accelerated and available today:

ABINIT, ACES III, ADF, BigDFT, CP2K, GAMESS, GAMESS-

UK, GPAW, LATTE, LSDalton, LSMS, MOLCAS, MOPAC2012,

NWChem, OCTOPUS*, PEtot, QUICK, Q-Chem, QMCPack,

Quantum Espresso/PWscf, QUICK, TeraChem*

Active GPU acceleration projects:

CASTEP, GAMESS, Gaussian, ONETEP, Quantum

Supercharger Library*, VASP & more

green* = application where >90% of the workload is on GPU

Page 4: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

4

MD vs. QC on GPUs

“Classical” Molecular Dynamics Quantum Chemistry (MO, PW, DFT, Semi-Emp)Simulates positions of atoms over time;

chemical-biological or chemical-material behaviors

Calculates electronic properties; ground state, excited states, spectral properties,

making/breaking bonds, physical properties

Forces calculated from simple empirical formulas (bond rearrangement generally forbidden)

Forces derived from electron wave function (bond rearrangement OK, e.g., bond energies)

Up to millions of atoms Up to a few thousand atoms

Solvent included without difficulty Generally in a vacuum but if needed, solvent treated classically (QM/MM) or using implicit methods

Single precision dominated Double precision is important

Uses cuBLAS, cuFFT, CUDA Uses cuBLAS, cuFFT, OpenACC

Geforce (Workstations), Tesla (Servers) Tesla recommended

ECC off ECC on

Page 5: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

5

GPU-Accelerated Molecular Dynamics Apps

ACEMD

AMBER

CHARMM

DESMOND

ESPResSO

Folding@Home

GPUGrid.net

GROMACS

HALMD

HOOMD-Blue

LAMMPS

mdcore

Green Lettering Indicates Performance Slides Included

GPU Perf compared against dual multi-core x86 CPU socket.

MELD

NAMD

OpenMM

PolyFTS

Page 6: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

6

Benefits of MD GPU-Accelerated Computing

• 3x-8x Faster than CPU only systems in all tests (on average)

• Most major compute intensive aspects of classical MD ported

• Large performance boost with marginal price increase

• Energy usage cut by more than half

• GPUs scale well within a node and/or over multiple nodes

• K80 GPU is our fastest and lowest power high performance GPU yet

Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive

Why wouldn’t you want to turbocharge your research?

Page 7: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

ACEMD

Page 8: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

www.acellera.com

470 ns/day on 1 GPU for L-Iduronic acid (1362 atoms)

116 ns/day on 1 GPU for DHFR (23K atoms)

M. Harvey, G. Giupponi and G. De Fabritiis, ACEMD: Accelerated molecular dynamics simulations in the microseconds timescale, J. Chem. Theory and Comput. 5, 1632 (2009)

Page 9: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

www.acellera.com

NVT, NPT, PME, TCL, PLUMED, CAMSHIFT1

1 M. J. Harvey and G. De Fabritiis, An implementation of the smooth particle-mesh Ewald (PME) method on GPU hardware, J. Chem. Theory Comput., 5, 2371–2377 (2009)2 For a list of selected references see http://www.acellera.com/acemd/publications

Page 10: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

June 2017

AMBER 16

Page 11: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

11

JAC_NVE on GP100s

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

320.19 320.14

370.32

404.09

0

50

100

150

200

250

300

350

400

450

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

23,558 atoms PME 2fs

Page 12: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

12

JAC_NVE on GP100s

614.42 613.16

714.23

782.11

0

100

200

300

400

500

600

700

800

900

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

23,558 atoms PME 4fs

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 13: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

13

JAC_NPT on GP100s

295.75 295.42

333.03

360.64

0

50

100

150

200

250

300

350

400

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

23,558 atoms PME 2fs

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 14: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

14

JAC_NPT on GP100s

580.47 578.48

654.66

706.53

0

100

200

300

400

500

600

700

800

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

23,558 atoms PME 4fs

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 15: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

15

FactorIX_NVE on GP100s

106.23 105.98

142.45

166.61

0

20

40

60

80

100

120

140

160

180

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

90,906 atoms PME

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 16: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

16

FactorIX_NPT on GP100s

102.27 102.26

126.75

146.34

0

20

40

60

80

100

120

140

160

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

90,906 atoms PME

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 17: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

17

Cellulose_NVE on GP100s

24.01 24.02

31.35

36.91

0

5

10

15

20

25

30

35

40

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

408,609 atoms PME

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 18: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

18

Cellulose_NPT on GP100s

22.76 22.8

28.76

32.37

0

5

10

15

20

25

30

35

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

408,609 atoms PME

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 19: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

19

STMV_NPT on GP100s

15.64 15.43

20.22

23.13

0

5

10

15

20

25

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

1,067,095 atoms PME

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 20: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

20

TRPCAGE on GP100s

1216.561187.3

0

250

500

750

1000

1250

1500

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

ns/

day

304 atoms GB

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 21: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

21

Myoglobin on GP100s

470.41458.28

443.49 447.23

0

150

300

450

600

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

2,492 atoms GB

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 22: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

22

Nucleosome on GP100s

11.47 11.29

21.2920.51

0

5

10

15

20

25

1 node + 1x GP100per node (PCIe)

1 node + 1x GP100per node (NVLink)

1 node + 2x GP100per node (PCIe)

1 node + 2x GP100per node (NVLink)

ns/

day

25,095 atoms GB

Running AMBER version 16

The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +

Quadro GP100s GPUs (PCIe and NVLink)

Page 23: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

February 2017

AMBER 16

Page 24: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

24

PME-Cellulose_NPT on K80s

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)2.35

11.36

15.43

0

4

8

12

16

20

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

PME-Cellulose_NPT

4.8X

6.6X

Page 25: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

25

PME-Cellulose_NPT on P100s PCIe

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)2.35

21.85

30.00

0

5

10

15

20

25

30

35

40

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +2x P100 PCIe (16GB)

per node

ns/

day

PME-Cellulose_NPT

9.3X

12.8X

Page 26: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

26

PME-Cellulose_NPT on P100s SXM2

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)2.35

23.37

32.22

36.65

0

5

10

15

20

25

30

35

40

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

ns/

day

PME-Cellulose_NPT

9.9X

13.7X15.6X

Page 27: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

27

PME-Cellulose_NVE on K80s

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)2.47

11.85

16.53

0

4

8

12

16

20

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

PME-Cellulose_NVE

4.8X

6.7X

Page 28: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

28

PME-Cellulose_NVE on P100s PCIe

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)2.47

23.34

32.55

0

5

10

15

20

25

30

35

40

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +2x P100 PCIe (16GB)

per node

ns/

day

PME-Cellulose_NVE

9.4X

13.2X

Page 29: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

29

PME-Cellulose_NVE on P100s SXM2

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)2.47

24.94

35.16

40.88

0

5

10

15

20

25

30

35

40

45

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

ns/

day

PME-Cellulose_NVE

10.1X

14.2X16.6X

Page 30: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

30

PME-FactorIX_NPT on K80s

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)11.43

48.54

66.68

0

10

20

30

40

50

60

70

80

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

PME-FactorIX_NPT

4.2X

5.8X

Page 31: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

31

PME-FactorIX_NPT on P100s PCIe

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)11.43

98.77

132.86

0

20

40

60

80

100

120

140

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +2x P100 PCIe (16GB)

per node

ns/

day

PME-FactorIX_NPT

8.6X

11.6X

Page 32: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

32

PME-FactorIX_NPT on P100s SXM2

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)11.43

106.25

144.11

159.80

0

20

40

60

80

100

120

140

160

180

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

ns/

day

PME-FactorIX_NPT

9.3X

12.6X14.0X

Page 33: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

33

PME-FactorIX_NVE on K80s

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

11.98

51.14

71.49

0

10

20

30

40

50

60

70

80

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

PME-FactorIX_NVE

5.4X

6.0X

Page 34: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

34

PME-FactorIX_NVE on P100s PCIe

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)11.98

105.86

145.83

0

20

40

60

80

100

120

140

160

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +2x P100 PCIe (16GB)

per node

ns/

day

PME-FactorIX_NVE

8.8X

12.2X

Page 35: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

35

PME-FactorIX_NVE on P100s SXM2

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)11.98

114.88

159.24

178.02

0

20

40

60

80

100

120

140

160

180

200

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

ns/

day

PME-FactorIX_NVE

9.6X

13.3X14.9X

Page 36: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

36

PME-JAC_NPT on K80s

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

45.89

162.09

216.78

0

50

100

150

200

250

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

PME-JAC_NPT

3.5X

4.7X

Page 37: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

37

PME-JAC_NPT on P100s PCIe

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

45.89

283.60

327.69

0

50

100

150

200

250

300

350

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +2x P100 PCIe (16GB)

per node

ns/

day

PME-JAC_NPT

6.2X

7.1X

Page 38: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

38

PME-JAC_NPT on P100s SXM2

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)45.89

310.52

360.64

423.09

0

50

100

150

200

250

300

350

400

450

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

ns/

day

PME-JAC_NPT

6.8X7.9X

9.2X

Page 39: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

39

PME-JAC_NVE on K80s

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

47.90

173.20

234.99

0

50

100

150

200

250

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

PME-JAC_NVE

3.6X

4.9X

Page 40: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

40

PME-JAC_NVE on P100s PCIe

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

47.90

308.46

363.79

0

50

100

150

200

250

300

350

400

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +2x P100 PCIe (16GB)

per node

ns/

day

PME-JAC_NVE

6.4X

7.6X

Page 41: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

41

PME-JAC_NVE on P100s SXM2

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)47.90

339.81

402.18

473.10

0

50

100

150

200

250

300

350

400

450

500

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +2x P100 PCIe

per node

1 node +4x P100 PCIe

per node

ns/

day

PME-JAC_NVE

7.1X

8.4X9.9X

Page 42: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

42

GB-Myoglobin on K80s

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)28.86

288.47

339.45

0

50

100

150

200

250

300

350

400

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

GB-Myoglobin

10.0X

11.8X

Page 43: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

43

GB-Myoglobin on P100s PCIe

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)28.86

483.37

561.94

0

100

200

300

400

500

600

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +4x P100 PCIe (16GB)

per node

ns/

day

GB-Myoglobin

16.7X

19.5X

Page 44: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

44

GB-Myoglobin on P100s SXM2

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)28.86

534.28

639.37

0

100

200

300

400

500

600

700

1 Broadwell node 1 node +1x P100 PCIe

per node

1 node +4x P100 PCIe

per node

ns/

day

GB-Myoglobin

18.5X

22.2X

Page 45: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

45

GB-Nucleosome on K80s

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

0.40

5.84

11.31

20.55

0

5

10

15

20

25

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

ns/

day

GB-Nucleosome

14.6X

28.3X

51.4X

Page 46: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

46

GB-Nucleosome on P100s PCIe

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)0.40

11.91

22.77

39.91

45.92

0

5

10

15

20

25

30

35

40

45

50

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

1 node +4x P100 PCIe

(16GB) per node

1 node +8x P100 PCIe

(16GB) per node

ns/

day

GB-Nucleosome

29.8X

56.9X

99.8X

114.8X

Page 47: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

47

GB-Nucleosome on P100s SXM2

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)0.40

13.36

25.53

46.2948.29

0

10

20

30

40

50

60

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

ns/

day

GB-Nucleosome

33.4X

63.8X

115.7X

120.7X

Page 48: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

48

Rubisco-75K on K80s

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

0.01

0.35

0.69

1.34

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

ns/

day

Rubisco-75K

35.0X

69.0X

134.0X

Page 49: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

49

Rubisco-75K on P100s PCIe

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)0.01

0.71

1.40

2.69

4.20

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

1 node +4x P100 PCIe

(16GB) per node

1 node +8x P100 PCIe

(16GB) per node

ns/

day

Rubisco-75K

71.0X140.0X

269.0X

420.0X

Page 50: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

50

Rubisco-75K on P100s SXM2

Running AMBER version 16.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)0.01

0.80

1.57

3.06

4.46

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

ns/

day

Rubisco-75K

80.0X

157.0X

306.0X

446.0X

Page 51: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

AMBER 14

Page 52: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

52

AMBER 14 vs. AMBER 12

Courtesy of

Scott Le Grand

From GTC 2014

presentation

Page 53: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

53

AMBER 14; large P2P and small Boost Clocks impacts

2 x Xeon E5-2690 [email protected] + 4 xTesla K40@745Mhz (no P2P)

2 x Xeon E5-2690 [email protected] + 4 xTesla K40@875Mhz (no P2P)

2 x Xeon E5-2690 [email protected] + 4 xTesla K40@745Mhz (P2P)

2 x Xeon E5-2690 [email protected] + 4 xTesla K40@875Mhz (P2P)

Series1 125.77 132.97 196.68 215.18

125.77132.97

196.68

215.18

0

50

100

150

200

250

ns/d

ay

AMBER 14 (ns/day) on 4x K40; P2P and Boost Clocks ImpactDHFR NVE PME, 2fs Benchmark (CUDA 6.0, ECC off)

Boost

P2P

Boost

No P2P

No Boost

P2PNo Boost

No P2P

Page 54: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

5454

AMBER Performance Over Time

Courtesy of

Scott Le Grand

From GTC 2014

presentation

Page 55: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

55

Cellulose on K40s, K80s and M6000s

Running AMBER version 14

The blue node contains Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs

The green nodes contain Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro

M6000@987Mhz GPUs

1.93

8.96

7.87

11.76

10.49

13.67

15.3814.90

0

4

8

12

16

20

1 HaswellNode

1 CPU Node+ 1x K40

1 CPU Node+ 0.5x K80

1 CPU Node+ 1x K80

1 CPU Node+ 1x M6000

1 CPU Node+ 2x K40

1 CPU Node+ 2x K80

1 CPU Node+ 2x M6000

Sim

ula

ted T

ime (

ns/

day)

PME-Cellulose_NVE

4.1X

6.1X

5.4X

8.0X7.7X

4.6X

7.1X

Page 56: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

56

Factor IX on K40s, K80s and M6000s

Running AMBER version 14

The blue node contains Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs

The green nodes contain Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro

M6000@987Mhz GPUs

9.68

40.48

33.59

50.7047.80

61.18 60.93

66.89

0

10

20

30

40

50

60

70

80

1 HaswellNode

1 CPU Node+ 1x K40

1 CPU Node+ 0.5x K80

1 CPU Node+ 1x K80

1 CPU Node+ 1x M6000

1 CPU Node+ 2x K40

1 CPU Node+ 2x K80

1 CPU Node+ 2x M6000

Sim

ula

ted T

ime (

ns/

day)

PME-FactorIX_NVE

3.5X

5.2X5.0X

6.4X6.3X

7.0X

4.2X

Page 57: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

57

JAC on K40s, K80s and M6000s

Running AMBER version 14

The blue node contains Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs

The green nodes contain Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro

M6000@987Mhz GPUs

37.38

134.82

121.30

174.34

161.53

200.34

225.34219.83

0

50

100

150

200

250

1 HaswellNode

1 CPU Node+ 1x K40

1 CPU Node+ 0.5x K80

1 CPU Node+ 1x K80

1 CPU Node+ 1x M6000

1 CPU Node+ 2x K40

1 CPU Node+ 2x K80

1 CPU Node+ 2x M6000

Sim

ula

ted T

ime (

ns/

day)

PME-JAC_NVE

3.2X

4.7X

4.3X

5.4X

6.0X 5.9X

3.6X

Page 58: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

58

Cellulose on M40s

Running AMBER version 14

The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)

CPUs + Tesla M40 (autoboost) GPUs

1.07

10.12

14.40

15.90

0

2

4

6

8

10

12

14

16

18

1 Node 1 Node +1x M40 per node

1 Node +2x M40 per node

1 Node +4x M40 per node

Sim

ula

ted T

ime (

ns/

Day)

PME - Cellulose_NPT

9.5X

13.5X14.9X

Page 59: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

59

Cellulose on M40s

1.07

10.50

15.41

17.13

0

2

4

6

8

10

12

14

16

18

1 Node 1 Node +1x M40 per node

1 Node +2x M40 per node

1 Node +4x M40 per node

Sim

ula

ted T

ime (

ns/

Day)

PME - Cellulose_NVE

9.8X

14.4X

16.0XRunning AMBER version 14

The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)

CPUs + Tesla M40 (autoboost) GPUs

Page 60: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

60

FactorIX on M40s

5.38

46.90

67.37

72.96

0

10

20

30

40

50

60

70

80

1 Node 1 Node +1x M40 per node

1 Node +2x M40 per node

1 Node +4x M40 per node

Sim

ula

ted T

ime (

ns/

Day)

PME - FactorIX_NPT

8.7X

12.5X

13.6XRunning AMBER version 14

The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)

CPUs + Tesla M40 (autoboost) GPUs

Page 61: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

61

FactorIX on M40s

5.47

49.33

73.00

80.04

0

10

20

30

40

50

60

70

80

90

1 Node 1 Node +1x M40 per node

1 Node +2x M40 per node

1 Node +4x M40 per node

Sim

ula

ted T

ime (

ns/

Day)

PME - FactorIX_NVE

9.0X

13.3X14.6X

Running AMBER version 14

The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)

CPUs + Tesla M40 (autoboost) GPUs

Page 62: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

62

JAC on M40s

20.88

149.40

211.97

226.63

0

50

100

150

200

250

1 Node 1 Node +1x M40 per node

1 Node +2x M40 per node

1 Node +4x M40 per node

Sim

ula

ted T

ime (

ns/

Day)

PME - JAC_NPT

7.2X

10.2X10.9X

Running AMBER version 14

The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)

CPUs + Tesla M40 (autoboost) GPUs

Page 63: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

63

JAC on M40s

21.11

157.68

230.18

246.15

0

50

100

150

200

250

300

1 Node 1 Node +1x M40 per node

1 Node +2x M40 per node

1 Node +4x M40 per node

Sim

ula

ted T

ime (

ns/

Day)

PME - JAC_NVE

7.5X

10.9X

11.7X

Running AMBER version 14

The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)

CPUs + Tesla M40 (autoboost) GPUs

Page 64: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

64

Myoglobin on M40s

9.83

232.20

300.86

322.09

0

50

100

150

200

250

300

350

1 Node 1 Node +1x M40 per node

1 Node +2x M40 per node

1 Node +4x M40 per node

Sim

ula

ted T

ime (

ns/

Day)

GB - Myoglobin

23.6X

30.6X32.8X

Running AMBER version 14

The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)

CPUs + Tesla M40 (autoboost) GPUs

Page 65: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

65

Nucleosome on M40s

0.13

4.67

9.05

16.11

0

2

4

6

8

10

12

14

16

18

1 Node 1 Node +1x M40 per node

1 Node +2x M40 per node

1 Node +4x M40 per node

Sim

ula

ted T

ime (

ns/

Day)

GB - Nucleosome

35.9X

69.6X

123.9X

Running AMBER version 14

The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)

CPUs + Tesla M40 (autoboost) GPUs

Page 66: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

66

TrpCage on M40s

408.88

831.91

551.36

464.63

0

100

200

300

400

500

600

700

800

900

1 Node 1 Node +1x M40 per node

1 Node +2x M40 per node

1 Node +4x M40 per node

Sim

ula

ted T

ime (

ns/

Day)

GB - TrpCage2.03X

1.3X

1.1X

Running AMBER version 14

The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)

CPUs + Tesla M40 (autoboost) GPUs

Page 67: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

67

Recommended GPU Node Configuration for AMBER Computational Chemistry

Workstation or Single Node Configuration

# of CPU sockets 2

Cores per CPU socket 6+ (1 CPU core drives 1 GPU)

CPU speed (Ghz) 2.66+

System memory per node (GB) 16

GPUs Kepler K20, K40, K80, P100

# of GPUs per CPU socket1-4

GPU memory preference (GB) 6

GPU to CPU connection PCIe 3.0 16x or higher

Server storage 2 TB

Network configuration Infiniband QDR or better

Scale to multiple nodes with same single node configuration67

Page 68: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

July 2016

CHARMM DOMDEC-GUI

Page 69: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

69

CHARMM DOMDEC-GUI 465 K System Benchmark

Running CHARMM version c40a1

The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs

Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who

is responsible for possible benchmarking error.

0.36

2.15

0

1

2

3

4

1 Haswell node 1 node + 1x K80 per node

ns/

day

465 K System (Her1_HER1_membrane)

6.0X

*Higher is better

Page 70: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

70

CHARMM DOMDEC-GUI 534 K System Benchmark

Running CHARMM version c40a1

The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs

Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who

is responsible for possible benchmarking error.

0.18

1.43

0.0

0.5

1.0

1.5

2.0

1 Haswell node 1 node + 1x K80 per node

ns/

day

534 K System (POPC_PSPC_CHL1mixture)

*Higher is better

8.0X

Page 71: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

71

CHARMM DOMDEC-GUI 20 K System Benchmark

Running CHARMM version c40a1

The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell)

CPUs + Tesla M40 GPUs

Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who

is responsible for possible benchmarking error.

16.00

59.68

0

20

40

60

80

1 Haswell node 1 node + 1x M40 per node

ns/

day

20 K System (Crambin)

*Higher is better

3.7X

Page 72: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

72

CHARMM DOMDEC-GUI 61 K System Benchmark

Running CHARMM version c40a1

The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell)

CPUs + Tesla M40 GPUs

Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who

is responsible for possible benchmarking error.3.90

25.08

0

5

10

15

20

25

30

35

1 Haswell node 1 node + 1x M40 per node

ns/

day

61 K System (GlnBP)

6.4X

*Higher is better

Page 73: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

73

CHARMM DOMDEC-GUI 465 K System Benchmark

Running CHARMM version c40a1

The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell)

CPUs + Tesla M40 GPUs

Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who

is responsible for possible benchmarking error.

0.36

2.27

0

1

2

3

4

1 Haswell node 1 node + 1x M40 per node

ns/

day

465 K System (Her1_HER1_membrane)

*Higher is better

6.3X

Page 74: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

October 2016

GROMACS 2016

Page 75: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

75

Erik Lindahl (GROMACS developer) video

Page 76: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

76

Water 1.5M on K80s

Running GROMACS version 2016

The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

2.79

5.22

6.14

0

1

2

3

4

5

6

7

1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node

ns/

day

Water 1.5M

1.9X

2.2X

Page 77: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

77

Water 3M on K80s

Running GROMACS version 2016

The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

1.32

2.66

3.05

0

1

1

2

2

3

3

4

1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node

ns/

day

Water 3M

2.0X

2.3X

Page 78: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

78

Water 1.5M on M40s

Running GROMACS version 2016

The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

2.79

6.15

7.60

0

1

2

3

4

5

6

7

8

1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node

ns/

day

Water 1.5M

2.2X

2.7X

Page 79: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

79

Water 3M on M40s

Running GROMACS version 2016

The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

1.32

2.97

3.94

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node

ns/

day

Water 3M

2.3X

3.0X

Page 80: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

80

Water 1.5M on P40s

Running GROMACS version 2016

The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P40 GPUs

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

2.79

6.60

8.07

0

1

2

3

4

5

6

7

8

9

1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node

ns/

day

Water 1.5M

2.4X

2.9X

Page 81: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

81

Water 3M on P40s

Running GROMACS version 2016

The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P40 GPUs

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

1.32

3.36

4.19

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node

ns/

day

Water 3M

2.5X

3.2X

Page 82: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

82

Water 1.5M on P100 PCIes

Running GROMACS version 2016

The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

2.79

6.34

7.11

0

1

2

3

4

5

6

7

8

1 Broadwell node 1 node + 2x P100 PCIe (16GB)per node

1 node + 4x P100 PCIe (16GB)per node

ns/

day

Water 1.5M

2.3X

2.5X

Page 83: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

83

Water 3M on P100 PCIes

Running GROMACS version 2016

The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

1.32

3.16

3.43

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

1 Broadwell node 1 node + 2x P100 PCIe (16GB)per node

1 node + 4x P100 PCIe (16GB)per node

ns/

day

Water 3M

2.4X

2.6X

Page 84: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

February 2017

GROMACS 5.1.2

Page 85: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

85

Water 1.5M on K80s

Running GROMACS version 5.1.2

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

3.04

3.49

5.75

0

1

2

3

4

5

6

7

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

Water 1.5M

1.1X

1.9X

Page 86: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

86

Water 1.5M on P100s PCIe

3.04

4.39

6.967.21

0

2

4

6

8

10

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +2x P100 PCIe (16GB)

per node

1 node +4x P100 PCIe (16GB)

per node

ns/

day

Water 1.5M

1.4X

2.3X2.4X

Running GROMACS version 5.1.2

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

Page 87: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

87

Water 1.5M on P100s SXM2

Running GROMACS version 5.1.2

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

3.04

4.11

6.70

7.18

7.88

0

1

2

3

4

5

6

7

8

9

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x 100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

ns/

day

Water 1.5M

1.4X

2.2X2.4X

2.6X

Page 88: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

88

Water 3M on K80s

1.38

1.59

2.98

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

Water 3M

1.2X

2.2X

Running GROMACS version 5.1.2

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

Page 89: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

89

Water 3M on P100s PCIe

1.38

1.96

3.43

3.80

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +2x P100 PCIe (16GB)

per node

1 node +4x P100 PCIe (16GB)

per node

ns/

day

Water 3M

1.4X

2.5X

2.8X Running GROMACS version 5.1.2

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

Page 90: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

90

Water 3M on P100s SXM2

Running GROMACS version 5.1.2

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

1.38

1.84

3.50

3.82

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

ns/

day

Water 3M

1.3X

2.5X2.8X

Page 91: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

91

Recommended GPU Node Configuration for GROMACS Computational Chemistry

Workstation or Single Node Configuration

# of CPU sockets 2

Cores per CPU socket 6+

CPU speed (Ghz) 2.66+

System memory per socket (GB) 32

GPUs Kepler K20, K40, K80

# of GPUs per CPU socket

1x

Kepler GPUs: need fast Sandy Bridge or Ivy Bridge, or

high-end AMD Opterons

GPU memory preference (GB) 6

GPU to CPU connection PCIe 3.0 or higher

Server storage 500 GB or higher

Network configuration Gemini, InfiniBand91

Page 92: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

February 2017

HOOMD-Blue 1.3.3

Page 93: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

93

lj-liquid on K80s

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)326.52

1324.84

1594.37

1942.12

0

500

1000

1500

2000

2500

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

avg

tim

e s

teps/

sec

lj-liquid

4.1X

4.9X

5.9X

Page 94: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

94

lj-liquid on P100s PCIe

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)326.52

2912.66

3217.68

0

500

1000

1500

2000

2500

3000

3500

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +8x P100 PCIe (16GB)

per node

avg

tim

est

eps/

sec

lj-liquid

8.9X

9.9X

Page 95: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

95

lj-liquid on P100s SXM2

326.52

3129.11

3397.74

0

500

1000

1500

2000

2500

3000

3500

4000

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +8x P100 SXM2

per node

avg

tim

est

eps/

sec

lj-liquid

9.6X

10.4XRunning HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

Page 96: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

96

lj_liquid_512k on K80s

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)43.43

220.10

334.59

526.47

0

100

200

300

400

500

600

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

avg

tim

est

eps/

sec

lj_liquid_512k

5.1X

7.7X

12.1X

Page 97: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

97

lj_liquid_512k on P100s PCIe

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)43.43

398.12

534.54

770.18

1045.50

0

200

400

600

800

1000

1200

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

1 node +4x P100 PCIe

(16GB) per node

1 node +8x P100 PCIe

(16GB) per node

avg

tim

est

eps/

sec

lj_liquid_512k

9.2X

12.3X

17.7X

24.1X

Page 98: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

98

lj_liquid_512k on P100s SXM2

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)43.43

443.74

568.51

793.36

1119.76

0

200

400

600

800

1000

1200

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

avg

tim

est

eps/

sec

lj_liquid_512k

10.2X

13.1X

18.3X

25.8X

Page 99: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

99

lj_liquid_1m on K80s

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)22.07

109.54

181.42

303.00

0

50

100

150

200

250

300

350

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

avg

tim

est

eps/

sec

lj_liquid_1m

5.0X

8.2X

13.7X

Page 100: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

100

lj_liquid_1m on P100s PCIe

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)22.07

204.67

294.88

465.58

672.46

0

100

200

300

400

500

600

700

800

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

1 node +4x P100 PCIe

(16GB) per node

1 node +8x P100 PCIe

(16GB) per node

avg

tim

est

eps/

sec

lj_liquid_1m

9.3X

13.4X

21.1X

30.5X

Page 101: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

101

lj_liquid_1m on P100s SXM2

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)22.07

221.02

315.07

488.04

707.73

0

100

200

300

400

500

600

700

800

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

avg

tim

est

eps/

sec

lj_liquid_1m

10.0X

14.3X

22.1X

32.1X

Page 102: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

102

Microsphere on K80s

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)17.53

64.87

98.43

166.74

0

20

40

60

80

100

120

140

160

180

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

avg

tim

est

eps/

sec

microsphere

3.7X

5.6X

9.5X

Page 103: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

103

Microsphere on P100s PCIe

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)17.53

145.71

179.54

257.58

371.24

0

50

100

150

200

250

300

350

400

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

1 node +4x P100 PCIe

(16GB) per node

1 node +8x P100 PCIe

(16GB) per node

avg

tim

est

eps/

sec

microsphere

8.3X

10.2X

14.7X

21.2X

Page 104: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

104

Microsphere on P100s SXM2

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)17.53

151.51

186.01

271.21

384.72

0

50

100

150

200

250

300

350

400

450

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

avg

tim

est

eps/

sec

microsphere

8.6X10.6X

15.5X

21.9X

Page 105: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

105

Polymer on K80s

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

362.19

975.14

1209.45

1518.99

0

200

400

600

800

1000

1200

1400

1600

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

avg

tim

est

eps/

sec

polymer

2.7X

3.3X

4.2X

Page 106: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

106

Polymer on P100s PCIe

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

362.19

1999.64

2143.15

2480.70

0

500

1000

1500

2000

2500

3000

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +4x P100 PCIe (16GB)

per node

1 node +8x P100 PCIe (16GB)

per node

avg

tim

est

eps/

sec

polymer

5.5X

5.9X

6.8X

Page 107: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

107

Polymer on P100s SXM2

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

362.19

2111.99

2272.27

2651.56

0

500

1000

1500

2000

2500

3000

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

avg

tim

est

eps/

sec

polymer

5.8X

6.3X

7.3X

Page 108: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

108

Quasicrystal on K80s

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)78.32

502.53

767.90

1280.44

0

200

400

600

800

1000

1200

1400

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

avg

tim

est

eps/

sec

quasicrystal

6.4X

9.8X

16.3X

Page 109: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

109

Quasicrystal on P100s PCIe

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)78.32

851.29

1199.64

1791.41

2261.72

0

500

1000

1500

2000

2500

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

1 node +4x P100 PCIe

(16GB) per node

1 node +8x P100 PCIe

(16GB) per node

avg

tim

steps/

sec

quasicrystal

10.9X15.3X

22.9X

28.9X

Page 110: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

110

Quasicrystal on P100s SXM2

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)78.32

939.53

1249.90

1940.29

2429.68

0

500

1000

1500

2000

2500

3000

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

avg

tim

steps/

sec

quasicrystal

24.8X

31.0X

12.0X

16.0X

Page 111: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

111

Triblock-copolymer on K80s

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

361.42

953.01

1170.47

1492.01

0

200

400

600

800

1000

1200

1400

1600

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

avg

tim

est

eps/

sec

triblock-copolymer

2.6X

3.2X

4.1X

Page 112: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

112

Triblock-copolymer on P100s PCIe

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

361.42

1999.14

2155.27

2456.09

0

500

1000

1500

2000

2500

3000

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +4x P100 PCIe (16GB)

per node

1 node +8x P100 PCIe (16GB)

per node

avg

tim

est

eps/

sec

triblock-copolymer

5.5X6.0X

6.8X

Page 113: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

113

Triblock-copolymer on P100s SXM2

Running HOOMD-Blue version 1.3.3

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

361.42

2132.922253.83

2587.91

0.00

500.00

1000.00

1500.00

2000.00

2500.00

3000.00

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

avg

tim

est

eps/

sec

triblock-copolymer

5.9X

6.2X

7.2X

Page 114: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

February 2017

LAMMPS 2016

Page 115: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

115

Atomic-Fluid Lennard-Jones 2.5 Cutoff on K80s

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

0.37

0.57

0.00

0.20

0.40

0.60

0.80

1.00

1 Broadwell node 1 node +2x K80 per node

1/se

conds

Atomic-Fluid Lennard-Jones 2.5 Cutoff

1.5X

Page 116: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

116

Atomic-Fluid Lennard-Jones 2.5 Cutoff on P100s PCIe

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

0.37

0.62

0.00

0.20

0.40

0.60

0.80

1.00

1 Broadwell node 1 node +2x P100 PCIe (16GB)

per node

1/se

conds

Atomic-Fluid Lennard-Jones 2.5 Cutoff

1.7X

Page 117: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

117

Atomic-Fluid Lennard-Jones 2.5 Cutoff on P100s SXM2

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 (autoboost) GPUs

0.37

0.64

0.00

0.25

0.50

0.75

1.00

1 Broadwell node 1 node + 2x P100 SXM2 per node

1/se

conds

Atomic-Fluid Lennard-Jones 2.5 Cutoff

1.7X

Page 118: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

118

Atomic-Fluid Lennard-Jones 5.0 Cutoff on K80s

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)0.10

0.14

0.26

0.36

0.00

0.20

0.40

0.60

0.80

1.00

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

1/se

conds

Atomic-Fluid Lennard-Jones 5.0 Cutoff

1.4X2.6X

3.6X

Page 119: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

119

Atomic-Fluid Lennard-Jones 5.0 Cutoff on P100s PCIe

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)0.10

0.22

0.350.37 0.38

0.00

0.20

0.40

0.60

0.80

1.00

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

1 node +4x P100 PCIe

(16GB) per node

1 node +8x P100 PCIe

(16GB) per node

1/se

conds

Atomic-Fluid Lennard-Jones 5.0 Cutoff

2.2X

3.5X 3.7X 3.8X

Page 120: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

120

Atomic-Fluid Lennard-Jones 5.0 Cutoff on P100s SXM2

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)0.10

0.22

0.36

0.41

0.00

0.25

0.50

0.75

1.00

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1/se

conds

Atomic-Fluid Lennard-Jones 5.0 Cutoff

2.2X3.6X

4.1X

Page 121: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

121

Course-grain Water on K80s

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

0.00437 0.00444

0.0000

0.0020

0.0040

0.0060

0.0080

0.0100

1 Broadwell node 1 node +4x K80 per node

1/se

conds

Course-grain Water

1.0X

Page 122: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

122

Course-grain Water on P100s PCIe

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

0.0044

0.0061

0.0093

0.0000

0.0010

0.0020

0.0030

0.0040

0.0050

0.0060

0.0070

0.0080

0.0090

0.0100

1 Broadwell node 1 node +4x P100 PCIe (16GB)

per node

1 node +8x P100 PCIe (16GB)

per node

1/se

conds

Course-grain Water

1.4X

2.1X

Page 123: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

123

Course-grain Water on P100s SXM2

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

0.0044

0.0069

0.0110

0.0000

0.0020

0.0040

0.0060

0.0080

0.0100

0.0120

1 Broadwell node 1 node +4x P100 SXM2

per node

1 node +8x 100 SXM2

per node

1/se

conds

Course-grain Water

1.6X

2.5X

Page 124: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

124

EAM on K80s

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)0.01

0.02

0.04

0.07

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

1/se

conds

EAM

2.0X

4.0X

7.0X

Page 125: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

125

EAM on P100s PCIe

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)0.01

0.03

0.05

0.08

0.13

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

1 node +4x P100 PCIe

(16GB) per node

1 node +8x P100 PCIe

(16GB) per node

1/se

conds

EAM

3.0X

5.0X

8.0X

13.0X

Page 126: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

126

EAM on P100s SXM2

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)0.01

0.03

0.05

0.08

0.13

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

1/se

conds

EAM

3.0X

5.0X

8.0X

13.0X

Page 127: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

127

Gay-Berne on K80s

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

0.010.02

0.03

0.04

0.00

0.01

0.02

0.03

0.04

0.05

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

1/se

conds

Gay-Berne

2.0X

3.0X

4.0X

Page 128: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

128

Gay-Berne on P100s PCIe

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

0.01

0.02

0.04

0.05

0.00

0.01

0.02

0.03

0.04

0.05

1 Broadwell node 1 node +1x P100 PCIe (16GB)

per node

1 node +2x P100 PCIe (16GB)

per node

1 node +4x P100 PCIe (16GB)

per node

1/se

conds

Gay-Berne

2.0X

4.0X

5.0X

Page 129: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

129

Gay-Berne on P100s SXM2

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

0.01

0.02

0.04

0.05

0.00

0.01

0.02

0.03

0.04

0.05

1 Broadwell node 1 node +1x SXM2per node

1 node +2x SXM2per node

1 node +4x SXM2per node

1/se

conds

Gay-Berne

2.0X

4.0X

5.0X

Page 130: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

130

Rhodopsin on K80s

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell)

0.22 0.22

0.31

0.38

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

1 node +4x K80 per node

1/se

conds

Rhodopsin

1.4X

1.7X

Page 131: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

131

Rhodopsin on P100s PCIe

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]

[3.6GHz Turbo] (Broadwell)

0.22

0.29

0.33

0.48

0.52

0.00

0.10

0.20

0.30

0.40

0.50

0.60

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

1 node +4x P100 PCIe

(16GB) per node

1 node +8x P100 PCIe

(16GB) per node

1/se

conds

Rhodopsin

1.3X1.5X

2.2X

2.4X

Page 132: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

132

Rhodopsin on P100s SXM2

Running LAMMPS version 2016

The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]

[3.6GHz Turbo] (Broadwell)

0.22

0.30

0.38

0.490.50

0.00

0.10

0.20

0.30

0.40

0.50

0.60

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

1 node +8x P100 SXM2

per node

1/se

conds

Rhodopsin

1.4X

1.7X

2.2X 2.3X

Page 133: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

133

Recommended GPU Node Configuration for LAMMPS Computational Chemistry

Workstation or Single Node Configuration

# of CPU sockets 2

Cores per CPU socket 6+

CPU speed (Ghz) 2.66+

System memory per socket (GB) 32

GPUsGTX Titan X,

Kepler K20, K40, K80, M40

# of GPUs per CPU socket 1-2

GPU memory preference (GB) 6+

GPU to CPU connection PCIe 3.0 or higher

Server storage 500 GB or higher

Network configuration Gemini, InfiniBand

Scale to thousands of nodes with same single node configuration13

3

Page 134: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

July 2017

NAMD 2.12

Page 135: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

135

APOA1 on K80s

Running NAMD version 2.12

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs3.45

14.92

17.73

0

4

8

12

16

20

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

APOA1

Page 136: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

136

APOA1 on P100s PCIe

Running NAMD version 2.12

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs3.45

22.58 22.85

0

4

8

12

16

20

24

28

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

ns/

day

APOA1

Page 137: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

137

APOA1 on P100s SXM2

Running NAMD version 2.12

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

3.45

22.98 23.44 23.87

0

5

10

15

20

25

30

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

ns/

day

APOA1

Page 138: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

138

F1ATPASE on K80s

Running NAMD version 2.12

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

1.15

4.81

6.27

0

2

4

6

8

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

F1ATPASE

Page 139: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

139

F1ATPASE on P100s PCIe

Running NAMD version 2.12

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

1.15

7.346.99

7.40

0

2

4

6

8

10

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

1 node +4x P100 PCIe

(16GB) per node

ns/

day

F1ATPASE

Page 140: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

140

F1ATPASE on P100s SXM2

Running NAMD version 2.12

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

1.15

7.116.85

7.11

0

2

4

6

8

10

1 Broadwell node 1 node +1x P100 SXM2

per node

1 node +2x P100 SXM2

per node

1 node +4x P100 SXM2

per node

ns/

day

F1ATPASE

Page 141: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

141

STMV on K80s

Running NAMD version 2.12

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz

Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs

0.292

1.274

2.085

0.0

0.5

1.0

1.5

2.0

2.5

3.0

1 Broadwell node 1 node +1x K80 per node

1 node +2x K80 per node

ns/

day

STMV

Page 142: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

142

STMV on P100s PCIe

Running NAMD version 2.12

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz

Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs

0.29

2.15

2.32

0.0

0.5

1.0

1.5

2.0

2.5

3.0

1 Broadwell node 1 node +1x P100 PCIe

(16GB) per node

1 node +2x P100 PCIe

(16GB) per node

ns/

day

STMV

Page 143: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

143

STMV on P100s SXM2

Running NAMD version 2.12

The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]

(Broadwell) CPUs

The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz

Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs

0.292

2.077

0.0

0.5

1.0

1.5

2.0

2.5

3.0

1 Broadwell node 1 node +1x P100 SXM2

per node

ns/

day

STMV

Page 144: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

NAMD 2.11 – Up to 2X Faster

Page 145: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

145

New GPU features in NAMD 2.11

• GPU-accelerated simulations up to twice as fast as NAMD 2.10

• Pressure calculation with fixed atoms on GPU works as on CPU

• Improved scaling for GPU-accelerated particle-mesh Ewald calculation

• CPU-side operations overlap better and are parallelized across cores.

• Improved scaling for GPU-accelerated simulations

• Nonbonded force calculation results are streamed from the GPU for better overlap.

• NVIDIA CUDA GPU-acceleration binaries for Mac OS X

Selected Text from the NAMD website

Page 146: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

146

NAMD 2.11 is up to 2x faster

0

5

10

15

20

25

1 Node 2 Nodes 4 Nodes

Sim

ula

ted T

ime (

ns/

day)

APoA1 (92,224 atoms)

1.2X

1.6X2.0X

NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs

Page 147: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

147

NAMD 2.11 APoA1 on 1 and 2 nodes

Running NAMD version 2.11

The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla

K80 (autoboost) GPUs

2.77

11.67

16.99

5.22

19.73

24.31

0

5

10

15

20

25

1 Node 1 Node +1x K80

1 Node +2x K80

2 Nodes 2 Nodes +1x K80

2 Nodes +2x K80

Sim

ula

ted T

ime (

ns/

day)

APoA1(92,224 atoms)

4.2X

6.1X

3.8X

4.7X

Page 148: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

148

NAMD 2.11 APoA1 on 4 and 8 nodes

Running NAMD version 2.11

The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla

K80 (autoboost) GPUs

10.27

20.64

23.52

16.85

27.83 27.74

0

5

10

15

20

25

30

4 Nodes 4 Nodes +1x K80

4 Nodes +2x K80

8 Nodes 8 Nodes +1x K80

8 Nodes +2x K80

Sim

ula

ted T

ime (

ns/

day)

APoA1 (92,224 atoms)

2.0X

2.3X1.7X 1.6X

Page 149: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

149

NAMD 2.11 is up to 1.8x faster

0

2

4

6

8

10

1 Node 2 Nodes 4 Nodes

Sim

ula

ted T

ime (

ns/

day)

F1-ATPase (327,506 atoms)

1.1X

1.8X 1.4X

NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs

Page 150: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

150

NAMD 2.11 F1-ATPase on 1 and 2 nodes

Running NAMD version 2.11

The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla

K80 (autoboost) GPUs

0.94

3.87

6.11

1.86

7.23

10.58

0

5

10

15

1 Node 1 Node +1x K80

1 Node +2x K80

2 Nodes 2 Nodes +1x K80

2 Nodes +2x K80

Sim

ula

ted T

ime (

ns/

day)

F1-ATPase(327,506 atoms)

4.1X

6.5X

3.9X

5.7X

Page 151: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

151

NAMD 2.11 F1-ATPase on 4 and 8 nodes

Running NAMD version 2.11

The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla

K80 (autoboost) GPUs

3.63

11.66

12.62

6.88

14.22

15.74

0

5

10

15

20

4 Nodes 4 Nodes +1x K80

4 Nodes +2x K80

8 Nodes 8 Nodes +1x K80

8 Nodes +2x K80

Sim

ula

ted T

ime (

ns/

day)

F1-ATPase(327,506 atoms)

3.2X

3.5X2.1X

2.3X

Page 152: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

152

NAMD 2.11 is up to 1.5x faster

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

1 Node 2 Nodes 4 Nodes

Sim

ula

ted T

ime (

ns/

day)

STMV (1,066,628 atoms)

1.5X

1.1X

1.5X

NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs

Page 153: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

153

NAMD 2.11 STMV on 1 and 2 nodes

Running NAMD version 2.11

The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Dual Intel E5-2698 [email protected] CPUs (Haswell) + Tesla

K80 (autoboost) GPUs

0.23

1.03

1.75

0.46

1.98

3.27

0

1

2

3

4

1 Node 1 Node +1x K80

1 Node +2x K80

2 Nodes 2 Nodes +1x K80

2 Nodes +2x K80

Sim

ula

ted T

ime (

ns/

day)

STMV(1,066,628 atoms)

4.5X

7.6X4.3X

7.1X

Page 154: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

154

NAMD 2.11 STMV on 4 and 8 nodes

Running NAMD version 2.11

The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs

The green nodes contain Dual Intel E5-2698 [email protected] CPUs (Haswell) + Tesla

K80 (autoboost) GPUs

0.90

3.61

4.54

1.74

5.86

6.24

0

2

4

6

8

4 Nodes 4 Nodes +1x K80

4 Nodes +2x K80

8 Nodes 8 Nodes +1x K80

8 Nodes +2x K80

Sim

ula

ted T

ime (

ns/

day)

STMV (1,066,628 atoms)

4.0X

5.0X

3.4X

3.6X

Page 155: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

155

Benefits of MD GPU-Accelerated Computing

• 3x-8x Faster than CPU only systems in all tests (on average)

• Most major compute intensive aspects of classical MD ported

• Large performance boost with marginal price increase

• Energy usage cut by more than half

• GPUs scale well within a node and/or over multiple nodes

• K80 GPU is our fastest and lowest power high performance GPU yet

Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive

Why wouldn’t you want to turbocharge your research?

Page 156: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

Dec. 19, 2016

Molecular Dynamics (MD) on GPUs

Page 157: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics

157

GPU-Accelerated Quantum Chemistry Apps

Abinit

ACES III

ADF

BigDFT

CP2K

GAMESS-US

Gaussian

GPAW

LATTE

LSDalton

MOLCAS

Mopac2012

NWChem

Green Lettering Indicates Performance Slides Included

GPU Perf compared against dual multi-core x86 CPU socket.

Quantum SuperChargerLibrary

RMG

TeraChem

UNM

VASP

WL-LSMS

Octopus

ONETEP

Petot

Q-Chem

QMCPACK

Quantum Espresso