mastering computational chemistry with deep learning

26
Olexandr Isayev, Ph.D. University of North Carolina at Chapel Hill [email protected] http://olexandrisayev.com Mastering Computational Chemistry with Deep Learning @olexandr

Upload: others

Post on 02-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mastering Computational Chemistry with Deep Learning

Olexandr Isayev, Ph.D.University of North Carolina at Chapel Hill

[email protected]

http://olexandrisayev.com

Mastering Computational Chemistry with

Deep Learning

@olexandr

Page 2: Mastering Computational Chemistry with Deep Learning

ANI-1: An extensible DL potential with DFT accuracy at force field computational cost

Chem. Sci., 2017, 8, 3192-3203

DOI: 10.1039/C6SC05720A

(http://arxiv.org/abs/1610.08935)

Joint work with Justin S. Smith and Adrian Roitberg

University of Florida

POSTER & Fast Forward Talk:ANI-1: Solving quantum mechanics

with deep learning on GPUs

By Justin Smith

Page 3: Mastering Computational Chemistry with Deep Learning

ANAKIN-MEAccurate NeurAl networK engINe for Molecular Energies

+ =

We want to train a padawan network to become a DFT jedi master

Why ANI-1 ???

AniThe force is strong!

Page 4: Mastering Computational Chemistry with Deep Learning

Quantum Mechanics 101

Time-independent Schrödinger equation

F(r) = E E

Page 5: Mastering Computational Chemistry with Deep Learning

Acc

ura

cy

Force fields

Semi-empirical QM

DFT & HF CCSD(T)

1 103 105 107 109

Time

Accessible molecular systems

Page 6: Mastering Computational Chemistry with Deep Learning

Acc

ura

cy

Force fields

Semi-empirical QM

DFT & HF CCSD(T)

ANI-1 Potential

1 103 105 107 109

Time

Accessible molecular systems

Rel. error in total energy of ~6 x 10-4 % vs. DFT Accuracy ~1 kcal/molSpeedup of 105-106

Page 7: Mastering Computational Chemistry with Deep Learning

Molecular Mechanics / Force Fields

Page 8: Mastering Computational Chemistry with Deep Learning

Protein - Ligand Docking

Page 9: Mastering Computational Chemistry with Deep Learning

MMFF94

PM7

Kanal, Hutchison, Keith Submitted Slide credit: G. Hutchison, University of Pittsburg

Molecular Conformers

Page 10: Mastering Computational Chemistry with Deep Learning

Design Principles

Create a “Force Field” in the sense of a mapping from coordinates R Energy

(Forces) with no a-priori functional form

• Accurate and reproducible

• Fast

• Input consisting only of things that the Schrödinger equation needs. (i.e. atomic

numbers and positions, plus charge and spin)

• Forces as true gradients of the energy

• Extensible in atomic elements

• Extensible to molecules of very different sizes

• Self-learning

Page 11: Mastering Computational Chemistry with Deep Learning

How does ANI-1 work?

Molecular representation (MR)• Transformation from coordinates to a deep learning friendly input

vector

• Accomplished through heavy modifications of Behler and Parrinello symmetry functions[1] or atomic environment vector (AEV or Ԧ𝐺𝑖

𝑋)

• Ԧ𝐺𝑖𝑋 provides atoms local chemical environment to a cutoff radius

• Mods provide recognizable features in MR

• Mods provide better atomic number differentiation

𝑞1Ԧ𝑞

NNP (O)

NNP (H)

𝐸1𝑂 𝐸1

𝐻 𝐸2𝐻

𝑞2 𝑞3

Atomic

Energies

𝐸𝑇Total

Energy

Ԧ𝐺2𝐻Ԧ𝐺1

𝐻Ԧ𝐺1𝑂

+ +

Each color

represents a

distinct NNP

1) J. Behler and M. Parrinello, Phys. Rev. Lett., 2007, 98, 146401.

High-dimensional neural network potential (HDNNP)[1]

• Utilizes AEVs by computing one for each atom

• Total energy takes on a sum of atomic contributions

• Allows training to datasets with many molecules of different size (diverse)

• One NNP per atomic number

J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203

H2O

Page 12: Mastering Computational Chemistry with Deep Learning

Molecular Representation

R = 5 A

Page 13: Mastering Computational Chemistry with Deep Learning

What do you need?

• ANI requires TONS of data

• Currently we run ~20M DFT data points. To be released soon

• Molecules with 1 to 8 atoms from GDB database

• Train network on the data

• Validate on separate data

• Test on ‘known sizes’ (Molecules with <= # max heavy atoms per molecule in training set)

• Interpolation

• Test on ‘unknown sizes’ (Molecules larger than any in the training set)

• Extrapolation

Page 14: Mastering Computational Chemistry with Deep Learning

• Best network architecture: 768 – 128 – 128 – 64 – 1 (122,944 weights + 321 biases)

• AEV cutoff – Radial SFs: 4.6Å; Angular SFs: 3.1Å

• AEV setup – 32 radial functions; 8x8 angular functions (768 elements)

• Included atomic numbers: H, C, N, O, S, F

• Trained and tested on in-house C++/CUDA program (NeuroChem)

• Trained on batches of 1024 molecules from ANI-1 dataset

• Approximate training time: ~2000 epochs or ~48 hours

• Early stopping with learning rate annealing

• % of ANI-1 dataset utilization: Training: 80% Validation: 10% Test 10%

• Final fitness (RMSE) – Training set: 1.299 kcal/mol

Validation set: 1.348 kcal/mol

Test set: 1.359 kcal/mol

Training the ANI-1 potential

J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203

Page 15: Mastering Computational Chemistry with Deep Learning

• Determine agreement of ANI-1 total potential energy to DFT (ωB97x/6-31g(d))

• 131 Randomly selected molecules with 10 heavy atoms

• Generated ~62 conformations for each of them

• Total of ~8200 structures/energies (300 kcal/mol energy range for each molecule)

ANI-1 test case 1

Page 16: Mastering Computational Chemistry with Deep Learning

Total energy correlationANI-1 vs. DFT

(131 molecules with 10 heavy atoms, 8200 total molecules + conformations) [units: kcal/mol]

J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203

Page 17: Mastering Computational Chemistry with Deep Learning
Page 18: Mastering Computational Chemistry with Deep Learning

73 total structures10 Heavy atoms25 Total atomsRMSE: 1.2 kcal/mol (0.048 kcal/mol/atom)DFT time: 1143.11sANI time: 0.0032s

357000x speedup!

Page 19: Mastering Computational Chemistry with Deep Learning

Relative Energy correlation (30kcal/mol)

J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203

Page 20: Mastering Computational Chemistry with Deep Learning

• ANI-1 potential’s smoothness and goodness of fit to DFT potential surface scans

• Molecules considered are relatively large molecules

(53, 31, and 44 atoms)

• 4 scans included: (bond stretch, angle bend, and two dihedral scans)

ANI-1 test case 2

Page 21: Mastering Computational Chemistry with Deep Learning

ANI-1 potential unrelaxed scans

J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203

Page 22: Mastering Computational Chemistry with Deep Learning

ANI-1 potential unrelaxed scans

J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203

Page 23: Mastering Computational Chemistry with Deep Learning

Simulating a box of water on ANI-1.1(Chads Hopkins) From 50ps MD run @ 300K

ANI-1.1 theoretical OH vibrational spectra

Self-diffusion coefficient

Exp. IR Absorbance

Method x10^-05 cm^2/s

Experiment 2.5

ANI-1.1 3.2

TIP3P 5.9

TIP4P 3.3

Page 24: Mastering Computational Chemistry with Deep Learning

Diels- Alder Reaction

C

DB

A

Page 25: Mastering Computational Chemistry with Deep Learning

The Big PictureAn automated and self consistent data generation framework

ANI network agent

IRC Pool GDB Pool

CVMD/MC Sampler

Online database Pool

CV Structure Sampler

Structure Pools

CV Conformer Search

Determine bad structures

Compute normal mode coordinates

Carry out restrained NMS

Compute Cluster

Database of molecular properties

(i.e. energies)

Retrain networks

Computations with QM

Page 26: Mastering Computational Chemistry with Deep Learning

• Universal NN potential for small organic molecules

• Accuracy of high quality DFT calculations

• Extremely fast evaluation: <0.001 s/molecule on 1 GPU

• Up 106 speedup in comparison to DFT

• Can do molecular dynamics, reactions and break bonds!

• Stay tuned!

Summary