machine learning for quantum mechanics · problem types unsupervised learning: data do not have...
TRANSCRIPT
![Page 1: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/1.jpg)
Machine Learning for Quantum Mechanics
Matthias Rupp
Fritz Haber Institute of the Max Planck Society, Berlin, Germany
Hands-on Workshop on Density Functional Theory and BeyondJuly 31–August 11 2017, Humboldt University, Berlin, Germany
![Page 2: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/2.jpg)
Outline
1. Rationalequantum mechanics, machine learning
2. Kernel learningkernel trick, kernel ridge regression
3. Model buildingoverfitting, validation, free parameters
4. Property predictionrepresentation, energies of molecules and crystals
Hands-on sessionproperty prediction with qmmlpack
2
![Page 3: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/3.jpg)
Rationale
![Page 4: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/4.jpg)
Challenges in quantum mechanical simulationsHigh-throughput screening
Castelli et al, Energy Environ Sci 12, 2013
Large systems
Image: Tarini et al, IEEE Trans Visual Comput Graph 2006
Long simulations
Liwo et al, Proc Natl Acad Sci USA 102: 2362, 2005
Quantum e�ects
Image: Hiller et al, Nature 476: 236, 2011
4
![Page 5: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/5.jpg)
Approximations
Hierarchy of numerical approximations to Schrodinger’s equation:
Abrv. Method RuntimeFCI Full Configuration Interaction (CISDTQ) O(N10)CC Coupled Cluster (CCSD(T)) O(N7)FCI Full Configuration Interaction (CISD) O(N6)MP2 Møller-Plesset second order perturbation theory O(N5)HF Hartree-Fock O(N4)DFT Density Functional Theory (Kohn-Sham) O(N3≠4)TB Tight Binding O(N3)MM Molecular Mechanics O(N2)
N = system size
Is it possible to be both accurate and fast?
5
![Page 6: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/6.jpg)
The key idea
• Exploit redundancy in related QM calculations• Interpolate between QM calculations using ML• Smoothness assumption (regularization)
��������
●
● ●
●
●
��������� ���������
• reference calculations— QM- - - ML
6
![Page 7: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/7.jpg)
Relationship to other models
quantum chemistry
generally applicableno or little fittingform from physicsdeductivefew or no parametersslowsmall systems
force fields
limited domainfitting to one classform from physicsmostly deductivesome parametersfastlarge systems
machine learning
generally applicablerefitted to any datasetform from statisticsinductivemany parametersin betweenlarge systems
7
![Page 8: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/8.jpg)
Machine learning
Machine learning (ML) studies algorithms whose performance improves
with data (“learning from experience”). Mitchell, McGraw Hill, 1997
Data X æ
.
Black box ML
Data: {( 1, y1), . . . , ( m, ym)}�
������
�����
algorithm
�
Hypothesis: �f : �� y
2/53...
2/53
æ Model f
• widely applied, many problem types and algorithms• systematic identification of regularity in data for prediction & analysis• interpolation in high-dimensional spaces• inductive, data-driven; empirical in a principled way• connections to statistics, mathematics, computer science, physics, . . .
example: information theory
8
![Page 9: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/9.jpg)
Problem typesUnsupervised learning: Data do not have labelsGiven
)xi
*ni=1, find structure
• dimensionality reduction Burges, now Publishers, 2010
Supervised learning: Data have labelsGiven
)(xi , yi)
*ni=1, predict y for new x
• novelty detection• classification• regression• structured output learning
Semi-supervised learning: Some data have labelsGiven
)(xi , yi)
*ni=1 and
)xi
*mi=1, m ∫ n, predict y for new x
Active learning: Algorithm chooses data to labelChoose n data
)xi
*ni=1 to predict y for new x
9
![Page 10: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/10.jpg)
Artificial neural networks
f (xi ,j) = h1 niÿ
k=1wi≠1,k f (xi≠1,k)
2 • parametric model• universal function approximator• training via non-convex optimization
10
![Page 11: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/11.jpg)
Kernel learning
![Page 12: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/12.jpg)
The kernel trick
Idea:• Transform samples into higher-dimensional space• Implicitly compute inner products there• Rewrite linear algorithm to use only inner products
-2 p -p 0 p 2 px
Input space X
‘æ
㭾 Feature space H
k : X ◊ X æ R, k(x , z) =+„(x), „(z)
,
12Scholkopf, Smola: Learning with Kernels, 2002; Hofmann et al.: Ann. Stat. 36, 1171, 2008.
![Page 13: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/13.jpg)
The kernel trick
Idea:• Transform samples into higher-dimensional space• Implicitly compute inner products there• Rewrite linear algorithm to use only inner products
-2 p -p 0 p 2 px
Input space X
‘æ
㭾
-2p -p p 2px
-1
1sin x
Feature space H
k : X ◊ X æ R, k(x , z) =+„(x), „(z)
,
12Scholkopf, Smola: Learning with Kernels, 2002; Hofmann et al.: Ann. Stat. 36, 1171, 2008.
![Page 14: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/14.jpg)
Kernel functionsKernels correspond to inner products.
If k : X ◊ X æ R is symmetric positive semi-definite,then k(x , z) = È„(x), „(z)Í for some „ : X æ H.
Inner products encode information about lengths and angles:||x ≠ z ||2 = Èx , xÍ ≠ 2 Èx , zÍ + Èz , zÍ , cos ◊ = Èx ,zÍ
||x || ||z|| .
0q
x
z
»»x-z»»2
»»x»»2
»»z»»2
»» z »»2 cos q»» x »»2
• well characterized function class• closure properties• access data only by Kij = k(xi , xj)• X can be any non-empty set
13
![Page 15: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/15.jpg)
Examples of kernel functions
Linear kernelk(x, z) = Èx, zÍ
-� -� � ��
-�
- ��
��
��(���)
-� -� � ��
-�
�
�
Gaussian kernelexp(≠Îx ≠ zÎ2/2‡2)
-� -� � ��
��
��
��
��(���)
- -
-
Laplacian kernelexp(≠Îx ≠ zÎ1/‡)
-� -� � ��
��
��
��
��(���)
- -
-
14
![Page 16: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/16.jpg)
Comparison of linear and kernel ridge regression
Ridge regression
Minimizing
min—œRd
nÿ
i=1
!f (x
i
) ≠ yi"2 + ⁄||—||2
yields
— =!X
TX + ⁄I
"≠1X
Ty
for models
f (x) =dÿ
i=1—ixi
Kernel ridge regression
Minimizing
min–œRn
nÿ
i=1
!f (x
i
) ≠ yi"2 + ⁄||f ||2H
yields
– =!K + ⁄I
"≠1y .
for models
f (x) =nÿ
i=1–ik(x
i
, x)
15
![Page 17: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/17.jpg)
The basis function picture
xix
yi
y
xix
yi
y
— learned y = cos(x)• training samples (xi , yi)
— basis functions- - - prediction f
16Vu et al, Int J Quant Chem 115: 1115, 2015
![Page 18: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/18.jpg)
Representer theorem
Kernel models have form
f (z) =nÿ
i=1–ik(x
i
, z)
due to the representer theorem:Any function minimizing a regularized risk functional
¸1!
x
i
, yi , f (xi
)"n
i=1
2+ g
!Îf Î"
admits to above representation.Intuition:• model lives in space spanned by training data• weighted sum of basis functions
17Scholkopf, Herbrich & Smola, COLT 2001
![Page 19: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/19.jpg)
Centering in kernel feature space
Centering X and y is equivalent to having a bias term b.
For kernel models, center in kernel feature space:
k(x, z) ==
„(x) ≠ 1n
nÿ
i=1„(x
i
), „(z) ≠ 1n
nÿ
i=1„(x
i
)>
∆ K =1I ≠ 1
n1
2K
1I ≠ 1
n1
2
Some kernels like Gaussian and Laplacian kernels do not need centeringPoggio et al., Tech. Rep., 2001
18
![Page 20: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/20.jpg)
Model building
![Page 21: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/21.jpg)
How regularization helps against overfitting
ÊÊ Ê
ÊÊ
Ê ÊÊ
Ê
ÊÊ
Ê
Ê
Ê
ÊÊ
Ê
ÊÊÊx
y
20
![Page 22: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/22.jpg)
E�ect of regularizationUnderfitting
0.0 0.5 1.0 1.5 2.0x0.0
0.2
0.4
0.6
0.8
1.0
1.2y
0.123 / 0.443
⁄ too large
Fitting
0.0 0.5 1.0 1.5 2.0x0.0
0.2
0.4
0.6
0.8
1.0
1.2y
0.044 / 0.068
⁄ right
Overfitting
0.0 0.5 1.0 1.5 2.0x0.0
0.2
0.4
0.6
0.8
1.0
1.2y
0.036 / 0.939
⁄ too small
21Rupp, PhD thesis, 2009; Vu et al, Int. J. Quant. Chem., 1115, 2015
![Page 23: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/23.jpg)
Learning theory
A
BC
D
ae
o
ℱ
prediction error =approximation error a
+ estimation error e+ optimization error o
F = model class, A = true model, B = best model in class, C = bestidentifiable model (data), D = best identifiable model (optimization)
Changes in size of F … a vs. e … bias-variance trade-o�
22Bottou & Bousquet, NIPS 2007
![Page 24: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/24.jpg)
Validation
Why?
• assess model performance• optimize free parameters (hyperparameters)
Which statistics?• root mean squared error (RMSE)• mean absolute error (MAE)• maximum error• squared correlation coe�cient (R2)
What else can we learn from validation?• distribution of errors, not only summary statistics• convergence of error with number of samples
23
![Page 25: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/25.jpg)
Validation
Golden rule:Never use training data for validation
Violation of this rule leads to overfittingby measuring flexibility in fitting instead of generalization abilityrote learner example
If there is su�cient data:• divide data into two subsets, training and validation• build model on training subset• estimate error of trained model on validation subset
Sometimes an external validation set is used in addition.
24
![Page 26: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/26.jpg)
Statistical validation
If too few data, statistical re-sampling methods can be used,such as cross-validation, bagging, bootstrapping, jackknifing
k-fold cross-validation:
• divide data into k evenly sized subsets• for i = 1, . . . , k,
build model on union of subsets {1, . . . , k} \ {i}and validate on subset i
All model building steps must be repeated for data splits:• all pre-processing such as feature selection and centering• optimization of hyperparameters
25Hansen et al, J. Chem. Theor. Comput., 3404, 2103
![Page 27: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/27.jpg)
Hyperparameters: physically motivated choices
Length scale ‡:
‡ ¥ Îx ≠ zÎ1median nearest neighbor distance
Regularization strength ⁄:
‚= noise variance (Bayesian)‚= leeway around yi for fitting∆ target accuracy
-5 0 5 10
-20
-15
-10
-5
00 2 4
-6
-4
-2
0
log2HlsL
log 2HnlL
log10HlsL
log 10HnlL
RMS
6.6
6.9
8.9
20
-6-4-20246
26Rupp: Int. J. Quant. Chem., 1058, 2015
![Page 28: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/28.jpg)
Hyperparameters: statistically motivated choices
• data-driven method forchoosing hyperparameters
• optimize using grid search orgradient descent
• use statistical validation toestimate error
• for validation andhyperparameter optimization,use nested data splits
-5 0 5 10
-20
-15
-10
-5
00 2 4
-6
-4
-2
0
log2HlsL
log 2HnlL
log10HlsL
log 10HnlL
RMS
6.6
6.9
8.9
20
-6-4-20246
27Rupp: Int. J. Quant. Chem., 1058, 2015
![Page 29: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/29.jpg)
Nested data splits• never use data from training in validation
• for performance assessment and hyperparameter optimization,use nested cross-validation or nested hold-out sets
• beware of overfitting
Example 1: plain overfitting◊ train on all data, predict all dataX split data, train, predictExample 2: centering◊ center data, split data, train & predictX split data, center training set, train, center test set, predictExample 3: cross-validation with feature selection◊ feature selection, cross-validationX feature selection for each split of cross-validation
28
![Page 30: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/30.jpg)
Property prediction
![Page 31: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/31.jpg)
Examples
• screening: chemical interpolation
Rupp et al., Phys. Rev. Lett. 108(5): 058301, 2012• molecular dynamics: potential energy surfaces
Behler, Phys. Chem. Chem. Phys. 13(40): 17930, 2011• dynamics simulations: crack propagation in silicon
Li et al, Phys Rev Lett 114: 096405, 2015.• crystal structure prediction: (meta)stable states
Ghiringhelli et al., Phys. Rev. Lett. 114(10): 105503, 2015• density functional theory: kinetic energies
Snyder et al., Phys. Rev. Lett. 108(25): 253002, 2012• transition state theory: dividing surfaces
Pozun et al., J. Chem. Phys. 136(17): 174101, 2012• amorphous systems: relaxation in glassy liquids
Schoenholz, Cubuk et al, Nat. Phys. 12(5): 469, 2016• design: stable interface search
Kiyohara, Oda, Tsuda, Mizoguchi, Jpn. J. Appl. Phys. 55(4): 045502, 2016
30
![Page 32: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/32.jpg)
The combinatorial nature of chemical/materials spaceO
O
O OH
O
OO-
O
CHO
OOO
O
Cl
O
O
NH+
O
NH
O O-
O
HC
OOO
O
ONHO
O
OOO
O
NH
O
O
NH
O
O
O
O
O
O
NH
O
O
O
NH
O
O
O
O
O
NH
O
O
O
O
NH
O
HN
O
F
O
NH
O
OO
+HN
O O
O O
O
O
N
O NH O
O O
O
NH
O
O
HN
SO
H2N
O
O
NH
NH
OH
O
O
O
OH
O
O O O
O
O
O
O
O
O
O
NH2
O
NH
O
O
O
NH
O
O
ONHO
O
ONHO
O
OOO
NH+
OO
O
O
OO N
O
O
O
O
OOO
F
O
OOO
O
NH
O
O
O
O
OO
NH2
O
O
OO
O O O
OF
O
O
O
O
O O O
O NH2
O
HO
O
NH
F
OO
O
O
O
O
O
O
O
NH2O
NH
O
O
OOO
O
OOO
O
O
OO
F
O
OO
O O
ONH2
O
O
O
O
O
O
O NH2
NH2O
O
O
O
O
Cl
O
O
O
O
O
O NH O
O
O
NH
O
O
O
O
N
O
O
O
O
NH
O
O
O
O
O
O
O
O
NH
O
F
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
F
O
NH
O
Br
O
O
O
O
NH
OH
O NH O
O
O
O
O
O
O O
O
Br
NH2O
NH
O
OF
Figure 4: Molecules decoded from randomly-sampled points in the latent space of a variationalautoencoder, near to a given molecule (aspirin [2-(acetyloxy)benzoic acid], highlighted inblue).
to realistic drug-like molecules. In a related experiment, and following the success of othergenerative models of images, we performed interpolations in chemical space. Random drugsfrom the list of FDA approved molecules were selected and encoded by sampling the meanof the VAE. We then performed a linear grid interpolation over two dimensions. We decodedeach point in latent space multiple times and report the one whose latent representation,once re-encoded, is the closest to the sampled point (Figures 14-5)
Bayesian optimization of drug-like molecules The proposed molecule autoencodercan be used to discover new molecules with desired properties.
As a simple example, we first attempt to maximize the water-octanol partition coe�cient(logP), as estimated by RDkit. [43] logP is an important element in characterizing the drug-likeness of a molecule, and is of interest in drug design. To ensure that the resulting moleculesto be easy to synthesize in practice, we also incorporate the synthetic accessibility [44] (SA)score into our objective.
Our initial experiments, optimizing only the logP and SA scores, produced novel molecules,but ones having unrealistically large rings of carbon atoms. To avoid this problem, we addeda penalty for having carbon rings of size larger than 6 to our objective.
8
• molecule space:graph theory
• materials space:group theory
• combinatorialexplosion
aspirin derivatives
31Gomez-Bombarelli et al, arXiv, 2016
![Page 33: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/33.jpg)
Learning potential energy surfaces
32Chang, von Lilienfeld: CHIMIA 68, 602, 2014von Lilienfeld, Int. J. Quant. Chem. 113, 1676, 2013.
![Page 34: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/34.jpg)
Predicting atomization energies
• 7 165 small organic molecules (H,C,N,O,S; 1–7 non-H atoms)• DFT PBE0 atomization energies• kernel ridge regression, Gaussian kernel k(M, M
Õ) = exp!≠d2(M,MÕ)
2‡2"
Ê
Ê
Ê ÊÊ Ê
Ê Ê ÊÊ Ê Ê ÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊÊ
‡
‡
‡
‡
‡‡
‡‡ ‡
‡ ‡ ‡ ‡ ‡‡‡‡‡‡‡‡‡
‡‡‡‡
‡‡
RMSE
MAE
500 1000 2000 500010
20
log2HNLError@kc
alêmo
lD
33Rupp, Tkatchenko, Muller, von Lilienfeld: Phys. Rev. Lett., 2012
![Page 35: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/35.jpg)
Coulomb matrix
Mij =
Y__]
__[
12Z 2.4
i for i = jZiZj
|Ri≠Rj | for i ”= j
• atom positions R
i
and proton numbers Zi• sort by simultaneously permuting rows and columns• Frobenius norm, pad with zeros to allow di�erent sizes
34Rupp, Tkatchenko, Muller & von Lilienfeld; Phys. Rev. Lett. 2012
![Page 36: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/36.jpg)
Extension to other propertiesLearning the map from molecular structure to molecular properties
• various properties• various levels of theory• small organic molecules• Coulomb matrix representations• kernel learning,
deep neural networks• for 5k training molecules, errors
are comparable to the reference
35Montavon et al, New. J. Phys., 2013; Hansen et al, J. Chem. Theor. Comput., 2013.
![Page 37: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/37.jpg)
Local properties
Local interpolation is global extrapolation.
æ
• linear scaling of computational e�ort with system size• size consistent in the limit• requires partitioning for global properties
36Bartok et al, Phys Rev Lett 104, 2010 Behler, J Phys Condens Matter 26, 2014Rupp et al, J Phys Chem Lett 6, 2015
![Page 38: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/38.jpg)
Summary
1. machine learning finds regularity in data for analysis or prediction,improving with more data
2. kernel trick: implicit transformation to high-dimensional spaces,with kernel ridge regression as example
3. for validation, avoid over-fitting by following the golden rule4. interpolation of electronic structure calculations;
example: atomization energies of organic molecules
37
![Page 39: Machine Learning for Quantum Mechanics · Problem types Unsupervised learning: Data do not have labels Given) x i *n i=1,findstructure • dimensionality reduction Burges, now Publishers,](https://reader036.vdocuments.us/reader036/viewer/2022071211/6023bc34c5d36f4e5d747c6b/html5/thumbnails/39.jpg)
TutorialMatthias Rupp:Machine Learning for Quantum Mechanics in a NutshellInternational Journal of Quantum Chemistry 15(16): 1058–1073, 2015http://doi.org/10.1002/qua.24954
Linkshttp://mrupp.info (Publications)http://qmml.org (Datasets)
38