![Page 1: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/1.jpg)
Prediction of NMR Chemical Shifts.
A Chemometrical Approach
К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg
Advanced Chemistry Development (ACD)
![Page 2: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/2.jpg)
Structure and its spectral data
COSY.esp
4 3 2 1F2 Chemical Shift (ppm)
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
F1 C
hem
ical
Shi
ft (p
pm)
HMQC.esp
4 3 2F2 Chemical Shift (ppm)
16
24
32
40
48
56
64
72
80
F1 C
hem
ical
Shi
ft (p
pm)
C13.esp
85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10Chemical Shift (ppm)
0.25
0.50
0.75
1.00
Nor
mal
ized
Inte
nsity
26.8
531.6
1
42.4
642
.86
48.2
2
50.3
251
.94
52.6
7
60.1
060
.1864
.59
76.7
877
.03
77.2
977
.60
H1.esp
4.0 3.5 3.0 2.5 2.0 1.5Chemical Shift (ppm)
0.25
0.50
0.75
1.00
Nor
mal
ized
Inte
nsity
CH4
StructureSpectraN
NO
O N
NO
O
O
N
N
O
O
N
N
O
![Page 3: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/3.jpg)
Sometimes solution is not obvious
• In many cases we obtain several structures corresponding to spectral data.
• In this case we need a method to rank the structures.
• Most powerful method - compare experimental and predicted 13C NMR spectra
![Page 4: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/4.jpg)
13C NMR spectral data
NN
O
O
N
N
O
O
2,00
9.62
Experimental
Predicted
![Page 5: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/5.jpg)
How to find the best structure?
• In most cases predicted spectrum of “correct structure” has best fit to experimental spectrum
• In practice “correct structure” has average deviation between predicted and experimental spectra 2-3 ppm
![Page 6: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/6.jpg)
The role of the spectra prediction• Real-world task. Unknown structure with MF
C29H32N2O5 and spectral data (1D and 2D NMR).• 20 min to generate all structures (> 12 000) • 24 hours to predict the NMR 13С spectra
of all the obtained structures• Speed of spectra prediction should be
increased
![Page 7: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/7.jpg)
Methods of the prediction of NMR spectra
• Quantum Mechanics• Database approach
– HOSE Codes– Maximum Common
Substructure
• Rule-based – Additive scheme– Neural Networks
– extremely slow– accurate but slow
– fast but inaccurate
• Our choice – improve accuracy of fast method
![Page 8: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/8.jpg)
Additive schemeaixi
=
C
O
CH3
C
CH2
CH
CH2
CH2
CH2
153.71-1.85-4.49-1.39-2.79+1.43+0.52+0.52-1.35 = 144.31
153.71
-1.85
-4.49
-1.39
-2.79
1.43
0.52
0.52
-1.35
144.31
Main problem – find correct values of atom increments
![Page 9: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/9.jpg)
Available data • We have database of 1.5 millions of
chemical shifts for 13С. • We can try to obtain correct values!
![Page 10: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/10.jpg)
How to encode atom environment
CH2 Atom’s type
Number of atoms…1 1CH
Input variables
C
O
CH3
C
CH2
CH
CH2
CH2
CH2
…C
1
1st sphere
CH2 CH3O
2 1 1
2nd sphere
![Page 11: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/11.jpg)
Data for PLS regressionAtom environment encoding
Sam
ples
Chemical shifts
X Y
![Page 12: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/12.jpg)
Find best structure encoding• Initially best scheme of structure
representation does not evident• We should find scheme which has best
accuracy• We should optimize
– substitutents coding scheme – number of used “spheres”
![Page 13: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/13.jpg)
Used data• 210 K of chemical shifts used as a training
set. • 170 K of chemical shifts from recent
literature used as external validation set.
![Page 14: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/14.jpg)
How to describe atom type
• Atom type (C, O, etc.).• Hybridization (sp3, sp2,
etc). • Valence• Number of neighbor H.• Charge• Distance to “central”
atom (bonds)
CH3
CH
CH
NH2
“Central” atom
“Substitutent”
7 (N)
1 (sp3)
320
3
![Page 15: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/15.jpg)
Result for different atom encoding
7.17
10.96
5.36
8.76
4.39
6.57
3.52
5.37
0.00
2.00
4.00
6.00
8.00
10.00
12.00
Atoms only + Elementtype
+Hybridization
+ All other
AverageDeviation
StandardDeviation
![Page 16: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/16.jpg)
Result for number of spheres
5.43
7.69
3.97
5.88
3.66
5.51
3.52
5.37
3.51
5.37
3.53
5.40
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
1 2 3 4 5 6
Number of "Spheres"
Averagedeviation
Standarddeviation
![Page 17: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/17.jpg)
Is it the best possible accuracy?
• Best possible average deviation is 3.5 ppm. • We need less than 3 ppm (2 is preferable).• Should we use additional variables?• We should be very careful adding variables.
![Page 18: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/18.jpg)
CH2 C
CH3
CH3
141,48125,90CH2 C
Cl
Cl
CH2 C
Cl
CH3
138,30
125,38CH2 C
H
Cl
Substitutents interference (cross effect)
CH2 C
H
H +2,48
122,90 CH2 C
H
CH3
134,16
+1.34 -1.94 -3.94
145.42127.86136.64
+11,26
![Page 19: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/19.jpg)
C
O
CH3
C
CH2
CH
CH2
CH2
CH2
Enhanced structure encoding
CH2 and CH Atom pair type
Number of pairs…1
Input variables
…1
Atoms Pairs of atoms (Crosses)
C and O
![Page 20: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/20.jpg)
1 2 3 44
32
1
22.22.42.6
2.8
3
3.2
3.4
3.6
Result for atom pairs (crosses)
Distance between atoms
within a crossNumber of spheres
Mea
n er
ror,
ppm
![Page 21: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/21.jpg)
More enhancements?
• Now accuracy is good enough (2.3 ppm)• But it is still bad in some cases• Unfortunately these cases are very
important• This “special” cases should be taken into
account
![Page 22: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/22.jpg)
Stereo effects: double bonds
CH3
OOH
CH3
CH3
CH3
25.7
17.6
3,9 A
2,9 A
• We use “topological” distance • Sometimes equal topological distance
correspond to different “real” distances
![Page 23: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/23.jpg)
Modified structure encoding
Atoms Pairs of atoms (Crosses) “Stereo” effects
Variables
![Page 24: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/24.jpg)
Prediction of spectra by different methods (mean error, ppm)
Taken into the account All types of atoms
CH3 =C
Atoms only 3,52 1,55 8,03
+ pairs of atoms (crosses)
2,32 1,50 3,22
+ “stereo” effects 2,27 1,24 3,22
+ solvent 2,25 1,24 3,20
+ to be continued?
![Page 25: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/25.jpg)
Size of training set
• We have 1.5 millions of chemical shifts• We should try to use all available data• Only one problem – matrix size• In many cases matrix size becomes more
than 2 GB
![Page 26: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/26.jpg)
Bigger dataset – smaller mean error!
0.00
1.00
2.00
3.00
4.00
5.00
1 2 4 8 16 32 64 128 207
Number of structures in training set (thousands)
Ave
rage
dev
iatio
n (p
pm)
![Page 27: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/27.jpg)
The final results
Method Average deviation
The rate of calculationshifts/sec.
Old Method - HOSE Codes 1.87 6
New Additive scheme 1.83 5800
Faster by 3 order!
![Page 28: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/28.jpg)
Prediction time: the past and present
NH
NH
O
O
CH3
CH3
OO
OCH3
Method Average deviation Time HOSE Codes 1.72 > 24 hoursAdditive scheme 1.63 2 min.
C29H32N2O5
![Page 29: Prediction of NMR Chemical Shifts. A Chemometrical Approach](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816020550346895dcf1fde/html5/thumbnails/29.jpg)
Conclusions
• Combination of “new” method with old well-known algorithm can produce very good (and unexpected) result