![Page 1: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/1.jpg)
Computers in Chemistry
Dr John MitchellUniversity of St Andrews
![Page 2: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/2.jpg)
1. Why?
• Working with experiment to test our theories.
• Computer uses theory to calculate an answer that can be compared with experiment.
• If prediction and experiment don’t agree, something has to give.
![Page 3: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/3.jpg)
Atoms in molecules are not spherical
![Page 4: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/4.jpg)
To Test Our Theories
• The theory that lies beneath chemistry is ultimately quantum physics.
• To turn this into a prediction of the rate of a chemical reaction or the frequency of a transition in an IR spectrum requires a lot of computation.
![Page 5: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/5.jpg)
To Test Our Theories
• Computation’s ability to make accurate predictions of experimental measurements is a good test of the validity of a theory.
• We only understand if we can predict.
![Page 6: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/6.jpg)
Crystal Structure Prediction
• Given the structural diagram of an organic molecule, predict the 3D crystal structure.
S NBr
OO
Slide after SL Price, Int. Sch. Crystallography, Erice, 2004
![Page 7: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/7.jpg)
To Access Data that Experiment can’t
• Computational chemistry also provides a way of obtaining information that would be very difficult, expensive or time-consuming to get experimentally.
• Behaviour at very high temperature or pressure.
• Details of structure of liquids at atomic scale.• Dynamics of proteins.
![Page 8: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/8.jpg)
Phase Changes of Iron in the Earth’s Core
et al.,
![Page 9: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/9.jpg)
Structure of Liquid Water and Water Clusters
Computer simulations are an important source of evidence, since atomic scale details of an irregular structure are hard to obtain by experiment.
![Page 10: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/10.jpg)
Dynamic Motions of Proteins
X-ray crystallography gives a single static structure
![Page 11: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/11.jpg)
Dynamic Motions of Proteins
Simulation can show how the protein flexes
![Page 12: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/12.jpg)
2. The Power to Compute
![Page 13: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/13.jpg)
Development of Computer Power
University of Manchester SSEM, 1948
![Page 14: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/14.jpg)
Development of Computer Power
IBM Roadrunner, 2008
![Page 15: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/15.jpg)
Computer Power: Moore’s Law
Computer power doubles every two years: exponential growth
![Page 16: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/16.jpg)
Computer Power: Moore’s Law
Logarithmic scale
![Page 17: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/17.jpg)
Computer Power: Moore’s Law
This growth will, eventually, slow down as components reach atomic scale … we think!
![Page 18: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/18.jpg)
The Size of the Problem
![Page 19: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/19.jpg)
Scaling• Nonetheless, theoretical chemistry is expensive• Often cost scales as the fourth power of molecule size
0 10 20 30 40 50 600
100000
200000
300000
400000
500000
600000
700000Scaling of the Expense of a Typical Quantum Chemical Calculation
Atoms in Molecule
Time (seconds)
![Page 20: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/20.jpg)
Typical scaling is ~N4. For the foreseeable future, there will be chemical problems at the limit of our computing power.
![Page 21: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/21.jpg)
3. Philosophies of Computational Chemistry
![Page 22: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/22.jpg)
The Two Faces of Computational Chemistry
TheoreticalChemistryInformatics
![Page 23: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/23.jpg)
“The problem is difficult, but by making suitable approximations we can solve it at reasonable cost based on our understanding of physics and chemistry.”
Philosophy of Theoretical Chemistry
![Page 24: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/24.jpg)
Theoretical Chemistry
• Calculations and simulations based on real physics.
• Calculations are either quantum mechanical or use numbers derived from quantum mechanics.
• Attempt to model or simulate reality. • Usually Low Throughput.
![Page 25: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/25.jpg)
What Kinds of Theoretical Chemistry can be Done?
Prof. Eitan Geva
(1) Quantum Chemistry
![Page 26: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/26.jpg)
What Kinds of Theoretical Chemistry can be Done?
(1) Quantum Chemistry
Using quantum mechanics to solve the structures and energetics of molecules; everything depends on the distribution of electrons.
![Page 27: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/27.jpg)
What Kinds of Theoretical Chemistry can be Done?
(1) Quantum Chemistry
Although quantum chemistry involves solving Schrödinger’s equation, it is not fully exact. There are some approximations involved.
![Page 28: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/28.jpg)
What Kinds of Theoretical Chemistry can be Done?
(1) Quantum Chemistry
Wavefunction Distribution of electrons within the molecule
![Page 29: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/29.jpg)
What Kinds of Theoretical Chemistry can be Done?
(1) Quantum Chemistry
Distribution of electrons Physical and chemical behaviour of the molecule
![Page 30: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/30.jpg)
What Kinds of Theoretical Chemistry can be Done?
(1) Quantum Chemistry
There are two main kinds of quantum chemistry: • Ab initio• Density Functional Theory
![Page 31: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/31.jpg)
What Kinds of Theoretical Chemistry can be Done?
(1) Quantum Chemistry
Ab initio “from first principles”.
• Solve Schrödinger equation to get wavefunction.• In principle rigorous – we know what we calculate.• But the standard “Hartree-Fock” method contains
significant approximations.• Expensive to adjust for these and get more accuracy.
![Page 32: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/32.jpg)
What Kinds of Theoretical Chemistry can be Done?
(1) Quantum Chemistry
Density Functional Theory
• Makes use of the theorem that all properties of interest can be determined directly from the electron density.
• True in principle, but the correct “functional” is unknown.• Less rigorous than ab initio, but usually more accurate for
an equivalent cost (or cheaper for similar accuracy).
![Page 33: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/33.jpg)
What Kinds of Theoretical Chemistry can be Done?
(2) Molecular Simulation
![Page 34: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/34.jpg)
What Kinds of Theoretical Chemistry can be Done?
(2) Molecular Simulation
There are various techniques for simulating molecules, the most significant is probably Molecular Dynamics.
Molecular Dynamics makes a “balls-and-springs” model of the molecule in the computer, and follows its behaviour over time.
![Page 35: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/35.jpg)
What Kinds of Theoretical Chemistry can be Done?
(2) Molecular Simulation
Light-harvesting protein subunit.
![Page 36: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/36.jpg)
What Kinds of Theoretical Chemistry can be Done?
(2) Molecular Simulation
Time steps need to be very, very short (~10-15 seconds), so it takes a million steps to simulate one nanosecond of real time and a billion steps to simulate a microsecond.
So it is hard to directly simulate relatively slow or rare events, such as protein folding.
![Page 37: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/37.jpg)
What Kinds of Theoretical Chemistry can be Done?
(2) Molecular Simulation
Also, a balls-and-springs model lacks the quantum mechanics needed to simulate a chemical reaction.
Nonetheless, molecular dynamics is very important for understanding shape changes, interactions and energetics of large molecules.
![Page 38: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/38.jpg)
The Two Faces of Computational Chemistry
TheoreticalChemistryInformatics
![Page 39: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/39.jpg)
Philosophy of Informatics“The problem is too difficult to solve at reasonable cost based on real physics and chemistry, so instead we will build a purely empirical model to predict the required molecular properties from chemical structure, using the available data.”
![Page 40: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/40.jpg)
Informatics
• In general, informatics methods represent phenomena mathematically, but not in a physics-based way.
• Inputs and output model are based on an empirically parameterised equation or more elaborate mathematical model.
• Do not attempt to simulate reality. • Usually High Throughput.
![Page 41: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/41.jpg)
What is Cheminformatics?
Calculating or predicting molecular properties without using a physics-based approach. Rather than modelling how the molecular world really works, cheminformatics is an empirical discipline, using available data to find correlations between chemical structure and properties.
Cheminformatics techniques are often used in drug discovery and pharmaceutical research, and the requirements of the pharmaceutical industry have dominated the development of the subject.
![Page 42: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/42.jpg)
Modelling in Chemistry
Density Functional Theoryab initio
Molecular Dynamics
Monte Carlo
Docking
PHYSICS-BASED
EMPIRICAL
AT
OM
IST
IC
Car-Parrinello
NO
N-A
TO
MIS
TIC
DPD
CoMFA
2-D QSAR/QSPR
Machine Learning
AM1, PM3 etc.Fluid Dynamics
LOW THROUGHPUT
HIGH THROUGHPUT
![Page 43: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/43.jpg)
4. How Best to Compute Solubility?
![Page 45: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/45.jpg)
Which would you Prefer ...
or ?
Solubility in water (and other biological fluids) is highly desirable for pharmaceuticals!
![Page 46: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/46.jpg)
Solubility is an important issue in drug discovery and a major cause of failure of drug development projects
This is expensive for the industry
A good computational model for predicting the solubility of druglike molecules would be very valuable.
![Page 47: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/47.jpg)
Drug Disc.Today, 10 (4), 289 (2005)
![Page 48: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/48.jpg)
Our Methods …
(A) Thermodynamic Cycle (Theoretical chemistry)
![Page 49: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/49.jpg)
We want to construct a theoretical model that will predict solubility for druglike molecules …
We expect our model to use real physics and chemistry and to give some insight …
We don’t expect it to be fast by informatics standards, but it should be reasonably accurate …
Our Thermodynamic Cycle method …
![Page 50: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/50.jpg)
Can we use theoretical chemistry to calculate solubility via a thermodynamic cycle?
![Page 51: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/51.jpg)
Gsub comes from lattice energy minimisation based on the experimental crystal structure.
![Page 52: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/52.jpg)
Calculate Energy of Infinite Periodic Lattice
Unit cell
![Page 53: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/53.jpg)
Calculate Energy of Infinite Periodic Lattice
• Take one molecule• Solve its Schrödinger equation• Calculate its interactions• Allow unit cell to change• Find best size, shape, packing• Find energy of infinite lattice
This is the same methodology as used in crystal structure prediction.
![Page 54: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/54.jpg)
Gsub comes from lattice energy minimisation based on the experimental crystal structure.
![Page 55: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/55.jpg)
Gsolv comes from a computational solvation model, RISM
![Page 56: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/56.jpg)
Model of Solvent-Solute Interaction
Calculate energy of interaction between solute and solvent
Model is called RISM
![Page 57: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/57.jpg)
Gsolv comes from model of solvent-solute interaction
![Page 58: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/58.jpg)
Theoretical Chemistry: Solubility Results
![Page 59: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/59.jpg)
Theoretical Chemistry: Solubility Results
These results are OK, but we would hope to do better
![Page 60: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/60.jpg)
Our Methods …
(B) Random Forest (informatics)
![Page 61: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/61.jpg)
We want to construct a model that will predict solubility for druglike molecules …
We don’t expect our model either to use real physics and chemistry or to be easily interpretable …
We do expect it to be fast and reasonably accurate …
Our Random Forest Model …
![Page 62: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/62.jpg)
Random ForestThis is a decision tree.
We use lots of them to make a forest!
A Machine Learning Method
![Page 63: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/63.jpg)
Random ForestThis is a decision tree.
![Page 64: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/64.jpg)
Random ForestGenerate more trees randomly.(1) By randomly sampling with replacement to make different “bootstrap
samples” of the data for each tree.
![Page 65: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/65.jpg)
Random ForestGenerate more trees randomly.(2) By randomly choosing the pool of questions to ask of the data for each
node (junction) of each tree.
![Page 66: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/66.jpg)
Random Forest● Machine Learning method introduced by Briemann and Cutler (2001)● Development of Decision Trees (Recursive Partitioning):
● Dataset is partitioned into consecutively smaller subsets
● Each partition is based upon the value of one descriptor
● The descriptor used at each split is selected so as to optimise splitting
● Bootstrap sample of N objects chosen from the N available objects with replacement
![Page 67: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/67.jpg)
Random ForestGenerate more trees randomly.
![Page 68: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/68.jpg)
Random ForestGenerate more trees randomly.
![Page 69: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/69.jpg)
Random ForestGenerate more trees randomly.
![Page 70: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/70.jpg)
Random ForestGenerate more trees randomly.
We use lots of them to make a forest!
![Page 71: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/71.jpg)
Random Forest for Solubility Prediction
A Forest of Regression Trees
Each leaf contains a group of molecules with similar solubility.
![Page 72: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/72.jpg)
Random Forest
• The molecules whose solubility is to be predicted are run through every tree (~ flow chart) in the forest.
• Each tree predicts a solubility for each molecule.
• We average the predictions over hundreds of different trees.
![Page 73: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/73.jpg)
Random Forest
![Page 74: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/74.jpg)
Random Forest: Solubility Results
RMSE(te)=0.69r2(te)=0.89Bias(te)=-0.04
RMSE(tr)=0.27r2(tr)=0.98Bias(tr)=0.005
RMSE(oob)=0.68r2(oob)=0.90Bias(oob)=0.01
DS Palmer et al., J. Chem. Inf. Model., 47, 150-158 (2007)
![Page 75: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/75.jpg)
RMSE(te)=0.69r2(te)=0.89Bias(te)=-0.04
RMSE(tr)=0.27r2(tr)=0.98Bias(tr)=0.005
RMSE(oob)=0.68r2(oob)=0.90Bias(oob)=0.01
DS Palmer et al., J. Chem. Inf. Model., 47, 150-158 (2007)
These results are competitive with the best solubility prediction methods
![Page 76: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/76.jpg)
What Have we Learned?
• For this particular problem, informatics does a bit better than pure theoretical chemistry.
![Page 77: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/77.jpg)
How to Utilise Informatics
• Fast informatics models can be integrated into drug discovery to compute solubilities for molecules before deciding whether to synthesise them.
• Saving much time and money on making useless compounds.
![Page 78: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/78.jpg)
Fits into drug discovery pipeline here
![Page 79: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/79.jpg)
Why Pursue Theory?
• Theory promises to give a greater understanding of why some molecules are more soluble than others.
• Advances in theory can be transferable to other contexts.
• Theoretical models can be systematically improved.
![Page 80: Computers in Chemistry Dr John Mitchell University of St Andrews](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0c5503460f949e1030/html5/thumbnails/80.jpg)