the basic technology research programme proof of concept studies & consortia building networks
Post on 21-Dec-2015
213 views
TRANSCRIPT
The Basic Technology Research Programme
Proof of Concept Studies & Consortia Building Networks
Background
• Cross research council endeavour– administered by EPSRC
• Funding for research to create a new technology
• Change the way we do science
• Underpin the future industrial base
Background
• 15 research projects funded up to April 2003
• Total funding for this period - £41M
• To support large, long term, high risk, high impact research consortia
• Encourage investigation of speculative ideas
Background
• Two levels of funding– One year start up– Full grant up to five years
• Two types of start up funding– Proof of concept– Consortia building networking
Proof of Concept Studies
• One year funding up to £100K• Research to investigate feasibility of
developing the new technology• Output – a business case for the next step of
investigation to be submitted in May 2004– Basic Technology Programme– Existing Research Council initiatives– DTI programmes
Consortia Building Networks
• Involvement of the users of the new technology at a very early stage
• Funding to form networks & hold workshops
ParaSurf – in silico Screening Technology
• Basic Technology Funding for October 2003 to September 2004– Proof of concept– Consortia building networking
• Academic partners– University of Portsmouth– University of Erlangen– University of Southampton– University of Oxford– University of Aberdeen
ParaSurf – Proof of Concept Research Programme
• Development of techniques to describe irregular solids & surfaces
• Development of projection & pattern recognition techniques for non-planar colour-coded surfaces
– spherical harmonics, molecular topology
• Conformational analysis• Rigid body dynamics incorporating surface features
– rigid parts of molecule treated as anisotropic solids linked by rotatable bonds
• Investigate how best to generate prediction models using surface properties that define a low dimensional chemical space
– QSAR, pattern recognition, artificial intelligence, analysis of surfaces
• Bench marking using Grid computing
ParaSurf – Proof of Concept Research Programme
Potential applications of the in silico screening technology
• High throughput virtual docking
• Physical property mapping
• ADMET prediction
• Long time-period simulation techniques
• Crystallisation and solubility
• Prediction of tautomers
• Chemical reactivity and metabolism
ParaSurf Progress Report
Letchworth, 16th March 2004
Main Areas
1. Molecular Surfaces and Property Calculation
2. RGB Encoding & Pattern Recognition3. Conformational Analysis4. Rigid Body Molecular Dynamics5. Analysis of Variables & QSAR models6. Grid Computing7. Consortium Building
Datasets
Small
Consensus Set of 74 Drug Molecules (diverse)
QSAR set (31 CoMFA steroids)
Medium
WDI subset (2,400 comps)
Harvard Chembank dataset (2,000 comps)
Large
WDI (50,000)
Maybridge (50,000)
Example Molecule
Allopurinol
Surface Definition & Local Property
Calculation
Calculations
3D co-ordinates from CORINA
QM calculations with VAMP
Local Properties and surfaces from ParaSurf
ParaSurf v1.0
SurfacesIsodensity Surfaces
Shrink WrapMarching Cube
Surfaces fit to Spherical Harmonics
PropertiesMEP, LIE, LEA and LPEncoded at points on the surfaceEncoded as Spherical Harmonic Expansions
Small molecule
RGB Encoding & Pattern Recognition
RGB Encoding
Each Local Property encoded as a colourLIE encoded on Red channel
LEA encoded on Green Channel
LP encoded on Blue Channel
Allopurinol RGB Surface
RGB Encoding
Alternative EncodingLIE
LEA
Absolute value of MEP
Allopurinol RGB Surface
Conformational Analysis
Conformational Analysis
Efficient All Atom MD analysis (DASH)Treated as time series (not Cluster Analysis)
Scales linearly with simulation length
No need for arbitrary choice of number of clusters
Can be analysed using Markov Chain methodology
MD studies of Rosiglitazone
Rigid Body Molecular Dynamics
Rigid body molecular dynamics
Well founded methodology e.g. CNS / XPLOR (Axel T. Brunger, Stanford University)
Idea is to use rigid groups to model flexibility:In the ligand
and the protein binding site.
Allows time-steps of 10fs to 20fs.
QSAR models
Distribution of Properties
Correlation Matrix
1-0.10.470.39MEP
-0.110.580.26LP
0.470.5810.44LEA
0.390.260.441LIE
MEPLPLEALIE
Descriptors
34 descriptors based on Normal Distribution
Principal Components
Spherical Harmonic Co-efficients
Descriptors for LIEmaxLIE
minLIE
LIE
LIE
2IE
Maximum value of the local ionization energy
Minimum value of the local ionization energy
Mean value of the local ionization energy
Range of the local ionization energy
Variance in the local ionization energy
Other Descriptors
MomentsOrder 1 – Mean
Order 2 – Variance
Order 3 – Skewness
Order 4 – Kurtosis
Overlapping GaussiansDerived from previous work on MD analysis
QSAR models
Models derived from Local PropertiesSurface Integral Model for Solvation Energy
RMS Error ~ 0.75 Kcal
Drug LikenessSOMs trained on WDI (drugs) & Maybridge (general)
Parameters from PC of Local Property Descriptors
Medium sized datasets superimposed on SOMs
GRID Computing
GRID Computing
ParaSurf compiled onSGI IRIXWindowsLinux (SUSE)IBM AIX
Future PlatformsSUN Solaris
GRID enabling at Portsmouth (Mark Baker), Southampton and Oxford.
Provisional Timings
SGI R10k, 256MBVAMP ~ 30s/compound
ParaSurf ~ 10s/compound
Intel 1.8 Xeon/ AMD Athlon XP-2000+ParaSurf ~ 2s/compound
SGI FUEL Workstation R14KParaSurf ~ 2s/compound
Conclusions
Conclusions
• Properties can be calculated
• Properties can be RGB encoded
• Properties are local
• Properties can be used for QSAR models
Computer vision methods for comparing molecular surfaces
• Comparing and recognising 3D objects is an active research area in robotics and AI.
• Fast methods have been developed for database indexing.
• Rotationally invariant descriptors of 3D objects are possible.
Pattern matching on molecular surfaces
• Can we recognise similar surfaces?
• Can we recognise similar surface fragments?
• Can we identify the most similar surface to our target?
• How do we compare field descriptors on the molecular surface?
Rotationally invariant 3D object descriptors
• Internal coordinates e.g. a distance matrix.
• Energy distributions based on the spherical harmonics.
• The spherical harmonic coefficients.
• Radial integration, radial scanning, and invariant moments.
Surface comparison
Two different approaches:1. Using spherical harmonic molecular
surfaces [J. Comp. Chem. 20(4) 383-395; Ritchie and
Kemp 2000; University of Aberdeen].2. Partial molecular alignment via local
structure analysis [J. Chem. Inf. Comput. Sci.
40(2) 503-512 ; Robinson, Lyne and Richards 1999;
University of Oxford].
An example grid of surface points
A grid is placed on a ParaSurf surface in order to reduce the number of surface points from 4038 to 55.
Partial molecular alignment
• We do not know which points on the two surfaces need to be aligned with each other.
• The essential approach is: all surface points on one surface are compared
with all points on the other.• For two surfaces, with M and N points, MN
possible alignments are possible: – we want to reduce this large search space!
Voting pairs are possible alignments
The voting pairs can have a critical effect on the quality of the surface alignment.
The voting table
• A voting table may list all matching pairs of surface points (i.e. all possible alignments).
• A smart editing of votes within the voting table can enable speed and accuracy. – We want to only consider alignments between
similar local features on the surfaces.– The more false votes we have in the voting
table the harder it is to find the optimum alignment.
A distance matrix can be used to describe local surface features
P1
P2
The internal distance matrix can be used to distinguish between surface points.
By comparing rows and columns from distance matricesof different surfaces we candetect similar surface features.
P3
Selecting the voting pairs
Similar local features, or interest points, on the molecular surface can be identified using a distance matrix.
For a point on each surface:1. Arrays of internal surface point distances are
calculated for both points i.e. dist1[], dist2[].2. After a crude alignment, the absolute difference
of dist1[] and dist2[] indicates the similarity of this pair of points.
Scoring the possible alignmentsThe optimum alignment is composed of a rotation R
and a translation T.• Apply the current rotation r:
1. Score the translation vectors t = p – q of all voting pairs (p,q) using a gravitational potential:
2. High potentials identify clusters of similar translation vectors.
3. The vector with the highest potential is the optimum translation T.
• Scoring all r gives R and T.
||
1
ji
ji
ji tt
P
Scoring with a gravitational potential
Translation vectors (x,y coordinates plotted)Some voting pairs for example rotations
Can we use the potential to compare aligned structures?
Can we get better alignments with more voting pairs?
Example alignments
1
3
4
2
Example 1: RMSD = 0.75
A
B
Example 2: RMSD = 1.05
A
B
Example 3: RMSD = 1.20
A
B
Example 4: RMSD = 1.89
A
B
Matching with the surface field descriptors: example 1
• Surfaces are aligned (using a quick search method; e.g. 45º rotations).
• Best N alignments are selected.
• Each alignment is gently perturbed and optimised using the field descriptors.
Matching with the surface field descriptors: example 2
• Align using the field descriptors’ values to identify suitable voting pairs: – only match on similar field descriptors.
• Filtering can be achieved by aligning the fields separately.
• More accurate alignments can be generated by combining field values.
Parameterisation• Voting pairs:
– The distance between points in surface grid. – The number of voting pairs.– Identifying and selecting local features. – How to represent the fields at interest points.
• Scoring:– Scoring function to identify the correct rotation and
translation (e.g. gravitational potential).– Target function to compare different surface alignments
(e.g. RMSD).
• Optimising the alignments.
Molecular Surface Property Graphs
Characterize the behaviour of a property
f : S
on a molecular surface S, in terms of a directed graph G on S derived from the gradient vector field
x = grad f(x)
Vertices (G) = fixed points of grad f (= critical points of f ).
Edges (G) = stable and unstable manifolds of the saddle points.
Gradient Flow
• minima• saddles• maxima
Molecular Surface Property Graph
Applications
• Similarity– Pattern recognition methods– Maximal common subgraphs
• Complementarity– Compare ligand graph with graph induced on ligand by receptor
• QSAR– Topological indices
ExampleS = Connolly Surface f(x) = Electrostatic Potential = ∑ q(i) / d(x,i)
Method
• Locate critical points of f (Newton-Raphson).
• Linearize at saddles, find eigenvectors of Hessian( f ).
• Integrate gradient vector field forward in time from 2 points on unstable eigenvector, backward in time from 2 points on stable eigenvector (Runge-Kutta).
• Integrate to boundary of Connolly surface patch, then continue on adjacent patch until reaching another critical point.
Allopurinol
8 maxima 7 minima13 saddles
#maxima – #saddles + #minima = (S) = 2
Work in Progress …
• Implementation for
S = spherical harmonic surface
f = MEP, LIE, LEA and LP
– Use images of triangulation points as starting points for Newton-Raphson search for critical points.
– Automatic differentiation.
Summary
Molecular
surfaces
QM properties
presented on
surface
Compound
screening
Pattern matching
on surfacesMartin Swain
Critical featuresDave Whitley
Data reduction
and QSARBrian Hudson
Spherical harmonic
representationDave Ritchie
Future directions
• High-throughput ligand docking– Superimposition of ligand and a “negative” of the
receptor
• Use of the fields to drive simulation– Use of the fields to derive intermolecular forces
– Rigid-body motions – long time-step MD
– Free energy calculations
A hierarchy of methods
• Rapid screening using computationally fast approaches– 3D fields – Andy Vinter
• On reduced set:– Semi-empirical property calculations and alignments
• On most interesting molecules:– Density-functional or ab-initio calculations and
alignment
• More accurate molecular representations are used as appropriate, as resources allow