the shape of things to come? linkedin: fractal dimensionality and … · 2020. 1. 23. · •...
TRANSCRIPT
Abstract:
Fractal dimension1,2 can be used to capture shape
information about small and macro-molecules. We
utilise this method to generate shape descriptions of
14,556 crystal structures obtained from the sc-PDB3
database. We trained 11 sequence translation
models to generate ligand fingerprints from protein
representations, generating ligand “answers” to
protein “questions”. For three-quarters of the test
set, reconstructed fingerprints were similar enough
to that found in the crystallographic data to enable
virtual screening, based on target analysis alone.
The shape of things to come?
Fractal dimensionality and its applications
in deep-learning-driven ligand-receptor
interaction prediction.
Ryan Byrne - Dr. H. Chen, AstraZeneca Mölndal, Prof. Dr. G. Schneider, ETH Zürich
References
1. Mandelbrot, B. (1967). Science, 156, 636-638.
2. Grassberger, P., & Procaccia, I. (1983). Physica D: Nonlinear Phenomena, 9, 189–208.
3. Kellenberger, E., Muller, P., et al. (2006). Journal of Chemical Information and Modeling, 46(2), 717–727
4. Vaswani, A., Shazeer, N., et al. (2017). Advances in Neural Information Processing Systems, 5998.
Data preparation:
• We extracted and cleaned 14,556 crystal
structures from the sc-PDB
crystallographic database.
• We then generated FD fingerprints for
each ligand (in its bound conformation),
and a protein fingerprint for all residues
within 3.5Å.
• We retained 10% of these as a test set.
• A low (0.33±0.09) average pairwise
Tanimoto similarity was observed
between protein pocket fingerprints.
Follow us @Aegis_ITN
“This project has received funding from the European Union’s Framework
Programme for Research and Innovation Horizon 2020 (2014-2020) under
the Marie Skłodowska-Curie Grant Agreement No. 675555, Accelerated
Early staGe drug dIScovery (AEGIS).”
Retrospective and
prospective
studies
Back-
translation?
Model training:
• We considered 11 sequence-to-sequence
architectures, and 76 hyper-parameter
combinations.
• Models were trained to attempt reconstruction
of the ligand shape fingerprint, based on the
associated protein fingerprint
• Assessment was via perplexity (a measure of
the uncertainty of the models about each
decision) and accuracy.
Outcomes
• The novel transformer4 architecture was
the best performing, by some measure.
• Our final architecture is a four layer
transformer, with a dense-layer width of
512, and eight attention heads, with the
Adam optimiser.
• Our model could regenerate adequate
or excellent reconstructions in three-
quarters of the examples tested, with
the latter category representing a third
of the total.
Fractal dimension: A user's guide
Fractal dimension (FD) is a measure of the
roughness and complexity of a surface. More
specifically, it describes how the properties of a
surface vary with the scale at which they are
measured.
These non-integer measures of dimensionality
correspond to the complexity and contortion of a
surface in predictable ways, and allow us to rapidly
rank molecules based on their shape. We can also
use it to describe target pockets.
We adopt this formalism to perform fast, shape-
based virtual screening, and to analyse target
pockets.
FD ≈ 2 2.4 ≤ FD ≤ 2.6 FD ≈ 3
Poor recovery VS Useful VS Ready
Random
performance
Figure 3: Ligand fingerprints reconstructed (predicted) from protein translation
compared to those extracted (calculated) from the scPDB. Approximately three-
quarters are ‘useful’ or better, a characterisation based on performance in large-
scale retro- and prospective studies. A third are ‘VS ready’, based on the same
analysis.
Figure 2: Maximum accuracy and perplexity achieved on the validation
set for each trained model. Best model per architecture family (LSTM,
GRU, CNN, Transformer) highlighted.
Validation Perplexity
Val
idat
ion
A
ccu
racy
Figure 1: Illustration of the defined pocket region for a protein-ligand complex
(ligand extracted for clarity). Pocket and ligand shapes are captured in the
fingerprints with corresponding colour key. Remaining fingerprint is that
generated by the deep-learning model, and compared against the experimental
version. PDB-ID:5N2F.
Combining our developed shape descriptors with deep-learning resulted in a
model which could create useful fingerprints for virtual screening in three-
quarters of cases, based on analysis of target structure alone.
Twitter:
@Ryan_Byrne_
LinkedIn: