the shape of things to come? linkedin: fractal dimensionality and … · 2020. 1. 23. · •...

Abstract:

Fractal dimension1,2 can be used to capture shape

information about small and macro-molecules. We

utilise this method to generate shape descriptions of

14,556 crystal structures obtained from the sc-PDB3

database. We trained 11 sequence translation

models to generate ligand fingerprints from protein

representations, generating ligand “answers” to

protein “questions”. For three-quarters of the test

set, reconstructed fingerprints were similar enough

to that found in the crystallographic data to enable

virtual screening, based on target analysis alone.

The shape of things to come?

Fractal dimensionality and its applications

in deep-learning-driven ligand-receptor

interaction prediction.

Ryan Byrne - Dr. H. Chen, AstraZeneca Mölndal, Prof. Dr. G. Schneider, ETH Zürich

References

1. Mandelbrot, B. (1967). Science, 156, 636-638.

2. Grassberger, P., & Procaccia, I. (1983). Physica D: Nonlinear Phenomena, 9, 189–208.

3. Kellenberger, E., Muller, P., et al. (2006). Journal of Chemical Information and Modeling, 46(2), 717–727

4. Vaswani, A., Shazeer, N., et al. (2017). Advances in Neural Information Processing Systems, 5998.

Data preparation:

• We extracted and cleaned 14,556 crystal

structures from the sc-PDB

crystallographic database.

• We then generated FD fingerprints for

each ligand (in its bound conformation),

and a protein fingerprint for all residues

within 3.5Å.

• We retained 10% of these as a test set.

• A low (0.33±0.09) average pairwise

Tanimoto similarity was observed

between protein pocket fingerprints.

Follow us @Aegis_ITN

“This project has received funding from the European Union’s Framework

Programme for Research and Innovation Horizon 2020 (2014-2020) under

the Marie Skłodowska-Curie Grant Agreement No. 675555, Accelerated

Early staGe drug dIScovery (AEGIS).”

Retrospective and

prospective

studies

Back-

translation?

Model training:

• We considered 11 sequence-to-sequence

architectures, and 76 hyper-parameter

combinations.

• Models were trained to attempt reconstruction

of the ligand shape fingerprint, based on the

associated protein fingerprint

• Assessment was via perplexity (a measure of

the uncertainty of the models about each

decision) and accuracy.

Outcomes

• The novel transformer4 architecture was

the best performing, by some measure.

• Our final architecture is a four layer

transformer, with a dense-layer width of

512, and eight attention heads, with the

Adam optimiser.

• Our model could regenerate adequate

or excellent reconstructions in three-

quarters of the examples tested, with

the latter category representing a third

of the total.

Fractal dimension: A user's guide

Fractal dimension (FD) is a measure of the

roughness and complexity of a surface. More

specifically, it describes how the properties of a

surface vary with the scale at which they are

measured.

These non-integer measures of dimensionality

correspond to the complexity and contortion of a

surface in predictable ways, and allow us to rapidly

rank molecules based on their shape. We can also

use it to describe target pockets.

We adopt this formalism to perform fast, shape-

based virtual screening, and to analyse target

pockets.

FD ≈ 2 2.4 ≤ FD ≤ 2.6 FD ≈ 3

Poor recovery VS Useful VS Ready

Random

performance

Figure 3: Ligand fingerprints reconstructed (predicted) from protein translation

compared to those extracted (calculated) from the scPDB. Approximately three-

quarters are ‘useful’ or better, a characterisation based on performance in large-

scale retro- and prospective studies. A third are ‘VS ready’, based on the same

analysis.

Figure 2: Maximum accuracy and perplexity achieved on the validation

set for each trained model. Best model per architecture family (LSTM,

GRU, CNN, Transformer) highlighted.

Validation Perplexity

Val

idat

ion

A

ccu

racy

Figure 1: Illustration of the defined pocket region for a protein-ligand complex

(ligand extracted for clarity). Pocket and ligand shapes are captured in the

fingerprints with corresponding colour key. Remaining fingerprint is that

generated by the deep-learning model, and compared against the experimental

version. PDB-ID:5N2F.

Combining our developed shape descriptors with deep-learning resulted in a

model which could create useful fingerprints for virtual screening in three-

quarters of cases, based on analysis of target structure alone.

Twitter:

@Ryan_Byrne_

LinkedIn:

the shape of things to come? linkedin: fractal dimensionality and … · 2020. 1. 23. · •...

Documents