molecular and data visualization in drug discovery
TRANSCRIPT
Molecular and Data Visualization in Drug Discovery
Deepak Bandyopadhyay
GlaxoSmithKline
Intro: Human Body & Disease Biology
• From Wikipedia: – Abnormal condition that affects part or all of an organism.
– Associated with specific symptoms and signs.
• Causes: – Single cause, e.g. pathogen, poison, nutrient deficiency, genetics
– Multiple factors including environment, lifestyle, genetics
http://www.biologyguide.net/biol1/1_disease.htm
Mycobacterium tuberculosis
Chest X-ray showing lung cancer
Drug Discovery Parts/Timeline
Focus of Drug Discovery
• Narrow down on one or a few substances to test in humans and develop into a drug that treats a disease
Components:
Target Selection and Validation
genome
protein
link to disease
disease
genetics
pathology
biological target
In Vitro Biology Medicinal Chemistry (Lead Optimization)
Lead Discovery (a.k.a. Screening)
In Vivo Biology
Molecular and Data Visualization
• The two parts of my job at GSK!
• Molecules: – small (drugs/peptides) and large
(proteins/DNA/RNA/lipids)
– visualized in 1D (SMILES), 2D (structure), 3D (coords / conformations), 4D (Mol. Dynamics)
• Data: – Format: numeric / text,
continuous / categorical, Delimited/database/XML/proprietary
– Source: instruments, manual entry, calculation
– About drug discovery projects (key: molecule ID), genomics/proteomics (key: gene/protein ID), clinical studies (key: anon. patient ID), …
Ibuprofen
DRUG
PROTEIN
EGFR
Ball and stick
EGFR ribbons
Movie: Introduction to Drug Design
By Schrödinger (molecular modeling software company): https://www.youtube.com/watch?v=u49k72rUdyc
Bioactivity 101
• Concentration-Response curve and IC50
• Structure Activity Relationship (SAR)
pIC50 = -log IC50 IC50 = 12.8 uM (micromolar) pIC50 = 6-log10IC50 = 4.89
Think Avogadro,
pH…
Molecular Visualization Deconstructed
• Representations • Navigation
• Interaction • What would you add?
Aspirin (ligand)
Cox-1 (protein)
Binding pocket surface
polar +ve charge
hydrophobic
-ve charge
XY translate, Z zoom Rotate about X/Y or Z E.g. in program MOE
F1 F2
F3 Save/restore scenes
Select Hide/Show Center Prev/Next Scene Expand Sel. Import/Export Align Compute…
Purposes of Molecule Visualization
• Understand and rationalize “SAR” in 3D
• (Protein) Structure-Based Drug Design. E.g.: – Aspirin Binds COX1/2, Celebrex binds COX2 only
• Clearly illustrate biological systems / processes
• What other tasks can you think of?
Case study 1: Protein-Protein Interactions HIV-1 coat protein gp120 bound to antibody 17b (Light, Heavy) and CD4
gp120/CD4 interface gp120/antibody L/H interface
Rank color: > > > > > >
Ban, Y. E. A., Edelsbrunner, H., & Rudolph, J. (2006). Interface surfaces for protein-protein complexes. J. ACM, 53(3), 361-378.
Case-Study 2: Molecular Dynamics Simulation of a drug entering into the binding site of a target protein
Decherchi et al., Nature Comms. 6(6155), 2015. https://www.youtube.com/watch?v=ckTqh50r_2w
From Molecules to Data
Mol spreadsheets, visualizations
StarDrop Glowing Molecules™ image from http://www.asteris-app.com/technical-info.htm
Hybrid molecule/data visualization
Software Systems: Spotfire
• Feature set / distinguishing factors: – Handling large datasets via filtering and
memory management
– Tabular file (CSV, Excel) or database input
– Multiple, configurable visualization types
– Easy enough for domain experts to use / share
– Life science add-ons
• Molecule depiction
• Specialized –omics packages
Binned pIC50 trellised by HBA and HBD pIC50 vs. % inh
Software Systems: LiveDesign
• Consolidate multiple disconnected tools for molecule design
– Integrated Single Platform
– Intuitive UI
– 2D, 3D, Data & Visuals
– Social aspect
Dimensions, dimensions…
• Molecules: 1D (SMILES e.g. c1ccccc1), 2D (depiction), 3D (coords), 4D (motion)
• Data: – 100s of activities, measured and predicted properties
per row (compound) – ~100K for gene expression, clinical trial data – Millions for –omics, next-gen sequencing – Then there’s systems biology…
• Dimensionality reduction is a key capability – PCA, SOM, Stochastic Proximity Embedding,…
Challenges / Types of Visualization
• Key capabilities for data visualization
– Large data human comprehension
– High-level summary + drill-down
– Quickly (auto?) isolate interesting data points
http://guides.library.duke.edu/datavis/vis_types
map
SOM
Parallel coords
Heat map protein
Volume rendering
http://flagshipbio.com/amino-acid-structure-properties-using-self-organizing-maps/
Radar plot
Box Plot
Sunburst
2D 3D nD hierarchical
Dendro-gram
Network/Graph layout
Wikipedia
All the Data at Once: Vlaaivis
T. J. Howe, G. Mahieu, P. Marichal, T. Tabruyn and P. Vugts. Data reduction and representation in drug discovery. Drug Discovery Today 12(1/2):45-53 Jan 2007 R
All the Data at Once (cont’d): Radar Plots
• Circular histogram for viewing multi-parameter results
The influence of the 'organizational factor' on compound quality in drug discovery Paul D. Leeson & Stephen A. St-Gallay Nature Reviews Drug Discovery 10, 749-765 (October 2011)
Property differences are scaled to either +1, whereby the company with a positive ('best') property value had the highest magnitude, or −1, whereby the company with the lowest ('worst') value had the highest magnitude.
Visualizing Large Datasets
P. Ertl & B. Rohde, J. Cheminformatics 4(12), 2012
Gaspar et al. J. Chem. Inf. Model., 2015, 55 (1), pp 84–94
Network-like similarity graph
Bajorath et al.
• Dimensionality reduction
• Graph layout
• Activity landscape
• Probabilistic property plots
• Scaffold abstraction
Steven Muchmore, Abbott Labs (now Abbvie)
Molecule cloud
Molecular Property 1
Mo
lecu
lar P
rop
erty
2
Pro
bab
ility
of s
ucc
ess
(cro
ssin
g ce
ll m
emb
ran
e)
SAR Tables
• SAR: Structure-Activity Relationship – Split molecule: core/scaffold, pendant R-groups
– SAR Table: molecule spreadsheet with R-groups and Activity Data
(-OH)
(-COOH)
SAR Maps - R1 vs. R2 on a Core
Sele
ctiv
e fo
r p
rote
in 1
pIC
50
2 ‒
pIC
501
S
elec
tive
fo
r p
rote
in 2
R1 R
2
Core “scaffold”:
D. K. Agrafiotis et al. SAR Maps: A New SAR Visualization Technique for Medicinal Chemists. J. Med. Chem., 2007, 50 (24), 5926–5937.
Clustering
• Based on chemical descriptors, biological activity, etc…
• Agglomerative or hierarchical
Hoek, Keith S. et al.: Metastatic potential of melanomas defined by specific gene expression profiles with no BRAF signature. Pigment Cell Research 19 (4), 290-302
http://chemmine.ucr.edu/help/
Molecules Genes
Limitations of Clustering
Molecule single cluster, can be limiting
seals (fur)
?
singleton
?
ducks (bill)
?
penguins (flipper)
?
Cluster 3 Cluster 10
similar molecules ≠ same cluster
Many singletons
Complete Link Cluster ID
Clu
ster
Siz
e
Automatic Decomposition into
(All) Overlapping Scaffolds Malarial parasite assay pIC50 8.1
… 49 total
… 226 total
2 total
Molecule
Scaffold(s)
Related Molecules
8.2
Avg pIC50 8.15
Avg pIC50 7.8
Avg pIC50 7.8
Next Step: Combine with Activities and Properties
… 49 total
… 226 total
2 total
8.5
8.2
8.0
7.5
7.7
8.5
7.4
7.9
7.7 8.2
Molecule
Scaffold(s)
Annotation
Related Molecules
Case Study: Linking Molecules By Scaffolds
• Use aggregate properties for decision making
• Find related molecules with improved properties
Improving property 1
Imp
rovi
ng
act
ivit
y 2
Aggregate (scaffold)
↓ Drill down
(8 molecules)
Improving activity 3
Im
pro
vin
g p
rop
erty
4
> Keep top half of molecule,
substitute bottom half
Example 1 Example 2
Summary and Lessons Learned
• Drug discovery has specialized types of data that are best understood by visualization
• Good visualizations can support the making of good decisions (and the converse: GIGO…)
• The human element is important – visuals and analytics should be creatable/usable by scientists
• As new visual analytics experts, consider careers in an industry where you can add value and be creative
– Subtle plug for drug discovery
Future Directions and Challenges in Data Visualization for Drug Discovery
• Human vs. Machine or Human + Machine ?
• Automate tediousness of data prep/integration
• Intuitiveness by design
• Interconnection by design
• Integration of latest visualization techniques developed for other domains
• Using emerging media eg. VR, Kinect
• What can you think of?