molecular and data visualization in drug discovery

28
Molecular and Data Visualization in Drug Discovery Deepak Bandyopadhyay GlaxoSmithKline

Upload: deepak-bandyopadhyay

Post on 12-Apr-2017

366 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Molecular and data visualization in drug discovery

Molecular and Data Visualization in Drug Discovery

Deepak Bandyopadhyay

GlaxoSmithKline

Page 2: Molecular and data visualization in drug discovery

Intro: Human Body & Disease Biology

• From Wikipedia: – Abnormal condition that affects part or all of an organism.

– Associated with specific symptoms and signs.

• Causes: – Single cause, e.g. pathogen, poison, nutrient deficiency, genetics

– Multiple factors including environment, lifestyle, genetics

http://www.biologyguide.net/biol1/1_disease.htm

Mycobacterium tuberculosis

Chest X-ray showing lung cancer

Page 3: Molecular and data visualization in drug discovery

Drug Discovery Parts/Timeline

Focus of Drug Discovery

• Narrow down on one or a few substances to test in humans and develop into a drug that treats a disease

Components:

Target Selection and Validation

genome

protein

link to disease

disease

genetics

pathology

biological target

In Vitro Biology Medicinal Chemistry (Lead Optimization)

Lead Discovery (a.k.a. Screening)

In Vivo Biology

Page 4: Molecular and data visualization in drug discovery

Molecular and Data Visualization

• The two parts of my job at GSK!

• Molecules: – small (drugs/peptides) and large

(proteins/DNA/RNA/lipids)

– visualized in 1D (SMILES), 2D (structure), 3D (coords / conformations), 4D (Mol. Dynamics)

• Data: – Format: numeric / text,

continuous / categorical, Delimited/database/XML/proprietary

– Source: instruments, manual entry, calculation

– About drug discovery projects (key: molecule ID), genomics/proteomics (key: gene/protein ID), clinical studies (key: anon. patient ID), …

Ibuprofen

DRUG

PROTEIN

EGFR

Ball and stick

EGFR ribbons

Page 5: Molecular and data visualization in drug discovery

Movie: Introduction to Drug Design

By Schrödinger (molecular modeling software company): https://www.youtube.com/watch?v=u49k72rUdyc

Page 6: Molecular and data visualization in drug discovery

Bioactivity 101

• Concentration-Response curve and IC50

• Structure Activity Relationship (SAR)

pIC50 = -log IC50 IC50 = 12.8 uM (micromolar) pIC50 = 6-log10IC50 = 4.89

Think Avogadro,

pH…

Page 7: Molecular and data visualization in drug discovery

Molecular Visualization Deconstructed

• Representations • Navigation

• Interaction • What would you add?

Aspirin (ligand)

Cox-1 (protein)

Binding pocket surface

polar +ve charge

hydrophobic

-ve charge

XY translate, Z zoom Rotate about X/Y or Z E.g. in program MOE

F1 F2

F3 Save/restore scenes

Select Hide/Show Center Prev/Next Scene Expand Sel. Import/Export Align Compute…

Page 8: Molecular and data visualization in drug discovery

Purposes of Molecule Visualization

• Understand and rationalize “SAR” in 3D

• (Protein) Structure-Based Drug Design. E.g.: – Aspirin Binds COX1/2, Celebrex binds COX2 only

• Clearly illustrate biological systems / processes

• What other tasks can you think of?

Page 9: Molecular and data visualization in drug discovery

Case study 1: Protein-Protein Interactions HIV-1 coat protein gp120 bound to antibody 17b (Light, Heavy) and CD4

gp120/CD4 interface gp120/antibody L/H interface

Rank color: > > > > > >

Ban, Y. E. A., Edelsbrunner, H., & Rudolph, J. (2006). Interface surfaces for protein-protein complexes. J. ACM, 53(3), 361-378.

Page 10: Molecular and data visualization in drug discovery

Case-Study 2: Molecular Dynamics Simulation of a drug entering into the binding site of a target protein

Decherchi et al., Nature Comms. 6(6155), 2015. https://www.youtube.com/watch?v=ckTqh50r_2w

Page 11: Molecular and data visualization in drug discovery

From Molecules to Data

Mol spreadsheets, visualizations

StarDrop Glowing Molecules™ image from http://www.asteris-app.com/technical-info.htm

Hybrid molecule/data visualization

Page 12: Molecular and data visualization in drug discovery

Software Systems: Spotfire

• Feature set / distinguishing factors: – Handling large datasets via filtering and

memory management

– Tabular file (CSV, Excel) or database input

– Multiple, configurable visualization types

– Easy enough for domain experts to use / share

– Life science add-ons

• Molecule depiction

• Specialized –omics packages

Binned pIC50 trellised by HBA and HBD pIC50 vs. % inh

Page 13: Molecular and data visualization in drug discovery

Software Systems: LiveDesign

• Consolidate multiple disconnected tools for molecule design

– Integrated Single Platform

– Intuitive UI

– 2D, 3D, Data & Visuals

– Social aspect

Page 14: Molecular and data visualization in drug discovery

Dimensions, dimensions…

• Molecules: 1D (SMILES e.g. c1ccccc1), 2D (depiction), 3D (coords), 4D (motion)

• Data: – 100s of activities, measured and predicted properties

per row (compound) – ~100K for gene expression, clinical trial data – Millions for –omics, next-gen sequencing – Then there’s systems biology…

• Dimensionality reduction is a key capability – PCA, SOM, Stochastic Proximity Embedding,…

Page 15: Molecular and data visualization in drug discovery

Challenges / Types of Visualization

• Key capabilities for data visualization

– Large data human comprehension

– High-level summary + drill-down

– Quickly (auto?) isolate interesting data points

http://guides.library.duke.edu/datavis/vis_types

map

SOM

Parallel coords

Heat map protein

Volume rendering

http://flagshipbio.com/amino-acid-structure-properties-using-self-organizing-maps/

Radar plot

Box Plot

Sunburst

2D 3D nD hierarchical

Dendro-gram

Network/Graph layout

Wikipedia

Page 16: Molecular and data visualization in drug discovery

All the Data at Once: Vlaaivis

T. J. Howe, G. Mahieu, P. Marichal, T. Tabruyn and P. Vugts. Data reduction and representation in drug discovery. Drug Discovery Today 12(1/2):45-53 Jan 2007 R

Page 17: Molecular and data visualization in drug discovery

All the Data at Once (cont’d): Radar Plots

• Circular histogram for viewing multi-parameter results

The influence of the 'organizational factor' on compound quality in drug discovery Paul D. Leeson & Stephen A. St-Gallay Nature Reviews Drug Discovery 10, 749-765 (October 2011)

Property differences are scaled to either +1, whereby the company with a positive ('best') property value had the highest magnitude, or −1, whereby the company with the lowest ('worst') value had the highest magnitude.

Page 18: Molecular and data visualization in drug discovery

Visualizing Large Datasets

P. Ertl & B. Rohde, J. Cheminformatics 4(12), 2012

Gaspar et al. J. Chem. Inf. Model., 2015, 55 (1), pp 84–94

Network-like similarity graph

Bajorath et al.

• Dimensionality reduction

• Graph layout

• Activity landscape

• Probabilistic property plots

• Scaffold abstraction

Steven Muchmore, Abbott Labs (now Abbvie)

Molecule cloud

Molecular Property 1

Mo

lecu

lar P

rop

erty

2

Pro

bab

ility

of s

ucc

ess

(cro

ssin

g ce

ll m

emb

ran

e)

Page 19: Molecular and data visualization in drug discovery

SAR Tables

• SAR: Structure-Activity Relationship – Split molecule: core/scaffold, pendant R-groups

– SAR Table: molecule spreadsheet with R-groups and Activity Data

(-OH)

(-COOH)

Page 20: Molecular and data visualization in drug discovery

SAR Maps - R1 vs. R2 on a Core

Sele

ctiv

e fo

r p

rote

in 1

pIC

50

2 ‒

pIC

501

S

elec

tive

fo

r p

rote

in 2

R1 R

2

Core “scaffold”:

D. K. Agrafiotis et al. SAR Maps:  A New SAR Visualization Technique for Medicinal Chemists. J. Med. Chem., 2007, 50 (24), 5926–5937.

Page 21: Molecular and data visualization in drug discovery

Clustering

• Based on chemical descriptors, biological activity, etc…

• Agglomerative or hierarchical

Hoek, Keith S. et al.: Metastatic potential of melanomas defined by specific gene expression profiles with no BRAF signature. Pigment Cell Research 19 (4), 290-302

http://chemmine.ucr.edu/help/

Molecules Genes

Page 22: Molecular and data visualization in drug discovery

Limitations of Clustering

Molecule single cluster, can be limiting

seals (fur)

?

singleton

?

ducks (bill)

?

penguins (flipper)

?

Cluster 3 Cluster 10

similar molecules ≠ same cluster

Many singletons

Complete Link Cluster ID

Clu

ster

Siz

e

Page 23: Molecular and data visualization in drug discovery

Automatic Decomposition into

(All) Overlapping Scaffolds Malarial parasite assay pIC50 8.1

… 49 total

… 226 total

2 total

Molecule

Scaffold(s)

Related Molecules

Page 24: Molecular and data visualization in drug discovery

8.2

Avg pIC50 8.15

Avg pIC50 7.8

Avg pIC50 7.8

Next Step: Combine with Activities and Properties

… 49 total

… 226 total

2 total

8.5

8.2

8.0

7.5

7.7

8.5

7.4

7.9

7.7 8.2

Molecule

Scaffold(s)

Annotation

Related Molecules

Page 25: Molecular and data visualization in drug discovery

Case Study: Linking Molecules By Scaffolds

• Use aggregate properties for decision making

• Find related molecules with improved properties

Improving property 1

Imp

rovi

ng

act

ivit

y 2

Aggregate (scaffold)

↓ Drill down

(8 molecules)

Improving activity 3

Im

pro

vin

g p

rop

erty

4

> Keep top half of molecule,

substitute bottom half

Example 1 Example 2

Page 26: Molecular and data visualization in drug discovery

Summary and Lessons Learned

• Drug discovery has specialized types of data that are best understood by visualization

• Good visualizations can support the making of good decisions (and the converse: GIGO…)

• The human element is important – visuals and analytics should be creatable/usable by scientists

• As new visual analytics experts, consider careers in an industry where you can add value and be creative

– Subtle plug for drug discovery

Page 27: Molecular and data visualization in drug discovery

Future Directions and Challenges in Data Visualization for Drug Discovery

• Human vs. Machine or Human + Machine ?

• Automate tediousness of data prep/integration

• Intuitiveness by design

• Interconnection by design

• Integration of latest visualization techniques developed for other domains

• Using emerging media eg. VR, Kinect

• What can you think of?