med264 structural bioinformatics

47
Structural Bioinformatics with Examples Drawn from Our Own Work Philip E. Bourne Professor of Pharmacology UCSD Associate Vice Chancellor for Innovation & Industry Alliances [email protected] 1 MED264 10/24/13

Upload: philip-bourne

Post on 06-May-2015

1.026 views

Category:

Technology


2 download

DESCRIPTION

What constitutes structural bioinformatics and 2 example areas from our own work - studying evolution using structure and what really happens when we take a drug. Presented to UCSD medical students in years 1-3

TRANSCRIPT

Page 1: Med264 Structural Bioinformatics

Structural Bioinformaticswith Examples Drawn from Our Own Work

Philip E. Bourne Professor of Pharmacology UCSD

Associate Vice Chancellor for Innovation & Industry Alliances

[email protected]

1MED26410/24/13

Page 2: Med264 Structural Bioinformatics

How I Got Excited

Page 3: Med264 Structural Bioinformatics

Some Things Stay with You Your Whole Life

Page 4: Med264 Structural Bioinformatics

Num

ber

of r

elea

sed

entr

ies

Drivers: Numbers & Complexity

Courtesy of the RCSB Protein Data Bank4MED26410/24/13

Page 5: Med264 Structural Bioinformatics

Putting Structural Bioinformatics in Perspective

MED264 5

PharmacyInformatics

BiomedicalInformatics

Bioinformatics

Drug dosingPharmacokineticsPharmacy InformationSystems

EHRDecision support systemsHospital Information Systems

AlgorithmsGenomicsProteomicsBiological networksSystems Biology

Note: These are only representative examples

10/24/13

Page 6: Med264 Structural Bioinformatics

Putting Structural Bioinformatics in Perspective

MED264 6

PharmacyInformatics

BiomedicalInformatics

Bioinformatics

Controlled vocabulariesOntologiesLiterature searchingData managementPharmacogenomicsPersonalized medicine

Note: These are only representative examples

10/24/13

Structural Bioinformatics

Page 7: Med264 Structural Bioinformatics

7MED26410/24/13

Page 8: Med264 Structural Bioinformatics

Structural Bioinformatics – Example Topics

• Structure prediction• Evolution• Drug discovery• Sequence-structure-

function relationships….

10/24/13 MED264 8

Video: http://www.scivee.tv/node/11616

Page 9: Med264 Structural Bioinformatics
Page 10: Med264 Structural Bioinformatics

10

Determining 3D Structures – X-ray Crystallography

Basic Steps

Target Selection

Crystallomics• Isolation,• Expression,• Purification,• Crystallization

DataCollection

StructureSolution

StructureRefinement

Functional Annotation Publish

Structural biology moves from being functionally driven to genomically driven

Fill inprotein fold

space

Robotics-ve data

Software engineering Functional prediction

Notnecessarily

MED26410/24/13

Page 11: Med264 Structural Bioinformatics

Enough background lets look at two fundamental questions where structural bioinformatics is critical

1. Is structure useful in studying evolution and what can we learn?

2. What really happens when we take a drug?

10/24/13 MED264 11

Page 12: Med264 Structural Bioinformatics

Nature’s ReductionismThere are ~ 20300 possible proteins>>>> all the atoms in the Universe

~45M protein sequences from UniProt

~90,000 protein structures Yield ~1500 folds, ~2000 superfamilies,

~4000 families (SCOP 1.75)10/24/13 MED264 12

Page 13: Med264 Structural Bioinformatics

13

Structure Provides an Evolutionary Fingerprint

Distribution among the three kingdoms as taken from SUPERFAMILY

• Superfamily distributions would seem to be related to the complexity of life

Eukaryota (650)

Archaea (416) Bacteria (564)

2 42

10

135

118

387

17

SCOP fold (765 total)

1

153/14

9/1

21/2 310/0645/49

29/0 68/0

Any genome / All genomes

10/24/13 MED264

Page 14: Med264 Structural Bioinformatics

14

Method – Distance Determination

(FSF)SCOP

SUPERFAMILY

organisms

C. intestinalis C. briggsae F. rubripes

a.1.1 1 1 1

a.1.2 1 1 1

a.10.1 0 0 1

a.100.1 1 1 1

a.101.1 0 0 0

a.102.1 0 1 1

a.102.2 1 1 1

C. intestinalis C. briggsae F. rubripes

C. intestinalis 0 101 109

C. briggsae 0 144

F. rubripes 0

Presence/Absence Data Matrix

Distance Matrix

10/24/13 MED264

Page 15: Med264 Structural Bioinformatics

15

If Structure is so Conservedis it a Useful Tool in the Study of Evolution?

The Answer Would Appear to be Yes

• It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies (FSFs) within a given proteome

Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8

10/24/13 MED264

Page 16: Med264 Structural Bioinformatics

16

The Influence of Environment on Life

Chris Dupont Scripps Institute of Oceanography

UCSD

DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827

10/24/13 MED264

Page 17: Med264 Structural Bioinformatics

17

Consider the Distribution of Disulfide

Bonds among Folds • Disulphides are only stable under

oxidizing conditions• Oxygen content gradually

accumulated during the earth’s evolution

• The divergence of the three kingdoms occurred 1.8-2.2 billion years ago

• Oxygen began to accumulate ~ 2.0 billion years ago

• Logical deduction – disulfides more prevalent in folds (organisms) that evolved later

• This would seem to hold true

• Can we take this further?

Eukaryota

Archaea Bacteria

0% (0/2)

16.7% (7/42)

0% (0/10)

31.9% (43/135)

14.4% (17/118) 4.7%

(18/387)

5.9% (1/17)

SCOP fold (708 total)

1

10/24/13 MED264

Page 18: Med264 Structural Bioinformatics

18

Evolution of the Earth• 4.5 billion years of change• 300+50K• 1-5 atmospheres• Constant photoenergy• Chemical and geological

changes• Life has evolved in this time

• The ocean was the “cradle” for 90% of evolution

10/24/13 MED264

Page 19: Med264 Structural Bioinformatics

19

• Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines).

• The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom.

0

0.5

1

1.00E-20

1.00E-16

1.00E-12

1.00E-08

1.00E-15

1.00E-12

1.00E-09

1.00E-06

1.00E-11

1.00E-09

1.00E-07

00.511.522.533.544.5

Billions of years before present

Concentration

(O2

in arbitrary units, Zn and Fe in m

oles L-1

BacteriaArchaea

Eukarya

Oxygen

Zinc

Iron

CobaltManganese

Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History

Replotted from Saito et al, 2003Inorganica Chimica Acta 356: 308-318

10/24/13 MED264

Page 20: Med264 Structural Bioinformatics

20

The Gaia Hypothesis

Gaia - a complex entity involving the Earth's biosphere, atmosphere, oceans, and soil; the totality constituting a feedback system which seeks an optimal physical and chemical environment for life on this planet.

James Lovelock

Gaia (pronounced /'geɪ.ə/ or /'gaɪ.ə/) "land" or "earth", from the Greek Γαῖα; is a Greek goddess personifying the Earth

10/24/13 MED264

Page 21: Med264 Structural Bioinformatics

21

The Question

• Have the emergent properties of an organism as judged by its protein content been influenced by the environment?

• Will do this by consideration of the metallomes of a broad range of species

• The metallomes can only be deduced by consideration of the protein structures to which the metal is covalently bound

• Will hypothesize that these emergent properties in turn influenced the environment

10/24/13 MED264

Page 22: Med264 Structural Bioinformatics

22

Bacteria Fe superfamilies

a.1.1 a.1.2

a.104.1 a.110.1

a.119.1 a.138.1

a.2.11 a.24.3

a.24.4 a.25.1

a.3.1 a.39.3

a.56.1 a.93.1

b.1.13 b.2.6

b.3.6 b.33.1

b.70.2 b.82.2

c.56.6 c.83.1

c.96.1 d.134.1

d.15.4 d.174.1

d.178.1 d.35.1

d.44.1 d.58.1

e.18.1 e.19.1

e.26.1 e.5.1

f.21.1 f.21.2

f.24.1 f.26.1

g.35.1 g.36.1

g.41.5

Eukaryotic Fe superfamilies

a.1.1 a.1.2

a.104.1 a.110.1

a.119.1 a.138.1

a.2.11 a.24.3

a.24.4 a.25.1

a.3.1 a.39.3

a.56.1 a.93.1

b.1.13 b.2.6

b.3.6 b.33.1

b.70.2 b.82.2

c.56.6 c.83.1

c.96.1 d.134.1

d.15.4 d.174.1

d.178.1 d.35.1

d.44.1 d.58.1

e.18.1 e.19.1

e.26.1 e.5.1

f.21.1 f.21.2

f.24.1 f.26.1

g.35.1 g.36.1

g.41.5

Superfamily Distribution As Well As Overall Content Has Changed

10/24/13 MED264

Page 23: Med264 Structural Bioinformatics

23

Metal Binding Proteins are Not Consistent Across Superkingdoms

0

1

2

Zn Fe Mn Co

Archaea Bacteria Eukarya

Total domains in a proteome

Tot

al Z

n-bi

ndin

g do

mai

ns in

a p

rote

ome

10

104

102.5 105

Slo

pe o

f fi

tted

pow

er la

w

A B

Since these data are derived from current species they are independent ofevolutionary events such as duplication, gene loss, horizontal transfer andendosymbiosis

10/24/13 MED264

Page 24: Med264 Structural Bioinformatics

Power Laws: Fundamental Constants in the Evolution of Proteomes

A slope of 1 indicates that a group of structural domains is in equilibrium with genome

growth, while a slope > 1 indicates that the group of domains is being preferentially

duplicated (or retained in the case of genome reductions).

van Nimwegen E (2006) in: Koonin EV, Wolf YI, Karev GP, (Ed.). Power laws, scale-free networks, and genome biology

10/24/13 MED264 24

Page 25: Med264 Structural Bioinformatics

25

Why are the Power Laws Different for Each Superkingdom?

• Power laws are likely influenced by selective pressure. Qualitatively, the differences in the power law slopes describing Eukarya and Prokarya are correlated to the shifts in trace metal geochemistry that occur with the rise in oceanic oxygen

• We hypothesize that proteomes contain an imprint of the environment at the time of the last common ancestor in each Superkingdom

• This suggests that Eukarya evolved in an oxic environment, whereas the Prokarya evolved in anoxic environments

10/24/13 MED264

Page 26: Med264 Structural Bioinformatics

26

Do the Metallomes Contain Further Support for this Hypothesis?

Overall percent of Fe bound bySuperkingdom Fold Family % Fe-binding O2 Fe-S heme amino

Cytochrome P450 0.44 + 0.48 heme yesCytochrome c3-like 0.13 + 0.3 heme noCytochrome b5 0.12 + 0.09 heme no

Eukarya Purple acid phosphatase 0.11 + 0.08 amino no 21 + 9 47 + 19 32 + 12Penicillin synthase-like 0.07 + 0.1 amino yesHypoxia-inducible factor 0.07 + 0.04 amino yesDi-heme elbow motif 0.06 + 0.01 heme no

4Fe-4S ferredoxins 1.80 + 0.7 Fe-S noMoCo biosynthesis proteins 1.60 + 0.3 Fe-S noHeme-binding PAS domain 1.10 + 1.0 heme no

Archaea HemN 0.80 + 0.20 Fe-S 1 68 + 12 13 + 14 19 + 6a helical ferrodoxin 0.60 + 0.16 Fe-S nobiotin synthase 0.55 + 0.1 Fe-S noROO N-terminal domain-like 0.5 + 0.1 amino 2

High potential iron protein 0.38 + 0.25 Fe-S noHeme-binding PAS domain 0.3 + 0.4 heme 1MoCo biosynthesis proteins 0.21 + 0.15 Fe-S no

Bacteria HemN 0.2 + 0.15 Fe-S no 47 + 11 22 + 12 31 + 164Fe-4S ferredoxins 0.2 + 0.2 Fe-S nocytochrome c 0.14 + 0.2 heme noa helical ferrodoxin 0.12 + 0.09 Fe-S no

1. Some, but not all, PAS domains actually sense oxygen2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway

10/24/13 MED264

Page 27: Med264 Structural Bioinformatics

27

e- Transfer ProteinsSame Broad Function, Same Metal, Different Chemistry

Induced by the Environment?

Fe-S clustersFe bound by S

Cluster held in place by Cys

Generally negative reduction potentials

Very susceptible to oxidation

CytochromesFe bound by heme (and

amino-acids)

Generally positive reduction potentials

Less susceptible to oxidation

10/24/13 MED264

Page 28: Med264 Structural Bioinformatics

28

Hypothesis

• Emergence of cyanobacteria changed oxygen concentrations

• Impacted relative metal ion concentrations in the ocean

• Organisms evolved to use these metals in new ways to evolve new biological processes eg complex signaling

• This in turn further impacted the environment

• Only protein structures could reveal such dependencies

10/24/13 MED264

Page 29: Med264 Structural Bioinformatics

What really happens when we take a drug?

MED264 2910/24/13

Page 30: Med264 Structural Bioinformatics

Our Motivation• Tykerb – Breast cancer

• Gleevac – Leukemia, GI cancers

• Nexavar – Kidney and liver cancer

• Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive

Collins and Workman 2006 Nature Chemical Biology 2 689-700

10/24/13 MED264 30

Page 31: Med264 Structural Bioinformatics

A Reverse Engineering Approach to Drug Discovery Across Gene FamiliesCharacterize ligand binding site of primary target (Geometric Potential)

Identify off-targets by ligand binding site similarity(Sequence order independent profile-profile alignment)

Extract known drugs or inhibitors of the primary and/or off-targets

Search for similar small molecules

Dock molecules to both primary and off-targets

Statistics analysis of docking score correlations

Xie and Bourne 2009 Bioinformatics 25(12) 305-312

31

Page 32: Med264 Structural Bioinformatics

• Initially assign Ca atom with a value that is the distance to the environmental boundary

• Update the value with those of surrounding Ca atoms dependent on distances and orientation – atoms within a 10A radius define i

0.2

0.1)cos(

0.1

i

Di

PiPGP

neighbors

a

Conceptually similar to hydrophobicity or electrostatic potential that is dependant on both global and local environments

Characterization of the Ligand Binding Site - The Geometric Potential

Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9

Page 33: Med264 Structural Bioinformatics

Discrimination Power of the Geometric Potential

0

0.5

1

1.5

2

2.5

3

3.5

4

0 11 22 33 44 55 66 77 88 99

Geometric Potential

binding site

non-binding site

• Geometric potential can distinguish binding and non-binding sites

100 0

Geometric Potential Scale

Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9

For Residue Clusters

Page 34: Med264 Structural Bioinformatics

Local Sequence-order Independent Alignment with Maximum-Weight Sub-Graph Algorithm

L E R

V K D L

L E R

V K D L

Structure A Structure B

• Build an associated graph from the graph representations of two structures being compared. Each of the nodes is assigned with a weight from the similarity matrix

• The maximum-weight clique corresponds to the optimum alignment of the two structures

Xie and Bourne 2008 PNAS, 105(14) 5441

Page 35: Med264 Structural Bioinformatics

Similarity Matrix of Alignment

Chemical Similarity

• Amino acid grouping: (LVIMC), (AGSTP), (FYW), and (EDNQKRH)

• Amino acid chemical similarity matrix

Evolutionary Correlation

• Amino acid substitution matrix such as BLOSUM45

• Similarity score between two sequence profiles

ia

i

ib

ib

i

ia SfSfd

fa, fb are the 20 amino acid target frequencies of profile a and b, respectivelySa, Sb are the PSSM of profile a and b, respectively Xie and Bourne 2008 PNAS, 105(14) 5441

Page 36: Med264 Structural Bioinformatics

We are particularly interested in applying these techniques to

neglected diseases

10/24/13 MED264 36

Page 37: Med264 Structural Bioinformatics

The Problem with Tuberculosis

• One third of global population infected• 1.7 million deaths per year• 95% of deaths in developing countries• Anti-TB drugs hardly changed in 40 years• MDR-TB and XDR-TB pose a threat to

human health worldwide• Development of novel, effective and

inexpensive drugs is an urgent priority

MED264 37

Page 38: Med264 Structural Bioinformatics

The TB-Drugome

1. Determine the TB structural proteome

2. Determine all known drug binding sites from the PDB

3. Determine which of the sites found in 2 exist in 1

4. Call the result the TB-drugome

Kinnings et al 2010 PLoS Comp Biol 6(11): e100097610/24/13 MED264 38

Page 39: Med264 Structural Bioinformatics

1. Determine the TB Structural Proteome

284

1, 446

3, 996 2, 266

TB proteome

homology models

solved structu

res

• High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3%

Kinnings et al 2010 PLoS Comp Biol 6(11): e100097610/24/13 MED264 39

Page 40: Med264 Structural Bioinformatics

2. Determine all Known Drug Binding Sites in the PDB

• Searched the PDB for protein crystal structures bound with FDA-approved drugs

• 268 drugs bound in a total of 931 binding sites

No. of drug binding sites

MethotrexateChenodiol

AlitretinoinConjugated estrogens

DarunavirAcarbose

Kinnings et al 2010 PLoS Comp Biol 6(11): e100097610/24/13 MED264 40

Page 41: Med264 Structural Bioinformatics

Map 2 onto 1 – The TB-Drugomehttp://funsite.sdsc.edu/drugome/TB/

Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red). 10/24/13 MED264 41

Page 42: Med264 Structural Bioinformatics

From a Drug Repositioning Perspective

• Similarities between drug binding sites and TB proteins are found for 61/268 drugs

• 41 of these drugs could potentially inhibit more than one TB protein

No. of potential TB targets

raloxifenealitretinoin

conjugated estrogens &methotrexate

ritonavir

testosteronelevothyroxine

chenodiol

Kinnings et al 2010 PLoS Comp Biol 6(11): e100097610/24/13 MED264 42

Page 43: Med264 Structural Bioinformatics

Top 5 Most Highly Connected Drugs

Drug Intended targets Indications No. of connections TB proteins

levothyroxine transthyretin, thyroid hormone receptor α & β-1, thyroxine-binding globulin, mu-crystallin homolog, serum albumin

hypothyroidism, goiter, chronic lymphocytic thyroiditis, myxedema coma, stupor

14

adenylyl cyclase, argR, bioD, CRP/FNR trans. reg., ethR, glbN, glbO, kasB, lrpA, nusA, prrA, secA1, thyX, trans. reg. protein

alitretinoin retinoic acid receptor RXR-α, β & γ, retinoic acid receptor α, β & γ-1&2, cellular retinoic acid-binding protein 1&2

cutaneous lesions in patients with Kaposi's sarcoma 13

adenylyl cyclase, aroG, bioD, bpoC, CRP/FNR trans. reg., cyp125, embR, glbN, inhA, lppX, nusA, pknE, purN

conjugated estrogens estrogen receptor

menopausal vasomotor symptoms, osteoporosis, hypoestrogenism, primary ovarian failure

10

acetylglutamate kinase, adenylyl cyclase, bphD, CRP/FNR trans. reg., cyp121, cysM, inhA, mscL, pknB, sigC

methotrexatedihydrofolate reductase, serum albumin

gestational choriocarcinoma, chorioadenoma destruens, hydatidiform mole, severe psoriasis, rheumatoid arthritis

10

acetylglutamate kinase, aroF, cmaA2, CRP/FNR trans. reg., cyp121, cyp51, lpd, mmaA4, panC, usp

raloxifeneestrogen receptor, estrogen receptor β

osteoporosis in post-menopausal women 9

adenylyl cyclase, CRP/FNR trans. reg., deoD, inhA, pknB, pknE, Rv1347c, secA1, sigC

10/24/13 MED264 43

Page 44: Med264 Structural Bioinformatics

Chang et al. 2010 Plos Comp. Biol. 6(9): e1000938 &Change et al. 2013 BMC Systems Biology 7:102

Systems Pharmacology

44MED26410/24/13

Page 45: Med264 Structural Bioinformatics

A closing note…

10/24/13 MED264 45

Page 46: Med264 Structural Bioinformatics

Your Social ResponsibilityJosh Sommer and Chordoma Disease

http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation#fullprogram10/24/13 MED264 46

Page 47: Med264 Structural Bioinformatics

Questions?

[email protected]

47MED26410/24/13