a network-based representation of protein fold space spencer bliven qualifying examination6/6/2011

21
A network-based representation of protein fold space Spencer Bliven Qualifying Examination 6/6/2011

Upload: randolph-sullivan

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

A network-based representation of protein fold space

Spencer Bliven

Qualifying Examination 6/6/2011

Page 2: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Overview1. Background & Motivation

2. Preliminary Research

3. Proposed Future Research

Page 3: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Fold SpaceWhat protein folds are possible?

Discrete or Continuous? Both? Neither?

What portion of fold space is utilized by nature?

Long debated questions. Why?Understanding of structure-function relationshipProtein design/engineeringProtein evolutionClassification

Page 4: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Previous Work Orengo, Flores, Taylor,

Thornton. Protein Eng (1993) vol. 6 (5) pp. 485-500

Holm and Sander. J Mol Biol (1993) vol. 233 (1) pp. 123-38

Holm and Sander. Science (1996) vol. 273 (5275) pp. 595-603

Shindyalov and Bourne. Proteins (2000) vol. 38 (3) pp. 247-60

Hou, Sims, Zhang, Kim. PNAS (2003) vol. 100 (5) pp. 2386-90

Taylor. Curr Opin Struct Biol (2007) vol. 17 (3) pp. 354-61

Sadreyev et al. Curr Opin Struct Biol (2009) vol. 19 (3) pp. 321-8

α

α+β

β

α/β

Page 5: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Why can we do better?More structures

Sampling of globular folds “saturated”Few novel folds being discoveredGeometric arguments for saturation of

small protein folds

Recent all-vs-all computationCluster sequence to 40% identity17,852 representative (updated weekly)189 million FATCAT rigid-body alignments

73503

http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=total&seqid=100Accessed 5/31/2011

Page 6: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Structural Similarity Graph Nodes: PDB chains,

non-redundant to 40%

Edges: FATCAT-rigid alignments

“Significant” edges: p<0.001 Length > 25 Coverage > 50

Hierarchically cluster to reduce complexity in visualization

aba/ba+bMultiMembraneSmall

Page 7: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Agreement with SCOP

Class p<10-6

Fold p<10-7

Superfamily p<10-10

Page 8: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Continuity

Grishin. J Struct Biol (2001) vol. 134 (2-3) pp. 167-85

Skolnick claims ≤ 7 intermediates between any proteinsWe observe network diameter=15

Can find interesting paths

Page 9: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

C4

C5

C6

C7

Symmetry

Beta Propellers

Page 10: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

SymmetryFunctionally important

Protein evolution (e.g. beta-trefoil)DNA bindingAllosteric regulationCooperativity

Widespread (~20% of proteins)

Focus of algorithmic work

FGF-1 Lee & Blaber. PNAS 2011

TATA Binding Protein1TGH

Hemoglobin4HHB

Page 11: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Cross-class example 3GP6.A

PagP, modifies lipid A f.4.1 (transmembrane

beta-barrel)

1KT6.A Retinol-binding protein b.60.1 (Lipocalins)

Page 12: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Summary of Preliminary Research

Calculated all-vs-all alignment Prlić A, Bliven S, Rose PW, Bluhm WF, Bizon C, Godzik A, Bourne PE. Pre-

calculated protein structure alignments at the RCSB PDB website. Bioinformatics (2010) vol. 26 (23) pp. 2983-2985

Built network of significant alignmentsApproximately matches SCOP classifications

Improved structural alignment algorithms Identify symmetry, circular permutations, topology

independent alignments Discussed more in report

Page 13: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Future ResearchImprove the network

1. Improve all-vs-all comparison algorithm

2. Tune parameters during graph generation

Annotate the network & draw biological inferences3. Annotate nodes with functional information

4. Compare with other networks

Create new networks5. Enhance structural comparison algorithms

Page 14: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

1. Improve all-vs-all comparison algorithm

Need domain decomposition

Use Combinatorial Extension (CE)

Page 15: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

2. Tune parameters during graph generation

Don’t use p-valuesShouldn’t compare p-values, statistically*Not normalized by secondary structureNot accurate due to multiple testing problem

Use TM-scoreRMSD, normalized to the alignment length

Determine optimal thresholds for determining “significance”For instance, train an SVG

* Technically ok here, since one-to-one with the FATCAT score

Page 16: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

FATCAT p-value by Class

Perform poorly on all-alpha in “twilight zone”

Terrible on membrane proteins Probably reflects non-

structural considerations in SCOP assignment

Page 17: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

3. Annotate nodes with functional information

SCOP/CATH classifications

GO terms

Metal binding

Ligand binding

Symmetry

aba/ba+bMultiMembraneSmall

Page 18: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

4. Compare with other networks

Define other types of network over the set of protein representativesProtein-protein interactionsCo-expression

Correlate to the structural similarities

Structural similarity

Protein-protein interaction

Page 19: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

5. Enhance structural comparison algorithms

Improve automated pseudo-symmetry detection

Find topology-independent relationships

C3

Page 20: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

SummaryFold space as network

Improve network creation

Annotate network with functional information

Improve structural similarity detection

Page 21: A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

AcknowledgmentsBourne Lab

Philip Bourne

Andreas Prlić

Lab & PDB members

Qualifying Exam Committee

Ruben Abagyan

Patricia Jennings

Andy McCammon

Collaborators

Philippe Youkharibache

Jean-Pierre Changeux

Rotation Advisors

Pavel Pevzner

Philip Bourne

José Onuchic & Pat Jennings

Mike MacCoss

Virgil Woods