protein structure space patrice koehl computer science and genome center koehl

38
Protein Structure Space Patrice Koehl Computer Science and Genome Center http://www.cs.ucdavis.edu/~koehl/

Post on 21-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Protein Structure Space

Patrice Koehl

Computer Science and Genome Center

http://www.cs.ucdavis.edu/~koehl/

Page 2: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

From Sequence to Function

KKAVINGEQIRSISDLHQTLKKWELALPEYYGENLDALWDCLTGVEYPLVLEWRQFEQSKQLTENGAESVLQVFREAKAEGCDITI

Sequence

Structure

Function

ligand

Page 3: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Protein Structure Space1CTF 1TIM 1K3R

1A1O 1NIK 1AON

68 AA 247 AA 268 AA

384 AA 4504 AA 8337 AA

Page 4: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Outline

•Protein Shape DescriptorsDifferential Geometry Tools

•Classifying Proteins

The Shapes of Protein Structures

•Protein Structure SpaceDimension?

•Complexity of Protein Structures

Are Proteins 3D, or 1D objects?

Page 5: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Outline

•Protein Shape DescriptorsDifferential Geometry Tools

•Classifying Proteins

The Shapes of Protein Structures

•Protein Structure SpaceDimension?

•Complexity of Protein Structures

Are Proteins 3D, or 1D objects?

Page 6: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Classification of Protein Structure: CATH

C

A

T

AlphaMixed AlphaBeta

Beta

Sandwich

Tim BarrelOther Barrel

Super RollBarrel

Page 7: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Protein Structure Space

Test set2,930 proteins out of 23,000 proteins in PDBNo sequence similarity (Fasta E-value < e-4)

Reference structural similarity defined from CATH769 folds

104,000 pairs of similar structures out of 4,600,000 pairs

Performance measure: ROC curve(Receiver Operating Characteristic)

Page 8: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

⎥⎥⎥

⎢⎢⎢

⎡=

0...

...0...

...0

1

1

N

N

d

d

D XXG T= X

Distance Matrix Metric Matrix Points in Space

Projecting Protein Structure Space

Page 9: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

⎥⎥⎥

⎢⎢⎢

⎡=

0...1

...01

110

D

⎥⎥⎥

⎢⎢⎢

⎡=

0...0

...01

010

D

k

k

k

k

Class

Fold

Projecting Protein Structure Space

Page 10: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Protein Structure Similarity

Root mean square distance: cRMS:

( ) ∑=

−−=N

iii TRba

NBAcRMS

1

21,

N: number of equivalent atoms between A and BR, T: rigid transformation that minimizes cRMS.

Page 11: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Eigenvalues of the Metric Matrix:

Protein Structure Classes

Measure of Structure Similarity:cRMS after Optimal Superposition(Structal)

Page 12: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

A Picture of the Protein Structure Space

Proteins

Proteins

α and Proteins

Page 13: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

A Picture of the Protein Structure Space

1a81G2

1sfcK0

1repC21bdo00

2bi6H0

Proteins

Proteins

α and Proteins

Page 14: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Outline

•Protein Shape DescriptorsDifferential Geometry Tools

•Classifying Proteins

The Shapes of Protein Structures

•Protein Structure SpaceDimension?

•Complexity of Protein Structures

Are Proteins 3D, or 1D objects?

Page 15: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

0 10 20 30 40 50 60 70 80 90 100

Rate of true negatives (%)

100

90

80

70

60

50

30

20

Rat

e of

true

pos

itive

s (%

)

40

10

Random measureArea = 0.5

“Perfect” measureArea = 1.0

Protein Fold SpaceROC Analysis

(Receiver Operating Characteristic)

Page 16: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Protein Fold SpaceROC Analysis

(Receiver Operating Characteristic)

True positives

pairs of proteins that belong to the same T class of CATH

True negatives

pairs of proteins that belong to the same C class, but not thesame T class.

Page 17: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Rate of true negatives (%)

Rat

e of

true

pos

itive

s (%

)

Protein Fold Space

Fasta: 0.54

CATH Class : 0.51

CATH Fold20 : 0.98 Fold20: first 20coordinates derivedfrom the CATHfold matrix

CATH class: first 3coordinates derivedfrom the CATHclass matrix

Page 18: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Rate of true negatives (%)

Rat

e of

true

pos

itive

s (%

)

Protein Fold Space

Fasta: 0.54Fasta: 0.54

Structal: 0.88

Page 19: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Protein Structure Features

()zyxzyxˆsin2),(),,( dR =

{ }),,(min)(),(

zyxxzy

R=ρ

y

x

z

R(x,y,z)

{ })(min xx

ρ=Δ

Global radius of curvature:

Thickness:

(Gonzalez & Maddocks, PNAS, 1999, 96:4769)

Page 20: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Thickness of a protein structure

= 2.60 Ǻ

Page 21: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Curvature Feature Vector

p

zyxpp dCdCdCzyxR

U/1

),,(

1⎟⎟⎠

⎞⎜⎜⎝

⎛= ∫∫∫

[ ]543215 UUUUUC =

Page 22: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Performance of the Curvature Feature Vector

Curvature vector performs better than fasta.

Needs more features to match Structal.

Rate of true negatives (%)

Rat

e of

true

pos

itive

s (%

)

Fasta: 0.54

Structal: 0.88

C5: 0.65

Page 23: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Protein Structure Features: Writhing

∫∫=2 2121 ),(

4

1dtdtttWr ω

π

Fain and Røgen, PNAS, 100: 119 (2003)

( )3

21

121121

)()(

)('),()(),('det),(

tt

tttttt

γγ

γγγγω

−=

Writhing Number

+ -

Sign of Crossing

(t1)

(t2)

[ ]|||| 12121110 WrWrWrWrW =

Writhe Feature Vector for Each Protein

1

Page 24: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Protein Structure Features: Writhing

W10 Writhe performs better than C5 Curvature

Rat

e of

true

pos

itive

s (%

)

Rate of true negatives (%)

Fasta: 0.54

C5: 0.65

Structal: 0.88

W10: 0.77

Page 25: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Outline

•Protein Shape DescriptorsDifferential Geometry Tools

•Classifying Proteins

The Shapes of Protein Structures

•Protein Structure SpaceDimension?

•Complexity of Protein Structures

Are Proteins 3D, or 1D objects?

Page 26: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Clustering Protein Fragments to Extract a Small Set of Representatives (a Library)

data clustereddata

library

(Simulated annealing K means)

Page 27: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Generating an approximate structure

Fragment library

A B C D

Page 28: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Generating an approximate structure

Fragment library

A B C D

Page 29: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Fragment library

Generating an approximate structure

A B C D

Page 30: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Fragment library

Generating an approximate structure

A B C D

Page 31: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Fragment library

Generating an approximate structure

A B C D

Structural Sequence:

AC

Page 32: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Fitting Protein Structures

better

100 fragments of length 50.91 Ǻ cRMS

50 fragments of length 72.78 Ǻ cRMS

Page 33: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Complexity(states/residue)

Ave

rage

cR

MS

dis

tan

ce

Longer fragments give better fit at same complexity

Fragment Size:7 residues6 residues5 residues4 residues

(Kolodny, Koehl, Guibas, Levitt, J. Mol. Biol.,323, 297 2002)

3

1

−= LNC N: number of fragmentsL: size of each fragment

Page 34: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Choosing the “right” library

Size L N such that Complexity=20

7 160000

6 8000

5 400

4 20

Page 35: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

cRMS model-experimental structure cRMS model-experimental structure

0.2 0.6 1.0 0.2 0.6 1.0

# of

str

uct

ure

sA Structural Alphabet for Protein Backbone

Pro

tein

siz

e

Fragment size: 4Number of fragment: 20

Page 36: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Structural Alphabet: Application to Structure Comparison

cRMS = 1Å

Page 37: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Collaborators• Michael Levitt

(Computational Biology)

Stanford University

• Marc Delarue

(Biophysics)

Institut Pasteur, Paris

• Rachel Kolodny

(Computer Science)

Columbia University

• Herbert Edelsbrunner

(Math/Computer Science)

Duke University

• Peter Roegen

(Math)

DTU, Denmark

• Joel Hass

(Math)

UC Davis

Page 38: Protein Structure Space Patrice Koehl Computer Science and Genome Center koehl

Thank You