anna . tramontano @uniroma1.it

42
[email protected] DIMACS 2006 Quality and effectiveness of protein structure models

Upload: wesley-green

Post on 04-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

Quality and effectiveness of protein structure models. DIMACS 2006. Anna . Tramontano @uniroma1.it. Molecular function. The paradigm. Molecular structure. Sequence. …. Detecting homology. 3.50. 3.00. 2.50. 2.00. 1.50. 1.00. 0.50. 0.00. 1.0. 0.8. 0.6. 0.4. 0.2. 0. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Anna . Tramontano @uniroma1.it

[email protected]

DIM

AC

S 2

006 Quality and

effectiveness of protein structure

models

Page 2: Anna . Tramontano @uniroma1.it

Molecular function

Molecular structure

Sequence

Th

e p

ara

dig

m

Page 3: Anna . Tramontano @uniroma1.it

Protein No.FLAV_CLOBE 1 A . . . I V Y W S G T G N T E K M A ECYSJ_THIRO 2 A . I T I L F G S Q T G N A K A V A E

Dete

cti

ng

hom

olo

gy

Page 4: Anna . Tramontano @uniroma1.it

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

1.0 0.8 0.6 0.4 0.2 0

Fraction sequence identity after structural superposition

r.m

.s.d

. =

[(1

/N)

Σ d

2]1

/2

Chothia and Lesk, EMBO J., 1986

Pro

tein

s e

volv

e

Page 5: Anna . Tramontano @uniroma1.it

AVGIFRAAVCTRGVAKAVDFVP

AVGIFRAAVCTRGVAKAVDFVP| || | | || ||||| ||AIGIWRSATCTKGVAKA--FVA

+

If If the alignment is correct, we can use the Chothia and Lesk relationship to predict the expected quality of the modelC

om

para

tive m

od

ellin

g

Page 6: Anna . Tramontano @uniroma1.it

Orengo, Curr. Op. Str. Biol, 1994

AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP

Score and select modelFold

recog

nit

ion

Page 7: Anna . Tramontano @uniroma1.it

Bystroff and Baker, JMB, 1998

AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF

Fra

gm

en

t b

ased

Page 8: Anna . Tramontano @uniroma1.it

Bystroff and Baker, JMB, 1998

AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF

Fra

gm

en

t b

ased

Page 9: Anna . Tramontano @uniroma1.it

Bystroff and Baker, JMB, 1998

AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF

Fra

gm

en

t b

ased

Page 10: Anna . Tramontano @uniroma1.it

Bystroff and Baker, JMB, 1998

AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF

Fra

gm

en

t b

ased

Page 11: Anna . Tramontano @uniroma1.it

Bystroff and Baker, JMB, 1998

AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF

Score and select modelFra

gm

en

t b

ased

Page 12: Anna . Tramontano @uniroma1.it

Moult et al., Proteins, 1995

CASP: Critical assessment of techniques for protein structure predictionAVSRAFT

RAFTAAFDGHTYIPK

Th

e e

valu

ati

on

Page 13: Anna . Tramontano @uniroma1.it

Tramontano, NSB, 2003

Mod

els

Targ

ets

Gro

up

s

Th

e e

valu

ati

on

0

50

100

150

200

250

300

0

10

20

30

40

50

60

70

1 2 3 4 5

0

5000

10000

15000

20000

25000

30000

6

Page 14: Anna . Tramontano @uniroma1.it

Cozzetto and Tramontano, Proteins, 2004

CASP4 CASP5 CASP6 : Best models

20,00

30,00

40,00

50,00

60,00

70,00

80,00

90,00

100,00

110,00

120,00

0 20 40 60 80

Max

P.A

L0 casp6

casp4

casp5

Th

e e

valu

ati

on

Page 15: Anna . Tramontano @uniroma1.it

http://predictioncenter.govS

tate

of

the a

rt

Moult et al., Proteins, 2005.

Page 16: Anna . Tramontano @uniroma1.it

http://www.caspur.it/PMDB

Castrignano’ et al., NAR, 2006.

Sta

te o

f th

e a

rt

Page 17: Anna . Tramontano @uniroma1.it

Str

uctu

ral g

en

om

ics

Page 18: Anna . Tramontano @uniroma1.it

Protein crystallization

Diffraction datameasurements

Model building Phase estimation

Protein preparation

Mole

cu

lar

rep

lacem

en

t

Page 19: Anna . Tramontano @uniroma1.it

}Rotation

search

Translation search ?

Model

Mole

cu

lar

rep

lacem

en

t

Page 20: Anna . Tramontano @uniroma1.it

Mole

cu

lar

rep

lacem

en

t

Completely automatic procedure:

CASP ModelsMolRep (10x10)AMoRe. (20)RefMac (10)

ArpWarp

Page 21: Anna . Tramontano @uniroma1.it

Giorgetti et al., Bioinformatics, 2005

100

80

60

40

?

Mole

cu

lar

rep

lacem

en

tGDT-TS (distance based measure)= [NCA(1Å)+NCA (2Å)+NCA (4Å)+NCA (8Å)]/4

Page 22: Anna . Tramontano @uniroma1.it

Giorgetti et al., submittedMole

cu

lar

rep

lacem

en

t

What if we don’t know the quality of the model?

What if we don’t know how to build models?

Page 23: Anna . Tramontano @uniroma1.it

Mole

cu

lar

rep

lacem

en

t

Giorgetti et al., submitted

ACTFGARTEADEASRTFCGAVHIGFRLPMNHTYWPLYHMVCS…

Structure factors

Page 24: Anna . Tramontano @uniroma1.it

Mole

cu

lar

rep

lacem

en

t60% success rate

Page 25: Anna . Tramontano @uniroma1.it

Mole

cu

lar

rep

lacem

en

t60% success rate

If one of the retrieved models

works, the procedure is successful

Page 26: Anna . Tramontano @uniroma1.it

molecular

bio

log

ical

bloodcoagulation

catalityc activity

cellu

lar

extra cellular

Fu

ncti

on

pre

dic

tion

Page 27: Anna . Tramontano @uniroma1.it

Moult et al., Proteins, 1995

AVSRAFTRAFTAAFDGHTYIPK

The experiment

?

Page 28: Anna . Tramontano @uniroma1.it

Scheme of the experiment

Collect known info on targets

Ask people to provide ADDITIONAL information

Compare predictions

Is there a consensus?

Once the structure is known, can we say more?

Fu

ncti

on

pre

dic

tion

Page 29: Anna . Tramontano @uniroma1.it

EC Number BindingBinding site(s)Residue role(s)PT modificationsFree text comments

Soro and Tramontano, Proteins 2005

Fu

ncti

on

pre

dic

tion

Page 30: Anna . Tramontano @uniroma1.it

We had too few predictions per target to derive any

sensible conclusion.

However, for the sake of the experiment, we tried to see what we could do and which

would be the problems in analysing the data (other

than the format) pretending that the numbers were

significant.

Fu

ncti

on

pre

dic

tion

Page 31: Anna . Tramontano @uniroma1.it

Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:

General function prediction only; Category: Poorly characterized)

Predictions:

GO number GO name frequency

287 magnesium ion binding 1

4176 ATP-dependent peptidase activity 1

4475 mannose-1-phosphate guanylyltransferase activity 1 1

4476 4672 protein kinase activity 1

5094 Rho GDP-dissociation inhibitor activity 1

5554 Molecular function unknown 1 -

6812 PROCESS (1)

6825 PROCESS (1)

8170 N-methyltransferase activity 1

16822 hydrolase activity, acting on acid carbon-carbon bonds 1

46872 metal ion binding 1

Fu

ncti

on

pre

dic

tion

Page 32: Anna . Tramontano @uniroma1.it

Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:

General function prediction only; Category: Poorly characterized)

Predictions:

GO number GO name frequency GO Parents

287 magnesium ion binding 1 46872, 43167, 5488

4176 ATP-dependent peptidase activity 1 8233, 16787, 3824

4475 mannose-1-phosphate guanylyltransferase

activity 1 8905, 16779, 16772, 16740, 3824

4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824

5094 Rho GDP-dissociation inhibitor

activity1 1 5092, 5083, 30695, 30234

8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824

16822 hydrolase activity, acting on

acid carbon-carbon bonds 1 16787, 3824

46872 metal ion binding 1 43167, 5488

Fu

ncti

on

pre

dic

tion

Page 33: Anna . Tramontano @uniroma1.it

Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:

General function prediction only; Category: Poorly characterized)

Predictions:

GO number GO name frequency GO Parents

287 magnesium ion binding 1 46872, 43167, 5488

4176 ATP-dependent peptidase activity 1 8233, 16787, 3824

4475 mannose-1-phosphate guanylyltransferase

activity 1 8905, 16779, 16772, 16740, 3824

4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824

5094 Rho GDP-dissociation inhibitor

activity1 1 5092, 5083, 30695, 30234

8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824

16822 hydrolase activity, acting on

acid carbon-carbon bonds 1 16787, 3824

46872 metal ion binding 1 43167, 5488

Fu

ncti

on

pre

dic

tion

Page 34: Anna . Tramontano @uniroma1.it

Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:

General function prediction only; Category: Poorly characterized)

Predictions:

GO number GO name frequency GO Parents

287 magnesium ion binding 1 46872, 43167, 5488

4176 ATP-dependent peptidase activity 1 8233, 16787, 3824

4475 mannose-1-phosphate guanylyltransferase

activity 1 8905, 16779, 16772, 16740, 3824

4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824

5094 Rho GDP-dissociation inhibitor

activity1 1 5092, 5083, 30695, 30234

8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824

16822 hydrolase activity, acting on

acid carbon-carbon bonds 1 16787, 3824

46872 metal ion binding 1 43167, 5488

16787 hydrolase 3824 catalyitic activity16740 transferase activity

Fu

ncti

on

pre

dic

tion

Page 35: Anna . Tramontano @uniroma1.it

Target Group Mod GO GO name Target Group Mod GO GO name T0226 P0009 1 4347 glucose-6-phosphate isomerase activity T0263 P0049 1 3754 chaperone activity

P0050 1 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 1 5098

Ran GTPase activator activity

P0050 3 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 2 5098

Ran GTPase activator activity

P0070 1 4347 glucose-6-phosphate isomerase activity P0070 1 4497 monooxygenase activity

P0344 1 4360 glutamine-fructose-6-phosphate transaminase (isomerizing) activity P0237 1 16491

oxidoreductase activity

Cons Transferase activity/Isomerase activity P0344 1 3676 nucleic acid binding

T0243 P0050 2 3980 UDP-glucose:glycoprotein glucosyltransferase activity

Cons

Binding/Oxidoreductase activity/Enzyme regulator activity

P0050 3 4581 dolichyl-phosphate beta-glucosyltransferase activity T0266 P0003 1 3723

RNA binding

P0070 1 3700 transcription factor activity P0049 1 3754 chaperone activity P0237 1 3677 DNA binding P0050 1 4587 ornithine-oxo-acid transaminase activity P0344 1 3677 DNA binding P0050 3 4047 aminomethyltransferase activity Cons DNA binding/Transferase activity P0096 1 4827 proline-tRNA ligase activity T0249 P0003 1 3677 DNA binding P0237 1 166 nucleotide binding P0070 1 3700 transcription factor activity P0726 1 4812 tRNA ligase activity

P0100 1 3700 transcription factor activity

Cons

Binding/ Transport activity/ Transferase activity

P0237 1 3677 DNA binding P0344 1 3677 DNA binding P0589 1 3700 transcription factor activity

Cons

DNA binding/Transcription factor activity

Results: GO consensus

Soro and Tramontano, Proteins, 2005

Fu

ncti

on

pre

dic

tion

Page 36: Anna . Tramontano @uniroma1.it

18 months later…

Annotations in DB decreased by 5%

24 new targets were annotated

We looked at methods (abstracts, directly contacting predictors, literature)

Fu

ncti

on

pre

dic

tion

Page 37: Anna . Tramontano @uniroma1.it

4

5

2

2

2

1

11

11011

10011

10100

10001

10000

11100

10101

11001Fu

ncti

on

pre

dic

tion

Page 38: Anna . Tramontano @uniroma1.it

18 months later…

4 newly annotated targets had been correctly predicted by at least one method

85% of the consensus non

redundant predictions were correct

Fu

ncti

on

pre

dic

tion

Page 39: Anna . Tramontano @uniroma1.it

Target Group Mod GO GO name Target Group Mod GO GO name T0226 P0009 1 4347 glucose-6-phosphate isomerase activity T0263 P0049 1 3754 chaperone activity

P0050 1 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 1 5098

Ran GTPase activator activity

P0050 3 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 2 5098

Ran GTPase activator activity

P0070 1 4347 glucose-6-phosphate isomerase activity P0070 1 4497 monooxygenase activity

P0344 1 4360 glutamine-fructose-6-phosphate transaminase (isomerizing) activity P0237 1 16491

oxidoreductase activity

Cons Transferase activity/Isomerase activity P0344 1 3676 nucleic acid binding

T0243 P0050 2 3980 UDP-glucose:glycoprotein glucosyltransferase activity

Cons

Binding/Oxidoreductase activity/Enzyme regulator activity

P0050 3 4581 dolichyl-phosphate beta-glucosyltransferase activity T0266 P0003 1 3723

RNA binding

P0070 1 3700 transcription factor activity P0049 1 3754 chaperone activity P0237 1 3677 DNA binding P0050 1 4587 ornithine-oxo-acid transaminase activity P0344 1 3677 DNA binding P0050 3 4047 aminomethyltransferase activity Cons DNA binding/Transferase activity P0096 1 4827 proline-tRNA ligase activity T0249 P0003 1 3677 DNA binding P0237 1 166 nucleotide binding P0070 1 3700 transcription factor activity P0726 1 4812 tRNA ligase activity

P0100 1 3700 transcription factor activity

Cons

Binding/ Transport activity/ Transferase activity

P0237 1 3677 DNA binding P0344 1 3677 DNA binding P0589 1 3700 transcription factor activity

Cons

DNA binding/Transcription factor activity

Results: GO consensus

Soro and Tramontano, Proteins, 2005

Fu

ncti

on

pre

dic

tion

Page 40: Anna . Tramontano @uniroma1.it

**

*

***

Fu

ncti

on

pre

dic

tion

Page 41: Anna . Tramontano @uniroma1.it

CASP is about to start again:

We will start collecting targets next week

There will be a few differences

http://predictioncenter.org

An

nou

ncm

en

ts

Page 42: Anna . Tramontano @uniroma1.it

BioSapiens - EU VI FrameworkMinistero della Salute

Universita' di Roma Istituto Pasteur Roma

Facolta' di Medicina San Paolo

CNR

Claudia Bonaccini Michele CerianiDomenico Cozzetto Emanuela GiombiniAlejandro GiorgettiPaolo MarcatiliVeronica MoreaRomina OlivaMassimiliano OrsiniMarialuisa Pellegrini Domenico Raimondo Simonetta Soro Ivano Talamo

Krzysztof FidelisTim Hubbard

Andriy KryshtafovychJohn Moult

Burkhard RostAdam Zemla

Structural biologistsPredictors

Ackn

ow

led

gem

en

ts