protein secondary structures assignment and prediction pernille haste andersen 17.05.2006

38
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Post on 19-Dec-2015

228 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Protein Secondary Structures

Assignment and prediction

Pernille Haste Andersen

17.05.2006

Page 2: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Outline

• What is protein secondary structure

• How can it be used?

• Different prediction methods– Alignment to homologues– Propensity methods– Neural networks

• Evaluation of prediction methods

• Links to prediction servers

Page 3: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Secondary Structure Elements

ß-strand

Helix

TurnBend

Page 4: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Use of secondary structure

• Classification of protein structures

• Definition of loops (active sites)

• Use in fold recognition methods

• Improvements of alignments

• Definition of domain boundaries

Page 5: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Classification of secondary structure

• Defining features– Dihedral angles– Hydrogen bonds– Geometry

• Assigned manually by crystallographers or• Automatic

– DSSP (Kabsch & Sander,1983)– STRIDE (Frishman & Argos, 1995)– DSSPcont (Andersen et al., 2002)

Page 6: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Dihedral Angles

phi - dihedral angle of the N-Calpha bondpsi - dihedral angle of the Calpha-C bondomega - dihedral angle of the C-N (peptide) bond

From http://www.imb-jena.de

Page 7: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Helices phi(deg) psi(deg) H-bond pattern-----------------------------------------------------------alpha-helix -57.8 -47.0 i+4pi-helix -57.1 -69.7 i+5310 helix -74.0 -4.0 i+3

(omega = 180 deg )From http://www.imb-jena.de

Page 8: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Beta Strands phi(deg) psi(deg) omega (deg)------------------------------------------------------------------beta strand -120 120 180

From http://broccoli.mfn.ki.se/pps_course_96/

Antiparallel

Parallel

Page 9: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Secondary Structure Elements

ß-strand

Helix

TurnBend

Page 10: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Secondary Structure Type Descriptions

* H = alpha helix * G = 310 - helix * I = 5 helix (pi helix)* E = extended strand, participates in beta ladder* B = residue in isolated beta-bridge * T = hydrogen bonded turn * S = bend * C = coil

Page 11: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Automatic assignment programs• DSSP ( http://www.cmbi.kun.nl/gv/dssp/ )• STRIDE ( http://www.hgmp.mrc.ac.uk/Registered/Option/stride.html )• DSSPcont ( http://cubic.bioc.columbia.edu/services/DSSPcont/ )

• The protein data bank visualizes DSSP assignments on structures in the data base

# RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA

1 4 A E 0 0 205 0, 0.0 2,-0.3 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 113.5 5.7 42.2 25.1 2 5 A H - 0 0 127 2, 0.0 2,-0.4 21, 0.0 21, 0.0 -0.987 360.0-152.8-149.1 154.0 9.4 41.3 24.7 3 6 A V - 0 0 66 -2,-0.3 21,-2.6 2, 0.0 2,-0.5 -0.995 4.6-170.2-134.3 126.3 11.5 38.4 23.5 4 7 A I E -A 23 0A 106 -2,-0.4 2,-0.4 19,-0.2 19,-0.2 -0.976 13.9-170.8-114.8 126.6 15.0 37.6 24.5 5 8 A I E -A 22 0A 74 17,-2.8 17,-2.8 -2,-0.5 2,-0.9 -0.972 20.8-158.4-125.4 129.1 16.6 34.9 22.4 6 9 A Q E -A 21 0A 86 -2,-0.4 2,-0.4 15,-0.2 15,-0.2 -0.910 29.5-170.4 -98.9 106.4 19.9 33.0 23.0 7 10 A A E +A 20 0A 18 13,-2.5 13,-2.5 -2,-0.9 2,-0.3 -0.852 11.5 172.8-108.1 141.7 20.7 31.8 19.5 8 11 A E E +A 19 0A 63 -2,-0.4 2,-0.3 11,-0.2 11,-0.2 -0.933 4.4 175.4-139.1 156.9 23.4 29.4 18.4 9 12 A F E -A 18 0A 31 9,-1.5 9,-1.8 -2,-0.3 2,-0.4 -0.967 13.3-160.9-160.6 151.3 24.4 27.6 15.3 10 13 A Y E -A 17 0A 36 -2,-0.3 2,-0.4 7,-0.2 7,-0.2 -0.994 16.5-156.0-136.8 132.1 27.2 25.3 14.1 11 14 A L E >> -A 16 0A 24 5,-3.2 4,-1.7 -2,-0.4 5,-1.3 -0.929 11.7-122.6-120.0 133.5 28.0 24.8 10.4 12 15 A N T 45S+ 0 0 54 -2,-0.4 -2, 0.0 2,-0.2 0, 0.0 -0.884 84.3 9.0-113.8 150.9 29.7 22.0 8.6 13 16 A P T 45S+ 0 0 114 0, 0.0 -1,-0.2 0, 0.0 -2, 0.0 -0.963 125.4 60.5 -86.5 8.5 32.0 21.6 6.8 14 17 A D T 45S- 0 0 66 2,-0.1 -2,-0.2 1,-0.1 3,-0.1 0.752 89.3-146.2 -64.6 -23.0 33.0 25.2 7.6

Page 12: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Secondary Structure Prediction

• What to predict?– All 8 types or pool types into groups

H

E

C

DSSP

Q3

* H = alpha helix * G = 310 -helix * I = 5 helix (pi helix)

* E = extended strand* B = beta-bridge

* T = hydrogen bonded turn * S = bend * C = coil

Page 13: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Straight HEC

Secondary Structure Prediction

• What to predict?– All 8 types or pool types into groups

H

E

C

Q3

* H = alpha helix

* E = extended strand

* T = hydrogen bonded turn * S = bend * C = coil* G = 310-helix* I = 5 helix (pi helix)* B = beta-bridge

Page 14: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Secondary Structure Prediction

• Simple alignments• Align to a close homolog for which the structure has been

experimentally solved.

• Heuristic Methods (e.g., Chou-Fasman, 1974)• Apply scores for each amino acid an sum up over a

window.

• Neural Networks• Raw Sequence (late 80’s)• Blosum matrix (e.g., PhD, early 90’s)• Position specific alignment profiles (e.g., PsiPred, late 90’s)• Multiple networks balloting, probability conversion, output

expansion (Petersen et al., 2000).

Page 15: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Improvement of accuracy

1974 Chou & Fasman ~50-53%1978 Garnier 63%1987 Zvelebil 66%1988 Quian & Sejnowski 64.3%1993 Rost & Sander 70.8-72.0%1997 Frishman & Argos <75%1999 Cuff & Barton 72.9%1999 Jones 76.5%2000 Petersen et al. 77.9%

Page 16: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Simple Alignments

•Solved structure of a homolog to query is needed•Homologous proteins have ~88% identical (3 state) secondary structure • If no close homologue can be identified alignments will give almost random results

Page 17: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Propensities: Amino acid preferences in -Helix

Page 18: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Propensities: Amino acid preferences in -Strand

Page 19: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Propensities: Amino acid preferences in coil

Page 20: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Chou-Fasman propensities

Name P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)Ala 142 83 66 0.06 0.076 0.035 0.058Arg 98 93 95 0.070 0.106 0.099 0.085Asp 101 54 146 0.147 0.110 0.179 0.081Asn 67 89 156 0.161 0.083 0.191 0.091Cys 70 119 119 0.149 0.050 0.117 0.128Glu 151 37 74 0.056 0.060 0.077 0.064Gln 111 110 98 0.074 0.098 0.037 0.098Gly 57 75 156 0.102 0.085 0.190 0.152His 100 87 95 0.140 0.047 0.093 0.054Ile 108 160 47 0.043 0.034 0.013 0.056Leu 121 130 59 0.061 0.025 0.036 0.070Lys 114 74 101 0.055 0.115 0.072 0.095Met 145 105 60 0.068 0.082 0.014 0.055Phe 113 138 60 0.059 0.041 0.065 0.065Pro 57 55 152 0.102 0.301 0.034 0.068Ser 77 75 143 0.120 0.139 0.125 0.106Thr 83 119 96 0.086 0.108 0.065 0.079Trp 108 137 96 0.077 0.013 0.064 0.167Tyr 69 147 114 0.082 0.065 0.114 0.125Val 106 170 50 0.062 0.048 0.028 0.053

Page 21: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Chou-Fasman

• Generally applicable

• Works for sequences with no solved homologs

• But the accuracy is low!

• The problem is that the method does not use enough information about the structural context of a residue

Page 22: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Neural Networks

• Benefits– Generally applicable– Can capture higher order correlations– Inputs other than sequence information

• Drawbacks– Needs a high amount of data (different solved

structures). However, today nearly 2500 structures with low sequence identity/high resolution are solved

– Complex method with several pitfalls

Page 23: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Architecture

IKEEHVI IQAE

HEC

IKEEHVIIQAEFYLNPDQSGEF…..Window

Input Layer

Hidden Layer

Output Layer

Weights

Page 24: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Sparse encoding

Inp Neuron 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

AAcid

A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

R 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

N 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

C 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Q 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 25: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Input Layer

IKEEHVI IQAE

0000

001

000

000

000

000

0

Page 26: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

BLOSUM 62

A R N D C Q E G H I L K M F P S T W Y V B Z X *A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4

Page 27: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Input Layer

IKEEHVI IQAE

-1

002

-425

-20-3 -

31-2 -

3 -10

-1 -3 -2 -

2

Page 28: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Secondary networks(Structure-to-Structure)

HECHECHEC

HEC

IKEEHVIIQAEFYLNPDQSGEF…..

Window

Input Layer

Hidden Layer

Output Layer

Weights

Page 29: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

PHD method (Rost and Sander)

• Combine neural networks with sequence profiles

– 6-8 Percentage points increase in prediction accuracy

over standard neural networks

• Use second layer “Structure to structure” network

to filter predictions

• Jury of predictors

• Set up as mail server

Page 30: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

PSI-Pred (Jones)

• Use alignments from iterative sequence

searches (PSI-Blast) as input to a neural

network

• Better predictions due to better sequence

profiles

• Available as stand alone program and via

the web

Page 31: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Position specific scoring matrices

(PSI-BLAST profiles) A R N D C Q E G H I L K M F P S T W Y V 1 I -2 -4 -5 -5 -2 -4 -4 -5 -5 6 0 -4 0 -2 -4 -4 -2 -4 -3 4 2 K -1 -1 -2 -2 -3 -1 3 -3 -2 -2 -3 4 -2 -4 -3 1 1 -4 -3 2 3 E 5 -3 -3 -3 -3 3 1 -2 -3 -3 -3 -2 -2 -4 -3 -1 -2 -4 -3 1 4 E -4 -3 2 5 -6 1 5 -4 -3 -6 -6 -2 -5 -6 -4 -2 -3 -6 -5 -5 5 H -4 2 1 1 -5 1 -2 -4 9 -5 -2 -3 -4 -4 -5 -3 -4 -5 1 -5 6 V -3 0 -4 -5 -4 -4 -2 -3 -5 1 -2 1 0 1 -4 -3 3 -5 -3 5 7 I 0 -2 -4 1 -4 -2 -4 -4 -5 1 0 -2 0 2 -5 1 -1 -5 -3 4 8 I -3 0 -5 -5 -4 -2 -5 -6 1 2 4 -4 -1 0 -5 -2 0 -3 5 -1 9 Q -2 -3 -2 -3 -5 4 -1 3 5 -5 -3 -3 -4 -2 -4 2 -1 -4 2 -2 10 A 2 -4 -4 -3 2 -3 -1 -4 -2 1 -1 -4 -3 -4 1 2 3 -5 -1 1 11 E -1 3 1 1 -1 0 1 -4 -3 -1 -3 0 3 -5 4 -1 -3 -6 -3 -1 12 F -3 -5 -5 -5 -4 -4 -4 -1 -1 1 1 -5 2 5 -1 -4 -4 -3 5 2 13 Y 3 -5 -5 -6 3 -4 -5 -2 -1 0 -4 -5 -3 3 -5 -2 -2 -2 7 1 14 L -1 -3 -4 -2 1 5 1 -1 -1 -1 1 -3 -3 1 -5 -1 -1 -2 3 -2 15 N -1 -4 4 1 5 -3 -4 2 -4 -4 -4 -3 -2 -4 -5 2 0 -5 0 0 16 P -2 4 -4 -4 -5 0 -3 3 2 -5 -4 0 -4 -3 0 1 -2 -1 5 -3 17 D -3 -2 1 5 -6 -2 2 2 -1 -2 -2 -3 -5 -4 -5 -1 2 -6 -3 -4

Page 32: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

• Sequence-to-structure– Window sizes 15,17,19 and 21– Hidden units 50 and 75– 10-fold cross validation => 80

predictions

• Structure-to-structure– Window size 17– Hidden units 40– 10-fold cross validation => 800

predictions

Several different architectures

Output:

C C H H C C C

Output:

C C C C C C C

Page 33: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

• Combining predictions from several networks improves the prediction

• Combinations of 800 different networks were used in the method described by

Petersen TN et al. 2000, Prediction of protein secondary structure at 80 % accuracy. Proteins 41 17-20

The majority rulesThe majority rules

Page 34: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Activities to probabilitiesActivities to probabilities

0.05 0.1 0.15 … 1.00.05 0.990.100.15 0.9 0.83 0.75...1.0

Helix activities (output)Strand activities (output)Coil probabilities! (calculated)

Coil conversion

Page 35: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Benchmarking secondary structure predictions

• EVA– Newly solved structures are send to prediction

servers.– Every week

http://cubic.bioc.columbia.edu/eva/sec/res_sec.html

Page 36: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

EVA results (Rost et al., 2001)

• PROFphd 77.0%

• PSIPRED 76.8%

• SAM-T99sec 76.1%

• SSpro 76.0%

• Jpred2 75.5%

• PHD 71.7%– Cubic.columbia.edu/eva

Page 37: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Links to servers

• Several links:http://cubic.bioc.columbia.edu/eva/doc/explain_methods.html#type_sec

• ProfPHD http://www.predictprotein.org/

• PSIPREDhttp://bioinf.cs.ucl.ac.uk/psipred/

• JPredhttp://www.compbio.dundee.ac.uk/~www-jpred/

Page 38: Protein Secondary Structures Assignment and prediction Pernille Haste Andersen 17.05.2006

Practical Conclusions

• If you need a secondary structure prediction use the newer methods based on advanced machine learning methods such as :– ProfPHD– PSIPRED– JPred

• And not one of the older ones such as :– Chou-Fasman– Garnier