prediction of protein structure in 1d
DESCRIPTION
Prediction of Protein Structure in 1D. 2 o structure, TM regions, and solvent accessibility. Topic 13. Chapter 29, Du and Bourne “Structural Bioinformatics”. The Truth (Information) is Out (In) There. The Truth (Information) is Out (In) There. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/1.jpg)
2o structure, TM regions, and solvent accessibility
Topic 13Chapter 29, Du and Bourne “Structural Bioinformatics”
![Page 2: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/2.jpg)
The Truth (Information) is Out (In) There
![Page 3: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/3.jpg)
The Truth (Information) is Out (In) There
But we’re still having a tough time finding it.
![Page 4: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/4.jpg)
Given a protein sequence (primary structure), predict its secondary structures
GHWIATRGQLIREAYEDYRHFSSECPFIP
CEEEEECCCEEEEECCCHHHHHHCCCCCC
E: -strandH: -helixC: coil
Assumption: short stretches of residues have propensity to adopt certainconformation conformation of the central residue in a sequence fragment⇒depends only on flanking residues (sliding window)
Protein Secondary Structure Prediction
H: ( H: - helix, G: 310 helix, I: -helix ) E: (E: -strand, B: bridge) C: (T: -turn, S: bend, C: coil)
![Page 5: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/5.jpg)
-- Because we can (kind of).-- Because it could be a first step towards prediction of protein tertiary
structure.
Why secondary structure prediction?
“Have solution, need problem.” Nearly every imaginable algorithm has been applied to secondary structure prediction.
![Page 6: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/6.jpg)
1. First generation: Single amino acid propensities Chou-Fasman method (1974), GOR I-IV ~56-60% accuracy
2. Second generation: Segments of 3-51 adjacent residues NNSSP, SSPAL
~65% accuracy
3. Neural network PHD, Psi-Pred, J-Pred
4. Support vector machine (SVM)
5. Hidden Markov Models (HMM)
Third generation methodsusing evolutionary information ~76% accuracy
Secondary Structure Prediction Methods
![Page 7: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/7.jpg)
3
ii1
3obs
M100
NiQ
1. three-state per-residue prediction accuracy
Mii, number of residues observed in state i and predicted in state i Nobs, the total number of residues observed in 3 states
Secondary Structure Prediction Accuracy
2. per-segment prediction accuracy (SOV, Segment of OVerlap)Per-stage segment overlap:
S1: observed SS segmentS2: predicted SS segment
![Page 8: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/8.jpg)
Calculate the propensity for a given amino acid to adopt a certain ss-type
( | ) ( , )( ) ( ) ( )
i i i
i
P aa p aaP
p p p aa
Example: from a data set with 30 proteins#Ala=2,000, #residues=20,000, #helix=4,000, #Ala in helix=580p(,aa) = 580/20,000, p() = 4,000/20,000, p(aa) = 2,000/20,000
P = 580 / (4,000/10) = 1.45
i, amino acid, secondary structure state
Single Residue Propensity Methods
![Page 9: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/9.jpg)
Amino Acid Propensities to Secondary Structures
T S P T A E L M R S T GP(H) 69 77 57 69 142 151 121 145 98 77 69 57
T S P T A E L M R S T GP(H) 69 77 57 69 142 151 121 145 98 77 69 57
Chou-Fasman method
T S P T A E L M R S T GP(H) 69 77 57 69 142 151 121 145 98 77 69 57
![Page 10: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/10.jpg)
* The idea is simple: predict SS of the central residue of a given segment from homologous segments (neighbors).
For example, from database, find some number of the closest sequences to a subsequence defined by a window around the central residue, then use max (N, N, Nc) to assign the SS.
Nearest Neighbor Methods
RSTEVRASRQLAKEKVN
Window size
Homologous sequences
ECCHHCC
C
Key parameters:1. How to define similarity?2. What size window of sequence should be examined?3. How many close sequences should be selected?
![Page 11: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/11.jpg)
The Devil is in the details…
![Page 12: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/12.jpg)
D. Jones, J. Mol. Boil. 292, 195 (1999). Method : Neural network Input data : PSSM generated by PSI-BLAST Bigger and better sequence database
Combining several database and data filtering Training and test sets preparation
Ss prediction only makes sense for proteins with no homologous structure.
No sequence & structural homologues between training and test sets by CATH and PSI-BLAST (mimicking realistic situation).
Psi-Pred Method
![Page 13: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/13.jpg)
Window size = 15 Two networks First network (sequence-to-structure):
315 = (20 + 1) 15 inputs extra unit to indicate where the windows spans either N or C terminus Data are scaled to [0-1] range by using 1/[1+exp(-x)] 75 hidden units 3 outputs (H, E, L)
Second network (structure-to-structure): Structural correlation between adjacent sequences 60 = (3 + 1) 15 inputs 60 hidden units 3 outputs
Accuracy ~76%
Psi-Pred Method--Neural Network
![Page 14: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/14.jpg)
Conf: Confidence (0=low, 9=high) ---very important!!!!Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence # PSIPRED HFORMAT (PSIPRED V2.3 by David Jones) Conf: 966899999997542002357777557999999716898188034435788873356776 Pred: CCHHHHHHHHHHHHHHHCCCCCCCHHHHHHHHHHHCCCCCEEECCCCEEEEEEECCCCCC AA: MMWEQFKKEKLRGYLEAKNQRKVDFDIVELLDLINSFDDFVTLSSCSGRIAVVDLEKPGD 10 20 30 40 50 60
Conf: 777179998337888888988751235636899718261220179868899999998557 Pred: CCCCEEEEEECCCCCHHHHHHHHHCCCCCEEEEECCCEEEEECCCHHHHHHHHHHHHHCC AA: KASSLFLGKWHEGVEVSEVAEAALRSRKVAWLIQYPPIIHVACRNIGAAKLLMNAANTAG 70 80 90 100 110 120
Conf: 200242314703799714651435541487355188999999999999999889999999 Pred: CCCCCCEECCCEEEEEECCCEEEEEECCCCCEEECHHHHHHHHHHHHHHHHHHHHHHHHH AA: FRRSGVISLSNYVVEIASLERIELPVAEKGLMLVDDAYLSYVVRWANEKLLKGKEKLGRL 130 140 150 160 170 180
Sample Psi-Pred Output
***Compare the prediction for residues 9 and 17***
![Page 15: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/15.jpg)
Sample Psi-Pred Output-II
![Page 16: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/16.jpg)
Again, voting rules methods tend to be bestATKAVCVLKGDGPVQGTIHFEAKGDTVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGP 2SODCCCCCCCCCCCCCCCCEEHCCHHECEEEEEEEEEEEECCCCCCCCCCCCCCCCCCCCCCC BPSCCHEEEEECCCCCCCCEEEHHHCCCEEEEEEEEECECCCCCCEEEECCCCCCCCCCCCCC D_RCCCEEEEEECCCCCEEEEEEEECCCEEEEEEEEEEEECCCCCEEEEECCCCCCCCCCCCC DSCCCCEEEEECCCCCCCEEEEEECCCCEEEEEEEEECCCCCCCCEEEEEECCCCCCCCCCCC GGRHHHCEEEECCCCCCCEEEEEECCCCEEEEEECEEEEEECCCCEEEEECCCCCCEEECCCC GORCCCCEEEECCCCCCCCCEEECCCCCCEEEEECEEECCCCCCCEEEECCCCCCCCEEECCC H_KCCCCEEEEECCCCCCCCCEEECCCCCEEEECCCCCCCCCCCEEEEEEEECCCCCCCCCCC K_SCCCCEEEECCCCCCCCEEEEECCCCEEEEEEEEEEECCCCCCEEEEECCCCCCCCCCCCC JOI---EEEEE------EEEEEEEEE--EEEEEEEEE-----EEEEEEEE------------- 2SOD HFNPLSKKHGGPKDEERHVGDLGNVTADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHEK 2SODCCCCCCCCCCCCCCCCCCCCCCECCCCCCHEECCCCCCCCCECCEECEEEEEEEEEEECC BPSCCCCCCCCCCCCCCCHHCECCCCCECCCCCCEEEEEEECCEEEECCCEEEEEEEEEEECC D_RCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCEEEEEECCCCCCCCCCEEEEEEEEEEECC DSCCCCCCCCCCCCCCCCCEEECCCCCCCCCCCCCEEEEECCCCCCCCCCEEEECEEEEEECC GGRCCCCCCCCCCCCCCHHEEECCCCCCCCCCCCEEEEEEECCEEECCCCEEEEEEEEEECCC GORCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEECCCCCCCCCCCCCCHHHHHHEECCC H_KCCCCCCCCCCCCCCCCEEECCCCCCCCCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEE K_SCCCCCCCCCCCCCCCCEEECCCCCCCCCCCCEEEEEECCCCECCCCCEEEEEEEEEEECC JOI--------------------EEEEEE------EEEEEEE--------------EEEEE-- 2SOD
![Page 17: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/17.jpg)
0
5
10
15
20
25
30 40 50 60 70 80 90 100
PSIPREDSSproPROFPHDpsiJPred2PHD
Perc
enta
ge o
f all
150
prot
eins
Percentage correctly predicted residues per protein
Prediction Accuracy (EVA)
EVA: Automatic evaluation of prediction servers
![Page 18: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/18.jpg)
Currently ~76%
Proteins with more than 100 homologues 80%
Assignment is ambiguous (5-15%). Recall DSSP vs STRIDE. -- non-unique protein structures (dynamic), H-bond cutoff, etc.
Different secondary structures between homologues (~12%).
Non-locality. Secondary structure is influenced by long-range interactions. -- Some segments can have multiple structure types (chameleon
sequences).
How Far Can We Go?
![Page 19: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/19.jpg)
Conceptually similar problem to SS prediction: Buried vs. Exposed. Weighted Ensemble Solvent Accessibility predictor: http://pipe.scs.fsu.edu/
wesa.html
Solvent accessibility
EE E E E
E
B
B
B B
B
B
![Page 20: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/20.jpg)
To provide structural context for putative mutations that one wants to characterize biochemically or biophysically.
Why bother?
![Page 21: Prediction of Protein Structure in 1D](https://reader035.vdocuments.us/reader035/viewer/2022062218/5681652e550346895dd7b408/html5/thumbnails/21.jpg)
Again, conceptually similar problem to SS prediction: TM vs. Not.
Transmembrane Segment Prediction