![Page 1: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/1.jpg)
Machine Learning Algorithms for Protein Structure Prediction
Jianlin Cheng
Institute for Genomics and BioinformaticsSchool of Information and Computer Sciences
University of California Irvine2006
![Page 2: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/2.jpg)
Outline
I. Introduction
II. 1D Prediction
III. 2D Prediction (Beta-Sheet Topology)
IV. 3D Prediction (Fold Recognition)
V. Publications and Bioinformatics Tools
![Page 3: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/3.jpg)
Importance of Protein Structure Prediction
AGCWY……
Sequence Structure Function
Cell
![Page 4: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/4.jpg)
Four Levels of Protein StructurePrimary Structure (a directional sequence of amino acids/residues)
Secondary Structure (helix, strand, coil)
N C…
Residue1
Alpha Helix Beta Strand / Sheet Coil
Residue2
Peptide bond
![Page 5: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/5.jpg)
Four Levels of Protein Structure
Quaternary Structure (complex)Tertiary Structure
G Protein Complex
![Page 6: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/6.jpg)
1D: Secondary Structure Prediction
Coil
MWLKKFGINLLIGQSV…
CCCCHHHHHCCCSSSSS…
Accuracy: 78%
Cheng, Randall, Sweredoski, Baldi. Nucleic Acid Research, 2005
Neural Networks+ Alignments
Strand
Helix
![Page 7: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/7.jpg)
1D: Solvent Accessibility PredictionExposed
Buried
MWLKKFGINLLIGQSV…
eeeeeeebbbbbbbbeeeebbb…
Accuracy: 79%
Neural Networks+ Alignments
Cheng, Randall, Sweredoski, Baldi. Nucleic Acid Research, 2005
![Page 8: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/8.jpg)
MWLKKFGINLLIGQSV…
OOOOODDDDOOOOO…
93% TP at 5% FP
Disordered Region
Cheng, Sweredoski, Baldi. Data Mining and Knowledge Discovery, 2005
1D-RNN
1D: Disordered Region Prediction Using Neural Networks
![Page 9: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/9.jpg)
MWLKKFGINLLIGQSV…
NNNNNNNBBBBBNNNN…
Domain 1 Domain 2 Domains
1D: Protein Domain Prediction Using Neural Networks
Cheng, Sweredoski, Baldi. Data Mining and Knowledge Discovery, 2006.
1D-RNN
+ SS and SA
HIV capsid protein Inference/Cut
Boundary
Top ab-initio domain predictor in CAFASP4
![Page 10: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/10.jpg)
1D: Predict Single-Site Mutation From Sequence Using Support Vector Machine
• First method to predict energy changes from sequence accurately
• Useful for protein engineering, protein design, and mutagenesis analysis
…MWLAVFILINLK…
SupportVector
Machine
Correlation = 0.76
Cheng, Randall, and Baldi. Proteins, 2006
![Page 11: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/11.jpg)
2D: Contact Map Prediction
1 2 ………..………..…j...…………………..…n 123....i.......n
3D Structure 2D Contact Map
Cheng, Randall, Sweredoski, Baldi. Nucleic Acid Research, 2005
Distance Threshold = 8Ao
![Page 12: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/12.jpg)
2D: Disulfide Bond Prediction
Disulfide Bond
Cysteine j
Cysteine i
2D-RNN
GraphMatching
[1] Baldi, Cheng, Vullo. NIPS, 2004.[2] Cheng, Saigo, Baldi. Proteins, 2005
SupportVector
Machine
yes
![Page 13: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/13.jpg)
2D: Prediction of Beta-Sheet Topology
N terminus
C terminus
Cheng and Baldi, Bioinformatics, 2005
Beta Sheet
BetaStrand
Beta ResiduePair
• Ab-Initio Structure Prediction
• Fold Recognition
• Protein Design
• Protein Folding
![Page 14: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/14.jpg)
An Example of Beta-Sheet Topology
Structure ofProtein 1VJG
Beta Sheets
Level 1
4 5
2 1 3 6 7
![Page 15: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/15.jpg)
An Example of Beta-Sheet Topology
Structure ofProtein 1VJG
Beta Sheets StrandStrand PairStrand AlignmentPairing Direction
Level 1 Level 2
Antiparallel
Parallel
4 5
2 1 3 6 7
![Page 16: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/16.jpg)
An Example of Beta-Sheet Topology
Structure ofProtein 1VJG
Beta Sheets StrandStrand PairStrand AlignmentPairing Direction
Beta ResidueResidue Pair
Level 1 Level 2 Level 3
Antiparallel
Parallel
4 5
2 1 3 6 7
H-bond
![Page 17: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/17.jpg)
Three-Stage Prediction of Beta-Sheets
• Stage 1 Predict beta-residue pairing probabilities
using 2D-Recursive Neural Networks (2D-
RNN, Baldi and Pollastri, 2003)
• Stage 2 Use beta-residue pairing probabilities to
align beta-strands
• Stage 3 Predict beta-strand pairs and beta-sheet
topology using graph algorithms
![Page 18: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/18.jpg)
Stage 1: Prediction of Beta-Residue Pairings Using 2D-Recusive Neural Networks
Input Matrix I (m×m)
2D-RNNO = f(I)
Output / Target Matrix (m×m)
Iij
20 for Residues 3 SS 2 SA
Oij: Pairing Prob.Tij: 0/1
(i,j)
…AHYHCKRWQNEDGHTPRKDECLIELMQDAQRMRK….
i j
![Page 19: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/19.jpg)
An Example (Target)
Protein 1VJGBeta-Residue Pairing Map (Target Matrix)
1 2 3 4 5 6 7
![Page 20: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/20.jpg)
An Example (Target)
Protein 1VJGBeta-Residue Pairing Map (Target Matrix)
1 2 3 4 5 6 7Antiparallel
Parallel
![Page 21: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/21.jpg)
An Example (Prediction)
![Page 22: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/22.jpg)
Stage 2: Beta-Strand Alignment
• Use output probability matrix as scoring matrix
• Dynamic programming• Disallow gaps and use
the simplified search algorithm
1 m
n 1
1 m1 n
Antiparallel
Parallel
Total number of alignments = 2(m+n-1)
![Page 23: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/23.jpg)
Strand Alignment and Pairing Matrix
• The alignment score is the sum of the pairing probabilities of the aligned residues
• The best alignment is the alignment with the maximum score
• Strand Pairing Matrix
Strand Pairing Matrix of 1VJG
![Page 24: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/24.jpg)
Stage 3: Prediction of Beta-Strand Pairings and Beta-Sheet Topology
(a) Seven strands of protein 1VJG in sequence order
(b) Beta-sheet topology of protein 1VJG
![Page 25: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/25.jpg)
Minimum Spanning Tree Like Algorithm
Strand Pairing Graph (SPG)
(a) Complete SPGStrand Pairing Matrix
![Page 26: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/26.jpg)
Minimum Spanning Tree Like Algorithm
Strand Pairing Graph (SPG)
Goal: Find a set of connected subgraphs that maximize the sum of the alignment scores and satisfy the constraints Algorithm: Minimum Spanning Tree Like Algorithm
(a) Complete SPG (b) True Weighted SPGStrand Pairing Matrix
![Page 27: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/27.jpg)
An Example of MST Like Algorithm
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1
2
3
4
56
7
1 2 3 4 5 6 7
4 5
Strand Pairing Matrix of 1VJG
Step 1: Pair strand 4 and 5
![Page 28: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/28.jpg)
An Example of MST Like Algorithm
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1
2
3
4
56
7
1 2 3 4 5 6 7
4 5
2 1
Strand Pairing Matrix of 1VJG
N
Step 2: Pair strand 1 and 2
![Page 29: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/29.jpg)
An Example of MST Like Algorithm
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1
2
3
4
56
7
1 2 3 4 5 6 7
4 5
2 1 3
Strand Pairing Matrix of 1VJG
N
Step 3: Pair strand 1 and 3
![Page 30: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/30.jpg)
An Example of MST Like Algorithm
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1
2
3
4
56
7
1 2 3 4 5 6 7
4 5
2 1 3 6Strand Pairing Matrix of 1VJG
N
Step 4: Pair strand 3 and 6
![Page 31: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/31.jpg)
An Example of MST Like Algorithm
0
1.3 0
.94 .37 0
.02 .02 .04 0
.02 .02 .03 1.9 0
.10 .05 .74 .04 .04 0
.02 .02 .03 .02 .02 .20 0
1
2
3
4
56
7
1 2 3 4 5 6 7
4 5
2 1 3 67Strand Pairing Matrix of 1VJG
N
C
Step 5: Pair strand 6 and 7
![Page 32: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/32.jpg)
Method Specificity/
Sensitivity
Ratio of
Improvement
BetaPairing 41% 17.8
CMAPpro
(Pollastri and Baldi, 2002)
27% 11.7
Method Specificity Sensitivity % of non-local pairs
MST Like 53% 59% 20%
Method Alignment
Accuracy
Pairing
Direction
BetaPairing 66% 84%
Statistical Potential (Hubbard, 1994) 40% X
Pseudo-energy (Zhu and Braun, 1999) 35% X
Information Theory (Steward and Thornton, 2002) 37% X
1.Beta Residue Pairing
2. Beta Strand Alignment
3. Beta Strand Pairing
![Page 33: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/33.jpg)
3D Structure Prediction•Ab-Initio Structure Prediction
•Template-Based Structure Prediction
Physical force field – protein foldingContact map - reconstruction
MWLKKFGINLLIGQSV…
……
Select structure with minimum free energy
MWLKKFGINKH…
Protein Data Bank
Fold
Recognition Alignment
Template
Simulation
Query protein
![Page 34: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/34.jpg)
A Machine Learning Information Retrieval Framework for Fold Recognition
MWLKKFGIN……
Protein Data Bank
Fold Recognition
Alignment
Template
Query Protein
Cheng and Baldi, Bioinformatics, 2006
Machine Learning Ranking
![Page 35: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/35.jpg)
Classic Fold Recognition Approaches
Sequence - Sequence Alignment(Needleman and Wunsch, 1970. Smith and Waterman, 1981)
ITAKPAKTPTSPKEQAIGLSVTFLSFLLPAGWVLYHL
ITAKPQWLKTSE------------SVTFLSFLLPQTQGLYHL
Query
Template
Works for >40% sequence identity(Close homologs in protein family)
Alignment (similarity) score
![Page 36: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/36.jpg)
Classic Fold Recognition Approaches
Profile - Sequence Alignment(Altschul et al., 1997)
ITAKPAKTPTSPKEQAIGLSVTFLSFLLPAGWVLYHLITAKPEKTPTSPREQAIGLSVTFLEFLLPAGWVLYHLITAKPAKTPTSPKEEAIGLSVTFLSFLLPAGWVLYHLITAKPQKTPTSLKEQAIGLSVTFLSFLLPAGWALYHL
ITAKPQWLKTSERSTEWQSVTFLSFLLPQTQGLYHN
QueryFamily
Template
More sensitive for distant homologs in superfamily. (> 25% identity)
AverageScore
![Page 37: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/37.jpg)
Classic Fold Recognition Approaches
ITAKPAKTPTSPKEQAIGLSVTFLSFLLPAGWVLYHLITAKPEKTPTSPREQAIGLSVTFLEFLLPAGWVLYHLITAKPAKTPTSPKEEAIGLSVTFLSFLLPAGWVLYHLITAKPQKTPTSLKEQAIGLSVTFLSFLLPAGWALYHL
ITAKPQWLKTSERSTEWQSVTFLSFLLPQTQGLYHN
QueryFamily
Template
1 2 … n
A 0.4
C 0.1
…
W 0.5
Position Specific Scoring MatrixOr Hidden Markov Model
More sensitive for distant homologs in superfamily. (> 25% identity)
12………………………………….………………n
Profile - Sequence Alignment(Altschul et al., 1997)
![Page 38: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/38.jpg)
Classic Fold Recognition Approaches
1 2 … m
A 0.3
C 0.5
…
W 0.2
Profile - Profile Alignment(Rychlewski et al., 2000)
ITAKPAKTPTSPKEQAIGLSVTFLSFLLPAGWVLYHLITAKPEKTPTSPREQAIGLSVTFLEFLLPAGWVLYHLILAKPAKTPTSPKEEAIGLSVTFLSFLLPAGWVLYHLITAKPQKTPTSLKEQAIGLSVTFLSFLLPAGWALYHL
ITAKPQWLKTSERSTEWQSVTFLSFLLPQTQGLYHNIPARPQWLKTSKRSTEWQSVTFLSFLLPYTQGLYHNIGAKPQWLWTSERSTEWHSVTFLSFLLPQTQGLYHM
QueryFamily
TemplateFamily
1 2 … n
A 0.1
C 0.4
…
W 0.5
More sensitive for very distant homologs. (> 15% identity)
![Page 39: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/39.jpg)
Classic Fold Recognition Approaches
MWLKKFGINLLIGQS….
Useful for recognizing similar folds without sequence similarity.(no evolutionary relationship)
Query
Template Structure
FitFitness Score
Sequence - Structure Alignment (Threading)(Bowie et al., 1991. Jones et al., 1992. Godzik, Skolnick, 1992. Lathrop, 1994)
![Page 40: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/40.jpg)
Integration of Complementary Approaches
Meta Server
FR Server1
FR server2
FR server3
Query
Internet
Consensus
1. Reliability depends on availability of external servers2. Make decisions on a handful candidates
(Lundstrom et al.,2001. Fischer, 2003)
![Page 41: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/41.jpg)
Machine Learning Classification Approach
Proteins
Class 1
Class 2
Class m
Classify individual proteins to several or dozens of structure classes(Jaakkola et al., 2000. Leslie et al., 2002. Saigo et al., 2004)
Problem 1: can’t scale up to thousands of protein classesProblem 2: doesn’t provide templates for structure modeling
Support Vector Machine (SVM)
![Page 42: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/42.jpg)
Machine Learning Information Retrieval Framework
Query-Template Pair
-
+
Score 1Relevance Function (e.g., SVM)
• Extract pairwise features• Comparison of two pairs (four proteins)• Relevant or not (one score) vs. many classes• Ranking of templates (retrieval)
Score 2
Score n
Rank
.
.
.
![Page 43: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/43.jpg)
Pairwise Feature Extraction • Sequence / Family Information Features Cosine, correlation, and Gaussian kernel• Sequence – Sequence Alignment Features Palign, ClustalW• Sequence – Profile Alignment Features PSI-BLAST, IMPALA, HMMer, RPS-BLAST• Profile – Profile Alignment Features ClustalW, HHSearch, Lobster, Compass, PRC-HMM• Structural Features Secondary structure, solvent accessibility, contact map, beta-
sheet topology
![Page 44: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/44.jpg)
Pairwise Feature Extraction
![Page 45: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/45.jpg)
Relevance Function: Support Vector Machine Learning
Positive Pairs(Same Folds)
Negative Pairs(Different Folds)
Training/Learning
SupportVector
Machine
Training Data Set
Feature Space
Hyperplane
![Page 46: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/46.jpg)
Relevance Function: Support Vector Machine Learning
f(x) = K is Gaussian Kernel:
Margin
Margin
(1) (2)
![Page 47: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/47.jpg)
Training and Cross-Validation• Standard benchmark (Lindahl’s dataset, 976 proteins)• 976 x 975 query-template pairs (about 7,468 positives)
123.....976
Query
975 pairs
975 pairs
Query 1’s pairs
.
.
.
Rank 975templatesfor eachquery
975 pairsQuery 2’s pairs
(90%: 1- 878)
(10%: 879 – 976)
Train / Learn
Test
![Page 48: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/48.jpg)
Results for Top Five Ranked Templates
•Family: close homologs, more identity•Superfamily: distant homologs, less identity•Fold: no evolutionary relation, no identity
Method Family Superfamily Fold
PSI-BLAST 72.3 27.9 4.7
HMMER 73.5 31.3 14.6
SAM-T98 75.4 38.9 18.7
BLASTLINK 78.9 4.06 16.5
SSEARCH 75.5 32.5 15.6
SSHMM 71.7 31.6 24
THREADER 58.9 24.7 37.7
FUGUE 85.8 53.2 26.8
RAPTOR 77.8 50 45.1
SPARKS3 86.8 67.7 47.4
FOLDpro 89.9 70.0 48.3
![Page 49: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/49.jpg)
Specificity-Sensitivity Plot (Family)
![Page 50: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/50.jpg)
Specificity-Sensitivity Plot (Superfamily)
![Page 51: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/51.jpg)
Specificity-Sensitivity Plot (Fold)
![Page 52: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/52.jpg)
Advantages of MLIR Framework• Integration
• Accuracy
• Extensibility
• Simplicity
• Reliability
• Completeness
• Potentials
DisadvantagesSlower than some alignment methods
![Page 53: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/53.jpg)
A CASP7 Example: T0290Query sequence (173 residues):RPRCFFDIAINNQPAGRVVFELFSDVCPKTCENFRCLCTGEKGTGKSTQKPLHYKSCLFHRVVKDFMVQGGDFSEGNGRGGESIYGGFFEDESFAVKHNAAFLLSMANRGKDTNGSQFFITKPTPHLDGHHVVFGQVISGQEVVREIENQKTDAASKPFAEVRILSCGELIP
Compare with the experimental structure:RMSD = 1Ao
FOLDpro
Predicted Structure
![Page 54: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/54.jpg)
Publications and Bioinformatics Tools1. P. Baldi, J. Cheng, and A. Vullo. Large-Scale Prediction of Disulphide Bond Connectivity. NIPS 2004.
[DIpro 1.0]2. J. Cheng, H. Saigo, and P. Baldi. Large-Scale Prediction of Disulphide Bridges Using Kernel Methods, Two-Dimensional Recursive Neural Networks, and Weighted Graph Matching. Proteins, 2006.
[DIpro 2.0] 3. J. Cheng and P. Baldi. Three-Stage Prediction of Protein Beta-Sheets by Neural Networks, Alignments, and Graph Algorithms. Bioinformatics, 2005.
[BETApro]4. J. Cheng, A. Randall, M. Sweredoski, and P. Baldi. SCRATCH: a Protein Structure and Structural Feature Prediction Server. Nucleic Acids Research, 2005.
[SSpro 4/ACCpro 4/CMAPpro 2]5. J. Cheng, M. Sweredoski, and P. Baldi. Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data. Data Mining and Knowledge Discovery, 2005.
[DISpro]
![Page 55: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/55.jpg)
6. J. Cheng, L. Scharenbroich, P. Baldi, and E. Mjolsness. Sigmoid: Towards a Generative, Scalable, Software Infrastructure for Pathway Bioinformatics and Systems Biology. IEEE Intelligent Systems, 2005.
[Sigmoid]7. J. Cheng, A. Randall, and P. Baldi. Prediction of Protein Stability Changes for Single Site Mutations Using Support Vector Machines. Proteins, 2006.
[MUpro]8. S. A. Danziger, S. J. Swamidass, J. Zeng, L. R. Dearth, Q. Lu, J. H. Chen, J. Cheng, V. P. Hoang, H. Saigo, R. Luo, P. Baldi, R. K. Brachmann, and R. H. Lathrop. Functional Census of Mutation Sequence Spaces: The Example of p53 Cancer Rescue Mutants. IEEE Transactions on Computational Biology and Bioinformatics, 2006.
9. J. Cheng, M. Sweredoski, and P. Baldi. DOMpro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Networks. Data Mining and Knowledge Discovery, 2006.
[DOMpro]10. J. Cheng and P. Baldi. A Machine Learning Information Retrieval Approach to Protein Fold Recognition. Bioinformatics, 2006.
[FOLDpro]
Publications and Bioinformatics Tools
![Page 56: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/56.jpg)
Acknowledgements • Pierre Baldi• G. Wesley Hatfield, Eric Mjolsness, Hal
Stern, Dennis Decoste, Suzanne Sandmeyer, Richard Lathrop, Gianluca Pollastri, Chin-Rang Yang
• Mike Sweredoski, Arlo Randall, Liza Larsen, Sam Danziger, Trent Su, Hiroto Saigo, Alessandro Vullo, Lucas Scharenbroich
![Page 57: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/57.jpg)
![Page 58: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/58.jpg)
Markov Models
![Page 59: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/59.jpg)
![Page 60: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/60.jpg)
![Page 61: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/61.jpg)
1D-Recursive Neural Network
![Page 62: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/62.jpg)
2D-Recursive Neural Network
![Page 63: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/63.jpg)
![Page 64: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/64.jpg)
2D-RNNs
![Page 65: Machine Learning Algorithms for Protein Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681441e550346895db0bb2e/html5/thumbnails/65.jpg)
2D RNNs