jpred and jnet: protein secondary structure prediction · limits of protein secondary structure....
TRANSCRIPT
![Page 1: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/1.jpg)
JPred and Jnet:Protein Secondary
Structure Predictionwww.compbio.dundee.ac.uk/jpred
![Page 2: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/2.jpg)
Protein Sequence
Secondary StructureFold
...AI
LE
GD
AY S
H
K...
MFUNCTION?
a-helix
b-strand
![Page 3: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/3.jpg)
What is the difference between JPred and Jnet???• JNet refers to the prediction “engine” that does the
work. The current version of this is Version 2.3.1
• JPred refers to the website. This uses JNet and other tools to do predictions and present them in different ways.
![Page 4: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/4.jpg)
History of JPred/JNet
• 1987: Zpred: First predictor that used multiple sequence alignment
• 1999: Jpred 1: Did prediction by combining prediction methods developed by different groups that worked from multiple sequence alignments
• 2000: JNet 1: Multiple neural network predictor replaced all other methods in JPred
• 2002: JPred 2: Retraining JNet – improved accuracy
• 2009: JPred 3: Retraining JNet, algorithm improvements to Jnet and website refresh
• 2015: Jpred 4: Retraining JNet, major website improvements
![Page 5: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/5.jpg)
Neural Network???Machine learning method
![Page 6: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/6.jpg)
Neural Networks
• Inductive method of learning
• Supervised learning• Inputs and outputs provided
• Dependent on ‘quality’ of observations• Representative
• Unbiased (non-redundant)
InputNodes
HiddenNodes
OutputNodes
![Page 7: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/7.jpg)
Training and Testing JPred4/Jnet 2.3.1
• You need training data – where you know the answer.• We use a set of PDB domains of known structure from
the SCOP domains database
• Testing • 1. Cross-validation on 1208 domains
• 2. Blind test on 150 domains not used in training
![Page 8: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/8.jpg)
Neural Network Inputs
• Generate alignments for each sequence by searching UniRef90 with PSI-BLAST
• Make profiles:• Position-Specific Scoring Matrix (PSSM)• Hidden Markov Model (HMMer3)
• Earlier versions of JNet/Jpred had more inputs.
![Page 9: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/9.jpg)
Profiles give position-specific scoring
Aligning to Glycine at position 11 scores +6.5
Aligning to Glycine at position 23 scores -1.51
This emphasises position-specific features of the protein family
Compared to Gly-Gly score of 0.6 in the BLOSUM62 matrix.
![Page 10: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/10.jpg)
Neural Network Outputs
• DSSP definitions of secondary structure reduced from 8- to 3-state• H: Helix
• E or B: Extended strand
• Everything else: coil
![Page 11: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/11.jpg)
Query --------KMLTQRAEIDRAFEEAAGSAETLSVERLVTFLQHQQRSeq 2 --------KILTKREEIDVIYGEYAKTDGLMSANDLLNFLLTEQR
Seq 3 --------KALTKRAEVQELFESFSADGQKLTLLEFLDFLREEQKSeq 4 --------RELLRRPELDAVFIQYSANGCVLSTLDLRDFLSD-QG
DSSP --HHHHHHHHH------HHHHHHHHHHHHH-------
}
# Input 10 0 0 0 0 0 0 0 0.9820 0.1192 0.28689 ...
# Output 11 0 0
Each position in the input vector is the score for an amino acid within the window
![Page 12: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/12.jpg)
Query --------KMLTQRAEIDRAFEEAAGSAETLSVERLVTFLQHQQRSeq 2 --------KILTKREEIDVIYGEYAKTDGLMSANDLLNFLLTEQR
Seq 3 --------KALTKRAEVQELFESFSADGQKLTLLEFLDFLREEQKSeq 4 --------RELLRRPELDAVFIQYSANGCVLSTLDLRDFLSD-QG
DSSP --HHHHHHHHH------HHHHHHHHHHHHH-------
}
# Input 20 0 0 0 0 0 0 0.9820 0.1192 0.28689 0.0474 ...
# Output 21 0 0
Each position in the input vector is the score for an amino acid within the window
![Page 13: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/13.jpg)
Query --------KMLTQRAEIDRAFEEAAGSAETLSVERLVTFLQHQQRSeq 2 --------KILTKREEIDVIYGEYAKTDGLMSANDLLNFLLTEQR
Seq 3 --------KALTKRAEVQELFESFSADGQKLTLLEFLDFLREEQKSeq 4 --------RELLRRPELDAVFIQYSANGCVLSTLDLRDFLSD-QG
DSSP --HHHHHHHHH------HHHHHHHHHHHHH-------
}
# Input 30 0 0 0 0 0 0.9820 0.1192 0.28689 0.0474 0.1192 ...
# Output 30 0 1
Each position in the input vector is the score for an amino acid within the window
![Page 14: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/14.jpg)
Two-Layer Ensemble
E-EEEEH-------HHH-HHH-HHH-HE- --EEEEE-------HHHHHHHHHHHHHH-
Sequence to Structure Structure to Structure
Actually, both have hundreds of inputs and three outputs – only two outputs shown for simplicity
![Page 15: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/15.jpg)
Training and Testing JPred4/Jnet 2.3.1
• You need training data – where you know the answer.• We use a set of PDB domains of known structure from
the SCOP domains database
• Testing • 1. Cross-validation on 1358 domains
• 2. Blind test on 150 domains not used in training
![Page 16: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/16.jpg)
Cross-validation training
• ‘Blind’ data - removed subset
• k-fold Cross-Validation (887 seqs)• Divide training data into k groups, train on k-1 and test
on remainder. Do this k times.
Train Test
![Page 17: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/17.jpg)
Jnet Version 1: Multiple methodsof presenting alignmentinformation.
If alternative networks do not agree, predict with network trained on difficultto predict regions.
![Page 18: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/18.jpg)
70.8
71.6
72.1
74.4
75.2
76.5
76.9
67 68 69 70 71 72 73 74 75 76 77 78
Blosum 62 profile ClustalW
Frequency profile ClustalW
Frequency profile PSIBLAST
HMMER profile ClustalW
PSSM PSIBLAST
Average of HMMER and PSSM PSIBLAST
Jury/No Jury network
JNet Version 1: Effect of Different Alignment Inputs
Average Percentage Accuracy
(7-fold cross-validation on 480 proteins)
JNet also accurately predicts whether amino acids will be buried or exposedin the folded structure of the protein.
![Page 19: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/19.jpg)
Blind Test
JNet Version 1
![Page 20: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/20.jpg)
Cuff, J. A. & Barton, G. J., (2000), Proteins 40: 502-511.
Comparison of JNet Version 1.0 to other Prediction Methodsin a Blind Test - (406 proteins)
62.0
70.6 73.372.370.7
74.6 76.4
![Page 21: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/21.jpg)
That was in 2000What has happened since?
![Page 22: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/22.jpg)
Average Prediction Accuracy is Rising, but Flattening off
Current JPred
![Page 23: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/23.jpg)
Confidence in prediction. JNet 1.0 vs Jnet 2.0
![Page 24: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/24.jpg)
Latest Jnet…
Dr. Alexey Drozdetskiy
Mean Q3 prediction accuracy curves
Residue Coverage curves
![Page 25: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/25.jpg)
Introducing the Practical
• At last!
![Page 26: JPred and Jnet: Protein Secondary Structure Prediction · Limits of protein Secondary Structure. Title: PowerPoint Presentation Author: Geoff Barton Created Date: 5/9/2015 9:35:26](https://reader036.vdocuments.us/reader036/viewer/2022071506/6126b6b40ecdd2577a449bd4/html5/thumbnails/26.jpg)
Limits of protein Secondary Structure