development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides
DESCRIPTION
Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides. Fábio M. Marques Madeira Supervisor: Professor Geoff Barton. 7 th May 2013. 14-3-3s dock onto pairs of tandem phosphoSer / Thr. 2R-ohnologue families. P. P. Kinase 1. 14-3-3. Kinase 2. - PowerPoint PPT PresentationTRANSCRIPT
Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides
Fábio M. Marques Madeira
Supervisor: Professor Geoff Barton
7th May 2013
14-3-3s dock onto pairs of tandem phosphoSer/Thr
P P
Kinase 1 Kinase 2
Hundreds of structurally and functionally diverse targets
14-3-3
1
2R-ohnologuefamilies
The binding specificity of 14-3-3s is determined by overall steric fit and the sequence flanking the phosphoSer/Thr site
2
Mode I: RSX(pS/T)XP
Mode II: RX(F/Y)X(pS)XP
Mode III: C-terminal X(pS/T)
P P
Johnson et al., (2011) Molecular & cellular proteomics 10, M110.005751.
ANIA: ANnotation and Integrated Analysis of the 14-3-3 interactome
3
Development and evaluation of three new classifiers
6
Position-specific scoring matrix (PSSM)
Artificial Neural Network (ANN)
Support Vector Machines (SVM)
Defining positive and negative examples for training and testing
5
Previous76 Pos76 Neg
Current273 Pos93 Neg
Training datasets:
1,192 Likely Neg
72 Proteins
pS/T pS/T
C- -N
Defining positive and negative examples for training and testing
5
Previous76 Pos76 Neg
Current273 Pos93 Neg
Training datasets:
1,192 Likely Neg
Previous17 Pos17 Neg
Current38 Pos38 Neg
Blind datasets:
-11:11
-3:3
-7:7
Sequence redundancy thresholds:60%, 50% and 40%
Different motif regions/lengths:
-9:9
-5:5
Development and evaluation of three new classifiers
7The area under the curve (AUC) was tested by Jackknife
Development and evaluation of three new classifiers
8
Q - Accuracy
MCC - Matthews Correlation Coefficient
Amino acid alphabet reduction reduces accuracy
9
Li et al., 2003 Livingston and Barton, 1993
Grouping 20 amino acids in 10 physicochemical classes:
Overall, alphabet reduction led to lower classification performances, suggesting that some sequence features that influence 14-3-3 binding, were lost by the reduction.
Protein secondary structure, disorder and conservation do not improve the performance of the ANN
10
Sequence conservationProtein secondary structure by Jpred
Protein disorder by IUPred, DisEMBL and GlobPlot
P – Positives; N – Negatives (true + likely neg); L – Likely neg only; R – Random neg
11
Blind testing shows that the PSSM is the best overall predictor
80% Overall Accuracy
12
Prediction of new 14-3-3-binding sites using the PSSMHuman Proteome
13
Scansite includes a set of predictions based on type I 14-3-3-
binding motif: RSX(pS/T)XP
The PSSM predictor outperforms Scansite in terms of accuracy
PSSM Scansite
Conclusions
New strategy to map negative datasets
Performance improvement (AUC from ~0.80 to 0.88) and 80% accuracy,
for the PSSM model (60% and [-5:5])
Large-scale prediction of the human 14-3-3-binding proteome
The PSSM classifier outperforms Scansite in terms of accuracy
15
Future work
1. Test training of the classifiers using non-symmetrical motif regions:
e.g. [-6:3]
2. Investigate new machine learning algorithms such as Bayesian
classifiers
3. Use the PSSM classifier to predict the 14-3-3-binding proteome of
model organisms such as Arabidopsis thaliana
4. Integrate predictions in ANIA and investigate if the candidate sites
are lynchpin sites conserved across 2R-ohnologue family members
16
Acknowledgements
Geoff Barton
Chris Cole
All members in the Computational Biology group
Carol MacKintosh and Michele Tinti