development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides

Post on 24-Feb-2016

33 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides. Fábio M. Marques Madeira Supervisor: Professor Geoff Barton. 7 th May 2013. 14-3-3s dock onto pairs of tandem phosphoSer / Thr. 2R-ohnologue families. P. P. Kinase 1. 14-3-3. Kinase 2. - PowerPoint PPT Presentation

TRANSCRIPT

Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides

Fábio M. Marques Madeira

Supervisor: Professor Geoff Barton

7th May 2013

14-3-3s dock onto pairs of tandem phosphoSer/Thr

P P

Kinase 1 Kinase 2

Hundreds of structurally and functionally diverse targets

14-3-3

1

2R-ohnologuefamilies

The binding specificity of 14-3-3s is determined by overall steric fit and the sequence flanking the phosphoSer/Thr site

2

Mode I: RSX(pS/T)XP

Mode II: RX(F/Y)X(pS)XP

Mode III: C-terminal X(pS/T)

P P

Johnson et al., (2011) Molecular & cellular proteomics 10, M110.005751.

ANIA: ANnotation and Integrated Analysis of the 14-3-3 interactome

3

Development and evaluation of three new classifiers

6

Position-specific scoring matrix (PSSM)

Artificial Neural Network (ANN)

Support Vector Machines (SVM)

Defining positive and negative examples for training and testing

5

Previous76 Pos76 Neg

Current273 Pos93 Neg

Training datasets:

1,192 Likely Neg

72 Proteins

pS/T pS/T

C- -N

Defining positive and negative examples for training and testing

5

Previous76 Pos76 Neg

Current273 Pos93 Neg

Training datasets:

1,192 Likely Neg

Previous17 Pos17 Neg

Current38 Pos38 Neg

Blind datasets:

-11:11

-3:3

-7:7

Sequence redundancy thresholds:60%, 50% and 40%

Different motif regions/lengths:

-9:9

-5:5

Development and evaluation of three new classifiers

7The area under the curve (AUC) was tested by Jackknife

Development and evaluation of three new classifiers

8

Q - Accuracy

MCC - Matthews Correlation Coefficient

Amino acid alphabet reduction reduces accuracy

9

Li et al., 2003 Livingston and Barton, 1993

Grouping 20 amino acids in 10 physicochemical classes:

Overall, alphabet reduction led to lower classification performances, suggesting that some sequence features that influence 14-3-3 binding, were lost by the reduction.

Protein secondary structure, disorder and conservation do not improve the performance of the ANN

10

Sequence conservationProtein secondary structure by Jpred

Protein disorder by IUPred, DisEMBL and GlobPlot

P – Positives; N – Negatives (true + likely neg); L – Likely neg only; R – Random neg

11

Blind testing shows that the PSSM is the best overall predictor

80% Overall Accuracy

12

Prediction of new 14-3-3-binding sites using the PSSMHuman Proteome

13

Scansite includes a set of predictions based on type I 14-3-3-

binding motif: RSX(pS/T)XP

The PSSM predictor outperforms Scansite in terms of accuracy

PSSM Scansite

Conclusions

New strategy to map negative datasets

Performance improvement (AUC from ~0.80 to 0.88) and 80% accuracy,

for the PSSM model (60% and [-5:5])

Large-scale prediction of the human 14-3-3-binding proteome

The PSSM classifier outperforms Scansite in terms of accuracy

15

Future work

1. Test training of the classifiers using non-symmetrical motif regions:

e.g. [-6:3]

2. Investigate new machine learning algorithms such as Bayesian

classifiers

3. Use the PSSM classifier to predict the 14-3-3-binding proteome of

model organisms such as Arabidopsis thaliana

4. Integrate predictions in ANIA and investigate if the candidate sites

are lynchpin sites conserved across 2R-ohnologue family members

16

Acknowledgements

Geoff Barton

Chris Cole

All members in the Computational Biology group

Carol MacKintosh and Michele Tinti

top related