localization prediction of transmembrane proteins
DESCRIPTION
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bod én and Marcus Gallagher The University of Queensland. Protein. Membrane. Soluble. Integral. Peripheral. Anchored. Transmembrane. -barrel. -helical. Multi-spanning. Single-spanning. Protein classes. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/1.jpg)
Localization prediction of transmembrane proteinsStefan Maetschke, Mikael Bodén and Marcus GallagherThe University of Queensland
![Page 2: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/2.jpg)
Maetschke et al, The University of Queensland2
Protein classes
-helical-barrel
TransmembraneAnchored
IntegralPeripheral
Protein
Soluble Membrane
Single-spanningMulti-spanning
![Page 3: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/3.jpg)
Maetschke et al, The University of Queensland3
Transmembrane protein types
N
N
C
C
Type-I Type-II Type-IV(multi-spanning)
Cytosol (inside)
signal peptide
Type-III
NC
![Page 4: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/4.jpg)
Maetschke et al, The University of Queensland4
NucleusMitochondrion
Peroxisome
Lysosome
Endoplasmic Reticulum
Golgi Complex
ERGIC
Endosome
RNARibosome
Eukaryotic cell
![Page 5: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/5.jpg)
Maetschke et al, The University of Queensland5
Secretory and endocytic pathway
![Page 6: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/6.jpg)
Maetschke et al, The University of Queensland6
Problem and hypothesis• Sorting signals for transmembrane proteins serve multiple
purposes (targeting, retention, retrieval, avoidance) and are largely unknown (the problem is challenging/multi-faceted)
• Current localization prediction of eukaryotic transmembrane proteins is poor (models based on soluble proteins are ill-suited) (previous work is inadequate/incomplete)
• Localization prediction for transmembrane proteins is virtually unexplored (paucity/variance of data) (it is an open problem)
• Explicit modelling of protein topology should enhance localization prediction accuracy(parameter tuning receives explicit guidance to biologically sensible solutions) (the way to do it!)
![Page 7: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/7.jpg)
Maetschke et al, The University of Queensland7
Hidden Markov model
ii Sq 1
Inital state probabilities:
)|( 1 itjtij SqSqPaA
State transition probabilities:
a12S1 S2 S3
b1
a23
a11a33
b3b2
a22
)|()( itkti SqVoPkbB
Observation probabilities:
A
R
1
V...
2
20
A
R
1
V
...
2
20
A
R
1
V
...
2
20
s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 State sequence:
Observation sequence:
![Page 8: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/8.jpg)
Maetschke et al, The University of Queensland8
2-order Hidden Markov model
ii Sq 1
Inital state probabilities:
)|( 1 itjtij SqSqPaA
State transition probabilities:
a12S1 S2 S3
b1
a23
a11a33
b3b2
a22
)|()( itkti SqVoPkbB
Observation probabilities:
AA
AR
1
VV
...
2
400
s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 State sequence:
Observation sequence:
AN
AD
3
4
AA
AR
1
VV
...
2
400
AN
AD
3
4
AA
AR
1
VV
...
2
400
AN
AD
3
4
![Page 9: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/9.jpg)
Maetschke et al, The University of Queensland9
3-order Hidden Markov model
ii Sq 1
Inital state probabilities:
)|( 1 itjtij SqSqPaA
State transition probabilities:
a12S1 S2 S3
b1
a23
a11a33
b3b2
a22
)|()( itkti SqVoPkbB
Observation probabilities:
AAA
AAR
1
VVV
...
2
8000
s1 s1 s1 s2 s2 s2 s2 s2 s2 s3 State sequence:
Observation sequence:
AAN
AAD
3
4
AAC
AAQ
5
6
AAA
AAR
1
VVV
...
2
8000
AAN
AAD
3
4
AAC
AAQ
5
6
AAA
AAR
1
VVV
...
2
8000
AAN
AAD
3
4
AAC
AAQ
5
6
![Page 10: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/10.jpg)
Maetschke et al, The University of Queensland10
Signal peptide
cleavage region
hydrophobic coreN-terminal
regionmature protein
![Page 11: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/11.jpg)
Maetschke et al, The University of Queensland11
Transmembrane domain
icap TMD ocap
![Page 12: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/12.jpg)
Maetschke et al, The University of Queensland12
Protein topology model
ocap TMD icap C-termN-termSP outside inside
![Page 13: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/13.jpg)
Maetschke et al, The University of Queensland13
Localization model (5 x topology models)
NucleusMitochondrion
Peroxisome
Lysosome
Endoplasmic Reticulum
Golgi Complex
ERGIC
Endosome
![Page 14: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/14.jpg)
Maetschke et al, The University of Queensland14
LOCATE dataset
Subset LOCATE database FANTOM3, Mouse proteome Filter for transmembrane proteins No multi-targeted proteins Redundancy reduced (<25%) TMDs and SPs are labeled (predicted) High quality localization annotation
873 Plasma Membrane
261 Endoplasmic Reticulum
141 Golgi Complex
45 Lysosome
31 Endosome
1351
![Page 15: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/15.jpg)
Maetschke et al, The University of Queensland15
Prediction performance
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
SVM-1
SVM-2
HMM-1
HMM-2
HMM-3
Prediction Performance (MCC)
LOCATE dataset Mean correlation coefficient 10 fold, 10 times Five locations (ER, PM, GO, EN, LY) SVM: linear kernel 1-, 2- and 3-order HMMs
Confusion Matrix HMM-2
=> Di-peptide composition superior to single amino acid composition
=> Topological model superior to non-topological model
![Page 16: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/16.jpg)
Maetschke et al, The University of Queensland16
Predictor comparison
18
33
48
75
0
10
20
30
40
50
60
70
80
CELLO WolfPSort PAnalyst HMM-2
Prediction accuracy in %
CELLO 2.5: http://cello.life.nctu.edu.tw/WolfPSort: http://wolfpsort.seq.cbrc.jp/ProteomeAnalyst 2.5: http://www.cs.ualberta.ca/~bioinfo/PA/Sub/HMM-2: http://pprowler.itee.uq.edu.au/TMPHMMLoc
Test set (20 PM, 20 ER, 20 Golgi) HMM: only three classes but test set train set Other predictors: more classes but
test set train set
→ difficult to compare!
![Page 17: Localization prediction of transmembrane proteins](https://reader035.vdocuments.us/reader035/viewer/2022081508/568147f1550346895db528bb/html5/thumbnails/17.jpg)
Maetschke et al, The University of Queensland17
Conclusion
• Novel predictor for subcellular localization of transmembrane proteins along the secretory pathway: http://pprowler.itee.uq.edu.au/TMPHMMLoc
• Protein model has less states than topology predictors (TMHMM, HMMTOP, etc) but is of second order
• Localization model is trained and tested using LOCATE, a recent, high-quality localization dataset
• Overall better performance than current localization predictors (transmembrane proteins, eukaryotic, secretory pathway)– Di-peptide composition superior to single amino acid composition– "Topological" model superior to "non-topological" baseline model