spatial proteomics combining experimental and annotation ...€¦ · spatial proteomics combining...

33
Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent Gatto [email protected] @lgatt0 Computational Proteomics Unit http://cpu.sysbiol.cam.ac.uk/ University of Cambridge 13 Jan 2015, Heidelberg

Upload: others

Post on 24-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Spatial proteomicsCombining experimental and annotation data to

predict protein sub-cellular localisation.

Laurent [email protected] – @lgatt0

Computational Proteomics Unithttp://cpu.sysbiol.cam.ac.uk/

University of Cambridge

13 Jan 2015, Heidelberg

Page 2: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Plan

Introduction

Spatial proteomics

Previously . . .

Transfer learning

Page 3: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Regulations

Page 4: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Cell organisation

Spatial proteomics is the systematic study of protein localisations.

Image from Wikipedia http://en.wikipedia.org/wiki/Cell_(biology).

Page 5: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Spatial proteomics - Why?

Mis-localisationDisruption of the targeting/trafficking process alters propersub-cellular localisation, which in turn perturb the cellular functionsof the proteins.I Abnormal protein localisation leading to the loss of functional

n effects in diseases (Laurila and Vihinen, 2009).I Disruption of the nuclear/cytoplasmic transport (nuclear

pores) have been detected in many types of carcinoma cells(Kau et al., 2004).

Multi- and re-localisation

I Differentiation: Tfe3 in mouse ESC (Betschinger et al., 2013).I Metabolism: changes in carbon sources, elemental limitations.

Page 6: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Plan

Introduction

Spatial proteomics

Previously . . .

Transfer learning

Page 7: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Spatial proteomics - How, experimentally

Single celldirect

observation

Population level

Subcellular fractionation (number of fractions)

Tagging Quantitative mass spectrometryCataloguing Relative abundance

1 fraction2 fractions(enriched

and crude)n discrete fractions

n continuous fractions(gradient approaches)

Subtractiveproteomics

(enrichment)

Invariantrich

fraction(clustering)

(χ )2PCP LOPIT

(PCA, PLS-DA)

Pure fraction

catalogue

GFPEpitope

Prot.-spec.antibody

Figure : Organelle proteomics approaches (Gatto et al., 2010)

Page 8: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Fusion proteins and immunofluorescence

Page 9: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Spatial proteomics - How, experimentally

Single celldirect

observation

Population level

Subcellular fractionation (number of fractions)

Tagging Quantitative mass spectrometryCataloguing Relative abundance

1 fraction2 fractions(enriched

and crude)n discrete fractions

n continuous fractions(gradient approaches)

Subtractiveproteomics

(enrichment)

Invariantrich

fraction(clustering)

(χ )2PCP LOPIT

(PCA, PLS-DA)

Pure fraction

catalogue

GFPEpitope

Prot.-spec.antibody

Figure : Organelle proteomics approaches (Gatto et al., 2010). Gradientapproaches: Dunkley et al. (2006), Foster et al. (2006).

⇒ Explorative/discovery approches, global localisation maps.

Page 10: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Fractionation/centrifugation

Quantitation/identificationby mass spectrometry

e.g. Mitochondrion

Cell lysis

e.g. Mitochondrion

Page 11: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Quantitation data and organelle markers

Fraction1 Fraction2 . . . Fractionm markersp1 q1,1 q1,2 . . . q1, m unknownp2 q2,1 q2,2 . . . q2, m loc1

p3 q3,1 q3,2 . . . q3, m unknownp4 q4,1 q4,2 . . . q4, m loci...

......

......

...

pj qj,1 qj,2 . . . qj, m unknown

Page 12: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Visualisation and classification

0.2

0.3

0.4

0.5

Correlation profile − ER

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

Correlation profile − Golgi

Fractions

1 2 4 5 7 81112

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − mit/plastid

Fractions

1 2 4 5 7 81112

0.15

0.20

0.25

0.30

0.35

Correlation profile − PM

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − Vacuole

Fractions

1 2 4 5 7 81112

●●

●●

●●

●●

●●●● ●

●●

●●

●●

−10 −5 0 5

−5

05

Principal component analysis

PC1

PC

2

ERGolgimit/plastidPM

vacuolemarkerPLS−DAunknown

Figure : From Gatto et al. (2010), Arabidopsis thaliana data from Dunkleyet al. (2006)

Page 13: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Plan

Introduction

Spatial proteomics

Previously . . .

Transfer learning

Page 14: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Previously . . .

I Infrastructure: MSnbase(Gatto and Lilley, 2012)

I Spatial proteomics data:pRolocdata (Gatto et al.,2014)

I Novely detection:pRoloc::phenoDisco

(Breckels et al., 2013)I Localisation prediction and

visualisation: pRoloc (Gattoet al., 2014) and pRolocGUI

MSnSet (storageMode: lockedEnvironment)

assayData: 2031 features, 8 samples

element names: exprs

protocolData: none

phenoData

sampleNames: n113 n114 ... n121 (8 total)

varLabels: Fraction.information

varMetadata: labelDescription

featureData

featureNames: Q62261 Q9JHU4 ... Q9EQ93 (2031 total)

fvarLabels: Uniprot.ID UniprotName ... markers (8 total)

fvarMetadata: labelDescription

experimentData: use 'experimentData(object)'Annotation:

- - - Processing information - - -

Loaded on Fri Nov 7 16:49:05 2014.

Normalised to sum of intensities.

Added markers from 'mrk' marker vector. Fri Nov 7 16:49:05 2014

MSnbase version: 1.13.16

Page 15: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Previously . . .

I Infrastructure: MSnbase(Gatto and Lilley, 2012)

I Spatial proteomics data:pRolocdata (Gatto et al.,2014)

I Novely detection:pRoloc::phenoDisco

(Breckels et al., 2013)I Localisation prediction and

visualisation: pRoloc (Gattoet al., 2014) and pRolocGUI

I Several mouse E14TG2a Embryonic Stem cells.I Human Embryonic Kidney fibroblast cells.I The Arabidopsis AT CHLORO data base (Ferro et

al., 2010).I Mouse organs (Foster et al., 2006).I Arabidopsis from callus (Dunkley et al., 2006;

Nikolovksi et al. 2014) and roots (Groen et al.,2014).

I Drosophila embryos (Tan et al., 2009).I Chicken DT40 Lymphocyte cell (Hall et al., 2009).I . . .I Collected from the literature

Page 16: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Previously . . .

I Infrastructure: MSnbase(Gatto and Lilley, 2012)

I Spatial proteomics data:pRolocdata (Gatto et al.,2014)

I Novely detection:pRoloc::phenoDisco

(Breckels et al., 2013)I Localisation prediction and

visualisation: pRoloc (Gattoet al., 2014) and pRolocGUI

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

● ●

●●

●●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

● ●

●●

● ●●

●●

●●

● ●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

ERGolgimitochondrionPMunknown

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

● ●

●●

●●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

● ●

●●

●●

●●

CytoskeletonERGolgiLysosomemitochondrionNucleus

PeroxisomePMProteasomeRibosome 40SRibosome 60Sunknown

Page 17: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Previously . . .

I Infrastructure: MSnbase(Gatto and Lilley, 2012)

I Spatial proteomics data:pRolocdata (Gatto et al.,2014)

I Novely detection:pRoloc::phenoDisco

(Breckels et al., 2013)I Localisation prediction

and visualisation: pRoloc(Gatto et al., 2014) andpRolocGUI

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)P

C2

(29.

96%

)

CytoskeletonERGolgiLysosomemitochondrionNucleus

PeroxisomePMProteasomeRibosome 40SRibosome 60S

Page 18: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Plan

Introduction

Spatial proteomics

Previously . . .

Transfer learning

Page 19: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

What about annotation data from repositories such as GO,sequence features, signal peptide, transmembrane domains,images, protein-protein interactions, ... . . .

I From a user perspective: ”free/cheap” vs. expensiveI Abundant (all proteins, 100s of features) vs. (experimentally)

limited/targeted (1000s of proteins, 6 – 20 of features)I For localisation in system at hand: low vs. high qualityI Static vs. dynamic

number GO features� experimental fractions⇒ dilution of experimental data

Page 20: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

What about annotation data from repositories such as GO,sequence features, signal peptide, transmembrane domains,images, protein-protein interactions, ... . . .

I From a user perspective: ”free/cheap” vs. expensiveI Abundant (all proteins, 100s of features) vs. (experimentally)

limited/targeted (1000s of proteins, 6 – 20 of features)I For localisation in system at hand: low vs. high qualityI Static vs. dynamic

number GO features� experimental fractions⇒ dilution of experimental data

Page 21: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

GoalSupport/complement the primary target domain (experimentaldata) with auxiliary data (annotation) features withoutcompromising the integrity of our primary data.

Updated experimental design for

I primary/experimental data

and

I auxiliary/annotation data

Page 22: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Fractionation/centrifugation

Quantitation/identificationby mass spectrometry

Database query

Extract GO CC terms

Convert terms to binary

PR

IMA

RY EX

PER

IMEN

TAL

DATA

AU

XIL

IARY D

RY D

ATA

O00767P51648Q2TAA5Q9UKV5......

GO:0016021 GO:0005789 GO:0005783 ... ... ...

1 1 1 ... ... ...1 1 0 ... ... ...1 1 0 ... ... ...0 0 0 ... ... .... . .. . .. . .. . .. . .. . .

x1

.

.

.

.

.

.

.

.xn

GO1 ... ... ... ... GOA

O00767P51648Q2TAA5Q9UKV5......

0.1361 0.150 0.1062 0.147 0.277 0.1429 0.0380 0.003380.1914 0.205 0.0566 0.165 0.237 0.0996 0.0180 0.027270.1297 0.201 0.0546 0.146 0.292 0.1463 0.0206 0.009020.0939 0.207 0.0419 0.204 0.344 0.1098 0.0000 0.00000. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .

x1

.

.

.

.

.

.

.

.xn

X113 X114 X115 X116 X117 X118 X119 X121

Visualisation Visualisation

e.g. Mitochondrion

Cell lysis

e.g. Mitochondrion

Page 23: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

−2 0 2 4

−2

−1

01

23

4

PC1 (40.28%)

PC

2 (2

5.7%

)

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown

Data from mouse stem cells (E14TG2a)

We use a class-weighted kNNtransfer learning algorithm tocombine primary and auxiliarydata, based on Wu andDietterich (2004):

V(ci)j = θ∗nPij + (1 − θ∗)nA

ij

Page 24: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Classes and weightsC = {ci=1 , . . . , ci=l }; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,lnP

2,1 . . . nP2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,lnA

2,1 . . . nA2,l

.

.

.

.

.

.

Page 25: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Classes and weightsC = {ci=1 , . . . , ci=l }; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,lnP

2,1 . . . nP2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,lnA

2,1 . . . nA2,l

.

.

.

.

.

.

1

2

c1c2c3

NP =

c1 c2 c3

p133 0 0

p213

23 0

......

...

Page 26: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Classes and weightsC = {ci=1 , . . . , ci=l }; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,lnP

2,1 . . . nP2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,lnA

2,1 . . . nA2,l

.

.

.

.

.

.

Weights matrix (labelled)

c1 c2 c3

θ1 0 0 0θ2 0 0 1

θi...

...... 1 1 0θΘl 1 1 1

F11

F12

F1i...

F1Θl

θ∗ = {1, 0, 1}

(r BiocParallel)

Page 27: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Classes and weightsC = {ci=1 , . . . , ci=l }; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,lnP

2,1 . . . nP2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,lnA

2,1 . . . nA2,l

.

.

.

.

.

.

Class-weighted classifier(unlabelled)

V(ci)j = θ∗nPij + (1 − θ∗)nA

ij

ci=1 . . . ci=l

123 V(ci)j...

j

yj = argmax(V(ci)j)

Page 28: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

θ∗ = {1, 0, 1} NP =

c1 c2 c3

p133 0 0

p213

23 0

......

...

V(c1)1 =1 ×

33

+ (1 − 1) × nA1,1

V(c2)1 =0 × 0 + (1 − 0) × nA1,2

V(c3)1 =1 × 0 + (1 − 1) × nA1,3

V(c1)2 =1 ×13

+ (1 − 1) × nA1,1

V(c2)2 =0 ×23

+ (1 − 0) × nA1,2

V(c3)2 =1 × 0 + (1 − 1) × nA1,3

Class-weighted classifier(unlabelled)

V(ci)j = θ∗nPij + (1 − θ∗)nA

ij

c1 c2 c3

1 V(c1)1 V(c2)1 V(c3)1

2 V(c1)2 V(c2)2 V(c3)2...

...

j

yj = argmax(V(ci)j)

Page 29: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

D                                              E                        

A                    B                                    C  

● ●●

● ●● ●●●●●

● ●●●●●●●

●●

●●

●●

●●●

●●

40S Ribosome 60S Ribosome Cytosol Endoplasmic reticulum

Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus

Plasma membrane Proteasome

0.4

0.6

0.8

1.0

0.6

0.7

0.8

0.9

1.0

0.00

0.25

0.50

0.75

1.00

0.7

0.8

0.9

1.0

0.00

0.25

0.50

0.75

1.00

0.75

0.80

0.85

0.90

0.95

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary

Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary

Combined Primary Auxiliary Combined Primary Auxiliary

F1 s

core

−6 −4 −2 0

−6−4

−20

2

PC1 (3.43%)

PC2

(2.0

8%)

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●●

● ●

●●

●●●

●●

●●●●●●●●●●●●●●

●●●

●●

●●●●

●●

●●

● ●● ●

●●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●●●

●●●●●

●●

●●●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●

● ●

●●●

●●

●●●●

●●

●●

●●

●●●

●●●●●

●●●●

● ●

●●●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●●●●

●●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●● ●●●

●●

●●●

●●

●●

●●●

●●●

●● ●

●●

●●

● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown

−2 0 2 4

−2−1

01

23

4

PC1 (40.28%)

PC2

(25.

7%)

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown ●

0.5

0.6

0.7

0.8

0.9

Combined Primary Auxiliary

F1 s

core

Proteasome

Plasma membrane

Nucleus − Nucleolus

Nucleus − Chromatin

Mitochondrion

Lysosome

Endoplasmic reticulum

Cytosol

60S Ribosome

40S Ribosome

0 1/3 2/3 1Classifier weight

Cla

ss

Data from mouse stem cells (E14TG2a).

Page 30: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Why? – Dual-localisation Proteins may be presentsimultaneously in several organelles (e.g. trafficking).

−6 −4 −2 0 2 4 6

−4

−2

02

4

PC1 (64.36%)

PC

2 (2

2.34

%) ●

●● ●

●●

● ●

●●●

●● ●

● ●●●●●

●● ●

●●

●●

●●●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●● ●●

●● ●

●●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●●

●●

●● ●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●●

● ●

●●

●●

●●

● ●● ●●

●●

●●

●● ●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

● ●

●● ●●

●●●

●●●

●●●●

●●

●●

●● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

● ●

● ●

●●

ER lumenER membraneGolgiMitochondrionPlastidPMRibosomeTGNvacuoleunknown

●● ● ●●

●● ● ● ●

●●

●●

●● ● ● ●

From Betschinger et al. (2013)

−6 −4 −2 0 2 4

−4

−2

02

4

Mouse ESC (E14TG2a) in serum LIF

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●Tfe3

Page 31: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

Why? – Dual-localisation Proteins may be presentsimultaneously in several organelles (e.g. trafficking).

−6 −4 −2 0 2 4 6

−4

−2

02

4

PC1 (64.36%)

PC

2 (2

2.34

%) ●

●● ●

●●

● ●

●●●

●● ●

● ●●●●●

●● ●

●●

●●

●●●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●● ●●

●● ●

●●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●●

●●

●● ●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●●

● ●

●●

●●

●●

● ●● ●●

●●

●●

●● ●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

● ●

●● ●●

●●●

●●●

●●●●

●●

●●

●● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

● ●

● ●

●●

ER lumenER membraneGolgiMitochondrionPlastidPMRibosomeTGNvacuoleunknown

●● ● ●●

●● ● ● ●

●●

●●

●● ● ● ●

From Betschinger et al. (2013)

−6 −4 −2 0 2 4

−4

−2

02

4

Mouse ESC (E14TG2a) in serum LIF

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●Tfe3

Page 32: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

FundingBBSRC and PRIME-XS EU FP7

I Lisa Breckels, Computational Proteomics UnitI Kathryn Lilley, Cambridge Centre of ProteomicsI Sean Holden, Computer Laboratory

Thank you for your attention

Page 33: Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining experimental and annotation data to predict protein sub-cellular localisation. Laurent

J Betschinger, J Nichols, S Dietmann, P D Corrin, P J Paddison, and A Smith. Exit from pluripotency is gated byintracellular redistribution of the bhlh transcription factor tfe3. Cell, 153(2):335–47, Apr 2013. doi:10.1016/j.cell.2013.03.012.

LM Breckels, L Gatto, A Christoforou, AJ Groen, KS Lilley, and MW Trotter. The effect of organelle discovery uponsub-cellular protein localisation. J Proteomics, 88:129–40, Aug 2013.

TPJ Dunkley, S Hester, IP Shadforth, J Runions, T Weimar, SL Hanton, JL Griffin, C Bessant, F Brandizzi, C Hawes,RB Watson, P Dupree, and KS Lilley. Mapping the Arabidopsis organelle proteome. PNAS, 103(17):6518–6523, Apr2006.

LJ Foster, CL de Hoog, Y Zhang, Y Zhang, X Xie, VK Mootha, and M Mann. A mammalian organelle map by proteincorrelation profiling. Cell, 125(1):187–199, Apr 2006.

L Gatto and KS Lilley. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry data visualization,processing and quantitation. Bioinformatics, 28(2):288–9, Jan 2012.

L Gatto, JA Vizcaino, H Hermjakob, W Huber, and KS Lilley. Organelle proteomics experimental designs and analysis.Proteomics, 2010.

L Gatto, L M Breckels, S Wieczorek, T Burger, and K S Lilley. Mass-spectrometry based spatial proteomics data analysisusing pRoloc and pRolocdata. Bioinformatics, Jan 2014.

TR Kau, JC Way, and PA Silver. Nuclear transport and cancer: from mechanism to intervention. Nat Rev Cancer, 4(2):106–17, Feb 2004.

K Laurila and M Vihinen. Prediction of disease-related mutations affecting protein localization. BMC Genomics, 10:122,2009.

P Wu and TG Dietterich. Improving svm accuracy by training on auxiliary data sources. In Proceedings of the Twenty-firstInternational Conference on Machine Learning, ICML ’04, New York, NY, USA, 2004. ACM.