introduction to ms-based proteomics and bioconductor infrastructure · 2015-06-23 · csama 17 june...

38

Upload: others

Post on 25-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Introduction to MS-based proteomics andBioconductor infrastructure

Laurent Gatto

[email protected] � @lgatt0

http://cpu.sysbiol.cam.ac.uk

CSAMA � 17 June 2015

Page 2: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Outline

Proteomics and MS data

Bioconductor infrastructure

Examples

Ranges infrastructure

Application: spatial proteomics

Page 3: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Mass-spectrometry � LC-MS/MS

Precursor iondissociation

Detector

Precursor ions MS1

Fragmented ions

MSMS - MS2

AnalyserSourceSeparation

Page 4: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

0 500 1000 1500 2000 2500 3000 3500

020

4060

8010

0Chromatogram: total intensity over time

Time (sec)

Tota

l ion

cur

rent

TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01−20141210.mzMLTIC: 7.22e+09

Page 5: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

MS1 (and MS2) spectra

400 500 600 700 800

0.0e

+00

5.0e

+07

1.0e

+08

1.5e

+08

2.0e

+08

MS1 scan @ 21:3 min

m/z

inte

nsity 200 400 600 800

0e+

002e

+06

4e+

066e

+06

8e+

061e

+07

MS2 scan, precursor m/z 460.79

200 400 600 800 1000

0e+

001e

+06

2e+

063e

+06

4e+

065e

+06

6e+

06

MS2 scan, precursor m/z 488.8

m/z

Page 6: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Mass-spectrometry � LC-MS/MS

Precursor iondissociation

Detector

Precursor ions MS1

Fragmented ions

MSMS - MS2

AnalyserSourceSeparation

Page 7: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Fragmentation

Credit abrg.org

Page 8: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

cid <- calculateFragments("AEGKLRFK",

type=c("b", "y"), z=2)

## Modifications used: C=160.030649

ht(cid, n = 3)

## mz ion type pos z seq

## 1 36.52583 b1 b 1 2 A

## 2 101.04713 b2 b 2 2 AE

## 3 129.55786 b3 b 3 2 AEG

## ...

## mz ion type pos z seq

## 31 357.7185 y6* y* 6 2 GKLRFK

## 32 422.2398 y7* y* 7 2 EGKLRFK

## 33 457.7583 y8* y* 8 2 AEGKLRFK

Page 9: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

MS1 and MS2 spectra

400 500 600 700 800

0.0e

+00

5.0e

+07

1.0e

+08

1.5e

+08

2.0e

+08

MS1 scan @ 21:3 min

m/z

inte

nsity 200 400 600 800

0e+

002e

+06

4e+

066e

+06

8e+

061e

+07

MS2 scan, precursor m/z 460.79

200 400 600 800 1000

0e+

001e

+06

2e+

063e

+06

4e+

065e

+06

6e+

06

MS2 scan, precursor m/z 488.8

m/z

Page 10: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

MS1 and MS2 spectra

400

600

800

1000

1200

21.10

21.15

21.20

21.25

21.30

21.35

0.0e+00

5.0e+07

1.0e+08

1.5e+08

2.0e+08

m/zretention time200

400

600

800

1000

1200

21.06

21.07

21.08

21.09

21.10

21.11

0.0e+00

5.0e+07

1.0e+08

1.5e+08

2.0e+08

m/zretention time

Page 11: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Proteomics data

I raw data:MS1 and MS2 overretention time

I identi�cation:MS2

I quantitation:MS1 or MS2

I protein database(to match MS2 spectraagainst)

Status package

Raw (mz*ML) X mzR

mzTab X MSnbase

mgf X MSnbase

mzIdentML X mzID, mzRmzQuantML (?mzR)

Page 12: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Bioconductor infrastructure

biocViews: Proteomics, MassSpectrometry

010

2030

4050

Num

ber

of B

ioco

nduc

tor

pack

ages

2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14

Bioconductor Versions

ProteomicsMassSpectrometryMassSpectrometryData

010

000

2000

030

000

4000

050

000

Pac

kage

dow

nloa

ds

2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14

Bioconductor Versions

ProteomicsMassSpectrometry

Page 13: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Learning from Bioconductor

| genomics | proteomics |

|------------------+------------------------|

| eSet (past?) | *MSnSet (present) |

| Ranges (present) | *Pbase et al. (future) |

| | PPI |

| | *localisation (present)|

Page 14: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

MSnSet

Page 15: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Example

library("MSnbase")

rx <- readMSData("rawdata.mzML") ## raw data

rx <- addIdentificationData(rx, "identification.mzid")

rx <- rx[!is.na(fData(rx)$pepseq)]

plot(rx[[10]], reporters = TMT6, full=TRUE)

Page 16: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Example

0e+00

2e+05

4e+05

6e+05

8e+05

300 600 900 1200M/Z

Inte

nsity

Precursor M/Z 600.36

0e+00

2e+05

4e+05

6e+05

8e+05

126.080 126.591 127.102 127.613 128.124 128.635 129.146 129.657 130.168 130.679 131.190

Page 17: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Example

library("MSnbase")

rx <- readMSData(f, centroided = TRUE)

rx <- addIdentificationData(rx, g)

rx <- rx[!is.na(fData(rx)$pepseq)]

plot(rx[[10]], reporters = TMT6, full=TRUE)

plot(rx[[4730]], rx[[4929]])

Page 18: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Example

0 500 1000 1500

−1.

0−

0.5

0.0

0.5

1.0

m/z

inte

nsity

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

prec scan: 6975, prec mass: 545.017, prec z: 3, # common: 34

●●●

●●●

●●

●●

●●●

●●

●●

●● ●●

● ● ●

●●●

prec scan: 7198, prec mass: 545.017, prec z: 3, # common: 35

Page 19: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Example

library("MSnbase")

rx <- readMSData(f, centroided = TRUE)

rx <- addIdentificationData(rx, g)

rx <- rx[!is.na(fData(rx)$pepseq)]

plot(rx[[10]], reporters = TMT6, full=TRUE)

plot(rx[[4730]], rx[[4929]])

qt <- quantify(rx, reporters = TMT6)

## qt <- readMSnSet("quantdata.csv", ecols = 5:11)

nqt <- normalise(qt, method = "vsn")

boxplot(exprs(nqt))

MAplot(nqt[, 1:2])

Page 20: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Example

● ● ●

TMT6.126 TMT6.128 TMT6.130

1015

2025

● ●●●●●●●●

●●●● ●●

●●●

●●●

●●

●●● ● ●●●●●● ●●●

●●●●

●● ●●

●●●

●●● ●

●●●

●●

●●●●●●●

●●●

● ●●● ●●● ●●●●●●

●● ●●●

●●

●●●●●●

●●●

●●●●●

●●●●● ●●●●

●●●

● ●●●●●●●● ●●

●●●●

●●

●●● ●

●● ●

●●● ●

●●●●●●

●●

●●●● ●

●●

●●● ●

● ●●

●●●●

●● ●

●●● ●

●●●

●●●●●●●●

●●

●● ●●●

●●● ●

●●

●●

●●

● ●●● ●

● ●● ●

●●●

●●● ●●●

●●

●●

●●●● ●●●

●●

●●

●●

●●

●●●

●●●

●●●

●● ●●●●

●●●●● ● ●●●●●●

● ●● ●●●●●

●●●

● ●●●●●● ●●● ●

●●

● ●●●●●

● ●●● ●● ●●●●

● ●●●●●

● ●●●

●●●●●●

● ●

● ●●●

● ●

●●● ●

●●● ●

● ●● ●●●●●

●●● ●● ●

●●●●

●●

●●

● ●

●●●●●●

● ●●●●●●●●

●●● ●●● ●●●

●●

●●●

●●●●

●●

●●●

●●●●●●

●●●●

●●●●●●

●●●●●●●

●●

●●● ●●

●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●●●●●

● ●●●●

●●●●

●●●●

●●●

●●●● ●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●● ●●● ●●

●●

●●● ●

●●

●●

●● ●● ●

●●●●●●●

●●●●

●●●● ●

●●● ●●●● ●●

●●● ●●

●●

● ●●● ● ●●

●●

●●

●●

●●

●●● ●● ●

● ●● ●●

●●●

●●

●●●●●●

●●● ● ●●●●●

●●● ●●●●●●●

●●

●●

●●

●●●●● ●

●●●●●●● ●●

●●●●●●

● ●●●

●● ●●

●●● ●

● ●●●●●● ●

●●●●●●●

●● ●●●●●

●●●

●●

●● ●

●● ●● ●

●●●●

●● ●

●●

●●

●●●●

●●●●●

●●●●●

●●● ●

●●●●● ●

●●●

●● ●●●●

●●

●●

● ●●●

●●●●

●●●●●

●●

●●● ●●●●●●

●●

● ●●●●●●●●●

●●●

●●●

●●●

● ●●●●● ●● ●●●●

●●●

●●

●●●●●●

●●

●●●●●● ●●●

●●●●●

●●●

●● ●●●●

●●

●●

●●●

●● ●

●●●●●●

● ●●●

●●

●●●

●●●

●●●●● ●

●● ●

●●●●●●●

● ●●●●●●●●●● ●●

●●

●●●

●●●●●●●●

●●

●●

● ●●●●● ●

●●

●●

●●●

● ●● ●

●●

●●● ●●●●

●●

●● ●

●●● ●

●●

●● ●●●●●

●●

●● ●●

●●●

● ●●● ●

●●●●

●●●

● ●●

●●●●●● ●● ●●● ●●●

● ●●●●

●● ●●● ●

● ●●●

●●●●● ●●●

●●

●● ●

●●

●●●●● ●●●●

●●●●●●●●●

●●●●●●●●●

● ●●

●●●

● ●●

●●●

●●

●●

●●●

●●

●●

●●● ●●●●

●●●

● ●●●●●

●●●

●●●

●●

●●

●●● ●● ●

●●

●●●

●●● ●●

●●

● ●●

● ● ●●●●●●

●●

●●●

●●●● ●● ●●

●●●●●●

● ●

●●●

●●●

●● ● ●

●●●●●●

●●

●●●

●●●●● ●●●●

●●●

●●

●●●

●●

● ● ●●●

●●●●

●●

●●●●●

●●●

●●●

●●

●●● ●

●●●

●●

●●●●●●●

●● ●●●●

●●● ●

●●

●●

●●

●●●●

● ●

●●

●●● ●●●

●●●●

●●

●●

●●●

●●●●●●

●●● ● ●●●

●●

● ●

● ●●●

●●

● ●●

●●●

● ●●

●●●●●● ●●●

● ●●●

●● ●● ●● ●

●●●●●

●●

●●

●●● ●

●●●●

●● ● ●

●●

●●●● ●●● ●

●● ●●●

● ●●●●●●

● ●●●● ●

●● ●●

●●●●

●●●

●●●

● ● ●●●●●●

●● ●●

●●●● ●

●●

● ● ●●

●●●●

●●●●

●●

●●●●

●●●

●●●

●●

● ●● ●

●●●

●●

●●

●●●●●●

● ● ●●

●●● ●

●●

●●●

●●

●●●

●●

● ●●●

●● ●●

● ●●

●●

●●

● ● ●●●

●●

●●

●●

●● ●●

●●●

●● ●● ●●

●●

●●●

●●

●●

●●●

●●●●

●●

● ●●● ●

● ● ●●●●

●●

●●

●● ●

●●

●●

● ●●

●● ●●

●●

● ●● ●●

●● ●●

●●●

●●●

●●●●

●●

●●

●●●

● ●●●●

●●

●●

●●

●● ●

●● ● ●

●●

●●●

●●●

●●

● ●

●●

●●

●●

●●

●● ● ●●

● ●

●● ●

●● ●●● ●

●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●●

●●●

● ●

● ●●

●●

●●

● ● ●

●●

●●

●●

●●

●●●●

●●

● ●● ●●

●●●

●●●

●● ●

●●

●● ●

●●

●● ●

●● ●

●●

●●

● ●

●●●

●●

●● ●●●

●●

● ●

●●

●●

●● ●●

●● ●●

●●●

●●● ●

●●

●●

●●●

●●

●● ● ●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●●

● ●●

●●●●●● ●

● ●● ●●●

●●●

● ●●

●●

●● ●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●●

●●●

●●

●●

● ●●

●●

●●

●●

●●

●●●●●● ● ●

●●

●●●●●

●●

●● ● ●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●●●

● ●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●●

● ●●●●

●● ●●

●●

●●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

● ●● ●● ●

● ●●

● ●●

● ●● ●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●●●

●●

●●●

● ●●●●●

●●

●●●●●

●● ● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●●● ●●

●●

●●

●●●●

●●●

●●

●●●

● ●●

●●

●●● ●●●●

●●

● ●

●●●

●●●

●●●

●●●●● ●● ●●

● ●

●●

●●

●●

●● ●●●● ●●●

●● ●●●

●●

●●●●

●●●●●

●●

●●

● ●

●● ●●

●●

●●

●●●

●●

●● ●

●●●

●●

●●

●●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●● ●● ●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●●

●●

● ● ●●●●

●●

●●

●●●

● ●

●●

● ●

●●

●●

●● ●●

●●●

●●

●●

●●

●● ●

●●

●●

●●●

●●

●●

●●● ●●●

●●

●● ● ● ●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●● ●●

●●

● ●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

● ●●

●●

●● ●●●

●●● ●

●●

●●

●●●

●●●

● ●● ●●

●●● ●

●●

●●

●●

●●

● ●

●●●

●●● ●●●

●● ●●

●●

●●

● ●●● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●●

●●●

●●●●

●●

●●

●●

●●●●

●●

● ●●

●●

●●

●●● ●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●●● ●●

●●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

● ●● ●● ●●

●● ●

●● ●

● ●●●

● ●●●● ●●

●●

●●●●

● ●●

●●●

● ●

●●

● ●

●●●

●●

●●●

●●

● ●●

●●●●●●

●●

● ●● ● ●

●●

●●●●

●●●

●●●

●●

●●●

●●

●●

●●

●● ●●●●

●●●

●●

●● ●● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●●

● ● ●● ●●

●●●

●●

●●

●●

●●

●●

●● ●

●● ●●

● ●

● ●

●●

● ●

●●

●●●

●●

●●

●●●

●●●

● ●

● ●● ●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●

●●

● ●●●

●●●●

●●

●●●

● ●●

●● ●

●● ●

●●●

●●

●●

●●● ● ●

●●

●●

● ●●●

●●

● ●

● ●

● ●●●

●●

●●

●●

●●●

●●

● ●● ●● ●

●●

●●

●●●

●●

●● ●●●

●●

●●

●●

● ●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

● ●● ●●

●●

●●●

●● ●●●

●●

●●●●●

●●● ●●●

●●

●● ● ●

●●

●●●

● ●

●●

●●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●● ●●

● ●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●● ●●

● ●

●● ●

● ●●

●●●

● ●

●●●

●●

●●

●●

●●● ●

●●

●●

● ●●●

● ●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●

● ●

●●

●●

● ●●

●●●

●●●●

●●● ●

●●

●●● ●

● ●●

● ●●●

●●

●●

● ●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●● ●

● ● ●●●

●●●●

●●

●●

●●● ●

●●●

● ●●

●●

●●● ●

●● ●●

●●

●●●

●●●

●●●●●

●●●

●●

● ●●

●●●●

●●

●●● ●

●●● ●

●●●

●●

●●●●

●●

● ●●

●●●

●●●

●●

●● ●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

● ●●

●●

●●

●● ●

● ●●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●● ●●

●●

●●

●●

●●

● ●●● ●

●●

●●●

●●

●●

●●

●●

●● ●●●

●●●

●●●●

●●

●●

●●

● ●●

●●● ●

●● ● ●●● ●●● ●●●

●●

●●

●●●●

●●●

●●

●●●

● ● ●●●●●●● ●●

● ●●●●●●● ● ●

●●●●

●●●

●●●

●●

●●●

●●●●●

●●

● ●●

●●● ●●●

●●

● ●●●●●●

●●●

●●●

●●●●●●

●●● ●●

●●

●●● ●●● ●

●●●●

●●●

● ●● ●●●

●● ●●●●

●●● ●● ●●

●●●

●●● ●●●● ●

●● ●● ●●●● ●●●

●●●●

●●●

● ●● ●●●●●●●● ●●●●

●●●●

●●

●● ●● ●●

●●

●●●●●●

●● ● ●● ●

●●

●●●●● ●●●●●

●● ●● ●●●●●● ●

●●● ●●

●●●

●●●●●

●●●●● ●

●● ●●

●●

●●● ●●●

●●●

●●●●● ●●

●●●

●●●●●●

●●●●● ●●

● ●●●

●●

●●●●●

●●●●

● ●●●●● ●●●●

●● ● ●●

●●●

●●● ●●

3.0 3.5 4.0 4.5−

0.15

−0.

10−

0.05

0.00

0.05

0.10

0.15

A

M

Median: 0.000187IQR: 0.013

Page 21: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

More

I RforProteomics package

library("RforProteomics")

RforProteomics()

RProtVis()

citation(package = "RforProteomics")

I Proteomics work�ow on the Bioc site

I Lab on Friday

Page 22: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

I protein database

I raw dataI quantitationI identi�cation

Page 23: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Ranges infrastructure

gs ge

G

T1

esi ee

i

i = 1

esi ee

i

i = 2

esi ee

i

i = 3

esi ee

i

i = 4

T2i = 1 i = 3 i = 4

T3i = 1 i = 4

P

psj pe

j

j = 1

psj pe

j

j = 2

psj pe

j

j = 3

Pj = 1 j = 2 j = 31 LP

Π

πsk πe

k

k = 1

πsk πe

k

k = 2

πskπe

k

k = 3

πskπe

k

k = 4

Page 24: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Pbase package

library("Pbase")

p <- Proteins("uniprot.fasta")

p <- addIdentificationData(p, "identification.mzid")

aa(p) ## peptides sequences as a AAStringSet

pranges(p) ## peptide ranges as IRangesList

i <- which(acols(p)[, "EntryName"] == "EF2_HUMAN")

plot(p[i])

plot(p[i], from = 155, to = 185)

Page 25: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Along protein coordinates

100

200

300

400

500

600

700

800

NH COOH

pept

ides

230

240

250NH COOH

P13

639

W A F T L K Q F A E M Y V A K F A A K G E G Q L G P A E R A K K V E

pept

ides

Page 26: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Along genome coordinates

... using transcript models as GRangesList and Gviz for plotting.

Chromosome 1

156.11 mb

156.115 mb

156.12 mb

156.125 mb

156.13 mb

156.135 mb

pept

ides

SYLLGNSSPRTQSPQNCSIM

RATRSGAQASSTPLSPTR

SVGGSGGGSFGDNLVTR

RATRSGAQASSTPLSPTR ELEKTYSAKLDNAR

METPSQRRATR

DTSRRLLAEKEREMAEMR

LALDMEIHAYR

LALDMEIHAYRK

RQNGDDPLLTYRFPPK

SVGGSGGGSFGDNLVTR

SVGGSGGGSFGDNLVTR

From the Pbase mapping vignette.

Page 27: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Along genome coordinates (with raw data)

Chromosome X

78.11 mb

78.115 mb

78.12 mb

78.125 mb

5' 3'3' 5'

MS

2 sp

ectr

aE

NS

T00

0003

7331

6

From the Pbase mapping vignette.

Page 28: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

With RNA-Seq readsChromosome 17

30.12 mb

30.13 mb

30.14 mb

30.15 mb

30.16 mb

30.17 mb

30.18 mb

EN

ST

0000

0612

959

0

20

40

60

80

Alig

nmen

tsTr

ack

From https://github.com/ComputationalProteomicsUnit/Intro-Integ-Omics-Prot

Page 29: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Spatial proteomics

I The cellular sub-division

allows cells to establish a

range of distinct

microenvironments, each

favouring di�erent

biochemical reactions and

interactions and, therefore,

allowing each compartment

to ful�l a particular

functional role.

I Localisation and

sequestration of proteins

within subcellular niches is a

fundamental mechanism for

the post-translational

regulation of protein

function. Spatial proteomics is the systematic

study of protein localisations.

Page 30: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Figure : Immuno�uorescence:ZFPL1, Golgi (left) and FHL2,mainly localized to actin �lamentsand focal adhesion sites. Alsodetected in the nucleus (right).(from the Human Protein Atlas)

Cell lysatefraction

centrifugation

iTRAQMS/MS

LOPIT(PCA,

PLS-DA)

label-freeMS/MS

(χ )2PCP

Pure fraction

catalogue

Invariantrich

fraction(clustering)

Subtractiveproteomics

(enrichment)

Figure : Mass spectrometry-basedapproaches based on densitygradient subcellular fractionation.

Page 31: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Cell membrane lysis

Mechanical or bu�er-induced lysis of the plasma membrane with

minimal disruption to intracellular organelles followed by subcellular

fractionation.

Page 32: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Density gradient separation

Page 33: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Quantitation by LC-MSMS

Page 34: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Data

Fraction1 Fraction2 . . . Fractionm markers

p1 q1,1 q1,2 . . . q1, m unknown

p2 q2,1 q2,2 . . . q2, m loc1

p3 q3,1 q3,2 . . . q3, m unknown

p4 q4,1 q4,2 . . . q4, m lock...

......

......

...

pn qn,1 qn,2 . . . qn, m unknown

Data analysis

MSnbase for data manipulation, pRoloc for clustering, classi�cation

and plotting, and pRolocGUI for interactive exploration.

Page 35: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

0.2

0.3

0.4

0.5

Correlation profile − ER

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

Correlation profile − Golgi

Fractions

1 2 4 5 7 81112

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − mit/plastid

Fractions

1 2 4 5 7 81112

0.15

0.20

0.25

0.30

0.35

Correlation profile − PM

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − Vacuole

Fractions

1 2 4 5 7 81112

●●

●●

●●

●●

●●●● ●

●●

●●

●●

−10 −5 0 5

−5

05

Principal component analysis

PC1

PC

2

ERGolgimit/plastidPM

vacuolemarkerPLS−DAunknown

Figure : From Gatto et al. (2010), data from Dunkley et al. (2006).

Page 36: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

2009 vs 2013

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●● ●●

● ●

● ●

●●

● ●●

●●

●●

●● ●

●● ●

●●●

ER/GolgimitochondrionPMunknown

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

PC

2 (2

9.96

%)

CytoskeletonERGolgiLysosomemitochondrionNucleus

PeroxisomePMProteasomeRibosome 40SRibosome 60S

Figure : Semi-supervised approach Breckels et al. (2013). Data from Tanet al (2009).

Page 37: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

−6 −4 −2 0 2 4

−4

−2

02

4

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●●●●●●

111111

●●●●●●222222 ●

●●●3333

●●●●●

●●●4 4

4444

44

●●

●●

● ●

55

5

55

5 5

●● ●●●

● ●

● 66 666

6 6

6

6

●●●●●●●●●

● ●● 777 7 77 7

777 7

7

●●●●● ●●●

88888 8

88

●●●● ●●●

9999 999

9

1 Dynein2 Vesicles − Clathrin 3 13S condensin4 T complex5 Nucleus lamina

6 Vesicles − COPI/II7 eIF3 complex8 ARP2/3 complex9 COP9 signalosome

From Betschinger et al. (2013)

−6 −4 −2 0 2 4

−4

−2

02

4

Mouse ESC (E14TG2a) in serum LIF

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●Tfe3

Page 38: Introduction to MS-based proteomics and Bioconductor infrastructure · 2015-06-23 · CSAMA 17 June 2015. Outline Proteomics and MS data Bioconductor infrastructure ... 5.0e+07 1.0e+08

Acknowledgement

I Lisa Breckels

I Sebastien Gibb

I Kathryn Lilley

(CCP)

## R version 3.2.0 Patched (2015-04-22 r68234)## Platform: x86_64-unknown-linux-gnu (64-bit)## Running under: Ubuntu 14.04.2 LTS#### attached base packages:## [1] stats graphics grDevices utils datasets methods base#### loaded via a namespace (and not attached):## [1] Biobase_2.29.1 vsn_3.37.1## [3] splines_3.2.0 foreach_1.4.2## [5] Formula_1.2-1 affy_1.47.1## [7] Pbase_0.9.0 highr_0.5## [9] stats4_3.2.0 latticeExtra_0.6-26## [11] BSgenome_1.37.1 Rsamtools_1.21.8## [13] impute_1.43.0 RSQLite_1.0.0## [15] lattice_0.20-31 biovizBase_1.17.1## [17] limma_3.25.9 chron_2.3-45## [19] digest_0.6.8 GenomicRanges_1.21.15## [21] RColorBrewer_1.1-2 XVector_0.9.1## [23] colorspace_1.2-6 preprocessCore_1.31.0## [25] plyr_1.8.2 MALDIquant_1.12## [27] XML_3.98-1.2 biomaRt_2.25.1## [29] zlibbioc_1.15.0 scales_0.2.4## [31] affyio_1.37.0 cleaver_1.7.0## [33] BiocParallel_1.3.25 IRanges_2.3.11## [35] ggplot2_1.0.1 SummarizedExperiment_0.1.5## [37] GenomicFeatures_1.21.13 nnet_7.3-9## [39] Gviz_1.13.2 BiocGenerics_0.15.2## [41] proto_0.3-10 survival_2.38-1## [43] magrittr_1.5 evaluate_0.7## [45] doParallel_1.0.8 MASS_7.3-40## [47] foreign_0.8-63 mzR_2.3.1## [49] BiocInstaller_1.19.6 Pviz_1.3.0## [51] tools_3.2.0 data.table_1.9.4## [53] formatR_1.2 matrixStats_0.14.0## [55] stringr_1.0.0 MSnbase_1.17.5## [57] S4Vectors_0.7.5 munsell_0.4.2## [59] cluster_2.0.1 AnnotationDbi_1.31.16## [61] lambda.r_1.1.7 Biostrings_2.37.2## [63] pcaMethods_1.59.0 GenomeInfoDb_1.5.7## [65] mzID_1.7.0 futile.logger_1.4.1## [67] grid_3.2.0 RCurl_1.95-4.6## [69] dichromat_2.0-0 iterators_1.0.7## [71] VariantAnnotation_1.15.13 bitops_1.0-6## [73] gtable_0.1.2 codetools_0.2-11## [75] DBI_0.3.1 reshape2_1.4.1## [77] GenomicAlignments_1.5.9 gridExtra_0.9.1## [79] knitr_1.10.5 rtracklayer_1.29.10## [81] Hmisc_3.16-0 ProtGenerics_1.1.0## [83] futile.options_1.0.0 stringi_0.4-1## [85] parallel_3.2.0 Rcpp_0.11.6## [87] rpart_4.1-9 acepack_1.3-3.3