data integration, mass spectrometry proteomics software development

52

Upload: neil-swainston

Post on 10-May-2015

1.217 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data Integration, Mass Spectrometry Proteomics Software Development
Page 2: Data Integration, Mass Spectrometry Proteomics Software Development

Overview

• Quantitative proteomics

• Data integration in kinetic modelling in systems biology

Page 3: Data Integration, Mass Spectrometry Proteomics Software Development

A typical proteomics experiment

• Various routes through this mapSeparating by size or charge in most cases

Identify peptides as a proxy for proteins, comparing theoretical

to experimental spectra

Page 4: Data Integration, Mass Spectrometry Proteomics Software Development

Quantitative proteomics

• Approach described is qualitative• Peptides / proteins identified but not quantified

• Mass spectrometry is not quantitative per se• Different compounds have different physiochemical

properties• May ionise differently, more / less readily

• Therefore peak intensities cannot be compared between two different compounds• Applies to peptides / proteins

Page 5: Data Integration, Mass Spectrometry Proteomics Software Development

Quantitative proteomics

• BUT peak intensities can be compared between compounds sharing the same physiochemical properties• Isotopes

• Same physiochemical properties• Different molecular masses (ΔM = 6Da)

Page 6: Data Integration, Mass Spectrometry Proteomics Software Development

Quantitative proteomics

Page 7: Data Integration, Mass Spectrometry Proteomics Software Development

Quantitative proteomics

• Can apply the same principle for peptides:

• IDVAVDSTGVFK• IDVAVDSTGVFK*

• Lysine (K) residue is labelled with C13

• Same physiochemical properties• Different molecular masses (ΔM = 6Da)

Page 8: Data Integration, Mass Spectrometry Proteomics Software Development

Quantitative proteomics

• Absolute quantitative proteomics requires isotopically-labelled peptide of known concentration spiked into sample

• Isotopically-identical peptides behave consistently– Comparable peak intensity, comparable retention time

• Ratio of labelled over non-labelled peptide can be used to determined absolute concentration of sample peptide

Page 9: Data Integration, Mass Spectrometry Proteomics Software Development

expected and observed ratio areas 3 peptides

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

proportion of light present

area

rat

io H

/(H

+L)

expected L/(L+H)

area L/(L+H) peptide 2

area L/(L+H) peptide 1

area L/(L+H) peptide 3

Linear (expectedL/(L+H))

DBKtest07 #1073 RT: 15.60 AV: 1 NL: 1.67E6T: FTMS + c ESI Full ms [300.00-2000.00]

516 517 518 519 520 521 522 523 524

m/z

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lativ

e A

bu

nd

an

ce

519.29

516.28

519.79

516.78

520.29

517.28

518.79

520.79519.14517.78520.14 521.29518.28 519.46516.45516.11 522.13515.77

Mixture 40:60

Data: Kathleen Carroll (Orbitrap MS)

Quantitative proteomics: QconCAT

Page 10: Data Integration, Mass Spectrometry Proteomics Software Development

Quantitative proteomics: QconCAT

• Requirements:• Determine absolute protein concentrations under a

given cellular condition• Quantify a number (~50) proteins simultaneously

• Apply QconCAT methodology• Allows simultaneous introduction of many labelled

peptides into sample• Multiplexed absolute quantification for proteomics using

concatenated signature peptides encoded by QconCAT genes. Pratt JM, et al. Nat Protoc. 2006, 1:1029-43.

Page 11: Data Integration, Mass Spectrometry Proteomics Software Development

Quantitative proteomics: QconCAT

• Construct an artificial protein containing many peptides– At least one from each protein of interest– Ensure that the artificial protein is isotopically-labelled

Page 12: Data Integration, Mass Spectrometry Proteomics Software Development

• Numerous absolute protein quantitations can be performed simultaneously

Quantitative proteomics: QconCAT

Page 13: Data Integration, Mass Spectrometry Proteomics Software Development

…from instrument to browser

• From an QconCAT informatics perspective, there are three steps…

1. Selection of QconCAT peptides2. Analysis and submission of data3. Browsing / querying

Page 14: Data Integration, Mass Spectrometry Proteomics Software Development

Selection of QconCAT peptides

Q. Given a given protein, which peptides are suitable candidates for QconCAT peptides?

Must…• Be unique across organism• Be detectable (digestible, flyable)

Preferably…• Be unmodified

Page 15: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT Selection Wizard

• Takes protein accession numbers as input (and other parameters)

• Provides list of potential QconCAT peptides• Downloads sequence• Performs BLAST against species-specific UniProt (tests

uniqueness)• Filters peptides “appropriately”• Applies score to peptide, using PeptideSieve (predict

flyability)• Computational prediction of proteotypic peptides for quantitative

proteomics. Mallick P, et al. Nat Biotechnol. 2007, 25:125-31.

Page 16: Data Integration, Mass Spectrometry Proteomics Software Development
Page 17: Data Integration, Mass Spectrometry Proteomics Software Development
Page 18: Data Integration, Mass Spectrometry Proteomics Software Development
Page 19: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT…

Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes. Pratt JM, et al. Nature Protocols 1, 1029-1043 (2006)

Page 20: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT data analysis

• Identify and quantify peptides / proteins of interest

• Generate results in standard data format• Facilitates data sharing• Exploit existing software tools

• PRIDE XML• PRoteomics IDEntifications• Community developed standard• http://www.ebi.ac.uk/pride/

Page 21: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

Page 22: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

Page 23: Data Integration, Mass Spectrometry Proteomics Software Development

Pride Converter

• Pride Converter (EBI) used to extract meta-data• Who ran the sample, what was the sample,

instrument used? etc.• http://code.google.com/p/pride-converter/• PRIDE Converter: making proteomics data-sharing easy. Barsnes H,

et al. Nat Biotechnol. 2009, 27:598-9.

• Simple wizard allowing experimental data to be marked up with meta-data

Page 24: Data Integration, Mass Spectrometry Proteomics Software Development

Pride Converter

Page 25: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

Page 26: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

Page 27: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

Page 28: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT PrideWizard: Identify

• Goal: to identify heavily-labelled QconCAT peptides• Uses Mascot• http://www.matrixscience.com/search_form_select.ht

ml

• De facto standard database search engine for identifying peptides / proteins

Page 29: Data Integration, Mass Spectrometry Proteomics Software Development
Page 30: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT PrideWizard: Identify

• Mascot results are parsed to find labelled QconCAT peptides:

Page 31: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

Page 32: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT PrideWizard: Quantify

• Goal: to quantify heavily-labelled QconCAT peptides

• We now know m/z and retention time of peak identified as a QconCAT peptide

• First step: extract mass chromatogram for both heavy (labelled) and light (unlabelled) peptide

Page 33: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT PrideWizard: Quantify

• Extracted mass chromatograms• Heavy and light peptide should overlay as they should

have same retention time

Page 34: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT PrideWizard: Quantify

• Could use peak areas to quantify heavy versus light• BUT hard (and inaccurate) to determine start and end

Page 35: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT PrideWizard: Quantify

• Alternative: extract individual scans showing isotopic clusters for both heavy and light

Page 36: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT PrideWizard: Quantify

• Apply sliding window and plot heavy versus light:

Page 37: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT PrideWizard: Quantify

• Final step: apply linear regression to determine heavy:light ratio (and an error):

Page 38: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

Page 39: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

Page 40: Data Integration, Mass Spectrometry Proteomics Software Development

MCISB Proteome Database

• Searchable repository of quantitative proteomics data

• Geeky bit…• eXist native XML database holding PRIDE XML• JSP front end• Querying extensible through XQuery

• Web and web-service interface• Both human and computer-queryable

Page 41: Data Integration, Mass Spectrometry Proteomics Software Development
Page 42: Data Integration, Mass Spectrometry Proteomics Software Development

QconCAT informatics pipeline

• Reference:

• A QconCAT informatics pipeline for the analysis, visualization and sharing of absolute quantitative proteomics data. Swainston N, et al. Proteomics. 2011, 11:329-33.

Page 43: Data Integration, Mass Spectrometry Proteomics Software Development

Data Integration

Page 44: Data Integration, Mass Spectrometry Proteomics Software Development

Systems biology modelling

Enzyme kineticsQuantitativemetabolomics

Quantitativeproteomics

Systems Biology Model

Parameters(KM, Kcat)

Variables(metabolite, proteinconcentrations)

PRIDE XML MeMo SABIO-RK

Web serviceWeb serviceWeb service

MeMo-RK

Web service

Page 45: Data Integration, Mass Spectrometry Proteomics Software Development

Systems biology modelling

Enzyme kineticsQuantitativemetabolomics

Quantitativeproteomics

Systems Biology Model

Parameters(KM, Kcat)

Variables(metabolite, proteinconcentrations)

PRIDE XML MeMo SABIO-RK

Web serviceWeb serviceWeb service

MeMo-RK

Web service

Page 46: Data Integration, Mass Spectrometry Proteomics Software Development

Systems biology modelling

Enzyme kineticsQuantitativemetabolomics

Quantitativeproteomics

Systems Biology Model

Parameters(KM, Kcat)

Variables(metabolite, proteinconcentrations)

PRIDE XML MeMo SABIO-RK

Web serviceWeb serviceWeb service

MeMo-RK

Web service

Page 47: Data Integration, Mass Spectrometry Proteomics Software Development

Systems biology modelling

Enzyme kineticsQuantitativemetabolomics

Quantitativeproteomics

Systems Biology Model

Parameters(KM, Kcat)

Variables(metabolite, proteinconcentrations)

PRIDE XML MeMo SABIO-RK

Web serviceWeb serviceWeb service

MeMo-RK

Web service

Page 48: Data Integration, Mass Spectrometry Proteomics Software Development

Modelling life-cycle workflows

Page 49: Data Integration, Mass Spectrometry Proteomics Software Development

From experiment to simulation

Kinetic models

Experimental data

Systematic integration of experimental data and models in systems biology. Li P, et al. BMC

Bioinformatics. 2010, 11:582.

Page 50: Data Integration, Mass Spectrometry Proteomics Software Development

Conclusion

• An informatics pipeline has been developed for analysis of quantitative proteomics data• Data is associated with metadata, identified,

quantified, and uploaded to database• Community standards have been followed

• Experimental data can be incorporated in systems biology models• Allows simulations of biological systems to be

performed

Page 51: Data Integration, Mass Spectrometry Proteomics Software Development

Thanks…

Page 52: Data Integration, Mass Spectrometry Proteomics Software Development