data integration, mass spectrometry proteomics software development

Post on 10-May-2015

1.217 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Overview

• Quantitative proteomics

• Data integration in kinetic modelling in systems biology

A typical proteomics experiment

• Various routes through this mapSeparating by size or charge in most cases

Identify peptides as a proxy for proteins, comparing theoretical

to experimental spectra

Quantitative proteomics

• Approach described is qualitative• Peptides / proteins identified but not quantified

• Mass spectrometry is not quantitative per se• Different compounds have different physiochemical

properties• May ionise differently, more / less readily

• Therefore peak intensities cannot be compared between two different compounds• Applies to peptides / proteins

Quantitative proteomics

• BUT peak intensities can be compared between compounds sharing the same physiochemical properties• Isotopes

• Same physiochemical properties• Different molecular masses (ΔM = 6Da)

Quantitative proteomics

Quantitative proteomics

• Can apply the same principle for peptides:

• IDVAVDSTGVFK• IDVAVDSTGVFK*

• Lysine (K) residue is labelled with C13

• Same physiochemical properties• Different molecular masses (ΔM = 6Da)

Quantitative proteomics

• Absolute quantitative proteomics requires isotopically-labelled peptide of known concentration spiked into sample

• Isotopically-identical peptides behave consistently– Comparable peak intensity, comparable retention time

• Ratio of labelled over non-labelled peptide can be used to determined absolute concentration of sample peptide

expected and observed ratio areas 3 peptides

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

proportion of light present

area

rat

io H

/(H

+L)

expected L/(L+H)

area L/(L+H) peptide 2

area L/(L+H) peptide 1

area L/(L+H) peptide 3

Linear (expectedL/(L+H))

DBKtest07 #1073 RT: 15.60 AV: 1 NL: 1.67E6T: FTMS + c ESI Full ms [300.00-2000.00]

516 517 518 519 520 521 522 523 524

m/z

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lativ

e A

bu

nd

an

ce

519.29

516.28

519.79

516.78

520.29

517.28

518.79

520.79519.14517.78520.14 521.29518.28 519.46516.45516.11 522.13515.77

Mixture 40:60

Data: Kathleen Carroll (Orbitrap MS)

Quantitative proteomics: QconCAT

Quantitative proteomics: QconCAT

• Requirements:• Determine absolute protein concentrations under a

given cellular condition• Quantify a number (~50) proteins simultaneously

• Apply QconCAT methodology• Allows simultaneous introduction of many labelled

peptides into sample• Multiplexed absolute quantification for proteomics using

concatenated signature peptides encoded by QconCAT genes. Pratt JM, et al. Nat Protoc. 2006, 1:1029-43.

Quantitative proteomics: QconCAT

• Construct an artificial protein containing many peptides– At least one from each protein of interest– Ensure that the artificial protein is isotopically-labelled

• Numerous absolute protein quantitations can be performed simultaneously

Quantitative proteomics: QconCAT

…from instrument to browser

• From an QconCAT informatics perspective, there are three steps…

1. Selection of QconCAT peptides2. Analysis and submission of data3. Browsing / querying

Selection of QconCAT peptides

Q. Given a given protein, which peptides are suitable candidates for QconCAT peptides?

Must…• Be unique across organism• Be detectable (digestible, flyable)

Preferably…• Be unmodified

QconCAT Selection Wizard

• Takes protein accession numbers as input (and other parameters)

• Provides list of potential QconCAT peptides• Downloads sequence• Performs BLAST against species-specific UniProt (tests

uniqueness)• Filters peptides “appropriately”• Applies score to peptide, using PeptideSieve (predict

flyability)• Computational prediction of proteotypic peptides for quantitative

proteomics. Mallick P, et al. Nat Biotechnol. 2007, 25:125-31.

QconCAT…

Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes. Pratt JM, et al. Nature Protocols 1, 1029-1043 (2006)

QconCAT data analysis

• Identify and quantify peptides / proteins of interest

• Generate results in standard data format• Facilitates data sharing• Exploit existing software tools

• PRIDE XML• PRoteomics IDEntifications• Community developed standard• http://www.ebi.ac.uk/pride/

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

Pride Converter

• Pride Converter (EBI) used to extract meta-data• Who ran the sample, what was the sample,

instrument used? etc.• http://code.google.com/p/pride-converter/• PRIDE Converter: making proteomics data-sharing easy. Barsnes H,

et al. Nat Biotechnol. 2009, 27:598-9.

• Simple wizard allowing experimental data to be marked up with meta-data

Pride Converter

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

QconCAT PrideWizard: Identify

• Goal: to identify heavily-labelled QconCAT peptides• Uses Mascot• http://www.matrixscience.com/search_form_select.ht

ml

• De facto standard database search engine for identifying peptides / proteins

QconCAT PrideWizard: Identify

• Mascot results are parsed to find labelled QconCAT peptides:

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

QconCAT PrideWizard: Quantify

• Goal: to quantify heavily-labelled QconCAT peptides

• We now know m/z and retention time of peak identified as a QconCAT peptide

• First step: extract mass chromatogram for both heavy (labelled) and light (unlabelled) peptide

QconCAT PrideWizard: Quantify

• Extracted mass chromatograms• Heavy and light peptide should overlay as they should

have same retention time

QconCAT PrideWizard: Quantify

• Could use peak areas to quantify heavy versus light• BUT hard (and inaccurate) to determine start and end

QconCAT PrideWizard: Quantify

• Alternative: extract individual scans showing isotopic clusters for both heavy and light

QconCAT PrideWizard: Quantify

• Apply sliding window and plot heavy versus light:

QconCAT PrideWizard: Quantify

• Final step: apply linear regression to determine heavy:light ratio (and an error):

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

QconCAT data analysis

eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData

MCISB Proteome Database

• Searchable repository of quantitative proteomics data

• Geeky bit…• eXist native XML database holding PRIDE XML• JSP front end• Querying extensible through XQuery

• Web and web-service interface• Both human and computer-queryable

QconCAT informatics pipeline

• Reference:

• A QconCAT informatics pipeline for the analysis, visualization and sharing of absolute quantitative proteomics data. Swainston N, et al. Proteomics. 2011, 11:329-33.

Data Integration

Systems biology modelling

Enzyme kineticsQuantitativemetabolomics

Quantitativeproteomics

Systems Biology Model

Parameters(KM, Kcat)

Variables(metabolite, proteinconcentrations)

PRIDE XML MeMo SABIO-RK

Web serviceWeb serviceWeb service

MeMo-RK

Web service

Systems biology modelling

Enzyme kineticsQuantitativemetabolomics

Quantitativeproteomics

Systems Biology Model

Parameters(KM, Kcat)

Variables(metabolite, proteinconcentrations)

PRIDE XML MeMo SABIO-RK

Web serviceWeb serviceWeb service

MeMo-RK

Web service

Systems biology modelling

Enzyme kineticsQuantitativemetabolomics

Quantitativeproteomics

Systems Biology Model

Parameters(KM, Kcat)

Variables(metabolite, proteinconcentrations)

PRIDE XML MeMo SABIO-RK

Web serviceWeb serviceWeb service

MeMo-RK

Web service

Systems biology modelling

Enzyme kineticsQuantitativemetabolomics

Quantitativeproteomics

Systems Biology Model

Parameters(KM, Kcat)

Variables(metabolite, proteinconcentrations)

PRIDE XML MeMo SABIO-RK

Web serviceWeb serviceWeb service

MeMo-RK

Web service

Modelling life-cycle workflows

From experiment to simulation

Kinetic models

Experimental data

Systematic integration of experimental data and models in systems biology. Li P, et al. BMC

Bioinformatics. 2010, 11:582.

Conclusion

• An informatics pipeline has been developed for analysis of quantitative proteomics data• Data is associated with metadata, identified,

quantified, and uploaded to database• Community standards have been followed

• Experimental data can be incorporated in systems biology models• Allows simulations of biological systems to be

performed

Thanks…

top related