data integration, mass spectrometry proteomics software development

Overview

• Quantitative proteomics

• Data integration in kinetic modelling in systems biology

http://mcisb.org/index.html

A typical proteomics experiment

• Various routes through this mapSeparating by size or charge in most cases

Identify peptides as a proxy for proteins, comparing theoretical

to experimental spectra


Quantitative proteomics

• Approach described is qualitative• Peptides / proteins identified but not quantified

• Mass spectrometry is not quantitative per se• Different compounds have different physiochemical

properties• May ionise differently, more / less readily

• Therefore peak intensities cannot be compared between two different compounds• Applies to peptides / proteins



• BUT peak intensities can be compared between compounds sharing the same physiochemical properties• Isotopes

• Same physiochemical properties• Different molecular masses (ΔM = 6Da)



• Can apply the same principle for peptides:

• IDVAVDSTGVFK• IDVAVDSTGVFK*

• Lysine (K) residue is labelled with C13

• Same physiochemical properties• Different molecular masses (ΔM = 6Da)



• Absolute quantitative proteomics requires isotopically-labelled peptide of known concentration spiked into sample

• Isotopically-identical peptides behave consistently– Comparable peak intensity, comparable retention time

• Ratio of labelled over non-labelled peptide can be used to determined absolute concentration of sample peptide


expected and observed ratio areas 3 peptides

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

proportion of light present

area

rat

io H

/(H

+L)

expected L/(L+H)

area L/(L+H) peptide 2



Linear (expectedL/(L+H))

DBKtest07 #1073 RT: 15.60 AV: 1 NL: 1.67E6T: FTMS + c ESI Full ms [300.00-2000.00]

516 517 518 519 520 521 522 523 524

m/z

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lativ

e A

bu

nd

an

ce

519.29

516.28

519.79

516.78

520.29

517.28

518.79

520.79519.14517.78520.14 521.29518.28 519.46516.45516.11 522.13515.77

Mixture 40:60

Data: Kathleen Carroll (Orbitrap MS)

Quantitative proteomics: QconCAT



• Requirements:• Determine absolute protein concentrations under a

given cellular condition• Quantify a number (~50) proteins simultaneously

• Apply QconCAT methodology• Allows simultaneous introduction of many labelled

peptides into sample• Multiplexed absolute quantification for proteomics using

concatenated signature peptides encoded by QconCAT genes. Pratt JM, et al. Nat Protoc. 2006, 1:1029-43.



• Construct an artificial protein containing many peptides– At least one from each protein of interest– Ensure that the artificial protein is isotopically-labelled


• Numerous absolute protein quantitations can be performed simultaneously



…from instrument to browser

• From an QconCAT informatics perspective, there are three steps…

1. Selection of QconCAT peptides2. Analysis and submission of data3. Browsing / querying


Selection of QconCAT peptides

Q. Given a given protein, which peptides are suitable candidates for QconCAT peptides?

Must…• Be unique across organism• Be detectable (digestible, flyable)

Preferably…• Be unmodified


QconCAT Selection Wizard

• Takes protein accession numbers as input (and other parameters)

• Provides list of potential QconCAT peptides• Downloads sequence• Performs BLAST against species-specific UniProt (tests

uniqueness)• Filters peptides “appropriately”• Applies score to peptide, using PeptideSieve (predict

flyability)• Computational prediction of proteotypic peptides for quantitative

proteomics. Mallick P, et al. Nat Biotechnol. 2007, 25:125-31.


QconCAT…

Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes. Pratt JM, et al. Nature Protocols 1, 1029-1043 (2006)

QconCAT data analysis

• Identify and quantify peptides / proteins of interest

• Generate results in standard data format• Facilitates data sharing• Exploit existing software tools

• PRIDE XML• PRoteomics IDEntifications• Community developed standard• http://www.ebi.ac.uk/pride/

http://www.ebi.ac.uk/pride/



eXist database

PRIDE XML

Identify

QconCAT Pride Wizard

Quantify

Format

Upload

Web / web service

Browser

Mascot

PRIDE XMLPRIDE Converter

mzData


Pride Converter

• Pride Converter (EBI) used to extract meta-data• Who ran the sample, what was the sample,

instrument used? etc.• http://code.google.com/p/pride-converter/• PRIDE Converter: making proteomics data-sharing easy. Barsnes H,

et al. Nat Biotechnol. 2009, 27:598-9.

• Simple wizard allowing experimental data to be marked up with meta-data

http://code.google.com/p/pride-converter/


Pride Converter



eXist database

PRIDE XML

Identify


Quantify

Format

Upload

Web / web service

Browser

Mascot


mzData


QconCAT PrideWizard: Identify

• Goal: to identify heavily-labelled QconCAT peptides• Uses Mascot• http://www.matrixscience.com/search_form_select.ht

ml

• De facto standard database search engine for identifying peptides / proteins

http://www.matrixscience.com/search_form_select.html

http://www.matrixscience.com/search_form_select.html


QconCAT PrideWizard: Identify

• Mascot results are parsed to find labelled QconCAT peptides:



eXist database

PRIDE XML

Identify


Quantify

Format

Upload

Web / web service

Browser

Mascot


mzData


QconCAT PrideWizard: Quantify

• Goal: to quantify heavily-labelled QconCAT peptides

• We now know m/z and retention time of peak identified as a QconCAT peptide

• First step: extract mass chromatogram for both heavy (labelled) and light (unlabelled) peptide


• Extracted mass chromatograms• Heavy and light peptide should overlay as they should

have same retention time


• Could use peak areas to quantify heavy versus light• BUT hard (and inaccurate) to determine start and end


• Alternative: extract individual scans showing isotopic clusters for both heavy and light


• Apply sliding window and plot heavy versus light:


• Final step: apply linear regression to determine heavy:light ratio (and an error):


eXist database

PRIDE XML

Identify


Quantify

Format

Upload

Web / web service

Browser

Mascot


mzData

MCISB Proteome Database

• Searchable repository of quantitative proteomics data

• Geeky bit…• eXist native XML database holding PRIDE XML• JSP front end• Querying extensible through XQuery

• Web and web-service interface• Both human and computer-queryable

QconCAT informatics pipeline

• Reference:

• A QconCAT informatics pipeline for the analysis, visualization and sharing of absolute quantitative proteomics data. Swainston N, et al. Proteomics. 2011, 11:329-33.

Data Integration

Systems biology modelling

Enzyme kineticsQuantitativemetabolomics

Quantitativeproteomics

Systems Biology Model

Parameters(KM, Kcat)

Variables(metabolite, proteinconcentrations)

PRIDE XML MeMo SABIO-RK

Web serviceWeb serviceWeb service

MeMo-RK

Web service

Modelling life-cycle workflows

From experiment to simulation

Kinetic models

Experimental data

Systematic integration of experimental data and models in systems biology. Li P, et al. BMC

Bioinformatics. 2010, 11:582.

Conclusion

• An informatics pipeline has been developed for analysis of quantitative proteomics data• Data is associated with metadata, identified,

quantified, and uploaded to database• Community standards have been followed

• Experimental data can be incorporated in systems biology models• Allows simulations of biological systems to be

performed

Thanks…

data integration, mass spectrometry proteomics software development

Technology

qconcat pride wizard

labelled qconcat peptides

database pride xml

selection of qconcat

qconcat genes

qconcat pridewizard

qconcat requirements

typical proteomics