data integration, mass spectrometry proteomics software development
TRANSCRIPT
Overview
• Quantitative proteomics
• Data integration in kinetic modelling in systems biology
A typical proteomics experiment
• Various routes through this mapSeparating by size or charge in most cases
Identify peptides as a proxy for proteins, comparing theoretical
to experimental spectra
Quantitative proteomics
• Approach described is qualitative• Peptides / proteins identified but not quantified
• Mass spectrometry is not quantitative per se• Different compounds have different physiochemical
properties• May ionise differently, more / less readily
• Therefore peak intensities cannot be compared between two different compounds• Applies to peptides / proteins
Quantitative proteomics
• BUT peak intensities can be compared between compounds sharing the same physiochemical properties• Isotopes
• Same physiochemical properties• Different molecular masses (ΔM = 6Da)
Quantitative proteomics
Quantitative proteomics
• Can apply the same principle for peptides:
• IDVAVDSTGVFK• IDVAVDSTGVFK*
• Lysine (K) residue is labelled with C13
• Same physiochemical properties• Different molecular masses (ΔM = 6Da)
Quantitative proteomics
• Absolute quantitative proteomics requires isotopically-labelled peptide of known concentration spiked into sample
• Isotopically-identical peptides behave consistently– Comparable peak intensity, comparable retention time
• Ratio of labelled over non-labelled peptide can be used to determined absolute concentration of sample peptide
expected and observed ratio areas 3 peptides
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
proportion of light present
area
rat
io H
/(H
+L)
expected L/(L+H)
area L/(L+H) peptide 2
area L/(L+H) peptide 1
area L/(L+H) peptide 3
Linear (expectedL/(L+H))
DBKtest07 #1073 RT: 15.60 AV: 1 NL: 1.67E6T: FTMS + c ESI Full ms [300.00-2000.00]
516 517 518 519 520 521 522 523 524
m/z
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lativ
e A
bu
nd
an
ce
519.29
516.28
519.79
516.78
520.29
517.28
518.79
520.79519.14517.78520.14 521.29518.28 519.46516.45516.11 522.13515.77
Mixture 40:60
Data: Kathleen Carroll (Orbitrap MS)
Quantitative proteomics: QconCAT
Quantitative proteomics: QconCAT
• Requirements:• Determine absolute protein concentrations under a
given cellular condition• Quantify a number (~50) proteins simultaneously
• Apply QconCAT methodology• Allows simultaneous introduction of many labelled
peptides into sample• Multiplexed absolute quantification for proteomics using
concatenated signature peptides encoded by QconCAT genes. Pratt JM, et al. Nat Protoc. 2006, 1:1029-43.
Quantitative proteomics: QconCAT
• Construct an artificial protein containing many peptides– At least one from each protein of interest– Ensure that the artificial protein is isotopically-labelled
• Numerous absolute protein quantitations can be performed simultaneously
Quantitative proteomics: QconCAT
…from instrument to browser
• From an QconCAT informatics perspective, there are three steps…
1. Selection of QconCAT peptides2. Analysis and submission of data3. Browsing / querying
Selection of QconCAT peptides
Q. Given a given protein, which peptides are suitable candidates for QconCAT peptides?
Must…• Be unique across organism• Be detectable (digestible, flyable)
Preferably…• Be unmodified
QconCAT Selection Wizard
• Takes protein accession numbers as input (and other parameters)
• Provides list of potential QconCAT peptides• Downloads sequence• Performs BLAST against species-specific UniProt (tests
uniqueness)• Filters peptides “appropriately”• Applies score to peptide, using PeptideSieve (predict
flyability)• Computational prediction of proteotypic peptides for quantitative
proteomics. Mallick P, et al. Nat Biotechnol. 2007, 25:125-31.
QconCAT…
Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes. Pratt JM, et al. Nature Protocols 1, 1029-1043 (2006)
QconCAT data analysis
• Identify and quantify peptides / proteins of interest
• Generate results in standard data format• Facilitates data sharing• Exploit existing software tools
• PRIDE XML• PRoteomics IDEntifications• Community developed standard• http://www.ebi.ac.uk/pride/
QconCAT data analysis
eXist database
PRIDE XML
Identify
QconCAT Pride Wizard
Quantify
Format
Upload
Web / web service
Browser
Mascot
PRIDE XMLPRIDE Converter
mzData
QconCAT data analysis
eXist database
PRIDE XML
Identify
QconCAT Pride Wizard
Quantify
Format
Upload
Web / web service
Browser
Mascot
PRIDE XMLPRIDE Converter
mzData
Pride Converter
• Pride Converter (EBI) used to extract meta-data• Who ran the sample, what was the sample,
instrument used? etc.• http://code.google.com/p/pride-converter/• PRIDE Converter: making proteomics data-sharing easy. Barsnes H,
et al. Nat Biotechnol. 2009, 27:598-9.
• Simple wizard allowing experimental data to be marked up with meta-data
Pride Converter
QconCAT data analysis
eXist database
PRIDE XML
Identify
QconCAT Pride Wizard
Quantify
Format
Upload
Web / web service
Browser
Mascot
PRIDE XMLPRIDE Converter
mzData
QconCAT data analysis
eXist database
PRIDE XML
Identify
QconCAT Pride Wizard
Quantify
Format
Upload
Web / web service
Browser
Mascot
PRIDE XMLPRIDE Converter
mzData
QconCAT data analysis
eXist database
PRIDE XML
Identify
QconCAT Pride Wizard
Quantify
Format
Upload
Web / web service
Browser
Mascot
PRIDE XMLPRIDE Converter
mzData
QconCAT PrideWizard: Identify
• Goal: to identify heavily-labelled QconCAT peptides• Uses Mascot• http://www.matrixscience.com/search_form_select.ht
ml
• De facto standard database search engine for identifying peptides / proteins
QconCAT PrideWizard: Identify
• Mascot results are parsed to find labelled QconCAT peptides:
QconCAT data analysis
eXist database
PRIDE XML
Identify
QconCAT Pride Wizard
Quantify
Format
Upload
Web / web service
Browser
Mascot
PRIDE XMLPRIDE Converter
mzData
QconCAT PrideWizard: Quantify
• Goal: to quantify heavily-labelled QconCAT peptides
• We now know m/z and retention time of peak identified as a QconCAT peptide
• First step: extract mass chromatogram for both heavy (labelled) and light (unlabelled) peptide
QconCAT PrideWizard: Quantify
• Extracted mass chromatograms• Heavy and light peptide should overlay as they should
have same retention time
QconCAT PrideWizard: Quantify
• Could use peak areas to quantify heavy versus light• BUT hard (and inaccurate) to determine start and end
QconCAT PrideWizard: Quantify
• Alternative: extract individual scans showing isotopic clusters for both heavy and light
QconCAT PrideWizard: Quantify
• Apply sliding window and plot heavy versus light:
QconCAT PrideWizard: Quantify
• Final step: apply linear regression to determine heavy:light ratio (and an error):
QconCAT data analysis
eXist database
PRIDE XML
Identify
QconCAT Pride Wizard
Quantify
Format
Upload
Web / web service
Browser
Mascot
PRIDE XMLPRIDE Converter
mzData
QconCAT data analysis
eXist database
PRIDE XML
Identify
QconCAT Pride Wizard
Quantify
Format
Upload
Web / web service
Browser
Mascot
PRIDE XMLPRIDE Converter
mzData
MCISB Proteome Database
• Searchable repository of quantitative proteomics data
• Geeky bit…• eXist native XML database holding PRIDE XML• JSP front end• Querying extensible through XQuery
• Web and web-service interface• Both human and computer-queryable
QconCAT informatics pipeline
• Reference:
• A QconCAT informatics pipeline for the analysis, visualization and sharing of absolute quantitative proteomics data. Swainston N, et al. Proteomics. 2011, 11:329-33.
Data Integration
Systems biology modelling
Enzyme kineticsQuantitativemetabolomics
Quantitativeproteomics
Systems Biology Model
Parameters(KM, Kcat)
Variables(metabolite, proteinconcentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Systems biology modelling
Enzyme kineticsQuantitativemetabolomics
Quantitativeproteomics
Systems Biology Model
Parameters(KM, Kcat)
Variables(metabolite, proteinconcentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Systems biology modelling
Enzyme kineticsQuantitativemetabolomics
Quantitativeproteomics
Systems Biology Model
Parameters(KM, Kcat)
Variables(metabolite, proteinconcentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Systems biology modelling
Enzyme kineticsQuantitativemetabolomics
Quantitativeproteomics
Systems Biology Model
Parameters(KM, Kcat)
Variables(metabolite, proteinconcentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Modelling life-cycle workflows
From experiment to simulation
Kinetic models
Experimental data
Systematic integration of experimental data and models in systems biology. Li P, et al. BMC
Bioinformatics. 2010, 11:582.
Conclusion
• An informatics pipeline has been developed for analysis of quantitative proteomics data• Data is associated with metadata, identified,
quantified, and uploaded to database• Community standards have been followed
• Experimental data can be incorporated in systems biology models• Allows simulations of biological systems to be
performed
Thanks…