relative quantification of proteins by 2d dige and label free lc-ms/ms 6 th european summer school...

44
Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin Wells Brixen/Bressanone, Italy

Upload: john-pierce

Post on 28-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Relative quantification of proteins by 2D DIGE and label free LC-MS/MS

6th European Summer School19-25 August 2012

Martin Wells Brixen/Bressanone, Italy

Page 2: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

1. Relative quantification – what do we need?2. The challenges and limitations specific to proteomics3. The Progenesis software brand – concept and philosophy 4. The relative quant products:

Progenesis SameSpots for 2D gels Progenesis LC-MS for proteomics Progenesis CoMet for metabolomics Progenesis MALDI

Introduction

Page 3: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Who are we?• Nonlinear Dynamics develops proteomics and metabolomics

analysis software that is different, ground-breaking and above all designed to help you generate reliable conclusions that are reproducible within and across-labs.

• Founded 1989• Head Office - Newcastle Upon Tyne, UK

Nonlinear Dynamics Ltd.

Page 4: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Proteomics – Relative Quantification

Example applications• Discovery of protein expression changes related

to disease or environment conditions

• Characterisation of proteins or complexes to understand functionality and or activity

Page 5: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Relative quantification - What is the goal?

“To discover the proteins that warrant further investigation as rapidly, objectively and reliably as possible.”

Proteins x, y and z are significantly changing in their abundance as a apparent result of changes in the experimental conditions. Investigate these observation…

Page 6: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Experimental design that can answer the biological questions

• Like for like measure of abundance across all samples to make relative comparisons

• Correction for differences in sample loading• Reliable measure of variance within experimental groups• Statistical tools to confidently find differences that are

significant and likely to be true• Identification workflow to annotate the measured

features• Confirmation that other measurements are in consensus

What do we need?

Page 7: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Feature detection and matching has been one of the major challenges in relative quantitative proteomics data analysis, leading to time consuming subjective user editing and non-reproducible results.

• Defining a feature consistently is more important to relative quantification than individual boundaries.

• Independent detection of each files leads to inconsistent interpretation of features and matching challenges

• How to handle / avoid “missing” data. Ie abundance measurement in only 3 out of 5 samples

Like for like measurements

Page 8: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Co-detection

Aggregate co-detection

Mapping the detection to all runs avoiding missing data

Page 9: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Data alignment – a prerequisite to co-detection

Section prior to alignment Section after alignment

Page 10: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Like for like measurements

Page 11: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Precision of label free quantification – ion abundance

FTMS scan intervals effect on Feature abundance variance

0

5

10

15

20

25

1 1.3 1.5 1.7 1.9

FTMS Scan Interval

mean

CV

Human Lysate Digest 750ng 70 min separation mean peak width 90% Max ~21sCVs calculated on quintruplicate analyses of 47,482 multiply charged featuresInst: Orbitrap XL and Dionex RSLCn Duncan Smith, Paterson Institute for Cancer Research, Manchester

20 16 14 12 11Mean number of FTMS data points/peak

Page 12: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Quantitative Precision and Accuracy

Not accurate, not precise

Precise, not accurate

Accurate, not precise

Accurate and precise

“The most complete dataset with accurate ratio’s and high precision was achieved by Progenesis” Richard R. Sprenger, USD, Odense - ASMS 2012 - Label-free absolute quantification of proteomes: evaluation and comparison of bioinformatics platforms and strategies

Page 13: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Benefit of high mass resolution in complex mixtures

Page 14: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Experimental design that can answer the biological questions

• Like for like measure of abundance across all samples to make relative comparisons

• Correction for differences in sample loading

What do we need?

Page 15: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• If you loaded 20% more sample then most proteins will display a 20% increase in abundance, assuming…

Data NormalisationAssuming like for like measurements are taken for all features you can calculate ratios for each feature. This is repeated to a common file for each sample. These ratios should on average be 1:1 assuming most proteins are not affected by the experimental condition.

Correction for sample loading

Page 16: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Relative normalised abundances

5 groups with 3 replicates per group.

Graphical expression profile showing group means with error bars representing 95% confidence bounds.

Page 17: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Experimental design that can answer the biological questions

• Like for like measure of abundance across all samples to make relative comparisons

• Correction for differences in sample loading• Reliable measure of variance within experimental

groups

What do we need?

Page 18: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Reliable measure of variance

• Technical noise

- e.g. experimentally introduced bias

• Analytical inadequacies

- e.g. incorrect or missing quantification of experiment data

• Biological variation

- how do we handle this?

These obstacles result in a loss of power and, therefore, a reduction in our ability to confidently discover significant expression changes

Page 19: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Determining the variance of the abundance measurement enables determination of confidence bounds. (95%)

• The greater the number of replicates the smaller the confidence interval - in a well controlled environment.

• Variance if often proportional to the mean.• The confidence of any reported difference or POWER is a

relationship between the size of the confidence intervals and the effect size.

Reliable measure of variance

Page 20: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Experimental design that can answer the biological questions

• Like for like measure of abundance across all samples to make relative comparisons

• Correction for differences in sample loading• Reliable measure of variance within experimental

groups• Statistical tools to confidently find differences that

are significant and likely to be true

What do we need?

Page 21: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Know the assumptions and limitations of the test you apply• Student T-test and One-way ANOVA assume:

Normal distribution Equal variance Sample independence

• To achieve the above a data transformation is typically required.

• Proteomics data is typically log distributed and requires a transformation to meet the pre-requisites of many statistical tests

Statistical tools

Log distribution Normal distribution

Page 22: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Uni and multivariate statistics

The complete data table of quantitative information, without missing data, enables the robust and confident application of multivariate statistics to support traditional univariate tests. • Principle component analysis PCA

•ANOVA p-values, typically p<0.05 – ie 5% type I error (false positive)

•False discovery rate p value “correction” – q values to reduce type 1 errors

• Power analysis per feature to reduce type II errors (false negative)

•Correlation analysis of proteins or peptides ions abundancesSee which protein abundances are highly correlated. Is there a

biological cause to this effect.The greater the biological variation the more reliable this relationship.

Page 23: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Principle component analysis

Fantastic overview of your data, enabling assessment of the major contributing factors (principle components) to variance in your experiment.

Eg. Are all the files within each group showing similar overall characteristics to each other. Are all groups different to all other groups or just one group is different to the rest.

Page 24: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

False discoveries and q-values

Theoretical p-value distribution Actual p-value distribution

0

[1] Storey,J.D. and Tibshirani,R. (2003) Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA, 100, 9440–9445.

• q-values take the into account the actual p-value distribution and can be referred to as a corrected p-value

• False discoveries occur when we are observing a difference when in truth there is none, thus indicating a test of poor specificity. Q-values are a tool to reduce such false discoveries

• False Discovery Rate (FDR) assumes a uniform distribution of p-values but ...

Page 25: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Link protein expression to pathways

Label Free LC-MS Based Proteomics for Integrated Preclinical Pharmaceutical Toxicology. Experience from the FP6 InnoMed PredTox Consortium. Scientific poster presented by Dr. Ben Collins from University College Dublin, Ireland. ASMS, 2010.

Find similar expression profiles

Page 26: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Experimental design that can answer the biological questions

• Like for like measure of abundance across all samples to make relative comparisons

• Correction for differences in sample loading• Reliable measure of variance within experimental

groups• Statistical tools to confidently find differences that

are significant and likely to be true• Identification workflow to annotate the measured

features

What do we need?

Page 27: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Identification workflow with co-detection

Peptide ion indentified obtained from any one file

No Missing data!

Identification can be confirmed by any other

file(s)

Page 28: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Should MSMS acquisition simple be replicated for each sample?

RT

400 2000

22min

92min

A MSMS of 400-525 (7.8% mz window)

B MSMS of 525-638 (7.0% mz window)

C MSMS of 638-766 (8.0% mz window)

D MSMS of 766-963 (12.3% mz window)

E MSMS of 963-2000 (64.8% mz window)

Quintuplicate analysis gives

5 Identical LCMS experiments/sample

iGPFDDA

iGPF gives 5 completely different

LCMSMS experiments/sample

Page 29: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Identification workflow

41,482 multiple charge

ions

iGPF 16,285(39%)

DDA 4,522(11%)

• Using iGPF 3.5 time more peptides were identified compared to DDA.

• Many of the ‘unsuccessful’ MS2 spectra are very weak due to the low abundance of the precursor.

• 41,482 multiply charged ions quantified from MS1 data

• FDR <1%• Further 2D-LC-MSMS annotations

yielded an additional 10,100 peptide ID’s due to a reduction in complexity and increased on column concentration

Duncan Smith, Paterson Institute for Cancer Research, Manchester

Page 30: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Experimental design that can answer the biological questions

• Like for like measure of abundance across all samples to make relative comparisons

• Correction for differences in sample loading• Reliable measure of variance within experimental groups• Statistical tools to confidently find differences that are

significant and likely to be true• Identification workflow to annotate the measured

features• Confirmation that any other measurements are in

consensus

What do we need?

Page 31: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Multiple measurements identified as unique and from the same protein can be collated and the protein abundance calculated.

• However you may have several reported fold changes – so which is correct?

Protein Abundance Results

Page 32: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Relative protein abundance calculations

Page 33: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Are all peptide measurements in agreement? Or do some peptides reflect modification differences between groups rather than abundances changes.

• Activation of the same quantity of protein may result in some peptides changing significantly whilst all others remaining constant.

• Are there any technical (eg miscleavage) or biological (eg PTM) outliers that need filtering out of the “abundance” measure.

Protein “Abundance” Results

Page 34: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Peptide consensus – example PTM

Page 35: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Generate additional MSE/MS2 data to annotate

quantification data

Set-up experiment

groups

View Results & Peptide Ion

Quantification

Multivariate Statistical Analysis

Peptide Search & Import Results

Protein View & Quantification Report

Data Import

RT Alignment

Automatic detection normalisation & quantification

of LC-MS data at the peptide level

LABEL FREE QUANTIFICATIO

N

Progenesis LC-MS analytical workflow

THEN IDENTIFY

Directly load vendor data filesApply lock-mass correctionApply dead time correction (if applicable)

Wikipedia

Page 36: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

View Peptide DataDelve into underlying peptide information

Analyse fractionated samples

Analysis of Each FractionReduce sample complexity but quantify more runs

Recombine FractionsNormalise across fractions to see global view

Protein ViewQuantification & identification at the protein level

Inclusion lists

Page 37: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Normalising pre-fractionated data

Two step process Standard normalisation within each fraction Use common abundance distribution characteristics to normalise

between fractions

Page 38: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Precision is more common to see reported and maybe easier to access with technical repetitionIs 15%-20% Precision good – is it good enough??

Effect Size - The fold change between the groups

The POWER – combination of both the above may be more suitable and relevant to differential expression results. Also takes into consideration of the sample size.

Accuracy is more challenging and requires some prior knowledge . Ie the “correct” or target answer. Is 15-20% Accuracy good – is it good enough??

Precision and Accuracy – what’s needed?

FDA Guidance for Industry - Bioanalytical Method ValidationThe precision should not exceed 15% of the coefficient of variation (CV) except for the LLOQ, where it should not exceed 20% of the CV.

The accuracy should be within 15% of the actual value except at LLOQ, where it should not deviate by more than 20%.

Page 39: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Impact of Precision to relative quant

~15% CV

~50% CV

~30% CV

Page 40: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

What analytical precision is needed?

For relative quant would it be more appropriate to report POWER than precision or even fold change? Something that is close to the limits of detection (LOD) is likely to have poor precision (in that group) but if the effect size (fold change) is large then any short coming in precision may not be critical to the significance of the observed difference.

Would statistical POWER be more informative?

p<0.05

p<0.05

Page 41: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Accuracy is more challenging and requires some prior knowledge . Ie the “correct” or target answer. Is 15-20% Accuracy good – is it good enough??

Precision and Accuracy – what’s needed?

30

50

40

20

60

10

fmol

Page 42: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Relative quantification of proteomics data can provide some extremely powerful biological insight.

• A clear understanding of the software tools, their assumptions and limitations are vital to the correct interpretation of the data.

• Knowledge of the major contributors to technical variance in your whole workflow and the control and monitoring of this can greatly impact your confidence in the results.

Conclusion

Page 43: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

Progenesis SameSpots for 2D gels

Progenesis LC-MS for proteomics

Progenesis CoMet for metabolomics

Progenesis MALDI

The Progenesis product range

www.nonlinear.com

Page 44: Relative quantification of proteins by 2D DIGE and label free LC-MS/MS 6 th European Summer School 19-25 August 2012 Martin WellsBrixen/Bressanone, Italy

• Colleagues at Nonlinear Dynamics• Duncan Smith et al,

Paterson Institute for Cancer Research, Manchester

Thank you for your attention

• Stats glossary http://www.stats.gla.ac.uk/steps/glossary/index.html

Acknowledgements