analysis of complex proteomic datasets using scaffold free scaffold viewer can be downloaded at:

34
Analysis of Complex Proteomic Datasets Using Scaffold ee Scaffold Viewer can be downloaded a www.proteomesoftware.com

Upload: andra-hawkins

Post on 02-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Analysis of Complex Proteomic Datasets Using Scaffold

Free Scaffold Viewer can be downloaded at:www.proteomesoftware.com

Page 2: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

• Beyond the realm of manual interpretationBeyond the realm of manual interpretation• How do we determine what is a valid protein How do we determine what is a valid protein

identification?identification?

Shotgun proteomics Analysis of complex mixturesShotgun proteomics Analysis of complex mixtures

1.2 Million Spectra!!!

Whole cell extract

10,000+ proteins

600,000 peptides

Scaffold: Why do we need it?

Page 3: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Statistical Analysis Using Scaffold

• All search engines use different scoring All search engines use different scoring algorithms algorithms Can not directly compare results Can not directly compare results

• Many search engines results are described by Many search engines results are described by more than one valuemore than one value

Examples:Examples:

Mascot Mascot Ion Score and Identity Score Ion Score and Identity Score

Sequest Sequest Xcorr and DeltaCn Xcorr and DeltaCn

Page 4: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Peptide Prophet*Peptide Prophet*

• Creates a universal score (discriminant score) for the search Creates a universal score (discriminant score) for the search engine result (e.g. XCorr and DeltaCn are compressed to oneengine result (e.g. XCorr and DeltaCn are compressed to one score for SEQUEST results, Ion score and Identity score forscore for SEQUEST results, Ion score and Identity score for Mascot results)Mascot results)

• Plots a histogram of the discriminant scores and Plots a histogram of the discriminant scores and calculates a bimodal distribution based on standard calculates a bimodal distribution based on standard statistics to differentiate between correct and incorrect hitsstatistics to differentiate between correct and incorrect hits

• Computes the Computes the probabilityprobability that the match is correct at a that the match is correct at a given discriminant scoregiven discriminant score

*Nesvizhskii, A. I. et al, Anal. Chem. 2003, 75, 4646-4658

Statistical Analysis Using Scaffold

Page 5: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

0

20

40

60

80

100

120

140

160

180

200

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3Discriminant score (D)

Nu

mb

er o

f sp

ectr

a in

eac

h b

in

Histogram of discriminate scoresHistogram of discriminate scores

Statistical Analysis Using Scaffold

Page 6: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

0

20

40

60

80

100

120

140

160

180

200

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3

Nu

mb

er o

f sp

ectr

a in

eac

h b

in

Discriminant score (D)

Assumes a mixture of standard statistical

distributions ““incorrect”incorrect”

““correct”correct”

Statistical Analysis Using Scaffold

Page 7: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

““incorrect”incorrect”

““correct”correct”

0

20

40

60

80

100

120

140

160

180

200

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3

Nu

mb

er o

f sp

ectr

a in

eac

h b

in

Discriminant score (D)

Peptide Probability Threshold

( | ) ( )( | )

( | ) ( ) ( | ) ( )

p D pp D

p D p p D p

Statistical Analysis Using Scaffold

Page 8: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

9%

19% 7%

34%

5%

4%22%

SEQUEST

X!Tandem

One Search One Search Engine may Engine may

not be not be enoughenough

Mascot

Statistical Analysis Using Scaffold

www.proteomesoftware.com

Page 9: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

• Peptide Prophet statistics are applied separately for Peptide Prophet statistics are applied separately for each search engine result (i.e. Mascot, SEQUEST, each search engine result (i.e. Mascot, SEQUEST, and X!Tandem)and X!Tandem) • Scaffold MergerScaffold Merger combines the peptide probabilities combines the peptide probabilities from each search engine to generate a proteinfrom each search engine to generate a protein probability probability

The probability of identifying a spectrumThe probability of identifying a spectrum++

The probability of agreement between search engines The probability of agreement between search engines

Protein ProbabilityProtein Probability

Statistical Analysis Using Scaffold

Page 10: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Advantages using of ScaffoldAdvantages using of Scaffold

• Allows you to choose a statistical error rate by setting Allows you to choose a statistical error rate by setting probability thresholdsprobability thresholds

• Allows you to compare and combine results from Allows you to compare and combine results from different experiments and different search enginesdifferent experiments and different search engines

• Allows sharing of raw data and search results Allows sharing of raw data and search results

• Accepted as a suitable statistical method to validate Accepted as a suitable statistical method to validate large datasetslarge datasets

Statistical Analysis Using Scaffold

Page 11: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

This is the Samples view This is the Samples view

Page 12: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

List of all the proteins found in your samplesList of all the proteins found in your samples

Homologous proteins (proteins matched to the same peptides) are shown. You can directly like out to database entries

Homologous proteins (proteins matched to the same peptides) are shown. You can directly like out to database entries

Page 13: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

General Rule General Rule Explain the spectral data Explain the spectral data with the smallest set of proteinswith the smallest set of proteins

AA

BBProtein A and Protein B Protein A and Protein B share all the same share all the same peptides so they will be peptides so they will be grouped togethergrouped together

How does Scaffold Deal with peptides that can be assigned to

more than one protein?

Page 14: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

General Rule General Rule Explain the spectral data Explain the spectral data with the smallest set of proteinswith the smallest set of proteins

Protein A and protein B Protein A and protein B each have one unique each have one unique peptide peptide they will be they will be listed separately listed separately only only if if the peptide probability is the peptide probability is > 50%> 50%

How does Scaffold Deal with peptides that can be assigned to

more than one protein?

AA

BB

Page 15: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

How does Scaffold Deal with peptides that can be assigned to

more than one protein?

General Rule General Rule Explain the spectral data Explain the spectral data with the smallest set of proteinswith the smallest set of proteins

Protein B has two unique Protein B has two unique peptides peptides it will be listed it will be listed separatelyseparatelyAA

BB

Page 16: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Scaffold will extract GO terms from NCBI annotationsScaffold will extract GO terms from NCBI annotations

Page 17: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Gene Ontology “GO” terms

• Controlled vocabulary containing consistent Controlled vocabulary containing consistent descriptions of gene products in different descriptions of gene products in different databasesdatabases

• Describe gene products in terms of their Describe gene products in terms of their associated biological processes, cellularassociated biological processes, cellular components and molecular functions in a speciescomponents and molecular functions in a species independent mannerindependent manner

Gene Ontology Project http://www.geneontology.org/GO.doc.shtml

Page 18: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

List of samplesList of samples

Page 19: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Color coded to represent probability that protein identification is correct

Color coded to represent probability that protein identification is correct

Probability thresholds for peptide and protein identifications and required number of unique peptides can be defined

Probability thresholds for peptide and protein identifications and required number of unique peptides can be defined

Page 20: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

This is the Proteins viewThis is the Proteins view

Page 21: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Spectrum of each peptide labeled with y and b ions which can be used for manual validation

Spectrum of each peptide labeled with y and b ions which can be used for manual validation

Page 22: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Manual Spectrum Evaluation• Search engine scores Search engine scores Is peptide found by more Is peptide found by more than one search engine?than one search engine?

Mascot ion score > 40Mascot ion score > 40SEQUEST Xcorr > 2 (+2 ion), 2.5 (+3 ion)SEQUEST Xcorr > 2 (+2 ion), 2.5 (+3 ion)

deltaCn > 0.2deltaCn > 0.2• Good signal-to-noiseGood signal-to-noise• Long stretches of y and/or b ionsLong stretches of y and/or b ions• All dominant peaks are assigned as y or b ionsAll dominant peaks are assigned as y or b ions• Fragmentation chemistry Fragmentation chemistry

N-terminal cleavage at P N-terminal cleavage at P dominate y-ion dominate y-ionC-terminal cleavage at D and E C-terminal cleavage at D and E dominate b-ion dominate b-ionPeptides containing W Peptides containing W abundant y-ions abundant y-ionsS and T S and T tend to lose water (-18 Da) tend to lose water (-18 Da)R, N, and Q R, N, and Q tend to lose ammonia (-17 Da) tend to lose ammonia (-17 Da)

Page 23: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

b9-H2O

b3 y3 b4 y4b5 b6

y5

y6b7

y7

b8

b9

y8

y9

b10y10

b11y11 b12 y12b13

I A E L A G F S V P E N T KK T N E P V S F G A L E A I

m/z

Re

lati

ve

Inte

ns

ity

0%

50%

100%

0 250 500 750 1000 1250

1474.73 AMU, +2 H (Parent Error: -650 ppm)

Peptide Sequence Peptide Sequence IAELAGFSVPENTKIAELAGFSVPENTK+2 charge on parent peptide+2 charge on parent peptide

Good Spectrum

SEQUEST: Xcorr = 2.61SEQUEST: Xcorr = 2.61 deltaCn = 0.4deltaCn = 0.4

Dominant y-ion at N-terminal cleavage of PDominant y-ion at N-terminal cleavage of P

Mascot: Ion Score = 60.1Mascot: Ion Score = 60.1 Identify Score = 37.3Identify Score = 37.3

Good coverage of y and b ion seriesGood coverage of y and b ion series

Good signal-to-noise

Page 24: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

b9-H2O-H2O+2H

x8+2H b13+2H

b13+2H+1

internal PLADYAL-NH3a7-H2O+1

y15+2Hb8-H2O

b17-H2O+2H

b17+2H

b9-H2O-H2O

b9+1

b9+2

b19+2H

internal PLADYALTPD-CO

x17

b20+2H

b20+2H+1

b21+2H+1

b22+2H+1y3 y4

b5y5 y6 y7

b8

b9

y9

y10

b11y11

y12 b14b15

Y P L A D Y A L T P D M A I V D A N L V M D M P K

K P M D M V L N A D V I A M D P T L A Y D A L P Y

m/z

Re

lati

ve

Inte

ns

ity

0%

50%

100%

0 500 1000 1500 2000 2500

2767.75 AMU, +3 H (Parent Error: -240 ppm)

Bad SpectrumPeptide Sequence YPLADYALTPDMAIVDANLVMDMPK

+3 charge on parent peptide

SEQUEST: Xcorr = 2.26SEQUEST: Xcorr = 2.26 deltaCn = 0.2deltaCn = 0.2

Mascot: Ion Score = 9.93Mascot: Ion Score = 9.93 Identity Score = 37.3Identity Score = 37.3

Poor signal-to-noisePoor signal-to-noise

Poor coverage of y and b ion seriesPoor coverage of y and b ion series

Multiple unassigned peaksMultiple unassigned peaks

Page 25: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

This is the Statistics viewThis is the Statistics view

Page 26: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Score HistogramScore Histogram

Blue indicates “incorrect” proteins

Protein is “correct” if it passes the peptide and protein Protein is “correct” if it passes the peptide and protein probability and minimum # peptide filtersprobability and minimum # peptide filters.

Scaffold Statistics View

Red indicates “correct” proteins

Important! Must have enough data to fit two distributions for the

statistics to be valid.

Page 27: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Scaffold Statistics View

With only 1 unique peptide (95% peptide prob)

the maximum protein probability is <90%.

With at least 2 unique Peptides (95% peptide prob)

the maximum protein probability is ~100%.

Page 28: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

SEQUEST only

Scaffold Statistics View

Missed IDs

Page 29: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Mascot only

Scaffold Statistics View

Missed IDs

Page 30: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Scaffold Statistics ViewUsing both Mascot and Sequest results in more Using both Mascot and Sequest results in more ““correct” protein identificationscorrect” protein identifications

Mascot only

Sequest only

Both

Page 31: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

This is the Publish ViewThis is the Publish View

Page 32: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

http://www.mcponline.org/misc/ParisReport_Final.shtml

Journal of Molecular and Cellular ProteomicsJournal of Molecular and Cellular Proteomics

Publication Guidelines for Proteomic Data

Page 33: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

• Name and version of software used to extract peak list Name and version of software used to extract peak list

• Name and version of database searching software (Mascot, Name and version of database searching software (Mascot, Sequest, Spectrum Mill, or X! Tandem) Sequest, Spectrum Mill, or X! Tandem)

• Values of all search parameters used (enzyme, modifications, Values of all search parameters used (enzyme, modifications, mass tolerance, etc.)mass tolerance, etc.)

• Name and size of the database searched (Swisprot or NCBI and Name and size of the database searched (Swisprot or NCBI and the number of sequence entries)the number of sequence entries)

• Name and version of any additional software used for statistical Name and version of any additional software used for statistical analysis and an explanation of the analysis (Scaffold, #peptide analysis and an explanation of the analysis (Scaffold, #peptide requirements, probability settings)requirements, probability settings)

Data AnalysisData Analysis

Publication Guidelines for Proteomic Data

Page 34: Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Publication Guidelines for Proteomic Data

Each Protein IdentifiedEach Protein Identified• Accession numberAccession number

• Sequence coverage and total number of unique Sequence coverage and total number of unique peptides peptides

Each Peptide IdentifiedEach Peptide Identified

• Peptide sequence noting any modifications or Peptide sequence noting any modifications or missed cleavagesmissed cleavages

• Parent peptide ion mass and charge Parent peptide ion mass and charge

• All search engine scoresAll search engine scores