![Page 1: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/1.jpg)
Proteomics Informatics – Protein identification II: search engines and
protein sequence databases (Week 5)
![Page 2: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/2.jpg)
The response to random input data should be random.
Maximum number of correct identification and minimum
number of incorrect identifications for any data set.
Maximal separation between scores for correct
identifications and the distribution of scores for random
matching proteins for any data set.
The statistical significance of the results should be
calculated.
The searches should be fast.
General Criteria for a Good Protein Identification Algorithms
![Page 3: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/3.jpg)
Search Parameters
Parent tolerance +/- daltons/ppm
Frag. Tolerance +/- daltons/ppm
Complete mods Cys alkylation
Potential mods
(artifacts)
Met/Trp oxidation,
Gln/Asn deamidation
Potential mods
(PTMs)
Phosphoryl, sulfonyl, acetyl, methyl, glycosyl, GPI
Cleavage Trypsin ([KR]|{P})
Scoring method Scores or statistics
Sequences FASTA files
![Page 4: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/4.jpg)
MS
Identification – Peptide Mass Fingerprinting
MS
Digestion
All Peptide Masses
Pick Protein
Compare, Score, Test Significance
Rep
ea
t for e
ac
h p
rote
in
Sequence DB
Identified Proteins
![Page 5: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/5.jpg)
Response to Random Data
Nor
malized F
requ
enc
y
![Page 6: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/6.jpg)
ProFound – Search Parameters
http://prowl.rockefeller.edu/
![Page 7: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/7.jpg)
ProFound – Protein Identification by Peptide Mapping
pattern
r
i
iirr
i
i F
mmrmm
gN
rNIkPDIkP
2
1
2
0
minmax
1 2
)(
2exp
2!
)!()|()|(
W. Zhang & B.T. Chait,
Analytical Chemistry
72 (2000) 2482-2489
![Page 8: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/8.jpg)
ProFound Results
![Page 9: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/9.jpg)
Peptide Mapping – Mass Accuracy
ProFound
0
1
2
3
4
5
6
7
0 0.5 1 1.5 2
Mass Tolerance (Da)
-lo
g(e
)
Mascot
0
20
40
60
80
100
120
140
0 0.5 1 1.5 2
Mass Tolerance (Da)S
co
re
![Page 10: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/10.jpg)
Peptide Mapping - Database Size
S. cerevisiae
Fungi
All Taxa
Expectation Values
Peptide mapping example:
S. Cerevisiae 4.8e-7
Fungi 8.4e-6
All Taxa 2.9e-4
![Page 11: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/11.jpg)
Missed Cleavage Sites
u = 1
u = 2
u = 4
Expectation Values
Peptide mapping example:
u=1 4.8e-7
u=2 1.1e-5
u=4 6.8e-4
![Page 12: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/12.jpg)
Peptide Mapping - Partial Modifications
No Modifications
Phophorylation (S, T, or Y)
Searched Searched With
Without Possible
Modifications Phosphorylation
of S/T/Y
DARPP-32 0.00006 0.01
CFTR 0.00002 0.005
Even if the protein is modified it is usually better to
search a protein sequence database without
specifying possible modifications using peptide
mapping data.
![Page 13: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/13.jpg)
Peptide Mapping - Ranking by Direct Calculation of the Significance
![Page 14: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/14.jpg)
MS/MS
Lysis
Fractionation
Tandem MS – Database Search
MS/MS
Digestion
Sequence DB
All Fragment Masses
Pick Protein
Compare, Score, Test Significance
Rep
eat fo
r all p
rote
ins
Pick Peptide LC-MS
Rep
ea
t for
all p
ep
tides
![Page 15: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/15.jpg)
Algorithms
![Page 16: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/16.jpg)
Comparing and Optimizing Algorithms
Score
Score 1-Specificity
1-Specificity
Se
ns
itiv
ity
Se
ns
itiv
ity
Algorithm 1
Algorithm 2
True
True
False
False
Score
Score 1-Specificity
1-Specificity
Se
ns
itiv
ity
Se
ns
itiv
ity
Algorithm 1
Algorithm 2
True
True
False
False
![Page 17: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/17.jpg)
17
MS/MS - Parent Mass Error and Enzyme Specificity
)!!( ybIII nnxx
Expectation Values
MS/MS example:
Dm=2, Trypsin 2.5e-5
Dm=100, Trypsin 2.5e-5
Dm=2, non-specific 7.9e-5
Dm=100, non-specific 1.6e-4
![Page 18: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/18.jpg)
Sequest
Cross-correlation
![Page 19: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/19.jpg)
X! Tandem - Search Parameters
http://www.thegpm.org/
![Page 20: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/20.jpg)
X! Tandem - Search Parameters
![Page 21: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/21.jpg)
X! Tandem - Search Parameters
![Page 22: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/22.jpg)
sequences
sequences
spectra
Conventional,
single stage searching
Generic search engine
Test all
cleavages,
modifications,
& mutations
for all sequences
![Page 23: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/23.jpg)
Determining potential modifications
- e.g., oxidation, phosphorylation, deamidation
- calculation order 2n
- NP complete
Some hard problems in MS/MS analysis in proteomics
Allowing for unanticipated peptide cleavages - e.g., chymotryptic contamination in trypsin - calculation order ~ 200 × tryptic cleavage - “unfortunate” coefficient
Detecting point mutations - e.g., sequence homology - calculation order 18N
- NP complete
![Page 24: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/24.jpg)
sequences
sequences
spectra
Multi-stage searching
Tryptic
cleavage
Modifications #1
Modifications #2
Point mutation
X! Tandem
![Page 25: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/25.jpg)
Search Results
![Page 26: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/26.jpg)
Search Results
![Page 27: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/27.jpg)
Sequence Annotations
![Page 28: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/28.jpg)
Search Results
![Page 29: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/29.jpg)
Search Results
![Page 30: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/30.jpg)
Mascot
http://www.matrixscience.com/cgi/search_form.pl?FORMVER=2&SEARCH=MIS
![Page 31: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/31.jpg)
Lysis
Fractionation
Digestion
LC-MS/MS
Identification – Spectrum Library Search
MS/MS
Spectrum Library
Pick
Spectrum
Compare, Score, Test Significance
Rep
eat fo
r a
ll sp
ec
tra
Identified Proteins
![Page 32: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/32.jpg)
1. Find the best 10 spectra for a particular
sequence, with the same PTMs and charge.
2. Add the spectra together and normalize the
intensity values.
3. Assign a “quality” value: the median
expectation value of the 10 spectra used.
4. Record the 20 most intense peaks in the
averaged spectrum, it’s parent ion z, m/z,
sequence, protein accessions & quality.
Steps in making an
Annotated Spectrum Library (ASL):
![Page 33: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/33.jpg)
0
2
4
6
8
10
0 10 20 30 40 50
peptide length
fraction o
f libra
ry (
%)
Spectrum Library Characteristics – Peptide Length
![Page 34: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/34.jpg)
0
10
20
30
40
50
10 30 50 70 90 110 130 150 170 190
protein Mr (kDa)
% c
ove
rag
e
residues
peptides
Spectrum Library Characteristics – Protein Coverage
![Page 35: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/35.jpg)
Library spectrum
Test spectrum
(5:25)
(5:25)
Results: 4 peaks selected, 1 peak missed
Identification – Spectrum Library Search
![Page 36: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/36.jpg)
Matches Probability
1 0.45
2 0.15
3 0.016
4 0.00039
5 0.0000037
Apply a hypergeometric probability model:
- 25 possible m/z values;
- 5 peaks in the library spectrum; and
- 4 selected by the test spectrum.
How likely is this?
Identification – Spectrum Library Search
![Page 37: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/37.jpg)
If you have 1000 possible m/z values and
20 peaks in test and library spectrum?
1.0E-14
1.0E-12
1.0E-10
1.0E-08
1.0E-06
1.0E-04
1.0E-02
1.0E+00
1 2 3 4 5 6 7 8 9 10
matches
p 1 matched: p = 0.6
5 matched: p = 0.0002
10 matched: p = 0.0000000000001
Identification – Spectrum Library Search
![Page 38: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/38.jpg)
Experimental
Mass Spectrum
Library of Assigned
Mass Spectra
M/Z
Best search result
Identification – Spectrum Library Search
![Page 39: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/39.jpg)
X! Hunter
![Page 40: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/40.jpg)
1. Use dot product to find a library spectrum
that best matches a test spectrum.
2. Calculate p-value with hypergeometric
distribution.
3. Use p-value to calculate expectation value,
given the identification parameters.
4. If expectation value is less than the median
expectation value of the library spectrum,
report the median value.
X! Hunter algorithm:
![Page 41: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/41.jpg)
X! Hunter Result
Query Spectrum
Library Spectrum
![Page 42: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/42.jpg)
Dynamic Range In Proteomics
Large discrepancy between the experimental dynamic
range and the range of amounts of different proteins in
a proteome
Experimental
Dynamic Range
Distribution of
Protein Amounts
Log (Protein Amount)
Nu
mb
er
of P
rote
ins
The goal is to identify and characterize all components of
a proteome
Desired Dynamic Range
![Page 43: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/43.jpg)
Loss of
material
Limit of amount
of material
Loss of
material
Limit of amount
of material
Separation
of material
Detection limit
Dynamic range
Mass
Separation
Detection
Mass
Separation
Peptide
Separation
Peptide
Labeling
Protein
Separation
Digestion
Protein
Labeling
Sample
Extraction
Ionization
Fragmentation
Protein AbundanceProtein Abundance
![Page 44: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/44.jpg)
Experimental Designs
Simulated
Protein Separation
Peptide
Separation
"Retention time" (bin)
y
1 k
y
1 k
# o
f
pe
pti
de
s
pe
r b
in
Mass SpectrometryMS
dynamic
range
10
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
m1
m2
m3
m4
m5
m6
10
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
m1
m2
m3
m4
m5
m6
Protein AbundanceProtein Abundance
Digestion
Sample
![Page 45: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/45.jpg)
Parameters in Simulation
● Distribution of protein amounts in sample
● Loss of peptides before binding to the column
● Loss of peptides after elution off the column
● Distribution of mass spectrometric response for
different peptides present at the same amount
● Total amount of peptides that are loaded on
column (limited by column loading capacity)
● # of peptide fractions
● # of Proteins in each fraction
● Total amount of peptides that are loaded on
column (limited by column loading capacity)
● # of peptide fractions
● Dynamic range of mass spectrometer
● Detection limit of mass spectrometer
Protein Separation
Peptide
Separation
"Retention time" (bin)
y
1 k
y
1 k
# o
f
pe
pti
de
s
pe
r b
in
Mass SpectrometryMS
dynamic
range
10
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
m1
m2
m3
m4
m5
m6
10
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
MS dynamic
range
m1
m2
m3
m4
m5m
6
m1
m2
m3
m4
m5
m6
Protein AbundanceProtein Abundance
Digestion
Sample
![Page 46: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/46.jpg)
Simulation Results for 1D-LC-MS
Complex Mixtures
of Proteins
RPC
Digestion
MS Analysis
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0 2 4 6 8 10log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0.00E+00
2.00E-03
4.00E-03
6.00E-03
8.00E-03
1.00E-02
1.20E-02
1.40E-02
0 2 4 6 8 10log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
No Protein
Separation
Protein
Separation:
10 fractions
Protein
Separation:
10 fractions
No Protein
Separation
Tissue
Tissue
Body Fluid
Body Fluid
![Page 47: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/47.jpg)
Success Rate of a Proteomics Experiment
DEFINITION: The success rate of a proteomics experiment
is defined as the number of proteins detected divided by
the total number of proteins in the proteome.
Log (Protein Amount)
Nu
mb
er
of P
rote
ins
Proteins
Detected
Distribution of
Protein Amounts
![Page 48: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/48.jpg)
Relative Dynamic Range of a Proteomics Experiment
DEFINITION: RELATIVE DYNAMIC RANGE, RDRx,
where x is e.g. 10%, 50%, or 90%
Log (Protein Amount)
RDR90
RDR50
RDR10 Fra
ctio
n o
f P
rote
ins
De
tec
ted
N
um
be
r o
f P
rote
ins
Proteins Detected
Distribution of Protein Amounts
![Page 49: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/49.jpg)
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000Number of Proteins in Mixture
Su
cc
es
s R
ate
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000Number of Proteins in Mixture
Re
lati
ve
Dy
na
mic
Ra
ng
e (
RD
R5
0)
0.00E+00
2.00E-03
4.00E-03
6.00E-03
8.00E-03
1.00E-02
1.20E-02
1.40E-02
0 2 4 6 8 10log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000Number of Proteins in Mixture
Su
cc
es
s R
ate
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000Number of Proteins in Mixture
Re
lati
ve
Dy
na
mic
Ra
ng
e (
RD
R5
0)
Number of Proteins in Mixture
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
Tissue
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0 2 4 6 8 10log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
Body Fluid Body Fluid 1 1 2
RDR50 Success Rate
Tissue
Body Fluid
1
1
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
Tissue 2
2
2
![Page 50: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/50.jpg)
0
0.2
0.4
0.6
0.8
1
0.01 0.1 1 10 100Amount Loaded [mg]
Re
lati
ve
Dy
na
mic
Ra
ng
e (
RD
R5
0)
0
0.2
0.4
0.6
0.8
1
0.01 0.1 1 10 100
Amount Loaded [mg]S
uc
ce
ss
Ra
te
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0 2 4 6 8 10log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0.00E+00
2.00E-03
4.00E-03
6.00E-03
8.00E-03
1.00E-02
1.20E-02
1.40E-02
0 2 4 6 8 10log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0
0.2
0.4
0.6
0.8
1
0.01 0.1 1 10 100
Amount Loaded [mg]S
uc
ce
ss
Ra
te
0
0.2
0.4
0.6
0.8
1
0.01 0.1 1 10 100Amount Loaded [mg]
Re
lati
ve
Dy
na
mic
Ra
ng
e (
RD
R5
0)
Amount of Peptides Loaded on the Column
Tissue Body Fluid Body Fluid 2 2 3
RDR50 Success Rate Tissue
Body Fluid
2
2
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
Tissue 3
3
3
![Page 51: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/51.jpg)
0
0.2
0.4
0.6
0.8
1
10 100 1000 10000 100000Number of Peptide Fractions
Re
lati
ve
Dy
na
mic
Ra
ng
e (
RD
R5
0)
0
0.2
0.4
0.6
0.8
1
10 100 1000 10000 100000Number of Peptide Fractions
Su
cc
es
s R
ate
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0 2 4 6 8 10log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0 2 4 6 8 10log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0
0.2
0.4
0.6
0.8
1
10 100 1000 10000 100000Number of Peptide Fractions
Su
cc
es
s R
ate
0
0.2
0.4
0.6
0.8
1
10 100 1000 10000 100000Number of Peptide Fractions
Re
lati
ve
Dy
na
mic
Ra
ng
e (
RD
R5
0)
Peptide Separation
Tissue Body Fluid Body Fluid 3 3 4
RDR50 Success Rate
Tissue
Body Fluid
3 3
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
Tissue 4
4 4
![Page 52: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/52.jpg)
Amount loaded and peptide separation
1. Protein separation
2. Amount loaded
3. Peptide separation
Order:
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive
Dyn
am
ic R
an
ge
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive
Dyn
am
ic R
an
ge
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
11
11
Tissue
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
11
11
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive
Dyn
am
ic R
an
ge
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive
Dyn
am
ic R
an
ge
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
22
Protein
separation
22
Tissue
11
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
11
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
22
Protein
separation
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive
Dyn
am
ic R
an
ge
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive
Dyn
am
ic R
an
ge
11
22
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
33
Amount
loaded 33
Tissue
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive
Dyn
am
ic R
an
ge
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive
Dyn
am
ic R
an
ge
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
11
11
Tissue
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
22
Protein
separation
22
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
44
Peptide
separation
44
33
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
33
Amount
loaded
1. Protein separation
2. Peptide separation
3. Amount loaded
11
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive
Dyn
am
ic R
an
ge
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive
Dyn
am
ic R
an
ge
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
22
Protein
separation
22
1111
Tissue
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive D
yn
am
ic R
an
ge
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive D
yn
am
ic R
an
ge Tissue
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
1111
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
22
Protein
separation
22
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
33
Peptide
separation
33
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive D
yn
am
ic R
an
ge
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Success Rate
Rela
tive D
yn
am
ic R
an
ge Tissue
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
1111
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
22
Protein
separation
22
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
44
Amount
loaded 44
0
0.005
0.01
0.015
0.02
0.025
0 1 2 3 4 5 6log(Protein Amount)
Nu
mb
er
of
Pro
tein
s
33
Peptide
separation
33
Protein separation
Amount loaded
Peptide separation
Ranges:
Protein separation: 30000 – 3000 proteins in each fraction
Amount loaded: 0.1 ug – 10 ug
Peptide separation: 100 – 1000 fractions
![Page 53: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/53.jpg)
Repeat Analysis
1 Analysis
![Page 54: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/54.jpg)
2 Analyses
Repeat Analysis
![Page 55: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/55.jpg)
3 Analyses
Repeat Analysis
![Page 56: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/56.jpg)
4 Analyses
Repeat Analysis
![Page 57: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/57.jpg)
5 Analyses
Repeat Analysis
![Page 58: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/58.jpg)
6 Analyses
Repeat Analysis
![Page 59: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/59.jpg)
7 Analyses
Repeat Analysis
![Page 60: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/60.jpg)
8 Analyses
Repeat Analysis
![Page 61: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/61.jpg)
Repeat Analysis: Simulations
0
0.1
0.2
0.3
0 2 4 6 8 10
Number of Repeats
Su
ce
ss
Ra
te
Experiment
Simulation
0
0.1
0.2
0.3
0.4
0.5
0 2 4 6 8 10
Number of Repeats
RD
R1
0
Experiment
Simulation
![Page 62: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/62.jpg)
Summary
• The success rate of proteome analysis is influenced by the following factors (listed in order of importance):
• Amount of peptides loaded on column or
mass spectrometric detection limit
• The degree of peptide separation or
mass spectrometric dynamic range
• The degree of protein separation
![Page 63: Proteomics Informatics Protein identification II: search engines and protein sequence …fenyolab.org/presentations/Proteomics_Informatics_2013/... · 2013-03-05 · protein sequence](https://reader033.vdocuments.us/reader033/viewer/2022050215/5f619c7f6c6e265656026a8c/html5/thumbnails/63.jpg)
Proteomics Informatics – Protein identification II: search engines and
protein sequence databases (Week 5)