1 mass spectrometry-based proteomics xuehua shen (adapted from slides with textbook)
TRANSCRIPT
![Page 1: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/1.jpg)
1
Mass Spectrometry-based Proteomics
Xuehua Shen
(Adapted from slides with textbook)
![Page 2: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/2.jpg)
2
Outline
• Motivation of proteomics
• Mass spectrometry-based proteomics
• Instrumentation of mass spectrometry
• De novo sequencing algorithm
• Database search
• Algorithms of real software (e.g., sequence tags)
![Page 3: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/3.jpg)
3
Motivation
• Proteins are working units of the cells
– The number of found genes is much less than the number of expressed proteins
– Directly related with cell processes and diseases
>1,000,000 distinct protein forms
~30,000 human genes
DNA
Alternative
splicing
mRNA Protein
Post-translational
Modification
>100,000 RNA messages
SNP
![Page 4: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/4.jpg)
4
Tools for Proteomics
• Edman degradation reaction
• NMR (Nuclear Magnetic Resonance)
• X-ray crystallography
• Protein array
• Mass Spectrometry
![Page 5: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/5.jpg)
5
Mass Spectrometry-based Proteomics
• Primary sequence (sequencing, identification)
• Post-translational modification (PTM) (characterization)
• Quantitative proteomics (quantification)
• Protein-protein interaction
![Page 6: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/6.jpg)
6
![Page 7: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/7.jpg)
7
Components of Mass Spectrometer
• Ion source (ESI and MALDI)
• Mass analyzer (ion traps, TOF, Quadrupole, FT, etc.)
– Mass-to-charge ratio (m/z)
• Ion detector
![Page 8: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/8.jpg)
8
Peptide and Intact Protein
• Peptide: a fragment of protein
• Some enzymes, e.g. trypsin, break protein into peptides.
• Some technology put intact protein into the mass spectrometer
![Page 9: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/9.jpg)
9
Peptide Fragmentation
• Peptides tend to fragment along the backbone.
• Fragments can also loose neutral chemical groups like NH3 and H2O.
H...-HN-CH-CO . . . NH-CH-CO-NH-CH-CO-…OH
Ri-1 Ri Ri+1
H+
N-Terminus C-Terminus
Collision Induced Dissociation
![Page 10: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/10.jpg)
10
Ideal Mass Spectrum
![Page 11: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/11.jpg)
11
Real Mass Spectrum
![Page 12: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/12.jpg)
12
N- and C-terminal Peptides
N-term
inal
pep
tides
C-te
rmin
al p
eptid
es
![Page 13: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/13.jpg)
13
Terminal peptides and ion types
Peptide
Mass (D) 57 + 97 + 147 + 114 = 415
Peptide
Mass (D) 57 + 97 + 147 + 114 – 18 = 397
without
![Page 14: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/14.jpg)
14
N- and C-terminal Peptides
N-term
inal
pep
tides
C-te
rmin
al p
eptid
es
415
486
301
154
57
71
185
332
429
![Page 15: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/15.jpg)
15
N- and C-terminal Peptides
N-term
inal
pep
tides
C-te
rmin
al p
eptid
es
415
486
301
154
57
71
185
332
429
![Page 16: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/16.jpg)
16
N- and C-terminal Peptides
415
486
301
154
57
71
185
332
429
![Page 17: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/17.jpg)
17
N- and C-terminal Peptides
415
486
301
154
57
71
185
332
429
Problem:
Reconstruct peptide from the set of masses of fragment
![Page 18: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/18.jpg)
18
Mass Spectra
G V D L K
mass0
57 Da = ‘G’ 99 Da = ‘V’LK D V G
• The peaks in the mass spectrum:
– Prefix
– Fragments with neutral losses (-H2O, -NH3)
– Noise and missing peaks.
and Suffix Fragments.
D
H2O
![Page 19: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/19.jpg)
19
Protein Identification with MS/MS
G V D L K
mass0
Inte
nsity
mass0
MS/MSPeptide Identification:
![Page 20: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/20.jpg)
20
Protein Identification by Tandem Mass Spectrometry
SSeeqquueennccee
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
MS/MS instrumentMS/MS instrument
De Novo interpretation•SherengaDatabase search•Sequest
![Page 21: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/21.jpg)
21
De Novo vs. Database Search
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
WR
A
C
VG
E
K
DW
LP
T
L T
WR
A
C
VG
E
K
DW
LP
T
L T
De Novo
AVGELTK
Database Search
Database ofknown peptides
MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN..
Database ofknown peptides
MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN..
Mass, Score
![Page 22: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/22.jpg)
22
Current Status
• It is still a open problem of protein sequencing no matter whether using de novo sequencing or database search methods
• Following algorithms only deal with simplified (or ideal) spectrums
• Some algorithms combine de novo sequencing and database search
![Page 23: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/23.jpg)
23
Pros and Cons of de novo Sequencing
• Advantage:– Gets the sequences that are not necessarily in the database.
– An additional similarity search step using these sequences may identify the related proteins in the database.
• Disadvantage:– Requires higher quality data.
– Often contains errors.
![Page 24: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/24.jpg)
24
Outline
• Motivation of proteomics
• Mass spectrometry-based proteomics
• Instrumentation of mass spectrometry
• De novo sequencing
• Database search
• Algorithms of real software (e.g., sequence tags)
![Page 25: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/25.jpg)
25
De novo Peptide Sequencing
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
SequenceSequence
![Page 26: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/26.jpg)
26
Peptide Sequencing Problem
Goal: Find a peptide with maximal match between an experimental and theoretical spectrum.
Input:
– S: experimental spectrum
– Δ: set of possible ion types
– m: parent mass
Output:
– P: peptide with mass m, whose theoretical spectrum matches the experimental S spectrum the best
![Page 27: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/27.jpg)
27
Procedure of De Novo Sequencing
• Build spectrum graph
– How to create vertices (from masses)
– How to create edges (from mass differences)
• Find best path or rank paths of spectrum graph
– How to find candidate paths
– How to score paths
![Page 28: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/28.jpg)
28
S E Q U E N C Eb
Mass/Charge (M/Z)Mass/Charge (M/Z)
From Sequence to Spectrum
![Page 29: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/29.jpg)
29
a
Mass/Charge (M/Z)Mass/Charge (M/Z)
S E Q U E N C E
From Sequence to Spectrum(cont.)
![Page 30: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/30.jpg)
30
S E Q U E N C E
Mass/Charge (M/Z)Mass/Charge (M/Z)
a is an ion type shift in b
From Sequence to Spectrum(cont.)
![Page 31: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/31.jpg)
31
y
Mass/Charge (M/Z)Mass/Charge (M/Z)
E C N E U Q E S
From Sequence to Spectrum (cont.)
![Page 32: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/32.jpg)
32
Mass/Charge (M/Z)Mass/Charge (M/Z)
Inte
nsit
yIn
tens
ity
From Sequence to Spectrum (cont.)
![Page 33: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/33.jpg)
33
Mass/Charge (M/Z)Mass/Charge (M/Z)
Inte
nsit
yIn
tens
ity
From Sequence to Spectrum (cont.)
![Page 34: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/34.jpg)
34
noise
Mass/Charge (M/Z)Mass/Charge (M/Z)
From Sequence to Spectrum (cont.)
![Page 35: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/35.jpg)
35
MS/MS Spectrum
Mass/Charge (M/z)Mass/Charge (M/z)
Inte
nsit
yIn
tens
ity
![Page 36: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/36.jpg)
36
Some Mass Differences between Peaks Correspond to Amino Acids
ss
ssss
ee
eeee
ee
ee
ee
ee
ee
qquu
uu
uu
nn
nn
nn
ee
cc
cc
cc
![Page 37: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/37.jpg)
37
Now decoding from spectrum to sequence…?
Build spectrum graph
![Page 38: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/38.jpg)
38
Peptide Fragmentation
HO NH3+
| |
R1 O R2 O R3 O R4
| || | || | || |H -- N --- C --- C --- N --- C --- C --- N --- C --- C --- N --- C -- COOH | | | | | | | H H H H H H H
y3
b2
y2 y1
b3a2 a3
b2-H2O
y3 -H2O
b3- NH3
y2 - NH3
• Different ion types (b, y, b-NH3, b-H2O)• Fragment at one site (internal ions)
![Page 39: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/39.jpg)
39
Example of Ion Type
• Δ={δ1, δ2,…, δk}
• Ion types
{b, b-NH3, b-H2O}
correspond to
Δ={0, 17, 18} *Note: In reality the δ value of ion type b is -1 but we will “hide” it for the sake of simplicity
![Page 40: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/40.jpg)
40
Why Peptide Sequencing hard
• Two ladders of overlapping masses, could not tell whether it is b ion or y ion
• Incomplete fragmentation
• Chemical noise
• Mass accuracy of the instrument is not good enough (Q=K, G+V=156.090, R=156.101)
• Q: Is sequencing shorter or longer peptide harder?
![Page 41: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/41.jpg)
41
Vertices of Spectrum Graph
• Vertices are generated by reverse shifts corresponding to
ion types Δ={δ1, δ2,…, δk}
• Every mass s in an MS/MS spectrum generates k vertices
V(s) = {s+δ1, s+δ2, …, s+δk}
corresponding to potential N-terminal peptides
• Vertices of the spectrum graph:
{initial vertex}V(s1) V(s2) ... V(sm) {terminal vertex}
![Page 42: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/42.jpg)
42
Reverse Shifts
Shift in H2O+NH3
Shift in H2O
![Page 43: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/43.jpg)
43
Edges of Spectrum Graph
• Two vertices with mass difference corresponding to
an amino acid A:
– Connect with an edge labeled by A (Directed Graph)
• Gap edges for di- and tri-peptides
– Potential sequence tag method (covered later)
![Page 44: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/44.jpg)
44
Best Path of Spectrum Graph
• How to find candidate paths
• There are many paths, how to find the correct one?
• We need scoring to evaluate paths
![Page 45: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/45.jpg)
45
Find Candidate Paths
• Heuristics: find a path with maximum number
of edges
• Longest path problem in DAG
• DFS (Depth First Search)
![Page 46: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/46.jpg)
46
Path Score
• p(P,S) = probability that peptide P produces spectrum S= {s1,s2,…sq}
• Scoring = computing probabilities
![Page 47: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/47.jpg)
47
Finding Optimal Paths in the Spectrum Graph
• For a given MS/MS spectrum S, find a peptide P’ maximizing p(P,S) over all possible peptides P:
• Peptides = paths in the spectrum graph
• P’ = the optimal path in the spectrum graph
• Some software rank paths
p(P,S)p(P',S) Pmax
![Page 48: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/48.jpg)
48
Ratio Test Scoring for Partial Peptides
• Incorporates premiums for observed ions and penalties for missing ions.
• Example: for k=4, assume that for a partial peptide P’ we only see ions δ1,δ2,δ4.
The score is calculated as:
RRRR q
q
q
q
q
q
q
q 4321
)1(
)1(
![Page 49: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/49.jpg)
49
Why Not Sequence De Novo?
• De novo sequencing is still not very accurate!
• Less than 30% of the peptides sequenced were completely correct!
Algorithm Amino Acid Accuracy
Whole Peptide Accuracy
Lutefisk (Taylor and Johnson, 1997). 0.566 0.189
SHERENGA (Dancik et. al., 1999). 0.690 0.289
Peaks (Ma et al., 2003). 0.673 0.246
PepNovo (Frank and Pevzner, 2005). 0.727 0.296
![Page 50: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/50.jpg)
50
De Novo vs. Database Search
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
WR
A
C
VG
E
K
DW
LP
T
L T
WR
A
C
VG
E
K
DW
LP
T
L T
De Novo
AVGELTK
Database Search
Database ofknown peptides
MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN..
Database ofknown peptides
MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN..
![Page 51: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/51.jpg)
51
Outline
• Motivation of proteomics
• Mass spectrometry-based proteomics
• Instrumentation: Mass Spectrometry
• De novo sequencing algorithm
• Database search
• Algorithms of real software (e.g., sequence tags)
![Page 52: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/52.jpg)
52
Peptide Identification Problem
Goal: Find a peptide from the database with maximal match between an experimental and theoretical spectrum.
Input:
– S: experimental spectrum
– database of peptides
– Δ: set of possible ion types
– m: parent mass
Output:
– A peptide of mass m from the database whose theoretical spectrum matches the experimental S spectrum the best
![Page 53: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/53.jpg)
53
Match between Spectra and the Shared Peak Count
• The match between two spectra is the number of masses
(peaks) they share (Shared Peak Count or SPC)
• In practice mass-spectrometrists use the weighted SPC
that reflects intensities of the peaks
• Match between experimental and theoretical spectra is
defined similarly
![Page 54: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/54.jpg)
54
MS/MS Database Search
Database search in mass-spectrometry has been successful in identification of already known proteins.
Experimental spectrum can be compared with theoretical spectra of database peptides to find the best fit.
SEQUEST (Yates et al., 1995)
But reliable algorithms for identification of peptides is a much more difficult problem.
Q: Why can a peptide be not identical to a sequence in the database
![Page 55: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/55.jpg)
55
Deficiency of the Shared Peaks Count
Shared peaks count (SPC): intuitive measure of spectral similarity.
Problem: SPC diminishes very quickly as the number of mutations increases.
Only a small portion of correlations between the spectra of mutated peptides is captured by SPC.
![Page 56: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/56.jpg)
56
SPC Diminishes Quickly
S(PRTEIN) = {98, 133, 246, 254, 355, 375, 476, 484, 597, 632}
S(PRTEYN) = {98, 133, 254, 296, 355, 425, 484, 526, 647, 682}
S(PGTEYN) = {98, 133, 155, 256, 296, 385, 425, 526, 548, 583}
no mutations
SPC=10
1 mutation
SPC=5
2 mutations
SPC=2
![Page 57: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/57.jpg)
57
Post-Translational ModificationsProteins are involved in cellular signaling and
metabolic regulation.
They are subject to a large number of biological modifications.
Almost all protein sequences are post-translationally modified and 200 types of modifications of amino acid residues are known.
![Page 58: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/58.jpg)
58
Examples of Post-Translational Modification
Post-translational modifications increase the number of “letters” in amino acid alphabet and lead to a combinatorial explosion in both database search and de novo approaches.
![Page 59: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/59.jpg)
59
Search for Modified Peptides: Virtual Database Approach
Yates et al.,1995: an exhaustive search in a virtual database of all modified peptides.
Exhaustive search leads to a large combinatorial problem, even for a small set of modifications types.
Problem (Yates et al.,1995). Extend the virtual database approach to a large set of modifications.
![Page 60: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/60.jpg)
60
Modified Peptide Identification Problem
Goal: Find a modified peptide from the database with maximal match between an experimental and theoretical spectrum.
Input:
– S: experimental spectrum
– database of peptides
– Δ: set of possible ion types
– m: parent mass
– Parameter k (# of mutations/modifications)
Output:
– A peptide of mass m that is at most k mutations/modifications apart from a database peptide and whose theoretical spectrum matches the experimental S spectrum the best
![Page 61: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/61.jpg)
61
Spectrum Alignment
• See 8.14 and 8.15 in the text book for one algorithm
• Complicated for real spectrums
![Page 62: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/62.jpg)
62
Outline
• Motivation of proteomics
• Mass spectrometry-based proteomics
• Instrumentation: Mass Spectrometry
• De novo sequencing algorithm
• Database search
• Algorithms of real software (e.g., sequence tags)
![Page 63: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/63.jpg)
63
Combining de novo and Database Search in Mass-Spectrometry
• So far de novo and database search were presented as two separate techniques
• Database search is rather slow: many labs generate more than 100,000 spectra per day. SEQUEST takes approximately 1 minute to compare a single spectrum against SWISS-PROT (54Mb) on a desktop.
• It will take SEQUEST more than 2 months to analyze the MS/MS data produced in a single day.
• Q: Can slow database search be combined with fast de novo analysis?
![Page 64: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/64.jpg)
64
What Can be Done with De Novo?
• Given an MS/MS spectrum:
– Can de novo predict the entire peptide sequence?
– Can de novo predict a set of partial sequences, that with high probability, contains at least one correct tag?
A Covering Set of Tags
- No! (accuracy is less than 30%).
- Yes!
![Page 65: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/65.jpg)
65
Peptide Sequence Tags
• A Peptide Sequence Tag is short substring of a peptide.
Example: G V D L KG V D V D L
D L K
Tags:
![Page 66: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/66.jpg)
66
Filtration with Peptide Sequence Tags
• Peptide sequence tags can be used as filters in database searches.
• The Filtration: Consider only database peptides that contain the tag (in its correct relative mass location).
• First suggested by Mann and Wilm (1994).
• Similar concepts also used by:– GutenTag - Tabb et. al. 2003.
– MultiTag - Sunayev et. al. 2003.
– OpenSea - Searle et. al. 2004.
![Page 67: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/67.jpg)
67
Why Filter Database Candidates?
• Effective filtration can greatly speed-up the process, enabling expensive searches involving post-translational modifications.
• Goal: generate a small set of covering tags and use them to filter the database peptides.
![Page 68: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/68.jpg)
68
Summary
• Protein sequencing
• Mass spectrum
• De novo search and database search
• Difficulty of protein sequencing
![Page 69: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/69.jpg)
69
The End
![Page 70: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/70.jpg)
70
Quality Measure of Mass Spectrometer
• Sensitivity
• Mass accuracy
• Resolution
• Dynamic range
![Page 71: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/71.jpg)
71
Exhaustive Search for Modified Peptides
• YFDSTDYNMAK
• 25=32 possibilities, with 2 types of modifications!
Phosphorylation?
Oxidation?
• For each peptide, generate all modifications.
• Score each modification.
![Page 72: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/72.jpg)
72
Peptide Identification Problem: Challenge
Very similar peptides may have very different spectra!
Goal: Define a notion of spectral similarity that correlates well with the sequence similarity.
If peptides are a few mutations/modifications apart, the spectral similarity between their spectra should be high.
![Page 73: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/73.jpg)
73
Why Filtration ?
Scoring
Protein Query
Sequence Alignment – Smith Waterman Algorithm
Sequence matches
Protein Sequences
Filtration
Filtered Sequences
Sequence Alignment – BLAST
Database
actgcgctagctacggatagctgatccagatcgatgccataggtagctgatccatgctagcttagacataaagcttgaatcgatcgggtaacccatagctagctcgatcgacttagacttcgattcgatcgaattcgatctgatctgaatatattaggtccgatgctagctgtggtagtgatgtaaga
• BLAST filters out very few correct matches and is almost as accurate as Smith – Waterman algorithm.
Database
actgcgctagctacggatagctgatccagatcgatgccataggtagctgatccatgctagcttagacataaagcttgaatcgatcgggtaacccatagctagctcgatcgacttagacttcgattcgatcgaattcgatctgatctgaatatattaggtccgatgctagctgtggtagtgatgtaaga
![Page 74: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/74.jpg)
74
Filtration and MS/MS
Scoring
MS/MS spectrum
Peptide Sequencing – SEQUEST / Mascot
Sequence matches
Peptide Sequences
Filtration
Peptide Sequences
Database
MDERHILNMKLQWVCSDLPTYWASDLENQIKRSACVMTLACHGGEMNGALPQWRTHLLERTYKMNVVGGPASSDALITGMQSDPILLVCATRGHEWAILFGHNLWACVNMLETAIKLEGVF
GSVLRAEKLNKAAPETYIN..
Database
MDERHILNMKLQWVCSDLPTYWASDLENQIKRSACVMTLACHGGEMNGALPQWRTHLLERTYKMNVVGGPASSDALITGMQSDPILLVCATRGHEWAILFGHNLWACVNMLETAIKLEGVF
GSVLRAEKLNKAAPETYIN..
![Page 75: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/75.jpg)
75
Filtration in MS/MS Sequencing
• Filtration in MS/MS is more difficult than in BLAST.
• Early approaches using Peptide Sequence Tags were not able to substitute the complete database search.
• Current filtration approaches are mostly used to generate additional identifications rather than replace the database search.
• Can we design a filtration based search that can replace the database search, and is orders of magnitude faster?
![Page 76: 1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)](https://reader035.vdocuments.us/reader035/viewer/2022062321/56649d9c5503460f94a84e92/html5/thumbnails/76.jpg)
76
Asking the Old Question Again: Why Not Sequence De Novo?
• De novo sequencing is still not very accurate!
Algorithm Amino Acid Accuracy
Whole Peptide Accuracy
Lutefisk (Taylor and Johnson, 1997). 0.566 0.189
SHERENGA (Dancik et. al., 1999). 0.690 0.289
Peaks (Ma et al., 2003). 0.673 0.246
PepNovo (Frank and Pevzner, 2005). 0.727 0.296