jimmy eng - toolstools.proteomecenter.org/course/lectures/0610-day1.eng.pdf · 2006-10-16 · •...

60
MS/MS Database Searching Jimmy Eng Day 1 October 16, 2006

Upload: others

Post on 13-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

1

MS/MS Database Searching

Jimmy EngDay 1

October 16, 2006

Page 2: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

2

Day 1 Lecture Topics

• Basic background & motivation• Peptide fragmentation, nomenclature• Peptide vs. tandem mass spectra• Sequence database searching

– Databases– Enzymes– Modifications

• Interpretation of search results; manual validation

• Introduction to software tools

Page 3: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

3

HPLC

Identify proteinsin complex

1D or 2D chromatographicseparation of peptidesDenatured protein

complexPeptides

Mass SpecDb search

Protein Identification Strategy

Page 4: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

4

TPPTPP

xINTERACTxINTERACT

PeptideProphetPeptideProphet XPRESS/ASAPRatioLibra

XPRESS/ASAPRatioLibra

mzXML file formatmzXML file format

ProteinProphetProteinProphet

SBEAMSSBEAMS

PeptideAtlasPeptideAtlas

Pep3DPep3DSEQUEST/COMETMascot/ProbID/SpectraST

SEQUEST/COMETMascot/ProbID/SpectraST

CytoscapeCytoscape

LC-MS/MS DataLC-MS/MS Data

pepXML file formatpepXML file format

protXML file formatprotXML file format

QualscoreQualscore

Gaggle…Gaggle…

XLinkXLink

Page 5: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

5

MassAnalysis

peptidesprotein peptides+

+

+

+

++++

IonizationDigestion

Single Stage MS

m/z

MS

Page 6: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

6

Ionization Isolation Fragmentation MassAnalysis

proteinpeptide

fragments

Digestion

peptides++

+

+

++

++

Tandem MS

++

+++

++

++++ +

m/zm/z

MS MS/MS

Page 7: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

7

time (scan #)

inte

nsity

m/z

m/z

inte

nsity

2D view: m/z, intensity

3D view: m/z, intensity, time

Mass vs. Intensity vs. Time

Page 8: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

8

Mass vs. Intensity vs. Timein

ten s

ity

MS scans

time (scan #)

m/z

m/zm/z

m/z

Page 9: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

9

Mass vs. Intensity vs. Time

MS scans

time (scan #)

inte

nsity m/z

m/zm/z

m/z

1000.2

Page 10: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

10

tryp m yo 0 1 # 2 9 4 R T : 9 .8 9 A V : 1 N L : 1 .1 2 E 7T : + c F u l l m s [ 3 0 0 .0 0 -1 6 0 0 .0 0 ]

4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0m /z

0

5

1 0

1 5

2 0

2 5

3 0

3 5

4 0

4 5

5 0

5 5

6 0

6 5

7 0

7 5

8 0

8 5

9 0

9 5

1 0 0

Rel

ativ

e Ab

unda

nce

6 6 1 .6

7 0 4 .24 9 6 .4

5 2 8 .39 9 1 .76 1 8 .73 4 2 .7

7 0 5 .1

9 9 2 .64 6 4 .4 9 5 2 .39 2 7 .1 1 1 2 8 .27 9 9 .95 8 0 .4 1 2 8 9 .8 1 4 8 5 .01 3 8 7 .1 1 5 4 1 .3

MS/MS Data Acquisition

2. Select an ion

1. Acquire full (MS) scan

3. Isolate ion

MS/MS scan

4. Fragment ion

Page 11: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

11

MS vs. MS/MS

MS

time (scan #)

inte

nsity

m/z

m/zm/z

m/z

MS/MS

Page 12: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

12

2D view of an LC-MS experiment

You’ll learn all about Pep3D

soon!

Page 13: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

13

Amino Acids

Amino acid 3LC SLC Average MonoisotopicGlycine Gly G 57.0519 57.02146Alanine Ala A 71.0788 71.03711Serine Ser S 87.0782 87.02303Proline Pro P 97.1167 97.05276Valine Val V 99.1326 99.06841Threonine Thr T 101.1051 101.04768Cysteine Cys C 103.1388 103.00919Leucine Leu L 113.1594 113.08406Isoleucine Ile I 113.1594 113.08406Asparagine Asn N 114.1038 114.04293Aspartic acid Asp D 115.0886 115.02694Glutamine Gln Q 128.1307 128.05858Lysine Kys K 128.1741 128.09496Glutamic acid Glu E 129.1155 129.04259Methionine Met M 131.1926 131.04049Histidine His H 137.1411 137.05891Phenyalanine Phe F 147.1766 147.06841Arginine Arg R 156.1875 156.10111Tyrosine Tyr Y 163.1760 163.06333Tryptophan Trp W 186.2132 186.07931

Page 14: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

14

Average vs. Monoisotopic Mass

Monoisotopic mass

For example:DIGSESTEDQAMEDIK

Mono MH+: 1767.7594 DaAvg MH+: 1768.8438 Da

Average mass – centroid of isotopic envelope

Charge state = 1 / ΔmΔm

Difference in mass can be significant!

Page 15: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

15

Fragment Ions

H2N C C N C C N C C N C COOH

H H H H H H H

R1 R2 R3 R4O O O

a1

x3 x2 x1

a2 a3b1

y3 y2 y1

b2 b3c1

z3 z2 z1

c2 c3

H+

http://www.matrixscience.com/help/fragmentation_help.html

Page 16: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

16

d-, v-, and w-ions are created by side chain cleavage. These ions are typically generated during high energy collision induced dissociation conditions. Of note, d- and w- ions allow the isobaric residues leucineand isoleucine to be differentiated.

H2N C C N C

H H H

R1 O CHR’

d2

H+

C C N C COOH

H H

R4OCHR’

H w2

H+

C C N C COOH

H H H

R4O

HN

v2

H+

http://www.matrixscience.com/help/fragmentation_help.html

Fragment Ion Types

Page 17: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

17

Immonium Ions

An internal fragment with just a single side chain formed by a combination of a type and y type cleavage is called an immonium ion. The presence of these ions can be a diagnostic to the presence of the corresponding amino acid in the peptide sequence.

http://www.abrf.org/ResearchGroups/MassSpectrometry/EPosters/ms97quiz/residueMasses.html

Amino Acid Residue Mass Immonium ion mass Amino Acid Residue Mass Immonium ion massGlycine 57.02147 30.03438 - Asparagine 114.04293 87.05584 +Alanine 71.03712 44.05003 - Aspartic acid 115.02695 88.03986 +Serine 87.03203 60.04494 + Glutamine 128.05858 101.0715 +Proline 97.05277 70.06568 ++ Lysine 128.09497 101.1079 (84.08136)Valine 99.06842 72.08133 ++ Glutamic acid 129.0426 102.0555 +Threonine 101.04768 74.06059 + Methionine 131.04049 104.0534 +Cysteine 103.00919 76.0221 - - oxidized methionine 147.0354 120.0483 +- carbamidomethylated 160.03065 133.0436 + Histidine 137.05891 110.0718 ++- carboxymethylated 161.01466 134.0276 + Phenylalanine 147.06842 120.0813 ++- acrylamide adduct 174.0643 147.0772 + Arginine 156.10112 129.114 -Isoleucine 113.08407 86.09698 ++ Tyrosine 163.06333 136.0762 ++Leucine 113.08407 86.09698 ++ Tryptophan 186.07392 159.0922 +

http://www.matrixscience.com/help/fragmentation_help.html

Page 18: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

18

70 → P86 → I/L

120 → F

MALDI-TOF-TOF tandem mass spectrum

APNDFNLKrabbit glycogen phosphorylase

70 → P86 → I/L

120 → F

Immonium Ions

Page 19: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

19

D L Y S K

D

D L

D L Y

D L Y S

L Y S K

Y S K

S K

K

N-terminal fragments C-terminal fragments

+

Peptide Fragmentation

Page 20: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

20

A-P-N-D-F-N-L-K(MH+ 918.5)

B-ions Y-ions72.0 A P-N-D-F-N-L-K 847.4

169.1 A-P N-D-F-N-L-K 750.4283.1 A-P-N D-F-N-L-K 636.3398.2 A-P-N-D F-N-L-K 521.3545.2 A-P-N-D-F N-L-K 374.2659.3 A-P-N-D-F-N L-K 260.2772.4 A-P-N-D-F-N-L K 147.1

monoisotopic masses

Fragmenting a Peptide

Page 21: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

21

A-P-N-D-F-N-L-K(MH+ 918.5)

Sequence vs. Tandem Mass Spectrum

Page 22: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

22

A P N D F N L K

B-ions

Sequence vs. Tandem Mass Spectrum

Page 23: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

23

APNDFNLKY-ions

Sequence vs. Tandem Mass Spectrum

Page 24: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

24

Sequence vs. Tandem Mass Spectrum

A P N D F N L K

APNDFNLK

Page 25: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

25

Raw, uninterpretedMS/MS spectra Sequence Database

>SEQ1CVVEELCPTPEGKDIGESVDLLKLQWCWENGTLRSLDCDVVS>SEQ2DLRSWTVRIDALNHGVKPHPPNVSVVDLTNR>

Uninterpreted MS/MS Database Search

Page 26: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

26

Input: • Fragmentation spectrum• Precursor mass, charge state

1. From database, select peptides that equal the input mass

2. Theoretically fragment peptides3. Compare theoretical fragments to

acquired spectrum4. Generate score5. Rank by score and display best

matches

SequenceDatabase

Uninterpreted MS/MS Database Search

Page 27: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

27

Raw MS/MS spectra

Sequence Database

>SEQ1CVVRELCPTPEGKDIGESVDLLKLQWCWENGTLRSLDCDVVSRDIGSESTEDRAMEDIK>SEQ2DLRSWTVRIDALNHGVKPHPPNVSVVDLTNRGDVEKGKKIFVQKCAQCHTVEKGGKHKT

Similarity score1.000.340.29

Peptides ofsame nominal

mass

Uninterpreted MS/MS Database Search

Page 28: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

28

MASCOT

Page 29: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

29

MASCOT

Page 30: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

30

MASCOT

Page 31: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

31

MASCOT

Page 32: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

32

Mascot Score?

From presentation on MatrixScience web site:• Each ion series is matched and scored independently• If an ion series contains only a random number of

matches, or less, it is discarded• All combinations of the ion series with non-random

levels of matching are tested to see which combination will give the highest score

• Having “too many” ion series doesn’t affect the score, it just reduces specificity

Page 33: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

33

Interpreting Mascot results

• Ions Score = -10 x Log(P)

– Calculation of P is ‘black box’

– Extension of the MOWSE score

Page 34: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

34

Interpreting Mascot results

• Identity threshold = -10 x Log(E/N)– E is the significance threshold– N is the number of peptides in the database matching the

precursor mass

• Example– If you can accept a 1 in 20 chance of a false positive select an

E of 0.05– If there are 4000 peptides that match the precursor ion mass

S = -10 x Log(0.05/4000)= 49

Matrix Science http://www.matrixscience.com/pdf/2005WKSHP4.pdf

Page 35: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

35

Interpreting Mascot results

• Homology threshold – “The homology threshold is an empirical

measure of whether the match is an outlier”

Matrix Science http://www.matrixscience.com/pdf/2005WKSHP4.pdf

Page 36: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

36

Interpreting Mascot results

• Expectation value– The number of times you could expect to get this

score or better by chance• E = Pthresh x (10 ^ ((Sthresh - score) / 10))• If Pthresh = 0.05 and Sthresh = 50

– score = 40 corresponds to E = 0.5– score = 50 corresponds to E = 0.05– score = 60 corresponds to E = 0.005

Matrix Science http://www.matrixscience.com/pdf/2005WKSHP4.pdf

Page 37: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

37

• Protein, nucleic acid, and EST sequence databases

• Optionally include enzyme specificity in the search

• Post-translation modifications can be identified

• Search software

MS/MS Database Search Parameters

Page 38: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

38

Raw genomic

Transcript or EST

Protein sequence

Sequence Databases

Page 39: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

39

• Protein, nucleic acid, and short EST sequence databases can all be searched

• Optionally include enzyme specificity in the search

• Post-translation modifications can be identified

• Search software

MS/MS Database Search Parameters

Page 40: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

40

DB: enzyme constraint

Page 41: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

41

GDVEKGTKIFVQKCAQCHTVEKGGKHKTGPNLHGLFGSK

TGQAPGFSYTDANKNKGITWGEETLMEYLENPKSYIPGT

GDVEKGKKIFVQKCAQCHTVEKGGKHKTGPNLHGLFGRK

TGQAPGFSYTDANKNKGITWGEETLMEYLENPKKYIPGT

tryptic peptides:

enzyme-unconstrained peptides:

DB: enzyme constraint

Page 42: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

42

human IPI database, 47,754

# tryptic # unconstr.mass peptides peptides factor

1000 Da 1,430 321,999 225x

2000 Da 466 325,096 697x

3000 Da 249 317,750 1276x

DB: tryptic peptides vs. unconstrained search

Page 43: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

43

• Protein, nucleic acid, and short EST sequence databases can all be searched

• Optionally include enzyme specificity in the search

• Post-translation modifications can be identified

• Search software

MS/MS Database Search Parameters

Page 44: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

44

• Static Modification– All occurrences of an amino acid is modified

• Variable/Differential Modification– One or more occurrences of an amino acid may

be modified

• Modifications can typically be specified on any residue(s) or termini.

Post-Translation Modifications

Page 45: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

45

1. DIGSESTEDQAMEDYK 3. DIGSESTEDQAMEDYK

2. DIGSESTEDQAMEDYK 4. DIGSESTEDQAMEDYK

P

P PP

Serine phosphorylation:

How many peptide forms are possible if you consider serine and threonine phosphorylation for the above peptide? Serine + threonine + tyrosine?

Variable Modifications

Page 46: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

46

human IPI database, 47,754

# tryptic phos STY # unconstrmass peptides tryptic factor peptides

1000 Da 1,430 5,093 3.5x 321,999

2000 Da 466 7,283 15.6x 325,096

3000 Da 249 16,761 67.3x 317,750

unconstrphos STY

1,167,740

4,538,383

15,641,722

Variable Modification Search

Page 47: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

47

• Protein, nucleic acid, and short EST sequence databases can all be searched

• Optionally include enzyme specificity in the search

• Post-translation modifications can be identified

• Search software

Uninterpreted MS/MS Database Search

Page 48: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

48

• Phenyx

• SpectrumMill

• ProteinPilot

• SEQUEST

• X! Tandem

• OMSSA

• ProbID

What about other programs?

Page 49: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

49

ProbID

Page 50: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

50

ProbID

Page 51: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

51

ProbID

Page 52: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

52

ProbID

Immonium ions:H, M, W, Y, F

pr(II(S)|k,B) = 1 – i/5where i = # of immonium

peaks in spectrum w/ocorresponding amino acid

Unmatched ions:pr(N(S)|k,B) = (1/massmax – massmin)r

where r = # of unmatched ionsand massmax & massmin are the

highest and lowest peaks in spectrum

Page 53: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

53

ProbIDMatch pattern:

pr(pat(S)|k,B) = (# of matched ion pairs) / 3(n-1)n = # of AA in peptide

Matched ions:

ai = amplitude of each peakmi = mass of each peak

σ = mass accuracy std dev

∏−

=2

2

2)(

)|)pr(M( σmm

i

i

eak,BS

Page 54: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

54

ProbID output

Page 55: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

55

X! Tandem

• Open source search engine• Very fast• Lots of user-definable search options• Built-in “refinement” mode

Page 56: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

56

X! Tandem refinement mode:

1st pass search

(Tryptic, Ox M)Full

database

Identified proteins

Subset DB2nd pass search

multiple parameters

Not identified in 1st pass

Page 57: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

57

Interpretation Rules

K.LLGNQATFSPIVTVEPR.R

K.SPSDVKPLPSPDTDVPLSSVE.I

D.PEDVFTENPDEKSIITY.V

An enzyme un-restricted search can greatly assist in the interpretation process.

Look for peptides that exhibit the expected cleavage at both the N- and C-terminus.

Don’t bother with peptides that exhibit no correct cleavage.

Page 58: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

58

Match all fragment ions!

Correct identifications don’t exhibit random fragment ion matches. Look for a series of y-ions or b-ions.

Trypsin leaves a basic residue (K or R) at the C-terminus which translate to strong y-ions so hopefully the big peaks match y-ions.

Interpretation Rules

Page 59: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

59

If a big peak matches a y-ion from an N-terminal cleavage of proline, that is a good indication of a correct identification.

The reverse is not true: a proline in a peptide that does not correspond to a big peak is not an indication of an incorrect identification.

Interpretation Rules

Page 60: Jimmy Eng - Toolstools.proteomecenter.org/course/lectures/0610-Day1.Eng.pdf · 2006-10-16 · • Protein, nucleic acid, and short EST sequence databases can all be searched • Optionally

60

Random or reverse databases?

MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIAFAQYRQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVA

MKSYASSFLFLLSIFTVWRGVFRRHADKHAVESRFKFNEEGLDKYQAFAILVLARVHDEFPCQQKAFETVENVLKDCNEASEDAVCTKDGFLTHLSKAVTCL

Original sequence:

Reverse peptide sequence:

When searching forward + reverse sequence database, estimated number of incorrect matches is:

2 * (# reverse matches passing cutoff)# total matches passing cutoff