making the most of your edman sequencing data: a primer on ... · a primer on data calling,...

Post on 21-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Making the Most of Your Making the Most of Your EdmanEdmanSequencing Data:Sequencing Data:

A Primer on Data Calling,A Primer on Data Calling,Analysis, Interpretation, andAnalysis, Interpretation, and

ReportingReporting

ESRG Tutorial, ABRF 2003, Feb 10-13ESRG Tutorial, ABRF 2003, Feb 10-13Denver, CODenver, CO

Ben Madden,Ben Madden, Mayo Clinic, Rochester, MN Mayo Clinic, Rochester, MN

Topics CoveredTopics Covered

1. Aspects of calling amino acids1. Aspects of calling amino acids 2. Factors that interfere with making2. Factors that interfere with making

amino acid assignmentsamino acid assignments 3. Database searching3. Database searching 4. Reporting results4. Reporting results 5. Calling, searching, and interpreting5. Calling, searching, and interpreting

sample examples.sample examples.

Goals of Goals of Edman Edman SequencingSequencing

Assign N-terminal sequenceAssign N-terminal sequenceIdentify the protein(s)/peptide(s) presentIdentify the protein(s)/peptide(s) presentin samplein sampleLocate position of mutation/modificationLocate position of mutation/modificationIndirectly establish presence of aminoIndirectly establish presence of aminoterminal modification (terminal modification (acetylationacetylation,,pyroglutamicpyroglutamic, etc..), etc..)

Data generationData generation

Detecting changes in the heights/areasDetecting changes in the heights/areasof peaks corresponding to PTH-aminoof peaks corresponding to PTH-aminoacids in a series of consecutive HPLCacids in a series of consecutive HPLCchromatograms.chromatograms.-- Increases in height, signal presence ofIncreases in height, signal presence of

amino acid at a particular cycle followed byamino acid at a particular cycle followed bydecrease at a later cycledecrease at a later cycle

-- some noise level changes are presentsome noise level changes are presentthroughout the runthroughout the run

Analysis of raw dataAnalysis of raw data

Requires the means to compare peakRequires the means to compare peakheights or peak areas in oneheights or peak areas in onechromatogram with the peak heightschromatogram with the peak heightsand areas in succeedingand areas in succeedingchromatograms.chromatograms.

Methods of calling aminoMethods of calling aminoacidsacids

Strip chart recorder /light boxStrip chart recorder /light boxComputer chromatography softwareComputer chromatography software-- ABI Model 610ABI Model 610-- HP/HP/Agilent ChemstationAgilent Chemstation-- SequenceProSequencePro-- any chromatography softwareany chromatography software

Chromatography software:Chromatography software:Overlay or stacking optionOverlay or stacking option

Compares 2 orCompares 2 ormoremorechromatogramschromatograms-- manually visualizemanually visualize

the differences inthe differences inpeak heightspeak heights

-- more forgiving ofmore forgiving ofinconsistencies ininconsistencies inchromatographychromatography

6.0 8.0 10.0 12.0 14.0 16.0 18.0

-1.50

-1.20

-0.90

-0.60

-0.30

:2:3:Std1

Chromatography software:Chromatography software:Subtraction mode optionSubtraction mode option

Shows combinedShows combinedimage of currentimage of currentchromatogramchromatogramminus peaks ofminus peaks ofprior chromatogramprior chromatogram-- requires tightrequires tight

chromatographychromatography-- hard to seehard to see

consecutive aminoconsecutive aminoacidsacids

A-singleexm Residue 3-2

DN S

QTG E

H

A R P M

V

REFWDPU

F I K L

6.0 9.0 12.0 15.0 18.0 21.0

-2.00

0.00

2.00

4.00

Chromatography softwareChromatography softwareoptionsoptions

HistogramsHistograms-- peak height/areas of a single amino acid inpeak height/areas of a single amino acid in

each cycle of the runeach cycle of the run-- requires good integration/requires good integration/quantitationquantitation

Chromatography softwareChromatography softwareoptionsoptions

Software callingSoftware calling-- requires tight chromatography and solidrequires tight chromatography and solid

integrationintegration-- may miss problematic amino acidsmay miss problematic amino acids-- not always reliablenot always reliable

What constitutes a calledWhat constitutes a calledamino acid?amino acid?

Potentially any signal that rises abovePotentially any signal that rises abovebackground level variations and falls atbackground level variations and falls ata later cycle can be an assignmenta later cycle can be an assignmentCalls may be defined as positive orCalls may be defined as positive ortentative depending on how far abovetentative depending on how far abovebackground levels the peaks risebackground levels the peaks riseExperience is a factorExperience is a factor

Calling amino acidsCalling amino acids

Although all amino acids are similar inAlthough all amino acids are similar inthethe Edman Edman degradation reactions and degradation reactions andthe resulting PTH-amino acids all havethe resulting PTH-amino acids all havevery similar extinction coefficients atvery similar extinction coefficients at269nm, there are some modifications269nm, there are some modificationsthat can affect the height/areas of thethat can affect the height/areas of theHPLC peaksHPLC peaks

Calling amino acidsCalling amino acidsPTH-PTH-serser recoveries are lower due to recoveries are lower due toloss of Hloss of H22OO-- forms PTH-DTT-forms PTH-DTT-dehydroalanine dehydroalanine derivativederivative

S

6.0 8.0 10.0 12.0 14.0 16.0 18.0-2.00

-1.00

0.00

1.00

2.00

3.00

4.00

dha

A

Calling amino acidsCalling amino acidsPTH-PTH-thr thr recovery can be lower due torecovery can be lower due toloss of Hloss of H22OO-- forms numerous forms numerous dehydrodehydro--aminoisobutyricaminoisobutyric

acid-DTT derivativesacid-DTT derivatives B- Celgene/JM UbcH7 011003 8:Residue 5

DN S

Q

T

G

E A RY

PV

dptu

Wdpu

F IK

L

6.0 8.0 10.0 12.0 14.0 16.0 18.0

-1.00

0.00

1.00

2.00

3.00

4.00

x xx x

T

Calling amino acidsCalling amino acidsMethionineMethionine can oxidize, resulting in can oxidize, resulting inlower recovery of PTH-Metlower recovery of PTH-Met

6.0 8.0 10.0 12.0 14.0 16.0 18.0-1.50

-1.00

-0.50

0.00

0.50

5.0 6.0 7.0 8.0 9.0 10.0

-1.20

-0.90

-0.60

-0.30

0.00

M

MetO

Calling amino acidsCalling amino acidsPTH-PTH-GluGlu is accompanied by PTH- is accompanied by PTH-GluGlu--aniline amide which becomes moreaniline amide which becomes moreabundant in later cyclesabundant in later cycles

6.0 8.0 10.0 12.0 14.0 16.0 18.0-2.00

-1.00

0.00

:13:14

E’E

Calling amino acidsCalling amino acids

TryptophanTryptophan is often oxidized to several is often oxidized to severalkynureninylkynureninyl adducts,resulting in low or adducts,resulting in low orno recovery of PTH-no recovery of PTH-TrpTrp

16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0

-4.00

-3.00

-2.00

-1.00

0.00

1.00

:8:9:Std1

ox W

P V DPTU W DPU F

Std pks in orange

Calling amino acidsCalling amino acidsNo PTH-No PTH-cysteinecysteine peaks are observed peaks are observeddue to loss of Hdue to loss of H22SS-- might see some PTH-DTT-might see some PTH-DTT-dehydroalaninedehydroalanine-- must bemust be alkylated alkylated ( (iodoacetamideiodoacetamide, vinyl-, vinyl-

pryidinepryidine,, acrylamide acrylamide, etc…), etc…)PTC-PTC-prolineproline is slow to cleave leaving is slow to cleave leavinggreater than normal laggreater than normal lag

PTH-S-proprionamide-Cys

EH

Calling amino acidsCalling amino acids

6.0 8.0 10.0 12.0 14.0 16.0

-1.00

0.00

1.00

2.00

3.00

:6:7:Std1

P

PVDF blot where PVDF blot where cysteine cysteine reacted withreacted withacrylamide acrylamide during electrophoresisduring electrophoresisProline Proline laglag

Calling amino acidsCalling amino acids

PTH-PTH-AsnAsn and PTH- and PTH-GlnGln will partially will partiallydeamidatedeamidate to give PTH-Asp and PTH- to give PTH-Asp and PTH-GluGluPTH-His and PTH-PTH-His and PTH-Arg Arg can be low duecan be low dueto poor extractionto poor extraction

Calling amino acidsCalling amino acids

“Blank” cycles can occur due to:“Blank” cycles can occur due to:-- unalkylated cysunalkylated cys-- completely oxidized completely oxidized trptrp-- modified amino acidmodified amino acid

•• asnasn-CHO , (N)-X-S/T motif-CHO , (N)-X-S/T motif

Calling amino acidsCalling amino acidsMultiple amino acids per cycleMultiple amino acids per cycle-- major and minor signals might be used tomajor and minor signals might be used to

assign more than one sequenceassign more than one sequence-- more challenging to distinguish major andmore challenging to distinguish major and

minor with comparable signals, taking intominor with comparable signals, taking intoaccount low recovery a.a.’saccount low recovery a.a.’s(C,W,S,T,R,H,M)(C,W,S,T,R,H,M)

-- numerous low level signals in each cyclenumerous low level signals in each cyclemay just be noisemay just be noise

Calling amino acidsCalling amino acids

Rising background throughout the runRising background throughout the run-- dependent on protein stabilitydependent on protein stability-- more prominent with larger protein runs more prominent with larger protein runs-- will limit length of calls will limit length of calls

Factors preventing goodFactors preventing goodassignmentsassignments

Presence of non sample amino acidsPresence of non sample amino acids-- contaminated cartridge blockscontaminated cartridge blocks

•• cleaning procedures using cleaning procedures using MeOHMeOH, , acnacn/H2O,/H2O,nitric acid, nitric acid, pyrolysispyrolysis

-- contaminated supports (PVDF blots,contaminated supports (PVDF blots,PVDFstripsPVDFstrips, GF,etc), GF,etc)

-- “dirty”“dirty” Polybrene Polybrene

Factors preventing goodFactors preventing goodassignmentsassignments

Sequencer performance (chemical orSequencer performance (chemical ormechanical problems)mechanical problems)-- excessive lagexcessive lag-- poor repetitive yieldpoor repetitive yield-- evaluate by running a standard proteinevaluate by running a standard protein

frequently or use an internal peptidefrequently or use an internal peptidestandard for each runstandard for each run

Factors preventing goodFactors preventing goodassignmentsassignments

Sequencer chemical artifactsSequencer chemical artifacts-- bad solvents and reagents, additivesbad solvents and reagents, additives

•• excessive DTT in S2Bexcessive DTT in S2B-- HPLC solvents, buffers, additivesHPLC solvents, buffers, additives-- co-co-GlnGln, aniline, DPTU, DPU, aniline, DPTU, DPU

Factors preventing goodFactors preventing goodassignmentsassignments

Sequencer chemical artifacts Sequencer chemical artifacts contcont..-- R2B (red) vsR2C (blue) (+/- PMTC)R2B (red) vsR2C (blue) (+/- PMTC)

15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0

-2.00

-1.60

-1.20

-0.80

-0.40

:10:13

17.0 18.0 19.0 20.0 21.0 22.0 23.0

0.00

10.00

20.00

:9:8

17.0 18.0 19.0 20.0 21.0 22.0 23.0-3.00

-2.00

-1.00

0.00

1.00

2.00

:9:8

PMTCDPTU

ox Trp

R2C red vs R2B blue

Factors preventing goodFactors preventing goodassignmentsassignments

HPLC problemsHPLC problems-- retention time reproducibilityretention time reproducibility

•• replace worn pump sealsreplace worn pump seals•• eliminating leaky fittingseliminating leaky fittings•• gradients / column equilibrationgradients / column equilibration

-- baseline flatnessbaseline flatness•• acetone, KHacetone, KH22POPO44

-- column lifecolumn life•• lower, broader peaks with older columnlower, broader peaks with older column

6.0 9.0 12.0 15.0 18.0 21.0

-3.00

0.00

3.00

6.00

9.00

:3:4:5

6.0 9.0 12.0 15.0 18.0 21.0

-2.00

0.00

2.00

4.00 normal

Pump seal failure

6.0 9.0 12.0 15.0 18.0 21.0 24.0-5.00

-4.00

-3.00

-2.00

-1.00

Factors preventing goodFactors preventing goodassignmentsassignments

cLC cLC guard column failureguard column failure

Factors affecting goodFactors affecting goodassignmentsassignments

SamplesSamples-- sample amount / puritysample amount / purity

•• the lower the amount, the higher thethe lower the amount, the higher thepurity required for confident callspurity required for confident calls

-- sample prep (see last years tutorial onlinesample prep (see last years tutorial onlineat WWW.ABRF.ORG)at WWW.ABRF.ORG)

Reasons for databaseReasons for databasesearchingsearching

Identification of protein sampleIdentification of protein sample-- Is assigned sequence in the database?Is assigned sequence in the database?-- Does the hit clearly identify the protein?Does the hit clearly identify the protein?

•• Is the match real or by chance?Is the match real or by chance?•• Longer sequences for definitive hitsLonger sequences for definitive hits

Determine if the sequence is unique.Determine if the sequence is unique.-- Is assigned sequence not in the database?Is assigned sequence not in the database?-- Does no exact hit mean you have a newDoes no exact hit mean you have a new

sequence?sequence?

Reason for databaseReason for databasesearchingsearching

Determine homology to otherDetermine homology to othersequences in the databasesequences in the database-- need enough sequence to establish aneed enough sequence to establish a

relationrelationSort multipleSort multiple Edman Edman assignments assignments-- multiple amino acids per cyclemultiple amino acids per cycle

Statistical based searchStatistical based searchalgorithmsalgorithms

BLASTBLAST-- AltschulAltschul,S.F., ,S.F., GishGish,W., Miller,W., Myers.E.W., and,W., Miller,W., Myers.E.W., and

LipmanLipman,D.J. (1990) J.,D.J. (1990) J.MolMol.. Biol Biol. 215, 403-410. 215, 403-410

FASTAFASTA-- LipmanLipman, D.J. and Pearson, W.R. (1985) Science, D.J. and Pearson, W.R. (1985) Science

227,1435-1441227,1435-1441

SSEARCH (Smith-SSEARCH (Smith-WatermanWaterman))-- Smith, T.F. and Smith, T.F. and WatermanWaterman, M.S. (1981) J., M.S. (1981) J.

MolMol. . BiolBiol. 147, 196-197. 147, 196-197

Text based search algorithmsText based search algorithms

FINDPATTERN (GCG)FINDPATTERN (GCG)MSPATTERN / MSEDMANMSPATTERN / MSEDMANPeptidesearchPeptidesearch

Protein DatabasesProtein Databases

NCBI NCBI nrnrSWALLSWALLSwissprotSwissprotTrEMBLTrEMBLLudwig Ludwig nrnrOwlOwlPIRPIRPRFPRF

Factors that influenceFactors that influencedatabase searchingdatabase searching

Search algorithmSearch algorithm(FASTA,BLAST,SSEARCH)(FASTA,BLAST,SSEARCH)Length of query sequence (>5)Length of query sequence (>5)Scoring matrix (PAM#,BLOSUM#)Scoring matrix (PAM#,BLOSUM#)Gap cost / PenaltyGap cost / Penalty

Factors that influenceFactors that influencedatabase searchingdatabase searching

WordsizeWordsize(1-3)(1-3)FilteringFilteringExpect (E)Expect (E)DatabaseDatabase

Web-Based SearchingWeb-Based Searching

National Center for BiotechnologyNational Center for BiotechnologyInformation (NCBI)Information (NCBI)-- www.www.ncbincbi..nlmnlm..nihnih..govgov/BLAST//BLAST/-- online BLAST tutorialonline BLAST tutorial-- BLAST searchesBLAST searches-- nr nr databasedatabase

Web-Based SearchingWeb-Based Searching

European Molecular BiologyEuropean Molecular BiologyLaboratory-European Laboratory-European BioinformaticsBioinformaticsInstitute (EMBL-EBI)Institute (EMBL-EBI)-- www.www.ebiebi.ac..ac.ukuk/tools/tools-- FASTAFASTA-- BLASTBLAST-- SSEARCHSSEARCH-- SWALL databaseSWALL database

Database Searching: FirstDatabase Searching: FirstAttemptAttempt

BLAST defaultBLAST defaultparametersparameters-- filterfilter-- BLOSUM62BLOSUM62-- expect 10expect 10-- wordsize wordsize 33-- database database nrnr

FASTA defaultFASTA defaultparametersparameters-- BLOSUM50BLOSUM50-- expect 1expect 1-- wordsize wordsize (k-(k-tuptup) 2) 2-- database database swallswall

Search returns no hitsSearch returns no hits

Search parameters too strictSearch parameters too strict-- remove filtersremove filters-- increase Expectincrease Expect-- use lower PAM or higher BLOSUM matrixuse lower PAM or higher BLOSUM matrix-- decrease word sizedecrease word size-- At NCBI BLAST can use “nearly identicalAt NCBI BLAST can use “nearly identical

short sequences” optionshort sequences” option•• E 20000E 20000•• PAM30PAM30

Search returns many hitsSearch returns many hits

Are the hits occurring by randomAre the hits occurring by randomchance?chance?-- parameters not strict enough parameters not strict enough

•• decrease Edecrease E•• higher PAM or lower BLOSUMhigher PAM or lower BLOSUM•• use filteruse filter

Are the hits to a highly conservedAre the hits to a highly conservedsequence?sequence?Need more sequence dataNeed more sequence data

Searching still returns noSearching still returns noexact hitexact hit

Sequence not in protein databaseSequence not in protein database-- search nucleotide databasesearch nucleotide database

•• TBLASTN, TFASTATBLASTN, TFASTA-- EST’sEST’s

Database searching withDatabase searching withmultiple amino acidsmultiple amino acids

Amino acids are similar in amountAmino acids are similar in amountCannot assign a major sequenceCannot assign a major sequenceUse search algorithms that allowUse search algorithms that allowmultiple entriesmultiple entriesUse search results to sort amino acidUse search results to sort amino acidassignments into protein sequencesassignments into protein sequences

Web based searching:Web based searching:multiple amino acidsmultiple amino acids

Text based searchingText based searching-- Protein ProspectorProtein Prospector

((www.prospector.www.prospector.ucsfucsf..eduedu))•• MSPATTERNMSPATTERN•• numerous databasesnumerous databases•• M.W. / species filteringM.W. / species filtering

-- PepSearch PepSearch (www.(www.mannmann..emblembl--heidelbergheidelberg.de/.de/GroupPagesGroupPages//PageLinkPageLink//peptipeptidesearchpagedesearchpage.html).html)

Web based searching:Web based searching:multiple amino acidsmultiple amino acids

Statistical based searchingStatistical based searching-- FASTF / FASTF3FASTF / FASTF3

•• http://http://fastafasta..biochbioch..virginiavirginia..eduedu•• http://www.http://www.ebiebi.ac..ac.ukuk/fasta33/fasta33•• numerous databasesnumerous databases

Reporting Results: ContentsReporting Results: Contents

Whatever the user wantsWhatever the user wantsRaw data (PTH chromatograms, list ofRaw data (PTH chromatograms, list ofall amino acid yields)all amino acid yields)Called amino acids in each cycleCalled amino acids in each cycle-- major and minormajor and minor-- positive or tentativepositive or tentativePmolesPmoles (raw or background subtracted) (raw or background subtracted)Initial yield / repetitive yieldInitial yield / repetitive yield

Reporting Results: ContentsReporting Results: Contents

Individual cycle commentsIndividual cycle commentsComments on the sequencing runComments on the sequencing runAssigned sequence/sAssigned sequence/sDatabase search parameters andDatabase search parameters andresultsresultsReconcile the sequencer data andReconcile the sequencer data anddatabase resultsdatabase results

Reporting Results: contentsReporting Results: contents

Sequencer and run conditionsSequencer and run conditionsSample Sample workupworkup

Reporting Results: stylesReporting Results: styles

Lab designed report formsLab designed report formsSequence analysis software printoutsSequence analysis software printoutsSpreadsheetsSpreadsheetsWordprocessorWordprocessorDatabase programs (Filemaker Pro)Database programs (Filemaker Pro)Handwritten report / emailHandwritten report / email

ESRG 2003 Sample UserESRG 2003 Sample UserReportsReports

36 reports received36 reports received-- 25 lab designed report form25 lab designed report form-- 6 sequence analysis software printout6 sequence analysis software printout-- 3 spreadsheet3 spreadsheet-- 1 email1 email-- 1 copy of handwritten lab notebook page1 copy of handwritten lab notebook page

Information included in ABRFInformation included in ABRFESRG 2003 user reportsESRG 2003 user reports

Information included in ABRF ESRG 2003 user reports %Sample information 58%Sample preparation 17%Sequencer run conditions 14%Raw data 8%Manuallly called amino acids 89%Positive / tentative call distinction 69%Place for minor calls 33%Computer called amino acids 14%Pmole raw 44%Pmole background subtracted 22%Individual cycle discussion 39%Assigned sequence 69%IY / RY information 31%Sequencing run discussion 33%Edman degradation discussion 17%

Perform database search 44%Database search parameters 25%Copy of database search results 88%Copy of database protein entry 38%Database search discussion 50%

ESRG 2003 Report ExamplesESRG 2003 Report Examples

top related