automated analysis of proteomics data on tap
TRANSCRIPT
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
1/57
Automated Analysis ofProteomics Data
on TapSimon Chiang
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
2/57
Tap (Task Application)
a b1
b c1
c2
c
?
x
x1
x2
y
y1 z
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
3/57
MS/MS IdentificationProtein Mixture
Digest to Peptides
Measure PeptideMass (MS)
Fragment, MeasureFragment Masses
(MSMS)
Identify Peptides
Correlate to Proteins
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
4/57
Peptide Identification
Peptide Mass
Fragmentation Spectrum (match/score)
Experimental Predicted from DBWednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
5/57
Peptide Identification
Peptide Mass (filter)
Fragmentation Spectrum
Experimental Predicted from DbWednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
6/57
PTM Identification
Variable Modification
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
7/57
Crosslinks
An unusual post-translational modification(modification by another peptide)
Same process, scoring algorithms
Main difference is prediction of crosslinkedspectrum
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
8/57
Search Space
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
9/57
Iterative Searching
Search normally
Harvest strong peptide identifications Generate subset database
Search unidentified for crosslinks vs subset
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
10/57
Iterative Searching
Strong Ids
Subset(identified proteins)
Crosslink Db
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
11/57
Variation
Search normally
Harvest strong peptide identifications Generate subset database
Search unidentified for PTMs vs subset
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
12/57
Variation
Search normally
Harvest strong peptide identifications Generate subset database
Search unidentified for PTMs vs subset
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
13/57
Variation
Search normally Harvest strong peptide identifications
Repeat (different search engine)
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
14/57
Variation
Search normally Harvest strong peptide identifications
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
15/57
Searching
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
16/57
A Practical Matter
Not (mostly) an issue of having software
Time, complexity, configuration Processing of results
Mostly an issue ofusingsoftware
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
17/57
Web Applications
Basis for most search engines/tools
Primarily a human interface Hard to automate a human*
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
18/57
Tap-Mechanize
Mechanize - a library for running websites
Used to redirect/resubmit HTTP
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
19/57
Tap-Mechanize
User Application
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
20/57
Tap-Mechanize
User
Application
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
21/57
Advantages
Utilizes native web interface
Robust and adaptable Multi-page requests supported
Lowest common denominator(works for most web apps)
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
22/57
Tap (Task Application)
A framework to automate workflows
Define computational tasks
Join tasks into workflows
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
23/57
Tasks
Submit data to a search engine
Download results
Convert a file format Perform a calculation
Generate a report ...
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
24/57
Sequence
a b c
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
25/57
Fork
a b
c
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
26/57
Merge
a
b
c
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
27/57
What Tap Does...
Programmers:
make/test, document, distribute Users:
install, learn, configure/use
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
28/57
Usage
Standard command line Web interface in development
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
29/57
Internals
Written in Ruby DSL for docs, configs
Distribution with RubyGems
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
30/57
Examples
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
31/57
Searching Perfect Data
Digest Protein(ALBU_HUMAN, Trypsin)
Generate Predicted Spectra(b,y ions, peptides n > 3 residues)
Search
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
32/57
Workflow
Digest Predict Search
Protein [Spectra] Result URL[Peptides]
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
33/57
Peptide Fragmentation
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
34/57
Protein Identification
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
35/57
Unassiged Peptide?
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
36/57
Peptide Mass Error?
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
37/57
Explanations
Unassigned peptides due to a serverconfiguration
Peptide mass error due to rounding(algorithm precision)
MinPepLenInSearch 5
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
38/57
Classify Results by GO
Terms Search with Mascot
Search with GPM Extract Intersection of Results
Map Accessions to Entrez (PIR)
Classify with GoGetter
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
39/57
Workflow
Load Data
mgf
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
40/57
Workflow
Search/Export Mascot
Search/Export GPM
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
41/57
Workflow
Intersect Results
Map Accessionsto Entrez
GoGetter
Graph
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
42/57
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
43/57
Gene Ontology, GO Slims : Biological Process - Weighted (Dataset 1 name)
Biological process (go:0008150) (16.35%)
Cellular process (go:0009987) (16.18%)
Macromolecule metabolic process (go:0043170) (15.98%
Metabolic process (go:0008152) (15.70%)
Nucleobase, nucleoside, nucleotide and nucleic aci..Cell communication (go:0007154) (8.73%)
Regulation of biological process (go:0050789) (6.46%
Transport (go:0006810) (4.14%)
Response to stimulus (go:0050896) (2.48%)
Multicellular organismal development (go:0007275) (2Biosynthetic process (go:0009058) (0.67%)
Cell differentiation (go:0030154) (0.56%)
Cell death (go:0008219) (0.48%)
Electron transport (go:0006118) (0.48%)
Secretion (go:0046903) (0.33%)
Membrane fusion (go:0006944) (0.33%)
ALBU_HUMAN
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
44/57
Recyling
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
45/57
Digest Predict Search
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
46/57
Digest Predict Load Search/Export Mascot
Search/Export GPM
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
47/57
Simple Iterative Search
Search Partition Search (+ PTMs)
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
48/57
Data Preparation
SearchConvertFormatExtract Data
.RAW [.dta] .mgf Result URL
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
49/57
Mspire
ms-mascot
ms-uniprot
ms-in_silico
ms-gpm
ms-fasta
ms-unimod
ms-xcalibur
ms-prots
ms-data_explorer
constants
molecules
external
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
50/57
Anticipated Usage
Programmer makes tasks
Researchers make workflows from tasks Definition
Configuration
Execution
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
51/57
Importance
Expedited analysis, reproducibiliy
Evaluation studies (swap x for y) Teaching/Learning
Performance is not secondary. It affecteverything you do, it affects how you usean application. - Linus Torvalds
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
52/57
Context
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
53/57
TPP, TOPP, CPAS
Various pipeline suites
Pipeline in the sense of sequence
Relatively hard to install, use, extend
Many similarities, but only sequences
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
54/57
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
55/57
Ongoing Research
Iterative searching - Nesvizhskii, 2006 Crosslink id via subsetting - Rinner, 2008 Subsetting search engine - Li, 2009
Enhancing peptide identification confidence bycombining search methods - Alves, 2008
Improving sensitivity by probabilistically combiningresults from multiple MS/MS search methodologies- Searle, 2008
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
56/57
Future Directions
Web interface, cleanup
Implement iterative searching with subsets Development of crosslink search algorithm
(if necessary)
Wednesday, April 8, 2009
-
8/14/2019 Automated Analysis of Proteomics Data on Tap
57/57
Acknowledgments
Kirk Hansen
Hansen Lab
Ashley Zurawel Lauren Kiemele
Ahn Lab John Prince
Thesis Committee
Bob Hodges
Paul Fennessey Christine Wu Larry Hunter Brad Bendiak
Mark Duncan