1 incob 2009, singapore ren é hussong et al. highly accelerated feature detection in mass...
TRANSCRIPT
1
InCoB 2009, Singapore
René Hussong et al.
Highly accelerated feature detection in mass spectrometry data
using modern graphics processing unitsBioinformatics 25 (2009).
Junior Research Group for Protein-Protein-Interactions and Computational Proteomics
Saarland University, Saarbruecken, Germany
2
Outline
∙ Introduction & Motivation - The Differential Proteomics Pipeline
∙ Computational Proteomics- Signal Processing and Feature Detection- The Isotope Wavelet Transform
∙ Parallelization via GPUs
∙ Results & Discussion
3
The Differential Proteomics Pipeline
Two probes:e.g. sick vs. healthy Mass Spectrometer
List of differentiallyexpressed proteins
Applications range from basic pharmaceutical researchover medical diagnostics and therapy
to biotechnology and engineering.
4
Principle of Biological Mass Spectrometry
digest
intensity
mass
Fingerprint
Proteins Peptides
Peptides are
ionized and
accelerated
5
Principle of Biological Mass Spectrometry
digest
intensity
mass
Fingerprint
mass of a single neutron
6
Principle of Biological Mass Spectrometry
digest
intensity
mass
Fingerprint
mass of a single neutron
7
(Simple) Feature Finding
Typically done by simple thresholding:
Needs additional preprocessing steps, like e.g.: - Baseline elimination (e.g. by morphological filters) - Noise reduction and/or smoothing (Mostly) needs resampling
Needs additional postprocessing steps, like e.g.:- Peak clustering (so-called “deconvolution”)- Model fitting, charge prediction
8
The Isotope Wavelet Transform
Convolution with a kernel function
- by construction robust against noise and baseline artifacts- also acts as a filter for chemical noise - predicts simultaneously the charge state- needs no explicit resampling - only a single parameter (threshold)
9
Results – Myoglobin PMF
10
Parallelization via CUDA
11
Parallelization via CUDA
12
Parallelization via CUDA
b-th data point
13
Parallelization via CUDA
b-th data point
14
Parallelization via CUDA
b-th data point
15
Parallelization via CUDA
b-th data point
16
Parallelization via CUDA
T0
b-th data point
Tn
17
Parallelization via CUDA and TBB
2x NVIDIA Tesla C870 via Intel Threading Building Blocks
1x NVIDIA Tesla C870
1x CPU 2.3 GHz
>200x speedup>200x speedup
18
Open Issues – Future Work
∙ Solutions for machine-specific ‘artifacts’, e.g.- Tailing effects in TOF-Analyzers- Severe mass discretization in high resolution data
∙ Separating overlapping patterns
∙ Tests for MSn spectra- Refined averagine model
GPU solutions
19
Availability: OpenMS
∙ An open source C++ library for mass spectrometry
∙ Designed for “users” as well as for “developers”
∙ TOPP- “The OpenMS proteomics pipeline”- suite of independent software tools- include file handling / conversion- peak picking and feature detection - includes visualizer TOPPView…
http://www.openms.de
20
References
Hussong, R, Gregorius, B, Tholey, A, and Hildebrandt, A (2009). Highly accelerated feature detection in proteomics data sets using modern graphics processing units. Bioinformatics 25.
Schulz-Trieglaff, O, Hussong, R, Gröpl, C, Leinenbach, A, Hildebrandt, A, Huber, C, and Reinert, K (2008). Computational Quantification of Peptides from LC-MS Data. Journal of Computational Biology 15(7).
Sturm, M, Bertsch, A, Gröpl, C, Hildebrandt, A, Hussong, R, Lange, E, Pfeifer, N, Schulz-Trieglaff, O, Zerck, A, Reinert, K, and Kohlbacher, O (2008).OpenMS - An open-source software framework for mass spectrometry, BMC Bioinformatics 9(163).
Hussong, R, Tholey, A, and Hildebrandt, A (2007).Efficient Analysis of Mass Spectrometry Data Using the Isotope WaveletIn: COMPLIFE 2007: The Third International Symposium on Computational Life Science. American Institute of Physics (AIP) 940.
Schulz-Trieglaff, O, Hussong, R, Gröpl, C, Hildebrandt, A, and Reinert, K (2007). A Fast and Accurate Algorithm for the Quantification of Peptides from Mass Spectrometry Data, In: Proceedings of the Eleventh Annual International Conference on Research in Computational Molecular Biology (RECOMB). Lecture Notes in Bioinformatics (LNBI) 4453.
21
The Isotope Wavelet Transform
Kernel functioncharge state 1, mass 1000D
Kernel functioncharge state 1, mass 2000D
- by construction robust against noise and baseline artifacts- also acts as a filter for chemical noise - predicts simultaneously the charge state- needs no explicit resampling - only a single parameter (threshold)
Convolution with a kernel function
22
The Isotope Wavelet Transform
MS spectrum (charge state 3)
charge-1-transform
charge-2-transform
charge-3-transform
23
The Sweep Line Idea
m/z [Th]
RT [s]
2 additional parameters:RT_cutoffRT_interleave
2 additional parameters:RT_cutoffRT_interleave
24
digest
intensity
mass/charge
Fingerprintcharge state 1
Open Issues – Future Work
Fragment Fingerprint
25
Open Issues – Future Work
∙ Separating overlapping patterns
26
The Retention Time
27
Results – 2D noisy data
28
The Adaptive Isotope Wavelet Kernel
- denotes the Heaviside step function- λ(m) is a linear function fit to the averagine model