outline

37
Spatial segmentation of imaging mass spectrometry data with edge-preserving image denoising and clustering Theodore Alexandrov, Michael Becker, Sören Deininger, Günther Ernst, Liane Wehder, Markus Grasmair, Ferdinand von Eggeling, Herbert Thiele, and Peter Maass

Upload: emma

Post on 24-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Spatial segmentation of imaging mass spectrometry data with edge-preserving image denoising and clustering. Theodore Alexandrov , Michael Becker, Sören Deininger , Günther Ernst, Liane Wehder , Markus Grasmair , Ferdinand von Eggeling , Herbert Thiele, and Peter Maass. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Outline

Spatial segmentation of imaging mass

spectrometry data with edge-preserving image

denoising and clusteringTheodore Alexandrov, Michael Becker, Sören Deininger,

Günther Ernst, Liane Wehder, Markus Grasmair, Ferdinand von Eggeling, Herbert Thiele, and Peter Maass

Page 2: Outline

Outline Background on MS Imaging and goals of

paper

Methods

Results

Conclusions and Criticism

Page 3: Outline

Outline Background on MS Imaging and goals of

paper

Methods

Results

Conclusions and Criticism

Page 4: Outline

Background: what is MS imaging? In the words of All-Mighty Wikipedia:

Mass spectrometry imaging is a technique used in mass spectrometry to visualize the spatial distribution of e.g. compounds, biomarker, metabolites, peptides or proteins by their molecular masses.

Or in images:

Page 5: Outline

To propose a new procedure for spatial segmentation of MALDI-imaging datasets.

This procedure clusters all spectra into different groups based on their similarity.

This partition is represented by a segmentation map, which helps to understand the spatial structure of the sample.

Goals of this paper:

Page 6: Outline

Goal: in images…(it is MS Imaging after all)

Page 7: Outline

Current multivariate algorithm (PCA) are not meant for MS data and cannot be used to directly interpret the data.

Current clustering algorithm do not take in account spatial information.

Here, we assume that spectra close to each other should be similar.

Why?

Page 8: Outline

Outline Background on MS Imaging and goals of

paper

Methods

Results

Conclusions and Criticism

Page 9: Outline

Overview: Pipeline

Page 10: Outline

Rat brain coronal section◦ 80 µm raster◦ 200 laser shots per position; 20185 spectra◦ Data acquired: 2.5 kDa-25 kDa◦ Data considered: 2.5 kDa-10 kDa; 3045 points

Section of neuroendocrine tumor (NET) invading the small intestine◦ 50 µm raster◦ 300 laser shots per position; 27360 spectra◦ Data acquired:1 kDa-30 kDa◦ Data considered: 3.2 kDa-18kDa; 5027 points

Datasets

Page 11: Outline

Baseline correction◦ TopHat algorithm, minimal baseline width set to

10%, default in ClinProTools No normalization No binning ASCII -> Matlab

Spectra Preprocessing

Page 12: Outline

Part1: conventional peak picking applied to each 10th spectrum. Select 10 peaks.◦ Orthogonal Matching Pursuit (OMP) because it is

fast and simple◦ Gaussian kernel deconvolution

Part 2: keep consensus peaks:◦ Only keep peaks that appear in at least 1% of the

considered spectra◦ Omit spurious peaks

Peak-Picking

Page 13: Outline

Imaging dataset is a reduced datacube with 3 coordinates: x, y, m/z (reduced in m/z dimension by peak picking)

MALDI-imaging data is noisy Must be able to keep fine anatomical or

histological details Grasmair modification of Total Variation

minimizing Chambolle algorithm◦ Parameter θ between 0.5 and 1: smoothness of

resulting image

Edge-preserving denoising of m/z images

Page 14: Outline

Total variation (TV) ~ sum of absolute differences between neighboring pixels

Chambolle algorithm searches for an approximation of the image with small TV

Chambolle algorithm => smoothness adjusted globally by manually choosing a parameter

Grasmair locally adapts denoising parameter of Chambolle

Edge-preserving denoising of m/z images

Page 15: Outline

Specify number of cluster a-priori High Dimensional Discriminant Clustering

(HDDC)◦ Available in Matlab tool box◦ Each cluster is modeled by a Gaussian distribution

of its own covariance structure.◦ HDDC developed for high-dimensional data (d >

10)◦ Note: In Matlab HDDC = high-dimensional data

clustering

Clustering

Page 16: Outline

Outline Background on MS Imaging and goals of

paper

Methods

Results

Conclusions and Criticism

Page 17: Outline

used 2019 spectra out of 20185 (10%) potential peaks: 373 peaks (red triangles) consensus peaks: 110 peaks (green triangles)

◦ Present in at least 20 spectra out of the 2019 (1%)

Rat brain: peak picking

Discarded peaks mostly in low m/z regions

Hypothesize they are noise peaks because MALDI imaging spectra have high baseline in low m/z region.

Page 18: Outline

OMP successfully detects major peaks

Rat brain: peak picking

Gaussian function provides reasonable approximation of peak shape

Page 19: Outline

Strong noise Noise variance changes within m/z image

and between m/z images Noise variance is linearly proportional to

peak intensity

Rat brain: noise in MALDI-imaging

Page 20: Outline

Rat brain: oise in MALDI-imaging

Page 21: Outline

Apply Grasmair method to selected 110 consensus peaks

Efficiently removes the noise while not smoothing out edges

Edge-preserving denoising

Page 22: Outline

Rat brain: segmentation map

Shows anatomical features

Restricted to spatial resolution of MALDI-imaging dataset

Page 23: Outline

No denoising: borders do not match as well 3x3 median smoothing: bad edge

preservation 5x5 median smoothing: lose many regions

Rat brain: importance of edge-preserving denoising

Page 24: Outline

Find mass values expressed in region

Rat brain: co-localized masses

Page 25: Outline

3 main parameters in addition to peak width◦ Portion of spectra considered for peak picking

(each 10th spectrum)◦ Number of peaks selected for each spectrum (10

peaks)◦ Percentage of spectra where peak is found for

consensus peak list (1%)

Rat brain: the role of parameterspeak picking

Page 26: Outline

Robust to changes of second and third parameter

Rat brain: the role of parameterspeak picking

5 10 20 peaks

0.1%

1%

5%

Page 27: Outline

Increase of parameter 1 can be compensated by higher value for parameter 2

Rat brain: the role of parameterspeak picking

Each 5th spectrum

Each 20th spectrum

Page 28: Outline

Segmentation maps for ◦ 3 levels of denoising (0.6, 0.7, 0.8)◦ 3 number of clusters (6, 8, 10)

Decrease in number of clusters merge features

Too much denoising causes loss of structure details

Rat brain: the role of parametersdenoising and number of clusters

Page 29: Outline

Rat brain: the role of parametersdenoising and number of clusters

Page 30: Outline

Human neuroendocrine tumor dataset

Page 31: Outline

Human neuroendocrine tumor dataset

Page 32: Outline

Outline Background on MS Imaging and goals of

paper

Methods

Results

Conclusions and Criticism

Page 33: Outline

Peak picking: usually done on mean spectrum◦ 1% consensus better for peaks in small spatial area

Edge-preserving denoising◦ One study with average moving window and one study

posthoc to improve classification Clustering methods

◦ HDDC better results than k-means but significantly slower

◦ Currently, mostly hierarchical clustering = memory intensive

Importance to cancer studies◦ Represents a proteomic functional topographic map

Conclusions

Page 34: Outline

Didn’t explain why they got rid of part of the range for which the data was acquired

Dataset reduction by peak picking◦ done initially on per spectrum basis, it may get rid

of lower abundance peaks which still show interesting image

◦ Also, because the peak must be present in 1% of the 10% selected spectra, can miss smaller regions of interest if bad selection of 10%

Highly parameterized + slow running time would make it hard to run many trials

Criticism

Page 35: Outline

Thank you

Page 36: Outline

TV-minimization (Grasmair slides)

Page 37: Outline

TV-minimization (Grasmair slides)