background - datascience755.files.wordpress.com · web viewslide segmentation separates the...

+

BMED 6780Spring 2011

Professor May D. WangTechnical Report:

Cancer Genome AtlasAdvisor: Sonal Kothari

Team Members:Rajan Ananthan

Nak-Seung Patrick HyunApril 19, 2011

Georgia Institute of TechnologyCollege of Engineering

School of Electrical and Computer Engineering

I. ContentsII. Background..........................................................................................................................................3

III. Methodology.......................................................................................................................................3

A. Slide Segmentation..........................................................................................................................3

B. Color Segmentation.........................................................................................................................5

C. Ground Truth for Assessment..........................................................................................................5

IV. Results.................................................................................................................................................7

A. Slide Segmentation..........................................................................................................................8

B. Color Segmentation.......................................................................................................................10

V. GUI Design.........................................................................................................................................10

VI. Discussion..........................................................................................................................................11

A. Strengths and Weakness...............................................................................................................11

B. Scope.............................................................................................................................................11

C. Clinical Application........................................................................................................................11

II. Background

The Cancer Genome Atlas (TCGA) was founded on the principle to aid research and development in detecting and diagnosing cancer. TCGA data examined in this study consist of multi-resolution images of ovarian and brain cancer cells that are stained using Hematoxylin and Eosin, a common technique that differentiates cell components. Hematoxylin and Eosin (H&E) stains nuclei blue, cytoplasm pink, and red blood cells red if they are present. Automatic color segmentation, which has an immense value to histopathologists, has been used to distinguish these colors. This automation aids pathologists who identify cells by examining nuclei shape and size. In addition, automating this process provides objectivity and repeatability which is not present in human observation. The design of the system is centered on two features: 1) slide segmentation and 2) color segmentation. Slide segmentation separates the useful portion of the tissue from the background

In histopathology, quantitative metrics for two key outcomes must be focused on: 1) accuracy and 2) efficiency. In analyzing the accuracy, a semi-automated system for generating ground truth images was used. The results showed that adding texture information did not strongly correlate with the ground truth images.

III. Methodology

A. Slide Segmentation

Theory

The goal of slide segmentation is to separate the useful portions of tissue for image analysis. In this implementation, active contours were chosen to accomplish this task. Active contours, also called snakes, are advantageous in this application since the background (primarily all white) is different from the object being segmented, the tissue. The snake model specifically chosen is an implementation of the Chan-Vese active contour, which evolves the snake by minimizing an energy functional using a level set formulation.

Here the μ parameter is used to control the size of the objects that are segmented. If μ is small, then smaller objects are detected and vice versa. Since the length is minimized, the contour will wrap

F (c1 , c2 ,C )=μ⋅Length(C )+ν⋅Area( inside (C ))+

λ1∫inside (C )|μ0( x , y )−c1|

2dxdy+ λ2∫outside (C)|μ0 ( x , y )−c2|

2 dxdy

tightly around the object being segmented. For our implementation, μ = 0.5. The ν parameter has a similar effect and for our implementation, ν = 0. The λ1 and λ2 parameters adjust the emphasis on the interior and exterior mean-squared intensity differences.

Level Set Method

The level set method is implicit and parameter free and thus provides a way to model complex shape features. In addition, intrinsic geometric properties of the curve can be estimated directly from the level set function (i.e. normal, curvature). The key feature of the level set method is that topological changes are automatically handled since the contour is projected into a higher dimension space.

Implementation

Initialization

The program for slide segmentation requires an initial binary mask as a starting point for contour evolution. To do this efficiently and accurately, Otsu’s method is used to determine the initial region that the image occupies. Otsu’s method automatically performs thresholding through the use of the histogram. The algorithm separates the image into foreground and background by computing the optimal threshold by minimizing the intra-class (within class) variance. The next step in initialization is to dilate the image that results from Otsu’s method until one connected cluster is obtained. This ensures that all pixels that may have diagnostic value are included in the initial mask. Once the initialization is computed, the signed distance function (SDF) is computed with points on the interior having negative values and points on the exterior having positive values.

Gradient Descent

The gradient descent method is used to minimize the energy functional. The fundamental principle is that if one moves in a direction proportional to the negative of the gradient, the local minimum will be found. For the Chan Vese algorithm, the Euler Lagrange equation is formulated and solved numerically using gradient descent.

Convergence Criteria

The convergence criteria used to stop iterating is a user selected maximum number of iterations. For the sake of efficiency, this method was chosen over a threshold. Since the focus of the system is weighted more on color segmentation, a more time consuming process, this method increases efficiency while minimally sacrificing accuracy.

Optimization

∂φ∂ t

=δ(φ) [μκ (φ )|∇ φ|−ν−λ1 (u0−c1 )2+λ2(u0−c2 )

2 ]

To reduce computational complexity, the level set function is computed over a narrow band near the contour (zero level set). Since fewer pixels are used, the time required decreases significantly. Also, this does not significantly affect the accuracy of the segmentation since most slides contain one or two tissue samples.

B. Color Segmentation

C. Ground Truth for Assessment

Ground truth images were generated semi-automatically through a graphical user interface. The user selects 3 or 4 colors which are used as initialization for one iteration of k-means. The GUI is shown below.

After selecting the appropriate colors, the segmentation will appear in the same window and the user can toggle between the two views by pressing “Reset” to view the original image and “Show Color” to view the ground truth image. Another feature is the ability to control the weighting of each color by adjusting the sliders. Also, since it is very difficult to pinpoint where the user is clicking, the selected color is shown on the right of the sliders.

IV. Results

A. Slide Segmentation

The results of slide segmentation are shown in the figure below.

Upon closer examination, the following observations can be made:

1. There are regions where the object is over-segmented (red square).2. The segmentation includes stain blobs (blue circle).3. The segmentation is invariant to small grainy noise (purple circle on bottom).4. The algorithm is clearly capable of detecting odd shaped objects.

B. Color Segmentation

V. GUI Design

The graphical user interface was designed with two panels to allow the user to select different sections of the image. With a single window display, the user cannot accomplish this task. The user can select the number of iterations using a slider bar that ranges from 100 – 250 iterations. The reason for including this feature is to allow more control over the segmentation and compensate for over-segmentation or under- segmentation. Since there is no convergence criteria, this method evades the slow convergence issue.

VI. Discussion

A. Strengths and Weakness

Slide Segmentation Advantages Disadvantages Works well with images with strong

background/foreground separation Not sensitive to pixilated (grainy)

background noise Segments odd shaped objects and

detects interior contours Detects blurred boundaries which are

typical in histology slides Automatic detection of changes in

topology Speed

Sensitive to initialization Difficult to segment objects that have

similar average intensity as the background

Segments small clusters of cells or stain blobs

No convergence criteria (semi-automatic)

B. Clinical Application/Future Work

It has already been established that combining texture features with the Gaussian Mixture Model may not result in a strong correlation with the ground truth when tested on TCGA brain and ovarian cells. The scope would need to be broadened to include data sets for other tissue types before making any final conclusions about the validity of this approach. In terms of a clinical application, the system may be enhanced to include different texture measures or incorporate unsupervised learning techniques such as the self organizing map (SOM) instead of supervised training data. This would increase the computational complexity but may be more robust to different types of cancer tissue since a user chosen set of training data may not capture all the variation in clinical data.

C. Insights

This project solidified our understanding of active contours and mixture models. The project gave us an understanding of the strengths and weaknesses of our proposed algorithm. For slide segmentation, the Chan-Vese approach is a clear choice for segmenting tissue samples from the background. For color segmentation, the EM Gaussian method with texture may not be suited for ovarian and brain tissue

background - datascience755.files.wordpress.com · web viewslide segmentation separates the...

Documents