[ieee 2012 4th international conference on intelligent human computer interaction (ihci) -...
TRANSCRIPT
IEEE Proceedings of 4th International Conference on Intelligent Human Computer Interaction, Kharagpur, India, December 27-29, 2012
Kernel Induced Rough c–means Clustering forLymphocyte Image Segmentation
Subrajeet Mohapatra∗Department of Information Technology
Birla Institute of Technology Mesra
Ranchi–835215, India
Email: [email protected]
Dipti Patra, Sunil Kumar∗Department of Electrical Engineering
National Institute of Technology Rourkela
Rourkela–769008, Odisha
Email: [email protected]
Sanghamitra Satpathi
Department of Pathology
Ispat General Hospital
Rourkela–769008, Orissa
Email: [email protected]
Abstract—Blood microscopic image segmentation is a fun-damental tool for automated diagnosis of hematological dis-orders. In particular, lymphoblast image segmentation acts asthe foundation for all image based leukemia diagnostic system.Precision in image segmentation is a necessary condition forimproving the diagnostic accuracy in automated cytology. Sincethe diagnostic information content of the segmented images areplentiful, suitable segmentation routines need to be developedfor better disease recognition. In this paper, Kernel InducedRough C–means (KIRCM) clustering algorithm is introducedfor the segmentation of human lymphocyte images. Rough C–means clustering (RCM) is performed in higher dimensionalfeature space to obtain improved segmentation accuracy andto facilitate automated Acute Lymphoblastic Leukemia (ALL)detection. Comparative analysis reveals that use of rough setsin kernel space clustering for leukocyte segmentation gives theproposed scheme an edge over existing schemes.
Keywords-Lymphocyte, image segmentation, parametric ker-nel, clustering, rough sets
Stained peripheral blood smear is an essential component in
the laboratory diagnosis of hematological disorder. Peripheral
blood is considered important, as the blood cells are easily
accessible indicators of disturbances in their organs of origin
or degradation which are much less accessible for diagnosis.
Thus, changes in the cellular components of the blood allow
important inference to be drawn about various disease condi-
tions. Among all, hematological abnormalities are considered
crucial and needs immediate medication independent of age,
sex and race. Among all blood cells, leukocyte is an important
diagonostic indicator for various blood disorders including
cancer. Leukocyte disorders can be classified as neoplastic
(cancerous) or non–neoplastic. Leukemia is one of the poten-
tially fatal neoplastic disorder of leukocytes and is the present
subject of our study. It can be understood as a hematological
malignancy with increased numbers of myeloid or lymphoid
blasts and can be acute or chronic depending on the severity
of the disease. In the present work we only consider acute
lymphoblastic leukemia (ALL) as our area of research.
Diagnosis of ALL through human visual assesment is al-
ways associated with several lacuna i.e. increased patient wait-
ing time, inconsistent and subjective diagnosis reports. Usage
of advanced techniques like flow cytometer, immunophenotyp-
ing, molecular probing etc for ALL screening is also limited in
developing countries due to high cost. Thus there is always a
need for a consistent as well as cost effective automated ALL
screening system which can improve the diagnostic accuracy
without bias.
Although fundamental, image segmentation method has
never been same for all types of images. Therefore, a spe-
cific algorithm has to be developed for lymphocyte images
which will facilitate in the development of an automated ALL
screening system. Over years various peripheral blood smear
or bone marrow segmentation methods have been proposed
and are mostly shape, threshold, region growing or edge based
schemes. WBC segmentation based on simple thresholding
followed by contour identification using shape analysis was
introduced by Liao and Deng [1]. Yang et al. [2] used color
GVF active contour model for leukocyte segmentation. Use
of classical watershed segmentation for nuclear and cytoplasm
region extraction in lymphocyte images was initially presented
by Angulo et al. [3]. Automated leukocyte segmentation
scheme using Gaussian mixture modeling and EM algorithm
was proposed by Sinha et al. [4]. Dorini et al. [5] used water-
shed transform based on image forest transform to extract the
WBC nucleus. Ghosh et al. [6] proposed a threshold detection
scheme using fuzzy divergence for leukocyte segmentation.
Although the aforementioned leukocyte segmentation schemes
are reasonably succesfull in nucleus extraction, they perform
poorly when it comes to cytoplasm extraction. As cytoplasm
is a decisive morphological component of blood for ALL
detection so utmost care must be taken during its extraction.
Due to staining it is feasible to classify each color pixel of a pe-
ripheral blood smear image among three classes i.e. cytoplasm,
nucleus and background. However, color pixel resemblance
between cytoplasm and background region makes the seg-
mentation a more difficult problem. Consequently, to achieve
a satisfactory segmentation accuracy in terms of cytoplasm
and nucleus region extraction a novel kernel induced hybrid
clustering algorithm scheme is proposed for lymphocyte image
segmentation.
Rest of the paper is organized as follows: Section I describes
the schema of the proposed method. A review on the foun-
dation of kernel space clustering is presented in Section II.
Proposed KIRCM clustering algorithm towards leukocyte im-
age segmentation is outlined in Section III. Experiments and
results are described in Section IV. Section V and VI present
978-1-4673-4369-5/12/$31.00 ©2012 IEEE
a discussion and conclusion of this work respectively.
I. Material andMethods
A. Blood Smear Image Acquisition
Blood samples were collected at Ispat General Hospital,
Rourkela, India through randomization. Subsequently blood
smear is prepared and stained using leishman stain for vi-
sualization of cell components. A total of 165 blood smear
images were captured with a digital microscope (Carl Zeiss
India) under 100X oil immersed setting and with an effective
magnification of 1000. Manual segmentation was performed
by a panel of hematologists headed by one of the authors of
the present work.
B. Sub Imaging
Peripheral blood smear images are relatively larger with
more than one leukocyte per image. However, the desired
region of interest (ROI) must contain a single lymphocyte
only for ALL detection. This is desired since each lymphocyte
in the entire blood smear image has to be evaluated for
differentiating a lymphoblast from a mature lymphocyte. For
a detailed process of subimaging the reader is referred to our
previous work [7]. Sample sub images containing a single
lymphocyte only are shown in Fig. 1.
(a) IGH1aLB (b) IGH1bLB (c) IGH1LB
Fig. 1. Cropped Sub Images (Lymphocytes)
C. Color Space Conversion
Blood microscopic images are acquired in RGB color space.
Colorimetric transformation of the initial color coordinate
system i.e. RGB is essential to obtain a color space in
which the representation of the color data is the best to
optimally perform the segmentation process [8]. L∗a∗b∗ color
model is a suitable alternative for image segmentation as the
color dimension is reduced [9] and the color channels are
uncorrelated. This color space consists of a luminosity layer
L∗, and a set of chromaticity layers a∗ and b∗ which contains
color information. Transforming the blood microscopic images
from RGB to CIELAB reduces the color dimension of the
pixels from three (RGB) to two (a∗ and b∗) and facilitates
faster clustering.
D. Image Segmentation
Color features at each pixel are mapped into a much
higher dimensional Hilbert space making the data more easily
separable. RCM clustering is performed on this kernelised data
to cluster each pixel into one of the regions i.e. cytoplasm,
nucleus or background. It is observed that, the computed
distance between the data patterns and the cluster centers
in the Hilbert space are more robust to outliers, providing
better segmentation accuracy. Details about lymphocyte image
segmentation using the proposed RCM clustering in kernel
space is presented in Section III.
II. Kernel Space Clustering
A. Kernel Space
Clustering is an unsupervised classification of data pat-
terns and has been used in diversified areas such as pattern
recognition, data mining, image processing, and marketing
etc. Popular clustering algorithms i.e. K–means and Fuzzy C
Means have been widely used in the task of image segmen-
tation. However, the above algorithms are unable to cluster
non–linearly separable data i.e. images resulting with poor
segmentation performance. Feature space transformation using
kernels are necessary for the clustering of nonlinear image
data (color and texture) for the improvement of segmentation
performance. Kernel functions are used to transform the data
in the image plane into a feature plane of higher dimension
(possibly infinite) known as Kernel space. Nonlinear mapping
functions i.e. φ transforms the nonlinear separation problem
in the image plane into a linear separation problem in kernel
space facilitating clustering in the feature space. Although, due
to high and possibly infinite feature dimension it is unrealistic
to measure the Eucledian distance between the transformed
variables. However, as per Mercer’s theoreom working di-
rectly on the transformed variables can be avoided. Mercer’s
theoreom can be used to calculate the distance between the
pixel feature values in the kernel space without knowing the
transformation function φ(.) as presented in Section II-B.
B. Mercer’s Theoreom
Considering φ(.) to be a nonlinear mapping function for
transforming from the observation space I to a higher dimen-
sional feature space J. Assuming x and y to be two points in
the image plane each representing a pixel with color values.
Whereas φ(x) and φ(y) be its corresponding kernelised value in
the feature plane respectively. The squared euclidean distance
between φ(x) and φ(y) in the feature space can be represented
as:
Jk(x, y) = ||φ(x) − φ(y)||2 (1)
As per Mercer’s theoreom any continous, symmetric, pos-
itive semidefinite kernel function can be expressed as a dot
product in a higher dimension. Therefore it is undesirable to
know the transfer function while calculating the distance in
the feature plane.
The transfer function φ(·) is usually not defined explicitly,
however the kernel function k is given and is defined as
k (x, y) = φ(x)T · φ (y) ∀ (x, y) ∈ I2 (2)
where ′′·′′ is the dot product in the kernel space.
Thus (1) can be represented in terms of kernel function and
is defined as
Jk (x, y) = ||φ (x) − φ (y) ||2= (φ (x) − φ (y))T · (φ (x) − φ (y))
= φ(x)Tφ (x) − φ(y)Tφ (x) − φ(x)Tφ (y) + φ(y)Tφ (y)
= k (x, x) − k (x, y) − k (x, y) + k (y, y)
= k (x, x) − 2k (x, y) + k (y, y) , ∀ (y, z) ∈ I2 (3)
where Jk (x, y) is the non–Euclidean distance measure in the
original data space corresponding to the squared norm in the
kernel space. This distance provides more linear separability
among features when compared to simple Euclidean distance
measure. Some standard kernel functions are listed in Table I.
TABLE IKernel Functions
Kernel Expression
Linear k (x, y) = xT y + c
Gaussian k (x, y) = exp(−||x − y||2/2σ2
)
Exponential k (x, y) = exp(−||x − y||/2σ2
)
Sigmoid k (x, y) = tanh(c(xT · y
)+ θ)
Polynomial k (x, y) = (x.y + c)d
Using kernel functions the non–Euclidean distance between
feature points can be measured without defining the transfer
function φ(·). Nonlinear transformation of lymphocyte im-
age data in form of color (a and b) features into a high
dimensional kernel space and then performing clustering is
the objective of the proposed algorithm. Accordingly Rough
C–means clustering is performed on this kernelised data for
the segmentation of lymphocyte images. Rough C–means
algorithm along with the proposed Kernel Induced Rough C–
means (KIRCM) clustering algorithm for lymphocyte image
segmentation is presented in the next scetion.
III. Lymphocyte Image Segmentation using KIRCM
Sub images containing a single lymphocyte per image is
desirable and is obtained as defined in Section I-B. Suitable
color conversion from RGB to L∗a∗b∗ is performed on each
lymphocyte image. a∗ and b∗ component of the lymphocyte
image are considered as two features for color based cluster-
ing in the feature space. The implicit assumption of hyper–
spherical or hyper–ellipsoidal clusters in lymphocyte image
data is often restrictive. Hence nonlinear mapping is necessary
and is achieved using suitable kernel functions as described in
the previous section. Clustering is performed on the kernelised
version of the image color features using Rough C–means
clustering algorithm as defined in Section III-A.
A. Rough c-means (RCM)
In Rough c–means (RCM) clustering, the idea of standard
K–means is extended by visualizing each class as an interval
or rough set [10]. A rough set Y is characterized by its
lower and upper approximations BY and BY respectively. In
rough context an object Xk can be a member of at most
one lower approximation. If Xk ∈ BY of cluster Y , then
concurrently Xk ∈ BY of the same cluster. Whereas it will
never belong to other clusters. If Xk is not a member of any
lower approximation, then it will belong to two or more upper
approximations. Updated centroid vi of cluster Ui is computed
as
vi =
⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩
M1 if BUi � ∅ ∧ BUi − BUi � ∅M2 if BUi = ∅ ∧ BUi − BUi � ∅M3 otherwise
(4)
where,
M1 = wlow
∑Xk∈BUi
Xk
|BUi| + wup
∑Xk∈(BUi−BUi)
Xk
|BUi − BUi|M2 =
∑Xk∈(BUi−BUi)
Xk
|BUi − BUi|(5)
M3 =
∑Xk∈BUi
Xk
|BUi|The parameters wlow and wup correspond to relative weight-
ing factor for lower and upper approximation respectively
towards centroid updation. In this process the weight factor
for lower approximation (BUi) is higher than that of rough
boundary (BUi − BUi), i.e. wlow > wup. Where |BUi| signifies
the number of members in the lower approximation of cluster
Ui, where as |BUi − BUi| is the number of members present
in the rough boundary within the two approximations. Herein,
the RCM algorithm can be summarized below:
1. Assign initial centroids vi for the c clusters.
2. Each data object Xk is assigned either to the lower approx-
imation BUi or upper approximation BUi of cluster Ui, by
computing the difference in its distance d(Xk, vi)−d(Xk, v j)
from cluster centroid pairs vi and v j.
3. If d(Xk, vi) − d(Xk, v j) is less than a particular threshold
T ,
then Xk ∈ BUi and Xk ∈ BUj and Xk cannot be a member
of any lower approximation,
else Xk ∈ BUi such that distance d(Xk, vi) is minimum
over the c clusters.
4. Compute new updated centroid vi for each cluster Ui
using equation (4).
5. Iterate until convergence, i.e., there are no more data
members in the rough boundary.
Rough c–means algorithm is completely governed by three
parameters such as wlow, wup and T . The parameter threshold
can be defined as relative distance of a data member Xk from
a pair of cluster centroids vi and v j. These parameters has to
be suitably tuned for proper segmentation.
B. Kernel Induced Rough C–means clustering (KIRCM)
Lymphocyte images can be visually segmented into three
regions i.e. nucleus, cytoplasm, RBC and background stain as
suggested by the hematologist. Experiments were conducted
to determine the optimum number of classes c for accurate
segmentation of lymphocyte images and was found to be three.
Background stain and RBC are merged and considered as a
single region. Whereas cytoplasm and nucleus are considered
as two other morphological regions. RCM is used to classify
each pixel into three independent classes or regions. The
proposed clustering based segmentation algorithm is applied
on each lymphocyte sub image to extract the nucleus and
cytoplasm regions from the background and is used for au-
tomated ALL diagnosis. The detailed KIRCM algorithm for
lymphocyte image segmentation is presented as follows:
1. Let Irgb represent an original color leukocyte image in
RGB color format.
2. Apply L∗a∗b∗ color space conversion on Irgb to obtain the
L∗a∗b∗ image i.e. Ilab.
3. Construct the input feature vector using a∗ and b∗ com-
ponents of Ilab.
4. Using a nonlinear mapping function φ(.) transform the
input feature vector into a higher dimensional feature
space.
5. Perform rough c–means clustering within this feature
space using nonlinear kernel function.
6. Obtain the labeled image from the clustered output.
7. Reconstruct the segmented RGB color image for each
class representing an individual morphological region.
IV. Experiments and PerformanceMeasures
A total of 150 lymphocyte sub images are obtained using
the proposed sub image separation method from the 165 blood
smear images collected from IGH Rourkela. Out of the entire
image data set 81 images were classified by the hematologist
as benign samples and the rest were identified as malignant.
The segmentation performance of the proposed scheme is
evaluated over a set of 20 images which also includes both
benign and malignant samples, and for which expert hand
made segmentations are available. Three experiments were
conducted to demonstrate the efficacy of the proposed scheme.
In the first experiment the proposed scheme is compared with
four published leukocyte segmentation schemes such as Fuzzy
Divergence (FD) [6], Gaussian Mixture Model (GMM) [11],
Modified Fuzzy C Means (MFCM) [12] and Rough K–Means
(RKM) [13]. The second experiment is meant to illustrate
the segmentation error in terms of misclassification error
percentage (ε) and is evaluated by comparing the segmentation
results with the available manual segmented images using the
following relation:
ε =Total number of misclassified pixels
Total number of pixels in a region× 100 (6)
The comparison with the human segmentation is also done
using Tanimoto Index (T I) [14] and is used for evaluating
segmentation accuracy. T I is defined as:
T I =|LRI ∩ LMI||LRI ∪ LMI| (7)
where LRI is the labeled reference image provided by the
human expert and LMI is the labeled measured image obtained
using the proposed segmentation approach. |LRI ∩ LMI| de-
notes the total number of pixels classified as a particular label
by both, the proposed method and the human expert (ground
truth). |LRI ∪ LMI| denotes the number of pixels classified
as a particular label by either the proposed method or the
human expert. In the last experiment, the proposed scheme is
compared with the reported schemes in terms of computation
time.
A. Experiment 1
To visualize the subjective performance, segmented output
of the proposed scheme is compared with the reported schemes
for a single lymphocyte images and is presented in Fig. 2.
Segmentation results obtained using the proposed scheme
for five lymphoblast (malignant lymphocyte) images are also
presented in Fig. 3 for subjective evaluation.
Original (IGH14H)
Cytoplasm Nucleus Background
FD
GMM
MFCM
RKM
KIRCM
Fig. 2. Comparative lymphocyte segmentation results
B. Experiment 2
Fig. 4 exibits manual segmented images for four sample
lymphocytes which includes two healthy and two malignant
cells. Since the predefined regions of the manual segmented
Original Cytoplasm Nucleus
Fig. 3. Lymphocyte Segmentation results using the proposed approach
images are available, misclassification error percentage (ε) can
be computed for each morphological region (cytoplasm and
nucleus) of a lymphocyte separately using Equation 6. Seg-
mentation error in terms of misclassification error percentage
(ε) for the above four lymphocyte images along with six others
are tabulated in Table II. As a reference, however we also
include in Fig. 5 the performance of the proposed segmentation
scheme in terms of misclassification error percentage for the
entire 20 lymphocyte image samples for which ground truth
images are available. Performance in terms of T I is also
computed over twenty images using the available ground truth
images and is presented in Fig. 6.
Fig. 4. Ground truth images of sample lymphocytes
C. Experiment 3
In this experiment, all the cited schemes are used to segment
two lymphocyte images (IGH1aLB and IGH14H) of size 128×
TABLE IIComparison of segmentation error percentage
Methods
Image Error FD GMM MFCM RKM KIRCM
11HNSE 11.85 8.11 4.02 3.17 1.05
CSE - 56.80 8.56 6.42 2.07
7HNSE 12.40 6.45 7.20 4.30 1.17
CSE - 40.06 22.39 6.46 2.18
16L2BNSE 10.80 7.93 10.05 4.47 1.96
CSE - 25.53 14.69 5.58 2.21
51LBNSE 7.23 19.96 4.50 3.57 0.97
CSE - 36.36 14.81 6.63 2.02
52LBNSE 15.74 26.24 6.67 3.04 1.07
CSE - 31.26 6.29 4.65 2.35
NSE: Nucleus Segmentation Error
CSE: Cytoplasm Segmentation Error
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1
2
3
4
5
6
7
8
9
10
Image Samples
g
NucleusCytoplasm
Fig. 5. Segmentation error for the results of the proposed scheme on 20lymphocyte images.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20.50
.55
.60
.65
.70
.75
.80
.85
.90
.95
.00
Image Samples
NucleusCytoplasm
Fig. 6. Tanimoto Index (TI) for the segmentation results of the proposedscheme on 20 lymphocyte images.
128. The computational time (in seconds) are recorded for all
the schemes and is shown in Fig. 7.
V. Discussion
So it can be concluded from the above experiments that the
proposed segmentation scheme outperforms all other reported
schemes in terms of nucleus and cytoplasm extraction as
depicted by ε, T I and visual assessment.
FD GMM MFCM RKM Proposed0
0
0
0
0
0
0
Segmentation Schemes
IGH 11HIGH 7H
Fig. 7. Variation of computational time in seconds for two lymphocyteimages.
VI. Conclusion
Kernel framework has been applied to rough clustering for
lymphocyte image segmentation and is the main theme of
the paper. Encouraging outcomes in terms of nucleus and
cytoplasm extraction were observed in contrast to standard
lymphocyte segmentation schemes. Results obtained stimulate
future works which includes lymphocyte image segmentation
for touching cells and improved segmentation accuracy.
VII. Acknowledgement
The authors would like to thank Dr. R. R. Panda of IGH
Rourkela, Dr. R. K. Jena and Dr. Sudha Sethy Department
of Clinical Haematology, SCB Medical College Cuttack for
their clinical support and valuable advises which helped us in
significantly improving the paper.
References
[1] Q. Liao and Y. Deng, “An accurate segmentation method for white bloodcell images,” in Proceedings of the IEEE International Symposium onBiomedical Imaging, 2002, pp. 245 – 248.
[2] L. Yang, P. Meer, and D. Foran, “Unsupervised segmentation based onrobust estimation and color active contour models,” IEEE Transactionson Information Technology in Biomedicine, vol. 9, no. 3, pp. 475 –486,September 2005.
[3] J. Angulo and G. Flandrin, “Microscopic image analysis using math-ematical morphology: Application to haematological cytology,” vol. 1,pp. 304–312, 2003.
[4] N. Sinha and A. Ramakrishnan, “Blood cell segmentation using emalgorithm,” in Proceedings of the Indian Conference on ComputerVision, Graphics and Image Processing, 2002.
[5] L. Dorini, R. Minetto, and N. Leite, “White blood cell segmentation us-ing morphological operators and scale-space analysis,” in Proceedings ofthe Brazilian Symposium on Computer Graphics and Image Processing,October 2007, pp. 294 –304.
[6] M. Ghosh, D. Das, C. Chakraborty, and A. K. Ray, “Automatedleukocyte recognition using fuzzy divergence,” Micron, vol. 41, no. 7,pp. 840–846, 2010.
[7] S. Mohapatra, D. Patra, and K. Kumar, “Fast leukocyte image segmen-tation using shadowed sets,” International Journal of ComputationalBiology and Drug Design, vol. 5, no. 1, pp. 49 –65, Jan. 2012.
[8] C. Charrier, G. Lebrun, and O. Lezoray, “Evidential segmentation of mi-croscopic color images with pixel classification posterior probabilities,”Journal of Multimedia, vol. 2, no. 3, 2007.
[9] O. Demirkaya, M. H. Asyali, and P. Sahoo, Image Processing withMATLAB: Applications in Medicine and Biology. Taylor and Francis,2009.
[10] P. Lingras and C. West, “Interval set clustering of web users with roughk-means,” Journal of Intelligent Information Systems, vol. 23, pp. 5–16,2004.
[11] N. Sinha and A. G. Ramakrishnan, “Automation of differential bloodcount,” in Proceedings of the Conference on Convergent Technologiesfor Asia-Pacific Region, vol. 2, 2003, pp. 547–551.
[12] S. Chinwaraphat, A. Sanpanich, C. Pintavirooj, M. Sangworasil, andP. Tosranon, “A modified fuzzy clustering for white blood cell seg-mentation,” in Proceedings of the Third International Symposium onBiomedical Engineering, vol. 6, 2008, pp. 2259–2261.
[13] S. Mohapatra, D. Patra, and K. Kumar, “Blood microscopic imagesegmentation using rough sets,” in Proceedings of the InternationalConference on Image Information Processing, November 2011, pp. 1–6.
[14] K. Tu, H. Yu, Z. Guo, and X. Li, “Learnability-based further predictionof gene functions in gene ontology,” Genomics, vol. 84, no. 6, pp. 922– 928, 2004.