[ieee 2012 4th international conference on intelligent human computer interaction (ihci) -...

IEEE Proceedings of 4th International Conference on Intelligent Human Computer Interaction, Kharagpur, India, December 27-29, 2012

Kernel Induced Rough c–means Clustering forLymphocyte Image Segmentation

Subrajeet Mohapatra∗Department of Information Technology

Birla Institute of Technology Mesra

Ranchi–835215, India

Email: [email protected]

Dipti Patra, Sunil Kumar∗Department of Electrical Engineering

National Institute of Technology Rourkela

Rourkela–769008, Odisha


Sanghamitra Satpathi

Department of Pathology

Ispat General Hospital

Rourkela–769008, Orissa


Abstract—Blood microscopic image segmentation is a fun-damental tool for automated diagnosis of hematological dis-orders. In particular, lymphoblast image segmentation acts asthe foundation for all image based leukemia diagnostic system.Precision in image segmentation is a necessary condition forimproving the diagnostic accuracy in automated cytology. Sincethe diagnostic information content of the segmented images areplentiful, suitable segmentation routines need to be developedfor better disease recognition. In this paper, Kernel InducedRough C–means (KIRCM) clustering algorithm is introducedfor the segmentation of human lymphocyte images. Rough C–means clustering (RCM) is performed in higher dimensionalfeature space to obtain improved segmentation accuracy andto facilitate automated Acute Lymphoblastic Leukemia (ALL)detection. Comparative analysis reveals that use of rough setsin kernel space clustering for leukocyte segmentation gives theproposed scheme an edge over existing schemes.

Keywords-Lymphocyte, image segmentation, parametric ker-nel, clustering, rough sets

Stained peripheral blood smear is an essential component in

the laboratory diagnosis of hematological disorder. Peripheral

blood is considered important, as the blood cells are easily

accessible indicators of disturbances in their organs of origin

or degradation which are much less accessible for diagnosis.

Thus, changes in the cellular components of the blood allow

important inference to be drawn about various disease condi-

tions. Among all, hematological abnormalities are considered

crucial and needs immediate medication independent of age,

sex and race. Among all blood cells, leukocyte is an important

diagonostic indicator for various blood disorders including

cancer. Leukocyte disorders can be classified as neoplastic

(cancerous) or non–neoplastic. Leukemia is one of the poten-

tially fatal neoplastic disorder of leukocytes and is the present

subject of our study. It can be understood as a hematological

malignancy with increased numbers of myeloid or lymphoid

blasts and can be acute or chronic depending on the severity

of the disease. In the present work we only consider acute

lymphoblastic leukemia (ALL) as our area of research.

Diagnosis of ALL through human visual assesment is al-

ways associated with several lacuna i.e. increased patient wait-

ing time, inconsistent and subjective diagnosis reports. Usage

of advanced techniques like flow cytometer, immunophenotyp-

ing, molecular probing etc for ALL screening is also limited in

developing countries due to high cost. Thus there is always a

need for a consistent as well as cost effective automated ALL

screening system which can improve the diagnostic accuracy

without bias.

Although fundamental, image segmentation method has

never been same for all types of images. Therefore, a spe-

cific algorithm has to be developed for lymphocyte images

which will facilitate in the development of an automated ALL

screening system. Over years various peripheral blood smear

or bone marrow segmentation methods have been proposed

and are mostly shape, threshold, region growing or edge based

schemes. WBC segmentation based on simple thresholding

followed by contour identification using shape analysis was

introduced by Liao and Deng [1]. Yang et al. [2] used color

GVF active contour model for leukocyte segmentation. Use

of classical watershed segmentation for nuclear and cytoplasm

region extraction in lymphocyte images was initially presented

by Angulo et al. [3]. Automated leukocyte segmentation

scheme using Gaussian mixture modeling and EM algorithm

was proposed by Sinha et al. [4]. Dorini et al. [5] used water-

shed transform based on image forest transform to extract the

WBC nucleus. Ghosh et al. [6] proposed a threshold detection

scheme using fuzzy divergence for leukocyte segmentation.

Although the aforementioned leukocyte segmentation schemes

are reasonably succesfull in nucleus extraction, they perform

poorly when it comes to cytoplasm extraction. As cytoplasm

is a decisive morphological component of blood for ALL

detection so utmost care must be taken during its extraction.

Due to staining it is feasible to classify each color pixel of a pe-

ripheral blood smear image among three classes i.e. cytoplasm,

nucleus and background. However, color pixel resemblance

between cytoplasm and background region makes the seg-

mentation a more difficult problem. Consequently, to achieve

a satisfactory segmentation accuracy in terms of cytoplasm

and nucleus region extraction a novel kernel induced hybrid

clustering algorithm scheme is proposed for lymphocyte image

segmentation.

Rest of the paper is organized as follows: Section I describes

the schema of the proposed method. A review on the foun-

dation of kernel space clustering is presented in Section II.

Proposed KIRCM clustering algorithm towards leukocyte im-

age segmentation is outlined in Section III. Experiments and

results are described in Section IV. Section V and VI present

978-1-4673-4369-5/12/$31.00 ©2012 IEEE

a discussion and conclusion of this work respectively.

I. Material andMethods

A. Blood Smear Image Acquisition

Blood samples were collected at Ispat General Hospital,

Rourkela, India through randomization. Subsequently blood

smear is prepared and stained using leishman stain for vi-

sualization of cell components. A total of 165 blood smear

images were captured with a digital microscope (Carl Zeiss

India) under 100X oil immersed setting and with an effective

magnification of 1000. Manual segmentation was performed

by a panel of hematologists headed by one of the authors of

the present work.

B. Sub Imaging

Peripheral blood smear images are relatively larger with

more than one leukocyte per image. However, the desired

region of interest (ROI) must contain a single lymphocyte

only for ALL detection. This is desired since each lymphocyte

in the entire blood smear image has to be evaluated for

differentiating a lymphoblast from a mature lymphocyte. For

a detailed process of subimaging the reader is referred to our

previous work [7]. Sample sub images containing a single

lymphocyte only are shown in Fig. 1.

(a) IGH1aLB (b) IGH1bLB (c) IGH1LB

Fig. 1. Cropped Sub Images (Lymphocytes)

C. Color Space Conversion

Blood microscopic images are acquired in RGB color space.

Colorimetric transformation of the initial color coordinate

system i.e. RGB is essential to obtain a color space in

which the representation of the color data is the best to

optimally perform the segmentation process [8]. L∗a∗b∗ color

model is a suitable alternative for image segmentation as the

color dimension is reduced [9] and the color channels are

uncorrelated. This color space consists of a luminosity layer

L∗, and a set of chromaticity layers a∗ and b∗ which contains

color information. Transforming the blood microscopic images

from RGB to CIELAB reduces the color dimension of the

pixels from three (RGB) to two (a∗ and b∗) and facilitates

faster clustering.

D. Image Segmentation

Color features at each pixel are mapped into a much

higher dimensional Hilbert space making the data more easily

separable. RCM clustering is performed on this kernelised data

to cluster each pixel into one of the regions i.e. cytoplasm,

nucleus or background. It is observed that, the computed

distance between the data patterns and the cluster centers

in the Hilbert space are more robust to outliers, providing

better segmentation accuracy. Details about lymphocyte image

segmentation using the proposed RCM clustering in kernel

space is presented in Section III.

II. Kernel Space Clustering

A. Kernel Space

Clustering is an unsupervised classification of data pat-

terns and has been used in diversified areas such as pattern

recognition, data mining, image processing, and marketing

etc. Popular clustering algorithms i.e. K–means and Fuzzy C

Means have been widely used in the task of image segmen-

tation. However, the above algorithms are unable to cluster

non–linearly separable data i.e. images resulting with poor

segmentation performance. Feature space transformation using

kernels are necessary for the clustering of nonlinear image

data (color and texture) for the improvement of segmentation

performance. Kernel functions are used to transform the data

in the image plane into a feature plane of higher dimension

(possibly infinite) known as Kernel space. Nonlinear mapping

functions i.e. φ transforms the nonlinear separation problem

in the image plane into a linear separation problem in kernel

space facilitating clustering in the feature space. Although, due

to high and possibly infinite feature dimension it is unrealistic

to measure the Eucledian distance between the transformed

variables. However, as per Mercer’s theoreom working di-

rectly on the transformed variables can be avoided. Mercer’s

theoreom can be used to calculate the distance between the

pixel feature values in the kernel space without knowing the

transformation function φ(.) as presented in Section II-B.

B. Mercer’s Theoreom

Considering φ(.) to be a nonlinear mapping function for

transforming from the observation space I to a higher dimen-

sional feature space J. Assuming x and y to be two points in

the image plane each representing a pixel with color values.

Whereas φ(x) and φ(y) be its corresponding kernelised value in

the feature plane respectively. The squared euclidean distance

between φ(x) and φ(y) in the feature space can be represented

as:

Jk(x, y) = ||φ(x) − φ(y)||2 (1)

As per Mercer’s theoreom any continous, symmetric, pos-

itive semidefinite kernel function can be expressed as a dot

product in a higher dimension. Therefore it is undesirable to

know the transfer function while calculating the distance in

the feature plane.

The transfer function φ(·) is usually not defined explicitly,

however the kernel function k is given and is defined as

k (x, y) = φ(x)T · φ (y) ∀ (x, y) ∈ I2 (2)

where ′′·′′ is the dot product in the kernel space.

Thus (1) can be represented in terms of kernel function and

is defined as

Jk (x, y) = ||φ (x) − φ (y) ||2= (φ (x) − φ (y))T · (φ (x) − φ (y))

= φ(x)Tφ (x) − φ(y)Tφ (x) − φ(x)Tφ (y) + φ(y)Tφ (y)

= k (x, x) − k (x, y) − k (x, y) + k (y, y)

= k (x, x) − 2k (x, y) + k (y, y) , ∀ (y, z) ∈ I2 (3)

where Jk (x, y) is the non–Euclidean distance measure in the

original data space corresponding to the squared norm in the

kernel space. This distance provides more linear separability

among features when compared to simple Euclidean distance

measure. Some standard kernel functions are listed in Table I.

TABLE IKernel Functions

Kernel Expression

Linear k (x, y) = xT y + c

Gaussian k (x, y) = exp(−||x − y||2/2σ2

)

Exponential k (x, y) = exp(−||x − y||/2σ2

)

Sigmoid k (x, y) = tanh(c(xT · y

)+ θ)

Polynomial k (x, y) = (x.y + c)d

Using kernel functions the non–Euclidean distance between

feature points can be measured without defining the transfer

function φ(·). Nonlinear transformation of lymphocyte im-

age data in form of color (a and b) features into a high

dimensional kernel space and then performing clustering is

the objective of the proposed algorithm. Accordingly Rough

C–means clustering is performed on this kernelised data for

the segmentation of lymphocyte images. Rough C–means

algorithm along with the proposed Kernel Induced Rough C–

means (KIRCM) clustering algorithm for lymphocyte image

segmentation is presented in the next scetion.

III. Lymphocyte Image Segmentation using KIRCM

Sub images containing a single lymphocyte per image is

desirable and is obtained as defined in Section I-B. Suitable

color conversion from RGB to L∗a∗b∗ is performed on each

lymphocyte image. a∗ and b∗ component of the lymphocyte

image are considered as two features for color based cluster-

ing in the feature space. The implicit assumption of hyper–

spherical or hyper–ellipsoidal clusters in lymphocyte image

data is often restrictive. Hence nonlinear mapping is necessary

and is achieved using suitable kernel functions as described in

the previous section. Clustering is performed on the kernelised

version of the image color features using Rough C–means

clustering algorithm as defined in Section III-A.

A. Rough c-means (RCM)

In Rough c–means (RCM) clustering, the idea of standard

K–means is extended by visualizing each class as an interval

or rough set [10]. A rough set Y is characterized by its

lower and upper approximations BY and BY respectively. In

rough context an object Xk can be a member of at most

one lower approximation. If Xk ∈ BY of cluster Y , then

concurrently Xk ∈ BY of the same cluster. Whereas it will

never belong to other clusters. If Xk is not a member of any

lower approximation, then it will belong to two or more upper

approximations. Updated centroid vi of cluster Ui is computed

as

vi =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

M1 if BUi � ∅ ∧ BUi − BUi � ∅M2 if BUi = ∅ ∧ BUi − BUi � ∅M3 otherwise

(4)

where,

M1 = wlow

∑Xk∈BUi

Xk

|BUi| + wup

∑Xk∈(BUi−BUi)

Xk

|BUi − BUi|M2 =

∑Xk∈(BUi−BUi)

Xk

|BUi − BUi|(5)

M3 =

∑Xk∈BUi

Xk

|BUi|The parameters wlow and wup correspond to relative weight-

ing factor for lower and upper approximation respectively

towards centroid updation. In this process the weight factor

for lower approximation (BUi) is higher than that of rough

boundary (BUi − BUi), i.e. wlow > wup. Where |BUi| signifies

the number of members in the lower approximation of cluster

Ui, where as |BUi − BUi| is the number of members present

in the rough boundary within the two approximations. Herein,

the RCM algorithm can be summarized below:

1. Assign initial centroids vi for the c clusters.

2. Each data object Xk is assigned either to the lower approx-

imation BUi or upper approximation BUi of cluster Ui, by

computing the difference in its distance d(Xk, vi)−d(Xk, v j)

from cluster centroid pairs vi and v j.

3. If d(Xk, vi) − d(Xk, v j) is less than a particular threshold

T ,

then Xk ∈ BUi and Xk ∈ BUj and Xk cannot be a member

of any lower approximation,

else Xk ∈ BUi such that distance d(Xk, vi) is minimum

over the c clusters.

4. Compute new updated centroid vi for each cluster Ui

using equation (4).

5. Iterate until convergence, i.e., there are no more data

members in the rough boundary.

Rough c–means algorithm is completely governed by three

parameters such as wlow, wup and T . The parameter threshold

can be defined as relative distance of a data member Xk from

a pair of cluster centroids vi and v j. These parameters has to

be suitably tuned for proper segmentation.

B. Kernel Induced Rough C–means clustering (KIRCM)

Lymphocyte images can be visually segmented into three

regions i.e. nucleus, cytoplasm, RBC and background stain as

suggested by the hematologist. Experiments were conducted

to determine the optimum number of classes c for accurate

segmentation of lymphocyte images and was found to be three.

Background stain and RBC are merged and considered as a

single region. Whereas cytoplasm and nucleus are considered

as two other morphological regions. RCM is used to classify

each pixel into three independent classes or regions. The

proposed clustering based segmentation algorithm is applied

on each lymphocyte sub image to extract the nucleus and

cytoplasm regions from the background and is used for au-

tomated ALL diagnosis. The detailed KIRCM algorithm for

lymphocyte image segmentation is presented as follows:

1. Let Irgb represent an original color leukocyte image in

RGB color format.

2. Apply L∗a∗b∗ color space conversion on Irgb to obtain the

L∗a∗b∗ image i.e. Ilab.

3. Construct the input feature vector using a∗ and b∗ com-

ponents of Ilab.

4. Using a nonlinear mapping function φ(.) transform the

input feature vector into a higher dimensional feature

space.

5. Perform rough c–means clustering within this feature

space using nonlinear kernel function.

6. Obtain the labeled image from the clustered output.

7. Reconstruct the segmented RGB color image for each

class representing an individual morphological region.

IV. Experiments and PerformanceMeasures

A total of 150 lymphocyte sub images are obtained using

the proposed sub image separation method from the 165 blood

smear images collected from IGH Rourkela. Out of the entire

image data set 81 images were classified by the hematologist

as benign samples and the rest were identified as malignant.

The segmentation performance of the proposed scheme is

evaluated over a set of 20 images which also includes both

benign and malignant samples, and for which expert hand

made segmentations are available. Three experiments were

conducted to demonstrate the efficacy of the proposed scheme.

In the first experiment the proposed scheme is compared with

four published leukocyte segmentation schemes such as Fuzzy

Divergence (FD) [6], Gaussian Mixture Model (GMM) [11],

Modified Fuzzy C Means (MFCM) [12] and Rough K–Means

(RKM) [13]. The second experiment is meant to illustrate

the segmentation error in terms of misclassification error

percentage (ε) and is evaluated by comparing the segmentation

results with the available manual segmented images using the

following relation:

ε =Total number of misclassified pixels

Total number of pixels in a region× 100 (6)

The comparison with the human segmentation is also done

using Tanimoto Index (T I) [14] and is used for evaluating

segmentation accuracy. T I is defined as:

T I =|LRI ∩ LMI||LRI ∪ LMI| (7)

where LRI is the labeled reference image provided by the

human expert and LMI is the labeled measured image obtained

using the proposed segmentation approach. |LRI ∩ LMI| de-

notes the total number of pixels classified as a particular label

by both, the proposed method and the human expert (ground

truth). |LRI ∪ LMI| denotes the number of pixels classified

as a particular label by either the proposed method or the

human expert. In the last experiment, the proposed scheme is

compared with the reported schemes in terms of computation

time.

A. Experiment 1

To visualize the subjective performance, segmented output

of the proposed scheme is compared with the reported schemes

for a single lymphocyte images and is presented in Fig. 2.

Segmentation results obtained using the proposed scheme

for five lymphoblast (malignant lymphocyte) images are also

presented in Fig. 3 for subjective evaluation.

Original (IGH14H)

Cytoplasm Nucleus Background

FD

GMM

MFCM

RKM

KIRCM

Fig. 2. Comparative lymphocyte segmentation results

B. Experiment 2

Fig. 4 exibits manual segmented images for four sample

lymphocytes which includes two healthy and two malignant

cells. Since the predefined regions of the manual segmented

Original Cytoplasm Nucleus

Fig. 3. Lymphocyte Segmentation results using the proposed approach

images are available, misclassification error percentage (ε) can

be computed for each morphological region (cytoplasm and

nucleus) of a lymphocyte separately using Equation 6. Seg-

mentation error in terms of misclassification error percentage

(ε) for the above four lymphocyte images along with six others

are tabulated in Table II. As a reference, however we also

include in Fig. 5 the performance of the proposed segmentation

scheme in terms of misclassification error percentage for the

entire 20 lymphocyte image samples for which ground truth

images are available. Performance in terms of T I is also

computed over twenty images using the available ground truth

images and is presented in Fig. 6.

Fig. 4. Ground truth images of sample lymphocytes

C. Experiment 3

In this experiment, all the cited schemes are used to segment

two lymphocyte images (IGH1aLB and IGH14H) of size 128×

TABLE IIComparison of segmentation error percentage

Methods

Image Error FD GMM MFCM RKM KIRCM

11HNSE 11.85 8.11 4.02 3.17 1.05

CSE - 56.80 8.56 6.42 2.07

7HNSE 12.40 6.45 7.20 4.30 1.17

CSE - 40.06 22.39 6.46 2.18

16L2BNSE 10.80 7.93 10.05 4.47 1.96

CSE - 25.53 14.69 5.58 2.21

51LBNSE 7.23 19.96 4.50 3.57 0.97

CSE - 36.36 14.81 6.63 2.02

52LBNSE 15.74 26.24 6.67 3.04 1.07

CSE - 31.26 6.29 4.65 2.35

NSE: Nucleus Segmentation Error

CSE: Cytoplasm Segmentation Error

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1

2

3

4

5

6

7

8

9

10

Image Samples

g

NucleusCytoplasm

Fig. 5. Segmentation error for the results of the proposed scheme on 20lymphocyte images.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20.50

.55

.60

.65

.70

.75

.80

.85

.90

.95

.00

Image Samples

NucleusCytoplasm

Fig. 6. Tanimoto Index (TI) for the segmentation results of the proposedscheme on 20 lymphocyte images.

128. The computational time (in seconds) are recorded for all

the schemes and is shown in Fig. 7.

V. Discussion

So it can be concluded from the above experiments that the

proposed segmentation scheme outperforms all other reported

schemes in terms of nucleus and cytoplasm extraction as

depicted by ε, T I and visual assessment.

FD GMM MFCM RKM Proposed0

0

0

0

0

0

0

Segmentation Schemes

IGH 11HIGH 7H

Fig. 7. Variation of computational time in seconds for two lymphocyteimages.

VI. Conclusion

Kernel framework has been applied to rough clustering for

lymphocyte image segmentation and is the main theme of

the paper. Encouraging outcomes in terms of nucleus and

cytoplasm extraction were observed in contrast to standard

lymphocyte segmentation schemes. Results obtained stimulate

future works which includes lymphocyte image segmentation

for touching cells and improved segmentation accuracy.

VII. Acknowledgement

The authors would like to thank Dr. R. R. Panda of IGH

Rourkela, Dr. R. K. Jena and Dr. Sudha Sethy Department

of Clinical Haematology, SCB Medical College Cuttack for

their clinical support and valuable advises which helped us in

significantly improving the paper.

References

[1] Q. Liao and Y. Deng, “An accurate segmentation method for white bloodcell images,” in Proceedings of the IEEE International Symposium onBiomedical Imaging, 2002, pp. 245 – 248.

[2] L. Yang, P. Meer, and D. Foran, “Unsupervised segmentation based onrobust estimation and color active contour models,” IEEE Transactionson Information Technology in Biomedicine, vol. 9, no. 3, pp. 475 –486,September 2005.

[3] J. Angulo and G. Flandrin, “Microscopic image analysis using math-ematical morphology: Application to haematological cytology,” vol. 1,pp. 304–312, 2003.

[4] N. Sinha and A. Ramakrishnan, “Blood cell segmentation using emalgorithm,” in Proceedings of the Indian Conference on ComputerVision, Graphics and Image Processing, 2002.

[5] L. Dorini, R. Minetto, and N. Leite, “White blood cell segmentation us-ing morphological operators and scale-space analysis,” in Proceedings ofthe Brazilian Symposium on Computer Graphics and Image Processing,October 2007, pp. 294 –304.

[6] M. Ghosh, D. Das, C. Chakraborty, and A. K. Ray, “Automatedleukocyte recognition using fuzzy divergence,” Micron, vol. 41, no. 7,pp. 840–846, 2010.

[7] S. Mohapatra, D. Patra, and K. Kumar, “Fast leukocyte image segmen-tation using shadowed sets,” International Journal of ComputationalBiology and Drug Design, vol. 5, no. 1, pp. 49 –65, Jan. 2012.

[8] C. Charrier, G. Lebrun, and O. Lezoray, “Evidential segmentation of mi-croscopic color images with pixel classification posterior probabilities,”Journal of Multimedia, vol. 2, no. 3, 2007.

[9] O. Demirkaya, M. H. Asyali, and P. Sahoo, Image Processing withMATLAB: Applications in Medicine and Biology. Taylor and Francis,2009.

[10] P. Lingras and C. West, “Interval set clustering of web users with roughk-means,” Journal of Intelligent Information Systems, vol. 23, pp. 5–16,2004.

[11] N. Sinha and A. G. Ramakrishnan, “Automation of differential bloodcount,” in Proceedings of the Conference on Convergent Technologiesfor Asia-Pacific Region, vol. 2, 2003, pp. 547–551.

[12] S. Chinwaraphat, A. Sanpanich, C. Pintavirooj, M. Sangworasil, andP. Tosranon, “A modified fuzzy clustering for white blood cell seg-mentation,” in Proceedings of the Third International Symposium onBiomedical Engineering, vol. 6, 2008, pp. 2259–2261.

[13] S. Mohapatra, D. Patra, and K. Kumar, “Blood microscopic imagesegmentation using rough sets,” in Proceedings of the InternationalConference on Image Information Processing, November 2011, pp. 1–6.

[14] K. Tu, H. Yu, Z. Guo, and X. Li, “Learnability-based further predictionof gene functions in gene ontology,” Genomics, vol. 84, no. 6, pp. 922– 928, 2004.

[ieee 2012 4th international conference on intelligent human computer interaction (ihci) -...

Documents