[ieee 2010 second international conference on computational intelligence, modelling and simulation...
TRANSCRIPT
Comparison between K-Mean and C-Mean Clustering for CBIR
Ritu Shrivastava, Khushbu Upadhyay, Raman Bhati
Acropolis Institute of Technology and Research, Indore, India
[email protected], [email protected],
Durgesh Kumar Mishra,
Acropolis Institute of Technology and Research, Indore, India
Abstract: Traditionally image is retrieved with the help
of the associated tag which is added to the image while
storing it in the database. This text based image
retrieval is time consuming, laborious and expensive. In
order to overcome these flaws content based image
retrieval is proposed which avoid the use of textual
description and retrieve the image based on their visual
similarity. To achieve this images are clustered using
clustering techniques. Clustering groups similar images
based on some properties for efficient and faster
retrieval. This paper compares two clustering
techniques: K- mean and C-mean clustering used for
Content Based Image Retrieval System.
Keywords: CBIR, clusters, seed points, k- mean, C- mean
I. INTRODUCTION
An image retrieval system is a computer system for
browsing, searching and retrieving images from a
large database of digital images. Most traditional and
common methods of image retrieval utilize some
method of adding metadata such as captioning,
keywords, or descriptions to the images so that
retrieval can be performed over the annotation words.
Manual image annotation is time-consuming,
laborious and expensive; to address this, there has
been a large amount of research done on automatic
image annotations. [3]
Additionally, the increase in
social web applications and the semantic web have
inspired the development of several web-based image
annotation tools.
Image search is a specialized data search used to
find images. To search for images, a user may
provide query terms such as keyword, image file/link,
or click on some image, and the system will return
images "similar" to the query. [1]
The similarity used
for search criteria could be meta tags, color
distribution in images, region/shape attributes, etc.
• Text based image retrieval - search of
images based on associated metadata such as
keywords, text, etc.
• Content based image retrieval (CBIR) – the
application of computer vision to the image
retrieval. CBIR aims at avoiding the use of
textual descriptions and instead retrieves
images based on their visual similarity to a
user-supplied query image or user-specified
image features. [3]
Recently, content based image retrieval has gained
much popularity and lots of research is going on to
make the image retrieval easier and faster that to with
minimum implementation cost. [4]
II. LITERATURE SURVEY
P. Sankara Rao.et. al. [1], proposed a neural
network for content based image retrieval. The author
first performs the clustering of the images available
in the database using hierarchical and k- mean
clustering. This clusters obtained is then supplied to
the neural network which uses radial basis function to
derive the relevant images supplied through user
query.
Tapas Kanungo, David M. Mount, Nathan S.
Netanyahu, Christine D. Piatko, Ruth Silverman, and
Angela Y. Wu [2], proposed an enhanced Lloyd’s
algorithm which is simple to implement and compute
and gives better results as compared to other k- mean
heuristics available which are NP hard.
Websites www.wikipedia.org [3], www.cs.cmu.edu
[6], intranet.cs.man.ac.uk [7], provides general
information and about the topic and gives the updates
of the research work done till know. It also provides
source code of these algorithms for different
application in matlab.
Y. Rui, T.S.Huang and S.-F.Chang [4], dicusses the
technique used for image retrieval in past and
challenges faced with these techniques. Also discuss
the current progress in this field.
Chang Wen Chen, Jiebo Luo and Kevin J. Parker
[9], discusses the problem faced while using K- mean
algorithm and propose adaptive k- mean algorithm,
its working and advantage over simple K- mean.
Weiling Cai, Songcan Chen, Daoqiang Zhang [10],
discusses the fuzzy C- mean clustering its working
and drawbacks. The paper propose that incorporating
the loacal information in the objective function while
clustering improve the performance of the algorithm
and make it resistant to noise and outliers.
III. COMPARATIVE ANALYSIS
Clustering is another term for grouping; it’s an
unsupervised form of learning. In clustering we
assign some label to data points that are close to each
Second International Conference on Computational Intelligence, Modelling and Simulation
978-0-7695-4262-1/10 $26.00 © 2010 IEEE
DOI 10.1109/CIMSiM.2010.66
104
Second International Conference on Computational Intelligence, Modelling and Simulation
978-0-7695-4262-1/10 $26.00 © 2010 IEEE
DOI 10.1109/CIMSiM.2010.66
117
Second International Conference on Computational Intelligence, Modelling and Simulation
978-0-7695-4262-1/10 $26.00 © 2010 IEEE
DOI 10.1109/CIMSiM.2010.66
117
other. This closeness relies on the distance metric
between data points. [3]
K- mean clustering aims to partition n observations
into k clusters in which each observation belongs to
the cluster with the nearest mean. It is a mixture of
Gaussians in the sense that it attempts to find the
centers of natural clusters in the data. [5]
The most common algorithm uses an iterative
refinement technique.
Algorithm:
Given the cluster number K, the K-means algorithm
is carried out in three steps:
1) Initialize the seed points.
2) Assign each object to the cluster with the
nearest seed points.
3) Compute new seed points as the centroids of
the new cluster formed.
Steps 2 through 3 are repeated till all N observations
are clustered into the K clusters. [6]
Whereas in fuzzy C- mean clustering, each point has
a degree of belonging to clusters rather than
belonging completely to just one cluster. Thus, points
on the edge of a cluster may be in the cluster to a
lesser degree than points in the center of cluster. [5]
Algorithm: Given the cluster number K, the C-mean algorithm is
carried out in three steps:
1) Assign coefficients to each seed point
randomly for being in the clusters.
2) Compute the centroid for each cluster.
3) For each seed point, compute its coefficients
of being in the new clusters formed.
Steps 2 through 3 are repeated since the coefficient
value doesn’t the given threshold value till two
iterations. [8]
IV. PROPOSED WORK
K-mean algorithm is easy to implement and
computationally fast. It is iterative in nature and
works for numerical data. [2] [6]
For categorical data an
extended version known as K- mode algorithm is
used. The algorithm is unsuitable for noisy data and
outliers; cannot form clusters of globular data. The
algorithm is deemed to have converged, however
only local minimum can be obtained. [9]
To achieve
global optimum, the result may depend on the initial
clusters. As the algorithm is usually very fast, it is
common to run it multiple times with different
starting conditions.
Fuzzy C-mean algorithm minimizes intra cluster
variance and always converges. The algorithm
consumes long time for computation. The result of
the algorithm depends on the initial choice made.
Fuzzy C-mean clustering is prone to noisy data but
posses low degree membership for outliers. [3]
To
reduce sensitivity to noise local objective function
should be incorporated in the objective function. [10]
But C- mean clustering has the same problems as k-
mean; the minimum is a local minimum.
IV. CONCLUSION
This paper compares the K- mean and C- mean
clustering techniques for the CBIR system. The
analysis shows that both techniques works on the
distance metric concept, that is both computes the
distance between the centroid of the cluster and seed
point. The one with the minimum distance is taken
into account and is added to the cluster. Both
algorithm uses prior identified number K (number of
clusters to be formed) therefore the results depends
on the cluster number K and initial choices of seed
points. K- mean algorithm is easy and fast to
compute on the other hand C- mean algorithm takes
long computational time. Both converges but suffers
from the problem of local minimum.
REFERENCES
[1] P.Sankara Rao.et. al., “An approach for CBIR
system through multi layer neural network”,
International Journal of Engineering Science and
Technology Vol. 2(4), 2010, 559-563
[2] Tapas Kanungo, David M. Mount, Nathan S.
Netanyahu, Christine D. Piatko, Ruth Silverman, and
Angela Y. Wu, “An Efficient k-Means Clustering
Algorithm: Analysis and Implementation”, IEEE
transactions on pattern analysis and machine
intelligence, vol. 24, no. 7, July 2002.
[3] http://en.wikipedia.org/wiki/Cluster_analysis.
[4] Y. Rui, T.S.Huang, S.-F.Chang, Image retrieval
Past, present and future, in: M. Liao(Ed.),Proceedings
of the International Symposiumon Multimedia
Information Processing Taipei,Taiwan,1997.
[5] http://en.wikipedia.org/wiki/Cluster_analysis#k-
means_clustering.
[6] http://www.cs.cmu.edu/~awm/tutorials.
[7] http://intranet.cs.man.ac.uk/mlo/comp20411/.
[8] http://ce.sharif.edu/~m_amiri/.
[9] Chang Wen Chen, Jiebo Luo and Kevin J. Parker,
“Image Segmentation via Adaptive K Mean
Clustering and Knowledge-Based Morphological
Operations with Biomedical Applications”, IEEE
transactions on image processing, vol. 7, no. 12,
December 1998.
[10] Weiling Cai, Songcan Chen, Daoqiang Zhang,
“Fast and robust fuzzy c-means clustering algorithms
incorporating local information for image
segmentation”, ISSN:0031-3203.
105118118