[ieee 2010 second international conference on computational intelligence, modelling and simulation...

2
Comparison between K-Mean and C-Mean Clustering for CBIR Ritu Shrivastava, Khushbu Upadhyay, Raman Bhati Acropolis Institute of Technology and Research, Indore, India [email protected] , [email protected] , Durgesh Kumar Mishra, Acropolis Institute of Technology and Research, Indore, India [email protected] Abstract: Traditionally image is retrieved with the help of the associated tag which is added to the image while storing it in the database. This text based image retrieval is time consuming, laborious and expensive. In order to overcome these flaws content based image retrieval is proposed which avoid the use of textual description and retrieve the image based on their visual similarity. To achieve this images are clustered using clustering techniques. Clustering groups similar images based on some properties for efficient and faster retrieval. This paper compares two clustering techniques: K- mean and C-mean clustering used for Content Based Image Retrieval System. Keywords: CBIR, clusters, seed points, k- mean, C- mean I. INTRODUCTION An image retrieval system is a computer system for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, or descriptions to the images so that retrieval can be performed over the annotation words. Manual image annotation is time-consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotations. [3] Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools. Image search is a specialized data search used to find images. To search for images, a user may provide query terms such as keyword, image file/link, or click on some image, and the system will return images "similar" to the query. [1] The similarity used for search criteria could be meta tags, color distribution in images, region/shape attributes, etc. Text based image retrieval - search of images based on associated metadata such as keywords, text, etc. Content based image retrieval (CBIR) – the application of computer vision to the image retrieval. CBIR aims at avoiding the use of textual descriptions and instead retrieves images based on their visual similarity to a user-supplied query image or user-specified image features. [3] Recently, content based image retrieval has gained much popularity and lots of research is going on to make the image retrieval easier and faster that to with minimum implementation cost. [4] II. LITERATURE SURVEY P. Sankara Rao.et. al. [1], proposed a neural network for content based image retrieval. The author first performs the clustering of the images available in the database using hierarchical and k- mean clustering. This clusters obtained is then supplied to the neural network which uses radial basis function to derive the relevant images supplied through user query. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu [2], proposed an enhanced Lloyd’s algorithm which is simple to implement and compute and gives better results as compared to other k- mean heuristics available which are NP hard. Websites www.wikipedia.org [3], www.cs.cmu.edu [6], intranet.cs.man.ac.uk [7], provides general information and about the topic and gives the updates of the research work done till know. It also provides source code of these algorithms for different application in matlab. Y. Rui, T.S.Huang and S.-F.Chang [4], dicusses the technique used for image retrieval in past and challenges faced with these techniques. Also discuss the current progress in this field. Chang Wen Chen, Jiebo Luo and Kevin J. Parker [9], discusses the problem faced while using K- mean algorithm and propose adaptive k- mean algorithm, its working and advantage over simple K- mean. Weiling Cai, Songcan Chen, Daoqiang Zhang [10], discusses the fuzzy C- mean clustering its working and drawbacks. The paper propose that incorporating the loacal information in the objective function while clustering improve the performance of the algorithm and make it resistant to noise and outliers. III. COMPARATIVE ANALYSIS Clustering is another term for grouping; it’s an unsupervised form of learning. In clustering we assign some label to data points that are close to each Second International Conference on Computational Intelligence, Modelling and Simulation 978-0-7695-4262-1/10 $26.00 © 2010 IEEE DOI 10.1109/CIMSiM.2010.66 104 Second International Conference on Computational Intelligence, Modelling and Simulation 978-0-7695-4262-1/10 $26.00 © 2010 IEEE DOI 10.1109/CIMSiM.2010.66 117 Second International Conference on Computational Intelligence, Modelling and Simulation 978-0-7695-4262-1/10 $26.00 © 2010 IEEE DOI 10.1109/CIMSiM.2010.66 117

Upload: durgesh-kumar

Post on 07-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2010 Second International Conference on Computational Intelligence, Modelling and Simulation (CIMSiM) - Bali, Indonesia (2010.09.28-2010.09.30)] 2010 Second International Conference

Comparison between K-Mean and C-Mean Clustering for CBIR

Ritu Shrivastava, Khushbu Upadhyay, Raman Bhati

Acropolis Institute of Technology and Research, Indore, India

[email protected], [email protected],

Durgesh Kumar Mishra,

Acropolis Institute of Technology and Research, Indore, India

[email protected]

Abstract: Traditionally image is retrieved with the help

of the associated tag which is added to the image while

storing it in the database. This text based image

retrieval is time consuming, laborious and expensive. In

order to overcome these flaws content based image

retrieval is proposed which avoid the use of textual

description and retrieve the image based on their visual

similarity. To achieve this images are clustered using

clustering techniques. Clustering groups similar images

based on some properties for efficient and faster

retrieval. This paper compares two clustering

techniques: K- mean and C-mean clustering used for

Content Based Image Retrieval System.

Keywords: CBIR, clusters, seed points, k- mean, C- mean

I. INTRODUCTION

An image retrieval system is a computer system for

browsing, searching and retrieving images from a

large database of digital images. Most traditional and

common methods of image retrieval utilize some

method of adding metadata such as captioning,

keywords, or descriptions to the images so that

retrieval can be performed over the annotation words.

Manual image annotation is time-consuming,

laborious and expensive; to address this, there has

been a large amount of research done on automatic

image annotations. [3]

Additionally, the increase in

social web applications and the semantic web have

inspired the development of several web-based image

annotation tools.

Image search is a specialized data search used to

find images. To search for images, a user may

provide query terms such as keyword, image file/link,

or click on some image, and the system will return

images "similar" to the query. [1]

The similarity used

for search criteria could be meta tags, color

distribution in images, region/shape attributes, etc.

• Text based image retrieval - search of

images based on associated metadata such as

keywords, text, etc.

• Content based image retrieval (CBIR) – the

application of computer vision to the image

retrieval. CBIR aims at avoiding the use of

textual descriptions and instead retrieves

images based on their visual similarity to a

user-supplied query image or user-specified

image features. [3]

Recently, content based image retrieval has gained

much popularity and lots of research is going on to

make the image retrieval easier and faster that to with

minimum implementation cost. [4]

II. LITERATURE SURVEY

P. Sankara Rao.et. al. [1], proposed a neural

network for content based image retrieval. The author

first performs the clustering of the images available

in the database using hierarchical and k- mean

clustering. This clusters obtained is then supplied to

the neural network which uses radial basis function to

derive the relevant images supplied through user

query.

Tapas Kanungo, David M. Mount, Nathan S.

Netanyahu, Christine D. Piatko, Ruth Silverman, and

Angela Y. Wu [2], proposed an enhanced Lloyd’s

algorithm which is simple to implement and compute

and gives better results as compared to other k- mean

heuristics available which are NP hard.

Websites www.wikipedia.org [3], www.cs.cmu.edu

[6], intranet.cs.man.ac.uk [7], provides general

information and about the topic and gives the updates

of the research work done till know. It also provides

source code of these algorithms for different

application in matlab.

Y. Rui, T.S.Huang and S.-F.Chang [4], dicusses the

technique used for image retrieval in past and

challenges faced with these techniques. Also discuss

the current progress in this field.

Chang Wen Chen, Jiebo Luo and Kevin J. Parker

[9], discusses the problem faced while using K- mean

algorithm and propose adaptive k- mean algorithm,

its working and advantage over simple K- mean.

Weiling Cai, Songcan Chen, Daoqiang Zhang [10],

discusses the fuzzy C- mean clustering its working

and drawbacks. The paper propose that incorporating

the loacal information in the objective function while

clustering improve the performance of the algorithm

and make it resistant to noise and outliers.

III. COMPARATIVE ANALYSIS

Clustering is another term for grouping; it’s an

unsupervised form of learning. In clustering we

assign some label to data points that are close to each

Second International Conference on Computational Intelligence, Modelling and Simulation

978-0-7695-4262-1/10 $26.00 © 2010 IEEE

DOI 10.1109/CIMSiM.2010.66

104

Second International Conference on Computational Intelligence, Modelling and Simulation

978-0-7695-4262-1/10 $26.00 © 2010 IEEE

DOI 10.1109/CIMSiM.2010.66

117

Second International Conference on Computational Intelligence, Modelling and Simulation

978-0-7695-4262-1/10 $26.00 © 2010 IEEE

DOI 10.1109/CIMSiM.2010.66

117

Page 2: [IEEE 2010 Second International Conference on Computational Intelligence, Modelling and Simulation (CIMSiM) - Bali, Indonesia (2010.09.28-2010.09.30)] 2010 Second International Conference

other. This closeness relies on the distance metric

between data points. [3]

K- mean clustering aims to partition n observations

into k clusters in which each observation belongs to

the cluster with the nearest mean. It is a mixture of

Gaussians in the sense that it attempts to find the

centers of natural clusters in the data. [5]

The most common algorithm uses an iterative

refinement technique.

Algorithm:

Given the cluster number K, the K-means algorithm

is carried out in three steps:

1) Initialize the seed points.

2) Assign each object to the cluster with the

nearest seed points.

3) Compute new seed points as the centroids of

the new cluster formed.

Steps 2 through 3 are repeated till all N observations

are clustered into the K clusters. [6]

Whereas in fuzzy C- mean clustering, each point has

a degree of belonging to clusters rather than

belonging completely to just one cluster. Thus, points

on the edge of a cluster may be in the cluster to a

lesser degree than points in the center of cluster. [5]

Algorithm: Given the cluster number K, the C-mean algorithm is

carried out in three steps:

1) Assign coefficients to each seed point

randomly for being in the clusters.

2) Compute the centroid for each cluster.

3) For each seed point, compute its coefficients

of being in the new clusters formed.

Steps 2 through 3 are repeated since the coefficient

value doesn’t the given threshold value till two

iterations. [8]

IV. PROPOSED WORK

K-mean algorithm is easy to implement and

computationally fast. It is iterative in nature and

works for numerical data. [2] [6]

For categorical data an

extended version known as K- mode algorithm is

used. The algorithm is unsuitable for noisy data and

outliers; cannot form clusters of globular data. The

algorithm is deemed to have converged, however

only local minimum can be obtained. [9]

To achieve

global optimum, the result may depend on the initial

clusters. As the algorithm is usually very fast, it is

common to run it multiple times with different

starting conditions.

Fuzzy C-mean algorithm minimizes intra cluster

variance and always converges. The algorithm

consumes long time for computation. The result of

the algorithm depends on the initial choice made.

Fuzzy C-mean clustering is prone to noisy data but

posses low degree membership for outliers. [3]

To

reduce sensitivity to noise local objective function

should be incorporated in the objective function. [10]

But C- mean clustering has the same problems as k-

mean; the minimum is a local minimum.

IV. CONCLUSION

This paper compares the K- mean and C- mean

clustering techniques for the CBIR system. The

analysis shows that both techniques works on the

distance metric concept, that is both computes the

distance between the centroid of the cluster and seed

point. The one with the minimum distance is taken

into account and is added to the cluster. Both

algorithm uses prior identified number K (number of

clusters to be formed) therefore the results depends

on the cluster number K and initial choices of seed

points. K- mean algorithm is easy and fast to

compute on the other hand C- mean algorithm takes

long computational time. Both converges but suffers

from the problem of local minimum.

REFERENCES

[1] P.Sankara Rao.et. al., “An approach for CBIR

system through multi layer neural network”,

International Journal of Engineering Science and

Technology Vol. 2(4), 2010, 559-563

[2] Tapas Kanungo, David M. Mount, Nathan S.

Netanyahu, Christine D. Piatko, Ruth Silverman, and

Angela Y. Wu, “An Efficient k-Means Clustering

Algorithm: Analysis and Implementation”, IEEE

transactions on pattern analysis and machine

intelligence, vol. 24, no. 7, July 2002.

[3] http://en.wikipedia.org/wiki/Cluster_analysis.

[4] Y. Rui, T.S.Huang, S.-F.Chang, Image retrieval

Past, present and future, in: M. Liao(Ed.),Proceedings

of the International Symposiumon Multimedia

Information Processing Taipei,Taiwan,1997.

[5] http://en.wikipedia.org/wiki/Cluster_analysis#k-

means_clustering.

[6] http://www.cs.cmu.edu/~awm/tutorials.

[7] http://intranet.cs.man.ac.uk/mlo/comp20411/.

[8] http://ce.sharif.edu/~m_amiri/.

[9] Chang Wen Chen, Jiebo Luo and Kevin J. Parker,

“Image Segmentation via Adaptive K Mean

Clustering and Knowledge-Based Morphological

Operations with Biomedical Applications”, IEEE

transactions on image processing, vol. 7, no. 12,

December 1998.

[10] Weiling Cai, Songcan Chen, Daoqiang Zhang,

“Fast and robust fuzzy c-means clustering algorithms

incorporating local information for image

segmentation”, ISSN:0031-3203.

105118118