unsupervised clustering of images using their joint...

9
Unsupervised Clustering of Images using their Joint Segmentation Sonia Starik Yevgeny Seldin Michael Werman School of Computer Science and Engineering The Hebrew University of Jerusalem Jerusalem, Israel E-mail: starik,seldin,werman @cs.huji.ac.il Abstract We present a method for unsupervised content based clas- sification of images. The idea is to first segment the im- ages using centroid models common to all the images in the set and then through bringing an analogy between mod- els/images and words/documents to apply algorithms from the field of unsupervised document classification to cluster the images. We choose our centroid models to be histograms of marginal distributions of wavelet coefficients on image sub- windows. The models are used in our enhancement of [6] algorithm to jointly segment all the images in the input set. Finally we use the sequential Information Bottleneck algo- rithm [14] to cluster the images based on the result of the segmentation. The method is applied to nature views classification and painting categorization by drawing style. The method is shown to be superior to image classification algorithms that regard each image as a single model. We see our current work as opening a new perspective on high level unsuper- vised data analysis. 1. Introduction Clustering of a set of images into meaningful classes has many applications in the organization of image data bases and the design of their human interface. [10, 8, 17, 13, 4] are only a few reviews and some of the recent works in the field. Image clustering can also be used in order to seg- ment a movie, for example [5], and to facilitate image data mining, for example searching for interesting partitions of medical images [15]. There are numerous papers describing algorithms for supervised and unsupervised segmentation of images, al- though it is still not a completely solved problem. There have also been a number of papers describing algorithms for supervised clustering of images and fewer papers de- scribing unsupervised clustering of images. See [1] for a nice example of unsupervised clustering using extra infor- mation. In this paper we treat the problem of unsupervised clus- tering of an image set into clusters of images, where we are only given the images. The algorithm has three main steps. The first step is to compute histograms from wavelets and color for an over- lapping net of small windows over each image in the input set. The second step can be looked at as unsupervised joint segmentation of a large set of images, or as finding a small set of histograms (centroids) that (as a mixture) will best describe the various histograms present in the input. We approach this problem in a deterministic annealing frame- work, looking for a top-down hierarchy of segmentations at increasing levels of resolution. We start with a single av- erage model for all the histograms and repeatedly split the model capturing the largest portion of data and use an EM- like procedure to segment that portion. The last step is to use the co-occurrences of centroids (models) and images in order to cluster the images. Treat- ing centroids as words and images as documents we use the sequential Information Bottleneck algorithm [14] to achieve this goal. It should be noted that while the idea of bring- ing the parallel between word counts in a document and model (feature) probability integral over the image already appeared in supervised classification literature [7], we see our current work as a new point of view on unsupervised data classification. It enables automatic search for signif- icant features in the input data which are data segments common to multiple objects in the input. The performed procedure may be seen as parsing of unmarked texts into words and subsequent classification of them based on seg- mented words. It is suitable for cases where input data units consist of several objects with no predefined order- ing between them, thus simple classification methods like straightforward comparison of the units is not applicable. The suggested approach is very general and may be applied not only to image classification, but also in unsupervised analysis of audio signals, protein sequences, spike trains 1

Upload: others

Post on 10-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unsupervised Clustering of Images using their Joint ...leibniz.cs.huji.ac.il/tr/acc/2003/HUJI-CSE-LTR... · windows. The models are used in our enhancement of [6] algorithm to jointly

Unsupervised Clustering of Images using their Joint Segmentation

Sonia Starik Yevgeny Seldin Michael Werman

School of Computer Science and EngineeringThe Hebrew University of Jerusalem

Jerusalem, IsraelE-mail:

�starik,seldin,werman � @cs.huji.ac.il

Abstract

We present a method for unsupervised content based clas-sification of images. The idea is to first segment the im-ages using centroid models common to all the images inthe set and then through bringing an analogy between mod-els/images and words/documents to apply algorithms fromthe field of unsupervised document classification to clusterthe images.

We choose our centroid models to be histograms ofmarginal distributions of wavelet coefficients on image sub-windows. The models are used in our enhancement of [6]algorithm to jointly segment all the images in the input set.Finally we use the sequential Information Bottleneck algo-rithm [14] to cluster the images based on the result of thesegmentation.

The method is applied to nature views classification andpainting categorization by drawing style. The method isshown to be superior to image classification algorithms thatregard each image as a single model. We see our currentwork as opening a new perspective on high level unsuper-vised data analysis.

1. Introduction

Clustering of a set of images into meaningful classes hasmany applications in the organization of image data basesand the design of their human interface. [10, 8, 17, 13, 4]are only a few reviews and some of the recent works in thefield. Image clustering can also be used in order to seg-ment a movie, for example [5], and to facilitate image datamining, for example searching for interesting partitions ofmedical images [15].

There are numerous papers describing algorithms forsupervised and unsupervised segmentation of images, al-though it is still not a completely solved problem. Therehave also been a number of papers describing algorithmsfor supervised clustering of images and fewer papers de-scribing unsupervised clustering of images. See [1] for a

nice example of unsupervised clustering using extra infor-mation.

In this paper we treat the problem of unsupervised clus-tering of an image set into clusters of images, where we areonly given the images.

The algorithm has three main steps. The first step is tocompute histograms from wavelets and color for an over-lapping net of small windows over each image in the inputset. The second step can be looked at as unsupervised jointsegmentation of a large set of images, or as finding a smallset of histograms (centroids) that (as a mixture) will bestdescribe the various histograms present in the input. Weapproach this problem in a deterministic annealing frame-work, looking for a top-down hierarchy of segmentations atincreasing levels of resolution. We start with a single av-erage model for all the histograms and repeatedly split themodel capturing the largest portion of data and use an EM-like procedure to segment that portion.

The last step is to use the co-occurrences of centroids(models) and images in order to cluster the images. Treat-ing centroids as words and images as documents we use thesequential Information Bottleneck algorithm [14] to achievethis goal. It should be noted that while the idea of bring-ing the parallel between word counts in a document andmodel (feature) probability integral over the image alreadyappeared in supervised classification literature [7], we seeour current work as a new point of view on unsuperviseddata classification. It enables automatic search for signif-icant features in the input data which are data segmentscommon to multiple objects in the input. The performedprocedure may be seen as parsing of unmarked texts intowords and subsequent classification of them based on seg-mented words. It is suitable for cases where input dataunits consist of several objects with no predefined order-ing between them, thus simple classification methods likestraightforward comparison of the units is not applicable.The suggested approach is very general and may be appliednot only to image classification, but also in unsupervisedanalysis of audio signals, protein sequences, spike trains

1

Page 2: Unsupervised Clustering of Images using their Joint ...leibniz.cs.huji.ac.il/tr/acc/2003/HUJI-CSE-LTR... · windows. The models are used in our enhancement of [6] algorithm to jointly

and many other types of data, while using appropriate al-gorithms and data structures for their segmentation.

The first two steps of the algorithm have a close resem-blance to [6]. There are two major differences. First, ouralgorithm jointly segments all the images in the set. Thusthe models (or cluster centroids) we get after segmentationare common to all the images in the set and may be seenas a “model dictionary”. Note that those models may beused to segment new images not from the set and using thesegmentation to associate them with corresponding clusters.The second difference is that we explicitly introduce thedependence between neighboring image locations throughthe definition of our distance measure. This idea was takenfrom our previous work on one dimensional data [12] andstabilizes the segmentation process (in [6] a post processingstep filtering out noisily labeled regions was required).

Another work slightly similar to ours is [4]. They rep-resent each image with Gaussian mixture of pixel color andlocation values. The mixtures and thus images correspond-ing to them are then clustered using agglomerative Infor-mation Bottleneck. Comparing to this work our suggestionto use joint segmentation of images and usage of sequentialInformation Bottleneck gives a more global top-down viewon the data. Also, the chosen wavelet based data modelingenables to operate with much more complex and locationindependent features of the data.

2. Probabilistic image modeling basedon texture and color features

We start with a description of our parametric model for im-age sites. Being the basic building-block of the algorithm,the correct choice of the model has crucial importance forthe final success of the whole process. In the current workwe choose our image sites as square subimages - windows -of a predefined size, and we use a texture approach to para-metrically model them - we model each window as if it wasa homogeneous texture sampled i.i.d. from a single distri-bution and an image as a collection of those textures.

To model a single window, we use a common approachthat texture can be identified by the energy distribution inits frequency domain, via modeling the marginal densitiesof its wavelet subband coefficients. Such an approach wassuccessfully used by [3] in image retrieval applications.

In our algorithm, we characterize a texture as a set ofmarginal histograms of its wavelet subband coefficients.The number of bins in a histogram was chosen to be a squareroot of the total number of coefficients at the correspond-ing subband, as an optimal compromise between distribu-tion resolution and statistical significance of the empiricalcounts. In order to assign to each bin of the histogram ap-proximately the same number of samples, we make a coarseestimation of the distribution of coefficients in this subband

as a Gaussian function, by fitting the parameters on thewhole data set, and use the inverse Gaussian distributionto construct the histogram bins. Although the distributionis more likely to be a Generalized Gaussian density (as wasshown by [3]), this coarse estimation by simple Gaussian issufficient. The conventional pyramid wavelet decomposi-tion is then performed with � levels (usually ����� ) withone of the known wavelet filters (we used Daubechies andReverse biorthogonal wavelets), and one histogram for eachsubband is created. We normalize the histograms to obtainprobability distributions and take the resulting set of the dis-tributions to be our parametric model for the window. Bymoving to the space of wavelet histograms, the number ofparameters characterizing the texture image was reduced toabout square root of the image area, and we also profit fromthe statistical nature of such model.

As a similarity measure between distributions � , � we useKullback-Leibler divergence, which is a natural measure ofdistance between probability distributions and is defined as� ��� ����� ����������� ��� ��� �"!$#&% �(') % �(' (see [2]). To compute thedistance from a window model * to a centroid model +we compute pairwise

� � * � � � + � � for each subband , andtake a weighted sum as the total distance. Since the numberof coefficients decreases by a factor of 4 from level to leveland due to the fact that there is more variability at higherresolutions, we give an accordingly decreasing weight tothe distances at lower resolutions.

To build a centroid for a weighted set of window models,we build an average model as a weighted sum:

Foreach subband ,-/.103254768.:9<;>=?.1@&A<2�. +/B 9?. , � = � DC FE * 1G �

where * 1G �is a histogram of subband , of window modelH

and C is the weight of the window. The aver-

age model computed this way minimizes the weightedsum of distances of the windows to the centroid:IKJ�LNM � � � * � � +�� (see [6]). The centroid model hasexactly the same parametric structure as the window mod-els.

In the same manner we can use other features, such asimage color, or a weighted combination of two or moretypes of wavelet filters, or a combination of all of them.

In our experiments, when we used color model, we tooka histogram of the hue component of image color space.

3. Joint image segmentation algorithmAs different regions in a set of images are likely to have dif-ferent textures, but at the same time a specific texture mayappear at many locations in different images in the set, wewant to segment the set into the family of textures consti-tuting it. The goal is to represent each image in the set as

2

Page 3: Unsupervised Clustering of Images using their Joint ...leibniz.cs.huji.ac.il/tr/acc/2003/HUJI-CSE-LTR... · windows. The models are used in our enhancement of [6] algorithm to jointly

a soft mixture of a small number of textures common to allthe images in the set. Segmentation of images is done ina top-down hierarchical manner. The algorithm integratesthe ideas developed in [12] for sequence segmentation withtexture segmentation algorithm of [6].

Our segmentation works on a shifted grid of size2?@:0O9 P(0OQ?.in shifts of

2�@:0O9 P:470ORS6(2?@:0O9 PT0OQ?.

is divisible by2?@:0O9 P:4708RS6). This results in an image being divided into

small patches of2�@10O9 P:4F08RS6SU>2�@:0O9 P:4F08RS6

size which deter-mine the resolution of our clustering. The dependence be-tween overlapping grid windows V Q:WYX

is introduced throughthe definition of the distance measure between patch

�[Zand

model +]\ :

9S� +]\"^ �NZ �_� � Wa` b�c3d �(eYfgDh Kikj�� * �aQ(W �(��� +]\1��lV 6nm"Q1WSo��SZqp��r X �

and the assignment probabilities C M>s �aQ W � :C M s ��Q W �_�ut � + \ � Q W �v� � Zw` �(eOx b�c C M>s �y� Z �

�lV 0vm"�NZvz{Q1W|X �here * ��Q1W � is a set of wavelet histograms for the windowQ W

.The goal of the segmentation process is to find } seg-

ment models V&+ \ X i\ g�~ ��� , so that the average in-segment distortion � 9�� � ~� � \ � Z t � +]\?� �SZ � E 9S� +]\"^ �NZ �will be minimal ( � is the total number of patches). Weapproach this problem in a deterministic annealing frame-work (see [9]) which essentially uses EM to estimate thesegmentation at increasing levels of resolution. This frame-work allows us to avoid many local minima as well as toget a hierarchy of segmentation solutions. In this work wealso present a new algorithmic enhancement, we call forcedhierarchy, that enables significant speed up of the segmen-tation process.

We start the description of the algorithm from the innerloop of Soft-Clustering. One problem we address is how todivide (segment) image patches among given models. Forthis purpose we define an objective functional for the seg-mentation step as:

I�J L�Y� % M sT� � e 'a� ey� s `l��������� G � s � % M sT� � e ' g�~5� �y� ^Y��� (1)

Where � �y� ^|����� ~� � Z � \ t � + \ � � Z � E � �"! � % M>s � �(e�'� % M>s ' isthe mutual information between patches and assigned mod-els with t � + \ ��� ~� � Z t � + \ � � Z � , and

is the maximal

currently permitted distortion (current resolution) - it willbe gradually decreased as we advance with the segmenta-tion. (1) is known as the Rate Distortion function in RateDistortion theory (see [2]). The idea is that the assignmentthat minimizes the mutual information subject to the con-straints given is the most probable one (see [2, 9]). (1) is

solved by the Blahut-Arimoto algorithm which iterativelyrepeats the first two steps of Soft Clustering (Fig. 1) untilconvergence. Though, from our previous experience [11], itis better to make just a single pass through this computationbetween subsequent models updates in the Soft-Clusteringprocedure we describe now.

The Soft-Clustering procedure is much like the EM al-gorithm, the only difference is that in our case it is repeatednumerous times at increasing levels of resolution � ( � is aLagrange multiplier corresponding to

in the solution of

(1)). It starts from a pair of models + ~ ^|+�� and a prior dis-tribution t � +��T� over them and iterates between data par-tition (steps 1 and 2) and model reestimation (step 3) untilconvergence to a local minimum (Fig. 1). The meaning ofthe last, �C #(�� |¡8¢ W

parameter is explained below.

Soft-Clustering( + � ~ G � � , t � + � ~ G � � � , � , �C #(��  ¡O¢ W)

Repeat until convergence:

1. C M s �y�NZ �v��t � +]\<� �NZ �_�� % M£s ' ¡|¤�¥T¦�§©¨ e � ª sY«��¬­Y®?¯ � % M ­ ' ¡ ¤�¥T¦�§l¨ e � ª ­ « E C #(�� |¡8¢ W �y� Z �

2. t � + \ �v� ~¢ � ¢Z g�~ t � + \ � � Z �

3. +°\>� WeightedAverageModel( V1* ��Q:W � X , V C M>s �aQ(W � X)

Figure 1: Soft-Clustering procedure

The idea of deterministic annealing (DA) is to look at aseries of soft clustering solutions at increasing levels of res-olution � . Each subsequent search is initialized by splittingthe set of models we got in the previous convergence. Split-ting of a model creates two new copies of a given modelwith small antisymmetric perturbations of all the parame-ters of the model. DA has the following nice property: ateach given resolution � there is a finite number of models}K± required to describe the data at that resolution. If thenumber of models is greater than }�± , Soft-Clustering con-verges with extra models either unified together (their pa-rameters are almost identical and the distance between themis negligible) or “having lost all their data” ( t � +��²�´³ ).When this happens we may increase the resolution parame-ter � to search for finer partitions of our data. The advantageof the DA approach is that at low resolution the potentialfunction of data partition is smoothed. This enables us toavoid many shallow local minima and to achieve qualita-tively better results.

In this paper we suggest a computational improvementfor the DA procedure we call forced hierarchy. The ideais that child models created after a split only compete onthe part of the data that belonged to their parent. This is

3

Page 4: Unsupervised Clustering of Images using their Joint ...leibniz.cs.huji.ac.il/tr/acc/2003/HUJI-CSE-LTR... · windows. The models are used in our enhancement of [6] algorithm to jointly

done through the normalization vector �C #T�µ |¡O¢ Wwe intro-

duced in Soft-Clustering. This limits the algorithm’s searchspace (in the original algorithm each model competes withall the rest of the models on all the data) but substantiallydecreases the computation time. Note, that in such an ap-proach each branch in the hierarchy is completely inde-pendent from other branches. Thus we proceed with seg-mentation in each branch independently of other branches.For each model we remember the value of � at which itwas created, every time choosing to split the largest model(the one having maximal t � +�� ) and continuing its branch-ing from the point where it was interrupted. This way weget a division of data into approximately even-sized seg-ments and avoid the generation of an unnecessary amountof small models modeling large variations in small portionsof the data. Since we work at increasing levels of resolution,avoiding shallow local minima, the new approach gives al-most no degradation in the final result. Moreover, since itsrunning time is substantially better, in the same time wemanage to get a much finer segmentation. The only currentdrawback of the new approach is that we lack the clusterstability indication we previously had (segmentation solu-tions that remained stable through relatively long intervalsof � increments were stable clusters). But since the focus ofthis paper is on classification of images based on their seg-mentation and since we took soft integrals when buildingour co-occurences matrix (see next section), finding stablesegmentations was less crucial than finding fine segmenta-tion.

The complete algorithm is given in Fig. 2. We start from� including a single model trained on the whole imageset. Then we repeatedly choose the model having the largestt � +�� from � , split it, and soft cluster the images thatbelonged to + (by C M

) between the two child models + ~and +¶� . The resolution � is set to ���·� M at which + wascreated (initially a small value). If after soft clustering + ~and + � become almost identical or one of them empties,we discard the two models, increase � and split + again.Otherwise we discard + and add + ~ and + � to � , setting� M ¯ �¸� Mº¹ �¸� . This is repeated until the process isterminated or

�� »¼� becomes larger than the minimal modelprobability taken into consideration.

4. Images classification according tosegmentation map

Now we come to the final but major algorithmic point ofour paper. We use the segmentation result we got in theprevious section for unsupervised classification of the in-put set of images. The idea is to regard each image � Zas a document and each model + \ as a word, definingt � + \ � � Z �q� ~� %�½ ey' ��� c x ½ t � + \ � � W � analogous to the nor-malized count of the number of appearances of +¶\ in �

The segmentation algorithm:

Initialization:

For all0, C>¾ ���NZ � = 1

+ ¾ = Build Model( �� , �C>¾ )

� = V:+ ¾ X, t � + ¾ �v�À¿

� MºÁ �u� ¾Annealing loop:

While�� »���Â�à ~

1. Take + having largest t � +�� out of � .

2. Ä + ~ ^ +¶� ÅÆ� Split� +��

3. Ä + ~ ^ + � ^ C M ¯ ^ C M ¹ Å��Soft-Cluster

� + ~ G �&^µV ~� t � +�� X ~ G ��^Y� M ^ C M �4. if ( t � + ~ �ÈÇ Ã ~ or t � + � �ÈÇ Ã ~ or9S� + ~ ^|+ � �nÇ ~± ª )

(a) � M = � M E �SÉ W ¡3#(b) Return + to � .

else

(a) � M ¯ �·� Mº¹ �u�(b) Add + ~ and + � to � .

Figure 2: Unsupervised Sequence Segmentation algorithm.

( � � � Z � is the number of patches in � Z). With this analogy

we use the sequential Information Bottleneck (sIB) cluster-ing algorithm [14] to cluster by images the co-occurrencematrix t � � Z ^|+ \ � .

The sIB clustering algorithm is based on the InformationBottleneck (IB) clustering framework proposed in [16]. Itsgoal is to represent input images V � Z X

with a small numberof clusters V:Ê �X

so that the distribution of models (features)inside each cluster t � + \ � Ê ��� ~� Ë ­ � � ½ e8x Ë ­ t � + \ � � Z �will be as close as possible to the original distributionst � +]\<� � Z � of images constituting the clusters. In [16] it isargued that the correct measure for quality of the represen-tation is the mutual information fraction ½(% ËÍÌ M '

½T% ½ Ì M ' . As well,the mutual information � � Ê�Î � � measures the compactnessof the representation. [16] derive a formal solution to thequality-compactness tradeoff problem which is a set of self-consistent equations that can be solved iteratively:

1. t � Ê � � Z �_� � % Ë ­ 'Ï % ± G ½ e 'SеÑ5Ò �8Ó � K���� t � + � � � Z �T� � t � + � � Ê �Y���2. t � +°\"� Ê �v� ~

¢�Ô � % Ë ­ ' � ½ e t � Ê � � Z ��t � +]\?� � Z �3. t � Ê �v� ~

¢ � Z t � Ê � � Z �4

Page 5: Unsupervised Clustering of Images using their Joint ...leibniz.cs.huji.ac.il/tr/acc/2003/HUJI-CSE-LTR... · windows. The models are used in our enhancement of [6] algorithm to jointly

Here Õ is the number of images and Ö � �v^ � Z � is a normal-ization factor. The Lagrange multiplier � determines thetradeoff between compactness and precision of the repre-sentation.

In [14] it was shown that the sIB algorithm has a superiorperformance over a range of other clustering methods (ag-glomerative, K-means and more), typically by a significantmargin, and even comparable with standard Naive Bayes(supervised) classifier on problems of document classifica-tion. The idea of the sIB algorithm is to start from a randompartition of the data into } clusters and sequentially take arandom sample � Z

from its cluster and reassign it to a newcluster Êk� , so that the mutual information � � Ê�Î|+�� (andthus the quality of the representation ½(% Ë�Ì M '

½(%�½ Ì M ' ) will be maxi-mized. Due to monotonic growth of � � Ê�Î|+�� the procedureis guaranteed to converge.

5. Results

5.1 Basic Texture Segmentation

As a basic test for our segmentation algorithm, we tooksynthetic mixes of gray-scaled textures from [6], each ofwhich consists of five different natural textures and ap-plied our segmentation algorithm with

2�@10O9 P(0�Q?. �×�<Ø ,2?@:0O9 P:4708RS6 �ÈÙ and ���È� decomposition levels, with@&Ú�0 B �7Û � wavelet filters. A typical result (hard partition) isshown in Fig. 3. The image is divided into 5 models, closeto each of the 5 main image regions. In soft partition (notshown here) similar-looking textures softly share some ofthe models, while distant textures don’t have any intersec-tions.

5.2 Clustering - synthetic example

This experiment was done to demonstrate the advantage ofimage segmentation as a preprocessing step for classifica-tion. Here we compare the results of the algorithm whenusing small ( Ü�Ý U Ü�Ý ) windows vs. taking one big window(2�@:0�9 P(0�Q?. � 2�@10O9 P:4F08RS6 � 0�Þ{A<2�. P(0�Q?.

), that is, skippingthe segmentation step and relating to each image as a singlemodel.

In both cases we used ����� , with@�Ú 0 B �FÛ � wavelet fil-

ters. We used two sets of synthetic images (three in eachset), and the algorithm classified them correctly when itused segmentation (Fig. 5), while without segmentation itfailed (Fig. 4).

5.3 Classification of Natural Views

We took 21 color pictures ( ß5¿:Ø U à Ü"Ù pix-els) of natural views in Australia (taken fromhttp://www.cs.washington.edu/research/imagedatabase)and divided them into 2, 3, 4, 5 and 6 clusters (see the

partition into 6 clusters in Fig. 6 and complete hierarchyin the supplemented material). Although class partitionof images is a subjective task, the obtained classificationlooks very natural. It is also nice to see almost hierarchicalsplitting of clusters as their number increases.

In order to compare the results of classification with priorimage segmentation with classification of non-segmentedimages, we run the algorithm with single window equal tothe image size. The obtained partition was much less obvi-ous (see supplemented material).

5.4 Classification of Painting styles

To demonstrate how our algorithm classifies images basedon their textural properties, we took 12 pictures of the fa-mous post-impressionist painter Paul Cezanne drawn in dif-ferent periods of his life with different drawing techniques.The algorithm divided the pictures into four clusters, wherethe similarity between the pictures inside each cluster canbe easily seen (see Fig. 7).

6. ConclusionThe presented work combines several ideas from differentareas of machine learning and image processing into onecoherent procedure for unsupervised content based imagesclassification. The major contribution of our work is theidea to use joint segmentation of a set of images for its clus-tering using image/document, segment/word analogy.

Acknowledgments

We thank Gal Chechik for providing us his implementationof sequential Information Bottleneck algorithm, prof. Naf-taly Tishby for useful advis es and Efim Belman for helpfuldiscussions.

References

[1] K. Barnard, P. Duygulu, and D. Forsyth. Modelingthe statistics of image features and associated text. InSPIE Electronic Imaging 2002, Document Recogni-tion and Retrieval IX, 2002.

[2] Thomas M. Cover and Joy A. Thomas. Elements ofInformation Theory. Wiley Series in Telecommunica-tions. John Wiley & Sons, New York, NY, 1991.

[3] Minh N. Do and Martin Vetterli. Texture similar-ity measurement using kullback-leibler distance onwavelet subbands. In In proceedings of IEEE Inter-national Conference on Image Processing, ICIP-2000,September 2000.

5

Page 6: Unsupervised Clustering of Images using their Joint ...leibniz.cs.huji.ac.il/tr/acc/2003/HUJI-CSE-LTR... · windows. The models are used in our enhancement of [6] algorithm to jointly

Figure 3: Basic Texture Segmentation.

(a) Cluster 1 of 2 (b) Cluster 2 of 2

Figure 4: Clustering without segmentation.

(a) Cluster 1 of 2 (b) Cluster 2 of 2

Figure 5: Clustering with prior segmentation.

[4] Jacob Goldberger, Hayit Greenspan, and Shiri Gor-don. Unsupervised image clustering using the in-formation bottleneck method. In DAGM-Symposium,pages 158–165, 2002.

[5] B. Gunsel, A. Ferman, and A. Tekalp. Temporal videosegmentation using unsupervised clustering and se-mantic object tracking. Journal of Electronic Imaging,7(3):592–604, July 1998. SPIE IS&T.

[6] Thomas Hofmann, Jan Puzicha, and Joachim M. Buh-mann. Unsupervised texture segmentation in a de-terministic annealing framework. IEEE Transac-tions on Pattern Analysis and Machine Intelligence,20(8):803–818, 1998.

[7] Daniel Keren. Recognizing image style and activitiesin video using local features and naive bayes.

[8] Swarup Medasani and Raghu Krishnapuram. Catego-rization of image databases for efficient retrieval usingrobust mixture decomposition. Computer Vision andImage Understanding: CVIU, 83(3):216–235, 2001.

[9] Kenneth Rose. Deterministic annealing for cluster-ing, compression, classification, regression and re-lated optimization problems. IEEE Trans. Info. The-ory, 80:2210–2239, November 1998.

[10] Y. Rui, T. Huang, and S. Chang. Image retrieval: cur-rent techniques, promising directions and open issues.

Journal of Visual Communication and Image Repre-sentation, 10(4):39–62, April 1999.

[11] Yevgeny Seldin. On unsupervised learning of mix-tures of Markovian sources. Master’s thesis, The He-brew University of Jerusalem, 2001.

[12] Yevgeny Seldin, Gill Bejerano, and Naftali Tishby.Unsupervised sequence segmentation by a mixture ofswitching variable memory Markov sources. In Pro-ceedings of the Eighteenth International Conferenceon Machine Learning (ICML 2001), pages 513–520,2001.

[13] S.Krishnamachari and M.Abdel-Mottaleb. Hierarchi-cal clustering algorithm for fast image retrieval. In InIS&T/SPIE Conference on Storage and Retrieval forImage and Video databases VII.

[14] Noam Slonim, Nir Friedman, and Naftali Tishby. Un-supervised document classification using sequentialinformation maximization. In In Proceedings of theá"â�ã�ä

Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval(SIGIR), 2002.

[15] L. Tang, R. Hanka, H.. Ip, K. Cheung, and R. Lam.Semantic query processing and annotation generationfor content-based retrieval of histological images. InSPIE EMedical Imagingg 2000, Document Recogni-tion and Retrieval IX, 2000.

6

Page 7: Unsupervised Clustering of Images using their Joint ...leibniz.cs.huji.ac.il/tr/acc/2003/HUJI-CSE-LTR... · windows. The models are used in our enhancement of [6] algorithm to jointly

(a) Cluster 1 of 6 (b) Cluster 2 of 6

(c) Cluster 3 of 6 (d) Cluster 4 of 6

(e) Cluster 5 of 6 (f) Cluster 6 of 6

Figure 6: Nature views classification.

7

Page 8: Unsupervised Clustering of Images using their Joint ...leibniz.cs.huji.ac.il/tr/acc/2003/HUJI-CSE-LTR... · windows. The models are used in our enhancement of [6] algorithm to jointly

(a) Landscapes drawn in early periods. The objects are stressed andstructure compositions more defined than in later period.

(b) Later (more abstract) landscape paintings. Forms are deliberatelysimplified and outlined with dark contours.

(c) Portraits. (d) Still life.

Figure 7: Classification of Paul Cezanne’s drawings by painting style.

8

Page 9: Unsupervised Clustering of Images using their Joint ...leibniz.cs.huji.ac.il/tr/acc/2003/HUJI-CSE-LTR... · windows. The models are used in our enhancement of [6] algorithm to jointly

[16] Naftali Tishby, Fernando Pereira, and William Bialek.The information bottleneck method. In Allerton Con-ference on Communication, Control and Computation,volume 37, pages 368–379. 1999.

[17] James Ze Wang, Gio Wiederhold, Oscar Firschein,and Sha Xin Wei. Content-based image indexing andsearching using daubechies’ wavelets. Int. J. on Digi-tal Libraries, 1(4):311–328, 1997.

9