shiri gordon electrical engineering – system, faculty of engineering, tel-aviv university

Unsupervised Image Clustering using

Probabilistic Continuous Models and

Information Theoretic Principles

Shiri Gordon Electrical Engineering – System, Faculty of Engineering,

Tel-Aviv University

Under the supervision of: Doctor Hayit Greenspan

Introduction : Content-Based Image Retrieval (CBIR)

• The interest in Content-Based Image Retrieval (CBIR) and efficient image search algorithms has grown out of the necessity of managing large image databases

• Most CBIR systems are based on search-by-query– The user provides an example image– The database is searched exhaustively for

images which are most similar to the query

CBIR: Issues

• Image representation

• Distance measure between images

• Image search algorithms

• Qbic - IBMBlobworld – BerkeleyPhotobook – MITVisualSEEk – Colombia

What is Image Clustering ??

• Performing supervised / unsupervised mapping of the archive images into classes

• The classes should provide the same information about the image archive as the entire image collection

Why do we need Clustering ??

• Faster search-by-query algorithms

• Browsing environment

• Image categorization

Queryimage

Clustercenter

Images

Why do we need Clustering ??

Clustercenter

Images

Why do we need Clustering ??“Yellow”

“Blue”

“Green”

Clustercenter

Images

GMM-IB System Block-DiagramClustering via

Information-Bottleneck (IB) method

Image GMM

Cluster GMMImages Image

Clusters

• Feature space= color (CIE-lab); Spatial (x,y); …

• Grouping the feature vectors in a 5-dimensional space

• Image is modeled as a Gaussian mixture distribution in feature space

Image Representation[ “Blobworld”: Belongie, Carson, Greenspan, Malik, PAMI 2002]

Pixels Feature vectors Regions

Image Representation via Gaussian Mixture Modeling (GMM)

• Feature Space GMM

• Parameter set :

• Expectation-maximization (EM) algorithm- to determine the maximum likelihood parameters of a mixture of k Gaussians

– Initialization of the EM algorithm via K-means– Model selection via MDL (Minimum Description Length)

1 1( | ) exp ( )( )2(2 ) | |

j jj jdj j

f y yy

10 , 1

dj jR is a d d positive definite matrix

1{ , , }kj j j j

5-dimensional space:Color (L*a*b)&Spatial (x,y)

Category GMM

Images Image Models Category Model

• Variability in colors per spatial location

• Variability in location per spatial color

8.529.128.714.4(4)flowers

27.714.236.330.2(3)sunset

30.442.110.429.6(2)snow

16.434.832.56.5(1)monkey

(4)(3)(2)(1)Image\category

• KL distance between Image model to category model:

• Kullback-Leibler (KL) distance between distributions:

GMM – KL Framework [Greenspan, Goldberger, Ridel . CVIU 2001]

( )( ) 1( || ) log log( ) ( )

nI ItI

I C fItC C It

f xf xD f f Ef x n f x

Imagedistribution

Category distribution

Feature setextracted

from image

Data setsize

• The desired clustering is the one that minimizes the loss of mutual information between objects and features extracted from them

• The information contained in the objects about the features is ‘squeezed’ through a compact ‘bottleneck’ of clusters

Unsupervised Clustering using the Information-Bottleneck (IB) principle

•N.Slonim, N.Tishby. In Proc. of NIPS 1999

Clusters

Information Bottleneck Principle Motivation

| |max ( ; )c K

FeaturesNumber ofrequired clusters

min ( ( ; ) ( ; ))C

I X Y I C Y

Objects

• The minimization problem posed by the IB principle can be approximated by various algorithms using a greedy merging criterion:

Information Bottleneck Principle Greedy Criterion

1 2( , ) ( , ) ( , )before afterd c c I C Y I C Y

1 21 2

, 1,2 1 2

( , ) ( , )( , ) log ( , ) log( ) ( ) ( ) ( )

y i yi

p c y p c c yp c y p c c yp c p y p c c p y

1 21,2

( ) ( ( | ) || ( | ))i ii

p c D p y c p y c c

KL distance:Prior probability ( || ) logffD f g Eg

GMM-IB Framework

clusters

Images

Prior probability

KL distance

1 ( | )| | X C

GMM p y XC

1 2 1 21,2

( , ) ( ) ( ( | ) || ( | ))i ii

d c c p c D p y c p y c c

min ( ( ; ) ( ; ))C

I X Y I C Y

Feature vectors

Example 8

ResultsAIB - Optimum number of clusters

Loss of mutual information during the clustering process

ResultsAIB - Generated Tree

Mutual Information as a quality measure

( , )( ; ) ( , ) log( ) ( )x X y Y

p x yI X Y p x yp x p y

• The reduction in the uncertainty of X based on the knowledge of Y:

• No closed-form expression for a mixture of Gaussian distribution

• The greedy criterion derived from the IB principle provides a tool for approximating this measure

Mutual Information as a quality measureExample

C1 C2 C3

I(C;Y) 1.51 1.32 1.18I(X;Y) 2.73 2.72 2.72

Results

• Image database of 1460 images selectively hand-picked from the COREL database to create 16 labeled categories

• Building the GMM model for each image

• Applying the various algorithms, using various image representations to the database

ResultsRetrieval Experiments

Clustering for efficient retrieval

Comparing between clustering methodologies

ResultsMutual Information as a quality measure

• Comparing between image representations

1.67SIB + average GMM1.68K-means + reduced GMM1.63AIB

I(C;Y)Clustering method • Comparing between clustering algorithms

Summary• Image clustering is done using the IB method

• IB is applied on continuous representations of images and categories with Gaussian Mixture Models

• From the AIB algorithm :– We conclude the optimal number of clusters in the database– We have a “built-in” distance measure– The database is arranged in a tree structure that provides a browsing

environment and more efficient search algorithms– The tree can be modified using algorithms like the SIB and K-means

to achieve a more stable solution

Future Work

• Making the current framework more feasible for large databases:

– A simpler approximation for the KL-distance– Incorporating the reduced category GMM into the clustering algorithms

• Performing relaxation on the hierarchical tree structure

• Using the tree structure for the creation of a “user-friendly” environment

• Extending the feature space

shiri gordon electrical engineering – system, faculty of engineering, tel-aviv university

unsupervised image clustering

image archive

themthe information

desired clustering

entire image collectionwhy

gaussian mixture distribution

framework greenspan

uncertainty of x

Documents

friday lunchtime lecture: data as culture 2014, with julie...

smart medical home israel gannot bio-medical engineering...

tel aviv university faculty of engineering school of...

the principle of reﬂexive practice · the principle of...

boolean algebra - faculty of engineering - tel aviv...

a theoretical analysis of creativity methods in...

text comparison of genetic sequences shiri azenkot pomona...

assaf likhovski june 2018 tel aviv university faculty of...

tomer shiri - a bug's life - how to avoid production...

s. shiri november, 2005optics,tnt 3d vector representation...

shri saibaba sansthan of shiri poem of...

uri kanonov school of electrical engineering tel aviv

ranunculusouramericanroots.com/2020fallbrochureweb.pdf ·...

tel aviv...

fun together responding to vulnerable families all rights...

a model of caterpillar locomotion based on assur tensegrity...

45,¢ - nasa · 45, ¢ nasa ... david gottlieb eitan tadmor...

dr. gilead fortuna, shiri freund koren

tel-aviv university - tau...tel-aviv university the iby and...

guy even school of electrical engineering tel-aviv...