data representation and pattern recognition in image mining-n d thokare

DATA REPRESENTATION AND PATTERN RECOGNITION IN IMAGE MINING

THOKARE NITIN D

M.E. SSADept of EE, IISc BangaloreSr.No. 4910-412-091-07119

Guide: Prof M N MurtyDept of CSA, IISc Bangalore

ABSTRACT

Currently image mining is forming an integral part of manyweb-based applications and hence is the interest of many re-searchers. Though it’s related fundamentals are developedthrough many research areas, the important problem of im-age representation and it’s interpretation demands more at-tention for real-time application and to get close-to-humanperformance. Specifically we are working towards image an-notation that needs a good model to be learned and appliedin real-time. We will see present state-of-the art techniquesin image mining related fields like image representation, im-age retrieval and image annotation. In image retrieval, theHSV color space and Local Binary Pattern (LBP) features areused for image representation. For large collections of im-ages Euclidean distance is found to give more relevant resultsas compared to other proximity metrics.

Index Terms— Image Retrieval, Image Annotation, Lo-cal Binary Pattern, Association Rule Mining, Heavy taileddistribution

1. INTRODUCTION

As visual stimuli forms most important part of human per-ception, image and video part of multimedia are getting moreattention of different researchers from Computer vision, Ma-chine Learning, Multimedia systems and Data mining. Imagemining, the application of Computer Vision, machine learn-ing, image processing, information retrieval, data miningand Databases, includes image related tasks like image re-trieval[1],[2],[3], automatic image annotation[4], extractionof implicit knowledge and image relationships.

Increase in use of social web application and semanticweb, forms the basis of inspiration for development of webbased image annotations and image retrieval systems. Manyweb-based systems/sites are developed till now which use dif-ferent techniques like Content based image retrieval(CBIR) orQuery by visual example(QBVE), Query by semantic exam-ple(QBSE) or textual query.

The major challenges lie behind image representation andusing pixels(low level information) to obtain relationships be-tween objects(high level information) contained in images.Image mining in particular involves all or some steps fromimage preprocessing, transformation, feature extraction, in-terpretation and searching for object/knowledge within imagedatabase [5],[6]. Preliminary work in this area has been doneon object detection, object recognition, association rule min-ing, clustering and classification.

2. PROBLEM DEFINITION

Representing the raw image (pixels-containing low level in-formation) into a useful and more informative format (highlevel information) and using this representation to annotateimages and obtain knowledge about image database is the ba-sic need of many tasks under image mining. So better tech-niques for these are needed to get satisfactory results in im-age retrieval and knowlege mining from image collections.Different techniques using Gaussian mixture models are sug-gested by different authors to model the distribution of im-age labels from training dataset and these models are used toassign labels(annotate) to test image[7],[8]. In practice Gaus-sian mixture model may not fit all the data. So we use a Heavytailed distribution for this modelling that may represent realworld data more accurately[9].

3. RELATED WORK

In this section we see the following fundamental work donein image mining:

• image representation

• image retrieval

• image annotation and

• association rule mining

3.1. Image Representation

This is the first and foremost important part of image min-ing. It is necessary to compare images and find similarity(ordissimilarity) between pair or collection of images. This canbe done by comparing images pixel by pixel but it is moreprone to errors due to translation or scaling in images. Thusto take care of such variations we should consider translation,rotation and scale invariant features for image representation.Three main features of image are color, texture and shape.

3.1.1. Color

Generally gray image is considered for image segmentation,object detection or object recognition tasks. But for imagemining it is more useful to consider color information. Dif-ferent color spaces like RGB, YCbCr or YUV, HSV etc. canbe used to get color contents of image. Out of these as HSVrepresentation is more perceptually relevant, we chose HSVcolor space to represent color contents of image in this work.

3.1.2. Texture and Shape

Widely used textural features are coarseness, contrast, direc-tionality, line-likeness, regularity and roughness. All thesefeatures are present in all images at different levels. Sim-ilarly different shape information like lines, circles/ellipses,rectangles are considered as features representing image. Inthis, given image is divided into patches(overlapping or non-overlapping) and above features(texture and shapes) are ex-tracted from each patch. Collectively all patches form thefeature vector corresponding to that image and can be usedfor classification or comparison purpose.

3.2. Image Annotation

Lebeling a given image with semantically correct labels thatequivalently explain the contents of image is the task of im-age annotation algorithms. Image annotation can be done indifferent ways. Given an image, divide image into differentsegments that corresponds to possibly different objects usingimage segmentation and apply object recognition algorithmon each segmentation. And finally annotate the image withall those labels found in object recognition process. As a partof this I completed the face detection and recgnition usingimproved LBP under Bayesian framework[10]. One result ofthis work is shown in Figure 1.

But image segmentation itself is an active research areadue to complex variation of object appearance from image toimage. Also for object recognition, training of all objects, thatmay be present in test image, must have been done which isnot a feasible task. To avoid such complexity we follow an-other approach to learn a probabilistic model that will learneach image with it’s local or/and global features and corre-

Fig. 1. One result of face detection experiments. Detectedfaces are shown using rectangles.

sponding labels given to image. This work is further exploredin preliminary experiment section.

3.3. Image Retrieval

For image retrieval we can use text based or content based ap-proach. In text based approach the textual information givenabout images is analysed and the appropriate label(s) for thatimage is(are) decided. The result of image annotation can beused as a textual representation of image for text based im-age retrieval. This(text based) approach is in use since thebirth of image retrieval concept. But now a days the laterconcept i.e. content based image retrieval is getting more at-tension of researchers. In this the local and global featureslike color, texture and shape are extracted from image and areused for comparison between two images. [11] and [12] havediscussed about seven rotation, translation and scale invari-ant moments in image analysis. The first four of them are asfollows:

φ1 = µ20 + µ02 (1)

φ2 = (µ20 − µ02)2 + 4µ112 (2)

φ3 = (µ30 − 3µ12)2 + (3µ21 − µ03)2 (3)

φ4 = (µ30 + µ12)2 + (µ21 + µ03)2 (4)

where, µij is (i + j)th order normalized central moment. Inimage retrieval systems relevance feedback is commonly usedfor improvement of query and to get more relevent result.Using relevance feedback the query can be internally modi-fied and used to get better results. Consider, X(i) denote theQuery at ith step in a particular relevance feedback session,then modified Query X(i+1) can be computed as:

X(i+ 1) = αX(i) + β∑YkεR

Yk|R| − γ

∑YkεN

Yk|N |

where, Yk is an image from retrieved images at ith stage, Ris set of relevant examples and N is set of non-relevant ex-amples decided by user in feedback and α, γ, β are constantsto control importance given to previous query, relevant exam-ples and non-relevant examples respectively.

This is known as Hard relevance feedback where only rel-evant or non-relevant option are available to user. The soft rel-evance feedback can be obtained by providing the user withmore than two options [13] like ‘Highly Relevant (HR)’, ‘Rel-evant (R)’, ‘No Opinion (NO)’, ‘Non-relevant (NR)’, ‘HighlyNon-relevant (HN)’. In this case the modified Query can beobtained by,

X(i+ 1) = αX(i) + β∑YkεR

rk×Yk∑YkεR

rk+ γ

∑YkεN

rk×Yk∑YkεN

rk

where, rk are relevance weights for different options as fol-lows(e.g.):

rk =

0.5 if YkεHR0.1 if YkεR0.0 if YkεNO−0.1 if YkεN−0.5 if YkεHN

3.4. Association Rule Mining

Association analysis is useful for discovering useful relation-ship hidden in large data collections[14]. These relations canbe useful to user, if represented as rule, for important decisionmaking. Collecting the user feedback during relevance feed-back session, it is possible to exploit the user relevant conceptfrom feedback and association rules that in turn will help toimprove precision rate.

When user start a new query session, the a priori relevanceassociation rules about this query are first retrieved[13]. FirstUsing these associations the results are shown. Then depend-ing upon the soft relevance feedback given by user, the querycan be recomputed and used to further improve the result. Theassociation rule formed from many users’ experience is usefulto get better image retrieval result for future sessions.

3.5. Heavy-Tailed Distributions

Heavy-tailed distributions are probability distributions whosetails are not exponentially bounded, i.e. tails are heavier thanexponential distribution. More precisely, if F (x) denotes thecumulative distribution function of a random variable X , andF̄ = 1− F (X), the F (X) is said to have heavy-tailed distri-bution if [9],

F̄ ∼ cx α

where, c is a positive constant, 0 < α < 2 and a(x) ∼ b(x)means limx→∞ a(x)/b(x) = 1.

Pareto distribution, Cauchy distribution, Zipf’s law aresome of common heavy-tailed distributions. Cauchy distri-bution has the density function given by,

f(x;x0, γ) =1

πγ

[1 +

(x−x0

γ

)2]

=1

π

[γ

(x− x0)2 + γ2

] (5)

where, x0 is the location parameter, specifying the locationof the peak of the distribution, and γ is the scale parameterwhich specifies the half-width at half-maximum (HWHM).

Zipf’s law states that, for number of appearances of anobject, denoted by R, and it’s rank, denoted by n

R = cn-β

for some positive constants c and β. For n=1, Zipf’s law statesthat popularity (R) and rank (n) are inversly proportional.

4. PRELIMINARY EXPERIMENTS AND RESULTS

In this section details of image retrieval part with it’s resultsare given. Also the initial approach for image annotation partis explained here.


Image retrieval can be done using textual or visual informa-tion. Textual image retrieval method includes image retrievalusing the text available in the information/ explaination aboutimage. Textual information also includes labels or annota-tions of image. Whereas, the visual-content based image re-trieval includes the use of visual information of an image.This information can be obtained in the form of different fea-tures like color, texture, shape etc.

In this work we completed image retrieval part using vi-sual features as follows:

• Initially convert the image from RGB to HSV colorspace, then divide the image into 6 × 4 (or 4 × 6) i.e.24 number of same size patches(overlapping or non-overlapping) for each of H,S and V separately.

• For each patch find the histogram in the range [0 :0.25 : 1] and concatenate three histograms(each withdimension 5) corresponding to H,S and V plane.

• To add texture features, grayscale image is transformedinto an image containing local shape and texture in-formation with the help of Local Binary Pattern(LBP)which is an illumination invariant feature[15]. A 256-bin histogram is formed using image LBP.

Fig. 2. Local Binary Pattern computations

• Hence the whole image is represented by a 616 dimen-sional vector(24 × 15 color features and 256 LBP fea-tures).

Figure 2 shows how LBP is computed for each pixel posi-tion. For similarity(or dissimilarity) measure between queryimage and images from dataset, different metrics are avail-able like Euclidean distance, canonical dot product, cosine ofangle between two vectors, City block distance, Minkowskidistance, Hamming distance, Chi-square, Kull-back Leiblerdivergence, Jeffrey divergence, Bhattacharyya distance etc.Figure 3 shows the results of image retrieval using HSVcolor, LBP texture features and Euclidean distance dissimi-larity measures [17].


[7] discusses the probabilistic modelling for image annotationand retrieval using semantic labels given to training images.Using Gaussian Mixture Model(GMM) the algorithm trainsthe model from training images provided with labels. On thesimilar line we propose the following algorithm for imageannotation:Let ω = {ω1, ω2, ..., ωm} be the set of labels given toI = {I1, I2, ..., In} images. Each image can have mul-tiple labels. To annotate the test image we need to knowhow features in the image are related to their correspondinglabels. To model these relationships we learn the mixturemodel, with heavy-tailed distribution function, correspondingto each ωiεω. Let Ii ⊆ I be the set of images that are anno-tated with label ωi. Then for each image:

1. Convert the image from RGB to HSV color space.Divide the HSV color image into 16 × 16 overlap-ping samples. These samples can be obtained by scan-ning the image in left-to-right top-to-bottom sequenceby overlapping of 50% area between each adjuscentpairs.

2. For each patch compute the 616 dimensional featurevector as explained in above section 4.1.

3. Considering all these samples as independent we learnthe mixture of K distributions that maximizes their like-lihood using Expectation Maximization (EM) algo-rithm. This produces the following class conditionaldistribution for each image:

PX/W (x/Ii) =∑K

πIikF (x, θIik)

where, πIik and θIik are maximum likelihood parame-ters for mixture component k

4. Applying hierarchical EM-algorithm[7] to above image-level mixture leads to a conditional distribution forclass ω of

PX/W (x/wi) =∑K

πwikF (x, θwik)

5. Now for annotation of test image It we divide the testimage and find corresponding feature vectorsXt as dis-cussed in section 4.1.

6. For each class ωiεω compute,

logPW/X(wi/Xt) =logPX/W (Xt/wi) + logPW (wi)− logPX(Xt)

where,

logPX/W (Xt/wi) =∑xεXt

logPX/W (x/wi)

PW (wi) is computed from the training set as the pro-portion of images containing annotation wi and PX isa constant in the computation above for all wi

7. Annotate the test image with the classes wi havingposterior probability, logPW/X(wi/Xt), greater thansome threshold.

5. OBSERVATIONS AND CONCLUSIONS

For image representation to retain more visual information,the HSV for color model is found to be more relevant. In caseof texture representation out of many methods available, LBPis found to be most simple and gives more relevant resultsas compared to others. Further additional use of shape infor-mation using image invariant moments the result was foundto be less relevant as compared to that with only HSV colorand LBP texture features. But still for some Queries (contain-ing larger objects with similar color and texture but differentshapes) the true positives in the results are found to be lessand hence demand shape features to be considered. So a bet-ter shape model is necessary.

After experiments with different distance metrics the Eu-clidean distance metric for dissimilarity measure is found tobe simple and most appropriate for image retrieval among acollection large variety of images.

6. FUTURE WORK

6.1. Image Representation

Search for more accurate shape features is needed.

Fig. 3. Image Retrieval results: First column shows the Query Image and remaing columns show top 7 retrieved results indescending rank orders. All images are from Reduced Corel Database-Simplicity1000 that contains 10 classes each with 100images[16].


1. Experiments with Gaussian mixture model(GMM)needs to be completed.

2. Use of Heavy tailed distributions instead of GMM asbetter approximate of model is to be examined.


1. Image retrieval using combination of both visual i.e.color, texture and shape information and textual i.e. im-age annotations can be done to improve the results.

2. Application of association rule mining in image anno-tation and relevance feedback for query modification inimage retrieval.

7. REFERENCES

1. N. Vasconcelos, “From Pixels to Semantic Spaces: Ad-vances in Content-Based Image Retrieval ”, Computer,Vol. 40, No. 7, pp. 20-26, 2007

2. Arnold W. M. Smeulders, Marcel Worring, SimoneSantini, Amarnath Gupta, Ramesh Jain, “Contentbased image retrieval at the end of the early years”, IEEE Transactions on Pattern Analysis and MachineIntelligence, Vol. 22, pp. 1349-1380, 2000

3. V Vani, Sabitha Raju, “A Detailed Survey on Query byImage Content Techniques ”, Proceedings of the 12thinternational conference on Networking, VLSI and sig-nal processing, pp.204-209, 2010

4. Allan Hanbury, “A survey of methods for image anno-tation ”, Journal of Visual Languages and Computing,Vol.19, No.5, pp.617-627, October-2008

5. Gonzalez and Woods, “Digital Image Processing ”,Prentice Hall, 2nd Edition, 2002

6. Anil K Jain, “Fundamentals of Digital Image Process-ing ”, Prentice Hall, Edition: Paperback, 1989

7. Gustavo Carneiro, Antoni B. Chan, Pedro J. Moreno,Nuno Vasconcelos, “Supervised learning of seman-tic classes for image annotation and retrieval ”, IEEETransactions on Pattern Analysis and Machine Intelli-gence,Vol. 29, Issue 3, pp. 394-410, 2007

8. J Li, J Z Wang, “Real-TIme Computerized Annotationof Pictures ”, IEEE Transactions on Pattern Analysisand Machine Intelligence, Vol. 30, Issue 6, pp.985-1002, 2008

9. Mark Crovella, “Performance Evaluation with HeavyTailed Distributions ”, Revised Papers from the 7th In-ternational Workshop on Job Scheduling Strategies forParallel Processing, pp.1-10, June 16, 2001

10. H Jin, Q Liu, H Lu, X Tong, “Face Detection UsingImproved LBP Under Bayesian Framework ”, Third In-ternational Conference on Image and Graphics, pp.306-309, 2004

11. M K Hu, “Visual Pattern Recognition by Moment In-variants ”, IRE Transaction on Information Theory, Vol.8, Issue 2, pp. 179-187, 1962

12. J Flusser, “Moment Invariants in Image Analysis ”,Proceedings of World Academy of Science, Egineeringand Technology, Vol. 11, pp. 196-201, 2006

13. Peng-Yeng Yin, Shin-Huei Li, “Content-based imageretrieval using association rule mining with soft rel-evance feedback ”, Journal of Visual Communicationand Image Representation, Vol. 17, Issue 5, pp. 1108-1125, 2006

14. P N Tan, M Steinbach, V Kumar, “Introduction To DataMining ”, Pearson, 2009

15. T Ojala, M Pietikainnen, T Maenpaa, “MultiresolutionGray-scale and Rotation Invariant Texture Classifica-tion with Local Binary Patterns ”, IEEE Transaction onPattern Analysis and Machine Intelligence, Vol. 24,No.7, pp.971-987, Jul-2002

16. Simplicity1000 Image dataset,“http://wang.ist.psu.edu/docs/related/ ”

17. R O Duda, P E Hart, D G Stork, “Pattern Classification”, 2nd edition, 2000

data representation and pattern recognition in image mining-n d thokare

Documents