faro visual attention for implicit relevance feedback in a content based image retrieval

Copyright © 2010 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail [email protected]. ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00

Visual Attention for Implicit Relevance Feedback in a Content Based ImageRetrieval

A. Faro, D. Giordano, C. Pino, C. Spampinato∗

Department of Informatics and Telecommunication EngineeringUniversity of Catania, Catania, 95125, Italy

Abstract

In this paper we propose an implicit relevance feedback methodwith the aim to improve the performance of known Content BasedImage Retrieval (CBIR) systems by re-ranking the retrieved imagesaccording to users’ eye gaze data. This represents a new mechanismfor implicit relevance feedback, in fact usually the sources takeninto account for image retrieval are based on the natural behaviorof the user in his/her environment estimated by analyzing mouseand keyboard interactions. In detail, after the retrieval of the im-ages by querying CBIRs with a keyword, our system computes themost salient regions (where users look with a greater interest) of theretrieved images by gathering data from an unobtrusive eye tracker,such as Tobii T60. According to the features, in terms of color,texture, of these relevant regions our system is able to re-rank theimages, initially, retrieved by the CBIR. Performance evaluation,carried out on a set of 30 users by using Google Images and “pyra-mid” like keyword, shows that about the 87% of the users is moresatisfied of the output images when the re-raking is applied.

Keywords: Relevance Feedback, Content Based Image Retrieval,Visual Attention, Eye Tracker

1 Introduction

During the last ten years, with the growing of Internet and the ad-vances in digital cameras research, huge collections of images havebeen created and shared on the web. During the past the only way tosearch digital images was done by keyword indexing, or simply byimages browsing. The needs to fast find the images in a large digi-tal images databases have brought researchers in image processingto open the way to develop Content Based Image Retrieval (CBIR),i.e., a system for images retrieval based on the concept of similarvisual content.Moreover, recent researches in information retrieval are basedon the consideration of the user’s personal environment in or-der to better understand the user’s needs. Indeed, in CBIRsystems not always user gets results fully related with theimage query especially in web-based image retrieval, suchas Google Images (http://images.google.it) or Yahoo!’s PictureGallery (http://gallery.yahoo.com/). This is mainly due to the factthat metadata often cannot explain well the content of an image andeven when the description is exhaustive the attention of the usermay be only in some portions of the image, which often correspondto the greatest salience areas of the image. In order to take into ac-count these user’s needs, a relevance feedback mechanism must beintegrated in CBIRs.

∗e-mail: afaro,dgiordan,cpino, [email protected]

Relevance feedback is a key-feature in image retrieval systems,whose main idea is to take into account the outputs, initially re-trieved, and to use the user’s feedback, on the relevance of themwith the initial query, in order to perform a new query. In literaturetwo types of feedback can be defined: explicit feedback and implicitfeedback. Since the former method requires a higher effort on theuser’s side, because it may be difficult to get explicit relevance as-sessments from searchers [Xu et al. 2008], implicit feedback meth-ods have gained more attention, where feedback data are obtainedby observing the user’s actions and in his/her natural environment.Until today the most explored and implemented sources for im-plicit relevance feedback have been the interactions of users withthe mouse and the keyboard [Kelly and Teevan 2003]. A new evi-dence source for implicit feedback, explored in the last few years,e.g., in [Moe et al. 2007], [Miller and Agne 2005], [Granka et al.2004], is the one related to the user’s visual attention (provided bythe eye movements), which introduces a potentially very valuablenew dimension of contextual information [Buscher 2007].Indeed, in a CBIR the knowledge of the human visual attentionwould allow us to select the most salient parts of an image, whichcan be used both for image retrieval, as in [Marques et al. 2006], andfor relevance feedback mechanisms implementation. Moreover, thedetection of these salient regions observed by a user is a crucial in-formation for finding image similarity.In this paper we propose an implicit relevance feedback mechanismby using visual attention implemented with a Tobii Eye Tracker T60to be integrated in a web based content based image retrieval, whichaims at re-ranking, using the most salient regions extracted by theeye tracker, the output images provided by a web-based CBIR. Theproposed system represents a novel approach of eye tracking forimage retrieval since other approaches in literature, e.g. [Oyekoyaand Stentiford 2006], [Oyekoya and Stentiford 2007], are based onroughly retrieval engines based on high level features and it allowsboth users with disabilities to perform a feedback of the obtainedresults and generic users to tune the CBIR with their cognitive per-ception of the images, (e.g., “I unconsciously prefer image with redcolors-like”, or “When I look to Egyptian Images I prefer to seePyramids and Sphinx”). The remainder of the paper is as follows:in the section 2 the architecture of the proposed system in discussed.In the section 3 an experimental evaluation on the Google imagesCBIR is performed and the experimental results on a set of 30 usersare shown. Finally, in the last section conclusion and future workare, respectively, presented.

2 The Proposed System

The flow diagram of the proposed system is shown in fig. 1: 1) theuser insert a keyword for image searching, 2) the web-based CBIRretrieves the most relevant images whose metadata contain the in-serted word, 3) the user looks at the output images and the systemretrieves the most relevant regions by using Tobii facilities and theirfeatures (e.g. color, texture, etc ...), 4) the system re-ranks the out-put images according to the above extracted information.Our system uses the Tobii eye tracker to capture an implicit rele-vance feedback and to classify the images in a different order ofrelevance with respect to the initial classification, in order to im-

73

Figure 1: Implicit Relevance Feedback for the new Ranking Method in web-based CBIR.

prove the ranking provided by the search on a CBIR environment.The aim of this is to capture the user’s gaze fixations in order toidentify the characteristics of the images s/he declares to be ofher/his interest. This will allow the tool to retrieve automaticallyfurther relevant images. The tool may be also able to discover in anunsupervised way the characteristics of the images of potential userinterest. Indeed, it is able to derive the characteristics of the imagesof user interest by considering the images, which mainly capturedthe user attention, e.g., by taking into account the user visual ac-tivity over the analyzed images. In the former case the tool learnshow to select further relevant images, whereas in the latter case itcould be also able to reclassify the images already examined by theuser suggesting to her/him of reconsidering more deeply some po-tentially relevant images.Although the system proposed has been only tested on Google im-ages to improve the precision of the retrieval, it may be applied toimprove the precision of the retrieval of any document on the basisof the images featuring the documents.

Figure 2 shows the general architecture of proposed implicit rel-evance feedback, where we point out the system ability of rear-ranging the images initially retrieved from a web-based CBIR (e.g.Google Images) without any user supervision, i.e., only on the basisof the user gaze fixations. A fine tuning of the characteristics to bepossessed by the images may be carried out by the system on thebasis of the user agreement for a better rearrangement of the imagesor for extracting relevant images from other datasets. In detail, there-ranking mechanism is composed of the following steps:

• First Image Retrieval. The user enters some keywords onthe used CBIR and observes the results. During this phase,the eye tracker stores gaze fixations on the thumbnails of theretrieved images, which most captured the user attention andher/his eye movements;

• Features Extraction. One of the crucial point in CBIR is thechoice of low-level features, to be used to compare the imageunder test with the queried image. The features combinationdetermines the effectiveness of research. The extracted fea-tures can be related to the entire image, so we are talking aboutglobal features, or to its portion, then we are talking about lo-cal features. The local features extraction is more complex,because it requires a first step for the detection of the impor-

tant regions of the image, such as clustering algorithms andobject recognition, but it permits a considerable reduction ofcomputational complexity of search algorithms.In our case the detection is simplified by the eye tracker,which allows us to identify the regions of major interest. Thelocal features, considered for describing image content, arethe Contrast C, Correlation Cr , Energy E, Homogeneity H,Gabor filters G-Maps (24 maps: 6 scales and 4 orientations)and two novel features that describe the:

– Brightness computed as rbright = µ+ µ3·2552

10;

– Smoothness computed as rsmooth = ( 1µ2

+ 1µ4

+E +

H);

The above features are based on the moments of the histogramH of the gray levels. The nth moment of the histogram ofgray levels is represented by

µn (x) =

L−1∑i=0

(xi − µ) · p(xi)

where p(xi) is the probability of finding a pixel of the im-age with gray level xi (given by the histogram H), L is thenumber of gray levels and µ the average value. Therefore, inthe proposed system, the images returned by the CBIR andthe file containing the data taken by the eye tracker are pro-cessed in order to identify the most relevant images and theirfeatures. Each image is then represented by a feature vectorF = [C,Cr, R,H,G−Maps, rbright, rsmooth].

• Re-Ranking. The values of the extracted features, whichshould be possessed by the images to best fit the user inter-est, are then processed to produce a ranking of the imagesinitially retrieved. In detail, we compute a similarity score(which represents a sort of implicit relevance feedback) be-tween the most relevant images, detected at the previous step,and the images retrieved at the first step (see fig. 3). Themetrics to evaluate the similarity is based on the concept ofdistance, measured between the feature vector Frel (normal-ized between 0 and 1) of the most salient images (extracted atthe previous step) and the feature vector Fret (normalized be-tween 0 and 1) of the images initially retrieved (at step 1). The

74

Figure 2: System Architecture.

images are re-ranked by using this similarity score, computedas:

f(IRel, IRet) =N∑i=1

wi · Ωi(f irel, f iret)

w1 + w2 + .....+ wN = 1(1)

where IRel, IRet, f irel, firet are respectively the relevant im-

age detected at the previous step, the image initially retrieved,the ith feature of the N features of the vector F of the imageIRel and IRet. Ωi is the fitness function related to the featuresf irel, f

iret and is computed as:

Ω = e−12 ·(f

iret−f

irel

) (2)

Finally, the retrieved images are ordered, hence re-ranked, ac-cording to the decreasing values of the similarity score f .

The relevance feedback detected by the eye tracker could be im-proved by taking into account the ranking carried out by other meth-ods, e.g., by the ones, which model the user behavior during thephase of image analysis from how the user operates on the mouseand keyboard.

3 User Interface and Experimental Results

The system has been implemented by integrating the functionalityof the Tobii Studio to Matlab 7.5 responsible for processing the out-put provided from the eye tracker. The Tobii studio makes possibleto register a web browsing, setting appropriate parameters such asthe URL and the initial size of the window on the web browser.By default the web browsing is set to http://images.google.com/ ashomepage, whereas the window size and resolution are put equalto the entire screen and the maximum resolution allowed by themonitor. After a proper training phase of the instrument, the useris authorized to start regular recording sessions that terminate bypressing the F10 key on the keyboard. At the end of the sessionthe user should confirm the export in textual form of the two filesrelated to fixations and events needed for the computation of therelevance feedback. Thus, the information representing the gaze

Figure 3: Eye Tracker with the implicit relevance feedback producean image input for CBIR system.

fixations and the one related to the images, which are merged in thesame picture, are actually separated into two files.To evaluate the effectiveness of the proposed system for increasingthe precision of the information retrieval carried out by Google Im-ages, we will show below how the system rearranges significantlythe collection of images proposed by Google in response to theword “pyramid” and we will evaluate the performance increase asperceived by a set of 30 users. Indeed, such collection is proposedwithout any knowledge of the user interest by merging images of“pyramid” where the subject is either a monument or a geometricsolid (see fig. 4).With the eye tracker we may go insight the user interests, by dis-

covering, for example that s/he is more interested in the pyramids asmonuments since the more fixed images are related to the Egyptianpyramids (as shown by the heatmap in fig. 5). With this informa-tion at hand it is relatively easy for the system to discover, afterthe recording session, the images relevant for the user following the

75

Figure 4: Google Ranking for “Pyramid” Keyword.

processing procedure pointed out in the previous section.Fig. 6 shows the collection of the images as re-proposed by our

Figure 5: Gaze Fixations on the Images retrieved by Google usingthe “Pyramid” keyword.

Figure 6: New Images Pyramid Ranking according to the EyeTracker feedback given by the user.

system. The new ranking correctly suggests a sequence that favorsthe pyramids more similar to those observed and then actually re-quested by the user. The user’s will was caught with an implicitrelevance feedback by taking into account that s/he was particu-larly attracted by a picture with the Sphinx in the foreground andthe pyramid in the background.The proposed system was then able to discover meaningful infor-mation from how the perception process has been carried out by theuser. Indeed, by the new re-proposed ranking, at the top two placesthere are images with the pyramid and the Sphinx.Finally, we tested the performance of the proposed system on a setof 30 users. In detail, after the re-ranking the user was requested tosay if the first five retrieved images were more or less relevant to theinserted word with respect to the ones obtained by Google Images.The results are reported in table 1, where we can see that 86.6% of

the users was satisfied after the re-ranking, the 6.7% of the userswas indifferent and the 6.7% was less satisfied.

Less Satisfied Indifferent More SatisfiedUsers 2 2 26

% 6.7 6.7 86.6

Table 1: Qualitative Performance Evaluation on a set of 30 Users

4 Conclusions and Future Work

The proposed model shows that the use of an eye tracker to de-tect an implicit feedback may greatly improve the performance ofa search in a CBIR system. Future developments will concern thepossibility of considering not only the first image but also the nextin order of importance, to obtain a more refined ranking. Moreover,we are currently working on the possibility to use visual attentionfor image indexing, thus taking into account the real contents ofimages. A comparison with on other web-based CBIRs and testson a wider set of users in order to provide quantitative results willbe carried out in future works.

References

BUSCHER, G. 2007. Attention-based information retrieval. InSIGIR ’07: Proceedings of the 30th annual international ACMSIGIR conference on Research and development in informationretrieval, ACM, New York, NY, USA, 918–918.

GRANKA, L. A., JOACHIMS, T., AND GAY, G. 2004. Eye-trackinganalysis of user behavior in www search. In SIGIR ’04: Proceed-ings of the 27th annual international ACM SIGIR conference onResearch and development in information retrieval, ACM, NewYork, NY, USA, 478–479.

KELLY, D., AND TEEVAN, J. 2003. Implicit feedback for inferringuser preference: a bibliography. SIGIR Forum 37, 2, 18–28.

MARQUES, O., MAYRON, L. M., BORBA, G. B., AND GAMBA,H. R. 2006. Using visual attention to extract regions of interestin the context of image retrieval. In ACM-SE 44: Proceedingsof the 44th annual Southeast regional conference, ACM, NewYork, NY, USA, 638–643.

MILLER, T., AND AGNE, S. 2005. Attention-based informationretrieval using eye tracker data. In K-CAP ’05: Proceedings ofthe 3rd international conference on Knowledge capture, ACM,New York, NY, USA, 209–210.

MOE, K. K., JENSEN, J. M., AND LARSEN, B. 2007. A qual-itative look at eye-tracking for implicit relevance feedback. InCIR.

OYEKOYA, O. K., AND STENTIFORD, F. W. 2006. Eye tracking– a new interface for visual exploration. BT Technology Journal24, 3, 57–66.

OYEKOYA, O., AND STENTIFORD, F. 2007. Perceptual image re-trieval using eye movements. Int. J. Comput. Math. 84, 9, 1379–1391.

XU, S., ZHU, Y., JIANG, H., AND LAU, F. C. M. 2008. Auser-oriented webpage ranking algorithm based on user attentiontime. In AAAI’08: Proceedings of the 23rd national conferenceon Artificial intelligence, AAAI Press, 1255–1260.

76

faro visual attention for implicit relevance feedback in a content based image retrieval

Documents

images retrieval

users feedback

webbased image retrieval

image retrieval systems

retrieved images

feedback data

output images

explicit feedback