pichunter a bayesian image retrieval system ingemar cox (1,2,3,4)t. conway (3) joumana ghosn...

PicHunterA Bayesian Image Retrieval System

Ingemar Cox (1,2,3,4) T. Conway (3)

Joumana Ghosn (2,3) Matt Miller (1,2,3,4)

Thomas Minka (3,4) Steve Omohundro (1)

Thomas Papathomas (2,3) Peter N. Yianilos (1,2,3,4)

Project overview1. Target Testing and the PicHunter Bayesian Multimedia Retrieval

System, I.J. Cox, Matt Miller, S.M. Omohundo, P.N. Yianilos, Proceedings of the Forum on Research & Technology Advances in Digital Libraries, pp 66-75, 1996.

2. PicHunter: Bayesian Relevance Feedback for Image Retrieval, I.J. Cox, Matt Miller, Stephen Omohundo, P.N. Yianilos, 13th International Conference on Pattern Recognition, Vol.III, Track C, pp.361-369, August 1996.

– Introduces PicHunter, the Bayesian framework, and describes a working system including measured user performance.

3. Hidden Annotation in Content Based Image Retrieval, I.J. Cox, Joumana Ghosn, Matt Miller, T. Papathomas, P.N. Yianilos, IEEE Workshop on Content-Based Access of Image & Video Libraries, pp.76-81, June 1997

– Introduces the idea of ``hidden annotation'', and reports results demonstrating that it improves performance.

Project overview

4. An Optimized Interaction Strategy for Bayesian Relevance Feedback, I. J. Cox, M. L. Miller, T. Minka, P. N. Yianilos, IEEE International Conference on Computer Vision and Pattern Recognition - CVPR '98, Santa Barbara, CA, pp. 553-558, 1998.

– Introduces an improved stochastic image display strategy allowing the system to ``ask better questions.''

5. Psychophysical Studies of the Performance of an Image Database Retrieval System, T. Papathomas, T. Conway, I. Cox, J. Ghosn, M. Miller, T. Minka, P. Yianilos, Proceedings of the Human Vision & Electric Imaging III, San Jose, CA Vol 3299, pp. 591-602, January 1998

– Describes Psychophysical studies of the system in a controlled environment.

Project summary

• The Bayesian Image Retrieval System, PicHunter: Theorgy, Implementation and Psychophysical Experiements, I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, P. N. Yianilos, IEEE Transactions on Image Processing, 9, 1, 20-37, (2000)

Introduction

• A search consists of– Query

– Repeated relevance feedback

• To date, emphasis on query phase – better representations, relevance feedback crude or non-

existent

• Lack of quantitative measures for comparing performance of search algorithms

The main ideas

• Bayesian relevance feedback– Learn from human interactions– Model the user's actions, not his/her query

• Quantifiable testing– Target testing– Baseline testing

• Optimize the image display

User interface

Target testing

• The user is shown an image from the database. His/her task is to use the system to find it. We measure the number of interactions required. This, then, is easily compared against a simple linear search

• Not a perfect model for all intended uses --- but something we can measure and use for comparisons

Features

• Pictorial features– Originally 18 global features

• % of pixels that are one of 11 colors

• Mean color saturation

• Median intensity of the image

• Image width and height

• A measure of global contrast

• Two measures of the number of edgels computed at different thresholds

Features

• Hidden annotation– Provides semantic labels– 147 attributes– Boolean vector, normalized Hamming distance

Bayesian relevance feedback

• At denotes the current user action,

• Dt is the current display

• H the session history including the current images displayed. Thus,– Ht = {D1, A1, D2, A2,… Dt, At}

• T is a target image.


• We build a predictive model P(A|T,H)

• Then from Bayes rule

j

1-t j1-ttjt

1-ti1-tt,itti

)H|P(T )S,D,T|P(A

)H|TP(T )SD ,TT|P(A )H|TP(T


• Assume time-invariance and same for all users

),|( tt DTTAPi

Absolute-distance model

• Only one image, Xq, in the display Dt can be selected at each iteration

• The probability of Ti increases or decreases depending on the distance d(Ti, Xq)

– P(T=Ti) = P(T=Ti) G(d(Ti, Xq))

Relative-distance model

• Let Q={Xq1, Xq2,…XqC} denote the set of selected in images in display Dt and

• Let N={Xn1, Xn2 …XnL} denote the set of unselected images

• Then we compute the distance difference– d(Ti, Xqk) – d(T1,Xnm) for all pairs {Xqk, Xnm}

– The probabilities of images Tc that are closer to Xqk are increased while those closer to Xnm are decreased.

Display updating algorithm

• Most probable display

• Most informative display (Max. mutual information)

• Sampling

• Query by example

Experimental setup

• Database of 4522 images– 1500 annotated

• M/N, A/R, P/S/B– Memory/ no memory (relevance feedback

history)– Absolute / relative distance– Pictorial / semantic/ both features

Experimental notation

• MRB – memory, relative distance, pictorial and semantic features

• MAB – memory, absolute distance, pictorial and semantic features

• NRB – no memory, …• NAB• MRS – memory, relative, semantic features• MRP – memory, relative, pictorial features

Experimental resultsMemory, metric and features

MRB MAB NRB NAB MRS MRP

6 naïve users

25.4 35.8 45.5 33.2 15.6 35.1

2 exp.

users

13.1 31.6 28.4 22.2 8.8 18.9

Baseline testing

• Similarity testing– How many images are examined before the

user sees a similar image?– Compare to number needed when randomly

searching the database

Target versus category search

MRB/T MRS/T MRB/C RAND/C

Naïve users

25.4 15.6 12.2 19.7

Exp. users

13.1 8.8 8.9 20.1

Improved pictorial features

• Pictorial features– HSV 64-element histogram– HSV 256-element autocorrelogram– RGB 128-element color coherence vector

Experimental results(User learning)

Before explanation

After explanation

Pictorial only

17.1 13.2

Pictorial and semantic

11.7 9.5

Display updating algorithms

• Most probable display

• Most informative display (Max. mutual information)

• Sampling

• Query by example

Most Probable Display

• Performs quite well

• However, greed strategy suffers from “over-learning”– PicHunter “gets stuck” in a local maximum– Display after display of “lions”, say

Most-Informative Display

• Try to minimize the total number of iterations required in a search– Try to elicit as much information from the user

as possible– Information theory suggests entropy as an

estimate of the number of questions one needs to ask to resolve the ambiguity

Most-informative display

• Consider the ideal (deterministic) case, in which the display consists of two images

)(),( if 1),,|1( 2121 XdTXdTXXAPideal

)(),( if 5.0),,|1( 2121 XdTXdTXXAPideal

)(),( if 0),,|1( 2121 XdTXdTXXAPideal

Most-informative display

• Generalization to the non-deterministic case

)/)),(),(exp((1

1),,|1(

2121 TXdTXd

TXXAPsigmoid

Most informative display

• To perform minimization is non-trivial– Perform Monte Carlo simulation

• Draw random displays {X1, X2… XND} from the distribution P(T=Ti)

• Sampling is a special case of most informative method where only one Monte Carlo sample is drawn

Simulation results: deterministic

Simulation results: non-deterministic

Experimental results: Display strategies

EB’ EP’ ES RB’

MRB’

AB’

NAB’

RS

MRS

RP

MRP

Naïve users

11.3 25.8 16.0 12.0 20.4 11.8 29.6

Exp.

users

6.8 10.2 8.3 8.65 11.5

Future directions

• More efficient algorithms

• Automatic detection of hidden features

• Explore slightly richer user interfaces

• Explore increased use of online learning

pichunter a bayesian image retrieval system ingemar cox (1,2,3,4)t. conway (3) joumana ghosn...

Documents

image image width

image display slide

image processing

matt miller

user interface slide

working system

bayesian framework

relevance feedback crude