a probabilistic topic- connection model for automatic ...xc35/ppt/cikm10_slides.pdf · •the major...

51
A Probabilistic Topic- Connection Model for Automatic Image Annotation www.ischool.drexel.edu Xin Chen 1 , Xiaohua Hu 1 , Zhongna Zhou 2 , Caimei Lu 1 , Gail Rosen 3 , Tingting He 4 , E.K. Park 5 1 College of Information Science and Technology, Drexel University, Philadelphia, PA, USA, 2 Dept. of ECE at University of Missouri in Columbia, MO, USA, 3 Dept. of ECE at Drexel University in Philadelphia, PA, USA, 4 Dept. of Computer Science at Central China Normal University in Wuhan, China, 5 CSI-CUNY in Staten Island, NY, USA

Upload: others

Post on 02-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

A Probabilistic Topic-Connection Model for Automatic Image Annotation

www.ischool.drexel.edu

Xin Chen1, Xiaohua Hu1, ZhongnaZhou2, Caimei Lu1, Gail Rosen3, Tingting He4, E.K. Park5

1College of Information Science and Technology, Drexel University, Philadelphia, PA, USA, 2Dept. of ECE at University of Missouri in Columbia, MO, USA, 3Dept. of ECE at Drexel University in Philadelphia, PA, USA, 4Dept. of Computer Science at Central China Normal University in Wuhan, China, 5CSI-CUNY in Staten Island, NY, USA

Page 2: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 2

OutlineOutlineProblem Statement & Research Questions

Review ‐ Background & Existing Techniques

Represent Image Content in the Feature Space

Topic Modeling

Developed Method and Evaluation

Developed Methods

Evaluation

Conclusions

Page 3: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 3

Web images comes with additional text information (Flickr.com)

Page 4: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 4

Web images comes with additional text information (Wikipedia page)

Page 5: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 5

ImageNet (Fei-Fei, et al, CVPR ‘09)

An ontology of image based on WordNet, currently has:13,000+ categories of visual concepts10 million human-clean images (~700 images per category)Openly available (www.image-net.org)

Page 6: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 6

Mapping a ImageNet synset to a Wikipedia page

ImageNet dataset: images synset“Chrysamthemum coronarium”

Wikipedia page obtained by URL matching

The mapping provides an ‘indirect’ link between image sets and textual information.

Page 7: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 7

Objective: utilizing available image-text pairs as prior knowledge for automatic image annotation

Manual image annotation is time‐consuming, laborious and expensive.

Image‐text pairs provide insight on revealing the correlation between image visual content and informative textual descriptions

Breakthroughs in automatic image annotation will help to organize the massive amount of digital images, promote developing and studying of image storage and retrieval systems, and serve for other applications such as online image‐sharing.

Page 8: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 8

OutlineOutlineProblem Statement & Research Questions

Review ‐ Background & Existing Techniques

Represent Image Content in the Feature Space

Topic Modeling

Developed Method and Evaluation

Developed Methods

Evaluation

Conclusions

Page 9: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 9

Problem 1: the semantic gap

• The major difficulty in content‐based image retrieval is the “semantic gap”between image features and the user.” (Arnold, W.M. et al., 2000).

An CBIR application called Retrievr, in which you can draw query images that can be used to find matching Flickr images.

Page 10: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 10

Page 11: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 11

Problem 2: the image appearances vary a lot among the same image category

Variations:Similarity transform:Spatial layout changeScale changeRotationsBlurringIllumination change…Affine transform:SkewingDifferent scaling of axes…

Page 12: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 12

Solution: Objects represented by parts and key-points

Page 13: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 13

Bag-of-FeaturesBackground & Existing Techniques

Assumption: the patterns of different object categories can be represented by different distributions of local structures.

Salient point detector

• Harris‐Laplace Detector (Mikolajczyk, 2004)

• DoG salient points detector (Lowe, 2004)

Region detector 

• Kadir‐Brady (KB) saliency detector (Kadir and Brady,2001 )

• Maximally Stable Extremal Regions ‐MSERs (Matas et al. 2002) 

Quantification – local descriptors:

• SIFT (Lowe, 2004)

• Color‐SIFT (van de Weijer, 2006)

Page 14: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 14

OutlineOutlineProblem Statement & Research Questions

Review ‐ Background & Existing Techniques

Represent Image Content in the Feature Space• Represent Image by ‘Key‐points’

• Represent Image by ‘Parts’

Topic Modeling

Developed Method and Evaluation

Developed Methods

Evaluation

Conclusions

Page 15: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 15

The Gaussian images and the difference-of-Gaussian images (Lowe, 2004)

Images are blurred by 2D-Gaussian function:

Adjacent Gaussian images are subtracted to produce the difference-of-Gaussian images.

For next octave, the Gaussian images are down-sampled by a factor of 2, and the process repeated.

The Gaussian images and the difference-of-Gaussian images (Lowe, 2004)

2 2 2( ) / 22

1( , , )2

x yG x y e σσπσ

− +=

Page 16: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 16

Difference-of-Gaussian (DoG) salient points detector (Lowe, 2004)

Original image Output of DoG salient point detection

The DoG salient point detector detects the scale‐space extreme points in the difference‐of‐Gaussian images and tends to extract blob‐like key points from images.

Page 17: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 17

Scale Invariant Feature Transform (SIFT) salient point descriptor (Lowe, 2004)

Image patches containing salient points are rotated to a canonical orientation and divided into cells. Each cell is represented as an 8‐dimension feature vector according to the gradient magnitude in eight orientations.

Compared to other descriptors, the SIFT descriptor is more robust and invariant to rotation and scale/luminance changes.

The SIFT descriptor of salient points (2×2 cells) (Lowe, 2004)

Page 18: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 18

Grouping similar local descriptors into visual words

Typically, the K‐Mean clustering algorithm is used to cluster the descriptors of extracted image patches into visual words and establish a code book of visual words for a specific image collection. 

⎫⎪⎪⎪⎬⎪⎪⎪⎭

Code book of visual words (Sivic, 2003) and (Fei-Fei et al. 2005)

Each key‐point assigned the index of the cluster center closest to the descriptor.

Page 19: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 19

“Bag-of-Visual-Words”

•Since visual words repeatedly appear in images and carry some atomic meanings, they can be regarded as visual analog of text words. Each image can be represented as a “Bag-of-Visual-Words”, which is an unordered collection of visual words.

•Many effective text mining and information retrieval algorithms (such as feature selection, stop words removal and TF-IDF term weighting) are applied to the vector space model of visual words.

Page 20: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 20

A simple test: the 15-scenes benchmark

15-scenes benchmark dataset consists of 4485 image spread over 15 categories, each of the 15 scene category contain 200 to 400 images and range from natural scenes to man-made environments

Oliva & Torralba, 2001Fei Fei & Perona, 2005Lazebnik, et al 2006

Page 21: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 21

00.10.20.30.40.50.60.70.80.91

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Recall

Precision

forest(S.)

tallbuilding(S.)coast(S.)

forest(V.)

tallbuilding(V.)coast(V.)

forest(B.)

tallbuilding(B.)coast(B.)

Comparison of precision-recall with respect to different image categories

Chen, et. al PAKDD ‘09

Page 22: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 22

Top ranked image retrieval results

Top ranked retrieval results

Query image (Coast)

Page 23: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 23

Image retrieval result (continue)

Top ranked retrieval results

Query image (Skyscrapers)

Page 24: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 24

LDA model for the ‘Bag-of-Visual-Words’(Fei-Fei et al. 2005)

We are able to achieve topic modeling from image documents in the same way as text documents, by using ‘Bag-of-Visual-Words’ .

LDA model for visual words (Fei-Fei et al. 2005)

codewords dictionarycodewords dictionary

Bag-of-Visual-Words

Page 25: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 25

OutlineOutlineProblem Statement & Research Questions

Review ‐ Background & Existing Techniques

Represent Image Content in the Feature Space• Represent Image by ‘Key‐points’

• Represent Image by ‘Parts’

Topic Modeling

Developed Method and Evaluation

Developed Methods

Evaluation

Conclusions

Page 26: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 26

Maximally Stable Extremal Regions –MSERs (Matas et al. 2002)

MSERs is a highly efficient region detector. The idea origins from thresholdings in image color/intensity space I. The thresholding yields a binary image Et as follows:

An extremal region is maximally stable when the area (or the boundary length) of the segment changes the least with respect to the threshold.  

The set of MSERs is closed under continuous geometric transformations and is invariant to affine intensity changes.

Page 27: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 27

The appearance of object parts as continuous space

Image Morphing

Page 28: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 28

How to quantify the image parts in a continuous space?

Image patches containing salient parts are rotated to canonical angle and adjust to uniform size (known as normalized patches).

Principal component analysis (PCA) is performed on normalized patches to obtain feature representation

Finally, the appearance of each patch (which is n × n matrix) is quantified as a feature vector of the first k (typically 20-50) principal components

Adjusting image patches to uniform size

Page 29: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 29

Selection of Principal Component Number

Reconstruction of first 50 PCA components

Major PCA components = 15

Original Pixel

Page 30: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 30

Summary: image represented by key-points and parts

Represent image by SIFT descriptors and MSER features

SIFT(key‐points)

MSER(parts)

Page 31: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 31

OutlineOutlineProblem Statement & Research Questions

Review ‐ Background & Existing Techniques

Represent Image Content in the Feature Space

Topic Modeling

Developed Method and Evaluation

Developed Methods

Evaluation

Conclusions

Page 32: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 32

Topic Modeling Topic Modeling -- IntuitiveIntuitive

Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain, visual, perception,

retinal, cerebral cortex,eye, cell, optical

nerve, imageHubel, Wiesel

IntuitiveAssume the data we see is generated by some parameterized random process.Learn the parameters that best explain the data.Use the model to predict (infer) new data, based on data seen so far.

Page 33: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 33

‘Bag of word model’ for text document

w jintelligence

d i

Texas Instruments said it has developedthe first 32-bit computer chip designedspecifically for artificial intelligenceapplications [...]

D = Document collection W = Lexicon/Vocabulary

...

artif

icia

l

1

inte

llige

nce

inte

rest

0

artif

act

0 ...... 2t

=di

w1 ... w j ... wJ

d1

...

di

...

dI

D

W

...

......

...

Document-Term Matrix

...frequ

ency

Page 34: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 34

Notations

WordBasic unit.Item from a vocabulary indexed by {1, . . . ,V}.

DocumentSequence of N words, denoted by w = (w1,w2, . . . ,wN).

CollectionA total of D documents, denoted by C = {w1,w2, . . . ,wD}.

TopicDenoted by z, the total number is K.Each topic has its unique word distribution p(w|z)

Page 35: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 35

Background & Existing Techniquesof Generative Latent Topic Models

The Naïve Bayesian model

The probabilistic latent semantic indexing (PLSI) model

PLSI Model (Hoffman, 2001)

* arg max ( | ) ( ) ( | )z p z w p z p w z= ∝

Word-Topic decision

Prior Probability of Topic z

Likelihood of word w given topic z

Assumption:

Each document has a mixture of k topics.

Fitting the model involves:

Estimating the topic specific word distributions p(wi|zk) and document specific topic distributions p(zk|dj) from the corpse via maximum likelihood estimation (MLE).

Page 36: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 36

Latent Dirichlet Allocation (LDA) Model (Blei, 2003)

In PLSI model, the topic mixture probability p(zk|dj) for documents are fixed once the model is estimated. For new coming document, the model needed to be re-estimated. Thus it is not scalable.

The LDA model treats the probability of latent topics for each document p(z|d) and the conditional probability of words for each latent topic p(w|z) as latent random variables whichare subject to change when new document comes.

Usually, symmetric prior is used:

( | ) ~ ( )j dp z d Multi θ

( | ) ~ ( )j jip w z Multi φ

~ ( )j Dirφ β

θd~Dir(α)

Generative process of LDA model

1 2 1 2{ , ,..., } {0.1}, { , ,..., } {0.01}T Wα α α β β β= = = =α β

Page 37: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 37

Explanation of Prior Settings

Why we set:                                              ?

The above parameter setting is for the consideration that we need to make topic modeling results more diverse. By doing this, each document will in turn have its unique favor on a small number of topics that related to its content instead of having equal probability for every latent topic.

1 2{ , ,..., } {0.1}Tα α α= =α

( | ) ~ ( )j dp z d Multi θ

θd~Dir(α)

Documents as mixtures of topics, each has a different prior probability

Page 38: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 38

This extension of the original topic model can be achieved by introducing a new branch for visual words which makes topics of visual words associated with that of caption words. So it’s called Corr‐LDA model. The  prototype of Corr‐LDA model is introduced by (Blei, 2003) . 

The model is estimated via Gibbs Sampling Monte Carlo process (Griffiths, 2004), which involves iteratively estimating the posterior probability for topics from current word‐topic assignment, and adopting a Monte Carlo process to determine the assignment of word‐topic in the next round.

Extension of topic model for visual words and image captions

β

α

z

w

φβ

z

w

φ

θ

˜

˜

˜

˜

D

T

Latent Topics

Posterior probability for topics at each iteration:

, ,

, ,

( | , , )wi di j i j

wi i di j i

n np z j w

W n T nβ αβ α

− −

− −

+ += ∝ ⋅

+ +-i -wiw z ii

Entity Type 1

Entity Type 2

Corr-LDA Model (Blei, 2003)

Page 39: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 39

Problem with CorrLDA model in modeling image and text

Needle-leaf forest is composed largely of straight trunked, conical tress with relatively short branches, and small, narrow, needlelike leaves. These tress are conifers. Where evergreen, the needleleaf forest provides continues and deep shade to the ground so that lower layers of vegetation are sparse or absent except for a thick carpet of mosses in many places. Species are few and large tracts of forest consist almost entirely of but one or two species.

Topic 2

Topic 1

Topic 5

...

...

Topic 3

...

branc

h

specie

s

leaf

tree

anim

al

...

groun

d

Document-level Topic Mixture Composition

...β'

α

z

w

φβ

y

v

ψ

θ D

T

Corr-LDA Model

Page 40: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 40

OutlineOutlineProblem Statement & Research Questions

Review ‐ Background & Existing Techniques

Represent Image Content in the Feature Space

Topic Modeling

Developed Method and Evaluation

Developed Methods

Evaluation

Conclusions

Page 41: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 41

An improved topic model

Note: multiple word phrases extracted by Xtract (Smadja, 1993)

Page 42: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 42

The Data Collection and Settings

The image dataset is acquired from the ImageNet (http://www.image‐net.org/). Specifically, we download synsets under the “flower”, “mammal”and “tree” subtree. The synsets are mapped to a Wikipedia pages describing the same concept. 

A rule‐based method is used to identify the explanative sections in Wikipedia pages. Articles with insufficient words (<200 words) are filtered out. In total, we obtain text descriptions for 1452 synsets (330, 562 and 560 synsets for subtrees “flower”, “mammal” and “tree”, respectively).

For each synset, we replicate the text descriptions to each of its images. We then make index for single‐words and multiple word phrases in the text descriptions, and extract visual‐word features as well as MESR region features from images (an average of 1095 visual words and 127 MSER regions per image)

The ImageNet has a backbone hierarchical ontology structure from WordNet, in which each node involves a group of images that depict a particular concept named as a synonym set, or “synset”

Page 43: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 43

Illustration of uncovered latent topics by proposed model

Topic84Top words Probabilityflower 0.019254orchid 0.012133Amanda 0.00867subgenera 0.006814shape 0.006617monophylet 0.006449Masdevallia 0.004167genera 0.003656subgenu 0.003208sever 0.003009section 0.003009genu 0.002962tuft 0.002903dura 0.002583Klotzsch 0.002562COLOMBIA 0.002558subtrib 0.002537epiphyt 0.002384final 0.002314botanist 0.002215

Top Phrase Probabilityone flower 0.015733orchid family 0.009458severalgenus 0.008829smooth leaf 0.007536triangular flower 0.006662temperate climate 0.006321a flower 0.005409specy ¨cm. 0.004575horticultural trade 0.004265e.g.m. 0.004012reproductive structure 0.003869divisionmagnoliophyta 0.003179biological function 0.003105male sperm 0.003041female ovum 0.002879higherplant 0.002747next generation 0.002676primarymean 0.002664reproductive organ 0.002459selective pressure 0.002443

Page 44: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 44

Illustration of uncovered latent topics by proposed model

Topic116Top words ProbabilityLeopard 0.011636Africa 0.0095Panthera 0.007002jaguar 0.00681lion 0.005525spot 0.005232cat 0.004863black 0.00485cross 0.004607Felida 0.004351home 0.003937hybrid 0.003923India 0.003921Uncia 0.003818central 0.003755normal 0.003571exist 0.003102parent 0.003069climb 0.003063habitat 0.003011

Top Phrase Probabilitysnow leopard 0.014342black panther 0.014025sri lanka 0.013044male leopard 0.012733genuspanthera 0.012725smallspot 0.012723mammalspecy 0.012718greatdiversity 0.012718greekword 0.012718southernasia 0.012718Indian subcontinent 0.012718rain forest 0.007444short leg 0.005945american continent 0.005864berlin zoo 0.005864forest area 0.005108wide variation 0.004198across 0.004079abundantprey 0.003955severalspecy 0.003925

Page 45: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 45

OutlineOutlineProblem Statement & Research Questions

Review ‐ Background & Existing Techniques

Represent Image Content in the Feature Space

Topic Modeling

Developed Method and Evaluation

Developed Methods

Evaluation

Conclusions

Page 46: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 46

Likelihood comparison

2

2 22

2

1

1 ''

Likelihood of visual components given the model:

( | ) ( | , ) ( | )

( )( ) ( ) ( )

j

T

j j j j jj

T VTTvj vv vVTV

jv v j vv

p p y p y d

CVC V

ψψ ψ ψ

βββ β

=

=

⎡ ⎤= ⎢ ⎥⎣ ⎦

+⎡ ⎤Γ= ⋅⎢ ⎥Γ Γ +⎣ ⎦

∏ ∫

∏∏ ∑

v y v

Page 47: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 47

Perplexity comparison (# of visual topics=1000)

log ( , | , )exp

( )test

test

dd dw pd

pPerplexity

N N

⎡ ⎤−⎢ ⎥=

+⎢ ⎥⎣ ⎦

∑∑

d d d dw p v r

Page 48: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 48

Annotation accuracy comparison

1

( | ) ( | ). ( | )T

i j it

p w d p w w z t p z t d j=

= = = = =∑

Page 49: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 49

OutlineOutlineProblem Statement & Research Questions

Review ‐ Background & Existing Techniques

Represent Image Content in the Feature Space

Topic Modeling

Developed Method and Evaluation

Developed Methods

Evaluation

Conclusions

Page 50: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 50

ConclusionsA probabilistic topic‐connection model is proposed to deal with the problem of modeling images and associated text description.

Specifically, new latent variables have been introduced to allowfor more flexible sampling of word topics and visual topics, in which one word topic may connect to multiple visual topics. 

The proposed model provides better representation of the connection between latent semantic topics and latent image patterns, thus achieves better performance in the task of automatic image annotation compared to the traditional Corr‐LDA model.

Page 51: A Probabilistic Topic- Connection Model for Automatic ...xc35/ppt/CIKM10_Slides.pdf · •The major difficulty in content‐based image retrieval is the “semantic gap” between

2010-11-9 51

Questions or Comments?

THANK YOU FOR COMING! ☺