presented by ivan chiou

Supervised Models for Multimodal Image Retrieval based on Visual,

Semantic and Geographic Information

Presented by Ivan Chiou

Author Duc-Tien Dang-Nguyen,

Giulia Boato,

Alessandro Moschitti,

Francesco G.B. De Natale

Department to Information and Computer Science –University of Trento – Italy

AbstractBackground: approaching to improve image ranking

Concerned about user annotation, time and location

ProposeTo define a novel multimodal similarity measureCombined visual features, annotated concepts, and geo tagging.Propose a learning approach based on SVMs(Support Vector Machine).

IntroductionImage-graph based techniques

Vertices represent including visual and semantic information.

Probabilistic modelsPLSA(Probabilistic Latent Semantic Analysis) methodology

Visual featuresAnnotation GPS coordinates

SVMs, able to learn from the data weight to be assigned.

Random set of image queriesRetrieve a set of images having highest similarityJudged relevant by human annotatorsTrain SVMs with examples.

Combining visual, Concept and GPS signals(1/2)

PLSAUser generated multimedia contents

Visual contentImage taggingGeo location

Producing corresponding topic spaces with reduced dimensions.Expectation Maximization

Fast on-line retrieval for very large dataset

Combining visual, Concept and GPS signals(2/2)

PLSA – with 100 topics.Visual feature

SIFT(Scale Invariant Feature Transform)128 element descriptor with 2500 salient points. 2500 salient points (K-Means, training set of 5000 images)Bag-of-words associating a feature vector with each image.

Image annotation Consists of all the tags in the dataset, except words used just once or by a single user.Total number:5500 words

GPS coordinationCalculated as distance between the GPS coordinates of the query and the retrieved images.

Supervised Multimodal Approach(1/2)

Improve retrieval accuracy Relies on Development Set(DS)

Relevant imagesRelevantIrrelevant

Annotated by users

Proposing SVMsTwo important property

They are robust to overfitting , offering the possibility to trade-off between generalization and empirical error to tune our model to a more general setting.Include additional features in the parameter vector

Supervised Multimodal Approach(2/2)

SVMs:Multimodal 2(MM2)

Experimental Result(1/5)100.000 images of Paris from Flickr.

2500 SIFT / 50.000 images.5.500 tags / 50.000 images.Maximum two images per user.

Avoid similar images taken by the same photographer.

100 query images and retrieved top-ranked 9 imagesHow to judge it is relevant

Half of 72 annotators to consider the image relevant

Experimental Result(2/5)Result

900 retrieved imagesVS: 305 relevant imagesTS: 218 relevant imagesVS+TS: 308 relevant imagesMM1: 641

GPS coordinates.

MM2: accuracy: 72% and MAP of 0.78

Experimental Result(3/5)

Experimental Result(4/5)Figure 4-8

Improve the basic model when the tag annotation is not reliableImprove diversification retrieval result. (reduce the same pictures with night or day, diff perspective, and diff point of view)

Experimental Result(5/5)

ConclusionPresented a novel way to combine visual information with tags and GPS.Proposed a supervised machine learning approach (MM2), based on Support Vector Machines. Result confirm that the approaches improve the accuracy.

BACKUP

Presented by Ivan Chiou

presented by ivan chiou

Documents

set of images

retrieved images

visual information

multimodal image retrieval

query images

image annotation

images of paris

similar images