presented by ivan chiou

15
Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information Presented by Ivan Chiou

Upload: marlo

Post on 17-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information. Presented by Ivan Chiou. Author. Duc-Tien Dang-Nguyen, Giulia Boato , Alessandro Moschitti , Francesco G.B. De Natale - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presented by  Ivan  Chiou

Supervised Models for Multimodal Image Retrieval based on Visual,

Semantic and Geographic Information

Presented by Ivan Chiou

Page 2: Presented by  Ivan  Chiou

Author Duc-Tien Dang-Nguyen,

Giulia Boato,

Alessandro Moschitti,

Francesco G.B. De Natale

Department to Information and Computer Science –University of Trento – Italy

Page 3: Presented by  Ivan  Chiou

AbstractBackground: approaching to improve image ranking

Concerned about user annotation, time and location

ProposeTo define a novel multimodal similarity measureCombined visual features, annotated concepts, and geo tagging.Propose a learning approach based on SVMs(Support Vector Machine).

Page 4: Presented by  Ivan  Chiou

IntroductionImage-graph based techniques

Vertices represent including visual and semantic information.

Probabilistic modelsPLSA(Probabilistic Latent Semantic Analysis) methodology

Visual featuresAnnotation GPS coordinates

SVMs, able to learn from the data weight to be assigned.

Random set of image queriesRetrieve a set of images having highest similarityJudged relevant by human annotatorsTrain SVMs with examples.

Page 5: Presented by  Ivan  Chiou

Combining visual, Concept and GPS signals(1/2)

PLSAUser generated multimedia contents

Visual contentImage taggingGeo location

Producing corresponding topic spaces with reduced dimensions.Expectation Maximization

Fast on-line retrieval for very large dataset

Page 6: Presented by  Ivan  Chiou

Combining visual, Concept and GPS signals(2/2)

PLSA – with 100 topics.Visual feature

SIFT(Scale Invariant Feature Transform)128 element descriptor with 2500 salient points. 2500 salient points (K-Means, training set of 5000 images)Bag-of-words associating a feature vector with each image.

Image annotation Consists of all the tags in the dataset, except words used just once or by a single user.Total number:5500 words

GPS coordinationCalculated as distance between the GPS coordinates of the query and the retrieved images.

Page 7: Presented by  Ivan  Chiou

Supervised Multimodal Approach(1/2)

Improve retrieval accuracy Relies on Development Set(DS)

Relevant imagesRelevantIrrelevant

Annotated by users

Proposing SVMsTwo important property

They are robust to overfitting , offering the possibility to trade-off between generalization and empirical error to tune our model to a more general setting.Include additional features in the parameter vector

Page 8: Presented by  Ivan  Chiou

Supervised Multimodal Approach(2/2)

SVMs:Multimodal 2(MM2)

Page 9: Presented by  Ivan  Chiou

Experimental Result(1/5)100.000 images of Paris from Flickr.

2500 SIFT / 50.000 images.5.500 tags / 50.000 images.Maximum two images per user.

Avoid similar images taken by the same photographer.

100 query images and retrieved top-ranked 9 imagesHow to judge it is relevant

Half of 72 annotators to consider the image relevant

Page 10: Presented by  Ivan  Chiou

Experimental Result(2/5)Result

900 retrieved imagesVS: 305 relevant imagesTS: 218 relevant imagesVS+TS: 308 relevant imagesMM1: 641

GPS coordinates.

MM2: accuracy: 72% and MAP of 0.78

Page 11: Presented by  Ivan  Chiou

Experimental Result(3/5)

Page 12: Presented by  Ivan  Chiou

Experimental Result(4/5)Figure 4-8

Improve the basic model when the tag annotation is not reliableImprove diversification retrieval result. (reduce the same pictures with night or day, diff perspective, and diff point of view)

Page 13: Presented by  Ivan  Chiou

Experimental Result(5/5)

Page 14: Presented by  Ivan  Chiou

ConclusionPresented a novel way to combine visual information with tags and GPS.Proposed a supervised machine learning approach (MM2), based on Support Vector Machines. Result confirm that the approaches improve the accuracy.

Page 15: Presented by  Ivan  Chiou

BACKUP

Presented by Ivan Chiou