presented by ivan chiou
DESCRIPTION
Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information. Presented by Ivan Chiou. Author. Duc-Tien Dang-Nguyen, Giulia Boato , Alessandro Moschitti , Francesco G.B. De Natale - PowerPoint PPT PresentationTRANSCRIPT
Supervised Models for Multimodal Image Retrieval based on Visual,
Semantic and Geographic Information
Presented by Ivan Chiou
Author Duc-Tien Dang-Nguyen,
Giulia Boato,
Alessandro Moschitti,
Francesco G.B. De Natale
Department to Information and Computer Science –University of Trento – Italy
AbstractBackground: approaching to improve image ranking
Concerned about user annotation, time and location
ProposeTo define a novel multimodal similarity measureCombined visual features, annotated concepts, and geo tagging.Propose a learning approach based on SVMs(Support Vector Machine).
IntroductionImage-graph based techniques
Vertices represent including visual and semantic information.
Probabilistic modelsPLSA(Probabilistic Latent Semantic Analysis) methodology
Visual featuresAnnotation GPS coordinates
SVMs, able to learn from the data weight to be assigned.
Random set of image queriesRetrieve a set of images having highest similarityJudged relevant by human annotatorsTrain SVMs with examples.
Combining visual, Concept and GPS signals(1/2)
PLSAUser generated multimedia contents
Visual contentImage taggingGeo location
Producing corresponding topic spaces with reduced dimensions.Expectation Maximization
Fast on-line retrieval for very large dataset
Combining visual, Concept and GPS signals(2/2)
PLSA – with 100 topics.Visual feature
SIFT(Scale Invariant Feature Transform)128 element descriptor with 2500 salient points. 2500 salient points (K-Means, training set of 5000 images)Bag-of-words associating a feature vector with each image.
Image annotation Consists of all the tags in the dataset, except words used just once or by a single user.Total number:5500 words
GPS coordinationCalculated as distance between the GPS coordinates of the query and the retrieved images.
Supervised Multimodal Approach(1/2)
Improve retrieval accuracy Relies on Development Set(DS)
Relevant imagesRelevantIrrelevant
Annotated by users
Proposing SVMsTwo important property
They are robust to overfitting , offering the possibility to trade-off between generalization and empirical error to tune our model to a more general setting.Include additional features in the parameter vector
Supervised Multimodal Approach(2/2)
SVMs:Multimodal 2(MM2)
Experimental Result(1/5)100.000 images of Paris from Flickr.
2500 SIFT / 50.000 images.5.500 tags / 50.000 images.Maximum two images per user.
Avoid similar images taken by the same photographer.
100 query images and retrieved top-ranked 9 imagesHow to judge it is relevant
Half of 72 annotators to consider the image relevant
Experimental Result(2/5)Result
900 retrieved imagesVS: 305 relevant imagesTS: 218 relevant imagesVS+TS: 308 relevant imagesMM1: 641
GPS coordinates.
MM2: accuracy: 72% and MAP of 0.78
Experimental Result(3/5)
Experimental Result(4/5)Figure 4-8
Improve the basic model when the tag annotation is not reliableImprove diversification retrieval result. (reduce the same pictures with night or day, diff perspective, and diff point of view)
Experimental Result(5/5)
ConclusionPresented a novel way to combine visual information with tags and GPS.Proposed a supervised machine learning approach (MM2), based on Support Vector Machines. Result confirm that the approaches improve the accuracy.
BACKUP
Presented by Ivan Chiou