a unified framework for retrieving diverse social images
TRANSCRIPT
A Unified Framework for Retrieving Diverse Social Images
MAIA ZAHARIEVA1,2 AND PATRICK SCHWAB1
1Multimedia Information Systems, Faculty of Computer Science, University of Vienna, Austria 2Interactive Media Systems, Institute of Software Technology and Interactive Systems,
Vienna University of Technology, Austria
The large number of publicly shared images leads to a high number of visually similar data. In traditional search engines these images receive a similar ranking, as the only ranking criterion is image’s relevance to a query. This, in turn, produces redundant search results, where users often have to browse through several pages of similar images to get an overview of the underlying dataset. This is where the MediaEval 2014 Retrieving Diverse Social Images Task [1] and, subsequently, this project come in: We aim at the reduction of redundancy in search results by introducing diversity as a ranking criterion.
EXPERIMENTS
R A N K F E AT U R E C O M PA R I S O N M E T R I C 1 CM3x3 Euclidean 2 TFIDF Spearman
SIFT Euclidean 3 CM χ2
LBP χ2
LBP3x3 χ2
4 HOG cosine GLRLM χ2
GLRLM3x3 χ2
5 CSD cosine CN correlation
6 CN3x3 Euclidean
R A N K F E AT U R E C O M PA R I S O N M E T R I C 1 CM χ2
CM3x3 Euclidean 2 HOG cosine 3 GLRLM3x3 χ2
LBP3x3 χ2
4 GLRLM Euclidean
LBP χ2
SIFT Euclidean
5 CSD cosine
CN Euclidean CN3x3 Euclidean
6 TFIDF Euclidean
R U N R E L E VA N C E R A N K I N G I M A G E C L U S T E R I N G
run1 (V) CM3x3 SIFT
run2 (T) TFIDF TFIDF run3 (VT) TFIDF CSD run5 (V) CM3x3 CSD
R U N C R @ 2 0 P @ 2 0 F 1 @ 2 0 run1 0.4426 0.7600 0.5552 run2 0.4132 0.7250 0.5188 run3 0.4484 0.7567 0.5559 run5 0.4369 0.7617 0.5499
R U N C R @ 2 0 P @ 2 0 F 1 @ 2 0 run1 0.3901 0.6646 0.4863 run2 0.3909 0.6809 0.4888 run3 0.3982 0.6732 0.4949 run5 0.3915 0.6752 0.4897
Table 1: Feature-metric combination ranking for relevance ranking.
Table 5: Evaluation results on the test dataset.
RESULTS
Table 3: The features used in the runs. V – visual, T – textual, VT – visual and textual
Table 4: Evaluation results on the development dataset.
Table 2: Feature-metric combination ranking for image clustering.
• We submitted 4 different runs for the final evaluation (Table 3 shows the configurations).
• The combination of visual and textual information achieved the best performances.
• However, the clustering recall (CR) remained relatively low due to the large number of irrelevant images.
We conducted a thorough evaluation of the employed feature-metric combinations at the two main stages of our approach: relevance ranking and image clustering.
REFERENCES
INTRODUCTION
APPROACH
CONCLUSION
B. Ionescu, A. Popescu, M. Lupu, A. L. Gînscâ, and H. Müller. Retrieving diverse social images at mediaeval 2014: Challenges, datasets, and evaluation. In MediaEval 2014 Multimedia Challenge Workshop, 2014.
[1]
We employ a multi-stage approach for the retrieval of diverse social images. The workflow passes the following three main stages: Relevance ranking of input images.
In this stage we evaluate how relevant each image is with respect to a given query. We compare every image to a set of representative images by computing the distance between their correspon-ding feature vectors. For the given dataset, we use the provided Wikipedia images as reference images. Following, we assign the smallest distance to any of the reference images as its relevance score. Image clustering for diversification.
The image clustering stage aims at the identifica-tion of groups of similar images that can be used to diversify the final retrieval results. The clustering algorithm is up to choice. For our purposes, Adaptive Hierarchical Clustering (AHC) showed the best performance. It’s important to note that the features and the distance metric used in this stage can, and optimally should, differ from the ones employed in the relevance ranking stage. Final image selection.
The final stage combines the results of the previous steps to facilitate a result set that is both relevant and diverse according to the initial image set. We use a Round-Robin algorithm that chooses the most relevant images from alternating clusters.
Although, there are significant differences in the performances of single features, the top perform-ing features prove to be highly interchangeable. Achieved results indicate that - for the given data-sets - the crucial part of the process is not so much the diversification but more the assessment of image relevance. In general, the achieved results outline the limitations of the available textual (and visual) information in assessing image relevance. A possible approach to improve the results may consider occasionally available GPS data and employ external resources as additional source for information.
Image clustering Relevance ranking
1 2 3 …!
Final image selection
Figure 1: Schematic overview of the proposed approach. Images from: http://google.com/search?q=eiffel+tower,
accessed July 2nd 2014
ACKNOWLEDGMENT This work has been partly funded by the Vienna Science and Technology Fund (WWTF) through project ICT12-010.