a unified framework for retrieving diverse social images

A Unified Framework for Retrieving Diverse Social Images

MAIA ZAHARIEVA1,2 AND PATRICK SCHWAB1

1Multimedia Information Systems, Faculty of Computer Science, University of Vienna, Austria 2Interactive Media Systems, Institute of Software Technology and Interactive Systems,

Vienna University of Technology, Austria

The large number of publicly shared images leads to a high number of visually similar data. In traditional search engines these images receive a similar ranking, as the only ranking criterion is image’s relevance to a query. This, in turn, produces redundant search results, where users often have to browse through several pages of similar images to get an overview of the underlying dataset. This is where the MediaEval 2014 Retrieving Diverse Social Images Task [1] and, subsequently, this project come in: We aim at the reduction of redundancy in search results by introducing diversity as a ranking criterion.

EXPERIMENTS

R A N K F E AT U R E C O M PA R I S O N M E T R I C 1 CM3x3 Euclidean 2 TFIDF Spearman

SIFT Euclidean 3 CM χ2

LBP χ2

LBP3x3 χ2

4 HOG cosine GLRLM χ2

GLRLM3x3 χ2

5 CSD cosine CN correlation

6 CN3x3 Euclidean

R A N K F E AT U R E C O M PA R I S O N M E T R I C 1 CM χ2

CM3x3 Euclidean 2 HOG cosine 3 GLRLM3x3 χ2

LBP3x3 χ2

4 GLRLM Euclidean

LBP χ2

SIFT Euclidean

5 CSD cosine

CN Euclidean CN3x3 Euclidean

6 TFIDF Euclidean

R U N R E L E VA N C E R A N K I N G I M A G E C L U S T E R I N G

run1 (V) CM3x3 SIFT

run2 (T) TFIDF TFIDF run3 (VT) TFIDF CSD run5 (V) CM3x3 CSD

R U N C R @ 2 0 P @ 2 0 F 1 @ 2 0 run1 0.4426 0.7600 0.5552 run2 0.4132 0.7250 0.5188 run3 0.4484 0.7567 0.5559 run5 0.4369 0.7617 0.5499

R U N C R @ 2 0 P @ 2 0 F 1 @ 2 0 run1 0.3901 0.6646 0.4863 run2 0.3909 0.6809 0.4888 run3 0.3982 0.6732 0.4949 run5 0.3915 0.6752 0.4897

Table 1: Feature-metric combination ranking for relevance ranking.

Table 5: Evaluation results on the test dataset.

RESULTS

Table 3: The features used in the runs. V – visual, T – textual, VT – visual and textual

Table 4: Evaluation results on the development dataset.

Table 2: Feature-metric combination ranking for image clustering.

•  We submitted 4 different runs for the final evaluation (Table 3 shows the configurations).

•  The combination of visual and textual information achieved the best performances.

•  However, the clustering recall (CR) remained relatively low due to the large number of irrelevant images.

We conducted a thorough evaluation of the employed feature-metric combinations at the two main stages of our approach: relevance ranking and image clustering.

REFERENCES

INTRODUCTION

APPROACH

CONCLUSION

B. Ionescu, A. Popescu, M. Lupu, A. L. Gînscâ, and H. Müller. Retrieving diverse social images at mediaeval 2014: Challenges, datasets, and evaluation. In MediaEval 2014 Multimedia Challenge Workshop, 2014.

[1]

We employ a multi-stage approach for the retrieval of diverse social images. The workflow passes the following three main stages: Relevance ranking of input images.

In this stage we evaluate how relevant each image is with respect to a given query. We compare every image to a set of representative images by computing the distance between their correspon-ding feature vectors. For the given dataset, we use the provided Wikipedia images as reference images. Following, we assign the smallest distance to any of the reference images as its relevance score. Image clustering for diversification.

The image clustering stage aims at the identifica-tion of groups of similar images that can be used to diversify the final retrieval results. The clustering algorithm is up to choice. For our purposes, Adaptive Hierarchical Clustering (AHC) showed the best performance. It’s important to note that the features and the distance metric used in this stage can, and optimally should, differ from the ones employed in the relevance ranking stage. Final image selection.

The final stage combines the results of the previous steps to facilitate a result set that is both relevant and diverse according to the initial image set. We use a Round-Robin algorithm that chooses the most relevant images from alternating clusters.

Although, there are significant differences in the performances of single features, the top perform-ing features prove to be highly interchangeable. Achieved results indicate that - for the given data-sets - the crucial part of the process is not so much the diversification but more the assessment of image relevance. In general, the achieved results outline the limitations of the available textual (and visual) information in assessing image relevance. A possible approach to improve the results may consider occasionally available GPS data and employ external resources as additional source for information.

Image clustering Relevance ranking

1 2 3 …!

Final image selection

Figure 1: Schematic overview of the proposed approach. Images from: http://google.com/search?q=eiffel+tower,

accessed July 2nd 2014

ACKNOWLEDGMENT This work has been partly funded by the Vienna Science and Technology Fund (WWTF) through project ICT12-010.