clustering and retrieval of social events in flickr

Clustering and Retrieval of Social Events in FlickrMaia Zaharieva1,2 Daniel Schopfhauser1 Manfred Del Fabro3 Matthias Zeppelzauer4

1Interactive Media Systems Group, Vienna University of Technology, Austria2Multimedia Information Systems Group, University of Vienna, Austria3Distributed Multimedia Systems Group, Klagenfurt University, Austria

4Institute of Creative Media Technologies, St. Pölten Univ. of Applied Sciences, Austria

This work has been partly funded by the Vienna Science and Technology Fund (WWTF) through project ICT12-010 and by the Carinthian Economic Promotion Fund (KWF) under grant KWF-20214/22573/33955.

!Contact: Dr. Maia Zaharieva, maia.zaharieva@{tuwien|univie}.ac.at

Development queries Test queriesR P F1 R P F1

Run 1 0.4656 0.8990 0.5367 0.2242 0.4570 0.2287Run 2 0.5052 0.8974 0.6192 0.2365 0.3268 0.2109Run 3 0.4770 0.4391 0.3838 0.4057 0.4203 0.2877

EXPERIMENTS IIConfigurationRun 1:‣ No query expansion‣ Neglect events without geo-coordinates‣ Use event-type modelsRun 2:‣ Run 1 + query expansionRun 3 (unsupervised):‣ No query expansion‣ Consider all events‣ No event-type models

Results

EVENT RETRIEVAL APPROACH

Lessons Learned‣ Unsupervised approach performs best‣Query expansion reduces precision‣ Event-type models do not help

Queries8 development queries:‣music, community, conferences, …10 test queries:‣music, community, sport, theatre, …

Dataset# photos: 362,578# events: 17,834% photos with GPS data: 20.12%% events with GPS data: 41.30%

Development set Test setF1 NMI F1 NMI

Run 1 0.9356 0.9873 0.9476 0.9886Run 2 0.9343 0.9872 0.9466 0.9884Run 3 0.9178 0.9840 0.9407 0.9872Run 4 0.9159 0.9836 0.9404 0.9871Run 5 0.9098 0.9822 0.9386 0.9866

Results

EXPERIMENTS IConfigurationRun 1: Text-based clustering + TERun 2: Text-based clustering + LTE2Run 3: Location-based clustering, LTE1Run 4: Location-based clustering, LTE2Run 5: Time-based clustering, TE

EVENT CLUSTERING APPROACHDatasetDevelopment set: # photos: 362,578# events: 17,834% photos with GPS data: 20.12%% events with GPS data: 41.30%

Test set: # photos: 110,541

Lessons Learned‣ High generalization ability‣ Capture time, user, and location information achieve impressive results‣Overall, existing metadata information achieve robust results

Event clustering

Event retrieval

Image collection

Event 1

Event 2

Event N

...Event 1

...

Event M

Query

Result

MOTIVATIONBackground‣ Immense daily growth of publicly available photos depicting social events‣ Increasing need for efficient algorithms that are able to mine large photo collections (e.g. Flickr)

Research Questions‣ Can we perform reliable event clustering using solely available metadata?‣ Supervised vs. unsupervised event retrieval?‣Generalization ability of the employed approaches?

Fundamental Principles‣ Simple but robust heuristics / decision criteria ‣Minimize number of parameters‣Minimize assumptions on dataset

photos

user name

capture time

GPS data

user-generated textual descriptions

Time-based clustering

generate single-item

event clusters

single-item event clusters

(SE)

merge SEA & SEB iff- user(SEA)=user(SEB)- �t(SEA,SEB) < tht

time-based event clusters (per user)

(TE)

merge TEA & TEB iff- min(�t(TEA,TEB))< tht- min(�l(TEA,TEB)) < thl

location-time-based event clusters

(LTE1)

Location-based clustering

Step 1: estimate a representative location per cluster, LStep 2: merge TEA & TEB iff - min(�t(TEA,TEB))< tht - �l(LA,LB) < thl

location-time-based event clusters

(LTE2)

Approach II

Text-based clustering

XOR

Step 1: textual features extraction

Step 2: merge LTEA & LTEB iff - Intersection(TDA,TDB)<thTD - Intersection(TAA, TAB)<thTA - temporal and/or location constraints apply

term dictionaries(TD)

LDA topic assignments (TA)

Approach I

event clusters

test queries

development queries(ground truth)

GeoTagging: city, country, venue

textual features extraction: TF-IDF

query expansion: WordNet synsets

event type models- topic extraction- one class SVMs

Feature extraction Matching: hard constraints

temporal constraints

geo constraints (country)

geo constraints (city)

Matching: soft constraints

matching venue?

textual similarity

event-type score

candidate clusters

weighting

adaptive thresholding

result clusters

clustering and retrieval of social events in flickr

Software