how ‘how’ reflects what’s what: content-based exploitation of how users frame social images

Post on 27-Jun-2015

251 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Our presentation at the High Risk / High Reward track at the ACM MM 2014 conference. In this presentation we present a novel way to tackle large scale image classification or retrieval.

TRANSCRIPT

How ‘How’ Reflects What’s What:

Content-based Exploitation of How Users Frame Social Images

Michael Riegler, Simula Research Laboratory, Norway!Martha Larson, Delft University of Technology, Netherlands!Mathias Lux, University of Klagenfurt, Austria!Christoph Kofler, Delft University of Technology, Netherlands

We will introduce a signal that !

exists in every image collection &

gives you an enormous speedup!

Take Home Message❖ Photographers use intentional frames.!

❖ The frames reflect the semantic categories of images.!

❖ In turn, global image features reflect the frames.!

❖ This motivates a fast and simple approach to image semantics.!

❖ Take home a strong inner feeling that you want to try it out yourself!

But what is intentional framing?

❖ You may think now that you already know it, its called:!

❖ Concepts or…!

❖ Scenes!

❖ But Wrong!

❖ And let me tell you, it is also not!

❖ Composition!

❖ Also Wrong!

–The Definition

“Intentional framing is the sum of choices made by photographers on exactly how to portray the subject matter that they have decided to

photograph.”

Picture source: https://www.flickr.com/photos/ausnap/5712791522/in/photostream/

Mechanics of Intentional Framing

semantic category of an

image

the photographers´

intent

global image features

reflects

reflectsreflects

Time for examples…

Hypothesis

❖ Photographers’ choices.!

❖ Even if framing is not a conscious decision, it still is an unconscious one.!

❖ Similar intents for taking images lead to similar framings.!

❖ Global features can capture these intentional semantics.

The Exploration Experiments…

Global Features and Intent

❖ Global features connect semantics and intent.!

❖ Show that there exist a solid evidence for intentional framing.!

❖ Clustering experiment on two different data sets!

❖ Intent data set!

❖ Fashion 10000 data set

Correlation of Peoples’ Perception and Global Features

❖ X-means clustering!

❖ Based on different global features.!

❖ Features can catch different aspects (edges, colour, etc.).!

❖ The density of the global features based clusters correlated to the users perception about the intentional framing in it.

Original

Edge

Color

Evidence of Human Perception of Intent

black - a positive correlation!red - a negative correlation

Intent Categories!Global Features

1 2 3 4 5 6

CEDDFCTHGaborTamuraLuminance LayoutScalable ColorOpponent HistogramAutocolor CorrelogramJPEG CoefficentEdge HistogramPHOGJCDJoint Histogram

Correlation between semantic categories and global features

correlation of 0,56

The Application Experiments…

Content Based Classification

❖ Using intentional framing to tackle a classification problem.!

❖ Simple search-based classifier (SimSea).!

❖ Our submission to the ACM MM `13 Yahoo! - Large-scale Flickr-tag image Classification Grand Challenge!

❖ Reviewers told us: It is too simple…

Remember the challenge?

❖ 2 million images.!

❖ 10 different semantic categories.!

❖ nature, people, music, london, 2012, food, wedding, sky, beach, travel.!

❖ extremely diverse categories.

The results iAP per category based on the development set

JCD CL OH PHOG2012 0,198 0,128 0,130 0,104

beach 0,448 0,487 0,342 0,534food 0,531 0,492 0,389 0,352

london 0,244 0,201 0,146 0,347music 0,526 0,457 0,495 0,164nature 0,502 0,410 0,435 0,503people 0,264 0,227 0,244 0,105

sky 0,628 0,601 0,544 0,473travel 0,139 0,101 0,128 0,112

wedding 0,463 0,272 0,262 0,235

Compared to the Official Results

!

!

❖ Very good results with a very simple method.!

❖ Very time efficient.!

❖ Processed on a single desktop PC.

Our method!SimSea

Local 1 (SMaL[1])

Local 2 (SVM[1])

Concept 1 (HA[2])

MiAP 0,391 0,422 0,413 0,37

[1] E. Mantziou, S. Papadopoulos, and Y. Kompatsiaris. Scalable Training with Approximate Incremental Laplacian Eigenmaps and PCA. In Proceedings of the ACM MM 13’, pages 381–384, 2013. [2] W. Hsu. Flickr-tag Prediction Using Multi-modal Fusion and Meta Information. In Proceedings of ACM MM 13’, pages 353–356, 2013.

Conclusion

❖ Intentional framing exists.!

❖ Different framing correspond to different global features.!

❖ Interesting framework for leveraging global features classification.!

❖ Fast and simple!!

❖ New vista for multimedia research.

Questions? Thank you!

top related