flickr distance

Post on 23-Feb-2016

50 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

ACM Multimedia 2008. Flickr Distance. Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li Microsoft Research Asia University of Science and Technology of China October 28, 2008. Multimedia Information Retrieval. Indexing. Ranking. Clustering. ……. Recommendation. Annotation. - PowerPoint PPT Presentation

TRANSCRIPT

Flickr DistanceACM Multimedia 2008

Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li

Microsoft Research AsiaUniversity of Science and Technology of China

October 28, 2008

2

IndexingRankingClustering……Recommenda

tionAnnotati

on

Multimedia

Information

Retrieval

Multimedia

Information

Retrieval

3

Image Similarity

/DistanceConcept

Similarity/

Distance

Annotation

Indexing

Ranking

Clustering

……

Recommendation

4

Image Similarity

/DistanceConcept

Similarity/

Distance

Image Similarity/Distance

5

Image Similarity/Distance

Numerous efforts have been made.

Concept Similarity

/Distance

Concept Similarity/Distance

Image Similarity/Distance

6

Concept Similarity/Distance

Olympic

Numerous efforts have been made.

Sports

CatTige

rPawMore and more used, but not well studied.

7

WordNet Distance

Google Distance

Tag Concurrence Distance

WordNet Distance

8

WordNet150,000 words

WordNet DistanceQuite a few methods to get it in WordNetBasic idea is to measure the length of the path between two words

Pros and ConsPros:

Cons:

Built by human experts, so close to human perceptionCoverage is limited and difficult to extend

Google Distance

9

Normalized Google Distance (NGD)Reflects the concurrency of two words in Web documentsDefined as

Pros and ConsPros:Cons:

Easy to get and huge coverageOnly reflects concurrency in textual documents. Not really concept distance (semantic relationship)

10

Concept Pairs

Google Distance

Airplane – Dog 0.2562

Football – Soccer 0.1905

Horse – Donkey 0.2147

Airplane – Airport 0.3094

Car – Wheel 0.3146

Tag Concurrence Distance

11

Reflects the frequency of two tags occur in the same imagesBased on the same idea of NGDMostly is sparse (> 95% are zero in the similarity matrix)

Pros and ConsPros:Cons:

Images are taken into accounta)Tags are sparse so visual

concurrency is not well reflected

b)Training data is difficult to get

similarity matrix: 500 tagssimilarity matrix: 50 tags

Image Tag Concurrence Distance (Qi, Hua,

et al. ACMMM07)

12

Tag Concurrence Distance

0.8532

0.1739

0.4513

0.1833

0.9617

Concept Pairs

Google Distance

Airplane – Dog 0.2562

Football – Soccer 0.1905

Horse – Donkey 0.2147

Airplane – Airport 0.3094

Car – Wheel 0.3146

Different Concept Relationships

13

Synonymydifferent words but the same

meaning

table tennis ping-pong—

Visually Similarsimilar things or things of same

type

horse donkey

Meronymypart and the whole

car wheel—

Concurrencyexist at the

same scene/place

airplane

airport

14

Image tag concurrence distance implicitly uses image information, but tags are too sparse

Google distance’s coverage is very high, but it is for text domain

Conc

ept

Dis

tanc

e

WordNet distance is good, but coverage is too low

Mine from ontology

Mine from text documents

Mine from image tags

15

Can we mine concept distance

from image content?

Some Facts

16

Semantic concept distance is based on human’s cognition

80% of human cognition comes from visual information

There are around 2.8 billion photos on Flickr (by Sep 08)

In average each Flickr image has around 8 tags

To mine concept distance from a large tagged

image collection based on image content

bear, fur, grass, tree polar bear, water, sea polar bear, fighting, usa

Overview of Flickr Distance

17

Concept A: Airplane

Concept B: Airport

Concept Model A

Concept Model B

Flickr Distance (A, B)

Flickr Distance

0.5151

0.0315

0.4231

0.0576

0.0708

18

Flickr Distance is able to cover the four different semantic relationshipsSynonymy, Visually Similar, Meronymy, and Concurrency

What We Need

19

R1: A Good Image CollectionLargeHigh coverage, especially on daily lifeWith tags

What We Need

20

R2: A Good Concept Representation or ModelBased on image contentCan cover wider concept relationshipsCan handle large-concept set

SVM, Boosting, …Discriminative

GenerativeGlobal FeatureLocal

Featurew/o Spatial

Relationw/ Spatial Relation

Bag-of-Words (pLSA, LDA), …2D HMM, MRF, …

Concept Models

What We Need

21

SVM, Boosting, …Discriminative

GenerativeGlobal FeatureLocal

Featurew/o Spatial

Relationw/ Spatial Relation

Bag-of-Words, …2D HMM, MRF, …

Concept Models

VLM – Visual Language Model Spatial-relation sensitive Efficient Can handle object variations

Statistical Language Model

22

I am talking about statisticallanguagemodel.

Unigram Model

Bigram ModelTrigram Model

xnx wPwwwwP 21

121 xxnx wwpwwwwP

2121 xxxnx wwwPwwwwP

Comparison on Image CategorizationCaltech 8 categories / 5097 images

pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM0

20406080

100

59 64

88 90 90

Accuracy (%)

Performance of VLM

24pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM

0.000.501.001.502.002.503.00

1.11

2.44

0.44

0.840000000000001

0.14

Training Time (sec/image)

Latent-Topic VLM (1)

25

Why Latent-Topic

Latent-Topic VLMVisual variations of concept are taken as latent topics

Cconceptoftopiclatentkthez

Cconceptinimagejthed

conceptAC

dzPzwwwPdwwwP

thCk

thCj

K

k

Cj

Ck

Ckyxyxxy

Cjyxyxxy

:

:

:

,,1

1,,11,,1

Latent-Topic VLM (2)

26

Latent-Topic VLM TrainingSolved by EM algorithm, The objective function is to maximize the joint distribution of concept and its visual word arrangement Aw

Cd yx

Cjyxyxxy

w

Cj

dwwwP

CApmaximize

,1,,1 ,

,

Estimate the posteriors of the hidden topics

Maximize the likelihood of visual arrangement

Performance of LT-VLM

27

Comparison on Image CategorizationCaltech 8 categories / 5097 images

pLSA (BOW)

LDA (BOW) 2D MHMM SVM VLM LT-VLM0

20406080

100

59 6488 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)

2D MHMM SVM VLM LT-VLM0.00

1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

010.14 0.24

Training Time (sec/image)

Flickr Distance

28

Kullback – Leibler (KL) divergenceGood, but not symmetric

Jensen –Shannon (JS) divergenceBetter, as it is symmetricAnd, square root of JS divergence is a metric, so is Flickr Distance

K

i

K

j zzJSCj

CiFlickr C

jCiPPDCzPCzPCCD

1 1 2121 )|()|()|(),( 2121

l Z

ZZZZKL lP

lPlPPPD

Cj

Ci

Ci

CCi

2

1

121 log)(

2)(

21)(

21)(

11

2121

Ci

Ci

Cj

Ci

CCi

ZZ

ZKLZKLZZJS

PPM

MPDMPDPPD

topic distance

topic distance

concept distance

Procedure of Flickr Distance

29

Concept A: Airplane

Concept B: Airport

Concept Model A

Concept Model B

Flickr Distance (A, B)

Tag search in

Flickr

Jensen-Shannon

Divergence

LT-VLM

Experiments

30

EvaluationObjective evaluationSubjective evaluation

ApplicationsConcept clusteringImage annotationTag recommendation

Experiments - Configurations

31

Images6,400,000 from Flickr

Concepts130,000,000 different tags10,000,000 filtered tags1,000 randomly-selected tags

ComparisonNormalized Google Distance (NGD)Tag Concurrence Distance (TCD)Flickr Distance (FD)

Eva1: Subjective Evaluation

32

Ground-Truth12 persons are asked to score semantic correlation of each concept pairAverage scores are taken as ground-truth

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all distance pairs D(a,b) and D(c,d)Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

NGD TCD FD0.470.480.49

0.50.510.520.530.540.550.560.57

Correct Rate

Eva2: Objective Evaluation

33

Ground-TruthWordNet DistanceOnly 497 concepts (overlap of WordNet and the 1000 concepts)

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all distance pairs D(a,b) and D(c,d)Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

NGD TCD FD0.450.460.470.480.49

0.50.510.520.530.54

Correct Rate

App1: Concept Clustering

34

Concept Clustering23 concepts; 3 groups – (1) outer space, (2) animal and (3) sports

Normalized Google Distance Tag Concurrence Distance Flickr Distance

Group1 Group2 Group3 Group 1 Group2 Group3 Group1 Group2 Group3

bearshorsesmoonspace

bowlingdolphindonkeySaturnsharkssnake

softballspidersturtle

Venuswhalewolf

baseballbasketball

footballgolf

soccertennis

volleyball

moonspaceVenuswhale

baseballdonkeysoftball

wolf

basketballbears

bowlingdolphinfootball

golfhorsesSaturnsharkssoccer

spiderstennisturtle

volleyball

moonSaturnspaceVenus

bearsdolphindonkey

golfhorsessharksspiderstenniswhalewolf

baseballbasketball

footballsnakesoccerbowlingsoftball

volleyball

App2: Image Annotation

35

Based on an approach using concept relationDual Cross-Media Relevance Model (DCMRM, J. Liu et al. ACMMM 2007) On 79 concepts / 79,000 images

The number of correctly annotated keywords at the first N words1 2 3 4

0200400600800

10001200

55

212 212301

53186 193

310

57

354423

960

NGD-DCMRM TC-DCMRM FD-DCMRM

Tota

l num

ber

of

corr

ect k

eyw

ords

App3: Tag Recommendation

36

To Improve Tagging QualityEliminating tag incompletion, noises, and ambiguity500 images / 10 recommended tags per image

NGD Tag Concurrent Distance Flickr Distance0.580.6

0.620.640.660.680.7

0.720.740.760.78

0.65200000000000

1

0.66500000000000

1

0.75800000000000

1

Precision @ 10

Summary

39

A novel approach to discover semantic relationships from image contentbased on real-life images from the Webbased on collective intelligence from grassroots

A distance more consistent with human’s perceptionA measurement more effective in many applications

Flickr Distance

Future Work

40

Flickr Distance as a Service.

Thank You

41

Backup

42

TagNet

43

TagNet – Visual Concept Net

Can be used in many applicationsKnowledge representationConcept learningMultimedia retrieval...

)(:)(:

)(:,,

weightDistanceFlickrWwedgeiprelationshsemanticEe

nodeconceptVvWEVG

TagNet

44

VisualizationThe bigger the distance, the longer the edgeUsing a tool called NetDraw provided byInternational Network for

Social Network Analysis

Outline Motivation Overview Visual Language Model Flickr Distance Calculation Evaluations and Applications

45

Semantic Relationship Is Important

46

Many efforts on using semantic relationshipsGJ Qi et al. Correlative Multi-Label Video Annotation. ACM MM 2007.R. Datta et al. Image Retrieval: Ideas, Influences and the Trends of the New Age. ACM Computing Surveys, 2008.L. Leslie et al. Annotation of Paintings with High-Level Semantic Concepts Using Transductive Inference and Ontology-based Concept Disambiguation. ACM MM 2007.J. Yu et al. Semantic Subspace Projection and Its Application in Image Retrieval. IEEE T CSVT 2008.

Applications of semantic relationshipsNatural language processingObject detectionConcept detectionMultimedia retrieval

Discussion

47

Why VLM divergence can estimate concept distance?

Why FD works well even tags are not complete?

Computer

TV

Office

room patternscomputer patterns other patterns

room patterns TV patterns other patterns

room patternsscreen patterns other patterns

VLM: distribution of trigrams

Flickr Distance is able to cover the four different semantic relationships

Synonymy, Visually Similar, Meronymy, and Concurrency

Visual Word Generation

49

Typical methodsSIFT + Clustering/PCA

Our methodPatch + Texture Direction Histogram + HashingEfficient, low-dimension, and rotation-Invariant Only need 1/20 computation of SIFT feature

1 0 0 1 0 0 1 0

Image Patch

Patch Gradient

Texture HistogramHashing Visual Word

Performance of VLM

50

Comparison on Image CategorizationCaltech 8 categories / 5097 images (L. Wu, et al. MIR 2007/T-MM 2008)

pLSA (BOW)

LDA (BOW) 2D MHMM SVM VLM LT-VLM0

20406080

100

59 6488 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)

2D MHMM SVM VLM LT-VLM0.00

1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

010.14 0.24

Training Time (sec/image)

Eva1: Objective Evaluation

51

Ground-TruthWordNet DistanceOnly 497 concepts (overlap of WordNet and the 1000 concepts)

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all concept triples (A,B,C)Step 2: Get 6 distance pairs for each triple (consider asymmetry)Step 3: Compute the correct ratio of each distance pair in terms of order (not value), compared with ground-truth distance

pair

NGD Ground-TruthC

A

B C

A

B

(AB,AC) x(AB, BC) √(AC, BC) √

Performance of VLM

52

Comparison on Image CategorizationCaltech 8 categories / 5097 images

pLSA (BOW)

LDA (BOW) 2D MHMM SVM VLM LT-VLM0

20406080

100

59 6488 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)

2D MHMM SVM VLM LT-VLM0.00

1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

010.14 0.24

Training Time (sec/image)

Future Work

53

ScalabilityLarge-scale testingTagNet as a service

Other data“PicNet Distance” based on different dataset / Optimizing datasetIntegrating text/tag concurrency distance and Flickr Distance

Concept modelingHandling scale variations (multiple-resolution)New models

More applicationsTag rankingQuery suggestions

top related