ghent university-ibbt at mediaeval 2012 search and hyperlinking: semantic similarity using named...

22
ELIS – Multimedia Lab Tom De Nies Pedro Debevere, Davy Van Deursen, Wesley De Neve, Erik Mannens and Rik Van de Walle Ghent University – IBBT – Multimedia Lab MediaEval: Search and Hyperlinking 4-5 October, Pisa, Italy

Upload: mediaeval2012

Post on 29-Nov-2014

704 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

ELIS – Multimedia Lab

Tom De NiesPedro Debevere, Davy Van Deursen, Wesley De Neve, Erik

Mannens and Rik Van de Walle

Ghent University – IBBT – Multimedia Lab

MediaEval: Search and Hyperlinking4-5 October, Pisa, Italy

Page 2: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

2

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

1. Create enriched representation of videos and queries

2. Apply multiple similarity metrics3. Merge results by late fusion

Our approach in a nutshell

Page 3: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

3

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Enriched Data Representation

Page 4: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

4

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

AdvantagesComparable queries and videosExtra metadata containing disambiguated conceptsEasy conversion from video to query object

→ possible to use same approach for Search and Linking!Disadvantages

o Enrichment step when ingesting data can take a whileo Only English NER tools → automatic translation step for

other languages

Enriched Data Representation

Page 5: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

5

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

1. Create enriched representation of videos and queries

2. Apply multiple similarity metrics3. Merge results by late fusion

Page 6: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

6

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

1. “Bag of words” similarity2. Named Entity-based similarity3. Tag-based similarity

Similarity metrics

Page 7: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

7

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Bag of Words similarity

TEXTSTOP WORD

REMOVAL

TEXTWITHOUT

STOPWORDS

CALCULATE TERM FREQUENCY

(TF)

CALCULATE INVERSE

DOCUMENT FREQUENCY (IDF)

FOREACHWORD

TF-IDFVECTOR

ofTF-IDF weights:

TF(t,D) = # of occurrences of

t in D

𝐼𝐷𝐹 (𝑡)=log ¿𝐷∨ ¿¿𝑑∈𝐷 :𝑡∈𝑑∨¿

¿¿

Page 8: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

8

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Similarity between two documents=

Cosine similarity of their TF-IDF vectors

Bag of Words similarity

Page 9: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

9

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Bag of Words similarity

Both corpus & documents taken into account

Common words get lower weight to exploit unique features

Expensive training step (IDF initialization)

No semantics → ambiguity

Page 10: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

10

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Named Entities are extracted from content

Similar content will have similar entities!

Named Entity-based Similarity

Page 11: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

11

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Cosine similarity of vectors as in Bag of Words except:

Words → Named EntitiesIDF → inverse support (IS)

TF-IS weights:

Named Entity-based Similarity

Page 12: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

12

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Named Entity-based Similarity

Less entities than terms → less calculations than BoW

Named Entities are unambiguous

IDF → IS : no indexing of corpus required

Lower precision / coarser granularity than BoW

Page 13: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

13

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Tags expanded with synonyms for maximal recall(WordNet)

Jaccard Similarity

Tag-based similarity

Page 14: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

14

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Tag-based similarity

Uses user-generated metadata

Very coarse granularity / Low precision

Synonyms for higher recall

Page 15: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

15

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

1. Create enriched representation of videos and queries

2. Apply multiple similarity metrics3. Merge results by late fusion

Page 16: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

16

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Late Fusion

Page 17: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

17

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

RunMRR mGAP MASP

60 30 10 60 30 10 60 30 10

1 (LIMSI: BoW+NE)

0.188 0.15 0.11

70.120

0.089

0.033

0.066

0.066

0.061

2 (LIUM: BoW+NE)

0.254

0.187

0.054

0.140

0.069

0.033

0.046

0.046

0.028

3 (LIMSI: BoW+NE+Tags)

0.165

0.128

0.094

0.099

0.069

0.017

0.061

0.061

0.057

4 (LIUM: BoW+NE+Tags)

0.221

0.154

0.038

0.115

0.053

0.017

0.040

0.041

0.023

Evaluation: Search

Page 18: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

18

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Unexpected:• LIUM > LIMSI, even though LIMSI had better language

detection → due to automatic translation?

• NE + BoW > NE + BoW + Tags→ Tags give false positives higher rank and find more results, so MRR decreases

Evaluation: Search

Page 19: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

19

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Evaluation: Search

Run Precision @60 Recall @60

1 (LIMSI: BoW+NE) 0.056 0.40

2 (LIUM: BoW+NE) 0.061 0.467

3 (LIMSI: BoW+NE+Tags) 0.054 0.433

4 (LIUM: BoW+NE+Tags) 0.059 0.50

Page 20: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

20

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

MAP (Ground Truth)

MAP (Search results)

LIMSI (BoW + NE) 0.157 0.014

LIUM (BoW + NE) 0.171 0.040

LIMSI (BoW + NE + Tags)

0.157 0.003

LIUM (BoW + NE + Tags) 0.171 0.037

Evaluation: Linking

Possible explanations:• Thresholds optimized for Search task, not for Linking• User-generated tags vs. extracted tags

… to be investigated!

Page 21: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

21

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

• Better ranking criteria / late fusion• Improve tag-similarity• Optimize parameters for linking

Improvements / Future Work

Page 22: Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Similarity using Named Entities

22

ELIS – Multimedia Lab

MediaEval 2012: Brave New Task: Search and HyperlinkingTom De Nies (IBBT-MMLab)

05/10/2012

Discussion

These research activities were funded by Ghent University, IBBT, the IWT Flanders, the FWO-Flanders, and the European Union, in the context of the IBBT project Smarter Media in Flanders (SMIF).