concentric semantic snapshot

41
THE CONCENTRIC NATURE OF NEWS SEMANTIC SNAPSHOTS JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO RAPHAËL TRONCY @peputo / [email protected] @giusepperizzo / [email protected] @rtroncy / raphael.troncy@ eurecom.fr

Upload: jose-luis-redondo-garcia

Post on 17-Feb-2017

616 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Concentric Semantic Snapshot

THE CONCENTRIC NATURE OF NEWS SEMANTIC SNAPSHOTS

JOSÉ LUIS REDONDO GARCIAGIUSEPPE RIZZORAPHAËL TRONCY

@peputo / [email protected]

@giusepperizzo / [email protected]

@rtroncy / [email protected]

Page 2: Concentric Semantic Snapshot

2

Overview

May 1, 2023 8th International Conference on Knowledge Capture

1. Introducing the Problem: Contextualizing News Items o The News Semantic Snapshot (NSS)

2. Previous Work:o Frequency-based Functionso Multidimensional Relevancy Approach

3. A Concentric Model for Generating NSS

Page 3: Concentric Semantic Snapshot

3

Overview

May 1, 2023 8th International Conference on Knowledge Capture

1. Introducing the Problem: Contextualizing News Items o The News Semantic Snapshot (NSS)

2. Previous Work:o Frequency-based Functionso Multidimensional Relevancy Approach

3. A Concentric Model for Generating NSS

Page 4: Concentric Semantic Snapshot

4

The Problem: Contextualizing News

May 1, 2023 8th International Conference on Knowledge Capture

Wolfgang Schäuble

Finance Minister Ruling Party in Ger.

Christian Democratic Union

1 2 3

Page 5: Concentric Semantic Snapshot

5May 1, 2023 8th International Conference on Knowledge Capture

Sarah Harrison

WikiLeaks Editor Airport in Moscow

Sheremetyevo

The Problem: Contextualizing News1 2 3

Page 6: Concentric Semantic Snapshot

6

Contextualizing News: Applications

May 1, 2023 8th International Conference on Knowledge Capture

1 2 3

Page 7: Concentric Semantic Snapshot

7

1 2 3

News Semantic Snapshot (NSS) [1]

May 1, 2023 8th International Conference on Knowledge Capture

News Semantic Snapshot (NSS)

[1] Redondo et al., Generating the Semantic Snapshot of Newscasts using Entity Expansion, ICWE 2015, Rotterdam.

Page 8: Concentric Semantic Snapshot

May 1, 2023 8

Recreating the NSS

News Semantic Snapshot8th International Conference on Knowledge Capture

ea eb ec ed ef eg eh ei ej ek el em

ea ec eh ej ek em

(2) SELECTION: filtering, clustering, ranking…

(1) EXPANSION: query generation, search, document retrieval…

ea eb ec ed

1 2 3

Page 9: Concentric Semantic Snapshot

May 1, 2023 9

Involving: (experts in the news domain + users)Dimensions:

Play with the data and help us to extend it at: https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation

News Semantic Snapshot: Gold Standard

(1) Video Subtitles(2) Image in the video(3) Text in the video image(4) Suggestions of an expert(5) Related articles

8th International Conference on Knowledge Capture

1 2 3

Page 10: Concentric Semantic Snapshot

May 1, 2023 10

Recreating the NSS

News Semantic Snapshot8th International Conference on Knowledge Capture

ea eb ec ed ef eg eh ei ej ek el em

ea ec eh ej ek em

(2) SELECTION: filtering, clustering, ranking…

(1) EXPANSION: query generation, search, document retrieval…

ea eb ec ed

1 2 3

Page 11: Concentric Semantic Snapshot

May 1, 2023

(1) Bringing in Missing Entities: News Entity Expansion

11

1.a)

8th International Conference on Knowledge Capture

Web sites to be crawled:- Google- L1 : A set of 10

internationals English speaking newspapers

- L2 : A set of 3 international newspapers used in GS

Temporal Window:- 1W: - 2W: Annotation filtering- Schema.org

1.b)Parameters [1]:

1 2 3

[1] Redondo et al., Generating the Semantic Snapshot of Newscasts using Entity Expansion, ICWE 2015, Rotterdam.

Page 12: Concentric Semantic Snapshot

May 1, 2023 12

News Semantic Snapshot8th International Conference on Knowledge Capture

ea eb ec ed ef eg eh ei ej ek el em

ea ec eh ej ek em

(2) SELECTION: filtering, clustering, ranking…

(1) EXPANSION: query generation, search, document retrieval…

ea eb ec ed

Recall (E. Expansion) = 0.91

Recall (NER on Subtitles) = 0.42

Recreating the NSS1 2 3

Page 13: Concentric Semantic Snapshot

May 1, 2023 138th International Conference on Knowledge Capture

(NSS)

(Entity Expansion)

0

N

FIdeal(ei)

(NSS)

FX(ei)

=?MNDCG

The Selection Problem: 1 2 3

Page 14: Concentric Semantic Snapshot

14

Overview

May 1, 2023 8th International Conference on Knowledge Capture

1. Introducing the Problem: Contextualizing News Items o The News Semantic Snapshot (NSS)

2. Previous Work:o Frequency-based Functiono Multidimensional Relevancy Approach

3. A Concentric Model for Generating NSS

Page 15: Concentric Semantic Snapshot

May 1, 2023 15

1º Entity Frequency SNOW Workshop 2014 [2]

8th International Conference on Knowledge Capture

A

1 2 3

[2] Redondo et al., Describing and Contextualizing Events in TV News Show}, SNOW Workshop, WWW 2014, Seoul, Korea.

Page 16: Concentric Semantic Snapshot

May 1, 2023 16

Frequency Based: Results

8th International Conference on Knowledge Capture

(NSS)

(Expansion)

FREQ0

N

(NSS

)

F(Laura Poitras) = 2

F(Glenn Greenwald) = 1

1 2 3

Page 17: Concentric Semantic Snapshot

May 1, 2023 15th International Conference on Web Engineering (ICWE) 17

(Fr) (FrGaussian)

Multidimensional ApproachICWE 2015 [1]

1 2 3

[1] Redondo et al., Generating the Semantic Snapshot of Newscasts using Entity Expansion, ICWE 2015, Rotterdam.

Page 18: Concentric Semantic Snapshot

May 1, 2023

POPULARITY (FPOP) EXPERT RULES (FEXP)

18

- Based on Google Trends- w = 2 months- μ + 2*σ (2.5%)

Example:- [ Location, = 0.48 ]- [ Person, = 0.74 ]- [ Organization, = 0.95 ]- [ < 2 , = 0.0 ]

15th International Conference on Web Engineering (ICWE) 18

Multidimensional Approach1 2 3

Page 19: Concentric Semantic Snapshot

May 1, 2023 19

- News Entity Expansion + Dimensions Generate the News Semantic Snapshot

- Best score: 0.667 in MNDCG at 10, better than BS1/2

• Collection: CSE (Google + 2W + Schema.org)• Ranking:

• Expert Rules• Popularity

8th International Conference on Knowledge Capture

Multidimensionality: Results1 2 3

Page 20: Concentric Semantic Snapshot

May 1, 2023 208th International Conference on Knowledge Capture

(NSS))

(Expansion)

FREQ POP EXP

+ + =

(NSS

)

Multidimensionality: Results1 2 3

Page 21: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 21

Follow up: Fine-Tuning

1. Exploit Google Relevance (+1.80%)2. Promote Subtitle Entities (+2.50%)3. Exploit Named Entity Extractor’s confidence (+0.20%)4. Interpret popularity Dimension (+1.40%)5. Performing Clustering before Filtering (-0.60%)

- NO SIGNIFICANT IMPROVEMENT -

1 2 3

Page 22: Concentric Semantic Snapshot

May 1, 2023 228th International Conference on Knowledge Capture

(NSS)

TuneFunction XFREQ POP EXP

No Improvement: Why?Re-ShuffleOriginal

(NSS

)

How many Dimensions?How to combine them?

1 2 3

Page 23: Concentric Semantic Snapshot

23

Overview

May 1, 2023 8th International Conference on Knowledge Capture

1. Introducing the Problem: Contextualizing News Items o The News Semantic Snapshot (NSS)

2. Previous Work:o Frequency-based Functiono Multidimensional Relevancy Approach

3. A Concentric Model for Generating NSS

Page 24: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 24

Thinking Outside the Box:

1. Is there room for improvement?2. Is MNDCG a good measure to

evaluate NSS? 3. How to significantly improve the

approach?

1 2 3

Page 25: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 25

Room for Improvement?

GAIN

1 2 3

Page 26: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 26

Room for Improvement?1 2 3

Page 27: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 27

How to Evaluate NSS? MNDCG:• Too focused on success at first positions (decay

Function)

• NSS intends to be flexible, ranking is application-dependent

COMPACTNESS:• Prioritizes coverage over ranking• Compromise between: Recall and NSS size• Recall*: positives are weighted according to score in GT

(NSS)

1 2 3

Page 28: Concentric Semantic Snapshot

May 1, 2023 288th International Conference on Knowledge Capture

Compactness:Recall: 22/33 = 0.66

Sa = 27

Sb = 33

Sc = 54

Sa = 27

Sb = 33

Sc= 54

(NSS

)

A B CA

B

C

> >

1 2 3

Page 29: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 29

Re-thinking the Approach: Concentric Snapshot

Duality in News Entity Spectrum:• REPRESENTATIVE entities:

• Driving the plot of the story, sometimes evident for users. • RELEVANT entities

• Related to former via specific reasons

Exploit the entity semantic relations

Popular?

Suggested by Expert?

Informative?Unexpected?

Interesting?

Explicative?Highly

1 2 3

Page 30: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 30

Hypothesis: Concentric SnapshotCORE:• Representative entities

• Spottable via Frequency dimensions

• High degree of cohesiveness

CRUST:• Attached to the Core via

particular relations

• Agnostic to relevancy nature: informativeness, interestingness, etc.

1 2 3

Page 31: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 31

Core Generationa) Representative entities: Frequency Dimension

(NSS)

b) Cohesiveness (DBpedia)

1 2 3

Page 32: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 32

Crust Generation

The number of Web documents talking simultaneously about a particular entity e and the Core:

??

1 2 3

Page 33: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 33

Experimental Settings

1. Entity Frequency• Core1: Jaro-Winkler > 0.9 • Core2: Frequency based on Exact String matching

2. Cohesiveness: • Everything is Connected Engine [3]• Skb(e1, e2) > 0.125

CORE: (2 configurations)

[3] Everything is Connected Engine:

https://github.com/mmlab/eice

1 2 3

Page 34: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 34

1. Candidates for CRUST generation: • Ex1: 1° ICWE2015 by R*(50): L2+Google, F3 1W, Gauss+ POP• Ex2: 2° ICWE 2015 by R*(50): L2+Google, F3 1W, Freq + POP

2. Function for attaching entities to CORE:• SWEB(ei, Core) over Google CSE, default Configuration

CRUST:

Experimental Settings1 2 3

(2 configurations)

Page 35: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 35

• Core+Crust: • CrustOnly:

Projecting CORE and CRUST:

(NSS)

(Expansion)

CORE CRUST Core+Crust CrustOnly

Experimental Settings1 2 3

(2 configurations)

Page 36: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 36

Baselines:

BAS01: best run in ICWE 2015 at R*(50)BAS02: second best run in ICWE 2015 at R*(50)

FREQPOPEXP

Experimental Settings1 2 3

Page 37: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 37

Results: Compactness

Percentage decrease of 36.9% over BAS01

IdealGT: size of SSN according to Gold Standard

(2*2*2 + 2) Runs

1 2 3

Page 38: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 38

Results: Recall* over N1 2 3

Page 39: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 39

Conclusion• News applications can benefit from the News Semantic Snapshot (NSS)

• Proposed a concentric based model for generating the NSS:• Formalizes duality in entities (Representative VS Relevant)• Exploit the entity semantic relations between Core and Crust.• Accommodate into a single model different relevancy dimensions via the notion of

web presence ( SWeb )

• Concentric model better reproduces the NSS:• Better Compactness: 36.9% over BAS01• Similar recall, Smaller size

• Concentric model easier to implement:

• Core can be reproduced via Frequency Dimension• Crust brings up relevant entities without having to deal with fuzzy dimensions

1 2 3

Page 40: Concentric Semantic Snapshot

May 1, 2023 8th International Conference on Knowledge Capture 40

Future• Extend the number of videos considered in GT:

From 5 to 23 (+18), check [4] for more information

• Spot not only relationships between Crust and the Core but also predicates that characterize them:

[4] https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation

Editor in WikiLeaks

1 2 3

Page 41: Concentric Semantic Snapshot

JOSÉ LUIS REDONDO GARCIAGIUSEPPE RIZZORAPHAËL TRONCY

@peputo / [email protected]

@giusepperizzo / [email protected]

@rtroncy / [email protected]

http://www.slideshare.net/joseluisredondo/concentric-semantic-snapshot

Visit poster at booth:

34