semantically capturing and representing news stories on the web

81
Semantically Capturing and Representing News Stories on the Web José Luis Redondo García Jluisred.github.io @peputo

Upload: jose-luis-redondo-garcia

Post on 17-Feb-2017

730 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web José%Luis%Redondo%García

Jluisred.github.io @peputo

Page 2: Semantically Capturing and Representing News Stories on the Web

Outline

�  Semantic Annotation of News’ Context

Original artwork by Matt Might http://matt.might.net/articles/phd-school-in-pictures/

�  TOWARDS A SEMANTIC MULTIMEDIA WEB

i.  Media annotation ii.  A multimedia model iii.  Semantic media

exploitation �  CONTEXTUALIZING NEWS

STORIES i.  The News Semantic

Snapshot (NSS) ii.  The multidimensional

nature of the entity relevance

iii.  A concentric model for NSS generation

iv.  NSS in the consumption of News

Future Career PHD Previous

1 2

Page 3: Semantically Capturing and Representing News Stories on the Web

Outline

Semantically Capturing and Representing News Stories on the Web 3

� Part II: Semantic Annotation of News’

Context

Multidimensional Relevancy

NSS Generation

Concentric Model

NSS Gold Standard

News Prototypes

2016/03/04

Page 4: Semantically Capturing and Representing News Stories on the Web

The Use Case: Contextualizing News

Semantically Capturing and Representing News Stories on the Web 4

http://www.bbc.com/news/world-europe-23339199#t=34.1,39.8 (Media Fragment URI 1.0)

Edward Snowden

(NE over Subtitles) Sarah Harrison

WikiLeaks Editor Airport in Moscow

Sheremetyevo

2016/03/04

Page 5: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 5

The Use Case: Contextualizing News

2016/03/04

Page 6: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 6

Research Questions

� Q1: How can multimedia content be semantically annotated and seamlessly connected with other resources on the Web?

� Q2: Can those semantic annotations and linked media resources bring value for the exploitation and consumption of multimedia content?

� Q3: Is it possible to automatically contextualize news stories with background information so they can be effectively interpreted by humans and machines?

2016/03/04

Page 7: Semantically Capturing and Representing News Stories on the Web

Part 1 Towards a Semantic Multimedia Web

Semantically Capturing and Representing News Stories on the Web 7

1

Q.1, Q.2

2016/03/04

Page 8: Semantically Capturing and Representing News Stories on the Web

“ Bringing Multimedia to the Web

Why?

Semantically Capturing and Representing News Stories on the Web 8

�  Make video a first citizen of the Web

�  Make video universally accessible and shareable at different granularities (segments)

�  Benefit from the vast knowledge already present on the Web

2016/03/04

Page 9: Semantically Capturing and Representing News Stories on the Web

Semantic Annotation �  Alfonseca, E. and Manandhar. An

unsupervised method for General Named Entity Recognition and Automated Concept Discovery

�  Mendes, P., Jakob, M. and Garcia-Silva, A and Bizer, C. DBpedia spotlight: shedding light on the web of documents

�  Shinyama, Y. and Sekine, S. Named entity discovery using comparable news articles

�  Chang, S-F, Manmatha, R and Chua, T-S.

Combining text and audio-visual features in video indexing

�  Wang, Richard C. and Cohen, William W. Iterative Set Expansion of Named Entities Using the Web

�  Talukdar, P-P., Brants, T., Liberman, M. and Pereira, F. A. Context Pattern Induction Method for Named Entity Extraction

Multimedia Modeling �  MPEG-7 http://mpeg.chiariglione.org/

standards/mpeg-7/mpeg-7.htm �  TV-Anytime http://tech.ebu.ch/tvanytime �  Synchronized Multimedia Integration

Language https://www.w3.org/TR/REC-smil/

�  Media Fragment URI 1.0 specification (W3C) http://www.w3.org/TR/media-frags ◉  Synote: http://linkeddata.synote.org ◉  Ninsuna: http://ninsuna.elis.ugent.be/

�  BBC Programmes Ontology http://www.bbc.co.uk/ontologies/programmes/2009-09-07.shtml

�  Schema.org (SchemaDotOrgTV) http://www.w3.org/wiki/WebSchemas/

�  Ontology for Media Resources https://www.w3.org/TR/mediaont-10/

�  Web Annotation https://www.w3.org/TR/annotation-model/

Semantically Capturing and Representing News Stories on the Web 9

State of the Art & Related Work Part 1

Named Entity

Multimodal

Expansion

� 2016/03/04

Page 10: Semantically Capturing and Representing News Stories on the Web

Multimedia Annotations

Semantically Capturing and Representing News Stories on the Web 10

�  Automatic annotation: 300 hours/min YouTube video

�  What is inside the video? multimodal approach

�  Semantic annotations, leveraging on Web Resources: more human-like operations

1.a

2016/03/04

Page 11: Semantically Capturing and Representing News Stories on the Web

1 ontology http://nerd.eurecom.fr/ontology 2 API http://nerd.eurecom.fr/api/application.wadl 3 UI http://nerd.eurecom.fr

Multimedia Annotation: Named Entity Recognition

Semantically Capturing and Representing News Stories on the Web 11

nerd:Product S-Bahn

nerd:Person Obama

nerd:Person Michelle

nerd:Location Berlin

http://data.linkedtv.eu/media/e2899e7f#t=840,900

Part 1.a

https://github.com/giusepperizzo/nerdml

ML [Rizzo_LREC’14]

2016/03/04

Page 12: Semantically Capturing and Representing News Stories on the Web

Other documents similar to DS

b) Expanded Entities

a) Entities from Seed Document DS

Multimedia Annotation: Named Entity Expansion

Semantically Capturing and Representing News Stories on the Web 12

[Redondo_SNOW’14]

Part 1.a

2016/03/04

Page 13: Semantically Capturing and Representing News Stories on the Web

Multimedia Annotation: Expansion Pipeline

Semantically Capturing and Representing News Stories on the Web 13

[Redondo_SNOW’14]

Part 1.a

Available @ http://linkedtv.eurecom.fr/entitycontext/api/

2016/03/04

Page 14: Semantically Capturing and Representing News Stories on the Web

Multimedia Annotation: Multimodal Approach

� Text: ○ Keyword Extraction ○ Topic Recognition ○ From Textual Visual Cues to LSCOM Concepts

� Visual: ○ Visual Concept Detection (LSCOM) ○ Shot Segmentation ○ Scene Segmentation ○ Optical Character Recognition (OCR) ○ Automatic Speech Recognition (ASR) ○ Face Detection and Tracking ○ …

14

Multimedia Knowledge

Model

Part 1.a

Semantically Capturing and Representing News Stories on the Web 2016/03/04

Page 15: Semantically Capturing and Representing News Stories on the Web

Multimedia Model

Semantically Capturing and Representing News Stories on the Web 15

�  Explicitly represent video and its annotations

�  At the level of fragments

�  Based on well-known vocabularies, flexible and extensible while being Linked Data compliant

1.b

2016/03/04

Page 16: Semantically Capturing and Representing News Stories on the Web

Multimedia Model: LinkedTV Model

Semantically Capturing and Representing News Stories on the Web 16

Annotation Concept

Keyword BBC Ontology + SchemaDotOrgTV

ANALYSIS RESULTS (Support for segmentation)

Media Fragments URI 1.0 (W3C)

LSCOM

Ontology for Media Resources (W3C)

BROADCAST DATA

Web Annotations (W3C)

EXTERNAL DATASETS

Entity

NERD

Provenance

Ontology for Provenance Management

Programme

Brand

Series

Episode

Version Broadcast

Service Broadcast Channel

Scene

Shot

MediaFragment

Face

Part 1.b

Available @ http://data.linkedtv.eu/ontologies/core/

2016/03/04

Page 17: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 17

Part 1.b

Locator

MediaResource

MediaFragment Annotation

Entity

URL (hyperlink)

Type

OffsetBasedString

Multimedia Model: LinkedTV Model

2016/03/04

Page 18: Semantically Capturing and Representing News Stories on the Web

Multimedia Model: TV2RDF Service

Semantically Capturing and Representing News Stories on the Web 18

Part 1.b

Content Publisher

RDF

Conversion + NERD

TV2RDF

Analysis Metadata

RDF

Triplestore

Available @ http://linkedtv.eurecom.fr/tv2rdf/

2016/03/04

Page 19: Semantically Capturing and Representing News Stories on the Web

Exploiting Knowledge

Semantically Capturing and Representing News Stories on the Web 19

�  Leverage on the Model & Annotations for advanced mining tasks

�  Probe the value of multimodal approach: Evaluation on standard corpora

1.c

2016/03/04

Page 20: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 20

Part 1.c Exploitation: Enriching

oa:Annotation

rbbaktuell_20120809

nerd:Location Berlin

Illustrate seed video [Milicic_WWW'13]

2016/03/04

Page 21: Semantically Capturing and Representing News Stories on the Web

Exploitation: Enriching Services & Prototypes

Semantically Capturing and Representing News Stories on the Web 21

Part 1.c

Name URL Published @

MediaCollector http://linkedtv.eurecom.fr/api/mediacollector/search/ [Rizzo_SAM’12]

MediaFinder http://mediafinder.eurecom.fr/ [Milicic_WWW’13]

Italian Elections 2013 http://mediafinder.eurecom.fr/story/elezioni2013 [Milicic_ESWC’13]

TVEnricher http://linkedtv.eurecom.fr/tvenricher/api/ [LinkedTV_D2.6’14]

TVNewsEnricher http://linkedtv.eurecom.fr/newsenricher/api/ [Redondo_ESWC’14]

2016/03/04

Page 22: Semantically Capturing and Representing News Stories on the Web

Exploitation: Classifying videos

Semantically Capturing and Representing News Stories on the Web 22

Part 1.c

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1.fun channel

0 17

85 85 96 106 114

78

117140

188

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

2.tech channel

0

410453

402 396 404353 364 344 374

571

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

3.sport channel

0

192

298 301 288 291 302260 270

361

231

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

4.news channel

0

527481 488 469

412 412 434 419487

792

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

5.creation channel

0

259 272245

186149

177 165 165143

205

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

6.lifestyle channel

0

1128

786 563 525 475 519 465 501 467

1567

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

7.shortfilms channel

0

169216431567156714971234121410991025

4268

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

8.music channel

0

204222

186

129166

131148 137 125

169

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

9.other channel

0

423495

451401 404

356 354 368 338

689

Thing Amount Animal Event Function Loc Organization Person Product Timex−Axis: The temporal positions of NEsy−Axis: The number of NEs

[Li_LIME'13] Dailymotion Dataset, 805 videos, 46.58% Accuracy

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1.fun channel

0 17

85 85 96 106 114

78

117140

188

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

2.tech channel

0

410453

402 396 404353 364 344 374

571

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

3.sport channel

0

192

298 301 288 291 302260 270

361

231

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

4.news channel

0

527481 488 469

412 412 434 419487

792

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

5.creation channel

0

259 272245

186149

177 165 165143

205

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

6.lifestyle channel

0

1128

786 563 525 475 519 465 501 467

1567

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

7.shortfilms channel

0

169216431567156714971234121410991025

4268

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

8.music channel

0

204222

186

129166

131148 137 125

169

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

9.other channel

0

423495

451401 404

356 354 368 338

689

Thing Amount Animal Event Function Loc Organization Person Product Timex−Axis: The temporal positions of NEsy−Axis: The number of NEs

Temporal distribution of entity types

2016/03/04

Page 23: Semantically Capturing and Representing News Stories on the Web

Exploitation: Promoting Media Fragments

Semantically Capturing and Representing News Stories on the Web 23

Part 1.c

Available @ http://linkedtv.eurecom.fr/HyperTED

[Redondo_ISWC’14]

2016/03/04

Page 24: Semantically Capturing and Representing News Stories on the Web

Evaluation: Multimodal @ Mediaeval 2013

Semantically Capturing and Representing News Stories on the Web 24

Part 1.c

~ 1697h of BBC video data, 2323 videos � Different TV shows

(news, sports, politics…) from 2012

� Subtitles and ASR (English)

� Output of some visual algorithms: shot and face detection

Anchor

Search Task Hyperlinking Task

Query

T/V

v1 v2 v3 vn v1 v2 v3 vn va

2016/03/04

Page 25: Semantically Capturing and Representing News Stories on the Web

Evaluation: Multimodal @ Mediaeval 2013

Semantically Capturing and Representing News Stories on the Web 25

Part 1.c

Annotations Processing Time Type

Visual Concept Detection (151) 20 days on 100 cores Visual **

Scene Segmentation 2 days on 6 cores Visual

OCR 1 day on 10 cores Visual

Keywords Extraction 5 hours Textual **

Named Entities Extraction 4 days Textual

Face detection and Tracking 4 days on 160 cores Visual

� Data Indexing: ◉  Lucene & Solr ◉  Granularities: Shot, Scenes, Sliding Windows… ◉  Multimodality

� Query Formulation: ◉  Search: Text + Visual Cues + Visual Concept

Mapping, MLSCOM ◉  Hyperlink: Subtitles, Keywords, LSCOM

concepts (MoreLikeThis)

Approach:

2016/03/04

Page 26: Semantically Capturing and Representing News Stories on the Web

0.19 MRR (Mean R. Rank)

Evaluation: Mediaeval 2013 Results

Semantically Capturing and Representing News Stories on the Web 26

Part 1.c

Search Task

Hyperlinking Task

[Sahuguet_MediaEval’13]

0,72 P10

2016/03/04

Page 27: Semantically Capturing and Representing News Stories on the Web

Evaluation: Mediaeval 2014 Results

Semantically Capturing and Representing News Stories on the Web 27

Part 1.c

Search Task

[Hoang_MediaEval’14]

Hyperlinking Task

�  Changes in 2014 edition: ◉  New Dataset from BBC: 2686 hours and 3520 videos ◉  No Visual Cues on Search Queries ◉  New Approach: 22% MAP improvement in 2013 Dataset

0.71 P10

0.67 P10

2016/03/04

Page 28: Semantically Capturing and Representing News Stories on the Web

“ Narrowing down…

From Multimedia Content to

News Items

Semantically Capturing and Representing News Stories on the Web 28 2016/03/04

Page 29: Semantically Capturing and Representing News Stories on the Web

Part 2 Semantically Contextualizing News Stories

Semantically Capturing and Representing News Stories on the Web 29

2

Q.3

2016/03/04

Page 30: Semantically Capturing and Representing News Stories on the Web

The Use Case: Contextualizing News

Semantically Capturing and Representing News Stories on the Web 30

Wolfgang Schäuble

Finance Minister Ruling Party in Ger.

Christian Democratic Union

Part 2

2016/03/04

Page 31: Semantically Capturing and Representing News Stories on the Web

Semantic News Annotation �  N. Fernandez, J. A. Fisteus, L. Sanchez, and G. Lopez. Identityrank: Named

entity disambiguation in the news domain.

�  S. Chabra. Entity-centric summarization: Generating text summaries for graph snippets.

�  A. Fuxman, P. Pantel, Y. Lv, A. Chandra, P. Chilakamarri, M. Gamon, D. Hamilton, B. Kohlmeier, D. Narayanan, E. Papalexakis, and B. Zhao. Contextual insights

�  N. Kanhabua, R. Blanco, and M. Matthews. Ranking related news predictions.

�  N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Back to the past: Supporting interpretations of forgotten stories by time-aware re-contextualization.

�  N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Time-travel translator: Automatically contextualizing news articles.

�  T. Stajner, B. Thomee, A.-M. Popescu, M. Pennacchiotti, and A. Jaimes. Automatic selection of social media responses to news.

Semantically Capturing and Representing News Stories on the Web 31

State of the Art & Related Work Part 2

Graph

Named Entities in News

Contextualizing News

Relevancy of Entities

2016/03/04

Page 32: Semantically Capturing and Representing News Stories on the Web

Semantic Snapshot of News (NSS)

Semantically Capturing and Representing News Stories on the Web 32

�  Definition and Motivation

�  A Gold Standard of News Entities

2.a

2016/03/04

Page 33: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 33

Going deep down… It is always challenging

What is on top: Entities explicitly appearing in the documents

Laura Poitras

Anatoly Kucherena

Edward Snowden

Part 2.a The News Semantic Snapshot (NSS)

2016/03/04

Page 34: Semantically Capturing and Representing News Stories on the Web

The News Semantic Snapshot (NSS)

Semantically Capturing and Representing News Stories on the Web 34

Part 2.a

News Semantic Snapshot (NSS) [Redondo_ICWE’15]

2016/03/04

Page 35: Semantically Capturing and Representing News Stories on the Web

The News Semantic Snapshot: Gold Standard

Semantically Capturing and Representing News Stories on the Web 35

Part 2.a

�  High Level of detail, significant human Intervention: (Experts in the news domain + users)

�  Entities in 5 Dimensions: (Visual & Text)

(1) Video Subtitles

(2) Image in the video

(4) Suggestions of an expert

(5) Related articles

USER SURVEY

“We don't have any extradition treaty with Russia. Broadly speaking our policy remains the same: that

we'd like him returned

(3) Text in the video image

(2)

(3)

(1)

[Romero_TVX’14]

2016/03/04

Page 36: Semantically Capturing and Representing News Stories on the Web

The News Semantic Snapshot: Gold Standard

Semantically Capturing and Representing News Stories on the Web 36

Part 2.a

Play with the data and help us to extend it at: https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation

25

2016/03/04

Page 37: Semantically Capturing and Representing News Stories on the Web

Automatically Generating the NSS

Semantically Capturing and Representing News Stories on the Web 37

2.b

�  The Selection problem

�  Approaches: frequency-based, multidimensional, concentric

�  Experiments and Results

2016/03/04

Page 38: Semantically Capturing and Representing News Stories on the Web

b) Expanded Entities

a) Entities from Seed Document DS

Generating the NSS: General Method

Semantically Capturing and Representing News Stories on the Web 38

[Redondo_SNOW’14]

(2)

c) News Semantic Snapshot

Part 2.b

2016/03/04

Page 39: Semantically Capturing and Representing News Stories on the Web

b) Expanded Entities

a) Entities from Seed Document DS

Generating the NSS: Entity Expansion

Semantically Capturing and Representing News Stories on the Web 39

[Redondo_SNOW’14]

(2)

c) News Semantic Snapshot

Part 2.b

2016/03/04

Page 40: Semantically Capturing and Representing News Stories on the Web

Generating the NSS: Expansion’s Settings

Semantically Capturing and Representing News Stories on the Web 40

Part 2.b

Query: -  Title -  5 W’s over Subtitles Entities Web sites to be crawled: -  Google -  L1 : A set of 10 internationals

English speaking newspapers -  L2 : A set of 3 international

newspapers used in GS Temporal Window: -  1W: -  2W:

Annotation filtering -  Schema.org

[Redondo_ICWE’15]

Parameters:

2016/03/04

Page 41: Semantically Capturing and Representing News Stories on the Web

b) Expanded Entities

a) Entities DS

Generating the NSS: Expansion’s Settings

Semantically Capturing and Representing News Stories on the Web 41

[Redondo_SNOW’14]

(2)

c) News Semantic Snapshot

Part 2.b

Recall (E. Expansion) =

0.91

Recall (NER on Subtitles) =

0.42

2016/03/04

Page 42: Semantically Capturing and Representing News Stories on the Web

b) Expanded Entities

a) Entities DS

Generating the NSS: Selection

Semantically Capturing and Representing News Stories on the Web 42

(2)

c) News Semantic Snapshot

Part 2.b

[Redondo_SNOW’14]

2016/03/04

Page 43: Semantically Capturing and Representing News Stories on the Web

Generating the NSS: The Selection problem

Semantically Capturing and Representing News Stories on the Web 43

Part 2.b

(NSS)

0

N

FIdeal(ei)

(NSS)

FX(ei)

=?

Expansion

2016/03/04

Page 44: Semantically Capturing and Representing News Stories on the Web

Generating the NSS: Measures

Semantically Capturing and Representing News Stories on the Web 44

Part 2.b

1  Precision / Recall @ N -  Popular -  Easy to interpret

2  Mean Normalized Discounted Cumulative Gain (MNDCG) @ N:

-  Considers ranking -  Relevant documents at the top positions

3  Compactness for Recall R: -  Compromise between: Recall and NSS size

2016/03/04

Page 45: Semantically Capturing and Representing News Stories on the Web

Generating the NSS: Compactness Example

Semantically Capturing and Representing News Stories on the Web 45

Part 2.b

Recall: 22/33 = 0.66

Sa = 27

Sb = 33

Sc = 54

Sa = 27

Sb = 33

Sc= 54

(NSS

)

A

B C

A

B

C

>

>

2016/03/04

Page 46: Semantically Capturing and Representing News Stories on the Web

Generating the NSS: The Approaches

Semantically Capturing and Representing News Stories on the Web 46

Part 2.b

1  Frequency-Based Ranking -  Leverages on biggest sample provided by expansion -  Prioritizes representativeness

2  Multidimensional Entity Relevance Ranking

-  Relevancy of entities is ground on different dimensions

3  Concentric Based Approach -  Core / Crust model -  Alleviates the problem of dealing with many dimensions

[Redondo_SNOW’14]

[Redondo_ICWE’15]

[Redondo_KCAP’15A]

2016/03/04

Page 47: Semantically Capturing and Representing News Stories on the Web

Generating the NSS: (1) Frequency-Based

Semantically Capturing and Representing News Stories on the Web 47

Part 2.b

[Redondo_SNOW’14]

A

2016/03/04

Page 48: Semantically Capturing and Representing News Stories on the Web

Generating the NSS: (2) Multidimensional

Semantically Capturing and Representing News Stories on the Web 48

Part 2.b

[Redondo_ICWE2015]

2016/03/04

Page 49: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 49

Part 2.b

POPULARITY (FPOP) EXPERT RULES (FEXP)

49

-  Based on Google Trends -  w = 2 months -  µ + 2*σ (2.5%)

Example: -  [ Location, = 0.43] -  [ Person, = 0.78] -  [ Organization, = 0.95 ] -  [ < 2 , = 0.0 ]

Generating the NSS: (2) Multidimensional

2016/03/04

Page 50: Semantically Capturing and Representing News Stories on the Web

Experiment 1: Frequency VS Multidimensional

Semantically Capturing and Representing News Stories on the Web 50

Part 2.b

20 x 4 x 4 =

320 formulas

2016/03/04

Page 51: Semantically Capturing and Representing News Stories on the Web

Experiment 1: Frequency VS Multidimensional

Semantically Capturing and Representing News Stories on the Web 51

Part 2.b

�  News Entity Expansion & Dimensions ! Generate NSS

�  Frequency-based score: 0.473 MNDCG @ 10

�  Best score: 0.698 MNDCG @ 10 •  Collection:

•  CSE (Google + 2W + Schema.org) •  Ranking:

•  Expert Rules •  Popularity

Multidimensional Nature of the NSS

2016/03/04

Page 52: Semantically Capturing and Representing News Stories on the Web

Experiment 1: Frequency VS Multidimensional

Semantically Capturing and Representing News Stories on the Web 52

Part 2.b

(NSS)

FREQ

0

(NSS

)

F(Laura Poitras) = 2

F(Glenn Greenwald) = 1

2016/03/04

Page 53: Semantically Capturing and Representing News Stories on the Web

Experiment 1: Frequency VS Multidimensional

Semantically Capturing and Representing News Stories on the Web 53

Part 2.b

(NSS)

(Expansion)

FREQ

POP

EXP

+

+

=

(NSS

)

2016/03/04

Page 54: Semantically Capturing and Representing News Stories on the Web

Experiment 2: Multidimensional ++

Semantically Capturing and Representing News Stories on the Web 54

Part 2.b

1.  Exploit Google relevance (+1.80%) 2.  Promote subtitle entities (+2.50%) 3.  Exploit named entity extractor’s

confidence (+0.20%) 4.  Interpret popularity dimension (+1.40%) 5.  Performing clustering before filtering

(-0.60%) - NO SIGNIFICANT IMPROVEMENT -

NMDCG @ 10:

2016/03/04

Page 55: Semantically Capturing and Representing News Stories on the Web

Experiment 2: Multidimensional ++

Semantically Capturing and Representing News Stories on the Web 55

Part 2.b

Tune Function X

FREQ

POP

EXP

Re-Shuffle

Original

(NSS

)

2016/03/04

Page 56: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 56

Part 2.b

MNDCG: •  Too focused on success at first positions (decay Function) •  NSS intends to be flexible, ranking is application-dependent COMPACTNESS: •  Prioritizes coverage over ranking while minimizing NSS size

Re-thinking the problem: measures

2016/03/04

Page 57: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 57

Part 2.b

Duality in news entity spectrum: •  Representative entities:

•  Driving the plot of the story •  Relevant entities

•  Related to former via specific reasons •  Exploit the entity semantic relations

Suggested by Expert?

Informative? Unexpected?

Interesting?

Explicative?

Re-thinking the problem: dimensions

2016/03/04

Page 58: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 58

Part 2.b Generating the NSS: (3) Concentric Approach

� Core •  Representative entities •  Spottable via frequency

dimensions •  High degree of

cohesiveness

� Crust •  Attached to the Core via

semantic relations •  Agnostic to relevancy

nature: informativeness, interestingness, etc.

[Redondo_KCAP2015A]

2016/03/04

Page 59: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 59

Part 2.b Generating the NSS: (3) Core Creation

a) Spot representative entities: Frequency Dimension

(NSS)

b) Cohesiveness (DBpedia)

2016/03/04

Page 60: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 60

Part 2.b Generating the NSS: (3) Crust Creation

The number of Web documents talking simultaneously about a particular entity e and the Core: ?

2016/03/04

Page 61: Semantically Capturing and Representing News Stories on the Web

Experiment 3: Multidimensional VS Concentric

Semantically Capturing and Representing News Stories on the Web 61

Part 2.b

1.  Entity Frequency ○  Core1: Jaro-Winkler > 0.9 ○  Core2: Frequency based on Exact String matching

2.  Cohesiveness: ○  Everything is Connected Engine, Skb(e1, e2) > 0.125

Everything is Connected Engine:

https://github.com/mmlab/eice

Concentric Core:

2016/03/04

Page 62: Semantically Capturing and Representing News Stories on the Web

Experiment 3: Multidimensional VS Concentric

Semantically Capturing and Representing News Stories on the Web 62

Part 2.b

1.  Candidates for CRUST generation: ○  Ex1: 1° ICWE2015 by R*(50): L2+Google, F3 1W, Gauss+ POP ○  Ex2: 2° ICWE 2015 by R*(50): L2+Google, F3 1W, Freq + POP

2.  Function for attaching entities to CORE: ○  SWEB(ei, Core) over Google CSE, default configuration

Concentric Crust:

2016/03/04

Page 63: Semantically Capturing and Representing News Stories on the Web

Experiment 3: Multidimensional VS Concentric

Semantically Capturing and Representing News Stories on the Web 63

Part 2.b

Combining CORE and CRUST:

Core+Crust CrustOnly

2016/03/04

Page 64: Semantically Capturing and Representing News Stories on the Web

Experiment 3: Multidimensional VS Concentric

Semantically Capturing and Representing News Stories on the Web 64

Part 2.b

36.9% more compact than Multidimensional (NSS’s size decrease)

IdealGT: size of SSN according to Gold Standard

(2*2*2 + 2) Runs

2016/03/04

Page 65: Semantically Capturing and Representing News Stories on the Web

Experiment 3: Multidimensional VS Concentric

Semantically Capturing and Representing News Stories on the Web 65

Part 2.b

NSS Gold Standard

Fukushima Disaster 2013

2016/03/04

n=22

Page 66: Semantically Capturing and Representing News Stories on the Web

Multidimensional

Concentric

Semantically Capturing and Representing News Stories on the Web 66

Part 2.b Experiment 3: Multidimensional VS Concentric

2016/03/04

Page 67: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 67

Part 2.b NSS: Suitable model for news applications ?

2016/03/04

Page 68: Semantically Capturing and Representing News Stories on the Web

Consuming the Concentric NSS

Semantically Capturing and Representing News Stories on the Web 68

2.c

�  News consumption phases

�  The NSS for feeding news prototypes

2016/03/04

Page 69: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 69

Part 2.c NSS Consumption: News Prototypes

… short summaries,

previews, hotspots …

… advanced graphs and diagrams,

timelines, in-depth summaries

… second screen apps, slideshows,

info-boxes …

2016/03/04

Page 70: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 70

Part 2.c NSS Consumption: Consumptions Phases

The Before The During The After

2016/03/04

Page 71: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 71

Part 2.c NSS Consumption: Phases VS Layers

[Redondo_KCAP’15B]

2016/03/04

Page 72: Semantically Capturing and Representing News Stories on the Web

Conclusions & Future Work

Semantically Capturing and Representing News Stories on the Web 72

�  Publications

�  References

2016/03/04

Page 73: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 73

Conclusions

a.  Applied NER and NED as semantic annotation techniques in the multimedia domain

b.  Developed other techniques such as Named Entity Expansion or Visual Concept Mapping

c.  LinkedTV model to harmonize annotations into the Linked Data Web

Q1: How can multimedia content be semantically annotated and seamlessly connected with other resources on the Web?

Q2: Can those semantic annotations and linked media resources bring value for the exploitation and consumption of multimedia content?

a.  Exploiting multimedia semantic techniques: enriching, highlighting media fragments (hotspots), classifying videos…

b.  Evaluation of multimodal approaches via Mediaeval 2013/2014

2016/03/04

Page 74: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 74

Conclusions

a.  Proposed the NSS model and a Gold Standard

b.  The multidimensional nature of the entity relevance •  Gaussian function, popularity, experts rules…

c.  Concentric model better reproduces the NSS: •  Better Compactness: 36.9% over BAS01 (similar recall, smaller size) •  Core/Crust brings up relevant entities without having to deal with

fuzzy dimensions

d.  NSS better supports the news consumption phases: (Before, During, After)

Q3: Is it possible to automatically contextualize news

stories with background information so they can be effectively interpreted by humans and machines?

2016/03/04

Page 75: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 75

Future Work •  [S] Publish generated NSS on the Web (Linked Data) •  [S] Extend the Gold Standard:

•  From 5 to 23 videos, concentric based model for candidate selection •  Submission to TOIS

•  [S] Not depending on “big players” for retrieving knowledge during the expansion phase (Terrier VS Google experiments)

2016/03/04

Page 76: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 76

Future Work

•  [M] Using the power of crowdsourcing in Gold Standard creation

•  Increase size of the Gold Standard without involving experts

•  Consider different levels of entity relevancy

•  [M] Supervised techniques: Learn to Rank •  Features in entities: surface forms, URL’s, types… •  Features in documents, sources, and other provenance

information

2016/03/04

Page 77: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 77

Future Work

•  [L] Spot not only the strength of the relationships between Crust and the Core, but also the predicates

Editor in WikiLeaks

Generating Explanations

analyzing documents considered in Sweb

2016/03/04

Page 78: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 78

Future Work

•  [L] Not having to rely on “Big Players” during Crust generation:

•  Continuous indexing •  Better curated white lists •  Fresher structured databases: DBpedia events

•  [L] Reusing concentric model in context-related tasks: •  Name Entity Extraction/Disambiguation

"  As another feature similar to BagOfWords, Word2vec… •  Exploratory Searches

"  Diversity, serendipity…

++

[Steiner_ICWE’15]

2016/03/04

Page 79: Semantically Capturing and Representing News Stories on the Web

José Luis Redondo García

http://jluisred.github.io

@peputo

http://github.com/jluisred

“my small dent in the vast ocean of knowledge…”

Ph.D.

questions?

Page 80: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 80

Publications

Journals

•  Redondo Garcia J. L and Adolfo Lozano-Tello: OntoTV: an Ontology Based System for the Management of Information about Television Content. International Journal of Semantic Computing, 6(01), 111-130, 2012.

Conferences •  Redondo Garcia J. L., Rizzo G., Troncy R. (2015) Capturing News Stories Once, Retelling

a Thousand Ways. In: 8th International Conference on Knowledge Capture (K-CAP'15), Palisades, NY, USA.

•  Redondo Garcia J. L., Rizzo G., Troncy R. (2015) The Concentric Nature of News Semantic Snapshots: Knowledge Extraction for Semantic Annotation of News Items. In: 8th International Conference on Knowledge Capture (K-CAP'15), Palisades, NY, USA. Best Paper Award

•  Redondo Garcia J. L., Rizzo G., Romero L. P., Hildebrand M., Troncy R. (2015) Generating Semantic Snapshots of Newscasts using Entity Expansion. In: 15th International Conference on Web Engineering (ICWE'15), Rotterdam, the Netherlands.

•  Rizzo G., Steiner T., Troncy R., Verborgh R., Redondo Garcia J. L. and Van de Walle R. (2012), What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social Networks. In (ACM Multimedia) International Workshop on Socially-Aware Multimedia (SAM'12), Nara, Japan

Journals (2), Conferences (6), Workshops(5), Demo/Poster(7)

2016/03/04

Page 81: Semantically Capturing and Representing News Stories on the Web

Semantically Capturing and Representing News Stories on the Web 81

References [Redondo_KCAP’15B] Capturing News Stories Once, Retelling a Thousand Ways

[Redondo_KCAP’15A] The Concentric Nature of News Semantic Snapshots

[Redondo_ICWE’15] Generating Semantic Snapshots of Newscasts using Entity Expansion

[Redondo_ISWC’14] Finding and sharing hot spots in Web Videos

[Redondo_ESWC’14] Augmenting TV Newscasts via Entity Expansion

[Redondo_SNOW’14] Describing and Contextualizing Events in TV News Show

[LinkedTV_D2.6’14] LinkedTV Framework for Generating Video Enrichments with Annotations

[Romero_TVX’14] LinkedTV News: A dual mode second screen companion for web-enriched news broadcasts

[Hoang_MediaEval’14] LinkedTV at MediaEval 2014 Search and Hyperlinking Task

[Rizzo_LREC’14] Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web

[Li_LIMe'13] Enriching Media Fragments with Named Entities for Video Classification

[Milicic_WWW'13] Live Topic Generation from Event Streams

[Milicic_ESWC’13] Tracking and Analyzing The 2013 Italian Election

[Sahuguet_MediaEval’13] LinkedTV at MediaEval 2013 Search and Hyperlinking Task

[Rizzo_SAM’12] What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social Networks

2016/03/04