semantically capturing and representing news stories on the web
TRANSCRIPT
Semantically Capturing and Representing News Stories on the Web José%Luis%Redondo%García
Jluisred.github.io @peputo
Outline
� Semantic Annotation of News’ Context
Original artwork by Matt Might http://matt.might.net/articles/phd-school-in-pictures/
� TOWARDS A SEMANTIC MULTIMEDIA WEB
i. Media annotation ii. A multimedia model iii. Semantic media
exploitation � CONTEXTUALIZING NEWS
STORIES i. The News Semantic
Snapshot (NSS) ii. The multidimensional
nature of the entity relevance
iii. A concentric model for NSS generation
iv. NSS in the consumption of News
Future Career PHD Previous
1 2
Outline
Semantically Capturing and Representing News Stories on the Web 3
� Part II: Semantic Annotation of News’
Context
Multidimensional Relevancy
NSS Generation
Concentric Model
NSS Gold Standard
News Prototypes
2016/03/04
The Use Case: Contextualizing News
Semantically Capturing and Representing News Stories on the Web 4
http://www.bbc.com/news/world-europe-23339199#t=34.1,39.8 (Media Fragment URI 1.0)
Edward Snowden
(NE over Subtitles) Sarah Harrison
WikiLeaks Editor Airport in Moscow
Sheremetyevo
2016/03/04
Semantically Capturing and Representing News Stories on the Web 5
The Use Case: Contextualizing News
2016/03/04
Semantically Capturing and Representing News Stories on the Web 6
Research Questions
� Q1: How can multimedia content be semantically annotated and seamlessly connected with other resources on the Web?
� Q2: Can those semantic annotations and linked media resources bring value for the exploitation and consumption of multimedia content?
� Q3: Is it possible to automatically contextualize news stories with background information so they can be effectively interpreted by humans and machines?
2016/03/04
Part 1 Towards a Semantic Multimedia Web
Semantically Capturing and Representing News Stories on the Web 7
1
Q.1, Q.2
2016/03/04
“ Bringing Multimedia to the Web
Why?
Semantically Capturing and Representing News Stories on the Web 8
� Make video a first citizen of the Web
� Make video universally accessible and shareable at different granularities (segments)
� Benefit from the vast knowledge already present on the Web
2016/03/04
Semantic Annotation � Alfonseca, E. and Manandhar. An
unsupervised method for General Named Entity Recognition and Automated Concept Discovery
� Mendes, P., Jakob, M. and Garcia-Silva, A and Bizer, C. DBpedia spotlight: shedding light on the web of documents
� Shinyama, Y. and Sekine, S. Named entity discovery using comparable news articles
� Chang, S-F, Manmatha, R and Chua, T-S.
Combining text and audio-visual features in video indexing
� Wang, Richard C. and Cohen, William W. Iterative Set Expansion of Named Entities Using the Web
� Talukdar, P-P., Brants, T., Liberman, M. and Pereira, F. A. Context Pattern Induction Method for Named Entity Extraction
Multimedia Modeling � MPEG-7 http://mpeg.chiariglione.org/
standards/mpeg-7/mpeg-7.htm � TV-Anytime http://tech.ebu.ch/tvanytime � Synchronized Multimedia Integration
Language https://www.w3.org/TR/REC-smil/
� Media Fragment URI 1.0 specification (W3C) http://www.w3.org/TR/media-frags ◉ Synote: http://linkeddata.synote.org ◉ Ninsuna: http://ninsuna.elis.ugent.be/
� BBC Programmes Ontology http://www.bbc.co.uk/ontologies/programmes/2009-09-07.shtml
� Schema.org (SchemaDotOrgTV) http://www.w3.org/wiki/WebSchemas/
� Ontology for Media Resources https://www.w3.org/TR/mediaont-10/
� Web Annotation https://www.w3.org/TR/annotation-model/
Semantically Capturing and Representing News Stories on the Web 9
State of the Art & Related Work Part 1
Named Entity
Multimodal
Expansion
� 2016/03/04
Multimedia Annotations
Semantically Capturing and Representing News Stories on the Web 10
� Automatic annotation: 300 hours/min YouTube video
� What is inside the video? multimodal approach
� Semantic annotations, leveraging on Web Resources: more human-like operations
1.a
2016/03/04
1 ontology http://nerd.eurecom.fr/ontology 2 API http://nerd.eurecom.fr/api/application.wadl 3 UI http://nerd.eurecom.fr
Multimedia Annotation: Named Entity Recognition
Semantically Capturing and Representing News Stories on the Web 11
nerd:Product S-Bahn
nerd:Person Obama
nerd:Person Michelle
nerd:Location Berlin
http://data.linkedtv.eu/media/e2899e7f#t=840,900
Part 1.a
https://github.com/giusepperizzo/nerdml
ML [Rizzo_LREC’14]
2016/03/04
Other documents similar to DS
b) Expanded Entities
a) Entities from Seed Document DS
Multimedia Annotation: Named Entity Expansion
Semantically Capturing and Representing News Stories on the Web 12
[Redondo_SNOW’14]
Part 1.a
2016/03/04
Multimedia Annotation: Expansion Pipeline
Semantically Capturing and Representing News Stories on the Web 13
[Redondo_SNOW’14]
Part 1.a
Available @ http://linkedtv.eurecom.fr/entitycontext/api/
2016/03/04
Multimedia Annotation: Multimodal Approach
� Text: ○ Keyword Extraction ○ Topic Recognition ○ From Textual Visual Cues to LSCOM Concepts
� Visual: ○ Visual Concept Detection (LSCOM) ○ Shot Segmentation ○ Scene Segmentation ○ Optical Character Recognition (OCR) ○ Automatic Speech Recognition (ASR) ○ Face Detection and Tracking ○ …
14
Multimedia Knowledge
Model
Part 1.a
Semantically Capturing and Representing News Stories on the Web 2016/03/04
Multimedia Model
Semantically Capturing and Representing News Stories on the Web 15
� Explicitly represent video and its annotations
� At the level of fragments
� Based on well-known vocabularies, flexible and extensible while being Linked Data compliant
1.b
2016/03/04
Multimedia Model: LinkedTV Model
Semantically Capturing and Representing News Stories on the Web 16
Annotation Concept
Keyword BBC Ontology + SchemaDotOrgTV
ANALYSIS RESULTS (Support for segmentation)
Media Fragments URI 1.0 (W3C)
LSCOM
Ontology for Media Resources (W3C)
BROADCAST DATA
Web Annotations (W3C)
EXTERNAL DATASETS
Entity
NERD
Provenance
Ontology for Provenance Management
Programme
Brand
Series
Episode
Version Broadcast
Service Broadcast Channel
Scene
Shot
MediaFragment
Face
Part 1.b
Available @ http://data.linkedtv.eu/ontologies/core/
2016/03/04
Semantically Capturing and Representing News Stories on the Web 17
Part 1.b
Locator
MediaResource
MediaFragment Annotation
Entity
URL (hyperlink)
Type
OffsetBasedString
Multimedia Model: LinkedTV Model
2016/03/04
Multimedia Model: TV2RDF Service
Semantically Capturing and Representing News Stories on the Web 18
Part 1.b
Content Publisher
RDF
Conversion + NERD
TV2RDF
Analysis Metadata
RDF
Triplestore
Available @ http://linkedtv.eurecom.fr/tv2rdf/
2016/03/04
Exploiting Knowledge
Semantically Capturing and Representing News Stories on the Web 19
� Leverage on the Model & Annotations for advanced mining tasks
� Probe the value of multimodal approach: Evaluation on standard corpora
1.c
2016/03/04
Semantically Capturing and Representing News Stories on the Web 20
Part 1.c Exploitation: Enriching
oa:Annotation
rbbaktuell_20120809
nerd:Location Berlin
Illustrate seed video [Milicic_WWW'13]
2016/03/04
Exploitation: Enriching Services & Prototypes
Semantically Capturing and Representing News Stories on the Web 21
Part 1.c
Name URL Published @
MediaCollector http://linkedtv.eurecom.fr/api/mediacollector/search/ [Rizzo_SAM’12]
MediaFinder http://mediafinder.eurecom.fr/ [Milicic_WWW’13]
Italian Elections 2013 http://mediafinder.eurecom.fr/story/elezioni2013 [Milicic_ESWC’13]
TVEnricher http://linkedtv.eurecom.fr/tvenricher/api/ [LinkedTV_D2.6’14]
TVNewsEnricher http://linkedtv.eurecom.fr/newsenricher/api/ [Redondo_ESWC’14]
2016/03/04
Exploitation: Classifying videos
Semantically Capturing and Representing News Stories on the Web 22
Part 1.c
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1.fun channel
0 17
85 85 96 106 114
78
117140
188
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
2.tech channel
0
410453
402 396 404353 364 344 374
571
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
3.sport channel
0
192
298 301 288 291 302260 270
361
231
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
4.news channel
0
527481 488 469
412 412 434 419487
792
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
5.creation channel
0
259 272245
186149
177 165 165143
205
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
6.lifestyle channel
0
1128
786 563 525 475 519 465 501 467
1567
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
7.shortfilms channel
0
169216431567156714971234121410991025
4268
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
8.music channel
0
204222
186
129166
131148 137 125
169
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
9.other channel
0
423495
451401 404
356 354 368 338
689
Thing Amount Animal Event Function Loc Organization Person Product Timex−Axis: The temporal positions of NEsy−Axis: The number of NEs
[Li_LIME'13] Dailymotion Dataset, 805 videos, 46.58% Accuracy
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1.fun channel
0 17
85 85 96 106 114
78
117140
188
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
2.tech channel
0
410453
402 396 404353 364 344 374
571
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
3.sport channel
0
192
298 301 288 291 302260 270
361
231
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
4.news channel
0
527481 488 469
412 412 434 419487
792
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
5.creation channel
0
259 272245
186149
177 165 165143
205
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
6.lifestyle channel
0
1128
786 563 525 475 519 465 501 467
1567
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
7.shortfilms channel
0
169216431567156714971234121410991025
4268
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
8.music channel
0
204222
186
129166
131148 137 125
169
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
9.other channel
0
423495
451401 404
356 354 368 338
689
Thing Amount Animal Event Function Loc Organization Person Product Timex−Axis: The temporal positions of NEsy−Axis: The number of NEs
Temporal distribution of entity types
2016/03/04
Exploitation: Promoting Media Fragments
Semantically Capturing and Representing News Stories on the Web 23
Part 1.c
Available @ http://linkedtv.eurecom.fr/HyperTED
[Redondo_ISWC’14]
2016/03/04
Evaluation: Multimodal @ Mediaeval 2013
Semantically Capturing and Representing News Stories on the Web 24
Part 1.c
~ 1697h of BBC video data, 2323 videos � Different TV shows
(news, sports, politics…) from 2012
� Subtitles and ASR (English)
� Output of some visual algorithms: shot and face detection
Anchor
Search Task Hyperlinking Task
Query
T/V
v1 v2 v3 vn v1 v2 v3 vn va
2016/03/04
Evaluation: Multimodal @ Mediaeval 2013
Semantically Capturing and Representing News Stories on the Web 25
Part 1.c
Annotations Processing Time Type
Visual Concept Detection (151) 20 days on 100 cores Visual **
Scene Segmentation 2 days on 6 cores Visual
OCR 1 day on 10 cores Visual
Keywords Extraction 5 hours Textual **
Named Entities Extraction 4 days Textual
Face detection and Tracking 4 days on 160 cores Visual
� Data Indexing: ◉ Lucene & Solr ◉ Granularities: Shot, Scenes, Sliding Windows… ◉ Multimodality
� Query Formulation: ◉ Search: Text + Visual Cues + Visual Concept
Mapping, MLSCOM ◉ Hyperlink: Subtitles, Keywords, LSCOM
concepts (MoreLikeThis)
Approach:
2016/03/04
0.19 MRR (Mean R. Rank)
Evaluation: Mediaeval 2013 Results
Semantically Capturing and Representing News Stories on the Web 26
Part 1.c
Search Task
Hyperlinking Task
[Sahuguet_MediaEval’13]
0,72 P10
2016/03/04
Evaluation: Mediaeval 2014 Results
Semantically Capturing and Representing News Stories on the Web 27
Part 1.c
Search Task
[Hoang_MediaEval’14]
Hyperlinking Task
� Changes in 2014 edition: ◉ New Dataset from BBC: 2686 hours and 3520 videos ◉ No Visual Cues on Search Queries ◉ New Approach: 22% MAP improvement in 2013 Dataset
0.71 P10
0.67 P10
2016/03/04
“ Narrowing down…
From Multimedia Content to
News Items
Semantically Capturing and Representing News Stories on the Web 28 2016/03/04
Part 2 Semantically Contextualizing News Stories
Semantically Capturing and Representing News Stories on the Web 29
2
Q.3
2016/03/04
The Use Case: Contextualizing News
Semantically Capturing and Representing News Stories on the Web 30
Wolfgang Schäuble
Finance Minister Ruling Party in Ger.
Christian Democratic Union
Part 2
2016/03/04
Semantic News Annotation � N. Fernandez, J. A. Fisteus, L. Sanchez, and G. Lopez. Identityrank: Named
entity disambiguation in the news domain.
� S. Chabra. Entity-centric summarization: Generating text summaries for graph snippets.
� A. Fuxman, P. Pantel, Y. Lv, A. Chandra, P. Chilakamarri, M. Gamon, D. Hamilton, B. Kohlmeier, D. Narayanan, E. Papalexakis, and B. Zhao. Contextual insights
� N. Kanhabua, R. Blanco, and M. Matthews. Ranking related news predictions.
� N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Back to the past: Supporting interpretations of forgotten stories by time-aware re-contextualization.
� N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Time-travel translator: Automatically contextualizing news articles.
� T. Stajner, B. Thomee, A.-M. Popescu, M. Pennacchiotti, and A. Jaimes. Automatic selection of social media responses to news.
Semantically Capturing and Representing News Stories on the Web 31
State of the Art & Related Work Part 2
Graph
Named Entities in News
Contextualizing News
Relevancy of Entities
2016/03/04
Semantic Snapshot of News (NSS)
Semantically Capturing and Representing News Stories on the Web 32
� Definition and Motivation
� A Gold Standard of News Entities
2.a
2016/03/04
Semantically Capturing and Representing News Stories on the Web 33
Going deep down… It is always challenging
What is on top: Entities explicitly appearing in the documents
Laura Poitras
Anatoly Kucherena
Edward Snowden
Part 2.a The News Semantic Snapshot (NSS)
2016/03/04
The News Semantic Snapshot (NSS)
Semantically Capturing and Representing News Stories on the Web 34
Part 2.a
News Semantic Snapshot (NSS) [Redondo_ICWE’15]
2016/03/04
The News Semantic Snapshot: Gold Standard
Semantically Capturing and Representing News Stories on the Web 35
Part 2.a
� High Level of detail, significant human Intervention: (Experts in the news domain + users)
� Entities in 5 Dimensions: (Visual & Text)
(1) Video Subtitles
(2) Image in the video
(4) Suggestions of an expert
(5) Related articles
USER SURVEY
“We don't have any extradition treaty with Russia. Broadly speaking our policy remains the same: that
we'd like him returned
(3) Text in the video image
(2)
(3)
(1)
[Romero_TVX’14]
2016/03/04
The News Semantic Snapshot: Gold Standard
Semantically Capturing and Representing News Stories on the Web 36
Part 2.a
Play with the data and help us to extend it at: https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation
25
2016/03/04
Automatically Generating the NSS
Semantically Capturing and Representing News Stories on the Web 37
2.b
� The Selection problem
� Approaches: frequency-based, multidimensional, concentric
� Experiments and Results
2016/03/04
b) Expanded Entities
a) Entities from Seed Document DS
Generating the NSS: General Method
Semantically Capturing and Representing News Stories on the Web 38
[Redondo_SNOW’14]
(2)
c) News Semantic Snapshot
Part 2.b
2016/03/04
b) Expanded Entities
a) Entities from Seed Document DS
Generating the NSS: Entity Expansion
Semantically Capturing and Representing News Stories on the Web 39
[Redondo_SNOW’14]
(2)
c) News Semantic Snapshot
Part 2.b
2016/03/04
Generating the NSS: Expansion’s Settings
Semantically Capturing and Representing News Stories on the Web 40
Part 2.b
Query: - Title - 5 W’s over Subtitles Entities Web sites to be crawled: - Google - L1 : A set of 10 internationals
English speaking newspapers - L2 : A set of 3 international
newspapers used in GS Temporal Window: - 1W: - 2W:
Annotation filtering - Schema.org
[Redondo_ICWE’15]
Parameters:
2016/03/04
b) Expanded Entities
a) Entities DS
Generating the NSS: Expansion’s Settings
Semantically Capturing and Representing News Stories on the Web 41
[Redondo_SNOW’14]
(2)
c) News Semantic Snapshot
Part 2.b
Recall (E. Expansion) =
0.91
Recall (NER on Subtitles) =
0.42
2016/03/04
b) Expanded Entities
a) Entities DS
Generating the NSS: Selection
Semantically Capturing and Representing News Stories on the Web 42
(2)
c) News Semantic Snapshot
Part 2.b
[Redondo_SNOW’14]
2016/03/04
Generating the NSS: The Selection problem
Semantically Capturing and Representing News Stories on the Web 43
Part 2.b
(NSS)
0
N
FIdeal(ei)
(NSS)
FX(ei)
=?
Expansion
2016/03/04
Generating the NSS: Measures
Semantically Capturing and Representing News Stories on the Web 44
Part 2.b
1 Precision / Recall @ N - Popular - Easy to interpret
2 Mean Normalized Discounted Cumulative Gain (MNDCG) @ N:
- Considers ranking - Relevant documents at the top positions
3 Compactness for Recall R: - Compromise between: Recall and NSS size
2016/03/04
Generating the NSS: Compactness Example
Semantically Capturing and Representing News Stories on the Web 45
Part 2.b
Recall: 22/33 = 0.66
Sa = 27
Sb = 33
Sc = 54
Sa = 27
Sb = 33
Sc= 54
(NSS
)
A
B C
A
B
C
>
>
2016/03/04
Generating the NSS: The Approaches
Semantically Capturing and Representing News Stories on the Web 46
Part 2.b
1 Frequency-Based Ranking - Leverages on biggest sample provided by expansion - Prioritizes representativeness
2 Multidimensional Entity Relevance Ranking
- Relevancy of entities is ground on different dimensions
3 Concentric Based Approach - Core / Crust model - Alleviates the problem of dealing with many dimensions
[Redondo_SNOW’14]
[Redondo_ICWE’15]
[Redondo_KCAP’15A]
2016/03/04
Generating the NSS: (1) Frequency-Based
Semantically Capturing and Representing News Stories on the Web 47
Part 2.b
[Redondo_SNOW’14]
A
2016/03/04
Generating the NSS: (2) Multidimensional
Semantically Capturing and Representing News Stories on the Web 48
Part 2.b
[Redondo_ICWE2015]
2016/03/04
Semantically Capturing and Representing News Stories on the Web 49
Part 2.b
POPULARITY (FPOP) EXPERT RULES (FEXP)
49
- Based on Google Trends - w = 2 months - µ + 2*σ (2.5%)
Example: - [ Location, = 0.43] - [ Person, = 0.78] - [ Organization, = 0.95 ] - [ < 2 , = 0.0 ]
Generating the NSS: (2) Multidimensional
2016/03/04
Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 50
Part 2.b
20 x 4 x 4 =
320 formulas
2016/03/04
Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 51
Part 2.b
� News Entity Expansion & Dimensions ! Generate NSS
� Frequency-based score: 0.473 MNDCG @ 10
� Best score: 0.698 MNDCG @ 10 • Collection:
• CSE (Google + 2W + Schema.org) • Ranking:
• Expert Rules • Popularity
Multidimensional Nature of the NSS
2016/03/04
Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 52
Part 2.b
(NSS)
FREQ
0
(NSS
)
F(Laura Poitras) = 2
F(Glenn Greenwald) = 1
2016/03/04
Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 53
Part 2.b
(NSS)
(Expansion)
FREQ
POP
EXP
+
+
=
(NSS
)
2016/03/04
Experiment 2: Multidimensional ++
Semantically Capturing and Representing News Stories on the Web 54
Part 2.b
1. Exploit Google relevance (+1.80%) 2. Promote subtitle entities (+2.50%) 3. Exploit named entity extractor’s
confidence (+0.20%) 4. Interpret popularity dimension (+1.40%) 5. Performing clustering before filtering
(-0.60%) - NO SIGNIFICANT IMPROVEMENT -
NMDCG @ 10:
2016/03/04
Experiment 2: Multidimensional ++
Semantically Capturing and Representing News Stories on the Web 55
Part 2.b
Tune Function X
FREQ
POP
EXP
Re-Shuffle
Original
(NSS
)
2016/03/04
Semantically Capturing and Representing News Stories on the Web 56
Part 2.b
MNDCG: • Too focused on success at first positions (decay Function) • NSS intends to be flexible, ranking is application-dependent COMPACTNESS: • Prioritizes coverage over ranking while minimizing NSS size
Re-thinking the problem: measures
2016/03/04
Semantically Capturing and Representing News Stories on the Web 57
Part 2.b
Duality in news entity spectrum: • Representative entities:
• Driving the plot of the story • Relevant entities
• Related to former via specific reasons • Exploit the entity semantic relations
Suggested by Expert?
Informative? Unexpected?
Interesting?
Explicative?
Re-thinking the problem: dimensions
2016/03/04
Semantically Capturing and Representing News Stories on the Web 58
Part 2.b Generating the NSS: (3) Concentric Approach
� Core • Representative entities • Spottable via frequency
dimensions • High degree of
cohesiveness
� Crust • Attached to the Core via
semantic relations • Agnostic to relevancy
nature: informativeness, interestingness, etc.
[Redondo_KCAP2015A]
2016/03/04
Semantically Capturing and Representing News Stories on the Web 59
Part 2.b Generating the NSS: (3) Core Creation
a) Spot representative entities: Frequency Dimension
(NSS)
b) Cohesiveness (DBpedia)
2016/03/04
Semantically Capturing and Representing News Stories on the Web 60
Part 2.b Generating the NSS: (3) Crust Creation
The number of Web documents talking simultaneously about a particular entity e and the Core: ?
2016/03/04
Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 61
Part 2.b
1. Entity Frequency ○ Core1: Jaro-Winkler > 0.9 ○ Core2: Frequency based on Exact String matching
2. Cohesiveness: ○ Everything is Connected Engine, Skb(e1, e2) > 0.125
Everything is Connected Engine:
https://github.com/mmlab/eice
Concentric Core:
2016/03/04
Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 62
Part 2.b
1. Candidates for CRUST generation: ○ Ex1: 1° ICWE2015 by R*(50): L2+Google, F3 1W, Gauss+ POP ○ Ex2: 2° ICWE 2015 by R*(50): L2+Google, F3 1W, Freq + POP
2. Function for attaching entities to CORE: ○ SWEB(ei, Core) over Google CSE, default configuration
Concentric Crust:
2016/03/04
Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 63
Part 2.b
Combining CORE and CRUST:
Core+Crust CrustOnly
2016/03/04
Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 64
Part 2.b
36.9% more compact than Multidimensional (NSS’s size decrease)
IdealGT: size of SSN according to Gold Standard
(2*2*2 + 2) Runs
2016/03/04
Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 65
Part 2.b
NSS Gold Standard
Fukushima Disaster 2013
2016/03/04
n=22
Multidimensional
Concentric
Semantically Capturing and Representing News Stories on the Web 66
Part 2.b Experiment 3: Multidimensional VS Concentric
2016/03/04
Semantically Capturing and Representing News Stories on the Web 67
Part 2.b NSS: Suitable model for news applications ?
2016/03/04
Consuming the Concentric NSS
Semantically Capturing and Representing News Stories on the Web 68
2.c
� News consumption phases
� The NSS for feeding news prototypes
2016/03/04
Semantically Capturing and Representing News Stories on the Web 69
Part 2.c NSS Consumption: News Prototypes
… short summaries,
previews, hotspots …
… advanced graphs and diagrams,
timelines, in-depth summaries
…
… second screen apps, slideshows,
info-boxes …
2016/03/04
Semantically Capturing and Representing News Stories on the Web 70
Part 2.c NSS Consumption: Consumptions Phases
The Before The During The After
2016/03/04
Semantically Capturing and Representing News Stories on the Web 71
Part 2.c NSS Consumption: Phases VS Layers
[Redondo_KCAP’15B]
2016/03/04
Conclusions & Future Work
Semantically Capturing and Representing News Stories on the Web 72
� Publications
� References
2016/03/04
Semantically Capturing and Representing News Stories on the Web 73
Conclusions
a. Applied NER and NED as semantic annotation techniques in the multimedia domain
b. Developed other techniques such as Named Entity Expansion or Visual Concept Mapping
c. LinkedTV model to harmonize annotations into the Linked Data Web
Q1: How can multimedia content be semantically annotated and seamlessly connected with other resources on the Web?
Q2: Can those semantic annotations and linked media resources bring value for the exploitation and consumption of multimedia content?
a. Exploiting multimedia semantic techniques: enriching, highlighting media fragments (hotspots), classifying videos…
b. Evaluation of multimodal approaches via Mediaeval 2013/2014
2016/03/04
Semantically Capturing and Representing News Stories on the Web 74
Conclusions
a. Proposed the NSS model and a Gold Standard
b. The multidimensional nature of the entity relevance • Gaussian function, popularity, experts rules…
c. Concentric model better reproduces the NSS: • Better Compactness: 36.9% over BAS01 (similar recall, smaller size) • Core/Crust brings up relevant entities without having to deal with
fuzzy dimensions
d. NSS better supports the news consumption phases: (Before, During, After)
Q3: Is it possible to automatically contextualize news
stories with background information so they can be effectively interpreted by humans and machines?
2016/03/04
Semantically Capturing and Representing News Stories on the Web 75
Future Work • [S] Publish generated NSS on the Web (Linked Data) • [S] Extend the Gold Standard:
• From 5 to 23 videos, concentric based model for candidate selection • Submission to TOIS
• [S] Not depending on “big players” for retrieving knowledge during the expansion phase (Terrier VS Google experiments)
2016/03/04
Semantically Capturing and Representing News Stories on the Web 76
Future Work
• [M] Using the power of crowdsourcing in Gold Standard creation
• Increase size of the Gold Standard without involving experts
• Consider different levels of entity relevancy
• [M] Supervised techniques: Learn to Rank • Features in entities: surface forms, URL’s, types… • Features in documents, sources, and other provenance
information
2016/03/04
Semantically Capturing and Representing News Stories on the Web 77
Future Work
• [L] Spot not only the strength of the relationships between Crust and the Core, but also the predicates
Editor in WikiLeaks
Generating Explanations
analyzing documents considered in Sweb
2016/03/04
Semantically Capturing and Representing News Stories on the Web 78
Future Work
• [L] Not having to rely on “Big Players” during Crust generation:
• Continuous indexing • Better curated white lists • Fresher structured databases: DBpedia events
• [L] Reusing concentric model in context-related tasks: • Name Entity Extraction/Disambiguation
" As another feature similar to BagOfWords, Word2vec… • Exploratory Searches
" Diversity, serendipity…
++
[Steiner_ICWE’15]
2016/03/04
José Luis Redondo García
http://jluisred.github.io
@peputo
http://github.com/jluisred
“my small dent in the vast ocean of knowledge…”
Ph.D.
questions?
Semantically Capturing and Representing News Stories on the Web 80
Publications
Journals
• Redondo Garcia J. L and Adolfo Lozano-Tello: OntoTV: an Ontology Based System for the Management of Information about Television Content. International Journal of Semantic Computing, 6(01), 111-130, 2012.
Conferences • Redondo Garcia J. L., Rizzo G., Troncy R. (2015) Capturing News Stories Once, Retelling
a Thousand Ways. In: 8th International Conference on Knowledge Capture (K-CAP'15), Palisades, NY, USA.
• Redondo Garcia J. L., Rizzo G., Troncy R. (2015) The Concentric Nature of News Semantic Snapshots: Knowledge Extraction for Semantic Annotation of News Items. In: 8th International Conference on Knowledge Capture (K-CAP'15), Palisades, NY, USA. Best Paper Award
• Redondo Garcia J. L., Rizzo G., Romero L. P., Hildebrand M., Troncy R. (2015) Generating Semantic Snapshots of Newscasts using Entity Expansion. In: 15th International Conference on Web Engineering (ICWE'15), Rotterdam, the Netherlands.
• Rizzo G., Steiner T., Troncy R., Verborgh R., Redondo Garcia J. L. and Van de Walle R. (2012), What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social Networks. In (ACM Multimedia) International Workshop on Socially-Aware Multimedia (SAM'12), Nara, Japan
Journals (2), Conferences (6), Workshops(5), Demo/Poster(7)
2016/03/04
Semantically Capturing and Representing News Stories on the Web 81
References [Redondo_KCAP’15B] Capturing News Stories Once, Retelling a Thousand Ways
[Redondo_KCAP’15A] The Concentric Nature of News Semantic Snapshots
[Redondo_ICWE’15] Generating Semantic Snapshots of Newscasts using Entity Expansion
[Redondo_ISWC’14] Finding and sharing hot spots in Web Videos
[Redondo_ESWC’14] Augmenting TV Newscasts via Entity Expansion
[Redondo_SNOW’14] Describing and Contextualizing Events in TV News Show
[LinkedTV_D2.6’14] LinkedTV Framework for Generating Video Enrichments with Annotations
[Romero_TVX’14] LinkedTV News: A dual mode second screen companion for web-enriched news broadcasts
[Hoang_MediaEval’14] LinkedTV at MediaEval 2014 Search and Hyperlinking Task
[Rizzo_LREC’14] Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web
[Li_LIMe'13] Enriching Media Fragments with Named Entities for Video Classification
[Milicic_WWW'13] Live Topic Generation from Event Streams
[Milicic_ESWC’13] Tracking and Analyzing The 2013 Italian Election
[Sahuguet_MediaEval’13] LinkedTV at MediaEval 2013 Search and Hyperlinking Task
[Rizzo_SAM’12] What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social Networks
2016/03/04