Download - Arcomem training enrichment_beginner
Entity Enrichment and Consolidationin ARCOMEM
Elena Demidova1,
including slides by: Stefan Dietze1, Diana Maynard2, Thomas Risse1, Wim Peters2, Katerina Doka3, Yannis Stavrakas3
1 L3S Research Center, Hannover, Germany2 University Sheffield, UK3 IMIS, RC ATHENA, Athens, Greece
The ARCOMEM approach
• Make use of the Social Web– Huge source of user generated content– Wide range of articulation methods
From simple „I like it“-Buttons to complete articles– Represents the diversity of opinions of the public
• User activities often triggered by – Events and related entities
(e.g. Sport Events, Celebrations, Crises, News Articles, Persons, Locations)
– Topics (e.g. Global Warming, Financial Crisis, Swine Flu)
A semantic-aware and socially-driven preservation model is a natural way to go
Slide 2
The extraction components for text
Aim Extraction of Entities, Topics, Events and Opinions (ETOEs) from
Web Pages Social Web (Twitter, YouTube, Facebook, …)
Challenges Entity recognition from degraded input sources (tweets etc)
Advancing state of the art NLP and text mining Dynamics detection: evolution of terms/entities
Semantic representation of Web objects and entities Appropriate RDF schemas for ETOE and Web objects Exploiting (Linked Open) Web data to enrich extracted ETOE
Entity classification (into events, locations, topics etc) & consolidation
Slide 3
ETOE extraction with GATE: an example
Slide 4
candidate multi-word term
Data consolidation & integration problem
Data extracted from different components or during different processing cycles not aligned => consolidation, disambiguation & correlation required.
Slide 5
<Location>Greece</Location><Person>Venizelos</Person> <Location>Griechenland</Location>
<Organisation>Greek Parliament</Organisation>
?
Data clustering & enrichmentEnrichment of entities with related references to Linked Data, particularly reference datasets (DBpedia, Freebase, …)=> use enrichments for correlation/clustering/consolidation
Slide 6
<Event>Trichet warns of systemic debt crisis</Event>
<Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation>
Enrichment for clustering & correlation: example
Slide 7
<Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment>
<Enrichment>http://dbpedia.org/resource/ECB</Enrichment>
<Event>Trichet warns of systemic debt crisis</Event>
<Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation>
Enrichment for clustering & correlation: example
Slide 8
=> dbpprop:office dbpedia:President_of_the_European_Central_Bankdbpedia:Governor_of_the_Banque_de_France
=> dcterms:subject category:Living_peoplecategory:Karlspreis_recipientscategory:Alumni_of_the_École_Nationale_d'Administrationcategory:People_from_Lyon…
<Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment>
<Enrichment>http://dbpedia.org/resource/ECB</Enrichment>
<Event>Trichet warns of systemic debt crisis</Event>
<Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation>
Enrichment for clustering & correlation: example
Slide 9
ARCOMEM entities and enrichments - graph
Slide 10
Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)
1013 clusters of correlated entities/events
Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)
1013 clusters of correlated entities/events => cluster expansion by considering related enrichments
ARCOMEM entities and enrichments - graph
Slide 11