Missing Mr. Brown and buying an Abraham Lincoln: Dark Entities and DBpedia
Marieke van Erp, Filip Ilievski, Marco Rospocher and Piek Vossen
• Entity linking is an important step in building knowledge graphs
• DBpedia is the de facto resource for entity linking:
• it’s large
• it’s broad
• it’s got good documentation
• it’s got tools
• Still, it’s coverage is insufficient
The Problem
Dark entities
• Not the same as NIL entities!
• Dark entities are those entities for which a knowledge base has no information in the context of the entity linking task
• In NewsReader we use this context for building event-centric knowledge graphs.
• We need to know more about an entity besides its type
1.2 Million News Articles on Cars
• 2003 - 2013
• Born digital
• Deep processing via 15-module NLP pipeline
• First intra-document information extraction, followed by cross-document event and entity coreference
Performance of the system
Precision Recall F1NewsReader 91.64 90.21 90.92Stanford NER -- -- 88.08
Ratinov et al. (2009) -- -- 90.57Passos et al. (2014) -- -- 90.90
Precision Recall
CoNLL/AIDA 79.67 75.95
TAC2010 79.77 60.68
NERC: CoNLL 2003
NEL:NewsReader system
Performance of the system
What’s going wrong in the pipeline?
• Real world data is dirty
• NER isn’t perfect
• Conjunctions
• Coreference resolution
What’s going wrong with linking to DBpedia?
• Subdivisions •April 2006:
•production of Polo from Spain to Eastern-Europe because of social problems in Volkswagen - Pamplona and maybe to Volkswagen -Vorst in Belgium
•July 2006: •Polo production in Vorst, no jobs lost in Spain but extra jobs in Belgium.
•August 2006: •Fewer Golfs produced in Vorst, maybe more Polos. ‘If not, we have a problem’, says a union representative.....Chances that Vorst will not make any Polos next year are minimal, because the factory invested this year in a special new welding installation specific for Polo cars.
•November 2006: •Volkswagen stops the production of Golf in Vorst: 3,500 jobs are lost plant renamed to Audi-Brussels
•November 2009:•Audi plant in Vorst stops the production of Polo: 300 jobs lost
Audi-Brussels present in DBpediaVolkswagen Pamplona linked to Volkswagen
Volkswagen closes Volkswagen Pamplona ≠ dbp:Volkswagen closes dbp:Volkswagen
What’s going wrong with linking to DBpedia?
• Domain mismatch/Ambiguity
What can we do?
• Dynamic set of knowledge bases
• Expand knowledge bases
• Leverage latent semantics
What can we do?
• Dynamic set of knowledge bases
• Expand knowledge bases
• Leverage latent semantics
This research was supported by the European Union’s 7th Framework Programme via the NewsReader project (ICT-316404)