missing mr. brown and buying an abraham lincoln

14
Missing Mr. Brown and buying an Abraham Lincoln: Dark Entities and DBpedia Marieke van Erp, Filip Ilievski, Marco Rospocher and Piek Vossen

Upload: marieke-van-erp

Post on 16-Apr-2017

376 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Missing Mr. Brown and buying an Abraham Lincoln

Missing Mr. Brown and buying an Abraham Lincoln: Dark Entities and DBpedia

Marieke van Erp, Filip Ilievski, Marco Rospocher and Piek Vossen

Page 2: Missing Mr. Brown and buying an Abraham Lincoln

• Entity linking is an important step in building knowledge graphs

• DBpedia is the de facto resource for entity linking:

• it’s large

• it’s broad

• it’s got good documentation

• it’s got tools

• Still, it’s coverage is insufficient

The Problem

Page 3: Missing Mr. Brown and buying an Abraham Lincoln

Dark entities

• Not the same as NIL entities!

• Dark entities are those entities for which a knowledge base has no information in the context of the entity linking task

• In NewsReader we use this context for building event-centric knowledge graphs.

• We need to know more about an entity besides its type

Page 4: Missing Mr. Brown and buying an Abraham Lincoln
Page 5: Missing Mr. Brown and buying an Abraham Lincoln

1.2 Million News Articles on Cars

• 2003 - 2013

• Born digital

• Deep processing via 15-module NLP pipeline

• First intra-document information extraction, followed by cross-document event and entity coreference

Page 6: Missing Mr. Brown and buying an Abraham Lincoln

Performance of the system

Precision Recall F1NewsReader 91.64 90.21 90.92Stanford NER -- -- 88.08

Ratinov et al. (2009) -- -- 90.57Passos et al. (2014) -- -- 90.90

Precision Recall

CoNLL/AIDA 79.67 75.95

TAC2010 79.77 60.68

NERC: CoNLL 2003

NEL:NewsReader system

Page 7: Missing Mr. Brown and buying an Abraham Lincoln

Performance of the system

Page 8: Missing Mr. Brown and buying an Abraham Lincoln

What’s going wrong in the pipeline?

• Real world data is dirty

• NER isn’t perfect

• Conjunctions

• Coreference resolution

Page 9: Missing Mr. Brown and buying an Abraham Lincoln

What’s going wrong with linking to DBpedia?

• Subdivisions •April 2006:

•production of Polo from Spain to Eastern-Europe because of social problems in Volkswagen - Pamplona and maybe to Volkswagen -Vorst in Belgium

•July 2006: •Polo production in Vorst, no jobs lost in Spain but extra jobs in Belgium.

•August 2006: •Fewer Golfs produced in Vorst, maybe more Polos. ‘If not, we have a problem’, says a union representative.....Chances that Vorst will not make any Polos next year are minimal, because the factory invested this year in a special new welding installation specific for Polo cars.

•November 2006: •Volkswagen stops the production of Golf in Vorst: 3,500 jobs are lost plant renamed to Audi-Brussels

•November 2009:•Audi plant in Vorst stops the production of Polo: 300 jobs lost

Audi-Brussels present in DBpediaVolkswagen Pamplona linked to Volkswagen

Volkswagen closes Volkswagen Pamplona ≠ dbp:Volkswagen closes dbp:Volkswagen

Page 10: Missing Mr. Brown and buying an Abraham Lincoln

What’s going wrong with linking to DBpedia?

• Domain mismatch/Ambiguity

Page 11: Missing Mr. Brown and buying an Abraham Lincoln

What can we do?

• Dynamic set of knowledge bases

• Expand knowledge bases

• Leverage latent semantics

Page 12: Missing Mr. Brown and buying an Abraham Lincoln

What can we do?

• Dynamic set of knowledge bases

• Expand knowledge bases

• Leverage latent semantics

Page 13: Missing Mr. Brown and buying an Abraham Lincoln

This research was supported by the European Union’s 7th Framework Programme via the NewsReader project (ICT-316404)