named entity disambiguation using linked data danica damljanović the university of sheffield brunel...

Post on 31-Mar-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Named Entity Disambiguation using Linked Data

Danica DamljanovićThe University of Sheffield

Brunel University London, 05 March 2012

Named Entity Disambiguation in TrendMiner

Newswire

Market data

Polls

MultilingualText Processing(EN, DE, IT, BG,

HI)

Time-SeriesMachine

Learning models

Cross-Lingual Summarisation

Knowledge-based Search and Browse

TrendMiner PlatformFinancial Decisions

Political Analysis

Named Entity Recognition is the first step: and it is important to get it right!

Hardik Fintrade Pvt. Ltd.

SORA

Eurokleis srl

Example

Linked Data

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Why DBpedia?

Regularly updated (from Wikipedia) Good source for named entities A hierarchy of concepts

a capital is also a city, but not vice versa Relations between concepts

Paris locatedIn France ParisHilton bornIn NewYorkCity

Named Entity Recognition

ANNIE Produces NE types such as Organization, Location

and Person Resolves coreference

Entities with the same meaning are linked E.g. General Motors and GM

Entity Linking

The Large Knowledge Gazetteer (LKB) Matches text against URIs

Match only against the values of The rdf:label and foaf:name properties For all instances of the classes:

dbpedia-ont:Person dbpedia-ont:Organisation dbpedia-ont:Place classes.

So, why not just combine them?

NE types generated by ANNIE miss the URI LKB does not use any context

Spurious entities E.g. each letter B is annotated as a possible

mention of dbpedia:B_%28Los_Angeles Railway%29

Refers to a line called B operated by Los Angeles Railway

How to filter out the noise?

Identify NEs (Location, Organisation and Person) using ANNIE

For each NE add URIs of matching instances from DBpedia

For each ambiguous NE calculate disambiguation scores

Remove all matches except the highest scoring one

Disambiguation score

Uses context A weighted sum of the three similarity metrics

String similarity Structural similarity Contextual similarity

String similarity

Refers to the edit distance between the text string, and the labels matching URIs

Paris and Paris Hilton

Levenshtein: 0.4166667

Jaccard: 0.5

MongeElcan: 1.0

Paris and Paris, Ontario

Levenshtein: 0.35714287

Jaccard: 0.0

MongeElcan: 1.0

Paris Hilton and Paris, Ontario

Levenshtein: 0.4285714

JaccardSimilarity: 0.0

MongeElcan: 0.6333333

Structural similarity

Is there a relation between the ambiguous NE and any other NE from the same sentence or document?

Paris....France >> true (Paris capitalOf France) Paris...New York>>true (ParisHilton bornIn

NewYorkCity)

Contextual similarity

The probability that two words appear with a similar set of other words (Random Indexing)

Paris France Paris Ontario Paris Hilton

0.9999999:paris0.3674829:métro0.356694:paul-martin0.34328446:lewden0.33907568:pimpfen0.33907568:théas0.33907568:werfft0.33907568:birmoverse0.33907568:cszhech0.330207:pierre

0.6818793:paris0.6818793:ontario0.5707274:merrickville-wolford0.5707274:naiscoutaing0.5707274:neguaguon0.5707274:magnetewan0.5707274:wabauskang0.5679094:tp0.5468101:s-e0.42145208:henvey

0.7042532:hilton0.70425296:paris0.2825679:poverty-related0.276114:jaumont0.276114:jaune-montagne0.276114:malancourt-la-montagne0.26384133:mons–january0.26142785:métro0.26125407:tank-tread0.26125407:“plane’s

Evaluation

Precision Recall f-measure

LKB 0.03 0.86 0.05

LKB+ANNIE 0.14 0.81 0.24

LKB+ANNIE+Disambiguation 0.66 0.75 0.70

100 Wikipedia user profiles manually annotated

Conclusion

Linked Data as an additional knowledge source for resolving context eliminated a large number of incorrect annotations

Thank You!

Questions?

More about the project:http://www.trendminer-project.eu

Contact:

danica.damljanovic@gmail.com

top related