finet - uni-mannheim.de · finet context-aware fine-grained named entity typing luciano del corro*,...

FINETContext-Aware Fine-Grained Named Entity Typing

Luciano Del Corro*, Abdalghani Abujabal*,

Rainer Gemulla†, and Gerhard Weikum*

Max-Planck-Institute for Informatics*

University of Mannheim†

Named Entity Typing

The task of detecting type(s) of named

entities in a given context with respect

to a type system (e.g., WordNet)

“Page plays his guitar on the stage”

guitarist

FINET A system

• for detecting fine-grained types

• in short inputs (e.g., sentences or

tweets)

• in a given context

• with respect to WordNet

Context-Aware Typing

“Steinmeier, the German Foreign Minister, ..”

explicit

“Steinmeier, the German Foreign Minister, ..”foreign minister

explicit

“Messi plays soccer”

foreign minister

explicit

“Messi plays soccer”almost explicitsoccer player

foreign minister

explicit

“Messi plays soccer”almost explicit

“Pavano never even made it to the mound”

soccer player

foreign minister

explicit

“Messi plays soccer”almost explicit

“Pavano never even made it to the mound”

baseball player implicit

soccer player

foreign minister

Applications• KB Construction

• find types for existing entities

• Named Entity Disambiguation

• “Page played amazingly on the stage”

BusinessmanMusician

• Semantic Search

• Give me all documents talk about musicians

Supervised Approaches

• Manually labeled data is scarce

• thousands of types, need sufficient

training data for every type

Distantly Supervised

Approaches

• Idea: automatically generated data

via KB (e.g., Wikipedia)

Approaches

“Klitschko is the mayor of Kiev”

“Klitschko is known for his powerful punches”

Approaches

mayorpolitician

boxer “Klitschko is the mayor of Kiev”

Approaches

Problem: types are context-oblivious

mayorpolitician

boxer “Klitschko is the mayor of Kiev”

FINET• Unsupervised

• Most extractors are unsupervised

• Context-aware

• “Klitschko is the mayor of Kiev” politicianmayor

• Context-aware

• “Klitschko is the mayor of Kiev”

• Super fine-grained

• WordNet as typing system (16K types; per, loc, org)

politicianmayor

FINET Overview1. Preprocessing

2. Candidate Generation

1. Pattern-based extractor [very explicit]

2. Mention-based extractor [explicit]

3. Verb-based extractor [almost explicit]

4. Corpus-based extractor [implicit]

3. Type Selection (via WSD)

Extractor

Stopping condition

Subsequent

Extractor

SelectionYes

Preprocessing

“Albert Einstein discovered the law of

photoelectric effect and he won the Nobel

price in 1921”

Preprocessing

• Identify clauses

• Some extractors operate on clause level

(clauses capture local context)

price in 1921”

Preprocessing

• Identify coarse-grained types [Stanford NER]

• FINET restricts its candidates to hyponyms

• Well studied task: high prec. and recall

• “Albert Einsten”: PER

price in 1921”

Preprocessing

• Coreference resolution

• (“Albert Einstein”, “he”)

price in 1921”

FINET Overview1. Preprocessing

Pattern-based Extractor[final patterns]

targets very explicit types

• “Barack Obama, the president of […]”

• [“Barack Obama”; president-1, president-2, ..]

NAMED ENTITY , (modifier) NOUN (modifier)

mod mod

NAMED ENTITY , (modifier) NOUN (modifier)

mod mod

Stopping Condition: produce at least one type

Pattern-based Extractor[non-final patterns]

• “Shakespeare’s productions”

• production produce producerDER

[“Shakespeare”; producer-1, producer-2, ..]

Poss. + transf.

Pattern-based Extractor[non-final patterns]

• “Shakespeare’s productions”

• production produce producerDER

Stopping Condition: KB lookup Shakespeare writer-1

Shakespeare producer-2

[“Shakespeare”; producer-1, producer-2, ..]

Poss. + transf.

Method Overview1. Preprocessing

Mention-based Extractor

• “Imperial College London”

• [“Imperial College London”; college-1,

college-2, ..]

Mention-based Extractor

Stopping Condition: KB lookup

• “Imperial College London”

• [“Imperial College London”; college-1,

college-2, ..]

Verb-based Extractor

• Nominalization

• “play” “player”

verb deverbal noun

Verb-argument semantic concordance

• “Messi plays in Barcelona”

Example 1: Suffixes

play player“-er”

Example 1: Suffixes

play-1

play-2

play-3

player-1 (player)

player-2 (musician)

player-3 (actor)

player-4 (participant)

Example 1: Suffixes

play-1

play-2

play-3

player-1 (player)

player-2 (musician)

player-3 (actor)

[“Messi”; player, musician, actor, ..]

Example 1: Suffixes

play-1

play-2

play-3

player-1 (player)

player-2 (musician)

player-3 (actor)

[“Messi”; player, musician, actor, ..]

Example 1: Suffixes

• “John committed a crime”

• commit perpetrate perpetrator

[“John”; perpetrator-1]

DERsyn

Example 2: Synonyms

Corpus-based Extractor

• “Messi” & “Cristiano Ronaldo” occur in

sport (soccer)

• Key idea: Collect types of similar entities

via KB

Distributional hypothesis:

similar entities tend to occur in similar context

• Word vectors represent semantic contexts for a

given phrase

• Given a set of phrases, return the k most

similar phrases with respect to context

Word2Vec

“Maradona expects to win in South Africa”

query: {“Maradona”, “South Africa”}

“Parreira coached Brazil in South Africa”

“Dunga replaced Parreira after South Africa”

Mention Type

“Diego Maradona" <coach-1>, ..

“Parreira" <coach-1>, ..

“Carlos Alberto Parreira" <coach-1>, ..

“Dunga" <coach-1>, ..

Stopping Condition: sufficient evidence for types

query: {“Maradona”, “South Africa”}

Mention Type

“Diego Maradona" <coach-1>, ..

“Parreira" <coach-1>, ..

“Carlos Alberto Parreira" <coach-1>, ..

“Dunga" <coach-1>, ..

“Parreira coached Brazil in South Africa”

“Dunga replaced Parreira after South Africa”

Type Selection via

Word Sense Disambiguation

• Given an entity and a set of candidate

• [“Maradona”; soccer_player-1,

football_player-1, coach-1, …]

• Select the best types according to

context

Entity Context for WSD

• Entity-oblivious context

• all words in an input sentence

• Entity-specific context via lexical

expansions

• entity-related words from word vectors

Type Selection via WSD

Naive Bayes trained with word features on WN glosses

and labeled data (if available) [ExtendedLesk].

Entity-oblivious context:

“expects”, “win”, “South Africa”

Entity-specific context:

“coach”, “cup”, “striker”, “mid-fielder”, and “captain”

Experiments

• Datasets

• 500 random sentences from NYT year 2007

• 500 random sentences from CoNLL

• 100 random tweets

• CG: (artifact, event, person, location,

organization)

• FG: ~200 prominent WN types

• SFG: all remaining WN types

Type Granularity

System Type System Total Types Top Categories

FINET WN 16K+ pers, org, loc

HYENA WN 505 all

System CG FG SFG

PCorrect

TypesP

Correct

TypesP

Correct

FINET 87.90 872 72.42 457 70.82 233

FINET (w/o l.) 87.90 872 71.13 436 67.11 204

HYENA 72.40 779 28.26 522 20.65 160

Results on NYT dataset

System CG FG SFG

PCorrect

TypesP

Correct

TypesP

Correct

FINET 87.90 872 72.42 457 70.82 233

FINET (w/o l.) 87.90 872 71.13 436 67.11 204

HYENA 72.40 779 28.26 522 20.65 160

System CG FG SFG

PCorrect

TypesP

Correct

TypesP

Correct

FINET 87.90 872 72.42 457 70.82 233

FINET (w/o l.) 87.90 872 71.13 436 67.11 204

HYENA 72.40 779 28.26 522 20.65 160

Conclusion

• FINET

• A system for detecting types of named entities

• Context-aware

• Unsupervised (mostly)

• Very fine-grained typing system

Mapping CG types to

WN• persons all descendants of

• person-1, imaginary, being-1, characterization-3, and

operator-2 (10584 in total);

• locations all descendants of

• location-1, way-1, and landmass-1 (3681 in total);

• organizations all descendants of

• organization-1 and social group-1 (1968 in total).

• “Messi plays soccer”

• “Messi” is a subject

• “soccer” is direct object

• Add “soccer” as a noun modifier to

the deverbal noun

• Utilize a corpus of frequent (verb,

type) pairs

• “Messi was treated in the hospital”

• [“Messi”; patient-1]

Corpus-based Extractor• Retrieve 100 most related phrases along with

similarity scores

• Filter out non-entity phrases and entities not

compatible with CG type

similarity scores

• Traverse the result list until we collect 50% of

the total score

similarity scores

• Traverse the result list until we collect 50% of

the total score

• If no more that 10 different types were added

add types as candidates

finet - uni-mannheim.de · finet context-aware fine-grained named entity typing luciano del corro*,...

Documents

l’ostéopathie viscérale - ordremk.fr · soient...

fine finet vol25 · fine-t information 201 1 g tin 1 201 (h...

automated template generation for question answering over...

ahmed abdalghani pump presentation

apache openoffice - official site - free yourself...realty...

timec s.a. 482 (back) race ss qw- all down acoordin yes...

coronarographie ffr ivus oct · 2019. 4. 30. · finet g &...

innovation. quality. versatility. - gutter supply ·...

expert talks ifa global markets...better business & better...

automated template generation for question...

finet user's voicefinet user's voice...

impactof’ overxpanon ’strategies:...

the spondylidae in the historical collections of the ... ·...

methods for open information extraction and sense...

fine t userÐs voice (ey *tcscm van fob sql server ...fine t...

42 claude finet © antoine rozès

query-driven on-the-fly knowledge base construction ·...

port of 3d slicer to qt julien finet & jean-christophe...

i molluschi marini nell’alimentazione gallo-romana della...

cear-finet-move workshop on family economics · marco...