(open) information extraction: where are we going?dellibovi/talks/talk_oie_ai2.pdf ·...
TRANSCRIPT
![Page 1: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/1.jpg)
(Open) Information Extraction:Where are we going?
Claudio Delli BoviJuly 18th, 2016
![Page 2: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/2.jpg)
About me
Second-year PhD student
http://wwwusers.di.uniroma1.it/~dellibovi
LCL group @ Sapienza
Advisor: prof. Roberto Navigli
bn:17381128n
Focus (so far): Disambiguation, (Open) Information Extraction
![Page 3: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/3.jpg)
Outline
BabelNet and friends: some backgroundResearch work @ LCL Sapienza
DefIE: OIE from textual definitionsDelli Bovi, Telesca, Navigli: TACL 2015
KBUnify: KB disambiguation and unificationDelli Bovi, Espinosa-Anke, Navigli: EMNLP 2015
![Page 4: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/4.jpg)
Outline
BabelNet and friends: some backgroundResearch work @ LCL Sapienza
DefIE: OIE from textual definitionsDelli Bovi, Telesca, Navigli: TACL 2015
KBUnify: KB disambiguation and unificationDelli Bovi, Espinosa-Anke, Navigli: EMNLP 2015
![Page 5: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/5.jpg)
Linguistic Computing Laboratory (LCL)@ Sapienza University of Rome
● Part of the Computer Science Department of Sapienza, focused on Natural Language Processing
● Some projects we have been involved in:
○ MultiJEDI (1.3M €): ERC Starting Grant○ LIDER (1.5 M €): EU CSA○ Google Focused Research Award (300k $)
![Page 6: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/6.jpg)
http://multijedi.org/
![Page 7: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/7.jpg)
![Page 8: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/8.jpg)
● To the best of our knowledge, the largest multilingual encyclopedic dictionary and semantic network (almost 14M entries in 271 languages and 380M semantic connections)
![Page 9: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/9.jpg)
● To the best of our knowledge, the largest multilingual encyclopedic dictionary and semantic network (almost 14M entries in 271 languages and 380M semantic connections)
● Initially created as an integration of Wikipedia and WordNet, now BabelNet is a merger of many different resources (Wiktionary, Wikidata, OmegaWiki, VerbNet, ImageNet, …)
![Page 10: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/10.jpg)
● The integration is performed via an automatic linking algorithm and by filling in lexical gaps with the aid of Machine Translation
![Page 11: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/11.jpg)
● The integration is performed via an automatic linking algorithm and by filling in lexical gaps with the aid of Machine Translation
● BabelNet is composed of Babel Synsets, concepts or entities lexicalized (“WordNet-style”) in many languages and featuring:
● is-a relations● domain and categories
● images and definitions● translations
![Page 12: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/12.jpg)
![Page 13: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/13.jpg)
BabelNet and friends
![Page 14: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/14.jpg)
BabelNet and friends
BabelfyA graph-based algorithm for multilingual joint Word Sense Disambiguation and Entity Linking, based on BabelNet
![Page 15: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/15.jpg)
BabelNet and friends
BabelfyA graph-based algorithm for multilingual joint Word Sense Disambiguation and Entity Linking, based on BabelNet
The Wikipedia BitaxonomyAn iterative algorithm for the automatic creation of a “bitaxonomy” for Wikipedia pages and categories
… and much more!
![Page 16: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/16.jpg)
BabelNet and my research
● BabelNet (especially in its early stages) was conceived as a lexico-semantic resource more than an actual knowledge base:
○ semantic connections are mostly lexical relations from WordNet or unspecified “relatedness edges” derived from Wikipedia hyperlinks
semantically related
semantically
related
Atom Heart Mother
Pink FloydNeil Armstrong NASA
![Page 17: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/17.jpg)
BabelNet and my research
● Construct from BabelNet a proper knowledge base with labeled relations (X is album by Y, X worked at Y, ... )
● BabelNet (especially in its early stages) was conceived as a lexico-semantic resource more than an actual knowledge base:
○ semantic connections are mostly lexical relations from WordNet or unspecified “relatedness edges” derived from Wikipedia hyperlinks
● Use Open Information Extraction!
![Page 18: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/18.jpg)
(Open) Information Extraction
OIE is great, but…
Sparsity: many relation phrases express the same relationship (e.g. synonyms, paraphrases)
Ambiguity: arguments (and relation phrases) are ambiguous!
![Page 19: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/19.jpg)
Outline
BabelNet and friends: some backgroundResearch work @ LCL Sapienza
DefIE: OIE from textual definitionsDelli Bovi, Telesca, Navigli: TACL 2015
KBUnify: KB disambiguation and unificationDelli Bovi, Espinosa-Anke, Navigli: EMNLP 2015
![Page 20: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/20.jpg)
DefIE: OIE from textual definitions
The idea:instead of targeting massive and noisy corpora (like the web) and then trying to find a smart way to cope with the noise
target smaller but “denser” (and virtually noise-free) corpora of definitional knowledge.
![Page 21: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/21.jpg)
DefIE: OIE from textual definitions
The idea:instead of targeting massive and noisy corpora (like the web) and then trying to find a smart way to cope with the noise
target smaller but “denser” (and virtually noise-free) corpora of definitional knowledge.
Apply OIE techniques to extract as much information as possible!
![Page 22: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/22.jpg)
The tools:- An underlying inventory/knowledge base (to which arguments and relation patterns will be connected)
- A WSD/EL system (to disambiguate concepts and entity mentions across the input text)
- A syntactic parser (to construct meaningful relation patterns and avoid sparsity)
DefIE: OIE from textual definitions
![Page 23: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/23.jpg)
The tools:
14 million entries
both lexicographic and encyclopedic knowledge
- An underlying inventory/knowledge base (to which arguments and relation patterns will be connected)
- A WSD/EL system (to disambiguate concepts and entity mentions across the input text)
- A syntactic parser (to construct meaningful relation patterns and avoid sparsity)
http://babelnet.org
DefIE: OIE from textual definitions
![Page 24: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/24.jpg)
The tools:
unified graph-based approach to EL and WSD
unsupervised, based on BabelNet
- An underlying inventory/knowledge base (to which arguments and relation patterns will be connected)
- A WSD/EL system (to disambiguate concepts and entity mentions across the input text)
- A syntactic parser (to construct meaningful relation patterns and avoid sparsity)
http://babelfy.org
DefIE: OIE from textual definitions
![Page 25: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/25.jpg)
The tools:
log-linear parser and supertagger based on CCG
(theoretically) suited to long-distance dependencies
- An underlying inventory/knowledge base (to which arguments and relation patterns will be connected)
- A WSD/EL system (to disambiguate concepts and entity mentions across the input text)
- A syntactic parser (to construct meaningful relation patterns and avoid sparsity)
http://svn.ask.it.usyd.edu.au/trac/candc
DefIE: OIE from textual definitions
![Page 26: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/26.jpg)
“Atom Heart Mother is the fifth album by English band Pink Floyd.”
1. Extracting relation instances
DefIE: How it works
Textual definition d
![Page 27: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/27.jpg)
Atom
“Atom Heart Mother is the fifth album by English band Pink Floyd.”
1. Extracting relation instances
DefIE: How it works
Parsingis
Mother
Heart
fifth
album
the
by
Pink
Floyd
Englishband
mod
mod
subj comp
mod det
prep
pobj
mod
mod
mod
Disambiguation Atom Heart Mother is the fifth album by English band Pink Floyd
bn:02070902n bn:03292767n
bn:00002488n
bn:00102248a
bn:00008280n
Dependency graph Gd
Sense mappings Sd
![Page 28: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/28.jpg)
Atom
“Atom Heart Mother is the fifth album by English band Pink Floyd.”
1. Extracting relation instances
DefIE: How it works
is
Mother
Heart
fifth
album
the
by
Pink
Floyd
Englishband
mod
mod
subj comp
mod det
prep
pobj
mod
mod
mod
Atom Heart Mother is the fifth album by English band Pink Floyd
bn:02070902n bn:03292767n
bn:00002488n
bn:00102248a
bn:00008280n
isAtom Heart
Motherbn:02070902n
subjcomp
albumbn:00002488n
Pink Floydbn:03292767n
Englishbn:00102248a
bandbn:00008280n
fifth
the by
mod
det prep
pobj
mod mod
Syntactic-Semantic Graph S sem
d
![Page 29: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/29.jpg)
1. Extracting relation instances
DefIE: How it works
isAtom Heart
Motherbn:02070902n
subjcomp
albumbn:00002488n
Pink Floydbn:03292767n
Englishbn:00102248a
bandbn:00008280n
fifth
the by
mod
det prep
pobj
mod mod
![Page 30: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/30.jpg)
1. Extracting relation instances
DefIE: How it works
isAtom Heart
Motherbn:02070902n
subjcomp
albumbn:00002488n
Pink Floydbn:03292767n
Englishbn:00102248a
bandbn:00008280n
fifth
the by
mod
det prep
pobj
mod modalbum
bn:00002488n
= Atom Heart Mother bn:02070902n
= Pink Floyd bn:03292767nExtr
actio
n 1
![Page 31: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/31.jpg)
1. Extracting relation instances
DefIE: How it works
isAtom Heart
Motherbn:02070902n
subjcomp
albumbn:00002488n
Pink Floydbn:03292767n
Englishbn:00102248a
bandbn:00008280n
fifth
the by
mod
det prep
pobj
mod modalbum
bn:00002488n
= Atom Heart Mother bn:02070902n
= Pink Floyd bn:03292767nExtr
actio
n 1
= Atom Heart Mother bn:02070902n
= album bn:00002488nExtr
actio
n 2
![Page 32: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/32.jpg)
1. Extracting relation instances
DefIE: How it works
albumbn:00002488n
R1:
R2:
⟨Atom Heart Mother, album⟩⟨Pink Floyd, band⟩
⟨Seattle, city⟩
…
⟨Atom Heart Mother, Pink Floyd⟩⟨Mutter, Rammstein⟩
⟨Can’t Get Enough, Barry White⟩
…
![Page 33: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/33.jpg)
1. Extracting relation instances
DefIE: How it works
albumbn:00002488n
R1:
R2:
Domain Range
⟨Atom Heart Mother, album⟩⟨Pink Floyd, band⟩
⟨Seattle, city⟩
…
⟨Atom Heart Mother, Pink Floyd⟩⟨Mutter, Rammstein⟩
⟨Can’t Get Enough, Barry White⟩
…
![Page 34: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/34.jpg)
2. Relation typing and scoring
DefIE: How it works
![Page 35: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/35.jpg)
2. Relation typing and scoring
DefIE: How it works
For each relation R:
Substitute each domain and range argument with its hypernym h (using the BabelNet taxonomy) and generate a probability distribution over semantic types for the two sets
Compute the entropy of R as
![Page 36: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/36.jpg)
2. Relation typing and scoring
DefIE: How it works
For each relation R:
Substitute each domain and range argument with its hypernym h (using the BabelNet taxonomy) and generate a probability distribution over semantic types for the two sets
Compute the score of R as
Domain and range entropy of R
Length of the relation pattern of R
Total number of extracted instances
for R
![Page 37: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/37.jpg)
2. Relation typing and scoring
DefIE: How it works
![Page 38: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/38.jpg)
3. Relation taxonomization
DefIE: How it works
![Page 39: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/39.jpg)
3. Relation taxonomization
DefIE: How it works
Hypernym generalization
isStudio album by
ci
is album by
album
work of art
creation
H(c )i
cj
![Page 40: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/40.jpg)
3. Relation taxonomization
DefIE: How it works
Hypernym generalization
isStudio album by
ci
is album by
cj
album
work of art
creation
H(c )i
Substring generalization
isStudio album by
ni
is album by
nj
studio album
modifier head
![Page 41: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/41.jpg)
Dataset:whole set of English textual definitions in BabelNet 2.5
4 357 327 items from 5 different sources (Wikipedia, WordNet, Wikidata, Wiktionary, OmegaWiki)
DefIE: Setup
![Page 42: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/42.jpg)
DefIE NELL PATTY ReVerb WiSeNet
# Relations 255 881 298 1 631 531 664 746 245 935
Avg. extractions 81.68 7 013.03 9.68 22.16 9.24
# Extractions 20 352 903 2 089 883 15 802 946 14 728 268 2 271 807
# Entities 2 398 982 1 996 021 1 087 907 3 327 425 1 636 307
# Edges in the taxonomy
44 412 - 20 339 - -
DefIE: Results
![Page 43: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/43.jpg)
Other evaluations:
- Precision and coverage of relations
- Novelty of information
- Quality of relation taxonomization
- Quality of entity linking/disambiguation
- Impact of definition sources
…
DefIE: Results
![Page 44: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/44.jpg)
Where from here?
- Relation clustering (as in PATTY and WiSeNet)
- Multilinguality
- Relational learning and KB completion
- Harvest definitions from the web
- Adapt to “general” text
…
DefIE: Future work
![Page 45: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/45.jpg)
Outline
BabelNet and friends: some backgroundResearch work @ LCL Sapienza
DefIE: OIE from textual definitionsDelli Bovi, Telesca, Navigli: TACL 2015
KBUnify: KB disambiguation and unificationDelli Bovi, Espinosa-Anke, Navigli: EMNLP 2015
![Page 46: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/46.jpg)
KB-Unify: Knowledge base unification via sense embeddings and disambiguation
The idea:
Open Information Extraction system
PATTYWiseNet...
NELLReVerb...
Linked Resources
Unlinked Resources
![Page 47: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/47.jpg)
KB-Unify: Knowledge base unification via sense embeddings and disambiguation
The idea:
Open Information Extraction system
PATTYWiseNet...
NELLReVerb...
Linked Resources
Unlinked Resources
![Page 48: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/48.jpg)
The tools:- A WSD/EL system (to disambiguate unlinked resources)
- A unified sense inventory S (to make the various resources “speak to each other”)
- A unified vector space VS (to associate a vector with each item of S)
KB-Unify: Knowledge base unification via sense embeddings and disambiguation
![Page 49: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/49.jpg)
The tools:- A WSD/EL system (to disambiguate unlinked resources)
- A unified sense inventory S (to make the various resources “speak to each other”)
- A unified vector space VS (to associate a vector with each item of S)
Babelfy
Babelnet
KB-Unify: Knowledge base unification via sense embeddings and disambiguation
![Page 50: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/50.jpg)
The tools:- A WSD/EL system (to disambiguate unlinked resources)
- A unified sense inventory S (to make the various resources “speak to each other”)
- A unified vector space VS (to associate a vector with each item of S)
Sense-based embedding model
Popular word2vec architecture (skip- gram) trained on a sense-annotated corpus
SensEmbed (Iacobacci et al., 2015)
KB-Unify: Knowledge base unification via sense embeddings and disambiguation
![Page 51: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/51.jpg)
A bird’s-eye view
KB-Unify: How it works
![Page 52: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/52.jpg)
A bird’s-eye viewuse BabelNet mappings to redefine each linked resource
disambiguate each unlinked resource using BabelNet as sense inventory (more on this later!)
KB-Unify: How it works
![Page 53: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/53.jpg)
Disambiguation
KB-Unify: How it works
![Page 54: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/54.jpg)
Two basic intuitions:1. Among all triples in target knowledge base, some of them
(even if ambiguous) will be easier to disambiguate
e.g. 〈 Armstrong , works for , NASA 〉
Disambiguation
KB-Unify: How it works
![Page 55: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/55.jpg)
Two basic intuitions:1. Among all triples in target knowledge base, some of them
(even if ambiguous) will be easier to disambiguate;
2. In general, the disambiguation strategy should vary according to the degree of specificity of each relation
Two basic intuitions:1. Among all triples in target knowledge base, some of them
(even if ambiguous) will be easier to disambiguate
e.g. 〈 Armstrong , works for , NASA 〉
Disambiguation
KB-Unify: How it works
![Page 56: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/56.jpg)
Group the set of unlinked triples by relation
Disambiguation
For each relation r:● Extract and disambiguate a subset of high-confidence seed
argument pairs for r ;● Estimate the specificity of r by looking at the distribution of its
disambiguated seeds in the vector space VS ; ● Disambiguate the remaining argument pairs of r with Babelfy
either triple-by-triple (if r is general) or all at once (if r is specific).
KB-Unify: How it works
![Page 57: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/57.jpg)
Identifying seed argument pairs
KB-Unify: How it works
![Page 58: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/58.jpg)
Identifying seed argument pairs
KB-Unify: How it works
![Page 59: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/59.jpg)
Identifying seed argument pairs
ζdisSeed
Disambiguation Confidence
KB-Unify: How it works
![Page 60: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/60.jpg)
Ranking relations by specificity
Domain/Range Centroids
Domain/Range Variances
KB-Unify: How it works
![Page 61: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/61.jpg)
Ranking relations by specificity
Domain/Range Centroids
Domain/Range Variances
spec
Specificity threshold:
KB-Unify: How it works
![Page 62: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/62.jpg)
unlinked triples specificity ranking
disambiguated seeds
δspec
Babelfy
general
specific
triple-by-triple disambiguation
relation-by-relation disambiguation
Disambiguation with Relation Context
KB-Unify: How it works
![Page 63: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/63.jpg)
A bird’s-eye view
KB-Unify: How it works
![Page 64: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/64.jpg)
Relation alignment
KB-Unify: How it works
![Page 65: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/65.jpg)
Relation alignment
KB-Unify: How it works
![Page 66: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/66.jpg)
Relation alignment
KB-Unify: How it works
![Page 67: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/67.jpg)
Relation alignment
KB-Unify: How it works
![Page 68: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/68.jpg)
Relation alignment
KB-Unify: How it works
![Page 69: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/69.jpg)
Evaluation
Experimental setup:
KB-Unify: Experiments
![Page 70: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/70.jpg)
Disambiguation
KB-Unify: Experiments
![Page 71: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/71.jpg)
Specificity ranking
KB-Unify: Experiments
![Page 72: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/72.jpg)
Specificity ranking
KB-Unify: Experiments
![Page 73: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/73.jpg)
Specificity ranking
KB-Unify: Experiments
![Page 74: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/74.jpg)
Cross-resource relation alignment
KB-Unify: Experiments
![Page 75: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/75.jpg)
Cross-resource relation alignment
KB-Unify: Experiments
![Page 76: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/76.jpg)
KB-Unify: Future work
Where from here?
- Less “naïve” relation alignment procedure
- Iterative algorithm for disambiguation and alignment(EM-style)
- Unify OIE-based KBs with hand-curated resources(Wikidata, DBpedia, etc.)
…
![Page 77: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/77.jpg)
Wrap up and Conclusion
![Page 78: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/78.jpg)
DefIE: A full-fledged OIE pipeline targeted to textual definitions, with explicit semantic characterization of both arguments and relation patterns
Wrap up and Conclusion
![Page 79: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/79.jpg)
DefIE: A full-fledged OIE pipeline targeted to textual definitions, with explicit semantic characterization of both arguments and relation patterns
KB-Unify: An approach to knowledge base disambiguation and unification based on a shared sense inventory and a sense-based vector space model
Wrap up and Conclusion
![Page 80: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/80.jpg)
Take-home message(s):
Web-scale OIE is absolutely great, but…
1. Definitional knowledge is important: sometimes it is worth it to just step back and analyze from where valuable information is extracted (quality vs. quantity)
2. Making sense of the output is important: semantic analysis can be used to let different OIE outputs “speak to each other” and benefit from mutual enrichment
Wrap up and Conclusion
![Page 81: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/81.jpg)
Take-home message(s):
Web-scale OIE is absolutely great, but…
1. Definitional knowledge is important: sometimes it is worth just stepping back and analyze from where valuable information is extracted (quality vs. quantity)
Making sense of the output is important: semantic analysis can be used to let different OIE outputs “speak to each other” and benefit from mutual enrichment
Wrap up and Conclusion
![Page 82: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/82.jpg)
Take-home message(s):
Web-scale OIE is absolutely great, but…
1. Definitional knowledge is important: sometimes it is worth just stepping back and analyze from where valuable information is extracted (quality vs. quantity)
2. Making sense of the output is important: semantic analysis can be used to let different OIE outputs “speak to each other” and benefit from mutual enrichment
Wrap up and Conclusion
![Page 83: (Open) Information Extraction: Where are we going?dellibovi/talks/talk_OIE_AI2.pdf · 2016-07-23 · (Open) Information Extraction: Where are we going? Claudio Delli Bovi July 18th,](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6b45027f8b9a60188d41bb/html5/thumbnails/83.jpg)
Thank you!
xkcd, “Extended Mind”