![Page 1: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/1.jpg)
T. Flati, D. Vannella, T. Pasini, R. Navigli
2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project
ERC Starting GrantMultiJEDI No. 259234
![Page 2: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/2.jpg)
The Wikipedia structure
Article pages~4M
Category pages~ 700K
Two noisy graphs with no explicit hypernym relation.
![Page 3: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/3.jpg)
The Wikipedia structure: an examplePages Categories
Mickey Mouse
Funny AnimalSuperman
Cartoon
Donald Duck
Disney comics characters
Disney comicsDisney character
Fictional characters by
medium
Comics by genre
Fictional characters
The Walt Disney Company
![Page 4: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/4.jpg)
Our goal
To automatically create a Wikipedia Bitaxonomy for Wikipedia pages and categories in a
simultaneous fashion.
pages categories
![Page 5: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/5.jpg)
Our goal
To automatically create a Wikipedia Bitaxonomy for Wikipedia pages and categories in a
simultaneous fashion.
The page and category level are mutually beneficial for inducing a wide-coverage and fine-grained integrated taxonomy
KEY IDEA
![Page 6: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/6.jpg)
Key idea Pages Categories
Disney comics characters
Disney comicsDisney character
The Walt Disney Company
Fictional characters by
medium
Comics by genre
Fictional characters
Mickey Mouse
Funny AnimalSuperman
Cartoon
Donald Duckis a
is a
is a
is a
is a
is a
is ais a is a
![Page 7: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/7.jpg)
A 3-phase method
pages categories
Starting from two noisy graphs
![Page 8: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/8.jpg)
A 3-phase method1. Build the page taxonomy
pages
![Page 9: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/9.jpg)
A 3-phase method1. Build the page taxonomy2. Bitaxonomy Algorithm
pages categories
![Page 10: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/10.jpg)
A 3-phase method
pages categories
1. Build the page taxonomy2. Bitaxonomy Algorithm
![Page 11: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/11.jpg)
pages
1. Build the page taxonomy
A 3-phase method
+50%categories
categories
3. Refine the category taxonomy2. Bitaxonomy Algorithm
![Page 12: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/12.jpg)
Contributions
1. Self-contained approach
2. Page taxonomy and category taxonomy built simultaneously
3. State-of-the-art results when compared to all other available taxonomies
![Page 13: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/13.jpg)
The WiBi Page taxonomy1
![Page 14: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/14.jpg)
Assumptions
• The first sentence of a page is a good definition (also called
gloss)
![Page 15: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/15.jpg)
The WiBi Page taxonomy
1. [Syntactic step]Extract the hypernym lemma from a page definition using a syntactic parser;
2. [Semantic step]Apply a set of linking heuristics to disambiguate the extracted lemma.
Scrooge McDuck is a character […]
Syntactic step
Hypernym lemma: character
A
Semantic step
Scrooge McDuck is a character[…]nn nsubj
cop
![Page 16: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/16.jpg)
The semantic step
5 cascadinglinking heuristics
Ambiguoushypernym(‘player’)
Linking heuristic
Target page(Cristiano Ronaldo)
Disambiguatedhypernym
(Football player)
1. Crowdsourced2. Category3. Multiword4. Monosemous5. Distributional
![Page 17: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/17.jpg)
1. Crowdsourced heuristic
Mickey Mouse is a funny animal cartoon character and the official mascot ofThe Walt Disney Company.
Use the links from the crowd!
![Page 18: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/18.jpg)
Given a page and its ambiguous hypernym, exploit its categories to build a distribution of the hypernym’s senses.
Characters in Disney package films
Disney comics charactersAmbiguous
hypernym: Character
Donald Duck Pluto
Hook
Mickey Mouse
José Carioca
2. Category heuristic
Goofy
![Page 19: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/19.jpg)
2. Category heuristicGiven a page and its ambiguous hypernym, exploit its categories to build a distribution of the hypernym’s senses.
Donald Duck Pluto
Hook
Mickey Mouse
José Carioca
Goofy
Goofy is a funny animal cartoon character […]
José Carioca is a Disney cartoon character […]
Captain James Hook is a fictional character […]
Mickey Mouse is a funny animal cartoon character […]
Pluto, also called Pluto the Pup, is a cartoon character […]
Mickey Mouse is a funny animal cartoon character […]
Characters in Disney package films
Disney comics charactersAmbiguous
hypernym: Character
![Page 20: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/20.jpg)
2. Category heuristicGiven a page and its ambiguous hypernym, exploit its categories to build a distribution of the hypernym’s senses.
Donald Duck
Goofy is a funny animal cartoon character […]
José Carioca is a Disney cartoon character […]
Captain James Hook is a fictional character […]
Mickey Mouse is a funny animal cartoon character […]
Pluto, also called Pluto the Pup, is a cartoon character […]
Mickey Mouse is a funny animal cartoon character […]
Character (arts) 5, Funny animal 1
Character (arts) 3, Funny animal 1, Cartoon 1
Character(arts) 8, Funny animal 2, Cartoon 1Ambiguous hypernym: Character
Characters in Disney package films
Disney comics characters
![Page 21: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/21.jpg)
Character(arts) 8, Funny animal 2, Cartoon 1
2. Category heuristicGiven a page and its ambiguous hypernym, exploit its categories to build a distribution of the hypernym’s senses.
Donald Duck
Character(arts)Ambiguous hypernym: Character
![Page 22: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/22.jpg)
Page taxonomy linking heuristics
Category(1.603M)
Multiword(65K) Monosemous
(161K)
Distributional(561K)
Crowdsourced(1.338M)
1
2
34
5
![Page 23: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/23.jpg)
Page taxonomy evaluation
![Page 24: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/24.jpg)
The story so far
1
Noisy page graph Page taxonomy
![Page 25: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/25.jpg)
2The Bitaxonomyalgorithm
![Page 26: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/26.jpg)
The Bitaxonomy algorithm
The information available in the two taxonomies is mutually beneficial;● At each step exploit one taxonomy to update
the other and vice versa;● Repeat until convergence.
![Page 27: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/27.jpg)
pages categories
Real MadridF.C.
Football team Football teams
Football clubsin Madrid
is a
Atlético Madrid
The Bitaxonomy algorithm
Football clubs
Starting from the page taxonomy
![Page 28: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/28.jpg)
Real MadridF.C.
Football team Football teams
Football clubsin Madrid
is a
is a
The Bitaxonomy algorithm
Football clubs
Exploit the cross links to infer hypernym relations in the category taxonomy
Atlético Madrid
pages categories
![Page 29: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/29.jpg)
Real MadridF.C.
Football team Football teams
Football clubsin Madrid
is a
is a
is a
The Bitaxonomy algorithm
Football clubs
Take advantage of cross links to infer back is-a relations in the page taxonomy
Atlético Madrid
pages categories
![Page 30: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/30.jpg)
Real MadridF.C.
Football team Football teams
Football clubsin Madrid
is a
is a
is a
The Bitaxonomy algorithm
Football clubs
is a
Use the relations found in previous step to infer new hypernym edges
Atlético Madrid
pages categories
![Page 31: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/31.jpg)
Atlético MadridReal Madrid
F.C.
Football team Football teams
Football clubsin Madrid
is a
is a
is a
The Bitaxonomy algorithm
Football clubs
is a
Mutual enrichment of both taxonomies until convergence
pages categories
![Page 32: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/32.jpg)
Page taxonomy evaluation (cont’d)Sensible 3% increment in terms of recall and coverage,with unvaried precision
![Page 33: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/33.jpg)
Category taxonomy evaluation
![Page 34: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/34.jpg)
The story so far
2
![Page 35: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/35.jpg)
3The WiBi category taxonomy refinement
![Page 36: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/36.jpg)
Comics characters by protagonist
Comics characters
Garfield characters
Category taxonomy refinement
Some categories are affected by some structural problems.
pages categories
No pagesassociated!
![Page 37: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/37.jpg)
Category taxonomy refinement● 3 refinement procedures to obtain broader
coverage for categorieso Single super categoryo Sub-categorieso Super-categories
![Page 38: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/38.jpg)
Single super category
This category has only 1 outgoing edge
Comics characters by protagonist
Comics characters
Garfield characters
Animated television characters by series
Animated characters
Fictional characters by medium
Animation
So we promote its only super category to hypernym
![Page 39: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/39.jpg)
Sub-categories
Comics characters by company
Disney comics
Comics by companyComics characters
DC Comicscharacters
Marvel Comicscharacters
Comics titlesby company
Focus on subcategories which have already been covered!
![Page 40: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/40.jpg)
Sub-categories
Comics characters by company
Disney comics
Comics by companyComics characters
DC Comicscharacters
Comics titlesby company
Marvel Comicscharacters
Focus on subcategories which have already been covered!
Only 1 path ending in u
2 pathsending in v
![Page 41: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/41.jpg)
Category taxonomy evaluation: coverage
+50%categoriescovered!
1SUP SUB SUPER
![Page 42: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/42.jpg)
Category taxonomy evaluation: P & R
Iterations1SUP SUB SUPER
+35%recall
86%
![Page 43: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/43.jpg)
Experimental setup
● We created 2 datasets:o 1000 randomly sampled pages;o 1000 randomly sampled categories.
● Each item was annotated with the most suitable generalization (lemma+page or category).
![Page 44: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/44.jpg)
Competitors
WikiNet
MENTA
WikiTaxonomy
pages categories
![Page 45: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/45.jpg)
Measures
● We calculated typical measures to assess the quality of all the possible taxonomies;o Precisiono Recallo Coverageo Specificityo Granularity
![Page 46: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/46.jpg)
Page taxonomy comparison
![Page 47: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/47.jpg)
Page taxonomy comparison
![Page 48: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/48.jpg)
Category taxonomy comparison
![Page 49: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/49.jpg)
Category taxonomy comparison
![Page 50: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/50.jpg)
Category taxonomy comparison
Specificitymeasure
![Page 51: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/51.jpg)
Measuring specificityA system is more specific than another when the hypernym(s) provided by the former are more specific/informative than the latter.
System 1
“Singer”System 2
“Swing singer”
“Frank Sinatra is a”
<less specific than
![Page 52: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/52.jpg)
Page taxonomy specificityRatio of the times in which WiBi provided
a more specificanswer than the other system
![Page 53: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/53.jpg)
Page taxonomy specificityRatio of the times in which WiBi
provided a less specific answer than the other system
![Page 54: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/54.jpg)
Category taxonomy specificity
![Page 55: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/55.jpg)
Measuring granularity
pages categories
![Page 56: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/56.jpg)
Conclusions● Unified, 3-phase approach
to the construction of a bitaxonomyfor the English Wikipedia;
● Self-contained, no additionalresources or supervision required;
● Nearly full coverage of Wikipedia pages and categories;● State-of-the-art performance both on pages and categories.
wibitaxonomy.org
![Page 57: 2 Is Bigger (and Better) Than 1: the Wikipedia Bitaxonomy Project](https://reader036.vdocuments.us/reader036/viewer/2022062321/56813370550346895d9a8591/html5/thumbnails/57.jpg)
Tiziano Flati, Daniele Vannella, Tommaso Pasini, Roberto Navigli
Linguistic Computing Laboratorylcl.uniroma1.it