word sense disambiguation: a unified evaluation framework ... · word sense disambiguation: a...
TRANSCRIPT
![Page 1: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/1.jpg)
Final ProjectsWord Sense Disambiguation:
A Unified Evaluation Framework and Empirical Comparison
Alessandro Raganato, José Camacho Colladosand Roberto Navigli
lcl.uniroma1.it/wsdeval
![Page 2: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/2.jpg)
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
2
Given the word in context, find the correct sense:
The mouse ate the cheese.
A mouse consists of an object held in one's hand, with one or more buttons.
Word Sense Disambiguation (WSD)
![Page 3: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/3.jpg)
International Workshops on Semantic Evaluation
Many evaluation datasets have been constructed for the task:
○ Senseval 2 (2001)○ Senseval 3 (2004)○ SemEval 2007 ○ SemEval 2013○ SemEval 2015
3Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 4: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/4.jpg)
International Workshops on Semantic Evaluation
Many evaluation datasets have been constructed for the task:
○ Senseval 2 (2001) WN 1.7○ Senseval 3 (2004) WN 1.7.1○ SemEval 2007 WN 2.1○ SemEval 2013 WN 3.0○ SemEval 2015 WN 3.0
Problem:● different formats, construction guidelines and sense
inventory
3Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 5: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/5.jpg)
Building a Unified Evaluation Framework
4
Our goal:
○ build a unified framework for all-words WSD (training and testing)○ use this evaluation framework to perform a fair quantitative and
qualitative empirical comparison
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 6: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/6.jpg)
Building a Unified Evaluation Framework
4
Our goal:
○ build a unified framework for all-words WSD (training and testing)○ use this evaluation framework to perform a fair quantitative and
qualitative empirical comparison
How:
○ standardizing the WSD datasets and training corpora into a unified format
○ semi-automatically converting annotations from any dataset to WordNet 3.0
○ preprocessing the datasets by consistently using the same pipeline.
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 7: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/7.jpg)
Building a Unified Evaluation Framework
5
Pipeline for standardizing any given WSD dataset:
Standardizing format:
○ convert all datasets to a unified XML scheme, where preprocessing information (e.g. lemma, PoS tag) of a given corpus can be encoded
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 8: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/8.jpg)
Building a Unified Evaluation Framework
6
Pipeline for standardizing any given WSD dataset:
WN version mapping:
○ map the sense annotations from its original WordNet version to 3.0● carried out semi-automatically (Daude et al., 2003)
Jordi Daude, Lluis Padro, and German Rigau.Validation and tuning of wordnet mapping techniques. In Proceedings of RANLP 2003.
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 9: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/9.jpg)
Building a Unified Evaluation Framework
7
Pipeline for standardizing any given WSD dataset:
Preprocessing:
○ use the Stanford coreNLP toolkit for part of speech tagging and lemmatization
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 10: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/10.jpg)
Building a Unified Evaluation Framework
8
Pipeline for standardizing any given WSD dataset:
Semi-automatic verification:
○ develop a script to check that the final dataset conforms to the guidelines ○ ensure that the sense annotations match the lemma and the PoS tag provided by
Stanford CoreNLP
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 11: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/11.jpg)
Data - evaluation framework
9
● Training data:○ SemCor, a manually sense-annotated corpus○ OMSTI (One Million Sense-Tagged Instances), a large annotated
corpus, automatically constructed by using an alignment based WSD approach
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 12: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/12.jpg)
Data - evaluation framework
9
● Training data:○ SemCor, a manually sense-annotated corpus○ OMSTI (One Million Sense-Tagged Instances), a large annotated
corpus, automatically constructed by using an alignment based WSD approach
● Testing data:○ Senseval 2, covers nouns, verbs, adverbs and adjectives○ Senseval 3, covers nouns, verbs, adverbs and adjectives○ SemEval 2007, covers nouns and verbs○ SemEval 2013, covers nouns only○ SemEval 2015, covers nouns, verbs, adverbs and adjectives
○ ALL, the concatenation of all five testing data
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 13: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/13.jpg)
Statistics - training data
10Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
Annotations Sense typesWord types
Ambiguity
226,036
911,13433,362
3,730
22.436
1.149
6,8
8,9
![Page 14: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/14.jpg)
Statistics - testing data
11Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
2,2821,850
455
1,644
1,022 5.46.8
8.5
4.9 5.5
![Page 15: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/15.jpg)
Statistics - testing data (ALL)
12Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
○ ALL, the concatenation of all the five evaluation datasets■ Total test instances: 7.253
![Page 16: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/16.jpg)
Statistics - testing data (ALL)
12Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
4,300
1,652955
346
4.8
10.4
3.8 3.1
○ ALL, the concatenation of all the five evaluation datasets■ Total test instances: 7.253
![Page 17: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/17.jpg)
Evaluation
13Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
![Page 18: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/18.jpg)
Evaluation: Comparison systems
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
14
● Knowledge-based
● Supervised
![Page 19: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/19.jpg)
Evaluation: Comparison systems
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
14
● Knowledge-based○ Lesk_extended (Banerjee and Pedersen, 2003)○ Lesk+emb (Basile et al., 2014)○ UKB (Agirre et al., 2014)○ Babelfy (Moro et al., 2014)
![Page 20: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/20.jpg)
Evaluation: Comparison systems (knowledge-based)
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
15
Lesk (Lesk, 1986)
Based on the overlap between the definitions of a given sense and the context of the target word. Two configurations:
- Lesk_extended (Banerjee and Pedersen, 2003): it includes related senses and tf-idf for word weighting.
- Lesk+emb (Basile et al., 2014): enhanced version of Lesk in which similarity between definitions and the target context is computed via word embeddings.
![Page 21: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/21.jpg)
Evaluation: Comparison systems (knowledge-based)
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
16
UKB (Agirre et al., 2014)
Graph-based system which exploits random walks over a semantic network, using Personalized PageRank.
It uses the standard WordNet graph plus disambiguated glosses as connections.
![Page 22: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/22.jpg)
Evaluation: Comparison systems (knowledge-based)
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
16
UKB (Agirre et al., 2014)
Graph-based system which exploits random walks over a semantic network, using Personalized PageRank.
It uses the standard WordNet graph plus disambiguated glosses as connections.
NEW - UKB*: enhanced configuration using sense distributions from SemCor and running Personalized PageRank for each word.
![Page 23: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/23.jpg)
Evaluation: Comparison systems (knowledge-based)
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
17
Babelfy (Moro et al., 2014)
Graph-based system that uses random walks with restart over a semantic network, creating high-coherence semantic interpretations of the input text.
BabelNet as semantic network. BabelNet provides a large set of connections coming from Wikipedia and other resources.
![Page 24: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/24.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
18
Knowledge-based
20 80
50
MCS baseline
65.2
F-Measure (%)
![Page 25: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/25.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
18
Knowledge-based
20 80
5048.7
Lesk_extended
MCS baseline
65.2
F-Measure (%)
![Page 26: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/26.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
18
Knowledge-based
20 80
5048.7 57.5
UKB
MCS baseline
65.2
F-Measure (%)Lesk_extended
![Page 27: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/27.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
18
Knowledge-based
20 80
5048.7 63.7
Lesk+emb
57.5
UKB
MCS baseline
65.2
F-Measure (%)Lesk_extended
![Page 28: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/28.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
18
Knowledge-based
20 80
5048.7 63.7
Lesk+emb
65.5
Babelfy
57.5
UKB
MCS baseline
65.2
F-Measure (%)Lesk_extended
![Page 29: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/29.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
18
Knowledge-based
20 80
5048.7 63.7
Lesk+emb
65.5
Babelfy
57.5
UKB
68.4
Worst supervised system
Supervised systems
MCS baseline
65.2
F-Measure (%)Lesk_extended
![Page 30: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/30.jpg)
Evaluation: Comparison systems
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
19
● Knowledge-based○ Lesk-extended (Banerjee and Pedersen, 2003)○ Lesk+emb (Basile et al., 2014)○ UKB (Agirre et al., 2014)○ Babelfy (Moro et al., 2014)
● Supervised○ IMS (Zhong and Ng, 2010)○ IMS+emb (Iacobacci et al. 2016)○ Context2Vec (Melamud et al., 2016)
![Page 31: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/31.jpg)
Evaluation: Comparison systems (supervised)
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
20
IMS (Zhong and Ng, 2010)
SVM classifier over a set of conventional features: surroundings words, PoS tags and local collocations.
Improvements integrating word embeddings as an additional feature (Taghipour and Ng, 2015; Rothe and Schütze, 2015; Iacobacci et al. 2016) -> IMS+emb.
![Page 32: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/32.jpg)
Evaluation: Comparison systems (supervised)
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
21
Context2Vec (Melamud et al., 2016)
Three steps:
- First, a bidirectional LSTM is trained on an unlabeled corpus.
- Then, this model is used to learn an output (context) vector for each sense annotation in the sense-annotated training corpus.
- Finally, the sense annotation whose context vector is closer to the target word’s context vector is selected as the intended sense.
![Page 33: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/33.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
22
Supervised (SemCor)
80
50
MFS baseline
64.8
F-Measure (%)
20
![Page 34: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/34.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
22
Supervised (SemCor)
80
50
IMS
68.4
MFS baseline
64.8
F-Measure (%)
20
![Page 35: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/35.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
22
Supervised (SemCor)
80
50
IMS
68.4
MFS baseline
64.8
F-Measure (%)
20
Context2Vec
69.0
![Page 36: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/36.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
22
Supervised (SemCor)
80
50
IMS
68.4
MFS baseline
64.8
F-Measure (%)
20
Context2Vec
69.0
IMS+emb
69.6
![Page 37: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/37.jpg)
Evaluation: Results on the concatenation of all datasets
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
22
Supervised (SemCor + OMSTI)
80
50
IMS
68.4
MFS baseline
64.8
F-Measure (%)
20
Context2Vec
69.0
IMS+emb
69.6
+0.4 (OMSTI)
+0.4 (OMSTI)+0.1 (OMSTI)
![Page 38: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/38.jpg)
Evaluation: Analysis
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
24
Training corpus
The automatically-constructed OMSTI helps to improve the results of the supervised systems trained on SemCor only.
Research direction -> (semi)automatic construction of sense-annotated datasets in order to overcome the knowledge-acquisition bottleneck.
![Page 39: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/39.jpg)
Evaluation: Analysis
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
25
Knowledge-based vs. Supervised
Supervised systems clearly outperform knowledge-based systems.
Supervised systems seem to better capture local contexts:
In sum, at both the federal and state government levels at least part of the seemingly irrational behavior voters display in the voting booth may have an exceedingly rational explanation.
![Page 40: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/40.jpg)
Evaluation: Analysis
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
26
Knowledge-based systems
Competitive for nouns, but underperform in other PoS tags.
The Most Common Sense (MCS) baseline is still hard to beat.
Only Babelfy and UKB* manage to outperform this baseline but…
- Babelfy uses the MCS baseline as a back-off strategy.
- The configuration of UKB which outperforms the baseline integrates all the sense distribution from SemCor.
![Page 41: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/41.jpg)
Evaluation: Analysis
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
27
Bias towards the Most Frequent Sense (MFS)
All IMS-based systems answer over 75% of the times with the MFS. Context2Vec is slightly less affected (73.1% on average).
The MFS bias is also present in graph-based systems, confirming the findings of previous studies: Calvo and Gelbukh (2015), Postma et al. (2016).
![Page 42: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/42.jpg)
Evaluation: Analysis
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
28
Low overall performance on verbs
All systems below 58%.
Verbs are extremely fine-grained in WordNet: 10.4 number of senses per verb on average on all datasets (4.8 in nouns and lower in adjectives and adverbs).
For example, the verb keep has 22 meaning in WordNet, 6 of them denoting possession.
![Page 43: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/43.jpg)
Conclusion
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
29
We presented a unified evaluation framework for all-words Word Sense Disambiguation, including standardized training and testing data.
This eases the task of researchers to evaluate their systems and ensures a fair comparison.
![Page 44: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/44.jpg)
Conclusion
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
29
We presented a unified evaluation framework for all-words Word Sense Disambiguation, including standardized training and testing data.
This eases the task of researchers to evaluate their systems and ensures a fair comparison.
Two potential research directions based on semisupervised learning:
- Exploiting large amounts of unlabeled corpora for learning accurate word embeddings or training neural language models
- (Semi)Automatic construction of high-quality sense-annotated corpora
![Page 45: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/45.jpg)
Conclusion
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
29
We presented a unified evaluation framework for all-words Word Sense Disambiguation, including standardized training and testing data.
This eases the task of researchers to evaluate their systems and ensures a fair comparison.
Two potential research directions based on semisupervised learning:
- Exploiting large amounts of unlabeled corpora for learning accurate word embeddings or training neural language models
- (Semi)Automatic construction of high-quality sense-annotated corpora
http://lcl.uniroma1.it/wsdeval
![Page 46: Word Sense Disambiguation: A Unified Evaluation Framework ... · Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison Alessandro Raganato, José Camacho](https://reader034.vdocuments.us/reader034/viewer/2022051814/60354b058095985cc162161d/html5/thumbnails/46.jpg)
Thank you!
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical ComparisonAlessandro Raganato, José Camacho Collados and Roberto Navigli
All the data available at
http://lcl.uniroma1.it/wsdeval