oana adriana Şoica building and ordering a sendis lexicon network
TRANSCRIPT
Oana Adriana Şoica
Building and Ordering a SenDiS Lexicon Network
Page 2
SenDiS
SenDiS operates on a specific lexicon network (LexNet)
– “sense tagged glosses” relations
lexicon networks obtained from other semantic / lexical relations
obtaining a SenDiS LexNet: build a “sense tagged glosses” LexNet
(manually annotate the lexicon with a specific tool) import a “sense tagged glosses” LexNet
(WordNet tagged glosses, as of 2008)
preprocessing (ordering) the SenDiS LexNet (before WSD) truncation of the LexNet leveling the LexNet
Outline
Page 3
SenDiS
o hypernyms
o hyponyms
o similar to
o has part
o synonyms
o antonyms
o holonyms
o meronyms
o coordinate terms
o troponyms
o entailment
Semantic/Lexical Relations
Page 4
SenDiS
An excerpt of the WordNet semantic network* Navigli, R. 2009.Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, Article 10 (2009)
Semantic/Lexical relations: WordNet
Page 5
SenDiSSemantic/Lexical relations: GRAALAN
Tail of relation Head of relation Relation type
{synonym } {synonym} Bidirectional, symmetric
{antonym } {antonym} Bidirectional, symmetric
{paronym} {paronym} Bidirectional, symmetric
{ hypernym } {hyponym} Bidirectional, asymmetric
{connotation} - Unidirectional
{holonym} {meronym} Bidirectional, asymmetric
{homonym} {homonym} Bidirectional, symmetric
{heteronym} {heteronym} Bidirectional, symmetric
{homophone} {homophone} Bidirectional, symmetric
{diminutive of} {diminutive by} Bidirectional, asymmetric
{augmentative of} {augmentative by} Bidirectional, asymmetric
{extension from} {extension into} Bidirectional, asymmetric
{reduction from} {reduction into} Bidirectional, asymmetric
{generalization from} {generalization into} Bidirectional, asymmetric
{specialization from} {specialization into} Bidirectional, asymmetric
{figurative of} {literal for} Bidirectional, asymmetric
{reference to} - Unidirectional
{derived from} {derived into} Bidirectional, asymmetric
{back formatted form} {back formats} Bidirectional, asymmetric
{abstract for} {concretized from} Bidirectional, asymmetric
{with variant} {variant for} Bidirectional, asymmetric
Page 6
SenDiS
manually annotating the glosses from a lexicon(using a specific tool that can ease the process)
importing an existing “gloss tagged” lexicon net (also obtained manually or semi-automatically), this usually translates in a dependency to a specific list of meanings/glosses
Obtaining a SenDiS LexNet
Page 7
SenDiS
o implied a significant effort, usually measured in months, involving several trained linguists
o using a specialized collaborative tool(BuildLNTool – Build Lexicon Network Tool)
o enriching the “gloss tagged” relation with three relative degrees of importance (in the gloss context) weak medium strong or ignoring the gloss word
o SenDiS objective, two LexNets: “gloss tagged” LexNet for the Romanian language “gloss tagged” LexNet for the English language
Creating the SenDiS LexNet
Page 8
SenDiS
o BuildLNTool (Build Lexicon Network Tool) provides:
a visual and effective mechanism to manually annotate the lexicon glosses
a synchronized overview of the already created relations
a browsing mechanism for inspecting the already tagged glosses and relations
BuildLNTool
Page 9
SenDiS
“Lemmas & MWEs” “Lemma \ MWE Info” “Competence & Definition Trees”
“Root & Leaf Meanings” Messages and progress
BuildLNTool - Sections
Page 10
SenDiS
o “Lemmas & MWEs”: list of lexicon entries
o “Root & Leaf Meanings”: list of roots and leafs for the lexicon network
o “Lemma/MWE Info”: current lexicon entry being analyzed
o “Competence & Definition Trees”: spanning trees for a given meaning over the current lexicon net
o section for messages and progress
BuildLNTool – Sections II
Page 11
SenDiS
selection of lexicon entry type
selection of unfinished lexicon entries filter
selection of viewing interval
text filter
lexicon entry text
lexicon entry status
BuildLNTool – Lemmas & MWEs
Page 12
SenDiS
double click
BuildLNTool – Selection of a current lexicon entry
Page 13
SenDiS
lexicon entry text morphologic interpretation
list of meanings filters
meaning/gloss fully tagged
meaning/gloss partially tagged
meaning/gloss not tagged
BuildLNTool – Browsing the meanings of the current lexicon entry
Page 14
SenDiS
double click
BuildLNTool – Selection of a current meaning for tagging
Page 15
SenDiS
unrecognizedgloss constituent
‘Enter’
BuildLNTool – Gloss constituent without interpretations
Page 16
SenDiS
Default setting: Medium
BuildLNTool – Degrees of relevance (in gloss context)
Page 17
SenDiS
‘Strong’ tokens
‘Medium’ tokens
‘Weak’ tokens
Ignored (X) tokens
BuildLNTool – Degrees of relevance II
Page 18
SenDiS
Unsavedannotations
Savedannotations
BuildLNTool – Gloss tagging
Page 19
SenDiS
view of meaning tagging tree
selection of constituent / group of gloss constituents
set / modifyrelevance degree
edit textof gloss constituent
select / modify the sense for the gloss constituent
further annotate meaning / save annotations
chose the next meaning
further on
save annotations
current gloss constituent
withoutsense interpretations
BuildLNTool – Gloss tagging protocol
Page 20
SenDiS
LexNets AllTokens OperatedTokens OpTokensValid OpTokensRelated OpTokens V & R
LL_Romanian - 99% 1,528,819 1,191,942 691,010 720,420 686,210
LL_English - 2% 36,828 30,350 18,523 17,641 17,505
LexNets Glosses Tagged Glosses Targeted Glosses Tags Density
LL_Romanian - 99% 130,087 118,536 58,976 0.5757
LL_English - 2% 259,651 3,496 7,551 0.5767
Built LexNets for Romanian and English
Page 21
SenDiS
o WordNet (3.0) is organized in synsets 117,659 synsets 155,287 words (lexicon entries) 206,941 word-sense pairs (gloss + usage examples)
o the synsets were split and transformed in to a classical lexicon format
o the lexicon network imported:
LexNets Glosses Tagged Glosses Targeted Glosses Tags Density
WordNet 206,941 206,938 59,251 0.3486
WordNet_extendedGlosses 206,941 206,941 83,174 0.3006
LexNets AllTokens OperatedTokens OpTokensValid OpTokensRelated OpTokens V & R
WordNet 2,394,190 2,394,190 2,394,189 834,803 834,803
WordNet_extendedGlosses 3,114,968 3,114,968 3,114,967 936,397 936,397
Imported WordNet tagged glosses
Page 22
SenDiS
o “gloss tagged” lexicon nets are large and dense graphs between 100,000 and 200.000 vertices over 1,000,000 edges / arcs
o to ease the operation with such graphs, “gloss tagged” lexicon nets can be preprocessed and optimized truncation of a lexicon net leveling of a lexicon net
o aims when optimizing a lexicon net elimination of loops or strong connected components a minimum number of removed edges leveling on a minimum number of levels minimization/maximization of roots/leafs vertices
Ordering a SenDiS LexNet
Page 23
SenDiS
e9
e4 e5 e6 e7
e8
e1 e2 e3
A minimal lexicon net in the original form
Unordered LexNet
Page 24
SenDiS
9
1
2
3
4
5
6
7
8
V
e11
e1
e2
e3
e4
e5
e6
e7
e8
e9
10
e10
11
B
The same minimal lexicon net leveled
Ordered (leveled) LexNet
Page 25
SenDiS
LNs Vertices Edges InOLN
Algorithm Edges Out Edges Removed Levels Time (s)
wn 202,361 834,803 Patentv1 821,048 13,755 192 4.5
wn_ex 205,188 936,397 Patentv1 936,397 74,526 382 5.7
ro_48% 72,067 318,741 Patentv1 308,592 10,149 195 1.6
ro_78% 100,175 523,192 Patentv1 504,210 18,982 244 2.3
ro_99% 120,472 686,784 Patentv1 659,030 27,754 291 2.8
ro_48% 130,407 318,741 NT_eades 308,334 10,407 58 60
ro_99% 130,099 686,784 NT_eades 654,025 32,759 70 330
wn_ex 206,941 936,397 NT_eades 904,992 31,405 46 1,315
Results on leveling experimental LexNets