Download - Evaluating the Waspbench
![Page 1: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/1.jpg)
Evaluating the Waspbench
A Lexicography Tool Incorporating Word Sense
Disambiguation
Rob Koeling, Adam Kilgarriff,
David Tugwell, Roger Evans
ITRI, University of Brighton
Credits: UK EPSRC grant WASPS, M34971
![Page 2: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/2.jpg)
Lexicographers need NLP
![Page 3: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/3.jpg)
NLP needs lexicography
![Page 4: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/4.jpg)
Word senses: nowhere truer
Lexicography– the second hardest part
![Page 5: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/5.jpg)
Word senses: nowhere truer
Lexicography– the second hardest part
NLP– Word sense disambiguation (WSD)
SENSEVAL-1 (1998): 77% Hector SENSEVAL-2 (2001): 64% WordNet
![Page 6: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/6.jpg)
Word senses: nowhere truer
Lexicography– the second hardest part
NLP– Word sense disambiguation (WSD)
SENSEVAL-1 (1998): 77% Hector SENSEVAL-2 (2001): 64% WordNet
– Machine Translation Main cost is lexicography
![Page 7: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/7.jpg)
Synergy
The WASPBENCH
![Page 8: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/8.jpg)
Inputs and outputs Inputs
– Corpus (processed)– Lexicographic expertise
![Page 9: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/9.jpg)
Inputs and outputs Outputs
– Analysis of meaning/translation repertoire – Implemented:
Word expert Can disambiguate
A “disambiguating dictionary”
![Page 10: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/10.jpg)
Inputs and outputs
MT needs rules of form
in context C, S => T– Major determinant of MT quality– Manual production: expensive– Eng oil => Fr huile or petrole?
SYSTRAN: 400 rules
![Page 11: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/11.jpg)
Inputs and outputs
MT needs rules of form
in context C, S => T– Major determinant of MT quality– Manual production: expensive– Eng oil => Fr huile or petrole?
SYSTRAN: 400 rules
Waspbench output: thousands of rules
![Page 12: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/12.jpg)
Evaluation
hard
![Page 13: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/13.jpg)
Evaluation
hard Three communities
![Page 14: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/14.jpg)
Evaluation
hard Three communities No precedents
![Page 15: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/15.jpg)
Evaluation
hard Three communities No precedents The art and craft of lexicography
![Page 16: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/16.jpg)
Evaluation
hard Three communities No precedents The art and craft of lexicography MT personpower budgets
![Page 17: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/17.jpg)
Five threads as WSD: SENSEVAL for lexicography: MED expert reports Quantitative experiments with human
subjects– India
Within-group consistency
– Leeds Comparison with commercial MT
![Page 18: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/18.jpg)
Method Human1
creates word experts Computer
uses word experts to disambiguate test instances MT system
translates same test instances Human2
– evaluates computer and MT performance on each instance:
– good / bad / unsure / preferred / alternative
![Page 19: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/19.jpg)
Words mid-frequency
– 1,500-20,000 instances in BNC At least two clearly distinct meanings
– Checked with ref to translations into Fr/Ger/Dutch
33 words– 16 nouns, 10 verbs, 7 adjs
around 40 test instances per word
![Page 20: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/20.jpg)
WordsNouns Verbs Adjectives
bank party charge toast bright
chest policy float undermine free
coat record move funny
fit seal observe hot
line step offend moody
lot term post strong
mass volume pray
![Page 21: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/21.jpg)
Human subjects Translation studies students, Univ Leeds
– Thanks: Tony Hartley Native/near-native in English and their other
language twelve people, working with:
– Chinese (4) French (3) German (2) Italian (1) Japanese (2) (no MT system for Japanese)
circa four days’ work:– introduction/training– two days to create word experts– two days to evaluate output
![Page 22: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/22.jpg)
Method Human1
creates word experts, average 30 mins/word Computer
uses word experts to disambiguate test instances MT system: Babelfish via Altavista
translates same test instances Human2
– evaluates computer and MT performance on each instance:
– good / bad / unsure / preferred / alternative
![Page 23: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/23.jpg)
Results (%)
Lang Wasps MT both neither unsure
Ger 60 28 19 26 5
Fr 61 45 37 28 4
Ch 68 42 37 23 3
It 67 29 23 22 5
All 64 36 29 25 4
![Page 24: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/24.jpg)
Results by POS (%)Wasps MT both neither
Nouns 69 40 35 24
Verbs 61 38 32 27
Adjs 63 41 31 24
![Page 25: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/25.jpg)
Observations Grad student users, 4-hour training 30 mins per (not-too-complex) word ‘fuzzy’ words intrinsically harder No great inter-subject disparities
– (it’s the words that vary, not the people)
![Page 26: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/26.jpg)
Conclusion WSD can improve MT
(using a tool like WASPS)
![Page 27: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/27.jpg)
![Page 28: Evaluating the Waspbench](https://reader033.vdocuments.us/reader033/viewer/2022051517/5681501f550346895dbe042a/html5/thumbnails/28.jpg)
Future work multiwords n>2 thesaurus other source languages new corpora, bigger corpora
– the web