number or nuance: factors affecting reliable word sense annotation susan windisch brown, travis...
TRANSCRIPT
![Page 1: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/1.jpg)
NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation
Susan Windisch Brown, Travis Rood, and Martha PalmerUniversity of Colorado at Boulder
![Page 2: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/2.jpg)
Annotators in their little nests agree;And ‘tis a shameful sight,When taggers on one projectFall out, and chide, and fight. —[adapted from] Isaac Watts
![Page 3: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/3.jpg)
3
Automatic word sense disambiguation
Lexical ambiguity is a significant problem in natural language processing (NLP) applications (Agirre & Edmonds, 2006) Text summarization Question answering
WSD systems might help Several studies show benefits for NLP
tasks (Sanderson, 2000; Stokoe, 2003; Carpuat and Wu, 2007; Chan, Ng and Chiang, 2007)
But only with higher system accuracy (90%+)
![Page 4: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/4.jpg)
4
Annotation reliability affects system accuracy
WSD system
System Performance
Inter-annotator agreement
Sense Inventory
SensEval2 62.5% 70% WordNet
Chen et al. (2007)
82% 89% OntoNotes
Palmer (2008)
90% 94% PropBank
![Page 5: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/5.jpg)
5
Senses for the verb control
WordNet OntoNotes
1. exercise authoritative control or power over
1. exercise power or influence over; hold within limits
2. control (others or oneself) or influence skillfully
3. handle and cause to function
4. lessen the intensity of; temper
5. check or regulate (a scientific experiment) by conducting a parallel experiment
2. verify something by comparing to a standard
6. verify by using a duplicate register for comparison
7. be careful or certain to do something
8. have a firm understanding of
![Page 6: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/6.jpg)
6
Possible factors affecting the reliability of word sense annotation
Fine-grained senses result in many senses per word, creating a heavy cognitive load on annotators, making accurate and consistent tagging difficult
Fine-grained senses are not distinct enough to reliably discriminate between
![Page 7: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/7.jpg)
7
Requirements to compare fine-grained and coarse-grained annotation
Annotation of the same words on the same corpus instances
Sense inventories differing only in sense granularity
Previous work (Ng et al., 1999; Edmonds & Cotton, 2001; Navigli et al. 2007)
![Page 8: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/8.jpg)
8
3 experiments
40 verbs Number of senses : 2-26 Sense granularity: WordNet vs. OntoNotes Exp. 1: confirm difference in reliability
between fine- and coarse-grained annotation; vary granularity and number of senses
Exp. 2: hold granularity constant; vary number of senses
Exp. 3: hold number constant; vary granularity
![Page 9: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/9.jpg)
9
Experiment 1
Compare fine-grained sense inventory to coarse
70 instances for each verb from the ON corpus
Annotated with WN senses by multiple pairs of annotators
Annotated with ON senses by multiple pairs of annotators
Compare the ON ITAs to the WN ITAsAve. number of senses
Granularity
OntoNotes 6.2 Coarse
WN 14.6 Fine
![Page 10: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/10.jpg)
10
Results
Wor
dNet
(fine
-gra
ined
)
OntoN
otes
(coa
rse-
grai
ned)
0%
20%
40%
60%
80%
100%
57%
91%
ITA
![Page 11: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/11.jpg)
11
Results
Coarse-grained ON annotations had higher ITAs than fine-grained WN annotations
Number of senses No significant effect (t(79) = -1.28, p = .206).
Sense nuance Yes, a significant effect (t(79) = 10.39, p < .0001).
With number of senses held constant, coarse-grained annotation is 16.2 percentage points higher than fine-grained.
![Page 12: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/12.jpg)
12
Experiment 2: Number of senses Hold sense granularity constant; vary # of senses 2 pairs of annotators, using fine-grained WN senses First pair uses full set of WN senses for a word Second pair uses a restricted set on instances that
we know should fit one of those senses
Ave. number of senses
Granularity
WN Full set 14.6 Fine
WN Restricted set
5.6 Fine
![Page 13: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/13.jpg)
13
OntoNotes grouped sense B
OntoNotes grouped sense C
OntoNotes grouped sense A
WN 3 7 8
13 14
WN 9 10
WN 1 2 4 5
6 11 12
![Page 14: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/14.jpg)
14
"Then I just bought plywood, drew the pieces on it and cut them out."
1. ---------------- 2. ---------------- 3. ---------------- 4. ---------------- 5. ---------------- 6. ---------------- 7. ---------------- 8. ---------------- 9. ---------------- 10. ---------------- 11. ---------------- 12. ---------------- 13. ---------------- 14. ----------------
3. ---------------- 7. ---------------- 8. ---------------- 13. ---------------- 14. ----------------
Full set of WN sensesRestricted set of WN senses
![Page 15: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/15.jpg)
15
Results
WN full set WN restricted set0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
59%53%
ITA
![Page 16: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/16.jpg)
16
Experiment 3
Number of senses controlled; vary sense granularity
Compare the ITAs for the ON tagging with the restricted-set WN tagging
Ave. number of senses
Granularity
OntoNotes 6.2 Coarse
WN Restricted set
5.6 Fine
![Page 17: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/17.jpg)
17
Results
WN re
stric
ted
set (
fine-
grai
ned)
OntoN
otes
(coa
rse-
grai
ned)
0%20%40%60%80%
100%
53%
91%
ITA
![Page 18: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/18.jpg)
18
Conclusion
Number of senses annotators must choose between: never a significant factor
Granularity of the senses: a significant factor, with fine-grained senses leading to lower ITAs
Poor reliability of fine-grained word sense annotation cannot be improved by reducing the cognitive load on annotators.
Annotators cannot reliably discriminate between nuanced sense distinctions.
![Page 19: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/19.jpg)
19
Acknowledgements
We gratefully acknowledge the efforts of all of the annotators and the support of the National Science Foundation Grants NSF-0415923, Word Sense Disambiguation and CISE-CRI-0551615, Towards a Comprehensive Linguistic Annotation and CISE-CRI 0709167, as well as a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, a subcontract from BBN, Inc.
![Page 20: NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder](https://reader036.vdocuments.us/reader036/viewer/2022081811/5697bf931a28abf838c8f921/html5/thumbnails/20.jpg)
20
Restricted set annotation
Use the adjudicated ON data to determine the ON sense for each instance.
Use instances from experiment1 that were labeled with one selected ON sense (35 instances).
Each restricted-set annotator saw only the WN senses that were clustered to form the appropriate ON sense.
Compare to the full set annotation for those instances.