wsd using optimized combination of knowledge sources
DESCRIPTION
WSD using Optimized Combination of Knowledge Sources. Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu. Introduction. Regular approaches All words Sample (small trial section) Problems Ambiguity, especially at fine granularity - PowerPoint PPT PresentationTRANSCRIPT
WSD using Optimized Combination of Knowledge Sources
Authors: Yorick Wilks and Mark Stevenson
Presenter: Marian Olteanu
Introduction
Regular approachesAll wordsSample (small trial section)
ProblemsAmbiguity, especially at fine granularityNew senses in text that are not in dictionary
Approach
Integrates partial sources of informationPart-of-speechDictionary definitionsPragmatic codesSelectional restrictions
IntegrationFiltersPartial selectors (taggers)
Dictionary for senses
Longman Dictionary of Contemporary English (LDOCE)
Two levels: Homograph Sense
Methodology
PreprocessingPart-of-speech tagger (Brill)
Part-of-speechFilter – eliminate all incompatible homographs If no sense remains – keep all senses
Methodology (cont.)
Dictionary definitionsPartial tagger:
Count number of words that appear both in definition and the context
Normalize by the length of the definition Return a list of candidate senses
Methodology (cont.)
Pragmatic codesPartial tagger - Uses the hierarchy of LDOCE
pragmatic codes (subject area)Modified simulated annealingOptimize the number of pragmatic codes of
the same type in the sentenceWhole paragraph - Only for nouns ?
Methodology (cont.)
Selectional RestrictionsFilterLDOCE senses – 35 semantic classes (H =
human, M = human male, P = plant, etc)Nouns – their type, adjs – the type of the
object they modify, adv – type of their modifier, verbs – types of S, DO, IO
Methodology (cont.)
Combine knowledge sourcesDecision listsCan assign sense to unknown words, if there
is a definition in LDOCE
Evaluation
Create a corpus based on SemCor (200,000 words; tagged with WordNet senses)SENSUS – merging between LDOCE and
WordNet (for Machine Translation)Still ambiguity36,869 out of 85,747 words (personal opinion:
strongly biased)
Results
Baseline: 49.8% 70% of the 1st sense –
correctly tagged 83.4% accuracy =
92.8% accuracy on all words (!!!)
Test by voting: