wsd using optimized combination of knowledge sources

11
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu

Upload: karleigh-walls

Post on 31-Dec-2015

25 views

Category:

Documents


0 download

DESCRIPTION

WSD using Optimized Combination of Knowledge Sources. Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu. Introduction. Regular approaches All words Sample (small trial section) Problems Ambiguity, especially at fine granularity - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: WSD using Optimized Combination of Knowledge Sources

WSD using Optimized Combination of Knowledge Sources

Authors: Yorick Wilks and Mark Stevenson

Presenter: Marian Olteanu

Page 2: WSD using Optimized Combination of Knowledge Sources

Introduction

Regular approachesAll wordsSample (small trial section)

ProblemsAmbiguity, especially at fine granularityNew senses in text that are not in dictionary

Page 3: WSD using Optimized Combination of Knowledge Sources

Approach

Integrates partial sources of informationPart-of-speechDictionary definitionsPragmatic codesSelectional restrictions

IntegrationFiltersPartial selectors (taggers)

Page 4: WSD using Optimized Combination of Knowledge Sources

Dictionary for senses

Longman Dictionary of Contemporary English (LDOCE)

Two levels: Homograph Sense

Page 5: WSD using Optimized Combination of Knowledge Sources

Methodology

PreprocessingPart-of-speech tagger (Brill)

Part-of-speechFilter – eliminate all incompatible homographs If no sense remains – keep all senses

Page 6: WSD using Optimized Combination of Knowledge Sources

Methodology (cont.)

Dictionary definitionsPartial tagger:

Count number of words that appear both in definition and the context

Normalize by the length of the definition Return a list of candidate senses

Page 7: WSD using Optimized Combination of Knowledge Sources

Methodology (cont.)

Pragmatic codesPartial tagger - Uses the hierarchy of LDOCE

pragmatic codes (subject area)Modified simulated annealingOptimize the number of pragmatic codes of

the same type in the sentenceWhole paragraph - Only for nouns ?

Page 8: WSD using Optimized Combination of Knowledge Sources

Methodology (cont.)

Selectional RestrictionsFilterLDOCE senses – 35 semantic classes (H =

human, M = human male, P = plant, etc)Nouns – their type, adjs – the type of the

object they modify, adv – type of their modifier, verbs – types of S, DO, IO

Page 9: WSD using Optimized Combination of Knowledge Sources

Methodology (cont.)

Combine knowledge sourcesDecision listsCan assign sense to unknown words, if there

is a definition in LDOCE

Page 10: WSD using Optimized Combination of Knowledge Sources

Evaluation

Create a corpus based on SemCor (200,000 words; tagged with WordNet senses)SENSUS – merging between LDOCE and

WordNet (for Machine Translation)Still ambiguity36,869 out of 85,747 words (personal opinion:

strongly biased)

Page 11: WSD using Optimized Combination of Knowledge Sources

Results

Baseline: 49.8% 70% of the 1st sense –

correctly tagged 83.4% accuracy =

92.8% accuracy on all words (!!!)

Test by voting: