improving subcategorization acquisition using word sense disambiguation

Improving Subcategorization Acquisition using Word Sense Disambiguation

Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15 JJ Thomas Avenue, Cambridge CB3 0FD, UK [email protected], [email protected]

Outline

Subcategorization Acquisition

Baseline System

Baseline System combined with WSD Probabilistic WSD Experiment

Evaluation

Methods

Introduction Subcategorization

The dependents of a verb are classified in: arguments -subject, object, direct object

- subject - non subject arguments (complements)

e.g. Mary knows that she is wining.adjuncts e.g. She read the book with great interest.

The type of complements that a verb permits gives the

verb classification The verb classification is called subcategorization SCFs –subcategorization frames for a given

predicate; essential for parsing

Introduction

SCFs- a particular set of arguments that a verb can appear with

Intransitive verb. NP[subject]. They danced.

Transitive verb. NP[subject], NP[object]. Mary appreciates her Professor.

Intransitive with PP. NP[subject],PP. He leave in Paris Transitive with PP. NP[subject], NP[object], PP. She put the

book on the table.

IntroductionManual subcategorization versus automatically one

Manual - does not provide the relative frequency of SCFs

- predicates change behavior

Automatically - no lexical/semantic information is exploited;

- reveals only syntactic aspects;

- no distinction between predicate senses

Korhonen(2002) model : back-off estimates which used the predominant sense of a verb (WordNet)

Acquisition Goal – domain specific lexicon (written vs. spoken; genre based on different senses)


Baseline System– system with the knowledge of verb semantics Levin(93) - verb senses divides them in classes distinctive for subcategorizationKorhonen(2002) - verb forms are able to divide them into semantic

classes based on the predominant sense (fly - move) - determine the sense and the semantic class (Levin Classes

“Motion verbs”) Briscoe Carroll(97) – SCF distribution are acquired from corpus

data

Subcategorization Acquisition Baseline System – description

The linear interpolation smoothing back-off estimates is used for the SCF distribution

The method of obtaining back-off estimates a) 4-5 representative verbs are chosen from a verb class

b) for theses verbs the SCF distribution is built using manually analysis of 300 occurrences of each verb (BNC)

c) the resulted SCF distributions are merged giving equal weight to each distribution E.g. fly - move, slide, arrive, travel, sail

An empirical threshold is used to filter out noisy SCFs

Subcategorization Acquisition Combining with WSD

Preiss & Korhonen(02)

- created different corpus datasets for the senses (first/and or second) being disambiguated and other datasets for the

remaining senses

- SCFs were acquired from both types of datasets

- back-off estimates used for the SCFs acquired from the initial dataset, the estimates were used for

smoothing according to the relevant sense

- the SCF lexicons acquired were merged in the end SCF distribution was rather specific to a verb than a

sense

- problems with subcategorization acquisition: datasets too small, separation of the data was unnecessary

Subcategorization Acquisition New method – does not involve separating data and it uses back-off estimates

for the sense distribution given by the WSD system not only for the predominant sense

pj(scfi), j=1..nb0 (nb0=the number of back-off estimates) - the probabilities of SCFs in different back-off distribution

P(scfi)= ∑λj*pj(scfi);

λj - weights for the different distributions that sum up to 1, are obtained from the probabilistic WSD system

Probabilistic WSD - able to determine the probability distribution for each noun, verb, adjective and adverb - able to determine a probability distribution on the senses for each verb and compute the average of it

J=1

nb0

Subcategorization Acquisition System Description

- it is based on Stevenson and Wilks(2001) system which combines knowledge sources to produce a WSD Tool

- it combines the probability distribution on senses determined by each module used; (modules

described in Yarowsky(2000); Mihalcea(2002); Pederson(2002)) for the WSD probabilistic system

- a process of smoothing is used for each module according to each confidence value; a low module confidence is smoothed extensively for uniform distribution

- the optimal combination of modules is based on the accuracy (F-measure) for the English all-words

task


ExperimentTest Data

- polysemous verbs with the predominant sense not very frequent – 29 verbs chosen randomly

- the Levin-style senses are used to map the WordNet senses of the chosen verbs

- he maximum number of Levin senses considered was 4 and some of the given senses were left out

Subcategorization Acquisition Evaluation Method - 20 mil words of the BNC corpus and extracted all

senses for the test verbs - 1000 sentences for each verb disambiguated with the

probabilistic WSD - applied the modified subcategorization system - for each verb an individual set of back-off estimates

was built based on the different frequency senses from the corpus data

- results were evaluated against a manual analysis of the corpus data

- for an average of 300 occurrences for each verb in the BNC test data 5-21 gold standard SCFs were

found (16 SCFs per verb)

Subcategorization Acquisition Evaluation

Method F-measure = 2∙P∙R ∕ P+R;

P-precisionR-recall

RC – Sperman rank correctionKL – Kullback-Leibler distance CE – cross entropy

- record the total number of SCFs missing in the distribution for determining the accuracy of the

back-off estimates - comparison with other systems: the base-line and other

which assumed no sense at all

Subcategorization Acquisition Results

- using the unsmoothed lexicon from a total of 175 unseen standard SCFs a number of 107 remain unseen after using the predominant sense method

- using the WSD method only 22 remain unseen- the performance improves with the numbers of senses - IS measure reveals that between the acquired and the

gold standard SCFs exists an intersection when WSD is used


Results

- improvement for the highly polysemous verbs (bear, count, roar e.t.c)

- verbs who differ substantially in terms of subcategorization (conceive, continue, grasp e.t.c)

- verbs whose sense involves mainly NP/PP

- SCFs seems to appear in data as “families” for a sense of a verb

- worse performance for seek using WSD even though is highly polysemous and differs in terms of

subcategorization

-no clear improvement : choose, compose, induce, watch


Conclusions

- using the WSD an improvement can be shown for SCFs acquisition of difficult verbs because the senses differ

also in terms of subcategorization not only in the degree of polysemy

Future work- a better way of integrating the frequency of acquired

senses into the SCFs and a refinancefor the subcategorization method

improving subcategorization acquisition using word sense disambiguation

Documents