corpora and statistical methods lecture 5

Click here to load reader

Upload: morgan

Post on 22-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Corpora and Statistical Methods Lecture 5. Albert Gatt. Application 3: Verb selectional restrictions . Observation. Some verbs place high restrictions on the semantic category of the NPs they take as arguments. Assumption : we’re focusing attention on Direct Objects only - PowerPoint PPT Presentation

TRANSCRIPT

Corpora and Statistical Methods Lecture 11

Albert GattCorpora and Statistical MethodsLecture 5Application 3: Verb selectional restrictions ObservationSome verbs place high restrictions on the semantic category of the NPs they take as arguments.

Assumption: were focusing attention on Direct Objects onlye.g. eat selects for FOOD DOs:eat cakeeat some fresh vegetablesgrow selects for LEGUME DOs:grow potatoes

Not all verbs are equally constrainingSome verbs seem to place fewer restrictions than others:see doesnt seem too restrictive:see Johnsee the potatosee the fresh vegetablesProblem definitionFor a given verb and a potential set of arguments (nouns), we want to learn to what extent the verb selects for those argumentsrather than individual nouns, were better off using noun classes (FOOD etc), since these allow us to generalise morecan obtain these using a standard resource, e.g. WordNetA short detour: Kullback-Leibler divergenceKullback-Leibler divergenceWe are often in a position where we estimate a probability distribution from (incomplete) dataThis problem is inherent in sampling.We end up with a distribution P, which is intended as a model of distribution Q.How good is P as a model?

Kullback-Leibler divergence tells us how well our model matches the actual distribution. Motivating exampleSuppose Im interested in the semantic type or class to which a noun belongs, e.g.:cake, meat, cauliflower are types of FOOD (among other things)potato, carrot are types of LEGUME (among other things)

How do I infer this?

It helps if I know that certain predicates, like grow select for some types of DO, not others*grow meat, *grow cakegrow potatoes, grow carrotsMotivating example cont/dIngredientsC: the class of interest (e.g. LEGUME)v: the verb of interest (e.g. grow)P(C) = probability of class C prior probability of finding some element of C as DO of any verbP(C|v) = probability of C given that we know that a noun is a DO of growthis is my posterior probability

More precise way of asking the question:Does the probability distribution of C change given the info about v?

Ingredients for KL Divergencesome prior distribution Psome posterior distribution Q

Intuition: KL-Divergence measures how much information we gain about P, given that we know Qif its 0, then we gain no info

Given two probability distributions P and Q, with probability mass functions p(x) and q(x), KL-Divergence is denoted D(p||q)

Calculating KL-Divergence

divergencebetween priorand posterior probability distributionsMore on the interpretation of KL-DivergenceIf probability distribution P is interpreted as the truth and distribution Q is my approximation, then:

D(p||q) tells me how much extra info I need to add to Q to get to the actual truthBack to our problem: Applying KL-divergence to selectional restrictionsResniks model (Resnik 1996)2 main ingredients:Selectional Preference Strength (S): how strongly a verb constrains its direct object (a global estimate)Selectional Association (A): how much a verb v is associated with a given noun class (a specific estimate for a given class)

Notationv = a verb of interestS(v) = the selectional preference strength of vc = a noun classC = the set of all the noun classesA(v,c) = the selectional association between v and class cSelectional Preference StrengthS(v) is the KL-Divergence between:the overall prior distribution of all noun classesthe posterior distribution of noun classes in the direct object position of v

how much info we gain from knowing the probability that members of a class occur as DO of vworks as a global estimate of how much v constrains its arguments semanticallythe more it constrains them, the more info we stand to gain from knowing that an argument occurs as DO of vS(grow): prior vs. posterior

Source: Resnik 1996, p. 135Calculating S(v)

This quantifies the extent to which our prior and posterior probability estimates diverge. how much info do we gain about C by knowing its the object of v?Some more examplesclassP(c)P(c|eat)P(c|see)P(c|find)people0.250.010.250.33furniture0.250.010.250.33food0.250.970.250.33action0.250.010.250.01SPS: S(v)1.760.000.35How much info do we gain if we know what a noun is DO of? quite a lot if its an argument of eat not much if its an argument of find none if its an argument of see

Source: Manning and Schutze 1999, p. 290Selectional associationThis is estimated based on selectional preference strength

tells us how much a verb is associated with a specific class, given the extent to which it constrains its arguments

given a class c, A(v,c) tells us how much of S(v) is contributed by cCalculating A(v,c)

this is partof our summationfor S(v)dividing by S(v)gives the proportion of S(v) which is caused by class cFrom A(v,c) to A(v,n)We know how to estimate the association strength of a class with v

Problem:some nouns can occur in more than one class

Let classes(n) be the classes in which noun n belongs:

ExampleSusan interrupted the chair.chair is in class FURNITUREchair is in class PEOPLE

A(interrupt,PEOPLE) > A(interrupt,FURNITURE)A(interrupt,chair) = A(interrupt,PEOPLE)

Note that this is a kind of word-sense disambiguation!Some results from Resnik 1996Verb (v)Noun (n)Class (c)A(V,n)answerrequestSpeech act4.49answertragedycommunication3.88hearstorycommunication1.89hearissuecommunication1.89There are some fairly atypical examples: these are due to the disambiguation method e.g. tragedy can be in COMM class, and so is assigned A(answer,COMM) as its a(v,n)Overall evaluationResniks results were shown to correlate very well with results from a psycholinguistic study

The method is promising:seems to mirror human intuitionsmay have some psychological validity

Possibly an alternative, data-driven account of the semantic bootstrapping hypothesis of Pinker 1989?