anaphora resolution

Post on 13-Jan-2016

65 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Anaphora Resolution. Sobha Lalitha Devi AU-KBC Research Centre MIT Campus of Anna University Chennai-44 sobha@au-kbc.org. Contents. Introduction to Anaphora and Anaphora Resolution Types of Anaphora Process of Anaphora Resolution Tools Applications References. Introduction. What is - PowerPoint PPT Presentation

TRANSCRIPT

Anaphora Resolution

Sobha Lalitha DeviAU-KBC Research CentreMIT Campus of Anna UniversityChennai-44sobha@au-kbc.org

Contents

Introduction to Anaphora and Anaphora Resolution

Types of Anaphora Process of Anaphora Resolution Tools Applications References

Introduction

What is Anaphora AntecedentAnaphora Resolution

1. Sabeer Bhatia arrived at Los Angeles International Airport at 6 p.m. on September 23, 1998. His flight from Bangalore had taken 22hrs and he was starving.

[RD, NOV 2000]

Etymology of Anaphora

ANA- Back, Upstream, Back upstream

Phora- Act of Carrying

Anaphora - Act of Carrying Back

What is Anaphora

Anaphora, in discourse, is a device for making an abbreviated reference (containing fewer bits of disambiguating information, rather than being lexically or phonetically shorter) to some entity (or entities) in the expectation that the receiver of the discourse will be able to disabbreviate the reference and, thereby, determine the identity of the entity.

(Hirst 1981)

Cataphora

When “anphor” precedes the antecedent

Because she was going to the departmental store, Mary was asked to pick up the vegetables.

Relevance from the Linguistics point of view

Binding Theory is one of the major results of the principles and parameters approach developed in Chomsky (1981) and is one of the mainstays of generative linguistics.

The Binding Theory deals with the relations between nominal expressions and possible antecedents.

It attempts to provide a structural account of the complementarity of distribution between pronouns, reflexives and R-expressions.

Dichotomy Between Linguistic and NLP

The Binding Theory (and its various formulations) deals only with intra-sentential anaphora,

A very small subset of the anaphoric phenomenon that practical NLP systems are interested in resolving.

A much larger set of anaphoric phenomenon is the resolution of pronouns inter-sententially.

This problem is dealt with by Discourse Representation Theory and more specifically by Centering Theory (Grosz et al., 1995)..

Type of Anaphors

The Prime Minister is yet to arrive and he is expected at the central hall at any time. [The Times of India, Feb 2001]

This book is about Anaphora Resolution. The book is designed to help beginners in the field and its author hopes that it will be useful.

John screamed, as did Mary .

Pronominal anaphora Vajpayee hits back forcefully when he told the

opposition today “sometimes we fall prey to the media and sometimes you do. [Indian Express 2001]

Possessive Priyanka eats only chicken sandwiches

before going to take any exam; nothing else goes down her gullet that day.[Indian Express, 13 March 2001]

Reflexive Pronoun

Finally ,Danian heaved himself up and lay on a waiting stretcher.

Demonstrative PronounJohn had lots of packing to do before he shifted his

house. This was something he never liked….

Relative PronounStumper Sameer Dige, who made his test debut, failed

to show fast reflexives when it mattered.

Pleonastic It

Cognativea. It is believed that…..b. It appears that…..Modal Adjectivesc. It is dangerous……d. It is important…..Temporale. It is five o’clock f. It is winterWeather verbsg. It is rainingf. It is snowing

Distance

h. How far it is to Chennai?

Non-anaphoric uses of pronounsHe that plants thorns must never expect to gather

roses.He who dares wins.

DeicticHe seems remarkably bright for a child of his age.

Noun Phrase Anaphora

Definite descriptions and Proper names

Roy Kaene has warned Manchester United he may snub their pay deal. United’s skipper is even hinting that unless the future Old Trafford Package meets his demands, he could quit the club in June 2000. Irishman Keane, 27, still has 17 months to run on his current 23,000 pound a week contract and wants to commit himself to United for life. Alex Ferguson’s No 1 player confirmed: If it’s not the contract I want, I won’t sign”.

Coreference

Computational Linguists from many different countries attended the tutorial. The participants found it hard to cope with the speed of the presentation, nevertheless they manages to take extensive notes.

What is Anaphora Resolution

The Process of finding the antecedent for an Anaphor is Anaphora resolution

Anaphor-The reference that point to the previous item.

Antecedent-The entity to which the anaphor refers

Different Approaches In Anaphora Resolution

Rule Based

Statistical Based

Lappin and Leass (1994) Anaphora Resolution Algorithm

The Lappin and Leass(1994) anaphora resolution algorithm uses

salience weight in determining the antecedent to the pronominals.

It requires as input a fully parsed sentence structure and

uses hierarchy in identifying the subject, object etc.

This algorithm uses syntactic criteria to rule out noun

phrases that cannot possibly corefer with it.

The antecedent is then chosen according to a ranking based

on salience weights.

The salience Factors and WeightsA pronoun P is non-coreferential with a (non-reflexive or non-

reciprocal) noun phrase N if any of the following conditions hold:

P and N have incompatible agreement features. P is in the argument domain of N. P is in the adjunct domain of N. P is an argument of a head

H, N is not a pronoun, and N is contained in H. P is in the NP domain of N. P is a determiner of a noun Q, and N is contained in Q.

Examples

Condition 1:The woman said that he is funny.

Condition 2:She likes her. John seems to want to see him.

Condition 3:She sat near her.

Condition 4:He believes that the man is amusing.This is the man he said John wrote about.

Condition 5:John’s portrait of him is interesting.

Salience Factors and Weights

Salience factor types with initial weightsFactor type Initial weightSentence recency 100Subject emphasis 80Existential emphasis 70Accusative emphasis 50Indirect object and oblique complement emphasis 40Head noun emphasis 80

Non-adverbial emphasis 50

Kennedy 1996The linguistic analysis for anaphora resolution includes

The output of a part of speech tagger,

Augmented with syntactic function annotations for each input token;

Using LINGSOFT

A set of patterns are used for identifying

The NP Chunking with position of the NP in the text: Nominal Sequencing in two subordinate syntactic

environments:a. in an adverbial adjunct b. in an NP (i.e. containment in a prepositional

or clausal complement of a noun, or containment in a relative clause)

Expletive “it”:

Anaphora Resolution

Uses Lappin and Lease algorithmSENT-S: 100 iff in the current sentenceCNTX-S: 50 iff in the current contextSUBJ-S: 80 iff GFUN = subjectEXST-S: 70 iff in an existential constructionPOSS-S: 65 iff GFUN = possessiveACC-S: 50 iff GFUN = direct objectDAT-S: 40 iff GFUN = indirect objectOBLQ-S: 30 iff the complement of a prepositionHEAD-S: 80 iff EMBED = NILARG-S: 50 iff ADJUNCT = NIL

Mitkov 1997

No Parsing of the Input Sentence

Boosting indicators

First Noun Phrases: A score of +1 is assigned to the first NP in a sentence.

Indicating Verbs: A score of +1 is assigned to those NPs immediately following a verb which is a member of a predefined set (including verbs such as discuss, present, illustrate, identify, summarise, examine, describe, define, show, check, develop, review,

MARS Cont….

Lexical Reiteration: A score of +2 is assigned to those NPs repeated twice or more in the paragraph in which the pronoun appears, a score of +1 is assigned to those NPs repeated once in that paragraph.

Section Heading Preference: A score of +1 is assigned to those NPs that also occur in the heading of the section in which the pronoun appears.

Boosting indicators contd..

Collocation Match: A score of +2 is assigned to those NPs that have an identical collocation pattern to the pronoun.

Immediate Reference: A score of +2 is assigned to those NPs appearing in constructions of the form

“… (You) V1 NP … con (you) V2 it (con (you) V3 it)”, where con Є {and/or/before/after…}.

Sequential Instructions: A score of +2 is applied to NPs in the NP1 position of constructions of the form: “To V1 NP1 V2 NP2. (Sentence). To V3 it, V4 NP4“ the noun phrase NP1 is the likely antecedent of the anaphor it (NP1 is assigned a score of 2).

Term Preference: A score of +1 is applied to those NPs identified as representing terms in the genre of the text.

Impeding indicators

Indefiniteness: Indefinite NPs are assigned a score of -1.

Prepositional Noun Phrases: NPs appearing in prepositional phrases are assigned a score of -1.

“Vasisth” a Rule Based Anaphora Resolution System

1. mo:han(i) avanRe(i) kuttiye kantu. mohan he-poss child-acc see-pst (Mohan saw his child.)2. mo:han(i) avanRe(i) kuttiye kantu ennu kRisnan paRannu. mohan he-poss child-acc see-pst compl krishnan say-pst (Krishnan said that Mohan saw his child.)3. *mo:han(i) avane(i) aticcu. mohan he-acc beat-pst (Mohan beat him.) 4. mo:han avane(i) aticcu ennu kRisnan(i) paRannu. mohan he-acc beat-pst compl krishnan say-pst (Krishnan said that Mohan beat him.)

The Algorithm for Intra-sentential Anaphora

A pronoun P is coreferential with an NP iff the following conditions hold:

a. P and NP have compatible P, N, G features. b. P does not precede NP. c. If P is possessive, then NP is the subject of the clause which contains P. d. If P is non-possessive, then NP is the subject

of the immediate clause which does not contain P.

Vasisth is a multilingual Anaphora Resolution system

Rule based With minimum Parsing Exploit the Morphology of Indian

Languages

“VASISTH” Using Salience Measure for Indian Languages

No In-depth Parsing

Exploit the Rich Morphology of the Language

The analysis depends on the salience weight of the candidate (NP) for the antecedent-hood of an anaphor from a list of probable candidates.

The salience weight assignment

a) The current sentence gets a score of 50 and it reduces by 10 for each preceding sentence till it reaches the fifth sentence. The system considers five sentences for identifying the antecedent.

b) The current clause gets a score of 75 if the pronoun present in the clause is a possessive pronoun and if it is a non-possessive pronoun it gets zero score.

c) The immediate clause gets the score 70 in the case of Possessive pronoun and gets a score of 75 for non-possessive pronouns.

d) For non-immediate clause, the possessive pronoun gets a score of 30 and non-possessive pronoun gets a score of 65.

e)The analysis showed that the subject could be the most probable antecedent for the pronoun. The case markings the subject of a sentence could take are nominative and dative.

A Nominative, a Dative and a Possessive NP with a nominative/Dative head could become a subject of a sentence.

f) The direct object of a sentence could be identified by the case markings and all the case markings other than the subject are considered for object. The next most probable NP for antecedent-hood is the direct object and hence it gets a score of 40.

g) The third NP in a clause, which is not identified as the subject or object, is considered as the indirect object and gets a low score of 30.

Salience factor weights for Indian Languages

Salience Factors Weights

Current sentence  Possessive Current clauseImmediate clauseNon-immediate clauseNon-PossessiveCurrent clauseImmediate clauseNon-immediate clausePossessive and Non-PossessiveN.NomN.PossN.DatN.Acc, Loc, Instr…N.others(3rd NP)

50- Reduced by 10 for preceding sentences upto 5th sentence 75 7030 07565 8050504030

How it works

The salience weight to an NP is assigned in the following way

Identify the Pronoun Consider Four sentences above the sentence containing

the Pronoun Consider all the NPs preceding the Pronoun ( This is

the general rule)

Here we take some NPs which follow the the Pronoun since Tamil

All Indian languages are relatively free word Order

Assign Salience Weights.

The NP which gets the maximum salience weight and agrees in png with the anaphor is considered as the antecedent to the anaphor

Tools

GATE Java-RAP (pronouns) GUITAR (Poesio & Kabadjov, 2004;

Kabadjov, 2007) BART (Versleyet al, 2008)

Where it is required?

Machine Translation Information Extraction Summarization And in……….almost all NLU applications

References

Massimo Poesio Slides: “Anaphora resolution for Practical task”

Ruslan Mitkov: “MARS a Knowledge Poor anaphora resolution system”

Thank You

top related