empirical evaluation of pronoun resolution and clausal structure

25
Empirical Evaluation of Pronoun Resolution and Clausal Structure Joel Tetreault and James Allen University of Rochester Department of Computer Science

Upload: kuame-marks

Post on 03-Jan-2016

39 views

Category:

Documents


1 download

DESCRIPTION

Empirical Evaluation of Pronoun Resolution and Clausal Structure. Joel Tetreault and James Allen University of Rochester Department of Computer Science. RST and pronoun resolution. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Empirical Evaluation of Pronoun Resolution and Clausal Structure

Joel Tetreault and James AllenUniversity of Rochester

Department of Computer Science

Page 2: Empirical Evaluation of Pronoun Resolution and Clausal Structure

RST and pronoun resolution Previous work suggests that breaking apart

utterances into clauses (Kameyama 1998), or assigning a hierarchical structure (Grosz and Sidner, 1986; Webber 1988) can aid in the resolution of pronouns:

1. Make search more efficient (less entities to consider)2. Make search more successful (block competing

antecedents) Empirical work has focused on using segmentation

to limit accessibility space of antecedents Test claim by performing an automated study on a

corpus (1241 sentence subsection of PennTreebank; 454 3rd person pronouns)

Page 3: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Rhetorical Structure Theory A way of organizing and describing

natural text (Mann and Thompson, 1988)

It identifies a hierarchical structure Describes binary relations between

text parts

Page 4: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Experiment Create coref corpus that includes PT

syntactic trees and RST information Run pronoun algorithms over this

merged data set to determine baseline score LRC (Tetreault, 1999) S-list (Strube, 1998) BFP (Brennan et al., 1987)

Develop algorithms that use clausal information to compare with baseline

Page 5: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Corpus 52 Wall Street Journal Articles from

1995 Penn Treebank 1273 sentences, 7594 words, 454

third person pronouns Pronoun Corpus annotated in same

manner as Ge and Charniak (1998) RST corpus from RST Discourse

Treebank (Marcu et al., 2002)

Page 6: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Pronoun Corpus( (S (S (NP\-SBJ\-\1#-290~1 (DT The) (NN package) ) (VP (VBD was) (VP (VBN termed) (S (NP\-SBJ (\-NONE\- \*\-\1) ) (ADJP\-PRD (JJ

excessive) )) (PP (IN by)

(NP\-LGS (DT the) (NNP Bush) (NN administration) ))))) (\, \,) (CC but) (S (NP\-SBJ (PRP#OBJREF-290~2 it) ) (ADVP (RB also) ) (VP (VBD provoked) (NP…..

Page 7: Empirical Evaluation of Pronoun Resolution and Clausal Structure

RST Corpus(SATELLITE (SPAN |4| |19|) (REL2PAR ELABORATION-

ADDITIONAL) (SATELLITE (SPAN |4| |7|) (REL2PAR CIRCUMSTANCE)

(NUCLEUS (LEAF |4|) (REL2PAR CONTRAST) (TEXT _!THE PACKAGE WAS TERMED EXCESSIVE BY THE BUSH |ADMINISTRATION,_!|))

(NUCLEUS (SPAN |5| |7|) (REL2PAR CONTRAST) (NUCLEUS (LEAF |5|) (REL2PAR SPAN)

(TEXT _!BUT IT ALSO PROVOKED A STRUGGLE WITH INFLUENTIAL CALIFORNIA LAWMAKERS_!))

Page 8: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Baseline Results

Algorithm

% Right (S) % Right (C)

LRC 80.8% 76.4%

S-list 73.4% 70.0%

BFP 59.5% 48.7%

Naïve 50.7% 56.0%

Page 9: Empirical Evaluation of Pronoun Resolution and Clausal Structure

LRC Algorithm While processing utterance’s

entities (left to right) do: Push entity onto Cf-list-new, if

pronoun, attempt to resolve first: Search through Cf-list-new (l-to-r) taking

the first candidate that meets gender, agreement constraints, etc.

If none found, search past utterance’s Cf-lists starting from previous utterance to beginning of discourse

Page 10: Empirical Evaluation of Pronoun Resolution and Clausal Structure

LRC Error Analysis (89 errors) (24) Minimal S

“the committee said the company reneged on its obligations”

(21) Localized Errors “…to get a customer’s 1100 parcel-a-week

load to its doorstep” (15) Preposed Phrase

“Although he was really tired, John managed to drive 10 hours without sleep”

Page 11: Empirical Evaluation of Pronoun Resolution and Clausal Structure

LRC Errors (2) (12) Parallelism

“It more than doubled the Federal’s long term debt to 1.9 billion dollars, thrust the company into unknown territory – heavy cargo – and suddenly exanded its landing rights to 21 countries from 4.

(11) Competing Antecedents “The weight of Lebanon’s history was also against

him, and it is a history…” (4) Plurals referring to companies

“The Ministory of Construction spreads concrete…. But they seldom think of the poor commuters.”

Page 12: Empirical Evaluation of Pronoun Resolution and Clausal Structure

LRC Errors (3) (2) Genitive Errors

“Mr. Richardson wouldn;t offer specifics regarding Atco’s proposed British project, but he said it would compete for customers…”

Page 13: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Advanced Approaches Grosz and Sidner (1986)– discourse

structure is dependent on intentional structure. Attentional state is modeled as a stack that pushes and pops current state with changes in intentional structure

Veins Theory (Ide and Cristea, 2000) – position of nuclei and satellites in a RST tree determine DRA (domain of referential accessibility) for each clause

Page 14: Empirical Evaluation of Pronoun Resolution and Clausal Structure

G&S Accessibility

e3

e4

e5

e6, p1

e1, e2

Search Order:

e6, e5, e4, e1, e2

Page 15: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Veins Theory Each RST discourse unit (leaf) has an

associated vein (Cristea et al., 1998; Ide and Cristea, 2000)

Vein provides a “summary of the discourse fragment that contains that unit”

Contains salient parts of the RST tree – the preceding nuclei and surrounding satellites

Veins determined by whether node is a nucleus or satellite and what its left and right children are

Page 16: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Veins Algorithm Use same data set augmented with head and

veins information (automatically computed) Exception: RST data set has some multi-child

nodes, assume all extra children are right children Bonus: areas to the left of the root are potentially

accessible – makes global topics introduced in the beginning accessible

Implementation – search each unit in the entity’s DRA starting with most-recent and left-to-right within clause. If no antecedent is found, use LRC to search.

Page 17: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Transforms

Goal of transforms – flatten corpus a bit to create larger segments, so more entities can be considered

SAT – merge satellite leaf into its sibling if sibling is a subtree with all leaves

SENT – merge clauses together in RST tree back into sentence

ATT – merge clauses that are in attribution relation

Page 18: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Transform Examples

NucleusLeaf C1

SatelliteLeaf C2

Subtree Root

Nucleus * Sat-leaf C3

(1) ORIG Subtree Root

C1 C3

(2) SAT

C2

Subtree Root(3) SENT

C1 + C2 + C3

(4) ATT Subtree Root

C1 + C2 C3

* C1 and C2 are in an Attribution relation

Page 19: Empirical Evaluation of Pronoun Resolution and Clausal Structure

SAT example

S.A. Brewing would make a takeoveroffer for all of Bell Resources

if it exercises the option

according to the commission.

Nucleus Sat-Leaf (attribution)

Nuc-Leaf Sat-Leaf (condition)

Nucleus

S.A. Brewing would make a takeoveroffer for all of Bell Resources

if it exercises the option according to the commission

Sat-LeafNuc-Leaf Sat-Leaf

Nucleus

ORIGINAL

TRANSFORM

Page 20: Empirical Evaluation of Pronoun Resolution and Clausal Structure

SENT example

Under the plan, Costa Rica will buyback roughly 60% of its bank debt at a deeply discounted price

according to officials

Nuc-Leaf Satellite (attribution)

Nuc-Leaf Sat-Leaf (elaboration)

Nucleus

ORIGINAL

TRANSFORM

involved in the agreement.

Under the plan, Costa Rica will buyback roughly 60% of its bank debt at a deeply discounted price, according to officials involved in the agreement

Nuc-leaf

Page 21: Empirical Evaluation of Pronoun Resolution and Clausal Structure

ATT example

said Douglas Myers, Chief ExecutiveOf Lion Nathan.

Nuc-Leaf Sat-Leaf (attribution)

Satellite (summary)

ORIGINAL

TRANSFORM

Lion Nathan has a concluded contract withBond and Bell Resources,

Sat-leaf (summary)

Lion Nathan has a concluded contract withBond and Bell Resources, said Douglas Myers,Chief Executive of Lion Nathan

Page 22: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Results

Transform

Veins (S) Veins (C) GS (S*) GS (S) GS (C)

Original 78.9 76.7 72.3 78.9 71.4

ATT 79.3 78.2 73.7 79.3 76.3

SAT 78.9 76.4 73.6 79.1 73.9

SENT N/A N/A 78.5 80.8 N/A

SENT-SAT N/A N/A 79.7 80.8 N/A

Page 23: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Long Distance Resolution 10 cases in corpus of pronouns with

antecedents more than 2 utterances away, most in ATT relations

LRC gets them all correct, since no competing antecedents (“him”, “their”)

Veins (w/o ATT) gets 6 out of 10 With the transforms, all algorithms

get 100%

Page 24: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Conclusions Two ways to determine success of decomposition

strategy: intrasentential and intersentential resolution

Intra: no improvement, better to use grammatical function

Inter: LDR’s…. Hard to draw concrete conclusions Need more data to determine if transforms give a good

approximation of segmentation Using G&S accessibility of clauses doesn’t seem to

work either At the minimum, even if a method performs the

same, it has the advantage of a smaller search space

Page 25: Empirical Evaluation of Pronoun Resolution and Clausal Structure

Future Work Error analysis shows determining

coherence relations could account for several intrasentential cases

Use rhetorical relations themselves to constrain accessibility of entities

Annotating human-human dialogues in TRIPS 911 domain for reference, already been annotated for argumentation acts (Stent, 2001)