anaphora, discourse and information structure

45
1 Anaphora, Discourse and Information Structure Oana Postolache [email protected] EGK Colloquium April 29, 2004

Upload: konane

Post on 25-Feb-2016

48 views

Category:

Documents


2 download

DESCRIPTION

Anaphora, Discourse and Information Structure. Oana Postolache [email protected] EGK Colloquium April 29, 2004. Overview. Anaphora Resolution Discourse (parsing) Balkanet Information Structure. Joint work with Prof. Dan Cristea & Prof. Dan Tufis; Univ. of Iasi. Anaphora Resolution. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Anaphora, Discourse and Information Structure

1

Anaphora, Discourse and Information Structure

Oana [email protected]

EGK ColloquiumApril 29, 2004

Page 2: Anaphora, Discourse and Information Structure

2

Overview

Anaphora ResolutionDiscourse (parsing)Balkanet

Information Structure

Joint work with Prof. Dan Cristea & Prof. Dan Tufis; Univ. of Iasi

Page 3: Anaphora, Discourse and Information Structure

3

Anaphora Resolution

“If an incendiary bomb drops next to you, don’t loose your head. Put it in a bucket and cover it with sand”.

Ruslan Mitkov (p.c.)

Page 4: Anaphora, Discourse and Information Structure

4

Anaphora Resolution“Anaphora represents the relation

between a term (named anaphor) and another (named antecedent), when the interpretation of the anaphor is somehow determined by the interpretation of the antecedent”.

Barbara Lust, Introduction to Studies of Anaphora Acquisition, D. Reidel, 1986

Page 5: Anaphora, Discourse and Information Structure

5

Anaphora Resolution TypesCoreference resolution

The anaphor and the antecedent refer to the same entity in the real world.

Three blind mice, three blind mice.See how they run! See how they run!

Functional anaphora resolution The anaphor and the antecedent refer to

two distinct entities that are in a certain relation.

When the car stopped, the driver got scared.

Haliday & Hassan 1976

Page 6: Anaphora, Discourse and Information Structure

6

Types of CoreferencePronominal coreference

The butterflies were dancing in the air. They offered an amazing couloured show.

Common nouns with different lemmasAmenophis the IVth's wife was looking through the window. The beautiful queen was sad.

Common nouns with different lemmas and numberA patrol was marching in the street. The soldiers were very well trained.

Proper namesThe President of U.S. gave a very touching speech. Bush talked about the antiterorist war.

AppositionsMrs. Parson, the wife of a neighbour on the same floor,

was looking for help.Nominal predicates

Maria is the best student of the whole class.Function-value coreference

The visitors agreed on the ticket price. They concluded that 100$ was not that much.

Page 7: Anaphora, Discourse and Information Structure

7

RARE – Robust Anaphora Resolution Engine

RARE

text

AR-model3

AR-model2

AR-model1

Coreference chains

Page 8: Anaphora, Discourse and Information Structure

8

RARE: Two main principles

1. Coreferential relations are semantic, not textual.

Coreferential anaphoric relation

text layer………………………………………………..

semantic layer……………………………………………

aa proposes centera

centera

b evokes centera

b

Page 9: Anaphora, Discourse and Information Structure

9

RARE: Two main principles

2. Processing is incremental

text layer…………………………………………

projection layer………………………………………………………..

semantic layer………………………………….

RE b projects PSb

PSb

centera

PSa proposes centera

RE a projects PSa

PSa

………………………

b a

PSb evokes centera

Page 10: Anaphora, Discourse and Information Structure

10

Terminologytext layer ……………………….…………………………………………

semantic layer ………………………………………

DEm

REa

projection layer ………………………………………………

DEj

PSx

REb REc REd REx

reference expressions

DE1

projected structures

discourse entities

Page 11: Anaphora, Discourse and Information Structure

11

What is an AR-model?text layer ……………………….…………………………………………

semantic layer ………………………………………

DEm

REa

projection layer ………………………………………………

DEj

PSx

REb REc REd REx

DE1

knowledge sources

primary attributes

heuristics/rules

domain of referential accessibility

Page 12: Anaphora, Discourse and Information Structure

12

Primary attributes1. Morphological (number, lexical gender, person)2. Syntactic (REs as constituents of a syntactic tree, quality of

being adjunct, embedded or complement of a preposition, inclusion or not in an existential construction, syntactic patterns in which the RE is involved)

3. Semantic and lexical (RE’s head position in a conceptual hierarchy, animacy, sex/natural gender, concreteness, inclusion in a synonymy class, semantic roles)

4. Positional (RE’s offset in the text, inclusion in a discourse unit)5. Surface realisation (zero/clitic/full/reflexive/possessive/

demonstrative/reciprocal pronoun, expletive “it”, bare noun, indefinite NP, definite NP, proper noun)

6. Other (domain concept, frequency of the term in the text, occurrence of the term in a heading)

Page 13: Anaphora, Discourse and Information Structure

13

Knowledge sources• A knowledge source: a (virtual) processor able to

fetch values to attributes on the projections layer

Minimum set: POS-tagger + shallow parser

Page 14: Anaphora, Discourse and Information Structure

14

Matching Rules

• Certifying Rules (applied first): certify without ambiguity a possible candidate.

• Demolishing Rules (applied afterwards): rule out a possible candidate.

• Scored Rules: increase/decrease a resolution score associated with a pair <PS, DE>.

Page 15: Anaphora, Discourse and Information Structure

15

Domain of referential accesibility

Filter and order the candidate discourse entities: a. Linearly

Dorepaal, Mitkov, ...

b. HierarchicallyGrosz & Sidner; Cristea, Ide & Romary ...

Page 16: Anaphora, Discourse and Information Structure

16

The engine

for_each RE in RESequence:projection(RE)proposing/evoking(PS)completion(DE,PS)re-evaluation

Page 17: Anaphora, Discourse and Information Structure

17

The engine: Projectionfor_each RE in RESequence:

projection(RE)proposing/evoking(PS)completion(DE,PS) re-evaluation

text layer ……………………….…………………………………………

semantic layer ………………………………………

DEm

REa

projection layer ………………………………………………

REb REc REd

DEn

PSd

REx

psx primary attributesknowledge sources

PSx

Page 18: Anaphora, Discourse and Information Structure

18

The engine: Proposingfor_each RE in RESequence:

projection(RE)proposing/evoking(PS)completion(DE,PS)re-evaluation

text layer ……………………….…………………………………………

semantic layer ………………………………………

DEm

REa

projection layer ………………………………………………

PSx

REb REc REd REx

domain of referential accessibility

DEn

PSd

heuristics/rulesDEn

Page 19: Anaphora, Discourse and Information Structure

19

The engine: Proposing (2)for_each RE in RESequence:

projection(RE)proposing/evoking(PS)

• apply certifying rules • apply demolishing rules • apply scored rules• sort candidates in desc. order of scores• use thresholds to:

– propose a new DE– link the current PS to an existing DE– postpone decision

completion(DE,PS)re-evaluation

Page 20: Anaphora, Discourse and Information Structure

20

The engine: Completionfor_each RE in RESequence:

projection(RE)proposing/evoking(PS)completion(DE,PS)re-evaluation

text layer ……………………….…………………………………………

semantic layer ………………………………………

DEm

REa

projection layer ………………………………………………

PSx

REb REc REd REx

DEn

PSd

DEn

Page 21: Anaphora, Discourse and Information Structure

21

The engine: Completion (2)for_each RE in RESequence:

projection(RE)proposing/evoking(PS)completion(DE,PS)re-evaluation

text layer ……………………….…………………………………………

semantic layer ………………………………………

DEm

REa

projection layer ………………………………………………

REb REc REd REx

PSd

DEn

Page 22: Anaphora, Discourse and Information Structure

22

The engine: Re-evaluationfor_each RE in RESequence:

projection(RE)proposing/evoking(PS)completion(DE,PS)re-evaluation

text layer ……………………….…………………………………………

semantic layer ………………………………………

DEm

REa

projection layer ………………………………………………

REb REc REd REx

PSd

DEn

PSd

DEn

Page 23: Anaphora, Discourse and Information Structure

23

The engine: Re-eval (2)for_each RE in RESequence:

projection(RE)proposing/evoking(PS)completion(DE,PS)re-evaluation

text layer ……………………….…………………………………………

semantic layer ………………………………………

DEm

REa

projection layer ………………………………………………

REb REc REd REx

DEn

Page 24: Anaphora, Discourse and Information Structure

24

The Coref Corpus

• 4 chapters from George Orwell’s novel “1984” summing up aprox. 19,500 words.• Preprocessed using a POS-tagger & a FDG-parser.• The NPs automatically extracted from FDG structure (some manual corrections were necessary, also adding other types of referential expressions).• Manual annotation of the coreferential links (each text was assigned to two annotators). • Interannotator agreement – as low as 60%.

Our annotation is conformant with MUC & ACE

Page 25: Anaphora, Discourse and Information Structure

25

The Coref CorpusText 1 Text 2 Text 3 Text 4 Total

No. of sentences 311 175 169 328 983No. of words 6935 3317 3260 6008 19520No. of REs 1942 914 916 1702 5472Average no. of

REs per sentence

6.2 5.2 5.4 5.1 5.4

Pronouns 645 281 362 614 1902No. of DEs 921 520 464 863

Page 26: Anaphora, Discourse and Information Structure

26

EvaluationSuccess Rate = #correctly solved anaphors / all anaphors

For the four texts we obtained values between 60% and 70%.

(Mitkov 2000)

Page 27: Anaphora, Discourse and Information Structure

27

Road Map

Anaphora ResolutionDiscourse (parsing)Balkanet

Information Structure

Page 28: Anaphora, Discourse and Information Structure

28

Discourse Parsing

Input: plain text

Goal: - Automatically obtain a discourse structure of the text (resembling RST trees). - Apply the Veins Theory to produce focussed summaries.

Cristea, Ide & Romary 1998

Page 29: Anaphora, Discourse and Information Structure

29

Veins Theory: Quick Intro

Cristea, Ide & Romary 1998

1 2 3 4

5

H=1 3 5

H=1 3

H=1

H=3

H=1

H=2H=3

H=4

H=5

V=1 3 5

V=1 3 5

V=1 3 5

V=1 3 5

V=1 3 5

V=1 2 3 5

V=1 3 5

V=1 3 5

V=1 3 4 5

Head expression: the sequence of the most important units within the corresponding span of text

Vein expression: the sequence of units that are required to understand the span of text covered by the node, in the context of the whole discourse

Page 30: Anaphora, Discourse and Information Structure

30

Focused Summaries

We call focused summary on an entity X, a coherent excerpt presenting how X is involved in the story that constitutes the content of the text.

- It is given by the vein expression of the unit to which X belongs.

Page 31: Anaphora, Discourse and Information Structure

31

The method

Plain text

FDGparser

segmentsdetector

NPDetector

sentence tree

extractor

AR-engine

taggedcorefss-trees

DiscouseParser

Discoursestructure

VeinsTheory

focusedsummary

Page 32: Anaphora, Discourse and Information Structure

32

The method

Plain text

FDGparser

segmentsdetector

NPDetector

sentence tree

extractor

AR-engine

taggedcorefss-trees

DiscouseParser

Discoursestructure

VeinsTheory

focusedsummary

Conexor FDG parser

http://www.connexor.com/m_syntax.html

Page 33: Anaphora, Discourse and Information Structure

33

The method

Plain text

FDGparser

segmentsdetector

NPDetector

sentence tree

extractor

AR-engine

taggedcorefss-trees

DiscouseParser

Discoursestructure

VeinsTheory

focusedsummary

Extracts NPs from the FDG structure

Page 34: Anaphora, Discourse and Information Structure

34

The method

Plain text

FDGparser

segmentsdetector

NPDetector

sentence tree

extractor

AR-engine

taggedcorefss-trees

DiscouseParser

Discoursestructure

VeinsTheory

focusedsummary

RARE...

Page 35: Anaphora, Discourse and Information Structure

35

The method

Plain text

FDGparser

segmentsdetector

NPDetector

sentence tree

extractor

AR-engine

taggedcorefss-trees

DiscouseParser

Discoursestructure

VeinsTheory

focusedsummary

Detects the boundaries of clauses, based on learning methods.

Georgiana Puscasu (2004): A Multilingual Method for Clause Splitting.

Page 36: Anaphora, Discourse and Information Structure

36

The method

Plain text

FDGparser

segmentsdetector

NPDetector

sentence tree

extractor

AR-engine

taggedcorefss-trees

DiscouseParser

Discoursestructure

VeinsTheory

focusedsummary

— Proposes one or more tree structure(s) at the sentence level. — The leaves are the clauses previously detected. — Uses the FDG structure and the cue-phrases.

Page 37: Anaphora, Discourse and Information Structure

37

The method

Plain text

FDGparser

segmentsdetector

NPDetector

sentence tree

extractor

AR-engine

taggedcorefss-trees

DiscouseParser

Discoursestructure

VeinsTheory

focusedsummary

Page 38: Anaphora, Discourse and Information Structure

38

The Discourse Parser– We have trees for each sentence;– The goal is to incrementally integrate these trees into a single structure corresponding to the entire text

The current tree is inserted at each node on the right frontier; each resulting structure is scored considering:

– The coreference links– Centering Theory– Veins Theory

foot node

*

Cristea, Postolache, Pistol (2004): Summarization through Discourse structure (submitted to Coling)

Page 39: Anaphora, Discourse and Information Structure

39

The Discourse Parser– At the end of the process - set of trees corresponding to the input text, each with a score

T* = argmax score(Ti)

– Veins(T*)– Extract the summary

Ti

Page 40: Anaphora, Discourse and Information Structure

40

Discusion & Evaluation- We do obtain automatically coherent summaries!

- How to evauate?- We have 90 summaries made by humans...1) Construct a golden summary out of the 90

summaries and compare it with the system output?

2) Compare the sytem output with all 90 summaries and take the best result?

Page 41: Anaphora, Discourse and Information Structure

41

Road Map

Anaphora ResolutionDiscourse (parsing)Balkanet

Information Structure

Page 42: Anaphora, Discourse and Information Structure

42

Information StructureMany approaches for IS:

Prague School Approach; Formal account of English intonation; Integrating different means of IS realization

within one grammar framework;Formal semantics of focus;Formal semantics of topic; Integrating IS within a theory of discourse

interpretation; IS-sensitive discourse context updating;

Sgall et al; Steedman; Kruijff; Krifka, Rooth; Hendriks; Vallduvi, Kruijff-Korbayova

Page 43: Anaphora, Discourse and Information Structure

43

Information StructureGoals:

Improve/Create/Enlarge a corpus annotated at IS (and not only);

Investigate means of continuing the annotation (at least partially) automatically

Investigate how the (major) NLP tasks can benefit from IS.Find correlation between different features.

System that detects IS

Page 44: Anaphora, Discourse and Information Structure

44

Summary

Anaphora Resolution: RARE

Discourse Parsing: Veins theory

Balkanet: Multilingual WordNet

Information Structure

Page 45: Anaphora, Discourse and Information Structure

45

References• Postolache, Oana. 2004. ‘‘A Coreference Resolution Model on

Excerpts from a novel’’. ESSLLI’04, to appear.• Postolache, Oana. 2004. ‘‘RARE: Robust Anaphora Resolution

Engine’’. M.Sci. thesis. Univ. of Iasi.