attribution and the pdtb - penn engineeringpdtb2012/assets/... · •annotation of attributions not...

Post on 24-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Attribution and the PDTB

Silvia Pareti

The University of Edinburgh School of Informatics

Outline

• Introduction

• Attribution in the PDTB

• Annotation schema extension

• Resources development

• Preliminary achievements

• Future directions

Introduction - Attribution

(wsj 0961)

PDTB - Attribution Annotation

Mr. Nemeth said in parliament that Czechoslovakia and Hungary would suffer environmental damage if the twin dams were built as planned. (wsj_0037)

____Explicit____

9163..9165

#### Text ####

if

#### Features ####

Ot, Comm, Null, Null

9067..9096

#### Text ####

Mr. Nemeth said in parliament

##############

if, Contingency.Condition.Unreal present

____Arg1____

9097..9162

#### Text ####

that Czechoslovakia and Hungary would suffer environmental damage

#### Features ####

Inh, Null, Null, Null

____Arg2____

9166..9201

#### Text ####

the twin dams were built as planned

#### Features ####

Inh, Null, Null, Null

(Prasad et al., 2008)

Other corpora with attribution

• MPQA Opinion Corpus (Wiebe et al., 2002)

– 692 articles

– intra-sentential annotation

• RST Discourse Treebank (Carlson&Marcu, 2001)

– 385 articles

– intra-sentential, only explicit sources, verb cues or according to

• GraphBank (Wolf&Gibson, 2005)

– 135 articles

– only attributions not overlapping with other discourse relations

• Other smaller or low-coverage projects

– Sidney Morning Herald Corpus (O’Keefe et al., submitted)

– Corpus TCC and RHETALHO (Pardo et al., 2004)

PDTB - Advantages

Large corpus

less frequent structures and strategies are better observed, e.g. :

Groused Robert Antolini, head of over-the-counter trading at Donaldson, Lufkin & Jenrette: "It's making it tough for traders to make money”. (wsj_1142)

For some at the SEC, an agency that covets its independence, Mr. Breeden may be too much of a Washington insider. (wsj_0955)

PDTB - Advantages

The range of attributions covered is not pre-defined

• Attributions are not limited to the sentence level

• A wide range of attributions are annotated:

– direct, indirect and mixed

– having named or not named, explicit as well as implicit sources (e.g. it is believed…)

– having verb and non-verb cues (e.g. idea, for)

• Includes some relevant features

PDTB - Extensions

• Finer grained annotation of the attribution span: source, cue, circumstantial information

• Completing content spans of some direct or mixed attributions

PDTB - Extensions

• Finer grained annotation of the attribution span: source, cue, circumstantial information

• Completing content spans of some direct or mixed attributions

"It's just sort of a one-upsmanship thing with some people," added Larry Shapiro. "They like to talk about having the new Red Rock Terrace one of Diamond Creek's Cabernets or the Dunn 1985 Cabernet, or the Petrus.

Producers have seen this market opening up and they're now creating wines that appeal to these people."

(wsj 0071)

• Annotation of attributions not overlapping with discourse relations

• Annotation of nested attributions

PDTB - Extensions

• Annotation of attributions not overlapping with discourse relations

• Annotation of nested attributions

["The Caterpillar people aren't too happy when they see their equipment used like that,"]

[shrugs] [Mr. George].

["They figure it's not a very good advert.“] (wsj 1121)

PDTB - Extensions

[They] [figure] [it's not a very good advert]

Annotation Schema

source

cue

SUPPLEMENT

content

[Mr. Nemeth said IN PARLIAMENT] that Czechoslovakia and Hungary would suffer environmental damage if the twin dams were built as planned. (wsj_0037)

[PDTB attribution span]

PDTB discourse connective /Arg1/Arg2 text spans

attribution type

source type

• assertion (e.g. say, mention)

• belief (e.g. think, doubt)

• fact (e.g. remember, know)

• eventuality (e.g. allow)

PDTB Attribution Features

• writer (if explicit, e.g. I think...)

• other (e.g. Mr. Brown, a witness)

• arbitrary (e.g. one, people)

• mixed (e.g. My assessment and everyone's assessment is…(wsj_2012))

factuality (determinacy)

scopal change (scopal polarity)

• factual

• non-factual

PDTB Attribution Features

• none

• scopal change

Se c’è, cioè, una maggioranza in Parlamento in grado di affrontare seriamente una fase di riforme anche elettorali, Ø penso che la legislatura possa utilmente proseguire. (re075)

If there is a majority at the Parliament able to seriously face a phase of reforms, also electoral, (I) think that the legislature could usefully continue.

source attitude

authorial stance

•neutral (e.g. say, add)

•positive (e.g. welcome, beam)

•critical (e.g. lament, fume)

•tentative (e.g. believe, suggest)

•other (e.g. joke)

New Attribution Features

•committed (e.g. admit, know)

•not-committed (e.g. lie, claim)

•neutral (e.g. say, suggest)

New Attribution Features

Mr. Nemeth said IN PARLIAMENT that Czechoslovakia and Hungary would suffer environmental damage if the twin dams were built as planned. (wsj_0037)

Attribution type: assertion

Source type: other

Factuality: factual

Scopal change: none

Source attitude: neutral

Authorial stance: neutral

New Attribution Features

Mr. Nemeth said IN PARLIAMENT that Czechoslovakia and Hungary would suffer environmental damage if the twin dams were built as planned. (wsj_0037)

Attribution type: assertion

Source type: other

Factuality: factual

Scopal change: none

Source attitude: neutral

Authorial stance: neutral

Confronted, Mrs. Yeargin admitted she had given the questions and answers two days before the examination to two low-ability geography classes.(wsj 0044) Authorial stance: committed

"I think that this magazine is not only called Garbage, but it is practicing journalistic garbage," fumes a spokesman for Campbell Soup.(wsj 0062) Source attitude: negative

Inter-Annotator Agreement

• 2 annotators

• 14 articles (PDTB)

• annotation manual

• training on an article

• MMAX2 annotation tool (Müller&Strube,2006)

• complete annotation schema

Data:

• 491 attributions

(22% are nested)

(Pareti, 2012 submitted)

Results - Existence of Attribution

0.87 agr proportion of commonly annotated relations with respect to the annotations identified overall by Annotator A and Annotator B

NOTE: writer attributions were annotated only if explicit

Span selection tasks (agr metric):

Cue Source Content Supplement 0.97 0.94 0.95 0.37

Results- Features

PERCENT AGREEMENT COHEN'S KAPPA

TYPE 83.42(317) 0.63

SOURCE 95(361) 0.71

SCOPAL CHANGE 98.68(375) 0.60

AUTHORIAL STANCE 94.47(359) 0.20

SOURCE ATTITUDE 82.36(313) 0.48

FACTUALITY 97.63(371) 0.73

Italian Attribution Corpus-ItAC

• 50 articles (37,000 tokens) from Italian newspaper corpora (e.g. La Repubblica)

• 460 attribution relations

• Freely available from: http://homepages.inf.ed.ac.uk/s1052974/resources.php

(Pareti and Prodanof, 2010)

PDTB Attribution Corpus

Stand-off annotation of attribution based on the PDTB:

• Comprises all attribution relations annotated in the PDTB (reconstructed from the current annotation)

• The annotation is further extended according to the revised annotation schema

(Pareti, 2012)

9868 attributions

PDTB Attribution Corpus Annotation of the attribution span: source cue SUPPLEMENT

80% automatically, then manually revised, using 48 matching rules, e.g.: (NP-SBJ)(VP) one person said (PP-LOC)(NP)(VB) IN DALLAS, LTV said (NP-SBJ)(VBP)(JJ) I am sure

20 % had rarer syntax and was manually annotated, e.g.:

Judge Curry ordered the refunds to begin Feb. 1 and said (wsj 0015)

PDTB Attribution Corpus

Further annotation of the content span:

– adding punctuation (direct quotation marks)

– completing content spans that had only been partially annotated

– annotating the quote status of the attribution based on the position of quote span QS and content span CS:

• direct QS = CS

• indirect CS outside or contained in QS

• mixed CS overlaps QS or QS contained in CS

PDTB Attribution Corpus

ATTRIBUTION ID: wsj_0003.pdtb_05 SOURCE SPAN: Darrell Phillips, vice president of human resources for Hollingsworth & Vose CUE SPAN: said CONTENT SPAN: “There’s no question that some of those workers and managers contracted asbestos–related diseases,” “But you have to recognize that these events took place 35 years ago. It has no bearing on our work force today.” SUPPLEMENT SPAN: None FEATURES: Ot, Comm, Null, Null QUOTE STATUS: Direct

Use of PDTB Attribution Corpus

Independent analysis of attribution:

• cue composition

– several cues other than verbs (prepositions, nouns, adverbs)

– wide range of attributional verbs (266 types in the corpus)

• source composition

– NEs only about 50% of the sources

• attribution structures

Use of PDTB Attribution Corpus

Testing a system for the identification of direct quotes and their speaker in the literature and news domains. University of Sydney and Sydney Morning Herald

(O’Keefe et al. 2012, submitted).

• rule-based and machine-learning based approaches have been tested on 3 corpora.

• Approaches results show that direct quotes differ by domain and style

Future

• Development of an attribution extraction system using the data to train a classifier

• Semi-automatic extension of the annotation to comprise all attributions in the corpus

• Annotation of the level of nesting of each attribution

• Release of the corpus for development/testing and shared tasks usages

Conclusion

• Advantages of attribution in the PDTB

• Development of a finer-grained annotation schema and its inter-annotator agreement results

• Application of the schema to a small corpus of Italian

• Collection and further annotation of attribution in the PDTB

• Importance of this resource for the analysis of attribution and its ‘long tail’ and for testing and developing attribution extraction systems

Bibliography

Carlson, L. and Marcu, D. Discourse tagging reference manual. Technical report ISITR- 545. Technical report, ISI, University of Southern California, September 2001.

Müller, C. and Strube, M., Multi-Level Annotation of Linguistic Data with MMAX2. In: Sabine Braun, Kurt Kohn, Joybrato Mukherjee (Eds.): Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods. Frankfurt: Peter Lang, pp. 197-214. (English Corpus Linguistics, Vol.3 ), 2006.

O’Keefe, T., Pareti, S., Curran, J., Koprinska, I. and Honnibal, M., A sequence labelling approach to quote attribution. Manuscript submitted for publication, 2012.

Pardo, T., das Graças Volpe Nunes, M. and Rino, L.. Dizer: An automatic discourse analyzer for Brazilian Portuguese. In Ana Bazzan and Sofiane Labidi, editors, Advances in Artificial Intelligence – SBIA 2004, volume 3171 of Lecture Notes in Computer Science, pages 224–234. Springer Berlin / Heidelberg, 2004.

Pareti, S. and Prodanof, I. Annotating attribution relations: Towards an Italian discourse treebank. In Proceedings of LREC10, 2010.

Pareti,S. A database of attribution relations. In Proceedings of LREC12, Istanbul, 23-25 May 2012 (to appear).

Pareti, S., Theory and practise of annotating attributions. Manuscript submitted for publication, 2012.

Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A. and Webber, B. The Penn Discourse Treebank 2.0. In Proceedings of LREC08, 2008.

Wiebe, J. Instructions for annotating opinions in newspaper articles. Technical report, University of Pittsburgh, 2002.

Wolf, F. and Gibson, E. Representing discourse coherence: A corpus-based study. Comput. Linguist., 31:249288, June 2005.

top related