project logo end-to-end discourse parser evaluation sucheta ghosh, sara tonelli, giuseppe riccardi,...

Project LOGO

End-to-End Discourse Parser Evaluation

Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson

Department of Information Engineering and Computer ScienceUniversity of Trento, Italy

Project LOGO

Content Introduction

Discourse Parser: what + why + howDiscourse Parser & Penn Discourse TreeBank (PDTB)Our contribution

Architecture Feature Result Conclusion

2End2End Disc Pars Eval

Project LOGO

Introduction What: we refer to coherent structured group of sentences

or expressions as a discourse Why: discourse structure to represent the meaning of the

document How :

Process flow: data (discourse) segmentation discourse parsing discourse structure

Discourse structure includes relations (connective and its arguments ) lexically anchored in the document text

Common Data Sources: Rhetorical Structure Tree (RST) & Penn Discourse TreeBank (PDTB ) We used this


Project LOGO

Examples from PDTB(1)Arg1 -> I never gamble too far.

Explicit Connective -> In particularArg2 -> I quit after one try, whether I win or lose.

[EXPANSION ]


Each annotated relation includes a connective, two arguments and a sense label of connective

Connective occur between two arguments or at the beginning of sentence or inside argument

The top-level senses of three-layered hierarchy: TEMPORAL, CONTINGENCY, COMPARISON, EXPANSION

Project LOGO

Examples from PDTB(2) When Mr. Green won a $240,000 verdict in a land condemnation

case against the State in June 1983, he says, Judge O’Kicki unexpectedly awarded him an additional $100,000. [TEMPORAL ]

As an indicator of the tight grain supply situation in the U.S., market analysts said that late Tuesday the Chinese government, which often buys U.S. grains in quantity, turned instead to Britain to buy 500,000 metric tons of wheat. [COMPARISON ]

Since McDonald’s menu prices rose this year, the actual deadline may have been more. [CONTINGENCY ]

(Arg1 italicized, connectives underlined, Arg2 boldfaced)


Project LOGO

PDTB Corpus Statistics Arg2 always in same sentence as connective

60.9% of the annotated Arg1 in same sentence as connective, 39.1% is in the previous sentence (30.1% adjacent, 9.0% non adjacent)

We used this statistic information to establish baseline


Project LOGO

Our Contribution Developed end-to-end discourse parser to retrieve

discourse structure with explicit connective, 2 arg spans starting with text paragraph

Evaluation Established system with Gold-standard data (PTB+PDTB) Evaluated with baseline Implemented same method in automated system Improvement of the automated system in terms of applicability

Overlapping discourse segmentation technique (+2/-2 window) applied on the complete text

Followed chunking strategy for classification The discourse model is a cascaded CRF


Project LOGO

End-to-End Architecture


Chunklink

• By Sabaine Buchholz

• CoNLL’00 task

AddDiscourse

• Pitler & Nenkova ‘09

• Conn. SenseDet.

RootExtract+Morpha

• Morph & All Feat

• Johansson+ Minnen et al

Pruner Arg2 Arg1

Doc

Parser

Parse_Tree

Project LOGO

FeaturesFeatures used for Arg1 and Arg2 segmentation and labeling.F1. Token (T)F2. Sense of Connective (CONN)F3. IOB chain (IOB)F4. PoS tagF5. Lemma (L)F6. Inflection (INFL)F7. Main verb of main clause (MV)F8. Boolean feature for MV (BMV)

Additional feature used only for Arg1F9. Arg2 Labels


For more details: Ghosh et al IJCNLP 2011

Project LOGO

Features: Arg1Features used for Arg1 and Arg2 segmentation and labeling.F1. Token (T)F2. Sense of Connective (CONN)F3. IOB chain (IOB)F4. PoS tagF5. Lemma (L)F6. Inflection (INFL)F7. Main verb of main clause (MV)F8. Boolean feature for MV (BMV)




Project LOGO

Features: Arg2Features used for Arg1 and Arg2 segmentation and labeling.F1. Token (T)F2. Sense of Connective (CONN)F3. IOB chain (IOB)F4. PoS tagF5. Lemma (L)F6. Inflection (INFL)F7. Main verb of main clause (MV)F8. Boolean feature for MV (BMV)




Project LOGO

Evaluation & BaselineMetrics: Precision, Recall and F1 measure

Scoring schemes:Exact Match: correct if classified span exactly coincides with gold standard span

Baseline (On the basis of statistics given at annotation manual):Arg2: by labeling all tokens of the text span between the connective and the beginning of the next sentence

Arg1: by labeling all tokens in the text span from the end of the previous sentence to the connective position; if the connective occurs at the beginning of a sentence, labeling previous sentence.


Project LOGO

Exact Arg2 Results: Comparison Viewgraph


P R F1

Baseline 0.53 0.46 0.49

Gold-Standard 0.84 0.74 0.79

Automatic 0.80 0.74 0.77

AutoConn+GoldSPT 0.82 0.70 0.76

GoldConn+AutoSPT 0.76 0.61 0.68

Lightweight(Auto) 0.72 0.56 0.63

Project LOGO

Exact Arg1 Results: Comparison Viewgraph


P R F1

Baseline 0.19 0.19 0.19

Gold-Standard 0.68 0.39 0.49

Automatic 0.63 0.28 0.39

AutoConn+GoldSPT 0.67 0.31 0.43

GoldConn+AutoSPT 0.62 0.31 0.41

Lightweight(Auto) 0.60 0.27 0.37

Project LOGO

Features

The IOB(Inside-Outside-Begin) chain all constituents on the path between the root note and the current leaf node of the tree. For example IOB chain feature for ``flashed“: I-S/E-VP/E-SBAR/E-S/C-VP , where B-, I-, E- and C- indicate whether the given token is respectively at the beginning, inside, at the end of the constituent, or a single token chunk.


Project LOGO

Conclusion

The Automatic end2end system results nearly same with Gold standard

We lead towards a “lightweight” version of the pipeline – shallow & less dependence of SPTs

We wish to explore more features We improved our result by 5 points for Arg1 classification

using a previous sentence feature (Ghosh et al IJCNLP 2011)


Project LOGO

Thank you

End2End Disc Pars Eval

Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson

Department of Information Engineering and Computer ScienceUniversity of Trento, Italy

{ghosh, riccardi}@disi.unitn.it

Project LOGO

Previous Work Task limited to retrieving the argument heads (Wellner et

al 2007, Elwell et al 2008) Dinesh et al. (2005) extracted complete arguments with

boundaries, but only for a restricted class of connectives The identification of Arg1 has been only partially

addressed in previous works (Prasad 2010) Automatic surface-sense classification (at class level)

already reached the upper bound of inter-annotator agreement (Pitler and Nenkova, 2009)


Project LOGO

Data & Tools Corpus Used: Penn Discourse Tree Bank (PDTB)

For Gold Standard System: Penn Tree Bank (PTB) corpus is used Third party software/scripts used:

Stanford Syntactic Tree Parser (by Klein & Manning 2003) AddDiscourse (Explicit Connective Classification) (Pitler and

Nenkova 2008) ChunkLink.pl to extract IOB chains (by Sabine Buchholtz: CoNLL

Shared Task 2000) RootExtractor: Syntactic Parse Tree (SPT) processors (by Richard

Johansson) Morpha (Minnen et al 2001) Conditional Random Field: CRF++ by Taku Kudo


Project LOGO

Overall Architecture

Syntactic tree parser is used for automatic systemsConnective Detection and classification tool is used for automatic systems PDTB & PTB are not used during end-to-end automatic testing phase


Project LOGO

End2End Testing Phase


Project LOGO

Conditional Random Field


We use the CRF++ tool (http://crfpp.sourceforge.net/) for sequence labeling classification (Lafferty et al., 2001), with second-order Markov dependency between tags.

Beside the individual specification of a feature in the feature description template, the features in various combinations are also represented.

We used this tool because the output of CRF++ is compatible to CoNLL 2000 chunking shared task, and we view our task as a discourse chunking task.

On the other hand, linear-chain CRFs for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Also Sha and Pereira (2003) claim that, as a single model, CRFs outperform other models for shallow parsing.

http://crfpp/

Project LOGO

Hill Climbing Algorithm


function HILL-CLIMBING ( problem) returns a state that is a local maximumcurrent 9— MAKE-NODE(problem.INITIAL-STATE)loop doneighbor highest-valued successor of currentif (neighbor.VALUE < current.VALUE) then return current.STATEcurrent 9<— neighbor [Artificial Intelligence: Stuart J. Russel]

The hill climbing search algorithm, the most basic local search technique. At each step the current node is replaced by the best neighbor;

Here neighbor with the highest VALUE, but if a heuristic cost estimate h is used, we would find the neighbor with the lowest h.

Hill climbing is greedy, fast local search We optimized this selected set with feature ablation technique, leaving 1

feature each time

Project LOGO

Features

The IOB(Inside-Outside-Begin) chain corresponds to the syntactic categories of all the constituents on the path between the root note and the current leaf node of the tree. The corresponding feature would be I-S/E-VP/E-SBAR/E-S/C-VP, where B-, I-, E- and C- indicate whether the given token is respectively at the beginning, inside, at the end of the constituent, or a single token chunk. In this case, ``flashed" is at the end of every constituent in the chain, except for the last VP, which dominates one single leaf.


Project LOGO

Result: Gold-lbl & AutoP R F1

Arg2 Exact 0.840.53

0.74 0.46

0.79 0.49

Partial 0.93 0.80

0.82 0.85

0.88 0.82

Overlap 0.97 0.98

0.88 0.85

0.92 0.91

Arg1 Exact 0.68 0.19

0.39 0.19

0.49 0.19

Partial 0.81 0.50

0.51 0.68

0.62 0.58

Overlap 0.91 0.70

0.52 0.68

0.66 0.69

P R F1

Arg2 Exact 0.80

0.74

0.77

Partial 0.91

0.85

0.88

Overlap 0.97

0.88

0.92

Arg1 Exact 0.64

0.31

0.42

semi Partial 0.76

0.39

0.52

auto Overlap 0.84

0.40

0.54

Arg1 Exact 0.63

0.28

0.39

full Partial 0.74

0.36

0.48

auto Overlap 0.83

0.37

0.51

Gold-labeled Sys Output

Automatic Sys Output

25End2End Disc Pars Eval(Baseline result in blue color)

Project LOGO

Combo Result P R F1

Arg2 Exact 0.82

0.70

0.76

Partial 0.93

0.79

0.85

Overlap 0.96

0.83

0.89

Arg1 Exact 0.67

0.31

0.43

Partial 0.81

0.44

0.57

Overlap 0.94

0.44

0.60

P R F1

Arg2 Exact 0.76

0.61

0.68

Partial 0.91

0.73

0.81

Overlap 0.96

0.77

0.85

Arg1 Exact 0.62

0.31

0.41

Partial 0.76

0.42

0.54

Overlap 0.87

0.43

0.58

Gold Conn + Auto SPT

Auto Conn + Gold SPT


Project LOGO

Result: replc. IOB chain

P R F1

Arg2 Exact 0.80 0.74 0.77

Partial 0.91 0.85 0.88

Overlap 0.97 0.88 0.92

Arg1 Exact 0.65 0.29 0.40

Partial 0.80 0.43 0.56

Overlap 0.97 0.43 0.60


project logo end-to-end discourse parser evaluation sucheta ghosh, sara tonelli, giuseppe riccardi,...

Documents