project logo end-to-end discourse parser evaluation sucheta ghosh, sara tonelli, giuseppe riccardi,...

27
Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering and Computer Science University of Trento, Italy

Upload: mikayla-firman

Post on 28-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

End-to-End Discourse Parser Evaluation

Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson

Department of Information Engineering and Computer ScienceUniversity of Trento, Italy

Page 2: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Content Introduction

Discourse Parser: what + why + howDiscourse Parser & Penn Discourse TreeBank (PDTB)Our contribution

Architecture Feature Result Conclusion

2End2End Disc Pars Eval

Page 3: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Introduction What: we refer to coherent structured group of sentences

or expressions as a discourse Why: discourse structure to represent the meaning of the

document How :

Process flow: data (discourse) segmentation discourse parsing discourse structure

Discourse structure includes relations (connective and its arguments ) lexically anchored in the document text

Common Data Sources: Rhetorical Structure Tree (RST) & Penn Discourse TreeBank (PDTB ) We used this

3End2End Disc Pars Eval

Page 4: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Examples from PDTB(1)Arg1 -> I never gamble too far.

Explicit Connective -> In particularArg2 -> I quit after one try, whether I win or lose.

[EXPANSION ]

4End2End Disc Pars Eval

Each annotated relation includes a connective, two arguments and a sense label of connective

Connective occur between two arguments or at the beginning of sentence or inside argument

The top-level senses of three-layered hierarchy: TEMPORAL, CONTINGENCY, COMPARISON, EXPANSION

Page 5: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Examples from PDTB(2) When Mr. Green won a $240,000 verdict in a land condemnation

case against the State in June 1983, he says, Judge O’Kicki unexpectedly awarded him an additional $100,000. [TEMPORAL ]

As an indicator of the tight grain supply situation in the U.S., market analysts said that late Tuesday the Chinese government, which often buys U.S. grains in quantity, turned instead to Britain to buy 500,000 metric tons of wheat. [COMPARISON ]

Since McDonald’s menu prices rose this year, the actual deadline may have been more. [CONTINGENCY ]

(Arg1 italicized, connectives underlined, Arg2 boldfaced)

5End2End Disc Pars Eval

Page 6: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

PDTB Corpus Statistics Arg2 always in same sentence as connective

60.9% of the annotated Arg1 in same sentence as connective, 39.1% is in the previous sentence (30.1% adjacent, 9.0% non adjacent)

We used this statistic information to establish baseline

6End2End Disc Pars Eval

Page 7: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Our Contribution Developed end-to-end discourse parser to retrieve

discourse structure with explicit connective, 2 arg spans starting with text paragraph

Evaluation Established system with Gold-standard data (PTB+PDTB) Evaluated with baseline Implemented same method in automated system Improvement of the automated system in terms of applicability

Overlapping discourse segmentation technique (+2/-2 window) applied on the complete text

Followed chunking strategy for classification The discourse model is a cascaded CRF

7End2End Disc Pars Eval

Page 8: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

End-to-End Architecture

8End2End Disc Pars Eval

Chunklink

• By Sabaine Buchholz

• CoNLL’00 task

AddDiscourse

• Pitler & Nenkova ‘09

• Conn. SenseDet.

RootExtract+Morpha

• Morph & All Feat

• Johansson+ Minnen et al

Pruner Arg2 Arg1

Doc

Parser

Parse_Tree

Page 9: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

FeaturesFeatures used for Arg1 and Arg2 segmentation and labeling.F1. Token (T)F2. Sense of Connective (CONN)F3. IOB chain (IOB)F4. PoS tagF5. Lemma (L)F6. Inflection (INFL)F7. Main verb of main clause (MV)F8. Boolean feature for MV (BMV)

Additional feature used only for Arg1F9. Arg2 Labels

9End2End Disc Pars Eval

For more details: Ghosh et al IJCNLP 2011

Page 10: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Features: Arg1Features used for Arg1 and Arg2 segmentation and labeling.F1. Token (T)F2. Sense of Connective (CONN)F3. IOB chain (IOB)F4. PoS tagF5. Lemma (L)F6. Inflection (INFL)F7. Main verb of main clause (MV)F8. Boolean feature for MV (BMV)

Additional feature used only for Arg1F9. Arg2 Labels

10End2End Disc Pars Eval

For more details: Ghosh et al IJCNLP 2011

Page 11: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Features: Arg2Features used for Arg1 and Arg2 segmentation and labeling.F1. Token (T)F2. Sense of Connective (CONN)F3. IOB chain (IOB)F4. PoS tagF5. Lemma (L)F6. Inflection (INFL)F7. Main verb of main clause (MV)F8. Boolean feature for MV (BMV)

Additional feature used only for Arg1F9. Arg2 Labels

11End2End Disc Pars Eval

For more details: Ghosh et al IJCNLP 2011

Page 12: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Evaluation & BaselineMetrics: Precision, Recall and F1 measure

Scoring schemes:Exact Match: correct if classified span exactly coincides with gold standard span

Baseline (On the basis of statistics given at annotation manual):Arg2: by labeling all tokens of the text span between the connective and the beginning of the next sentence

Arg1: by labeling all tokens in the text span from the end of the previous sentence to the connective position; if the connective occurs at the beginning of a sentence, labeling previous sentence.

12End2End Disc Pars Eval

Page 13: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Exact Arg2 Results: Comparison Viewgraph

13End2End Disc Pars Eval

P R F1

Baseline 0.53 0.46 0.49

Gold-Standard 0.84 0.74 0.79

Automatic 0.80 0.74 0.77

AutoConn+GoldSPT 0.82 0.70 0.76

GoldConn+AutoSPT 0.76 0.61 0.68

Lightweight(Auto) 0.72 0.56 0.63

Page 14: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Exact Arg1 Results: Comparison Viewgraph

14End2End Disc Pars Eval

P R F1

Baseline 0.19 0.19 0.19

Gold-Standard 0.68 0.39 0.49

Automatic 0.63 0.28 0.39

AutoConn+GoldSPT 0.67 0.31 0.43

GoldConn+AutoSPT 0.62 0.31 0.41

Lightweight(Auto) 0.60 0.27 0.37

Page 15: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Features

The IOB(Inside-Outside-Begin) chain all constituents on the path between the root note and the current leaf node of the tree. For example IOB chain feature for ``flashed“: I-S/E-VP/E-SBAR/E-S/C-VP , where B-, I-, E- and C- indicate whether the given token is respectively at the beginning, inside, at the end of the constituent, or a single token chunk.

15End2End Disc Pars Eval

Page 16: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Conclusion

The Automatic end2end system results nearly same with Gold standard

We lead towards a “lightweight” version of the pipeline – shallow & less dependence of SPTs

We wish to explore more features We improved our result by 5 points for Arg1 classification

using a previous sentence feature (Ghosh et al IJCNLP 2011)

16End2End Disc Pars Eval

Page 17: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Thank you

End2End Disc Pars Eval

Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson

Department of Information Engineering and Computer ScienceUniversity of Trento, Italy

{ghosh, riccardi}@disi.unitn.it

Page 18: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Previous Work Task limited to retrieving the argument heads (Wellner et

al 2007, Elwell et al 2008) Dinesh et al. (2005) extracted complete arguments with

boundaries, but only for a restricted class of connectives The identification of Arg1 has been only partially

addressed in previous works (Prasad 2010) Automatic surface-sense classification (at class level)

already reached the upper bound of inter-annotator agreement (Pitler and Nenkova, 2009)

18End2End Disc Pars Eval

Page 19: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Data & Tools Corpus Used: Penn Discourse Tree Bank (PDTB)

For Gold Standard System: Penn Tree Bank (PTB) corpus is used Third party software/scripts used:

Stanford Syntactic Tree Parser (by Klein & Manning 2003) AddDiscourse (Explicit Connective Classification) (Pitler and

Nenkova 2008) ChunkLink.pl to extract IOB chains (by Sabine Buchholtz: CoNLL

Shared Task 2000) RootExtractor: Syntactic Parse Tree (SPT) processors (by Richard

Johansson) Morpha (Minnen et al 2001) Conditional Random Field: CRF++ by Taku Kudo

19End2End Disc Pars Eval

Page 20: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Overall Architecture

Syntactic tree parser is used for automatic systemsConnective Detection and classification tool is used for automatic systems PDTB & PTB are not used during end-to-end automatic testing phase

20End2End Disc Pars Eval

Page 21: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

End2End Testing Phase

21End2End Disc Pars Eval

Page 22: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Conditional Random Field

22End2End Disc Pars Eval

We use the CRF++ tool (http://crfpp.sourceforge.net/) for sequence labeling classification (Lafferty et al., 2001), with second-order Markov dependency between tags.

Beside the individual specification of a feature in the feature description template, the features in various combinations are also represented.

We used this tool because the output of CRF++ is compatible to CoNLL 2000 chunking shared task, and we view our task as a discourse chunking task.

On the other hand, linear-chain CRFs for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Also Sha and Pereira (2003) claim that, as a single model, CRFs outperform other models for shallow parsing.

Page 23: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Hill Climbing Algorithm

23End2End Disc Pars Eval

function HILL-CLIMBING ( problem) returns a state that is a local maximumcurrent 9— MAKE-NODE(problem.INITIAL-STATE)loop doneighbor highest-valued successor of currentif (neighbor.VALUE < current.VALUE) then return current.STATEcurrent 9<— neighbor [Artificial Intelligence: Stuart J. Russel]

The hill climbing search algorithm, the most basic local search technique. At each step the current node is replaced by the best neighbor;

Here neighbor with the highest VALUE, but if a heuristic cost estimate h is used, we would find the neighbor with the lowest h.

Hill climbing is greedy, fast local search We optimized this selected set with feature ablation technique, leaving 1

feature each time

Page 24: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Features

The IOB(Inside-Outside-Begin) chain corresponds to the syntactic categories of all the constituents on the path between the root note and the current leaf node of the tree. The corresponding feature would be I-S/E-VP/E-SBAR/E-S/C-VP, where B-, I-, E- and C- indicate whether the given token is respectively at the beginning, inside, at the end of the constituent, or a single token chunk. In this case, ``flashed" is at the end of every constituent in the chain, except for the last VP, which dominates one single leaf.

24End2End Disc Pars Eval

Page 25: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Result: Gold-lbl & AutoP R F1

Arg2 Exact 0.840.53

0.74 0.46

0.79 0.49

Partial 0.93 0.80

0.82 0.85

0.88 0.82

Overlap 0.97 0.98

0.88 0.85

0.92 0.91

Arg1 Exact 0.68 0.19

0.39 0.19

0.49 0.19

Partial 0.81 0.50

0.51 0.68

0.62 0.58

Overlap 0.91 0.70

0.52 0.68

0.66 0.69

P R F1

Arg2 Exact 0.80

0.74

0.77

Partial 0.91

0.85

0.88

Overlap 0.97

0.88

0.92

Arg1 Exact 0.64

0.31

0.42

semi Partial 0.76

0.39

0.52

auto Overlap 0.84

0.40

0.54

Arg1 Exact 0.63

0.28

0.39

full Partial 0.74

0.36

0.48

auto Overlap 0.83

0.37

0.51

Gold-labeled Sys Output

Automatic Sys Output

25End2End Disc Pars Eval(Baseline result in blue color)

Page 26: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Combo Result P R F1

Arg2 Exact 0.82

0.70

0.76

Partial 0.93

0.79

0.85

Overlap 0.96

0.83

0.89

Arg1 Exact 0.67

0.31

0.43

Partial 0.81

0.44

0.57

Overlap 0.94

0.44

0.60

P R F1

Arg2 Exact 0.76

0.61

0.68

Partial 0.91

0.73

0.81

Overlap 0.96

0.77

0.85

Arg1 Exact 0.62

0.31

0.41

Partial 0.76

0.42

0.54

Overlap 0.87

0.43

0.58

Gold Conn + Auto SPT

Auto Conn + Gold SPT

26End2End Disc Pars Eval

Page 27: Project LOGO End-to-End Discourse Parser Evaluation Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering

Project LOGO

Result: replc. IOB chain

P R F1

Arg2 Exact 0.80 0.74 0.77

Partial 0.91 0.85 0.88

Overlap 0.97 0.88 0.92

Arg1 Exact 0.65 0.29 0.40

Partial 0.80 0.43 0.56

Overlap 0.97 0.43 0.60

27End2End Disc Pars Eval