project logo end-to-end discourse parser evaluation sucheta ghosh, sara tonelli, giuseppe riccardi,...
TRANSCRIPT
Project LOGO
End-to-End Discourse Parser Evaluation
Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson
Department of Information Engineering and Computer ScienceUniversity of Trento, Italy
Project LOGO
Content Introduction
Discourse Parser: what + why + howDiscourse Parser & Penn Discourse TreeBank (PDTB)Our contribution
Architecture Feature Result Conclusion
2End2End Disc Pars Eval
Project LOGO
Introduction What: we refer to coherent structured group of sentences
or expressions as a discourse Why: discourse structure to represent the meaning of the
document How :
Process flow: data (discourse) segmentation discourse parsing discourse structure
Discourse structure includes relations (connective and its arguments ) lexically anchored in the document text
Common Data Sources: Rhetorical Structure Tree (RST) & Penn Discourse TreeBank (PDTB ) We used this
3End2End Disc Pars Eval
Project LOGO
Examples from PDTB(1)Arg1 -> I never gamble too far.
Explicit Connective -> In particularArg2 -> I quit after one try, whether I win or lose.
[EXPANSION ]
4End2End Disc Pars Eval
Each annotated relation includes a connective, two arguments and a sense label of connective
Connective occur between two arguments or at the beginning of sentence or inside argument
The top-level senses of three-layered hierarchy: TEMPORAL, CONTINGENCY, COMPARISON, EXPANSION
Project LOGO
Examples from PDTB(2) When Mr. Green won a $240,000 verdict in a land condemnation
case against the State in June 1983, he says, Judge O’Kicki unexpectedly awarded him an additional $100,000. [TEMPORAL ]
As an indicator of the tight grain supply situation in the U.S., market analysts said that late Tuesday the Chinese government, which often buys U.S. grains in quantity, turned instead to Britain to buy 500,000 metric tons of wheat. [COMPARISON ]
Since McDonald’s menu prices rose this year, the actual deadline may have been more. [CONTINGENCY ]
(Arg1 italicized, connectives underlined, Arg2 boldfaced)
5End2End Disc Pars Eval
Project LOGO
PDTB Corpus Statistics Arg2 always in same sentence as connective
60.9% of the annotated Arg1 in same sentence as connective, 39.1% is in the previous sentence (30.1% adjacent, 9.0% non adjacent)
We used this statistic information to establish baseline
6End2End Disc Pars Eval
Project LOGO
Our Contribution Developed end-to-end discourse parser to retrieve
discourse structure with explicit connective, 2 arg spans starting with text paragraph
Evaluation Established system with Gold-standard data (PTB+PDTB) Evaluated with baseline Implemented same method in automated system Improvement of the automated system in terms of applicability
Overlapping discourse segmentation technique (+2/-2 window) applied on the complete text
Followed chunking strategy for classification The discourse model is a cascaded CRF
7End2End Disc Pars Eval
Project LOGO
End-to-End Architecture
8End2End Disc Pars Eval
Chunklink
• By Sabaine Buchholz
• CoNLL’00 task
AddDiscourse
• Pitler & Nenkova ‘09
• Conn. SenseDet.
RootExtract+Morpha
• Morph & All Feat
• Johansson+ Minnen et al
Pruner Arg2 Arg1
Doc
Parser
Parse_Tree
Project LOGO
FeaturesFeatures used for Arg1 and Arg2 segmentation and labeling.F1. Token (T)F2. Sense of Connective (CONN)F3. IOB chain (IOB)F4. PoS tagF5. Lemma (L)F6. Inflection (INFL)F7. Main verb of main clause (MV)F8. Boolean feature for MV (BMV)
Additional feature used only for Arg1F9. Arg2 Labels
9End2End Disc Pars Eval
For more details: Ghosh et al IJCNLP 2011
Project LOGO
Features: Arg1Features used for Arg1 and Arg2 segmentation and labeling.F1. Token (T)F2. Sense of Connective (CONN)F3. IOB chain (IOB)F4. PoS tagF5. Lemma (L)F6. Inflection (INFL)F7. Main verb of main clause (MV)F8. Boolean feature for MV (BMV)
Additional feature used only for Arg1F9. Arg2 Labels
10End2End Disc Pars Eval
For more details: Ghosh et al IJCNLP 2011
Project LOGO
Features: Arg2Features used for Arg1 and Arg2 segmentation and labeling.F1. Token (T)F2. Sense of Connective (CONN)F3. IOB chain (IOB)F4. PoS tagF5. Lemma (L)F6. Inflection (INFL)F7. Main verb of main clause (MV)F8. Boolean feature for MV (BMV)
Additional feature used only for Arg1F9. Arg2 Labels
11End2End Disc Pars Eval
For more details: Ghosh et al IJCNLP 2011
Project LOGO
Evaluation & BaselineMetrics: Precision, Recall and F1 measure
Scoring schemes:Exact Match: correct if classified span exactly coincides with gold standard span
Baseline (On the basis of statistics given at annotation manual):Arg2: by labeling all tokens of the text span between the connective and the beginning of the next sentence
Arg1: by labeling all tokens in the text span from the end of the previous sentence to the connective position; if the connective occurs at the beginning of a sentence, labeling previous sentence.
12End2End Disc Pars Eval
Project LOGO
Exact Arg2 Results: Comparison Viewgraph
13End2End Disc Pars Eval
P R F1
Baseline 0.53 0.46 0.49
Gold-Standard 0.84 0.74 0.79
Automatic 0.80 0.74 0.77
AutoConn+GoldSPT 0.82 0.70 0.76
GoldConn+AutoSPT 0.76 0.61 0.68
Lightweight(Auto) 0.72 0.56 0.63
Project LOGO
Exact Arg1 Results: Comparison Viewgraph
14End2End Disc Pars Eval
P R F1
Baseline 0.19 0.19 0.19
Gold-Standard 0.68 0.39 0.49
Automatic 0.63 0.28 0.39
AutoConn+GoldSPT 0.67 0.31 0.43
GoldConn+AutoSPT 0.62 0.31 0.41
Lightweight(Auto) 0.60 0.27 0.37
Project LOGO
Features
The IOB(Inside-Outside-Begin) chain all constituents on the path between the root note and the current leaf node of the tree. For example IOB chain feature for ``flashed“: I-S/E-VP/E-SBAR/E-S/C-VP , where B-, I-, E- and C- indicate whether the given token is respectively at the beginning, inside, at the end of the constituent, or a single token chunk.
15End2End Disc Pars Eval
Project LOGO
Conclusion
The Automatic end2end system results nearly same with Gold standard
We lead towards a “lightweight” version of the pipeline – shallow & less dependence of SPTs
We wish to explore more features We improved our result by 5 points for Arg1 classification
using a previous sentence feature (Ghosh et al IJCNLP 2011)
16End2End Disc Pars Eval
Project LOGO
Thank you
End2End Disc Pars Eval
Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson
Department of Information Engineering and Computer ScienceUniversity of Trento, Italy
{ghosh, riccardi}@disi.unitn.it
Project LOGO
Previous Work Task limited to retrieving the argument heads (Wellner et
al 2007, Elwell et al 2008) Dinesh et al. (2005) extracted complete arguments with
boundaries, but only for a restricted class of connectives The identification of Arg1 has been only partially
addressed in previous works (Prasad 2010) Automatic surface-sense classification (at class level)
already reached the upper bound of inter-annotator agreement (Pitler and Nenkova, 2009)
18End2End Disc Pars Eval
Project LOGO
Data & Tools Corpus Used: Penn Discourse Tree Bank (PDTB)
For Gold Standard System: Penn Tree Bank (PTB) corpus is used Third party software/scripts used:
Stanford Syntactic Tree Parser (by Klein & Manning 2003) AddDiscourse (Explicit Connective Classification) (Pitler and
Nenkova 2008) ChunkLink.pl to extract IOB chains (by Sabine Buchholtz: CoNLL
Shared Task 2000) RootExtractor: Syntactic Parse Tree (SPT) processors (by Richard
Johansson) Morpha (Minnen et al 2001) Conditional Random Field: CRF++ by Taku Kudo
19End2End Disc Pars Eval
Project LOGO
Overall Architecture
Syntactic tree parser is used for automatic systemsConnective Detection and classification tool is used for automatic systems PDTB & PTB are not used during end-to-end automatic testing phase
20End2End Disc Pars Eval
Project LOGO
End2End Testing Phase
21End2End Disc Pars Eval
Project LOGO
Conditional Random Field
22End2End Disc Pars Eval
We use the CRF++ tool (http://crfpp.sourceforge.net/) for sequence labeling classification (Lafferty et al., 2001), with second-order Markov dependency between tags.
Beside the individual specification of a feature in the feature description template, the features in various combinations are also represented.
We used this tool because the output of CRF++ is compatible to CoNLL 2000 chunking shared task, and we view our task as a discourse chunking task.
On the other hand, linear-chain CRFs for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Also Sha and Pereira (2003) claim that, as a single model, CRFs outperform other models for shallow parsing.
Project LOGO
Hill Climbing Algorithm
23End2End Disc Pars Eval
function HILL-CLIMBING ( problem) returns a state that is a local maximumcurrent 9— MAKE-NODE(problem.INITIAL-STATE)loop doneighbor highest-valued successor of currentif (neighbor.VALUE < current.VALUE) then return current.STATEcurrent 9<— neighbor [Artificial Intelligence: Stuart J. Russel]
The hill climbing search algorithm, the most basic local search technique. At each step the current node is replaced by the best neighbor;
Here neighbor with the highest VALUE, but if a heuristic cost estimate h is used, we would find the neighbor with the lowest h.
Hill climbing is greedy, fast local search We optimized this selected set with feature ablation technique, leaving 1
feature each time
Project LOGO
Features
The IOB(Inside-Outside-Begin) chain corresponds to the syntactic categories of all the constituents on the path between the root note and the current leaf node of the tree. The corresponding feature would be I-S/E-VP/E-SBAR/E-S/C-VP, where B-, I-, E- and C- indicate whether the given token is respectively at the beginning, inside, at the end of the constituent, or a single token chunk. In this case, ``flashed" is at the end of every constituent in the chain, except for the last VP, which dominates one single leaf.
24End2End Disc Pars Eval
Project LOGO
Result: Gold-lbl & AutoP R F1
Arg2 Exact 0.840.53
0.74 0.46
0.79 0.49
Partial 0.93 0.80
0.82 0.85
0.88 0.82
Overlap 0.97 0.98
0.88 0.85
0.92 0.91
Arg1 Exact 0.68 0.19
0.39 0.19
0.49 0.19
Partial 0.81 0.50
0.51 0.68
0.62 0.58
Overlap 0.91 0.70
0.52 0.68
0.66 0.69
P R F1
Arg2 Exact 0.80
0.74
0.77
Partial 0.91
0.85
0.88
Overlap 0.97
0.88
0.92
Arg1 Exact 0.64
0.31
0.42
semi Partial 0.76
0.39
0.52
auto Overlap 0.84
0.40
0.54
Arg1 Exact 0.63
0.28
0.39
full Partial 0.74
0.36
0.48
auto Overlap 0.83
0.37
0.51
Gold-labeled Sys Output
Automatic Sys Output
25End2End Disc Pars Eval(Baseline result in blue color)
Project LOGO
Combo Result P R F1
Arg2 Exact 0.82
0.70
0.76
Partial 0.93
0.79
0.85
Overlap 0.96
0.83
0.89
Arg1 Exact 0.67
0.31
0.43
Partial 0.81
0.44
0.57
Overlap 0.94
0.44
0.60
P R F1
Arg2 Exact 0.76
0.61
0.68
Partial 0.91
0.73
0.81
Overlap 0.96
0.77
0.85
Arg1 Exact 0.62
0.31
0.41
Partial 0.76
0.42
0.54
Overlap 0.87
0.43
0.58
Gold Conn + Auto SPT
Auto Conn + Gold SPT
26End2End Disc Pars Eval
Project LOGO
Result: replc. IOB chain
P R F1
Arg2 Exact 0.80 0.74 0.77
Partial 0.91 0.85 0.88
Overlap 0.97 0.88 0.92
Arg1 Exact 0.65 0.29 0.40
Partial 0.80 0.43 0.56
Overlap 0.97 0.43 0.60
27End2End Disc Pars Eval