a novel discourse parser based on support vector machine classification
DESCRIPTION
A Novel Discourse Parser Based on Support Vector Machine Classification. Source: ACL 2009 Author: David A. duVerle and Helmut Prendinger Reporter: Yong-Xiang Chen. Research problem. Automated annotation of a text with RST hierarchically organized relations To parse discourse - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/1.jpg)
A Novel Discourse Parser Based onSupport Vector Machine Classification
Source: ACL 2009 Author: David A. duVerle and Helmut Prendinger
Reporter: Yong-Xiang Chen
![Page 2: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/2.jpg)
Research problem
• Automated annotation of a text with RST hierarchically organized relations
1. To parse discourse
2. Within the framework of Rhetorical Structure Theory (RST)
3. Produce a tree-like structure
4. Based on SVM
![Page 3: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/3.jpg)
Rhetorical Structure Theory (RST)
• Mann and Thompson (1988)• A set of structural relations to composing units (‘
spans’) of text– 110 distinct rhetorical relations– Relations can be of intentional, semantic, or textual n
ature
• Two-step process (This study focus on step 2)– Segmentation of the input text into elementary discour
se units (‘edus’)– Generation of the rhetorical structure tree
• the edus constituting its terminal nodes
![Page 4: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/4.jpg)
• Edus:– Nucleus
• relatively more important part of the text
– Satellite• subordinate to the nucleus, represents supporting• information
out-going arrowsatellite
satellite
nucleu
nucleu
![Page 5: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/5.jpg)
Research restriction
1. A sequence of edus that have been segmented beforehand
2. Use the reduced set of 18 rhetorical relations • e.g.: PROBLEM-SOLUTION, QUESTION-ANSWER, STATEME
NT-RESPONSE, TOPIC-COMMENT and COMMENT-TOPIC are all grouped under one TOPIC-COMMENT relation
3. Turned all n-ary rhetorical relations into nested binary relations • e.g.: LIST relation
4. Only adjacent spans of text can be put in relation within an RST tree (‘Principle of sequentiality’ (Marcu, 2000)
![Page 6: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/6.jpg)
18 rhetorical relations
• Attribution, Background, Cause, Comparison, Condition, Contrast, Elaboration, Enablement, Evaluation, Explanation, Joint, Manner-Means,
Topic-Comment, Summary, Temporal, Topic- Change, Textual-organization, same-unit
![Page 7: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/7.jpg)
Classifier
• Input: given two consecutive spans (atomic edus or RST sub-trees) from input text
• Score the likelihood of a direct structural relation as well as probabilities for
1. a relation’s label
2. Nuclearity
• Gold standard: human cross-validation levels
![Page 8: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/8.jpg)
Two separate classifiers
• to train two separate classifiers:
• S: A binary classifier, for structure– existence of a connecting node between the t
wo input sub-trees
• L: A multi-class classifier, for rhetorical relation and nuclearity labeling
![Page 9: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/9.jpg)
Produce a valid tree
• Using these classifiers and a straight-forward bottom-up tree-building algorithm
![Page 10: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/10.jpg)
Classes
• 18 super-relations and 41 classes
• Considering only valid nuclearity options– e.g., (ATTRIBUTION, N, S) and (ATTRIBUTIO
N, S, N) are two classes of ATTRIBUTION– but not (ATTRIBUTION, N, N)
![Page 11: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/11.jpg)
Reduce the multi-classification
• Reduce the multi-classification problem through a set of binary classifiers, each trained either on a single class (“one vs. all”) or by pair (“one vs. one”)
![Page 12: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/12.jpg)
Input data
• Annotated documents taken from the RST-DT corpus– paired with lexicalized syntax trees (LS Trees)
for each sentence– a separate test set is used for performance ev
aluation
![Page 13: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/13.jpg)
Lexicalized syntax trees (LS Trees)
• Taken directly from the Penn Treebank corpus then “lexicalized” using a set of canonical head-projection rules– tagged with lexical “heads” on each internal n
ode of the syntactic tree
![Page 14: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/14.jpg)
Algorithm
• Repeatedly applying the two classifiers and following a naive bottom-up tree-construction method– obtain a globally satisfying RST tree for the entire text
• Starts with a list of all atomic discourse sub-trees– made of single edus in their text order
• Recursively selects the best match between adjacent sub-trees – using binary classifier S
• Labels the newly created sub-tree (using multi-label classifier L) and updates scoring for S, until only one sub-tree is left
![Page 15: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/15.jpg)
Features
1. ‘S[pan]’ are sub-tree-specific features– Symmetrically extracted from both left and ri
ght candidate spans
2. ‘F[ull]’ are a function of the two sub-trees considered as a pair
![Page 16: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/16.jpg)
Textual Organization
• S features:– Number of paragraph boundaries– Number of sentence boundaries
• F features:– Belong to same sentence– Belong to same paragraph
• Hypothesize a correlation between span length and rhetorical relation– e.g. the satellite in a CONTRAST relation will tend to be shorter t
han the nucleus– span size and positioning
• using either tokens or edus as a distance unit• using relative values for positioning and distance
![Page 17: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/17.jpg)
Lexical Clues and Punctuation
• Discourse markers are good indications • Use an empirical n-gram dictionary (for n {1, 2,∈
3}) built from the training corpus and culled by frequency– Reason: Takes into account non-lexical signals such
as punctuation
• Counted and encoded n-gram occurrences while considering only the first and last n tokens of each span– Classifier accuracy improved by more than 5%
![Page 18: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/18.jpg)
Simple Syntactic Clues
• For achieving better generalization – smaller dependency on lexical content
• Add shallow syntactic clues by encoding part-of-speech (POS) tags for both prefix and suffix in each span– length higher than n = 3 did not seem to impro
ve
![Page 19: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/19.jpg)
Dominance Sets• Extract from the syntax parse trees• EX. Difficult to identify the scope of the ATTRIBUTION re
lation below:
![Page 20: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/20.jpg)
One dominance: Logical nesting order
• Logical nesting order: 1A > 1B > 1C• This order allows us to favor the relation between 1B and
1C over a relation between 1A and 1B
![Page 21: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/21.jpg)
Dominance Sets
• S features:– Distance to root of the syntax tree– Distance to common ancestor in the syntax tree– Dominating node’s lexical head in span– Relative position of lexical head in sentence
• F features:– Common ancestor’s POS tag– Common ancestor’s lexical head– Dominating node’s POS tag (diamonds in Figure )– Dominated node’s POS tag (circles in Figure )– Dominated node’s sibling’s POS tag (rectangles in Fig
ure )
![Page 22: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/22.jpg)
Rhetorical Sub-structure
• Structural features for large spans (higher-level relations)
• Encoding each span’s rhetorical sub-tree into the feature vector
![Page 23: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/23.jpg)
Evaluation1. Raw performance of SVM classifiers2. Entire tree-building task• Binary classifier S
– trained on 52,683 instances• Positive: 1/3, Negative:2/3
– tested on 8,558 instances• classifier L
– trained on 17,742 instances• labeled across 41 classes
– tested on 2,887 instancesBaseline
![Page 24: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/24.jpg)
Baseline: Reitter’s result 2003
• A smaller set of training instances – 7976 v.s. 17,742 in this case
• Less classes – 16 rhetorical relation labels with no nuclearity,
v.s. to our 41 nuclearized relation classes
![Page 25: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/25.jpg)
Full System Performance
• Comparing structure and labeling of the RST tree produced to that manual annotation– perfectly-segmented & SPADE segmenter output– blank tree structure (‘S’)– with nuclearity (‘N’)– with rhetorical relations (‘R’)– fully labeled structure (‘F’)
![Page 26: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/26.jpg)
Comparison with other Algorithms
Testing corpus relation classes segmenter
M unavailable 15 SPADE
LT selection of 21 documents 14 unavailable
dV selection of 21 documents 18 SPADE
![Page 27: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/27.jpg)
The End
![Page 28: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/28.jpg)
Background
• Coherence relations reflect the authors intent– Hierarchically structured set of Coherence rel
ations
• Discourse– Focuses on a higher-level view of text than se
ntence level
![Page 29: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/29.jpg)
14
• Due to small differences in the way they were tokenized and pre-treated, rhetorical tree and LST are rarely a perfect match: optimal alignment is found by minimizing edit distances between word sequences
![Page 30: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/30.jpg)
Features
• Use n-fold validation on S and L classifiers to assess the impact of some sets of features on general performance and eliminate redundant features
• ‘S[pan]’ are sub-tree-specific features– Symmetrically extracted from both left and rig
ht candidate spans
• ‘F[ull]’ are a function of the two sub-trees considered as a pair
![Page 31: A Novel Discourse Parser Based on Support Vector Machine Classification](https://reader030.vdocuments.us/reader030/viewer/2022033105/56814e17550346895dbb7e81/html5/thumbnails/31.jpg)
Strong Compositionality Criterion