the cass technique for evaluating the performance of ... · the cass technique for evaluating the...
TRANSCRIPT
![Page 1: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/1.jpg)
The CASS Technique for Evaluating the Performance of Argument Mining
Rory Duthie, John Lawrence, Katarzyna Budzynska, Chr is Reed
![Page 2: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/2.jpg)
Centre for Argument TechnologyUniversity of Dundee
Rory Duthie John Lawrence Katarzyna Budzynska Chris Reed
IFiS Polish Academy of Sciences
22
![Page 3: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/3.jpg)
OutlineMotivation and Aim
• Problems when publishing evaluation and results
• CASS (Combined Argument Similarity Score)
Metric• How CASS is calculated
Automation• Deployment of CASS
33
![Page 4: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/4.jpg)
OutlineMotivation and Aim
• Problems when publishing evaluation and results
• CASS (Combined Argument Similarity Score)
Metric• How CASS is calculated
Automation• Deployment of CASS
44
![Page 5: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/5.jpg)
Motivation
•Consistency for the Argument Mining community
•Metric which does not double penalise mismatches
•Automate the calculations
55
![Page 6: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/6.jpg)
Motivation: Consistency for the community
From the 2nd Workshop on Argument(ation) Mining:
• Inter-annotator agreement: 3 papers - Cohen’s Kappa 3 papers - percentage agreement2 papers - precision and recall 3 papers - other methods
• Automatic Argument Mining results: 4 papers - accuracy 5 papers - precision, recall and F-score1 paper - macro-averaged F-score
• Other Metrics in Comp Ling: ROUGE, in text summarization
66
![Page 7: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/7.jpg)
Motivation: Metric (1/3)(Kirschner et al., 2015) provides:• Graph Based approach, APA, Weighted Average
Problems: • Segmentation differences
• Propositional content relations only
• Not all nodes in an analysis (Distance < 6)
• Relation direction ignored
• Set metrics
77
![Page 8: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/8.jpg)
Motivation: Metric (2/3)CASS extends (Kirschner et al., 2015):
• Segmentation differences
• Propositional content relations and dialogical content relations:
• confusion matrices
• all nodes
• differing segmentation
88
![Page 9: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/9.jpg)
Motivation: Metric (3/3)• Use CASS to combine scores
• CASS with any metric
• Annotator agreement and Argument Mining results
• Comparison of analysis in different annotation schemes
9
![Page 10: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/10.jpg)
Motivation: Automatic Solution
Manual VS ManualCohen’s Kappa,Fleiss Kappa…
Manual VS AutomaticPrecision, Recall, F-score,
Accuracy…
1010
VS VS
![Page 11: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/11.jpg)
OutlineMotivation and Aim
• Problems when publishing evaluation and results
• Aim of CASS (Combined Argument Similarity Score)
Metric• How CASS is calculated
Automation• Deployment of CASS
1111
![Page 12: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/12.jpg)
Metric: Segmentation (1/4)
1212
Still, it is possible that, should war erupt in Iraq, American and British forces might fall foul of, for example, the provision of the ICC treaty outlawing attacks on military targets that cause "clearly excessive" harm to civilians.
![Page 13: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/13.jpg)
Metric: Segmentation (2/4)
1313
That is especially so if they do not learn lessons from recent wars and take corrective steps. The weapon most likely to produce such harm is the cluster bomb.
![Page 14: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/14.jpg)
Metric: Segmentation (3/4)
1414
12 31 1810 28S2 17 12 27
S1 20 18 29 39 31 18
Still, it is possible that, should war erupt in Iraq, American and British forces might fall foul of, for example, the provision of the ICC treaty outlawing attacks on military targets that cause "clearly excessive" harm to civilians.
![Page 15: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/15.jpg)
Metric: Segmentation (4/4)
•Pk - (Beeferman et al., 1999)
•WindowDiff - (Pevzner and Hearst, 2002)
•Segmentation Similarity - (Fournier and Inkpen, 2012)
1515
![Page 16: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/16.jpg)
Metric: Calculating Relations
•Guaranteed matching formula used for all propositions and locutions
•We use the Levenshtein distance
•Levenshtein distance and word positions are combined to give node matches
1616
![Page 17: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/17.jpg)
Metric: Propositional Relations (1/3)
1717
5
6
42
31
7
2 4
31
6
8
5
Annotation 1 Annotation 2
![Page 18: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/18.jpg)
Metric: Propositional Relations (2/3)
1818
5
6
42
31
7
2 4
31
6
8
5
Annotation 1 Annotation 2
![Page 19: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/19.jpg)
Metric: Propositional Relations (3/3)
•Pair nodes and check the relation attached
•When there is a differing segmentation, consider fine grained and convergent arguments
•All node pairs are considered to give a confusion matrix
19
![Page 20: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/20.jpg)
Metric: Dialogical Relations (1/3)
2020
![Page 21: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/21.jpg)
Metric: Dialogical Relations (2/3)
2121
![Page 22: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/22.jpg)
Metric: Dialogical Relations (3/3)
•Split calculation into parts
•When there is a differing segmentation, considered matched pairs
•All node pairs are considered to give a confusion matrix
22
![Page 23: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/23.jpg)
CASS technique
•Combine scores for the CASS technique
•Applied to any consistent combination of scores
2323
![Page 24: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/24.jpg)
CASS: Evaluation
•Use CASS – Kappa as it provides an adjustment of the score for chance
•Not the only score that can be used with CASS
2424
![Page 25: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/25.jpg)
CASS: Extension
•Any metric with a confusion matrix can be applied to CASS
• E.g. Balanced Accuracy, Informedness…
•We provide a select set but there is no metric ruled out
2525
![Page 26: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/26.jpg)
OutlineMotivation and Aim
• Problems when publishing evaluation and results
• Aim of CASS (Combined Argument Similarity Score)
Metric• How CASS is calculated
Automation• Deployment of CASS
2626
![Page 27: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/27.jpg)
Automation: AIF (Argument Interchange Format)
•AIF allows us to split calculations into component parts: segmentation, propositional and dialogical
•AIF allows the translation of other representation models to AIF format
•Allows for comparison of corpora in different representations.
•However, CASS technique is independent of AIF
2727
![Page 31: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/31.jpg)
Thank You.
31
Find out more athttp://arg.tech
Come to COMMA 2016: Conference onComputational
Models of Argument(Potsdam)
Investigate thedatasets at
http://aifdb.org
31
![Page 32: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris](https://reader035.vdocuments.us/reader035/viewer/2022071023/5fd7fdec4efe8063f863b59b/html5/thumbnails/32.jpg)
ReferencesChristian Kirschner, Judith Eckle-Kohler, and Iryna Gurevych. 2015. Linking the thoughts: Analysis of argumentation structures in scientific publications. In Proceedings of the Second Workshop on Argumentation Mining. Association for Computational Linguistics, pages 1–11.
Doug Beeferman, Adam Berger, and John Lafferty. 1999. Statistical models for text segmentation. Machine learning, 34(1-3):177–210.
Lev Pevzner and Marti A Hearst. 2002. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1):19–36.
Chris Fournier and Diana Inkpen. 2012. Segmentation similarity and agreement. In Proceedings of the2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 152–161. Association for Computational Linguistics
3232