the cass technique for evaluating the performance of ... · the cass technique for evaluating the...

The CASS Technique for Evaluating the Performance of Argument Mining

Rory Duthie, John Lawrence, Katarzyna Budzynska, Chr is Reed

Centre for Argument TechnologyUniversity of Dundee

Rory Duthie John Lawrence Katarzyna Budzynska Chris Reed

IFiS Polish Academy of Sciences

OutlineMotivation and Aim

• Problems when publishing evaluation and results

• CASS (Combined Argument Similarity Score)

Metric• How CASS is calculated

Automation• Deployment of CASS

• CASS (Combined Argument Similarity Score)

Motivation

•Consistency for the Argument Mining community

•Metric which does not double penalise mismatches

•Automate the calculations

Motivation: Consistency for the community

From the 2nd Workshop on Argument(ation) Mining:

• Inter-annotator agreement: 3 papers - Cohen’s Kappa 3 papers - percentage agreement2 papers - precision and recall 3 papers - other methods

• Automatic Argument Mining results: 4 papers - accuracy 5 papers - precision, recall and F-score1 paper - macro-averaged F-score

• Other Metrics in Comp Ling: ROUGE, in text summarization

Motivation: Metric (1/3)(Kirschner et al., 2015) provides:• Graph Based approach, APA, Weighted Average

Problems: • Segmentation differences

• Propositional content relations only

• Not all nodes in an analysis (Distance < 6)

• Relation direction ignored

• Set metrics

Motivation: Metric (2/3)CASS extends (Kirschner et al., 2015):

• Segmentation differences

• Propositional content relations and dialogical content relations:

• confusion matrices

• all nodes

• differing segmentation

Motivation: Metric (3/3)• Use CASS to combine scores

• CASS with any metric

• Annotator agreement and Argument Mining results

• Comparison of analysis in different annotation schemes

Motivation: Automatic Solution

Manual VS ManualCohen’s Kappa,Fleiss Kappa…

Manual VS AutomaticPrecision, Recall, F-score,

Accuracy…

• Aim of CASS (Combined Argument Similarity Score)

Metric: Segmentation (1/4)

Still, it is possible that, should war erupt in Iraq, American and British forces might fall foul of, for example, the provision of the ICC treaty outlawing attacks on military targets that cause "clearly excessive" harm to civilians.

That is especially so if they do not learn lessons from recent wars and take corrective steps. The weapon most likely to produce such harm is the cluster bomb.

12 31 1810 28S2 17 12 27

S1 20 18 29 39 31 18

Still, it is possible that, should war erupt in Iraq, American and British forces might fall foul of, for example, the provision of the ICC treaty outlawing attacks on military targets that cause "clearly excessive" harm to civilians.

•Pk - (Beeferman et al., 1999)

•WindowDiff - (Pevzner and Hearst, 2002)

•Segmentation Similarity - (Fournier and Inkpen, 2012)

Metric: Calculating Relations

•Guaranteed matching formula used for all propositions and locutions

•We use the Levenshtein distance

•Levenshtein distance and word positions are combined to give node matches

Metric: Propositional Relations (1/3)

Annotation 1 Annotation 2

•Pair nodes and check the relation attached

•When there is a differing segmentation, consider fine grained and convergent arguments

•All node pairs are considered to give a confusion matrix

Metric: Dialogical Relations (1/3)

•Split calculation into parts

•When there is a differing segmentation, considered matched pairs

•All node pairs are considered to give a confusion matrix

CASS technique

•Combine scores for the CASS technique

•Applied to any consistent combination of scores

CASS: Evaluation

•Use CASS – Kappa as it provides an adjustment of the score for chance

•Not the only score that can be used with CASS

CASS: Extension

•Any metric with a confusion matrix can be applied to CASS

• E.g. Balanced Accuracy, Informedness…

•We provide a select set but there is no metric ruled out

• Aim of CASS (Combined Argument Similarity Score)

Automation: AIF (Argument Interchange Format)

•AIF allows us to split calculations into component parts: segmentation, propositional and dialogical

•AIF allows the translation of other representation models to AIF format

•Allows for comparison of corpora in different representations.

•However, CASS technique is independent of AIF

Automation: AIFdb

http://www.aifdb.org/search

Automation: AIFcorpora

http://corpora.aifdb.org/

Automation: Argument Analytics

http://analytics.arg-tech.org

Thank You.

rory@arg.tech

Find out more athttp://arg.tech

Come to COMMA 2016: Conference onComputational

Models of Argument(Potsdam)

Investigate thedatasets at

http://aifdb.org

ReferencesChristian Kirschner, Judith Eckle-Kohler, and Iryna Gurevych. 2015. Linking the thoughts: Analysis of argumentation structures in scientific publications. In Proceedings of the Second Workshop on Argumentation Mining. Association for Computational Linguistics, pages 1–11.

Doug Beeferman, Adam Berger, and John Lafferty. 1999. Statistical models for text segmentation. Machine learning, 34(1-3):177–210.

Lev Pevzner and Marti A Hearst. 2002. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1):19–36.

Chris Fournier and Diana Inkpen. 2012. Segmentation similarity and agreement. In Proceedings of the2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 152–161. Association for Computational Linguistics

the cass technique for evaluating the performance of ... · the cass technique for evaluating the...

Documents

cass instruction

cass budgets

evaluating improvisation as a technique for training...

cass knowledge - cass business school

welcome to cass

a novel phantom technique for evaluating the performance of...

cass consulting - welcome to cass | cass business school ·...

an improved technique for evaluating the cpt friction...

cass regional -...

&rpphufldo5hkdelolwdwlrq([hpswlrq&huwlilfdwlrqiru ...cass 27...

cass overview

planning and evaluating your program. torture technique or...

march 7, 2019 cass county commissioners meeting room cass

cass everitt nvidia corporation cass@nvidia.com shadow...

fish otolith ablation: a new technique for evaluating...

higher order surfaces in opengl with nv_evaluators cass...

the cass terrestrial carbon cycle model v. 1 -...

an electrochemical technique for rapidly evaluating...

strategies for evaluating opportunities - sage …...

cass freight index report - cass information systems...