textual entailment and machine learning for argument...

Textual Entailment and Machine Learningfor Argument Mining

Serena Villata

Universite Cote d’Azur, CNRS, Inria, I3S, France

[email protected]

NLP Techniques for MiningArguments and their Relations

• Typical argument mining pipeline:1 Argument extraction: detect arguments within the input

natural language texts• Support Vector Machines• Naive Bayes classifiers• Logistic Regression• Decision Trees and Random Forests

2 Relation extraction: predict what are the relations holdingbetween the arguments

• Textual Entailment• Support Vector Machines• Context-free grammar parser• Naive Bayes classifiers

Serena Villata, TE and Machine Learning for Argument Mining, IJCAI Tutorial, 11.07.2016 2




2 Relation prediction: predict what are the relations holdingbetween the arguments



Applying logistic regression toargument extraction


Applying logistic regression toargument extraction

The case of Twitter


DART - Dataset of Arguments andtheir Relations on Twitter

Step 1: arguments annotation

• tweets containing an opinion

RT mariofraioli: What will #AppleWatch mean for runners? I can’t speak for

everyone, but I won’t be running out to get one. Will you? http:t.coxBpj0HWKPW

• claims expressed as questions

RT GrnEyedMandy: What next Republicans? You going to send North Korea a love

letter too? #47Traitors

• tweets containing factual information

RT HeathWallace: You can already buy a fake #AppleWatch in China

http:t.coWpHEDqYuUC via cnnnews mr gadget http:t.coWhcMKuMWcd




• amount of world knowledge?

RT SaysSheToday: The Dixie Chicks were attacked just for using 1A right to say they

were ashamed of GWB. They didn’t commit treason like the #47Senators

• tweets containing pronouns only? Not arguments.

FakeGhostPirate GameOfThrones He is the one true King after all ;)

• tweets containing an advertisement? Arguments if providingopinions/factual information

RT NewAppleDevice: Apple’s smartwatch can be a games platform and here’s why

http:t.couIMGDyw08I

• IAA: three annotators (100 tweets), Krippendorff’s α = 0.74











http:t.couIMGDyw08I




Step 2: pairs creation

• identical or almost identical tweet-arguments are pruned toavoid redundancy

• arguments discussing about the same topic (or the sameaspect of it) are grouped together

• pairs are created within such groups

• manual creation of categories for each topic



Step 3: argument linking

• positive relation: i.e., a support relation in abstract bipolarargumentation (Cayrol and Lagasquie-Schiex, 2005)

Tweet-A: The letter #47Traitors sent to Iran is one of the most plainly stupid things agroup of senators has ever done. http://t.co/oEJFlJeXjy

Tweet-B: Republicans Admit: That Iran Letter Was a Dumb Idea

http://t.co/Edj57f4nE8. You think?? #47Traitors

• negative relation: i.e., an attack relation in abstractargumentation (Dung, 1995)

Tweet-C:#47Traitors is a joke. Given the definition of treason, it would be on the

Obama administration if Iran developed a nuclear bomb.

• unrelated











• unrelated


Data:

• (politics) the letter to Iran written by 47 senators on10/03/2015 (e.g., #47Traitors, #IranLetter)

• (politics) the referendum in Greece for or against Greeceleaving European Union on 10/07/2015 (e.g., #Grexit,#GreeceCrisis)

• (product release) the release of Apple iWatch on10/03/2015 (e.g., #AppleWatch, #iWatch)

• (product release) the airing of Episode 4 (Season 5) of theseries Game of Thrones on 4/05/2015 (e.g.,#GameOfThrones, #GoT)



• annotators: three European students (from Luxembourg, Italyand Germany)

• reconciliation phase: the label annotated by at least 2annotators out of 3 (majority voting mechanism) was chosen

• IAA: between the expert annotators and the reconciledstudent annotations on 250 tweets (α47traitors = 0.81)

Topic # arg # not arg # tot47 Traitors 768 214 982Grexit 746 241 987Apple Watch 623 352 975Games of Thrones 565 374 939TOTAL 2702 1181 3883


Step 2: pairs creation

• + 2200 argument-tweets on Apple watch (on 9/03/2016)

• categories: features (F), price (P), look (L), buyingannouncements (B), advertisement (A), predictions on thefuture of the product (S), news (N), and others (O).

• category features divided into: health, innovation, and battery.

• tweets could be annotated with more than one category

O A B F L N P S# 720 175 370 619 205 65 189 112


Step 3: arguments linking

• Two expert annotators annotated ∼600 pairs oftweet-arguments in each categories look, price, health, and100 pairs of category prediction

• IAA: 99 pairs (33 pairs from each topic), Krippendorff α =0.67.

Support Attack Unknown Total# in look 72 30 498 600# in price 134 44 412 590

# in health 222 31 348 601# in predictions 18 17 65 100

# TOTAL 446 122 1323 1891


Argument mining pipeline

tweet-arguments

not arguments

attacks

attacks

supports

argumentation graphs

clusters of tweet- arguments

pairs of tweet-arguments and their relations

Twitter stream


Logistic regression• Problem of two-class classification:

Posterior probability of class C1 can be written as a logisticsigmoid acting on a linear function of the feature vector φ sothat:

p(C1|φ) = y(φ) = σ(W Tφ)

where p(C2|φ) = 1− p(C1|φ).For an M-dimensional feature space φ, this model has Madjustable parameters.

Likehood function for a dataset {φn, tn} where tn ∈ {0, 1} andφn = φ(xn) with n = 1, . . . ,N:

p(t|w) =N∏

n=1

y tnn {1− yn}1−tn

where t = {t1, . . . , tN}T and yn = p(C1|φn).


Logistic regression

http://www.appstate.edu/ whiteheadjc/service/logit/


Argument identification

• classification task: argument vs non argument tweets

• train and validate on the first three topics (3-fold crossvalidation with randomized hyperparameter search (Bergstraand Bengio, 2012)), test on the Apple Watch dataset

• tweets tokenized with Twokenize and PoS annotated

• baseline model: logistic regression trained on PoS tags andbigrams as features

Approach Average F1baseline 0.64baseline + tokens 0.66baseline + tokens + bigrams tokens 0.67

• best model: Logistic regression, L2-penalized with λ = 100,all the features and re-training on the 3 folds: F1-score = 0.78





2 Relation prediction: predict what are the relations holdingbetween the arguments



Combining Textual Entailment andArgumentation Theory for Supporting

Online Debates Interactions


Calling to say youwill be late can reduce

stress and make you less inclined to drive

aggressively

Research showsthat drivers speaking on a mobile phone have much slower reactons in braking

tests than non-users …

Debate issue: The use of cell-phones while driving is a public hazard.




aggressively




Textual Entailment




aggressively



A1 A11


Textual Entailment

ATTACK




aggressively



A1 A11


Abstract ArgumentatonTheory

Textual Entailment

ATTACK




aggressively



A1 A11



Argument A1 is rejected.Argument A11 is accepted.

Textual Entailment

ATTACK




aggressively



A1 A11



Argument A1 is rejected.Argument A11 is accepted.

Decision Making

Textual Entailment

ATTACK


Online Debates Platforms


Debatepedia


Textual Entailment

• A text t entails a hypothesis h if h is true in everycircumstance (possible world) in which t is true (Chierchia

and McConnell-Ginet (2001).

• Strict entailment - does not account for some uncertaintyallowed in applications.

• “Almost certain” entailments:

t: The technological triumph known as GPS . . . was incubatedin the mind of Ivan Getting.

h: Ivan Getting invented the GPS.


Textual Entailment

• Generic framework for capturing major semantic inferenceneeds in NLP applications (Dagan and Glickman, 2004).

• Relation between two textual fragments T and H:

T ⇒ H: meaning of H can be inferred from meaning of T , asinterpreted by a typical language user.

T1: Research shows that drivers speaking on a mobile phone have much slowerreactions in braking tests than non-users.H: The use of cell-phones while driving is a public hazard.

T2: Regulation could negate the safety benefits of having a phone in the car.When you’re stuck in traffic, calling to say you’ll be late can reduce stressand make you less inclined to drive aggressively to make up lost time.H: The use of cell-phones while driving is a public hazard.


Textual Entailment

• Generic framework for capturing major semantic inferenceneeds in NLP applications (Dagan and Glickman, 2004).

• Relation between two textual fragments T and H:

T ⇒ H: meaning of H can be inferred from meaning of T , asinterpreted by a typical language user.

T1: Research shows that drivers speaking on a mobile phone have much slowerreactions in braking tests than non-users.

H: The use of cell-phones while driving is a public hazard.

T2: Regulation could negate the safety benefits of having a phone in the car.When you’re stuck in traffic, calling to say you’ll be late can reduce stressand make you less inclined to drive aggressively to make up lost time.

H: The use of cell-phones while driving is a public hazard.


Details of the Entailment Strategy

• PreprocessingMultiple levels of lexical pre-processingSyntactic ParsingShallow semantic parsingAnnotating semantic phenomena

• RepresentationBag of words, n-grams through tree/graphs basedrepresentationLogical representations

• Knowledge SourcesSyntactic mapping rulesLexical resourcesSemantic Phenomena specific modulesRTE specific knowledge sourcesAdditional Corpora/Web resources



• Control Strategy Decision Making

Single pass/iterative processingStrict vs. Parameter based

• Justification

What can be said about the decision?


Abstract Argumentation Theory

• Directed graph (Dung, 1995)

Nodes: abstract argumentsEdges: attack relation

argumentA

argumentB

argumentC

argumentA

argumentB

IN OUT IN OUT IN

ATTACK ATTACK ATTACK

• Bipolar argumentation(Cayrol & Lagasquie-Schiex, 2005),(Boella et al., 2010)

argumentC

argumentA

argumentB

OUT OUT

ATTACK SUPPORT

ATTACK(due to support)

IN


Example of DebateMaking Internet a right only benefits society

A2: Internet access is essential now; must be a right. The internet is only that wire that delivers freedom of speech, freedom of assembly, and freedom of the press in a single connection.A1: Making Internet a right only benefits society.

A3: Internet not as important as real rights. We may think of such trivial things as a fundamental right, but consider the truly impoverished and what is most important to them. The right to vote, the right to liberty and freedom from slavery or the right to elementary education.A1: Making Internet a right only benefits society.

A4: I’ve seen the growing awareness within the developing world that computers and connectivity matter and can be useful. It’s not that computers matter more than water, food, shelter and health-care,but that the network and PCs can be used to ensure that those other things are available. Satellite imagery sent to a local computer can help villages find fresh water, mobile phones can tell farmers the prices at market so they know when to harvest.A3: Internet not as important as real rights. We may think of such trivial things as a fundamental right, but consider the truly impoverished and what is most important to them. The right to vote, the right to liberty and freedom from slavery or the right to elementary education.

A1A4 A3

A2

ATTACK ATTACK

SUPPORT


Data set:creation of T-H argument pairs

1 Debatepedia main issue as starting argument;

2 each opinion considered as an argument;3 arguments coupled with:

1 starting argument, or2 other arguments in same discussion to which argument refers;

4 resulting pairs of arguments tagged with appropriate relation.


Case Study:TE and support

T: In 1992 the World Health Organization’s Expert Committee on Drug Dependence (ECDD)undertook a ’prereview’ of coca leaf at its 28th meeting. The 28th ECDD report concludedthat, “the coca leaf is appropriately scheduled as a narcotic under the Single Convention onNarcotic Drugs, 1961, since cocaine is readily extractable from the leaf.” This ease of ex-traction makes coca and cocaine inextricably linked. Therefore, because cocaine is defined asa narcotic, coca must also be defined in this way.

H: Coca can be classified as a narcotic. +SUPP +ENT


Case Study:TE and support

T: In 1992 the World Health Organization’s Expert Committee on Drug Dependence (ECDD)undertook a ’prereview’ of coca leaf at its 28th meeting. The 28th ECDD report concludedthat, “the coca leaf is appropriately scheduled as a narcotic under the Single Convention onNarcotic Drugs, 1961, since cocaine is readily extractable from the leaf.” This ease of ex-traction makes coca and cocaine inextricably linked. Therefore, because cocaine is defined asa narcotic, coca must also be defined in this way.

H: Coca can be classified as a narcotic. +SUPP +ENT

T: Coca is not cocaine. Coca is distinct from cocaine. Coca is a natural leaf with very mildeffects when chewed. Cocaine is a highly processed and concentrated drug using derivativesfrom coca, and therefore should not be considered as a narcotic.

H: Coca in its natural state is not a narcotic. What is absurd about the 1961 convention isthat it considers the coca leaf in its natural, unaltered state to be a narcotic. The paste orthe concentrate that is extracted from the coca leaf, commonly known as cocaine, is indeeda narcotic, but the plant itself is not. +SUPP, -ENT


Case Study:contradiction and attack


H: Coca can be classified as a narcotic. +ATT, +CONTR


Case Study:contradiction and attack


H: Coca can be classified as a narcotic. +ATT, +CONTR

T: Coca chewing is bad for human health. The decision to ban coca chewing fifty years agowas based on a 1950 report elaborated by the UN Commission of Inquiry on the Coca Leafwith a mandate from ECOSOC: “We believe that the daily, inveterate use of coca leaves bychewing is thoroughly noxious and therefore detrimental”

H: Chewing coca offers an energy boost. Coca provides an energy boost for working or for com-bating fatigue and cold. +ATT, -CONTR


Experimental setting: data set

Training set Test set

Topic #arg #pairs Topic #arg #pairstot. yes no tot. yes no

Violent games/aggressiveness 16 15 8 7 Ground zero mosque 9 8 3 5China one-child policy 11 10 6 4 Mandatory military service 11 10 3 7Consider coca as a narcotic 15 14 7 7 No fly zone over Libya 11 10 6 4Child beauty contests 12 11 7 4 Airport security profiling 9 8 4 4Arming Libyan rebels 10 9 4 5 Solar energy 16 15 11 4Random alcohol breath tests 8 7 4 3 Natural gas vehicles 12 11 5 6Osama death photo 11 10 5 5 Cell phones while driving 11 10 5 5Privatizing social security 11 10 5 5 Marijuana legalization 17 16 10 6Internet access as a right 15 14 9 5 Gay marriage as a right 7 6 4 2

Vegetarianism 7 6 4 2TOTAL 109 100 55 45 TOTAL 110 100 55 45

• Debatepedia as case study provides us with annotatedarguments and casts our task as a yes/no entailment task

• Test set pairs concern completely new topics


Experimental set: evaluation

• EXCITEMENT Open Platform: an open source softwareplatform containing state-of-the-art algorithms for recognizingtexual entailment relationshttp://hltfbk.github.io/Excitement-Open-Platform/

• FIRST STEP: TEXTUAL ENTAILMENTDebatepedia train (160); Debatepedia test (100)

EOP Acc. Recall Precision F1BIUTEE 0.71 0.94 0.66 0.78

EditDistanceEDA 0.58 0.61 0.59 0.59

• SECOND STEP: TE+ARGUMENTATION THEORY

Pr: 0.74, Rec: 0.76, F1: 0.75


http://hltfbk.github.io/Excitement-Open-Platform/


• EXCITEMENT Open Platform: an open source softwareplatform containing state-of-the-art algorithms for recognizingtexual entailment relationshttp://hltfbk.github.io/Excitement-Open-Platform/

• FIRST STEP: TEXTUAL ENTAILMENTDebatepedia train (160); Debatepedia test (100)

EOP Acc. Recall Precision F1BIUTEE 0.71 0.94 0.66 0.78

EditDistanceEDA 0.58 0.61 0.59 0.59

• SECOND STEP: TE+ARGUMENTATION THEORY

Pr: 0.74, Rec: 0.76, F1: 0.75


http://hltfbk.github.io/Excitement-Open-Platform/


EDITS learning curve: increasing number of pairs in training set→ improvement system performances


Another application of TE toargument mining


Another application of TE toargument mining

The case of the Wikipediarevision history


Supporting community managers usingargument mining

TEXTUALENTAILMENT

TEXTUALENTAILMENT

ARGUMENTATIONTHEORY

ARGUMENTATIONTHEORY

How to detect the arguments, And the relationships among them?

1

How to build the overall graph of the changes and discover the winning arguments?

2

RDF/SPARQL

RDF/SPARQL

3COMMUNITYMANAGER

How to extract further insightful information?

GOALEfficient management of wiki pages by community managers and animations of communities


Combined FrameworkWikipedia revisions for the article “United States”

T (Wiki12): The land area of the contiguous United States is 2,959,064 square miles (7,663,941 km2).

H (Wiki11): The land area of the contiguous United States is approximately 1,800 million acres

(7,300,000 km2)

T (Wiki11): The land area of the contiguous United States is approximately 1,800 million acres

(7,300,000 km2)

H (Wiki10): The land area of the contiguous United States is approximately 1.9 billion acres )

(770 million hectares)

T (Wiki10): The land area of the contiguous United States is approximately 1.9 billion acres

(770 million hectares)

H (Wiki09): The total land area of the contiguous United States is approximately 1.9 billion acres.

A2Wiki10

A3Wiki11

A1Wiki09

A4Wiki12

(a)

A2Wiki10

A3Wiki11

A1Wiki09

A4Wiki12

(b)


Experimental setting: evaluation

• EDITS system (Edit Distance Textual Entailment Suite)(Kouylekov and Negri, 2010), off-the-shelf system

Basic configuration: word overlap and cosine similarityalgorithms; distance calculated on lemmas; stopword list

• FIRST STEP: TEXTUAL ENTAILMENTTrain Test

EDITS configurations rel Precision Recall Accuracy Precision Recall Accuracy

WordOverlapyes 0.83 0.82

0.830.83 0.82

0.78no 0.76 0.73 0.79 0.82

CosineSimilarityyes 0.58 0.89

0.630.52 0.87

0.58no 0.77 0.37 0.76 0.34

• SECOND STEP: TE+ARGUMENTATION THEORYTest

Configuration Precision Recall F-measureWordOverlap + AT 0.90 0.92 0.91








0.830.83 0.82

0.78no 0.76 0.73 0.79 0.82


0.630.52 0.87

0.58no 0.77 0.37 0.76 0.34




Thanks for your attention!

NoDE DATASET:http://www-sop.inria.fr/NoDE/


http://www-sop.inria.fr/NoDE/

Credits

• Tutorial “Textual Entailment” at ACL 2007 by Ido Dagan,Dan Roth, and Fabio Massimo Zanzotto.

• Christopher M. Bishop, “Pattern Recognition and MachineLearning”, Springer, 2006.


Publications

TEXTUAL ENTAILMENT FOR ARGUMENT MINING

E. Cabrio, S. Villata (2013). A Natural Language Bipolar Argumentation Approach to Support Users in OnlineDebate Interactions, Argument and Computation.

E. Cabrio, S. Villata and F. Gandon (2013). A Support Framework for Argumentative Discussions Management inthe Web, Procs of the Extended Semantic Web Conference (ESWC 2013).

DEEP LEARNING FOR ARGUMENT MINING

T. Bosc, E. Cabrio, S. Villata (2016). DART: a Dataset of Arguments and their Relations on Twitter, Procs of the10th Language Resources and Evaluation Conference (LREC 2016).

T. Bosc, E. Cabrio, S. Villata (2016). Tweeties Squabbling: Positive and Negative Results in Applying ArgumentMining on Social Media, Procs of the 6th International Conference on Computational Models of Argument(COMMA 2016).

OVERVIEW PAPERS ABOUT ARGUMENT MINING

A. Peldszus, M. Stede (2013). From Argument Diagrams to Argumentation Mining in Texts: A Survey. Int.Journal of Cognitive Informatics and Natural Intelligence.

M. Lippi, P. Torroni (2016). Argumentation Mining: State of the Art and Emerging Trends. ACM Trans. InternetTechn.


textual entailment and machine learning for argument...

Documents