signals from text: sentiment, intent, emotion, deceptionversion 1: collect lists of positive and...
TRANSCRIPT
![Page 1: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/1.jpg)
Signals from Text: Sentiment, Intent, Emotion,Deception
Stephen Pulman
TheySay Ltd, www.theysay.ioand Dept. of Computer Science, Oxford University [email protected]
March 9, 2017
![Page 2: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/2.jpg)
Text Analytics
Overview
Sentiment analysis proper: classify text into positive, negative orneutral.
(Future intent, risk, speculation...)
(Demographic profile: age, gender, politics, religion...)
Deception: can we tell if someone is lying?
Emotion: joy, sadness, fear, anger...
Some example application areas.
![Page 3: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/3.jpg)
What tools do we use?
Linguistically based pattern matching (information extraction) e.g.<PERSON> resign/be-sacked/fired/moved-from <COMPANY-ROLE>
Machine learning methods: train a classifier using annotated data:Naive Bayes, Support Vector Machine, Averaged Perceptron, NeuralNetworks etc.
Currently fashionable: Deep Learning methods - Convolutional NeuralNet, Long Short Term Memory models etc
Choice usually depends on the availability of annotated data:expensive and time-consuming to acquire.
![Page 4: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/4.jpg)
What is sentiment analysis?
Sentiment proper
Positive, negative, or neutral attitudes expressed in text:
Suffice to say, Skyfall is one of the best Bonds in the 50-yearhistory of moviedom’s most successful franchise.
Skyfall abounds with bum notes and unfortunate compromises.
There is a breach of MI6. 007 has to catch the rogue agent.
Sometimes, factual statements imply sentiment:
Online giant Amazon’s shares have closed 9.8% higher
![Page 5: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/5.jpg)
Building a sentiment analysis system
Cheap and cheerful
Version 1: collect lists of positive and negative words, classify a textbased on proportion of pos/neg.
Version 2: (what most commercial systems do) get a training corpusof texts human annotated for sentiment (e.g. pos/neg/neut);represent each text as a vector of counts of words or successive pairsof words; train your favourite classifier on these vectors
Problems:
if number of positive = number of negatives?
bag-of-words means structure is ignored:“Airbus: orders slump but profits rise” wrongly =“Airbus: orders rise but profits slump”
Compositional effects will be missed:
clever, too clever, not too cleverbacteria, kill, kill bacteria, fail to kill bacteria, never fail to kill bacteria
![Page 6: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/6.jpg)
Version 3
Use linguistic analysis
do as full a parse as possible on input texts
use the syntax to do ‘compositional’ sentiment analysis:
Sentence
�����
HHHH
H
Noun
Our product
VerbPhrase
����
HHHH
Adverb
never
VerbPhrase
���
HHH
Verb
fails
VerbPhrase
���
HHH
Verb
to kill
Noun
bacteria
![Page 7: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/7.jpg)
Sentiment logic rules
kill + negative → positive (kill bacteria)
kill + positive → negative (kill kittens)
kill + neutral → neutral (kill time)
too + anything → negative (too clever, too red, too cheap)
etc. In our system (www.theysay.io) we have 75,000+ rules...
Problems:
still need extra work for context-dependence (‘cold’, ‘wicked’, ‘sick’...)
can’t deal with reader perspective: “Oil prices are down” is good forme, not for investors.
can’t deal with sarcasm or irony: “Oh, great, they want it to run onWindows”
![Page 8: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/8.jpg)
Linguistic characteristics of deceptive vs. truthful text
Several studies have found that we can tell liars from truthtellers:
Liars
Liars tend to use more emotion words, fewer first person (“I”, “we”),more negation words, and more motion verbs (“lead”, “go”). (Why?)
Possible that liars exaggerate certainty and positive/negative aspects,and do not associate themselves closely with the content.
Truthtellers
Truthtellers use more self references, exclusive words (“except”,“but”), tentative words (“maybe”, “perhaps”) and time relatedwords.
Truthtellers are more cautious, accept alternatives, and do associatethemselves with the content.
![Page 9: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/9.jpg)
Financial applications
Lying CEOs
Larcker and Zakolyukina (2012) looked at the language used by CEOs andCFOs talking to analysts in conference calls about earningsannouncements.Looking at subsequent events:
discovery of ‘accounting irregularities’
restatements of earnings
changes of accountants
exit of CEO and/or CFO
- you can identify retrospectively who was telling the truth or not.This gives us a corpus of transcripts which can be labelled as ‘true’ or‘deceptive’: training data.
![Page 10: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/10.jpg)
Detecting deception
Avoid losing money
Training a classifier on features like those just described, Larcker andZakolyukina were able to get up to 66% prediction accuracy.
Building a portfolio of deceptive companies will lose you 4 to 11% perannum...
Verisimilitude
We (TheySay) have build a general ‘verisimilitude’ classifier whichlooks out for linguistic indicators of deception...
... and also measures clarity and readability (vs. obfuscation, hedging,etc.)
We’ve tried this on speeches by many politicians, and guess what?!
![Page 11: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/11.jpg)
Emotion detection
Many theories of emotion...
Ekman: anger, disgust, fear, happiness, sadness, surprise.
Seems to be a correspondence with facial expressions, and emoticons:
![Page 12: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/12.jpg)
Universality of emotion classes?
Whereas Japanese and US subjects agree on classification ofexpressions of happiness, surprise, and sadness...
... agreement is significantly lower for anger, disgust and fear
Is this emotion taxonomy universal? Possibly not - in Ifaluk, spokenon a small group of islands in the Pacific:
“...fago is felt when someone dies, is needy, is ill, or goes on avoyage...”
So fago looks like sadness so far. But “...fago is also felt when in thepresence of someone admirable or when given a gift.”
Nevertheless...
...the basic dimensions of emotion are expressed in all languages, webelieve.
![Page 13: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/13.jpg)
Emotion classification is difficult
Data collection
Human annotation usually gives an upper limit on what is possible:initial results of cross annotator agreement not encouraging.
But we can bootstrap data collection using self-labelled blog posts(LiveJournal) or social media:
Best day in ages! #Happy :)
Gets so #angry when tutors don’t email back..Do you job
idiots!
In Oxford (TheySay) we have used distant supervision and humanannotated data to build a multi-dimensional emotion classifier withacceptable accuracy...
![Page 14: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/14.jpg)
Text analytics and politics
Predicting election results: can we eliminate opinion polls?
A very mixed picture, to put it mildly:
Tumasjan et a. 2010 claimed that share of volume on Twittercorresponded to distribution of votes between 6 main parties in the2009 German Federal election. (Volume rather than sentiment is alsoa better predictor of movie success).
Jungherr et al. 2011 pointed out that not all parties running had beenincluded and that different results are got with different timewindows. Tumasjan et al. replied, toning down their original claims.
Tjong et al. found that Tweet volume was NOT a good predictor forthe Dutch 2011 Senate elections, and that sentiment analysis wasbetter.
![Page 15: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/15.jpg)
Predicting election results
Skoric et al. 2012 found some correlation between Twitter volumeand the votes in the 2011 Singapore general election, but not enoughto make accurate predictions.
Bermingham and Smeaton 2011 found that share of Tweet volumeand proportion of positive Tweets correlated best with the outcome ofthe Irish General Election of 2011
... but that the mean average error was greater than that oftraditional opinion polls!
So this doesn’t look very promising, although of course other sources thanTwitter might give better results - but they are difficult to get at quickly.
![Page 16: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/16.jpg)
What do the voters think?
We can detect which topics arouse strong emotions among the voters:
Candidate Jones:
economy immigration education health
fear 8 6 2 1
anger 2 9 1 1
happiness 1 1 5 6
Candidate Smith:
economy immigration education health
fear 2 7 2 3
anger 1 9 1 1
happiness 7 1 5 3
![Page 17: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/17.jpg)
Scottish Independence Referendum
![Page 18: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/18.jpg)
Positive sentiment
![Page 19: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/19.jpg)
Levels of fear
![Page 20: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/20.jpg)
Economy: Daily Mirror debate 17th May:John McDonnell, Nigel Farage and Peter Mandelson
![Page 21: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/21.jpg)
Immigration: murder of Remain MP Jo Cox 16th June
![Page 22: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/22.jpg)
Sentiment: all topics
![Page 23: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/23.jpg)
Brexit today
![Page 24: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/24.jpg)
Brexit today
![Page 25: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/25.jpg)
Brexit today
![Page 26: Signals from Text: Sentiment, Intent, Emotion, DeceptionVersion 1: collect lists of positive and negative words, classify a text based on proportion of pos/neg. Version 2: (what most](https://reader033.vdocuments.us/reader033/viewer/2022052306/5f0e5da47e708231d43ee522/html5/thumbnails/26.jpg)
Summary
Text Analytics today
We can detect several different types of signals in text. Performanceis nowhere near 100% accurate but improving all the time, and beingable to process large volumes of text increases the signal to noiseratio.
We know that sentiment has a predictive value. Emotion classificationgives a finer grained analysis of opinion, and more insight andexplanation than traditional sentiment analysis. Current research isalso finding signals from text that predict intent, risk, deception, etc.
Text is an underused resource: it is (mostly) free and thoughunstructured, text analytics can derive structured information from it,very quickly.
Combining rich text signals (sentiment, deception, risk etc.) withother time series data will provide analysts with important insightsinto underlying trends.