sentiment analysis + extras - courses.cs.ut.ee · •using nlp/statistics/machine learning methods...
TRANSCRIPT
![Page 1: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/1.jpg)
Sentiment Analysis + extrasNatural Language Processing: Lecture 10
09.11.2017
Kairit Sirts
![Page 2: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/2.jpg)
Homework 2 – POS taggingScore Name Features
97.3 Faiz word +/-2 words; prefixes, suffixes; numeric, cap, all caps, first and last word, hyphen, lower
96.6 Maksym word and stem +/-1 stems; prefixes, suffixes, last 2 words; cap, P and N punct; word pos, sent len
96.5 Olha word/stem -2 words/stems, +1 word/stem; prefixes, suffixes
95.4 Ian word and stem -2 words/stems; previous POS tag
94.5 Andre stem +1 word -2 words; prefixes, suffixes; cap, punct
94.1 Alina word -1 word, stem -2 stems; prefixes, suffixes; contains num, punct; cap, is punct, is num
93.9 Liza word; prefixes, suffixes; num, first, last, 2nd last, alpha, cap, all caps, punct, article (a, an, the)
93.4 Viacheslav stem; prefixes, suffixes; characters (by frequency)
93.2 Aytaj word -2 words; prefixes, suffixes; cap, all caps, lower, contains .
92.9 Vladyslav word/stem -2 words/stems; prefixes, suffixes; cap, punct; word length
89.5 Artem stem
89.2 Yevhen, Ivan, Hendrik stem
79.4 Yurii word, stem; bigram of word and previous word 2
![Page 3: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/3.jpg)
Topics
• Sentiment analysis
• Sentiment lexicons
• Extras• Topic modeling• More about evaluation
• Cross-validation• Precision and recall in multiclass setting
• Slides partially based on:• https://web.stanford.edu/~jurafsky/slp3/slides/7_NB.pdf• https://lct-master.org/files/MullenSentimentCourseSlides.pdf• https://web.stanford.edu/~jurafsky/slp3/slides/21_SentLex.pdf
3
![Page 4: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/4.jpg)
Sentiment Analysis
4
![Page 5: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/5.jpg)
Positive or negative movie review
• unbelievably disappointing
• Full of zany characters and richly applied satire, and some great plot twists
• this is the greatest screwball comedy ever filmed
• It was pathetic. The worst part about it was the boxing scenes.
5
![Page 6: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/6.jpg)
What is sentiment?
• Sentiment = feelings• Attitudes
• Emotions
• Opinions
• Subjective impressions, not facts
6
![Page 7: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/7.jpg)
What is sentiment?
• Generally, a binary opposition in opinions is assumed
• For/against, like/dislike, good/bad etc
• Some sentiment analysis jargon:• “Semantic orientation”
• “Polarity”
7
![Page 8: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/8.jpg)
What is sentiment analysis
• Using NLP/statistics/machine learning methods to extract, identify or otherwise characterize the sentiment content of a text unit
• Also referred to as:• Opinion extraction
• Opinion mining
• Sentiment mining
• Subjectivity analysis
8
![Page 9: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/9.jpg)
Why sentiment analysis?
• Movie: is this review positive or negative?
• Products: what do people think about the new iPhone?
• Public sentiment: how is consumer confidence? Is despair increasing?
• Politics: what do people think about this candidate or issue?
• Prediction: predict election outcomes or market trends from sentiment
• Customer service: is this customer email satisfied or dissatisfied?
• Marketing: how are people responding to this ad/campaign/product release/news item?
9
![Page 10: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/10.jpg)
Scherer Typology of Affective States
• Emotion: brief organically synchronized … evaluation of a major event
• angry, sad, joyful, fearful, ashamed, proud, elated
• Mood: diffuse non-caused low-intensity long-duration change in subjective feeling
• cheerful, gloomy, irritable, listless, depressed, buoyant
• Interpersonal stances: affective stance toward another person in a specific interaction
• friendly, flirtatious, distant, cold, warm, supportive, contemptuous
• Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons
• liking, loving, hating, valuing, desiring
• Personality traits: stable personality dispositions and typical behavior tendencies
• nervous, anxious, reckless, morose, hostile, jealous
10
![Page 11: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/11.jpg)
Scherer Typology of Affective States
• Emotion: brief organically synchronized … evaluation of a major event
• angry, sad, joyful, fearful, ashamed, proud, elated
• Mood: diffuse non-caused low-intensity long-duration change in subjective feeling
• cheerful, gloomy, irritable, listless, depressed, buoyant
• Interpersonal stances: affective stance toward another person in a specific interaction
• friendly, flirtatious, distant, cold, warm, supportive, contemptuous
• Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons
• liking, loving, hating, valuing, desiring
• Personality traits: stable personality dispositions and typical behavior tendencies
• nervous, anxious, reckless, morose, hostile, jealous
11
![Page 12: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/12.jpg)
What is sentiment analysis?
• Sentiment analysis is the detection of attitudes“enduring, affectively colored beliefs, dispositions towards objects or persons”
1. Holder (source) of attitude
2. Target (aspect) of attitude
3. Type of attitude• From a set of types
• Like, love, hate, value, desire, etc.
• Or (more commonly) simple weighted polarity: • positive, negative, neutral, together with strength
4. Text containing the attitude• Sentence or entire document
12
![Page 13: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/13.jpg)
Sentiment analysis
• Simplest task:• Is the attitude of this text positive or negative?
• More complex:• Rank the attitude of this text from 1 to 5
• Advanced:• Detect the target, source, or complex attitude types
13
![Page 14: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/14.jpg)
Other related tasks
• Information extraction (discarding subjective information)
• Question answering (recognising opinion-oriented questions)
• Summarisation (accounting for multiple viewpoints)
• “Flame” detection
• Identifying child-suitability of videos based on comments
• Bias identification in news sources, fake news
• Identifying (in)appropriate content for ad placement
14
![Page 15: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/15.jpg)
Applications in Business Intelligence
• Question: “Why aren’t consumers buying our laptop?”
• We know the concrete data: price, specs, competition etc
• We want to know subjective data: • “the design is tacky”
• “customer service was condescending”
• Misperceptions are also important, e.g. “updated drivers aren’t available” (even though they really are)
15
![Page 16: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/16.jpg)
Challenges in Sentiment Analysis
• People express opinions in complex ways
• In opinion texts, lexical content alone can be misleading
• Negations and topic changes are common, both in the text and sentence level
• Rhetorical devices such as sarcasm, irony, implication etc
16
![Page 17: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/17.jpg)
A letter to a hardware store
“Dear <hardware store>
Yesterday I had occasion to visit <your competitor>. The had an excellent selection, friendly and helpful salespeople, and the lowest prices in town.
You guys suck.
Sincerely,”
17
![Page 18: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/18.jpg)
Amazon.com 5 star
“The characters are so real and handled so carefully, that being trapped inside the Overlook is no longer just a freaky experience. You run along with them, filled with dread, from all the horrible personifications of evil inside the hotel's awful walls. There were several times where I actually dropped the book and was too scared to pick it back up. Intellectually, you know it's not real. It's just a bunch of letters and words grouped together on pages. Still, whenever I go into the bathroom late at night, I have to pull back the shower curtain just to make sure.”
18
![Page 19: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/19.jpg)
Amazon.com 1 star
“The original Star Wars trilogy was a defining part of my childhood. Born as I was in 1971, I was just the right age to fall headlong into this amazing new world Lucas created. I was one of those kids that showed up early at toy stores [...] anxiously awaiting each subsequent installment of the series. I'm so glad that by my late 20s, the old thrill had faded, or else I would have been EXTREMELY upset over Episode I: The Phantom Menace... perhaps the biggest let-down in film history.”
19
![Page 20: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/20.jpg)
Pitchfork.com (0.0 out of 10)
“Ten years on from Exile, Liz has finally managed to achieve what seems to have been her goal ever since the possibility of commercial success first presented itself to her: to release an album that could have just as easily been made by anybody else.”
20
![Page 21: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/21.jpg)
Amazon.com 1 star
“It took a couple of goes to get into it, but once the story hooked me, I found it difficult to put the book down -- except for those moments when I had to stop and shriek at my friends, "SPARKLY VAMPIRES!" or "VAMPIRE BASEBALL!" or "WHY IS BELLA SO STUPID?" These moments came increasingly often as I reached the climactic chapters, until I simply reached the point where I had to stop and flail around laughing.”
21
![Page 22: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/22.jpg)
What to classify
• There are many possibilities of what we might want to classify• Users
• Texts
• Sentences / paragraphs / chunks of texts
• Predetermined descriptive phrases• <ADJ NOUN>, <NOUN NOUN>, <ADV ADJ>, etc
• Words
• Tweets
22
![Page 23: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/23.jpg)
SA as document classification
• Extract features from text• Ngrams
• POS tags, parse structures
• Negation words, emoticons, exclamation marks etc
• Distributed features
• Sentiment lexicons
• Build a classifier• Naïve Bayes
• Logistic regression
• SVM
23
![Page 24: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/24.jpg)
SA as document classification
• Extract features from text• Ngrams
• POS tags, parse structures
• Negation words, emoticons, exclamation marks etc
• Distributed features
• Sentiment lexicons
• Build a classifier• Naïve Bayes
• Logistic regression
• SVM
24
![Page 25: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/25.jpg)
Sentiment Lexicons
25
![Page 26: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/26.jpg)
Polarity keywords
• There seems to be some relation between positive words and positive reviews
• Can we come up with a set of keywords by hand to identify polarity?
• Results from (Pand et al., 2002)
26
Proposed word list Accuracy Ties
Human 1 Pos: dazzling, brilliant, phenomenal, excellent, fantasticNeg: suck, terrible, awful, unwatchable, hideous
58% 75%
Human 2 Pos: gripping, mesmerizing, riveting, spectacular, cool, awesome, thrilling, badass, excellent, moving, excitingNeg: : bad, cliched, sucks, boring, stupid, slow
64% 39%
Human 3 + stats
Pos: love, wonderful, best, great, superb, still, beautifulNeg: bad, worst, stupid, waste, boring, ?, !
69% 16%
![Page 27: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/27.jpg)
Sentiment/Affective Lexicons
• The General Inquirer
• LIWC - Linguistic Inquiry and Word Count
• MPQA Subjectivity Cues Lexicon
• Bing Liu Opinion Lexicon
• SentiWordNet
27
![Page 28: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/28.jpg)
The General InquirerP. J. Stone et al., 1966. The General Inquirer: A Computer Approach to Content Analysis.
• Home page: http://www.wjh.harvard.edu/~inquirer
• List of Categories: http://www.wjh.harvard.edu/~inquirer/homecat.htm
• Spreadsheet: http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls
• Categories:• Positiv (1915 words) and Negativ (2291 words)
• Strong vs Weak, Active vs Passive, Overstated versus Understated
• Pleasure, Pain, Virtue, Vice, Motivation, Cognitive Orientation, etc
• Free for Research Use
28
![Page 29: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/29.jpg)
LIWC - Linguistic Inquiry and Word CountPennebaker et al., 2007. Linguistic Inquiry and Word Count: LIWC
• Home page: http://www.liwc.net/
• 2300 words, >70 classes
• Affective Processes• negative emotion (bad, weird, hate, problem, tough)
• positive emotion (love, nice, sweet)
• Cognitive Processes• Tentative (maybe, perhaps, guess), Inhibition (block, constraint)
• Pronouns, Negation (no, never), Quantifiers (few, many)
• Academic version $90, 30 day rental 10$
29
![Page 30: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/30.jpg)
MPQA Subjectivity Cues LexiconT. Wilson et al., 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Riloff and Wiebe, 2003. Learning extraction patterns for subjective expressions.
• Home page: http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/
• 6885 words• 2718 positive
• 4912 negative
• Each word annotated for intensity (strong, weak)
• GNU GPL
30
![Page 31: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/31.jpg)
Bing Liu Opinion LexiconM. Hu and B. Liu, 2004. Mining and Summarizing Customer Reviews.
• Bing Liu's Page on Opinion Mining
• http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
• 6786 words• 2006 positive
• 4783 negative
31
![Page 32: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/32.jpg)
SentiWordNetS. Baccianella et al., 2010. SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.
• Home page: http://sentiwordnet.isti.cnr.it/
• All WordNet synsets automatically annotated for degrees of positivity, negativity, and neutrality/objectiveness
• [estimable(J,3)] “may be computed or estimated” • Pos 0 Neg 0 Obj 1
• [estimable(J,1)] “deserving of respect or high regard” • Pos .75 Neg 0 Obj .25
32
![Page 33: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/33.jpg)
Semi-supervised Sentiment Lexicon Extraction
• Use a small amount of information• A few labeled examples
• A few hand-built patterns
• Then bootstrap a lexicon
33
![Page 34: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/34.jpg)
Hatzivassiloglou and McKeown intuition for identifying word polarityV. Hatzivassiloglou and K. R. McKeown, 1997. Predicting the Semantic Orientation of Adjectives.
• Adjectives conjoined by “and” have same polarity• Fair and legitimate, corrupt and brutal
• *fair and brutal, *corrupt and legitimate
• Adjectives conjoined by “but” do not• fair but brutal
34
![Page 35: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/35.jpg)
Hatzivassiloglou and McKeown 1997: Step 1
• Label seed set of 1336 adjectives (all >20 in 21 million word WSJ corpus)
• 657 positive: adequate central clever famous intelligent remarkable reputed sensitive slender thriving…
• 679 negative: contagious drunken ignorant lanky listless primitive strident troublesome unresolved unsuspecting…
35
![Page 36: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/36.jpg)
Hatzivassiloglou and McKeown 1997: Step 2
• Expand seed set to conjoined adjectives
36
nice, helpful
nice, classy
![Page 37: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/37.jpg)
Hatzivassiloglou and McKeown 1997: Step 3
• Supervised classifier assigns “polarity similarity” to each word pair, resulting in graph:
37
![Page 38: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/38.jpg)
Hatzivassiloglou and McKeown 1997: Step 3
• Clustering for partitioning the graph into two
38
![Page 39: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/39.jpg)
Output Polarity Lexicon
• Positive• bold decisive disturbing generous good honest important large mature
patient peaceful positive proud sound stimulating straightforward strange talented vigorous witty…
• Negative• ambiguous cautious cynical evasive harmful hypocritical inefficient insecure
irrational irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful…
39
![Page 40: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/40.jpg)
Output Polarity Lexicon
• Positive• bold decisive disturbing generous good honest important large mature
patient peaceful positive proud sound stimulating straightforward strangetalented vigorous witty…
• Negative• ambiguous cautious cynical evasive harmful hypocritical inefficient insecure
irrational irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful…
40
![Page 41: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/41.jpg)
Turney AlgorithmTurney, 2002. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews.
1. Extract a phrasal lexicon from reviews
2. Learn polarity of each phrase
3. Rate a review by the average polarity of its phrases
41
![Page 42: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/42.jpg)
Extract two-word phrases with adjectives and adverbs
First word Second word Third word (not extracted)
JJ NN or NNS anything
RB, RBR, RBS JJ Not NN nor NNS
JJ JJ Not NN or NNS
NN or NNS JJ Not NN or NNS
RB, RBR, RBS VB, VBD, VBN, VBG anything
42
![Page 43: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/43.jpg)
How to measure polarity of a phrase?
• Positive phrases co-occur more with “excellent”
• Negative phrases co-occur more with “poor”
• But how to measure co-occurrence?
43
![Page 44: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/44.jpg)
PMI - Pointwise Mutual Information
• How much more do events x and y co-occur than if they were independent?
• PMI between words: How much more do two words co-occur than if they were independent?
44
![Page 45: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/45.jpg)
How to estimate Pointwise Mutual Information?• Query search engine
• Turney used Altavista because it had NEAR operator
• P(word) estimated by hits(word)
• P(word1, word2) estimated by hits(word1 NEAR word2)
45
![Page 46: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/46.jpg)
Does word occur more with “poor” or “excellent”?
46
![Page 47: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/47.jpg)
Phrases from a positive review
47
![Page 48: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/48.jpg)
Phrases from a negative review
48
![Page 49: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/49.jpg)
Results of Turney algorithm
• 410 reviews from Epinions• 170 (41%) negative
• 240 (59%) positive
• Majority class baseline: 59%
• Turney algorithm: 74%
• Phrases rather than words
• Learns domain-specific information
49
![Page 50: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/50.jpg)
Using WordNet to learn polarityS.M. Kim and E. Hovy. 2004. Determining the sentiment of opinions.M. Hu and B. Liu. Mining and summarizing customer reviews.
• WordNet: online thesuarus
• Create positive (“good”) and negative seed-words (“terrible”)
• Find Synonyms and Antonyms• Positive Set: Add synonyms of positive words (“well”) and antonyms of
negative words
• Negative Set: Add synonyms of negative words (“awful”) and antonyms of positive words (”evil”)
• Repeat, following chains of synonyms
• Filter
50
![Page 51: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/51.jpg)
Summary on semi-supervised lexicon learning
• Advantages:• Can be domain-specific• Can be more robust (more words)
Intuition
• Start with a seed set of words (‘good’, ‘poor’)
• Find other words that have similar polarity:• Using “and” and “but”• Using words that occur nearby in the same document• Using WordNet synonyms and antonyms
51
![Page 52: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/52.jpg)
Using lexicon to detect document sentimentSimplest unsupervised method• Count the words with positive sentiment from a document
• Count the words with the negative sentiment
• Choose whichever value (positive or negative) has higher sum
52
![Page 53: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/53.jpg)
Using lexicon to detect document sentimentSimplest supervised method• Build a classifier
• Predict sentiment (or emotion, or personality) given features
• Use “counts of lexicon categories” as a features
• Sample features:• LIWC category “cognition” had count of 7• NRC Emotion category “anticipation” had count of 2
• Baseline• Instead use counts of all the words and bigrams in the training set• This is hard to beat• But only works if the training and test sets are very similar
53
![Page 54: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/54.jpg)
Conclusion on Sentiment Analysis
• Sentiment analysis is about detecting the polarity of subjective opinions
• Related tasks, all related to subjective aspects:• Emotion detection, emotion intensity detection
• Irony, sarcasm, humor detection
• Personality analysis
• In the simplest case SA is a document classification task, trying to predict the binary sentiment: positive or negative
• Task or domain-specific sentiment lexicons are useful in SA
54
![Page 55: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/55.jpg)
LDA topic modeling
55
![Page 56: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/56.jpg)
LDA topic modeling
• LDA – Latent Dirichlet Allocation
• Unsupervised document clustering into topics or themes
• Each document will be represented as a mixture of K topics
• The topic vectors can be used as document representations in further classification task
56
![Page 57: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/57.jpg)
57David Blei, 2012. Probabilistic Topic Models
![Page 58: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/58.jpg)
Intuition of LDA
• Each topic is a distribution over words in a fixed vocabulary• The genetics topic has words about genetics with high probability
• The evolutionary biology topic has words about evolutionary biology with high probability
• Each document has a distribution over topics• In a two topic situation (genetics and evolutionary biology) a document may
be represented for instance with a vector [0.7, 0.3]
• This is a mixture of topics
58
![Page 59: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/59.jpg)
Generative story
LDA is a generative model, thus it has a generative story
• For each topic, generate a distribution over words
• For each document, generate a distribution over topics
• For each word position in the document:• Sample a topic from the document’s topic distribution
• Sample a word from that topic distribution
59
![Page 60: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/60.jpg)
LDA model – plate diagram
60
Topic-word distributions
hyperparameters
wordsTopic index for each word
Document-topicdistributions
David Blei, 2012. Probabilistic Topic Models
![Page 61: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/61.jpg)
Training an LDA model
• Learn the topic-word distributions
• For each document find the document-topic distributions
• Training equals inference because the model is unsupervised• By training the model on a text corpus we also obtain the topic vectors for
these documents
Two main methods (both are beyond the scope of our course)
• Gibbs sampling
• Variational Bayes
61
![Page 62: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/62.jpg)
LDA fit to 17000 articles from Science
62David Blei, 2012. Probabilistic Topic Models
![Page 63: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/63.jpg)
Tools for training topic models
• Gensim
• Mallet
• Stanford Topic Modeling Toolkit
• jLDADMM
• R topic modeling package
63
![Page 64: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/64.jpg)
Evaluation
64
![Page 65: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/65.jpg)
Evaluation protocols
• Train-dev-test: 80/10/10, 70/20/10, 70/15/15• Train on training set
• Tune hyper-parameters on development set
• Test the best model on test set
• What if we can’t afford to split the data?• Cross-validation
65
![Page 66: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/66.jpg)
K-fold cross-validation
• Break up data into 5 folds
• For each fold:• Choose the fold as a temporary test
set
• Train on 9 folds, evaluate on test fold
• Report average performance of the 5 runs
66
TrainingTest
Test
Test
Test
Test
Training
Training Training
Training
Training
Iteration
1
2
3
4
5
![Page 67: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/67.jpg)
Cross-validation
• Most commonly used 10-fold CV and 5-fold CV• Extreme case is LOOCV – Leave one out cross-validation
• Stratified K-fold CV – the label distribution is preserved in each fold.• If the dataset contains 10 positive items and 30 positive items and data is
divided into 5 fold then how many positive and how many negative items will be in each fold?
• The performance on different folds can vary a lot, especially when folds are small
• If possible, separate the test set and tune the hyperparameters using CV on the train set
67
![Page 68: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/68.jpg)
Aggregating CV results
• Micro-averaging• Sum TP, FP and FN counts from all folds and then compute precision, recall
and F-score
• Macro-averaging• Compute precision, recall and F-score for each fold and then average.
68
![Page 69: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/69.jpg)
Precision and recall in multiclass classification
• Contingency table
69Figure 6.5: https://web.stanford.edu/~jurafsky/slp3/6.pdf
![Page 70: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/70.jpg)
Aggregating precision and recall
70Figure 6.6: https://web.stanford.edu/~jurafsky/slp3/6.pdf
![Page 71: Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods to extract, ... •Sentiment analysis is the detection of attitudes ... Semi-supervised](https://reader036.vdocuments.us/reader036/viewer/2022081505/5b4075597f8b9af6438d4aac/html5/thumbnails/71.jpg)
Conclusion for Extras
• LDA topic modeling – unsupervised method for clustering documents based on themes or topics
• It can provide useful features for document classification
• Use cross-validation when you can’t afford to split the data into train/dev/test
• Split data into train/test and perform CV with train set to choose features and hyperparameters
• With multiple classes evaluate precision and recall for every class and then aggregate with micro- or macro-averaging
71