“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE
Natural Language
Engineering
Daniela GÎFU
http://profs.info.uaic.ro/~daniela.gifu/
Sentiment Analysis
Laboratory 7
2
What is Sentiment Analysis?
IMPACT OF TOPIC
Sentiment Analysis (SA) - one of the most current topics in NLP.
SA - offers possibility to monitor, to identify and understand in real
time consumer's feelings and attitudes towards brands or topics in
cyberspace and act accordingly.
SA - very popular in social media.
-Target: academia and industry.
08.05.2012
IMPACT IN SOCIAL MEDIA
Social media deals with the personal and social related opinion. SA - very vital role in understanding the opinions from such conversation, posts, blogs, etc and deriving a sensible short summary consisting of most relevant opinions. SA - helps to: • Take quick decision • To change strategy and tactics used • To understand mood of the market • Be with the changing trends • To improve one’s product
VALIDATY OF S.A.
- evaluated by comparing sentiment scores for specific comments to their respective star ratings, which are common clues used by individuals to filter what they read during information acquisition.
RESEARCH QUESTIONS...
• How comparable are sentiment scores for reviews/comments to their respective star ratings? • How do sentiment scores impact decision outcomes?
PURPOSE AND MOTIVATION
- to enhance the results of context-based SA.
- to clarify the descriptive behavior of receptor, affected by the multitude of information on forums, etc.
- to improve the performance of SA classifiers based on two approaches (machine learning & lexicon).
SA similar with…
SA – terminology:
- subjectivity [Lyons 1981; Langacker 1985]; - evidentiality [Chafe and Nichols 1986]; - analysis of stance [Biber and Finegan 1988; Conrad and Biber 2000]; - affect [Batson, Shaw, and Oleson 1992]; - point of view [Wiebe 1994; Scheibman 2002]; - evaluation [Hunston and Thompson, 2001] - appraisal [Martin and White 2005]; - opinion mining [Pang and Lee 2008]; - politeness [Gîfu and Topor, 2014].
SA - the process of detecting the contextual polarity of
text.
Sentiment classification techniques
Fig. 1 Sentiment classification techniques
SA levels – document (1)
Positive Negative Neutral
Fig. 2 Supervised learning – for three classes
a) supervised approach
SA levels – document (2)
Fig. 2 Python NLTK Demos for Natural Language Text Processing
a) supervised approach
http://text-processing.com/demo/
SA levels – document (3)
a) unsupervised approach
Based on determining the semantic orientation (SO) of specific words/phrases. 1. Sentiment lexicon (words/expressions) – [Taboada et. al, 2011] 1. Set of predefined POS models – [Turney, 2002]
SA levels – clause/sentence
More complex – identifying if a sentence is opinionated and establishing the nature of opinion; - using supervised methods; 1. classifying clauses into two classes [Yu and Hatzivassiloglou, 2003] 2. an approach based on minimal reductions. [Pang and Lee, 2004] The problem: How can we classify the interrogations, sarcasm, metaphor, humor, etc.?
SA levels – features
- more entities for each analyzed text or more attributes for each entity; - extraction of the attributes of an object; Becali a ajutat mult săracii 1/, [dar] nimeni nu a ştiut exact 2/ [cum] a făcut atâţia bani 3/. - extract and store all NPs;
- keep only NPs with frequency above a learned-by-experiments threshold [Hu and Liu, 2004]
SA levels – comparative
-When a user doesn’t offer a direct opinion about a product. [Jindal and Liu, 2006]
Dacia Logan arată mult mai bine decât Dacia Solenza. - adverbial adjectives: mai mult, mai puţin (En. - more, less) - superlative adjectives and adverbs: mai, cel puţin (En. - more, at least) - additional clauses: decât, împotriva (En. - rather than, against).
cover 98% of the comparative opinions
SA levels – sentiment lexicon (1)
a) manual approaches: WordNet [Fellbaum, 1998], European EuroWordNet [Vossen, 1998], Balkanet [Tufiş et al., 2004]
Our work: AnaDiP-2010 inspired by LIWC-2007 [Pennebaker et al., 2001]: 9 emotional classes.
<classes>
<class name="emotional" id="1"/>
<class name="positive" id="2" parent="1"/>
<class name="negative" id="3" parent="1"/>
<class name="anxiety" id="4" parent="3"/>
<class name="anger" id="5" parent="3"/>
<class name="sadness" id="6" parent="3"/>
<class name="spectacular" id="7" parent="2"/>
<class name="firmness” id="8" parent="2"/>
<class name="moderation" id="9" parent="2"/>
</classes>
SA levels – sentiment lexicon (2)
Our software performs part-of-speech (POS) tagging and lemmatization of words. For example: <lexic name="Politic" lang="ro">
<word lemma="clevetitor" classes="1,3,6"/>
<word lemma="genial" classes="1,2,7"/>
…
</lexic>
SA levels – sentiment lexicon (3)
a) corpus-based approaches – a set of words/phrases extracted from a relatively small corpus is extended by using a large corpus of documents on a single domain.
- a classical work [Hatzivassiloglou and McKeown, 1997]
using a set of linguistic connectors şi, sau, nici, fie (en. - and, or, not, either).
Examples: bărbat puternic şi armonios / bărbat puternic şi armonios femeie senzuală sau inteligentă? / femeie sărmană sau înstărită? băiatul nu e nici prost, nici deștept... / băiatul nu e nici prost, nici
urât...
Applications – business and government (1)
“Why aren’t consumers buying our laptop?” when the price is good, and the weight is obviously in accord with consumer’s wishes. [Lee, 2004] Two kinds of answers: - the subjective reasons about intangible qualities (e.g. the physical keyboard is tacky) or - misperceptions (even though they are wrong) Solution: By tracking consumer’s opinions, one could realize trend prediction in sales, etc. [Mishne & Glance, 2006].
Applications – business and government (2)
Solution based on a dictionary + semantic role of negations and pragmatic connectors: - classification of emotionally charged words into two classes: positive and negative (also a neutral class); - more classes, associating to each word with a value in the range -5 to +5; - [Gîfu and Cristea, 2012a] a scale to the interval -3 to +3; - [Gîfu and Scutelnicu, 2013] a scale of values: -1 to +1.
Process phases: POS-tagger & NER & Anaphora Resolution (1)
<DOCUMENT>
<P ID="1">
<S ID="1">
<W EXTRA="NotInDict" ID="11.1" LEMMA="" MSD="Vmip3s" Mood="indicative"
Number="singular" POS="VERB" Person="third" Tense="present" Type="predicative"
offset="0"></W>
<NP HEADID="11.2" ID="0" ref="0">
<W Case="direct" Gender="masculine" ID="11.2" LEMMA="nimic" MSD="Pz3msr"
Number="singular" POS="PRONOUN" Person="third" Type="negative"
offset="1">Nimic</W>
<W ID="11.3" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="7">mai</W>
<W Case="direct" Definiteness="no" Gender="masculine" ID="11.4" LEMMA="odios"
MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" offset="11">odios</W>
<W ID="11.5" LEMMA="," MSD="COMMA" POS="COMMA" offset="16">,</W>
<W ID="11.6" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="18">mai</W>
<W ID="11.7" LEMMA="oribil" MSD="Rg" POS="ADVERB" offset="22">oribil</W>
<W Case="direct" Definiteness="no" EXTRA="NotInDict" Gender="masculine"
ID="11.8" LEMMA="decât" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE"
offset="29">decât</W>
</NP>
<NP HEADID="11.9" ID="1" ref="1">
<W Case="direct" Definiteness="yes" Gender="masculine" ID="11.9" LEMMA="pantof"
MSD="Ncmpry" Number="plural" POS="NOUN" Type="common" offset="35">pantofii</W>
<NP HEADID="11.10" ID="2" ref="2">
<W Case="direct" Definiteness="no" Gender="masculine" ID="11.10" LEMMA="sport"
MSD="Ncmsrn" Number="singular" POS="NOUN" Type="common" offset="44">sport</W>
<W ID="11.11" LEMMA="cu" MSD="Sp" POS="ADPOSITION" offset="50">cu</W>
<NP HEADID="11.12" ID="3" re f="3">
<W Case="direct" Definiteness="yes" Gender="feminine" ID="11.12"
LEMMA="platformă" MSD="Ncfsry" Number="singular" POS="NOUN" Type="common"
offset="53">platformă</W>
</NP>
</NP>
</NP>
</DOCUMENT>
Process phases: POS-tagger & NER & Anaphora Resolution (2)
Fig. 3 The interface of the EAT system
SA - Rules
- 46 rules for values. <rule>
<word attribute=”LEMMA” value=”cel”/>
<word attribute=”LEMMA” value=”mai”/>
<word attribute=”POS“ value=”ADJECTIVE”/>
</rule>
Ex: cel mai bun
<rule>
<word attribute=”LEMMA” value=”cel”/>
<word attribute=”LEMMA” value=”mai”/>
<word attribute=”POS” value=”bun”/>
</rule>
Applications – review sites
- to appreciate the reviews and ratings about your company or yourself; - to summarize reviews. Our work: the consumer’s behaviour, civic identity [Gîfu et al., 2013] 6 profiles: the-decent, the-porn-aggressive, the-incitator, the-affected, the-author-attacker and supporter. - we established a number of features (lexical, syntactic, semantic): style, emotional classes, etc.
Applications – politics/sociology
Two dimensions in politics: 1. to know what electors are thinking about the political candidates [Efron, 2004, Goldberg et al., 2007, Layer et al., 2003, Mullen and Malouf, 2008]; 2. to clarify the politicians’ positions to enhance the quality of information that voters have access to [Bansal et al., 2008, Gîfu, 2013b]
In sociology: - how ideas and innovations are propagated [Rosen, 1974] Ex: the polls on different issues
CONCLUSIONS AND DISCUSSIONS
SA - a complex task; SA - an emerging discipline with promising academic and, most important!!!, industrial applications; .... the sentiment classification problem - more challenging
29
Final project: SEMEVAL 2018/2019
Lab. 7 SA: - NLTK (Naïve Bayes Classifier): https://www.nltk.org/_modules/nltk/classify/positivenaivebayes.html - TextBlob – perform different NLP tasks: POS Tagging, NPs Extraction, SA, etc.): https://textblob.readthedocs.io/en/dev/index.html
Methodology: 1 Categorize each text/document (e.g. tweets) into a specific class positive, negative, neutral; 2. Add new instances <pos>…</pos>, <neg>…</neg>, etc.
according to the classifier used… in your XML/ JSON file for each common noun. 3. Add new statistics over the SEMEVAL corpus for SA task
Statistics over the SEMEVAL data set
Counted elements Value
# sentences
# tokens (punctuation included)
# tokens (excluded punctuation)
# entities
# person
# location
….
# sentiments
# positive (single class or using other subclasses)
# negative (single class or using other subclasses)
# neutral
Thank you for your attention!
?