rule based approach to sentiment analysis at romip’11 slides
DESCRIPTION
Slides for presentation at Dialogue'12 API (free access for devs available): https://mashape.com/dmitrykey/russiansentimentanalyzerTRANSCRIPT
Rule-based approach to sentiment analysis at ROMIP’11
Dmitry [email protected]: @DmitryKan
AlphaSense IncDialogue, 2012
Outline
• Problem definition• Base level for accuracy• Towards shallow parsing of input text• Rule-based algorithm• Object-oriented sentiment detection• Performance• Open problems
Problem definition
• What is sentiment for people:– Mood of the author? Mood of the reader? Personal
attitude?– Opinion about the target object (product etc)?– Something else, defined by an annotator’s boss?
• What is sentiment for a computer:– General polarity background– General opinion mining– Object (product) oriented opinion mining– Polarity strength detection
Base level for accuracy
• cross-annotator agreement gives 80% [1]• Real performance of the system is the one it
shows when used on un-annotated data• Real example: ”CEO of the company turned 50”
(was marked as positive -> why?)• Some machine learning (ML) methods can give
90% and more on test data• Hard (unless impossible) to do object oriented
sentiment detection with ML
Towards shallow parsing of input text
Majority likes this, but I do not like this
Subclause 1 Subclause 2
Opposite conjunction
I liked new iPhone, but GalaxyS is not easy to use
Subclause 1 Subclause 2
Opposite conjunctionnegation
negation
Object: iPhone
iPhone
Sentiment: positive
Object: GalaxyS Sentiment: negative
Object: - Sentiment: neutral (mixed)
GalaxyS
totalSentimentScore = totalPositiveScore – totalNegativeScore - ½ * sentimentCount, if opp. conj found
0, if no opp conj found
NOT(polarity) = opposite_polarity
Rule based algorithm flow on example sentence
Majority likes this, but I do not like this.Phase1 (negations): posScore = 0 – negation weight = -2
Phase2 (individual words):Word ”likes”: posScore = -2 + 1 = -1Word ”not”: negScore = 0 + 1 = 1Word ”like”: posScore = -1 + 1 = 0
Phase3 (oppositeConjuctions): sentimentCount = 3
totalScore = posScore – negScore – ½ * sentimentCount = 0 – 1 – 3/2 = -5/2
Sentiment: Negative
Rule-based algorithm #1/3
• Suits micro-posts (twitter) or individual sentences• Polarity dictionaries for Russian (1739 positive
and 2338 negative words)• All words are lemmatized (A. Zaliznyak [2])• Set of negations of Russian, that tend to
noticeably affect on polarity of connected word(s): не плохо (not bad); also gap between words are processed correctly, for example: Я не сильно люблю это (I do not strongly like this)
Rule-based algorithm #2/3
• Set of opposite conjunctions of Russian, which affect on polarity of sentence’s subclauses in relation to each other: Большинству это всё нравится, а мне нет (Majority likes this, but I do not)
• totalScore = positiveScore – negativeScore - oppositeConjuctionSentimentScore, whereoppositeConjuctionSentimentScore removes the polarity
mass from the sentence with a conjunction and is: sentimentWordCount / 2
Rule-based algorithm #3/3• Object oriented sentiment detection
• First each sentence of the input text is examined for the presense of the keywords of the object
• If the sentence was found, it is checked for the presence of conjuctions or other boundaries of subclauses (like punctuation)
• If there is no boundary found, the sentiment of the entire found sentence is detected according to the algorithm described above
• If there is a boundary, the subclause containing the keywords is identified and sentiment of the subclause is detected according to the algorithm described above
Performance
• Test data: text reviews (many sentences)• Accuracy of 64%• 92% precision and 69% recall for positive class
when two annotators have agreed• Much lower precision and recall for negative
class (not enough dictionary entries, sentiment for text level to be defined)
• Worked slightly better for 2-way classifier ensemble with Multinomial Naive Bayes [3]
Open problems
• Multi-sentence sentiment detection• Domain adaptation: mining polarity words [4]• Adding more rules for shallow parsing• Trying out formal syntactic parsing• Automatic detection of product names
(Named Entity Recognition)
Questions?
Thank you!
Bibliography
• [1] Bermingham, A. and Smeaton, A.F. (2009). A study of interannotator agreement for opinion retrieval. In SIGIR, 784-785.
• [2] Andrey Zaliznyak. Grammaticheskij slovar' russkogo jazyka. Moskva, 1977, (further editions are 1980, 1987, 2003).
• [3] Poroshin V. (2012). Proof of concept statistical sentiment classification at ROMIP 2011. In Dialog.
Bibliography
• [4] Chetverkin I., Loukachevitch N. (2010). Automatic Extraction of Domain-specific Opinion Words. Dialogue.
• [5] Minqing Hu, Bing Liu. (2004). Mining and summarizing customer reviews. In Proc. of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.