rule based approach to sentiment analysis at romip’11 slides

Rule-based approach to sentiment analysis at ROMIP’11

Dmitry [email protected]: @DmitryKan

AlphaSense IncDialogue, 2012

mailto:[email protected]

Outline

• Problem definition• Base level for accuracy• Towards shallow parsing of input text• Rule-based algorithm• Object-oriented sentiment detection• Performance• Open problems

Problem definition

• What is sentiment for people:– Mood of the author? Mood of the reader? Personal

attitude?– Opinion about the target object (product etc)?– Something else, defined by an annotator’s boss?

• What is sentiment for a computer:– General polarity background– General opinion mining– Object (product) oriented opinion mining– Polarity strength detection

Base level for accuracy

• cross-annotator agreement gives 80% [1]• Real performance of the system is the one it

shows when used on un-annotated data• Real example: ”CEO of the company turned 50”

(was marked as positive -> why?)• Some machine learning (ML) methods can give

90% and more on test data• Hard (unless impossible) to do object oriented

sentiment detection with ML

Towards shallow parsing of input text

Majority likes this, but I do not like this

Subclause 1 Subclause 2

Opposite conjunction

I liked new iPhone, but GalaxyS is not easy to use

Subclause 1 Subclause 2

Opposite conjunctionnegation

negation

Object: iPhone

iPhone

Sentiment: positive

Object: GalaxyS Sentiment: negative

Object: - Sentiment: neutral (mixed)

GalaxyS

totalSentimentScore = totalPositiveScore – totalNegativeScore - ½ * sentimentCount, if opp. conj found

0, if no opp conj found

NOT(polarity) = opposite_polarity

Rule based algorithm flow on example sentence

Majority likes this, but I do not like this.Phase1 (negations): posScore = 0 – negation weight = -2

Phase2 (individual words):Word ”likes”: posScore = -2 + 1 = -1Word ”not”: negScore = 0 + 1 = 1Word ”like”: posScore = -1 + 1 = 0

Phase3 (oppositeConjuctions): sentimentCount = 3

totalScore = posScore – negScore – ½ * sentimentCount = 0 – 1 – 3/2 = -5/2

Sentiment: Negative

Rule-based algorithm #1/3

• Suits micro-posts (twitter) or individual sentences• Polarity dictionaries for Russian (1739 positive

and 2338 negative words)• All words are lemmatized (A. Zaliznyak [2])• Set of negations of Russian, that tend to

noticeably affect on polarity of connected word(s): не плохо (not bad); also gap between words are processed correctly, for example: Я не сильно люблю это (I do not strongly like this)

Rule-based algorithm #2/3

• Set of opposite conjunctions of Russian, which affect on polarity of sentence’s subclauses in relation to each other: Большинству это всё нравится, а мне нет (Majority likes this, but I do not)

• totalScore = positiveScore – negativeScore - oppositeConjuctionSentimentScore, whereoppositeConjuctionSentimentScore removes the polarity

mass from the sentence with a conjunction and is: sentimentWordCount / 2

Rule-based algorithm #3/3• Object oriented sentiment detection

• First each sentence of the input text is examined for the presense of the keywords of the object

• If the sentence was found, it is checked for the presence of conjuctions or other boundaries of subclauses (like punctuation)

• If there is no boundary found, the sentiment of the entire found sentence is detected according to the algorithm described above

• If there is a boundary, the subclause containing the keywords is identified and sentiment of the subclause is detected according to the algorithm described above

Performance

• Test data: text reviews (many sentences)• Accuracy of 64%• 92% precision and 69% recall for positive class

when two annotators have agreed• Much lower precision and recall for negative

class (not enough dictionary entries, sentiment for text level to be defined)

• Worked slightly better for 2-way classifier ensemble with Multinomial Naive Bayes [3]

Open problems

• Multi-sentence sentiment detection• Domain adaptation: mining polarity words [4]• Adding more rules for shallow parsing• Trying out formal syntactic parsing• Automatic detection of product names

(Named Entity Recognition)

Questions?

Thank you!

Bibliography

• [1] Bermingham, A. and Smeaton, A.F. (2009). A study of interannotator agreement for opinion retrieval. In SIGIR, 784-785.

• [2] Andrey Zaliznyak. Grammaticheskij slovar' russkogo jazyka. Moskva, 1977, (further editions are 1980, 1987, 2003).

• [3] Poroshin V. (2012). Proof of concept statistical sentiment classification at ROMIP 2011. In Dialog.

Bibliography

• [4] Chetverkin I., Loukachevitch N. (2010). Automatic Extraction of Domain-specific Opinion Words. Dialogue.

• [5] Minqing Hu, Bing Liu. (2004). Mining and summarizing customer reviews. In Proc. of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.

rule based approach to sentiment analysis at romip’11 slides

Technology