rule based approach to sentiment analysis at romip’11 slides

14
Rule-based approach to sentiment analysis at ROMIP’11 Dmitry Kan [email protected] Twitter: @DmitryKan AlphaSense Inc Dialogue, 2012

Upload: dmitry-kan

Post on 26-Jun-2015

2.283 views

Category:

Technology


2 download

DESCRIPTION

Slides for presentation at Dialogue'12 API (free access for devs available): https://mashape.com/dmitrykey/russiansentimentanalyzer

TRANSCRIPT

Page 1: Rule based approach to sentiment analysis at romip’11 slides

Rule-based approach to sentiment analysis at ROMIP’11

Dmitry [email protected]: @DmitryKan

AlphaSense IncDialogue, 2012

Page 2: Rule based approach to sentiment analysis at romip’11 slides

Outline

• Problem definition• Base level for accuracy• Towards shallow parsing of input text• Rule-based algorithm• Object-oriented sentiment detection• Performance• Open problems

Page 3: Rule based approach to sentiment analysis at romip’11 slides

Problem definition

• What is sentiment for people:– Mood of the author? Mood of the reader? Personal

attitude?– Opinion about the target object (product etc)?– Something else, defined by an annotator’s boss?

• What is sentiment for a computer:– General polarity background– General opinion mining– Object (product) oriented opinion mining– Polarity strength detection

Page 4: Rule based approach to sentiment analysis at romip’11 slides

Base level for accuracy

• cross-annotator agreement gives 80% [1]• Real performance of the system is the one it

shows when used on un-annotated data• Real example: ”CEO of the company turned 50”

(was marked as positive -> why?)• Some machine learning (ML) methods can give

90% and more on test data• Hard (unless impossible) to do object oriented

sentiment detection with ML

Page 5: Rule based approach to sentiment analysis at romip’11 slides

Towards shallow parsing of input text

Majority likes this, but I do not like this

Subclause 1 Subclause 2

Opposite conjunction

I liked new iPhone, but GalaxyS is not easy to use

Subclause 1 Subclause 2

Opposite conjunctionnegation

negation

Object: iPhone

iPhone

Sentiment: positive

Object: GalaxyS Sentiment: negative

Object: - Sentiment: neutral (mixed)

GalaxyS

totalSentimentScore = totalPositiveScore – totalNegativeScore - ½ * sentimentCount, if opp. conj found

0, if no opp conj found

NOT(polarity) = opposite_polarity

Page 6: Rule based approach to sentiment analysis at romip’11 slides

Rule based algorithm flow on example sentence

Majority likes this, but I do not like this.Phase1 (negations): posScore = 0 – negation weight = -2

Phase2 (individual words):Word ”likes”: posScore = -2 + 1 = -1Word ”not”: negScore = 0 + 1 = 1Word ”like”: posScore = -1 + 1 = 0

Phase3 (oppositeConjuctions): sentimentCount = 3

totalScore = posScore – negScore – ½ * sentimentCount = 0 – 1 – 3/2 = -5/2

Sentiment: Negative

Page 7: Rule based approach to sentiment analysis at romip’11 slides

Rule-based algorithm #1/3

• Suits micro-posts (twitter) or individual sentences• Polarity dictionaries for Russian (1739 positive

and 2338 negative words)• All words are lemmatized (A. Zaliznyak [2])• Set of negations of Russian, that tend to

noticeably affect on polarity of connected word(s): не плохо (not bad); also gap between words are processed correctly, for example: Я не сильно люблю это (I do not strongly like this)

Page 8: Rule based approach to sentiment analysis at romip’11 slides

Rule-based algorithm #2/3

• Set of opposite conjunctions of Russian, which affect on polarity of sentence’s subclauses in relation to each other: Большинству это всё нравится, а мне нет (Majority likes this, but I do not)

• totalScore = positiveScore – negativeScore - oppositeConjuctionSentimentScore, whereoppositeConjuctionSentimentScore removes the polarity

mass from the sentence with a conjunction and is: sentimentWordCount / 2

Page 9: Rule based approach to sentiment analysis at romip’11 slides

Rule-based algorithm #3/3• Object oriented sentiment detection

• First each sentence of the input text is examined for the presense of the keywords of the object

• If the sentence was found, it is checked for the presence of conjuctions or other boundaries of subclauses (like punctuation)

• If there is no boundary found, the sentiment of the entire found sentence is detected according to the algorithm described above

• If there is a boundary, the subclause containing the keywords is identified and sentiment of the subclause is detected according to the algorithm described above

Page 10: Rule based approach to sentiment analysis at romip’11 slides

Performance

• Test data: text reviews (many sentences)• Accuracy of 64%• 92% precision and 69% recall for positive class

when two annotators have agreed• Much lower precision and recall for negative

class (not enough dictionary entries, sentiment for text level to be defined)

• Worked slightly better for 2-way classifier ensemble with Multinomial Naive Bayes [3]

Page 11: Rule based approach to sentiment analysis at romip’11 slides

Open problems

• Multi-sentence sentiment detection• Domain adaptation: mining polarity words [4]• Adding more rules for shallow parsing• Trying out formal syntactic parsing• Automatic detection of product names

(Named Entity Recognition)

Page 12: Rule based approach to sentiment analysis at romip’11 slides

Questions?

Thank you!

Page 13: Rule based approach to sentiment analysis at romip’11 slides

Bibliography

• [1] Bermingham, A. and Smeaton, A.F. (2009). A study of interannotator agreement for opinion retrieval. In SIGIR, 784-785.

• [2] Andrey Zaliznyak. Grammaticheskij slovar' russkogo jazyka. Moskva, 1977, (further editions are 1980, 1987, 2003).

• [3] Poroshin V. (2012). Proof of concept statistical sentiment classification at ROMIP 2011. In Dialog.

Page 14: Rule based approach to sentiment analysis at romip’11 slides

Bibliography

• [4] Chetverkin I., Loukachevitch N. (2010). Automatic Extraction of Domain-specific Opinion Words. Dialogue.

• [5] Minqing Hu, Bing Liu. (2004). Mining and summarizing customer reviews. In Proc. of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.