potential and limitations of commercial sentiment detection tools
DESCRIPTION
In this paper, we analyze the quality of several commercial tools for sentiment detection. All tools are tested on nearly 30’000 short texts from various sources, such as tweets, news, reviews etc. In addition to the quality analysis (measured by various metrics), we also investigate the effect of increasing text length on the performance. Finally, we show that combining all tools using machine learning techniques increases the overall performance significantly.TRANSCRIPT
Zurich Universitiy of Applied Sciences
Potential and Limitations of Commercial Sentiment Detection Tools
Fatih Uzdillijoint work with Mark Cieliebak and Oliver Dürr
03.12.2013 @ ESSEM’13
Zurich Universitiy of Applied Sciences
About Me
Fatih UzdilliInstitute of Applied Information Technology (InIT)
ZHAW, Winterthur, Switzerland
Email: , more about me: home.zhaw.ch/~uzdi
Research Interest
Information Retrieval, Machine Learning, Sentiment Analysis
Background
Software Engineer, Social Media Monitoring, Search Technologies
203.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Abstract
Evaluation of 9 commercial sentiment tools on approx. 30'000 short texts.
Best commercial tools have accuracy of only 60%.
Combining all tools using Random Forest improved the accuracy.
303.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Motivation
• Scientific results for sentiment detection: «very good performance: > 80%
accuracy»
• Blog posts about commercial tools: «very poor quality,
unusable»4
C
D03.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Motivation
• Scientific results for sentiment detection: «very good performance: > 80%
accuracy»
• Blog posts about commercial tools: «very poor quality,
unusable»5
C
DReally?
03.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
How good is commercial Sentiment Detection?
Is there potential for improvement?
6
source: http://www.commute.com/images/schools_evaluation.jpg source: http://3.bp.blogspot.com/-u3acK_WjaLU/ULYv51mHEhI/ AAAAAAAAARY/ DIZqOfxuswc/s1600/IcebergQ1.jpg
03.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Evaluation Setup
• 7 Public Text Corpora– Single Statements– Different Media Types
• Tweet, News, Review, Speech Transcript
– Total: 28653 Texts
• 9 Commercial APIs– Stand-alone– Free for this evaluation– Arbitrary Text
NEGATIVEPOSITIVE
OTHER( neutral / mixed )
703.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Tool Accuracy
C1(Tweets)
C2(Quotations)
C3(Reviews)
C4(Headlines)
C5(Reviews)
C6(Reviews)C7(News)
0.2
0.3
0.4
0.5
0.6
0.7
0.8Best Tool per Corpus
Average of All Tools
Worst Tool per Corpus
Acc
ura
cy
8
61%
52%
40%
Avg.
03.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Tool Accuracy
C1(Tweets)
C2(Quotations)
C3(Reviews)
C4(Headlines)
C5(Reviews)
C6(Reviews)C7(News)
0.2
0.3
0.4
0.5
0.6
0.7
0.8Best Tool per CorpusAverage of All ToolsWorst Tool per CorpusOverall Best Tool
Acc
ura
cy
9
61%52%40%59%45%
Avg.
03.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Further Findings
• Longer texts are hard to classify
• Corpus annotations might be erroneous
1003.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Can a Meta-Classifier do better?
• 1st Approach: Majority Classifier– Sentiment with most votes chosen
Illustration:
1103.12.2013 Fatih Uzdilli
api1 api2 api3 api4 api5 api6 api7 MajorityText 1 + + - o - + o +Text 2 - + + - - - - -Text 3 - o + + + + - +Text n o o + o - o o o
Zurich Universitiy of Applied Sciences
Tool Accuracy
C1(Tweets)
C2(Quotations)C3(Reviews)
C4(Headlines)C5(Reviews)
C6(Reviews)C7(News)
0.2
0.3
0.4
0.5
0.6
0.7
0.8Best Tool per Corpus
Average of All Tools
Worst Tool per Corpus
Acc
ura
cy
1203.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Majority Classifier beats Average
C1(Tweets)
C2(Quotations)C3(Reviews)
C4(Headlines)C5(Reviews)
C6(Reviews)C7(News)
0.2
0.3
0.4
0.5
0.6
0.7
0.8Best Tool per CorpusAverage of All ToolsWorst Tool per CorpusMajority Classifier
Acc
ura
cy
1303.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
2nd Approach: Random-Forest
1403.12.2013 Fatih Uzdilli
api1 api2 api3 … api n
annotation
Text 1 + - + … o +
Text 2 - + o … + -
Text 3 - o - … + -
Text 4 + o + … - +
Text 5 + o + … o o
Text 6 + o o … - o
Text 7 + - + … o unknown
Text 8 + + o … - unknown
Text 9 o - + … o unknown
Random Forest
Classifier
+
+
o
Train
Train
Train
Train
Predict
Predict
Predict
Train
Train
Zurich Universitiy of Applied Sciences
Before Random Forest
C1(Tweets)
C2(Quotations)C3(Reviews)
C4(Headlines)C5(Reviews)
C6(Reviews)C7(News)
0.2
0.3
0.4
0.5
0.6
0.7
0.8Best Tool per CorpusAverage of All ToolsWorst Tool per CorpusMajority Classifier
Acc
ura
cy
1503.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Random Forest Beats Best Single Tool
C1(Tweets)
C2(Quotations)C3(Reviews)
C4(Headlines)C5(Reviews)
C6(Reviews)C7(News)
0.2
0.3
0.4
0.5
0.6
0.7
0.8Best Tool per CorpusAverage of All ToolsWorst Tool per CorpusMajority Classifier
Acc
ura
cy
1603.12.2013 Fatih Uzdilli
Zurich Universitiy of Applied Sciences
Summary
• Best Tool: 59% Accuracy• Random Forest combination: Up to 9% improvement
<=9%
1703.12.2013 Fatih Uzdilli