don't you love it? sentiment analysis with crowd sourcing
TRANSCRIPT
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Don't you love it? Sentiment Analysis with
Crowd Sourcing
Wouter van Atteveldt et al.
2017-02-20
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
About me
� Wouter van Atteveldt ( )
� Political communication, VU Amsterdam
� MSc Arti�cial Intelligence (Edinburgh)
� PhD AI & Communication Science (VU Amsterdam)
� Research: Automatic Text Analysis, Data Analysis
� http://vanatteveldt.com
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
What is sentiment analysis?� Measure evaluative meaning of (subjective) language
� "This movie sucks" → negative
� Applications e.g.� analysing hotel reviews� automatic stock market trading� early warning systems (for brands and countries)
� Pang, B., & Lee, L. (2008). Opinion mining andsentiment analysis. Foundations and trends in informationretrieval, 2(1-2), 1-135.
� Liu, B. (2012). Sentiment analysis and opinion mining.Synthesis lectures on human language technologies, 5(1),1-167.
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Common approaches in sentiment analysis
� Human annotation
� Dictionaries of positive, negative terms
� semi-automatic expansion of dictionaries
� Machine learning
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Problems with sentiment analysis
� Evaluations are inherently subjective
� Evaluative language is creative and context-sensitive
� (even more so than factual language)
� Evaluation implies a relation, but most data/toolsundirected
� Source likes/dislikes target
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Problems with sentiment analysis
� "Terrorist Attacks: 250 Innocent Massacred By ISIS"
� "Saddam Hussein was executed by hanging"
� "This car has better mpg than my old Volvo"
� "Preacher who applauded Orlando mass killing asked torelocate"
� "Janeane Garafalo also was an interesting character ."
� "Brexit means brexit"
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Sentiment analysis: what do I want
Task de�nition
Given a piece of text, what do the author and mentionedactors think about all mentioned actors and issues
� Preacher who applauded Orlando mass killing asked torelocate
� → preacher/+/killing
� "This car has better mpg than my old Volvo"
� → author/+/thiscar, author/-/volvo
� "Terrorist Attacks: 250 Innocent Massacred By ISIS"
� → ISIS/-/innocents, author/-/ISIS
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Crowd sourcing
� Use anonymous/untrained people from Internet toperform a task
� Useful for simple tasks
� Split bigger tasks into smaller steps
� Can be very cheap (cents per unit)
� Quality control using test questoins
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Crowd sourcing
� Use anonymous/untrained people from Internet toperform a task
� Useful for simple tasks
� Split bigger tasks into smaller steps
� Can be very cheap (cents per unit)
� Quality control using test questoins
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
CrowdFlower
� Platform for easily distributing tasks
� Other platforms exist, e.g. mturk
https://make.crowdflower.com/jobs/933225/editor
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Why use Crowd Sourcing for sentiment analysis?
� Task is easy to explain
� Judgment is subjective anyway
� Low cost means multiple codings per unit possible
� Low cost means task/domain dependent analysis viable
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Why use Crowd Sourcing for sentiment analysis?
� Task is easy to explain
� Judgment is subjective anyway
� Low cost means multiple codings per unit possible
� Low cost means task/domain dependent analysis viable
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Previous work in social science
� Ken Benoit et al, Crowd-sourced text analysis, APSR2016
� Martin Haselmayer et al, Sentiment analysis of political
communication: combining a dictionary approach with
crowdcoding, QQ 2016
� Richard Socher et al, Recursive Deep Models for
Semantic Compositionality Over a Sentiment Treebank,EMNLP 2013
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Benoit et al
� Compare crowd coding to manual coding
� Policy positions in party manifestoes
� 18k sentences, 4-6 expert codings, 5-20 crowd codings
Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., &Mikhaylov, S. (2016). Crowd-sourced text analysis:
reproducible and agile production of political data. AmericanPolitical Science Review, 110(02), 278-295.
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Benoit et al
Errors when using multiple experts / crowd coders
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Benoit et al
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Haselmayer et al
� Create German sentiment dictionary using codings
� Applied to election coverage statements
� 13k sentences, 130k codings (2k euro)
Haselmayer, M., & Jenny, M. (2016), Sentiment analysis of
political communication: combining a dictionary approach
with crowdcoding. Quality & Quantity, 1-24.
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Haselmayer et al
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Socher et al
� Recursive Neural Network of sentiment
� Crowd sourced 215k phrases in 12k sentences
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Socher et al
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Socher et al
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Existing work: summary
� Promising results of crowd sourcing
� 3 use cases / approaches for crowd results
� Direct use� Build dictionary� Build statistical model
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Our Objectives
� Build sentiment tools for Dutch, English
� Compare 3 approaches
� Evaluate accuracy, costs, generalizability
ICA Panel proposal: Automatic Sentiment Analysis forCommunication Research (Wouter van Atteveldt and PabloBarbera)Using crowdsourcing for developing an attributed sentiment analysistool (Wouter van Atteveldt, Antske Fokkens, Isa Maks, Kevin vanVeenen, and Mariken van der Velden)
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Method
� Tweets and newspaper sentences about Ukraine treaty
� Coded manually and by CrowdFlower
� Compared to sentiment dictionary and Coosto
� (Coosto is a social media analytics company that doessentiment analysis)
� (n~200)
Kevin van Veenen (MA-thesis 2016), Methodologal study ofautomatic sentiment analysis in political news
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Results
Method n Reliability Correlation(kappa) (rho)
All sentences:Expert vs Dictionary 139 .38 .48Expert vs Domain Dictionary 158 .58 .58Expert vs Crowd 190 .86 .86Tweets:Expert vs. Coosto 132 .49 .56Dictionary vs. Coosto 104 .63 .68Domain dictionary vs Coosto 109 .43 .55Crowd vs Coosto 132 .52 .59
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Accuracy by number of coders
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Costs
Task N Cost codings / $Expert coding 190 87.5 2.2Crowd (15 codings) 450 10.78 42Crowd (3 codings) 480 4.02 119
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Conclusion
� Crowd Sourcing cheap and accurate
� To-do:
� Finish English task� Train machine learning / dictionary with crowd data� Error analysis / validation� Create easy tool for using results
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Other work
Some other things I am working on:
� AmCAT Text Analysis Toolkit
� NLPipe Linguistic Processing
� Clause analysis and Source Detection
� Scraping Cantonese
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
AmCAT
� Amsterdam Content Analysis Toolkit
� Open source web-based text analysis tool
� Automatic analysis, quantitative manual analysis� Multi-user, permissions per project� REST API, integration with R/python
� Setup your own server or use ours
� (experimental docker support)
� http://wiki.amcat.nl, https://amcat.nl
� http://github.com/amcat/amcat
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
NLPipe
� Natural Language Pipelining
� Easy docker-installable server for NLP
� Lemmatizing, POS tagging, parsing
� Integrated with R / Python
� Modules for Dutch, English
� http://github.com/vanatteveldt/nlpipe
Automatic Text Analysis Made Easy: Using AmCAT, NLPipe and R
to do corpus management, linguistic processing, and automatic text
analysis. Wouter van Atteveldt, Kasper Welbers, Antske Fokkens,Nel Ruigrok, Martijn Bastiaan, Christian Stuart (ICA 2017)
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Clause Analysis / RSyntax
� Detect source use and clauses
� Who does what to whom
� http://vanatteveldt.com/2016-clause-analysis/
(In press, Political Analysis)
� http://github.com/vanatteveldt/rsyntax
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Scraping Cantonese
� With Chris Fei Shen
� HK Discussion forums in Cantonese
� No segmenter exists for Cantonese
� Cannot do corpus analysis, machine learning etc.
� Scrape discusshk.com, build segmenter fromtrigrams/bigrams
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Scraping Cantonese
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.
Sentiment Analysis Sentiment Analysis: Methods and Results Other work
Conclusion
� I like automatic text analysis :)
� All tools, code available on github
� Try it out and let me know!
Slides: http://vanatteveldt.com/cityu_seminar
Don't you love it? Sentiment Analysis with Crowd Sourcing Wouter van Atteveldt et al.