consumer sentiment analysis with twitter
DESCRIPTION
Consumer sentiment analysis with Twitter. Reetta Suonperä August 2013. Two months , one csv.gz file per day In total about 1.2 billion tweets - PowerPoint PPT PresentationTRANSCRIPT
Consumer sentiment analysis with TwitterReetta Suonperä August 2013
My dataset
• Two months, one csv.gz file per day
• In total about 1.2 billion tweets
• It's always easy for a person to say get over, but you don't feel what heart feels to make that statment|PrettynPinkC215|2011-02-01T04:01:16Z|2011-02-01T04:00:48Z|1296532876139018784|
The tools I use
• General approach: natural language processing (NLP)
• The Natural Language Toolkit (NLTK)
Introduction: the consumer sentiment index
• A survey-based indicator of consumer confidence or sentiment
• History goes back to 1946 at University of Michigan
• Ireland’s consumer sentiment index by the ESRI since 1996
ESRI survey questions• Q1: Economic situation in the country (next 12
months)
• Q2: Unemployment in the country (next 12 months)
• Q3: Household financial situation (12 months ago)
• Q4: Household financial situation (next 12 months)
• Q5: Good/bad time to buy large household items
Answers: positive/neutral/negative
This is what it looks like:The KBC/ESRI consumer sentiment index
We can speculate on what drives sentiment – but we can’t really know
On the June 2013 improvement in households’ assessment of their personal finances:
“We think that the ECB rate cut in May played some role … a combination of low inflation, early summer sales and increasing signs of improvement in the residential property market could have contributed…”
On the decline in the July 2013 index:“We think reports that the Irish economy had fallen back into recession and a couple of high profile job loss announcements unnerved consumers last month.”
Motivation: why using Twitter could help
• More timely
• Continuous information
• Save money
• What drives sentiment
Previous research
• O’Connor et al (2010): From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
• An index based on tweets containing the word “jobs” correlates with the Michigan index and Gallup’s daily poll
• Indices with economy or job correlate poorly!
The process (simplified)
Initial wordlist topicsGeneral economic situation
Unemployment/employment
Household financial situation
Buying climate major hh items
General economy Job losses General Acquire/buy
Good times Job gains Income Cost
Bad times Credit Pricey
Econ policy Feeling broke Bargain
Feeling flush
Using WordNet to expand seed wordlist
• Use WordNet to find synonyms for initial keyword list:
• Words have many different meanings
• Include part-of-speech tag
• Word doesn’t exist in WordNet?
• Output does not include tenses or plurals
Pre-processing tasks
• Regular expressions for more basic tasks:
• Cleaning, tokenising URLs, usernames
• NLTK functionality for more complex tasks
• Stopword removal, stemming, POS-tagging
Fine selection – not there yet…
• Do more filtering using bigrams?
• “I broke”
• “pay cut”
• “new job”
• Use POS tags?
• Classification?
• Finalise fine selection
• Sentiment classification
• Visualisation
The to-do list
• www.nltk.org
• Natural Language Processing with Python:http
://nltk.org/book/
• Python Text Processing with NLTK 2.0 Cookbook
Resources
Resources• O’Connor et al (2010): From Tweets to Polls: Linking Text
Sentiment to Public Opinion Time Series• Bollen et al (2011): Twitter mood predicts the stock
market• Bollen et al (2011): Modeling public mood and emotion:
Twitter sentiment and socio-economic phenomena• Go et al (2009): Twitter sentiment classification using
distant supervision• Jiang et al (2011): Target-dependent Twitter Sentiment
Classification
Questions?