diving into twitter data on consumer electronic brands
DESCRIPTION
Which consumer electronic brands get tweeted about most? Which brands have more positive/negative sentiment? To find out, 15.3 gb of tweets was downloaded from 13 - 25 May using Python and then analysed in R.TRANSCRIPT
Diving into Twitter dataon consumer electronic brands
Which brands get tweeted about most? Is it mainly positive or negative?
15.3 gb of JSON data downloaded from Twitter’s Streaming API
between 13 – 25 May using Python
Before processing, tweets were in raw JSON format
Time Created Tweet text/status
Username
Tweet location (if available)
No. of followers No. of people followed
No. of statusesLanguage
Data should be optimized as only a fraction of the data used for analysis—
optimization improves performance in models and saves cost and time
The same tweet we saw previously
By optimizing the data,
15.3 gb of json was converted to 757 mb of csv (5% of original size)
After processing, only some fields retained and converted to CSV format
Brand Positive Sentiment
Brand Negative Sentiment
Brand Mixed Sentiment
The list of words for sentiment analysis is adapted from
the Harvard General Inquirer dictionaries Source: http://www.wjh.harvard.edu/~inquirer/homecat.htm, downloaded on 28 May 2014
Tweets are then tagged for brand and sentiment in R
Initially, collected tweets based on 17 keywords
Samsung
S4
Xperia
HTC
Huawei
BlackBerry
Apple
S5
Sony
Nokia
Note 3Lumia
q5
iPhone
q10
z10
Motorala
“Apple” and “iPhone” accounted for 87% of tweet volume
Removed from keywords during actual data collection to focus on
other brands (, save space, and reduce bandwidth usage)
A trial was conducted with 16 keywords on 11 May, 8 – 9am
1 gb of JSON data was collected in a hour
During a one hour trial, “Apple” and “iPhone” had 87% share of tweets
Samsung
Sony
Nokia
HTC
Huawei
BlackBerry
Motorola
Tweets containing seven keywords were collected from 13 – 25 May
4% of tweets mentioned
> 2 brands; they were
excluded from analysis
8% of tweets had
mixed sentiment
(i.e., positive and
negative sentiment);
they were excluded
from analysis
92% of tweets
remained, each only
mentioning 1 brand
with either “positive”,
“negative”, or
“neutral” sentiment
3,681,942 tweets were collected
After processing, 3,234,678 tweets remained for analysis
Samsung leads in twitter buzz, followed by Sony and Nokia
Together, they make up 75% of twitter buzz
Samsung is the clear leader in twitter buzz, followed by Sony and Nokia
However, Samsung and Sony have wider product offerings
relative to the rest that mainly focus on phones
Also, Huawei’s users may mainly be on Weibo, Renren, etc
Most brands have roughly 1:1 ratio of
positive to negative tweets
Samsung is the exception with ratio of
roughly 3:2
Brands have equal ratio of positive to negative tweets
Dip due to connectivity issues
Brands’ share of tweets is roughly consistent over time
Spikes in tweet volume coincide with product launches
Spikes in tweet volume coincide with product launches
Users who tweet about
BlackBerry tend to be
better connected (i.e.,
higher median of
followers and people
followed)*
* Excluding outliers
Across brands, there is not much difference in user connectedness
The median user has
around 250 followers
and also follows 250
people
50th – 75th percentile of users
who tweet about Sony, HTC,
and Motorola have very high
numbers of all time tweets
(spam bots perhaps?)*
While Nokia is 3rd in twitter buzz
share (14%), users who tweet
about Nokia have least
numbers of all time tweets
Suggests that tweets likely to
come from real users and not
bots (or maybe less active bots)
* Excluding outliers
However, there is a large difference between users’ all time tweets
12833979
followers
11796709
followers
CNN’s tweet on Obama’s BlackBerry was “seen” by most followers
1753696 tweets
1730006
tweets
A bot that retweets on farts has the highest all time tweets
1753696 tweets
1730006
tweets
A bot that retweets on farts has the highest all time tweets
Initially, BlackBerry tweets showed 100% negative sentiment
Culprit was the word “lack”—it was removed
However, removing it reduced negative sentiment for other
brands by 2 – 3 %
An interesting error led to BlackBerry having 100% negative sentiment
Track brands’ managed twitter accounts and conversations to measure engagement Which brands have better engagement with users and why?
Track general message of tweets Are tweets of a brand mainly about sales, reviews, complaints, or news?
Network analysis to identify users with high centrality and influence Which users have high influence and what are they tweeting about my brand?
Geospatial analysis of tweets Are there differences in brand buzz, sentiment, and engagement across regions?
Where do we go from here?
Code available on GitHub: https://github.com/eugeneyan/Twitter-SMA
Python script to download
tweets in JSON format
Python scripts to convert
tweets from JSON to CSV
(with & without regular
expressions filtering)
R script and sentiment
analysis list of words
R script and sentiment
analysis list of words to
reproduce BlackBerry error