diving into twitter data on consumer electronic brands

23
Diving into Twitter data on consumer electronic brands

Upload: eugene-yan

Post on 26-Jan-2015

107 views

Category:

Data & Analytics


2 download

DESCRIPTION

Which consumer electronic brands get tweeted about most? Which brands have more positive/negative sentiment? To find out, 15.3 gb of tweets was downloaded from 13 - 25 May using Python and then analysed in R.

TRANSCRIPT

Page 1: Diving into Twitter data on consumer electronic brands

Diving into Twitter dataon consumer electronic brands

Page 2: Diving into Twitter data on consumer electronic brands

Which brands get tweeted about most? Is it mainly positive or negative?

Page 3: Diving into Twitter data on consumer electronic brands

15.3 gb of JSON data downloaded from Twitter’s Streaming API

between 13 – 25 May using Python

Page 4: Diving into Twitter data on consumer electronic brands

Before processing, tweets were in raw JSON format

Time Created Tweet text/status

Username

Tweet location (if available)

No. of followers No. of people followed

No. of statusesLanguage

Data should be optimized as only a fraction of the data used for analysis—

optimization improves performance in models and saves cost and time

Page 5: Diving into Twitter data on consumer electronic brands

The same tweet we saw previously

By optimizing the data,

15.3 gb of json was converted to 757 mb of csv (5% of original size)

After processing, only some fields retained and converted to CSV format

Page 6: Diving into Twitter data on consumer electronic brands

Brand Positive Sentiment

Brand Negative Sentiment

Brand Mixed Sentiment

The list of words for sentiment analysis is adapted from

the Harvard General Inquirer dictionaries Source: http://www.wjh.harvard.edu/~inquirer/homecat.htm, downloaded on 28 May 2014

Tweets are then tagged for brand and sentiment in R

Page 7: Diving into Twitter data on consumer electronic brands

Initially, collected tweets based on 17 keywords

Samsung

S4

Xperia

HTC

Huawei

BlackBerry

Apple

S5

Sony

Nokia

Note 3Lumia

q5

iPhone

q10

z10

Motorala

Page 8: Diving into Twitter data on consumer electronic brands

“Apple” and “iPhone” accounted for 87% of tweet volume

Removed from keywords during actual data collection to focus on

other brands (, save space, and reduce bandwidth usage)

A trial was conducted with 16 keywords on 11 May, 8 – 9am

1 gb of JSON data was collected in a hour

During a one hour trial, “Apple” and “iPhone” had 87% share of tweets

Page 9: Diving into Twitter data on consumer electronic brands

Samsung

Sony

Nokia

HTC

Huawei

BlackBerry

Motorola

Tweets containing seven keywords were collected from 13 – 25 May

Page 10: Diving into Twitter data on consumer electronic brands

4% of tweets mentioned

> 2 brands; they were

excluded from analysis

8% of tweets had

mixed sentiment

(i.e., positive and

negative sentiment);

they were excluded

from analysis

92% of tweets

remained, each only

mentioning 1 brand

with either “positive”,

“negative”, or

“neutral” sentiment

3,681,942 tweets were collected

After processing, 3,234,678 tweets remained for analysis

Page 11: Diving into Twitter data on consumer electronic brands

Samsung leads in twitter buzz, followed by Sony and Nokia

Together, they make up 75% of twitter buzz

Samsung is the clear leader in twitter buzz, followed by Sony and Nokia

However, Samsung and Sony have wider product offerings

relative to the rest that mainly focus on phones

Also, Huawei’s users may mainly be on Weibo, Renren, etc

Page 12: Diving into Twitter data on consumer electronic brands

Most brands have roughly 1:1 ratio of

positive to negative tweets

Samsung is the exception with ratio of

roughly 3:2

Brands have equal ratio of positive to negative tweets

Page 13: Diving into Twitter data on consumer electronic brands

Dip due to connectivity issues

Brands’ share of tweets is roughly consistent over time

Page 14: Diving into Twitter data on consumer electronic brands

Spikes in tweet volume coincide with product launches

Page 15: Diving into Twitter data on consumer electronic brands

Spikes in tweet volume coincide with product launches

Page 16: Diving into Twitter data on consumer electronic brands

Users who tweet about

BlackBerry tend to be

better connected (i.e.,

higher median of

followers and people

followed)*

* Excluding outliers

Across brands, there is not much difference in user connectedness

The median user has

around 250 followers

and also follows 250

people

Page 17: Diving into Twitter data on consumer electronic brands

50th – 75th percentile of users

who tweet about Sony, HTC,

and Motorola have very high

numbers of all time tweets

(spam bots perhaps?)*

While Nokia is 3rd in twitter buzz

share (14%), users who tweet

about Nokia have least

numbers of all time tweets

Suggests that tweets likely to

come from real users and not

bots (or maybe less active bots)

* Excluding outliers

However, there is a large difference between users’ all time tweets

Page 18: Diving into Twitter data on consumer electronic brands

12833979

followers

11796709

followers

CNN’s tweet on Obama’s BlackBerry was “seen” by most followers

Page 19: Diving into Twitter data on consumer electronic brands

1753696 tweets

1730006

tweets

A bot that retweets on farts has the highest all time tweets

Page 20: Diving into Twitter data on consumer electronic brands

1753696 tweets

1730006

tweets

A bot that retweets on farts has the highest all time tweets

Page 21: Diving into Twitter data on consumer electronic brands

Initially, BlackBerry tweets showed 100% negative sentiment

Culprit was the word “lack”—it was removed

However, removing it reduced negative sentiment for other

brands by 2 – 3 %

An interesting error led to BlackBerry having 100% negative sentiment

Page 22: Diving into Twitter data on consumer electronic brands

Track brands’ managed twitter accounts and conversations to measure engagement Which brands have better engagement with users and why?

Track general message of tweets Are tweets of a brand mainly about sales, reviews, complaints, or news?

Network analysis to identify users with high centrality and influence Which users have high influence and what are they tweeting about my brand?

Geospatial analysis of tweets Are there differences in brand buzz, sentiment, and engagement across regions?

Where do we go from here?

Page 23: Diving into Twitter data on consumer electronic brands

Code available on GitHub: https://github.com/eugeneyan/Twitter-SMA

Python script to download

tweets in JSON format

Python scripts to convert

tweets from JSON to CSV

(with & without regular

expressions filtering)

R script and sentiment

analysis list of words

R script and sentiment

analysis list of words to

reproduce BlackBerry error