challenges analyzing twitter public opinion …...twitter users and the general public, and estimate...

17
The Challenges in Analyzing Twitter Data for Public Opinion Researchers Masahiko Aida, Director of Analytics

Upload: others

Post on 11-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

The Challenges in Analyzing Twitter Data for Public Opinion Researchers

Masahiko Aida, Director of Analytics

Page 2: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

WHY TWITTER?

© 2012 All Rights Reserved. 1

• Access to Twitter data is open, unlike Facebook

• User base is large– 140 million users (901 million FB users, 100 million Google+ users)

• Ubiquitous among Politicians in the US (as of May 2012)

– 375 House members (of 435)– 92  Senators ( of 100)– 49 Governors (of 51)

Page 3: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

CHALLENGES IN TWITTER ANALYSIS

• Sampling and Coverage Problem– The volume of Twitter data can be large and it can be costly to 

obtain and store– Coverage issue

• Text Analysis Problem 

• Inference Problem 

© 2012 All Rights Reserved. 2

Page 4: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

DATA SIZE AND SAMPLING

© 2012 All Rights Reserved. 3

Source: from twitter

• During the State of Union address, there were 766,681 SOTU related tweets in 95 minutes.  (8,070 tweets per minute).  The data will be approximately 600MB.

• Imagine saving all the tweets during an election year.

• However, one can sample and save subset.

Page 5: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

BarackBarack

COVERAGE – OBAMA TWEETS EXAMPLE

© 2012 All Rights Reserved. 4

• Ideally, we want to assign a non‐zero chance of selection to all the tweets that discuss a particular topic.

• However, with 340 million tweets a day,  it is extremely inefficient to pull  a random sample.– Ex. There are about 20,000 tweets 

that include “Barack Obama” a day, it is 0.0059% of all tweets.

• Another possibility is to query related words such as “the president”.– However, it will increase noise.

ObamaObama

The President

The President

Universe

Missed tweets (φ)

Irreverent tweets

Irreverent tweets

Page 6: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

POTENTIAL SOLUTION FOR COVERAGE ERRORS

© 2012 All Rights Reserved. 5

• Stratified Approach – create list of users from a keyword query and pull tweets targeting user IDs.  

High Density Set of users who tweeted Obama within 3 days Least Expensive

Mid Density Set of users who tweeted Obama within 1 week Inexpensive

Low Density Set of users who tweeted Obama within 1 month Expensive

Very Low Density

Users who have not tweeted Obama more than 1 month Cost Prohibitive

Page 7: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

TEXT ANALYSIS

© 2012 All Rights Reserved. 6

• Many of our tools are assuming numeric data and it is very difficult to translate/map text sentences into numeric values.

• Several vendors offer rule based sentiment scores.– Ex. Like, love, greedy, enthusiastic– Cannot handle sarcasm.– Vendors use secrete proprietary algorithms to code

• Alternative: supervised learning methods

Page 8: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

HOW SENTIMENT CODINGWORKS

© 2012 All Rights Reserved. 7

Tweets

Human coder classifies and assigns sentiment (training dataset)

Create supervised learning models

Tweets

Tweets

Tweets

Example: R‐text‐tool: Timothy P. Jurka, Loren Collingwood, Amber E. Boydstun, EmilianoGrossman and Wouter van Atteveldt (2012). RTextTools: Automatic Text Classification via Supervised Learning. R package version 1.3.6. http://CRAN.R‐project.org/package=RTextTools

Page 9: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

OTHER TEXT ANALYSIS EXAMPLES

© 2012 All Rights Reserved. 8

• Forget about quantifying – use a strictly visual approach.– Appearances of nouns and adjectives.– Process tweets with natural language processing software

• Network analysis– Identify “influentials” in the network

• Data: Twitter data that includes following1. Mitt Romney2. WI recall election

Page 10: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

FREQUENT TERMS THAT DESCRIBE ROMNEY

© 2012 All Rights Reserved. 9

Increased mentions of Romney’s money

War on Women

Romney’s proposed tax rate

Data: Sample of tweets that include “Mitt Romney”.

Processed with natural language analysis package using Python.

Data: Sample of tweets that include “Mitt Romney”.

Processed with natural language analysis package using Python.

Page 11: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

GOP PRIMARY POLLS AND TWITTER

© 2012 All Rights Reserved. 10

1st row : GOP primary public polling summary.2nd row : frequency of candidate names from twitter sample.

Page 12: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

VISUALIZINGMENTIONS : WI RECALL ELECTION

© 2012 All Rights Reserved. 11

Liberal News MediaLiberal News Media Tea Party Types

Tea Party Types

Method: Collect sample of tweets that include “WI re‐election”, “Scott Walker”.

Visualize relationships of mentions.

Rasmussen pollRasmussen poll

Page 13: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

CONSERVATIVE TWITTER ACCOUNTS

© 2012 All Rights Reserved. 12

Brother of Rush Limbaugh

Brother of Rush Limbaugh

Network visualization allows us to see popular news sources and how mentions are clustered ideologically.

Page 14: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

PROBLEM OF INFERENCE I

© 2012 All Rights Reserved. 13

• We may develop better means of predicting sentiment and sampling, thus the measurement of Twitter opinion will improve as we gain experience.

• However, the distribution of opinions on Twitter is not directly transferable to the opinions of the general public or likely voters.

• Can we find a way to infer the opinions of the general  population from Twitter data?  Maybe.

Page 15: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

PROBLEM OF INFERENCE II

© 2012 All Rights Reserved. 14

• The purpose of research is not necessarily obtaining unbiased point estimates of a population parameter.

No smoke where there is no fire.

– Ex. Suppose one person claims  “All swans are white.”– I just need one black swan to prove that is not the case.

Page 16: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

PROBLEM OF INFERENCE III : MODEL BASED

© 2012 All Rights Reserved. 15

• If we can approximate the mechanism that separates Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller.

– Ex. Heckman model, sample matching (YouGovPolimetrix surveys)

– Ex.  Twitter sentiment of a college educated gay black man who lives in Ohio

– Likely support Obama and do not favor Romney.

Page 17: Challenges Analyzing Twitter Public Opinion …...Twitter users and the general public, and estimate opinion with the correct specification, the bias will be smaller. – Ex. Heckman

SUMMARY

© 2012 All Rights Reserved. 16

• Technical challenges that can be solved– Sampling and coverage issue in Twitter sampling– Mapping of text data into a scale

• Problems that are hard to solve in the near future– Generalization of opinion distribution to general public

• Think differently– Use Twitter to find emerging or rare patterns– Use Twitter to see how people are obtaining information– Different types of inference (find smoke)