aapor - comparing found data from social media and made data from surveys

32
"When Are Big Data Methods Trustworthy for Social Measurement?" Cliff Lampe (@clifflampe), Josh Pasek, Lauren Guggenheim, Fred Conrad University of Michigan Michael Schober The New School for Social Research

Upload: cliff-lampe

Post on 20-Dec-2014

135 views

Category:

Social Media


1 download

DESCRIPTION

This presentation was for the 2014 AAPOR conference, and deals with specific components of how "big data" from social media is different from data acquired through surveys.

TRANSCRIPT

Page 1: AAPOR - comparing found data from social media and made data from surveys

"When Are Big Data Methods Trustworthy for Social Measurement?"

Cliff Lampe (@clifflampe), Josh Pasek, Lauren Guggenheim, Fred ConradUniversity of Michigan

Michael SchoberThe New School for Social Research

Page 2: AAPOR - comparing found data from social media and made data from surveys

Presenting on “Big Data”

• Cliff Lampe– University of Michigan

School of Information– Social Scientist who uses

some Big Data techniques

– NOT A REAL DATA SCIENTIST

– Background in survey research

Page 3: AAPOR - comparing found data from social media and made data from surveys

Mostly publish in Computer Science conferences

Page 4: AAPOR - comparing found data from social media and made data from surveys

CHI – Computer Human InteractionKDD – Knowledge Discovery and Data MiningWSDM – Web Search and Data Mining

Page 5: AAPOR - comparing found data from social media and made data from surveys

Ironically Data-Free Presentation

Today we are presenting on methodological issues of Big Social Data and surveys. Not presenting new data.

First we describe Big Data and Big Social Data as terms.

Then we describe methodological considerations at the intersection of surveys and Big Social Data

Page 6: AAPOR - comparing found data from social media and made data from surveys

There have been many hyperbolic claims about Big Data

Is Big Data going to replace other forms of social measurement, or is it too flawed to survive (HINT: Neither)

Page 7: AAPOR - comparing found data from social media and made data from surveys

What is Big Data?

Page 8: AAPOR - comparing found data from social media and made data from surveys

Big Data started in the physical sciences

Page 9: AAPOR - comparing found data from social media and made data from surveys

Big Data is increasingly being applied to social science questions

Page 10: AAPOR - comparing found data from social media and made data from surveys

What counts as “big”?

LHC: .001% of sensors lead to 25 petabytes annually.Wikipedia: 17 terabytesTwitter: ~ 10 GB/day

How many observations needed to count as “big”?

Note: 100 million records not all that big.

Page 11: AAPOR - comparing found data from social media and made data from surveys

Almost nobody who uses these techniques would use the term “big data”. Similar to surveys vs. polls.

Big Data is short hand for a variety of techniques that include:

- Data capture- Data storage- Data analytics- Search and Retrieval

Page 12: AAPOR - comparing found data from social media and made data from surveys

Challenges in “Big Data”

CaptureCurationStorageSearchSharingTransferAnalysisVisualization

Related terms:

Computational social science, data science, information access and retrieval, Web-scale data, data mining, machine learning, non-reactive data

Page 13: AAPOR - comparing found data from social media and made data from surveys

Big Social Data: large data sets about humans that are collected from social interactions captured online, primarily in social media sites.

Page 14: AAPOR - comparing found data from social media and made data from surveys

What are the characteristics of surveys and Big Social Data that define when they are complementary,

supplementary, or orthogonal?

Page 15: AAPOR - comparing found data from social media and made data from surveys

Bob Groves“Three Eras of Survey Research”

Mick Couper“Is the Sky Falling? New Technology, Changing Media, and the Future of Surveys”

Page 16: AAPOR - comparing found data from social media and made data from surveys

Survey Research80+ years of research and practice

Sampling proceduresQuestion designEstimating precision of statisticsPractices in reducing survey error

Attempt to represent the population of interest with a sample

Page 17: AAPOR - comparing found data from social media and made data from surveys

Research Questions

• Do we see big social data and survey data telling us the same things about society? When and why might this happen?

• How do survey data and big social data compare on important dimensions?

• In what ways are the two fundamentally different from each other?

• How are their uses different from one another?

Page 18: AAPOR - comparing found data from social media and made data from surveys

Highlighting 3 Areas of Concern

How participants understand the activity of responding or posting

Different motivations and communicative dynamics

Nature of the dataDifferent structure, users, and data properties

Practical, ethical, and analytic considerations

Page 19: AAPOR - comparing found data from social media and made data from surveys

Participants Understanding

Page 20: AAPOR - comparing found data from social media and made data from surveys

Participants’ Understanding

– Posting initiative or motivation– Informed consent– Ability to opt out– Prior considerations– User identity– Perceived audience and social desirability– Time pressure/synchrony– Respondent burden

Page 21: AAPOR - comparing found data from social media and made data from surveys

Participants’ Understanding

• Nature of perceived audience– Survey: Interviewer, Organization, others in HH– BSD: Groups of friends, acquaintances, public

• Social Desirability– Survey: Avoid negative evaluations from researcher– BSD: Manage impressions for their audience

• Scale of data• Face threatening topics

Page 22: AAPOR - comparing found data from social media and made data from surveys

Participants’ Understanding

• Identity of user– Survey: Kept anonymous– BSD: User-created persona. Multiple users on a

single account, multiple accounts for one user, corporate users, etc.

• Prior Considerations– Survey: May not have thought about issue– BSD: Have thought about it, maybe not deeply

• Being asked vs caring to post

Page 23: AAPOR - comparing found data from social media and made data from surveys

Nature of the Data

Page 24: AAPOR - comparing found data from social media and made data from surveys

Nature of the Data

– Population coverage– Sampled units– Sampling– Sample size– Temporal properties– Relevance to research topic– Granularity of possible analyses– Data structure– Auxiliary information

Page 25: AAPOR - comparing found data from social media and made data from surveys

Nature of the Data

• Sampling– Surveys: Representative of population of interest (via probability

sampling)– BSD: Users/messages not the full population. User accounts are not

always users. Frequency of posting among users varies

• Sample Size– Surveys: Balance between large enough to make inference and low

cost– BSD: More users and posts than surveys. Limited by access/storage.

• Can size help overcome sampling/representativeness problems?• The aggregation of SM does not necessarily map on to collection of

individual users in survey research

Page 26: AAPOR - comparing found data from social media and made data from surveys

Nature of the Data

• Temporal properties:– Surveys: Memory retrieval, measurement at

discrete moments– BSD: Posting on recent events, continuously

• Auxiliary data:– Surveys: Paradata (# calls, behavior during

interview)– BSD: Geolocation, system activity, profile info

Page 27: AAPOR - comparing found data from social media and made data from surveys

Practical, Ethical and Analytic Considerations

Page 28: AAPOR - comparing found data from social media and made data from surveys

Practical, Ethical, and Analytic Considerations

– Established research communities– Consent to research/IRB– Perception of research among public– Costs to researchers– Data ownership– Adjustments for non-representativeness– Stability of data source and adjustments– Updating models in changing environment– Users and impact

Page 29: AAPOR - comparing found data from social media and made data from surveys

Practical Considerations

• Adjustments for non-representativeness– Surveys: Well developed, weighting– BSD: No standard use, depends on style of analysis,

may not be done if using certain techniques

• Ethical issues– Surveys: Explicit consent, regulated by govm’t/IRB– BSD: Unaware of terms in user agreement,

inconsistently regulated by IRBs

Page 30: AAPOR - comparing found data from social media and made data from surveys

Practical Considerations

• Perception of research/Legitimacy– Surveys: fatigue, falling response rates, confusion

about legitimacy– BSD: not considered while posting, but concerns

over surveillance

Page 31: AAPOR - comparing found data from social media and made data from surveys

YOU’RE SLOW AND EXPENSIVE!

YOU AREN’T REPRESENTATIVE!

Page 32: AAPOR - comparing found data from social media and made data from surveys

Conclusion

We need to stop arguing about the wrong things.

We need a systematic agenda of research looking at the intersection of these [email protected]

[email protected]: @clifflampe