accessing and using big data to advance social science knowledge

21
Accessing and Using Big Data to Advance Social Science Knowledge Eric Meyer, Ralph Schroeder, Linnet Taylor and Josh Cowls

Upload: josh-cowls

Post on 10-May-2015

98 views

Category:

Technology


1 download

DESCRIPTION

'Brown bag' presentation at the Oxford Internet Institute, June 2014.

TRANSCRIPT

Page 1: Accessing and Using Big Data to Advance Social Science Knowledge

Accessing and Using Big Data to Advance Social Science Knowledge

Eric Meyer, Ralph Schroeder, Linnet Taylor and Josh Cowls

Page 2: Accessing and Using Big Data to Advance Social Science Knowledge

Overview

• Project Introduction (Eric)– Papers– Conferences and outreach– Data sources

• Big data theory, definition, limits to science and its uses (Ralph)

• Case study: using big data to gauge public opinion (Josh)

Page 3: Accessing and Using Big Data to Advance Social Science Knowledge

Accessing and Using Big Data to Advance Social Science Knowledge

• October 2012 – August 2014• Funded by the Alfred P. Sloan

Foundation• Data sources

• 120+ interviews, mainly with social scientists• Workshops and conferences• No representative sample, but some

patterns of disciplinary and skills background and career trajectory

Page 4: Accessing and Using Big Data to Advance Social Science Knowledge

Project scope to date

• Economics• Business (Nemode project with Greg, Monica)• Wikipedia• Social Media• Development• Political Science• Causation and Theory • Policy• (20+ blog posts on a range of other issues)

Page 5: Accessing and Using Big Data to Advance Social Science Knowledge

Outcomes and outreach

• A series of workshops hosted at the Oxford Internet Institute:– 2013, March. Big Data: Rewards and Risks for the Social Sciences.

Oxford Internet Institute, Oxford. http://www.oii.ox.ac.uk/events/?id=557

– 2013, March. Qualitative methods to study data practices: An international comparative workshop. Oxford Internet Institute, Oxford

– 2013, January. Towards a Sociology of Data. Oxford Internet Institute, Oxford. http://www.oii.ox.ac.uk/events/?id=560

– 2014, May. Big data and social change in the developing world, Rockefeller Foundation Bellagio Centre conference

– 2014, August. American Political Science Association Panel on ‘Big Data and Political Science’

• MSc option course ‘Big Data and Society’

Page 6: Accessing and Using Big Data to Advance Social Science Knowledge

Definition

• ‘Big data’– the advance of knowledge via a leap in the scale

and scope in relation to a given object or phenomenon

• ‘Data’– Belongs to the object– ‘taking…before interpreting’ (Ian Hacking)

• the view that ‘all data are of their nature interpreted’ is misleading: ‘data are made, but as a good first approximation, the making and taking come before interpreting’

– The most atomizable useful unit of analysis

Page 7: Accessing and Using Big Data to Advance Social Science Knowledge

Digital Objects and their Referents

Digital Object(Examples:

Twitter, Tesco Loyalty card information

Real World(People / Physical Objects)

Represent / Manipulate

Page 8: Accessing and Using Big Data to Advance Social Science Knowledge

RepresentingManipulating

Limits

Digital Data

Page 9: Accessing and Using Big Data to Advance Social Science Knowledge

Uses and Limits • Big data research uses (academic, commercial, government) are limited to

the exploitation of suitable objects, and the objects which ‘give off’ digital data, and the phenomena they lay bare, are limited

• The knowledge produced is aimed at ‘sorting people’ and advancing ‘representing and intervening’ (but without ‘manipulating’, except where this is warranted by practical economic and political objectives)

• Difference commercial versus academic world is that knowledge provides competitive and practical advantage as against advancing (high-consensus rapid-discovery) knowledge– The limits in both cases are the objects (to which the data ‘belong’), and that need

to have available digitally manipulable data points

• How available these objects are differs, but also…– Causation and theoretical embedding matters for academic social science– For commercial (and non-academic uses), ‘predicting’ consumer choices and other

behaviours, for limited purposes and without increasing scientific knowledge, is good enough

• There are many objects, for non-academics and scientists to humanities scholars (physical, human, cultural), but they are not infinite

• This availability, not skills or other issues, determines the future of big data research

Page 10: Accessing and Using Big Data to Advance Social Science Knowledge

(Big) data definition enables pinpointing impacts and threats

• ‘Google Plus may not be much of a competitor to Facebook as a social network, but…some analysts…say that Google understands more about people’s social activity than Facebook does.’– New York Times, 15.2. 2014, p. A1 ‘The Plus in Google Plus? It’s Mostly for Google’.

• Facebook Likes: ‘Predicting users’ individual attributes and preferences can beused to improve numerous products and services. For instance, digital systems and devices (such as online stores or cars) could be designed to adjust their behavior to best fit each user’s inferred profile…online insurance…advertisements might emphasize security when facing emotionally unstable (neurotic) users but stress potential threats when dealing with emotionally stable ones’– ‘Private traits and attributes are predictable from digital records of human behavior.’ Kosinski M,

Stillwell D, Graepel T.,Proc Natl Acad Sci 2013 Apr 9;110(15):5802-5.

• More powerful knowledge will enable better services, and more manipulation

Page 11: Accessing and Using Big Data to Advance Social Science Knowledge

Ethical and Social Issues in Big Data Research

• Objects with ‘total’ knowledge (universes)– Danger is inferring behaviour not of individuals, but of classes

of people

• Asymmetry of knower and the subjects of knowledge is greater than elsewhere

• Based not on individuals’ but on aggregate behaviour– Hence only utilitarian, not Kantian justification?

• Why does prediction or uncovering laws of behaviour ‘grate’?

• Benefits: greater scientific power and more specific details

• Relation to smaller data? ‘Creep’• Solution: ethical = greater researcher and public

awareness, regulatory (would apply to academic researchers?) = prevent legal and specific harms

Page 12: Accessing and Using Big Data to Advance Social Science Knowledge

Outlook and Implications• There is an overlap between real world research and

the world of academic research which is closer than elsewhere – because this is the research front in both– because they share common objects

• For research– Develop theoretical frame in which to embed big data (for

social media), including power/function, relation to traditional media, and role in society

• For society– Awareness of how research can generate transparency and

manipulability

• Big Brother? – Yes, but also Brave New World of Omniscience, with Social

Science as Handmaiden

Page 13: Accessing and Using Big Data to Advance Social Science Knowledge

Exemplar case: using big data to gauge public opinion

Opportunities and challenges

Page 14: Accessing and Using Big Data to Advance Social Science Knowledge

The traditional model

• The notion of public opinion was enlivened by the coffee houses of 17th Century Britain

• Inferential statistics provided a rigorous, replicable basis for reporting public opinion, based on a random sample and MOE

• Remained expensive, random sample difficult to construct, response rates dwindling

Page 15: Accessing and Using Big Data to Advance Social Science Knowledge

Using big data approaches

• n=all: beyond the sample• Cheaper (after initial

investment) • More granularity, more

insight?

CostUtility

Traditional inferential model

Big data model

Page 16: Accessing and Using Big Data to Advance Social Science Knowledge

But challenges remain…

• Representativeness• Reliability• Replicability

Page 17: Accessing and Using Big Data to Advance Social Science Knowledge

The challenge of representativeness

Amber Boydstun: anyone who does a Twitter study has to really work hard, I've noticed, to justify why we should care about Twitter because Twitter is not representative of the United States population or the world population, right. And it's not. And even if it was representative, or even if we don't care that it's not representative, it's really hard to figure out in any given study whether you're getting an over-sampling of those users who are just more active than other users.

Data from Dutton, W.H. and Blank, G., with Groselj, D. (2013) Cultures of the Internet: The Internet in Britain. Oxford Internet Survey 2013. Oxford Internet Institute, University of Oxford.

Page 18: Accessing and Using Big Data to Advance Social Science Knowledge

The challenge of reliability

Mike Thelwall: really the big problem that we haven’t cracked is that if someone tweets a sentiment it’s not necessarily what they’re feeling, it can be for a variety of reasons, so it doesn’t really reflect directly what they feel necessarily … so it’s quite a stretch to say that if someone tweets, “I’m happy” that they’re actually happy, to give a simple example

• Difficult to establish the meaning of latent messages

• Platform specific behaviours (e.g. hashtags, likes) are not always understood

• Political discourse often laced with sarcasm

Page 19: Accessing and Using Big Data to Advance Social Science Knowledge

The challenge of replicability

• Social data is often proprietary – getting access can be difficult, expensive or impossible

• Sometimes access is limited to output – analysis takes inside black box

• Challenges basic Popperian assumption of falsifiabilityNick Anstead: there are all these

companies that do all this wonderful stuff, but actually as an academic researcher, using them is expensive … what do you actually get from working with these companies? Do you get raw data sets that you go and do stuff with yourself? More commonly, I would suggest, what you probably get is access to, sort of, a black box tool.

Page 20: Accessing and Using Big Data to Advance Social Science Knowledge

… and the implications aren’t just academic

Shelton, T et al, ‘Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum Volume 52, March 2014, pps 167-79

Page 21: Accessing and Using Big Data to Advance Social Science Knowledge

Project PapersSchroeder, Ralph (Forthcoming). ‘Big Data: Towards a More Scientific Social Science and Humanities’ in Mark Graham and William H Dutton (eds.), Society and the Internet: How Networks of Information are Changing our Lives. Forthcoming.

Schroeder, Ralph, & Taylor, Linnet (Forthcoming). ‘Is bigger better? The emergence of big data as a tool for international development policy.’ GeoJournal.

Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, August). ‘Big Data in the Study of Twitter, Facebook and Wikipedia: On the Uses and Disadvantages of Scientificity for Social Research.’ Paper presented at the proceedings of the Annual Meeting of the American Sociological Association. (being submitted)

Schroeder, Ralph, & Taylor, Linnet. ‘Big Data and Wikipedia Research: Social Science Knowledge across Disciplinary Divides’. Submitted to Information, Communication and Society.

Taylor, Linnet. ‘No place to hide? The ethics and analytics of tracking mobility using African mobile phone data. Submitted to Population, Space and Place.

Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet. ‘Big Data in the Social Sciences: Towards a New Research Paradigm?’ (being submitted).

Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, November). ‘The Boundaries of Big Data.’ Paper presented at SIG-SI Symposium, ASIST 2013, November 1-6, 2013, Montreal, Quebec, Canada.

Schroeder, Ralph and Cowls, Josh. ‘Answering Questions and Questioning Answers in the Era of Big Data.’ In preparation.

Taylor, Linnet, Meyer, Eric T., & Schroeder, Ralph. ‘Bigger and better, or more of the same? Emerging practices and perspectives on big data analysis in economics”. Forthcoming in Big Data & Society.

Cowls, Josh. ‘The Crowd in the Cloud?’, forthcoming presentation and IPP 2014’

Cowls, Josh ‘Big Data and Policy Implementation’, in preparation.

Schroeder, Ralph ‘Big Data and Policy Implications’, in preparation.