the use of social media for research and analysis: a ... · applying computer science methods to...

The Use of Social Media

for Research and

Analysis: A Feasibility

December 2014

DWP ad hoc research report no. 13

A report of research carried out by the Oxford Internet Institute on behalf of the

Department for Work and Pensions

© Crown copyright 2014. You may re-use this information (not including logos) free of charge in any format or medium, under the terms of the Open Government Licence. To view this licence, visit http://www.nationalarchives.gov.uk/doc/open-government-licence/ or write to the Information Policy Team, The National Archives, Kew, London TW9 4DU, or email: psi@nationalarchives.gsi.gov.uk. This document/publication is also available on our website at: https://www.gov.uk/government/organisations/department-for-work-pensions/about/ research#research-publications If you would like to know more about DWP research, please email: Socialresearch@dwp.gsi.gov.uk First published 2014. ISBN 978-1-78425-407-0 Views expressed in this report are not necessarily those of the Department for Work and Pensions or any other Government Department.

The Use of Social Media for Research and Analysis: A Feasibility Study

Summary

The aim of this report is to explore the ways in which data generated by social

media platforms can be used to support social research and analysis at the

Department for Work and Pensions [DWP]. The report combines a general

review of all the possibilities generated by social media data with an empirical

exploration assessing the feasibility of some solutions, focussing in particular

on the examples of Universal Credit and Personal Independence Payment.

The report argues that social media data can be useful for social research

purposes in two key respects. Firstly, these media can provide indications of

information seeking behaviour (which may indicate public awareness of and

attention to specific policies, as well as providing an idea of the sources where

they get information from). Secondly, they can provide indications of public

opinion of specific policies, or reaction to specific media events. This means

that social media are positioned to provide social researchers at the DWP with

a variety of useful data sources, such as:

o Indications of public reaction to specific policy announcements or

proposals

o Insight into public experiences of services the DWP provide (such as

Jobcentres)

o Ways of measuring overall public attention to DWP policies, or

awareness of key policy changes

o Ways of measuring general social trends of importance to the DWP,

such as a rise in employment

o Insight into the sources of public opinion, such as where people get

information on specific DWP policies

However we also argue that caution is needed in interpreting the results of

social media data, or generalizing from these data to the public at large. The

science behind many of these methods is still developing; and major questions

remain how to employ them properly.

Overall, we recommend that all social media data be “benchmarked” against

other data sources as most indicators developed in the report are very difficult

to interpret in isolation.

This report does not focus on using social media for public relations or

communications, rather it looks at its role in social research and analysis.

Contents

Acknowledgements .................................................................................................... 6

The Authors ................................................................................................................ 7

Glossary and Abbreviations ........................................................................................ 8

Summary .................................................................................................................... 9

1 Introduction ........................................................................................................ 12

2 Part 1 - Social Media for Social Research ......................................................... 14

2.1 Defining Social Media ................................................................................. 14

2.1.1 Social Media Usage ............................................................................ 15

2.2 Using Social Media for Social Research .................................................... 18

2.2.1 Predicting the Present: Using social media to understand current salient

issues 19

2.2.2 Social Media as a Source of Public Opinion ....................................... 21

2.3 Using Social Media Data: Challenges ........................................................ 23

2.3.1 Technical and Methodological Challenges .......................................... 24

2.3.2 Legal and Ethical Challenges .............................................................. 25

3 Part 2 –Social Media Data and DWP Policy Research: the cases of Universal

Credit and Personal Independence Payment ........................................................... 28

3.1 Search Data ............................................................................................... 30

3.1.1 Google Trends and Public Awareness of DWP Policies ..................... 31

3.1.2 Using Google Trends to "predict the present" ..................................... 34

3.1.3 Google Search and the sources of public opinion ............................... 37

3.2 Wikipedia Presence .................................................................................... 39

3.2.1 Wikipedia Readership Statistics as an Indicator of Public Awareness 39

3.2.2 Wikipedia Article Quality as an Indicator of Public Awareness ............ 40

3.3 Mentions of DWP Policies on Social Media Platforms ................................ 43

3.3.1 Public Attention to UC and PIP on Social Media Platforms ................. 44

3.3.2 Social Media Data as an indicator of public opinion ............................ 48

3.4 The DWP’s Own Social Media Presence ................................................... 49

4 Conclusions and Further Work .......................................................................... 53

5 References ........................................................................................................ 55

6 Annex ................................................................................................................ 59

List of tables and figures

Figure 1: HMRC and Inland Revenue compared as Google Search Terms

Figure 2: Universal Credit, Personal Independence Payment and Jobseeker’s Allowance as

Google search terms. Highest peak represents approximately 60,000 monthly searches

Figure 3: Top Related Searches for Universal Credit (left) and Jobseeker’s Allowance (right) Figure 4: Google Flu Trends

Figure 5: Volume of searches for Jobseeker’s Allowance (left) and Facebook (right) in different

regions Figure 6: Google API Results for the search term “Universal Credit”. December 2012 – August

Figure 7: Google API Results for the search term “Universal Credit”, limited to just gov.uk and

dwp.gov.uk. December 2012 – August 2013.

Figure 8: Volume of monthly page views for three Wikipedia pages

Figure 9: Volume of daily page views for three Wikipedia page. Dotted vertical lines indicate

major news events

Figure 10: Length of the UC, JSA and PIP Wikipedia Pages (in words)

Figure 11: The first revision of Wikipedia’s article on Universal Credit, which appeared the day

after it was announced by Iain Duncan Smith in October 2010.

Figure 12: Number of Facebook users mentioning each policy per day. Vertical lines represent

major news events

Figure 13: Number of Twitter users mentioning each policy per day. Two data collection issues

led to missing data in May and August. Vertical lines represent major news events

Figure 14: Cumulative Twitter users

Figure 15: Cumulative Facebook users

Table 1: Size and edit history of different Wikipedia articles for the period January 2013 to

October 2013 Table 2: Directed feedback to Jobcentre Plus Names and places have been removed.

Acknowledgements

The report’s authors would like to acknowledge the financial support provided

by the Department for Work and Pensions which commissioned the report,

and the wide range of useful critiques and comments provided by Andrea

Kirkpatrick, Dimitra Karperou, and several anonymous reviewers.

We would also like to thank Wikimedia Deutschland e.V. and the Wikimedia

Foundation which provided access to the Wikipedia data used in this report;

and i2-sm whose data collection platform was used to access Twitter and

Facebook data.

Finally, we would like to thank the technical staff at the Oxford Internet

Institute for infrastructural support to the project.

All errors and omissions remain our own.

The Authors

Jonathan Bright is a Research Fellow at the Oxford Internet Institute,

University of Oxford. He is a political scientist specialising in political

communication and computational social science (especially “big data”

research”). His major research interest is around how people get information

about politics, and how this process is changing in the internet era.

Helen Margetts is Director of the Oxford Internet Institute, University of

Oxford, where she is Professor of Society and the Internet, and a Fellow of

Mansfield College. She is a political scientist specialising in digital government

and internet-mediated collective action. She is co-author (with Patrick

Dunleavy) of Digital Era Governance: IT Corporations, the State and e-

Government (Oxford University Press, 2006, 2008).

Scott Hale is a Research Assistant and doctoral candidate at the Oxford

Internet Institute, University of Oxford. He is interested in developing and

applying computer science methods to social science questions in the areas

of political science, language, and network analysis.

Taha Yasseri is a Researcher at the Oxford Internet Institute, University of

Oxford. His main research focus is on online societies, governmentcitizen

interactions on the web and structural evolution of the web. He uses

mathematical models and data analysis to study social systems quantitatively.

Glossary and Abbreviations

Abbreviations

DWP – The Department for Work and Pensions

API – Application Programming Interface – An interface for accessing the data

contained in a web platform or service in a systematic way. Many social media sites

maintain such an API.

PIP – Personal Independence Payments

JSA – Jobseeker’s Allowance

DLA – Disability Living Allowance

TOS – Terms of Service

Glossary

Sentiment analysis – A family of techniques which aim to automatically extract

“sentiment” from a piece of text (for example, whether it is positive or negative, or

whether the person writing it was angry or excited, etc.).

Social media – A means of communication, based around a website or internet

service, where the content being communicated is produced by the people using the

service. Can be distinguished from other types of media (such as the news media)

where there is a clear distinction between the producers and consumers of content.

Social network – A type of social media which, in addition to the features listed

above, allows users to maintain lists of other individuals on the site with which they

want to maintain special or frequent contact