big data luiss

22
Big data in social sciences and humanities: from epistemology to data power Teresa Numerico Dept. Philosophy, communication and performing arts University of Rome Three [email protected] Luiss - Media Politics and Democracy. A Challenging Topic for Social Sciences 21-22 May 2015

Upload: terindis

Post on 15-Apr-2017

96 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big data luiss

Big data in social sciences and humanities: from

epistemology to data power

Teresa Numerico Dept. Philosophy, communication and

performing artsUniversity of Rome Three

[email protected]

Luiss - Media Politics and Democracy. A Challenging Topic for Social Sciences

21-22 May 2015

Page 2: Big data luiss

Questionable Big data examples: Ethical, juridical, political and social doubts

Facebook experiments, google flu trends, culturonomics

Page 3: Big data luiss

Facebook experiment on textual emotional contagion

• In June 2014 PNAS journal published the description of a Facebook experiment on measuring emotional negative and positive contagion by altering the news feed of 689,003 English users

• The paper was written by Adam Kramer (core data science team Facebook) and two scholars in social sciences who worked at the Dept. of Communication and information science, Cornell University

See Schroeder 2014 for a complete analysis of the Facebook experiment

Page 4: Big data luiss

Informed consent • There is a discussion about informed consent of the people who were involved in the experiment

• Users tested in the experiment did not obtain any prior information or opt-out opportunity

• Because Facebook is a company and not a research institution there was no need to ask for any extra consent than that which is obtained in the service agreement

• The defence of Facebook with respect to this point is based on the fact that the company always manipulates user experience (Yarkoni 2014, boyd 2014)

Page 5: Big data luiss

IRB approval• Because the research was conducted independently by Facebook and Professor Hancock had access only to results – and not to any individual, identifiable data at any time – Cornell University’s Institutional Review Board concluded that he was not directly engaged in human research and that no review by the Cornell Human Research Protection Program was required

Press release Cornell University 30 june 2014http://mediarelations.cornell.edu/2014/06/30/media-statement-on-cornell-universitys-role-in-facebook-emotional-contagion-research

/

Page 6: Big data luiss

Data collection and interpretations

• The collection of the data and their interpretations raises not only ethical and legal doubts but also epistemological controversies.

• Positive and negative emotional words were counted using a linguistic inquiry and word count software (LIWC 2007) that implies the use of a generic, univocal, context free definition of words, judged as positive or negative. The system interprets posts by listing the presence of positive or negative expressions

Kramer and al. 2014, passim

Page 7: Big data luiss

Technological determinism or exploitation of a dominant

position? • Prediction and manipulation are based on the hypothesis that human behaviour is stable and mechanically alterable

• No replication of the experiment according to the standard scientific methodology is possible

• No control on data acquisition from scientists that were involved in the interpretation process, Jamie Guillory and Jeffrey Hancock

• However their reputations as social scientists were used by the Facebook team to validate their data science research results

Page 8: Big data luiss

Social sciences: representing while intervening • According to Evelyn Fox Keller (1991), a feminist philosopher of science and to Ian Hacking (1983, 1992) it is not possible to represent something without intervening and transforming it

• The Facebook experiment is a clear example of a representation that need intervention: understanding the emotional reactions of the human beings - which were the objects of representation - implied manipulating them

• Scientists are like apprentice sorcerer: they describe emotional reactions, while inducing them during the experiment

Page 9: Big data luiss

Google Flu Trends (GFT) failure

• GFT did not give the right predictions on flu trends, their value almost doubled the data preview by the Center for disease control and prevention (CDC)

• Instability of the data • Continuous changes in the search algorithms

that influenced the GFT data • Not clear indicators adopted • Impossible to repeat experiments for

controlling results • Measurement systems impossible to analyse • The risk of ‘red teams’ attack on the

monitored systems, that attempt to manipulate results for economic or political gain

Lazer and al. 2014

Page 10: Big data luiss

Facebook filter bubble study

• Bakshy et al. Exposure to ideologically diverse news and opinion on Facebook, Science, 7 may 2015

• David Lumb: Why Scientists Are Upset About the Facebook Filter Bubble Study• https://www.fastcompany.com/3046111/fast-feed/why-scientists-are-upset-over-

the-facebook-filter-bubble-study

• Christian Sandvig: The Facebook “It’s Not Our Fault” Study• http://socialmediacollective.org/2015/05/07/the-facebook-its-not-our-fault-

study/

• Eli Pariser:  Did Facebook’s Big New Study Kill My Filter Bubble Thesis?• https://medium.com/backchannel/facebook-published-a-big-new-study-on-the-

filter-bubble-here-s-what-it-says-ef31a292da95

• Zeynep Tufekci:  How Facebook’s Algorithm Suppresses Content Diversity (Modestly) and How the Newsfeed Rules Your Clicks

• https://medium.com/message/how-facebook-s-algorithm-suppresses-content-diversity-modestly-how-the-newsfeed-rules-the-clicks-b5f8a4bb7bab

• John Wihbey | May 7, 2015: Does Facebook drive political polarization? Data science and research http://journalistsresource.org/studies/society/social-media/facebook-political-polarization-data-science-research#

Page 11: Big data luiss

Facebook data science and politics

• Vinter Mason 28/10/2014: Politics and Culture on Facebook in the 2014 Midterm Elections https://www.facebook.com/notes/facebook-data-science/politics-and-culture-on-facebook-in-the-2014-midterm-elections/10152598396348859

Page 12: Big data luiss

Epistemology and politics: research and power

Changes in thinking about knowledge creation and their consequences

Page 13: Big data luiss

researching or spying • How to be a knowledge scientist after Snowden

revelations? (Berendt, Bückler, Rockwell 2015, see also van Dijck 2014)

• The digital humanist is losing innocence, experiencing his/her own ‘Manhattan Project’ syndrome: there is no neutral technology

• Technologies are already oriented once they are used in the research/battle field

• Ethics of knowledge science is needed but it is very difficult if we decline responsibility on our creatures as soon as we invent them

• There is a power of data, not only because they are never raw, not only because they are often proprietary but also because they are used for political reasons and every generic ‘neutral’ manipulation is a transformation of the observed object with no way back

Page 14: Big data luiss

Knowing is transforming AKA Fox Keller vision

• There is no pure science and bad applications • Knowledge is action not only with respect to

power in society but also with respect to the object of research

• After the knowledge process the object will never be the same

• Language’s role in science is never considered enough

• The evocative character of language and its vague, ambiguous status introduces uncontrolled leaps of meanings, metaphors, and the pre-scientific arguments

Fox Keller 2011

Page 15: Big data luiss

Rhetoric of BD/1: Computer are better problem solver than humans

• It’s human nature to focus on the problems […] where human skill and ingenuity are most valuable. And it’s normal human prejudice to undervalue the problems [of] the domain where data-driven intelligence really shines. But […] what problems can computers solve that we can’t? And how, when we put that ability together with human intelligence, can we combine the two to do more than either is capable of alone?

Nielsen, 2011, p. 255

Page 16: Big data luiss

Rhetoric of BD/2: data-driven science

• Science is no more oriented by interpretation, models and theory

• Science is “data-driven” which - in the BD jargon - means that there is no interpretation and no theory prior to data, because they are just making sense by themselves

• But this is just rhetoric because in order to find out the correlation among data series you need to seek for them choosing the right machine learning algorithms, or you risk that the correlations are just random, particularly with high dimensionality

Page 17: Big data luiss

No BD without solid replicable methodologies• Machine-learning methods are a valuable part of our toolkit in understanding behavior, but we do not yet understand the precise limits of their applicability

• The biggest contributions before us are not new algorithms or new social theories but new methodologies for decomposing hard questions in the social sciences into a series of robust analyses that are replicable and composable

Raghavan 2014

Page 18: Big data luiss

BD can be useful provided we understand the epistemological

implications

• According to Kitchin 2014a we need to develop a “situated, reflexive and contextually nuanced epistemology” in order to effectively use the methods in social sciences and humanities

• But to understand the problematic epistemological implication means to reduce the rhetoric and comprehend the relationships savoir/poivoir which are implied in data-driven results

Page 19: Big data luiss

Let’s ask some final questions on BD experiments and results

• Who owns the data? • Who owns the machines on which the data are processed?

• Who plans the algorithms to make sense of the data (is the data scientist working with or without the field expert)?

• What do we consider as definite results of the data-driven procedures?

• who is going to take advantages of the results?

• Is it possible to replicate the process, on different machines with different algorithms to be sure of the stability of the results?

Page 20: Big data luiss

Bibliographic sources/1• Berendt B.,Buchler M., Rockwell G. (2015) “Is it research or is it

spying?” Pre-print of paper published in Künstliche Intelligenz 2015. (C) Springer, URL of this pre-print: http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_buechler_rockwell

• dana boyd (1 july 2014), “What does the Facebook experiment teach us?”in the message, URL: https://medium.com/message/what-does-the-facebook-experiment-teach-us-c858c08e287f

• Hacking I (1983) Representing and Intervening, Cambridge University Press, Cambridge

• Hacking I (1992) “The self-vindication of the laboratory sci- ences” In: Pickering A (ed.) Science as Practice and Culture, University of Chicago Press, Chicago, pp. 29–64.

• Halevy A., Norvig P., Pereira F., (2009) “The unreasonable effectiveness of data”, IEEE Intelligent systems, March/April 2009, vol.24 n.9 pp.8-12, http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/35179.pdf

• Keller Fox E. (2010) The mirage of a space between nature and nurture, Duke University Press, Durham & London.

• Kitchin R. (2014a) “Big Data, new epistemologies and paradigm shifts”, in Big data and society,April-June 2014, 1-12.

• Kitchin, R. (2014). The Data Revolution. Big Data, Open Data, Data Infrastructures & Their Consequences. London: Sage.

Page 21: Big data luiss

Bibliographic sources/2• Kramer A.I. and al. (2014) “Experimental evidence of massive-scale

emotional contagion through social networks”, in PNAS, June 17, 2014, vol. 111, no. 24, 8788–8790, www.pnas.org/cgi/doi/10.1073/pnas.1320040111

• Lazer D., Kennedy R., King G., Vespignani A. (2014) “The parable of Google Flu: traps in Big data analysis”, in Science, vol. 343, 14 march 2014, pp. 1203- 1205.

• Leetaru, K. H. (5 September 2011). "Culturomics 2.0: Forecasting Large-Scale Human Behavior Using Global News Media Tone In Time And Space". First Monday 16 (9),URL: http://firstmonday.org/ojs/index.php/fm/article/view/3663/3040#p7

• Licklider J.C.R. (1965): Libraries of the future, The MIT Press, Cambridge, MA.

• Mayer-Schönberger V., Cukier K. (2013) Big Data. A revolution that will transform how we live, work and think, Houghton Mifflin Harcourt, Boston.

• Michel, J.B., Liberman Aiden, E. (14 Jan. 2011). "Quantitative Analysis of Culture Using Millions of Digitized Books". Science 331 (6014): 176–182.

• Nielsen M. (2012) Reinventing discovery: the new era of networked science, Princeton University Press, Princeton.

Page 22: Big data luiss

Bibliographic sources/3• Mayer-Schönberger, V. & Cukier, K. (2013). Big Data. A Revolution

That Will Transform How We Live, Schroeder R.(2014) “Big data and the brave new world of social media research”, in Big data and society, July-Dec 2014, 1-11.

• Porsdam H. (2013) “Digital Humanities: On Finding the Proper Balance between Qualitative and Quantitative Ways of Doing Research in the Humanities”, in Digital humanities quarterly 2013, Volume 7 Number 3http://www.digitalhumanities.org/dhq/vol/7/3/000167/000167.html

• Raghavan P. (2014) “It’s time to scale the science in the social sciences”, in Big Data and society, Apr-June 2014, pp.1-4.

• Schroeder R. (2014) “Big Data and the brave new world of social and media research” in Big Data and society July-Dec 2014, 1-11, bds.sagepub.com.

• Taylor Bob oral interview 1989 http://conservancy.umn.edu/bitstream/107666/1/oh154rt.pdf

• Yarkoni T.(july 2014) In defense of in defense of facebook, in citation needed, URL: http://www.talyarkoni.org/blog/2014/07/01/in-defense-of-in-defense-of-facebook/

• Van Dijck J. (2014) “Datification, Dataism and dataveillance: big data between scientific paradigm and ideology, in Surveillance and Society, 2014, vol. 12(2), 197-208.

• Wiener, N. (1950): The Human Use of Human Beings. Houghton Mifflin, Boston.