critical issues in the collection, analysis and use of student (digital) data
TRANSCRIPT
Critical issues in the collection, analysis and use of students’ (digital) data
By Paul Prinsloo (University of South Africa)
Presentation at the Centre for Higher Education Development (CHED), University of Cape Town, Wednesday 8 April 2015
Image credit: http://graffitiwatcher.deviantart.com/art/Big-Brother-is-Watching-173890591
ACKNOWLEDGEMENTS
I do not own the copyright of any of the images in this presentation and hereby acknowledge the original copyright and licensing regime of every image and reference used. All the images used in this presentation have been sourced from Google and were labeled for non-commercial reuse.
This work (excluding the images) is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Overview of the presentation
• Map the collection, analysis and use of students’ digital data against the backdrop of discourses re surveillance/sousveillance& Big Data/lots of data
• Problematise the collection, analysis and use of student digital data …
• User knowledge and choice in the context of the collection, analysis and use of data
• When our good intentions go wrong…
• Do students know?
• Points of departure
• Implications
• (In)conclusions
The collection, analysis and use of students’ digital data in the context of…• Claims that Big Data in higher education will change
everything and that student data are “the new black” and “the new oil”
• Our “quantification fetish”, the “algorithmic turn” and “techno-solutionism” (Morozov, 2013a, 2013b)
• The current meta-narratives of “techno-romanticism” in education (Selwyn, 2014)
• The belief that data is “raw”, “speak for itself” and that collecting even more data equals necessarily results in better understanding and interventions
The collection, analysis and use of students’ digital data in the context of… (2)
• Ever-increasing concerns about surveillance, and new forms of “societies of control” (Deleuze, 1992)
• The “algorithmic turn” and the “alogorithm as institution” (Napoli, 2013)
• A possible “gnoseological turning point” where our belief about what constitutes knowledge is changing and where individuals are reduced to classes and numbers (Totaro & Ninno, 2014). N=all (Lagoze, 2014)
• Claims that “Privacy is dead. Get over it” (Rambam, 2008)
Problematising the collection, analysis and use of student data…• Privacy as concept & as enforceable construct is fragile (Crawford & Schultz,
2014; Prinsloo & Slade, 2015)
• Legal & regulatory frameworks (permanently?) lag behind (Silverman, 2015)
• Consent is more than a binary of opt-in or opt-out (Miyazaki & Fernandez, 2000;
Prinsloo & Slade, 2015)
• Individuals share unprecedented amounts of information but yet, are increasingly concerned about privacy (Murphy, 2014)
• Discrimination is a fundamental building block in the collection, analysis & use of data (Pfeifle, 2014; Tene & Polonetsky, 2014)
• There are increasing concerns re the lack of algorithmic accountability (Diakopoulos, 2014; Pasquale, 2014) & the fracturing of the control zone (Lagoze,
2014)
• There are also concerns about the unintended consequences of the collection, analysis & use of data (Wigan & Clark, 2013)
Mapping the collection, analysis and use of student digital data against the
discourses of surveillance/sousveillance
From surveillance to sousveillance…
Image credit: http://commons.wikimedia.org/wiki/File:SurSousVeillanceByStephanieMannAge6.png
Jennifer Ringely – 1996-2003 – webcam
Source: http://onedio.com/haber/tum-zamanlarin-en-
etkili-ve-onemli-internet-videolari-36465
If I did not share it on Facebook, did it really happen?
We share more than every before, we are watched more than ever before and we watch each other more than ever before…
Privacy in flux…
Image source: https://www.mpiwg-berlin.mpg.de/en/news/features/feature14 Copyright could not be established
• 1749 Jacques Francois Gaullauté proposed “le serre-papiers” – The Paperholder – to King Louis the 15th
• One of the first attempts to articulate a new technology of power – one based on traces and archives (Chamayou, nd)
• The stored documents comprised individual reports on each and every citizen of Paris
The technology will allow the sovereign “…to know every inch of the city as well as his own house, he will know more about ordinary citizens than their own neighbours and the people who see them everyday (…) in their mass, copies of these certificates will provide him with an absolute faithful image of the city” (Chamayou, n.d)
The Paperholder – “le serre papiers” (1749
“Secrets are lies”
“Sharing is caring”
“Privacy is theft”
(Eggers, 2013, p. 303)
Welcome to “The Circle”
TruYou – “one account, one identity, one password, one payment system, per person. (…) The devices knew where you were… One button for the rest of your life online… Anytime you wanted to see anything, use anything, comment on anything or buy anything, it was one button, one account, everything tied together and trackable and simple…”
(Eggers, 2013, p. 21)
“Hidden algorithms can make (or ruin) reputations, decide the destiny of entrepreneurs, or even devastate an entire economy. Shrouded in secrecy and complexity, decisions at major Silicon Valley and Wall Street firms were long assumed to be neutral and technical. But leaks, whistleblowers, and legal disputes have shed new light on automated judgment. Self-serving and reckless behavior is surprisingly common, and easy to hide in code protected by legal and real secrecy. Even after billions of dollars of fines have been levied, underfunded regulators may have only scratched the surface of this troubling behavior.”
http://www.hup.harvard.edu/catalog.php?isbn=9780674368279
Mapping the collection, analysis and use of student digital data against the discourses of Big Data/lots of data…
What is Big Data?
• Huge in volume• High in velocity, being created in or near real time• Diverse in variety• Exhaustive in scope• Fine-grained in resolution and uniquely indexical in
identification• Relational in nature• Flexible, holding traits of extensionality (can add new
fields easily) and scalability(can expand in size rapidly)
(Kitchen, 2013, p. 262)
Exploring the differences between Big Data/lots of data… (Lagoze, 2014)
Mayer-Schönberger & Cukier (2013 –• N=all – Big Data as presenting a “complete view” of reality• Big permits us to lessen our desire for exactitude• We need to shed some of our obsession for causality in
exchange for correlations – not necessarily knowing (or caring about the why but focusing on the what
Lots of data – methodological challengesBig Data – epistemological challenges
Big data as cultural, technological, and scholarly phenomenon (Boyd & Crawford, 2012)
Big Data as interplay of
• Technological: maximising computation power and algorithmic accuracy to gather, analyse, link, and compare large data sets
• Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims
• Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of trust, objectivity, and accuracy
(Boyd & Crawford, 2012, p. 663)
Three sources of data
Directed
A digital form of
surveillance
wherein the “gaze
of the technology is
focused on a
person or place by
a human operator”
Automated
Generated as “an
inherent, automatic
function of the device or
system and include
traces …”
Volunteered
“gifted by users and
include interactions
across social media
and the crowdsourcing
of data wherein users
generate data”(emphasis added)
(Kitchen, 2013, pp. 262—263)
Different sources/variety of quality/ integrity of data
Different role-players with different interests• Individuals• Corporates• Governments• Higher education• Data brokers• Fusion centres
Different methods/types of surveillance, harvesting and analysis
Issues re
• Informed consent
• Reuse/contextual
integrity/context
collapse
• Ethics/privacy/
justice/care
The Trinity of Big Data
Adapted & refined from Prinsloo, P. (2014). A brave new world. Presentation at SAAIR, 16-18 October http://www.slideshare.net/prinsp/a-brave-new-world-student-
surveillance-in-higher-education
Image credit: http://commons.wikimedia.org/wiki/File:Red_sandstone_Lattice_piercework,_Qutb_Minar_complex.jpg
Image credits: http://commons.wikimedia.org/wiki/File:DARPA_Big_Data.jpg
“Privacy and big data are simply incompatible and the time has come to reconfigure choices that we made decades ago to enforce constraints”
(Lane, Stodden, Bender & Nissenbaum, 2015, p. xii)
Critical questions for big data – boyd & Crawford (2012)
1. Big data changes the definition of knowledge – “Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves” (Anderson, 2008, in boyd & Crawford, 2012, p. 666)
1. Claims to objectivity and accuracy are misleading – “working with Big Data is still subjective, and what it quantifies does not necessarily have a closer claim on objective truth” (boyd & Crawford, 2012, p. 667). Big Data “enables the practice of apophenia: seeing patterns where none actually exist, simply because enormous quantities of data can offer connections that radiate in all directions” (ibid., p. 668)
Critical questions for big data (2) – boyd & Crawford (2012)
3. Bigger data are not always better data
3. Taken out of context, Big Data loses its meaning – leading to context collapse
3. Just because it is accessible does not make it ethical – the difference in ethical review procedures and overview between research and ‘institutional research’
3. Limited access to Big Data creates new digital divides
User knowledge and choice in the context
of the collection, analysis and use of
data
Image credit: http://www.mailbow.net/eng/blog/opt-in-and-op-out/
“Providing people with notice, access, and the ability to control their data is key to facilitating some autonomy in a world where decisions are increasingly made about them with the use of personal data, automated processes, and clandestine rationales, and where people have minimal abilities to do anything about such decisions”
(Solove, 2013, p. 1899; emphasis added)
Image credit: http://www.mailbow.net/eng/blog/opt-in-and-op-out/
A framework for mapping the collection, use and sharing of personal user information
(Miyazaki & Fernandez, 2000)
Nevercollect or identity users
Users explicitly opting in to have data collected, used and shared
Users explicitly opting out
The constant collection, analysis and sharing of user data with users’ knowledge
The constant collection, analysis and sharing of user data withoutusers’ knowledge
Also see Prinsloo, P., & Slade, S. (2015). Student vulnerability, agency and learning analytics: an exploration. Presentation at LAK15, Poughkkeepsie, NY, 16 March 2015
http://www.slideshare.net/prinsp/lak15-workshop-vulnerability-final
The constraints of privacy self-management …
• It is almost impossible to comprehend the scope of data collected, analysed and used, the combination with other sources of information, the future uses for historical information and the possibilities of re-identification of de-personalized data
• These various sources of information and combinations of sources start to resemble “electronic collages” and an “elaborate lattice of information networking” (Solove, 2004, p. 3)
• The fragility of consent… what may be innocuous data in one context, may be damning in another
Adapted from Prinsloo, P., & Slade, S. (2015). Student privacy self-management: implications for learning analytics. Presentation at LAK15, Poughkkeepsie, NY, 16 March 2015
http://www.slideshare.net/prinsp/lak15-workshop-vulnerability-final
Using student data and student vulnerability: between the devil and the deep blue sea?
Students (some more vulnerable
than others)
Generation, harvesting and analysis of data
Our assumptions, selection of data and algorithms
may be ill-defined
Turning ‘pathogenic’ – “a response intended to
ameliorate vulnerability has the paradoxical effect of exacerbating existing
vulnerabilities or generating new ones”
(Mackenzie et al, 2014, p. 9)
Adapted from Prinsloo, P., & Slade, S. (2015). Student vulnerability, agency and learning analytics: an exploration. Presentation at LAK15, Poughkkeepsie, NY, 16 March 2015
http://www.slideshare.net/prinsp/lak15-workshop-vulnerability-final
Do students know/have the right to know… • what data we harvest from them• about the assumptions that guide our algorithms• when we collect data & for what purposes• who will have access to the data (now & later)• how long we will keep the data & for what
purpose & in what format• how will we verify the data & • do they have access to confirm/enrich their
digital profiles…?
Adapted from Prinsloo, P., & Slade, S. (2015). Student privacy self-management: implications for learning analytics. Presentation at LAK15, Poughkkeepsie, NY, 16 March 2015
http://www.slideshare.net/prinsp/lak15-workshop-vulnerability-final
Do they know?
Do they have the right to know?
Can they opt out and what are the implications if they do/don’t?
Adapted from Prinsloo, P., & Slade, S. (2015). Student privacy self-management: implications for learning analytics. Presentation at LAK15, Poughkkeepsie, NY, 16 March 2015
http://www.slideshare.net/prinsp/lak15-workshop-vulnerability-final
Points of departure (1)
(Big) data is…
…not an unqualified good (Boyd and Crawford, 2011)
and “raw data is an oxymoron” (Gitelman, 2013) – see
Kitchen, 2014
Technology and specifically the use of data have been
and will always be ideological (Henman, 2004; Selwyn,
2014) and embedded in relations of power (Apple,
2004; Bauman, 2012)
“… ‘educational technology’ needs to be
understood as a knot of social, political,
economic and cultural agendas that are riddled
with complications, contradictions and conflicts”
(Selwyn, 2014, p. 6)
Points of departure (2):
If we accept that
…what are the implications for the
collection, analysis and use of
student data?
Points of departure (3):The (current?)
limitations of our surveillance
• Students’ digital lives are but a minute part of a bigger
whole – but our collection and analysis pretend as if this
minute part represents the whole
• We create smoke and claim we see a fire – so what
does the number of clicks mean?
• We seldom wonder what if our algorithms are wrong,
and what are the long-term implications for students?
What are the implications for the collection, analysis and use of student (digital) data? (Prinsloo & Slade, 2015)
1. The duty of reciprocal care
• Make TOCs as accessible and understandable (the latter may mean longer…)
• Make it clear what data is collected, when, for what purpose, for how long it will be kept and who will have access and under what circumstances
• Provide users access to information and data held about them, to verify and/or question the conclusions drawn, and where necessary, provide context
• Provide access to a neutral ombudsperson(Prinsloo & Slade, 2015)
What are the implications …? (2)
2. The contextual integrity of privacy and data – ensure the contextual integrity and lifespan of personal data. Context matters…
2. Student agency and privacy self-management• The fiduciary duty of higher education implies a social contract of
goodwill and ‘do no harm’• The asymmetrical power relationship between institution and
students necessitates transparency, accountability, access and input/collaboration
• Empower students – digital citizenship/care• The costs and benefits of sharing data with the institution should be
clear• Higher education should not accept a non-response as equal to
opting in…(Prinsloo & Slade, 2015)
What are the implications …? (3)
4. Future direction and reflection• Rethink consent and employ nudges – move away from
thinking just in terms of a binary of opting in or out – but provide a range of choices in specific contexts or needs
• Develop partial privacy self-management – based on context/need/value
• Adjust privacy’s timing and focus - the downstream use of data, the importance of contextual integrity, the lifespan of data
• Moving toward substance over neutrality – blocking troublesome and immoral practices, but also soft, negotiated spaces of reciprocal care
(Prinsloo & Slade, 2015)
Ethical use of Student Data for Learning Analytics Policy
An example of the institutionalisation of thinking about the ethical implications of using student data
Available at: http://www.open.ac.uk/students/charter/essential-documents/ethical-use-student-data-learning-analytics-policy
(In)conclusions“The way forward involves
(1) developing a coherent approach to consent, one that accounts for the social science discoveries about how people make decisions about personal data;
(2) recognising that people can engage in privacy self management only selectively;
(3) adjusting privacy law’s timing to focus on downstream uses; and
(4) developing more substantive privacy rules.
These are enormous challenges, but they must be tackled”
(Solove, 2013)
(In)conclusions
“Technology is neither good or bad; nor is it neutral… technology’s interaction with social ecology is such that technical developments frequently have environmental, social, and human consequences that go far beyond the immediate purposes of the technical devices and practices themselves”
Melvin Kranzberg (1986, p. 545 in boyd & Crawford, 2012, p. 1)
THANK YOU
Paul Prinsloo (Prof)Research Professor in Open Distance Learning (ODL)College of Economic and Management Sciences, Office number 3-15, Club 1, Hazelwood, P O Box 392Unisa, 0003, Republic of South Africa
T: +27 (0) 12 433 4719 (office)T: +27 (0) 82 3954 113 (mobile)
[email protected]: paul.prinsloo59
Personal blog: http://opendistanceteachingandlearning.wordpress.com
Twitter profile: @14prinsp