cyber summit 2016: privacy issues in big data sharing and reuse
TRANSCRIPT
Bart Custers PhD MSc LLMAssociate professor/head of research
eLaw – Center for Law and Digital TechnologiesLeiden University, The Netherlands
Cyber Summit 2016 – Banff, CanadaOctober 27th 2016, 2:15 pm – 2:45 pm
� Introduction: big data and data reuse� Eudeco-project
� Generating new data vs data reuse
� Legal and ethical issues� Privacy, security
� Discrimination, stigmatization, polarization
� Consent, autonomy, self-determination
� Transparency, integrity, trust
� Suggestions for solutions� Conclusions
2
more data => more opportunities
This calls for data sharing and reuse
� The Eudeco project (3 years)� Five partners
� Four countries
� Modeling the European Data Economy� Focus on big data and data reuse
� Legal, societal, economic and technological perspectives
3
Big Data• Volume (big)
• Velocity (fast)• Variety (unstructured)
� People
� Social media
� User generated content
� Devices (Internet of Things)
� Sensors▪ Cameras, microphones
� Trackers▪ RFID tags, web surfing behavior
� Other▪ Mobile phones, wearables
▪ Self-surveillance/quantified self4
Data sharing� Active role of data subjects
(hence: consent)
Data reuse(with/without consent)
� Data recyclingData reuse for the same purpose
� Data repurposingData reuse for new purposes
� Data recontextualisation
Data reuse in a new context
5
Data reuse may…• be more efficient
• be more effective (e.g., larger volumes, more completeness)
• include historical data• not always match purposes and
context• be difficult
• Technological(e.g. interoperability, data portability
• Legal(e.g. privacy laws)
• Economic(e.g. competition)
• Right to data portability• Right to be forgotten
Facebook likes can predict:sexual orientation, ethnicity, religious and
political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender.
(Kosinski et al. 2013)
Legal perspective� Violations of privacy depend on your definition of privacy
Ethical perspective� Violations of privacy depends on your expectations.
� Subjective: personal expectations
� Objective: reasonable expectations
� Unwanted disclosure of information
� Security (hacking, leaking)
� Predictions
� Unwanted use of information
� Transparency regarding decision-making
� Function creep 6
informational privacy:
Which data are used? For which purposes?
7
8
Data may be discriminating:
� When police surveillance focuses on black neighborhoods, people in database will be black (selective sampling)
Patterns may be discriminating:
� Database may show top managers are male (self fulfilling prophecy)
� People causing car accidents are >16 years old (non-novel pattern)
Discrimination may be concealed/indirect
� Selection on zip code instead of ethnic background (redlining)
� Selection on legitimate attributes correlated to discriminating attributes (masking)
Discrimination
�
Stigmatisation
�
Polarisation
Privacy policies/Terms & Conditions� People do not read policies
� Reading everything would take 244 hours annually
� Users are willing to spend 1-5 minutes on this
� Facebook: 9,500 words (>1 hour), LinkedIn: 7,500 words (~1 hour)
� People do not understand policies
� Policies are often highly legalistic, technical, or both
� Devil is in the details
� People do not grasp consequences
� Preferred option is not available
� Take-it-or-leave it decisions: check the box
9
informational self-determination (Westin, 1967)People control who gets their data and for which purposes
10
Past Current Future?
� Big data is used for a lot of decision-making
� Based on what data?
� Based on which analyses?
Do you know in how
many databases you are?
� Limiting Access to Sensitive Data� Basic idea is that if sensitive data are absent in the database/cloud, the
resulting decisions/selections cannot be discriminating
� However, restricting access is very difficult:� According to information theory, the dissemination of data follows
the laws of entropy:
▪ Information can easily be copied and multiplied
▪ Information can easily be distributed
▪ This process is irreversible
11
Analyze the problem:
� Privacy Impact Assessments
Customize the solution:
� Privacy by Design� Privacy enhancing tools
� Privacy preserving big data analytics
� Discrimination aware data mining
12
Since there is not one problem, there is no single solution
Combinations of smart solutions are required
New perspectives
� Focus less on:
� Limiting access to data
� Restrictions use of data
� Focus more on:
� Transparency
� Responsibility
13
Restricting data access and use limits big data
opportunities and is difficult to enforce
� We need data sharing and data reuse
� There are risks, however, regarding
� Privacy, discrimination, consent, transparency
� These risks can be addressed via responsible innovation
� Privacy Impact Assessments
� Privacy by Design
▪ Privacy enhancing tools
▪ Privacy preserving big data analytics
▪ Discrimination aware data mining
� New approaches� Focus less on limitations of access to data and use restrictions
� Focus more on transparency and responsibility
14