mitigating risk through implementing 5 safes principles at scale€¦ · data and metadata while...
TRANSCRIPT
Mitigating risk through implementing
5 safes principles at scale
Nathan Cunningham
Director for Big Data
SAFE OUTPUTS
5 SAFES, Manchester
17-18 September 2015
Privacy Challenges Analysis: It’s surprisingly easy to identify individuals from credit-card
metadata. There are a two things we know about publicly releasing
large sets of data:
1. It’s really hard (maybe impossible) to completely anonymize big data.
2. You’re really easy to identify in most anonymized data sets, using
just a small amount of information.
Unique in the shopping mall: On the reidentifiability of credit card
metadata (de Montjoye, Science 30 January 2015)
• COMMENT
Big Access
The Good, The Bad and the Ugly
BIG Concerns:
• Big data tools can alter the balance of power between government and citizen. Government agencies can reap enormous benefits from using big data to improve service delivery or detect payment fraud. But government uses of big data also have the potential to chill the exercise of free speech or free association. As more data is collected, analyzed, and stored on both public and private systems, we must be vigilant in ensuring that balance is maintained between government and citizens, and revise our laws accordingly.
• Big data tools can reveal intimate personal details. One powerful big data technique involves merging multiple data sets, drawn from disparate sources, to reveal complex patterns. But this practice, sometimes known as “data fusion,” can also lead to the so-called “mosaic effect,” whereby personally identifiable information can be discerned even from ostensibly anonymized data. As big data becomes even more widely used in the private sector to bring a wellspring of innovations and productivity, we must ensure that effective consumer privacy protections are in place to protect individuals.
• Big data tools could lead to discriminatory outcomes. As more decisions about our commercial and personal lives are determined by algorithms and automated processes, we must pay careful attention that big data does not systematically disadvantage certain groups, whether inadvertently or intentionally. We must prevent new modes of discrimination that some uses of big data may enable, particularly with regard to longstanding civil rights protections in housing, employment, and credit.
•
The risks associated with Big Data
The risk spectrum for Unsafe Data
Disruptive Technology
What question do you want to answer?
Governance modeling
Enable Streaming
Brokering 5 safes as a service
Streams and Lakes
• Data Classification – to create an
understanding of the data within
Hadoop and provide a classification
of this data to external and internal
sources
• Centralized Auditing – to provide a
framework for capturing and
reporting on access to and
modifications of data within Hadoop
• Search and Lineage – to allow pre-
defined and ad-hoc exploration of
data and metadata while maintaining
a history of how a data source or
explicit data was constructed
• Security and Policy Engine – to
protect data and rationalize data
access according to compliance
policy.
Put the user first
• Develop a systems view from the users perspective.