micah altman david o’brien & alexandra wood these opinions are our own. they are not the...
TRANSCRIPT
Towards a Modern Approach toPrivacy-Aware Government Data Releases
Micah Altman David O’Brien & Alexandra Wood MIT Libraries Berkman Center for Internet & Society
Open Data: Addressing Privacy, Security, and Civil Rights Challenges19th Annual BCLT/BTLJ Symposium
April 2015
DisclaimerThese opinions are our own. They are not the opinions of MIT, Brookings, Berkman any of the project funders, nor (with the exception of co-authored previously published
work) our collaborators.
2Towards a Modern Approach to Privacy-Aware Government Data Releases
Collaborators & Co-Conspirators
Collaborators
● The Privacy Tools for Research Data Project<privacytools.seas.harvard.edu>
● Research Support from Sloan Foundation; National Science Foundation (Award #1237235); Microsoft Corporation
3Towards a Modern Approach to Privacy-Aware Government Data Releases
Related Work● Vadhan, S., et al. 2011. “Re: Advance Notice of Proposed Rulemaking: Human
Subjects Research Protections.” ● Altman, M., D. O’Brien, S. Vadhan, A. Wood. 2014. “Big Data Study: Request for
Information.”● O'Brien, et al. 2015. “Integrating Approaches to Privacy Across the Research
Lifecycle: When Is Information Purely Public?” (Mar. 27, 2015) Berkman Center Research Publication No. 2015-7.
● Wood, et al. 2014. “Integrating Approaches to Privacy Across the Research Lifecycle: Long-Term Longitudinal Studies” (July 22, 2014). Berkman Center Research Publication No. 2014-12.
Preprints and reprints available from: informatics.mit.edu
4Towards a Modern Approach to Privacy-Aware Government Data Releases
Goals
1. Examine critical use cases
2. Develop a framework for systematically analyzing privacy in releases of data
3. Produce a guide for selecting among new legal and technical tools for privacy protection
5Towards a Modern Approach to Privacy-Aware Government Data Releases
Use Cases for Government Data Releases
● Freedom of Information Act/Privacy Act
● Open Government/E-Government Initiatives
● Traditional Public and Vital Records
● Official Statistics
6Towards a Modern Approach to Privacy-Aware Government Data Releases
Recent Examples
● E-Government DataOccupational Safety and Health Administration release of workplace injury records
● Open Government DataOpen cities data
7Towards a Modern Approach to Privacy-Aware Government Data Releases
Public Release ofWorkplace Injury Records
8Towards a Modern Approach to Privacy-Aware Government Data Releases
Benefits from Public Data Availability
● Transparency as a democratic principle
● Accountability of institutions
● Economic and social welfare benefits
● Data for research and scientific progress
9Towards a Modern Approach to Privacy-Aware Government Data Releases
Scope of Information Made Public
● All collected data not protected by FOIA, the Privacy Act, or OSHA reporting regulations
● Redaction of names, addresses, dates of birth, and gender
● Information to be released includes job title, date and time of incident, and descriptions of injury or illness and where and how it occurred
10Towards a Modern Approach to Privacy-Aware Government Data Releases
OSHA rulemaking mockup of proposed web display of injury/illness reports11Towards a Modern Approach to Privacy-Aware Government Data Releases
Re-identification Risks
● Individuals can be identified despite redaction of directly identifying fields or attributes
● Robust de-identification of microdata is a very difficult problem, and free-form text fields are especially challenging
12Towards a Modern Approach to Privacy-Aware Government Data Releases
Information Sensitivity
● OSHA identifies “privacy concern cases” as injuries or illnesses related to sexual assault, mental health, or infectious diseases
● There are other situations in which details regarding an injury or illness may be sensitive, such those related to drug or alcohol abuse, that are not included
13Towards a Modern Approach to Privacy-Aware Government Data Releases
Review, Reporting, and Accountability
● Lack of review mechanisms, such as systematic redactions of sensitive information before release
● Lack of accountability for harm arising from misuse of disclosed data
14Towards a Modern Approach to Privacy-Aware Government Data Releases
Framework for Modern Privacy Analysis
15Towards a Modern Approach to Privacy-Aware Government Data Releases
Observations
Privacy is not a simple function of the presence or absence of specific fields, attributes, or keywords in a released set of data.
Other factors, including what one can learn or infer about individuals from a data release as a whole or when linked with other information, may lead to harm.
16Towards a Modern Approach to Privacy-Aware Government Data Releases
Observations
Redaction, pseudonymization, coarsening, and hashing, are often neither an adequate nor appropriate practice, and releasing less information is not always a better approach to privacy.
Simple redaction of information that has been identified as sensitive is often not a guarantee of privacy protection and may also reduce the usefulness of the information. In addition, the act of redacting certain fields of a record may reveal the fact that a record contains sensitive information.
17Towards a Modern Approach to Privacy-Aware Government Data Releases
Observations
Naïve use of any data sharing model, including a more advanced model, is unlikely to provide adequate protection.
Thoughtful analysis with expert consultation is necessary in order to evaluate the sensitivity of the data collected, to quantify the associated re-identification risks, and to design useful and safe release mechanisms.
18Towards a Modern Approach to Privacy-Aware Government Data Releases
Framework for Privacy Analysis
● Benefits from public data availability● Scope of information made public● Re-identification risks● Information sensitivity● Review, reporting, and information
accountability19Towards a Modern Approach to Privacy-Aware Government Data Releases
Privacy Interventions at Any Stage
20Towards a Modern Approach to Privacy-Aware Government Data Releases
Data Sharing Models
21Towards a Modern Approach to Privacy-Aware Government Data Releases
Data Management Approaches
● Access controls (including tiered access models)
● Secure data enclaves
● Personal data stores
● Audit systems
● Information accountability/operational policy
● Risk assessments
22Towards a Modern Approach to Privacy-Aware Government Data Releases
Legal & Regulatory Approaches
● Notice and consent
● Data sharing agreements
● Transparency and audit requirements
● Data minimization requirements
● Accountability for misuse, including civil and criminal penalties and private rights of action
23Towards a Modern Approach to Privacy-Aware Government Data Releases
Statistical & Computational Approaches
● Contingency tables
● Synthetic data
● Data visualizations
● Interactive mechanisms
● Multiparty computations
● Functional and homomorphic encryption
24Towards a Modern Approach to Privacy-Aware Government Data Releases
Selecting Appropriate Controls
25Towards a Modern Approach to Privacy-Aware Government Data Releases
Analysis Type
Data Structure
● Logical Structure (e.g., single relation, multiple relational, network/graph, semi-structured, geospatial, aggregate table)
● Source● Unit of observation● Attribute measurement type (e.g., continuous/discrete; ratio/interval/ordinal/nominal
scale; associated schema/ontology)● Performance characteristics (e.g., dimensionality/number of measures, number of
observation/volume, sparseness, heterogeneity/variety, frequency of updates/velocity)● Quality characteristics (e.g., measurement error, metadata, completeness, total error)
● Form of output (e.g., summary scalars, summary table, model parameters, data extract, static data publication, static visualization, dynamic visualization, statistical/model diagnostics)
● Analysis methodology (e.g., contingency tables/counting queries, summary statistics/function estimation, regression models/GLM, general model-based statistical estimation/MLE/MCMC, bootstraps/randomization/data partitioning, data mining/heuristics/custom algorithms)
● Analysis goal (e.g., rule-based, theory formation, existence proof, verification, descriptive inference, forecasting, causal inference, mechanistic inference)
● Utility/loss/quality measure (e.g., entropy, mean squared error, realism, validity of descriptive/predictive/causal statistical inference)
References● Salil Vadhan, et al., Comments to the Department of Health and Human Services and the Food and Drug
Administration, Re: Advance Notice of Proposed Rulemaking: Human Subjects Research Protections, Docket No. HHS-OPHS-2011-0005 (Oct. 26, 2011), available at http://privacytools.seas.harvard.edu/files/commonruleanprm.pdf.
● Micah Altman, David O’Brien, & Alexandra Wood, Comments to the Occupational Safety and Health Administration, Re: Proposed Rule: Improve Tracking of Workplace Injuries and Illnesses, OSHA-2013-0023-1207 (March 10, 2014), available at http://www.regulations.gov/#%21documentDetail;D=OSHA-2013-0023-1207.
● Micah Altman, David O’Brien, Salil Vadhan, & Alexandra Wood, Comments to the White House Office of Science and Technology Policy, Re: Big Data Study; Request for Information (March 31, 2014), available at http://privacytools.seas.harvard.edu/files/whitehousebigdataresponse1.pdf.
● David O’Brien, et al., Integrating Approaches to Privacy Across the Research Lifecycle: When Is Information Purely Public?, Berkman Center Research Publication No. 2015-7 (March 27, 2015), available at http://ssrn.com/abstract=2586158 or http://dx.doi.org/10.2139/ssrn.2586158.
● Alexandra Wood, et al., Integrating Approaches to Privacy Across the Research Lifecycle: Long-Term Longitudinal Studies, Berkman Center Research Publication No. 2014-12 (July 22, 2014), available at http://ssrn.com/abstract=2469848 or http://dx.doi.org/10.2139/ssrn.2469848.
26Towards a Modern Approach to Privacy-Aware Government Data Releases
Questions
E-mail: Micah Altman, [email protected]: privacytools.seas.harvard.edu
27Towards a Modern Approach to Privacy-Aware Government Data Releases