privacy and security workgroup: summary of big data public hearings january 26, 2015 deven mcgraw,...

40
Privacy and Security Workgroup: Summary of Big Data Public Hearings January 26, 2015 Deven McGraw, chair Stan Crosley, co- chair

Upload: oscar-whorton

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Privacy and Security Workgroup:

Summary of Big Data Public Hearings

January 26, 2015

Deven McGraw, chairStan Crosley, co-chair

2

Agenda

• PSWG Workplan• Scope• Key Themes• Topics to Discuss

• De-identification• Consent

• Backup Slides – Summary of Hearing Testimony

Privacy and SecurityDraft Workplan

Meetings Task

December 5, 2015 • Virtual hearing – big data and privacy

December 8, 2014 • Virtual hearing – big data and privacy

January 12, 2015 • Big data and privacy in health care

January 26, 2015 • Big data and privacy in health care

February 9, 2015 • Big data and privacy in health care

HITPC Meeting March 10, 2015 • Tentative Date to Present Initial Findings/Recommendations to HITPC

PSWG Workplan Scope Key Themes De-identification Consent

4

Scope

In scope:• Privacy and security concerns• Potential harmful uses (related to privacy)

Out of scope:• Data quality/data standards• Non representativeness of data?

• Shouldn’t try to resolve this from the standpoint of increasing “representativeness” of data but should be considered in discussion of harmful uses

PSWG Workplan Scope Key Themes De-identification Consent

5

Key Themes

1. Concerns about tools commonly used to protect privacyA. De-identification B. Patient consent v. norms of useC. TransparencyD. Collection/use/purpose limitationsE. Security

2. Preventing/Limiting/Redressing Harms3. Legal Landscape

A. Gaps or “under” regulationB. “Over-” or “mis-” regulation

PSWG Workplan Scope Key Themes De-identification Consent

6

Topic 1: De-identification - Concerns

Critical tool for protecting privacy, but:• Concerns persist about re-identification risk, particularly when data sets are combined

(mosaic effect) and for data de-identified using the safe harbor method• But safe harbor is intended to be easy to use and low cost, to encourage de-

identification• No prohibition/penalties against re-identification• When expert determination is used, no transparency or objective scrutiny of methods• Also de-identified data useful for many analytic needs – but not all (not the panacea)• Even when individuals are not re-identified in the dataset, sensitive

information/attributes about them may be revealed/inferred

PSWG Workplan Scope Key Themes De-identification Consent

7

Topic 1: De-identification - Definitions

Potentially helpful definitions:• HIPAA Definition of “de-identified”: § 164.514 Other requirements relating to uses and

disclosures of protected health information.(a) Standard: de-identification of protected health information. Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information.

• From NIH – “Data Enclave” - A controlled, secure environment in which eligible researchers can perform analyses using restricted data resources, but not take the data with them.

PSWG Workplan Scope Key Themes De-identification Consent

8

Topic 1: De-identification - Recommendations

Possible Solutions: [ideally we identify some “actors” for these recommendations]• Federal regulators should work together to set consistent de-identification standards

for all personal data (HIPAA has only standard now) and provide incentives for use of de-identified data. Re-identification risk reduction measures applied should depend on context (more applied for public use datasets vs. circumstances where access is controlled, such as through data enclaves)

• Regulators, led by OCR, should continue to define standards and best practices for expert determination. Regulators and industry could collaborate to establish mechanism to objectively vet statistician approaches; should they also be required to be published?

• Propose certification or accreditation for de-identification experts/organizations• Certification may professionalize and grow the field• Who should do this?

• Package statistical expertise via automation to provide easy (and ideally affordable) alternative to safe harbor [who should do this?]

PSWG Workplan Scope Key Themes De-identification Consent

9

Topic 1: De-identification - Recommendations

Possible Solutions: [ideally we identify some “actors” for these recommendations]• Congress should enact prohibitions on re-identification and establish penalties for

unauthorized re-identification• Regulations may need to establish public policy exceptions (for health & safety, or

for white hat testing of de-identification techniques?)• Regulators should require re-assessment of re-identification risk when datasets are

combined• Re-identification or the “mosaic effect” should be approved by IRB s or Privacy Boards• OCR should re-evaluate (or limit the use of) Safe Harbor (for example, limit its use to

those datasets that meet the presumption upon which Safe Harbor was created or has been tested; no public release datasets?)

• Regulators should impose security requirements to protect de-identified data; security protections should be commensurate with risk.

• How to deal with risk of privacy disclosures or inferences that are not due to re-identification?

PSWG Workplan Scope Key Themes De-identification Consent

10

Topic 1: De-identification - Recommendations

Possible Solutions: [ideally we identify some “actors” for these recommendations]• Regulators should examine potential for reduced requirements for de-identification in

certain circumstances for validated research. What are some of the circumstances?• Access to data in controlled environments, such as data enclaves (NIH definition:

A controlled, secure environment in which eligible researchers can perform analyses using restricted data resources.)

• Internal use only vs. disclosure to others. • Execution of data use agreements setting forth permitted uses and prohibiting

re-identification (similar to what is required for a HIPAA limited data sets).• Patient-controlled research initiatives?• Where research has been approved by an IRB or Privacy Board.

PSWG Workplan Scope Key Themes De-identification Consent

11

Topic 2: Consent - Concerns

Valued tool for protecting privacy and individual autonomy but:• Difficult to obtain informed consent up front for future, valuable big data uses and re-

uses• Some secondary uses may be unexpected (for example, in data analytics models

where the data surface the hypotheses)• May be impossible for large scale studies• Even allowing opt-out may skew results • Lays burden for privacy on individual• May work best when not over-utilized (for example, not requiring for “expected” uses)• Policy tension with the tech landscape (technologies to enable are evolving but policies

may not reflect technical capabilities). See TSSWG meeting slides on consent. http://www.healthit.gov/facas/calendar/2014/12/17/standards-transport-security-standards-workgroup

• When is transparency a better strategy for engaging individuals than seeking their individual consent, or even allowing opt-outs?

PSWG Workplan Scope Key Themes De-identification Consent

12

Topic 2: Consent - Recommendations

• Regulators should evaluate policies governing research uses of health data to determine when/under what circumstances such research uses can be pursued under individual engagement models not confined to opt-in specific authorization of a particular research use.• Presume research is defined as is currently done in HIPAA and the Common Rule:

“systematic investigation….intended to produce generalizable knowledge” [check wording]

• Consider whether secondary (with TPO not considered a secondary use) use of information introduces additional risk for individual, depending on context:• Is research being done in a controlled environment? Internal vs. external?• Are there limitations on who is permitted to see the information, and how

much information is exposed (identifiability)?• Is research intended for public benefit? (Is the research definition itself

sufficient to impose this limitation?)• Are there reasonable security protections for the data?

• Could be accomplished through changes in regulation or guidance under existing regulations• But could still have problem of varying interpretations by individual

institutions, IRBs

PSWG Workplan Scope Key Themes De-identification Consent

13

Topic 2: Consent - Recommendations

• Regulators and industry should explore/pursue/implement technology options that enable choice when it is required to be obtained.• Downstream restrictions coupled with consent provenance.

• Transparency to individuals about actual data uses – whether for identifiable or de-identified data – is key, particularly in circumstances where choice is not provided or is more limited. [what action/what actors?]

PSWG Workplan Scope Key Themes De-identification Consent

14

Backup Slides:Summary of Hearing Testimony

15

Health Big Data Opportunities & the Learning Health System Testimony

Beneficial opportunities for using data associated with the social determinants of health• User generated data; e.g., track diet, steps, workout, sleep, mood, pain, and heart rate• 3 characteristics: (1) breadth of variables captured, (2) near continuous nature of its

collection, and (3) sheer numbers of people generating the data• Personal benefits predictive algorithms for risk of readmission in heart failure patients• Community benefits asthma inhaler data to identify hot spots; track aggregate

behavior of runners• Key issues: privacy, informed consent, access to the data and data quality• Important to allow experimentation for the technology and methods to improve• Important to allow institutions catch up to learn how best to take advantage of

opportunities and realize potential benefits

“Care between the care” patient defined data. May ultimately reveal a near total picture of an individual – merged clinical and patient data; data must flow back and forth Data needs access, control and privacy mechanisms throughout its life cycle, at level of data use, not just data generation; data storage is not well thought through

16

Health Big Data Opportunities & the Learning Health System Testimony

Must embed learning into care delivery; we still do not have answers for a large majority of health questions

Key points:1. Sometimes there is a need to use fully identifiable data2. It is not possible to get informed consent for all uses3. Impossible to notify individuals personally about all uses4. Can’t do universal opt-out because answers could be unreliable5. There is likely a standard that could be developed that determines “clearly

good/appropriate uses” and “clearly bad/inappropriate uses”

Focus on:6. Minimum necessary amount of identifiable data (but offset by future use needs)7. Good processes for approval and oversight8. Uses of data stated publicly (transparency)9. Number of individuals who have accessed to data minimized (distributed systems help

accomplish this)When we use identifiable data, we must store it in highly protected locations – “data enclaves”

17

Health Big Data Opportunities & the Learning Health System - Testimony

• Shift in the way we look into data and its use• Paradigm of looking into the data first and then beginning to understand different

findings and correlations that you didn’t think about in standard hypothesis-driven research, but you do when you’re doing data driven research

• Focus on sharing, integrating, and analyzing cancer clinical trial data• Use de-identified data; de-identification is the responsibility of the data provider); most

data providers use expert determination method

• Data collected and used to conduct topological data analysis• Mathematics concept that allows one to see the shape of their data • Analysis can identify healthcare fraud, waste, and abuse, as well as reduce clinical

variation and improve clinical outcomes• Use de-identified data• We have not been able to get a data set that shows a continuum of care for a patient• While interoperability isn’t exactly perfect in other industries, in healthcare we’ve seen

that to be a unique issue

18

Health Big Data Opportunities & the Learning Health System Testimony

• Partners drawn from academia, care delivery, industry, technology and patient and consumer interest

• Key asset is the database – 7.7 terabytes of de-identified data from administrative claims of over 100 million individuals over 20 years, clinical data from electronic health records of 25 million patients, and consumer data on 30 million Americans

• Data provided to researchers vie secure enclave• Premise: combine the insights of multiple partners• Key issue: systematically coordinating uses of de-identified techniques with subsequent

uses of PHI

• Cloud-based, single instance software platform with 59,000 healthcare provider clients • Products include EHR, practice management, and care coordination services• Data immediately aggregated into databases; near real-time visibility into medical

practice patterns• Monitor visit data for diagnoses of influenza-like illness • Tracking the impact of the ACA on community doctors; sentinel group of 15k doctors;

measuring # patients seen, health status, and out-of-pocket payment requests

19

Health Big Data Concerns Testimony

A person’s health footprint now include Web searches, social media posts, inputs to mobile devices, and clinical information such as downloads from implantable devices

Key issues include (1) notice and consent, (2) unanticipated/unexpected uses, and (3) security

HIPAA does not apply to most apps

Without clear ground rules and accountability for appropriately and effectively protecting user health data, data holders tend to become less transparent about their data practices

Patient perspective• Frustration with “data dysfunction” - cannot access and combine his/her own data• Privacy and security are cited as excuses/barriers that prevent access to personal data• Health data is a social asset; there is a public need for data liquidity

20

Health Big Data Concerns Testimony

Issues from conferences on big data and civil rights:

1. The same piece of data can be used both to reduce health disparities and empower people and to violate privacy and cause harm

2. All data can be health data3. Focus on uses and harms rather than costs and benefits. Focusing on C&B implies

trade-offs. Instead, seek redress via civil rights laws.4. Universal design. Design the technology and services to meet the range of needs

without barriers for some.5. Ensure privacy and security of health information via all the FIPPs, not just consent6. Principle of preventing misuse of patient data. There are many good uses of health

information, but there must also be some prohibitions.

Consumer Protections Testimony

21

• Ease of re-identification narrative may be misleading• If you de-identify data properly, success rate is very low for attacks. If you don’t use

existing methods or de-identify data at all, and if data is attacked, success rate is high• De-identification is a powerful privacy protective tools• Most attacks on health data have been done on datasets that were not de-identified at

all or not properly de-identified • De-identification standards are needed to continue to raise the bar. There are good de-

identification methods and practices in use today, but no homogeneity.• HIPAA works fairly well – but mounting evidence that Safe Harbor has important

weaknesses• De-identification doesn’t resolve issues of harmful uses; may need other governance

mechanisms, such as an ethics or data access committees• Privacy architectures. Still need to de-identify the data that goes in to Save Havens• Distributed computation. You push the computations out to the data sources and have

the analysis done where the data is located

22

Consumer Protections Testimony

• Cant’ regulate something called “big data” because once you define it, people will find a way around it

• The people who think privacy protections don’t apply to big data are likely the same people who have always been opposed to privacy protections

• No reason to think HIPAA’s research rules need to be different because of big data. HIPAA at least sets a clear and consistent process that covered entities and business associates must follow

• Privacy laws today are overly focused on individual control• Individual control is inadequate as both a definition and an aspiration. Impossible

expectation to think a person can control his or her personal health data • The effect of control is an impediment to availability. For most patients & families, the

primary concern about data misuse was that they would be contacted• Privacy is too critical and important a value to leave to a notion that individuals should

police themselves• We need to be thinking about how to make sure data is protected at the same time that

it’s available. We don’t let the mechanisms of protection by themselves interfere with the responsible use of the data

23

Current Law Testimony

• HIPAA Safe Harbor de-identification requires removal of 18 fields• May not give researchers the data they need/want; but some researchers cited the

value of de-identified data• Limited data set is a bit more robust, but not a lot• Definition of research same under HIPAA and Common Rule (generalizable knowledge)• May receive a waiver to use data by an IRB or privacy board• HITECH changes:

• Authorization may now permit future research (must adequately describe it)• Some compound authorizations now permitted for research purposes

• HIPAA applies to covered entities and business associates; patient authorization/consent is not required for treatment, payment, or healthcare operations purposes

• Paradox in HIPAA• Two studies that use data for quality improvement purposes using the same data

points done to address the same question or sets of questions and done by the same institution will be treated as operations (no consent required) if the results are not intended to contribute to generalizable knowledge (intended for internal quality improvement instead)

24

Current Law Testimony

• HIPAA does not cover a large amount of healthcare data• Past few years = explosion in amount of data that falls outside of HIPAA

• Mobile applications, websites, personal health records, wellness programs • FTC is default regulator of privacy and security unfair or deceptive acts or practices

• Very active on general enforcement of data security standards• Debate as to whether the FTC really has authority to do this; 2 pending cases

• Less FTC enforcement in privacy space, especially healthcare• Tough question is broader FTC ability to pursue unfair practices in area of data

privacy (enforcement of deceptive practices is easier)• Fair Credit Reporting Act (FCRA) governs how information is gathered, used, and what

people must be told about contents of credit reports• Specific prohibitions using medical data for credit purposes

• Many conflicting state laws, which are often confusing, outdated and seldom enforced• Key issue: substantial gaps exist

• More and more data that is health-related is falling outside the scope of HIPAA rules

25

Key themes in depth

KEY THEMES IN DEPTH

26

Gaps, or potential “under-” regulation

• § 164.514 Other requirements relating to uses and disclosures of protected health information.(a) Standard: de-identification of protected health information. Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information.

(Is this the definition you'd like to use for "de-identification"? Are there other definitions?) This is the HIPAA definition – it’s the only one I’m aware of, and we should acknowledge we are using it but it may not necessarily be the standard that all currentl follow . We should incorporate this into the deep dive de-identification slides vs. having this separate slide. Insert definition of what we mean when we say identifiable data under HIPAA because once it is de-identified it can be used for whatever this raises concerns that can be addressed as part of the de-identification discussion in the deep dive slides.

27

Gaps, or potential “under-” regulation

• HIPAA applies to health “big data” – but only to identifiable health data collected, accessed, used and disclosed by some (in particular, covered entities and business associates).

• HIPAA does not apply to data that has been de-identified (see definition on prior slide)• HIPAA does not apply to health data collected, accessed, used and disclosed elsewhere –

including in consumer-facing devices and spaces (e.g., the web, mobile apps)• “Non-health” data, which is collected and used initially for non-health purposes, would likely

also be outside of the scope of HIPAA, and could potentially be used for health purposes (for example, socioeconomic determinants).

• FTC has authority (both for entities subject to HIPAA and those not subject to HIPAA) to crack down on unfair and deceptive consumer-directed trade practices with respect to health data and non-health data collection and use – but this is not a comprehensive privacy and security regulatory framework. FTC does not have authority over non-profits except for personal health records (& related apps) for breach notification, per HITECH.

• Consumers/patients have access to health information held by entities covered by HIPAA to make decisions about themselves– but often have difficulty exercising this right (at all or in a timely way), and this right does not extend to all personal data they collect and share; consumers also often do not have access to information used to make decisions about them (except in circumstances covered by the Fair Credit Reporting Act), and often don’t have access to research data.

28

Potential “Over- (or mis-)” regulation

• HIPAA “Paradox” or QI/Research Distinction – two studies using data for QI purposes, using the same data points to address the same question; one study will be treated as “operations“ (no consent required) if the primary purpose of the study does NOT include contributing to “generalizable knowledge,” and the other, intended to contribute to generalizable knowledge, will be treated as research.

• Managing multiplicity of state laws for analytics done across state lines (Gail commented that legislation would be needed not guidance)

• Other regulatory considerations/complexity:• 42 CFR part 2 – while does not differ by state, distribution of data is complicated• Common Rule• FDA – explore their oversight, gain deeper understanding. How do we want to

gather this information and what is the timeline? Do we gather testimony from FDA, research offline, other method?

• Others?

29

De-identification

Critical tool for protecting privacy, but:• Concerns persist about re-identification risk, particularly when data sets are combined

(mosaic effect) and for data de-identified using the safe harbor method• But safe harbor is intended to be easy to use and low cost, to encourage de-

identification• No prohibition/penalties against re-identification• When expert determination is used, no transparency about or objective scrutiny of methods• Also de-identified data useful for many analytic needs – but not all (not the panacea)• § 164.514 Other requirements relating to uses and disclosures of protected health

information.(a) Standard: de-identification of protected health information. Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information. (Is this definition sufficient for de-identification. Other definitions?)

• Data enclaves – highly protected locations to store and analyze data. Functions like a sandbox where the data never leaves the data enclave and can never be combined with outside data. A tool that allows the sharing, among a closed community of researchers, of datasets that are too sensitive to share broadly. (Is this the right definition?)

• From NIH - Data Enclave - A controlled, secure environment in which eligible researchers can perform analyses using restricted data resources.

30

De-identification Concerns

• Micky suggested a slide on de-identification concerns may need to be added. • Mitre to pull main concerns for this slide from Testimony. • Is this the right placement for the slide? Should there be another slide with the deep

dive discussion slides as well? We don’t need a slide here on this – should be part of deep dive de-identification discussion.

31

Topic 1: De-identification - Concerns

PSWG Workplan Scope Key Themes De-identification Consent

Hurdles Privacy Risks• When expert determination is used, no

transparency or objective scrutiny of methods

• De-identified data may have limited future utility

• HIPAA provides some standards – but they are not universally applicable

• Re-identification risk, particularly when data sets are combined (mosaic effect) and for data de-identified using the safe harbor method

• No prohibition/penalties against re-identification

• Revealing information/attributes about members of a group

32

Topic 1: De-identification - Concerns

Topic ApplicationData Generation • Safe Harbor is intended to be easier and

cheaper, but more vulnerable• Little transparency in the expert/statistician

determination method• HIPAA provides some standards – but they

are not universally applicableData Use • Re-identification risk depends on context

(for example, public use datasets vs. more controlled environments)

• Combining datasets once considered to be de-identified may increase re-identification risk

• No prohibition on re-identification

Problem: Concerns have been raised about de-identification; consequently, de-identification is under pressure in a big data world.

PSWG Workplan Scope Key Themes De-identification Consent

33

Topic 1: De-identification - Concerns

Topic ApplicationData Usability • De-identified data may have limited future

utilityRisk of Harm • Re-identification potential

• Revealing information/attributes about members of a group

Problem: Concerns have been raised about de-identification; consequently, de-identification is under pressure in a big data world.

PSWG Workplan Scope Key Themes De-identification Consent

34

Consent

Valued tool for protecting privacy and individual autonomy but:• Difficult to obtain informed consent up front for future, valuable big data uses and re-

uses• May be impossible for large scale studies• Even allowing opt-out may skew results • Lays burden for privacy on individual• May work best when not over-utilized (for example, not requiring for “expected” uses)• Policy tension with the tech landscape. See TSSWG meeting slides on consenthttp://www.healthit.gov/facas/calendar/2014/12/17/standards-transport-security-standards-workgroup• Unexpected secondary uses. Downstream restrictions coupled with consent

provenance. Transparency vs individual choice.

Topic 2: Consent - Concerns

35

Hurdles Privacy Risks• Difficult to obtain informed consent up

front for future, valuable big data uses and re-uses.

• May be impossible for large scale studies.

• Even allowing opt-out may skew results.

• Technologies to enable are evolving but policies may not reflect technical capabilities.

• Lays burden for privacy on individual• Unexpected secondary uses.• Transparency vs individual choice.

PSWG Workplan Scope Key Themes De-identification Consent

36

Topic Application

Problem: ???????

Topic 2: Consent - Concerns

PSWG Workplan Scope Key Themes De-identification Consent

37

Transparency

• Consumers/patients lack transparency about actual uses and disclosures of their personal information• HIPAA Notice of Privacy Practices covers what entities have the right to do with

data, not what they actually do• Privacy Policies, driven primarily by a need to provide legal defensibility are written

for regulators, not consumers, often too long, difficult to read• Uses of de-identified data rarely disclosed • As noted in a previous slide, lack of transparency about data, basis for decisions (for

example, uses of algorithms)

38

Other Protections

Collection/use/purpose limitations• Do these limits hinder valuable uses of/insights from big data? (allowing data to surface

the hypotheses vs. limiting data collection and use to what is needed to address a specific question)

• Complete transparency –may encourage data to be withheld . Tension between transparency and limitations. (Comment made here but should we place this on transparency slide too?)

• Define re-identification practices• Concerns resulting from rejoining of data – deductions. Threat of sharing across

domains that were not intended. No regulation.• All data is health data/can be used to evaluate health – what protections should exist? • Regs deal with data from providers not health status. Regs are business specific.• Special sensitivities of data about you that is health related – what controls can be built

in? Many potential harms to consider.

Data Security (suggest separate slide)• One presenter raised concerns about data storage security• (Insert more content on data security and storage practices…encryption,

authentication, authorization, redundancy, etc.)

39

Harms

A number of presenters urged us to consider protections that would prevent/limit harms to individuals caused by collection, use and disclosure of big data for health. Such harms could include:• Discrimination- data “redlining”• Embarrassment/dignity• To individuals or to groups• To trust?

• Harms resulting from sharing data across domains and re-joining it:• Financial harms• Genomic harms• Harm from family history data• Medical identity theft harm• Other?

40

Other key themes?

Insert any additional themes received by WG members by e-mail Forwarding one comment received by Gil Kuperman, but it doesn’t suggest additional themes but a potential framework to apply to each theme. I think we can delete this slide for now.