setting the stage:   how de-identification came into u.s. law, and why the debate matters today

14
Setting the Stage: How De-Identification Came into U.S. Law, and Why the Debate Matters Today Professor Peter Swire Ohio State University/Future of Privacy Forum FPF Conference on DeIdentification National Press Club December 5, 2011

Upload: preston

Post on 05-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Setting the Stage:   How De-Identification Came into U.S. Law, and Why the Debate Matters Today. Professor Peter Swire Ohio State University/Future of Privacy Forum FPF Conference on DeIdentification National Press Club December 5 , 2011. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Setting the Stage: How De-Identification Came into U.S. Law,

and Why the Debate Matters Today

Professor Peter SwireOhio State University/Future of Privacy ForumFPF Conference on DeIdentificationNational Press ClubDecember 5, 2011

Page 2: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Overview

• U.S. history: Census, federal agency statistics, & HIPAA

• Why Deidentification (DeID) matters today– The debate – it works or it doesn’t– Three threat models– Analogy to law enforcement

• Big picture – useful for many tasks, even with the limits shown by scientists

Page 3: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Census, Statistics & DeID

• Many years of Census experience– Highly useful data– Deidentified• Periodic opposition to mandatory reporting• Needed strong confidentiality promises

– Suppress small cell size• Only home in a census tract

– Fuzz data– Strict rules against release even for national security

purposes

Page 4: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Federal Agency Statistics

• Codification in Confidential Information Protection & Statistical Efficiency Act of 2002 (CIPSEA)– Good history by Sylvester & Lohr

• Basic rule: if collect data for statistical purposes, use only for statistical purposes, don’t ReID

• Funny thing: same culture & practice for years in private sector polling (Gallup-style) and market research

• Many years of practice here• Perhaps a basic guideline going forward?

Page 5: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

HIPAA• 1999-2000 regs informed by Sweeney research• Safe harbor – delete a lot of specified data fields• Expert (I pushed for this) – where statistical basis, can

achieve DeID based on risk, not safe harbor• Data use agreements – release for research, with

enforceable promise not to ReID• In short:– If scrubbed enough, can release publicly– If scrubbed less, then enforceable promise not to

ReID

Page 6: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Why It Matters Today

• Now data mining far beyond specialized researchers– The Internet (commercial since only 1993) gives

me access to data– Storage & processing on my laptop > mainframe of

25 years ago– Search is way better– The erosion of practical obscurity – “they” really

may figure out who “we” are

Page 7: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

The Debate is Joined

• Ohm (and others) draw on Sweeney-type research– DeID likely to lead to ReID

• Yakowitz (and others) respond– Benefits of public data enormous– Practical risk/harm from ReID low

• Anonymization creates huge risks or low risks?• Worth doing anonymization/DeID at all?• Today’s conference to shed light on this …

Page 8: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Threat Models – Which Attackers?

• Three types of attackers on “anonymized” data:– Insiders “peeping”– Outside hackers intruding– The public who doesn’t get into the database

• DeID often effective for first two• Ohm/Yakowitz debate primarily on the third

Page 9: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Insiders Peeping• Swire 2009 Peeping article, at peterswire.net• Threat: employee or employee of sub-contractor sees

the data and “peeps”– Sees celebrity information - Clooney– Sees information about friend/family/ex– Sees information to create harm (ID theft, blackmail)

• Anonymization useful part of anti-peeping strategy– Employee doesn’t search or stumble upon Clooney– Employee may lack tools to do Sweeney-type analysis– Audit logs catch employees who try– Give employees access to statistical data, not PII

Page 10: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Outside Hackers

• Hacker may intrude for a short while– Anonymization may prevent “ah hah” – Clooney

• Hacker may download database– If so, then hacker becomes similar to the public– May or may not be good at Sweeney-type tricks– May be focused on specific types of information,

and not try to ReID• Less-than-perfect DeID may substantially reduce

incidence of ReID

Page 11: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Re-ID by “The Public”

• So, masking may help against some threats• The debate, though, is whether “the public” (i.e., the

experts) can ReID• Sweeney & other research provides startling &

important results of ReID– Can everything be ReIdentified?

Page 12: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

ReID & 2 Famous Studies

• Date of birth, zip, & gender -> 80%+ unique– Yes– BUT, DOB is off-the-charts different• Gender – splits population in half• DOB = 366 (days) x 80 (years) = over 25,000 cells• Moral – DOB ridiculously strong to ReID

• Netflix and can Re-ID over 60% of movie reviews– BUT, takes known ImDB reviewers and matches to

Netflix– Can ReID a lot, but not a big effect

Page 13: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Law Enforcement Analogy

• So, is ReID generally easy or hard, useful or useless?• Consider cop with a bunch of clues (male, tall, red

hair, etc.)– Enough to ReID? No– Helpful to ReID? Yes– A matter of how much legwork, analysis, extra data is

available and accurate– Very big range for difficulty of finding the suspect– Same is true for ability of “the public” to ReID, to name

the suspect

Page 14: Setting the Stage:   How  De-Identification Came into U.S. Law, and Why the Debate Matters Today

Conclusion

• Issue matters today -- more data potentially available to “the public”

• History of useful anonymization in statistics– If collect data for statistical purposes, use only for

statistical purposes, store that way, don’t ReID• DeID helps against insider & hacker threats• DeID by “the public” varies widely in the effort needed

to find the “suspect”• Our conference today to help policymakers learn

where DeID likely to be most useful