towards a socio demographic fingerprint ch-iassist 2013
TRANSCRIPT
Towards a procedure to anonymise micro data
Anonymising data from official statistics for public use
IASSIST, Köln - 30.05.2013 Katelijne Gysen
2
Outline
1. Promotion of official statistics
2. Anonymisation of data2.1 Trade off: disclosure risk versus data utility
2.2 Procedure
2.3 Parameter setting for Statistical Disclosure Control (SDC)
3. Uniqueness and k-anonymity3.1 Concepts
3.2 Recent research on mobility data
3.3 The real fingerprint
3.4 Socio-demographic fingerprint
3
1. Promotion of official statistics
Data from National Statistical Institute (NSI) Labour Force Survey Survey on Structure of Earnings SILC (Survey on Income and Living Conditions) PISA (Education) Swiss Health Survey Population Census and Business Census, …
Micro data for research and teaching purposes
Collaboration with our NSI:
4
2. Anonymisation of data
2.1 Trade-off dilemma: disclosure risk versus data utility
researcher versus data owner
Data utility
Data protection
5
2.2 Procedure (1)Dataset
Release data
Risk / utility Balance ?
Describe Intrusion scenario
Apply SDC methods
Describe Dataset characteristics
Define Target public
Release data
Disclosure risk ?
Measure Data utility
Describeaccess conditions
6
2.2 Procedure (2)Dataset
Release data
Data utility ?
Describe Intrusion scenario
Apply SDC methods
Set SDC parametersDescribe
Dataset characteristics
Define Target public
SDC parameters
met ?
Release data
Disclosure risk ?
Measure Data utility
Describeaccess conditions
7
2.3 Parameter setting for Statistical Disclosure Control (SDC)
1. Age of the data (min.)
2. Subsample (min.)
3. Level of geographical detail (max.)
4. Global and individual risk (max.)
5. Number of indirect identifying variables (max.)
6. Degree of anonymity for socio-demographic characteristics (min.)
8
Micro data
iden
tify
ing
vari
ab
les
Non identifying variables Rare
Observable
Searchable
3 Uniqueness and k-anonymity - 3.1 Concepts
9
3.2 Recent research about mobility data
“… four, randomly chosen “spatio-temporal points” (for example, mobile device pings to antennas)
is enough to: uniquely identify 95% of the individuals”.
The mobility pattern is apparently unique.
10
3.3 The real fingerprint
“There are as many as 150 ridge characteristics (points) in the average fingerprint.
So how many points must a fingerprint examiner match in order to safely say the prints are indeed those of a particular suspect?”
The answer is surprising.
“There is no standard number required. …
… In fact, the decision as to whether or not there is a match is left entirely to the individual examiner. However, individual departments and agencies may have their own set of standards in place that requires a certain number of points be matched before making a positive identification.”
Source: http://www.leelofland.com/wordpress/comparing-fingerprints-whats-the-point
/
11
3.4 The socio-demographic fingerprint
Gender Date of birth Municipality
Civil status Nationality
12
3.4 The socio-demographic fingerprint (2)
Source: STATPOP 2010, BFS.
k-anonymity
1 2 5 20 100 1000
Gender * DOB * Municipality 74 86.9 95.3 100 100 100
Gender * YOB * Municipality 0.7 1.9 6.3 27.6 68.3 92.1
Gender * YOB * Civil status * Municipality 3.2 6.4 14.9 41.5 77.9 96.6
Gender * YOB * Nationality * Municipality 7.9 12.9 21.3 47.1 82 97.1
Gender * YOB * Civil * Nation * Municip. 12 18.6 31.1 59.6 87.4 98.9
Anonymity of the Swiss population given simple socio-demographics
13
References
de Montjoye, Y.A., Hidalgo C.A., Verleysen M., Blondel V.D. Unique in the crowd: the privacy bounds of human mobility. Scientific Reports 3, article 1376, DOI: 10.1038/srep01376. 2013
Franconi, L., Public Use Files: practices and methods to increase quality of released microdata. OECD, 2012.
Golle, P. Revisiting the uniqueness of simple demographics in the US population. Palo Alto Research Center. 2006
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Schulte Nordholt, E., Spicer, K. , De Wolf P.P., Statistical Disclosure Control. Wiley. 2012.
Sweeney, L. Simple Demographics often identify people uniquely. Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh 2000.
Sweeney, L. k-Anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuziness and Knowledge-based Systems, 10 (5), 2002, 557-570.
Meindl, B., Kowarik, A., Templ M. Guidelines for the anonymisation of microdata using R-package sdcMicro. Vienna. 2012
14
Find out more ?
about FORS: www.fors.unil.ch about public microdata for research in CH: www.compass.unil.ch
Let’s connect !