on method-specific record linkage for risk assessment
DESCRIPTION
On method-specific record linkage for risk assessment. Jordi Nin Javier Herranz Vicenç Torra. On method-specific record linkage for risk assessment Contents. Disclosure Risk Scenario: How an intruder re-identifies an individual Preliminaries : Protection methods and Record Linkage - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/1.jpg)
On method-specific record linkage for risk assessment
Jordi NinJavier Herranz Vicenç Torra
![Page 2: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/2.jpg)
2
Disclosure Risk Scenario:
How an intruder re-identifies an individual
Preliminaries:
Protection methods and Record Linkage
Location record linkage:
A new way to compute the disclosure risk
Conclusions and future work:
On method-specific record linkage for risk assessment Contents
![Page 3: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/3.jpg)
3
Disclosure Risk Scenario
Preliminaries
Location Record Linkage
Conclusions and future work
![Page 4: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/4.jpg)
4
On method-specific record linkage for risk assessment Disclosure Risk Scenario
X
n
a
Attribute classification
Identifiers: Passport number
Quasi-Identifiers: Age, postal code
Confidential: Income
id SexMarital status
Income
1
2
...
Male
Male
...
Single
Single
…
13.500
11.000
…
![Page 5: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/5.jpg)
5
On method-specific record linkage for risk assessment Disclosure Risk Scenario
Re-identification scenario
X = id || Xnc || Xc X’ = X’nc || Xc
Privacy is ensured, quasi-identifiers are anonymized
Data quality is preserved, confidential attributes are preserved
![Page 6: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/6.jpg)
6
On method-specific record linkage for risk assessment Disclosure Risk Scenario
Data set 1 Data set 2
X1 X2 X3 X4
X1 X2 X3 X4
X1 X2 X3 X4
X’1 X’2 X’3 X’4
X’1 X’2 X’3 X’4
X’1 X’2 X’3 X’4
Problem: Find a correct mapping between data file 1 and data file 2
Record Linkage
![Page 7: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/7.jpg)
7
On method-specific record linkage for risk assessment Disclosure Risk Scenario
Distance based Record linkage
Probabilistic Record linkage
• The nearest pairs of record are considered as linked pairs • It is very easy to tune
• Results very dependent of the parameters
• Moderated time cost
• Linked pairs are computed using conditional probabilities • Tuning is difficult
• Few parameters
• High time cost
![Page 8: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/8.jpg)
8
Disclosure Risk Scenario
Preliminaries
Location Record Linkage
Conclusions and future work
![Page 9: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/9.jpg)
9
On method-specific record linkage for risk assessment Preliminaries
Rank swapping - p
Algorithm
For all attrj where 1 j n
Attrj is sorted
all values xij are swapped with xil where i < l l+p
Sorting Attrj is reversed
End for
End algorithm Simple
Preserve µ and
All combinations disappear
![Page 10: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/10.jpg)
10
On method-specific record linkage for risk assessment Preliminaries
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
Rank swapping - p example
p = 20%
8
6
10
7
9
2
1
4
5
3
1
2
3
4
5
6
7
8
9
10
![Page 11: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/11.jpg)
11
On method-specific record linkage for risk assessment Preliminaries
Microaggregation - ka
k
a a a
k
k
k
a = 1 Optimal
a > 1, NP-Hard Heuristic
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
k=3
![Page 12: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/12.jpg)
12
On method-specific record linkage for risk assessment Preliminaries
Optimal univariate Microaggregation
Result 1. When the elements are sorted according to an attribute, for any optimal partition, the elements in each cluster are contiguous (non overlapping clusters exist)
Result 2. All clusters of any optimal partition have between k and 2k-1 elements.
x1
x2
x3
x4
k = 2
Clusters are built using the nodes of the shortest path
algorithm
![Page 13: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/13.jpg)
13
On method-specific record linkage for risk assessment Preliminaries
MDAV Microaggregation
k=2
X X’
MDAV is multivariate heuristic microaggegation
![Page 14: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/14.jpg)
14
On method-specific record linkage for risk assessment Preliminaries
Score: Protection method evaluation
Score = 0.5 IL + 0.5 DR
IL = 100(0.2 IL1+0.2 IL2+0.2 IL3+0.2 IL4+0.2 IL5)
IL1 = mean of absolute error
IL2 = mean variation of average
IL3 = mean variation of variance
IL4 = mean variation of covariancie
IL5 = mean variation of correlation
DR = 0.25 DLD+0.25 PLD+0.5 ID
DLD = number of links using DBRL
PLD = number of links using PRL
ID = protected values near orginal
![Page 15: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/15.jpg)
15
Disclosure Risk Scenario
Preliminaries
Location Record Linkage
Conclusions and future work
![Page 16: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/16.jpg)
16
On method-specific record linkage for risk assessment Location Problem Desciption
L-RL: Location Record Linkage
Standard record linkage compares all records
Rank swapping, univariate microaggregation and other methods only use some original records to create the protected data set
It is unnecessary to compare all the records
![Page 17: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/17.jpg)
17
On method-specific record linkage for risk assessment Location record linkage
Method Description
Xext X’QuickTime™ and a
Photo - JPEG decompressorare needed to see this picture.
![Page 18: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/18.jpg)
18
On method-specific record linkage for risk assessment Location record linkage
Example: Rank swapping
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
P=20%
17
6
13
14
16
19
12
5
16
Distance
![Page 19: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/19.jpg)
19
On method-specific record linkage for risk assessment Location record linkage
Rank Swapping Experiments
Data sets:
Census (1080 records & 13 attributes)
EIA (4092 records & 10 attributes)
Rank swapping configurations:
p = 2 … 20
Score modifications:
DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
![Page 20: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/20.jpg)
20
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
On method-specific record linkage for risk assessment Location record linkage
L-RL: Rank Swapping Linkage Results
![Page 21: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/21.jpg)
21
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
On method-specific record linkage for risk assessment Location record linkage
L-RL: Rank Swapping Score Results
![Page 22: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/22.jpg)
22
On method-specific record linkage for risk assessment Location record linkage
Univariate Microaggregation Experiments
Data sets:
Census (1080 records & 13 attributes)
EIA (4092 records & 10 attributes)
Univariate microaggregation configurations:
k = 10 … 50
Score modifications:
DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
![Page 23: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/23.jpg)
23
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
On method-specific record linkage for risk assessment Location record linkage
L-RL: Univariate Microaggregation Linkage Results
![Page 24: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/24.jpg)
24
On method-specific record linkage for risk assessment Location record linkage
L-RL: Univariate Microaggregation Score Results
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
![Page 25: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/25.jpg)
25
On method-specific record linkage for risk assessment Location record linkage
MDAV Experiments
Data sets:
Census (1080 records & 13 attributes)
EIA (4092 records & 10 attributes)
Univariate microaggregation configurations:
k = 10 … 50
Score modifications:
DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
![Page 26: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/26.jpg)
26
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
On method-specific record linkage for risk assessment Location record linkage
L-RL: MDAV Linkage Results
![Page 27: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/27.jpg)
27
On method-specific record linkage for risk assessment Location record linkage
L-RL: MDAV Score Results
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
![Page 28: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/28.jpg)
28
Disclosure Risk Scenario
Preliminaries
Location Problem Description
Location Record Linkage
Conclusions and future work
![Page 29: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/29.jpg)
29
On method-specific record linkage for risk assessment Conclusions and future work
• We have presented a new type of record linkage designed
to exploit the limitations of some protection methods
• L-RL method obtains a more accurate DR evaluation for
rank swapping and univariate microaggregation
• MDAV is immune to the location problem
Conclusions
• We plan to study the DR of MDAV and other protection methods using other ad-hoc methods
Future work
![Page 30: On method-specific record linkage for risk assessment](https://reader035.vdocuments.us/reader035/viewer/2022081603/56814bfe550346895db8fc3e/html5/thumbnails/30.jpg)
On method-specific record linkage for risk assessment
Jordi NinJavier Herranz Vicenç Torra