eurostat statistical disclosure control. presented by peter-paul de wolf, statistics netherlands...
TRANSCRIPT
Content
• Introduction• What’s the problem?
– Specific for business statistics
• Formalising the problem• What to do?
– Methods– Software
• Summary
Introduction
• General definition of confidential data:
Data can not be published “as is”
» By law (e.g. statistical law)» Sensitive data (what’s sensitive?)» Respondent considers it confidential» …
Introduction
• Physical protection– Entrance– Network
• Legal protection– Oath
• Statistical Disclosure Control– Protection of statistical output
What’s the problem?
Statistical output• Microdata
– Not often in case of business data– Obvious: each record represents a single respondent
• Tabular data– In business data often magnitude tables– Sometimes frequency tables– But: aggregated data?!?!?!?
• Cell value itself not sensitive:– All contributions are equal (1)
• Spanning variables– Indentifying, e.g. NACE, Region– Sensitive, e.g. “environmental offence”
(illegal dumping of waste, illegal fishing, oil spills, …)
What’s the problem (frequency table)
What’s the problem (frequency table)
Example: number of ship-owners
Environmental offenceRegion Yes No Total … A 9 0 9 ...
What’s the problem (frequency table)
Example: number of ship-owners
Environmental offenceRegion Yes No Total … B 14 2 16 ...
What’s the problem (frequency table)
Example: number of ship-owners
Environmental offenceRegion Yes No Total … C 1 1 2 ...
What’s the problem (magnitude table)Turnover (106 €) of instrument producing companies
Region A B C
TotalHarps 58 151 47 123 36 98 141 372
Organs 71 16 124 21 24 9 219 46
Pianos 92 5 157 2 59 1 308 8
Other 800 302 934 362 651 287 2385 951
Total 1021474 1262 508 770 395 3053 1377
What’s the problem (magnitude table)Turnover (106 €) of instrument producing companies
Region A B C
TotalHarps 58 151 47 123 36 98 141 372
Organs 71 16 124 21 24 9 219 46
Pianos 92 5 157 2 59 1 308 8
Other 800 302 934 362 651 287 2385 951
Total 1021474 1262 508 770 395 3053 1377
?
Formalising the problem
Suppose cell (Piano, A) consists of
Company X: 81106 €Company Y: 5106 €Other three: 2106 € eachTotal : 92106 €
92 – 5 = 87
is within 7.4%!
Formalising the problem
General, objective rules needed
• Threshold rule• Dominance rule or (n,k)-rule• p%-rule
p%-rule is favoured over (n,k)-rule and implies minimum of 3 contributors
What to do?
• Redesign table– Combine rows/columns– Define different categories
• Rounding• Add noise• Cell suppression
Cell suppression
Region A B C D Total
Harps 58 47 36 89 230
Organs 71 124 24 31 250
Pianos 92 157 59 28 336Other 800 934 651 742 3127Total 1021 1262 770 890 3943
Cell suppression
Region A B C D Total
Harps 58 47 36 89 230
Organs 71 124 24 31 250
Pianos 92 157 59 28 336Other 800 934 651 742 3127Total 1021 1262 770 890 3943
X
X
X
Cell suppression
Region A B C D Total
Harps 58 47 36 89 230
Organs 71 124 24 31 250
Pianos 92 157 59 28 336Other 800 934 651 742 3127Total 1021 1262 770 890 3943
X
X
X
X
X X
Cell suppression
Region A B C D Total
Harps 58 47 36 89 230
Organs 71 124 24 31 250
Pianos 92 157 59 28 336Other 800 934 651 742 3127Total 1021 1262 770 890 3943
X
X
X
XX
X
X
X X
Cell suppression
Region A B C D Total
Harps 58 47 36 89 230
Organs 71 124 24 31 250
Pianos 92 157 59 28 336Other 800 934 651 742 3127Total 1021 1262 770 890 3943
X
X
X
XX
X
X
X X
Software
Latest version can be found on
http://neon.vb.cbs.nl/casc
New Open Source versionavailable end 2014
Contact/info
• Glossary, handbook, project info– http://neon.vb.cbs.nl/casc
• Wiley book