census 2011 – a question of confidentiality
DESCRIPTION
Census 2011 – A Question of Confidentiality. Statistical Disclosure control for the 2011 Census. Carole Abrahams ONS Methodology BSPS – York, September 2011. Overview. Brief introduction to SDC Census outputs & confidentiality Record swapping Data utility 2001 vs 2011 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/1.jpg)
Census 2011 – A Question of Confidentiality
Statistical Disclosure control for the 2011 Census
Carole AbrahamsONS Methodology
BSPS – York, September 2011
![Page 2: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/2.jpg)
Overview
• Brief introduction to SDC
• Census outputs & confidentiality
• Record swapping
• Data utility
• 2001 vs 2011
• Communal Establishments
• Further work
![Page 3: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/3.jpg)
Introduction to SDC (1) - What is disclosure risk?
There is a disclosure risk when information is published that could allow an intruder to indicate the identity or particulars of:
• an individual
• a household or family
• a business
• or another statistical unit
![Page 4: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/4.jpg)
4
• Identification disclosure• Attribute disclosure (AD)• Group disclosure
Introduction to SDC (2) - Examples of disclosure risk
![Page 5: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/5.jpg)
Introduction to SDC (3) - Statistical Disclosure Control
Statistical Disclosure Control (SDC) involves
either: • introducing sufficient ambiguity/damage into, or
reducing level of detail, of published statistics, so that the risk of disclosing confidential information is reduced to an acceptable level
and/or:• controlling access to data
![Page 6: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/6.jpg)
Census outputs and confidentiality
• Disclosure control of Census outputs required by law
• Pledge on Census forms• Visible variables
– use to identify individual/family/household– find out something new about them– Data Environment Analysis Service (DEAS)
• Sensitive variables– defined by DPA
![Page 7: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/7.jpg)
Risk – Utility balance
Disclosure Risk:
Information about
confidential units
Data Utility: Information about legitimate items
Original Data
No dataReleased
Data
Maximum Tolerable Risk
High
High
Low
![Page 8: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/8.jpg)
SDC for Census 2001
• Random record swapping
• Lack of harmonisation and late changes to agreed methodology
• SCA applied in E, W, NI, not in Scotland
• SCA protected individual tables, but some remaining risk through differencing
• Effect on utility at low geographies and in creating bespoke geographies
![Page 9: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/9.jpg)
9
104 Delivery Groups (DGs) in England & Wales
• ≈ 4 LADs in a DG
• ≈ 20 MSOAs in an LAD
• ≈ 20 OAs in an MSOA
Census Geography
DG
LAD LAD
LAD LAD
MSOAMSOA
MSOA MSOA
OA OA
OAOA
![Page 10: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/10.jpg)
SDC for Census 2011
• RsG agreement November 2006– Small cell counts as long as ‘sufficient uncertainty’– Main risk attribute disclosure
• Targeted record swapping
– Targeted to ‘risky’ records – Risk looks at particular variables, takes account of
geography– Risk scores for individuals combined to household score – Households swapped– Households swapped only as far as their risk is considered
‘high’– Imputation considered as part protection
![Page 11: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/11.jpg)
Targeted swapping (1)
• Households− Risk score on uniqueness/rarity of small number of key
variables at different geographies
• Probability −inversely related to area imputation rate−positively related to household risk score
• Matching−look for matches only as far as is necessary−Match on household size, and other variables if possible
![Page 12: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/12.jpg)
Targeted swapping – an example of how it works (1)
Risky within OA
Risky within MSOA
Risky within LA
Swap with h’hold in another OA in MSOA
Swap with h’hold in another MSOA in LA
Swap with h’hold in another LA within delivery group
Household is in area that has high response rate, therefore low imputation.
So area has higher than average swapping rate
![Page 13: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/13.jpg)
Targeted swapping – an example of how it works (2)
Household found to be risky within OA and is selected for swapping. Only swapped between OAs in the same MSOA.
Households are matched on:
Adults = 2
Children = 1
Pets = 2
![Page 14: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/14.jpg)
Swapping & Sufficient uncertainty
• Level of swapping in an area determined by level of non-response / imputation
• Swapping lower where more imputed records
• Sufficient uncertainty has been assessed by two factors:– Percentage of real attribute disclosures (ADs)
protected by imputation & swapping– Percentage of apparent ADs created
![Page 15: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/15.jpg)
Effect of targeted swapping on data utility
LLTI by OA LLTI by MSOA
• Typical effect of swapping on numbers of people with LLTI
• Based on 2001 data
• Utility higher at MSOA than at OA
![Page 16: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/16.jpg)
Summary of SDC methodology
• Main effect on utility will be for small cells at low level geographies
• Tables will be consistent and additive
• Will use minimum average cell size
• All univariate residence-based tables at OA publishable
• There will be no small cell adjustment
• Tables will contain apparent small cells and apparent ADs, but an intruder can’t find out something about an individual case with a “high degree of confidence”
![Page 17: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/17.jpg)
17
Communal establishmentsFor client residents:
For staff residents:
![Page 18: Census 2011 – A Question of Confidentiality](https://reader035.vdocuments.us/reader035/viewer/2022062305/568149ec550346895db71b8a/html5/thumbnails/18.jpg)
Further work
• Minority population outputs.
• Flow data
• Microdata
• Workplace tables
• Commissioned tables
• Contact: SDC [email protected]