confidentiality and the sars update on sar progress, and discussion of the disclosure work done for...

Post on 28-Mar-2015

222 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Confidentiality and the SARs

Update on SAR progress, and discussion of the disclosure work done for Scotland.

Sam Smith s.smith@man.ac.uk

Update 2001 SARs

Newsletter published very recently: More delays Disclosure Control is ongoing by CAPRI

Current estimate for Individual data to be with the SARs team in JuneIn-house access at ONS for users with urgent need.

England and Wales

For the release of 100% tables, England and Wales and Northern Ireland rounded small cell counts.

It is not possible to match between the SAR and the tables for England, Wales and NI.

Scotland

Scotland did not round their 100% tables.

As a result, there are counts of 1 in the tables.

If any of these individuals are present in the SAR, it is disclosive.

Background

The following work has been carried out in collaboration with the General Register Office for Scotland, by the SARs team at CCSR.

At time of writing, I have had no access to disclosive data.

There is no geography below Scotland level.

Population Uniques

Population Uniques are people who have one or more characteristics which are Unique in the Population.

Sample Uniques are people who are unique on one or more characteristics in the Sample.

Scale

There are 62 variables in both the SAR and 100% tables.

GROS are interested in Tri-variate tables. Only concerned with uniques.

We obtained 37,820 tables, covering all combinations of trivariate tables.

Request of the tables

An example request for input to their system was provided by GROS

We then replicated and modified it, one request for each table.

The tables arrived on 4 CDs, a month later.

An example tableSpace-Time Research2001 ED Based OSD - Test 1Table 1Cars - Number of by Ever worked Indicator and Number of Roomsfor Person

No code required No code required No code required No code required No code required

Not applicable01-02 03-04 05-06 7+None - 53,323 421,443 232,335 18,719One - 33,839 577,499 759,187 188,235Two - 6,104 174,884 499,420 368,657Three - 772 20,029 83,915 84,619Four or more - 222 4,622 20,353 29,984Communal establishment 50,485 - - - -

Cars - Number of by Ever worked Indicator and Number of RoomsOnly “No Code Required” shown for Ever Worked.

A Bigger Example TableAge, Industry, Occupation

Add table here

Analysis

Custom software written to parse each table, and list the file, variables and values locations of all uniques.

List the Uniques.

There are 2.4 million of them.

Implementation

Step by Step process.

Keep intermediate steps.

Keep It Simple.

Target

The Scotland Specification is as compatible as possible with the England and Wales specification.

Use recodes to reduce the unique count to a level where they can be dealt with on an record by record basis.

Simple Suppression of Uniques

All records with uniques must be perturbed.

Approximately 96% of Uniques will be immediately suppressed by virtue of the sample being 4%.

There are also reductions because of differences in the specifications.

Recodes

Variables were recoded to coarser categories.

Some used to aid E&W disclosure work

including: Age, Hours of Work, Industry + others

At time of writing, Occupation is the only additional recode for Scotland.

Running the recodes.

The previous slide represents 6 weeks of iterative work.

Each recode had the uniques analysis run, producing a list of uniques.

Distribution of uniques by variable

0

200000

400000

600000

800000

1000000

1200000

variable

Cou

nt

No recode SARbase sarbase+age5 sarbase+age5+occ

Moving forwards

We now have a slightly more restrictive specification for Scotland. Age recoded to between 2 and 5 year

bands (for age 16+) (possibly also for EWNI)

Occupation in ?? categories Industry in 15 categories (applied to

EWNI) Hours of Work banded (applied to

EWNI)

So far…

Everything has been done on publicly accessible data.

The above process needs to be rerun on the SAR to find Sample Uniques

This requires access to the disclosive microdata.

Future Work

The 38,720 tables will be recreated for the records in the sample.

The lists of Population Uniques and Sample Uniques will be compared.

Where there is a Population Unique in the Sample, it will be flagged.

Applying this to the Microdata

All the Population Uniques in the Sample will be peturbed by ONS.

The method of peturbation will be the same as done for England, Wales and NI records.

This method is likely to involve PRAMM. Discussion paper available from the SARs website?

The 100% Tables

The 37,820 tables requested cost £2,000 - paid for by the SARs project.

They will be made available to registered SARs/Census users for use in research.

And Finally….

Slides will be available on the seminars webpage tomorrow.

Any questions?

top related