joint unece/eurostat meeting on population and housing censuses (28-30 october 2009) accuracy...

20
Joint UNECE/Eurostat Meeting on Population and Housing Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Censuses (28-30 October 2009) Accuracy evaluation of Nuts level Accuracy evaluation of Nuts level 2 hypercubes with the adoption of 2 hypercubes with the adoption of a sampling strategy in the 2011 a sampling strategy in the 2011 Italian Population Census Italian Population Census Giancarlo Carbonetti, Mariangela Verrascina Istat – Italian National Institute of Statistics Division for General Censuses Geneva, October 29th 2009

Upload: arabella-owens

Post on 27-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE/Eurostat Meeting on Population and Housing Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009)Censuses (28-30 October 2009)

Accuracy evaluation of Nuts level 2 hypercubes Accuracy evaluation of Nuts level 2 hypercubes with the adoption of a sampling strategy in the with the adoption of a sampling strategy in the

2011 Italian Population Census2011 Italian Population Census

Giancarlo Carbonetti, Mariangela Verrascina

Istat – Italian National Institute of StatisticsDivision for General Censuses

Geneva, October 29th 2009

Page 2: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

2

Why do we adopt sampling techniques in the Italian Census?Why do we adopt sampling techniques in the Italian Census?

Sampling is crucial for the new census strategy.Sampling is crucial for the new census strategy.

The main solutions proposed are related to:use of population registers;census forms mail out;mixed mode of data collection.

A high response rate is needed.

The 2011 population census has been planned in order to:improve the efficiency of the survey operations;reduce the workload of the municipalities;minimize the statistical burden for the people.

Page 3: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

3

Which effects by adopting a Which effects by adopting a sampling techniquesampling technique??

To keep high level of quality (reducing non-sampling error sources)

→ → this is an opportunitythis is an opportunity

Timeliness for a smaller amount of data to process (hypercubes must be delivered to Eurostat by 1 April 2014) → this is a constraint→ this is a constraint

AdvantagesAdvantages

DisadvantagesDisadvantages

Introduction of sampling error

→ → an evaluation of accuracy of the sampling estimates is an evaluation of accuracy of the sampling estimates is requiredrequired

Page 4: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

4

The frameworkThe framework

POPULATION: private households.

LISTS: population registers managed by the municipalities.

VARIABLES: non-demographic variables.

DOMAINS: Census Areas.

DIFFERENT STRATEGY: municipality demographic thresholds.

A simulation study has been conducted in order to define which methodology performs the most accurate estimation.

Page 5: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

5

Simple Random Sampling of HOUseholds (SRSHOU) from population registers.

Area Frame Sampling where reliable population registers are not available.

Calibrated Estimators.

Definition of Census Areas of about 15,000 inhabitants.

Sampling Ratio of 33%.

Some results of the simulation studySome results of the simulation study

Page 6: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

6

Distribution of average and maximum Distribution of average and maximum cv%cv% for classes of cell for classes of cell counts for three tested sampling ratios (SRSHOU design)counts for three tested sampling ratios (SRSHOU design)

cv%_average cv%_max cv%_average cv%_max cv%_average cv%_max

<10 143.3 191.8 101.4 123.7 66.5 95.810├30 75.9 85.1 48.4 54.6 33.8 38.530├50 51.8 57.1 31.8 37.1 23.4 25.650├100 38.6 41.3 22.3 28.4 17.4 19.1100├250 25.4 28.5 15.7 19.6 11.4 12.8250├500 16.1 18.3 10.4 12.5 7.5 8.1500├1,000 11.8 12.8 7.5 8.2 5.3 5.91,000├2,500 7.5 8.9 4.7 5.9 3.3 3.92,500├5,000 4.9 5.4 3.0 3.6 2.0 2.55,000├10,000 3.2 3.8 2.0 2.5 1.3 1.9

Classes of absolute frequency T

sampling ratio = 10% sampling ratio = 20% sampling ratio = 33%

- for cells of 1,000 units cv is about 4%

- for cells of 100 units cv is about 13%

- for cells of 10 units cv is about 40%

Page 7: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

7

0

20

40

60

80

100

120

140

sampling ratio = 10%

sampling ratio = 20%

sampling ratio = 33%Classes of absolute

frequencies

cv_expected value

Curves of sampling errors drawn by the simulation resultsCurves of sampling errors drawn by the simulation results

Page 8: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

8

Relevant issueRelevant issue

Which is the impact of the sampling error on the dissemination hypercubes?

The answer is the core of this presentation where the impact of the sampling strategy on the final results will be carefully explained.

Page 9: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

9

“When can the quality of the statistical table be

considered acceptable?”

Example 1: if less than 1/3 of cell counts have a cv>12.5%

Example 2: if less than 10% of persons are classified in cell counts where cv>12.5%

Impact of sampling errors on dissemination Impact of sampling errors on dissemination hypercubeshypercubes

For a fixed cv (for instance, a critical level should be 12.5%), the global quality of a dissemination hypercube can be acceptable:→ if the percentage of cell counts estimated with a cv higher than the critical value is low; → if the percentage of persons classified in those cells is low.

Having chosen the sampling strategy (SRSHOU; calibrated estimator), for an area and a dissemination hypercube:

Page 10: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

10

Evaluation of the sets of estimates with Evaluation of the sets of estimates with critical critical accuracyaccuracy by means of a sampling errors curve by means of a sampling errors curve

cv_max

12.5%

TS

Set of estimates with cv>12.5%

High sampling errors

Absolute frequencies estimated with a critical quality

Set of estimates with cv<12.5%

Absolute Frequencies T

cv

critical threshold

sampling error

The lower the amount of information estimated with high levels of cv (referred to persons classified in cells with absolute frequencies lower than the threshold TS), the higher the quality of the related dissemination hypercubes.

Page 11: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

11

Evaluations are related to 8 Eurostat hypercubes crossing demographic variables with one or more long form variables and referred to NUTS level 2.

The considered hypercubes contain topics with breakdowns used in 2001 Italian Census dissemination, close (in terms of number and information content) to breakdowns to be provided for the next census round.

The number of cells goes from 1,000 to more than 20,000 depending on the complexity of the statistical table.

Quality evaluations for hypercubes at NUTS level2Quality evaluations for hypercubes at NUTS level2

Page 12: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

12

H.B1.E0.R1 H.B1.E0.R2 H.B1.E0.R3 H.B1.E0.R4 H.B1.E0.R5 H.B1.E1.R2 H.B1.E1.R3 H.B1.E1.R4

Sex M M M M M M M M

Age L L L L L M M S

Current Activity Status M L M

Occupation M M MIndustry (branch of economic activity) M M MStatus in Employment MEducational Attainment M M M M

VariablesEurostat Hypercubes for NUTS level2

Each non-demographic variable has been individually crossed with sex and age (single ages).

Hypercubes at NUTS level2 considered in the study Hypercubes at NUTS level2 considered in the study (draft version, April 2009)(draft version, April 2009)

More than one non-demographic variables have been crossed with sex and age (age classes).

Hypercube computations are simulated with 2001 Census data

Lo

ng

Fo

rm v

aria

ble

s

Page 13: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

13

Number of potential cells and acceptable cells for Number of potential cells and acceptable cells for each hypercube considered in the studyeach hypercube considered in the study

Number of potential cells = the product of the number of categories

Number of acceptable cells = the number of potential cells without “structural zeros”

Hypercube codeNumber of

potential cellsNumber of

acceptable cellsPercentage of

acceptable cells

H.B1.E0.R1 1,212 (2x101x6) 1,062 87,6%

H.B1.E0.R2 2,020 (2x101x10) 1,922 95,1%

H.B1.E0.R3 3,434 (2x101x17) 3,126 91,0%

H.B1.E0.R4 1,212 (2x101x6) 1,032 85,1%

H.B1.E0.R5 1,414 (2x101x7) 1,342 94,9%

H.B1.E1.R2 23,520 (2x21x8x10x7) 3,810 16,2%

H.B1.E1.R3 29,988 (2x21x6x17x7) 5,574 18,6%

H.B1.E1.R4 30,940 (2x13x10x17x7) 26,350 85,2%

Page 14: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

14

Indicators of global accuracyIndicators of global accuracy

Two indicators are proposed to measure the global accuracy of census data produced by adopting a sampling strategy and referred to a dissemination hypercube:

1) Percentage of critical cells = number of cell counts (>0) lower than the critical threshold Ts / number of acceptable cells

2) Percentage of persons in critical cells = persons classified in critical cells / total of persons

In particular, the second indicator quantifies the percentage of people classified in cells which will be estimated with a low accuracy (10% could be considered a tolerable limit).

Page 15: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

15

Example 1Example 1: Hypercube H.B1.E1.R3. Quality indicators related : Hypercube H.B1.E1.R3. Quality indicators related to NUTS2 areas of Italy: Molise, Marche and Siciliato NUTS2 areas of Italy: Molise, Marche and Sicilia

Molise Marche Sicilia

Sampling ratio

threshold Ts

% of critical cells

% of persons in

critical cells

threshold Ts

% of critical cells

% of persons in

critical cells

threshold Ts

% of critical cells

% of persons in

critical cells

10% 100 79.2 10.7 250 78.8 6.9 500 75.9 4.2

20% 50 71.0 5.8 100 68.4 3.0 250 68.7 2.1

33% 30 63.6 3.4 50 59.8 1.5 100 59.4 1.0

Hypercube H.B1.E1.R3: sex (2) by age (21) by current activity status (6) by industry (17) by educational attainment (7). Number of acceptable cells = 5,574 (no structural zeros).

The cells are critical if the related absolute frequency is lower than the threshold TS observed in correspondence of cv_max =12.5% .

Page 16: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

16

Example 2Example 2: Hypercube H.B1.E1.R4. Quality indicators related : Hypercube H.B1.E1.R4. Quality indicators related to NUTS2 areas of Italy: Molise, Marche and Siciliato NUTS2 areas of Italy: Molise, Marche and Sicilia

Molise Marche Sicilia

Sampling ratio

threshold Ts

% of critical cells

% of persons in

critical cells

threshold Ts

% of critical cells

% of persons in

critical cells

threshold Ts

% of critical cells

% of persons in

critical cells

10% 100 91.9 14.9 250 91.1 11.2 500 91.8 7.3

20% 50 86.5 9.3 100 84.4 6.0 250 87.6 4.5

33% 30 81.3 6.5 50 77.1 3.4 100 79.4 2.2

Hypercube H.B1.E1.R4: sex (2) by age (13) by occupation (10) by industry (17) by educational attainment (7). Number of acceptable cells = 26,350 (no structural zeros).

The cells are critical if the related absolute frequency is lower than the threshold TS observed in correspondence of cv_max =12.5% .

Page 17: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

17

CV = 12.5%

<5% 5-10% <5% 5-10% 10-15% <5% 5-10% 10-15%15-20% >20%

H.B1.E0.R1 (1062) 20 0 20 0 0 19 1 0 0 0H.B1.E0.R2 (1922) 20 0 20 0 0 15 4 1 0 0H.B1.E0.R3 (3126) 20 0 17 3 0 11 4 4 1 0H.B1.E0.R4 (1032) 20 0 16 4 0 7 8 5 0 0H.B1.E0.R5 (1342) 20 0 20 0 0 15 5 0 0 0H.B1.E1.R2 (3810) 20 0 18 2 0 12 6 2 0 0H.B1.E1.R3 (5574) 20 0 17 3 0 10 5 5 0 0H.B1.E1.R4 (26350) 15 5 8 10 2 1 9 6 3 1

s.r. = 33% s.r. = 20% s.r. = 10%Eurostat Hypercubes

(acceptable cells)Percentage of persons in critical cells

Expected quality for hypercubes at NUTS level2Expected quality for hypercubes at NUTS level2

Distribution of all 20 Italian Nuts2 areas by percentage of persons classified in critical cells for the Eurostat hypercubes considered in the study and the three tested sampling ratios.

********

*

Page 18: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

18

Concluding remarksConcluding remarks

The adoption of a sampling strategy doesn’t seem to bring a reduction of accuracy.

The sampling error could have a considerable impact only to estimate very small frequencies.

NUTS2 hypercubes with different complexity could be estimated with good accuracy even for lower sampling ratios.

The revised version of the hypercubes considered in the work seems to be less detailed. This will hopefully bring more accuracy.

Page 19: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

19

Some solutions to enhance accuracySome solutions to enhance accuracy

Adopting small area estimators.

Increasing the set of variables to be observed on the whole population, reducing the set of variables that have to be surveyed on samples of households:

adoption of a medium/long form.

Enhancements of estimates regarding rare events and small domains in order to increase their efficiency and to reduce

the number of critical cells.

Page 20: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of

Joint UNECE Eurostat Meeting

20

Thank you for your attention.

[email protected] - [email protected]