joint unece/eurostat meeting on population and housing censuses (28-30 october 2009) accuracy...
TRANSCRIPT
Joint UNECE/Eurostat Meeting on Population and Housing Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009)Censuses (28-30 October 2009)
Accuracy evaluation of Nuts level 2 hypercubes Accuracy evaluation of Nuts level 2 hypercubes with the adoption of a sampling strategy in the with the adoption of a sampling strategy in the
2011 Italian Population Census2011 Italian Population Census
Giancarlo Carbonetti, Mariangela Verrascina
Istat – Italian National Institute of StatisticsDivision for General Censuses
Geneva, October 29th 2009
Joint UNECE Eurostat Meeting
2
Why do we adopt sampling techniques in the Italian Census?Why do we adopt sampling techniques in the Italian Census?
Sampling is crucial for the new census strategy.Sampling is crucial for the new census strategy.
The main solutions proposed are related to:use of population registers;census forms mail out;mixed mode of data collection.
A high response rate is needed.
The 2011 population census has been planned in order to:improve the efficiency of the survey operations;reduce the workload of the municipalities;minimize the statistical burden for the people.
Joint UNECE Eurostat Meeting
3
Which effects by adopting a Which effects by adopting a sampling techniquesampling technique??
To keep high level of quality (reducing non-sampling error sources)
→ → this is an opportunitythis is an opportunity
Timeliness for a smaller amount of data to process (hypercubes must be delivered to Eurostat by 1 April 2014) → this is a constraint→ this is a constraint
AdvantagesAdvantages
DisadvantagesDisadvantages
Introduction of sampling error
→ → an evaluation of accuracy of the sampling estimates is an evaluation of accuracy of the sampling estimates is requiredrequired
Joint UNECE Eurostat Meeting
4
The frameworkThe framework
POPULATION: private households.
LISTS: population registers managed by the municipalities.
VARIABLES: non-demographic variables.
DOMAINS: Census Areas.
DIFFERENT STRATEGY: municipality demographic thresholds.
A simulation study has been conducted in order to define which methodology performs the most accurate estimation.
Joint UNECE Eurostat Meeting
5
Simple Random Sampling of HOUseholds (SRSHOU) from population registers.
Area Frame Sampling where reliable population registers are not available.
Calibrated Estimators.
Definition of Census Areas of about 15,000 inhabitants.
Sampling Ratio of 33%.
Some results of the simulation studySome results of the simulation study
Joint UNECE Eurostat Meeting
6
Distribution of average and maximum Distribution of average and maximum cv%cv% for classes of cell for classes of cell counts for three tested sampling ratios (SRSHOU design)counts for three tested sampling ratios (SRSHOU design)
cv%_average cv%_max cv%_average cv%_max cv%_average cv%_max
<10 143.3 191.8 101.4 123.7 66.5 95.810├30 75.9 85.1 48.4 54.6 33.8 38.530├50 51.8 57.1 31.8 37.1 23.4 25.650├100 38.6 41.3 22.3 28.4 17.4 19.1100├250 25.4 28.5 15.7 19.6 11.4 12.8250├500 16.1 18.3 10.4 12.5 7.5 8.1500├1,000 11.8 12.8 7.5 8.2 5.3 5.91,000├2,500 7.5 8.9 4.7 5.9 3.3 3.92,500├5,000 4.9 5.4 3.0 3.6 2.0 2.55,000├10,000 3.2 3.8 2.0 2.5 1.3 1.9
Classes of absolute frequency T
sampling ratio = 10% sampling ratio = 20% sampling ratio = 33%
- for cells of 1,000 units cv is about 4%
- for cells of 100 units cv is about 13%
- for cells of 10 units cv is about 40%
Joint UNECE Eurostat Meeting
7
0
20
40
60
80
100
120
140
sampling ratio = 10%
sampling ratio = 20%
sampling ratio = 33%Classes of absolute
frequencies
cv_expected value
Curves of sampling errors drawn by the simulation resultsCurves of sampling errors drawn by the simulation results
Joint UNECE Eurostat Meeting
8
Relevant issueRelevant issue
Which is the impact of the sampling error on the dissemination hypercubes?
The answer is the core of this presentation where the impact of the sampling strategy on the final results will be carefully explained.
Joint UNECE Eurostat Meeting
9
“When can the quality of the statistical table be
considered acceptable?”
Example 1: if less than 1/3 of cell counts have a cv>12.5%
Example 2: if less than 10% of persons are classified in cell counts where cv>12.5%
Impact of sampling errors on dissemination Impact of sampling errors on dissemination hypercubeshypercubes
For a fixed cv (for instance, a critical level should be 12.5%), the global quality of a dissemination hypercube can be acceptable:→ if the percentage of cell counts estimated with a cv higher than the critical value is low; → if the percentage of persons classified in those cells is low.
Having chosen the sampling strategy (SRSHOU; calibrated estimator), for an area and a dissemination hypercube:
Joint UNECE Eurostat Meeting
10
Evaluation of the sets of estimates with Evaluation of the sets of estimates with critical critical accuracyaccuracy by means of a sampling errors curve by means of a sampling errors curve
cv_max
12.5%
TS
Set of estimates with cv>12.5%
High sampling errors
Absolute frequencies estimated with a critical quality
Set of estimates with cv<12.5%
Absolute Frequencies T
cv
critical threshold
sampling error
The lower the amount of information estimated with high levels of cv (referred to persons classified in cells with absolute frequencies lower than the threshold TS), the higher the quality of the related dissemination hypercubes.
Joint UNECE Eurostat Meeting
11
Evaluations are related to 8 Eurostat hypercubes crossing demographic variables with one or more long form variables and referred to NUTS level 2.
The considered hypercubes contain topics with breakdowns used in 2001 Italian Census dissemination, close (in terms of number and information content) to breakdowns to be provided for the next census round.
The number of cells goes from 1,000 to more than 20,000 depending on the complexity of the statistical table.
Quality evaluations for hypercubes at NUTS level2Quality evaluations for hypercubes at NUTS level2
Joint UNECE Eurostat Meeting
12
H.B1.E0.R1 H.B1.E0.R2 H.B1.E0.R3 H.B1.E0.R4 H.B1.E0.R5 H.B1.E1.R2 H.B1.E1.R3 H.B1.E1.R4
Sex M M M M M M M M
Age L L L L L M M S
Current Activity Status M L M
Occupation M M MIndustry (branch of economic activity) M M MStatus in Employment MEducational Attainment M M M M
VariablesEurostat Hypercubes for NUTS level2
Each non-demographic variable has been individually crossed with sex and age (single ages).
Hypercubes at NUTS level2 considered in the study Hypercubes at NUTS level2 considered in the study (draft version, April 2009)(draft version, April 2009)
More than one non-demographic variables have been crossed with sex and age (age classes).
Hypercube computations are simulated with 2001 Census data
Lo
ng
Fo
rm v
aria
ble
s
Joint UNECE Eurostat Meeting
13
Number of potential cells and acceptable cells for Number of potential cells and acceptable cells for each hypercube considered in the studyeach hypercube considered in the study
Number of potential cells = the product of the number of categories
Number of acceptable cells = the number of potential cells without “structural zeros”
Hypercube codeNumber of
potential cellsNumber of
acceptable cellsPercentage of
acceptable cells
H.B1.E0.R1 1,212 (2x101x6) 1,062 87,6%
H.B1.E0.R2 2,020 (2x101x10) 1,922 95,1%
H.B1.E0.R3 3,434 (2x101x17) 3,126 91,0%
H.B1.E0.R4 1,212 (2x101x6) 1,032 85,1%
H.B1.E0.R5 1,414 (2x101x7) 1,342 94,9%
H.B1.E1.R2 23,520 (2x21x8x10x7) 3,810 16,2%
H.B1.E1.R3 29,988 (2x21x6x17x7) 5,574 18,6%
H.B1.E1.R4 30,940 (2x13x10x17x7) 26,350 85,2%
Joint UNECE Eurostat Meeting
14
Indicators of global accuracyIndicators of global accuracy
Two indicators are proposed to measure the global accuracy of census data produced by adopting a sampling strategy and referred to a dissemination hypercube:
1) Percentage of critical cells = number of cell counts (>0) lower than the critical threshold Ts / number of acceptable cells
2) Percentage of persons in critical cells = persons classified in critical cells / total of persons
In particular, the second indicator quantifies the percentage of people classified in cells which will be estimated with a low accuracy (10% could be considered a tolerable limit).
Joint UNECE Eurostat Meeting
15
Example 1Example 1: Hypercube H.B1.E1.R3. Quality indicators related : Hypercube H.B1.E1.R3. Quality indicators related to NUTS2 areas of Italy: Molise, Marche and Siciliato NUTS2 areas of Italy: Molise, Marche and Sicilia
Molise Marche Sicilia
Sampling ratio
threshold Ts
% of critical cells
% of persons in
critical cells
threshold Ts
% of critical cells
% of persons in
critical cells
threshold Ts
% of critical cells
% of persons in
critical cells
10% 100 79.2 10.7 250 78.8 6.9 500 75.9 4.2
20% 50 71.0 5.8 100 68.4 3.0 250 68.7 2.1
33% 30 63.6 3.4 50 59.8 1.5 100 59.4 1.0
Hypercube H.B1.E1.R3: sex (2) by age (21) by current activity status (6) by industry (17) by educational attainment (7). Number of acceptable cells = 5,574 (no structural zeros).
The cells are critical if the related absolute frequency is lower than the threshold TS observed in correspondence of cv_max =12.5% .
Joint UNECE Eurostat Meeting
16
Example 2Example 2: Hypercube H.B1.E1.R4. Quality indicators related : Hypercube H.B1.E1.R4. Quality indicators related to NUTS2 areas of Italy: Molise, Marche and Siciliato NUTS2 areas of Italy: Molise, Marche and Sicilia
Molise Marche Sicilia
Sampling ratio
threshold Ts
% of critical cells
% of persons in
critical cells
threshold Ts
% of critical cells
% of persons in
critical cells
threshold Ts
% of critical cells
% of persons in
critical cells
10% 100 91.9 14.9 250 91.1 11.2 500 91.8 7.3
20% 50 86.5 9.3 100 84.4 6.0 250 87.6 4.5
33% 30 81.3 6.5 50 77.1 3.4 100 79.4 2.2
Hypercube H.B1.E1.R4: sex (2) by age (13) by occupation (10) by industry (17) by educational attainment (7). Number of acceptable cells = 26,350 (no structural zeros).
The cells are critical if the related absolute frequency is lower than the threshold TS observed in correspondence of cv_max =12.5% .
Joint UNECE Eurostat Meeting
17
CV = 12.5%
<5% 5-10% <5% 5-10% 10-15% <5% 5-10% 10-15%15-20% >20%
H.B1.E0.R1 (1062) 20 0 20 0 0 19 1 0 0 0H.B1.E0.R2 (1922) 20 0 20 0 0 15 4 1 0 0H.B1.E0.R3 (3126) 20 0 17 3 0 11 4 4 1 0H.B1.E0.R4 (1032) 20 0 16 4 0 7 8 5 0 0H.B1.E0.R5 (1342) 20 0 20 0 0 15 5 0 0 0H.B1.E1.R2 (3810) 20 0 18 2 0 12 6 2 0 0H.B1.E1.R3 (5574) 20 0 17 3 0 10 5 5 0 0H.B1.E1.R4 (26350) 15 5 8 10 2 1 9 6 3 1
s.r. = 33% s.r. = 20% s.r. = 10%Eurostat Hypercubes
(acceptable cells)Percentage of persons in critical cells
Expected quality for hypercubes at NUTS level2Expected quality for hypercubes at NUTS level2
Distribution of all 20 Italian Nuts2 areas by percentage of persons classified in critical cells for the Eurostat hypercubes considered in the study and the three tested sampling ratios.
********
*
Joint UNECE Eurostat Meeting
18
Concluding remarksConcluding remarks
The adoption of a sampling strategy doesn’t seem to bring a reduction of accuracy.
The sampling error could have a considerable impact only to estimate very small frequencies.
NUTS2 hypercubes with different complexity could be estimated with good accuracy even for lower sampling ratios.
The revised version of the hypercubes considered in the work seems to be less detailed. This will hopefully bring more accuracy.
Joint UNECE Eurostat Meeting
19
Some solutions to enhance accuracySome solutions to enhance accuracy
Adopting small area estimators.
Increasing the set of variables to be observed on the whole population, reducing the set of variables that have to be surveyed on samples of households:
adoption of a medium/long form.
Enhancements of estimates regarding rare events and small domains in order to increase their efficiency and to reduce
the number of critical cells.