methodology for producing the revised back series of population estimates for 1992 - 2000 julie...
TRANSCRIPT
Methodology for producing the revised back series of population estimates
for 1992 - 2000
Julie JefferiesPopulation and Demography Division
Office for National Statistics
Outline of Presentation
1. Why did the back series need to be revised?
2. The approach taken in 2001, compared to 1991
3. Explaining and quantifying the difference
4. The remaining difference
5. Possible methods for apportioning the remaining difference
6. Development of the final national method
7. The sub-national back series
1. Why did the back series need to be revised?
• Population estimates provide estimates of the population in the years between censuses.
• Following each census there is a new base or starting point.
• A discontinuity occurs in the time series as a result of changing the base.
Population estimates for 1991-2001 (based on 1991 Census) and 2001 population estimate (based on 2001 Census)
50,500
51,000
51,500
52,000
52,500
53,000
53,50019
91
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
(thou
sand
s) Based on 1991
Based on 2001
Population estimates for 1991-2001 (based on 1991 Census) and 2001 population estimate (based on 2001 Census)
0
10,000
20,000
30,000
40,000
50,000
60,00019
91
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
(thou
sand
s) Based on 1991
Based on 2001
2. Approach taken in 2001 vs. 1991
Following the 1991 Census:
• A method for revising the 1980s back series had already been selected prior to the census
– interim revised estimates produced using simple period method (easy and quick to calculate)
– final revised estimates produced using the more sophisticated linear cohort method
There was no ….• examining of the reasons for the divergence• evaluation of different methods
… and the final method used was much simpler!
Approach taken in 2001 vs. 1991
In 2001, a three stage approach was used:
1. Examine the reasons for the difference
2. Quantify the impact of these reasons on the estimates over previous decade and adjust the back series
3. Apportion the remaining difference– Examine a range of methods– Select the most appropriate method
3. Explaining and quantifying the difference
The difference may be caused by:
a) Issues with using the 2001 Census data (in its raw form) as a base for the mid-2001 population estimates.
b) Accumulated error in the population estimates over the intercensal period – population drift.
Possible causes - shortcomings in methodology or data sources, definitional issues.
Population estimates for 1991-2001 (based on 1991 Census) and 2001 population estimate (based on 2001 Census)
50,500
51,000
51,500
52,000
52,500
53,000
53,500
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
(thou
sand
s) Based on 1991
Based on 2001
2001 Census data
• A number of studies examining the reason for the difference were carried out. These included:
– Demographic analysis of sex ratios, fertility, mortality and migration
– Analysis of the Longitudinal Study– Comparisons with administrative sources– Investigation of census data and processes– Matching studies of address lists collected by local
authorities and those held by census– The Local Authority Population Studies
Impact
Conclusion: an adjusted Census base should be used for the mid-2001 population estimates.
• Hence the final rebased mid-2001 population estimate (September 2004) was 275,000 higher than the original rebased estimate:
• 193,000 due to LS adjustment and other adjustments in September 2003.
• 82,000 due to Local Authority Population Studies and consequential adjustments.
Intercensal population estimates
Two quantifiable sources of error were also identified in the population estimates:
1. The mid-1991 population estimates were too high because they included too big an adjustment for undercoverage in the 1991 Census.
2. Difficulties in the estimation of international migration during the 1990s resulted in an overestimation of population growth.
Impact
1. Mid-1991 population was revised downwards and rolled forward over the decade.
The rolled-forward mid-2001 population estimate was reduced by 351,000.
2. Following thorough methodological research, international migration estimates for the 1990s were revised.
The rolled forward estimate for mid-2001 was reduced by 305,000.
Impact of quantifying these differences
50,500
51,000
51,500
52,000
52,500
53,000
53,500
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
(thou
sand
s)
Original based on 1991 Original based on 2001
Revised (mig &91) based on 1991 Revised (Sept 04) based on 2001
4. The remaining difference
Remaining difference = 209,000
Possible causes e.g.
• issues to do with the concept and measurement of usual residence (including changes in residence status that do not involve a migration)
• remaining differences in estimating international migration
• births to non-resident mothers
Not possible to separately quantify these causes at present.
The remaining difference
Remaining differences between rebased and rolled-forward mid-2001 population estimates
-30000
-25000
-20000
-15000
-10000
-5000
0
5000
10000
15000
20000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Age
Dif
fere
nc
e
5. Apportioning the remaining difference
• Two main methods:• period• cohort
• Within these methods, choice of:• simple (linear) • weighted (by…..)
Period
Period effect:• where the error is related to a particular age,
i.e. the estimates for those of that age are drifting further and further away from the truth
E.g. Each year we were underestimating the number of students (age 18) leaving an area to go to university or leaving the UK on a gap year
Simple period example 1
For 20 year old males, difference between rolled forward 2001 estimates and rebased 2001 estimates is: 6240
Actual error each year:
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
16 437 437 437 437 437 437 437 437 437 437
17 352 352 352 352 352 352 352 352 352 352
18 911 911 911 911 911 911 911 911 911 911
19 1595 1595 1595 1595 1595 1595 1595 1595 1595 1595
20 624 624 624 624 624 624 624 624 624 624
21 974 974 974 974 974 974 974 974 974 974
Simple period example 2Accumulated error:
This is what the existing back series estimate needs to be adjusted by.
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
16 437 874 1311 1748 2185 2622 3059 3496 3933 4370
17 352 704 1056 1408 1760 2112 2464 2816 3168 3520
18 911 1822 2733 3644 4555 5466 6377 7288 8199 9110
19 1595 3190 4785 6380 7975 9570 11165 12760 14355 15950
20 624 1248 1872 2496 3120 3744 4368 4992 5616 6240
21 974 1948 2922 3896 4870 5844 6818 7792 8766 9740
Cohort
Cohort effect:• where the error is related to a particular group of
people i.e. the error for this birth cohort built up gradually over the decade as they got older.
E.g. in the rolled forward 2001 estimates, we have too many 45 year old males. This could be because over the decade some people born around 1956 spent periods of time abroad and were not identified as out-migrants.
Linear cohort example 1
For 45 year olds, difference between rolled forward 2001 estimates and rebased 2001 estimates is: 2860. Actual error each year:
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
36 286 111 223 715 1276 2210 1859 1950 2323 1707
37 214 286 111 223 715 1276 2210 1859 1950 2323
38 243 214 286 111 223 715 1276 2210 1859 1950
39 446 243 214 286 111 223 715 1276 2210 1859
40 237 446 243 214 286 111 223 715 1276 2210
41 571 237 446 243 214 286 111 223 715 1276
42 357 571 237 446 243 214 286 111 223 715
43 368 357 571 237 446 243 214 286 111 223
44 221 368 357 571 237 446 243 214 286 111
45 241 221 368 357 571 237 446 243 214 286
Linear cohort example 2Accumulated error: 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
36 286 222 669 2860 6380 13260 13013 15600 20907 17070
37 214 572 333 892 3575 7656 15470 14872 17550 23230
38 243 428 858 444 1115 4290 8932 17680 16731 19500
39 446 486 642 1144 555 1338 5005 10208 19890 18590
40 237 892 729 856 1430 666 1561 5720 11484 22100
41 571 474 1338 972 1070 1716 777 1784 6435 12760
42 357 1142 711 1784 1215 1284 2002 888 2007 7150
43 368 714 1713 948 2230 1458 1498 2288 999 2230
44 221 736 1071 2284 1185 2676 1701 1712 2574 1110
45 241 442 1104 1428 2855 1422 3122 1944 1926 2860
This is what the existing back series estimate needs to be adjusted by.
Period + cohort combination?
• Generally we pick whichever effect is likely to be dominant or best approximates the true situation.
• A combination of both the period and cohort effects may be closest to reality.
• Using a combination method is complex - need to decide for each age group what proportion of the error is due to a period effect and what is due to a cohort effect. Then need to apply constraints to ensure that the final error by age is correct.
• For the 1992 to 2000 back series we started out using a cohort method (more later)…..
Simple (linear) vs. weighted
Examples so far have assumed a simple (or linear) effect
A linear method:
• weights each year of the decade equally (divides difference by 10)
• is easier to calculate and understand than a weighted method
• assumes whatever is causing the difference will have an equal impact in each year, which may not be the case
The weighted method
A weighted method:
• weights each year of the decade by a different amount i.e. allocates a varying amount of the difference to each year.
• may be appropriate if the difference is likely to be driven by or correlated with a quantifiable factor
• this factor varies over time or by age (or both)
• weighted methods are much more complex
Developing the final national method
The intercensal drift was thought to be correlated with migration (in particular out-migration from an area).
We know that:• Propensity to migrate varies with age• Levels of migration change over time
• Apportion difference back over cohort according to propensity to out-migrate by age (IPS data).
• In addition, weight the difference according to level of migration (all ages over time).
Calculating migration age weights 1
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Total
M40 634 1837 779 2108 890 6503 3056 1745 2408 3971 23930
M41 1678 570 566 201 753 557 1152 1597 1331 762 9168
M42 908 462 1258 2559 489 1242 1169 360 1083 474 10003
M43 1821 1825 401 1410 1658 597 0 1805 467 2323 12307
M44 781 911 770 492 109 269 837 388 647 1614 6817
M45 541 1428 0 725 2486 1239 853 1723 1239 1438 11671
M46 233 2583 630 442 609 1174 1658 288 895 2059 10571
M47 1599 1138 1147 598 381 175 112 192 1099 900 7340
M48 1540 480 115 439 1063 117 148 1118 737 2676 8434
M49 592 870 0 373 626 1681 120 590 428 929 6208
IPS out-migration data for males:
Calculating migration age weights 2
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Total
M40 2393 2393 2393 2393 2393 2393 2393 2393 2393 2393 23930
M41 917 917 917 917 917 917 917 917 917 917 9168
M42 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 10003
M43 1231 1231 1231 1231 1231 1231 1231 1231 1231 1231 12307
M44 682 682 682 682 682 682 682 682 682 682 6817
M45 1167 1167 1167 1167 1167 1167 1167 1167 1167 1167 11671
M46 1057 1057 1057 1057 1057 1057 1057 1057 1057 1057 10571
M47 734 734 734 734 734 734 734 734 734 734 7340
M48 843 843 843 843 843 843 843 843 843 843 8434
M49 621 621 621 621 621 621 621 621 621 621 6208
Calculating migration age weights 3
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Cohort
Total
M40 2393 2393 2393 2393 2393 2393 2393 2393 2393 2393 23287
M41 917 917 917 917 917 917 917 917 917 917 21078
M42 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 19144
M43 1231 1231 1231 1231 1231 1231 1231 1231 1231 1231 17458
M44 682 682 682 682 682 682 682 682 682 682 15054
M45 1167 1167 1167 1167 1167 1167 1167 1167 1167 1167 13772
M46 1057 1057 1057 1057 1057 1057 1057 1057 1057 1057 12638
M47 734 734 734 734 734 734 734 734 734 734 11872
M48 843 843 843 843 843 843 843 843 843 843 11173
M49 621 621 621 621 621 621 621 621 621 621 10645
Calculating migration age weights 4
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Cohort
Total
M40 0.225 0.214 0.202 0.189 0.174 0.159 0.137 0.125 0.114 0.103 1.000
M41 0.100 0.086 0.082 0.077 0.073 0.067 0.061 0.053 0.048 0.043 1.000
M42 0.111 0.109 0.094 0.090 0.084 0.079 0.073 0.066 0.057 0.052 1.000
M43 0.143 0.136 0.134 0.116 0.110 0.104 0.097 0.089 0.082 0.070 1.000
M44 0.087 0.079 0.076 0.074 0.064 0.061 0.057 0.054 0.050 0.045 1.000
M45 0.147 0.148 0.136 0.129 0.127 0.110 0.104 0.098 0.092 0.085 1.000
M46 0.145 0.133 0.134 0.123 0.117 0.115 0.099 0.095 0.089 0.084 1.000
M47 0.112 0.101 0.092 0.093 0.086 0.081 0.080 0.069 0.066 0.062 1.000
M48 0.137 0.129 0.116 0.106 0.107 0.098 0.094 0.092 0.079 0.075 1.000
M49 0.107 0.101 0.095 0.085 0.078 0.079 0.072 0.069 0.067 0.058 1.000
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Total 106875 96155 91957 85430 93668 92931 103305 108511 133227 123544
Factor 1.032 0.928 0.888 0.825 0.904 0.897 0.998 1.048 1.286 1.193
Calculate grossing factors to show how out-migration for each year compares to the average out-migration for the decade
Total out-migration for decade (all ages) = 1,035,604
Average migration per year = 103,560
Grossing factor for 1992 = migration in 1992 (106,875) 103,560
Calculating migration time weights
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
M40 0.225 0.214 0.202 0.189 0.174 0.159 0.137 0.125 0.114 0.103
M41 0.100 0.086 0.082 0.077 0.073 0.067 0.061 0.053 0.048 0.043
M42 0.111 0.109 0.094 0.090 0.084 0.079 0.073 0.066 0.057 0.052
M43 0.143 0.136 0.134 0.116 0.110 0.104 0.097 0.089 0.082 0.070
M44 0.087 0.079 0.076 0.074 0.064 0.061 0.057 0.054 0.050 0.045
M45 0.147 0.148 0.136 0.129 0.127 0.110 0.104 0.098 0.092 0.085
M46 0.145 0.133 0.134 0.123 0.117 0.115 0.099 0.095 0.089 0.084
M47 0.112 0.101 0.092 0.093 0.086 0.081 0.080 0.069 0.066 0.062
M48 0.137 0.129 0.116 0.106 0.107 0.098 0.094 0.092 0.079 0.075
M49 0.107 0.101 0.095 0.085 0.078 0.079 0.072 0.069 0.067 0.058
Age weightings:
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Factor 1.032 0.928 0.888 0.825 0.904 0.897 0.998 1.048 1.286 1.193
Time weightings:
Age weights * time weights 1
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
M40 0.232 0.199 0.179 0.156 0.157 0.143 0.137 0.131 0.146 0.123
M41 0.103 0.080 0.073 0.064 0.066 0.060 0.061 0.055 0.062 0.052
M42 0.114 0.101 0.083 0.074 0.076 0.071 0.072 0.070 0.074 0.062
M43 0.148 0.127 0.119 0.095 0.100 0.093 0.097 0.094 0.105 0.084
M44 0.089 0.074 0.067 0.061 0.058 0.055 0.057 0.057 0.064 0.054
M45 0.152 0.138 0.121 0.107 0.115 0.098 0.104 0.103 0.119 0.101
M46 0.150 0.124 0.119 0.102 0.106 0.103 0.099 0.099 0.115 0.100
M47 0.116 0.093 0.082 0.077 0.077 0.073 0.080 0.072 0.085 0.074
M48 0.142 0.120 0.103 0.088 0.097 0.088 0.093 0.096 0.102 0.090
M49 0.110 0.094 0.084 0.070 0.071 0.071 0.072 0.072 0.087 0.070
Age weights * time weights 2
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Cohort
Total
M40 0.232 0.199 0.179 0.156 0.157 0.143 0.137 0.131 0.146 0.123 1.000
M41 0.103 0.080 0.073 0.064 0.066 0.060 0.061 0.055 0.062 0.052 1.000
M42 0.114 0.101 0.083 0.074 0.076 0.071 0.072 0.070 0.074 0.062 1.000
M43 0.148 0.127 0.119 0.095 0.100 0.093 0.097 0.094 0.105 0.084 1.000
M44 0.089 0.074 0.067 0.061 0.058 0.055 0.057 0.057 0.064 0.054 1.000
M45 0.152 0.138 0.121 0.107 0.115 0.098 0.104 0.103 0.119 0.101 1.000
M46 0.150 0.124 0.119 0.102 0.106 0.103 0.099 0.099 0.115 0.100 1.000
M47 0.116 0.093 0.082 0.077 0.077 0.073 0.080 0.072 0.085 0.074 1.000
M48 0.142 0.120 0.103 0.088 0.097 0.088 0.093 0.096 0.102 0.090 1.000
M49 0.110 0.094 0.084 0.070 0.071 0.071 0.072 0.072 0.087 0.070 1.000
Applying the weights 1
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Diff
M40 -15530
M41 -8562
M42 -2509
M43 1283
M44 1391
M45 -2890
M46 -275
M47 -2168
M48 -2738
M49 1270
Applying the weights 2
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Diff
M40 295 -544 -388 -43 -454 198 175 -329 -1251 -1904 -15530
M41 -429 102 -199 -138 -18 -173 84 71 -155 -444 -8562
M42 -262 -421 106 -202 -165 -19 -209 97 95 -156 -2509
M43 -299 -290 -495 121 -273 -202 -27 -271 146 108 1283
M44 126 -149 -154 -255 74 -150 -124 -16 -184 75 1391
M45 -1854 194 -244 -244 -478 125 -285 -223 -33 -292 -2890
M46 791 -1511 168 -206 -243 -430 126 -271 -248 -27 -275
M47 520 494 -1003 108 -156 -167 -332 92 -231 -160 -2168
M48 -479 538 543 -1071 136 -178 -214 -400 129 -247 -2738
M49 85 -317 379 371 -864 100 -146 -165 -362 88 1270
Applying the weights 3
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
M40 295 -835 -874 -130 -1707 956 989 -2130 -7980 -15147
M41 -429 396 -1034 -1012 -148 -1880 1041 1059 -2284 -8425
M42 -262 -849 502 -1237 -1177 -167 -2089 1138 1154 -2440
M43 -299 -552 -1345 623 -1509 -1379 -194 -2360 1284 1262
M44 126 -449 -706 -1599 697 -1659 -1503 -209 -2544 1359
M45 -1854 319 -693 -950 -2078 822 -1944 -1726 -242 -2836
M46 791 -3365 487 -898 -1193 -2507 948 -2216 -1975 -269
M47 520 1285 -4369 595 -1055 -1360 -2839 1039 -2447 -2135
M48 -479 1058 1828 -5440 731 -1233 -1574 -3239 1169 -2694
M49 85 -796 1436 2200 -6304 831 -1379 -1739 -3601 1257
Applying the weights 4
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
M40 298 -849 -887 -132 -1739 979 1005 -2189 -8111 -15530
M41 -434 400 -1051 -1028 -151 -1915 1065 1077 -2348 -8562
M42 -264 -861 507 -1257 -1196 -170 -2129 1164 1173 -2509
M43 -304 -557 -1362 630 -1534 -1401 -198 -2404 1314 1283
M44 128 -455 -711 -1620 704 -1686 -1527 -213 -2592 1391
M45 -1880 326 -703 -958 -2105 830 -1976 -1754 -247 -2890
M46 796 -3411 497 -911 -1202 -2540 957 -2252 -2006 -275
M47 533 1294 -4429 608 -1070 -1371 -2876 1050 -2487 -2168
M48 -494 1084 1840 -5514 747 -1251 -1586 -3282 1181 -2738
M49 87 -821 1472 2214 -6391 848 -1399 -1753 -3648 1270
This is what the existing back series estimate needs to be adjusted by.
Story so far….
The 1992-2000 back series was revised using a:
• Cohort method• Weighted by out-migration• Migration weights varied by both age and time
QA – weighted cohort method worked well for nearly all ages…..
• But still some issues with teenagers• Following QA and further research, a period adjustment for
teenagers was included
…so the final method was a weighted combination method!
Period adjustment 1
• Introduced to address a specific issue for 18 and 19 year olds
• Analysis of the results obtained using the weighted cohort method suggested that there was a significant period effect associated with these ages which had not been allowed for
• This is possibly due to people taking ‘gap years’ abroad at ages 18 and 19
Period adjustment 2
• A proportion of the difference observed at age 18 and 19 was allocated using a time-weighted period method
• This proportion was determined by comparing the relative size of measured migration at 18 and 19 year olds with migration levels at younger ages
• The remainder of the difference was allocated using the cohort method
Final rebased back series (published Oct 04)
50,500
51,000
51,500
52,000
52,500
53,000
53,500
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
(tho
usan
ds)
Original based on 1991 Original based on 2001
Revised (mig &91) based on 1991 Revised (Sept 04) based on 2001
Revised back series
7. Sub-national back series (published Oct 04)
• Each local authority calculated separately using method as for national estimate.
– For the age weights, the national distribution was used.
– For the time weights, both international out-migration (IPS) and internal out-migration were used.
• Final LA estimates for each year constrained to national estimate.
• QA – sex ratios and time series.
QA – sex ratios
Sex Ratios - Basingstoke and Deane
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 6
12
18
24
30
36
42
48
54
60
66
72
78
84
Age
Ra
tio
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
QA – time series
Basingstoke and Deane
140000
142000
144000
146000
148000
150000
152000
154000
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Po
pu
lati
on
Rebased Rolled Forward
Contact details:
www.statistics.gov.uk/popestemail: [email protected]
tel: 01329 813318
Any questions?