cmgpd-ln methodological lecture day 4
DESCRIPTION
CMGPD-LN Methodological Lecture Day 4. Households. Outline. Existing household variables Identifiers Characteristics Dynamics Household relationship Creation of new variables Use of bysort / egen. Identifiers. HOUSEHOLD_ID - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/1.jpg)
CMGPD-LNMethodological Lecture
Day 4
Households
![Page 2: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/2.jpg)
Outline
• Existing household variables– Identifiers– Characteristics– Dynamics– Household relationship
• Creation of new variables– Use of bysort/egen
![Page 3: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/3.jpg)
Identifiers
• HOUSEHOLD_ID– Identifies records associated with a household in the current
register• HOUSEHOLD_SEQ
– The order of the current household (linghu) within the current household group (yihu)
• UNIQUE_HH_ID– Identifies records associated with the same household across
different registers– New value assigned at time of household division
• Each of the resulting households gets a new, different
![Page 4: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/4.jpg)
Characteristics
• HH_SIZE– Number of living members of the household– Set to missing before 1789
• HH_DIVIDE_NEXT– Number of households in the next register that the
members of the current household are associated with.– 1 if no division– 0 if extinction– 2 or more if division– Set to missing before 1789
![Page 5: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/5.jpg)
histogram HH_SIZE if PRESENT & HH_SIZE > 0, width(2) scheme(s1mono) fraction ytitle("Proportion of individuals") xtitle("Number of members")
0.0
5.1
.15
Pro
porti
on o
f ind
ivid
uals
0 50 100 150Number of members
![Page 6: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/6.jpg)
• This isn’t particularly appealing• A log scale on the x axis would help• In STATA, histogram forces fixed width bins, even
when the x scale is set to log• We can collapse the data and plot using twoway bar
or scatter
table HH_SIZE, replacetwoway bar table1 HH_SIZE if HH_SIZE > 0,
xscale(log) scheme(s1mono) xlabel(0 1 2 5 10 20 50 100 150)
![Page 7: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/7.jpg)
020
,000
40,0
0060
,000
80,0
0010
0,00
0Fr
eq.
0 1 2 5 10 20 50 100 150Household size
![Page 8: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/8.jpg)
• What if we would like to convert to fractions?• Compute total number of households by summing table1,
then divide each value of table 1 by the total• sum(table1) returns the sum of table 1 up to the current
observation• total[_N] returns the value of total in the last observation
drop if HH_SIZE <= 0generate total = sum(table1)generate hh_fraction = table1/total[_N]twoway bar hh_fraction HH_SIZE if HH_SIZE > 0, xscale(log) scheme(s1mono) xlabel(0 1 2 5 10 20 50 100 150) ytitle("Proportion of households")
![Page 9: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/9.jpg)
0.0
2.0
4.0
6.0
8P
ropo
rtion
of h
ouse
hold
s
0 1 2 5 10 20 50 100 150Household size
![Page 10: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/10.jpg)
Households as units of analysis• The previous figures all treated individuals as the units of an
analysis• Every household was represented as many times as it had
members– A household with 100 members would contribute 100 observations
• In effect, the figures represent household size as experienced by individuals
• Sometimes we would like to treat households as units of analysis– So that each household only contributes one observation per
register
![Page 11: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/11.jpg)
Households as units of analysis• One easy way is to create a flag variable that is set to 1 only
for the first observation in each household• Then select based on that flag variable for tabulations etc.• This leaves the original individual level data intact
bysort HOUSEHOLD_ID: generate hh_first_record = _n == 1
histogram HH_SIZE if hh_first_record & HH_SIZE > 0, width(2) scheme(s1mono) fraction ytitle("Proportion of households") xtitle("Number of members")
![Page 12: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/12.jpg)
0.1
.2.3
Pro
porti
on o
f hou
seho
lds
0 50 100 150Number of members
![Page 13: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/13.jpg)
0.0
5.1
.15
Pro
porti
on o
f ind
ivid
uals
0 50 100 150Number of members
0.1
.2.3
Pro
porti
on o
f hou
seho
lds
0 50 100 150Number of members
![Page 14: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/14.jpg)
Another approach to plotting trends
• We can plot average household size by year of birth without ‘destroying’ the data with TABLE, REPLACE or COLLAPSE
bysort YEAR: egen mean_hh_size = mean(HH_SIZE) if HH_SIZE > 0
bysort YEAR: egen first_in_year = _n == 1twoway scatter mean_hh_size YEAR if first_in_year & YEAR >= 1775, scheme(s1mono) ytitle("Mean household size of individuals") xlabel(1775(25)1900)
![Page 15: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/15.jpg)
510
1520
25M
ean
hous
ehol
d si
ze o
f ind
ivid
uals
1775 1800 1825 1850 1875 1900Year
![Page 16: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/16.jpg)
Mean household size of individuals by age
keep if AGE_IN_SUI > 0 & SEX == 2 & YEAR >= 1789 & HH_SIZE > 0
bysort AGE_IN_SUI: egen mean_hh_size = mean(HH_SIZE)
bysort AGE_IN_SUI: generate first_in_age = _n == 1twoway scatter mean_hh_size AGE_IN_SUI if
first_in_age & AGE_IN_SUI <= 80, scheme(s1mono) ytitle("Mean household size of individuals") xlabel(1(5)85) xtitle("Age in sui")
lowess mean_hh_size AGE_IN_SUI if first_in_age & AGE_IN_SUI <= 80, scheme(s1mono) ytitle("Mean household size of individuals") xlabel(1(5)85) xtitle("Age in sui") msize(small)
![Page 17: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/17.jpg)
1015
20M
ean
hous
ehol
d si
ze o
f ind
ivid
uals
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86Age in sui
![Page 18: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/18.jpg)
1015
20M
ean
hous
ehol
d si
ze o
f ind
ivid
uals
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86Age in sui
bandwidth = .8
Lowess smoother
![Page 19: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/19.jpg)
Household divisionIndividuals by next register
. tab HH_DIVIDE_NEXT if PRESENT & NEXT_3 & HH_DIVIDE_NEXT >= 0
Number of | household in | the next | available | register | Freq. Percent Cum.---------------+----------------------------------- 1 | 789,250 94.98 94.98 2 | 33,000 3.97 98.95 3 | 5,815 0.70 99.65 4 | 1,812 0.22 99.87 5 | 383 0.05 99.91 6 | 314 0.04 99.95 7 | 196 0.02 99.98 8 | 34 0.00 99.98 9 | 82 0.01 99.99 10 | 86 0.01 100.00---------------+----------------------------------- Total | 830,972 100.00
![Page 20: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/20.jpg)
Household divisionHouseholds by next register
. bysort HOUSEHOLD_ID: generate first_in_hh = _n == 1
. tab HH_DIVIDE_NEXT if PRESENT & NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh
Number of | household in | the next | available | register | Freq. Percent Cum.---------------+----------------------------------- 1 | 117,317 97.80 97.80 2 | 2,287 1.91 99.71 3 | 272 0.23 99.94 4 | 57 0.05 99.98 5 | 8 0.01 99.99 6 | 7 0.01 100.00 7 | 2 0.00 100.00 9 | 1 0.00 100.00 10 | 1 0.00 100.00---------------+----------------------------------- Total | 119,952 100.00
![Page 21: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/21.jpg)
Household divisionExample of a simple analysis
generate byte DIVISION = HH_DIVIDE_NEXT > 1
generate l_HH_SIZE = ln(HH_SIZE)/ln(1.1)
logit DIVISION HH_SIZE YEAR if HH_SIZE > 0 & NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh
logit DIVISION l_HH_SIZE YEAR if NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh
![Page 22: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/22.jpg)
. logit DIVISION HH_SIZE YEAR if HH_SIZE > 0 & NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh
Iteration 0: log likelihood = -15419.716 Iteration 1: log likelihood = -14310.848 Iteration 2: log likelihood = -14127.244 Iteration 3: log likelihood = -14126.276 Iteration 4: log likelihood = -14126.276
Logistic regression Number of obs = 132688 LR chi2(2) = 2586.88 Prob > chi2 = 0.0000Log likelihood = -14126.276 Pseudo R2 = 0.0839
------------------------------------------------------------------------------ DIVISION | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- HH_SIZE | .0882472 .0016549 53.32 0.000 .0850036 .0914908 YEAR | -.0122989 .0005941 -20.70 0.000 -.0134633 -.0111345 _cons | 18.23519 1.087218 16.77 0.000 16.10428 20.3661
![Page 23: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/23.jpg)
. logit DIVISION l_HH_SIZE YEAR if NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh
Iteration 0: log likelihood = -15419.716 Iteration 1: log likelihood = -13953.268 Iteration 2: log likelihood = -13468.077 Iteration 3: log likelihood = -13463.036 Iteration 4: log likelihood = -13463.032 Iteration 5: log likelihood = -13463.032
Logistic regression Number of obs = 132688 LR chi2(2) = 3913.37 Prob > chi2 = 0.0000Log likelihood = -13463.032 Pseudo R2 = 0.1269
------------------------------------------------------------------------------ DIVISION | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- l_HH_SIZE | .1341566 .0023316 57.54 0.000 .1295867 .1387265 YEAR | -.0130866 .0005775 -22.66 0.000 -.0142185 -.0119547 _cons | 17.75924 1.048066 16.94 0.000 15.70507 19.81342------------------------------------------------------------------------------
![Page 24: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/24.jpg)
Creating household variables• bysort and egen are your friends• Use household_id to group observations of the same
household in the same register• Let’s start with a count of the number of live individuals in
the household
bysort HOUSEHOLD_ID: egen new_hh_size = total(PRESENT)
. corr HH_SIZE new_hh_size if YEAR >= 1789(obs=1410354)
| HH_SIZE new_hh~e-------------+------------------ HH_SIZE | 1.0000 new_hh_size | 1.0000 1.0000
![Page 25: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/25.jpg)
Creating measures of age and sex composition of the household
bysort HOUSEHOLD_ID: egen males_1_15 = total(PRESENT & SEX == 2 & AGE_IN_SUI >= 1 & AGE_IN_SUI <= 15)
bysort HOUSEHOLD_ID: egen males_16_55 = total(PRESENT & SEX == 2 & AGE_IN_SUI >= 16 & AGE_IN_SUI <= 55)
bysort HOUSEHOLD_ID: egen males_56_up = total(PRESENT & SEX == 2 & AGE_IN_SUI >= 56)
bysort HOUSEHOLD_ID: egen females_1_15 = total(PRESENT & SEX == 1 & AGE_IN_SUI >= 1 & AGE_IN_SUI <= 15)
bysort HOUSEHOLD_ID: egen females_16_55 = total(PRESENT & SEX == 1 & AGE_IN_SUI >= 16 & AGE_IN_SUI <= 55)
bysort HOUSEHOLD_ID: egen females_56_up = total(PRESENT & SEX == 1 & AGE_IN_SUI >= 56)
generate hh_dependency_ratio = (males_1_15+males56_up+females_1_15+females56_up)/HH_SIZE
bysort AGE_IN_SUI: generate first_in_age = _n == 1bysort AGE_IN_SUI: egen mean_hh_dependency_ratio =
mean(hh_dependency_ratio)twoway line mean_hh_dependency_ratio AGE_IN_SUI if first_in_age &
AGE_IN_SUI >= 16 & AGE_IN_SUI <= 55, scheme(s1mono) ylabel(0(0.1)0.5) xlabel(16(5)55) ytitle("Household dependency ratio (Prop. < 15 or >= 56 sui)") xtitle("Age in sui")
![Page 26: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/26.jpg)
0.1
.2.3
.4.5
Hou
seho
ld d
epen
denc
y ra
tio (P
rop.
< 1
5 or
>=
56 s
ui)
16 21 26 31 36 41 46 51 56Age in sui
![Page 27: CMGPD-LN Methodological Lecture Day 4](https://reader035.vdocuments.us/reader035/viewer/2022062501/56815b34550346895dc9045b/html5/thumbnails/27.jpg)
Numbers of individuals who co-reside with someone who holds a position
. bysort HOUSEHOLD_ID: egen position_in_hh = total(PRESENT & HAS_POSITION > 0)
. tab position_in_hh if PRESENT & YEAR >= 1789
position_in | _hh | Freq. Percent Cum.------------+----------------------------------- 0 | 1,177,575 90.23 90.23 1 | 87,517 6.71 96.94 2 | 24,204 1.85 98.79 3 | 8,019 0.61 99.41 4 | 4,893 0.37 99.78 5 | 1,712 0.13 99.91 6 | 651 0.05 99.96 7 | 241 0.02 99.98 8 | 136 0.01 99.99 9 | 101 0.01 100.00------------+----------------------------------- Total | 1,305,049 100.00
. replace position_in_hh = position_in_hh > 0(49183 real changes made)
. tab position_in_hh if PRESENT & YEAR >= 1789
position_in | _hh | Freq. Percent Cum.------------+----------------------------------- 0 | 1,177,575 90.23 90.23 1 | 127,474 9.77 100.00------------+----------------------------------- Total | 1,305,049 100.00