sjtu cmgpd 2012 methodological lecture
DESCRIPTION
SJTU CMGPD 2012 Methodological Lecture. Recommended Acknowledgments Contemporary Applications of Historical Data Origins of the CMGPD-LN Key Features. CMGPD-LN. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/1.jpg)
SJTU CMGPD 2012Methodological Lecture
Recommended AcknowledgmentsContemporary Applications of Historical Data
Origins of the CMGPD-LNKey Features
![Page 2: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/2.jpg)
CMGPD-LN
Public release at ICPSR supported by the United States Department of Health and Human Services. National Institutes of Health. Eunice Kennedy Shriver National Institute of Child Health and Human Development (R01 HD057175-01A1) with funds from the American Recovery and Reinvestment Act
![Page 3: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/3.jpg)
Acknowledging the CMGPD
• Please include acknowledge and cite the CMGPD in your publications
• This will allow us to document use of the CMGPD• Will facilitate future applications for support to
release additional databases by providing evidence of demand
• Please also send us copies of any papers that results from use of the CMGPD
![Page 4: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/4.jpg)
Recommended acknowledgementPlease include in ALL publications
This research made use of the CMGPD-LN dataset. Preparation of the CMGPD-LN and documentation for public release via ICPSR DSDR was supported by United States Department of Health and Human Services National Institutes of Health Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) R01 HD057175-01A1 "Multi-Generational Family and Life History Panel Dataset" with funds from the American Recovery and Reinvestment Act.
![Page 5: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/5.jpg)
Recommended citationsPlease include in ALL Publications
• User guide– Lee, James Z, Cameron Campbell, and Shuang Chen. 2010.
China Multi-Generational Panel Dataset, Liaoning (CMGPD-LN) 1749-1909. User Guide. Ann Arbor, MI: Inter-university Consortium for Political and Social Research.
• Dataset– Lee, James Z., and Cameron D. Campbell. China Multi-
Generational Panel Dataset, Liaoning (CMGPD-LN), 1749-1909 [Computer file]. ICPSR27063-v5. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011-06-27. doi:10.3886/ICPSR27063
![Page 6: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/6.jpg)
Contemporary Topics
• Family contextual effects on individual outcomes• Neighborhood and community context• Life-course processes
– Conditions in childhood– Long-term effects of socioeconomic status
• Economic, climatic and other shocks• Multigenerational processes
– Interactions with stratification and inequality
![Page 7: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/7.jpg)
Limitations of contemporary data
• Time depth– Panel/cohort studies are recent– Prospective data only for portions of life span– Exceptions: British Cohort Studies
• Family context– Limited to parents, sometimes siblings– Typically co-resident– Exceptions: PSID, WLS
![Page 8: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/8.jpg)
Limitations of contemporary data
• Event counts– When mortality is low, ‘degree of freedom’
problem in all but the largest datasets– Difficult to explore complex interactions
• Exogenous shocks– Rare enough that their consequences are studied
individually– Indonesian Tsunami, Hurricane Katrina etc.
![Page 9: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/9.jpg)
Historical population databases• Individual life histories• Prospective• In some cases…
– Multigenerational– Household and community context– Kinship
• Exogenous shocks: Price spikes, climate fluctuations, disease epidemics
• High mortality levels• Examples: CMGPD-LN, HSN, PRDH, UAS, UPD
![Page 10: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/10.jpg)
History of the CMGPD-LN
• Early 1980s: Ju Deyuan at the First Historical Archives alerted James Lee to the existence of the registers at the Liaoning Provincial Archives (LPA) in the early
• James Lee visits LPA three times 1982-1985• Lee and Campbell visit LPA 1987• LPA provides Daoyi registers (dataset 1) that
become basis of Fate and Fortune
![Page 11: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/11.jpg)
History of the CMGPD-LN
• Datasets 3 and 2 obtained from LPA in early nineties and coded
• 1990-1999: datasets 4-10– Datasets became available from the Genealogical
Society of Utah– Data entry carried out in the United States
• 1999-2008: datasets 11-29– Data entry carried out in China
![Page 12: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/12.jpg)
CMGPD-LNOrganization of the Release
• Basic Dataset (DS-001)– Identifiers for data management, basic variables
• Restricted Dataset (DS-002)– Names and village locations
• Analytic Dataset (DS-003)– Richer set of socioeconomic status variables
• Kinship Dataset (DS-004)– Ancestry identifiers, constructed kin counts
• Additional files with
![Page 13: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/13.jpg)
CMGPD-LNContents
• Longitudinal• Individuals and households can be linked from one register
to the next• 1.5 million observations of 260,000 people
• 1,051 paternal descent groups identified through record linkage
• 698 communities• Generational depth
• 1749-1909• 7 generations
• (Relatively) Easy to Use• Resemble longitudinally-linked Censuses• Discrete-time event history (logistic regression etc.)
![Page 14: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/14.jpg)
CMGPD-LNContents
• Demographic outcomes• Mortality• Marriage• Reproduction (based on surviving children)• Migration• Timing of events
• Closed, can identify individuals at risk• Health and Disability
• In early registers, annotation of specific conditions for adult males.
• In later registers, indicator of whether or not disabled for adult males.
![Page 15: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/15.jpg)
CMGPD-LNContents
• Socioeconomic characteristics• Attainment of official position for adult males• Status as an exam candidate, indicative of high
education• Given name
• Flag variables for types of name• Diminutive, indicative of low status or
aspirations• Non-Han, indicative of expressed ethnicity
• Pinyin transcriptions in restricted release
![Page 16: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/16.jpg)
CMGPD-LNContents
• Geographic context• Villages distributed across a region the size of New
Jersey• Wide variety of economic and ecological contexts
• Basic release• Region• Unique village identifier
• Restricted release• Geocodes for villages accounting for 95% of
population
![Page 17: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/17.jpg)
CMGPD-LNContents
• Household and family context• Household of residence• Relationship to head
• Relatives can be linked to reconstruct descent groups• Via automated record linkage based on household
relationship and longitudinal linkage of individual records
• Kin outside the household• Based kinship variables, including parent identifiers, and
counts of close kin, available now• Additional constructed kinship variables available next
year
![Page 18: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/18.jpg)
![Page 19: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/19.jpg)
![Page 20: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/20.jpg)
REGION (approximate)
![Page 21: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/21.jpg)
DISTRICT (approximate)
![Page 22: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/22.jpg)
010
000
2000
030
000
4000
050
000
Obs
erva
tions
17501760
17701780
17901800
18101820
18301840
18501860
18701880
18901900
1910
Year
![Page 23: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/23.jpg)
CMGPD-LNFormat
• Similar in format to a series of triennial Censuses– Individuals listed in the same order and easy to link across time
• Organizing by community, kin group, household• Detailed specification of relationship to household head• Events since the previous register are annotated
– Basis for construction of flag variables specifying occurrence of events between current register and the next
• Discrete-time event history analysis– Typically, logistic regression or complementary log-log regression– Outcome: death in the next three years
• Restricting to registers for which the immediately succeeding register is also available
![Page 24: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/24.jpg)
CMGPDProcessing
• Images scanned from microfilm• Provided to coders in China• Coders in China transcribe contents to Excel spreadsheets
– Copy previous spreadsheet over and update based on contents of new register
• STATA programs import the contents of the spreadsheets and perform error-checking– Inconsistencies across registers
• Reports sent to coders for cleaning– Original registers coded ‘as is’, so if an inconsistency is in the original register
we leave it• STATA programs carry out automated linking of kin and generation of
variables for analysis
![Page 25: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/25.jpg)
Pre-1789 formatFeidi Yimiangcheng, 1783
![Page 26: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/26.jpg)
Post-1789 FormatFeidi Yimiancheng, 1792
![Page 27: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/27.jpg)
Daoyi 1816Illegal Escape
23 sui74 sui
Dead
42 sui
![Page 28: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/28.jpg)
Daoyi 1819
Dead
New arrival
![Page 29: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/29.jpg)
![Page 30: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/30.jpg)
![Page 31: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/31.jpg)
![Page 32: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/32.jpg)
0.2
.4.6
.81
Pro
porti
on o
f chi
ldre
n fo
r who
mda
ta o
n sp
ecifi
ed a
nces
tor a
re a
vaila
ble
1750 1800 1850 1900Year
Father Grandfather
Great-grandfather Great-great-grandfather
Great-great-great-grandfather Great-great-great-great-grandfather
![Page 33: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/33.jpg)
Using the DataRECORD_NUMBER
• RECORD_NUMBER identifies the same observation across the different datasets
• Use as the basis for one-to-one merge
local cmgpd_ln_location "..\CMGPD-LN from ICPSR\ICPSR_27063“
use "`cmgpd_ln_location'\DS0001\27063-0001-Data“
merge 1:1 RECORD_NUMBER using "`cmgpd_ln_location'\DS0003\27063-0003-Data"
![Page 34: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/34.jpg)
Using the DataRECORD_NUMBER
• If the merged datasets won’t fit into memory, make use of options on use and merge to load specific variables
use RECORD_ID YEAR SEX using "`cmgpd_ln_location'\DS0001\27063-0001-Data“
merge 1:1 RECORD_NUMBER using "`cmgpd_ln_location'\DS0003\27063-0003-Data“, keepusing(NON_HAN_NAME)
tab YEAR if SEX == 2, sum(NON_HAN_NAME)
![Page 35: SJTU CMGPD 2012 Methodological Lecture](https://reader035.vdocuments.us/reader035/viewer/2022062520/568163a7550346895dd4b61d/html5/thumbnails/35.jpg)
Using the DataMissing Values
• Following standard practice, missing values are coded as -98 or -99– -98 is structural missing– -99 is missing
• These are not the same as STATA missing, so observations will not be excluded automatically
• Especially in regressions, computations of means, etc., either manually exclude these, or recode to force exclusion– recode ZHI_SHI_REN -99 -98=. or– summ ZHI_SHI_REN if ZHI_SHI_REN != -98 & ZHI_SHI_REN != -99