Social Domain Record Linkage Environment
Presentation at the 2014 International Health Data Linkage Conference
Health Statistics Division
April 2014
Record Linkage at Statistics Canada
Linkages must satisfy a prescribed review process• New linkages approved by Executive Management Board
chaired by Chief Statistician of Canada
All projects using linked data are publicly announced on StatCan website• Including tabulations using linked datasets
Approved researchers access only the records needed for their project with no direct identifiers included
14/05/2014Statistics Canada • Statistique Canada2
Creating a Record Linkage Environment
Longitudinal Health and Administrative Data (LHAD) Initiative has proven that a record linkage environment is possible• System for linking health data using provincial health registries
and storing linked keys in a depository • Does not produce a fully integrated analytical database• Reduced cost and time for creating linked health analysis files
Social Domain Record Linkage Environment (SDRLE) Proof of Concept Project• Building on the success of LHAD• Using Statistics Canada administrative data• Increase the relevance of surveys by linking socio-demographic
indicators from multiple sources
14/05/2014Statistics Canada • Statistique Canada3
Derived Record Depository--------------
Key Depository
Hospital Discharge
Data
Survey Data
Vital Statistics
CensusImmigration Data
Tax Data
Canadian Cancer Registry
14/05/2014Statistics Canada • Statistique Canada4
SDRL Environment
Identifiers of the datasets are linked to
the DRD
The results are stored in a depository of linked keys
Only identifiers of datasets are
brought into the environment
DRD is built through
successive record linkages
SDRLE – Derived Record Depository (DRD)
Prototype DRD was built using multiple files linked together to identify unique individuals Source files include Census 2006, T1 Tax file, (1980-2011)
Canadian Births Database (1985-2008), Canadian Mortality Database (1992-2009), Landed Immigrant File (1980-2011), Indian Registry
Deterministic Record Linkage Method• Deterministic record linkage was initially used to restrict the creation
to the highest quality matches• Development of the method evolved to include near exact matches
Only persons identified in at least two datasets through record linkage were included in the DRD
Personal identifiers are stored in the DRD and a unique anonymous record identifier is assigned to these records
14/05/2014Statistics Canada • Statistique Canada5
14/05/2014Statistics Canada • Statistique Canada6
SDRL ENVIRONMENTDerived Record Depository
Core
SDRLE number123456789
182354987
129998889
Name
SDRLE number
Surname Given 1 Given 2 Start date End date
123456789 Doe John Liam 2006 2010
182354987 Doe Jane Lena 2006 2009
182354987 Johnson Jane Lena 2009 2012
129998889 Simpson Homer J 2006 2006
Address
SDRLE number
Address City PostalCode
Start date End date
123456789 150 Tunney’s Ottawa K1A0T6 2006 2009
123456789 Disney World Orlando 12345 2009 2010
182354987 151 Tunney’s Ottawa K1A0T5 2006 2012
129998889 Du Parc Montreal H3G1B1 2006 2006
Date of birth
SDRLE number
Date of birth
Start date End date
123456789 19501012 2006 2010
182354987 19600506 2006 2009
182354987 19600605 2009 2012
129998889 2006 2006 2006
Sex
SDRLE number
Sex Start date End date
123456789 M 2006 2010
182354987 F 2006 2012
129998889 M 2006 2006
Date of death
SDRLE number
Date of death
Start date End date
123456789 20100101 2010 2010
14/05/2014Statistics Canada • Statistique Canada7
SDRL ENVIRONMENT
Key Depository
SDRLE Number
DAD ID Number
Tax ID Number
Birth ID Number
Death IDNumber
Cancer ID Number
Census ID Number
ImmigrationNumber
123456789 - 490212461 - 1756243763 - 129309482 1278882762
182354987 4455600 678097512 - - 123765190A 776545411 -
129998889 1547342 - 2938365789 - - - -
Creating linked datasets
14/05/2014Statistics Canada • Statistique Canada8
Key Depository
Cohort dataset
Outcomes dataset
Linked dataset
14/05/2014Statistics Canada • Statistique Canada9
SDRLE results to date
DRD unique person records lower but near the demographic count for Canada• As a test, compared to the 2011 population count• Analysis of this DRD indicates that it includes fewer than 200,000
duplicates• Variations of coverage in sub-populations
• Could be attributable to limitations of the datasets used as well as to the record linkage methodology used for the proof of concept exercise
External linkage between the DRD and the National Longitudinal Survey of Children and Youth, criminal court data (ICCS), hospital discharge data (DAD), tax, and education program records (PSIS and RAIS).
14/05/2014Statistics Canada • Statistique Canada10
Next steps Improve methods by:
• Simplifying process for updates to the Derived Record Depository
• Reviewing and optimizing record linkage methods and processes
• Incorporating the use of G-Link for probabilistic matching where appropriate
Additional files to be included in the model• Files already at StatCan: Canadian Child Tax Benefit, T4
file• Other files yet to be identified• Environment is open to addition of new files in the future
QUESTIONS/COMMENTS?
Richard TrudeauHealth Record Linkage SectionHealth Statistics [email protected]
Craig GrimesHealth Record Linkage SectionHealth Statistics [email protected]
Bob KingsleyHealth Statistics [email protected]
14/05/2014Statistics Canada • Statistique Canada11
Derived Depository--------------
Key Registry
Hospital Discharge
Data
Survey Data
Vital Statistics
CensusImmigration Data
Tax Data
Canadian Cancer Registry
14/05/2014Statistics Canada • Statistique Canada13
SDRL ENVIRONMENT
Some datasets are
used to derive the Depository
Some datasets are linked to the Derived
Depository for analytical
purposes only
Deriving Record Depository and Key Depository
14/05/2014Statistics Canada • Statistique Canada14
Source(Tax, Census,
Births, etc.)
Filter
External Record Linkage
Record Depository
Add?Update?
Source updatesUpdate source
Process metadata
Unlinked source records
Linked source records
Source metadata
Key Depository
(linked status)
Creation of a linked dataset
14/05/2014Statistics Canada • Statistique Canada15
Analysis fileLinked NLSCY Cohort with
• 93% linked to Tax • 24% linked to PSIS• 14% linked to DAD• 6% linked to RAIS• 5% linked to ICCS
Cohort file NLSCY
91% linked
Derived Record
Depository
Key Registry
ICCS (69%)
PSIS (90%)
Tax (98%)
RAIS (95%)
DAD (55%)