ipums to ihsn: leveraging structured metadata for discovering multi-national census and survey data...
Post on 24-Dec-2015
212 Views
Preview:
TRANSCRIPT
IPUMS to IHSN: Leveraging structured metadata
for discovering multi-national census and survey data
Wendy L. Thomas
4th Conference of the European Survey Research Association (ESRA)
Session 2.10 The role and benefit of structured metadata in survey research II
W.L. Thomas2
Outline The Project:
Creation of DDI compliant metadata from the contents of a proprietary database
Project for the World Bank supported by funding from PARIS21 The Source:
IPUMS International database of metadata The Target:
IHSN Microdata Toolkit [DDI Codebook based] The Process:
Mapping a proprietary database to DDI Codebook standard The Results:
Identification of improvements for IPUMS Limitations of ISHN Microdata Toolkit not attributable to DDI Limitations of distribution metadata for supporting on-the-ground
processes
2011-07-20
W.L. Thomas3
Project The project was to extract metadata from the
IPUMS database system and to create IHSN Microdata Toolkit compatible metadata for the original data files and for the IPUMS extracts of data for developing countries.
IPUMS contains metadata and data for 106 censuses from developing countries world wide
Goals: Facilitate access to existing full data sets within the
IPUMS member country Populate a number of microdata catalogs based on
NADA
2011-07-20
W.L. Thomas4
Integrated Public Use Microdata Series, International Census microdata for social and economic
research (62 countries, 185 censuses, 397 million person records)
IPUMS-International is a project dedicated to collecting and distributing census microdata from around the world. Its purpose is to: Collect and preserve data and documentation Harmonize data Disseminate the data to registered researchers at
no cost https://international.ipums.org/international/
2011-07-20
W.L. Thomas5
IPUMS Metadata Extensive, detailed metadata held as original
documents, structured text files, and structured database
Basic division is between metadata to: Inform the user Drive the harmonization and/or search and
extraction engine Retains the original metadata for all included
original variables: Description Source Question Enumerator Instructions
Adds information on harmonization process and variables
2011-07-20
W.L. Thomas6
IHSN Microdata Toolkit Developed by the World Bank Data Group for
the IHSN to promote the adoption of international standards and best practices for microdata documentation, dissemination and preservation.
Consists of 3 modules Microdata Editor: Documents using DDI-Codebook
and Dublin Core standards Explorer: Views metadata and exports to common
formats CD-ROM Builder: Create user-friendly CD-ROM and
web products http://ihsn.org/home/index.php?q=tools/toolkit
2011-07-20
W.L. Thomas7
Process
2011-07-20
The project focused on metadata within the IPUMS database system which could be mapped out to generate a IHSN Microdata Toolkit compatible set of metadata
The desired output consisted of A set of Dublin Core descriptions of related technical
documents and copies of those documents A set of metadata for the original country data A set of metadata and data for the content of the public
extract system (harmonized and country specific variables) All outputs would be delivered to the country NSO for
their use Metadata content and document copies would be
provided to IHSN for inclusion in their catalog
W.L. Thomas8
IPUMS Process: Creating the Database
2011-07-20
STEP 1: Incoming data and metadata Prepare data for entry
into IPUMS harmonization process and database
Data arrives in non-specific formats and reflects the original structure of the data at the NSO or a standard extract format
Metadata arrives in non-specific format with inconsistent content coverage (may or may not have English language equivalents)
W.L. Thomas9
IPUMS Process: Creating the Database
2011-07-20
STEP 1: Incoming data and metadata Prepare data for entry
into IPUMS harmonization process and database
STEP 2: Intermediary data and metadata Clean and restrict for
confidentiality Harmonize Establish metadata
links
Data is selected and reformatted into household and person records, IPUMS generated variables are added for processing
Basic metadata is entered into a spreadsheet including; original name, description, IPUMS name, translations, frequency (generated), and all extant content types (including errors).
Creation of database
W.L. Thomas10
IPUMS Process: Creating the Database
2011-07-20
STEP 1: Incoming data and metadata Prepare data for entry
into IPUMS harmonization process and database
STEP 2: Intermediary data and metadata Clean and restrict for
confidentiality Harmonize Establish metadata links
STEP 3: IPUMS
Clean, confidential, harmonized data
Metadata includes links to original data, questions, instructions, and transformation processes
Original data structures have not been retained in the database, includes only the cleaned, restricted content and harmonized variables created by IPUMS
W.L. Thomas11
Process Constraints
2011-07-20
Working from the IPUMS database we were only able to construct microdata from STEP 2 and STEP 3 Linking back to original metadata and data
structure would require individualized processing (entry and linking)
Had we been able to construct metadata for STEP 1 it would not have been IHSN Microdata Toolkit compatible, just DDI-Codebook compatible Linking information held as file names Alphabetic codes Person records only (households required
construction) Multi-record relational data base structures for
data
W.L. Thomas12
Results
2011-07-20
Modified products delivered Within 24 hours of submission, all metadata files were
loaded into the IHSN NADA based catalog and were searchable down to the variable level in that system
The MPC is now capable of creating DDI/IHSN/NESSTAR compatible codebooks for all extracts on all microdata systems
Countries can create full extracts of their census data to match the intermediary structure (household/person record) found in the metadata, import it to the Microdata Toolkit and explore or create desired output easily
Countries can easily compare or create products from the harmonized data from their country or download additional data from the IPUMS International system for cross-national comparison
W.L. Thomas13
Insights
2011-07-20
The IPUMS database captured all required metadata elements either directly or in related structured files More information should be moved to the database Informational items that could be structured (i.e., dates)
should have both unstructured and structured content Database should be expanded to capture input structures and
retain the full process metadata from Step 1 to Step 3 (for curation as well as process purposes)
Simplifies the shift to DDI Lifecycle output by simple addition of comparative information
The IHSN Microdata Toolkit is limited in its ability to support legacy data and legacy data systems on the ground Most limitations are due to software rather than standards
constraints
W.L. Thomas14
Distribution and Process Metadata
2011-07-20
IPUMS, like most metadata databases, contains the majority of required objects for supporting distribution metadata in a DDI format (codebook or lifecycle)
The process of mapping to a detailed standard raised specific issues and highlighted possible improvements to the database content and structure
Brought to light the need to capture more metadata at an earlier point in the process. Metadata produced outside the database structure
was lost Object level version control was missing so
intermediary content was lost
W.L. Thomas15
Conclusions
2011-07-20
The ability to create distribution metadata in a standard format increases utility for users, provides new product options, provides easy means of populating external catalogs
The process of mapping to a well formed standard provides an opportunity for content and process improvement
Capture of incoming metadata is critical to shifting from a distribution metadata structure to a process metadata structure
Process structures not only can capture the development of the data within the system but can be used to drive the process and data and metadata checks
W.L. Thomas16
Contacts
2011-07-20
Wendy Thomas – Project manager wlt@umn.edu
Benjamin Youngdahl – Programmer beny@umn.edu
top related