input data warehousing canada’s experience with establishment level information presentation to...
TRANSCRIPT
Input Data WarehousingCanada’s Experience with
Establishment Level Information
Presentation to the Third International Conference on Establishment Statistics
Montreal, QCJune 20 2007
Overview
Introduction of data warehousing as a concept Approaches to holding data Introduction to the Statistics Canada’s Unified
Enterprise Statistics (UES) Program Centralized warehousing of UES data Example of the data warehouse at work
Subject-matter areas need or generate different types of information
Data to support collection Questionnaires and supporting metadata Frame and sample information Status of each respondent during collection Survey data Administrative data
Post-collection processing Edits (metadata) Imputation specifications Allocation specifications Generation of “clean” datasets
Tabulation of estimates/analysis of results Value of estimate Data quality indicators Suppression patterns Analysis of coherence
Input Data
Input Data Warehouse
A copy of statistical input data specifically structured for querying and reporting Collection Post-collection processing Tabulation of estimates
Approaches to organizing information holdings
Decentralized In a completely decentralized approach, each subject
matter area maintains its own input data Centralized
Centralized data warehouse contains all input data from all subject matter program areas
All program areas need to use common concepts and standards for classification, or else a concordance would have to be found among these systems.
These are extremes along a continuum
Centralized approach
Advantages Economies of scale should lead to reduced overall
development and maintenance costs Some human resource issues are eased (knowledge and
skills retention and transfer) Eases integration of data to support data analysis,
coherence analysis, etc. Allows subject-matter divisions to specialize in data analysis
rather than data management
Decentralized approach
Advantages Specialized subject matter expertise readily available Subject matter areas are not dependent on a central
authority to make changes therefore flexibility is increased Care and control of the data is clearly established
Questions to address in moving to a more centralized environment
What purpose does it serve? What must be done to the statistical model to
ensure compatibility with other data sources? What mechanisms need to be in place to
ensure productive client-service relationship? Who is custodian of the data? Do the benefits in moving to a more
centralized environment truly outweigh the costs?
Statistics Canada and the Unified Enterprise Survey Program
In the late 1990’s, Statistics Canada undertook a major program to improve the quality of the provincial economic accounts released by the Agency and the annual business surveys that feed into accounts
These surveys were integrated in order to increase the quality of data produced from these surveys in terms of Consistency Coherence Breadth Depth
Features of the UES
Improved frame (business register) Sampling made to be consistent across surveys and improved
coverage Harmonized content and common collection applications Administrative data are to be used instead of survey data if
possible and if the data are of good quality Common post-collection processing systems Common storage of data Central contact management system Improvements in outputs
Moving to a more centralized environment
What is the purpose? The UES data warehouse forms a repository of all the files
created through the processing phases of UES and accompanying metadata.
This supports the work of analysts and survey managers in subject matter divisions, collection managers, statistical methodologists and users in the System of National Accounts
Moving to a more centralized environment
What must be done to the statistical model to ensure
compatibility with other data sources? The statistical model for UES surveys forced the
harmonization of concepts, definitions and classifications across surveys
Integration of survey and administrative data required the mapping of tax data to survey data (harmonized conceptually as well as characteristically)
Moving to a more centralized environment
What mechanisms need to be in place to ensure productive client-service relationship? Project management structure for the UES that crosses
functional boundaries Change management function to ensure seamless
integration of surveys into UES
Moving to a more centralized environment
Who is custodian of the data? ESD controls access to all common systems. Subject matter divisions are exclusively responsible for
dissemination, including the determination of aggregations and data suppressions (due to quality and confidentiality)
Moving to a more centralized environment
Do the benefits in moving to a more centralized environment truly outweigh the costs? Reduction in development costs Development of best practices that can be shared across
the bureau Single point of access for input data improves security of all
UES related data Rationalization of hardware to minimize the number of
servers
The UES Data Warehouse
UES Warehouse is centrally managed within Enterprise Statistics Division
Major components of the data warehouse include: Metadata repository Processing metadata Central data store (CDS) External data
Data that originate outside UES but have been integrated in the UES framework
The UES Data Warehouse
Systems interfacing with the data warehouse: Unified Tracking and Retrieval Tool (USTART) Integrated Questionnaire Metadata System (IQMS) UES Processing Interface Working Estimation Environment (WEE) interface Macro-data adjustment Facility
Operational applications
Operational monitoring Coherence analysis Baseline information for operational research Quality measures (i.e. response rate analysis) Integrated data analysis
The centralized system in action
Outcomes The centralized input data warehouse
provides a centralized tool that allows users to track performance on a consistent basis
Same method Same source data