input data warehousing canada’s experience with establishment level information presentation to...

24
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics Montreal, QC June 20 2007

Upload: sidney-waldridge

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Input Data WarehousingCanada’s Experience with

Establishment Level Information

Presentation to the Third International Conference on Establishment Statistics

Montreal, QCJune 20 2007

Overview

Introduction of data warehousing as a concept Approaches to holding data Introduction to the Statistics Canada’s Unified

Enterprise Statistics (UES) Program Centralized warehousing of UES data Example of the data warehouse at work

Subject-matter areas need or generate different types of information

Data to support collection Questionnaires and supporting metadata Frame and sample information Status of each respondent during collection Survey data Administrative data

Post-collection processing Edits (metadata) Imputation specifications Allocation specifications Generation of “clean” datasets

Tabulation of estimates/analysis of results Value of estimate Data quality indicators Suppression patterns Analysis of coherence

Input Data

Input Data Warehouse

A copy of statistical input data specifically structured for querying and reporting Collection Post-collection processing Tabulation of estimates

Approaches to organizing information holdings

Decentralized In a completely decentralized approach, each subject

matter area maintains its own input data Centralized

Centralized data warehouse contains all input data from all subject matter program areas

All program areas need to use common concepts and standards for classification, or else a concordance would have to be found among these systems.

These are extremes along a continuum

Centralized approach

Advantages Economies of scale should lead to reduced overall

development and maintenance costs Some human resource issues are eased (knowledge and

skills retention and transfer) Eases integration of data to support data analysis,

coherence analysis, etc. Allows subject-matter divisions to specialize in data analysis

rather than data management

Decentralized approach

Advantages Specialized subject matter expertise readily available Subject matter areas are not dependent on a central

authority to make changes therefore flexibility is increased Care and control of the data is clearly established

Questions to address in moving to a more centralized environment

What purpose does it serve? What must be done to the statistical model to

ensure compatibility with other data sources? What mechanisms need to be in place to

ensure productive client-service relationship? Who is custodian of the data? Do the benefits in moving to a more

centralized environment truly outweigh the costs?

Statistics Canada and the Unified Enterprise Survey Program

In the late 1990’s, Statistics Canada undertook a major program to improve the quality of the provincial economic accounts released by the Agency and the annual business surveys that feed into accounts

These surveys were integrated in order to increase the quality of data produced from these surveys in terms of Consistency Coherence Breadth Depth

Features of the UES

Improved frame (business register) Sampling made to be consistent across surveys and improved

coverage Harmonized content and common collection applications Administrative data are to be used instead of survey data if

possible and if the data are of good quality Common post-collection processing systems Common storage of data Central contact management system Improvements in outputs

Moving to a more centralized environment

What is the purpose? The UES data warehouse forms a repository of all the files

created through the processing phases of UES and accompanying metadata.

This supports the work of analysts and survey managers in subject matter divisions, collection managers, statistical methodologists and users in the System of National Accounts

Moving to a more centralized environment

What must be done to the statistical model to ensure

compatibility with other data sources? The statistical model for UES surveys forced the

harmonization of concepts, definitions and classifications across surveys

Integration of survey and administrative data required the mapping of tax data to survey data (harmonized conceptually as well as characteristically)

Moving to a more centralized environment

What mechanisms need to be in place to ensure productive client-service relationship? Project management structure for the UES that crosses

functional boundaries Change management function to ensure seamless

integration of surveys into UES

Moving to a more centralized environment

Who is custodian of the data? ESD controls access to all common systems. Subject matter divisions are exclusively responsible for

dissemination, including the determination of aggregations and data suppressions (due to quality and confidentiality)

Moving to a more centralized environment

Do the benefits in moving to a more centralized environment truly outweigh the costs? Reduction in development costs Development of best practices that can be shared across

the bureau Single point of access for input data improves security of all

UES related data Rationalization of hardware to minimize the number of

servers

The UES Data Warehouse

UES Warehouse is centrally managed within Enterprise Statistics Division

Major components of the data warehouse include: Metadata repository Processing metadata Central data store (CDS) External data

Data that originate outside UES but have been integrated in the UES framework

The UES Data Warehouse

Systems interfacing with the data warehouse: Unified Tracking and Retrieval Tool (USTART) Integrated Questionnaire Metadata System (IQMS) UES Processing Interface Working Estimation Environment (WEE) interface Macro-data adjustment Facility

Operational applications

Operational monitoring Coherence analysis Baseline information for operational research Quality measures (i.e. response rate analysis) Integrated data analysis

Response rates in collection

Final response rates

The centralized system in action

Outcomes The centralized input data warehouse

provides a centralized tool that allows users to track performance on a consistent basis

Same method Same source data

Conclusion

The centralized data warehouse offers benefits to statistical programs

There are a number of conditions that must be fulfilled for success Purpose Data compatibility Client-service relationship Custodian of data Cost-benefit