study of editing and imputation practices at statistics finland janika konnu and pauli ollila...
DESCRIPTION
Internal E&I Study of StatF i Forms the basis for the work of the project. Describes the current E&I situation at StatFi. Reveals points where the developmental resources should be allocated in later phases of the project. 5 May 20103Janika Konnu, Pauli Ollila INTERNAL E&I STUDY OF STATFI SURVEY OF E&I PRACTICES AT STATFI DETAILED STUDIES OF E&I IN SOME STATISTICS OTHER STUDIES (e.g. auditing reports) Part 2 Janika Konnu Part 1 Pauli OllilaTRANSCRIPT
Study of Editing and Imputation Practices at Statistics Finland
Janika Konnu and Pauli OllilaStatistics Finland
Q2010: Editing sessionWednesday 5th of May, 11.00-12.30
Editing Project of Statistics Finland
5 May 2010 2Janika Konnu, Pauli Ollila
INTERNAL E&ISTUDY
OF STATFI
EXTERNAL E&I STUDY
DEVELOPMENTALWORK FOR THE
NEEDS OF STATFI
INFORMATIONAND
EDUCATION
Development project of two years Targets: to provide good E&I practices, help in making statistics
more effective, improve quality, diminish work load, save costs.
Internal E&I Study of StatFi
Forms the basis for the work of the project.
Describes the current E&I situation at StatFi.
Reveals points where the developmental resources should be allocated in later phases of the project.
5 May 2010 3Janika Konnu, Pauli Ollila
INTERNAL E&ISTUDY
OF STATFI
SURVEY OF E&I
PRACTICES AT STATFI
DETAILED STUDIES OF E&I IN SOME STATISTICS
OTHER STUDIES
(e.g. auditing reports)
Part 2Janika Konnu
Part 1Pauli Ollila
Survey of E&I Practices at StatFi Conducted in January 2010. A web questionnaire was used. Directed to all statistics of StatFi, providing information from all relevant
statistics (exceptions: statistics were finished, were to be finished, were in transition etc.)
Equivalence = one response equals also one or more other statistics
5 May 2010 4Janika Konnu, Pauli Ollila
SURVEY OF E&I
PRACTICES AT STATFI
STATISTICS DEPARTMENT RESPONSES EQUIVALENCES STATISTICS IN ALL
Population Statistics 34 17 51
Social Statistics 18 0 18
Prices and Wages 17 7 24
Economic Statistics 20 11 31
Business Trends 20 4 24
Business Structures 25 12 37
ALL 134 51 185
Topics of E&I Survey The survey tried to cover all
important aspects connected to editing and imputation.
The question pattern was commented and tested with E&I and survey experts together with subject matter people.
The structure allowed open-space commenting on every page. This proved to be a very valuable asset.
5 May 2010 5Janika Konnu, Pauli Ollila
SURVEY OF E&I
PRACTICES AT STATFI
SURVEYS, REGISTERS,
SOURCE DATA
DATA COLLECTION
METHODS
PRELIMINARY OPERATIONS
ERROR RECOGNITION
PRACTICES
MISSING VALUE
PRINCIPLES
ERROR CORRECTION
AND IMPUTATION
REPORTINGDATA ARCHIVING
Analysing and Utilising the Results
5 May 2010 6Janika Konnu, Pauli Ollila
SURVEY OF E&I
PRACTICES AT STATFI
DATA BASE OF PRACTICES IN
STATISTICS
DISTRIBUTIONS OF PRACTICES AT
VARIOUS LEVELS
MAKING “STATISTICS TYPES” BY COMMON
PRACTICES
STUDYING E&I PROCESSES (string of practices, descriptions)
PROVIDES GOOD BASIS FOR THE
DEVELOPMENTAL WORK OF EDITING
PROJECT
VALUABLE INFORMATION FOR
PLANS OF STATISTICS DEPARTMENTS AND OTHER INSTANCES
Example 1: Work time spent for editing and imputation in statistics (%)
5 May 2010 7Janika Konnu, Pauli Ollila
STATISTICS DEPARTMENT
Mis-sing
0 - 10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 ALL
Population Statistics 2 23 7 1 4 4 1 0 9 51Social Statistics 0 9 4 1 2 0 2 0 0 18Prices and Wages 1 11 3 2 3 1 1 2 0 24Economic Statistics 8 8 4 2 0 3 3 0 3 31Business Trends 0 11 3 2 2 0 1 5 0 24Business Structures 2 13 1 5 2 1 0 1 12 37ALL 13 75 22 13 13 9 8 8 24 185
DISTRIBUTIONS OF PRACTICES AT
VARIOUS LEVELS
Example 2: Type of data in making statistics at Statistics Finland
5 May 2010 8Janika Konnu, Pauli Ollila
STATISTICS DEPARTMENT
SUR REG SOU SURREG
SURSOU
REGSOU
SURREGSOU
ALL
Population Statistics 0 12 7 4 4 9 15 51Social Statistics 1 2 2 10 1 1 1 18Prices and Wages 0 1 1 2 12 0 8 24Economic Statistics 4 0 8 4 4 1 11 32Business Trends 0 2 0 10 0 1 9 22Business Structures 4 1 2 4 1 5 21 38ALL 9 18 20 34 22 17 65 185
SUR = survey, REG = register, SOU = source data
DISTRIBUTIONS OF PRACTICES AT
VARIOUS LEVELS
5 May 2010 9Janika Konnu, Pauli Ollila
Example 3: Technical editing at the unit level
Statistics with unit-level processing
Pop.Stat.(44)
Soc.Stat.(15)
Pric. &Wages(22)
Econ.Stat.(24)
Busin.Trends(22)
Busin.Struct.(32)
ALL(159)
Unit-level examination with a computer
19 10 17 23 18 29 116
Logical checks using a program or otherwise
37 13 8 21 13 25 117
Defining non-valid variable values
31 12 8 14 11 19 95
Listing extreme values of variables
13 11 9 10 11 24 78
Comparing with previous or other values
34 10 14 22 13 23 116
Ratio of values of two variables or different time points, other functions
16 8 5 13 4 19 65
DISTRIBUTIONS OF PRACTICES AT
VARIOUS LEVELS
5 May 2010 10Janika Konnu, Pauli Ollila
Example 4: Model editing at the unit level
Statistics with unit-level processing Pop.Stat.(44)
Soc.Stat.(15)
Pric. &Wages(22)
Econ.Stat.(24)
Busin.Trends(22)
Busin.Struct.(32)
ALL(159)
Defining the certainty of different variables to be right in the case of conflicting variables (reliability weight, minimum change Fellegi-Holt -principle)
6 3 2 0 6 0 17
Comparing modelled value and observed value
0 1 4 8 1 1 15
Modelling variable values / observations risk to be erroneous (e.g. selective editing)
1 1 1 0 0 0 3
Finding problematic values with defining the importance of the observation or so called sensitivity function (reveals the effect of the observation to the estimate)
0 5 12 0 7 6 30
DISTRIBUTIONS OF PRACTICES AT
VARIOUS LEVELS
5 May 2010 11Janika Konnu, Pauli Ollila
Example 5: Macro editingStatistics with unit-level processing Pop.
Stat.(44)
Soc.Stat.(15)
Pric. &Wages(22)
Econ.Stat.(24)
Busin.Trends(22)
Busin.Struct.(32)
ALL(159)
Studying distributions and cross-tabulations
32 15 6 15 6 23 97
Information from calculating preliminary estimates (e.g. mean, total, correlation, deviation)
23 14 10 15 7 26 95
Controlling the joint effect of survey weights and exceptional values
0 5 4 0 1 5 15
Comparing with estimates from previous occasion(s), valid limits for estimates (e.g. time series)
15 11 15 18 10 26 95
Using graphical methods 8 8 5 13 7 15 56Studying aggregated data 25 6 19 19 17 28 114Comparing with other possible data 28 10 8 18 7 27 98
DISTRIBUTIONS OF PRACTICES AT
VARIOUS LEVELS
5 May 2010 12Janika Konnu, Pauli Ollila
Example 6: Treatment types (not imputation)Statistics with unit-level processing Pop.
Stat.(44)
Soc.Stat.(15)
Pric. &Wages(22)
Econ.Stat.(24)
Busin.Trends(22)
Busin.Struct.(32)
ALL(159)
Getting contact to the respondent and asking the value or getting it from the paper questioinnaire of the postal enquiry
27 5 17 20 16 30 115
Fetching the previous value (cold-deck)
6 2 13 11 8 20 60
Getting the value from another observation or another source
12 5 13 14 14 25 83
Getting the real value by reasoning based on the information of the observation in question
27 7 8 21 13 27 103
Correcting automatically with program lines including conditions or based on a list of erroneuos values (e.g. ‘america’ = ‘United States’)
37 8 6 14 10 18 93
Correcting automatically based on risk functions (e.g. selective editing)
0 0 1 0 6 0 7
DISTRIBUTIONS OF PRACTICES AT
VARIOUS LEVELS
Example 1: Statistics with no unit-level processing
5 May 2010 13Janika Konnu, Pauli Ollila
Collecting statistics utilises statistics and tabulations from several sources, and after gathering information the required form of the statistics is reached (6 statistics). Strict processing statistics are based on one or more data (statistical data, external source data or register), which are used strictly without changes in order to make the statistics (9 statistics). Calculation model statistics lean on existing, already edited data and/or tabulations/statistics in such way that with using them one can realise a mathematical or statistical calculation model required by the statistics (11 statistics).
MAKING “STATISTICS TYPES” BY COMMON
PRACTICES
Example 2: Different types of utilising statistics (i.e. estimates from other sources)
5 May 2010 14Janika Konnu, Pauli Ollila
MAKING “STATISTICS TYPES” BY COMMON
PRACTICES
Direct use of statistics: statistics (estimates) are directed straight to the process of making statistics, or it goes through a standard treatment before the process. Additions and checks: statistics (estimates) are used for treating missing values and errors and/or for various checks. Making expansion weights: statistics (estimates) and distributions are utilised for making weights expanding the results to the population level (e.g. calibration). Index calculation Account calculation A part of calculating results: all purposes of using statistics (estimates) in calculating the results (excluding index and account calculation).
Example 3: Types of data collection
5 May 2010 15Janika Konnu, Pauli Ollila
MAKING “STATISTICS TYPES” BY COMMON
PRACTICES
STATISTICS DEPARTMENTOnly statistics with data collection
Pop.Stat.
Soc.Stat.
Pri.Wag
Econ.Stat.
Busin.Tren.
Busin.Struct.
ALL
Full Blaise-based data collection 0 7 0 2 1 0 10Paper questionnaire collection only 0 1 0 0 0 0 1Diary surveys 0 2 0 0 0 0 2XCOLA-based data collection 2 0 0 2 1 5 10XCOLA and paper combination 0 0 1 0 3 0 4XCOLA and Excel combination 0 0 8 2 3 2 15Other web collection made in StatFi 0 0 1 0 6 0 7Web collection via external server 5 1 3 4 1 19 33Excel-based data delivery 3 0 3 3 1 4 14Other data delivery or transfer 10 0 2 3 3 0 18YHTEENSÄ 20 11 18 16 19 30 114
Detailed interviews with statistics
Interviews with different type of statistics from production and editing point of view
Informal discussions with 1-2 interviewers and 1-2 persons from the statistic
Reports finalised with the interview persons and made available for everyone in StatFi
5 May 2010 16Q2010 Konnu and Ollila
DETAILED STUDIES OF E&I IN SOME STATISTICS
5 May 2010 17Q2010 Konnu and Ollila
DETAILED STUDIES OF E&I IN SOME STATISTICS
Most common methods for editing and imputation Editing
deterministic checking rules
local checkingdistributional checkinguse of other sources or
historical data
Imputationmanualcold deckaveragehot deckautomatic imputation
(checking lists)
DETAILED STUDIES OF E&I IN SOME STATISTICS
General impression of editing and imputation in StatFi
Usually we take new contact to the respondent
Deduction is used if it’s possible
Personnel has strong contentual knowledge and awareness of current events
Personnel is very interested in and willing to work for methodological improvements
5 May 2010 18Q2010 Konnu and Ollila