possibilities of exploiting administrative data in short term statistics in poland jacek kowalewski...
TRANSCRIPT
Possibilities of exploiting
administrative data in short
term statistics in Poland
Jacek Kowalewski
STATISTICAL OFFICE IN POZNAŃ
Possibilities of exploiting administrative data in short term statistics in Poland
• possibilities of using existing administrative registers in short term statistics
• assessment of potential usefulness of administrative sources
• analysis of ways to decrease response burden for some companies
• improvement in data completeness and quality• assessment of the possibilities of exploiting register data
for purposes of calibration in the case of incomplete data
Research program
DG1 survey
Possibilities of exploiting administrative data in short term statistics in Poland
• business activity report • the basic source of short-term information about
economic activity of businesses• covers only enterprises with over 9 employees • timeliness – 5th day after the end of each month• sales, taxes and subsidies, turnover, employees,
working time• 30 questions in the survey form converted into 460
variables• current prices, constant prices
DG1 survey
SURVEY SCOPE
BY THEMES
SALES
of manufacturing output:-total-domestic-exported (of which goods exported to the Eurozone countries)
of construction output:-total-by type of work-by buidling type
of goods and materials:-In retail trade-online-In wholesale trade
TAXES & SUBSIDIES
Excise duty:-on goods produced for sale-on goods and materials
Specific subsidies
EMPLOYEES
working personsaverage number of employees
SALARIES
Wages and salaries
Dividend
Social insurance premiums
PRICES
of goods and services:-total-domestic-exported
construction work-
price indexes
NEW ORDERS
New orders received:-total-domestic-exportedof which orders exported to the Eurozone countries)
T RANSPORT
Load capacity of road transport
Transport of goods
Transport of passengers-total- city transport- bus intercity transport.
ZP
WORKING TIME
time worked
TURNOVER
Turnover:-total-domestic-exported(of which goods exported to the Eurozone countries)
SURVEYSCOPE
18 594 78 443
18 594(100%)
13 353(about 17%)
Sample31 947
10% refusals
14% reminders
25% explanations
Analysis of the processing of the DG-1 survey
URZĄD STATYSTYCZNY W POZNANIU
sample
aggregation
sections, divisions, classes
generalization
URZĄD STATYSTYCZNY W POZNANIU
Analysis of the processing of the DG-1 survey
18 594(100%)
13 353(about 17%)
Sprawozdanie DG-1
Kontrola i tworzenie
bazy B1
Baza B1
Naliczanie bazy BB
Baza BB
Naliczanie na działy
i sektory w bazie B3(suma sprawozdań)
Baza B3
Uogólnianie działów i
sektorów w bazie B3
Kartoteka
Baza B3
Sumowanie bazy B3na wyższe agregaty
Baza B3
Korekta wskaźników uogólnienia
Baza B3
Czy poprawne?
TAK
NIE
Sumowanie bazy B1na bazy specjalizacji
BP; BR; BH
Bazy Bx
Uogólnianie baz specjalizacjiBP; BR; BH
Uogólnione
bazy Bx
Kartoteka
Korekta wskaźników uogólnienia w bazach
Bx
Bazy Bx
Czy poprawne?
TAK
NIE
Sprawozdanie DG-1
Kontrola i tworzenie
bazy B1
Baza B1
Naliczanie bazy BB
Baza BB
Naliczanie na działy
i sektory w bazie B3(suma sprawozdań)
Baza B3
Uogólnianie działów i
sektorów w bazie B3
Kartoteka
Baza B3
Sumowanie bazy B3na wyższe agregaty
Baza B3
Korekta wskaźników uogólnienia
Baza B3
Czy poprawne?
TAK
NIE
Sumowanie bazy B1na bazy specjalizacji
BP; BR; BH
Bazy Bx
Uogólnianie baz specjalizacjiBP; BR; BH
Uogólnione
bazy Bx
Kartoteka
Korekta wskaźników uogólnienia w bazach
Bx
Bazy Bx
Czy poprawne?
TAK
NIE
18 594 78 443
Digital reporting within the DG-1 survey national average from January 2008 to May 2010
2326
2629 29 30 31 32 34
3743
50
8288 88 88
8788 87
76
87 87 87 87 91 91 91 92 91
0
10
20
30
40
50
60
70
80
90
100
%
The use of data from
administrative registers
Possibilities of exploiting administrative data in short term statistics in Poland
• recognition and description of the regulations• comparison of the scope of data used in the registers with
those used by public statistics• comparison of concept definitions and classifications • studies into methodological compatibility• evaluation of the quality of administrative systems to
determine their usefulness as data sources for business statistics
• specification of the scope of data to be used
Review of the systems
Table 1. Evaluation of the quality of administrative systems (max 49)
No. Database / Register Evaluation of system quality
Tax system
1. Database of taxpayers of the personal income tax PIT as a source of data in the field of labour market
41
2. Database of taxpayers of the personal income tax PIT as a source of data in the field of revenues and costs of activities as well as taxes
34
3. Database of taxpayers of the corporate income tax CIT as a source of data in the field of revenues and costs of activities as well as taxes
41
4. Database of taxpayers of the value added tax VAT as a source of data in the field of revenues and costs of activities as well as taxes
44
5. National Register of Taxpayers (KEP) as a source of data in the field of labour market and revenues and costs of activities
49
System of social insurance
1. Central Register of Contribution Payers as a source of data in the field of labour market
39
2. Central Register of the Insured as a source of data in the field of labour market
37
RevenuesDG1 survey vs administrative
registers
Distribution of enterprises by revenue, DG-1, 2008
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Num
ber
of e
nter
pris
es
Revenues
Relationship between the values of accumulated revenue - from DG-1, PIT or CIT register, all units together 2008
PIT
, C
IT r
egis
ter
DG1 survey
Relationship between the values of accumulated revenue - from DG-1, PIT or CIT register, all units together 2008
PIT
, C
IT r
egis
ter
DG1 survey
Usefulness of
calibration
Possibilities of exploiting administrative data in short term statistics in Poland
• Transforming administrative data sources into statistical data sets
• General population (MEETS) consisting of companies in the DG-1 survey which were successfully matched with information from the the KEP, CIT, PIT and ZUS databases
• Mean revenue was estimated using a calibration estimator with a known vector of the population total of auxiliary variables (the Horvitz-Thompson estimator was used as a benchmark).
The study procedure
.==ˆ1=
ii
n
iii
sHT ydydY
Possibilities of exploiting administrative data in short term statistics in Poland
• Simulation: 5%, 10% and 15% samples were drawn from the MEETS dataset
• After obtaining a sample, information about revenue (dependent variable Y) for some enterprises was replaced with missing data
• Different approaches were used to generate missing data: random fashion (option 1), missing data were attributed to enterprises with the lowest (option 2) and highest (option 3) revenue
• 500 iterations performed for each option, the expected value of revenue, the expected value of the bias
The study procedure - cntd
Table 2. The relative estimation error of estimators of the monthly enterprise revenue (in percent)
Horwitz-Thompson estimator
calibration estimator
sample size
% of missing data option 1 option 2 option 3 option 1 option 2 option 3
5%
5% 28.37 28.71 8.94 14.40 14.35 12.25
10% 29.78 26.61 7.51 14.89 14.06 9.79
15% 32.43 27.49 7.42 15.08 14.60 9.10
10%
5% 20.33 19.32 6.03 10.77 10.57 7.95
10% 19.65 18.94 5.45 9.73 10.04 6.19
15% 20.05 19.74 5.12 11.19 10.27 6.07
15%
5% 14.69 15.11 4.67 8.42 8.15 5.29
10% 16.40 14.15 4.24 8.67 7.87 4.75
15% 16.49 15.30 3.81 8.62 8.03 4.50
Conclusions
Possibilities of exploiting administrative data in short term statistics in Poland
• It’s hard to obtain administrative registers (system barrier)• The direct use of administrative data in short-term statistics
is limited by the overly long time of data processing and a long period of waiting for data to be transferred by their administrators.
• Administrative databases can be used as direct data sources for
surveys, as a source of information to complete missing data, as a source of data for comparison with information collected in statistical surveys conducted by means of statistical reporting forms
• The databases are a rich source of potential auxiliary variables, which can improve the quality of estimations by reducing the negative effect of non-response in statistical reporting.
Conclusions
Possibilities of exploiting administrative data in short term statistics in Poland
• The use of all available information about selected variables helps to
reduce the respondent burden and improve the efficiency of estimators
• Use of administrative data in producing business statistics may not completely solve all its problems (highly right-skewed distributions,
very high differentiation and high concentration) but it can significantly contribute to minimizing their negative effects.
• Calibration approach can improve the quality of short-term estimation
• Calibration combined with the methodology of small area statistics, can increase the range of estimation methods available in business surveys
Conclusions
To celebrate the 100th anniversary of the Polish Statistical Association
http://www.stat.gov.pl/pts/kongres2012