imputing missing administrative data for short-term enterprise statistics pieter vlag – statistics...

13
IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia, Statistics Finland, ISTAT, Statistics Lithuania, ONS

Upload: anastasia-phillips

Post on 31-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS

Pieter Vlag – Statistics Netherlands

Joint work with

DESTATIS, Statistics Estonia, Statistics Finland, ISTAT,

Statistics Lithuania, ONS

Page 2: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

Imputing missing admin data for STS-estimates 2

Outline of the presentation

• Scope of the project - use of admin data for STS

• Two situations:

a. VAT fairly complete and representative - VAT representative

b. VAT not complete and not-representative - VAT not representative

• VAT representative

a. imputing missing values

• Imputing missing values

a. methods for imputations

b. which units to impute

• Conclusions and implications for other projects

Page 3: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

3

Scope of the project

Final situation: (after year)- all admin data are available for NSIs- data cover the population

Monthly and quarterly estimates:Part of admin data are ‘missing’

L.E. (survey)

admin data

L.E. (survey)

admin data

Missing

Assumption If admin data are complete, possible to use for statistics

Challenge How to estimate for ‘ missing’ admin data in case of monthly and quarterly estimates

Scope: turnover (VAT-registration), wages+employees (“social security data”)

Imputing missing admin data for STS-estimates

Page 4: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

4

Additional Value of ESSnet AdminData

• VAT = Value Added Tax

• The European Union value added tax (EU VAT) is a value added tax encompassing member states in the European Union VAT area. Joining in this is compulsory for member states of the European Union.

• Each Member State's national VAT legislation must comply with the provisions of EU VAT law as set out in Directive 2006/112/EC.

TRANSLATION TO STATISTCS

• INPUT: Available VAT-information quite similar in Europe !

• OUTPUT: obligations also similar in Europe (STS, SBS. ESR regulations)

• CONCLUSIONS ESSNET: methodological challenges in use of admin data indentical -> solution may differ, but only limited

Imputing missing admin data for STS-estimates

Page 5: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

Imputing missing admin data for STS-estimates 5

Two situations

Situation A:

L.E. (100 % sample) L.E (100 % sample)

VATAlmost complete

VATNot available or very limited

GENERAL SITUATION FOR

Q; t+45days

GENERAL SITUATION FOR

M; t+30 days

SITUATION A. or B. FOR OTHER ESTIMATES(Q-flash; M-T+45/50d)

DIFFERS PER COUNTRY

Situation B:

Page 6: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

experimental meth.

NOT DISCUSSED FURTHER

established techniques•Level estimates•Imputation of missing data (with available VAT)

100 % sample

AdmindataFinal situation

100 % sample

Admindata

Missing

STS

SITUATION A:Admindata coverage almost complete

ESTIMATION ONLY BASED ON ADMIN DATA

SITUATION B:Admindata coverage incomplete

ADMIN DATA = AUXILIARY INFORMATION

sample

VAT

ESTIMATION

VAT

sample

QUALITY STS-ESTIMATES:Revision compared to final estimate

T

eB

T

tt

1

T

eE

T

tt

1

average bias:

average error:

L.E.

SME

Methods Situation A: methodology

VATT-x

Page 7: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

7

Methods for imputations

• Analysed several production systems:

i.e. DE, F, “Nordic countries’, NL , I

• Imputation of “missing VAT” based on:

Ot/Ot-1, Ot/Ot-12 of available VAT – or similar approaches

• Stratification levels for calculation stratum imputations differ

from

NACE 2-digit x 2-size classes

to

NACE 4-digit x 9 size classes

KEY QUESTION: Do these different approaches lead to different output, because methods are generally applied when coverage of L.E. survey + available VAT exceeds 90 % of target variable ?

Imputing missing admin data for STS-estimates

Page 8: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

8

Methods for imputations– testing of different methodologies (example Estonia)

Conclusion: Imputation method provide similar results if the population is fixed and VAT covers > 80 % of population

Turnover growth rate, NACE 47

1,0

1,1

1,2

1,3

1,4

1 2 3 4 5 6 7 8

Month, Year 2011

Gro

wth

rat

e

IMP Ot/Ot-12 NACE2 (1st trans.)

IMP Ot/Ot-12 STS (1st trans.)

IMP Ot/Ot-1 NACE2 (1st trans.)

IMP Ot/Ot-1 STS (1st trans.)

IMP Ot/Ot-12 NACE2 (2nd trans.)

IMP Ot/Ot-12 STS (2nd trans.)

IMP Ot/Ot-1 NACE2 (2nd trans.)

IMP Ot/Ot-12 STS (2nd trans.)

Survey growth rate

Imputing missing admin data for STS-estimates

Page 9: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

9

Comparing imputations with realisations (approach Statistics Finland)

• Five imputation rules for current period at mico-level

• Imputation rules automatically evaluated and compared by calculating maximum proportional forecast errors using data concerning the five latest months. The selection rules are:

• An imputation rule < 20% maximum proportional forecast error and the same direction of change as in the last two months is automatically admissible;

• The model with the smallest maximum error is considered best

Main difference with other detected practices:

• No assumption; available VAT = representative

• Not all missing data imputed (in practice 20 - 50 %)

Imputing missing admin data for STS-estimates

Mean annual change

Geometric mean of monthly changes

Previous turnover

Mean turnover

Turnover of comparison month

Page 10: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

10

Comparing imputations with realisations(more precise conclusions)

Imputing missing admin data for STS-estimates

Explanations:- Outlier effect on calculated Ot/Ot-1 or Ot/Ot-12 values- Late VAT-reporters are likely a selective group in countries with automatic fining systems in case of late VAT-reporting.

impact of selectivity on output is generally neglible due to high coverage available data

Page 11: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

11

Which units to impute

Imputing missing admin data for STS-estimates

at STS-estimate

PROVISIONAL POPULATION

I ACTIVE reporter active

(a) x

II ASSUMED ACTIVE: (late) reporting expected IMPUTED VALUE

correctly assumed active (b)

incorrectly assumed active

(c)

III.ASSUMED INACTIVE: no (late) reporting expected NO IMPUTED VALUE

incorrectly assumed inactive

(c)

correctly assumed inactive

(a)

ESTIMATE = I + II + III REVISED ESTIMATE = A.

when all data complete

FINAL POPULATION

REVISION DUE TO: (a) revised VAT; (b) imputation technique; (c) uncertainty provisional population

A. ACTIVE B. INACTIVE

Page 12: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

12

Impact on resultsexample Italy

Imputing missing admin data for STS-estimates

without later

reportingimputed values at

t

reported values at

t+12imputed

values a tImputed

valueReported

valuesa b c d e f=(b+d) g=(c+e) h=(a+f) i=(a+g)

10 98.2 1.2 1.2 0.4 0.7 1.6 1.8 99.8 100.015 98.4 1.0 1.0 0.5 0.7 1.5 1.6 99.8 100.025 98.6 0.9 0.9 0.3 0.4 1.3 1.4 99.9 100.028 98.5 1.0 1.0 0.3 0.5 1.3 1.5 99.8 100.030 98.1 1.2 1.2 0.5 0.7 1.7 1.9 99.8 100.041 97.7 1.4 1.4 0.8 1.0 2.2 2.3 99.9 100.047 98.0 1.2 1.2 0.6 0.8 1.8 2.0 99.8 100.064 98.2 1.2 1.2 0.3 0.6 1.5 1.8 99.7 100.071 98.3 1.1 1.1 0.3 0.6 1.5 1.7 99.8 100.081 96.3 1.8 1.8 0.7 2.0 2.5 3.7 98.8 100.0

Nace Division

Early reporters

Total reported

value

with later reporting

Units with imputed missing Units

without imputed values at

t but reporting

at t+12

Total imputed

value

imputation technique

uncert. provisional population

Conclusion: effect on revision caused by uncertainty of units to be imputed is larger than imputation technique itself

Page 13: IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

13

Conclusions

• When using Admin Data for STS missing data are imputed

• Most widely used imputation rules are: Ot/Ot-1 or Ot/Ot-12

• Taking into account large coverage of available data exact chosen imputation technique has only limited impact on outcome, despite the indication that the main assumption of the used techniques “available VAT = representative” might not be 100 % correct.

• More important than the imputation technique = estimate for provisional population

Imputing missing admin data for STS-estimates