unece - conference of european statisticians work session on statistical data editing

32
UNECE - Conference of European Statisticians Work Session on Statistical Data Editing Silvia Pacini ([email protected]) M. Carla Congia, Donatella Tuzi ISTAT - ITALY The Editing Process in the Italian Short-Term Survey on Labour Cost based on Administrative Data Vienna, Austria, 21 – 23 April 2008

Upload: vilina

Post on 13-Jan-2016

25 views

Category:

Documents


4 download

DESCRIPTION

The Editing Process in the Italian Short-Term Survey on Labour Cost based on Administrative Data. Silvia Pacini ([email protected]) M. Carla Congia, Donatella Tuzi ISTAT - ITALY. UNECE - Conference of European Statisticians Work Session on Statistical Data Editing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

UNECE - Conference of European Statisticians

Work Session on Statistical Data Editing

Silvia Pacini ([email protected])

M. Carla Congia, Donatella Tuzi

ISTAT - ITALY

The Editing Process in the Italian Short-Term Surveyon Labour Cost based on

Administrative Data

Vienna, Austria, 21 – 23 April 2008

Page 2: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

OutlinesWork Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

- The main features of the Oros Survey

- The peculiarities of the administrative sources used and their main impact on the E&I process

- The main steps of the Oros Editing process

Different from traditional surveys where many non-sampling errors may be prevented or reduced ex ante during the planning phase

- Final remarks

Page 3: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Main features of the Oros Survey (1)Work Session on Statistical Data

Editing

Variables : gross wages, other labour costs, total labour cost

Coverage : all Italian firms with at least one employee in the

private non-agricoltural sector

Timeliness : 70 days from the end of the reference quarter

Vienna, Austria, 21 – 23 April 2008

In Italy the Oros Survey represents an innovative example of administrative data extensively used to produce short-term

business statistics

Page 4: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Main features of the Oros Survey (2)

Administrative data of the

National Social Security Institute

(INPS)

The Oros Survey

100% of private

employees

Until 2002: gross wages and total labour cost were produced only by the Montly Survey on Large Enterprises covering firms with more than 500 employees (LES)Since 2003: the Oros Survey has released indicators for all Italian private firms with at least one employee through the use of the administrative data collected by the Italian National Social Security Institute (INPS) integrated with LES data

Combined with

LES Data

20%of total private

employees

Page 5: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

The administrative sources (1)

Work Session

on Statistical

Data Editing

Vienna, Austria, 21 – 23 April 2008

The AR represents the current population updated at the end of the reference quarter but suffers of over-coverage problems (temporary suspensions and firm closures are under-recorded)

Making the AR suitable for statistical purposes requires:

- checks on the quality of the fiscal code (used as firm identification code)

- drawing the NACE code from the Italian Statistical Business Register (BR-ASIA). The 90% of the INPS active units are linked

Administrative Register (AR) Structural information on

administrative units (id number, fiscal code, birth date…)

Impact on the E&I process:

Page 6: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session

on Statistical

Data Editing

Vienna, Austria, 21 – 23 April 2008

- PROVISIONAL POPULATION used to produce provisional estimates (95-98% of the population used to produce final estimates 5 quarters later)

1.3 mln employers - 10 mln employees

The administrative sources (2)

- RAW DATA trasmitted to Istat after 35 days from the end of the reference quarter. These data are not subjected to previous aggregations and checks from INPS because of the tight time constrains

-preliminary checks-complex retrieval process of the statistical variables based on a “metadata database” ad hoc in-house built

the high number of units makes necessary a selective editing

Electronic Montly Social Contribution Declarations

(DM10 forms archive)

Impact on the E&I process:

Page 7: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Main steps of the Oros Survey E&I processWork Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Given the peculiarities

of the administrative

information used

checks have been

developped along the

whole process

Preliminary checks and retrieval of statistical variables

Micro editing

Imputation of temporary employment agencies

The large firms:checks for survey data integration

Macroediting

Monthly administrativemicro data

Oros Survey indicators

Page 8: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Main steps of the Oros Survey E&I processWork Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Preliminary checks and retrieval of statistical variables

Micro editing

Imputation of temporary employment agencies

The large firms:checks for survey data integration

Macroediting

Monthly administrativemicro data

Oros Survey indicators

Page 9: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Preliminary checks and the retrieval of statistical variables (1)

The social contribution declaration or DM10 form is a detailed grid, containing information about firm, number of employees by type of employment, paid days, wage bill, social contributions, credit terms and tax relieves. Information is declared at a high disaggregated level identified by 4 digits administrative codes (more than 5,000)

Complete and updated METADATA are necessary for:

- the translation of the administrative information

- the estimation of some components of other labour costs non declared in the DM10 form

METADATA DATABASE

- in-house built to standardize and use information on laws and regulations, contribution rates, codes, and other technical aspects on Social Security

- quarterly updated

Page 10: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Preliminary checks and the retrieval of statistical variables (2)

Retrieval of statistical variables:

1) number of employees and related wage bills have to be calculated selecting and aggregating the appropriate codes

2) other labour costs have to be calculated and some components (e.g. Employers’ injuries insurance premium and severance payment) not recorded in the DM10 have to be estimated

Preliminary checks:

to investigate and possibly correct errors on codes, duplications, incoherencies with current legislation…

In this step E&I is mainly automatic and based on the metadata database BUT the metadata database updating cannot be completely automated

Page 11: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Preliminary checks and the retrieval of statistical variables (3)

TO (output)

the statistical micro data (1.3 mln of records each month)

1 record for 1 DM10

FROM (input)

the adminstrative micro data (10 mln of records each month)

8 records on average for 1 DM10

- translation - checks - aggregation

Page 12: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Preliminary checks and the retrieval of statistical variables (4)

- a mine of information for multiple statistical aims

- the possibility of keeping under control the translation of administrative information (activity not done by INPS considering the short-time available for the release of Oros Indicators)

- very complex ad hoc and in-house procedures for the translation of administrative information into statistical data

- the building and continuous updating of the metadata database which requires multiple skills (legal, statistical,etc.)

- the handling of a huge quantity of data in a very short time

Implications of highly disaggregated raw data:

Page 13: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Main steps of the Oros Survey E&I processWork Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Preliminary checks and retrieval of statistical variables

Micro editing

Imputation of temporary employment agencies

The large firms:checks for survey data integration

Macroediting

Monthly administrativemicro data

Oros Survey indicators

Page 14: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Micro editing (1)

Once statistical data have been made available

Selective editing criteria

based on a score function assigning to each of the 1.3 mln of units the probability that an error occurs in the target variables

A more traditional micro editing procedure is set up

Cut-off thresholds

to select the anomalous values which are interactively analysed and if necessary corrected

In this step E&I is mainly interactive

Page 15: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Micro editing (2)

Units are checked through some edit rules mainly based on well-known functional relations among the analysed variables. Edits are aimed at evaluating at unit record level both cross-sectional and longitudinal consistency using the information on the previous month:

- a positive amount of wage bills must correspond to a positive amount of employment, and often to a particular rate of social contributions

- the number of employees recorded in the current month should not significantly differ from that of the previous month

- the gross per capita wages, or the per capita paid days, should have similar and acceptable amounts in the analysed period

- the rate of social contributions on gross wages should fall within an expected range

Page 16: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Micro editing (3)

These aspects affect both the calculation of suitable check indicators and the single out of potential errors

- very low per capita wages

(e.g. firms with employees all receiving only supplementary earnings from employers)

- negative per capita other labour costs

(e.g. social contribution rebates)

Given the peculiarities of the administrative data used, the target variable distributions may have significant tail area so that the identification of the cut-off threshold is particularly problematic:

Page 17: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Main steps of the Oros Survey E&I processWork Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Preliminary checks and retrieval of statistical variables

Micro editing

Imputation of temporary employment agencies

The large firms:checks for survey data integration

Macroediting

Monthly administrativemicro data

Oros Survey indicators

Page 18: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Imputation of temporary employment agencies (1)

The INPS provisional population covers the 95-98% of total active units but evidence shows that unit nonresponses do not affect Oros wages and other labour costs changes.

Temporary employment agencies are an exception:

These large enterprises are not included in the LES data. Their imputation essential because of their weight: the absence of only

one of this unit may have a significant impact not only on levels but also on changes of the per capita indicators

100 enterprises

300,000 employees

3% of privat sector employment

20% of employees of sector K (Real estate, renting and business activities) where they are all classified by INPS

Page 19: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Imputation of temporary employment agencies (2)

In a traditional survey:

a list of theoretical respondents is available

Prediction of the unit activity state

throught a longitudinal analysis of the unit activity in the nearby quarters (based on the evidence that it is actually low the probability that a

latecamer position in a quarter is latecamer also in its near quarters)

Which are unit nonresponses?

Given the dinamic nature of these firms, it is necessary to follow their frequent changes (e.g. mergers, split-ups, etc.) over time to correctly single out unit nonresponses

In the Oros Survey:

the AR is available but it suffers of over-coverage problems

Page 20: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Imputation of temporary employment agencies (3)

Deterministic imputation mainly based on the longitudinal information available on each unit:

• suitable values are selected from the closest quarter when the current missing unit was respondent

• those values are fairly updated using panel information drawn from the current respondents

Imputation criteria

Page 21: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Main steps of the Oros Survey E&I processWork Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Preliminary checks and retrieval of statistical variables

Micro editing

Imputation of temporary employment agencies

The large firms:checks for survey data integration

Macroediting

Monthly administrativemicro data

Oros Survey indicators

Page 22: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

The large firms: checks for survey data integration (1)

Why the integration with survey data?

The administrative data could guarantee the coverage of all firms in the private sectors, but for the estimation of Large Enterprises the use of LES data is preferable because:

- each of them has a relevant influence on the estimations (1000 enterprises / 2 million of employees)

- they are frequently subjected to changes over time

A direct contact with LE can guarantee a higher quality of data

and a more rapid and efficient management of their changes

(spill overs, mergers,…)

Page 23: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

The large firms: checks for survey data integration (2)

1) the production of variables harmonised with those produced using administrative data

2) a check and editing procedure to correctly single out the LES enterprises in the administrative data. Starting from the list of firms belonging to the survey, a complementary list of INPS firms must be defined avoiding omissions and duplications

The integration implies:

Record linkage problems

The FISCAL CODE is the only matching variable between the two archives but it is not sufficient to correctly identify the SAME firms for:

- formal errors

- updating at different times (mergers, hive-offs, split-ups might be recorded in several periods)

Page 24: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

The large firms: checks for survey data integration (3)

LES archive

Fiscal code Employees

A 1000

Fiscal code Employees

A 1000

B 5000

C 2000

B 3500 X 1500

…… ……. Y 2000

INPS archive

12% of total LES employment D 1500

E 900

Z 2400

………………….. …………………...

…… ……. …… ….…

A firm may have different fiscal codes in the two archives:

signal of possible problems is a significantly different number of employees

manually checked and joined to the correspondent INPS firms

Page 25: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Main steps of the Oros Survey E&I processWork Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Preliminary checks and retrieval of statistical variables

Micro editing

Imputation of temporary employment agencies

The large firms:checks for survey data integration

Macroediting

Monthly administrativemicro data

Oros Survey indicators

Page 26: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Macroediting (1)

Changes in contribution legislation as economic events with an impact on macro data are frequent, so irregular but acceptable trends must be as possible distinguished from anomalies due for example to:

- an erroneous updating of the “metadata database”

- outliers/errors not singled out and corrected in the previous editing steps

Final quality controls on macro data are a key step in the E&I process to identify possible residual errors that may

significantly affect the series

If errors are detected, a drill-down to micro data is necessary

Page 27: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Macroediting (2)

- analytic and graphical inspection of the time series at a sub-population detail, through some statistical measures which have to respect pre-determined acceptance boundaries

- automatic detection of outliers based on TERROR, an application of the software TRAMO-SEATS (Caporello and Maravall, 2002) which detects suspected errors in the last observations comparing them with their forecasts estimated trough REG-ARIMA models

- comparison with figures drawn from other Istat statistical sources (e.g. National Account data, Indices of wages according to collective agreements, etc.)

- variable relationships, whose coherence has always to be guaranteed (e.g. the ratio of other labor costs on wages, the evolution of their trends, etc.)

Page 28: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Main steps of the Oros Survey E&I processWork Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Preliminary checks and retrieval of statistical variables

Micro editing

Imputation of temporary employment agencies

The large firms:checks for survey data integration

Monthly administrativemicro data

Oros Survey indicators

Macroediting

Documen t a t i on

Page 29: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Documentation of the E&I process

The Oros E&I process, ad hoc developped in SAS and fully integrated in the general survey production process, is quarterly updated and documented:

- metadata are archived

- methodological information is documented

- imputed data are flagged (and pre-imputation data are archived)

- quality indicators on the impact of the imputation are calculated

The documentation of the Oros process guarantees its reproducibility and repeatability

Page 30: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Final remarks

The E&I process of the quarterly Italian Oros Survey:

- developed without any previous experience in the use of administrative data for the production of short term indicators

- gradually implemented learning by the experience

- characterized by the implementation of a complex translation process of administrative information into statistical data

- continuously updated because of the evolution of the social security legislation

- consists of a systematic sequence of checks and editing steps which should assure the quality of indicators produced

- turns out to be reliable both in terms of effectiveness (quality of the entire process) and efficiency (relatively limited time consuming and low use of human and economic resources)

Less “standardizable” than the E&I process of a traditional survey?

Page 31: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

Final remarks

Although the E&I process of a survey based on administrative data is very source-specific,

what have we learnt from the Oros experience?

Grater is the disaggregation of the source

very selective has to be the E&I process but still interactive (only partially automatic)

Link with administrative institutions are fundamental, nevertheless in-house metadata database and ad hoc procedures have to be built

Greater is the size file

Greater is the complexity of the nature of data and the frequence of metadata changes

more detailed checks are necessary and more E&I steps have to be developped

higher attention and human sources have to be used to monitor frequent modifications

Page 32: UNECE - Conference of European Statisticians Work Session on Statistical Data Editing

Work Session on Statistical Data

Editing

Vienna, Austria, 21 – 23 April 2008

References

Baldi C., Ceccato F., Cimino E., Congia M.C., Pacini S., Rapiti F., Tuzi D. (2004) Use of Administrative Data to produce Short Term Statistics on Employment, Wages and Labour Cost. Essays, n.15/2004, Istat, Rome.

Caporello G., Maravall A. (2002) A tool for quality control of time series data. Program TERROR. Bank of Spain.

Istat (2006) Rilevazione mensile sull’occupazione, gli orari di lavoro e le retribuzioni nelle grandi imprese, Metodi e Norme n.29, Roma.

Istat, CBS, SFSO, Eurostat (2007) Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys, available on the web site:http://edimbus.istat.it/dokeos/document/document.php?openDir=%2FRPM_EDIMBUS

Thank you for your attention

Silvia Pacini

[email protected]