new procedures for editing and imputation of demographic variables g. bianchi, a. manzari, a....

5
New procedures for Editing and Imputation of demographic variables G. Bianchi, A. Manzari, A. Pezone, A. Reale, G. Saporito ISTAT

Upload: theodore-barton

Post on 18-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New procedures for Editing and Imputation of demographic variables G. Bianchi, A. Manzari, A. Pezone, A. Reale, G. Saporito ISTAT

New procedures for Editing and Imputation of demographic

variables

New procedures for Editing and Imputation of demographic

variables

G. Bianchi, A. Manzari, A. Pezone, A. Reale, G. Saporito

ISTAT

Page 2: New procedures for Editing and Imputation of demographic variables G. Bianchi, A. Manzari, A. Pezone, A. Reale, G. Saporito ISTAT

The ISTAT purposes in handling the 2001 Italian Population Census data was providing a complete and consistent set of data by performing plausible imputations and preserving the maximum amount of collected information

Adopted strategy:

Dividing the E&I problem into simpler sub-problems and finding an appropriate solution for each of them

The overall E&I process consists of several procedures addressing specific E&I problems and implementing different E&I methods.

The aim of this strategy is to improve the quality of final results because each problem is solved by a suitable tool

In the paper three new procedures are presented

New procedures for Editing and Imputation of demographic variables

Page 3: New procedures for Editing and Imputation of demographic variables G. Bianchi, A. Manzari, A. Pezone, A. Reale, G. Saporito ISTAT

New procedures for Editing and Imputation of demographic variables

The first procedure has been developed to deal with problems occurring when connected subsets of variables are handled in sequential E&I steps

An approach suggested by the graph theory has been used, consisting in performing E&I of variables handled in the first step taking into account the information provided by variables treated in the second step

The procedure consists of three main phases:

A. Location of the variable (pivot) involved in the highest number of connections among the subsets. The pivot variable is edited in the first step

B. Definition of a new auxiliary variable, the Subset of Admissible Values (SAV) of the pivot variable, identifying the values of the pivot variable that are as much consistent as possible with the information provided by the subset of variables that will be edited in the second step

C. Performing the E&I of the pivot variable using its SAV  

Page 4: New procedures for Editing and Imputation of demographic variables G. Bianchi, A. Manzari, A. Pezone, A. Reale, G. Saporito ISTAT

The second procedure aims at locating the household reference person (Person 1) when:

one person has declared to be the Person 1 but his Year of birth is not consistent with such a role (17 years old or younger) or it is missing

either more than one person or no person has declared to be the Person 1

New procedures for Editing and Imputation of demographic variables

The approach used is based on optimization techniques and has been carried out by adapting the first fields then donors algorithm implemented in the DIESIS system to the specific problem

The procedure assigns the Person 1 role to the person which allows the minimum change of the demographic variables values to restore the household consistency

Page 5: New procedures for Editing and Imputation of demographic variables G. Bianchi, A. Manzari, A. Pezone, A. Reale, G. Saporito ISTAT

The third procedure is concerned with the treatment of invalid or inconsistent responses for the demographic variables

The demographic variables have been processed by the DIESIS system using the first donors then fields (data driven approach) and the first fields then donors (minimum change approach) algorithms

The data driven approach has been selected as default with the option to turn to the minimum change approach when, for a given failed edit household, the number of changes proposed by the data driven approach was exceedingly high, compared to the number of changes proposed by the minimum change approach

The two algorithms have been jointly used in order to balance the plausibility of the imputation actions with the preservation of the collected information.

New procedures for Editing and Imputation of demographic variables