summary of the presentation
DESCRIPTION
Summary of the presentation. Objectives and evolution of the software Software installation pre-requisites Data needed for Genesees Input data sets (characteristics, controls) Output: tables, file formats, data sets Structural Business Statistics (SBS) surveys using Genesees - PowerPoint PPT PresentationTRANSCRIPT
GENEralised software for Sampling Estimates and Errors in Surveys
2
Summary of the presentation
Objectives and evolution of the software Software installation pre-requisites Data needed for Genesees Input data sets (characteristics, controls) Output: tables, file formats, data sets Structural Business Statistics (SBS) surveys using Genesees Population of interest - Business Register Domains of interest SME Sampling strategy (current) Variables of interest Case study
GENEralised software for Sampling Estimates and Errors in Surveys
3
Objectives and evolution of the software (1/2)
Need to estimate variables of interest for social and economic statistics
Guarantee coherence among estimates in time and space
Improve quality of data produced (for example, in accordance to SBS Council Regulation)
Methodology (Deville and Särndal, 1992)
Implemented by Falorsi P.D. – Falorsi S..
GENEralised software for Sampling Estimates and Errors in Surveys
4
Objectives and evolution of the software (2/2)
Genesees prototype for social statistics Genesees prototype for enterprises statistics (1992 as first reference year) Several contributions to the development of the software have thereafter been provided by other Istat researchers Delivery of the new releases is made regularly Genesees is currently used for estimation in almost all Istat surveys
GENEralised software for Sampling Estimates and Errors in Surveys
5
Software installation pre-requisites
SAS for Windows SAS Language, Macro, IML, Stat, Graph HD ≥ 4 Mb; RAM ≥ 64 Mb
How to download Genesees: http://www.istat.it/Metodologi/index.htm then select: “Metodi e Software per le indagini statistiche” download and then unzip the file “Genesees3.zip” on the directory c:\Genesees E-mail to [email protected] for the starting password will inform you about the new releases of the software
GENEralised software for Sampling Estimates and Errors in Surveys
6
Data needed for Genesees
Frame (example: Business Register) → to get the known totals of auxiliary variables as a reference structure
Survey respondent units → to compute the initial sampling weight correction factor and then to assign the final sampling weight to each unit
GENEralised software for Sampling Estimates and Errors in Surveys
7
Input data sets (characteristics)
Input SAS data sets: (“Noti”; “Inp”)
“Noti”: (var. name≤8 char.) Planned population = domain of interest: (alfanum. var.; var. ≤15 char.) Totals of auxiliary variables: (num. var.; at least 1 var.)
“Inp”: (var. name≤8 char.) Id. Code (num. var.) Planned population (as in “Noti”) Auxiliary variables: (num. var.) (have to be inputted in the same order as in “Noti”) Coef = initial weight (adjusted for unit non response); (num. var.) Ck = “distance weight”: (num. var.); not necessary
GENEralised software for Sampling Estimates and Errors in Surveys
8
Input data sets (controls)
“Noti”: Planned popul. = . → Procedure stops → data set “Noti-miss” Totals of aux. var. = . → 0
“Inp”: Id. Code = . → Procedure stops → data set “Missing” Id. Code = double → data set “Codici-doppi” Auxiliary variables = . → 0 Coef = . → 1 (no controls) Ck = . → 1
GENEralised software for Sampling Estimates and Errors in Surveys
9
Output tables
Output tables (summary descriptive statistics related to the calibration estimators process):
Table 1: Statistics on estimates and final weights for planned popul.; Table 2: Statistics on initial weights correction factors; Table 3: Statistics on estimates and initial weights; Table 4: Prefixed parameters for the estimation iterative procedure; Table 5: Known totals, direct and final estimates, and differences;
Tabulate 1: Controls on the domains: known totals, direct estimates, ratios between known totals and direct estimates, sample totals;
Tabulate 2: Sample size (respondents) and population estimate with direct weights; Tabulate 3: Controls on domains without sample units.
GENEralised software for Sampling Estimates and Errors in Surveys
10
Output file formats
Output file formats “genesees.log” (SAS log) “stampa1.txt” – “stampa6.txt” (Tables) “stampe stime.htm” (Tables) Data sets SAS (“*.sas7bdat”)
GENEralised software for Sampling Estimates and Errors in Surveys
11
Output data sets (1/2)
Diagnostics (errors detected in the input step, if any): “missing”; (Id. Code = .) “noti-miss”; (Planned popul. = .) “vuoti” (domain is present in “Noti” but is not present in “Inp”); “codici-doppi”; (Id. Code = double) “csenzat” (domain is present in “Inp” but is not present in “Noti”); “savestime” (shows parameters inputted)
GENEralised software for Sampling Estimates and Errors in Surveys
12
Output data sets (2/2)
Statistics and final weights: “Pesifin” (initial w.; corr. factor; final w.; id.; conta; domain); “stat”
• conta;• max; min; sum; mean; var; cv; (with reference to initial weights, correction factor and
final weights)• Iterations; maxiter; converge; constraints (c2); sample units in the domain (r2); dist. func.;
”stimedir”: domain; aux. var. totals; conta; ”stimefin”: known total; direct estimate; final estimate; conta; difference between
final estimate and known total
GENEralised software for Sampling Estimates and Errors in Surveys
13
Structural Business Statistics (SBS) Surveysusing Genesees (1/2)
Small and Medium Enterprises (SME) Survey Information and Communication Technologies (ICT) Survey Structure of Earnings Survey (SES) Labor Cost Survey (LCS) Prodcom SBS Preliminary Estimates …
GENEralised software for Sampling Estimates and Errors in Surveys
14
Structural Business Statistics (SBS) Surveysusing Genesees (2/2)
Estimation of economic variables on enterprises according to: Istat traditional data production on enterprises Structural Business Statistics (SBS) EU Council Regulation No 58/97
Preliminary estimates (1 estimation domain; t + 10 months) Final estimates (3 estimation domains; t + 18 months) Quality indicators and specific reports (3 estim.domains; t + 24 months)
• Coefficient of Variation - CV (3 domains);• Item and unit non response rate (1 domain);• Specific reports on survey strategy and principal economic activity.
t = year of reference
GENEralised software for Sampling Estimates and Errors in Surveys
15
Population of interest (1/2)
Number of Italian enterprises (SBS 2002)
Number of persons employed
Economic activity sector (NACE Rev.1 Division) 1-9 10-19 20-49 50-249 250+ Total
Manufacturing (10-41) 463,052 54,359 26,229 10,506 1,553 555,699
Constructions (45) 513,156 18,403 5,124 1,139 78 537,900
Services (50-74) 2,555,631 49,268 17,405 6,597 1,261 2,630,162
Total 3,531,839 122,030 48,758 18,242 2,892 3,723,761
Number of persons employed of the Italian enterprises (SBS 2002)
Number of persons employed
Economic activity sector (NACE Rev.1 Division) 1-9 10-19 20-49 50-249 250+ Total
Manufacturing (10-41) 1,230,838 730,888 774,903 951,764 1,176,772 4,865,165
Constructions (45) 1,045,386 237,915 145,778 98,091 47,850 1,575,020
Services (50-74) 4,464,548 643,597 517,363 634,725 1,459,231 7,719,464
Total 6,740,772 1,612,400 1,438,044 1,684,580 2,683,853 14,159,649
GENEralised software for Sampling Estimates and Errors in Surveys
16
Population of interest (2/2)
Table 1. Administrative data sources and variables provided
Variables ASIA Register Balance Sheets Data
Social security Data
Fiscal Data
Identity code x x x x Economic activity code x x x x
Style of the firm x x x Firm location x x
Economic variables x x Employment variables x x x
GENEralised software for Sampling Estimates and Errors in Surveys
17
Business Register ASIA
- Data sources:- Tax Register, Chambers of Commerce, Social Security, Work Accident
Insurance, Electric Power Board, SEAT telephone directory
- Statistical and probabilistic procedure for enterprises’ main economic activity detection
- Variables in the register are the result of standardization, normalization and integration of information provided by administrative sources
GENEralised software for Sampling Estimates and Errors in Surveys
18
Domains of study (SBS final estimates)
Code Type of domain
(partition of population of interest)
Number of domains
(in the partition)
DOM1 NACE Rev.1.1 Class (4-digit) 461
DOM2NACE Rev.1.1 Group (3-digit) by size-class
1,047
DOM3NACE Rev.1.1 Division (2-digit)by region
984
GENEralised software for Sampling Estimates and Errors in Surveys
19
SME Sampling strategy (current)
N ≈ 3,723,000 enterprises (Business Register) (enterprises <10 persons employed cover 94.8% of the total enterprises and
47.8% of the total employment) Stratified simple random sample H ≈ 26,000 strata (NACE Rev.1.1, Size class, Region) n ≈ 120,000 (negative coordination with other SBS Surveys, multivariable and multidomain sample allocation) Survey technique: postal questionnaire; 2 call-backs Calibration estimators methodology (Deville and Särndal,1992)
GENEralised software for Sampling Estimates and Errors in Surveys
20
Variables of interest
- Turnover- Value added at factor cost- Employment- Total purchases of goods and services- Personnel costs- Wages and salaries- Production value- …..Totals of variables of study are estimated with reference to subpopulation of interest (domains), as requested by SBS EU Regulation
GENEralised software for Sampling Estimates and Errors in Surveys
21
Case study
Data base used for the study:
Population: N = 291,202 (joint-stock companies into the BR)
Sample size: n = 33,608 (11.5%)
Domain of study considered:
DOMINIO: NACE Rev. 1.1 group
GENEralised software for Sampling Estimates and Errors in Surveys
22
Starting picture
GENEralised software for Sampling Estimates and Errors in Surveys
23
Thank you!