an r vs sas experiment megan pope and gareth clews office for national statistics
Post on 21-Jan-2016
213 Views
Preview:
TRANSCRIPT
An R vs SAS Experiment
Megan Pope and Gareth ClewsOffice for National Statistics
R at ONS
• Open source software in ONS• Supporting the government IT strategy• Development of training for GSS
• R Development Groupi. Support use of R within ONSii. Increase user baseiii. Aim for incorporation in production systems
• Teaching R to a SAS audience• Increasing usage
2
SAS at ONS
• Designated standard software
• Statistics Canada Generalised Estimation System (GES)
• Suite of SAS macros
• Calibration weights, domain estimates, variance estimates
3
ReGenesees
• Free R package
• R evolved Generalised software for sampling estimates and errors in surveys
• Developed by Italian Statistics Office (Istat)
4
R vs SAS
• Comparative study of complex survey estimation software
• Quality Improvement Fund (QIF)
• SAS (GES) v R (ReGenesees)
• Investigating open source in line with GSS strategy
5
Calibration
• Used if there is a relationship between auxiliary data and response variable
• An estimation procedure which constrains sample-based estimates of auxiliary variables to known totals (or accurate estimates)
6
Surveys chosen and why ...
• Business surveys
• QSI– Cut-off sample
• BRES – Separate calibration totals Set thresholds for Winsorisation
• ABS – Biggest survey with 4,000 strata Externally calibrated weights
7
Surveys chosen and why ...
• Social surveys
• LFS – biggest survey resource intensive
• LOS – longitudinal
• IPS – 2-stage calibration
8
Quarterly Stock Inquiry
• Cut-off samplingCombined ratio estimationCalibration to one auxiliary
• Estimates and variance estimates
• GES – Seven separate input filesReGenesees – Six simple commands
9
Quarterly Stock Inquiry - GES
10
Quarterly Stock Inquiry - ReGenesees
• design<e.svydesign(data= ids= strata= weights= fpc=)
• template<-pop.template(data= calmodel= partition=)
• pop<-fill.template(universe= template=)
• population.check(df.population= data= calmodel= partition=)
• cal<-e.calibrate(design= df.population= sigma2=)
• est<-svystatTM(design= y= by= ,)
11
What we found ......
• Software comparison
• Time
• Missing values
• Programming
12
Conclusions/Recommendations
• ReGenesees successfully used in place of GES
• ReGenesees easier – less risk!
• GES more capable for some aspects and vice versa
• Recommend to explore further!
13
Questions
top related