population health surveys bootstrap hands-on workshop yves beland, cchs senior methodologist larry...
Post on 18-Dec-2015
215 views
TRANSCRIPT
![Page 1: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/1.jpg)
Population Health SurveysBootstrap Hands-on Workshop
Yves Beland, CCHS senior methodologist
Larry MacNabb, CCHS dissemination manager
developed by
François Brisebois CCHS/NPHS senior [email protected]
![Page 2: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/2.jpg)
Purpose of the presentation
Justify the use, understand the theory, and
get familiar with the bootstrap technique
Demystify all illusions about using the
bootstrap technique for variance estimation
![Page 3: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/3.jpg)
Outline
Context NPHS \ CCHS Complex survey design Variance estimation \ Bootstrap 101 Data support \ using the bootvar program Why bootstrap? CV lookup tables Historical info about variance estimation for NPHS Variance estimation with other software programs Future for STC Health Surveys (re. bootstrap)
![Page 4: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/4.jpg)
Context
A data user is interested in producing some results
1- Compute an estimate (total, ratio, etc.)
2- Compute the precision of the estimate (variance,
coefficient of variation (CV), etc.)
![Page 5: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/5.jpg)
Context
1- Compute an estimate Is not a problem! Use the provided survey weight with
NPHS/CCHS files
![Page 6: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/6.jpg)
Context
1- Compute an estimate (cont’d) Why use the survey weight?
# People % PeopleUnweighted 620 4.1Weighted 865,910 3.5Source: 1998 Master Health f ile
NPHS Estimates for Diabetes - Canada
Conclusion: ALWAYS USE THE WEIGHTS
![Page 7: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/7.jpg)
Context
2- Compute the precision of an estimate Is a problem!!
Estimate Std Dev.Unweighted 4.1 0.162Weighted 3.5 0.151Bootstrap weights 3.5 0.177Source: 1998 Master Health f ile
NPHS Estimates for Diabetes - CanadaSTANDARD DEVIATIONS
% People
![Page 8: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/8.jpg)
Context
2- Compute the precision of the estimate (cont’d) Scaled weights:
Scaled weight = weight / mean(weight) Used to overcome problems with the computation of
the variance for some statistics in SAS Reference: paper from G.Roberts & al.
![Page 9: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/9.jpg)
Context
2- Compute the precision of the estimate (cont’d) Why such a difference?
Answer: The complex survey design is the main cause (other factors to be discussed later)
Note: CCHS and NPHS have slightly different frames but are both considered as complex survey designs
![Page 10: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/10.jpg)
Complex survey design
1- Each province is divided into strata
Stratum #1
Stratum #2
Province AProvince A
![Page 11: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/11.jpg)
2- Selection of clusters within each stratum
Stratum #1
Stratum #2
Province AProvince A
Complex survey design
![Page 12: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/12.jpg)
3- Selection of households within each cluster
Stratum #1
Stratum #2
Province AProvince A
Complex survey design
![Page 13: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/13.jpg)
How does the sample design affect the precision of estimates? Stratification decreases variability (more precise)
Clustering increases variability (less precise)
Overall, the multistage design has the effect of
increasing variability (less precise than SRS)
Complex survey design
![Page 14: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/14.jpg)
So why use a multistage cluster sample design anyway?
Pros:
Efficient for interviewing (less travel, less costly)
Better coverage of the entire region of interest
Cons:
Problems for variance estimation
Complex survey design
![Page 15: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/15.jpg)
Variance estimation with complex multistage cluster sample design:
Exact formula for variance estimation is too complex; use of an approximate approach required
NOTE: taking account for the design in variance estimation is as crucial as using the sampling weights for the estimation of a statistic
Bootstrap Method
![Page 16: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/16.jpg)
Approximate methods for variance estimation: Taylor linearization Re-sampling methods:
Balanced Repeated Replication Jackknife Bootstrap
Bootstrap Method
![Page 17: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/17.jpg)
Principle: You want to estimate how precise is your
estimation of the number of smokers in Canada You could draw 500 totally new samples, and
compare the 500 estimations you would get from these samples. The variance of these 500 estimations would indicate the precision.
Problem: drawing 500 new samples is $$$ Solution: Use your sample as a population, and take
many smaller subsamples from it.
Bootstrap Method
![Page 18: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/18.jpg)
How Bootstrap weights are created(the secret is finally revealed!!!)
Bootstrap 101
Starting point: Full data file (example presented for a given stratum)ID Wgt ClusterA 10 1B 10 1C 10 1D 10 2E 10 2F 10 2G 10 3H 10 3I 10 4J 10 4
Select n-1 clusters among n within each stratum (with replacement)ID Wgt Cluster B1 = # of times the cluster is selectedA 10 1 1B 10 1 1C 10 1 1D 10 2 1E 10 2 1F 10 2 1G 10 3 0H 10 3 0I 10 4 1J 10 4 1
Repeat the process 500 times (*BOOTSTRAP REPLICATES*)ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 1 1 0 3B 10 1 1 0 3C 10 1 1 0 3D 10 2 1 1 0E 10 2 1 1 0F 10 2 1 1 0G 10 3 0 0 0H 10 3 0 0 0I 10 4 1 2 0J 10 4 1 2 0
Apply the survey weight (Wgt) (*BOOTSTRAP WEIGHTS*)ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 1 10 0 30B 10 1 10 0 30C 10 1 10 0 30D 10 2 10 10 0E 10 2 10 10 0F 10 2 10 10 0G 10 3 0 0 0H 10 3 0 0 0I 10 4 10 20 0J 10 4 10 20 0
Adjust for the fact that we picked n-1 among n (factor = n / n-1 = 1.33)ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 1 13 0 40B 10 1 13 0 40C 10 1 13 0 40D 10 2 13 13 0E 10 2 13 13 0F 10 2 13 13 0G 10 3 0 0 0H 10 3 0 0 0I 10 4 13 27 0J 10 4 13 27 0
USING THE BOOTSTRAP WGTS: Estimate the number of smokersID Wgt Cluster Smoke B1 B2 . . . . . . . . . . . . B500A 10 1 X 13 0 40B 10 1 X 13 0 40C 10 1 13 0 40D 10 2 13 13 0E 10 2 13 13 0F 10 2 13 13 0G 10 3 X 0 0 0H 10 3 0 0 0I 10 4 13 27 0J 10 4 X 13 27 0
40 39 27 . . . . . . . . . . . . 80
T = 40Var = (Bi - B)2 / 499
![Page 19: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/19.jpg)
How Bootstrap replicates are built (cont’d) The “real” recipe
1- Subsampling of clusters (SRS) within strata
2- Apply (initial design) weight
3- Adjust weight for selection of n-1 among n
4- Apply all standard adjustments (nonresponse, share, etc.)
5- Post-stratification to population counts
Bootstrap 101
![Page 20: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/20.jpg)
How Bootstrap replicates are built (cont’d) The bootstrap method intends to mimic the same approach
used for the sampling and weighting processes
Be careful: some software programs say they include the
bootstrap technique; what they really do is to skip steps #4 and
#5, and use directly the final weight in step #2
Bootstrap 101
![Page 21: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/21.jpg)
Bootstrap 101
STC Methodologists create the bootstrap weight files.
Can you create your own bootstrap wgt file? No
Why? Because to do so you need to know:
The design information, i.e. strata, clusters (to generate the
bootstrap subsamples)
The definition of all adjustment classes (including post-
stratification)
![Page 22: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/22.jpg)
Bootstrap 101
The bootstrap wgt files are:
Available for all file (except PUMF - confidentiality)
Distributed with the data files in separate files
The bootstrap wgt files contain: IDs (REALUKEY/SAMPLEID, PERSONID)
Final sampling weight (WTxx)
500 Bootstrap weights (BSW1--BSW500)
![Page 23: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/23.jpg)
Bootstrap - Support
NPHS/CCHS provides data users with SAS & SPSS
macro programs to compute bootstrap variances
Macros simplifying computation of bootstrap variance
estimates for totals, ratio, differences of ratios, regressions
(linear and logistic), and basic generealized linear models
Come with documentation & examples
French and English
referred as “bootvar”
![Page 24: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/24.jpg)
Example: Step by Step
Let’s get to work!
Goal: Interested in estimating the number of diabetics (total)
NPHS 1998-99 Dummy file (see information sheet)
# % of populationTotal cases DIAB DIAB / TOTAL
Diabetes (CCC6_1J), some totals and ratiosNPHS 1998-99 Dummy Health File
# % of populationTotal cases of diabetes
Diabetes (CCC6_1J), some totals and ratiosNPHS 1998-99 Dummy Health File
![Page 25: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/25.jpg)
STEP #1
Create your « analysis data file »
Read NPHS\CCHS data file
Prepare dummy variables
necessary for your analysis
Keep only necessary variables
(include geography desired)
Run the analysis to get point
estimates only (not necessary but recommended)
STEP #2
Compute your variances
with bootvar
Location of INPUT files: Your « analysis data file » The bootstrap weights file
Geography desired Number of bootstrap weights
to use
Specify the desired analysis Totals, ratios, diff of ratios Regression (linear & logit) Generalized linear modeling
Example: Step by Step
![Page 26: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/26.jpg)
Example: Step by Step
Step #1: On your own
(but can use the examples provided as a starting point)
Step #2: Use the provided Bootvar program
![Page 27: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/27.jpg)
STEP #1
Read input file Create dummy variables Keep only necessary variables Run the analysis to get point estimates
Create dummy variables
For qualitative/categorical variables, we need to identify which value(s) we are interested in. This is done through the creation of a dummy variable
Dummy variable= 1 for characteristic of interest
= 0 otherwise
![Page 28: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/28.jpg)
STEP #1
Create dummy variable: example #1 During the past 12 months, how often did you drink
alcoholic beverages? (ALC8_2)1=Less than once a month2=Once a month3=2 to 3 times a month4=Once a week5=2 to 3 times a week6=4 to 6 times a week7=Every day
Interested in categories 1 to 4 (once a week or less) DRINK
= 1 if ALC8_2 is 1,2,3 or 4= 0 otherwise
![Page 29: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/29.jpg)
STEP #1
Create dummy variable: example #2
Diabetes (CCC8_1J) Sex (DHC8_SEX)1=Yes 1=Male2=No 2=Female6=Not applicable7=Don’t know9=Not stated
Interested in “males having diabetes” mdiab
= 1 if CCC8_1J = 1 and SEX =1= 0 otherwise
![Page 30: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/30.jpg)
STEP #1
Create dummy variable: example #2 How to use the dummy variable to get an estimate
Total:MDIAB WT56 (product)
0 100 00 200 01 300 3001 400 4001 500 5000 600 00 700 00 800 0
ESTIMATE = 1200
In SAS:
Proc freq; tables mdiab; weight wt56;run;
![Page 31: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/31.jpg)
STEP #1
Create dummy variable: example #2 How to use the dummy variable to get an estimate
Ratio:MDIAB TOTAL WT56 (num) (den)
0 1 100 0 1000 1 200 0 2001 1 300 300 3001 1 400 400 4001 1 500 500 5000 1 600 0 6000 1 700 0 7000 1 800 0 800
ESTIMATE = 1200 36001200 / 3600 = 33%
![Page 32: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/32.jpg)
# % of populationDiabetes (Nfld, Man & BC) 169,700 3.1
Diabetes (CCC6_1J), some totals and ratiosNPHS 1998-99 Dummy Health File
STEP #1
See example in SPSS
![Page 33: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/33.jpg)
# % of populationDiabetes (Nfld, Man & BC) 169,700 3.1Asthma (Nfld, Man & BC) ASTHMA ASTHMA / TOTAL
Diabetes & Asthma, some totals and ratiosNPHS 1998-99 Dummy Health File
# % of populationDiabetes (Nfld, Man & BC) 169,700 3.1Asthma (Nfld, Man & BC) 446,800 8.1
Diabetes & Asthma, some totals and ratiosNPHS 1998-99 Dummy Health File
STEP #1
Now your turn! (exercise #1)
Add asthma (CCC8_1C) to the table
Use existing program (step1.sas) and add SPSS codes to create a dummy variable for asthma; and then get the results
![Page 34: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/34.jpg)
Step #2: Bootvar Program
Created by methodologists in 1997
(first used with NPHS cycle 2 data)
Version 1.0 one single program (over 1,000 lines of codes)
divided into 4 sections
users have to adapt the program to their requests; changes
in 3 sections
SAS: bootvar.sas / bootvarf.sas
SPSS: beta version available only on request (bvr_b.sps)
![Page 35: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/35.jpg)
Version 2.0 Justifications:
Compatible with SAS 8+
Centralize the codes where modifications have to be done by the user
Can use with both NPHS and CCHS data files
Now consists of 2 programs
Contains the codes users need to modify for their requests
Contains the codes users do not have to modify (macros)
Step #2: Bootvar Program
![Page 36: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/36.jpg)
Version 2.0 SAS version:
bootvare_v20.sas / bootvarf_v20.sas
macroe_v20.sas / macrof_v20.sas
SPSS version:
bootvare_v21.sps / bootvarf_v21.sps
macroe_v21.sps / macrof_v21.sps
Step #2: Bootvar Program
![Page 37: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/37.jpg)
STEP #2: Use of bootvar
Point estimates have already been obtained, let us now estimate the sampling variability of those estimates
Go through the bootvar program (bootvare_v21.sps)
![Page 38: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/38.jpg)
See example in SPSS
STEP #2: Use of bootvar
# % of pop.Diabetes 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Asthma 446,800 8.1
95% C.I.
Nfld, Man & B.C. only
95% C.I.
Diabetes & Asthma, some totals and ratiosNPHS 1998-99 Dummy Health File
![Page 39: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/39.jpg)
STEP #2
Now your turn! (exercise #2) Compute confidence intervals for asthma
Use bootvare_v21.sps and adjust it to obtain desired results(use the already set up step2.sps program for this exercise)
# % of pop.Diabetes 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Asthma 446,800 ? 8.1 ?Nfld, Man & B.C. only
95% C.I.
Diabetes & Asthma, some totals and ratiosNPHS 1998-99 Dummy Health File
95% C.I.# % of pop.Diabetes 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Asthma 446,800 (381,700 ; 511,900) 8.1 (6.9 ; 9.3)
95% C.I.
Nfld, Man & B.C. only
95% C.I.
Diabetes & Asthma, some totals and ratiosNPHS 1998-99 Dummy Health File
![Page 40: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/40.jpg)
Why 500 bootstrap weights? Size of file (for dissemination)
Time of computation (for an average PC)
Accuracy
Use more bootstrap weights? Faster PC
Accuracy for small domains and more complex analysis
methods
Bootstrap - More
![Page 41: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/41.jpg)
Confidentiality revealed from the bootstrap weights
Bootstrap - More
ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 1 13 0 33B 10 1 13 0 33C 10 1 13 0 33D 10 2 13 14 0E 10 2 13 14 0F 10 2 13 14 0G 10 3 0 0 0H 10 3 0 0 0I 10 4 13 29 0J 10 4 13 29 0
ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 ? 13 0 33B 10 ? 13 0 33C 10 ? 13 0 33D 10 ? 13 14 0E 10 ? 13 14 0F 10 ? 13 14 0G 10 ? 0 0 0H 10 ? 0 0 0I 10 ? 13 29 0J 10 ? 13 29 0
ID Wgt Cluster B1 B2 . . . . . . . . . . . . B500A 10 ? 13 0 33B 10 ? 13 0 33C 10 ? 13 0 33D 10 ? 13 14 0E 10 ? 13 14 0F 10 ? 13 14 0G 10 ? 0 0 0H 10 ? 0 0 0I 10 ? 13 29 0J 10 ? 13 29 0
![Page 42: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/42.jpg)
Confidentiality revealed from the bootstrap weights (cont’d) How PUMF users estimate their exact variances?
Remote access Provide dummy file
(same structure as master files but contain dummy data) Test programs and send by e-mail
Research Data Centre Regional Offices
Bootstrap - More
![Page 43: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/43.jpg)
Why Bootstrap?
Other techniques examined: Taylor, Jackknife Taylor:
Need to define a linear equation for each statistic examined
Jackknife: Can not disseminate because of confidentiality Number of replicates depends on the number of
strata (large number of strata in 1996 makes it impossible to disseminate)
![Page 44: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/44.jpg)
Why Bootstrap?
Bootstrap: Handle more easily survey design with many strata Sets of 500 bootstrap weights can be distributed to
data users Recommended (over the jackknife) for estimating the
variance of nonsmooth functions like quantiles, LICO Reference: “Bootstrap Variance Estimation for the
National Population Health Survey”, D.Yeo, H.Mantel, and T.-P. Liu. 1999, ASA Conference.
![Page 45: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/45.jpg)
Bootvar: exercise #3
Results for diabetes broken down by sex and province
# % of pop.Nfld,Man & BC 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Nfld
MalesFemales
ManitobaMalesFemales
B.C.MalesFemales
95% C.I.
Diabetes, some totals and ratiosNPHS 1998-99 Dummy Health File
95% C.I.# % of pop.Nfld,Man & BC 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Nfld DIAB TOTAL
Males MDIAB MDIAB / MTOTALFemales FDIAB FDIAB / FTOTAL
ManitobaMalesFemales
B.C.MalesFemales
95% C.I.
Diabetes, some totals and ratiosNPHS 1998-99 Dummy Health File
95% C.I.# % of pop.Nfld,Man & BC 169,700 (133,400 ; 205,900) 3.1 (2.4 ; 3.8)Nfld 24,900 (18,200 ; 31,500) 4.6 (3.4 ; 5.9)
Males 9,800 (4,600 ; 14,700) 3.7 (1.7 ; 5.6)Females 15,100 (10,000 ; 20,100) 5.6 (3.7 ; 7.4)
Manitoba 32,300 (20,400 ; 44,200) 3.0 (1.9 ; 4.1)Males 15,800 (7,300 ; 24,400) 3.0 (1.3 ; 4.5)Females 16,500 (8,000 ; 25,000) 3.0 (1.5 ; 4.7)
B.C. 112,500 (79,300 ; 145,600) 2.9 (2.0 ; 3.7)Males 68,700 (43,500 ; 93,900) 3.5 (2.2 ; 4.8)Females 43,700 (22,200 ; 65,300) 2.2 (1.1 ; 3.4)
95% C.I.
Diabetes, some totals and ratiosNPHS 1998-99 Dummy Health File
95% C.I.
![Page 46: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/46.jpg)
Bootvar: Tricks
If you need to create a dummy variable for a characteristic based on many variables: Example: Males with diabetes First, create dummy variables for each individual
variable (males, diabetes) Then, create the dummy variable for the characteristic
by multiplying the individual dummy variables
![Page 47: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/47.jpg)
Bootvar: Tricks
Example: Males = 1,0 (MALES)Diabetes = 1,0 (DIAB)Males having diabetes (MDIAB) = MALES * DIAB
MALES DIAB MDIAB
1 0 01 1 10 0 00 1 0
* =
![Page 48: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/48.jpg)
Bootvar: Tricks
Use the REGION parameter in bootvar to specify a “stratification” variable (doesn’t have to be a geographic variable!) Example: REGION = sex
will produce results by sex
![Page 49: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/49.jpg)
CV look-up tables
What is it? Approximate sampling variability tables Produced for Canada, each province, and by age groups
for Canada (also by Health Regions for cycle 2) Useful only for categorical estimates
Totals & ratios only
![Page 50: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/50.jpg)
Approximate Sampling Variability Tables for MANITOBA - Selected Members
NUMERATOR OF ESTIMATED PERCENTAGE PERCENTAGE ('000) 0.1% 1.0% 2.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% ………
1 103.6 103.2 102.6 101.1 98.4 95.6 92.7 89.8 86.7 83.6 80.3 2 ******** 72.9 72.6 71.5 69.5 67.6 65.6 63.5 61.3 59.1 56.8 3 ******** 59.6 59.3 58.3 56.8 55.2 53.5 51.8 50.1 48.3 46.4 4 ******** 51.6 51.3 50.5 49.2 47.8 46.4 44.9 43.4 41.8 40.2 5 ******** 46.1 45.9 45.2 44.0 42.7 41.5 40.2 38.8 37.4 35.9 6 ******** 42.1 41.9 41.3 40.2 39.0 37.9 36.7 35.4 34.1 32.8 7 ******** 39.0 38.8 38.2 37.2 36.1 35.0 33.9 32.8 31.6 30.4 8 ******** 36.5 36.3 35.7 34.8 33.8 32.8 31.7 30.7 29.6 28.4 9 ******** 34.4 34.2 33.7 32.8 31.9 30.9 29.9 28.9 27.9 26.8 10 ******** 32.6 32.5 32.0 31.1 30.2 29.3 28.4 27.4 26.4 25.4 11 **************** 30.9 30.5 29.7 28.8 28.0 27.1 26.2 25.2 24.2 12 **************** 29.6 29.2 28.4 27.6 26.8 25.9 25.0 24.1 23.2 13 **************** 28.5 28.0 27.3 26.5 25.7 24.9 24.1 23.2 22.3 14 **************** 27.4 27.0 26.3 25.5 24.8 24.0 23.2 22.3 21.5 15 **************** 26.5 26.1 25.4 24.7 23.9 23.2 22.4 21.6 20.7 16 **************** 25.7 25.3 24.6 23.9 23.2 22.4 21.7 20.9 20.1 17 **************** 24.9 24.5 23.9 23.2 22.5 21.8 21.0 20.3 19.5 18 **************** 24.2 23.8 23.2 22.5 21.9 21.2 20.4 19.7 18.9 19 **************** 23.5 23.2 22.6 21.9 21.3 20.6 19.9 19.2 18.4 20 **************** 22.9 22.6 22.0 21.4 20.7 20.1 19.4 18.7 18.0 21 **************** 22.4 22.1 21.5 20.9 20.2 19.6 18.9 18.2 17.5 22 ************************ 21.5 21.0 20.4 19.8 19.1 18.5 17.8 17.1 23 ************************ 21.1 20.5 19.9 19.3 18.7 18.1 17.4 16.7 24 ************************ 20.6 20.1 19.5 18.9 18.3 17.7 17.1 16.4 25 ************************ 20.2 19.7 19.1 18.5 18.0 17.3 16.7 16.1 30 ************************ 18.4 18.0 17.5 16.9 16.4 15.8 15.3 14.7 35 ************************ 17.1 16.6 16.2 15.7 15.2 14.7 14.1 13.6 40 ************************ 16.0 15.6 15.1 14.7 14.2 13.7 13.2 12.7 45 ************************ 15.1 14.7 14.2 13.8 13.4 12.9 12.5 12.0 50 ************************ 14.3 13.9 13.5 13.1 12.7 12.3 11.8 11.4
CV look-up tables
![Page 51: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/51.jpg)
Sampling Variability Guidelines
Type of estimate CV Guidelines
Acceptable 0.0-16.5 General unrestricted release
Marginal 16.6-33.3 General unrestricted release but withwarning cautioning users of the
highsampling variablitity.
Should be identified by letter M.
Unacceptable > 33.3 No release.
Should be flagged with letter U.
![Page 52: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/52.jpg)
CV look-up tables
Manitoba total: T=32K Cvtable =18%, BTS = 18.7%Manitoba Males : T=16K Cvtable=25.7%, BTS=27.6%Manitoba Females: T=16.5K Cvtable=25.3%, BTS=26.4%
Comparison between bootstrap CV and CV from lookup table For number of people having diabetes:
![Page 53: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/53.jpg)
CV look-up tables
Other examples (from master - general file) Number of people experiencing food insecurity:
Number of people in the lowest income quintile:
Comparison between bootstrap CV and CV from lookup table
Manitoba total: T=40K Cvtable =11.9%, BTS = 19.8%
Manitoba total: T=118K Cvtable =6.4%, BTS = 11.2%
![Page 54: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/54.jpg)
Bootvar: Regression models
Logistic regression model log (Y) = intercept + b1*X1 + b2*X2
→Y has to be qualitative (categorical) (for now assume it is dichotomous, i.e. 0,1)
→Xi can be quantitative or qualitative variables
![Page 55: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/55.jpg)
Bootvar: Regression models
Logistic regression model Example: Diabetes vs sex and age
→Categorical variables need to be dichotomized (“dummied”; 1 variable for each category except 1)
→Sex: if sex=2 then FEMALE = 1; else FEMALE = 0;→Age: create a variable for people over 60
(if age > 60 then OVER60=1; else OVER60=0)→The model is:
DIAB = intercept + b1*FEMALE + b2*OVER60
![Page 56: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/56.jpg)
Bootvar: Regression models
Logistic regression model Example: Diabetes vs sex and age
DIAB = intercept + b1*FEMALE + b2*OVER60
In bootvar, use %logreg macro
%logreg(yvar,xvar);
%logreg(DIAB,FEMALE OVER60);
![Page 57: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/57.jpg)
Bootvar: Regression models
Linear regression model Y = intercept + b1*X1 + b2*X2
→Y is quantitive
→Xi can be qualitative (categorical) or quantitative
![Page 58: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/58.jpg)
Bootvar: Regression models
Linear regression model Example: BMI (body mass index) vs sex and age
→Categorical variables need to be dichotomized (“dummied”; 1 variable for each category except 1)
→Sex: if sex=2 then FEMALE = 1; else FEMALE = 0;→Age: use it as quantitative (single year of age)→The model is:
BMI = intercept + b1*FEMALE + b2*AGE
![Page 59: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/59.jpg)
Bootvar: Regression models
Linear regression model Example: BMI vs sex and age
BMI = intercept + b1*FEMALE + b2*AGE
In bootvar, use %regress macro
%regress(yvar,xvar);
%regress(BMI,FEMALE AGE);
![Page 60: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/60.jpg)
Bootvar: testing
For version 2.0/2.1: Simply set 2 < B < 500
For version 1.0: See documentation!
![Page 61: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/61.jpg)
Historical info about variance estimation for NPHS
Cycle 1: Use of Jackknife technique Could not disseminate with public-use microdata
files; only custom requests Cycle 2 & +: Use of bootstrap technique
Can not disseminate ….; custom requests or remote access
All cycles: CV look-up tables for large domains (provinces, age groups) only good for totals, ratios, and differences of ...
![Page 62: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/62.jpg)
Variance estimation with other software programs
WesVar (SPSS)
SAS
SUDAAN
STATA
![Page 63: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/63.jpg)
Future for Stats Can Health Surveys (vs. bootstrap)
NPHS Cycle 4 (2000-2001) data processing & weighting
Promote the use of longitudinal data
Bootstrap pgms: finalize version 2.0 (SAS & SPSS)
CCHS Cycle 1.1 bootstrap weights
Bootstrap also used for variance estimation (same programs as for NPHS)
![Page 64: Population Health Surveys Bootstrap Hands-on Workshop Yves Beland, CCHS senior methodologist Larry MacNabb, CCHS dissemination manager developed by François](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d265503460f949fc915/html5/thumbnails/64.jpg)
Contacts
Health Pgm Surveys Manager: Lorna Bailie ([email protected])
NPHS Manager: France Bilocq ([email protected])
CCHS Manager: Marc Hamel ([email protected])
CCHS Dissemination manager: Larry MacNabb ([email protected]
Senior Methodologists: François Brisebois
Mylène Lavigne ([email protected])
Yves Béland ([email protected])
Data Access Services Manager: Mario Bédard ([email protected])
Custom Services Requests: Garry Macdonald ([email protected])
Population Health Surveys