principles and guidelines · /n (pps sampling) •each psu gets the same sample size p = n/m...
TRANSCRIPT
1
Sampling for EHES
Principles and Guidelines
Johan Heldal & Susie Cooper
Statistics Norway
2
Overview • Why this kind of sampling?
• Target population & sample size
• Sampling frames.
• Probability sampling
• Two-stage sampling - PSUs
• Stratification
• Stage 1 sampling
Sample sizes
Sampling PSUs with PPS
• Stage 2 sampling
• A cost model
• Age-gender stratification
• Further aspects
3
Why?
• Goals for EHES:
• To estimate distribution of risk levels within national populations.
• To compare risk levels among national populations.
• To predict levels of disease in the future.
• Different from ordinary goals for epidemiologists: to establish risk factors and models for risk.
4
Ideal Target Population
• Core: All persons 25-64 years at a given date with permanent residence in a country.
Can be extended by age to 18+.
Should also include institutionalized.
• Sample size: At least 500 in each of
(M,W) x (25-34, 35-44, 45-54, 55-64):
Total ≥ 4000 persons.
For pilot ≥ 200 persons.
5
Main Sampling Frame
• List of persons/addresses from which to take a sample (register or census).
Should cover the target population but may need ”adds-on”.
”adds-on”: List of institutions
• A good list frame may be unavailable.
• Can use ”Map frames” (NHANES).
• Telephone directories may be complicated.
6
Probability sampling
• Sampling in scientific surveys is carried out as Probability Sampling (e.g. simple random sampling)
• Every sampling unit and every target unit has a defined probability of being selected.
• It must be possible to calculate this probability at least for all units being sampled.
7
Two stage sampling
• Primary Sampling Unit: Area that can be handled by one examination site.
• Small enough that every person living there can easily travel to the site.
• Or be easily visited.
• Can be created from small census tracts, municipalities, electoral districts, post code areas or … .
• Divide the country into disjoint PSUs.
8
Two stage sampling
• Stratification: Group the PSUs into groups of ”close PSUs”, Strata.
• Use geography and other known information to group similar PSUs together.
• Stage 1: Take a probability sample of PSUs in each stratum.
• Stage 2:Then take a probability sample of persons/-households/-addresses in each sampled PSU.
9
10
Strata consists of PSUs
• PSU sizes: Ni = # persons, households, addresses
of PSU no. i.
Can vary, but not too much within a stratum.
Recommended Ni ≥ 1000.
• Stratum size: N = N1 + … + NM
• A sample of m ≥ 2 PSUs and n persons or addresses, is taken from the stratum.
11
Stage 1 sampling • Selection probabilities for PSUs i :
πi = mNi/N (PPS sampling)
• Each PSU gets the same sample size
p = n/m (persons, addresses).
• Gives every person in the same stratum equal probability of being selected.
• m and p can be calculated in a cost-
variance optimal way in each stratum.
• The program EHESsampling takes care of the calculations and performs sampling.
12
Stage 2 sampling
• Sampling of persons or addresses within each of the PSUs sampled at stage 1.
• Simple random sampling of p = n/m
(persons, addresses) in every sampled PSU.
13
A cost model
C1 = cost of establishing an extra PSU
C2 = cost of inviting an extra person to
the PSU.
Total variable cost budget model
C = C1m + C2n
m and p = n/m can be calculated to
minimize variance given the size of this budget. EHESsampling can do this.
14
Age-gender stratification
• At stage 2: Sample separately for each of the eight (M,W) x 4 age domains.
• An option only if the main sampling frame consists of individual persons.
• Gives better control of sample size within each age-gender domain.
• Not necessary if sampling size very large.
15
With address frames
• Address:
1. A dwelling or
2. A house with many dwellings
1. Dwelling: Invite all eligible persons in the dwelling, if not too many
2. Sample some dwellings at the address with a Kish grid. (Stage 3)
Then do as in 1.
16
Time and place aspects
• A HES takes time (say a year).
• Avoid confoundation between time of year and geography.
• A randomized design for the order of visiting the PSUs recommended.
• Simpler to handle if many teams work in parallel.
17
Thank you!