principles and guidelines · /n (pps sampling) •each psu gets the same sample size p = n/m...

17
1 Sampling for EHES Principles and Guidelines Johan Heldal & Susie Cooper Statistics Norway

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

1

Sampling for EHES

Principles and Guidelines

Johan Heldal & Susie Cooper

Statistics Norway

Page 2: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

2

Overview • Why this kind of sampling?

• Target population & sample size

• Sampling frames.

• Probability sampling

• Two-stage sampling - PSUs

• Stratification

• Stage 1 sampling

Sample sizes

Sampling PSUs with PPS

• Stage 2 sampling

• A cost model

• Age-gender stratification

• Further aspects

Page 3: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

3

Why?

• Goals for EHES:

• To estimate distribution of risk levels within national populations.

• To compare risk levels among national populations.

• To predict levels of disease in the future.

• Different from ordinary goals for epidemiologists: to establish risk factors and models for risk.

Page 4: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

4

Ideal Target Population

• Core: All persons 25-64 years at a given date with permanent residence in a country.

Can be extended by age to 18+.

Should also include institutionalized.

• Sample size: At least 500 in each of

(M,W) x (25-34, 35-44, 45-54, 55-64):

Total ≥ 4000 persons.

For pilot ≥ 200 persons.

Page 5: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

5

Main Sampling Frame

• List of persons/addresses from which to take a sample (register or census).

Should cover the target population but may need ”adds-on”.

”adds-on”: List of institutions

• A good list frame may be unavailable.

• Can use ”Map frames” (NHANES).

• Telephone directories may be complicated.

Page 6: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

6

Probability sampling

• Sampling in scientific surveys is carried out as Probability Sampling (e.g. simple random sampling)

• Every sampling unit and every target unit has a defined probability of being selected.

• It must be possible to calculate this probability at least for all units being sampled.

Page 7: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

7

Two stage sampling

• Primary Sampling Unit: Area that can be handled by one examination site.

• Small enough that every person living there can easily travel to the site.

• Or be easily visited.

• Can be created from small census tracts, municipalities, electoral districts, post code areas or … .

• Divide the country into disjoint PSUs.

Page 8: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

8

Two stage sampling

• Stratification: Group the PSUs into groups of ”close PSUs”, Strata.

• Use geography and other known information to group similar PSUs together.

• Stage 1: Take a probability sample of PSUs in each stratum.

• Stage 2:Then take a probability sample of persons/-households/-addresses in each sampled PSU.

Page 9: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

9

Page 10: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

10

Strata consists of PSUs

• PSU sizes: Ni = # persons, households, addresses

of PSU no. i.

Can vary, but not too much within a stratum.

Recommended Ni ≥ 1000.

• Stratum size: N = N1 + … + NM

• A sample of m ≥ 2 PSUs and n persons or addresses, is taken from the stratum.

Page 11: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

11

Stage 1 sampling • Selection probabilities for PSUs i :

πi = mNi/N (PPS sampling)

• Each PSU gets the same sample size

p = n/m (persons, addresses).

• Gives every person in the same stratum equal probability of being selected.

• m and p can be calculated in a cost-

variance optimal way in each stratum.

• The program EHESsampling takes care of the calculations and performs sampling.

Page 12: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

12

Stage 2 sampling

• Sampling of persons or addresses within each of the PSUs sampled at stage 1.

• Simple random sampling of p = n/m

(persons, addresses) in every sampled PSU.

Page 13: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

13

A cost model

C1 = cost of establishing an extra PSU

C2 = cost of inviting an extra person to

the PSU.

Total variable cost budget model

C = C1m + C2n

m and p = n/m can be calculated to

minimize variance given the size of this budget. EHESsampling can do this.

Page 14: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

14

Age-gender stratification

• At stage 2: Sample separately for each of the eight (M,W) x 4 age domains.

• An option only if the main sampling frame consists of individual persons.

• Gives better control of sample size within each age-gender domain.

• Not necessary if sampling size very large.

Page 15: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

15

With address frames

• Address:

1. A dwelling or

2. A house with many dwellings

1. Dwelling: Invite all eligible persons in the dwelling, if not too many

2. Sample some dwellings at the address with a Kish grid. (Stage 3)

Then do as in 1.

Page 16: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

16

Time and place aspects

• A HES takes time (say a year).

• Avoid confoundation between time of year and geography.

• A randomized design for the order of visiting the PSUs recommended.

• Simpler to handle if many teams work in parallel.

Page 17: Principles and Guidelines · /N (PPS sampling) •Each PSU gets the same sample size p = n/m (persons, addresses). •Gives every person in the same stratum equal probability of being

17

Thank you!