19. why simple random sampling can be seldom used in practice 2a- juan...

15
Session 2A: The limitations of Simple Random Sampling and The practice of sampling for household surveys Juan Muñoz, Sistemas Integrales Delhi, March 18, 2013 Please join Channel 41

Upload: others

Post on 03-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Session 2A:The limitations of Simple Random Sampling andThe practice of sampling for household surveys

Juan Muñoz, Sistemas IntegralesDelhi, March 18, 2013

Please join  Channel 41

Page 2: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Limitations of Simple Random Sampling

• SRS may be an option in certain cases,• but it may not be practical if

– if we need estimations for subgroups of the population – especially if some of the subgroups are small

– we don’t dispose of an adequate sample frame– a Simple Random Sample would be too scattered in the territory

• We then resort to other techniques– Stratification– Sampling in stages

2

Page 3: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Elections in an archipelago

3

Page 4: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Stratification• We divide the population into subgroups, called strata

• We take a separate sample in each stratum

• Stratification may be needed if:– We want to reduce the standard error, by gaining control of the composition of the sample

– We want to assure the representativity of certain groups

• The selection probabilities may differ across strata

4

These two objectives are contradictory in 

practice

This imposes the use of weights, to assure the external validity of our impact evaluation

Page 5: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Electoral survey in two stages

5

Page 6: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Two‐stage sampling• Instead of taking a SRS

• We divide the territory into small areas, called Primary Sampling Units (PSUs).– In the first stage, we choose PSUs.

– In the second stage, we select households in the chosen PSUs

6

Page 7: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Two‐stage sampling• Solves the problems of SRS

– Reduces transportation costs– Reduces sample frame problems

• The sample can be made self‐weighted if– We choose PSUs with Probability Proportional to Size (PPS), and then

– We take a fixed number of households in each PSU

• The price to pay is cluster effect

7

Page 8: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Cluster effectStandard error grows if, instead of taking a Simple Random Sample of n households, we take a two‐stage sample, with  

k PSUs and m households per PSU (n=k•m)

Cluster effect

Intra‐Cluster Correlation

1122 mee SRSTSS

Two-Stage Sample Simple Random Sample

Page 9: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

0.01 0.02 0.05 0.10 0.20Intra-Cluster Correlation

1.03 1.06 1.15 1.30 1.601.05 1.10 1.25 1.50 2.001.07 1.14 1.35 1.70 2.401.11 1.22 1.55 2.10 3.201.14 1.28 1.70 2.40 3.801.19 1.38 1.95 2.90 4.801.29 1.58 2.45 3.90 6.801.39 1.78 2.95 4.90 8.801.59 2.18 3.95 6.90 12.801.79 2.58 4.95 8.90 16.802.19 3.38 6.95 12.90 24.80

1.151.251.351.551.701.952.452.953.954.956.95

1.19 1.38 1.95 2.90 4.80

Cluster EffectFor a total sample size of 12,000 households

3,000 42,000 61,500 81,000 12800 15600 20400 30300 40200 60150 80100 120

1.95

Number of PSUs

HHs per PSU

Page 10: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Design effect

• In a two‐stage sampleCluster effect = e2TSS / e2SRS

• In a more complex design(with two or more stages, stratification, etc.)Design effect = Deff = e2Complex design / e2SRS

• Can be interpreted as an apparent contraction of the sample size, as a result of clustering and stratification

• Can be estimated with special software(e.g., Stata’s svy commands)

10

Page 11: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Household samplesChoosing the PSUs• PSUs are Census Enumeration Areas (EAs), or groups of EAs

• PSUs typically have 50 to 200 households

• The sample frame is a small file. Can easily be managed with Excel

• PSUs in the sample are generally selected with Probability Proportional to Size (PPS)

• The selected PSUs must be recognizable in the field

11

Implies collaboration with the National Census Office

Outsized PSUs may require some work 

See how to do it in the UN Manual for HH Surveys in 

development and transition countries

Computer files are not enough,

We also need maps

Page 12: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Household samplesChoosing the households

• The best sample frame is the full list of all households in the selected PSUs

• The household listing operation requires time and money. Relative to the project’s overall calendar and budget, these are– Marginal, if they are accounted for beforehand– Large enough to be a big headache, if they are 

not• Information to be reported on the listing

– Name and address, as a minimum– Additional data required for the selection

(e.g., presence of pregnant women, or children)• Households are generally selected from the 

listing by systematic equal probability sampling

12

Do not ask additional 

information that is not essential 

Beware of imitations, such 

as

random walkssnowballing

expert opinion

Page 13: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

The best way of dealing with non response is to…

1. Replace non respondents by similar households

2. Increase the sample size to compensate for non response

3. Use correction formulas4. Use imputation 

techniques to simulate the answers of non respondents

5. None of the above

Page 14: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

The best way of dealing with non response is to prevent it

Lohr, Sharon L. Sampling: Design & Analysis (1999)14

The big problem with non response is not the reduction of sample size. The problem is bias.

Page 15: 19. Why Simple Random Sampling Can be Seldom Used in Practice 2A- Juan Munozpubdocs.worldbank.org/en/415941463426765721/Core-19-Why... · Microsoft PowerPoint - 19. Why Simple Random

Non response

Interviewer

Questionnaire Respondent

Training

Work loadMotivation

QualificationWork plan

Demographic

Socio‐economic

Biologic testing

Fatigue

Motivation

Proxy

Availability

Fuente: “Some factors affecting Non‐Response.” by R. Platek. 1977. Survey Methodology. 3. 191‐21415