sampling - environmental science & policy · sampling method is! use good methods to minimize...
TRANSCRIPT
SamplingESP178 Research Methods
Professor Susan Handy2/2/16
Ethics
Movie Trailer BBC story
To Test Housing Program, Some Are Denied AidNew York Times, 12/8/2010
“Half of the test subjects — people who are behind on rent and in danger of being evicted —are being denied assistance from the program for two years, with researchers tracking them to see if they end up homeless.”
http://www.nytimes.com/2010/12/09/nyregion/09placebo.html?pagewanted=all&_r=0
Ethical principles
• Minimize harm• No deception• Informed consent• Distribute benefits equitably
What we’ve covered…
Aspect of Research Type of ValidityWhat to study Conceptualization and
operationalizationMeasurement
Who to study Sampling External (Generalizability
How to study it Research design Internal (Causal)
Sampling
Sampling is the (statistical) process of selecting a subset of a population of interest for purposes of making observations and (statistical) inferences about that population.
How it worksUnit of Observation
Population = who we want to know about People living in selected area
Sampling Frame =list of units in population
Sample = who we collect data from
As depicted in your text
Types of Sampling
Type DefinitionProbability sampling i.e. random
Every element in the population has a none non-zero probability of being selected; sampling involves random selection (equal chance)
Non-probability sampling i.e. non-random
Do not know in advance how likely that any element of the population will be selected for the sample; non-random selection (not equal chance)
Probability samplingGoal is a representative sample –
one that resembles the population of interest
Population
Sample
Generalizability
A Different Population
Sample Generalizability Cross-Population Generalizability
Sampling Error
Difference between characteristics of sample and characteristics of population
Random sampling error
Inherent in process of sampling! Measure it with confidence intervals (see below).
Systematic samplingerror
Depends on how good your sampling method is!Use good methods to minimize this.
The Sampling Frames Challenge
Issues?
Alternatives? ?
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chance
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling
Select first element randomly, then pick every nth element
To select first element, then every 10th element
OR to select element on each page
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling
Select first element randomly, then pick every nth element
Stratified Random Sampling – Proportionate
Sort population into strata; randomly select within strata; proportionate to population
If sampling frames available for strata but not overall population.To have more homogeneous samples (see below).
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling
Select first element randomly, then pick every nth element
Stratified Random Sampling – Proportionate
Sort population into strata; randomly select within strata; proportionate to population
Stratified Random Sampling – Disproportionate
Sort population into strata; randomly select within strata; disproportionate to population
To ensure enough elements in small strata.Strata defined by some characteristic…
Campus Travel Survey Sampling
http://its.ucdavis.edu/research/publications/publication-detail/?pub_id=2537
Campus Mode Share
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling
Select first element randomly, then pick every nth element
Stratified Random Sampling – Proportionate
Sort population into strata; randomly select within strata; proportionate to population
Stratified Random Sampling – Disproportionate
Sort population into strata; randomly select within strata; disproportionate to population
Cluster Sampling Draw random sample of clusters, then select elements within clusters
Cluster = naturally occurring grouping, e.g. neighborhoods, classes, etc. Often used for practical purposes – if in-person data collection.
Exit Polling
Probability Sampling MethodsMethod DefinitionSimple Random Sampling Elements chosen completely by chanceSystematic Random Sampling
Select first element randomly, then pick every nth element
Stratified Random Sampling – Proportionate
Sort population into strata; randomly select within strata; proportionate to population
Stratified Random Sampling – Disproportionate
Sort population into strata; randomly select within strata; disproportionate to population
Cluster Sampling Draw random sample of clusters, then select elements within clusters
Matched-pairs Sampling Divide population into two groups by key characteristic; draw random sample in Group 1, then find matches in Group 2
How do you know if you have a representative sample?
Source: Popovich, et al. 2015
Compare to census data…
Does sample size matter?
Inferential Statistics for Probability SamplesTerm Definitionn Sample sizeSample statistic Statistic computed from sample, e.g. meanPopulation parameter True value of statistic, e.g. mean, for populationSampling error Population parameter – sampling statistic
We don’t know the population parameter!So we don’t know the sampling error!
But we can estimate a confidence interval…
Sample mean
Sample StatisticStandard Deviation = how close individual scores are to the sample mean
Scores
Population mean
Let’s say we take a bunch of samples…Standard error = how close mean scores from repeated samples are to the population mean
Sample means
Calculating a confidence interval
s𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡 = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑒𝑒𝑑𝑑𝑑𝑑𝑡𝑡𝑡𝑡𝑑𝑑𝑒𝑒𝑡𝑡/ 𝑡𝑡
confidence interval = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑑𝑑𝑠𝑠𝑡𝑡𝑑𝑑𝑠𝑠 ± 2 𝑥𝑥 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡
The 68-95-99 percent rule for confidence intervals
Calculating a confidence interval
s𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡 = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑒𝑒𝑑𝑑𝑑𝑑𝑡𝑡𝑡𝑡𝑑𝑑𝑒𝑒𝑡𝑡/ 𝑡𝑡
How do we reduce standard error?
More homogeneous populations mean tighter confidence intervals: SD ↓ → SE ↓ → CI ↓
Larger sample sizes mean tighter confidence intervals: n ↑ → SE ↓ → CI ↓
confidence interval = 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑑𝑑𝑠𝑠𝑡𝑡𝑑𝑑𝑠𝑠 ± 1.96 𝑥𝑥 𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑡𝑡𝑡𝑡𝑒𝑒𝑡𝑡
Example
0.00.5
1.01.5
2.02.5
3.03.5
4.04.5
1 2 3 4 5 6 7 8 9 10 11 12 13
Pints per Week
Num
ber o
f Stu
dent
s
UCDCSUC
Calculation of Confidence Intervals UCD CSUC
Mean 4.1 6.0
Standard Deviation 2.54 3.32
Calculation of Confidence Intervals UCD CSUC
Mean 4.1 6.0
Standard Deviation 2.54 3.32
Standard Error if n=20 1.26 1.36
95% CI low 1.57 3.34
95% CI high 6.53 8.66
= 𝑆𝑆𝐷𝐷/ 𝑡𝑡
= 𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 − 2 ∗ 𝑆𝑆𝐸𝐸= 𝑚𝑚𝑒𝑒𝑡𝑡𝑡𝑡 + 2 ∗ 𝑆𝑆𝐸𝐸
Are the means different?i.e. do the confidence intervals overlap?
Calculation of Confidence Intervals UCD CSUC
Mean 4.1 6.0
Standard Deviation 2.54 3.32
Standard Error if n=20 1.26 1.36
95% CI low 1.57 3.34
95% CI high 6.53 8.66
Standard Error if n=200 0.18 0.24
95% CI low 3.70 5.54
95% CI high 4.40 6.46
Are the means different?
Sample size matters
Larger sample is better – up to a point
Population
Sample
Population
Sample
Percent women based on random sample…
Sample Size Margin of Error Range for Estimate
5 44% 6 – 94%
10 31% 19 – 81%
20 22% 28 – 72%
30 17% 33 – 67%
40 15% 35 – 65%
50 13% 37 – 63%
http://www.methodspace.com/group/qualitativeinterviewing/forum/topics/two-problems-with-random-sampling-and-two-alternatives
A few examples
My Cul-de-Sac Sampling Plan:
The household survey will provide the primary data for testing the hypothesis. We expect the design of the sampling plan for the survey and the design of the survey instrument itself to be particularly challenging for this study. The target population for this study is children living in houses located on cul-de-sacs and through streets in the Sacramento region. Because no sampling frame exists for this population, we will use a multi-stage cluster sampling strategy. First, residential neighborhoods throughout the region will be defined based on major roadways and other geographic features. Census data will be used to eliminate neighborhoods built before 1950 because of the infrequent use of cul-de-sacs in residential developments before this time. From the remaining post-1950 neighborhoods, a random sample of neighborhoods will be selected. Within these neighborhoods, cul-de-sacs will be identified using the 2000 Census street network and the capabilities of geographic information systems (GIS). A random sample of the cul-de-sacs within each neighborhood will be chosen. For each cul-de-sac in the sample, a segment of a nearby through street (defined as a street that links arterial streets and carries significant levels of through traffic) of otherwise similar characteristics and similar length will be chosen. This approach creates matched pairs of streets. Next, all addresses on the sample street pairs will be compiled to create a sample of households. The sample of streets will be used in the field observations, the sample of households will be used in the household survey, and a sub-sample of the sample of households will be used in the in-depth interviews, as described below.
Handy, et al., 2005
Handy, et al., 2005
Types of Sampling
Type DefinitionProbability sampling i.e. random
Every element in the population has a none non-zero probability of being selected; sampling involves random selection (equal chance)
Non-probability sampling i.e. non-random
Do not know in advance how likely that any element of the population will be selected for the sample; non-random selection (not equal chance)
Non-Probability Sampling
Method DefinitionAvailability or Convenience Sampling
Cases selected because they’re easy to find
Quota Sampling (proportional or not)
Groups defined by key characteristics; specified number of cases selected in each group.
Purposive or Expert Sampling
Individuals selected for sample because of their knowledge – “key informants”
Snowball Sampling Start with initial sample, ask them to recommend other participants
Used for exploratory and qualitative research.Must be very cautious about generalizing!
Other things to think about!Goal Example 1
Child as unit of analysisExample 2Neighborhood as unit
Ensuring that the independent variable varies
Sample includes children who live on cul-de-sacs and children who don’t
Sample includes neighborhoods that have lots of cul-de-sacs and neighborhoods that have few cul-de-sacs
Ensuring that the control variables don’t vary
Sample includes only children from moderate-income households
Sample includes only neighborhoods with average income in the moderate range
After the mid-term: non-response bias
To do
• Sampling exercise on Thursday!• Read and study for midterm!• Don’t forget lecture notes on website!• Office hours Wed 3 – 4:30• Review in Section on Friday