introduction to survey sample weighting · 3/15/2017 5 base/selection weights (2) 9 base/selection...

12
3/15/2017 1 Introduction to Survey Sample Weighting Linda Owens Content of Webinar What are weights Types of weights Weighting adjustment methods General guidelines for weight construction/use. 2

Upload: others

Post on 20-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

1

Introduction to Survey Sample

Weighting

Linda Owens

Content of Webinar

� What are weights

� Types of weights

� Weighting adjustment methods

� General guidelines for weight construction/use.

2

Page 2: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

2

What are weights?

� A weight is a value assigned to each case in the data file to restore the proportional representation of the target population.

� The value of a weight indicates how much each case will count in a statistical procedure (or how many cases it will represent).

� A case with a weight of 1 represents only itself.

� A case with a weight of 2 represents itself plus one other unit.

� Weights are always positive and nonzero, but can be fractions.

3

Simple example

4

� In a simple random sample of 1,000 drawn from a population of 100,000, each sampled member would have a weight of 100, and would represent 100 members of the population (the case itself, plus 99 others).

� 1,000/100,000=.01;

� 1/.01=100

� If only half of the 1,000 sampled members responded, the weight would be doubled to 200, to account for nonsampled members and sampled members who did not respond. � 500/100,000=.005;

� 1/.005=200

Page 3: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

3

Conditions for using weights

5

� Weights allow the researcher to make inferences to the population from which the sample was drawn

� e.g. what percent of the population engages in regular exercise?

� Weights are used to make adjustments in probability samples, not to fix poorly designed samples or convenience samples

� Sample must be:

� drawn with probabilistic methods

� high quality

� sufficiently large

Reasons for using weights

6

� Members of population sampled with varying probabilities (e.g. Freshman sampled at higher rate than Seniors)

� Nonresponse varies by some characteristic of sampled respondents (e.g. women have higher response rates than men)

� Make sample characteristics consistent with population characteristics (e.g. percent of sample by gender matches percent of population by gender)

Page 4: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

4

Types of weights

7

� Base weights (selection weights)

� Expansion weights

� Relative weights

� Nonresponse weights

� Post-stratification weights

� Final analysis weight (generally a combination of the above types)

Base/Selection weights (1)

8

� Base weights adjust for different probabilities of selection among sampled population members

� Epsem (equal probability selection methods) result in sample in which each member has same probability of selection.

� Epsem samples sometimes called self-weighting; using weights rarely necessary for these samples

Page 5: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

5

Base/Selection weights (2)

9

� Base/selection weight is the inverse of the probability of selection:

where

� Sample 100 from a population of 10,000

10001.

1==iw01.

000,10

100==if

N

nfi =

i

if

w1

=

Base/Selection weights (3)

10

� When population members are sampled with unequal probabilities, base weights are necessary to ensure proper representation of population

� Example: If women are sampled from a list at a rate of 1/10 and men at a rate of 1/5, men will be over-represented in the sample if the data are not weighted.

� Women have a weight of 10, men a weight of 5.

Page 6: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

6

Expansion weights

11

� Expansion weights are weights that inflate the number of sampled cases to the population N.

� Base weights can sometimes also serve as expansion weights.

� Expansion weights should be used only to estimate total numbers of the population who possess the characteristic of interest.

� Never use expansion weights for model testing as they will inappropriately inflate the sample size being used for analysis.

Relative weights

12

� Relative weights are appropriate for analytic studies because they do not inflate the sample size.

� Are constructed by ‘normalizing’ expansion weights.

� Dividing each case’s expansion weight by the mean expansion weight

� Or, multiply each weight by ratio of actual sample size to sum of expansion weights

Page 7: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

7

Expansion weights vs.relative weights

13

� Expansion weights sum to the total population size (N)

� Relative weights sum to the study sample size (n)—the number of cases in the data file

Weight construction example

Stratum Ni ni fi wi rwi (ni)(rwi)

1 100 5 .05 20 1.33 6.65

2 100 5 .05 20 1.33 6.65

3 100 5 .05 20 1.33 6.65

4 100 5 .05 20 1.33 6.65

5 100 5 .05 20 1.33 6.65

6 50 5 .10 10 .67 3.35

7 50 5 .10 10 .67 3.35

8 50 5 .10 10 .67 3.35

9 50 5 .10 10 .67 3.35

10 50 5 .10 10 .67 3.35

Totals 750 50 150 10 50

Ni = total population in stratum ini = number sampled from stratum ifi = probability of selection in stratum i = (ni/Ni)wi = base (expansion) weight in stratum i = (Ni/ni) = 1/fi��= mean expansion (base) weight = [ ∑(wi)(ni) ]/n = (750)/(50) = 15rwi = relative weight = (wi) / (�� )

Page 8: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

8

Nonresponse weights (1)

15

� Nonresponse occurs when some sampled units do not respond to survey:

� 40% of men respond to survey compared to 50% of women

� 30% of smokers respond compared to 45% of nonsmokers

� Nonresponse weights adjust base weights so responding units represent those that don’t respond

Nonresponse weights (2)

16

� Respondents assigned to weighting adjustment cells

� Characteristics defining cells (gender, race, age) must be on the sample frame

� Nonresponse adjustment is reciprocal of response rate in each cell.

� NR adjustment for men=1/.40=2.5; women=1/.5=2

� NR weight for smokers=1/.30=3.3; nonsmokers=1/.45=2.2

� These adjustments assume characteristics defining cells are the only variables associated with nonresponse

� NR adjustment is multiplied by the base weight

Page 9: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

9

Post-stratification (1)

17

� Adjusting sample marginal distribution to match population distribution on key variables (generally demographic)

� Requires an auxiliary dataset to provide the population estimates (census, American Community Survey, etc.)

� Post-stratification formula:

where: pp= population proportion

ps= sample proportion

Post-stratification example

18

Gender Population Proportion

Sample Proportion

Population/Sample

Weight

Female .52 .60 .52/.60 .8666

Male .48 .40 .48.40 1.2

� Women are over-represented; men are under-represented.� Their weights are adjusted by the post-stratification ratios.� Post-stratification weights are used to adjust for minor

differences in non-response.

Page 10: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

10

Post-stratification (2)

19

� Post-stratification adjustments often use multiple factors—gender, race, age, education, etc.

� How to incorporate all?

� One large cross-classification of all factors

� Often results in a huge number of cells

� Sample sizes in cells too small to work with

� Iteratively adjust to factors one at a time

� Raking

� Manually or with software designed for it

Raking—how to

20

1. Weight data with base weights or adjusted base weights (wb).

2. Run frequency of first demographic variable (e.g. gender)

3. Adjust weighted sample proportion to population proportion wg=(wb*Pf/pf) for women and (wb*Pm/pm) for men

4. Apply this new weight (wg ) and run frequency on next demographic variable (e.g. race)

5. Adjust weighted sample proportion to population proportion wr=(wg*Pnhb/pnhb) for non-Hispanic Black, = (wg*Pnhw/pnhw) for non-Hispanic White, etc.

6. Do this for each demographic variable; repeat until sample proportions on all demographic variables are close to population proportions.

Page 11: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

11

Trimming weights

21

� Sometimes weights have a large range or a few cases have unusually large weights

� These may cause problems in the analysis

� Sometimes researchers “trim” these weights (create a cutoff for large weights)

� No clear standards for how to do it

� Use of trimming should be limited

To trim or not to trim?

22

� “How One 19-Year-Old Illinois Man is Distorting National Polling Averages”—the Upshot, Nate Cohn 10/12/16

� R in U.S.C/LAT poll had a final weight that was 30 times larger than average R and 300 times larger than least-weighted R

� Jill Darling, the survey director at the U.S.C. Center for Economic and Social Research, noted that they had decided not to “trim” the weights (that’s when a poll prevents one person from being weighted up by more than some amount, like five or 10) because the sample would otherwise underrepresent African-American and young voters.

� This makes sense. Gallup got itself into trouble for this reason in 2012: It trimmed its weights, and nonwhite voters were underrepresented.

https://www.nytimes.com/2016/10/13/upshot/how-one-19-year-old-illinois-man-is-distorting-national-polling-averages.html?_r=0

Page 12: Introduction to Survey Sample Weighting · 3/15/2017 5 Base/Selection weights (2) 9 Base/selection weight is the inverse of the probability of selection: where Sample 100 from a population

3/15/2017

12

Final analysis weights

23

� Only one weight per case can be used for data analysis

� Final weight typically product of base weight and adjustments made for nonresponse and poststratification

� e.g. wf = wb x wnr x wps

� Due to rounding, sum of final weights often different from analysis sample size.

� e.g. n=1,500 cases; sum of final weights=1,508.6

� Make final adjustment by multiplying final weight by a ratio of actual sample size to sum of final weights (1,500/1,508.6)

Use of weights

24

� If sample design uses unequal probabilities of selection, weights are necessary when making population inferences with descriptive statistics (e.g. 30% of population smokes).

� In multivariate analysis (e.g. regression):

� Not as much consensus about using weights

� If variables used to construct weights are predictors in regression model, maybe not necessary to use weights

� Run both ways and compare results

� Weights almost always increase variance of estimates

� Understand how your software (STATA, SAS, SPSS) uses weights