myoung ho lee

25
Myoung Ho Lee STATISTICAL METHODS FOR REDUCING BIAS IN WEB SURVEYS 13 rd September 2012

Upload: ollie

Post on 05-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

Myoung Ho Lee. STATISTICAL METHODS FOR REDUCING BIAS IN WEB SURVEYS. 13 rd September 2012. Introduction Web surveys Methodology - Propensity Score Adjustment - Calibration (Rim weighting) Case Study Discussion and Conclusion. Contents. Trends in Data Collection - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Myoung Ho Lee

Myoung Ho Lee

STATISTICAL METHODS FOR REDUCING BIAS IN WEB SURVEYSSTATISTICAL METHODS FOR REDUCING BIAS IN WEB SURVEYS

13rd September 2012

Page 2: Myoung Ho Lee

Introduction

Web surveys

Methodology

- Propensity Score Adjustment

- Calibration (Rim weighting)

Case Study

Discussion and Conclusion

Contents 2

Page 3: Myoung Ho Lee

• Trends in Data Collection

Paper and Pencil => Telephone => Computer

=> Internet (Web)

• Internet penetration

Introduction 3

Page 4: Myoung Ho Lee

Pros and Cons of Web surveys

• Pros - Low cost and Speed

- No interviewer effect

- Visual, flexible and interactive

- Respondents convenience

• Cons - Quality of sample estimates

Web surveys may be solutions! But, Problems!!!

Introduction 4

Page 5: Myoung Ho Lee

Previous Studies• Harris Interactive (2000 ~ )

• Lee (2004), Lee and Valliant (2009)

• Hur and Cho (2009)

• Bethlehem (2010), etc.

Lee and Valliant (2009) : good performance in simulation

But, most other results do not seem to be so good.

- Malhotra and Krosnick (2007), Huh and Cho (2009)

Introduction 5

Page 6: Myoung Ho Lee

Volunteer Panel Web Survey Protocol (Lee, 2004)

Web surveys 6

Under-coverage Self-selection Non-response

Challenge: Fix anticipated biases in web survey estimates that

result from under-coverage, self-selection and non-response

myoungho
Page 7: Myoung Ho Lee

Methodology

Proposed Adjustment Procedure for Volunteer Panel Web surveys (Lee, 2004)

7

Page 8: Myoung Ho Lee

Propensity Score Adjustment (PSA)

• Original idea : Comparison of two groups, treatment and control,

in observational studies (Rosenbaum and Rubin, 1983)

- by weighting using all auxiliary variables that are thought to

account for the differences

• In context of web surveys, this technique aims to correct for

differences between offline people and online people

- by certain inclinations of people who participate in the volunteer

panel web survey

Methodology 8

Page 9: Myoung Ho Lee

• “Webographic” : overlapping variables between

web and reference survey

- To capture the difference between online and

offline populations (Schonlau et al., 2007)

- For example, “Do you feel alone?”, “In the last month

have you read a book?”…… (Harris Interactive)

Methodology 9

Page 10: Myoung Ho Lee

• Propensity score :

It is assumed that zi are independent given a set of covariates (xi)

• ‘Strong ignorability assumption’ : Response variable is conditionally

independent of treatment assignment given the propensity score.

Methodology 10

Page 11: Myoung Ho Lee

Logistic regression model :

Variable Selection

• Include variables related to not only treatment assignment

but also response in order to satisfy the ‘strong ignorability

assumption’

(Rosenbaum and Rubin, 1984; Brookhart et al., 2006)

Methodology 11

Page 12: Myoung Ho Lee

Variable Selection

• In practice, stepwise selection method has been often used to

develop good predictive models for treatment assignment

• Most previous web studies : Use of all available covariates (5-30)

• Huh and Cho (2009) : 9 or 7 out of 123 covariates were chosen

by their “subjective” views

Methodology 12

Page 13: Myoung Ho Lee

Variable Selection

• Stepwise logistic regression using SIC

- large number of covariates, little theoretical guidance

• LASSO (PROC GLMSELECT in SAS)

- a good alternative to stepwise variable selection

• Boosted tree (“gbm” in R)

- determine a set of split conditions

Methodology 13

Page 14: Myoung Ho Lee

Applying methods for PSA

• Inverse propensity scores as weights

- weights :

- then, multiply them with sampling weights

• Subclassification (Stratification)

- subgrouping homogenous people into each stratum

Methodology 14

Page 15: Myoung Ho Lee

• Subclassification (Stratification)

1. Combine both reference and web data into one

2. Estimate each propensity score from the combined sample

3. Partition those units into C subclasses according to ordered

values, where each subclass has about the same number of units

4. Compute adjustment factor, and apply to all units in the cth

subclass.

5. Multiply the factor with sampling weights to get PSA weights

Methodology 15

Page 16: Myoung Ho Lee

Calibration (Rim weighting)

• Matching sample and population characteristics only with

respect to the marginal distributions of selected covariates

• Little and Wu (1991)

- Iterative algorithm to alternatively adjust weights according to

each covariates’ marginal distribution until convergence

Methodology 16

Page 17: Myoung Ho Lee

Case Study

• Reference survey : “2009 Social Survey” by Statistics Korea

- Culture & Leisure, Income & Consumption, etc.

- All persons aged +15 in 17,000 households

- Sample size : 37,049

- Face-to-face mode

- Post-stratification estimation

- Assumed to be “True”

17Case Study

Page 18: Myoung Ho Lee

• Web survey

- Recruiting volunteers from web sites (6,854 households)

- Systematic sampling with non-equal selection probabilities

(inverse of rim weights using region, age, gender)

- Sample size : 1,500 households and 2,903 respondents

- Overlapping covariates : 123

Case Study 18

Page 19: Myoung Ho Lee

Case Study – Model Selecion 19

M1 = Stepwise(22), M2 = Stepwise(17), M3 = LASSO(12), M4 = Boosted tree(18)

Page 20: Myoung Ho Lee

Assessment methods

• 16 combinations : (Model 1, 2, 3 and 4) × (Inverse weighting

and Subclassification) × (No Calibration and Rim weighting)

• 12 response variables

• Percentage of bias reduction

Case Study 20

Page 21: Myoung Ho Lee

M1 M2 M3 M4

SubclassificationInverse weighting

CalibrationPSA alone

M1 M2 M3 M4

SubclassificationInverse weighting

M1 M2 M3 M4M1 M2 M3 M4

Percentage of bias reduction

Page 22: Myoung Ho Lee

• Why PSA doesn’t work well alone ???

Discussion

Propensity scores for each survey in 5 strata in Model 1

22

Page 23: Myoung Ho Lee

What are the possible solutions to fix poor PSA?

• Setting maximum value of weight

• Different subclassification algorithm

- Formula for the variance of weights that depends on both the

number of cases from each group within a stratum and the

variability of propensity scores with the stratum

• Matching PSA

- limited number of treated group members and a larger number

of control group members

Discussion 23

Page 24: Myoung Ho Lee

• Violation of some assumptions

- ‘Strong ignorability assumption’

- Missing at random (MAR)

- Mode effects

• Variable selection (What are webographic variables?)

- Models affect the performance of PSA significantly

- Maybe expert knowledge, not statistical approach

- Further studies are needed

Discussion 24

Page 25: Myoung Ho Lee

• Web surveys have attractive advantages

• However, bias from self-selection, under-coverage, non-responses

• According to my case study results,

=> It seems to be difficult to apply PSA to “real world” just now

• Further researches on webographic variables and different PSA

methods are needed

Conclusion 25