experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

39
Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions Patrick Sturgis National Centre for Research Methods (NCRM) and University of Southampton Paper presented at the New Measurement Issues in Survey Research meeting of the Survey Resources Network, 21 September 2010

Upload: hao

Post on 31-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions. Patrick Sturgis National Centre for Research Methods (NCRM) and University of Southampton. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Experimental thinkaloud protocols: a new method for

evaluating the validity of survey questions

Patrick Sturgis

National Centre for Research Methods (NCRM)

and

University of Southampton

Paper presented at the New Measurement Issues in Survey Research meeting of the Survey Resources Network, 21 September 2010

Page 2: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Do different questions measure the same thing?

Many important concepts are measured by different ‘standard’ questions in surveys: Social/political trust General health Life happiness/satisfaction Fear of crime/confidence in police

How to tell if they are ‘equivalent’? How to tell which is the ‘best’ measure?

Page 3: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Validity assessment strategies

Face/process validity Correlation with criterion variables Multi-trait-multi-method (MTMM) Expert panels Behaviour coding Interviewer debrief Thinkaloud protocols/cognitive interview

Page 4: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Experimental thinkalouds

Randomly assign respondents to receive one or other version of the ‘same’ question

Follow-up with verbatim probe ‘what came to mind when answering last question?’

Examine marginal distribution of cognitive frames by question type Are people thinking of things they should be?

Use thinkaloud variables in regression model to predict earlier response Which cognitive frames are most relevant in forming

answers to the questions?

Page 5: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Example 1 - Trust

Page 6: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Conceptions of Trust

Trust is a ‘good thing’ Trusting citizens are good citizens (voting,

volunteering, civic engagement) Trusting societies are good societies (more

democratic, egalitarian, > economic performance)

Trust ‘lubricates’ social and economic transactions

Reduces ‘monitoring costs’ and the need for contracts etc.

Page 7: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

The standard trust question Generally speaking, would you say that most

people can be trusted, or that you can't be too careful in dealing with people? Most people can be trusted Can’t be too careful

Usually credited to Rosenberg (1959), the ‘Rosenberg Generalized Trust’ (RGT) item

Page 8: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

The Local Area Trust item How much do you trust people in your local

area? a lot a fair amount not very much not at all

Reflects Putnam’s emphasis on trust being a property of local areas

Page 9: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Trust by Question type These items are both used more or less

interchangeably as measures of generalized trust Yet, they yield very different estimates of trust at the

national level. e.g.: Social Capital Community Benchmark survey: 47% most

people can be trusted; 83% trust people in local area ‘some’ or ‘a lot’

UK Taking Part survey: 44% most people can be trusted; 74% trust ‘many’ or ‘some’ of the people in their local area

Why such a large discrepancy in generalized trust (trust in strangers)?

Page 10: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Research Design

Ipsos-MORI general population omnibus survey Random selection of small areas, quota controlled

selection of individuals n=989 (fieldwork, November 2007) Respondents randomly assigned to RGT or TLA

item In answering the last question, who came to mind

when you were thinking about ‘most people’/ ‘people in your local area’?

Page 11: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Distributions for trust questions

RGT item (n=508) TLA item (n=481)

Most people can be trusted 48% (229) A lot 20% (100)

Can’t be too careful 52% (252) A fair amount 60% (302)

Not very much 17% (88)

Not at all 3% (17)

Page 12: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Primary Codes Higher Order Codes % mentioned 1. colleagues/ ex-colleagues 2. family/ family member 3. friends Known others 42% 4. most people I know/ meet 5. neighbours 6. people from my church 7. anyone/ all people 8. everyone/ everybody 9. foreigners/ ethnic minorities 10. general public/ people in general Unknown others 22% 11. children/ young people 12. no-one in particular 13. strangers 14. people in this town/ village Local community 5% 15. doctors 16.officials/ authority figures/professionals

17. police Named job/ profession 10% 18. politicians/political parties 19. salesmen/ sales people 20. tradesmen 21. don't know these days 22. identity theft 23. you have to place trust in people 24. people interested in themselves Other (not relevant) 13% 25. people mostly trustworthy 26. trust people until they upset me 27. trusting is naïve 28. other answers 29. don't know/not stated Don’t know/ no answer 22%

Page 13: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Who comes to mind by RGT

0%

10%

20%

30%

40%

50%

60%

70%

80%

known others unknown others namedjob/profession

people in localarea

other don't know/notstated

code

% m

en

tio

ne

d

most people can be trusted

can't be too careful

Page 14: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Who comes to mind by TLA

0%

10%

20%

30%

40%

50%

60%

70%

80%

known others unknown others namedjob/profession

people in localarea

other don't know/notstated

code

% m

en

tio

ne

d

a lot

a fair amount

not at all/not very much

Page 15: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Who came to mind – both questions

0%

10%

20%

30%

40%

50%

60%

known others unknown others namedjob/profession

people in local area other don't know/notstated

code

% m

enti

on

ed

RGT

TLA

Page 16: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Explanatory Models 1 RGT Item – Binary Logit Model Model 1a Model 2a

Covariates Logit (S.E.) O.R

. Logit (S.E.) O.R. Age (years) 0.028 (0.036) 1.03 0.013 (0.038) 1.01 Sex (male=1) 0.057 (0.197) 1.06 0.091 (0.207) 1.09 social class (ABC1=1) 0.817 (0.213)*** 2.26 0.949 (0.227)*** 2.58 longstanding illness (yes = 1) 0.355 (0.335) 1.43 0.462 (0.349) 1.59 Highest qualification (ref=no qualifications) Degree 0.944 (0.337)** 2.60 1.029 (0.354)** 2.80 GSCE or above 0.108 (0.261) 1.11 0.142 (0.276) 1.15 Marital status (ref = single, never married) Divorced 0.236 (0.454) 1.27 0.508 (0.476) 1.66 Married 0.176 (0.274) 1.19 0.413 (0.291) 1.51 Widow -0.124 (0.516) 0.88 0.272 (0.540) 1.31 Who came to mind? (ref=2. unknown others) 1. known others - - 1.535 (0.267)*** 4.64 3. people in local area - - 1.885 (0.763)** 6.60 4. named job/ profession - - -0.255 (0.373) 0.78 5. other (not relevant) - - 0.257 (0.328) 1.29 6. non-one/ don't know/ not stated - - 1.043 (0.280)*** 2.84 Constant -1.178 (0.345) 0.31 -2.161 (0.410) 0.12

Page 17: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Explanatory Models 2 TLA Item – Ordered Logit Model Model 1b Model 2b

Covariates Logit (S.E.) O.R

. Logit (S.E.) O.R. Age (years) 0.097 (0.034)** 1.10 0.076 (0.034)* 1.08 Sex (male=1) -0.393 (0.186)** 0.68 -0.255 (0.190) 0.77 social class (ABC1=1) 0.751 (0.204)*** 2.12 0.771 (0.207)*** 2.16 longstanding illness (yes = 1) 0.230 (0.293) 1.26 0.297 (0.297) 1.35 Highest qualification (ref=no qualifications) Degree 0.605 (0.312)* 1.83 0.425 (0.320) 1.53 GSCE or above 0.218 (0.255) 1.24 0.075 (0.258) 1.08 Marital status (ref = single, never married) Divorced -0.247 (0.409) 0.78 -0.206 (0.418) 0.81 Married 0.323 (0.249) 1.38 0.275 (0.253) 1.32 Widow 0.516 (0.440) 1.68 0.447 (0.448) 1.56 Who came to mind? (ref=2. unknown others) 1. known others - 1.559 (0.305)*** 4.75 3. people in local area - 0.953 (0.408)* 2.59 4. named job/ profession - 0.087 (0.305) 1.09 5. other (not relevant) - 0.383 (0.356) 1.47 6. non-one/ don't know/ not stated - 0.579 (0.346) 1.78 Constant - - - -

Page 18: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

The science of well-being

“Now is the time for every government to collect data on a uniform basis on the happiness of its population…every survey of individuals should automatically measure their well-being, so that in time we can really say what matters to people and by how much. When we do, it will produce very different priorities for our society. ” Layard 2010, Science.

Page 19: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Survey measures of subjective well-being

Tend to ask about ‘happiness’ or ‘satisfaction’ with life

And treat these as if they are measuring the same concept

Page 20: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Happiness = Satisfaction?

Yes – time-series models show same pattern of effects (Blanchlower and Oswald, 2002)

No – happiness and satisfaction correlated but not equivalent in European Values Survey (Gundelach and Kreiner 2004)

Page 21: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Mode effects

Widely different estimates of well-being across different surveys

Could mode be an explanatory factor? Being unhappy with your life is not socially

desirable (people may over-state happiness to an interviewer)

Conti and Pudney (2008) find higher ratings of satisfaction in interviewer relative to self-administered questions

Page 22: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Design

Ipsos-MORI face-to-face omnibus survey (quota sample), April 2010

n=2033 Respondents randomly allocated to:

1. interviewer administered life satisfaction

2. Self-administered life satisfaction

3. Interviewer administered happiness

4. Self-administered happiness

Page 23: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Questions (from European Social Survey)

All things considered, how happy would you say you are? Please answer using the scale on the card where 1 means ‘extremely unhappy’ and 10 means ‘extremely happy’.

1. Extremely unhappy..10. Extremely happy

All things considered, how satisfied are you with your life as a whole nowadays? Please answer using the scale on the card where 1 means ‘extremely dissatisfied’ and 10 means ‘extremely satisfied

1. Extremely dissatisfied..10. Extremely satisfied

Page 24: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Verbatims

Now, thinking about your answer to the last question, please tell me what came to mind when thinking about your answer. There are no right or wrong answers; I just want you to tell me everything that came to mind in thinking about how happy you are. What else?

PROBE FULLY

Page 25: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Results 1 satisfaction = happiness?

Page 26: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Raw distributions for happiness and satisfaction

Mean=7.38 Mean=7.39

Page 27: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Satisfaction v Happiness - distributions

Pearson’s Chi Square, p=0.041

Page 28: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Satisfaction v Happiness by sex

p=0.394p=0.047

Means

Male = 7.43Female = 7.34

Page 29: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Results 2 mode effects

Page 30: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Mode effect by question - means

Question CAPI (s.e.) CASI (s.e.) Happiness 7.45 (.077) 7.32 (.081) Satisfaction 7.29 (.081) 7.49 (.085)*

Page 31: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Mode effect by question - distributions

p=0.015p=0.209

Page 32: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Question*mode*sex - means

Question CAPI (s.e.) CASI (s.e.) men Happiness 7.40 (.105) 7.36 (.118) Satisfaction 7.46 (.118) 7.52 (.127) women Happiness 7.50 (.111) 7.28 (.112) Satisfaction 7.12 (.118) 7.48 (.127)**

Page 33: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Question*mode*sex - distributions

p=0.018

p=0.145p=0.037

p=0.053

Page 34: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Prediction model

happiness s.e. satisfaction s.e. (Constant) 6.154 .351 6.385 .352

sex (male) -.012 .137 .292 .137

age (years) .016 .004 .006 .004

social grade (AB) .062 .187 .337 .188

social grade (CD) -.050 .179 .217 .175

net income (banded) .131 .058 .206 .055

parent(yes) -.049 .156 -.070 .158

highest qual (degree) .201 .171 -.263 .169

no qualifications -.169 .217 .052 .212

mode (CASI) -.060 .134 -.280 .134

n 643.000 645.000 R2 .053 .052

Page 35: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Verbatim responses

Page 36: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Verbatim responses Verbatim responses coded to a descriptive

frame with 111 codes

These were then allocated to one of 14 thematic codes

Page 37: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Thematic Codes

1. work/job/education 2. family/friends/pets 3. emotions/feelings/outlook 4. ageing 5. house/home/area 6. financial/material possessions 7. social life/hobby 8. freedom/independence 9. events/temporary 10. health (self) 11. health (other) 12. political/environmental concerns 13. neutral/in the middle 14. other/idiosyncratic

Page 38: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Significant differences in thematic codes across questions

0.0

5.0

10.0

15.0

20.0

25.0

work/job/education economy/financial/material events/temporary political/environmental

thematic code

% r

ep

ort

ing

co

de

happiness

satisfaction

Page 39: Experimental thinkaloud protocols: a new method for evaluating the validity of survey questions

Conclusions

great deal of heterogeneity in the frames of reference people use in answering trust questions

Acquaintances more trusted than strangers Problematic to assume these questions

measure generalized trust Local area question should not be used

interchangeably with standard trust item