how does health psychology measure up?
DESCRIPTION
A critical look at measurement in Health Psychology.TRANSCRIPT
![Page 1: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/1.jpg)
How does health psychology measure up?
A critical look at measurement in health psychology
Matthew Hankins16th September 2011
![Page 2: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/2.jpg)
2
The empirical basis of Health Psychology• Why do Health Psychologists collect data?
– Theory generation, esp. identifying constructs– Theory corroboration – Measuring outcomes (trials etc.)
• The value of such activities is therefore critically dependent on the quality of the data
![Page 3: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/3.jpg)
3
Questionnaire measures• Majority of data collected by Health Psychologists
is generated by questionnaire measures (‘scales’)
• Questionnaires vary in the quality of data that they generate
– Validity: extent to which the questionnaire measures what is intended
– Reliability: extent to which variance in data reflects variance in construct measured
• Index of measurement error
![Page 4: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/4.jpg)
4
Pragmatic approach• Validity
– Unidimensionality (factor analysis)– Associations between measures– Discrimination between known groups
• Reliability
– Estimated by Cronbach’s Alpha– Or test-retest correlation
![Page 5: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/5.jpg)
5
Scale development• Combination of these approaches is derived from
‘Classical Test Theory’ (CTT)
– Originated with Spearman (1904)– Landmark text: Guilford 2nd ed. (1954) – Fully developed by Lord & Novick (1968)
• Further developments: ‘item-response theory’ (IRT)
– E.g Rasch model (1960)
• CTT implicit in most empirical Health Psychology research
![Page 6: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/6.jpg)
6
What is a scale?• A scale orders people on the construct of interest
• Both CTT & IRT agree that a person’s position on the dimension can be estimated from the item scores
• Strength of IRT is that it does not assume that a set of correlated items forms a scale
• Implicit in CTT: if items load on same factor, we automatically assume that they form a scale
Construct
Low Person A Person B Person C Person D High
![Page 7: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/7.jpg)
7
Scaling problem• Whether a set of items forms a scale is a hypothesis
(Guttman 1950)
– Formally tested whether items formed ‘Guttman scales’
• “In contemporary psychometric practice, it is the rule rather than the exception that two people having the same score on a test will have [endorsed]different items…Such scores are crude empirical devices known to have some predictive efficiency, but they cannot be called measurements in any strict sense” (Loevinger 1948)
• Additionally, there is no rational basis for adding up a set of ordinal Likert scores unless they have been shown to scale
![Page 8: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/8.jpg)
8
Example: PHQ-9• Feeling tired + Little interest in doing things +
Poor appetite several days in last 2 weeks
– Scale score = +3
• Thoughts of hurting yourself in some way nearly every day in last 2 weeks
– Scale score = +3
• Are these responses really equivalent?
![Page 9: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/9.jpg)
9
Implications• If a set of items are assumed to form a scale, then
we cannot be sure that the scale score accurately ranks people on the construct of interest
– People with different positions may be assigned the same score
– People with the same position may be assigned different scores
• Unless we test this hypothesis, assessing reliability & validity is pointless
![Page 10: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/10.jpg)
10
Rejecting the hypothesis of a scale• Scales are very rarely ‘rejected’ in health
psychology
• Reliability is usually reported as ‘acceptable’ or ‘good’
– Based on arbitrary cut-off around 0.7 (0.6, 0.5…)– “Test-retest reliability was acceptable (r=0.43)”
• Criteria for validity are usually not specified in advance
– Any factor structure can be accommodated– Any association can be cited as ‘validating’ scale
• Formal testing of ‘scalability’ of items rare
![Page 11: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/11.jpg)
11
What we would like: interval scales
What we might have: ordinal scales
What we probably have: disordered categories
A scale that cannot rank-order people is not a scale
Disordered categories
![Page 12: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/12.jpg)
12
Item ‘difficulty’ (intensity)• The problem arises because CTT does not account
for item difficulty or intensity
• Some items are endorsed at low levels of the construct
– ‘Low intensity item’– Endorsement may indicate low or high level of construct
• Some items are endorsed at high levels of the construct
– ‘High intensity item’– Endorsement indicates high level of construct
![Page 13: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/13.jpg)
13
Example: PHQ-9• Feeling tired on several days is a low intensity item
– Endorsed at low level of depression– But may also be endorsed at higher levels of
depression
Depression
Low Yes Yes Yes Yes High
![Page 14: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/14.jpg)
14
Example: PHQ-9• Thoughts of hurting yourself in some way nearly
every day in last 2 weeks is a high intensity item
– Endorsed at high level of depression– But not endorsed at lower levels of depression
Depression
Low No No No Yes High
![Page 15: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/15.jpg)
15
How CTT fails to deal with item intensityFactor analysis groups items of similar intensity
• Factor analysis of a unidimensional construct will produce more than one ‘factor’
• These ‘factors’ are simply sets of items with similar intensities
![Page 16: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/16.jpg)
16
Example: GHQ-12
• Example: GHQ-12
• Many studies report 2- or 3-factor solutions
• ‘Factors’ simply group items by intensity (Hankins 2008)
Psychiatric morbidity
Low High7 4 5 2 6 10 111 12 98 3
![Page 17: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/17.jpg)
17
How CTT fails to deal with item intensitySelecting items on basis of factor analysis exacerbates problem, but simultaneously conceals it
• Items are selected on basis of similar intensities, creating scales with limited range but high reliability
Psychiatric morbidity
Low High7 4 5 2 6 10 111 12 98 3
Low High
7 41 128 3
Psychiatric morbidity
![Page 18: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/18.jpg)
18
Why Rasch modelling is not the answer• Rasch modelling (RM) explicitly takes into account
item intensities
– Stochastic Guttman scale
• Tests the hypothesis that items form a scale
• Additionally claims to produce interval scaling & ‘objective’ measurement
• Increasingly popular in Health Psychology
![Page 19: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/19.jpg)
19
CTT vs. IRT• Argument tends to be that IRT is superior to CTT &
IRT is ‘objective’ measurement
• Differences more apparent than real:
– Large correlations between CTT data & IRT data– If data treated as ordinal, perfect correlation
between CTT & Rasch data
From Embretson & Reise (2000)
![Page 20: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/20.jpg)
20
GHQ-12: CTT scoring vs. RM scoring
![Page 21: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/21.jpg)
21
Problems• Rasch models require very large samples to allow
estimation of person and item parameters
• Very strong assumptions, e.g. logistic item-response curve
– Why should all items have the same form of response?
• The data must fit the model, not the other way round
– Discards potentially useful data to fit arbitrary assumptions
• Interval scaling is questionable gain if psychological constructs are not quantitative in the first place
![Page 22: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/22.jpg)
22
Ontological diversion• In general, psychologists seem to believe that
attributes are either categorical or quantitative
– A ‘cat’ is a different from a ‘tree’: different categories, difference is qualitative
– 30cm is different 60cm: different quantities, difference is quantitative
• Having made this distinction, quantitative attributes may be measured as categorical, ordinal, interval
• Ordinal attributes cannot exist in their own right
– Just a way of collecting data on a quantitative attribute
![Page 23: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/23.jpg)
23
Ontological diversion• Russell (1896): the difference between two
quantities is itself a quantity
– The difference between two lengths is itself a length
• For psychological attributes to be quantitative, the difference between two ‘levels’ of that attribute must itself be a ‘level’ of that attribute
– Is the difference between two pleasures itself a pleasure?– Is the difference between two levels of depression itself a
level of depression?
• If not, are psychological states then merely categorical?
– But what then do we mean by ‘severity’ of depression?
![Page 24: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/24.jpg)
24
Ontological diversion• Is it possible for psychological attributes to be
ordinal?
– Can something exist in degree but not quantity?
• Michell (2009) argues that we cannot assume quantity from degree
– shows that they are logically separable: “It is possible that an ordered attribute is non-quantitative”
• Collingwood (1933) argues that some concepts exist only in degree
![Page 25: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/25.jpg)
25
Ontological diversion• Are we comfortable talking about degree, rather
than quantity?
• Implicit in our descriptions and experiences of psychological attributes
– But does not require the assumption that the attributes are quantitative
![Page 26: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/26.jpg)
26
The degrees of the lie• JAQUES
– Can you nominate in order now the degrees of the lie?
• TOUCHSTONE
– O sir, we quarrel in print, by the book; as you have books for good manners: I will name you the degrees. The first, the Retort Courteous; the second, theQuip Modest; the third, the Reply Churlish; thefourth, the Reproof Valiant; the fifth, theCountercheque Quarrelsome; the sixth, the Lie withCircumstance; the seventh, the Lie Direct.
• As You Like It, Act 5 Scene 1
![Page 27: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/27.jpg)
27
Summary• Measurement methods in health psychology are
suboptimal
• In particular, the fundamental assumption that correlated items form a scale is not routinely tested
• IRT models such as the Rasch model assume that interval scaling is meaningful
• Psychological attributes may not exist as quantities
• Is there a method for constructing purely ordinal scales?
![Page 28: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/28.jpg)
28
Non-parametric IRT (NPIRT)• E.g. Mokken (1971)
• Takes into account item intensities
– Stochastic Guttman scale
• Claims only to rank order people
• Very weak assumptions
– Retains data
• Complements CTT
– Uses simple scale score
![Page 29: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/29.jpg)
Examples of NPIRT analysis
![Page 30: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/30.jpg)
• Mokken (1971) proposed two models
– Monotone homogeneity model (MH)– Doubly monotone model (DM)
• Scales fitting the MH model rank order people on the attribute of interest
• Corollary is that scales not fitting the MH model do not rank order people on the attribute of interest
![Page 31: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/31.jpg)
• Select items for the scale based on homogeneity
• Assess whether the resulting scale fits the MH model
• Scaling procedure and the MH model based on the following minimal assumptions:
– For all items, if person A has a higher degree of X than person B, A’s probability of endorsing an item will be equal to or higher than B’s
– Local independence: item scores are uncorrelated for the same degree of attribute
![Page 32: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/32.jpg)
• If the purpose of the scale is to rank order people on a given attribute then the scale must be monotone homogenous
• Probability of item being endorsed must be monotone nondecreasing against attribute
• i.e. probability of item endorsement does not decrease with an increase in the measured attribute
* - as estimated from the remaining items of the scale
![Page 33: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/33.jpg)
For this GHQ-12 item the probability of endorsement reaches 50% at a low level of psychological distress.
It is therefore a low intensity item: people endorsing this item are signalling a low level of distress.
![Page 34: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/34.jpg)
For this GHQ-12 item the probability of endorsement reaches 50% at a high level of psychological distress.
It is therefore a high intensity item: people endorsing this item are signalling a high level of distress.
![Page 35: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/35.jpg)
• If two items belong to a unidimensional scale, then:
– Endorsing the more intense item entails that the less intense item also be endorsed
– Endorsing the less intense item does not entail that the more intense item be endorsed
• For a Guttman scale, these are deterministic statements
• For a Mokken scale, these are probabilistic statements
![Page 36: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/36.jpg)
• A Guttman error occurs when the more intense item is endorsed but not the less intense item
• Too many Guttman errors imply that items are not measuring the same attribute
More intense item
Less intense item
![Page 37: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/37.jpg)
• This asymmetrical relationship between item pairs can be summarised with Loevinger’s H
– H is the coefficient of homogeneity between two items i and j
• Ranges from 0.0 to 1.0
– 0.0 indicates no association between items– 1.0 indicates perfect association, given the differences in item
intensity– 1.0 also indicates no Guttman errors
• Mokken (1971) developed H for scale development
– Hij : Homogeneity of pair of items
– Hi : Homogeneity of item i with all items
– H : Homogeneity of scale
![Page 38: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/38.jpg)
• All Hij > 0
• Start with item pair with highest Hij
• Select third item to maximise scale H
• Proceed until H reaches threshold value c
• Produces a unidimensional scale
– c = 0.3; weak scale– c = 0.4; medium scale– c = 0.5; strong scale– c = 1.0; perfect Guttman scale
![Page 39: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/39.jpg)
Results for GHQ-12
Step Item Scale H1 p6d 0.791 n4d 0.792 n6d 0.733 n5d 0.684 n2d 0.645 n3d 0.616 p5d 0.597 p3d 0.578 p4d 0.559 n1d 0.5310 p2d 0.5111 p1d 0.50
• => the items of the GHQ-12 form a strong unidimensional scale
![Page 40: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/40.jpg)
Monotone homogeneity model: GHQ-12
Item H #vi maxvi zmax #zsig
p1d 0.44 0 0.00 0.00 0
n1d 0.45 0 0.00 0.00 0
p2d 0.43 1 0.06 0.99 0
p3d 0.50 0 0.00 0.00 0
n2d 0.55 0 0.00 0.00 0
n3d 0.51 0 0.00 0.00 0
p4d 0.47 0 0.00 0.00 0
p5d 0.50 1 0.05 0.90 0
n4d 0.56 0 0.00 0.00 0
n5d 0.50 0 0.00 0.00 0
n6d 0.56 1 0.05 0.93 0
p6d 0.53 1 0.04 0.68 0
• Small deviations from MH model but none significant
![Page 41: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/41.jpg)
![Page 42: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/42.jpg)
![Page 43: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/43.jpg)
Conclusion
• The GHQ-12 is a strongly homogenous unidimensional scale
• Small deviations from monotone homogeneity, none significant
• The GHQ-12 summed score can rank order people by the measured attribute
• i.e. it can serve as an ordinal measure of severity of psychiatric impairment
• Compare to results of EFA/CFA studies
![Page 44: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/44.jpg)
Example: Northwick Park dependency scale
• Item selection from pool of 16 items
Item Scale H
Q8 0.93
Q5 0.93
Q9 0.93
Q2 0.91
Q1 0.88
Q13 0.87
Q7 0.84
Q12 0.82
Q6 0.79
Q14 0.76
Q4 0.74
Q3 0.70
Q11 0.67
Q15 0.62
• 14 items form unidimensional scale
![Page 45: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/45.jpg)
• Two items with serious violations of monotone homogeneity
Item H #vi maxvi zmax #zsig
Q3 0.45 6 0.25 2.88 4
Q11 0.32 5 0.28 3.43 2
Q3: help required using toilet (urination)
Q11: help required with drinking
![Page 46: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/46.jpg)
![Page 47: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/47.jpg)
• Some items decrease in probability as attribute increases
• With extreme dependency, patients require less help with drinking and emptying bladder– Because at this extreme, they are more likely to be
tube-fed and catheterised • Hence, for these items, probability of
endorsement decreases as dependency increases– Scale is not monotone homogenous
• The summed score will not rank order people on the measured attribute
![Page 48: How does health psychology measure up?](https://reader035.vdocuments.us/reader035/viewer/2022062418/5549345fb4c905194d8b4583/html5/thumbnails/48.jpg)
48
Summary• The credibility of Health Psychology research &
practice rests on its empirical evidence base
• This evidence base relies on the quality of questionnaire data
• The quality of questionnaire data may be compromised by the use of inappropriate methods
• We should stop relying on factor analysis & reliability coefficients & test the hypothesis that a set of items constitutes a scale