1 class 4 basic psychometric characteristics: variability, reliability, interpretability october 15,...
TRANSCRIPT
![Page 1: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/1.jpg)
1
Class 4
Basic Psychometric Characteristics:Variability, Reliability, Interpretability
October 15, 2009
Anita L. StewartInstitute for Health & Aging
University of California, San Francisco
![Page 2: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/2.jpg)
2
Overview of Class 4
Concepts of error, sources of error and bias in measures.
Indicators of variability and reasons for poor variability
Indicators of reliability Interpretability of scores
![Page 3: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/3.jpg)
3
Components of an Individual’s Observed Item Score
(Simplistic view)
Observed true item score score
= + error
![Page 4: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/4.jpg)
4
Components of an Individual’s Observed Item Score
Observed true item score score
= + error
“score that would be obtained over repeated testings”
Nunnally, 1994, p211
![Page 5: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/5.jpg)
5
Random versus Systematic Error
Observed true item score score
= + error random
systematic
![Page 6: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/6.jpg)
6
Random versus Systematic Error
Observed true item score score
= + error random
systematic
Relevant to reliability
Relevant to validity
![Page 7: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/7.jpg)
7
Components of Variability in Item Scores of a Group of Individuals
Observed true score score variance variance
Total variance (sum of all observed item scores)
= + errorvariance
![Page 8: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/8.jpg)
8
Components of Variability in Item Scores of a Group of Individuals
Observed true score score variance variance
Total variance (sum of all observed item scores)
= +(Random)
errorvariance
![Page 9: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/9.jpg)
9
Combining Items into Multi-Item Scales
When items are combined into a summated scale, random error to some extent “cancels out”– Error variance reduced as # items increases
– Reducing random error increases amount of “true score” variance
![Page 10: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/10.jpg)
10
Sources of Error
Subjects Observers or interviewers Measure or instrument
![Page 11: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/11.jpg)
11
Example: Measuring Weight of Children
Observed score is a linear combination of many sources of variation for an individual
![Page 12: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/12.jpg)
12
Measuring Weight in Pounds (Without Shoes) of One Child
Scale ismiscalibrated
True weight
80 lbs
Amount of water
past 30 min
Weightof clothes
Observed weight
Person weighing children
is not very precise
= + +
+ +
![Page 13: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/13.jpg)
13
Measuring Weight in Pounds (Without Shoes) of One Child
Scale ismiscalibrated
+.1 lb
True weight80 lbs
Amount of water
past 30 min+.25 lb
Weightof clothes
+.70 lb
Observed weight82.1 lbs
Person weighing children
is not very precise+1 lb
= + +
+ +
82.1 = 80 +.25 +.70 +.1 +1
![Page 14: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/14.jpg)
14
Sources of Error in Measuring Weight of Children
Weight of clothes– Subject source of random error
Scale is miscalibrated– Instrument source of systematic error
Person weighing child is not precise– Observer source of random error
![Page 15: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/15.jpg)
15
Measuring Depressive Symptoms (past 4 weeks) in an Asian or Latino Man
Unwillingnessto tell
interviewer
“True” depression
16
Hard to choose number on the 1-6
response choice scale
Observed depression
score
Measuremisses 2
culturally-bound symptoms
= +
+ +Poor
memoryof feelings
+
![Page 16: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/16.jpg)
16
Measuring Depressive Symptoms (past 4 weeks) in an Asian or Latino Man
Unwillingto tell
interviewer-2
“True” depression
16
Hard to choose number on the 1-6
response choice scale+1
Observed depression
score12
Measuremisses 2
culturally-bound symptoms
-2
= +
+ +
12 = 16 +1 -2 -1 -2
Poor memory
of feelings-1
+
![Page 17: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/17.jpg)
17
Sources of Error in Measuring Depression
Hard to choose one number on 1-6 response scale– Subject source of random error
Unwilling to tell interviewer, poor memory of feelings– Subject sources of systematic error (underreport true
depression) Measure misses culturally-bound symptoms
– Instrument source of systematic error (underestimate true depression)
![Page 18: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/18.jpg)
18
Four Types of Memory Errors: From Cognitive Psychology
Encoding– Information inadequately stored in memory
Storage– Memory eroded over time
Retrieval– Some events/feelings harder to recall
Reconstruction – Errors filling in missing pieces
R Torangeau, Chap 3, in AA Stone et al. (eds)The Science of Self-Report, London: Lawrence Erlbaum, 2000
![Page 19: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/19.jpg)
19
Memory and Time Autobiographical memory – memory of
things in time and space Events not encoded with their calendar dates
– Thus time is a poor retrieval method Numerous errors remembering “when” and
“how often” something occurred within a particular time frame
N Bradburn, Chap 4, The Science of Self-Report
![Page 20: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/20.jpg)
20
Memory and Emotion
Tend to remember– positive more than negative experiences– more emotionally intense than neutral
experiences– non-threatening events more than
threatening, sensitive events
Kihlstrom et al, Chap 6, The Science of Self-Report
![Page 21: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/21.jpg)
21
Overview
Concepts of error Basic psychometric characteristics
– Variability
– Reliability
– Interpretability
![Page 22: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/22.jpg)
22
Variability
Good variability– All (or nearly all) scale levels are represented– Distribution approximates bell-shaped normal
Variability is a function of the sample– Need to understand variability of a measure in
sample similar to one you are studying Review criteria
– Adequate variability on the latent variable that is relevant to your study
![Page 23: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/23.jpg)
23
Indicators of Variability
Range of scores Mean, median, mode Standard deviation (or standard error) Interquartile range Skewness statistic % at floor (lowest possible score) % at ceiling (highest possible score)
![Page 24: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/24.jpg)
24
Range of Scores: Possible and Observed
Especially important for multi-item measures Example:
– CES-D possible range is 0-30– Wong et al. study of mothers of young
children: observed range was 0-23» missing entire high end of the distribution (none
had high levels of depression)
![Page 25: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/25.jpg)
25
Mean, Median, Mode
Mean - average Median - midpoint Mode - most frequent score In normally distributed measures, these are
all the same In non-normal distributions, they will vary
![Page 26: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/26.jpg)
26
Mean and Standard Deviation Most information on variability is from
mean and standard deviation– Can envision how measure is distributed
on the possible range
– Mean + 1 SD = 64% of the scores
![Page 27: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/27.jpg)
27
Interquartile Range (IR)
Difference between the 3rd and 1st quartiles
IR = Quartile 3 - Quartile 1 This range contains the middle 50% of the
distribution– 25% of the sample is above and 25% is
below this range
![Page 28: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/28.jpg)
28
Quartiles
Divide distribution into 4 parts with 25% of the sample in each part (quartiles)
Quartile 1 - the scale score at the boundary of the lowest 25% of the distribution
Quartile 2 - the score that divides the distribution in half (same as the median)
Quartile 3 - the score at the boundary of the highest 25% (25% of the sample scores above this point)
![Page 29: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/29.jpg)
29
Set of Scores on 12 people
1 2 3 4 5 6 7 8 9 10 11 12 2 3 8 1 7 4 4 3 2 7 5 3
4 9 1 8 2 12 7 6 11 10 5 3 1 2 2 3 3 3 4 4 5 7 7 8
12 people (red), 12 scores (black)
Re-arrange scores in numeric order
![Page 30: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/30.jpg)
30
Example of Quartiles: Set of Scores on 12 people
1 2 2 3 3 3 4 4 5 7 7 8
Q1=lowest 25% (lowest 3 people)Q2= median (50% below, 50% above)Q3=highest 25% (highest 3 people)
2.5Q1
6Q3
3.5Q2
![Page 31: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/31.jpg)
31
Example of Quartiles: Set of Scores on 12 people
1 2 2 3 3 3 4 4 5 7 7 8
Interquartile range - quartile 3 - quartile 1 = 6 - 2.5 = 3.5
2.5Q1
6Q3
3.5Q2
![Page 32: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/32.jpg)
32
Skewness
Positive skew - scores bunched at low end, long tail to the right
Negative skew - opposite pattern Skewness coefficient ranges from - infinity to +
infinity– the closer to zero, the more normal
Scores +2.0 are cause for concern
![Page 33: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/33.jpg)
33
Ceiling and Floor Effects: Similar to Skewness Information
Ceiling effects: substantial number of people get highest possible score
Floor effects: opposite More helpful for single-item measures or
coarse scales with only a few levels
![Page 34: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/34.jpg)
34
… to what extent did health problems limit you in everyday physical activities (such as walking and climbing stairs)?
0
10
20
30
40
50
Not at all Slightly Moderately Quite a bit Extremely
%
49% not limited at all (can’t improve)
![Page 35: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/35.jpg)
35
SF-36 Variability Information in Patients with Chronic Conditions (N=3,445)
Physicalfunction10 items
Role-physical
4 items
Mental health5 items
Vitality (energy)
5 items
Mean (SD) 80 (27) 75 (41) 71 (21) 54 (22)
Skewness - .99 - .26 - .83 - .24
% floor < 1 24 <1 <1
% ceiling 19 37 4 <1
McHorney C et al. Med Care. 1994;32:40-66.
All on 0-100 scales, higher is better
![Page 36: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/36.jpg)
36
Evidence of Floor and Ceiling Effects in One SF-36 Scale
Physicalfunction10 items
Role-physical
4 items
Mental health5 items
Vitality (energy)
5 items
Mean (SD) 80 (27) 75 (41) 71 (21) 54 (22)
Skewness - .99 - .26 - .83 - .24
% floor < 1 <1 <1
% ceiling 19 4 <1
McHorney C et al. Med Care. 1994;32:40-66.
All on 0-100 scales, higher is better
24
37
![Page 37: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/37.jpg)
37
Reasons for Poor Variability
Low variability in construct being measured in that “sample” (true low variation)
Items not adequately tapping construct– If only one item, especially hard
Items not detecting variation at one end What to do:
– If developing measures, add items– If selecting measures – find another one
![Page 38: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/38.jpg)
38
Advantages of Multi-item Scales Revisited
Using multi-item scales minimizes likelihood of ceiling/floor effects
Even if items are skewed, multi-item scale “normalizes” the skew
![Page 39: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/39.jpg)
39
Percent with “Best” Score on 5 Items in the MOS MHI-5
6-level response scale - all of the time to none of the time:
Stewart A. et al., Measuring Functioning and Well-Being, 1992
%
Very nervous person (none of the time) 34
Felt calm and peaceful (all of the time) 4
Felt downhearted and blue (none of the time) 33
Happy person (all of the time) 10
So down in the dumps nothing could cheer you up (none of the time) 63
![Page 40: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/40.jpg)
40
Percent with “Best” Score on 5 Items in the MOS MHI-5
6-level response scale - all of the time to none of the time:
Stewart A. et al., Measuring Functioning and Well-Being, 1992
%
Very nervous person (none of the time) 34
Felt calm and peaceful (all of the time) 4
Felt downhearted and blue (none of the time) 33
Happy person (all of the time) 10
So down in the dumps nothing could cheer you up (none of the time) 63
![Page 41: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/41.jpg)
41
Percent with “Best” Score on 5 Items in the MOS MHI-5
6-level response scale - all of the time to none of the time:
Stewart A. et al., Measuring Functioning and Well-Being, 1992
%
Very nervous person (none of the time) 34
Felt calm and peaceful (all of the time) 4
Felt downhearted and blue (none of the time) 33
Happy person (all of the time) 10
So down in the dumps nothing could cheer you up (none of the time) 63
5-itemscale:
only 5%had
highestscore
![Page 42: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/42.jpg)
42
Overview
Concepts of error Basic psychometric characteristics
– Variability
– Reliability
– Interpretability
![Page 43: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/43.jpg)
43
Reliability
Extent to which an observed score is free of random error– Produces the same score each time it is administered (all
else being equal) Population-specific - reliability affected by:
– sample size– variability in scores (dispersion)– a person’s level on the scale
![Page 44: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/44.jpg)
44
Back to Components of Variability in Item Scores of a Group of Individuals
Observed true score score variance variance
Total variance (Variation is the sum of all observed item scores)
= + errorvariance
![Page 45: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/45.jpg)
45
Reliability Depends on True Score Variance
Reliability is a group-level statistic Reliability:
– Reliability = 1 – (error variance)– OR
Proportion of variance due to true score Total variance
![Page 46: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/46.jpg)
46
Reliability Depends on True Score Variance
Reliability of .70 means 30% of variancein observed scores is due to error
Reliability = total variance – error variance.70 = 1.0 – .30
![Page 47: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/47.jpg)
47
Reliability Coefficient
Typically ranges from .00 - 1.00 Higher scores indicate better reliability
![Page 48: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/48.jpg)
48
Importance of Reliability
Necessary for validity (but not sufficient)– Low reliability (or high measurement error)
attenuates correlations with other variables – May conclude that two variables are not
related when they are Greater reliability = greater power
– The more reliable your scales, the smaller sample size you need to detect an association
![Page 49: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/49.jpg)
49
Reliable Scale?
NO! There is no such thing as a “reliable” scale We accumulate “evidence” of reliability in a
variety of populations in which it has been tested
![Page 50: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/50.jpg)
50
How Do You Know if a Scale or Measure Has Adequate Reliability?
Adequacy of reliability judged according to standard criteria
– Criteria depend on type of coefficient
![Page 51: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/51.jpg)
51
Types of Reliability Tests
Internal-consistency Test-retest Inter-rater Intra-rater
![Page 52: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/52.jpg)
52
Internal Consistency Reliability: Cronbach’s Alpha
Requires multiple items supposedly measuring same construct to calculate
Extent to which all items measure the same construct (same latent variable)
![Page 53: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/53.jpg)
53
Internal-Consistency Reliability
For multi-item scales Cronbach’s alpha
– for scales using ordinal items (e.g., 1-5) Kuder Richardson 20 (KR-20)
– for scales using dichotomous items
![Page 54: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/54.jpg)
54
Minimum Standardsfor Internal Consistency Reliability
For group comparisons (e.g., regression, correlational analyses)– .70 or above is minimum (Nunnally, 1978)– .80 is optimal– above .90 is unnecessary
For individual assessment (e.g., treatment decisions)– .90 or above (.95) is preferred (Nunnally, 1978)
![Page 55: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/55.jpg)
55
Internal-Consistency Reliability Can be Spurious
Based on only those who answered all questions in the measure– If a lot of people are having trouble with the
items and skip some, they are not included in test of reliability
Important to compare sample size in reliability calculation to total sample
![Page 56: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/56.jpg)
56
Internal-Consistency Reliability is a Function of Number of Items in Scale
Increases with the number of items Very large scales (20 or more items) can
have high reliability without other good psychometric properties
![Page 57: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/57.jpg)
57
Example: 20 item Beck Depression Inventory (BDI)
BDI 1978 version (asks about past week)– Internal consistency reliability = .86
Beck AT et al. J Clin Psychol. 1984;40:1365-1367
![Page 58: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/58.jpg)
58
Example: 20 item Beck Depression Inventory (BDI)
BDI 1978 version (asks about past week)– Internal consistency reliability = .86
– BUT: 3 items correlated < .30 with other items in the scale
Beck AT et al. J Clin Psychol. 1984;40:1365-1367
![Page 59: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/59.jpg)
59
Reliability Varies by Level on Measure
Reliability can be poorer for those scoring at one end of the scale
Example: Number of visits to doctor in past 12 months– More reliable for those with fewer visits
![Page 60: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/60.jpg)
60
Test-Retest Reliability
Repeat assessment on individuals not expected to change
Time between assessments should be:– Short enough so no change occurs– Long enough so subjects don’t recall first response
Only reliability test for single item measures Coefficient: correlation between 2
measurements
![Page 61: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/61.jpg)
61
Appropriate Test-Retest Coefficients by Type of Scale
Continuous scales (ratio or interval scales, multi-item Likert scales):– Pearson
Ordinal or non-normally distributed scales:– Spearman or Kendall’s tau
Dichotomous (categorical) measures:– Phi or Kappa
![Page 62: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/62.jpg)
62
Minimum Standards for Test-Retest Reliability
Magnitude of a test-retest correlation is important, not significance
Criterion: similar to that for internal consistency
– >.70 is desirable
– >.80 is optimal
![Page 63: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/63.jpg)
63
Observer or Rater Reliability
Inter-rater reliability (across two or more raters)– Consistency (correlation) between two or more
observers of the same subjects (one point in time)
Intra-rater reliability (within one rater)– Consistency within one observer– Correlation among repeated values obtained by the
same observer (over time)
![Page 64: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/64.jpg)
64
Observer or Rater Reliability
Sometimes Pearson correlations are used – scores on a group of individuals obtained by one observer correlated with scores obtained by another observer– Assesses association only
.65 to .95 are typical correlations >.85 is considered acceptable
McDowell I et al. Measuring Health, 2006, p. 45.
![Page 65: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/65.jpg)
65
Association vs. Agreement When Correlating Scores from Two Times or Ratings
Association: degree to which scores of one rater linearly predict scores of 2nd rater
Agreement: extent to which same score obtained on 2nd measurement (retest, 2nd rater)
Can have high correlation and poor agreement– If second score is consistently higher for all
subjects, can obtain high correlation– Need second test of mean differences
![Page 66: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/66.jpg)
66
Hypothetical Scores on 4 Subjects by 2 Observers
1
2
3
4
5
6
7
S1 S2 S3 S4
Subjects
![Page 67: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/67.jpg)
67
Example of Association and Agreement
Scores by observer 1 are exactly 2 points above scores by observer 2– Correlation (association) would be perfect
(r=1.0)
– Agreement is poor (no agreement on score in all cases - a difference of 2 between scores on each subject
![Page 68: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/68.jpg)
68
Intraclass Correlation Coefficient (Kappa) for Testing Inter-rater Reliability
Coefficient indicates level of agreement of two or more judges, exceeding that which would be expected by chance
Appropriate for dichotomous (categorical) scales and ordinal scales
Several forms of kappa:– e.g., Cohen’s kappa: 2 judges, dichotomous scale
Sensitive to number of observations, distribution of data
![Page 69: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/69.jpg)
69
Interpreting Magnitude of Kappa: Level of Reliability
<0.00
.00 - .20
.21 - .40
.41 - .60
.61 - .80
.81 - 1.00
Poor
Slight
Fair
Moderate
Substantial
Almost perfect
.60 or higher is acceptable (Landis, 1977)
![Page 70: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/70.jpg)
70
Reliability Often Poorer in Lower SES or Low Literacy Groups
More random error due to Reading problems Difficulty understanding complex
questions Unfamiliarity with questionnaires and
surveys
![Page 71: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/71.jpg)
71
Advantages of Multi-item Scales Revisited
Using multi-item scales improves reliability
Random error is “canceled out” across multiple items
![Page 72: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/72.jpg)
72
What Makes a Measure Reliable?
Preventing measurement error easier than assessing its effects
Measure– Clear items, appropriate response choices, etc.
Format– Make instrument easily understood
Method of administration– Train raters to do their job– Adhere to standard administration procedures
![Page 73: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/73.jpg)
73
Overview
Concepts of error Basic psychometric characteristics
– Variability
– Reliability
– Interpretability
![Page 74: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/74.jpg)
74
Interpretability: What does a Score Mean?
What are the endpoints? What does a high score mean?
(direction of scoring) Compared to norms - is score low or high?
Single items, more easily interpretable
Multi-item scales, no inherent meaning to scores
![Page 75: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/75.jpg)
75
Endpoints
What is minimum and maximum possible?– Enable interpretation of mean score
When scores are added, endpoints depend on number of items & number of response choices– 5 items, 4 response choices = 5 to 20– 3 items, 5 response choices = 3 to 15
![Page 76: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/76.jpg)
76
Compare Results to Norms
Comparing your means to published norms helps interpret the mean of your sample
SF-36 has numerous norms, e.g.– General population
» By age group, gender, and chronic disease
![Page 77: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/77.jpg)
77
SF-36 in MOS Patients versus Population Norms
Physicalfunction
Role-physical
Mental health
Vitality (energy)
MOS patients
Mean (SD) 80 (27) 75 (41) 71 (21) 54 (22)
NORMS
Gen pop 84 (23) 81 (34) 75 (18) 61 (21)
Age 75+ 53 (30) 45 (42) 74 (20) 50 (24)
JE Ware et al, SF-36 Health Survey Manual andInterpretation Guide, The Health Institute, 1993.
![Page 78: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/78.jpg)
78
Direction of Scoring
What does a high score mean? Where in the range does the mean score
lie?– Toward top, bottom?
– In the middle?
![Page 79: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/79.jpg)
79
Descriptive Statistics for ~3,000 Women
M (SD) Min Max
Age 46.2 (2.7) 42.0 52.9
Activity 7.7 (1.8) 3.0 14.0
Stress 8.6 (2.9) 4.0 19.0
Med Care, 2003;41:1262-1276
![Page 80: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/80.jpg)
80
Descriptive Statistics for ~3,000 Women
M (SD) Min Max
Age 46.2 (2.7) 42.0 52.9
Activity 7.7 (1.8) 3.0 14.0
Stress 8.6 (2.9) 4.0 19.0
Med Care, 2003;41:1262
Activity: no measure mentionedStress: Perceived stress scale (Cohen, 1983)
![Page 81: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/81.jpg)
81
Perceived Stress Scale (Cohen 1983): Hard to Find
Available in JSTOR– Can print one page at a time
Searched article “on line” – Could not find scoring information other than
reverse 7 of the 14 items and sum them» Possible score range of 0-56
– Could not find response choices
![Page 82: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/82.jpg)
82
Another Example: Mean Scores in a Sample of Older Adults
Physical functioning 45.0Sleep problems 28.1Disability 35.7
Mean
![Page 83: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/83.jpg)
83
Making it Easier to Interpret
Physical functioning 45.0Sleep problems 28.1Disability 35.7
* All scores 0-100
Mean*
![Page 84: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/84.jpg)
84
Making it Easier to Interpret
Physical functioning (+) 45.0Sleep problems (-) 28.1Disability (-) 35.7
* All scores 0-100 (+) indicates higher score is better health(-) indicates lower score is better health
Mean*
![Page 85: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/85.jpg)
85
Confusion Introduced by Labels:
SF-36 Bodily Pain scale– Higher score is no pain or limitations due to pain– Rationale: so 8 subscales scored in same direction
Social Adjustment Scale (Weissman) Functional Status Index (Jette)
![Page 86: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/86.jpg)
86
Mean Has to be Interpreted Within Possible Range
M SD
Parents’ harsh discipline practices* Interviewers’ ratings of mother 2.55 .74 Husbands’ reports of wife 5.32 3.30
*Note: high score indicates more harsh practices
![Page 87: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/87.jpg)
87
Mean Has to be Interpreted Within Possible Range (Add Range)
M SD
Parents’ harsh discipline practices* Interviewers’ ratings of mother (1-5) 2.55 .74 Husbands’ reports of wife (1-7) 5.32 3.30
*Note: high score indicates more harsh practices
![Page 88: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/88.jpg)
88
Mean Has to be Interpreted Within Possible Range
M SD
Parents’ harsh discipline practices* Interviewers’ ratings of mother (1-5) 2.55 .74 Husbands’ reports of wife (1-7) 5.32 3.30
Interviewer: 1 2 3 4 5
Husband: 1 2 3 4 5 6 7
*Note: high score indicates more harsh practices
2.55
5.32
![Page 89: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/89.jpg)
89
Transforming a Summated Scale to a 0-100 Scale
Works with any ordinal or summated scale Transforms it so 0 is the lowest possible and
100 is the highest possible Eases interpretation across numerous scales
100 x (observed score - minimum possible score)
(maximum possible score - minimum possible score)
![Page 90: 1 Class 4 Basic Psychometric Characteristics: Variability, Reliability, Interpretability October 15, 2009 Anita L. Stewart Institute for Health & Aging](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649f455503460f94c66a96/html5/thumbnails/90.jpg)
90
Homework
Complete rows 13-19 on matrix for both measures– Interpretability, nature of samples on which
it has been tested, variability and central tendency, reliability