1 class 4 psychometric characteristics part i: sources of error, variability, reliability,...
TRANSCRIPT
![Page 1: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/1.jpg)
1
Class 4
Psychometric Characteristics Part I: Sources of Error, Variability, Reliability,
Interpretability October 12, 2006
Anita L. StewartInstitute for Health & Aging
University of California, San Francisco
![Page 2: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/2.jpg)
2
Overview of Class 4
Concepts of error Basic psychometric characteristics
– Variability
– Reliability
– Interpretability
![Page 3: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/3.jpg)
3
Components of an Individual’s Observed Item Score
(NOTE: Simplistic view)
Observed true item score score
= + error
![Page 4: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/4.jpg)
4
Components of Variability in Item Scores of a Group of Individuals
Observed true score score variance variance
Total variance (Variation is the sum of all observed item scores)
= + errorvariance
![Page 5: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/5.jpg)
5
Combining Items into Multi-Item Scales
When items are combined into a scale score, error cancels out to some extent– Error variance is reduced as more items
are combined
– As you reduce random error, amount of “true score” variance increases
![Page 6: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/6.jpg)
6
Sources of Error
Subjects Observers or interviewers Measure or instrument
![Page 7: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/7.jpg)
7
Measuring Weight in Pounds of Children: Weight without shoes
Observed scores is a linear combination of many sources of variation for an individual
![Page 8: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/8.jpg)
8
Measuring Weight in Pounds of Children: Weight without shoes
Scale ismiscalibrated
True weight
Amount of water
past 30 min
Weightof clothes
Observed weight
Person weighing children
is not very precise
= + +
+ +
![Page 9: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/9.jpg)
9
Measuring Weight in Pounds of Children: Weight without shoes
Scale ismiscalibrated
+1 lb
True weight80 lbs
Amount of water
past 30 min+.25 lb
Weightof clothes
+.75 lb
Observed weight83 lbs
Person weighing children
is not very precise+1 lb
= + +
+ +
83 = 80 +.25 +.75 +1 +1
![Page 10: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/10.jpg)
10
Sources of Error
Weight of clothes– Subject source of error
Person weighing child is not precise– Observer source of error
Scale is miscalibrated– Instrument source of error
![Page 11: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/11.jpg)
11
Measuring Depressive Symptoms in Asian and Latino Men
Unwillingnessto tell
interviewer
“True” depression
Hard to choose onenumber on the 1-6
response choice scale
Observed depression
score
Measurenot culturally
sensitive
= +
+ +
![Page 12: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/12.jpg)
12
Measuring Depressive Symptoms in Asian and Latino Men
Unwillingnessto tell
interviewer-3
“True” depression
16
Hard to choose onenumber on the 1-6
response choice scale+2
Observed depression
score13
Measurenot culturally
Sensitive-2
= +
+ +
13 = 16 +2 -3 -2
![Page 13: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/13.jpg)
13
Return to Components of an Individual’s Observed Item Score
Observed true item score score
= + error
![Page 14: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/14.jpg)
14
Components of an Individual’s Observed Item Score
Observed true item score score
= + error random
systematic
![Page 15: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/15.jpg)
15
Sources of Error in Measuring Weight
Weight of clothes– Subject source of random error
Scale is miscalibrated– Instrument source of systematic error
Person weighing child is not precise– Observer source of random error
![Page 16: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/16.jpg)
16
Sources of Error in Measuring Depression
Hard to choose one number on 1-6 response scale– Subject source of random error
Unwillingness to tell interviewer– Subject source of systematic error (underreporting
true depression) Instrument is not culturally sensitive (missing
some components)– Instrument source of systematic error
![Page 17: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/17.jpg)
17
Memory Errors – From Cognitive Psychology
Error remembering “when” and “how often” something occurred within some time frame
Memory and emotion – tend to remember– positive more than negative experiences– more emotionally intense than neutral experiences
Memory for threatening, sensitive events is more error prone than non-threatening events
AA Stone et al. (eds), The Science of Self-Report,London: Lawrence Erlbaum, 2000.
![Page 18: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/18.jpg)
18
Overview
Concepts of error Basic psychometric characteristics
– Variability
– Reliability
– Interpretability
![Page 19: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/19.jpg)
19
Variability
Good variability– All (or nearly all) scale levels are represented– Distribution approximates bell-shaped normal
Variability is a function of the sample– Need to understand variability of measure of
interest in sample similar to one you are studying Review criteria
– Adequate variability in a range that is relevant to your study
![Page 20: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/20.jpg)
20
Common Indicators of Variability
Range of scores (possible, observed) Mean, median, mode Standard deviation (standard error) Skewness % at floor (lowest score) % at ceiling (highest score)
![Page 21: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/21.jpg)
21
Range of Scores
Especially important for multi-item measures Possible and observed Example of difference:
– CES-D possible range is 0-30– Wong et al. study of mothers of young children:
observed range was 0-23» missing entire high end of the distribution (none had high
levels of depression)
![Page 22: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/22.jpg)
22
Mean, Median, Mode
Mean - average Median - midpoint Mode - most frequent score In normally distributed measures, these are
all the same In non-normal distributions, they will vary
![Page 23: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/23.jpg)
23
Mean and Standard Deviation
Most information on variability is from mean and standard deviation– Can envision how it is distributed on the
possible range
![Page 24: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/24.jpg)
24
Normal Distributions(Or Approximately Normal)
Mean, SD tell the entire story of the distribution + 1 SD on each side of the mean = 64%
of the scores
![Page 25: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/25.jpg)
25
Skewness
Positive skew - scores bunched at low end, long tail to the right
Negative skew - opposite pattern Coefficient ranges from - infinity to + infinity
– the closer to zero, the more normal Test whether skewness coefficient is significantly
different from zero– thus depends on sample size
Scores +2.0 are cause for concern
![Page 26: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/26.jpg)
26
Skewed Distributions
Mean and SD are not as useful – SD often goes out beyond the maximum or
minimum possible
![Page 27: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/27.jpg)
27
Ceiling and Floor Effects: Similar to Skewness Information
Ceiling effects: substantial number of people get highest possible score
Floor effects: opposite Not very meaningful for continuous scales
– there will usually be very few at either end More helpful for single-item measures or
coarse scales with only a few levels
![Page 28: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/28.jpg)
28
… to what extent did health problems limit you in everyday physical activities (such as walking and climbing stairs)?
0
10
20
30
40
50
Not at all Slightly Moderately Quite a bit Extremely
%
49% not limited at all (can’t improve)
![Page 29: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/29.jpg)
29
… to what extent did health problems limit you in everyday physical activities (such as walking and climbing stairs)?
0
10
20
30
40
50
Not at all Slightly Moderately Quite a bit Extremely
%
49% not limited at all (can’t improve)
![Page 30: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/30.jpg)
30
SF-36 Variability Information in Patients with Chronic Conditions (N=3,445)
Physicalfunction
Role-physical
Mental health
Vitality (energy)
0-100 0-100 0-100 0-100
Mean 80 75 71 54
SD 27 41 21 22
Skewness - .99 - .26 - .83 - .24
% floor < 1 24 <1 <1
% ceiling 19 37 4 <1
McHorney C et al. Med Care. 1994;32:40-66.
![Page 31: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/31.jpg)
31
SF-36 Variability Information in Patients with Chronic Conditions (N=3,445)
Physicalfunction
Role-physical
Mental health
Vitality (energy)
0-100 0-100 0-100 0-100
Mean 80 75 71 54
SD 27 41 21 22
Skewness - .99 - .26 - .83 - .24
% floor < 1 <1 <1
% ceiling 19 4 <1
McHorney C et al. Med Care. 1994;32:40-66.
24
37
![Page 32: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/32.jpg)
32
Reasons for Poor Variability
Low variability in construct being measured in that “sample” (true low variation)
Items not adequately tapping construct– If only one item, especially hard
Items not detecting important differences in construct at one or the other end of the continuum
Solutions if one is in the process of developing measures: add items
![Page 33: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/33.jpg)
33
Advantages of multi-item scales revisited
Using multi-item scales minimizes likelihood of ceiling/floor effects
When items are skewed, multi-item scale “normalizes” the skew
![Page 34: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/34.jpg)
34
Percent with Highest (Best) Score:MOS 5-Item Mental Health Index
Items (6 pt scale - all of the time to none of the time): – Very nervous person - 34% none of the time– Felt calm and peaceful - 4% all of the time– Felt downhearted and blue - 33% none of the time– Happy person - 10% all of the time– So down in the dumps nothing could cheer you up – 63%
none of the time Summated 5-item scale (0-100 scale)
– Only 5% had highest scoreStewart A. et al., MOS book, 1992
![Page 35: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/35.jpg)
35
Overview
Concepts of error Basic psychometric characteristics
– Variability
– Reliability
– Interpretability
![Page 36: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/36.jpg)
36
Reliability
Extent to which an observed score is free of random error– Produces the same score each time it is administered (all else
being equal) Population-specific; reliability increases with:
– sample size– variability in scores (dispersion)– a person’s level on the scale
![Page 37: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/37.jpg)
37
Components of Variability in Item Scores of a Group of Individuals
Observed true score score variance variance
Total variance (Variation is the sum of all observed item scores)
= + errorvariance
![Page 38: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/38.jpg)
38
Reliability Depends on True Score Variance
Reliability is a group-level statistic Reliability:
– Reliability = 1 – (error variance)– Reliability is:
Proportion of variance due to true score Total variance
![Page 39: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/39.jpg)
39
Reliability Depends on True Score Variance
Proportion of variance due to true score Total variance
Reliability = Total variance – error variance .70 = 100% - 30%
![Page 40: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/40.jpg)
40
Reliability Depends on True Score Variance
Reliability of .70 means 30% of the variancein the observed score is explainedby error
Reliability = total variance – error variance
Proportion of variance due to true score Total variance
![Page 41: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/41.jpg)
41
Importance of Reliability
Necessary for validity (but not sufficient)– Low reliability attenuates correlations with other
variables (harder to detect true correlations among variables)
– May conclude that two variables are not related when they are
Greater reliability, greater power – Thus the more reliable your scales, the smaller
sample size you need to detect an association
![Page 42: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/42.jpg)
42
Reliability Coefficient
Typically ranges from .00 - 1.00 Higher scores indicate better reliability
![Page 43: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/43.jpg)
43
How Do You Know if a Scale or Measure Has Adequate Reliability?
Adequacy of reliability judged according to standard criteria– Criteria depend on type of coefficient
![Page 44: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/44.jpg)
44
Types of Reliability Tests
Internal-consistency Test-retest Inter-rater Intra-rater
![Page 45: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/45.jpg)
45
Internal Consistency Reliability: Cronbach’s Alpha
Requires multiple items supposedly measuring same construct to calculate
Extent to which all items measure the same construct (same latent variable)
![Page 46: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/46.jpg)
46
Internal-Consistency Reliability
For multi-item scales Cronbach’s alpha
– ordinal scales Kuder Richardson 20 (KR-20)
– for dichotomous items
![Page 47: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/47.jpg)
47
Minimum Standardsfor Internal Consistency Reliability
For group comparisons (e.g., regression, correlational analyses)– .70 or above is minimum (Nunnally, 1978)– .80 is optimal– above .90 is unnecessary
For individual assessment (e.g., treatment decisions)– .90 or above (.95) is preferred (Nunnally, 1978)
![Page 48: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/48.jpg)
48
Internal-Consistency Reliability Can be Spurious
Based on only those who answered all questions in the measure– If a lot of people are having trouble with the
items and skip some, they are not included in test of reliability
![Page 49: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/49.jpg)
49
Internal-Consistency Reliability is a Function of Number of Items in Scale
Increases with the number of items Very large scales (20 or more items) can
have high reliability without other good scaling properties
![Page 50: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/50.jpg)
50
Example: 20 item Beck Depression Inventory (BDI)
BDI 1978 version (past week)– reliability .86
– 3 items correlated < .30 with other items in the scale
Beck AT et al. J Clin Psychol. 1984;40:1365-1367
![Page 51: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/51.jpg)
51
Test-Retest Reliability
Repeat assessment on individuals who are not expected to change
Time between assessments should be:– Short enough so no change occurs– Long enough so subjects don’t recall first response
Coefficient is a correlation between two measurements For single item measures, the only way to test
reliability
![Page 52: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/52.jpg)
52
Appropriate Test-Retest Coefficients by Type of Measure
Continuous scales (ratio or interval scales, multi-item Likert scales):– Pearson
Ordinal or non-normally distributed scales:– Spearman– Kendall’s tau
Dichotomous (categorical) measures:– Phi– Kappa
![Page 53: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/53.jpg)
53
Minimum Standards for Test-Retest Reliability
Significance of a test-retest correlation has NOTHING to do with the adequacy of the reliability
Criteria: similar to those for internal consistency
– >.70 is desirable
– >.80 is optimal
![Page 54: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/54.jpg)
54
Observer or Rater Reliability
Inter-rater reliability (across two or more raters)– Consistency (correlation) between two or more
observers on the same subjects (one point in time)
Intra-rater reliability (within one rater)– A test-retest within one observer– Correlation among repeated values obtained by the
same observer (over time)
![Page 55: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/55.jpg)
55
Observer or Rater Reliability
Sometimes Pearson correlations are used - correlate one observer with another– Assesses association only
.65 to .95 are typical correlations >.85 is considered acceptable
McDowell and Newell
![Page 56: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/56.jpg)
56
Association vs. Agreement When Correlating Two Times or Ratings
Association is degree to which one score linearly predicts other score
Agreement is extent to which same score is obtained on second measurement (retest, second observer)
Can have high correlation and poor agreement– If second score is consistently higher for all
subjects, can obtain high correlation– Need second test of mean differences
![Page 57: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/57.jpg)
57
Hypothetical Scores on 4 Subjects by 2 Observers
1
2
3
4
5
6
7
S1 S2 S3 S4
Subjects
![Page 58: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/58.jpg)
58
Example of Association and Agreement
Scores by observer 1 are exactly 2 points above scores by observer 2– Correlation (association) would be perfect
(r=1.0)
– Agreement is poor (no agreement on score in all cases - a difference of 2 between scores on each subject
![Page 59: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/59.jpg)
59
Intraclass Correlation Coefficient for Testing Inter-rater Reliability (Kappa) Coefficient indicates level of agreement of two
or more judges, exceeding that which would be expected by chance
Appropriate for dichotomous (categorical) scales and ordinal scales
Several forms of kappa:– e.g., Cohen’s kappa is for 2 judges, dichotomous
scale Sensitive to number of observations,
distribution of data
![Page 60: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/60.jpg)
60
Interpreting Kappa: Level of Reliability
<0.00
.00 - .20
.21 - .40
.41 - .60
.61 - .80
.81 - 1.00
Poor
Slight
Fair
Moderate
Substantial
Almost perfect
.60 or higher is acceptable (Landis, 1977)
![Page 61: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/61.jpg)
61
Reliable Scale?
NO! There is no such thing as a “reliable” scale We accumulate “evidence” of reliability in a
variety of populations in which it has been tested
![Page 62: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/62.jpg)
62
Reliability Often Poorer in Lower SES Groups
More random error due to Reading problems Difficulty understanding complex
questions Unfamiliarity with questionnaires and
surveys
![Page 63: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/63.jpg)
63
Advantages of multi-item scales revisited
Using multi-item scales improves reliability
Random error is “canceled out” across multiple items
![Page 64: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/64.jpg)
64
Overview
Concepts of error Basic psychometric characteristics
– Variability
– Reliability
– Interpretability
![Page 65: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/65.jpg)
65
Interpretability of Scale Scores: What does a Score Mean?
Meaning of scores What are the endpoints? Direction of scoring - what does a high score
mean? Compared to norms - is score average, low, or
high compared to norms?
Single items, more easily interpretableMulti-item scales, no inherent meaning to scores
![Page 66: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/66.jpg)
66
Endpoints
What is minimum and maximum possible?– To enable interpretation of mean score
Endpoints of summated scales depend on number of items & number of response choices– 5 items, 4 response choices = 5 - 20
– 3 items, 5 response choices = 3 - 15
![Page 67: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/67.jpg)
67
Direction of Scoring
What does a high score mean? Where in the range does this mean score
lie?– Toward top, bottom?
– In the middle?
![Page 68: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/68.jpg)
68
Descriptive Statistics for 3193 Women
M (SD) Min Max
Age 46.2 (2.7) 44.0 52.9
Activity 7.7 (1.8) 3.0 14.0
Stress 8.6 (2.9) 4.0 19.0
Avis NE et al. Med Care, 2003;41:1262-1276
![Page 69: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/69.jpg)
69
Sample Results: Mean Scores in a Sample of Older Adults
Physical functioning 45.0Sleep 28.1Disability 35.7
Mean
![Page 70: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/70.jpg)
70
Example of Table Labeling Scores: Making it Easier to Interpret
Physical functioning 45.0Sleep 28.1Disability 35.7
* All scores 0-100
Mean*
![Page 71: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/71.jpg)
71
Example of Table Labeling Scores: Making it Easier to Interpret
Physical functioning (+) 45.0Sleep (-) 28.1Disability (-) 35.7
* All scores 0-100 (+) indicates higher score is better health(-) indicates lower score is better health
Mean*
![Page 72: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/72.jpg)
72
Solutions
Can include in label (+) or (-)– Can label scale so that higher score is more
of “label” Can easily put score range next to label if
they differ in one table
![Page 73: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/73.jpg)
73
Mean Has to be Interpreted Within the Possible Range
M SD
Parents’ harsh discipline practices* Interviewers’ ratings of mother 2.55 .74 Husbands’ reports of wife 5.32 3.30
*Note: high score indicates more harsh practices
![Page 74: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/74.jpg)
74
Mean Has to be Interpreted Within the Possible Range
M SD
Parents’ harsh discipline practices* Interviewers’ ratings of mother (1-5) 2.55 .74 Husbands’ reports of wife (1-7) 5.32 3.30
*Note: high score indicates more harsh practices
![Page 75: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/75.jpg)
75
Mean Has to be Interpreted Within the Possible Range
M SD
Parents’ harsh discipline practices* Interviewers’ ratings of mother (1-5) 2.55 .74 Husbands’ reports of wife (1-7) 5.32 3.30
Interviewer: 1 2 3 4 5
Husband: 1 2 3 4 5 6 7
*Note: high score indicates more harsh practices
2.55
5.32
![Page 76: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/76.jpg)
76
Mean Has to be Interpreted Within the Possible Range: Adding SD Information
M SD
Parents’ harsh discipline practices* Interviewers’ ratings of mother (1-5) 2.55 .74 Husbands’ reports of wife (1-7) 5.32 3.30
Interviewer: 1 2 3 4 5
Husband: 1 2 3 4 5 6 7
*Note: high score indicates more harsh practices
2.55
5.32
![Page 77: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/77.jpg)
77
Transforming a Summated Scale to 0-100 Scale
Works with any ordinal or summated scale Transforms it so 0 is the lowest possible and
100 is the highest possible Eases interpretation across numerous scales
100 x (observed score - minimum possible score)
(maximum possible score - minimum possible score)
![Page 78: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/78.jpg)
78
Homework for Next Class
Complete rows in matrix for your two measures– Rows 13-18: Nature of samples on which it
has been tested, data quality
– Rows 19-26: Variability, reliability, interpretability
![Page 79: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/79.jpg)
79
Next Class (Class 5)
Guest lecture: Steve Gregorich Factor analysis
![Page 80: 1 Class 4 Psychometric Characteristics Part I: Sources of Error, Variability, Reliability, Interpretability October 12, 2006 Anita L. Stewart Institute](https://reader035.vdocuments.us/reader035/viewer/2022070403/56649f2c5503460f94c47039/html5/thumbnails/80.jpg)
80
Two Readings for Next Week
Selected by Steve Gregorich– Kline
– Mulaik
Suggest reading them ahead to be able to ask questions