Generalizability Theory
Nothing more practical than a good theory!
This presentation is made by Prof. ZhaoThis presentation is made by Prof. Zhao
Overview of Presentation
Classes of reliability theories Generalizability Theory
G-study D-study
Illustrations
Three Reliability Theories
Classical Test Theory Generalizability Theory Item Response Theory
Overview of Presentation
Classes of reliability theories Generalizability Theory
G-study D-study
Illustrations
Generalizability Theory
Fundamental is the concept of parallel measures (like classical test theory), but the theory allows a multitude of error sources
Generalizability concept:Reliability is dependent on the inferences (generalizations) that the investigator wishes to make with the data from the measurement
Illustration
Essay test 7 vignette based essay questions 2 markers independently marking all
questions for all examinees Reliability in a classical framework:
Cronbach’s alpha: 0.66 Inter rater reliability (i.e. kappa) 0.71
Fundamental Equation
X =X = Observed score
T + E T = True score
E = Error score
Reliability = Variance of TVariance of X
The larger the variance of T in relation to X, the higher the
reliability
Fundamental Equation
X =X = Observed score
T + E T = True score
E = Error score
Reliability = Variance of TVariance of X
= = =
Fundamental Equation
X =X = Observed score
T + E T = True score
E = Error score
Reliability = Variance of TVariance of X
Reliability = Variance of TVar T + Var E
Multiple sources of error variance
Reliability = Variance of TVar T + Var E
Markers Essays Unexplained
Two steps in G analysis
1) G(eneralizability)-study:Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers)
2) D(ecision)-study:Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)
G-study steps
Determine facets (factors of variance)
Determine design Random vs fixed Crossed vs nested
Crossed vs nested designs
A B
1
2
3
4
5
6
A B C D E F G H I J K L
Crosseddesign
Nesteddesign
G-study
Determine facets (factors of variance)
Determine design Random vs Fixed Crossed vs nested Collect data
Analysis of Variance (ANOVA) Estimation of variance components
Illustration 1
Essay Test 7 vignette based open ended questions 100 students One marker marked all essays for all
students G-study questions?
N of factors/facets? Random/fixed facets? Nested or crossed?
One facet designRandomCrossed
Sources of Variance
Person x Items
p ipi,e
Sources of Variance
Person x Items
ip pi,e
Sources of Variance
Person x Items
p ipi,e
Sources of Variance
Person x Items
p pi,e
Variance component estimation (one facet design)
An observed score for a person on an item (Xpi):
Xpi = [Overall mean]
+ p - [Person effect]
+ i - [Item effect]
+ pi - p - i - [Residual]
Each of these effects have an average (always 0) anda variance (2). The latter ones are the variance components.
The variance of all observes scores Xpi across all persons and items:
^
^2 (Xpi) = ^2p
^2i + ^2
pi,e +
Variance components
P x I design
Source
pi
pi,e
EstimatedVariance
Component
97.57261.24371.97
StandardError
19.02112.9817.60
Percentageof TotalVariance
13.3535.7550.90
Crossed vs nested designs
A B
1
2
3
4
5
6
A B C D E F G H I J K L
Crosseddesign
Nesteddesign
Sources of Variance
Items : Persons
p i,pi,e
Variance components
I : P design
p
i,pi,e
97.57
663.21
35.7550.90
13.35
86.65
ipi,e
261.24371.97
Source
EstimatedVariance
Component
Percentageof TotalVariance
Variance components
I : P design
p
i,pi,e
97.57
663.21
35.7550.90
13.35
86.65
ipi,e
261.24371.97
Source
EstimatedVariance
Component
Percentageof TotalVariance
pi,pi,e
97.57663.21
13.3586.65
Sources of Variance
Person x Items x Judges
p i
pij,e
pi
pj ij
j
Variance component estimation (two facet design)
An observed score for a person on an item (Xpi):
Xpi = [Overall mean]
+ p - [Person effect]
+ j - [Item effect]
+ i - [Judge effect]
+ pj - p - j + [Person by judge effect] + pi - p - i + [Person by item effect]
+ ij - j - i + [Judge x item effect]
+ pij - pj - pi - ij + p + j + i - [Residual]
The variance of observes scores Xpi across all persons and items:
^2 (Xpij) = ^2p
^2j + + ^2
i + ^2pj +
^2pi +
^2ij +
^2pij,e
Variance componentsP x I x J design
Source
pij
pipjij
pij,e
EstimatedVariance
Component
48.7125.1215.00
185.8733.1880.0072.94
Percentageof TotalVariance
10.575.453.26
40.337.20
17.3615.83
Overview of Presentation
Classes of reliability theories Generalizability Theory
G-study D-study
Illustrations
Two steps in G analysis
1) G(eneralizability)-study:Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers)
2) D(ecision)-study:Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)
Interpretation of scores
Norm-oriented perspectiveScores have relative meaning; scores have meaning in relation to each other
Domain-oriented perspectiveScores have absolute meaning to the domain of measurement
Mastery-oriented perspectiveScores have meaning in relation to a cut-off score (reliability of decisions, not of scores)
Fundamental Equation
X =X = Observed score
T + E T = True score
E = Error score
Reliability = Variance of TVariance of X
Reliability = Variance of TVar T + Var E
Illustration 1
Essay test 7 vignette based essay questions 1 markers marked all questions for all
examinees Norm-referenced perspective
Calculate generalizability coefficient!
D-study (ni = 7; norm-referenced)
Source
pi
pi,e
EstimatedVariance
Component
97.57261.24371.97
StandardError
19.02112.9817.60
Percentageof TotalVariance
13.3535.7550.90
G =T
T + E=
97.57
97.57 + 371.97/7= 0.65
Illustration 2
Essay test 7 vignette based essay questions 1 markers marked all questions for all
examinees Domain-referenced perspective
Calculate dependability coefficient!
D-study (ni = 7; domain referenced)
Source
pi
pi,e
EstimatedVariance
Component
97.57261.24371.97
StandardError
19.02112.9817.60
Percentageof TotalVariance
13.3535.7550.90
D =97.57
97.57+= 0.52
261.24/ 7
+371.97/ 7
Illustration 3
Essay test 7 vignette based essay questions 1 markers marked all questions for all
examinees Domain-referenced perspective
Calculate dependability coefficient fora sample of 10 essays!
D-study (ni = 10; domain referenced)
Source
pi
pi,e
EstimatedVariance
Component
97.57261.24371.97
StandardError
19.02112.9817.60
Percentageof TotalVariance
13.3535.7550.90
D =97.57
97.57+= 0.61
261.24/10
+371.97/ 10
D-studies for several item samples
N Essays
1571015
GeneralizabilityCoefficient (G)
0.210.570.650.720.80
DependabilityCoefficient (D)
0.130.440.520.610.70
Illustration 4
Essay test 7 vignette based essay questions 2 markers independently marked all
questions for all examinees Norm-referenced perspective
Calculate generalizability coefficient!
D-study (ni=7; nj=2; norm referenced)
Source
pij
pipjij
pij,e
VarianceComponent
48.7125.1215.00
185.8733.1880.0072.94
% of TotalVariance
10.575.453.2640.337.2017.3615.83
G =48.71
48.71+= 0.50
185.87/ 7
+33.18/2
+72.94/2 x 7
Illustration 5
Essay test 7 vignette based essay questions 2 markers independently marked all
questions for all examinees Domain-referenced perspective
Calculate dependability coefficient!
D-study (ni=7; nj=2; domain referenced)
Source
pij
pipjij
pij,e
VarianceComponent
48.7125.1215.00
185.8733.1880.0072.94
% of TotalVariance
10.575.453.2640.337.2017.3615.83
D =48.71
48.71+= 0.43
25.12/ 7
+15.00/2+185.87/
14+33.18/
2+80.00/
14+72.94/
14
Illustration 6
Essay test 7 vignette based essay questions 2 different markers
independently marked each question for all examinees
Norm-referenced perspective
Calculate generalizability coefficient!
D-study (ni=7; nj=2; norm referenced)
SourceEstimated Var
ComponentPerc of Total
Variance
(Judges : Items) x Persons
pi
j,ijpi
pj,pij,e
48.7125.1895.00
185.87106.12
10.575.45
20.6240.3323.03
G =48.71
48.71+= 0.52
185.87/ 7
+ 106.12/2 x 7
D-study summary table
TwoMarkers
0.440.500.560.61
OneMarker
0.390.470.560.65
TwoMarkers
0.460.540.630.72
Same Markerfor all essays
Different Markerfor each essayNumber
ofEssays
571015
OneMarker
0.360.410.450.49
Norm-referenced score interpretation
Another reliability index
Reliability coefficient (G & D coefficients) Scale independent (0-1) Non-intuitive interpretation
Standard Error of Measurement (SEM) Intuitive interpretation Scale dependent
Standard Error of Measurement
X =X = Observed score
T + E T = True score
E = Error score
Reliability index = Variance of TVariance T + Variance E
EStandard Error of Measurement (SEM) =
Interpretation of SEM
Suppose an examinee has a score of 60% and the SEM is 5:
60555045 65 70 7565% CI
1.96 x 5 10
60555045 65 70 7595% CI
2.14 x 5 11
60555045 65 70 7595% CI
D-study (ni = 7; norm referenced)
Source
pi
pi,e
EstimatedVariance
Component
97.57261.24371.97
StandardError
19.02112.9817.60
Percentageof TotalVariance
13.3535.7550.90
G =97.57
97.57 + 371.97/7= 0.65
SEM = = 7.29 371.97 /7
D-study (ni=7; nj=2; domain referenced)
Source
pij
pipjij
pij,e
VarianceComponent
48.7125.1215.00
185.8733.1880.0072.94
% of TotalVariance
10.575.453.2640.337.2017.3615.83
D =48.71
48.71+= 0.43
25.12/ 2
+15.00/2+185.87/
14+33.18/
2+80.00/
14+72.94/
14SEM = = 8.57
Overview of Presentation
Classes of reliability theories Generalizability Theory
G-study D-study
Illustrations
Scenario CEX
A clinical mini exercise (CEX) was developed in which examinees are periodically observed and rated on a rating form. An investigator analyzed a data set from 88 residents who were each observed on 4 occasions by a single different examiner (cf. 1. Norcini JJ, Blank LL, Arnold GK, Kimbal HR. The mini-CEX (Clinical Evaluation Exercise): A
preliminary investigation. Annals of Internal Medicine 1995;123:795-799.). Variance
Componentsp
o,op,eG =
p
p + o:p /4
= Do:p
Scenario OSCE I
An OSCE was administered to 100 final year students consisting of 15 stations. Each station was scored by two independent examiners on a case specific checklist. Different examiners were used in each station.
VarianceComponents
ps
G =p
p +j:spspj:s
ps /15
+ pj:s /2 x15
Scenario OSCE II
An experimental OSCE was administered to 20 residents. Each resident was tested on a different day. For each resident 3 stations were organized consisting of real patients that were available that day. Two examiners observed all residents in all stations and completed a generic rating scale.
VarianceComponents
ps:p D =
p
p +s:p /3j
ps:spj
+ j /2+ ps:s /
3+ pj /
6
Scenario Clerkship Evaluation
An investigator wishes to evaluate teaching quality of 10 clinical clerkships. She developed a questionnaire with 30 items on various quality aspects. The questionnaire was administered in all clerkships by 50 students.
VarianceComponents
ci
s:cci
cs:i
G =c
c + s:c /50
+ ci /30
+ cs:i /50 x 30
PS: It is doubtful that i is a random facet and i could be treated as fixed or ignored!
Further reading & software
Literature Cronbach LJ, Gleser GC, Nanda H, Rajaratnam N. The dependability of behavioral
measurements: Theory of generalizability for scores and profiles. New York: Wiley, 1972. Original monograph on generalizability theory. Complete, but hardly accessible for any reader.
Brennan RL. Elements of Generalizability Theory. Iowa: ACT Publications, 1983.This is the resource book for most specialists. Not easy for non-statistically trained readers
Shavelson RJ, Webb NM. Generalizability theory: A primer. Newbury Park, CA: Sage Publications, 1991 . Good and accessible introduction to generalizability theory for any reader
Software GENOVA
Conducts G and D studies and provides ample statistical information. Operates on any PC. Program is relatively old and not user friendly. Program available from Dr. J. Crick, National Board of Medical Examiners, National Board of Medical Examiners, 3750 Market Street,Philadelphia, PA 19104-3190, USA.
SPSSSPSS General Linear Models, Subprogram Variance Components, estimates variance components (also for unbalanced designs). D-studies need to be done manually.