hierarchical linear modeling for detecting cheating and aberrance
DESCRIPTION
Hierarchical Linear Modeling for Detecting Cheating and Aberrance. Statistical Detection of Potential Test Fraud May, 2012 Lawrence, KS. William Skorupski University of Kansas Karla Egan CTB/McGraw-Hill. Purpose of the Study. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/1.jpg)
Hierarchical Linear Modeling for Detecting
Cheating and AberranceStatistical Detection of Potential Test Fraud
May, 2012 Lawrence, KS
William SkorupskiUniversity of Kansas
Karla EganCTB/McGraw-Hill
![Page 2: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/2.jpg)
Purpose of the Study “Cheating” as a paradigm for
psychometric research has focused on individuals.
Our purpose is to identify groups of cheaters, based on the premise that teachers and administrators may be motivated to inappropriately influence students’ scores.
![Page 3: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/3.jpg)
Background Importance of cheating detection Cheating as classroom-, school-, or even
district-wide phenomenon Results of many large-scale educational
assessments are tied to incentives, e.g., merit-based pay, accountability, AYP targets from NCLB
Teachers may be tempted to “teach to the test,” provide inappropriate materials, alter students’ answer sheets
![Page 4: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/4.jpg)
Previous Study Skorupski & Egan (2011)
demonstrated a Bayesian hierarchical modeling approach for group-level aberrance (real data).
Cross-validation with external reports of impropriety.
Reasonable detection rates, difficult to verify results.
![Page 5: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/5.jpg)
Findings Relatively large aberrance for a
few schools at certain Time points suggested that this approach may be useful for flagging potentially cheating schools.
The present simulation study was planned to evaluate detection power.
![Page 6: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/6.jpg)
620
640
660
680
700
Gr. 3/2008 Gr. 4/2009 Gr. 5/2010
Mea
n Sc
ale
Scor
e
Grade/Year
Two “Non-Aberrant” Schools
![Page 7: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/7.jpg)
620
640
660
680
700
Gr. 3/2008 Gr. 4/2009 Gr. 5/2010
Mea
n Sc
ale
Scor
e
Grade/Year
Two Flagged Schools
![Page 8: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/8.jpg)
Goals of the study
Evaluate the robustness of the Bayesian HLM approach for detecting group-level cheating through Monte Carlo simulation.
Develop heuristics for flagging known “cheaters” from the analysis
![Page 9: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/9.jpg)
Cheating & Aberrance Certain kinds of aberrance may be
evidence of cheating Answer copying Model-data misfit In our analysis: unusually high group
performance at given time, given marginal group & time effects• i.e., Large positive interaction effect
![Page 10: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/10.jpg)
Important Note No cheating/aberrance detection
method can “prove” cheating, but merely flag unusual individuals or groups for further review.
Our goal is to demonstrate detection of known group-level cheating with adequate power while maintaining an acceptable Type I error rate.
![Page 11: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/11.jpg)
Methods – Data Simulation Data created to emulate a
vertically scaled SWA 3 linked administrations,
means increasing 0.5 between each Time t = 0, 0.5, 1
60 Groups, N(g) within ranging from 10 to 260 (Total N = 4,650)
![Page 12: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/12.jpg)
Histogram of School Sample Sizes - Real Data
Sample Size
Fre
qu
en
cy
0 50 100 150 200
05
01
00
15
02
00
25
03
00
![Page 13: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/13.jpg)
Histogram of Simulated Sample Sizes
Sample Size
Fre
qu
en
cy
0 50 100 150 200 250 300
05
10
15
20
![Page 14: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/14.jpg)
Histogram of Simulated Sample Means
Sample Mean
Fre
qu
en
cy
-3 -2 -1 0 1 2 3
02
46
81
01
21
4 51 of 60 means at Time 1 from (g) ~ N(0,1)
3 x 3 = 9 groups: N(g) = 10, 60, 110 (g) = -1,0,1
These 9 groups (3 at each Time, so 5% overall) will be the “cheaters”
![Page 15: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/15.jpg)
Simulate Individual Scores ~ MVN(0,R): 0 vector of zeros, R
correlation matrix, off-diagonals = 0.77 (based on real data study)
Each individual score Yigt was created by taking igt and adding its respective Time and Group mean.
At this point, all scores are “non-aberrant;” main effects alone account for differences
![Page 16: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/16.jpg)
Simulate “Cheating” For cheating groups, additional
interaction effect is added to Yigt
3 at each Time, for (g) = -1, 0, or 1 and N(g) = 10, 60, or 110
Group-by-Time (60 x 3) matrix of effects. If GT=0 no cheating, GT>0 cheating. GT=1 for simulated cheaters (i.e., Group
mean is +1 above main effects)
![Page 17: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/17.jpg)
5 of 60 Simulated Group Means over Time
Time
Gro
up
Me
an
1 2 3
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0 Time 3
Cheating
Time 2 Cheating
Time 1 Cheating
Each of these 3 patterns was crossed with 3 N = 10, 60, 110
![Page 18: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/18.jpg)
Notes on Simulation Forms must be linked over Time
In this analysis, scale scores were directly simulated (treating scores as measured without error), but in practice item response data would first be obtained, linked in a vertical scale.
Examinees are nested within groups, Time points nested within individuals
![Page 19: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/19.jpg)
Schoolsj = 1,...,J
Individualsi = 1,...,n(j)
Person1j
Schoolj
... ...
Each ij(from separate
calibrations)linked to
vertical scale
Grades3 - 5
Y3ij Y4ij Y5ij
Personij Personn(j)j
Yig1 Yig2 Yig3
Individuals(1,…,N(g)) PersonN(g)gPersonigPerson1g
Time(linked)(1,2,3)
Groups(1,…,G) Groupg
![Page 20: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/20.jpg)
Methods – Analysis Hierarchical Growth Model Model: Scale scores for individuals (i)
within groups (g) over time (t):
Yigt = 0 + 1g + 2t + 3gt + igt
igt ~ N (0, 2) Fully Bayesian estimation (MCMC) using
WinBUGS (Lunn et al, 2000) 50 replications
![Page 21: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/21.jpg)
Baseline Model Only Time- and Group-level effects
are estimated as differences in intercepts (plus interaction term)
With real data, other models could also incorporate covariates (SES, etc.) at any level of the model
![Page 22: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/22.jpg)
Outcomes The parameter estimates 3gt (Group-
by-Time interactions) are used to infer aberrant group performance at a given Time. 1g (main effect for Group) could also be
used to detect systematic aberrance Delta values for parameter estimates,
plus “Posterior Probability of Cheating” (PPoC).
![Page 23: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/23.jpg)
Outcomes
2
3 0
t
gtgt
PPoC = proportion of posterior draws (samples from the posterior in MCMC output) above zero. Criterion for flagging: PPoC≥0.75
Standardized effect size for Interaction. Previous study found ≥0.5 as a reasonable criterion
![Page 24: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/24.jpg)
Cross-validation Any Group/Time interaction effect
with ≥0.5 and PPoC≥0.75 was considered flagged as aberrant (i.e., potentially cheating).
Over replications, correctly identified groups were part of the Power calculation, false positive flags were part of the Type I error rate.
![Page 25: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/25.jpg)
Results MCMC: 2 chains, 30,000 iterations
each, burn-in=25,000 Very good convergence of solutions
Main effects for Time and Group were well recovered.
Detection power was very good at Times 2 & 3, quite low for Time 1
Acceptable Type I error rate
![Page 26: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/26.jpg)
0.0 0.2 0.4 0.6 0.8 1.0
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
r = 0.995
True Time Mean
Ave
rag
e E
stim
ate
d T
ime
Me
an
![Page 27: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/27.jpg)
-2 -1 0 1 2
-2-1
01
2
r = 0.95, N(groups) = 60
True Group Mean
Me
an
Est
ima
ted
De
lta
![Page 28: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/28.jpg)
-1.0 -0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Cheating/Aberrance Indicators
Mean Estimated Delta for Interaction Terms
Me
an
PP
oC
for
Inte
ract
ion
Te
rms
Flag Criteria:
≥ .5PPoC ≥ .75
MarginalPower = .59Type1 = .04
![Page 29: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/29.jpg)
-1.0 -0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Cheating/Aberrance Indicators
Mean Estimated Delta for Interaction Terms
Me
an
PP
oC
for
Inte
ract
ion
Te
rms
Flag Criteria:
≥ .5PPoC ≥ .75
MarginalPower = .59Type1 = .04
![Page 30: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/30.jpg)
-1.0 -0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Cheating/Aberrance Indicators at Time 1
Mean Estimated Delta for Interaction Term
Me
an
PP
oC
for
Inte
ract
ion
Te
rm
Flag Criteria:
≥ .5PPoC ≥ .75
Time 1Power = .07Type1 = .04
![Page 31: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/31.jpg)
-1.0 -0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Cheating/Aberrance Indicators at Time 2
Mean Estimated Delta for Interaction Term
Me
an
PP
oC
for
Inte
ract
ion
Te
rm
Flag Criteria:
≥ .5PPoC ≥ .75
Time 2Power = .71Type1 = .04
![Page 32: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/32.jpg)
-1.0 -0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Cheating/Aberrance Indicators at Time 3
Mean Estimated Delta for Interaction Term
Me
an
PP
oC
for
Inte
ract
ion
Te
rm
Flag Criteria:
≥ .5PPoC ≥ .75
Time 3Power = 1Type1 = .05
![Page 33: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/33.jpg)
Discussion Overall power is quite good, very
poor at Time 1 Type I error rate acceptable Pretty encouraging results; more
simulations, replications planned More conditions with various
effect sizes, sample sizes, non-linear trends, etc.
![Page 34: Hierarchical Linear Modeling for Detecting Cheating and Aberrance](https://reader033.vdocuments.us/reader033/viewer/2022050809/568144de550346895db1aad6/html5/thumbnails/34.jpg)
How might this method be used in practice? Flagged groups may be compared to
the Overall growth trajectory to infer aberrance of performance.
Groups flagged must then be investigated further. Unusual performance could be caused by
cheating, or it could indicate something exemplary!
Commend or Condemn?