staistical detection of test fraud (data forensics) - where do i start?
TRANSCRIPT
Nathan Thompson, PhD Terry Ausman
Statistical Detection: Where do I start?
2
Welcome! This is some of the lessons I have learned
while diving into the field. Overview of the topic Discuss resources Save time and effort for anyone starting out
Purpose is NOT to be a full workshop on data forensics
3
Outline History Where do I start learning? Resources What are threats to test security?
How do I start deterring? Deterrent solutions like weblock and remote
proctoring How do I start detecting? Intro to data forensics Software for detection
4
History
Literature dates before 1950 Many collusion indices Most were descriptive or completely ad hoc Notable exception: Frary, Tideman, and Watts
(1977) – G2 Modern era started when Wollack adapted G2
to IRT Other analyses not as much literature
5
How do I start learning? x
6
Resources
In the past, if you wanted to learn:1. Read all the original articles2. Read reviews• Bliss (2012) Covington Award – 25 indices • Khalid, Mehmood, & Rehman (2011) – 20 indices• Cizek 1997 book: good but little attention to
forensics• You still need all the originals.
UNTIL…
7
ResourcesWollack &
Maynes (2013)Kingston & Clark
(2014)
You can now start here!
8
Overview of Security Threats
Major sources of issues Brain dump makers (harvesting) Brain dump takers (preknowledge) Specific location problems Examinee collusion Receiving help (teacher, proctor, outside) Proxy testing What is your list?
9
Harvesting
What: Steal your content and make it
public
Why: Often (but not always) to make
money
How: Memorization or images; Brain
dump sites
Deter: CAT/LOFT
Detect: Unusual responses & latencies;
brain dump comparisons; Trojan Horses
Minimize: Frequent republishing
10
Preknowledge
What: Knowing the questions and answers
Why: Easy pass
How: Brain dump sites (used to be word of mouth)
Deter: CAT/LOFT
Detect: High score, low time; brain dump comparisons;
Trojan Horses
Minimize: Frequent republishing
11
Examinee Collusion
What: Copying
Why: More items correct
How: Individual or group effort
Deter: CAT/LOFT, multiple forms, proctors
Detect: Collusion indices, group rollups
Minimize: CAT/LOFT, multiple forms
12
Receiving help
What: Teacher, proctor, or outside aid
Why: More items correct; often
benefits the aider
How: Individual or group effort
Deter: CAT/LOFT, multiple forms,
proctors
Detect: Collusion indices, group rollups,
erasure
Minimize: CAT/LOFT, multiple forms, TEIs,
Perf tests
13
How do I start deterring? x
14
Many options User roles in test development Limit access to test content during delivery Verify identity of examinee Test window date/time Test location (IP addresses) Lockdown browser Proctor/Examinee authentication Biometrics for ID Proctor training
Many providers
16
How do I start detecting? x
17
It’s a Hypothesis Test!
First step: Identify the threats you are worried about and how
you think it would present itself in data
18
It’s a Hypothesis Test!
Independent variables Test centers/locations Countries Training programs Test forms Individuals
19
It’s a Hypothesis Test!
Dependent variables Item response or test time Item statistics Test statistics (mean/SD, pass rate) Person statistics (intra-individual) Collusion indices
20
It’s a Hypothesis Test!
12/2/2014
If you aim at nothing, that’s exactly what
you’ll hit.
21
It’s a Hypothesis Test!
Example: Teachers helping kids Item statistics different than other teachers Collusion indices Relatively high scores with relatively short time
– bivariate plot? Item latencies different than other teachers
22
It’s a Hypothesis Test! Example: Brain dump users Collusion indices Responses on Trojan Horses Relatively high scores with
relatively short time Item latencies Group level not likely (could be
at any test center)
23
Time High score, low time: Preknowledge or aid Low score, high time: Harvester
Response patterns Person fit Score gains
Step 2: Determine your analysis x
24
Options for DetectionIn
tra-In
divv
idua
l • Time/RTE (CBT only)
• Response patterns
• Score gains• Person fit
Inte
r-In
divi
dual
• Collusion Indices
• Erasure (paper only, also Group level)
Gro
up
• Roll-up of intra and inter
• Descriptive Statistics
25
More on Collusion Indices
How is collusion quantified? 100 item test… Error similarity – we both had 10 errors:
Same items? Same responses on those items?
Response similarity We gave the same response on 50 items? 90?
Some indices are standardized/probabilistic (good) Some are descriptive or non-probabilistic (bad) Can vary in direction (one/two)
26
More on Collusion Indices
There are issues to consider when comparing:
ESA only looks at errors, ignores rest of data Major confound with ability
Two examinees with 99/100 will get flagged as collusion!
Therefore important to condition on this Some indices have no theoretical basis
whatsoever
27
More about collusionProbabilistic Descriptive Ad
hocError Similarity
B&B EIC EEIC HHHHJ
Response Similarity
Wollack’s OmegaWesolowsky ZjkFrary et al G2
RIC
28
More resources
ITC Guidelines on the Security of Tests, Examinations, and Other Assessments
TILSA Test Security Guidebook
Conference presentations/workshops (harder to find)
29
Software
Next step: Find software that meets your needs
Scrutiny! S-check R packages (CopyDetect) SIFT Integrity Caveon IRT software like IRTPRO or Xcalibre
30
Epilogue: Then what?
Define a pathway for investigation and actions
Joy Matthews-Lopez and Paul Jones
31
Examples (if time)
500 certification candidates Gr4 Math (locations) Check on teachers and schools; there is
incentive to help students
Summary – Q&A