staistical detection of test fraud (data forensics) - where do i start?

Nathan Thompson, PhD Terry Ausman

Statistical Detection: Where do I start?

2

Welcome! This is some of the lessons I have learned

while diving into the field. Overview of the topic Discuss resources Save time and effort for anyone starting out

Purpose is NOT to be a full workshop on data forensics

3

Outline History Where do I start learning? Resources What are threats to test security?

How do I start deterring? Deterrent solutions like weblock and remote

proctoring How do I start detecting? Intro to data forensics Software for detection

4

History

Literature dates before 1950 Many collusion indices Most were descriptive or completely ad hoc Notable exception: Frary, Tideman, and Watts

(1977) – G2 Modern era started when Wollack adapted G2

to IRT Other analyses not as much literature

5

How do I start learning? x

6

Resources

In the past, if you wanted to learn:1. Read all the original articles2. Read reviews• Bliss (2012) Covington Award – 25 indices • Khalid, Mehmood, & Rehman (2011) – 20 indices• Cizek 1997 book: good but little attention to

forensics• You still need all the originals.

UNTIL…

7

ResourcesWollack &

Maynes (2013)Kingston & Clark

(2014)

You can now start here!

8

Overview of Security Threats

Major sources of issues Brain dump makers (harvesting) Brain dump takers (preknowledge) Specific location problems Examinee collusion Receiving help (teacher, proctor, outside) Proxy testing What is your list?

9

Harvesting

What: Steal your content and make it

public

Why: Often (but not always) to make

money

How: Memorization or images; Brain

dump sites

Deter: CAT/LOFT

Detect: Unusual responses & latencies;

brain dump comparisons; Trojan Horses

Minimize: Frequent republishing

10

Preknowledge

What: Knowing the questions and answers

Why: Easy pass

How: Brain dump sites (used to be word of mouth)

Deter: CAT/LOFT

Detect: High score, low time; brain dump comparisons;

Trojan Horses

Minimize: Frequent republishing

11

Examinee Collusion

What: Copying

Why: More items correct

How: Individual or group effort

Deter: CAT/LOFT, multiple forms, proctors

Detect: Collusion indices, group rollups

Minimize: CAT/LOFT, multiple forms

12

Receiving help

What: Teacher, proctor, or outside aid

Why: More items correct; often

benefits the aider

How: Individual or group effort

Deter: CAT/LOFT, multiple forms,

proctors

Detect: Collusion indices, group rollups,

erasure

Minimize: CAT/LOFT, multiple forms, TEIs,

Perf tests

13

How do I start deterring? x

14

Many options User roles in test development Limit access to test content during delivery Verify identity of examinee Test window date/time Test location (IP addresses) Lockdown browser Proctor/Examinee authentication Biometrics for ID Proctor training

Many providers

16

How do I start detecting? x

17

It’s a Hypothesis Test!

First step: Identify the threats you are worried about and how

you think it would present itself in data

18


Independent variables Test centers/locations Countries Training programs Test forms Individuals

19


Dependent variables Item response or test time Item statistics Test statistics (mean/SD, pass rate) Person statistics (intra-individual) Collusion indices

20


12/2/2014

If you aim at nothing, that’s exactly what

you’ll hit.

21


Example: Teachers helping kids Item statistics different than other teachers Collusion indices Relatively high scores with relatively short time

– bivariate plot? Item latencies different than other teachers

22

It’s a Hypothesis Test! Example: Brain dump users Collusion indices Responses on Trojan Horses Relatively high scores with

relatively short time Item latencies Group level not likely (could be

at any test center)

23

Time High score, low time: Preknowledge or aid Low score, high time: Harvester

Response patterns Person fit Score gains

Step 2: Determine your analysis x

24

Options for DetectionIn

tra-In

divv

idua

l • Time/RTE (CBT only)

• Response patterns

• Score gains• Person fit

Inte

r-In

divi

dual

• Collusion Indices

• Erasure (paper only, also Group level)

Gro

up

• Roll-up of intra and inter

• Descriptive Statistics

25

More on Collusion Indices

How is collusion quantified? 100 item test… Error similarity – we both had 10 errors:

Same items? Same responses on those items?

Response similarity We gave the same response on 50 items? 90?

Some indices are standardized/probabilistic (good) Some are descriptive or non-probabilistic (bad) Can vary in direction (one/two)

26

More on Collusion Indices

There are issues to consider when comparing:

ESA only looks at errors, ignores rest of data Major confound with ability

Two examinees with 99/100 will get flagged as collusion!

Therefore important to condition on this Some indices have no theoretical basis

whatsoever

27

More about collusionProbabilistic Descriptive Ad

hocError Similarity

B&B EIC EEIC HHHHJ

Response Similarity

Wollack’s OmegaWesolowsky ZjkFrary et al G2

RIC

28

More resources

ITC Guidelines on the Security of Tests, Examinations, and Other Assessments

TILSA Test Security Guidebook

Conference presentations/workshops (harder to find)

29

Software

Next step: Find software that meets your needs

Scrutiny! S-check R packages (CopyDetect) SIFT Integrity Caveon IRT software like IRTPRO or Xcalibre

30

Epilogue: Then what?

Define a pathway for investigation and actions

Joy Matthews-Lopez and Paul Jones

31

Examples (if time)

500 certification candidates Gr4 Math (locations) Check on teachers and schools; there is

incentive to help students

Summary – Q&A

staistical detection of test fraud (data forensics) - where do i start?

Education