comparing decision rules decision accuracy of different decision rules combining multiple measures...
TRANSCRIPT
Comparing Decision Rules
Decision accuracy of different decision rules combining multiple measures in a higher educational contextIris Yocarini, Samantha Bouwmeester, Guus Smeets, and Lidia ArendsCEMO conference standard setting 23rd september 2015
The decision to be made
End of first bachelor year
Start of first bachelor year
Student to second bachelor year
Student leaves bachelor program
BSA decision
Decision accuracy
• Given high stakes an accurate decision is required
• Comparison decision based on true score vs. observed score
True Score
Observed scoreError
Decision accuracy
• Given high stakes an accurate decision is required
• Comparison decision based on true score vs. observed score
Decision based on true score
Fail Pass
Decision based on observed score
Fail Correct classification
MisclassificationFalse negative
Pass Misclassification False positive
Correct classification
Decision accuracy
• Total proportion of misclassifications • (C + B / total sample)
Decision based on true score
Fail Pass
Decision based on observed score
Fail Correct classification
A MisclassificationFalse negative
B
Pass Misclassification False positive
C Correct classification
D
Decision accuracy
• Total proportion of misclassifications • (C + B / total sample)
• False negative rate• from all truly competent students those who are identified as fails
(B/B+D)
Decision based on true score
Fail Pass
Decision based on observed score
Fail Correct classification
A MisclassificationFalse negative
B
Pass Misclassification False positive
C Correct classification
D
Decision accuracyDecision based on true score
Fail Pass
Decision based on observed score
Fail Correct classification
A MisclassificationFalse negative
B
Pass Misclassification False positive
C Correct classification
D
• Total proportion of misclassifications • (C + B / total sample)
• False negative rate• from all truly competent students those who are identified as fails
(B/B+D)• False positive rate• from all truly failing students those who are identified as passes
(C/A+C)
Decision accuracyDecision based on true score
Fail Pass
Decision based on observed score
Fail Correct classification
A MisclassificationFalse negative
B
Pass Misclassification False positive
C Correct classification
D
• Total proportion of misclassifications • (C + B / total sample)
• False negative rate• from all truly competent students those who are identified as fails
(B/B+D)• False positive rate• from all truly failing students those who are identified as passes
(A/A+C)• Positive predictive value• from all students who passed those who are correctly classified
(D/C+D)
Testing system
• Compensatory testing system at Erasmus University Rotterdam• Vs. standard conjunctive testing system in Dutch
higher education
• Debate
Reasons behind implementation
• Educational views
• Psychometric argument• Classical Test Theory (CTT): average more
reliable
Assumption of parallel tests• Equal true ability levels• Similar test reliabilities
Factors influencing decision accuracy
• Reliability
• Decision accuracy
True Score
Observed score
ErrorDecision based on true score
Fail Pass
Decision based on observed score
Fail Correct classification
MisclassificationFalse negative
Pass Misclassification False positive
Correct classification
Error
Decision rules in practice
• Educational setting: combinatory decision rules• Compensatory aspect: required GPA• Conjunctive aspect: required minimum grade
• Clusters
• First year psychology at Erasmus University • Grading scale: 1.0 – 10.0• GPA: 6.0• Minimum grade: 4.0• Two clusters with each 8 courses
Our study
• Aim of study• Comparing decision accuracy different decision
rules that combine multiple tests• Evaluating psychometric argument for
implementation compensatory testing system• CTT: average grade is more reliable than using
individual test scores
• Context of first year Psychology students at Erasmus University Rotterdam
Decision rules
• Varying• Conjunctive aspect: minimum required grade• Compensatory aspect: required GPA
• Also included • Fully conjunctive rule• Fully compensatory rule
Decision rule Minimum grade GPA
Fully Conjunctive 5.5 5.5
Fully Compensatory
1.0 5.5 / 6.0 / 6.5
Complex rules 3.0 / 4.0 / 5.0 5.5 / 6.0 / 6.5
*Grading from 1.0 to 10.0
Simulation
• Simulation study
• Manipulation of factors
Decision based on true score
Fail Pass
Decision based on observed score
Fail Correct classification
MisclassificationFalse negative
Pass Misclassification False positive
Correct classification
Results – minimum grade & GPA
• Minimum grade • 1.0/ 3.0/ 4.0/ 5.0• GPA• 5.5/ 6.0/ 6.5
Results – minimum grade & GPA
Results – minimum grade & GPA
Results – minimum grade & GPA
Results – average test reliability
Proportion of Misclassifications
Results – average test reliability
Positive Predictive Value
Results – number of retakes
Proportion of Misclassifications
Results – number of retakes
False Negative Rate
Results – number of retakes
False Positive Rate
Results – number of retakes
Positive Predictive Value
Comparison conjunctive & compensatory• In compensatory decision rule:• Fewer classification errors• Fewer false negatives, more false positives• Positive predictive value higher
Conclusion
• Increasing the degree of compensation results in less classification errors
• Within compensatory decision rule relatively fewer false negatives and more false positives
• Depends on specific setting & tests used• Most important: test reliability and number of
retakes
• Psychometric argument• Standard setting
Take home message
• Decision accuracy important consideration• Focus on both specific decision rule as well as tests
Results – proportion of misclassifications
Decision Rule
GPA
Minimum
MeanProportion
Errors
Average TestCorrelation
Average TestReliability
Number ofTests
Number ofRetakes
.1 .3 .5 .7 .4 .6 .8 8 12 0 2
1 5.5 5.5 .18 .16 .18 .19 .19 .24 .18 .12 .19 .17 .20 .16
2 5.5 1 .05 .03 .04 .05 .06 .06 .04 .03 .05 .04 .06 .03
3 5.5 3 .08 .09 .09 .08 .07 .14 .07 .04 .08 .09 .13 .04
4 5.5 4 .17 .20 .18 .16 .13 .27 .16 .08 .15 .19 .26 .08
5 5.5 5 .22 .23 .23 .22 .21 .31 .22 .14 .22 .23 .28 .16
6 6 1 .09 .09 .10 .09 .09 .13 .09 .06 .10 .09 .10 .08
7 6 3 .11 .13 .11 .10 .09 .16 .10 .06 .11 .11 .14 .08
8 6 4 .16 .20 .17 .14 .12 .24 .15 .08 .15 .17 .22 .10
9 6 5 .21 .23 .22 .21 .19 .29 .21 .13 .20 .22 .27 .15
10 6.5 1 .13 .17 .13 .12 .10 .18 .13 .08 .14 .12 .13 .13
11 6.5 3 .13 .17 .14 .12 .10 .19 .13 .08 .14 .13 .13 .13
12 6.5 4 .15 .19 .15 .13 .11 .21 .14 .09 .15 .14 .16 .13
13 6.5 5 .17 .20 .18 .16 .14 .24 .17 .11 .17 .17 .20 .14
Results - sensitivity
Decision Rule
GPA
Minimum
MeanSensitivity
Average TestCorrelation
Average Test Reliability
Number ofTests
Number of Retakes
.1 .3 .5 .7 .4 .6 .8 8 12 0 2
1 5.5 5.5 .60 .52 .59 .64 .67 .45 .60 .76 .65 .56 .44 .77
2 5.5 1 .97 .98 .97 .97 .97 .96 .98 .99 .97 .98 .96 .99
3 5.5 3 .93 .91 .92 .93 .95 .87 .94 .98 .94 .92 .88 .98
4 5.5 4 .83 .79 .81 .84 .87 .71 .84 .93 .85 .80 .71 .94
5 5.5 5 .67 .61 .66 .69 .72 .52 .68 .82 .71 .63 .51 .83
6 6 1 .95 .94 .95 .95 .95 .93 .95 .97 .94 .95 .92 .98
7 6 3 .92 .90 .91 .93 .94 .87 .93 .96 .92 .92 .87 .97
8 6 4 .85 .80 .83 .87 .90 .74 .86 .93 .87 .83 .75 .95
9 6 5 .68 .61 .67 .71 .75 .53 .69 .83 .73 .64 .52 .84
10 6.5 1 .92 .89 .91 .93 .94 .89 .92 .94 .91 .92 .88 .96
11 6.5 3 .90 .86 .90 .92 .93 .86 .91 .94 .90 .90 .85 .95
12 6.5 4 .86 .80 .85 .88 .91 .78 .87 .93 .87 .85 .78 .94
13 6.5 5 .73 .64 .71 .76 .81 .59 .74 .86 .77 .69 .58 .88
Results - specificity
Decision Rule
GPA
Minimum
MeanSpecificity
Average TestCorrelation
Average TestReliability
Number ofTests
Number ofRetakes
.1 .3 .5 .7 .4 .6 .8 8 12 0 2
1 5.5 5.5 .93 .92 .92 .93 .94 .93 .92 .93 .91 .94 .96 .89
2 5.5 1 .67 .57 .66 .71 .74 .58 .67 .77 .66 .69 .76 .58
3 5.5 3 .72 .66 .72 .75 .77 .69 .71 .77 .70 .75 .82 .63
4 5.5 4 .80 .75 .79 .83 .85 .82 .79 .81 .78 .83 .87 .74
5 5.5 5 .89 .86 .88 .90 .92 .90 .88 .89 .87 .91 .93 .85
6 6 1 .73 .65 .73 .77 .79 .64 .73 .82 .72 .75 .81 .65
7 6 3 .75 .68 .75 .78 .80 .69 .74 .83 .73 .77 .84 .66
8 6 4 .80 .75 .80 .82 .83 .78 .78 .84 .78 .82 .88 .72
9 6 5 .89 .86 .88 .90 .91 .90 .88 .89 .87 .91 .93 .84
10 6.5 1 .80 .74 .80 .83 .84 .72 .80 .88 .79 .82 .87 .74
11 6.5 3 .81 .75 .80 .83 .84 .73 .81 .88 .79 .82 .87 .74
12 6.5 4 .83 .78 .82 .85 .86 .78 .82 .89 .81 .85 .90 .76
13 6.5 5 .88 .87 .88 .89 .90 .88 .87 .90 .86 .91 .94 .83
Results – positive predictive value
Decision Rule
GPA
Minimum
MeanPositive
PredictiveValue
Average TestCorrelation
Average TestReliability
Number ofTests
Number ofRetakes
.1 .3 .5 .7 .4 .6 .8 8 12 0 2
1 5.5 5.5 .82 .68 .80 .88 .93 .79 .82 .86 .83 .81 .79 .86
2 5.5 1 .98 .99 .98 .97 .96 .97 .98 .98 .97 .98 .98 .97
3 5.5 3 .98 .99 .98 .97 .97 .97 .98 .98 .98 .98 .98 .97
4 5.5 4 .97 .96 .97 .97 .98 .97 .97 .97 .97 .97 .97 .97
5 5.5 5 .90 .82 .88 .93 .96 .88 .89 .91 .91 .89 .87 .92
6 6 1 .93 .95 .93 .93 .93 .91 .93 .96 .93 .94 .94 .93
7 6 3 .94 .95 .94 .93 .93 .92 .94 .96 .93 .94 .95 .93
8 6 4 .94 .94 .94 .94 .94 .93 .93 .95 .93 .94 .94 .93
9 6 5 .89 .81 .88 .92 .95 .88 .89 .91 .90 .89 .87 .91
10 6.5 1 .85 .82 .85 .87 .88 .80 .85 .91 .85 .86 .87 .84
11 6.5 3 .86 .82 .85 .87 .88 .81 .85 .91 .85 .86 .87 .84
12 6.5 4 .86 .82 .86 .87 .88 .82 .86 .91 .85 .87 .88 .85
13 6.5 5 .85 .77 .85 .89 .91 .83 .84 .89 .85 .86 .85 .85
Results – average test reliability
False Negative Rate
Results – average test reliability
False Positive Rate
Previous studies
• Douglas & Mislevy (2010)• Van Rijn, Béguin, & Verstralen (2012)• McBee, Peters, & Waterman (2014)