discovery education assessment progress ......discovery education assessment hereby employs one of...
TRANSCRIPT
1
NATIONAL CENTER ON
RESPONSE TO INTERVENTION
EVALUATION CRITERIA
DISCOVERY EDUCATION ASSESSMENT
PROGRESS MONITORING SCREENING TOOL
2
Table of Contents
National Center for RTI Screening Tool………………………………………………………...3
Five Criteria for Screening Tool…………………………………………………………4
DEA Progress Monitoring Screening Tool Data………………………………………………...6
Generalizability…………………………………………………………………………..6
Reliability………………………………………………………………………………...7
Validity…………………………………………………………………………………...8
Classification Analysis…………………………………………………………………..12
Appendices………………………………………………………………………………………18
3
National Center for RTI Screening Tool
Starting in 2009, Discovery Education Assessment (DEA) benchmark assessments have received
high marks for use as screening tools from the National Center on Response to Intervention.
Screening tools can be used to identify students who are at risk of not meeting grade level
proficiency standards. These at risk students can then be placed into Response to Intervention
(RTI) or similar programs.
The National Center defines a screening tool as follows:
Screening involves brief assessments that are valid, reliable, and evidence-based. They are
conducted with all students or targeted groups of students to identify students who are at risk of
academic failure and, therefore, likely to need additional or alternative forms of instruction to
supplement the conventional general education approach.
The National Center’s Technical Review Committee (TRC) on Screening “independently
established a set of criteria for evaluating the scientific rigor of screening tools. The TRC rated
each submitted tool against these criteria.” Five types of scientifically rigorous criteria were
evaluated: (1) Classification Accuracy; (2) Generalizability; (3) Reliability; (4) Validity; and (5)
Disaggregated Reliability, Validity, and Classification Data for Diverse Populations. DEA
received high marks on all five criteria. (The NCRTI Screening Tools chart can be found at
http://www.rti4success.org/chart/screeningTools/screeningtoolschart.html#)
Convincing Evidence Partially Convincing Evidence Unconvincing Evidence
4
These five criteria were defined as follows:
Generalizability
Generalizability refers to the extent to which results generated from one population can be
applied to another population. A tool is considered more generalizable if studies have been
conducted on larger, more representative samples. A rating of Moderate High means the
screening tool has a “Large representative national sample or multiple regional/state samples
with no cross-validation or one or more regional/state samples with cross-validation.”
Reliability
Reliability refers to the consistency with which a tool classifies students from one administration
to the next. A tool is considered reliable if it produces the same results when administering the
test under different conditions, at different times, or using different forms of the test. A rating of
Convincing Evidence means that “split-half, coefficient alpha, test-retest, or inter-rater reliability
(is) greater than .80.”
Validity
Validity refers to the extent to which a tool accurately measures the underlying construct that it is
intended to measure. Validity is measured in three ways: content validity, construct validities
above .70, and predictive validities above .70.
Classification Accuracy
Classification accuracy refers to the extent to which a screening tool is able to accurately classify
students into "at risk for reading or mathematics disability" and "not at risk for reading or
mathematics disability" categories. Classification accuracy was measured by a statistic known as
the Area Under the Curve (AUC). AUC values have to be .85 or greater to receive a rating of
Convincing Evidence. (Area Under the Curve (AUC) Statistic: an overall indication of the
diagnostic accuracy of a Receiver Operating Characteristic (ROC) curve. ROC curves are a
generalization of the set of potential combinations of sensitivity and specificity possible for
predictors. AUC values closer to 1 indicate the screening measure reliably distinguishes among
students with satisfactory and unsatisfactory reading performance, whereas values at .50 indicate
the predictor is no better than chance.)
5
Disaggregated Reliability, Validity, and Classification Data for Diverse Populations
Data are disaggregated when they are calculated and reported separately for specific sub-groups.
Evidence for disaggregated reliability, validity, and classification data receive the highest scores
in this category.
The following sections describe the specific studies DEA used to achieve high marks from the
National Center.
6
DEA Progress Monitoring Screening Tool Data
DEA Screeners encompass the subjects of Reading and Mathematics, spanning Grades 3 to 10.
Evaluation criteria will be presented in the following order: Generalizability, Reliability,
Validity, and then Classification Accuracy. Information on disaggregation by ethnic group will
be outlined in the sections on validity and classification accuracy.
Generalizability (Moderate High)
Once again, generalizability refers to the extent to which results generated from one population
can be applied to another population. A tool is considered more generalizable if studies have
been conducted on larger, more representative samples. DEA presented data drawn from two
representative sources: (1) The state of Kentucky, comprised of more than 6000 students from
five representative school districts; and (2) the District of Columbia Public School System, one
of the nation’s largest urban districts comprised of more than 20,000 students in Grades 3 to 10.
Standardized test scores were obtained for each student from the Kentucky Core Commonwealth
Test (KCCT) from the Spring of 2008 and the District of Columbia Comprehensive Assessment
System (DCCAS), also from the Spring of 2008. Each student also completed three DEA
benchmarks during the 2007-2008 school year: Fall, Winter, and Spring. Additional data was
also obtained for the DCCAS from Spring 2009.
7
Reliability (Convincing Evidence)
Reliability refers to the consistency with which a tool classifies students from one administration
to the next. A tool is considered reliable if it produces the same results when administering the
test under different conditions, at different times, or using different forms of the test. Cronbach’s
alpha was used to measure a screener’s reliability.
The following tables presents the range and median reliability coefficients for DEA Screeners for
the three time periods (Fall, Winter, Spring), separately for Kentucky and DC, and also
separately for Reading and Mathematics. The median reliabilities for all test ranges exceed .80,
the criteria established by the National Center to receive a Convincing rating.
Table 1: Reliabilities for DEA Reading Screeners
Reading Test Period Range Median
Fall .73 to .85 .84
Winter .82 to .86 .84 Kentucky
Spring .84 to .86 .85
Fall .82 to .87 .86
Winter .77 to .89 .82 District of Columbia
Spring .84 to .87 .86
Table 2: Reliabilities for DEA Mathematics Screeners
Mathematics Test Period Range Median
Fall .76 to .87 .83
Winter .79 to .84 .83 Kentucky
Spring .81 to .86 .84
Fall .79 to .87 .82
Winter .75 to .85 .81 District of Columbia
Spring .81 to .90 .85
8
Validity (Partially Convincing Evidence)
Validity refers to the extent to which a tool accurately measures the underlying construct that it is
intended to measure. Content validity represents how well a tool measures the skills and
knowledge of a particular domain. Criterion validity measures how well scores on a tool
correlate with scores on an external measure, such as state tests.
Content Validity
Content validity evidence shows that test content is appropriate for the particular constructs that
are being measured. Content validity is measured by agreement among subject matter experts
about test material and alignment to state standards, by highly reliable training procedures for
item writers, by thorough reviews of test material for accuracy and lack of bias, and by
examination of depth of knowledge of test questions.
To ensure content validity of all tests, Discovery Education Assessment carefully aligns the
content of its assessments to a given state’s content standards and the content sampled by the
respective high stakes test. Discovery Education Assessment hereby employs one of the leading
alignment research methodologies, the Webb Alignment Tool (WAT), which has continually
supported the alignment of our tests to state specific content standards both in breadth (i.e.,
amount of standards and objectives sampled) and depth (i.e., cognitive complexity of standards
and objectives). All Discovery Education Assessment tests are thus state specific and feature
matching reporting categories of a given state’s large-scale assessment used for accountability
purposes.
DEA’s screening tools in Reading and Mathematics are aligned to state specific standards. The
Kentucky screening tools match standards on the Kentucky Core Content Test (KCCT). The
District of Columbia screening tools match standards on the District of Columbia
Comprehensive Assessment System (DCCAS).
Criterion validity: predictive and concurrent
Criterion validity evidence demonstrates that test scores predict scores on an important criterion
variable, such as a state’s standardized large-scale assessment. Predictive validity occurs when
the screening tool is administered at least three months before a state test. DEA screening tools
given during the Fall and Winter periods are predictive of the state test given in the spring.
Concurrent validity occurs when the screening tool is administered less than three months before
a state test. The DEA screening tool administered in the spring represents concurrent validity.
The following tables present predictive and concurrent correlations between DEA Reading and
Mathematics Screeners and either Kentucky or District of Columbia state tests from Spring 2008.
Additional data is also provided for the District of Columbia for Spring 2009. The median
correlations for Reading range from .61 to .75, and the median correlations for Mathematics
9
range from .61 to .79. Correlations are presented for all students first and then for the two
disaggregated groups of African-American and Hispanic.
The National Center established criteria of .70 for predictive validities. Many DEA validities
exceeded this value and many others were in the high .60 range. DEA received a rating of
Partially Convincing for validity.
Table 3: Predictive and Concurrent Validity for Reading Screeners for All Students
Reading Test Period Range Median
Fall Predictive .69 to .74 .71
Winter Predictive .67 to .71 .68 Kentucky
Spring Concurrent .64 to .73 .70
Fall Predictive .65 to .69 .67
Winter Predictive .65 to .71 .66 District of Columbia
Spring Concurrent .65 to .70 .69
Table 4: Predictive and Concurrent Validity for Reading Screeners for African-American Students
Reading Test Period Range Median
Fall Predictive .60 to .68 .67
Winter Predictive .60 to .75 .67 Kentucky
Spring Concurrent .48 to .70 .68
Fall Predictive .57 to .65 .61
Winter Predictive .61 to .70 .62 District of Columbia
Spring Concurrent .63 to .69 .64
Table 5: Predictive and Concurrent Validity for Reading Screeners for Hispanic Students
Reading Test Period Range Median
Fall Predictive .59 to .70 .65
Winter Predictive .62 to .69 .65 District of Columbia
Spring Concurrent .60 to .72 .63
10
Table 6: Predictive and Concurrent Validity for Reading Screeners for District of Columbia 2009
Reading Test Period Range Median
Fall Predictive .66 to .73 .70
Winter Predictive .68 to .78 .74 District of Columbia
Spring Concurrent .69 to .78 .76
Table 7: Predictive and Concurrent Validity for Mathematics Screeners for All Students
Mathematics Test Period Range Median
Fall Predictive .71 to .81 .76
Winter Predictive .72 to .82 .76 Kentucky
Spring Concurrent .73 to .78 .76
Fall Predictive .56 to .70 .67
Winter Predictive .67 to .75 .71 District of Columbia
Spring Concurrent .65 to .77 .74
Table 8: Predictive and Concurrent Validity for Mathematics Screeners for African-American
Students
Mathematics Test Period Range Median
Fall Predictive .67 to .70 .69
Winter Predictive .66 to .77 .72 Kentucky
Spring Concurrent .63 to .76 .72
Fall Predictive .52 to .64 .61
Winter Predictive .63 to .71 .64 District of Columbia
Spring Concurrent .58 to .74 .69
Table 9: Predictive and Concurrent Validity for Mathematics Screeners for Hispanic Students
Mathematics Test Period Range Median
Fall Predictive .47 to .70 .64
Winter Predictive .68 to .78 .68 District of Columbia
Spring Concurrent .65 to .76 .72
11
Table 10: Predictive and Concurrent Validity for Mathematics Screeners for District of Columbia
2009
Mathematics Test Period Range Median
Fall Predictive .65 to .75 .71
Winter Predictive .72 to .80 .76 District of Columbia
Spring Concurrent .78 to .81 .79
12
Classification Accuracy (Convincing Evidence)
A screening tool should classify students into one of two categories: At Risk or Not At Risk.
The accuracy of this classification can be assessed by comparing this prediction to a student’s
status on a standardized outcome measure. DEA benchmarks for Reading and Mathematics
classified students into the two categories of At Risk and Not At Risk based on the following
Kentucky and DC specific proficiency levels:
Kentucky
• Novice and Apprentice (At Risk)
• Proficient and Distinguished (Not At
Risk)
District of Columbia
• Below Basic and Basic (At Risk)
• Proficient and Advanced (Not At
Risk)
The actual proficiency level of students was obtained from results on the Spring 2008 KCCT or
the Spring 2008 DCCAS. Furthermore, additional results were obtained for the Spring 2009
DCCAS.
The accuracy and errors of predictions using a screening tool can be classified into one of four
outcomes.
State Test
At Risk Not At Risk
At Risk True Positive (a) False Positive (b) Total Predicted
At Risk (a+b)
Screener
Not At Risk
False Negative (c) True Negative (d) Total Predicted
Not At Risk
(c+d)
Total True At Risk
(a+c)
Total True Not At
Risk (b+d)
Total Students
(a+b+c+d)
a) True Positive indicates the number of
students predicted as At Risk on the screener
that are actually At Risk on the state test.
c) False Positive indicates the number of
students predicted as Not At Risk on the
screener that are actually At Risk on the
state test. So, these students have been
falsely identified as Not At Risk.
b) False Positive indicates the number of
students predicted as At Risk on the screener
that are actually Not At Risk on the state
test. So, these students have been falsely
identified as At Risk.
d) True Negative indicates the number of
students predicted as Not At Risk on the
screener that are actually Not At Risk on the
state test.
13
Two desirable characteristics of screeners are Sensitivity and Specificity. Sensitivity is the
percent of Total True At Risk students that are True Positives. Specificity is the percent of Total
True Not At Risk students that are True Negatives. Sensitivity and Specificity are illustrated in
the following table
Sensitivity is True Positive (a) divided by Total True At Risk (a+c)
Specificity is True Negative (d) divided by Total True Not at Risk (b+d)
Let’s look at a specific example. The following table is for Kentucky Grade 3 Reading. The
Screener is the DEA Fall Reading Benchmark and the State Test is the KCCT for Spring 2008.
State Test KCCT 2008
At Risk Not At Risk
At Risk 117 126 243 Screener DEA
Fall Benchmark Not At Risk 18 407 425
125 533 668
Sensitivity is True Positive (a) divided by Total True At Risk (a+c)
Sensitivity = 117/125 = .87 or 87%
Specificity is True Negative (d) divided by Total True Not at Risk (b+d)
Specificity is 407/533 = .76 or 76%
State Test
At Risk Not At Risk
At Risk True Positive (a) False Positive (b) Total Predicted
At Risk (a+b)
Screener
Not At Risk
False Negative (c) True Negative (d) Total Predicted
Not At Risk
(c+d)
Total True At Risk
(a+c)
Total True Not At
Risk (b+d)
Total Students
(a+b+c+d)
14
A screener tool has to establish a cut score that separates At Risk from Not At Risk. This cut
score is used to predict performance on the state test. So the cut score has to be set in advance. A
particular cut score is associated with levels of Sensitivity and Specificity. A different cut score
would have different levels. Good cut scores strive to balance high levels of Sensitivity and
Specificity.
Table 11 shows the relationship between sensitivity and specificity for this same Kentucky
Grade 3 Reading test. The test had 36 questions. Selecting a particular value of the number
correct has an associated level of sensitivity and specificity. If the cut score of 16 was used, this
table is indicating that the Sensitivity would be .72 or 72% and the Specificity would be .85 or
85%. Thus, there would be higher accuracy in classifying Not At Risk students than At Risk
students.
Good screening tools strive to balance Sensitivity and Specificity at high levels. For this
particular test, a cut score of 18 was used to differentiate At Risk from Not At Risk. This cut
score has a Sensitivity index of .87 or 87% and a Specificity index of .76 or 76% (the same
values that were obtained from the analysis in the table above).
The relationship between Sensitivity and Specificity can be graphed in an analysis called a
Receiver Operating Characteristic. For this graph, values of Sensitivity are plotted on the y-axis
and values of 1 minus Specificity are plotted on the x-axis. Figure 1 shows this ROC curve for
the Kentucky Grade 3 Reading test.
Classification accuracy was measured by a statistic known as the Area Under the Curve (AUC).
AUC values had to be .85 or greater to receive a rating of Convincing Evidence. (Area Under
the Curve (AUC) Statistic: an overall indication of the diagnostic accuracy of a Receiver
Operating Characteristic (ROC) curve. ROC curves are a generalization of the set of potential
combinations of sensitivity and specificity possible for predictors. AUC values closer to 1
indicate the screening measure reliably distinguishes among students with satisfactory and
unsatisfactory reading performance, whereas values at .50 indicate the predictor is no better than
chance.)
Tables 12 to 14 present Sensitivity and AUC values for Reading Screeners for all students and
for the African-American and Hispanic student subgroups. Tables 15 to 17 present Sensitivity
and AUC values for Mathematics Screeners for all students and for the African-American and
Hispanic student subgroups. The AUC values are all in the .80 and above range with most at .85
and above.
DEA received a rating of Convincing Evidence for classification accuracy.
15
Table 11: Levels of Sensitivity and Specificity
by Cut Score
Figure 1: ROC Curve for Kentucky
Grade 3 Reading Test
Cut
Score Sensitivity Specificity
1 -
Specificity
2 0.00 1.00 0.00
4 0.00 1.00 0.00
5 0.01 1.00 0.00
6 0.03 0.99 0.01
7 0.08 0.98 0.02
8 0.10 0.98 0.02
9 0.14 0.98 0.02
10 0.24 0.97 0.03
11 0.35 0.95 0.05
12 0.44 0.94 0.06
13 0.50 0.92 0.08
14 0.56 0.90 0.10
15 0.64 0.88 0.12
16 0.72 0.85 0.15
17 0.79 0.81 0.19
18 0.87 0.76 0.24
19 0.93 0.73 0.27
20 0.97 0.68 0.32
21 0.97 0.64 0.36
22 0.98 0.57 0.43
23 0.98 0.52 0.48
24 0.99 0.46 0.54
25 0.99 0.41 0.59
26 0.99 0.33 0.67
27 0.99 0.27 0.73
28 0.99 0.21 0.79
29 0.99 0.17 0.83
30 0.99 0.11 0.89
31 1.00 0.07 0.93
32 1.00 0.04 0.96
33 1.00 0.02 0.98
34 1.00 0.01 0.99
35 1.00 0.00 1.00
36 1.00 0.00 1.00
16
Table 12: Sensitivity and AUC for Reading Screeners for All Students
Sensitivity
Area Under the
Curve (AUC)
Reading Test Period Range Median Range Median
Fall Predictive .80 to .89 .84 .85 to .90 .88
Winter Predictive .78 to .88 .84 .78 to .88 .86 Kentucky
Fall Predictive .79 to .92 .90 .84 to .86 .85
Winter Predictive .75 to .92 .88 .84 to .87 .86
District of
Columbia
Table 13: Sensitivity and AUC for Reading Screeners for African-American Students
Sensitivity
Area Under the
Curve (AUC)
Reading Test Period Range Median Range Median
Fall Predictive .79 to .93 .89 .78 to .86 .83
Winter Predictive .85 to .91 .89 .81 to .91 .84 Kentucky
Fall Predictive .79 to .93 .90 .81 to .83 .83
Winter Predictive .76 to .92 .88 .82 to .86 .84
District of
Columbia
Table 14: Sensitivity and AUC for Reading Screeners for Hispanic Students
Sensitivity
Area Under the
Curve (AUC)
Reading Test Period Range Median Range Median
Fall Predictive .79 to .93 .87 .78 to .90 .84 District of
Columbia Winter Predictive .73 to .90 .87 .81 to .91 .83
17
Table 15: Sensitivity and AUC for Mathematics Screeners for All Students
Sensitivity
Area Under the
Curve (AUC)
Mathematics Test Period Range Median Range Median
Fall Predictive .81 to .90 .85 .86 to .91 .90 Kentucky
Winter Predictive .83 to .94 .88 .87 to .93 .89
Fall Predictive .84 to .94 .91 .79 to .87 .85
District of
Columbia Winter Predictive .89 to .92 .91 .86 to .90 .87
Table 16: Sensitivity and AUC for Mathematics Screeners for African-American Students
Sensitivity
Area Under the
Curve (AUC)
Mathematics Test Period Range Median Range Median
Fall Predictive .81 to .93 .89 .81 to .90 .83
Kentucky
Winter Predictive .78 to 1.00 .89 .83 to
1.00 .86
Fall Predictive .84 to .93 .91 .81 to .84 .82
District of
Columbia Winter Predictive .89 to .93 .91 .84 to .88 .85
Table 17: Sensitivity and AUC for Mathematics Screeners for Hispanic Students
Sensitivity
Area Under the
Curve (AUC)
Mathematics Test Period Range Median Range Median
Fall Predictive .83 to .92 .86 .75 to .88 .85 District of
Columbia Winter Predictive .86 to .93 .89 .82 to .91 .89
18
Appendices
Reliability Tables
Table 17: Cronbach’s Alpha for Kentucky Reading Tests
Table 18: Cronbach’s Alpha for Kentucky Mathematics Tests
Table 19: Cronbach’s Alpha for District of Columbia Reading Tests
Table 20: Cronbach’s Alpha for District of Columbia Mathematics Tests
Reading Validity Tables
Table 21: Predictive Validity for Kentucky Reading Tests
Table 22: Concurrent Validity for Kentucky Reading Tests
Table 23: Predictive Validity for District of Columbia 2008 Reading Tests
Table 24: Concurrent Validity for District of Columbia 2008 Reading Tests
Table 25: Predictive Validity for District of Columbia 2009 Reading Tests
Table 26: Concurrent Validity for District of Columbia 2009 Reading Tests
Math Validity Tables
Table 27: Predictive Validity for Kentucky Mathematics Tests
Table 28: Concurrent Validity for Kentucky Mathematics Tests
Table 29: Predictive Validity for District of Columbia 2008 Mathematics Tests
Table 30: Concurrent Validity for District of Columbia 2008 Mathematics Tests
Table 31: Predictive Validity for District of Columbia 2009 Mathematics Tests
Table 32: Concurrent Validity for District of Columbia 2009 Mathematics Tests
Reading Classification Tables
Table 33: Classification Tables for Kentucky Reading Tests
Table 34: Classification Tables for District of Columbia Reading Tests
Math Classification Tables
Table 35: Classification Tables for Kentucky Mathematics Tests
Table 36: Classification Tables for District of Columbia Mathematics Tests
19
Reliability
Table 17: Kentucky
Cronbach's Alpha Reliability Coefficients for Reading 0708
Fall Winter Spring
Grade N Coefficient N Coefficient N Coefficient
3 10,405 0.85 9,477 0.84 6,878 0.85
4 10,895 0.73 9,222 0.85 6,593 0.86
5 10,695 0.84 9,097 0.82 6,416 0.84
6 11,246 0.82 9,500 0.84 7,727 0.86
7 10,914 0.84 9,977 0.86 8,560 0.86
8 10,712 0.85 9,543 0.84 8,723 0.84
10 5,040 0.85 3,975 0.85 3,313 0.84
Median 0.84 0.84 0.85
Table 18: Kentucky
Cronbach's Alpha Reliability Coefficients for Math 0708
Fall Winter Spring
Grade N Coefficient N Coefficient N Coefficient
3 10,624 0.76 9,716 0.84 6,775 0.86
4 10,918 0.87 9,161 0.84 6,505 0.84
5 10,840 0.86 9,136 0.79 6,486 0.84
6 11,284 0.83 9,530 0.83 7,472 0.81
7 10,694 0.82 9,970 0.79 8,537 0.86
8 10,750 0.86 9,871 0.81 8,576 0.86
10 4,579 0.81 3,444 0.84 2,796 0.82
Median 0.83 0.83 0.84
20
Table 19: District of Columbia
Cronbach's Alpha Reliability Coefficiencts for Reading 0708
Fall Winter Spring
Grade N Coefficient N Coefficient N Coefficient
3 3,010 0.87 3,225 0.89 3,956 0.87
4 3,057 0.86 3,200 0.87 3,929 0.85
5 2,970 0.86 2,953 0.81 3,634 0.86
6 2,672 0.84 2,811 0.78 3,441 0.84
7 2,317 0.84 2,339 0.77 3,102 0.87
8 2,490 0.82 2,510 0.84 3,621 0.87
10 2,493 0.86 2,355 0.82 2,717 0.84
Median 0.86 0.82 0.86
Table 20: District of Columbia
Cronbach's Alpha Reliability Coefficiencts for Math 0708
Fall Winter Spring
Grade N Coefficient N Coefficient N Coefficient
3 2,967 0.87 3,193 0.85 3,960 0.90
4 3,004 0.84 3,173 0.84 3,923 0.89
5 2,915 0.85 2,919 0.83 3,646 0.87
6 2,630 0.82 2,759 0.78 3,442 0.81
7 2,277 0.80 2,287 0.75 3,127 0.84
8 2,496 0.79 2,475 0.77 3,648 0.83
10 1,993 0.82 2,319 0.81 2,703 0.85
Median 0.82 0.81 0.85
21
Reading Validity
Table 21: KY DEA Fall & Winter Assessment & 2008 KCCT Reading Results
Predictive Validity - Full Sample
Fall Winter
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 952 0.69 p < .01 1314 0.69 p < .01
4 1060 0.74 p < .01 1398 0.70 p < .01
5 1099 0.71 p < .01 1407 0.67 p < .01
6 1668 0.69 p < .01 1809 0.71 p < .01
7 1411 0.71 p < .01 1615 0.67 p < .01
8 1440 0.71 p < .01 1830 0.68 p < .01
10 354 0.76 p < .01 352 0.64 p < .01
Median 0.71 0.68
Disagg for Ethnicity: Black
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 111 0.66 p < .01 122 0.69 p < .01
4 157 0.67 p < .01 176 0.73 p < .01
5 171 0.68 p < .01 183 0.75 p < .01
6 398 0.60 p < .01 418 0.64 p < .01
7 390 0.68 p < .01 406 0.60 p < .01
8 353 0.61 p < .01 389 0.63 p < .01
Median 0.67 0.67
22
Table 22: KY DEA Spring Assessment & 2008
KCCT Reading Results
Concurrent Validity - Full Sample
Spring
Grade N Coefficient Sig. (2-tailed)
3 1101 0.70 p < .01
4 1337 0.67 p < .01
5 1301 0.68 p < .01
6 1942 0.73 p < .01
7 1670 0.71 p < .01
8 1746 0.70 p < .01
10 309 0.64 p < .01
Median 0.7
Disagg for Ethnicity: Black
Grade N Coefficient Sig. (2-tailed)
3 70 0.66 p < .01
4 151 0.48 p < .01
5 129 0.67 p < .01
6 393 0.70 p < .01
7 369 0.70 p < .01
8 360 0.68 p < .01
Median 0.68
23
Table 23: DC DEA Fall & Winter Assessment & 2008 DC-CAS Reading Results
Predictive Validity - Full Sample
Fall Winter
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 3333 0.65 p < .01 3388 0.65 p < .01
4 3314 0.65 p < .01 3353 0.69 p < .01
5 3079 0.69 p < .01 3124 0.66 p < .01
6 2643 0.67 p < .01 2730 0.65 p < .01
7 2180 0.66 p < .01 2253 0.67 p < .01
8 2474 0.67 p < .01 2549 0.66 p < .01
10 1936 0.67 p < .01 1829 0.71 p < .01
Median 0.67 0.66
Disagg for Ethnicity: Black
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 2597 0.61 p < .01 2651 0.61 p < .01
4 2647 0.57 p < .01 2676 0.62 p < .01
5 2507 0.64 p < .01 2544 0.62 p < .01
6 2150 0.62 p < .01 2227 0.61 p < .01
7 1828 0.60 p < .01 1889 0.62 p < .01
8 2100 0.61 p < .01 2155 0.63 p < .01
10 1560 0.65 p < .01 1485 0.70 p < .01
Median 0.61 0.62
Disagg for Ethnicity: Hispanic
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 383 0.64 p < .01 387 0.62 p < .01
4 341 0.59 p < .01 346 0.66 p < .01
5 329 0.74 p < .01 336 0.65 p < .01
6 297 0.69 p < .01 301 0.69 p < .01
7 231 0.65 p < .01 237 0.69 p < .01
8 235 0.70 p < .01 247 0.60 p < .01
10 218 0.59 p < .01 212 0.62 p < .01
Median 0.65 0.65
24
Table 24: DC DEA Spring Assessment & 2008 DC-CAS
Reading Results
Concurrent Validity - Full Sample
Spring
Grade N Coefficient Sig. (2-tailed)
3 3372 0.68 p < .01
4 3372 0.69 p < .01
5 3141 0.65 p < .01
6 2796 0.69 p < .01
7 2249 0.70 p < .01
8 2594 0.66 p < .01
10 1924 0.69 p < .01
Median 0.69
Disagg for Ethnicity: Black DC Test C Reading
Grade N Coefficient Sig. (2-tailed)
3 2627 0.65 p < .01
4 2693 0.63 p < .01
5 2558 0.61 p < .01
6 2297 0.65 p < .01
7 1878 0.64 p < .01
8 2200 0.63 p < .01
10 1598 0.69 p < .01
Median 0.64
Disagg for Ethnicity: Hispanic DC Test C Reading
Grade N Coefficient Sig. (2-tailed)
3 392 0.60 p < .01
4 349 0.62 p < .01
5 343 0.65 p < .01
6 299 0.70 p < .01
7 242 0.72 p < .01
8 248 0.60 p < .01
10 208 0.63 p < .01
Median 0.63
25
Table 25: DC DEA Fall & Winter Assessment & 2009 DC-CAS Reading Results
Predictive Validity - Full Sample
Fall Winter
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 3334 0.66 p < .01 3388 0.68 p < .01
4 3002 0.67 p < .01 3020 0.71 p < .01
5 2907 0.71 p < .01 2970 0.71 p < .01
6 2106 0.73 p < .01 2183 0.76 p < .01
7 2028 0.70 p < .01 2039 0.76 p < .01
8 1976 0.69 p < .01 2006 0.78 p < .01
10 1756 0.72 p < .01 1708 0.74 p < .01
Median 0.70 0.74
Table 26: DC DEA Spring Assessment & 2009 DC-CAS Results
Concurrent Validity - Full Sample - DC Test C Reading
Spring
Grade N Coefficient Sig. (2-tailed)
3 3427 0.72 p < .01
4 3077 0.69 p < .01
5 3024 0.75 p < .01
6 2224 0.77 p < .01
7 2124 0.76 p < .01
8 2068 0.77 p < .01
10 1882 0.78 p < .01
0.76
26
Mathematics Validity
Table 27: KY DEA Fall & Winter Assessment & 2008 KCCT Mathematics Results
Predictive Validity - Full Sample
Fall Winter
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 1098 0.71 p < .01 1434 0.76 p < .01
4 1075 0.79 p < .01 1409 0.76 p < .01
5 1116 0.76 p < .01 1386 0.74 p < .01
6 1743 0.73 p < .01 1809 0.75 p < .01
7 1479 0.76 p < .01 1649 0.72 p < .01
8 1489 0.81 p < .01 1800 0.77 p < .01
11 162 0.79 p < .01 319 0.82 p < .01
Median 0.76 0.76
Disagg for Ethnicity: Black
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 184 0.68 p < .01 220 0.76 p < .01
4 204 0.67 p < .01 220 0.75 p < .01
5 201 0.64 p < .01 218 0.66 p < .01
6 477 0.69 p < .01 421 0.72 p < .01
7 446 0.70 p < .01 405 0.66 p < .01
8 434 0.69 p < .01 447 0.69 p < .01
11 45 0.69 p < .01 44 0.77 p < .01
Median 0.69 0.72
27
Table 28: KY DEA Spring Assessment & 2008 KCCT Mathematics Results
Concurrent Validity - Full Sample
Spring
Grade N Coefficient Sig. (2-tailed)
3 1356 0.74 p < .01
4 1308 0.78 p < .01
5 1311 0.79 p < .01
6 1992 0.76 p < .01
7 1753 0.73 p < .01
8 1820 0.78 p < .01
11 289 0.73 p < .01
Median 0.76
Disagg for Ethnicity: Black KY Test B Math
Grade N Coefficient Sig. (2-tailed)
3 202 0.75 p < .01
4 178 0.76 p < .01
5 170 0.74 p < .01
6 470 0.72 p < .01
7 438 0.63 p < .01
8 448 0.71 p < .01
11 46 0.72 p < .01
Median 0.72
28
Table 29: DC DEA Fall & Winter Assessment & 2008 DC-CAS Mathematics Results
Predictive Validity - Full Sample
Fall Winter
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 3284 0.67 p < .01 3364 0.71 p < .01
4 3284 0.69 p < .01 3328 0.71 p < .01
5 3040 0.70 p < .01 3089 0.75 p < .01
6 2644 0.67 p < .01 2728 0.69 p < .01
7 2165 0.68 p < .01 2238 0.70 p < .01
8 2462 0.63 p < .01 2543 0.67 p < .01
10 1882 0.56 p < .01 1863 0.73 p < .01
Median 0.67 0.71
Disagg for Ethnicity: Black
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 2547 0.61 p < .01 2629 0.67 p < .01
4 2576 0.62 p < .01 2648 0.64 p < .01
5 2466 0.64 p < .01 2510 0.71 p < .01
6 2138 0.61 p < .01 2227 0.63 p < .01
7 1800 0.59 p < .01 1863 0.61 p < .01
8 2067 0.55 p < .01 2131 0.61 p < .01
10 1508 0.52 p < .01 1504 0.69 p < .01
Median 0.61 0.64
Disagg for Ethnicity: Hispanic
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 382 0.64 p < .01 389 0.66 p < .01
4 342 0.65 p < .01 348 0.68 p < .01
5 330 0.70 p < .01 334 0.78 p < .01
6 305 0.64 p < .01 301 0.72 p < .01
7 242 0.70 p < .01 244 0.74 p < .01
8 257 0.64 p < .01 268 0.68 p < .01
10 215 0.47 p < .01 212 0.68 p < .01
Median 0.64 0.68
29
Table 30: DC DEA Spring Assessment & 2008 DC-CAS
Mathematics Results
Concurrent Validity - Full Sample
Spring
Grade N Coefficient Sig. (2-tailed)
3 3375 0.77 p < .01
4 3369 0.77 p < .01
5 3159 0.74 p < .01
6 2804 0.68 p < .01
7 2279 0.74 p < .01
8 2612 0.65 p < .01
10 1963 0.72 p < .01
Median 0.74
Disagg for Ethnicity: Black
Grade N Coefficient Sig. (2-tailed)
3 2625 0.73 p < .01
4 2693 0.74 p < .01
5 2575 0.70 p < .01
6 2296 0.61 p < .01
7 1900 0.67 p < .01
8 2197 0.58 p < .01
10 1596 0.69 p < .01
Median 0.69
Disagg for Ethnicity: Hispanic
Grade N Coefficient Sig. (2-tailed)
3 397 0.76 p < .01
4 343 0.72 p < .01
5 341 0.74 p < .01
6 307 0.71 p < .01
7 246 0.74 p < .01
8 270 0.66 p < .01
10 218 0.65 p < .01
Median 0.72
30
Table 31: DC DEA Fall & Winter Assessment & 2009 DC-CAS Mathematics Results
Predictive Validity - Full Sample
Fall Winter
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 3339 0.66 p < .01 3368 0.72 p < .01
4 2995 0.71 p < .01 3028 0.72 p < .01
5 2903 0.74 p < .01 2938 0.80 p < .01
6 2129 0.79 p < .01 2141 0.78 p < .01
7 1985 0.66 p < .01 2056 0.78 p < .01
8 1991 0.65 p < .01 2035 0.73 p < .01
10 1730 0.75 p < .01 1735 0.76 p < .01
Median 0.71 0.76
Disagg for Ethnicity: Black
Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)
3 2485 0.63 p < .01 2507 0.69 p < .01
4 2281 0.66 p < .01 2314 0.67 p < .01
5 2280 0.69 p < .01 2298 0.76 p < .01
6 1672 0.74 p < .01 1683 0.74 p < .01
7 1595 0.62 p < .01 1661 0.72 p < .01
8 1619 0.54 p < .01 1659 0.65 p < .01
10 1175 0.74 p < .01 1359 0.73 p < .01
Median 0.66 0.72
31
Table 32: DC DEA Spring Assessment
and 2009 DC-CAS Mathematics Results
Concurrent Validity - Full Sample
Spring
Grade N Coefficient Sig. (2-tailed)
3 3452 0.78 p < .01
4 3084 0.78 p < .01
5 3026 0.79 p < .01
6 2211 0.78 p < .01
7 2124 0.79 p < .01
8 2082 0.79 p < .01
10 1792 0.81 p < .01
0.79
Disagg for Ethnicity: Black
Grade N Coefficient Sig. (2-tailed)
3 2581 0.76 p < .01
4 2357 0.75 p < .01
5 2377 0.76 p < .01
6 1740 0.74 p < .01
7 1721 0.73 p < .01
8 1703 0.70 p < .01
10 1420 0.78 p < .01
0.75
32
Reading Classification Tables
Table 33: Kentucky Reading Sensitivity, Specificity & Area Under the Curve
Fall Winter
Sensitivity Specificity AUC Sensitivity Specificity AUC
Grade 3 0.87 0.76 0.88 0.88 0.74 0.88
Grade 4 0.89 0.78 0.90 0.84 0.72 0.86
Grade 5 0.84 0.78 0.89 0.84 0.72 0.87
Grade 6 0.82 0.76 0.85 0.82 0.77 0.87
Grade 7 0.85 0.76 0.87 0.82 0.72 0.85
Grade 8 0.80 0.80 0.87 0.84 0.71 0.85
Grade 10 0.83 0.79 0.88 0.78 0.68 0.78
Median 0.84 0.78 0.88 0.84 0.72 0.86
Dissag for Ethnicity: Black
Sensitivity Specificity AUC Sensitivity Specificity AUC
Grade 3 0.89 0.58 0.82 0.91 0.59 0.84
Grade 4 0.89 0.68 0.83 0.88 0.59 0.83
Grade 5 0.93 0.63 0.88 0.91 0.62 0.91
Grade 6 0.82 0.63 0.78 0.89 0.62 0.86
Grade 7 0.89 0.70 0.86 0.85 0.60 0.82
Grade 8 0.79 0.78 0.83 0.86 0.61 0.81
Median 0.89 0.66 0.83 0.89 0.61 0.84
33
Table 34: DC Reading Sensitivity, Specificity & Area Under the Curve
Fall Winter
Sensitivity Specificity AUC Sensitivity Specificity AUC
Grade 3 0.90 0.64 0.86 0.87 0.69 0.85
Grade 4 0.88 0.63 0.84 0.89 0.67 0.86
Grade 5 0.90 0.63 0.86 0.86 0.68 0.85
Grade 6 0.79 0.75 0.85 0.75 0.74 0.84
Grade 7 0.91 0.62 0.86 0.88 0.65 0.86
Grade 8 0.92 0.58 0.85 0.92 0.59 0.87
Grade 10 0.89 0.64 0.85 0.92 0.66 0.87
Median 0.90 0.63 0.85 0.88 0.67 0.86
Dissag for Ethnicity: Black
Sensitivity Specificity AUC Sensitivity Specificity AUC
Grade 3 0.90 0.58 0.83 0.87 0.64 0.84
Grade 4 0.90 0.55 0.81 0.89 0.59 0.83
Grade 5 0.90 0.57 0.83 0.87 0.64 0.83
Grade 6 0.79 0.72 0.83 0.76 0.71 0.82
Grade 7 0.92 0.57 0.85 0.88 0.59 0.84
Grade 8 0.93 0.51 0.83 0.92 0.52 0.84
Grade 10 0.89 0.61 0.83 0.92 0.63 0.86
Median 0.90 0.57 0.83 0.88 0.63 0.84
Dissag for Ethnicity: Hispanic
Sensitivity Specificity AUC Sensitivity Specificity AUC
Grade 3 0.93 0.56 0.84 0.90 0.58 0.82
Grade 4 0.79 0.58 0.78 0.85 0.60 0.82
Grade 5 0.94 0.64 0.90 0.87 0.64 0.87
Grade 6 0.82 0.61 0.84 0.73 0.69 0.81
Grade 7 0.87 0.65 0.85 0.79 0.74 0.88
Grade 8 0.92 0.66 0.88 0.87 0.71 0.91
Grade 10 0.85 0.58 0.82 0.87 0.69 0.83
Median 0.87 0.61 0.84 0.87 0.69 0.83
34
Mathematics Classification Tables
Table 35: Kentucky Math Sensitivity, Specificity & Area Under the Curve
Fall Winter
Sensitivity Specificity AUC Sensitivity Specificity AUC
Grade 3 0.83 0.76 0.86 0.83 0.77 0.89
Grade 4 0.89 0.80 0.91 0.83 0.77 0.89
Grade 5 0.86 0.84 0.91 0.83 0.76 0.87
Grade 6 0.81 0.80 0.86 0.88 0.72 0.88
Grade 7 0.90 0.67 0.87 0.89 0.66 0.87
Grade 8 0.84 0.83 0.91 0.93 0.71 0.89
Grade 10 0.85 0.80 0.90 0.94 0.80 0.93
Median 0.85 0.80 0.90 0.88 0.76 0.89
Dissag for Ethnicity: Black
Sensitivity Specificity AUC Sensitivity Specificity AUC
Grade 3 0.93 0.56 0.82 0.91 0.60 0.86
Grade 4 0.89 0.64 0.83 0.89 0.63 0.86
Grade 5 0.86 0.68 0.86 0.78 0.74 0.87
Grade 6 0.81 0.70 0.81 0.89 0.62 0.86
Grade 7 0.89 0.56 0.83 0.88 0.62 0.84
Grade 8 0.84 0.70 0.83 0.92 0.53 0.83
Grade 11 0.92 0.50 0.90 1.00 0.75 1.00
Median 0.89 0.64 0.83 0.89 0.62 0.86
35
Table 36: DC Math Sensitivity, Specificity & Area Under the Curve
Fall Winter
Sensitivity Specificity AUC Sensitivity Specificity AUC
Grade 3 0.91 0.57 0.84 0.92 0.60 0.86
Grade 4 0.90 0.63 0.85 0.91 0.66 0.87
Grade 5 0.94 0.61 0.87 0.93 0.66 0.90
Grade 6 0.93 0.59 0.86 0.89 0.65 0.87
Grade 7 0.91 0.61 0.84 0.91 0.65 0.87
Grade 8 0.88 0.64 0.85 0.91 0.66 0.88
Grade 10 0.84 0.61 0.79 0.90 0.66 0.86
Median 0.91 0.61 0.85 0.91 0.66 0.87
Dissag for Ethnicity: Black
Sensitivity Specificity AUC Sensitivity Specificity AUC
Grade 3 0.93 0.46 0.82 0.92 0.52 0.85
Grade 4 0.91 0.58 0.83 0.91 0.61 0.85
Grade 5 0.94 0.52 0.84 0.93 0.59 0.88
Grade 6 0.93 0.51 0.83 0.89 0.60 0.84
Grade 7 0.91 0.52 0.81 0.90 0.56 0.84
Grade 8 0.89 0.56 0.82 0.91 0.62 0.86
Grade 10 0.84 0.58 0.78 0.92 0.64 0.86
Median 0.91 0.52 0.82 0.91 0.60 0.85
Dissag for Ethnicity: Hispanic
Sensitivity Specificity AUC Sensitivity Specificity AUC
Grade 3 0.86 0.50 0.77 0.88 0.54 0.81
Grade 4 0.86 0.55 0.83 0.91 0.62 0.85
Grade 5 0.92 0.54 0.85 0.93 0.61 0.91
Grade 6 0.91 0.60 0.87 0.89 0.70 0.91
Grade 7 0.83 0.69 0.86 0.86 0.74 0.89
Grade 8 0.89 0.65 0.88 0.89 0.66 0.91
Grade 10 0.83 0.53 0.75 0.92 0.54 0.82
Median 0.86 0.55 0.85 0.89 0.62 0.89