discovery education assessment progress ......discovery education assessment hereby employs one of...

1

NATIONAL CENTER ON

RESPONSE TO INTERVENTION

EVALUATION CRITERIA

DISCOVERY EDUCATION ASSESSMENT

PROGRESS MONITORING SCREENING TOOL

2

Table of Contents

National Center for RTI Screening Tool………………………………………………………...3

Five Criteria for Screening Tool…………………………………………………………4

DEA Progress Monitoring Screening Tool Data………………………………………………...6

Generalizability…………………………………………………………………………..6

Reliability………………………………………………………………………………...7

Validity…………………………………………………………………………………...8

Classification Analysis…………………………………………………………………..12

Appendices………………………………………………………………………………………18

3

National Center for RTI Screening Tool

Starting in 2009, Discovery Education Assessment (DEA) benchmark assessments have received

high marks for use as screening tools from the National Center on Response to Intervention.

Screening tools can be used to identify students who are at risk of not meeting grade level

proficiency standards. These at risk students can then be placed into Response to Intervention

(RTI) or similar programs.

The National Center defines a screening tool as follows:

Screening involves brief assessments that are valid, reliable, and evidence-based. They are

conducted with all students or targeted groups of students to identify students who are at risk of

academic failure and, therefore, likely to need additional or alternative forms of instruction to

supplement the conventional general education approach.

The National Center’s Technical Review Committee (TRC) on Screening “independently

established a set of criteria for evaluating the scientific rigor of screening tools. The TRC rated

each submitted tool against these criteria.” Five types of scientifically rigorous criteria were

evaluated: (1) Classification Accuracy; (2) Generalizability; (3) Reliability; (4) Validity; and (5)

Disaggregated Reliability, Validity, and Classification Data for Diverse Populations. DEA

received high marks on all five criteria. (The NCRTI Screening Tools chart can be found at

http://www.rti4success.org/chart/screeningTools/screeningtoolschart.html#)

Convincing Evidence Partially Convincing Evidence Unconvincing Evidence

4

These five criteria were defined as follows:

Generalizability

Generalizability refers to the extent to which results generated from one population can be

applied to another population. A tool is considered more generalizable if studies have been

conducted on larger, more representative samples. A rating of Moderate High means the

screening tool has a “Large representative national sample or multiple regional/state samples

with no cross-validation or one or more regional/state samples with cross-validation.”

Reliability

Reliability refers to the consistency with which a tool classifies students from one administration

to the next. A tool is considered reliable if it produces the same results when administering the

test under different conditions, at different times, or using different forms of the test. A rating of

Convincing Evidence means that “split-half, coefficient alpha, test-retest, or inter-rater reliability

(is) greater than .80.”

Validity

Validity refers to the extent to which a tool accurately measures the underlying construct that it is

intended to measure. Validity is measured in three ways: content validity, construct validities

above .70, and predictive validities above .70.

Classification Accuracy

Classification accuracy refers to the extent to which a screening tool is able to accurately classify

students into "at risk for reading or mathematics disability" and "not at risk for reading or

mathematics disability" categories. Classification accuracy was measured by a statistic known as

the Area Under the Curve (AUC). AUC values have to be .85 or greater to receive a rating of

Convincing Evidence. (Area Under the Curve (AUC) Statistic: an overall indication of the

diagnostic accuracy of a Receiver Operating Characteristic (ROC) curve. ROC curves are a

generalization of the set of potential combinations of sensitivity and specificity possible for

predictors. AUC values closer to 1 indicate the screening measure reliably distinguishes among

students with satisfactory and unsatisfactory reading performance, whereas values at .50 indicate

the predictor is no better than chance.)

5

Disaggregated Reliability, Validity, and Classification Data for Diverse Populations

Data are disaggregated when they are calculated and reported separately for specific sub-groups.

Evidence for disaggregated reliability, validity, and classification data receive the highest scores

in this category.

The following sections describe the specific studies DEA used to achieve high marks from the

National Center.

6

DEA Progress Monitoring Screening Tool Data

DEA Screeners encompass the subjects of Reading and Mathematics, spanning Grades 3 to 10.

Evaluation criteria will be presented in the following order: Generalizability, Reliability,

Validity, and then Classification Accuracy. Information on disaggregation by ethnic group will

be outlined in the sections on validity and classification accuracy.

Generalizability (Moderate High)

Once again, generalizability refers to the extent to which results generated from one population

can be applied to another population. A tool is considered more generalizable if studies have

been conducted on larger, more representative samples. DEA presented data drawn from two

representative sources: (1) The state of Kentucky, comprised of more than 6000 students from

five representative school districts; and (2) the District of Columbia Public School System, one

of the nation’s largest urban districts comprised of more than 20,000 students in Grades 3 to 10.

Standardized test scores were obtained for each student from the Kentucky Core Commonwealth

Test (KCCT) from the Spring of 2008 and the District of Columbia Comprehensive Assessment

System (DCCAS), also from the Spring of 2008. Each student also completed three DEA

benchmarks during the 2007-2008 school year: Fall, Winter, and Spring. Additional data was

also obtained for the DCCAS from Spring 2009.

7

Reliability (Convincing Evidence)

Reliability refers to the consistency with which a tool classifies students from one administration

to the next. A tool is considered reliable if it produces the same results when administering the

test under different conditions, at different times, or using different forms of the test. Cronbach’s

alpha was used to measure a screener’s reliability.

The following tables presents the range and median reliability coefficients for DEA Screeners for

the three time periods (Fall, Winter, Spring), separately for Kentucky and DC, and also

separately for Reading and Mathematics. The median reliabilities for all test ranges exceed .80,

the criteria established by the National Center to receive a Convincing rating.

Table 1: Reliabilities for DEA Reading Screeners

Reading Test Period Range Median

Fall .73 to .85 .84

Winter .82 to .86 .84 Kentucky

Spring .84 to .86 .85

Fall .82 to .87 .86

Winter .77 to .89 .82 District of Columbia

Spring .84 to .87 .86

Table 2: Reliabilities for DEA Mathematics Screeners

Mathematics Test Period Range Median

Fall .76 to .87 .83

Winter .79 to .84 .83 Kentucky

Spring .81 to .86 .84

Fall .79 to .87 .82

Winter .75 to .85 .81 District of Columbia

Spring .81 to .90 .85

8

Validity (Partially Convincing Evidence)

Validity refers to the extent to which a tool accurately measures the underlying construct that it is

intended to measure. Content validity represents how well a tool measures the skills and

knowledge of a particular domain. Criterion validity measures how well scores on a tool

correlate with scores on an external measure, such as state tests.

Content Validity

Content validity evidence shows that test content is appropriate for the particular constructs that

are being measured. Content validity is measured by agreement among subject matter experts

about test material and alignment to state standards, by highly reliable training procedures for

item writers, by thorough reviews of test material for accuracy and lack of bias, and by

examination of depth of knowledge of test questions.

To ensure content validity of all tests, Discovery Education Assessment carefully aligns the

content of its assessments to a given state’s content standards and the content sampled by the

respective high stakes test. Discovery Education Assessment hereby employs one of the leading

alignment research methodologies, the Webb Alignment Tool (WAT), which has continually

supported the alignment of our tests to state specific content standards both in breadth (i.e.,

amount of standards and objectives sampled) and depth (i.e., cognitive complexity of standards

and objectives). All Discovery Education Assessment tests are thus state specific and feature

matching reporting categories of a given state’s large-scale assessment used for accountability

purposes.

DEA’s screening tools in Reading and Mathematics are aligned to state specific standards. The

Kentucky screening tools match standards on the Kentucky Core Content Test (KCCT). The

District of Columbia screening tools match standards on the District of Columbia

Comprehensive Assessment System (DCCAS).

Criterion validity: predictive and concurrent

Criterion validity evidence demonstrates that test scores predict scores on an important criterion

variable, such as a state’s standardized large-scale assessment. Predictive validity occurs when

the screening tool is administered at least three months before a state test. DEA screening tools

given during the Fall and Winter periods are predictive of the state test given in the spring.

Concurrent validity occurs when the screening tool is administered less than three months before

a state test. The DEA screening tool administered in the spring represents concurrent validity.

The following tables present predictive and concurrent correlations between DEA Reading and

Mathematics Screeners and either Kentucky or District of Columbia state tests from Spring 2008.

Additional data is also provided for the District of Columbia for Spring 2009. The median

correlations for Reading range from .61 to .75, and the median correlations for Mathematics

9

range from .61 to .79. Correlations are presented for all students first and then for the two

disaggregated groups of African-American and Hispanic.

The National Center established criteria of .70 for predictive validities. Many DEA validities

exceeded this value and many others were in the high .60 range. DEA received a rating of

Partially Convincing for validity.

Table 3: Predictive and Concurrent Validity for Reading Screeners for All Students


Fall Predictive .69 to .74 .71

Winter Predictive .67 to .71 .68 Kentucky

Spring Concurrent .64 to .73 .70


Winter Predictive .65 to .71 .66 District of Columbia


Table 4: Predictive and Concurrent Validity for Reading Screeners for African-American Students








Table 5: Predictive and Concurrent Validity for Reading Screeners for Hispanic Students





10

Table 6: Predictive and Concurrent Validity for Reading Screeners for District of Columbia 2009





Table 7: Predictive and Concurrent Validity for Mathematics Screeners for All Students








Table 8: Predictive and Concurrent Validity for Mathematics Screeners for African-American

Students








Table 9: Predictive and Concurrent Validity for Mathematics Screeners for Hispanic Students





11

Table 10: Predictive and Concurrent Validity for Mathematics Screeners for District of Columbia

2009





12

Classification Accuracy (Convincing Evidence)

A screening tool should classify students into one of two categories: At Risk or Not At Risk.

The accuracy of this classification can be assessed by comparing this prediction to a student’s

status on a standardized outcome measure. DEA benchmarks for Reading and Mathematics

classified students into the two categories of At Risk and Not At Risk based on the following

Kentucky and DC specific proficiency levels:

Kentucky

• Novice and Apprentice (At Risk)

• Proficient and Distinguished (Not At

Risk)

District of Columbia

• Below Basic and Basic (At Risk)

• Proficient and Advanced (Not At

Risk)

The actual proficiency level of students was obtained from results on the Spring 2008 KCCT or

the Spring 2008 DCCAS. Furthermore, additional results were obtained for the Spring 2009

DCCAS.

The accuracy and errors of predictions using a screening tool can be classified into one of four

outcomes.

State Test

At Risk Not At Risk

At Risk True Positive (a) False Positive (b) Total Predicted

At Risk (a+b)

Screener

Not At Risk

False Negative (c) True Negative (d) Total Predicted

Not At Risk

(c+d)

Total True At Risk

(a+c)

Total True Not At

Risk (b+d)

Total Students

(a+b+c+d)

a) True Positive indicates the number of

students predicted as At Risk on the screener

that are actually At Risk on the state test.

c) False Positive indicates the number of

students predicted as Not At Risk on the

screener that are actually At Risk on the

state test. So, these students have been

falsely identified as Not At Risk.

b) False Positive indicates the number of

students predicted as At Risk on the screener

that are actually Not At Risk on the state

test. So, these students have been falsely

identified as At Risk.

d) True Negative indicates the number of

students predicted as Not At Risk on the

screener that are actually Not At Risk on the

state test.

13

Two desirable characteristics of screeners are Sensitivity and Specificity. Sensitivity is the

percent of Total True At Risk students that are True Positives. Specificity is the percent of Total

True Not At Risk students that are True Negatives. Sensitivity and Specificity are illustrated in

the following table

Sensitivity is True Positive (a) divided by Total True At Risk (a+c)

Specificity is True Negative (d) divided by Total True Not at Risk (b+d)

Let’s look at a specific example. The following table is for Kentucky Grade 3 Reading. The

Screener is the DEA Fall Reading Benchmark and the State Test is the KCCT for Spring 2008.

State Test KCCT 2008

At Risk Not At Risk

At Risk 117 126 243 Screener DEA

Fall Benchmark Not At Risk 18 407 425

125 533 668

Sensitivity is True Positive (a) divided by Total True At Risk (a+c)

Sensitivity = 117/125 = .87 or 87%

Specificity is True Negative (d) divided by Total True Not at Risk (b+d)

Specificity is 407/533 = .76 or 76%

State Test

At Risk Not At Risk

At Risk True Positive (a) False Positive (b) Total Predicted

At Risk (a+b)

Screener

Not At Risk

False Negative (c) True Negative (d) Total Predicted

Not At Risk

(c+d)

Total True At Risk

(a+c)

Total True Not At

Risk (b+d)

Total Students

(a+b+c+d)

14

A screener tool has to establish a cut score that separates At Risk from Not At Risk. This cut

score is used to predict performance on the state test. So the cut score has to be set in advance. A

particular cut score is associated with levels of Sensitivity and Specificity. A different cut score

would have different levels. Good cut scores strive to balance high levels of Sensitivity and

Specificity.

Table 11 shows the relationship between sensitivity and specificity for this same Kentucky

Grade 3 Reading test. The test had 36 questions. Selecting a particular value of the number

correct has an associated level of sensitivity and specificity. If the cut score of 16 was used, this

table is indicating that the Sensitivity would be .72 or 72% and the Specificity would be .85 or

85%. Thus, there would be higher accuracy in classifying Not At Risk students than At Risk

students.

Good screening tools strive to balance Sensitivity and Specificity at high levels. For this

particular test, a cut score of 18 was used to differentiate At Risk from Not At Risk. This cut

score has a Sensitivity index of .87 or 87% and a Specificity index of .76 or 76% (the same

values that were obtained from the analysis in the table above).

The relationship between Sensitivity and Specificity can be graphed in an analysis called a

Receiver Operating Characteristic. For this graph, values of Sensitivity are plotted on the y-axis

and values of 1 minus Specificity are plotted on the x-axis. Figure 1 shows this ROC curve for

the Kentucky Grade 3 Reading test.

Classification accuracy was measured by a statistic known as the Area Under the Curve (AUC).

AUC values had to be .85 or greater to receive a rating of Convincing Evidence. (Area Under

the Curve (AUC) Statistic: an overall indication of the diagnostic accuracy of a Receiver

Operating Characteristic (ROC) curve. ROC curves are a generalization of the set of potential

combinations of sensitivity and specificity possible for predictors. AUC values closer to 1

indicate the screening measure reliably distinguishes among students with satisfactory and

unsatisfactory reading performance, whereas values at .50 indicate the predictor is no better than

chance.)

Tables 12 to 14 present Sensitivity and AUC values for Reading Screeners for all students and

for the African-American and Hispanic student subgroups. Tables 15 to 17 present Sensitivity

and AUC values for Mathematics Screeners for all students and for the African-American and

Hispanic student subgroups. The AUC values are all in the .80 and above range with most at .85

and above.

DEA received a rating of Convincing Evidence for classification accuracy.

15

Table 11: Levels of Sensitivity and Specificity

by Cut Score

Figure 1: ROC Curve for Kentucky

Grade 3 Reading Test

Cut

Score Sensitivity Specificity

1 -

Specificity

2 0.00 1.00 0.00

4 0.00 1.00 0.00

5 0.01 1.00 0.00

6 0.03 0.99 0.01

7 0.08 0.98 0.02

8 0.10 0.98 0.02

9 0.14 0.98 0.02

10 0.24 0.97 0.03

11 0.35 0.95 0.05

12 0.44 0.94 0.06

13 0.50 0.92 0.08

14 0.56 0.90 0.10

15 0.64 0.88 0.12

16 0.72 0.85 0.15

17 0.79 0.81 0.19

18 0.87 0.76 0.24

19 0.93 0.73 0.27

20 0.97 0.68 0.32

21 0.97 0.64 0.36

22 0.98 0.57 0.43

23 0.98 0.52 0.48

24 0.99 0.46 0.54

25 0.99 0.41 0.59

26 0.99 0.33 0.67

27 0.99 0.27 0.73

28 0.99 0.21 0.79

29 0.99 0.17 0.83

30 0.99 0.11 0.89

31 1.00 0.07 0.93

32 1.00 0.04 0.96

33 1.00 0.02 0.98

34 1.00 0.01 0.99

35 1.00 0.00 1.00

36 1.00 0.00 1.00

16

Table 12: Sensitivity and AUC for Reading Screeners for All Students

Sensitivity

Area Under the

Curve (AUC)

Reading Test Period Range Median Range Median

Fall Predictive .80 to .89 .84 .85 to .90 .88

Winter Predictive .78 to .88 .84 .78 to .88 .86 Kentucky


Winter Predictive .75 to .92 .88 .84 to .87 .86

District of

Columbia

Table 13: Sensitivity and AUC for Reading Screeners for African-American Students

Sensitivity

Area Under the

Curve (AUC)



Winter Predictive .85 to .91 .89 .81 to .91 .84 Kentucky



District of

Columbia

Table 14: Sensitivity and AUC for Reading Screeners for Hispanic Students

Sensitivity

Area Under the

Curve (AUC)


Fall Predictive .79 to .93 .87 .78 to .90 .84 District of

Columbia Winter Predictive .73 to .90 .87 .81 to .91 .83

17

Table 15: Sensitivity and AUC for Mathematics Screeners for All Students

Sensitivity

Area Under the

Curve (AUC)

Mathematics Test Period Range Median Range Median

Fall Predictive .81 to .90 .85 .86 to .91 .90 Kentucky



District of


Table 16: Sensitivity and AUC for Mathematics Screeners for African-American Students

Sensitivity

Area Under the

Curve (AUC)



Kentucky

Winter Predictive .78 to 1.00 .89 .83 to

1.00 .86


District of


Table 17: Sensitivity and AUC for Mathematics Screeners for Hispanic Students

Sensitivity

Area Under the

Curve (AUC)


Fall Predictive .83 to .92 .86 .75 to .88 .85 District of


18

Appendices

Reliability Tables

Table 17: Cronbach’s Alpha for Kentucky Reading Tests

Table 18: Cronbach’s Alpha for Kentucky Mathematics Tests

Table 19: Cronbach’s Alpha for District of Columbia Reading Tests

Table 20: Cronbach’s Alpha for District of Columbia Mathematics Tests

Reading Validity Tables

Table 21: Predictive Validity for Kentucky Reading Tests

Table 22: Concurrent Validity for Kentucky Reading Tests

Table 23: Predictive Validity for District of Columbia 2008 Reading Tests

Table 24: Concurrent Validity for District of Columbia 2008 Reading Tests

Table 25: Predictive Validity for District of Columbia 2009 Reading Tests

Table 26: Concurrent Validity for District of Columbia 2009 Reading Tests

Math Validity Tables

Table 27: Predictive Validity for Kentucky Mathematics Tests

Table 28: Concurrent Validity for Kentucky Mathematics Tests

Table 29: Predictive Validity for District of Columbia 2008 Mathematics Tests

Table 30: Concurrent Validity for District of Columbia 2008 Mathematics Tests

Table 31: Predictive Validity for District of Columbia 2009 Mathematics Tests

Table 32: Concurrent Validity for District of Columbia 2009 Mathematics Tests

Reading Classification Tables

Table 33: Classification Tables for Kentucky Reading Tests

Table 34: Classification Tables for District of Columbia Reading Tests

Math Classification Tables

Table 35: Classification Tables for Kentucky Mathematics Tests

Table 36: Classification Tables for District of Columbia Mathematics Tests

19

Reliability

Table 17: Kentucky

Cronbach's Alpha Reliability Coefficients for Reading 0708

Fall Winter Spring

Grade N Coefficient N Coefficient N Coefficient

3 10,405 0.85 9,477 0.84 6,878 0.85

4 10,895 0.73 9,222 0.85 6,593 0.86

5 10,695 0.84 9,097 0.82 6,416 0.84

6 11,246 0.82 9,500 0.84 7,727 0.86

7 10,914 0.84 9,977 0.86 8,560 0.86

8 10,712 0.85 9,543 0.84 8,723 0.84

10 5,040 0.85 3,975 0.85 3,313 0.84

Median 0.84 0.84 0.85

Table 18: Kentucky

Cronbach's Alpha Reliability Coefficients for Math 0708

Fall Winter Spring


3 10,624 0.76 9,716 0.84 6,775 0.86

4 10,918 0.87 9,161 0.84 6,505 0.84

5 10,840 0.86 9,136 0.79 6,486 0.84

6 11,284 0.83 9,530 0.83 7,472 0.81

7 10,694 0.82 9,970 0.79 8,537 0.86

8 10,750 0.86 9,871 0.81 8,576 0.86

10 4,579 0.81 3,444 0.84 2,796 0.82

Median 0.83 0.83 0.84

20

Table 19: District of Columbia

Cronbach's Alpha Reliability Coefficiencts for Reading 0708

Fall Winter Spring


3 3,010 0.87 3,225 0.89 3,956 0.87

4 3,057 0.86 3,200 0.87 3,929 0.85

5 2,970 0.86 2,953 0.81 3,634 0.86

6 2,672 0.84 2,811 0.78 3,441 0.84

7 2,317 0.84 2,339 0.77 3,102 0.87

8 2,490 0.82 2,510 0.84 3,621 0.87

10 2,493 0.86 2,355 0.82 2,717 0.84

Median 0.86 0.82 0.86

Table 20: District of Columbia

Cronbach's Alpha Reliability Coefficiencts for Math 0708

Fall Winter Spring


3 2,967 0.87 3,193 0.85 3,960 0.90

4 3,004 0.84 3,173 0.84 3,923 0.89

5 2,915 0.85 2,919 0.83 3,646 0.87

6 2,630 0.82 2,759 0.78 3,442 0.81

7 2,277 0.80 2,287 0.75 3,127 0.84

8 2,496 0.79 2,475 0.77 3,648 0.83

10 1,993 0.82 2,319 0.81 2,703 0.85

Median 0.82 0.81 0.85

21

Reading Validity

Table 21: KY DEA Fall & Winter Assessment & 2008 KCCT Reading Results

Predictive Validity - Full Sample

Fall Winter

Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed)

3 952 0.69 p < .01 1314 0.69 p < .01

4 1060 0.74 p < .01 1398 0.70 p < .01

5 1099 0.71 p < .01 1407 0.67 p < .01

6 1668 0.69 p < .01 1809 0.71 p < .01

7 1411 0.71 p < .01 1615 0.67 p < .01

8 1440 0.71 p < .01 1830 0.68 p < .01

10 354 0.76 p < .01 352 0.64 p < .01

Median 0.71 0.68

Disagg for Ethnicity: Black


3 111 0.66 p < .01 122 0.69 p < .01

4 157 0.67 p < .01 176 0.73 p < .01

5 171 0.68 p < .01 183 0.75 p < .01

6 398 0.60 p < .01 418 0.64 p < .01

7 390 0.68 p < .01 406 0.60 p < .01

8 353 0.61 p < .01 389 0.63 p < .01

Median 0.67 0.67

22

Table 22: KY DEA Spring Assessment & 2008

KCCT Reading Results

Concurrent Validity - Full Sample

Spring

Grade N Coefficient Sig. (2-tailed)

3 1101 0.70 p < .01

4 1337 0.67 p < .01

5 1301 0.68 p < .01

6 1942 0.73 p < .01

7 1670 0.71 p < .01

8 1746 0.70 p < .01

10 309 0.64 p < .01

Median 0.7



3 70 0.66 p < .01

4 151 0.48 p < .01

5 129 0.67 p < .01

6 393 0.70 p < .01

7 369 0.70 p < .01

8 360 0.68 p < .01

Median 0.68

23

Table 23: DC DEA Fall & Winter Assessment & 2008 DC-CAS Reading Results


Fall Winter


3 3333 0.65 p < .01 3388 0.65 p < .01

4 3314 0.65 p < .01 3353 0.69 p < .01

5 3079 0.69 p < .01 3124 0.66 p < .01

6 2643 0.67 p < .01 2730 0.65 p < .01

7 2180 0.66 p < .01 2253 0.67 p < .01

8 2474 0.67 p < .01 2549 0.66 p < .01

10 1936 0.67 p < .01 1829 0.71 p < .01

Median 0.67 0.66



3 2597 0.61 p < .01 2651 0.61 p < .01

4 2647 0.57 p < .01 2676 0.62 p < .01

5 2507 0.64 p < .01 2544 0.62 p < .01

6 2150 0.62 p < .01 2227 0.61 p < .01

7 1828 0.60 p < .01 1889 0.62 p < .01

8 2100 0.61 p < .01 2155 0.63 p < .01

10 1560 0.65 p < .01 1485 0.70 p < .01

Median 0.61 0.62

Disagg for Ethnicity: Hispanic


3 383 0.64 p < .01 387 0.62 p < .01

4 341 0.59 p < .01 346 0.66 p < .01

5 329 0.74 p < .01 336 0.65 p < .01

6 297 0.69 p < .01 301 0.69 p < .01

7 231 0.65 p < .01 237 0.69 p < .01

8 235 0.70 p < .01 247 0.60 p < .01

10 218 0.59 p < .01 212 0.62 p < .01

Median 0.65 0.65

24

Table 24: DC DEA Spring Assessment & 2008 DC-CAS

Reading Results


Spring


3 3372 0.68 p < .01

4 3372 0.69 p < .01

5 3141 0.65 p < .01

6 2796 0.69 p < .01

7 2249 0.70 p < .01

8 2594 0.66 p < .01

10 1924 0.69 p < .01

Median 0.69

Disagg for Ethnicity: Black DC Test C Reading


3 2627 0.65 p < .01

4 2693 0.63 p < .01

5 2558 0.61 p < .01

6 2297 0.65 p < .01

7 1878 0.64 p < .01

8 2200 0.63 p < .01

10 1598 0.69 p < .01

Median 0.64

Disagg for Ethnicity: Hispanic DC Test C Reading


3 392 0.60 p < .01

4 349 0.62 p < .01

5 343 0.65 p < .01

6 299 0.70 p < .01

7 242 0.72 p < .01

8 248 0.60 p < .01

10 208 0.63 p < .01

Median 0.63

25

Table 25: DC DEA Fall & Winter Assessment & 2009 DC-CAS Reading Results


Fall Winter


3 3334 0.66 p < .01 3388 0.68 p < .01

4 3002 0.67 p < .01 3020 0.71 p < .01

5 2907 0.71 p < .01 2970 0.71 p < .01

6 2106 0.73 p < .01 2183 0.76 p < .01

7 2028 0.70 p < .01 2039 0.76 p < .01

8 1976 0.69 p < .01 2006 0.78 p < .01

10 1756 0.72 p < .01 1708 0.74 p < .01

Median 0.70 0.74

Table 26: DC DEA Spring Assessment & 2009 DC-CAS Results

Concurrent Validity - Full Sample - DC Test C Reading

Spring


3 3427 0.72 p < .01

4 3077 0.69 p < .01

5 3024 0.75 p < .01

6 2224 0.77 p < .01

7 2124 0.76 p < .01

8 2068 0.77 p < .01

10 1882 0.78 p < .01

0.76

26

Mathematics Validity

Table 27: KY DEA Fall & Winter Assessment & 2008 KCCT Mathematics Results


Fall Winter


3 1098 0.71 p < .01 1434 0.76 p < .01

4 1075 0.79 p < .01 1409 0.76 p < .01

5 1116 0.76 p < .01 1386 0.74 p < .01

6 1743 0.73 p < .01 1809 0.75 p < .01

7 1479 0.76 p < .01 1649 0.72 p < .01

8 1489 0.81 p < .01 1800 0.77 p < .01

11 162 0.79 p < .01 319 0.82 p < .01

Median 0.76 0.76



3 184 0.68 p < .01 220 0.76 p < .01

4 204 0.67 p < .01 220 0.75 p < .01

5 201 0.64 p < .01 218 0.66 p < .01

6 477 0.69 p < .01 421 0.72 p < .01

7 446 0.70 p < .01 405 0.66 p < .01

8 434 0.69 p < .01 447 0.69 p < .01

11 45 0.69 p < .01 44 0.77 p < .01

Median 0.69 0.72

27

Table 28: KY DEA Spring Assessment & 2008 KCCT Mathematics Results


Spring


3 1356 0.74 p < .01

4 1308 0.78 p < .01

5 1311 0.79 p < .01

6 1992 0.76 p < .01

7 1753 0.73 p < .01

8 1820 0.78 p < .01

11 289 0.73 p < .01

Median 0.76

Disagg for Ethnicity: Black KY Test B Math


3 202 0.75 p < .01

4 178 0.76 p < .01

5 170 0.74 p < .01

6 470 0.72 p < .01

7 438 0.63 p < .01

8 448 0.71 p < .01

11 46 0.72 p < .01

Median 0.72

28

Table 29: DC DEA Fall & Winter Assessment & 2008 DC-CAS Mathematics Results


Fall Winter


3 3284 0.67 p < .01 3364 0.71 p < .01

4 3284 0.69 p < .01 3328 0.71 p < .01

5 3040 0.70 p < .01 3089 0.75 p < .01

6 2644 0.67 p < .01 2728 0.69 p < .01

7 2165 0.68 p < .01 2238 0.70 p < .01

8 2462 0.63 p < .01 2543 0.67 p < .01

10 1882 0.56 p < .01 1863 0.73 p < .01

Median 0.67 0.71



3 2547 0.61 p < .01 2629 0.67 p < .01

4 2576 0.62 p < .01 2648 0.64 p < .01

5 2466 0.64 p < .01 2510 0.71 p < .01

6 2138 0.61 p < .01 2227 0.63 p < .01

7 1800 0.59 p < .01 1863 0.61 p < .01

8 2067 0.55 p < .01 2131 0.61 p < .01

10 1508 0.52 p < .01 1504 0.69 p < .01

Median 0.61 0.64



3 382 0.64 p < .01 389 0.66 p < .01

4 342 0.65 p < .01 348 0.68 p < .01

5 330 0.70 p < .01 334 0.78 p < .01

6 305 0.64 p < .01 301 0.72 p < .01

7 242 0.70 p < .01 244 0.74 p < .01

8 257 0.64 p < .01 268 0.68 p < .01

10 215 0.47 p < .01 212 0.68 p < .01

Median 0.64 0.68

29

Table 30: DC DEA Spring Assessment & 2008 DC-CAS

Mathematics Results


Spring


3 3375 0.77 p < .01

4 3369 0.77 p < .01

5 3159 0.74 p < .01

6 2804 0.68 p < .01

7 2279 0.74 p < .01

8 2612 0.65 p < .01

10 1963 0.72 p < .01

Median 0.74



3 2625 0.73 p < .01

4 2693 0.74 p < .01

5 2575 0.70 p < .01

6 2296 0.61 p < .01

7 1900 0.67 p < .01

8 2197 0.58 p < .01

10 1596 0.69 p < .01

Median 0.69



3 397 0.76 p < .01

4 343 0.72 p < .01

5 341 0.74 p < .01

6 307 0.71 p < .01

7 246 0.74 p < .01

8 270 0.66 p < .01

10 218 0.65 p < .01

Median 0.72

30

Table 31: DC DEA Fall & Winter Assessment & 2009 DC-CAS Mathematics Results


Fall Winter


3 3339 0.66 p < .01 3368 0.72 p < .01

4 2995 0.71 p < .01 3028 0.72 p < .01

5 2903 0.74 p < .01 2938 0.80 p < .01

6 2129 0.79 p < .01 2141 0.78 p < .01

7 1985 0.66 p < .01 2056 0.78 p < .01

8 1991 0.65 p < .01 2035 0.73 p < .01

10 1730 0.75 p < .01 1735 0.76 p < .01

Median 0.71 0.76



3 2485 0.63 p < .01 2507 0.69 p < .01

4 2281 0.66 p < .01 2314 0.67 p < .01

5 2280 0.69 p < .01 2298 0.76 p < .01

6 1672 0.74 p < .01 1683 0.74 p < .01

7 1595 0.62 p < .01 1661 0.72 p < .01

8 1619 0.54 p < .01 1659 0.65 p < .01

10 1175 0.74 p < .01 1359 0.73 p < .01

Median 0.66 0.72

31

Table 32: DC DEA Spring Assessment

and 2009 DC-CAS Mathematics Results


Spring


3 3452 0.78 p < .01

4 3084 0.78 p < .01

5 3026 0.79 p < .01

6 2211 0.78 p < .01

7 2124 0.79 p < .01

8 2082 0.79 p < .01

10 1792 0.81 p < .01

0.79



3 2581 0.76 p < .01

4 2357 0.75 p < .01

5 2377 0.76 p < .01

6 1740 0.74 p < .01

7 1721 0.73 p < .01

8 1703 0.70 p < .01

10 1420 0.78 p < .01

0.75

32

Reading Classification Tables

Table 33: Kentucky Reading Sensitivity, Specificity & Area Under the Curve

Fall Winter

Sensitivity Specificity AUC Sensitivity Specificity AUC

Grade 3 0.87 0.76 0.88 0.88 0.74 0.88

Grade 4 0.89 0.78 0.90 0.84 0.72 0.86

Grade 5 0.84 0.78 0.89 0.84 0.72 0.87

Grade 6 0.82 0.76 0.85 0.82 0.77 0.87

Grade 7 0.85 0.76 0.87 0.82 0.72 0.85

Grade 8 0.80 0.80 0.87 0.84 0.71 0.85

Grade 10 0.83 0.79 0.88 0.78 0.68 0.78

Median 0.84 0.78 0.88 0.84 0.72 0.86

Dissag for Ethnicity: Black


Grade 3 0.89 0.58 0.82 0.91 0.59 0.84

Grade 4 0.89 0.68 0.83 0.88 0.59 0.83

Grade 5 0.93 0.63 0.88 0.91 0.62 0.91

Grade 6 0.82 0.63 0.78 0.89 0.62 0.86

Grade 7 0.89 0.70 0.86 0.85 0.60 0.82

Grade 8 0.79 0.78 0.83 0.86 0.61 0.81

Median 0.89 0.66 0.83 0.89 0.61 0.84

33

Table 34: DC Reading Sensitivity, Specificity & Area Under the Curve

Fall Winter


Grade 3 0.90 0.64 0.86 0.87 0.69 0.85

Grade 4 0.88 0.63 0.84 0.89 0.67 0.86

Grade 5 0.90 0.63 0.86 0.86 0.68 0.85

Grade 6 0.79 0.75 0.85 0.75 0.74 0.84

Grade 7 0.91 0.62 0.86 0.88 0.65 0.86

Grade 8 0.92 0.58 0.85 0.92 0.59 0.87

Grade 10 0.89 0.64 0.85 0.92 0.66 0.87

Median 0.90 0.63 0.85 0.88 0.67 0.86



Grade 3 0.90 0.58 0.83 0.87 0.64 0.84

Grade 4 0.90 0.55 0.81 0.89 0.59 0.83

Grade 5 0.90 0.57 0.83 0.87 0.64 0.83

Grade 6 0.79 0.72 0.83 0.76 0.71 0.82

Grade 7 0.92 0.57 0.85 0.88 0.59 0.84

Grade 8 0.93 0.51 0.83 0.92 0.52 0.84

Grade 10 0.89 0.61 0.83 0.92 0.63 0.86

Median 0.90 0.57 0.83 0.88 0.63 0.84

Dissag for Ethnicity: Hispanic


Grade 3 0.93 0.56 0.84 0.90 0.58 0.82

Grade 4 0.79 0.58 0.78 0.85 0.60 0.82

Grade 5 0.94 0.64 0.90 0.87 0.64 0.87

Grade 6 0.82 0.61 0.84 0.73 0.69 0.81

Grade 7 0.87 0.65 0.85 0.79 0.74 0.88

Grade 8 0.92 0.66 0.88 0.87 0.71 0.91

Grade 10 0.85 0.58 0.82 0.87 0.69 0.83

Median 0.87 0.61 0.84 0.87 0.69 0.83

34

Mathematics Classification Tables

Table 35: Kentucky Math Sensitivity, Specificity & Area Under the Curve

Fall Winter


Grade 3 0.83 0.76 0.86 0.83 0.77 0.89

Grade 4 0.89 0.80 0.91 0.83 0.77 0.89

Grade 5 0.86 0.84 0.91 0.83 0.76 0.87

Grade 6 0.81 0.80 0.86 0.88 0.72 0.88

Grade 7 0.90 0.67 0.87 0.89 0.66 0.87

Grade 8 0.84 0.83 0.91 0.93 0.71 0.89

Grade 10 0.85 0.80 0.90 0.94 0.80 0.93

Median 0.85 0.80 0.90 0.88 0.76 0.89



Grade 3 0.93 0.56 0.82 0.91 0.60 0.86

Grade 4 0.89 0.64 0.83 0.89 0.63 0.86

Grade 5 0.86 0.68 0.86 0.78 0.74 0.87

Grade 6 0.81 0.70 0.81 0.89 0.62 0.86

Grade 7 0.89 0.56 0.83 0.88 0.62 0.84

Grade 8 0.84 0.70 0.83 0.92 0.53 0.83

Grade 11 0.92 0.50 0.90 1.00 0.75 1.00

Median 0.89 0.64 0.83 0.89 0.62 0.86

35

Table 36: DC Math Sensitivity, Specificity & Area Under the Curve

Fall Winter


Grade 3 0.91 0.57 0.84 0.92 0.60 0.86

Grade 4 0.90 0.63 0.85 0.91 0.66 0.87

Grade 5 0.94 0.61 0.87 0.93 0.66 0.90

Grade 6 0.93 0.59 0.86 0.89 0.65 0.87

Grade 7 0.91 0.61 0.84 0.91 0.65 0.87

Grade 8 0.88 0.64 0.85 0.91 0.66 0.88

Grade 10 0.84 0.61 0.79 0.90 0.66 0.86

Median 0.91 0.61 0.85 0.91 0.66 0.87



Grade 3 0.93 0.46 0.82 0.92 0.52 0.85

Grade 4 0.91 0.58 0.83 0.91 0.61 0.85

Grade 5 0.94 0.52 0.84 0.93 0.59 0.88

Grade 6 0.93 0.51 0.83 0.89 0.60 0.84

Grade 7 0.91 0.52 0.81 0.90 0.56 0.84

Grade 8 0.89 0.56 0.82 0.91 0.62 0.86

Grade 10 0.84 0.58 0.78 0.92 0.64 0.86

Median 0.91 0.52 0.82 0.91 0.60 0.85

Dissag for Ethnicity: Hispanic


Grade 3 0.86 0.50 0.77 0.88 0.54 0.81

Grade 4 0.86 0.55 0.83 0.91 0.62 0.85

Grade 5 0.92 0.54 0.85 0.93 0.61 0.91

Grade 6 0.91 0.60 0.87 0.89 0.70 0.91

Grade 7 0.83 0.69 0.86 0.86 0.74 0.89

Grade 8 0.89 0.65 0.88 0.89 0.66 0.91

Grade 10 0.83 0.53 0.75 0.92 0.54 0.82

Median 0.86 0.55 0.85 0.89 0.62 0.89

discovery education assessment progress ......discovery education assessment hereby employs one of...

Documents