validity in the context of high-stakes accountability? rebecca holcombe june 24, 2015 johanna...

Validity in the Context of

High-Stakes Accountability

?

Rebecca HolcombeJune 24, 2015

Johanna Bandler

American Psychological Association:

“Measurement validity simply means whether a test provides useful information for a particular purpose.”

State purposes for which we want useful information:

• Monitoring equity and quality

• Identifying schools that need intervention

• Identifying promising practices

Only part of what we want students to know is tested.

Under high stakes, schools are incentivized to focus narrowly.

Rating based on a subset or goals: is it enough?

What we want students to

learnMeasured by

local assessments

Measured for accountability

purposes

Rating schools: What does a single measure indicate?

Narrowing instruction to high-stakes subjects?

Scores improved in both math and science.

When we see this gain pattern, should we celebrate or worry?

This is not VT data, Credit: Jennifer Jennings, NYU

Narrowing instruction within subjects to content tested for high stakes purposes?

2011 High School Math Mean Scale Scores by School Size

Top quartile of schools

Middle half of schools

Bottom quartile of schools

Are scores reliable enough to “identify” the “right” schools?

Are scores reliable enough to “identify” the “right” schools?

2011 High School Math Mean Scale Scores by School Size

2012 High School Math Mean Scale Scores by School Size (colors reflect 2011 status)

The problem of small “n”s: Are we identifying the right schools?

How many students need to take the test to get reliable school level results?

Assuming scores are reliable, can we trust proficiency cut scores?

1 student is 8% of total

Wow! Increase of 33% proficient!

Strong increase of 6.6, but does it

feel like it doubled?

Assuming scores are reliable, can we trust proficiency cut scores?

1 student is 7% of total

1 student is 2.5% of total

Study compared probability of graduating for students just below and just above the cut score.

Assuming we trust cut scores, is “predictive validity” of college

readiness a function of “readiness” or sampling bias?

Papay, Murnane and Willett (2010)

Assuming we trust cut scores, is “predictive validity” of college

readiness a function of “readiness” or sampling bias?

Compared to peers who “just pass,” low-income, urban students who “just fail” the 10th grade MCAS:

• Have an 8 percentage point lower probability of graduating on time

• Have a 4 percentage point greater probability of dropping out in the year after initial testingPapay, Murnane and Willett (2010)

No such effects observed for suburban students (regardless of income) or wealthier urban students

Is what we are measuring the impact of schools on learning?

Jurisdiction

% of 4th graders

scoring at or above

"Proficient" on 2013 NAEP

Minnesota 59.4%New

Hampshire 58.7%

Massachusetts 58.4%

Indiana 51.8%

Vermont 51.5%


Jurisdiction

% of 4th graders

scoring at or above


Income of Households

(2-Year-Average Medians, 2012-13)

% of 25-34 year olds with some kind of postsecondar

y degree, 2010 census

Minnesota 59.4% $61,800.00 49.8%New

Hampshire 58.7% $70,063.00 46.0%

Massachusetts 58.4%

$63,772.19 54.3%Indiana 51.8% $48,690.82 36.1%

Vermont 51.5% $55,615.76 44.5%

Wow,Indiana!


Jurisdiction

% of 4th graders

scoring at or above


Income of Households

(2-Year-Average Medians, 2012-13)

% of 25-34 year olds with some kind of postsecondar

y degree, 2010 census

Inclusion rate

Students with

Disabilities

Minnesota 59.4% $61,800.00 49.8% 84%New

Hampshire 58.7% $70,063.00 46.0% 83%

Massachusetts 58.4%

$63,772.19 54.3% 88%Indiana 51.8% $48,690.82 36.1% 88%

Vermont 51.5% $55,615.76 44.5% 93%

Given this range, how do we understand results?

Reliability and New Assessments

“You’re asking people still, even with the best of rubrics and evidence and training, to make judgments about complex forms of cognition. The more we go towards the kinds of interesting thinking and problems and situations that tend to be more about open-ended answers, the harder it is to get objective agreement in scoring.”

-James Pellegrino (SBAC TAC in the NYT, 6/22/15)

Closing thought:

“Setting absurd standards and then announcing massive failures has undermined public support for public schools. . . We are dismantling public school systems whose problems are basically the problems of racial and economic polarization, segregation and economic disinvestment.” (Gary Orfield, 2014)

Summary: VT takeaways

1. Assuming we want to rate schools and apply sanctions based on student mastery of a subset of important content, skills and item formats, we may not be able to distinguish between schools where more learning has taken place and schools where students have learned more of tested content and formats at the expense of other valued learning.

2. Assuming we are comfortable with evaluating based on a subset of goals, scores may not be reliable enough to “identify” the “right” schools.

3. Assuming scores are reliable, performance reporting categories may (and probably do) distort underlying patterns of learning.

4. Assuming we trust scores and performance categories, what we are measuring may not be the impact of schools on learning.

Resources:Memo to SBAC on Performance Categorieshttp://education.vermont.gov/documents/VT_SBAC-Governing-States_Performance-Categories_11_2014.pdf

Memo to parents and caregivers on SBAC:http://education.vermont.gov/documents/RH_Letter%20to%20Parents%20and%20Caregivers_SBAC_Another%20Measure%20of%20Learning_3_17_2015.pdf

Memo to schools on SBAChttp://education.vermont.gov/documents/RH_Memo%20to%20Supts%20Principals_Keeping%20Perspective%20SBAC_3_23_2015.pdf

Vermont State Board of Education Statement and Resolution on Assessment and Accountability http://education.vermont.gov/documents/EDU-SBE_AssmntAcct_Adpted081914.pdf

Letter to parents and caregivers about the limitations of NCLBhttp://education.vermont.gov/documents/EDU-Letter_to_parents_and_caregivers_AOE_8_8_14.pdf

http://education.vermont.gov/documents/VT_SBAC-Governing-States_Performance-Categories_11_2014.pdf

http://education.vermont.gov/documents/VT_SBAC-Governing-States_Performance-Categories_11_2014.pdf

http://education.vermont.gov/documents/RH_Letter%20to%20Parents%20and%20Caregivers_SBAC_Another%20Measure%20of%20Learning_3_17_2015.pdf



http://education.vermont.gov/documents/RH_Memo%20to%20Supts%20Principals_Keeping%20Perspective%20SBAC_3_23_2015.pdf

http://education.vermont.gov/documents/RH_Memo%20to%20Supts%20Principals_Keeping%20Perspective%20SBAC_3_23_2015.pdf

http://education.vermont.gov/documents/EDU-SBE_AssmntAcct_Adpted081914.pdf

http://education.vermont.gov/documents/EDU-SBE_AssmntAcct_Adpted081914.pdf

http://education.vermont.gov/documents/EDU-Letter_to_parents_and_caregivers_AOE_8_8_14.pdf

http://education.vermont.gov/documents/EDU-Letter_to_parents_and_caregivers_AOE_8_8_14.pdf

Partial Bibliography:Darling-Hammond, Linda; Edward Haertel, Edward; Pellegrino, James.

(2014). Making good use of new assessments: Interpreting and using scores from the Smarter Balanced Assessment Consortium. Smarter Balanced Assessment Consortium. http://education.vermont.gov/documents/EDU-WhitePaper-Making_Good_Use-of_New_Assessments.pdf

Geller, Wendy and Bailey, Glenn. VT Agency of Education Data and Research Work Group.

Ho, Andrew Dean. (2008). The problem with proficiency: Limitations of statistics and policy under No Child Left Behind. Educational Researcher. 37, 6, p. 351.

Hollingshead, L. & Childs, R.A. (2011.) Reporting the percentage of students above a cut score: The effect of group size. Educational Measurement: Issues and Practice, 30 (1), 36–43.

Orfield, Gary. (2014). A new civil rights agenda for American education. Educational Researcher, August/September 2014, p.286

Papay, John P.; Murnane, Richard J. & Willett, John B. (2010). The consequences of high school exit examinations for low-performing urban students: Evidence from Massachusetts. Educational Evaluation & Policy Analysis. Vol. 32 Issue 1, p. 5-23.

http://education.vermont.gov/documents/EDU-WhitePaper-Making_Good_Use-of_New_Assessments.pdf



validity in the context of high-stakes accountability? rebecca holcombe june 24, 2015 johanna...

Documents

high schools

scale scores

small schools

right schools

rating schools

qualityidentifying schools

high stakes purposes

highstakes subjects