validity in the context of high-stakes accountability? rebecca holcombe june 24, 2015 johanna...
TRANSCRIPT
Validity in the Context of
High-Stakes Accountability
?
Rebecca HolcombeJune 24, 2015
Johanna Bandler
American Psychological Association:
“Measurement validity simply means whether a test provides useful information for a particular purpose.”
State purposes for which we want useful information:
• Monitoring equity and quality
• Identifying schools that need intervention
• Identifying promising practices
Only part of what we want students to know is tested.
Under high stakes, schools are incentivized to focus narrowly.
Rating based on a subset or goals: is it enough?
What we want students to
learnMeasured by
local assessments
Measured for accountability
purposes
Rating schools: What does a single measure indicate?
Narrowing instruction to high-stakes subjects?
Scores improved in both math and science.
When we see this gain pattern, should we celebrate or worry?
This is not VT data, Credit: Jennifer Jennings, NYU
Narrowing instruction within subjects to content tested for high stakes purposes?
2011 High School Math Mean Scale Scores by School Size
Top quartile of schools
Middle half of schools
Bottom quartile of schools
Are scores reliable enough to “identify” the “right” schools?
Are scores reliable enough to “identify” the “right” schools?
2011 High School Math Mean Scale Scores by School Size
2012 High School Math Mean Scale Scores by School Size (colors reflect 2011 status)
The problem of small “n”s: Are we identifying the right schools?
How many students need to take the test to get reliable school level results?
Assuming scores are reliable, can we trust proficiency cut scores?
1 student is 8% of total
Wow! Increase of 33% proficient!
Strong increase of 6.6, but does it
feel like it doubled?
Assuming scores are reliable, can we trust proficiency cut scores?
1 student is 7% of total
1 student is 2.5% of total
Study compared probability of graduating for students just below and just above the cut score.
Assuming we trust cut scores, is “predictive validity” of college
readiness a function of “readiness” or sampling bias?
Papay, Murnane and Willett (2010)
Assuming we trust cut scores, is “predictive validity” of college
readiness a function of “readiness” or sampling bias?
Compared to peers who “just pass,” low-income, urban students who “just fail” the 10th grade MCAS:
• Have an 8 percentage point lower probability of graduating on time
• Have a 4 percentage point greater probability of dropping out in the year after initial testingPapay, Murnane and Willett (2010)
No such effects observed for suburban students (regardless of income) or wealthier urban students
Is what we are measuring the impact of schools on learning?
Jurisdiction
% of 4th graders
scoring at or above
"Proficient" on 2013 NAEP
Minnesota 59.4%New
Hampshire 58.7%
Massachusetts 58.4%
Indiana 51.8%
Vermont 51.5%
Is what we are measuring the impact of schools on learning?
Jurisdiction
% of 4th graders
scoring at or above
"Proficient" on 2013 NAEP
Income of Households
(2-Year-Average Medians, 2012-13)
% of 25-34 year olds with some kind of postsecondar
y degree, 2010 census
Minnesota 59.4% $61,800.00 49.8%New
Hampshire 58.7% $70,063.00 46.0%
Massachusetts 58.4%
$63,772.19 54.3%Indiana 51.8% $48,690.82 36.1%
Vermont 51.5% $55,615.76 44.5%
Wow,Indiana!
Is what we are measuring the impact of schools on learning?
Jurisdiction
% of 4th graders
scoring at or above
"Proficient" on 2013 NAEP
Income of Households
(2-Year-Average Medians, 2012-13)
% of 25-34 year olds with some kind of postsecondar
y degree, 2010 census
Inclusion rate
Students with
Disabilities
Minnesota 59.4% $61,800.00 49.8% 84%New
Hampshire 58.7% $70,063.00 46.0% 83%
Massachusetts 58.4%
$63,772.19 54.3% 88%Indiana 51.8% $48,690.82 36.1% 88%
Vermont 51.5% $55,615.76 44.5% 93%
Given this range, how do we understand results?
Reliability and New Assessments
“You’re asking people still, even with the best of rubrics and evidence and training, to make judgments about complex forms of cognition. The more we go towards the kinds of interesting thinking and problems and situations that tend to be more about open-ended answers, the harder it is to get objective agreement in scoring.”
-James Pellegrino (SBAC TAC in the NYT, 6/22/15)
Closing thought:
“Setting absurd standards and then announcing massive failures has undermined public support for public schools. . . We are dismantling public school systems whose problems are basically the problems of racial and economic polarization, segregation and economic disinvestment.” (Gary Orfield, 2014)
Summary: VT takeaways
1. Assuming we want to rate schools and apply sanctions based on student mastery of a subset of important content, skills and item formats, we may not be able to distinguish between schools where more learning has taken place and schools where students have learned more of tested content and formats at the expense of other valued learning.
2. Assuming we are comfortable with evaluating based on a subset of goals, scores may not be reliable enough to “identify” the “right” schools.
3. Assuming scores are reliable, performance reporting categories may (and probably do) distort underlying patterns of learning.
4. Assuming we trust scores and performance categories, what we are measuring may not be the impact of schools on learning.
Resources:Memo to SBAC on Performance Categorieshttp://education.vermont.gov/documents/VT_SBAC-Governing-States_Performance-Categories_11_2014.pdf
Memo to parents and caregivers on SBAC:http://education.vermont.gov/documents/RH_Letter%20to%20Parents%20and%20Caregivers_SBAC_Another%20Measure%20of%20Learning_3_17_2015.pdf
Memo to schools on SBAChttp://education.vermont.gov/documents/RH_Memo%20to%20Supts%20Principals_Keeping%20Perspective%20SBAC_3_23_2015.pdf
Vermont State Board of Education Statement and Resolution on Assessment and Accountability http://education.vermont.gov/documents/EDU-SBE_AssmntAcct_Adpted081914.pdf
Letter to parents and caregivers about the limitations of NCLBhttp://education.vermont.gov/documents/EDU-Letter_to_parents_and_caregivers_AOE_8_8_14.pdf
Partial Bibliography:Darling-Hammond, Linda; Edward Haertel, Edward; Pellegrino, James.
(2014). Making good use of new assessments: Interpreting and using scores from the Smarter Balanced Assessment Consortium. Smarter Balanced Assessment Consortium. http://education.vermont.gov/documents/EDU-WhitePaper-Making_Good_Use-of_New_Assessments.pdf
Geller, Wendy and Bailey, Glenn. VT Agency of Education Data and Research Work Group.
Ho, Andrew Dean. (2008). The problem with proficiency: Limitations of statistics and policy under No Child Left Behind. Educational Researcher. 37, 6, p. 351.
Hollingshead, L. & Childs, R.A. (2011.) Reporting the percentage of students above a cut score: The effect of group size. Educational Measurement: Issues and Practice, 30 (1), 36–43.
Orfield, Gary. (2014). A new civil rights agenda for American education. Educational Researcher, August/September 2014, p.286
Papay, John P.; Murnane, Richard J. & Willett, John B. (2010). The consequences of high school exit examinations for low-performing urban students: Evidence from Massachusetts. Educational Evaluation & Policy Analysis. Vol. 32 Issue 1, p. 5-23.