validity in the context of high-stakes accountability? rebecca holcombe june 24, 2015 johanna...

21
Validity in the Context of High-Stakes Accountabili ty? Rebecca Holcombe June 24, 2015 Johanna Bandle

Upload: edmund-jefferson

Post on 01-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Validity in the Context of

High-Stakes Accountability

?

Rebecca HolcombeJune 24, 2015

Johanna Bandler

Page 2: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

American Psychological Association:

“Measurement validity simply means whether a test provides useful information for a particular purpose.”

Page 3: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

State purposes for which we want useful information:

• Monitoring equity and quality

• Identifying schools that need intervention

• Identifying promising practices

Page 4: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Only part of what we want students to know is tested.

Under high stakes, schools are incentivized to focus narrowly.

Rating based on a subset or goals: is it enough?

What we want students to

learnMeasured by

local assessments

Measured for accountability

purposes

Page 5: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Rating schools: What does a single measure indicate?

Narrowing instruction to high-stakes subjects?

Scores improved in both math and science.

Page 6: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

When we see this gain pattern, should we celebrate or worry?

This is not VT data, Credit: Jennifer Jennings, NYU

Narrowing instruction within subjects to content tested for high stakes purposes?

Page 7: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

2011 High School Math Mean Scale Scores by School Size

Top quartile of schools

Middle half of schools

Bottom quartile of schools

Are scores reliable enough to “identify” the “right” schools?

Page 8: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Are scores reliable enough to “identify” the “right” schools?

2011 High School Math Mean Scale Scores by School Size

2012 High School Math Mean Scale Scores by School Size (colors reflect 2011 status)

Page 9: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

The problem of small “n”s: Are we identifying the right schools?

How many students need to take the test to get reliable school level results?

Page 10: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Assuming scores are reliable, can we trust proficiency cut scores?

1 student is 8% of total

Wow! Increase of 33% proficient!

Strong increase of 6.6, but does it

feel like it doubled?

Page 11: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Assuming scores are reliable, can we trust proficiency cut scores?

1 student is 7% of total

1 student is 2.5% of total

Page 12: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Study compared probability of graduating for students just below and just above the cut score.

Assuming we trust cut scores, is “predictive validity” of college

readiness a function of “readiness” or sampling bias?

Papay, Murnane and Willett (2010)

Page 13: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Assuming we trust cut scores, is “predictive validity” of college

readiness a function of “readiness” or sampling bias?

Compared to peers who “just pass,” low-income, urban students who “just fail” the 10th grade MCAS:

• Have an 8 percentage point lower probability of graduating on time

• Have a 4 percentage point greater probability of dropping out in the year after initial testingPapay, Murnane and Willett (2010)

No such effects observed for suburban students (regardless of income) or wealthier urban students

Page 14: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Is what we are measuring the impact of schools on learning?

Jurisdiction

% of 4th graders

scoring at or above

"Proficient" on 2013 NAEP

Minnesota 59.4%New

Hampshire 58.7%

Massachusetts 58.4%

Indiana 51.8%

Vermont 51.5%

Page 15: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Is what we are measuring the impact of schools on learning?

Jurisdiction

% of 4th graders

scoring at or above

"Proficient" on 2013 NAEP

Income of Households

(2-Year-Average Medians, 2012-13)

% of 25-34 year olds with some kind of postsecondar

y degree, 2010 census

Minnesota 59.4% $61,800.00 49.8%New

Hampshire 58.7% $70,063.00 46.0%

Massachusetts 58.4%

$63,772.19 54.3%Indiana 51.8% $48,690.82 36.1%

Vermont 51.5% $55,615.76 44.5%

Wow,Indiana!

Page 16: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Is what we are measuring the impact of schools on learning?

Jurisdiction

% of 4th graders

scoring at or above

"Proficient" on 2013 NAEP

Income of Households

(2-Year-Average Medians, 2012-13)

% of 25-34 year olds with some kind of postsecondar

y degree, 2010 census

Inclusion rate

Students with

Disabilities

Minnesota 59.4% $61,800.00 49.8% 84%New

Hampshire 58.7% $70,063.00 46.0% 83%

Massachusetts 58.4%

$63,772.19 54.3% 88%Indiana 51.8% $48,690.82 36.1% 88%

Vermont 51.5% $55,615.76 44.5% 93%

Given this range, how do we understand results?

Page 17: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Reliability and New Assessments

“You’re asking people still, even with the best of rubrics and evidence and training, to make judgments about complex forms of cognition. The more we go towards the kinds of interesting thinking and problems and situations that tend to be more about open-ended answers, the harder it is to get objective agreement in scoring.”  

-James Pellegrino (SBAC TAC in the NYT, 6/22/15)

Page 18: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Closing thought:

“Setting absurd standards and then announcing massive failures has undermined public support for public schools. . . We are dismantling public school systems whose problems are basically the problems of racial and economic polarization, segregation and economic disinvestment.” (Gary Orfield, 2014)

Page 19: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Summary: VT takeaways

1. Assuming we want to rate schools and apply sanctions based on student mastery of a subset of important content, skills and item formats, we may not be able to distinguish between schools where more learning has taken place and schools where students have learned more of tested content and formats at the expense of other valued learning.

2. Assuming we are comfortable with evaluating based on a subset of goals, scores may not be reliable enough to “identify” the “right” schools.

3. Assuming scores are reliable, performance reporting categories may (and probably do) distort underlying patterns of learning.

4. Assuming we trust scores and performance categories, what we are measuring may not be the impact of schools on learning.

Page 20: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Resources:Memo to SBAC on Performance Categorieshttp://education.vermont.gov/documents/VT_SBAC-Governing-States_Performance-Categories_11_2014.pdf

Memo to parents and caregivers on SBAC:http://education.vermont.gov/documents/RH_Letter%20to%20Parents%20and%20Caregivers_SBAC_Another%20Measure%20of%20Learning_3_17_2015.pdf

Memo to schools on SBAChttp://education.vermont.gov/documents/RH_Memo%20to%20Supts%20Principals_Keeping%20Perspective%20SBAC_3_23_2015.pdf

Vermont State Board of Education Statement and Resolution on Assessment and Accountability http://education.vermont.gov/documents/EDU-SBE_AssmntAcct_Adpted081914.pdf

Letter to parents and caregivers about the limitations of NCLBhttp://education.vermont.gov/documents/EDU-Letter_to_parents_and_caregivers_AOE_8_8_14.pdf

Page 21: Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

Partial Bibliography:Darling-Hammond, Linda; Edward Haertel, Edward; Pellegrino, James.

(2014). Making good use of new assessments: Interpreting and using scores from the Smarter Balanced Assessment Consortium. Smarter Balanced Assessment Consortium. http://education.vermont.gov/documents/EDU-WhitePaper-Making_Good_Use-of_New_Assessments.pdf

Geller, Wendy and Bailey, Glenn. VT Agency of Education Data and Research Work Group.

Ho, Andrew Dean. (2008). The problem with proficiency: Limitations of statistics and policy under No Child Left Behind. Educational Researcher. 37, 6, p. 351.

Hollingshead, L. & Childs, R.A. (2011.) Reporting the percentage of students above a cut score: The effect of group size. Educational Measurement: Issues and Practice, 30 (1), 36–43.

Orfield, Gary. (2014). A new civil rights agenda for American education. Educational Researcher, August/September 2014, p.286

Papay, John P.; Murnane, Richard J. & Willett, John B. (2010). The consequences of high school exit examinations for low-performing urban students: Evidence from Massachusetts. Educational Evaluation & Policy Analysis. Vol. 32 Issue 1, p. 5-23.