colorado assessment summit_oct12

Post on 15-Jan-2015

263 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Presenter - John Cronin, Ph.D.

Contacting us:NWEA Main Number: 503-624-1951 E-mail: rebecca.moore@nwea.org

This PowerPoint presentation and recommended resources are available at our website: www.kingsburycenter.org

Considerations when using tests for teacher evaluation

Label each player as effective, partially effective, or ineffective

Avg. HR RBI SB

.309 5 54 7

.303 13 53 20

.271 4 30 7

.270 28 71 4

.260 16 58 3

.238 7 37 1

.217 5 28 0

Label each player as effective, partially effective, or ineffective

Avg. HR RBI SB

Rosario .309 5 54 7

Gonzales .303 13 53 20

Scuturo .271 4 30 7

Cudger .270 28 71 4

Helton .260 16 58 3

Hernandez .238 7 37 1

Rosario .217 5 28 0

Facts about baseball players

• If effective baseball players hit .300, then 90% of baseball players are ineffective.

• If effective baseball players are better-than-av average hitters than 50% are ineffective.

• A baseball player retains his job is he performs better than the available replacement.

• Most of the pool of available replacements are lousy baseball players.

Application to teaching

Don’t dismiss teachers for incompetence unless you know you can replace them with someone better.

Don’t identify more teachers for dismissal than you can support through remediation.

Don’t identify more teachers for dismissal than you can manage through the dismissal process.

Key requirements related to testing

• Assessment constitutes 50% of the evaluation.• Statewide summative assessments for subjects in which available.

Districts will be on their own for other subjects.• Use of the Colorado Growth Model with statewide assessment.• A measure of individually attributed or collectively attributed student

growth.• Local measure must be credible, valid (aligned), reliable, and inferences

from the measure must be supportable by evidence and logic.• The law requires that the measures should support consistent inferences.• Rating of ineffective or partially effective can lead to loss of non-

probationary status.• If a value-added model is used the model must be transparent enough to

permit external evaluation.

Unique characteristics of the Colorado approach

• Student progress counts for 50% of the evaluation.

• Teachers are evaluated on both a “catch up” and “keep up” metric (at least on TCAP)

• The Colorado Growth Model will likely be used to evaluate progress (at least on TCAP)

Unique characteristics of the Colorado approach

• Student progress counts for 50% of the evaluation.

• Teachers are evaluated on both a “catch up” and “keep up” metric (at least on TCAP)

• The Colorado Growth Model will likely be used to evaluate progress (at least on TCAP)

Obvious possible issues

• The requirement that the assessment support inferences of teacher effectiveness opens a legal question.

• The credibility requirement is unique and not interpreted.

Testing

Metric (Growth or Gain Score)

Analysis (Value Added Effect Size and/or ranking)

Evaluation (Performance Rating)

How tests are used to evaluate teachers and principals

Expect consistent inconsistency!

Inconsistency occurs because

• Of differences in test design. • Differences in testing conditions. • Differences in models being applied to

evaluate growth.

Inconsistency between tests

California STAR NWEA MAP

Test Retest

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

The reliability problem – Inconsistency in testing conditions

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

The reliability problem – Inconsistency in testing conditions

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

The problem with spring-spring testing

3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12

Teacher 1 Summer Teacher 2

The problem with spring-spring testing

3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12

Teacher 1 Summer Teacher 2

The problem with spring-spring testing

3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12

Teacher 1 Summer Teacher 2

Characteristics of value-added metrics

• Value-added metrics are inherently NORMATIVE.• If below average = partially effective then half of the

average staff will be partially effective.• Value-added metrics can’t measure progress of the

larger group over time. • Extreme performance is more likely to have alternate

explanations.

Issues in the use of growth and value-added measures

“Among those who ranked in the top category on the TAKS reading test, more than 17% ranked among the lowest two categories on the Stanford. Similarly more than 15% of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.”

Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low Stakes Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI (2010).

Teachers with growth scores in lowest and highest quintile over two years using NWEA’s Measures of Academic Progress

Bottom quintile Y1&Y2

Top quintile Y1&Y2

Number 59/493 63/493

Percent 12% 13%

r .64 r2 .41

Typical r values for measures of teaching effectiveness range between .30 and .60 (Brown Center on Education Policy, 2010)

Reliability of teacher value-added estimates

-12.00-11.00-10.00

-9.00-8.00-7.00-6.00-5.00-4.00-3.00-2.00-1.000.001.002.003.004.005.006.007.008.009.00

10.0011.0012.00

Mathematics Growth Index Distribution by Teacher - Validity Filtered

Aver

age

Grow

th In

dex

Scor

e an

d Ra

nge

Q5

Q4

Q3

Q2

Q1

Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green line), plus or minus the standard error of the growth index estimate (black line). We removed stu-dents who had tests of questionable validity and teachers with fewer than 20 students.

Range of teacher value-added estimates

Inconsistency among the Colorado Growth Model and other value-added approaches.

Issues with the Colorado Growth Model

• When applied to MAP it discards the advantages of a cross-grade scale and robust growth norms.

• It is a descriptive and not a causal model.• As currently applied it does not control for

factors outside the teacher’s influence that may affect student growth.

A brief commentary on the Colorado Growth Model

It’s limitations

• It does not support inference.• It does not take advantage of the

useful characteristics of a vertical scale.

• It uses only prior scores and past testing history to evaluate growth.

A brief commentary on the Colorado Growth Model

Other limitations

• The model can’t be used for cross-state comparisons.

• the model is problematic for assessing long-term trends.

A finding of effectiveness or ineffectiveness is more defensible when it is arrived at by:

1. Two or more assessments of different designs.2. Two or more models of different designs.3. As many cases as possible.

It is not good to choose tests or models for local assessment in hopes that they will mimic the state assessment.

Potential Litigation Issues

The use of value-added data for high stakes personnel decisions does not yet have a strong, coherent, body of case law.

Expect litigation if value-added results are the lynchpin evidence for a teacher-dismissal case until a body of case law is established.

“The findings indicate that these modeling choices can significantly influence outcomes for individual teachers, particularly those in the tails of the performance distribution who are most likely to be targeted by high-stakes policies.” 

Ballou, D., Mokher, C. and Cavalluzzo, L. (2012) Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specification Influence Teachers’ Outcomes.

Instability at the tails of the distribution

LA Times Teacher #1LA Times Teacher #2

“Significant evidence of bias plagued the value-added model estimated for the Los Angeles Times in 2010, including significant patterns of racial disparities in teacher ratings both by the race of the student served and by the race of the teachers (see Green, Baker and Oluwole, 2012). These model biases raise the possibility that Title VII disparate impact claims might also be filed by teachers dismissed on the basis of their value-added estimates. 

Additional analyses of the data, including richer models using additional variables mitigated substantial portions of the bias in the LA Times models (Briggs & Domingue, 2010).”

Baker, B. (2012, April 28). If it’s not valid, reliability doesn’t matter so much! More on VAM-ing & SGP-ing Teacher Dismissal.

Possible racial bias in models

Issues in the use of growth and value-added measures

Lack of random assignment

The use of a value-added model assumes that the school doesn’t add a source of variation that isn’t controlled for in the model.

e.g. Young teachers are assigned disproportionate numbers of students with poor discipline records.

Measurement Issues

Moving from the model to the teacher rating

Translating ranked data to ratings - principles

• There is no “science” per se around translating a ranking to a rating. If you call a bottom 40% teacher ineffective that is a judgment.

• The rating process can be politicized.• The process is easy to over-engineer.

New York Rating System

• 60 points assigned from classroom observation• 20 points assigned from state assessment• 20 points assigned from local assessment• A score of 64 or less is rated ineffective.

Ineffective (Growth

Measures)Developing (Growth Measures) Effective (Growth Measures) Highly Effective (Growth Measures)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Ineffective (Observation

al)

0 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 21 2 3 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 62 2 4 5 6 6 6 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 93 2 5 6 7 7 8 8 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 124 3 5 7 8 9 9 10 10 11 11 11 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 155 3 6 8 9 10 11 11 12 12 13 13 14 14 14 14 15 15 15 15 16 16 16 16 16 16 16 17 17 17 17 17 17 17 17 17 18 18 18 18 18 186 3 6 8 10 11 12 13 13 14 14 15 15 16 16 16 17 17 17 17 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 217 3 7 9 11 12 13 14 15 15 16 16 17 17 18 18 18 19 19 19 20 20 20 20 20 21 21 21 21 21 22 22 22 22 22 22 22 23 23 23 23 238 3 7 10 11 13 14 15 16 17 17 18 18 19 19 20 20 20 21 21 21 22 22 22 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 259 3 8 10 12 14 15 16 17 18 18 19 20 20 21 21 22 22 23 23 23 24 24 24 24 25 25 25 25 26 26 26 26 26 27 27 27 27 27 27 28 2810 3 8 11 13 14 16 17 18 19 20 20 21 22 22 23 23 24 24 25 25 25 26 26 26 27 27 27 27 28 28 28 28 29 29 29 29 29 29 30 30 3011 3 8 11 13 15 17 18 19 20 21 22 22 23 24 24 25 25 26 26 27 27 27 28 28 28 29 29 29 30 30 30 30 31 31 31 31 31 32 32 32 3212 4 8 12 14 16 17 19 20 21 22 23 24 24 25 26 26 27 27 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 33 33 34 34 34 3413 4 9 12 14 16 18 20 21 22 23 24 25 26 26 27 28 28 29 29 30 30 31 31 31 32 32 33 33 33 34 34 34 34 35 35 35 35 36 36 36 3614 4 9 12 15 17 19 20 22 23 24 25 26 27 27 28 29 30 30 31 31 32 32 33 33 33 34 34 35 35 35 36 36 36 37 37 37 37 38 38 38 3815 4 9 13 15 18 19 21 23 24 25 26 27 28 29 29 30 31 31 32 33 33 34 34 35 35 35 36 36 37 37 37 38 38 38 39 39 39 40 40 40 40

Developing (Observation

al)

16 4 9 13 16 18 20 22 23 25 26 27 28 29 30 31 31 32 33 33 34 35 35 36 36 37 37 37 38 38 39 39 39 40 40 40 41 41 41 42 42 4217 4 9 13 16 19 21 23 24 25 27 28 29 30 31 32 33 33 34 35 35 36 37 37 38 38 39 39 39 40 40 41 41 42 42 42 43 43 43 44 44 4418 4 10 14 17 19 21 23 25 26 28 29 30 31 32 33 34 35 35 36 37 37 38 38 39 40 40 41 41 41 42 42 43 43 44 44 44 45 45 45 46 4619 4 10 14 17 20 22 24 26 27 28 30 31 32 33 34 35 36 36 37 38 39 39 40 40 41 42 42 43 43 43 44 44 45 45 46 46 46 47 47 47 4820 4 10 14 17 20 22 24 26 28 29 31 32 33 34 35 36 37 38 38 39 40 41 41 42 42 43 43 44 45 45 45 46 46 47 47 48 48 48 49 49 4921 4 10 14 18 21 23 25 27 29 30 31 33 34 35 36 37 38 39 40 40 41 42 42 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 50 51 5122 4 10 15 18 21 23 26 27 29 31 32 34 35 36 37 38 39 40 41 42 42 43 44 44 45 46 46 47 47 48 48 49 49 50 50 51 51 52 52 52 5323 4 10 15 18 21 24 26 28 30 31 33 34 36 37 38 39 40 41 42 43 43 44 45 46 46 47 48 48 49 49 50 50 51 51 52 52 53 53 54 54 5424 4 11 15 19 22 24 27 29 31 32 34 35 36 38 39 40 41 42 43 44 45 45 46 47 48 48 49 50 50 51 51 52 52 53 53 54 54 55 55 56 5625 4 11 15 19 22 25 27 29 31 33 34 36 37 39 40 41 42 43 44 45 46 47 47 48 49 50 50 51 52 52 53 53 54 54 55 55 56 56 57 57 5826 4 11 16 19 23 25 28 30 32 34 35 37 38 39 41 42 43 44 45 46 47 48 49 49 50 51 51 52 53 53 54 55 55 56 56 57 57 58 58 59 5927 4 11 16 20 23 26 28 30 32 34 36 37 39 40 42 43 44 45 46 47 48 49 50 50 51 52 53 53 54 55 55 56 57 57 58 58 59 59 60 60 6128 4 11 16 20 23 26 29 31 33 35 37 38 40 41 42 44 45 46 47 48 49 50 51 52 52 53 54 55 55 56 57 57 58 59 59 60 60 61 61 62 6229 4 11 16 20 24 26 29 31 34 35 37 39 40 42 43 45 46 47 48 49 50 51 52 53 54 54 55 56 57 57 58 59 59 60 61 61 62 62 63 63 6430 4 11 16 20 24 27 30 32 34 36 38 40 41 43 44 45 47 48 49 50 51 52 53 54 55 56 56 57 58 59 59 60 61 61 62 62 63 64 64 65 65

Effective (Observation

al)

31 4 11 17 21 24 27 30 32 35 37 39 40 42 43 45 46 47 49 50 51 52 53 54 55 56 57 57 58 59 60 61 61 62 63 63 64 64 65 66 66 6732 4 11 17 21 25 28 30 33 35 37 39 41 43 44 46 47 48 50 51 52 53 54 55 56 57 58 59 59 60 61 62 62 63 64 64 65 66 66 67 68 6833 4 12 17 21 25 28 31 33 36 38 40 42 43 45 46 48 49 50 52 53 54 55 56 57 58 59 60 61 61 62 63 64 64 65 66 66 67 68 68 69 6934 4 12 17 21 25 28 31 34 36 38 40 42 44 46 47 49 50 51 53 54 55 56 57 58 59 60 61 62 63 63 64 65 66 66 67 68 68 69 70 70 7135 4 12 17 22 25 29 32 34 37 39 41 43 45 46 48 49 51 52 53 55 56 57 58 59 60 61 62 63 64 64 65 66 67 68 68 69 70 70 71 72 7236 4 12 17 22 26 29 32 35 37 39 41 43 45 47 49 50 52 53 54 55 57 58 59 60 61 62 63 64 65 66 66 67 68 69 69 70 71 72 72 73 7437 4 12 17 22 26 29 32 35 38 40 42 44 46 48 49 51 52 54 55 56 58 59 60 61 62 63 64 65 66 67 68 68 69 70 71 71 72 73 74 74 7538 4 12 18 22 26 30 33 36 38 40 43 45 46 48 50 52 53 55 56 57 58 60 61 62 63 64 65 66 67 68 69 69 70 71 72 73 73 74 75 75 7639 4 12 18 22 26 30 33 36 39 41 43 45 47 49 51 52 54 55 57 58 59 61 62 63 64 65 66 67 68 69 70 71 71 72 73 74 75 75 76 77 7740 4 12 18 23 27 30 33 36 39 41 44 46 48 50 51 53 55 56 57 59 60 61 63 64 65 66 67 68 69 70 71 72 73 73 74 75 76 77 77 78 7941 4 12 18 23 27 31 34 37 39 42 44 46 48 50 52 54 55 57 58 60 61 62 63 65 66 67 68 69 70 71 72 73 74 75 75 76 77 78 78 79 8042 5 12 18 23 27 31 34 37 40 42 45 47 49 51 53 54 56 58 59 60 62 63 64 66 67 68 69 70 71 72 73 74 75 76 76 77 78 79 80 80 8143 5 12 18 23 27 31 34 37 40 43 45 47 49 51 53 55 57 58 60 61 63 64 65 66 68 69 70 71 72 73 74 75 76 77 78 78 79 80 81 82 8244 5 12 18 23 28 31 35 38 41 43 46 48 50 52 54 56 57 59 60 62 63 65 66 67 69 70 71 72 73 74 75 76 77 78 79 80 80 81 82 83 8445 5 13 19 24 28 32 35 38 41 44 46 48 51 53 54 56 58 60 61 63 64 66 67 68 69 71 72 73 74 75 76 77 78 79 80 81 82 82 83 84 85

Highly Effective

(Observational)

46 5 13 19 24 28 32 35 39 41 44 47 49 51 53 55 57 59 60 62 63 65 66 68 69 70 71 73 74 75 76 77 78 79 80 81 82 83 83 84 85 8647 5 13 19 24 28 32 36 39 42 45 47 49 52 54 56 58 59 61 63 64 66 67 69 70 71 72 74 75 76 77 78 79 80 81 82 83 84 85 85 86 8748 5 13 19 24 29 32 36 39 42 45 47 50 52 54 56 58 60 62 63 65 66 68 69 71 72 73 74 76 77 78 79 80 81 82 83 84 85 86 87 87 8849 5 13 19 24 29 33 36 40 43 45 48 50 53 55 57 59 61 62 64 66 67 69 70 71 73 74 75 77 78 79 80 81 82 83 84 85 86 87 88 89 8950 5 13 19 24 29 33 37 40 43 46 48 51 53 55 57 59 61 63 65 66 68 69 71 72 74 75 76 77 79 80 81 82 83 84 85 86 87 88 89 90 9051 5 13 19 25 29 33 37 40 43 46 49 51 54 56 58 60 62 64 65 67 69 70 72 73 74 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 9252 5 13 19 25 29 33 37 41 44 47 49 52 54 56 58 61 62 64 66 68 69 71 72 74 75 77 78 79 80 82 83 84 85 86 87 88 89 90 91 92 9353 5 13 19 25 30 34 37 41 44 47 50 52 55 57 59 61 63 65 67 68 70 72 73 75 76 77 79 80 81 82 84 85 86 87 88 89 90 91 92 93 9454 5 13 20 25 30 34 38 41 44 47 50 53 55 57 60 62 64 66 67 69 71 72 74 75 77 78 80 81 82 83 85 86 87 88 89 90 91 92 93 94 9555 5 13 20 25 30 34 38 41 45 48 50 53 56 58 60 62 64 66 68 70 71 73 75 76 78 79 80 82 83 84 85 87 88 89 90 91 92 93 94 95 9656 5 13 20 25 30 34 38 42 45 48 51 54 56 58 61 63 65 67 69 70 72 74 75 77 78 80 81 82 84 85 86 87 89 90 91 92 93 94 95 96 9757 5 13 20 25 30 35 38 42 45 48 51 54 56 59 61 63 65 67 69 71 73 74 76 78 79 81 82 83 85 86 87 88 90 91 92 93 94 95 96 97 9858 5 13 20 26 30 35 39 42 46 49 52 54 57 59 62 64 66 68 70 72 73 75 77 78 80 81 83 84 85 87 88 89 90 92 93 94 95 96 97 98 9959 5 13 20 26 31 35 39 43 46 49 52 55 57 60 62 64 66 68 70 72 74 76 77 79 81 82 83 85 86 88 89 90 91 92 94 95 96 97 98 99 10060 5 13 20 26 31 35 39 43 46 49 52 55 58 60 63 65 67 69 71 73 75 76 78 80 81 83 84 86 87 88 90 91 92 93 95 96 97 98 99 100 101

Unintended Consequences?

• Many principals and teachers (including good ones) will seek schools or teaching assignments that they think will improve their results.

• Principals and teachers may game the system, inadvertently or intentionally.

• Many teachers will seek opportunities to avoid grades with standardized tests.

• Ranking metrics can discourage cooperation among principals and teachers – finding ways to reward teamwork and cooperation are important.

Case Study #1 - Mean value-added performance in mathematics by school – fall to spring

-8.00

-6.00

-4.00

-2.00

0.00

2.00

4.00

6.00

Case Study #1 - Mean spring and fall test duration in minutes by school

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

Spring termFall term

-10.00

-8.00

-6.00

-4.00

-2.00

0.00

2.00

4.00

6.00

8.00

Students taking 10+ minutes longer spring than fall All other students

Case Study #1 - Mean value-added growth by school and test duration

Differences in fall-spring test durations

Case Study # 2

15%

25%

61%

Mathematics

Spring < Fall Spring = Fall Spring > FallSpring < Fall Spring = Fall Spring > Fall

0.0

1.0

2.0

3.0

4.0

5.0

6.0

Mathematics

Gro

wth

Inde

x

Differences in growth index score based on fall-spring test durations

Case Study # 2

42%

33%

25%

Fall < Spring Fall = Spring Fall > Spring

-5.0

-4.5

-4.0

-3.5

-3.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

Series1

Differences in spring -fall test durations Differences in raw growth based by spring-fall test duration

How much of summer loss is really summer loss?

Case Study # 2

0

20

40

60

80

100

120

140

160

180

200

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

Growth Index Fall test duration Spring test duration

School

Min

utes

Grow

th In

dex

Differences in fall-spring test duration (yellow-black) andDifferences in growth index scores (green) by school

Negotiated goals – Student Learning Objectives

• Negotiated goals (SLOs) are likely to be necessary in some subjects.

• It is difficult to set fair and reasonable goals for improvement absent norms or context.

• It is likely that some goals will be absurdly high and others way too low.

An alternate approach

• Give primacy to evaluator observation for judging teachers.• Focus mandatory observations on low performers. • Use assessments and value-added measurement to validate

observations.• Require reassessment when observations and assessment

data are in significant misalignment.

Possible legal issues

• Title VII of the Civil Rights Act of 1964 – Disparate impact of sanctions on a protected group.

• State statutes that provide tenure and other related protections to teachers.

• Challenges to a finding of “incompetence” stemming from the growth or value-added data.

Recommendations

• Embrace the formative advantages of growth measurement as well as the summative.

• Create comprehensive evaluation systems with multiple measures of teacher effectiveness (Rand, 2010)

• Select measures as carefully as value-added models.• Use multiple years of student achievement data.• Understand the issues and the tradeoffs.

Presenter - John Cronin, Ph.D.

Contacting us:NWEA Main Number: 503-624-1951 E-mail: rebecca.moore@nwea.org

The presentation and recommended resources are available at our website: www.kingsburycenter.org

Thank you for attending this event

top related