colorado assessment summit_oct12

50
Presenter - John Cronin, Ph.D. Contacting us: NWEA Main Number: 503-624-1951 E-mail: [email protected] This PowerPoint presentation and recommended resources are available at our website: www.kingsburycenter.org Considerations when using tests for teacher evaluation

Upload: john-cronin

Post on 15-Jan-2015

263 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Colorado assessment summit_oct12

Presenter - John Cronin, Ph.D.

Contacting us:NWEA Main Number: 503-624-1951 E-mail: [email protected]

This PowerPoint presentation and recommended resources are available at our website: www.kingsburycenter.org

Considerations when using tests for teacher evaluation

Page 2: Colorado assessment summit_oct12

Label each player as effective, partially effective, or ineffective

Avg. HR RBI SB

.309 5 54 7

.303 13 53 20

.271 4 30 7

.270 28 71 4

.260 16 58 3

.238 7 37 1

.217 5 28 0

Page 3: Colorado assessment summit_oct12

Label each player as effective, partially effective, or ineffective

Avg. HR RBI SB

Rosario .309 5 54 7

Gonzales .303 13 53 20

Scuturo .271 4 30 7

Cudger .270 28 71 4

Helton .260 16 58 3

Hernandez .238 7 37 1

Rosario .217 5 28 0

Page 4: Colorado assessment summit_oct12

Facts about baseball players

• If effective baseball players hit .300, then 90% of baseball players are ineffective.

• If effective baseball players are better-than-av average hitters than 50% are ineffective.

• A baseball player retains his job is he performs better than the available replacement.

• Most of the pool of available replacements are lousy baseball players.

Page 5: Colorado assessment summit_oct12

Application to teaching

Don’t dismiss teachers for incompetence unless you know you can replace them with someone better.

Don’t identify more teachers for dismissal than you can support through remediation.

Don’t identify more teachers for dismissal than you can manage through the dismissal process.

Page 6: Colorado assessment summit_oct12

Key requirements related to testing

• Assessment constitutes 50% of the evaluation.• Statewide summative assessments for subjects in which available.

Districts will be on their own for other subjects.• Use of the Colorado Growth Model with statewide assessment.• A measure of individually attributed or collectively attributed student

growth.• Local measure must be credible, valid (aligned), reliable, and inferences

from the measure must be supportable by evidence and logic.• The law requires that the measures should support consistent inferences.• Rating of ineffective or partially effective can lead to loss of non-

probationary status.• If a value-added model is used the model must be transparent enough to

permit external evaluation.

Page 7: Colorado assessment summit_oct12

Unique characteristics of the Colorado approach

• Student progress counts for 50% of the evaluation.

• Teachers are evaluated on both a “catch up” and “keep up” metric (at least on TCAP)

• The Colorado Growth Model will likely be used to evaluate progress (at least on TCAP)

Page 8: Colorado assessment summit_oct12

Unique characteristics of the Colorado approach

• Student progress counts for 50% of the evaluation.

• Teachers are evaluated on both a “catch up” and “keep up” metric (at least on TCAP)

• The Colorado Growth Model will likely be used to evaluate progress (at least on TCAP)

Page 9: Colorado assessment summit_oct12

Obvious possible issues

• The requirement that the assessment support inferences of teacher effectiveness opens a legal question.

• The credibility requirement is unique and not interpreted.

Page 10: Colorado assessment summit_oct12

Testing

Metric (Growth or Gain Score)

Analysis (Value Added Effect Size and/or ranking)

Evaluation (Performance Rating)

How tests are used to evaluate teachers and principals

Page 11: Colorado assessment summit_oct12

Expect consistent inconsistency!

Page 12: Colorado assessment summit_oct12

Inconsistency occurs because

• Of differences in test design. • Differences in testing conditions. • Differences in models being applied to

evaluate growth.

Page 13: Colorado assessment summit_oct12

Inconsistency between tests

California STAR NWEA MAP

Page 14: Colorado assessment summit_oct12

Test Retest

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

The reliability problem – Inconsistency in testing conditions

Page 15: Colorado assessment summit_oct12

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

The reliability problem – Inconsistency in testing conditions

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

Page 16: Colorado assessment summit_oct12

The problem with spring-spring testing

3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12

Teacher 1 Summer Teacher 2

Page 17: Colorado assessment summit_oct12

The problem with spring-spring testing

3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12

Teacher 1 Summer Teacher 2

Page 18: Colorado assessment summit_oct12

The problem with spring-spring testing

3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12

Teacher 1 Summer Teacher 2

Page 19: Colorado assessment summit_oct12

Characteristics of value-added metrics

• Value-added metrics are inherently NORMATIVE.• If below average = partially effective then half of the

average staff will be partially effective.• Value-added metrics can’t measure progress of the

larger group over time. • Extreme performance is more likely to have alternate

explanations.

Page 20: Colorado assessment summit_oct12

Issues in the use of growth and value-added measures

“Among those who ranked in the top category on the TAKS reading test, more than 17% ranked among the lowest two categories on the Stanford. Similarly more than 15% of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.”

Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low Stakes Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI (2010).

Page 21: Colorado assessment summit_oct12

Teachers with growth scores in lowest and highest quintile over two years using NWEA’s Measures of Academic Progress

Bottom quintile Y1&Y2

Top quintile Y1&Y2

Number 59/493 63/493

Percent 12% 13%

r .64 r2 .41

Typical r values for measures of teaching effectiveness range between .30 and .60 (Brown Center on Education Policy, 2010)

Reliability of teacher value-added estimates

Page 22: Colorado assessment summit_oct12

-12.00-11.00-10.00

-9.00-8.00-7.00-6.00-5.00-4.00-3.00-2.00-1.000.001.002.003.004.005.006.007.008.009.00

10.0011.0012.00

Mathematics Growth Index Distribution by Teacher - Validity Filtered

Aver

age

Grow

th In

dex

Scor

e an

d Ra

nge

Q5

Q4

Q3

Q2

Q1

Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green line), plus or minus the standard error of the growth index estimate (black line). We removed stu-dents who had tests of questionable validity and teachers with fewer than 20 students.

Range of teacher value-added estimates

Page 24: Colorado assessment summit_oct12

Inconsistency among the Colorado Growth Model and other value-added approaches.

Page 26: Colorado assessment summit_oct12

Issues with the Colorado Growth Model

• When applied to MAP it discards the advantages of a cross-grade scale and robust growth norms.

• It is a descriptive and not a causal model.• As currently applied it does not control for

factors outside the teacher’s influence that may affect student growth.

Page 27: Colorado assessment summit_oct12

A brief commentary on the Colorado Growth Model

It’s limitations

• It does not support inference.• It does not take advantage of the

useful characteristics of a vertical scale.

• It uses only prior scores and past testing history to evaluate growth.

Page 28: Colorado assessment summit_oct12

A brief commentary on the Colorado Growth Model

Other limitations

• The model can’t be used for cross-state comparisons.

• the model is problematic for assessing long-term trends.

Page 29: Colorado assessment summit_oct12

A finding of effectiveness or ineffectiveness is more defensible when it is arrived at by:

1. Two or more assessments of different designs.2. Two or more models of different designs.3. As many cases as possible.

It is not good to choose tests or models for local assessment in hopes that they will mimic the state assessment.

Page 30: Colorado assessment summit_oct12

Potential Litigation Issues

The use of value-added data for high stakes personnel decisions does not yet have a strong, coherent, body of case law.

Expect litigation if value-added results are the lynchpin evidence for a teacher-dismissal case until a body of case law is established.

Page 31: Colorado assessment summit_oct12

“The findings indicate that these modeling choices can significantly influence outcomes for individual teachers, particularly those in the tails of the performance distribution who are most likely to be targeted by high-stakes policies.” 

Ballou, D., Mokher, C. and Cavalluzzo, L. (2012) Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specification Influence Teachers’ Outcomes.

Instability at the tails of the distribution

LA Times Teacher #1LA Times Teacher #2

Page 32: Colorado assessment summit_oct12

“Significant evidence of bias plagued the value-added model estimated for the Los Angeles Times in 2010, including significant patterns of racial disparities in teacher ratings both by the race of the student served and by the race of the teachers (see Green, Baker and Oluwole, 2012). These model biases raise the possibility that Title VII disparate impact claims might also be filed by teachers dismissed on the basis of their value-added estimates. 

Additional analyses of the data, including richer models using additional variables mitigated substantial portions of the bias in the LA Times models (Briggs & Domingue, 2010).”

Baker, B. (2012, April 28). If it’s not valid, reliability doesn’t matter so much! More on VAM-ing & SGP-ing Teacher Dismissal.

Possible racial bias in models

Page 33: Colorado assessment summit_oct12

Issues in the use of growth and value-added measures

Lack of random assignment

The use of a value-added model assumes that the school doesn’t add a source of variation that isn’t controlled for in the model.

e.g. Young teachers are assigned disproportionate numbers of students with poor discipline records.

Page 34: Colorado assessment summit_oct12

Measurement Issues

Moving from the model to the teacher rating

Page 35: Colorado assessment summit_oct12

Translating ranked data to ratings - principles

• There is no “science” per se around translating a ranking to a rating. If you call a bottom 40% teacher ineffective that is a judgment.

• The rating process can be politicized.• The process is easy to over-engineer.

Page 36: Colorado assessment summit_oct12

New York Rating System

• 60 points assigned from classroom observation• 20 points assigned from state assessment• 20 points assigned from local assessment• A score of 64 or less is rated ineffective.

Page 37: Colorado assessment summit_oct12

Ineffective (Growth

Measures)Developing (Growth Measures) Effective (Growth Measures) Highly Effective (Growth Measures)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Ineffective (Observation

al)

0 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 21 2 3 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 62 2 4 5 6 6 6 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 93 2 5 6 7 7 8 8 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 124 3 5 7 8 9 9 10 10 11 11 11 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 155 3 6 8 9 10 11 11 12 12 13 13 14 14 14 14 15 15 15 15 16 16 16 16 16 16 16 17 17 17 17 17 17 17 17 17 18 18 18 18 18 186 3 6 8 10 11 12 13 13 14 14 15 15 16 16 16 17 17 17 17 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 217 3 7 9 11 12 13 14 15 15 16 16 17 17 18 18 18 19 19 19 20 20 20 20 20 21 21 21 21 21 22 22 22 22 22 22 22 23 23 23 23 238 3 7 10 11 13 14 15 16 17 17 18 18 19 19 20 20 20 21 21 21 22 22 22 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 259 3 8 10 12 14 15 16 17 18 18 19 20 20 21 21 22 22 23 23 23 24 24 24 24 25 25 25 25 26 26 26 26 26 27 27 27 27 27 27 28 2810 3 8 11 13 14 16 17 18 19 20 20 21 22 22 23 23 24 24 25 25 25 26 26 26 27 27 27 27 28 28 28 28 29 29 29 29 29 29 30 30 3011 3 8 11 13 15 17 18 19 20 21 22 22 23 24 24 25 25 26 26 27 27 27 28 28 28 29 29 29 30 30 30 30 31 31 31 31 31 32 32 32 3212 4 8 12 14 16 17 19 20 21 22 23 24 24 25 26 26 27 27 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 33 33 34 34 34 3413 4 9 12 14 16 18 20 21 22 23 24 25 26 26 27 28 28 29 29 30 30 31 31 31 32 32 33 33 33 34 34 34 34 35 35 35 35 36 36 36 3614 4 9 12 15 17 19 20 22 23 24 25 26 27 27 28 29 30 30 31 31 32 32 33 33 33 34 34 35 35 35 36 36 36 37 37 37 37 38 38 38 3815 4 9 13 15 18 19 21 23 24 25 26 27 28 29 29 30 31 31 32 33 33 34 34 35 35 35 36 36 37 37 37 38 38 38 39 39 39 40 40 40 40

Developing (Observation

al)

16 4 9 13 16 18 20 22 23 25 26 27 28 29 30 31 31 32 33 33 34 35 35 36 36 37 37 37 38 38 39 39 39 40 40 40 41 41 41 42 42 4217 4 9 13 16 19 21 23 24 25 27 28 29 30 31 32 33 33 34 35 35 36 37 37 38 38 39 39 39 40 40 41 41 42 42 42 43 43 43 44 44 4418 4 10 14 17 19 21 23 25 26 28 29 30 31 32 33 34 35 35 36 37 37 38 38 39 40 40 41 41 41 42 42 43 43 44 44 44 45 45 45 46 4619 4 10 14 17 20 22 24 26 27 28 30 31 32 33 34 35 36 36 37 38 39 39 40 40 41 42 42 43 43 43 44 44 45 45 46 46 46 47 47 47 4820 4 10 14 17 20 22 24 26 28 29 31 32 33 34 35 36 37 38 38 39 40 41 41 42 42 43 43 44 45 45 45 46 46 47 47 48 48 48 49 49 4921 4 10 14 18 21 23 25 27 29 30 31 33 34 35 36 37 38 39 40 40 41 42 42 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 50 51 5122 4 10 15 18 21 23 26 27 29 31 32 34 35 36 37 38 39 40 41 42 42 43 44 44 45 46 46 47 47 48 48 49 49 50 50 51 51 52 52 52 5323 4 10 15 18 21 24 26 28 30 31 33 34 36 37 38 39 40 41 42 43 43 44 45 46 46 47 48 48 49 49 50 50 51 51 52 52 53 53 54 54 5424 4 11 15 19 22 24 27 29 31 32 34 35 36 38 39 40 41 42 43 44 45 45 46 47 48 48 49 50 50 51 51 52 52 53 53 54 54 55 55 56 5625 4 11 15 19 22 25 27 29 31 33 34 36 37 39 40 41 42 43 44 45 46 47 47 48 49 50 50 51 52 52 53 53 54 54 55 55 56 56 57 57 5826 4 11 16 19 23 25 28 30 32 34 35 37 38 39 41 42 43 44 45 46 47 48 49 49 50 51 51 52 53 53 54 55 55 56 56 57 57 58 58 59 5927 4 11 16 20 23 26 28 30 32 34 36 37 39 40 42 43 44 45 46 47 48 49 50 50 51 52 53 53 54 55 55 56 57 57 58 58 59 59 60 60 6128 4 11 16 20 23 26 29 31 33 35 37 38 40 41 42 44 45 46 47 48 49 50 51 52 52 53 54 55 55 56 57 57 58 59 59 60 60 61 61 62 6229 4 11 16 20 24 26 29 31 34 35 37 39 40 42 43 45 46 47 48 49 50 51 52 53 54 54 55 56 57 57 58 59 59 60 61 61 62 62 63 63 6430 4 11 16 20 24 27 30 32 34 36 38 40 41 43 44 45 47 48 49 50 51 52 53 54 55 56 56 57 58 59 59 60 61 61 62 62 63 64 64 65 65

Effective (Observation

al)

31 4 11 17 21 24 27 30 32 35 37 39 40 42 43 45 46 47 49 50 51 52 53 54 55 56 57 57 58 59 60 61 61 62 63 63 64 64 65 66 66 6732 4 11 17 21 25 28 30 33 35 37 39 41 43 44 46 47 48 50 51 52 53 54 55 56 57 58 59 59 60 61 62 62 63 64 64 65 66 66 67 68 6833 4 12 17 21 25 28 31 33 36 38 40 42 43 45 46 48 49 50 52 53 54 55 56 57 58 59 60 61 61 62 63 64 64 65 66 66 67 68 68 69 6934 4 12 17 21 25 28 31 34 36 38 40 42 44 46 47 49 50 51 53 54 55 56 57 58 59 60 61 62 63 63 64 65 66 66 67 68 68 69 70 70 7135 4 12 17 22 25 29 32 34 37 39 41 43 45 46 48 49 51 52 53 55 56 57 58 59 60 61 62 63 64 64 65 66 67 68 68 69 70 70 71 72 7236 4 12 17 22 26 29 32 35 37 39 41 43 45 47 49 50 52 53 54 55 57 58 59 60 61 62 63 64 65 66 66 67 68 69 69 70 71 72 72 73 7437 4 12 17 22 26 29 32 35 38 40 42 44 46 48 49 51 52 54 55 56 58 59 60 61 62 63 64 65 66 67 68 68 69 70 71 71 72 73 74 74 7538 4 12 18 22 26 30 33 36 38 40 43 45 46 48 50 52 53 55 56 57 58 60 61 62 63 64 65 66 67 68 69 69 70 71 72 73 73 74 75 75 7639 4 12 18 22 26 30 33 36 39 41 43 45 47 49 51 52 54 55 57 58 59 61 62 63 64 65 66 67 68 69 70 71 71 72 73 74 75 75 76 77 7740 4 12 18 23 27 30 33 36 39 41 44 46 48 50 51 53 55 56 57 59 60 61 63 64 65 66 67 68 69 70 71 72 73 73 74 75 76 77 77 78 7941 4 12 18 23 27 31 34 37 39 42 44 46 48 50 52 54 55 57 58 60 61 62 63 65 66 67 68 69 70 71 72 73 74 75 75 76 77 78 78 79 8042 5 12 18 23 27 31 34 37 40 42 45 47 49 51 53 54 56 58 59 60 62 63 64 66 67 68 69 70 71 72 73 74 75 76 76 77 78 79 80 80 8143 5 12 18 23 27 31 34 37 40 43 45 47 49 51 53 55 57 58 60 61 63 64 65 66 68 69 70 71 72 73 74 75 76 77 78 78 79 80 81 82 8244 5 12 18 23 28 31 35 38 41 43 46 48 50 52 54 56 57 59 60 62 63 65 66 67 69 70 71 72 73 74 75 76 77 78 79 80 80 81 82 83 8445 5 13 19 24 28 32 35 38 41 44 46 48 51 53 54 56 58 60 61 63 64 66 67 68 69 71 72 73 74 75 76 77 78 79 80 81 82 82 83 84 85

Highly Effective

(Observational)

46 5 13 19 24 28 32 35 39 41 44 47 49 51 53 55 57 59 60 62 63 65 66 68 69 70 71 73 74 75 76 77 78 79 80 81 82 83 83 84 85 8647 5 13 19 24 28 32 36 39 42 45 47 49 52 54 56 58 59 61 63 64 66 67 69 70 71 72 74 75 76 77 78 79 80 81 82 83 84 85 85 86 8748 5 13 19 24 29 32 36 39 42 45 47 50 52 54 56 58 60 62 63 65 66 68 69 71 72 73 74 76 77 78 79 80 81 82 83 84 85 86 87 87 8849 5 13 19 24 29 33 36 40 43 45 48 50 53 55 57 59 61 62 64 66 67 69 70 71 73 74 75 77 78 79 80 81 82 83 84 85 86 87 88 89 8950 5 13 19 24 29 33 37 40 43 46 48 51 53 55 57 59 61 63 65 66 68 69 71 72 74 75 76 77 79 80 81 82 83 84 85 86 87 88 89 90 9051 5 13 19 25 29 33 37 40 43 46 49 51 54 56 58 60 62 64 65 67 69 70 72 73 74 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 9252 5 13 19 25 29 33 37 41 44 47 49 52 54 56 58 61 62 64 66 68 69 71 72 74 75 77 78 79 80 82 83 84 85 86 87 88 89 90 91 92 9353 5 13 19 25 30 34 37 41 44 47 50 52 55 57 59 61 63 65 67 68 70 72 73 75 76 77 79 80 81 82 84 85 86 87 88 89 90 91 92 93 9454 5 13 20 25 30 34 38 41 44 47 50 53 55 57 60 62 64 66 67 69 71 72 74 75 77 78 80 81 82 83 85 86 87 88 89 90 91 92 93 94 9555 5 13 20 25 30 34 38 41 45 48 50 53 56 58 60 62 64 66 68 70 71 73 75 76 78 79 80 82 83 84 85 87 88 89 90 91 92 93 94 95 9656 5 13 20 25 30 34 38 42 45 48 51 54 56 58 61 63 65 67 69 70 72 74 75 77 78 80 81 82 84 85 86 87 89 90 91 92 93 94 95 96 9757 5 13 20 25 30 35 38 42 45 48 51 54 56 59 61 63 65 67 69 71 73 74 76 78 79 81 82 83 85 86 87 88 90 91 92 93 94 95 96 97 9858 5 13 20 26 30 35 39 42 46 49 52 54 57 59 62 64 66 68 70 72 73 75 77 78 80 81 83 84 85 87 88 89 90 92 93 94 95 96 97 98 9959 5 13 20 26 31 35 39 43 46 49 52 55 57 60 62 64 66 68 70 72 74 76 77 79 81 82 83 85 86 88 89 90 91 92 94 95 96 97 98 99 10060 5 13 20 26 31 35 39 43 46 49 52 55 58 60 63 65 67 69 71 73 75 76 78 80 81 83 84 86 87 88 90 91 92 93 95 96 97 98 99 100 101

Page 39: Colorado assessment summit_oct12

Unintended Consequences?

• Many principals and teachers (including good ones) will seek schools or teaching assignments that they think will improve their results.

• Principals and teachers may game the system, inadvertently or intentionally.

• Many teachers will seek opportunities to avoid grades with standardized tests.

• Ranking metrics can discourage cooperation among principals and teachers – finding ways to reward teamwork and cooperation are important.

Page 40: Colorado assessment summit_oct12

Case Study #1 - Mean value-added performance in mathematics by school – fall to spring

-8.00

-6.00

-4.00

-2.00

0.00

2.00

4.00

6.00

Page 41: Colorado assessment summit_oct12

Case Study #1 - Mean spring and fall test duration in minutes by school

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

Spring termFall term

Page 42: Colorado assessment summit_oct12

-10.00

-8.00

-6.00

-4.00

-2.00

0.00

2.00

4.00

6.00

8.00

Students taking 10+ minutes longer spring than fall All other students

Case Study #1 - Mean value-added growth by school and test duration

Page 43: Colorado assessment summit_oct12

Differences in fall-spring test durations

Case Study # 2

15%

25%

61%

Mathematics

Spring < Fall Spring = Fall Spring > FallSpring < Fall Spring = Fall Spring > Fall

0.0

1.0

2.0

3.0

4.0

5.0

6.0

Mathematics

Gro

wth

Inde

x

Differences in growth index score based on fall-spring test durations

Page 44: Colorado assessment summit_oct12

Case Study # 2

42%

33%

25%

Fall < Spring Fall = Spring Fall > Spring

-5.0

-4.5

-4.0

-3.5

-3.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

Series1

Differences in spring -fall test durations Differences in raw growth based by spring-fall test duration

How much of summer loss is really summer loss?

Page 45: Colorado assessment summit_oct12

Case Study # 2

0

20

40

60

80

100

120

140

160

180

200

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

Growth Index Fall test duration Spring test duration

School

Min

utes

Grow

th In

dex

Differences in fall-spring test duration (yellow-black) andDifferences in growth index scores (green) by school

Page 46: Colorado assessment summit_oct12

Negotiated goals – Student Learning Objectives

• Negotiated goals (SLOs) are likely to be necessary in some subjects.

• It is difficult to set fair and reasonable goals for improvement absent norms or context.

• It is likely that some goals will be absurdly high and others way too low.

Page 47: Colorado assessment summit_oct12

An alternate approach

• Give primacy to evaluator observation for judging teachers.• Focus mandatory observations on low performers. • Use assessments and value-added measurement to validate

observations.• Require reassessment when observations and assessment

data are in significant misalignment.

Page 48: Colorado assessment summit_oct12

Possible legal issues

• Title VII of the Civil Rights Act of 1964 – Disparate impact of sanctions on a protected group.

• State statutes that provide tenure and other related protections to teachers.

• Challenges to a finding of “incompetence” stemming from the growth or value-added data.

Page 49: Colorado assessment summit_oct12

Recommendations

• Embrace the formative advantages of growth measurement as well as the summative.

• Create comprehensive evaluation systems with multiple measures of teacher effectiveness (Rand, 2010)

• Select measures as carefully as value-added models.• Use multiple years of student achievement data.• Understand the issues and the tradeoffs.

Page 50: Colorado assessment summit_oct12

Presenter - John Cronin, Ph.D.

Contacting us:NWEA Main Number: 503-624-1951 E-mail: [email protected]

The presentation and recommended resources are available at our website: www.kingsburycenter.org

Thank you for attending this event