evaluating pretest to posttest score differences in cap science and social studies assessments: how...

41
Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington, Ph.D. – Shaker Heights Russ Brown, Ph.D – CMSD Denis Jarvinen, Ph.D. – Strategic Measurement and Evaluation, Inc.

Upload: rosalyn-goodman

Post on 18-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Evaluating Pretest to PosttestScore Differences in CAP Science and

Social Studies Assessments:How Much Growth is Enough?

February 2014 

Dale Whittington, Ph.D. – Shaker HeightsRuss Brown, Ph.D – CMSD

Denis Jarvinen, Ph.D. – Strategic Measurement and Evaluation, Inc.

Page 2: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Licensing Tests(e.g., Pharmacists)

State Accountability Testing(e.g., Ohio OAA)

CAP Foundation Science andSocial Studies Assessments

One Test and One Standard for PerformancePass or Fail

One Test, Multiple StandardsBelow Basic, Basic, Proficient, Advanced

Two Tests, One StandardEvaluating Growth (How Much?)

Setting Standards for Performance

Page 3: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Looking at Performance Standards

Content-BasedStandards

Goal of standard setting is to determine a level of knowledge and skill judged to

be appropriate for test purpose

Growth-BasedStandards

Goal of standard setting is to use common statistical feature(s) of the data to set a

criteria for acceptable performance

Page 4: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Three Statistic-Based Approaches for Evaluating Growth

of Student Scores

• Using Effect Size

• Using The Score Distribution

• Using the Standard Error of Measurement

Page 5: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Describing and Comparing Approaches

• Data Points Needed

• Calculations Required

• Outcomes Using a Common Set of Student Data

• Advantages and Disadvantages

Page 6: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

The Common Data Set

Page 7: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

The Common Data Set

Page 8: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

The Common Data Set

Page 9: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

The Common Data Set

Page 10: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

The Common Data Set

Page 11: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Shaker Heights Schools

Prepared by Dale WhittingtonShaker Heights City School District

Ohio Middle Level Annual ConferenceColumbus, Ohio

February 21, 2014

Effect Size for SLO’s and Growth

Page 12: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

What is effect size?

• In an educational setting, effect size is one way to measure the effectiveness of a particular intervention.

• Effect size enables us to measure both the improvement (gain) in learner achievement for a group of learners AND at the same time, take into account the variation of student performance.

Adapted from Understanding, using and calculating effect size, Govt of South Australia, Department of Education & Child Development, http://www.decd.sa.gov.au/quality/files/links/WhatIsEffectSize.pdf

Page 13: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Practical Advantages

• Easy to calculate

• Easy to understand; makes intuitive sense

• Adaptable to different kinds of assessments

• Adaptable to different kinds of ways of considering growth

and goals for SLO’s:

– Shared attribution across the district

– Shared attribution within a school

– Attribution for a specific teacher or group of students

Page 14: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

So how do you calculate effect sizes for SLO’s or growth?

Student Pretest Posttest

Denis 40 35

Donna 25 30

Dale 45 50

Russ 30 40

Difference (AKA Gain)

-5

+5

+5

+10

Start with a set of pretest scores and posttest scores for the same students

Calculate the difference between the pretest & posttest for each student

Page 15: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Calculations Continued

• Pretest– Mean: 35.0– SD: 9.1

• Posttest– Mean: 38.8– SD: 8.5

• The average of 9.1 and 8.5 is 8.8

Calculate the means and standard deviations for both tests

Average the Standard Deviations

Page 16: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

How to adapt

• If your pretest and posttest are different lengths, convert to a similar scale, like percentages.

• Think about who you are basing your analysis on and use that to decide what standard deviation (SD) to use– Common attribution for district: District SD– Common attribution for school: School SD– Class: Class SD– Specific group, such as economically disadvantaged: the

group’s SD

Page 17: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Use the average standard deviation and the gains to calculate the effect size:

Student Pretest Posttest Gain Effect

Denis 40 35 -5 -.57

Donna 25 30 +5 +.57

Dale 45 50 +5 +.57

Russ 30 40 +10 +1.14

Effect Size=Gain/SD

Page 18: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Interpret your results: Common criteria

Cohen (1969)

• ‘Small’ (.2)o real, but difficult to detecto difference between the heights of 15 year old and 16 year old girls in the US

• ‘Medium’ (.5)o ‘large enough to be visible to the naked eye’o difference between the heights of 14 & 18 year old girls

• ‘Large’ (.8)o ‘grossly perceptible and therefore large’ o difference between the heights of 13 & 18 year old girls

Hattie: “For students moving from one year to the next, the average effect size across all students is 0.40.”

Page 19: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

How results differ, depending on attribution and how you tier students

Page 20: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Another Example based on OAA

Page 21: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,
Page 22: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Resources

• Understanding, using and calculating effect size. Government of South Australia, Department of Education & Child Development,

http://www.decd.sa.gov.au/quality/files/links/WhatIsEffectSize.pdf

• Review Methods/Interpreting Effect Sizes. JHU: Best Evidence Encyclopedia.

http://www.bestevidence.org/methods/effectsize.htm

• Calculating an effect size: a practical guide. Visible Learning Plus.http://visiblelearningplus.com/faqs/calculating-effect-size-practical-guide

Page 23: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Establishing Growth Targets with Limited Data

Prepared by Russ Brown, Ph.D – CMSD

Page 24: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

•Design Principles for Student Growth Model work

•The PROBLEM!

•An Idea for a Solution

•Strengths/Weaknesses

Overview

Page 25: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

1. Equity - like measures for like teachers, like expectations for like students.

2. Simplicity - Parsimony and transparency are critical.

3. Continuous improvement will be critical – It simply will not be perfect on the first try!

Guiding Principles

Page 26: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

The PROBLEM

How much growth is enough?

How do you estimate this when you don’t know the relationship between the two tests?

?

Page 27: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

What do we know?

1. Basic information about the distribution of scores.

2. The relative position of each student on the distribution.

Can we leverage this

to set targets?

Time Mean SD

Pretest 24.28 9.6

Page 28: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

The Idea

1. Devoid of any way to estimate what growth “should be”…

2. Students of like ability (ie., same pretest scores) would typically be expected to make comparable growth over time.

3. Use Normal Curve Equivalents as a means to establish targets and relative growth.

Page 29: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

How

1. Translate Pre-Test scores to NCEs

Z= (Pretest Score - Mean Pretest Score) Standard Deviation of the Pretest

NCE = (Z x 21.063)+ 50 (1-99 Interval)

Class Pretest Pre-Mean SD Pre-Z Pre-

NCE

Class1 8.0 24.3 9.6 -1.7 14.2Class1 9.0 24.3 9.6 -1.6 16.4Class1 9.0 24.3 9.6 -1.6 16.4

Page 30: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Outcomes – What Threshold?

Calculating whether the goal is obtained:

•Must make a judgment about the stringency of the goal/calculation

Class Pre- NCE

Post-NCE

NCE Change

Stringency of Goal

0 -5 -7.5Stu 1 14.2 3.2 -11.1 No No NoStu 2 16.4 7.6 -8.9 No No NoStu 3 16.4 9.0 -7.4 No No YesStu 4 18.6 11.9 -6.7 No No Yes

Page 31: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Outcomes – What Performance Level?

Percent of students achieving the Goal

Teacher Growth Rating

Translation

90- 100% 5 Above80-89% 4 Met70-79% 3 Met60-69% 2 Met0-59% 1 Below

Page 32: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Outcomes – What Performance Level?

 Group

Percent of Students Reaching the Goal Mean

Gain0 -5 -7.5

Class 1 1- 12.0% 1- 20.0% 1- 52.0% 24.04Class 2 4- 84.0% 5- 92.0% 5- 92.0% 37.88Class 3 1- 44.0% 1- 52.0% 2- 60.0% 34.44Class 4 1- 44.0% 1- 56.0% 2- 64.0% 34.76•Not surprisingly – outcomes vary by the stringency of

the expectation…

Page 33: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Outcomes – Quick Comparison

 GroupPercent of Students Reaching the Goal

Mean Gain0 -5 -7.5

Class 1 1- 12.0% 1- 20.0% 1- 52.0% 24.04Class 2 4- 84.0% 5- 92.0% 5- 92.0% 37.88Class 3 1- 44.0% 1- 52.0% 2- 60.0% 34.44Class 4 1- 44.0% 1- 56.0% 2- 64.0% 34.76

Percent of Students Reaching the Goal (SEM)

 Group 3 SE 2 SE 1 SE Mean GainClass 1 1- 44% 4 -88% 5- 100% 24.04Class 2 5- 96% 5- 100% 5- 100% 37.88Class 3 1- 52% 1- 56% 3- 76% 34.44Class 4 1- 48% 2 – 60% 3- 76% 34.76

Page 34: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Outcomes – What about Real Data?

Applied to 3rd Grade OAA (Fall to Spring):Percent of students

achieving the Goal

Building Growth Rating

Translation IRN Count

90- 100% 5 Above 060-89% 2-4 Met 370-59% 1 Below 36

Page 35: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Outcomes – What about Real Data?

Applied to 4th Grade Benchmark to OAA (Fall to Spring):

Percent of students

achieving the Goal

Building Growth Rating

Translation IRN Count Mean Value Add Index

90- 100% 5 Above 2 1.96

60-89% 2-4 Met 50 -.68

0-59% 1 Below 13 -1.56

Page 36: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Pros and Cons

+ Students with like scores have like expectations for growth

+ Relatively simple and relatively transparent

- Must make a value judgment about the amount of error for which one wishes to compensate (not so transparent)

- More adjustment = more bias at the bottom

Page 37: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Standard Error of Measurement

All scores have a “true” score and “error”

• Error bands on score reports

Standard Error quantifies degree of “error” in a test score

Formula is: Standard Error of Measurement =

Values needed: Mean, Standard Deviation, Reliability of the Test

Assumptions that underlie this approach

Page 38: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Steps

1) For a set of data, calculate the mean and standard deviation

2) Calculate the reliability of the test

3) Use the formula to determine the Standard Error of Measurement (class

level, school level)

4) Set a level for the growth standard (1 se, 2 se, etc.)

5) Add chosen level of standard error to raw score

6) Convert (raw score + standard error) to percent correct on pretest

7) Find corresponding percent correct/raw score on posttest

(Note: Assumptions here not required once IRT equating is completed)

8) Compare actual student posttest score with target score

9) At or above target score = “Acceptable Progress”

Page 39: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Calculations for one student

Page 40: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Results

Page 41: Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,

Observations

High pretest scores can lead to out-of-range posttest score targets.

Any modification to the sample that increases the Standard Deviation will increase the value of the Standard Error and therefore require more score growth to reach the target.