unpublished work © 2005 by educational testing service growth options for california county and...

Unpublished Work © 2005 by Educational Testing Service

Growth Options for California

County and District

Evaluators’ Meetings

May 10 and 19, 2005

2

Californians Want to Measure Student Growth

CST scales are separate by grade Each grade has its own Basic (300) and

Proficient (350) standards Connections do not presently exist

between grades

3

“Measuring growth” can mean different

things to different users “Vertical scaling”

Catch-all phrase used by a variety of people to represent growth measures

A technical term for one particular statistical procedure

May or may not be most useful and cost-effective growth measure needed by CA

Today we will explain options for measuring growth and get your input

4

Progress Toward Determining the Best

Growth Measure(s) for CA Exploratory study of vertical scaling of CSTs Technical Advisory Group Interviews of CA school district staff about

what growth measures would be useful Growth Options Task Force Evaluators’ meetings Growth Options Task Force follow-up

5

Vertical Scaling (Technical definition)

Connect the scales across grades by having students take “linking” items from adjacent grade tests

These links place the items (and scores) across grades on a common scale

Scale scores might range from

200 (grade 2) up to 800 (grade 11)

6

Vertical Scaling Ideal goals:

Scale scores increase by grade Scale scores can be compared across grades A 500 “means the same thing” if it comes

from a grade 4 test or a grade 5 test “Growth” of 10 units “means the same thing”

in low grades as high grades Ideal approximated in real life but

never exactly met Vast majority of vertical scales have been

developed with published norm-referenced tests

Few vertical scales exist for state standards-referenced tests

7

Exploratory Vertical Scaling Study for California

ELA grades 2-11 Math grades 2-7 Linking embedded in 2004 operational

CST testing No incremental testing or cost to state

Linking items Measured standards that were common

across adjacent grades Placed in “field test buckets”

8

Design

N=3000 to 5000 per linking item ELA 17-25 linking items per grade pair Math 18-24 linking items per grade pair Grade 2 students took some grade 3

items and grade 3 students took some grade 2 items, etc.

Scales linked sequentially: 2<3<4<5<6<7<8<9<10<11

9

Evaluation of Links

Evidence that supports the validity of vertical scaling is the growth of student scores Better performance of higher-grade

students than lower-grade students on common items

Scale score distributions that increase as grade increases

10

Findings:

Higher-grade students consistently did better than lower-grade students on common items that came from the higher-grade operational test

Higher-grade students did not necessarily do better than lower-grade students when common items were from the lower-grade operational test

Position effects were evident: items became more difficult when they appeared later in a test

11

Findings (cont.): Scale scores generally increased by

grade except ELA: grades 9, 10, 11 minimal growth Math: grades 6 and 7 essentially no growth

12

Conclusions of exploratory study

Concerns: ELA: Minimal growth in grades 9, 10, and 11 Math: Minimal growth in grades 6 and 7

Possible factors affecting vertical links Item position effects Grade x curriculum interactions Changes in populations

Not clear if vertical scaling will work for CSTs at all grades

13

Phone Interviews

March/April 2005 15 respondents from CA counties and

districts Asked 5 questions

14

Are you currently using STAR data to make any longitudinal comparisons, and if so, what are you doing with that

data?

Used NRT or CST Aware of inappropriateness of

using current CST scale scores for growth

15

Who are the most important potential users in your district of longitudinal information?

Full range: Teachers to Superintendents Parents School Boards Administrators: instructional planning Teachers: expected student

performance

16

If we were able to improve the psychometric underpinnings for making comparisons across grades using CSTs, would that be of benefit to your district? How would you plan to use that information?

Overwhelming enthusiasm for legitimate method of making longitudinal comparisons

Should provide legitimate procedure so users don’t “hurt themselves”

Concern about over-burdening the CSTs by addition of one more purpose

17

Longitudinal comparisons do have their limitations and can be misinterpreted, so we’d like to get your input on what interpretive materials would be most useful to you.

Current post-test workshops and guides should cover this

Few saw need for special efforts Largest districts have resources

to address this Teacher-specific interpretive

materials would be helpful

18

One of the options we are considering is a vertical scale. If we used a vertical scale, there would be some changes, and we would need to have an in-grade scale that differed from an “across- grade” scale. Would that be a problem in your district?

Two diametrically opposed opinions: Acquired meaning of 300 and 350 too

important to do away with The meaning of 300 and 350 could be

easily supplanted Use of both in-grade and across-grade

scales seen as complicated and potentially confusing

19

Growth Options Task Force

Tom Barrett, Riverside USD Paula Carroll, Lodi USD J.T. Lawrence, San Diego COE Phil Morse, LAUSD Jim Parker, Paramount USD Jim Stack, SFUSD Mary Tribbey, Butte COE Mao Vang, Sacramento City USD

20

Major Options for Tracking Growth

Vertical Scales Norms Tables of Expected Growth

21

Vertical Scales

Advantages Scale scores comparable across grades Useful if tracking students across many

grades Suitable for statistical analyses

22

Vertical Scales

Disadvantages Assumption of hierarchical growth maybe not met;

scores may not grow between grades Across-grade scale different from within-grade

scale Can highlight inconsistencies (if they exist) of

with-in grade standards Scale scores have no intrinsic meaning Need caution in comparing growth in different

parts of scale Special data collection needed

23

Norms

CA percentiles, NCEs, or Z-scores By grade by content area “Typical” growth defined to be what is

seen cross-sectionally in state from grade to grade

Types Static (using a base year such as 2003) Rolling (using current year)

24

Norms

Advantages Fairly easy to understand Allow comparisons of relative standing and

growth relative to norm group Minimal assumptions are required Comparisons can be made across content

areas No special data collection needed

25

Norms

Disadvantages Need to keep clear relative nature of

comparison (static vs rolling norm) No continuous growth scale Growth expectations are based on cross-

sectional, not longitudinal data “Typical” growth does not necessarily

mean student is progressing sufficiently toward Proficiency

26

Tables of Expected Growth

Use longitudinal CA data (e.g., grade 3 and 4 performance for the same students)

Determine statistical expectation of grade 4 scores typically seen for students with each possible grade 3 score

Calculate standard error along with expectation

Standardized deviations from expectations can be compared across grades and content areas

27


Advantages Fairly easy to understand Allow comparisons of growth relative to

norm group Minimal assumptions are required; could

be done for high school courses Comparisons can be made across content

areas Based on actual student growth

28


Disadvantages Tables of expectations may need to be

recalculated each year No continuous growth scale “Typical” growth does not necessarily mean

student is progressing sufficiently toward Proficiency

Matching student data over years required Expectations would not include students who

have been in CA < 1 year or who cannot be tracked

29

Growth Options Task Force

Discussed options in detail for a day Norms may be most easily understood Growth Expectations may be most useful for

administrators and program evaluation Classification may be useful: Growth is

average/above average/below average Standardized growth measures that could be

pooled over grades could be useful:(Observed score – Expected score)/SE

Will work with CDE and ETS to pilot test some options

unpublished work © 2005 by educational testing service growth options for california county and...

Documents

lowergrade students

grade increasesfindings

gradeeach grade

grade exceptela

grade pairgrade

grade pairmath

statelinking items

gradesmeasuring growth