unpublished work © 2005 by educational testing service growth options for california county and...
TRANSCRIPT
Unpublished Work © 2005 by Educational Testing Service
Growth Options for California
County and District
Evaluators’ Meetings
May 10 and 19, 2005
2
Californians Want to Measure Student Growth
CST scales are separate by grade Each grade has its own Basic (300) and
Proficient (350) standards Connections do not presently exist
between grades
3
“Measuring growth” can mean different
things to different users “Vertical scaling”
Catch-all phrase used by a variety of people to represent growth measures
A technical term for one particular statistical procedure
May or may not be most useful and cost-effective growth measure needed by CA
Today we will explain options for measuring growth and get your input
4
Progress Toward Determining the Best
Growth Measure(s) for CA Exploratory study of vertical scaling of CSTs Technical Advisory Group Interviews of CA school district staff about
what growth measures would be useful Growth Options Task Force Evaluators’ meetings Growth Options Task Force follow-up
5
Vertical Scaling (Technical definition)
Connect the scales across grades by having students take “linking” items from adjacent grade tests
These links place the items (and scores) across grades on a common scale
Scale scores might range from
200 (grade 2) up to 800 (grade 11)
6
Vertical Scaling Ideal goals:
Scale scores increase by grade Scale scores can be compared across grades A 500 “means the same thing” if it comes
from a grade 4 test or a grade 5 test “Growth” of 10 units “means the same thing”
in low grades as high grades Ideal approximated in real life but
never exactly met Vast majority of vertical scales have been
developed with published norm-referenced tests
Few vertical scales exist for state standards-referenced tests
7
Exploratory Vertical Scaling Study for California
ELA grades 2-11 Math grades 2-7 Linking embedded in 2004 operational
CST testing No incremental testing or cost to state
Linking items Measured standards that were common
across adjacent grades Placed in “field test buckets”
8
Design
N=3000 to 5000 per linking item ELA 17-25 linking items per grade pair Math 18-24 linking items per grade pair Grade 2 students took some grade 3
items and grade 3 students took some grade 2 items, etc.
Scales linked sequentially: 2<3<4<5<6<7<8<9<10<11
9
Evaluation of Links
Evidence that supports the validity of vertical scaling is the growth of student scores Better performance of higher-grade
students than lower-grade students on common items
Scale score distributions that increase as grade increases
10
Findings:
Higher-grade students consistently did better than lower-grade students on common items that came from the higher-grade operational test
Higher-grade students did not necessarily do better than lower-grade students when common items were from the lower-grade operational test
Position effects were evident: items became more difficult when they appeared later in a test
11
Findings (cont.): Scale scores generally increased by
grade except ELA: grades 9, 10, 11 minimal growth Math: grades 6 and 7 essentially no growth
12
Conclusions of exploratory study
Concerns: ELA: Minimal growth in grades 9, 10, and 11 Math: Minimal growth in grades 6 and 7
Possible factors affecting vertical links Item position effects Grade x curriculum interactions Changes in populations
Not clear if vertical scaling will work for CSTs at all grades
13
Phone Interviews
March/April 2005 15 respondents from CA counties and
districts Asked 5 questions
14
Are you currently using STAR data to make any longitudinal comparisons, and if so, what are you doing with that
data?
Used NRT or CST Aware of inappropriateness of
using current CST scale scores for growth
15
Who are the most important potential users in your district of longitudinal information?
Full range: Teachers to Superintendents Parents School Boards Administrators: instructional planning Teachers: expected student
performance
16
If we were able to improve the psychometric underpinnings for making comparisons across grades using CSTs, would that be of benefit to your district? How would you plan to use that information?
Overwhelming enthusiasm for legitimate method of making longitudinal comparisons
Should provide legitimate procedure so users don’t “hurt themselves”
Concern about over-burdening the CSTs by addition of one more purpose
17
Longitudinal comparisons do have their limitations and can be misinterpreted, so we’d like to get your input on what interpretive materials would be most useful to you.
Current post-test workshops and guides should cover this
Few saw need for special efforts Largest districts have resources
to address this Teacher-specific interpretive
materials would be helpful
18
One of the options we are considering is a vertical scale. If we used a vertical scale, there would be some changes, and we would need to have an in-grade scale that differed from an “across- grade” scale. Would that be a problem in your district?
Two diametrically opposed opinions: Acquired meaning of 300 and 350 too
important to do away with The meaning of 300 and 350 could be
easily supplanted Use of both in-grade and across-grade
scales seen as complicated and potentially confusing
19
Growth Options Task Force
Tom Barrett, Riverside USD Paula Carroll, Lodi USD J.T. Lawrence, San Diego COE Phil Morse, LAUSD Jim Parker, Paramount USD Jim Stack, SFUSD Mary Tribbey, Butte COE Mao Vang, Sacramento City USD
21
Vertical Scales
Advantages Scale scores comparable across grades Useful if tracking students across many
grades Suitable for statistical analyses
22
Vertical Scales
Disadvantages Assumption of hierarchical growth maybe not met;
scores may not grow between grades Across-grade scale different from within-grade
scale Can highlight inconsistencies (if they exist) of
with-in grade standards Scale scores have no intrinsic meaning Need caution in comparing growth in different
parts of scale Special data collection needed
23
Norms
CA percentiles, NCEs, or Z-scores By grade by content area “Typical” growth defined to be what is
seen cross-sectionally in state from grade to grade
Types Static (using a base year such as 2003) Rolling (using current year)
24
Norms
Advantages Fairly easy to understand Allow comparisons of relative standing and
growth relative to norm group Minimal assumptions are required Comparisons can be made across content
areas No special data collection needed
25
Norms
Disadvantages Need to keep clear relative nature of
comparison (static vs rolling norm) No continuous growth scale Growth expectations are based on cross-
sectional, not longitudinal data “Typical” growth does not necessarily
mean student is progressing sufficiently toward Proficiency
26
Tables of Expected Growth
Use longitudinal CA data (e.g., grade 3 and 4 performance for the same students)
Determine statistical expectation of grade 4 scores typically seen for students with each possible grade 3 score
Calculate standard error along with expectation
Standardized deviations from expectations can be compared across grades and content areas
27
Tables of Expected Growth
Advantages Fairly easy to understand Allow comparisons of growth relative to
norm group Minimal assumptions are required; could
be done for high school courses Comparisons can be made across content
areas Based on actual student growth
28
Tables of Expected Growth
Disadvantages Tables of expectations may need to be
recalculated each year No continuous growth scale “Typical” growth does not necessarily mean
student is progressing sufficiently toward Proficiency
Matching student data over years required Expectations would not include students who
have been in CA < 1 year or who cannot be tracked
29
Growth Options Task Force
Discussed options in detail for a day Norms may be most easily understood Growth Expectations may be most useful for
administrators and program evaluation Classification may be useful: Growth is
average/above average/below average Standardized growth measures that could be
pooled over grades could be useful:(Observed score – Expected score)/SE
Will work with CDE and ETS to pilot test some options