aligning assessments to monitor growth in math achievement: a validity study jack b. monpas-huber,...
Post on 18-Jan-2018
215 Views
Preview:
DESCRIPTION
TRANSCRIPT
Aligning Assessments to Monitor Growth in Math Achievement: A Validity Study
Jack B. Monpas-Huber, Ph.D.Director of Assessment & Student Information
Washington State Assessment ConferenceDecember 2008
Core Themes1. Technical quality of locally developed
district assessments as outcome measure for program evaluation
• Growth modeling in HLM
2. Fidelity of local district instrumentation to state instrumentation
• Sampling same domain• Built from same maps and item specs• Measuring same thing?
300.3
352.2
404.0
455.8
507.7
MA
TH
-0.20 0.90 2.00 3.10 4.20
TIME
FRL = 0
FRL = 1
Possible Questions:
1. What does growth in math look like over the course of one year? Do the data match our expectations and theory of action?
2. Why do some students have steeper growth curves than others?
3. What causes growth? Does math coaching explain variation in the slopes of students’ growth curves?
Evaluating Program with Growth Curves
Background of this ProjectIssue #1:WASL math achievement seems to decline after 3rd grade
Working Explanations:
1. Reflection of curriculum and instruction?
2. Artifact of WASL standard setting
2007 WASL MATH ACHIEVEMENT
0
20
40
60
80
100
Gr3 Gr4 Gr5 Gr6 Gr7 Gr8 Gr10
% M
eetin
g St
anda
rd
Spokane State
Background of this ProjectIssue #2:District assessments ready for action
Written CurriculumSTANDARDS
Taught Curriculum
Tested CurriculumASSESSMENT of Learning
EVOLOVING PURPOSES (AND VALIDATION) OF ASSESSMENTS• Monitor learning across the system• Inferences about fidelity to curriculum, instructional
effectiveness• Curriculum development, professional development• Predictive of WASL performance
Design for Exploratory StudyWhat’s happening to our students from
Grade 5 to Grade 7?Longitudinal: Follow the same students from Grade 5 to Grade 7
Stratified: Approximately 10 students per classroom
Repeated measurements: What can we learn from these data sets?…..
– 2007 Grade 4 WASL– Fall SASL– Winter SASL– 5 End-of-unit district assessments– 2008 Grade 5 WASL
10 waves
Research QuestionsWhat can we validly infer from WASL-like
district benchmark assessments about….1. Fidelity to curriculum: Are teachers teaching
the district curriculum?2. Rigor of curriculum: Is it rigorous enough?
Are we preparing our kids adequately?3. Growth toward state standard: Are our district
assessments showing us true achievement of state standards?
Results: Performance on District Assessments
FALL SASL
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Freq
uenc
y
UNIT 2
0
20
40
60
80
100
120
140
160
0 1 2 3 4 5 6 7 8 9 10
Freq
uenc
y
UNIT 4
0
20
40
60
80
100
120
140
160
0 1 2 3 4 5 6 7 8 9 10
Freq
uenc
y
UNIT 5
0
50
100
150
200
250
0 1 2 3 4 5 6 7
Freq
uenc
y
UNIT 3
0
20
40
60
80
100
120
140
160
0 1 2 3 4 5 6 7 8 9 10 11
Freq
uenc
y
UNIT 6
0
50
100
150
200
250
0 1 2 3 4 5 6 7 8 9 10
Freq
uenc
y
Winter SASL
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Freq
uenc
y
2007 GRADE 5 WASL
0
20
40
60
80
100
120
5 9 13 17 21 25 29 33 37 41 45 49 53
Freq
uenc
y
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Research QuestionsFurther Questions about Growth toward State
Standard:• Are our district assessments showing us true achievement
of state standards?• Are our districts assessments aligned to WASL and
measuring the same thing?• How can I “link” scores from district assessments to WASL
so that they all sit on the same scale?• Could difficulty parameters for local items help teachers
see the relative difficulty of different skills?
Linking District Assessments to WASL
Equating, Scaling and Linking Literature• Linear Equating• Equipercentile Equating• Item Response Theory (IRT)
• Rasch Model (Winsteps) places item difficulties and person abilities on the same scale
• “Common person” single group design• Concurrent calibration
300.3
352.2
404.0
455.8
507.7
MA
TH
-0.20 0.90 2.00 3.10 4.20
TIME
FRL = 0
FRL = 1
Grade 4WASL
FallSASL
Unit 2
Unit 3
Unit 4
Random sample of 5 students
Results: Tentative Growth Curves
Two ChallengesReliability• A function of test length and redundancy
of items• District assessments fairly short• HLM raised doubts about reliability of
growth curves
Dimensionality• Rasch model assumes one dimension• Each district assessment measures
somewhat different content
Broader Issues for District Assessments
Locally developed WASL-like district benchmark assessmentsAdvantages: Local capacity building, content validityChallenge: Time and expense of development, technical
quality, scaling and equating
Outside Instruments, e.g., Measures of Academic Progress (MAP)Advantage: Technical quality, growth scaleChallenge: Content validity
top related