building the ncsc summative assessment: towards a stage- adaptive design sarah hagge, ph.d., and...

Copyright © 2014 CTB/McGraw-Hill LLC.

1

Building the NCSC Summative Assessment: Towards a Stage-Adaptive

Design

Sarah Hagge, Ph.D., and Anne Davidson, Ed.D.

McGraw-Hill Education CTB

CCSSO

New Orleans, LA

June 25, 2014


2

Overview

Rationale for stage-adaptive test Proposed stage-adaptive design Overview of pilot testing: Plan and goals Summary of results from Pilot Phase I Main findings and next steps


3

Rationale for Stage-Adaptive Test

Targeted to student proficiency levels Improved precision of student test scores Reduced total testing time Reduced testing burden to students and teacher

test administrators


4

Proposed Stage-Adaptive Design

All students will receive tests with the same content distribution

Tests will be adaptive based on tiers and item difficulty– All students receive the same or a similar first stage,

or testlet of items– Students will receive a second stage of items of lower,

higher or about the same difficulty based on their performance on the first stage of the test


5

Example of a Stage-Adaptive Design

Stage 1

Moderate difficultyAll students

Stage 2B

Higher difficultyHigher performing students from Stage 1

Stage 2A

Lower difficultyLower performing students from Stage 1

Overview of Pilot Testing


7

Purpose of Pilot Testing

Collect information necessary to support development and refinement of NCSC summative assessment design

Pilot Phase 1 – Item tryout – Spring 2014– Generate student performance data– Investigate administration conditions– Understand how the items are functioning– Investigate the proposed item scoring processes and procedures

Pilot Phase 2 – Test forms – Fall 2014– Investigate the adaptive algorithm– Collect form and student performance data


8

Broad goals

Try out items Evaluate items Understand administration policies Understand administration processes

– Computer based system– Accommodations

Investigate building an IRT scale Develop the stage adaptive design specification


9

ELA Content and Forms

Grades 3-8, 11 8 forms/grade

– Four reading passages• Two literary and two informational• Foundational items in Grades 3 and 4

– 22 – 35 items/form– One passage at each of the four tiers– Selected response and dichotomously scored

constructed response items


10

Math Content and Forms

Grades 3-8, 11 8 forms/grade

– 25 items per form– Each form contained a mix of all four item tiers– Content distribution percentages similar across the 8

forms– Selected response and dichotomously scored

constructed response items


11

Initial Analysis

Demographic characteristics of student sample– Descriptive statistics (e.g., gender, ethnicity) were collected

for the sample of students.– Learner characteristic inventory was used to collect profile

information about students who participated.– Accommodations data was collected prior to administration

as well as whether the eligible student used the accommodation.

Form-level results Classical item analysis Tier analysis Item response time


12

Flagging Criteria for Item Reviews

Classical Item Analysis– Low p-value, <0.50 (note Tier-1 items have 2 answer

choices)– High p-value, >0.90– Low point-biserial correlation, < 0.20– High option point-biserial correlation, >0.05– Omit rate, >5%

Tier reversals (Tier 1 p-value < Tier 4) Key checks (Distractor analysis) Survey and student interaction study results

Pilot Phase I Results


14

Summary of Student Counts

3832 students overall took ELA (n forms = 8/grade)

3703 students overall took Math (n forms = 8/grade)

Grade N N ELA N Math3 717 518 5334 742 576 5145 723 526 5506 766 598 5447 722 533 5408 756 546 55511 735 535 467

Total 5161 3832 3703


15

Summary of Descriptive Statistics

Subgroup Category N %

Gender*Male 3329 64.8

Female 1811 35.2

Ethnicity**

White 2690 52.1

Asian 159 3.1

Hawaiian or Pacific Islander 88 1.7

Indian or Alaska Native 205 4.0

Hispanic 1296 25.1

African American 697 13.5


16

Summary of AccommodationsSubgroup Category N %

Assistive

Presentation

Needs 278 5.4

Used 107 2.1

Assistive ResponseNeeds 457 8.9

Used 191 3.7

Braille FormNeeds ** **

Used ** **

Large Print FormNeeds 229 4.4

Used 82 1.6

Paper VersionNeeds 512 9.9

Used 349 6.8

Read or RereadNeeds 4471 86.6

Used 2930 56.8

Subgroup Category N %

Text to SpeechNeeds 1263 24.5

Used 582 11.3

ScribeNeeds 1103 21.4

Used 446 8.6

Speech to TextNeeds 338 6.5

Used 86 1.7

Sign InterpretationNeeds 98 1.9

Used 40 0.8

No Accommodation

Needed

Needs 2069 40.1

Used 1429 27.7


17

ELA Form-Level Results

Note. * Forms included all ELA items except the extended Writing prompt.Note. Cronbach alpha coefficients ranged from 0.56 to 0.90 on ELA forms.


18

Math Form-Level Results

Note. Cronbach alpha coefficients ranged from 0.31 to 0.83 on math forms.


19

Classical Item Results

Range of item p-values– 0.05 to 0.95– P-value standard deviation of 0.11 to 0.23 depending

on test form– Very few items with low or high p-values

Item omit rates less than 3% across all items Majority of flagged items had low point-biserial

or high option point-biserial


20

Tier results: Mean p-values

3 4 5 6 7 8 110

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ELA

Tier 1 Tier 2 Tier 3 Tier 4Grade

Me

an

p-v

alu

e

3 4 5 6 7 8 110

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Math

Tier 1 Tier 2 Tier 3 Tier 4Grade

Me

an

p-v

alu

e

Discussion and Next Steps


22

Main Findings

– Evidence that content is appropriate for students in the Phase I Pilot sample.

• Range of p-values• Relatively few items flagged for high or low p-values• Item omit rates and not-reached rates 3% or less• Form percent correct range of approximately 45-70%

– Evidence that tiers are functioning according to design at an aggregate level

• Tier 1 easier than the other four tiers • Tiers 2, 3 and 4 tended to have a pattern of difficulty ranging

from least to most difficult– Evidence that item bank can support forms at different

difficulty levels• Items exhibit a range of p-values


23

Next Steps

– Investigate IRT scaling on forms with higher N counts– Conduct item and form-level analysis by student

subgroups– Conduct simulation studies of the adaptive design– Pilot Phase 2

• Field-test items to obtain statistics for operational item bank

• Evaluate stage-adaptive design


24

Thank you!

building the ncsc summative assessment: towards a stage- adaptive design sarah hagge, ph.d., and...

Documents

stage of items

ctbmcgrawhill llc

stageadaptive testtargeted

stageadaptive designcopyright

formsgrade25 items

adaptive algorithmcollect

item difficultyall students

students ability