idaho standards achievement tests in english language arts and mathematics 2018–2019 technical...

Idaho Standards Achievement

Tests in English Language Arts

and Mathematics

2018–2019 Technical Report

Submitted to

Idaho State Department of Education

by the American Institutes for Research

Idaho Standards Achievement Tests in English Language Arts and Mathematics


i American Institutes for Research

TABLE OF CONTENTS

1. BACKGROUND................................................................................................................... 1

1.1 Updates; 2015–2018 ..................................................................................................................... 2

1.2 Changes in 2018–2019 Summative Assessments ........................................................................... 2

1.3 Impact of Changes in 2018–2019 ELA/L Test Blueprints.............................................................. 3

2. TEST ADMINISTRATION .................................................................................................. 7

2.1 Testing Windows.......................................................................................................................... 7

2.2 Test Options and Administrative Roles ......................................................................................... 7

2.2.1 Administrative Roles .......................................................................................................... 8

2.2.2 Online Administration ..................................................................................................... 10

2.2.3 Paper-Pencil Test Administration .................................................................................... 11

2.2.4 Braille Test Administration .............................................................................................. 11

2.3 Training and Information for Test Coordinators and Administrators ............................................ 12

2.3.1 Online Training ............................................................................................................... 12

2.3.2 District Training Workshops ............................................................................................ 17

2.4 Test Security .............................................................................................................................. 17

2.4.1 Student-Level Testing Confidentiality............................................................................... 17

2.4.2 System Security................................................................................................................ 18

2.4.3 Security of the Testing Environment ................................................................................. 19

2.4.4 Test Security Violations ................................................................................................... 20

2.5 Student Participation .................................................................................................................. 20

2.5.1 Homeschooled Students ................................................................................................... 21

2.5.2 Exempt Students .............................................................................................................. 21

2.6 Online Testing Features and Testing Accommodations ............................................................... 21

2.6.1 Online Universal Tools for All Students ........................................................................... 22

2.6.2 Designated Supports and Accommodations ...................................................................... 24

2.7 Data Forensics Program .............................................................................................................. 39



ii American Institutes for Research

2.7.1 Data Forensics Report ..................................................................................................... 39

2.7.2 Changes in Student Performance ..................................................................................... 39

2.7.3 Item Response Time ......................................................................................................... 40

2.7.4 Inconsistent Item Response Pattern .................................................................................. 40

2.8 Prevention and Recovery of Disruptions in Test Delivery System ............................................... 41

2.8.1 High-Level System Architecture ....................................................................................... 42

2.8.2 Automated Backup and Recovery ..................................................................................... 43

2.8.3 Other Disruption Prevention and Recovery ...................................................................... 43

3. CONSTRUCTION OF GRADES 9 AND 10 TESTS........................................................... 45

3.1 Test Blueprints ........................................................................................................................... 45

3.1.1 ELA/L Test Blueprint ....................................................................................................... 45

3.1.2 Mathematics Test Blueprint ............................................................................................. 46

3.2 Summary of Simulation Studies .................................................................................................. 46

3.2.1 Summary of Adaptive Algorithm ...................................................................................... 46

3.2.2 Testing Plan .................................................................................................................... 48

3.2.3 Statistical Summaries ...................................................................................................... 49

3.2.4 Summary Statistics on Test Blueprints ............................................................................. 50

3.2.5 Summary Statistics on Ability Estimation ......................................................................... 52

3.2.6 Global Item Exposure ...................................................................................................... 54

3.3 Establishing Cut Scores .............................................................................................................. 55

4. SUMMARY OF 2018–2019 OPERATIONAL TEST ADMINISTRATION ....................... 61

4.1 Student Population ..................................................................................................................... 61

4.2 Summary of Overall Student Performance .................................................................................. 62

4.3 Test-Taking Time ....................................................................................................................... 78

4.4 Distribution of Student Ability and Item Difficulty ..................................................................... 81

5. VALIDITY ......................................................................................................................... 90

5.1 Evidence on Test Content ........................................................................................................... 90

5.2 Evidence on Internal Structure .................................................................................................... 97

6. RELIABILITY .................................................................................................................. 101

6.1 Marginal Reliability.................................................................................................................. 101



iii American Institutes for Research

6.2 Standard Error Curves .............................................................................................................. 102

6.3 Reliability of Achievement Classification ................................................................................. 108

6.4 Reliability for Subgroups .......................................................................................................... 113

6.5 Reliability for Claim Scores ...................................................................................................... 116

7. SCORING ......................................................................................................................... 120

7.1 Estimating Student Ability Using Maximum Likelihood Estimation.......................................... 120

7.2 Rules for Transforming Theta to Vertical Scale Scores ............................................................. 121

7.3 Lowest/Highest Obtainable Scores (LOSS/HOSS) .................................................................... 122

7.4 Scoring All Correct and All Incorrect Cases .............................................................................. 123

7.5 Rules for Calculating Strengths and Weaknesses for Claim Scores ............................................ 123

7.6 Target Scores............................................................................................................................ 124

7.6.1 Target Scores Relative to Student’s Overall Estimated Ability ........................................ 124

7.6.2 Target Scores Relative to Proficiency Standard (Level 3 Cut) ........................................ 125

7.7 Handscoring ............................................................................................................................. 126

7.7.1 Rater Selection .............................................................................................................. 126

7.7.2 Rater Training ............................................................................................................... 127

7.7.3 Rater Statistics .............................................................................................................. 129

7.7.4 Rater Monitoring and Retraining ................................................................................... 130

7.7.5 Validity Checks.............................................................................................................. 130

7.7.6 Rater Dismissal ............................................................................................................. 131

7.7.7 Rater Agreement ............................................................................................................ 131

8. REPORTING AND INTERPRETING SCORES ............................................................... 134

8.1 Online Reporting System for Students and Educators ................................................................ 134

8.1.1 Types of Online Score Reports ....................................................................................... 134

8.1.2 Online Reporting System ............................................................................................... 136

8.2 Interpretation of Reported Scores .............................................................................................. 150



iv American Institutes for Research

8.2.1 Scale Score .................................................................................................................... 150

8.2.2 Standard Error of Measurement .................................................................................... 150

8.2.3 Achievement Level ......................................................................................................... 150

8.2.4 Performance Category for Claims ................................................................................. 151

8.2.5 Performance Category for Targets ................................................................................ 151

8.2.6 Aggregated Score .......................................................................................................... 152

8.3 Appropriate Uses for Scores and Reports .................................................................................. 152

9. QUALITY CONTROL PROCEDURE .............................................................................. 154

9.1 Adaptive Test Configuration ..................................................................................................... 154

9.1.1 Platform Review ............................................................................................................ 154

9.1.2 User Acceptance Testing and Final Review.................................................................... 155

9.2 Quality Assurance in Document Processing .............................................................................. 155

9.3 Quality Assurance in Data Preparation ...................................................................................... 155

9.4 Quality Assurance in Handscoring ............................................................................................ 155

9.4.1 Double Scoring Rates, Agreement Rates, Validity Sets, and Ongoing Read-Behinds ....... 155

9.4.2 Handscoring QA Monitoring Reports ............................................................................ 156

9.4.3 Monitoring by State Department of Education ............................................................... 156

9.4.4 Identifying, Evaluating, and Informing the State on Alert Responses .............................. 156

9.5 Quality Assurance in Test Scoring ............................................................................................ 157

9.5.1 Score-Report Quality Check .......................................................................................... 158

REFERENCES ....................................................................................................................... 161

APPENDICES ........................................................................................................................ 162



v American Institutes for Research

LIST OF TABLES

Table 1. Changes in ELA/L Test Blueprints......................................................................................... 3

Table 2. Changes in Testing Time (Grades 3–8 and 10) ....................................................................... 4

Table 3. Changes in Testing Time (Grades 9 and 11) ........................................................................... 4

Table 4. Changes in Test Score Reliabilities (Grades 3–8 and 10) ........................................................ 5

Table 5. Changes in Test Score Reliabilities (Grades 9 and 11) ............................................................ 5

Table 6. 2018–2019 Testing Windows ................................................................................................. 7

Table 7. Summary of Tests and Testing Options in 2018–2019 ............................................................ 7

Table 8. Number of Students Who Took Paper-Pencil Tests in 2018–2019 Summative Test

Administration ........................................................................................................................... 11

Table 9. SY 2018–2019 Universal Tools, Designated Supports, and Accommodations ...................... 28

Table 10. ELA/L Total Students with Allowed Embedded and Non-Embedded Accommodations

(Grades 3–8 and 10) ................................................................................................................... 29

Table 11. ELA/L Total Students with Allowed Embedded and Non-Embedded Accommodations

(Grades 9 and 11) ....................................................................................................................... 29

Table 12. ELA/L Total Students with Allowed Embedded Designated Supports (Grades 3–8 and 10) 30

Table 13. ELA/L Total Students with Allowed Embedded Designated Supports (Grades 9 and 11) .... 30

Table 14. ELA/L Total Students with Allowed Non-Embedded Designated Supports (Grades 3–8 and

10) ............................................................................................................................................. 31

Table 15. ELA/L Total Students with Allowed Non-Embedded Designated Supports (Grades 9 and 11)

................................................................................................................................................... 32

Table 16. Mathematics Total Students with Allowed Embedded and Non-Embedded Accommodations

(Grades 3–8 and 10) ................................................................................................................... 33


(Grades 9 and 11) ....................................................................................................................... 33

Table 18. Mathematics Total Students with Allowed Embedded Designated Supports (Grades 3–8 and

10) ............................................................................................................................................. 34

Table 19. Mathematics Total Students with Allowed Embedded Designated Supports (Grades 9 and

11) ............................................................................................................................................. 35

Table 20. Mathematics Total Students with Allowed Non-Embedded Designated Supports (Grades 3–

8, 10) .......................................................................................................................................... 36

Table 21. Mathematics Total Students with Allowed Non-Embedded Designated Supports (Grades 9

and 11) ....................................................................................................................................... 38

Table 22. Parameters Used to Generate True Ability Distributions ..................................................... 48



vi American Institutes for Research

Table 23. Percentage of Tests Meeting Blueprint Requirements in ELA/L ......................................... 50

Table 24. Percentage of Tests Meeting Blueprint Requirements in Mathematics ................................ 51

Table 25. Average and Range of the Number of Unique Targets Covered in Each Test by Claim ....... 52

Table 26. Bias of the Estimated Abilities ........................................................................................... 52

Table 27. Average Item Difficulty and Student Ability ...................................................................... 53

Table 28. Standard Errors of the Estimated Abilities .......................................................................... 53

Table 29. Percentage of Items by Exposure Rate ............................................................................... 55

Table 30. Sample Sizes of Grades 9, 10, and 11 Students in Vertical Linking Sample ........................ 58

Table 31. Cut Scores for ELA/L ........................................................................................................ 58

Table 32. Cut Scores for Mathematics ............................................................................................... 59

Table 33. Predicted Cut Scores for ELA/L ......................................................................................... 59

Table 34. Predicted Cut Scores for Mathematics ................................................................................ 60

Table 35. Number of Students in Summative ELA/L Assessment (Grades 3–8 and 10) ...................... 61

Table 36. Number of Students in Summative ELA/L Assessment (Grades 9 and 11) .......................... 61

Table 37. Number of Students in Summative Mathematics Assessment (Grades 3–8 and 10) ............. 62

Table 38. Number of Students in Summative Mathematics Assessment (Grades 9 and 11) ................. 62

Table 39. ELA/L Percentage of Students in Achievement Levels for Overall and by Subgroup (Grades

3–5)............................................................................................................................................ 63


6–8)............................................................................................................................................ 64

Table 41. ELA/L Percentage of Students in Achievement Levels for Overall and by Subgroup (Grade

10) ............................................................................................................................................. 65


9 and 11) .................................................................................................................................... 66

Table 43. Mathematics Percentage of Students in Achievement Levels for Overall and by Subgroup

(Grades 3–5)............................................................................................................................... 67


(Grades 6–8)............................................................................................................................... 68


(Grade 10) .................................................................................................................................. 69


(Grades 9 and 11) ....................................................................................................................... 70

Table 47. ELA/L Percentage of Students in Performance Categories by Claim (Grades 3–8 and 10) .. 77



vii American Institutes for Research

Table 48. ELA/L Percentage of Students in Performance Categories by Claim (Grades 9 and 11) ...... 77

Table 49. Mathematics Percentage of Students in Performance Categories by Claim (Grades 3–8 and

10) ............................................................................................................................................. 78

Table 50. Mathematics Percentage of Students in Performance Categories by Claim (Grades 9 and 11)

................................................................................................................................................... 78

Table 51. ELA/L Test-Taking Time (Grades 3–8 and 10) .................................................................. 79

Table 52. ELA/L Test-Taking Time (Grades 9 and 11) ...................................................................... 80

Table 53. Mathematics Test-Taking Time (Grades 3–8 and 10) ......................................................... 80

Table 54. Mathematics Test-Taking Time (Grades 9 and 11) ............................................................. 81

Table 55. Percentage of ELA/L Delivered Tests Meeting Blueprint Requirements for Each Claim and

the Number of Passages Administered (Grades 3–5) ................................................................... 91

Table 56. ELA/L Percentage of Delivered Tests Meeting Blueprint Requirements for Each Claim and

the Number of Passages Administered (Grades 6–8, 10) ............................................................. 92

Table 57. ELA/L Percentage of Delivered Tests Meeting Blueprint Requirements for Each Claim and

the Number of Passages Administered (Grades 9, 11) ................................................................. 93

Table 58. Mathematics Percentage of Delivered Tests Meeting Blueprint Requirements for Claims and

Targets (Grades 3–5) .................................................................................................................. 94


Targets (Grades 6–8 and 10) ....................................................................................................... 95


Targets (Grades 9 and 11) ........................................................................................................... 96

Table 61. Average and Range of the Number of Unique Targets Assessed Within Each Claim Across

All Delivered Tests (Grades 3–8 and 10) .................................................................................... 97

Table 62. Average and Range of the Number of Unique Targets Assessed Within Each Claim Across

All Delivered Tests (Grades 9 and 11) ........................................................................................ 97

Table 63. Correlations Among Claim Scores for ELA/L (Grades 3–8 and 10) .................................... 99

Table 64. Correlations Among Claim Scores for ELA/L (Grades 9 and 11) ........................................ 99

Table 65. Correlations Among Claim Scores for Mathematics (Grades 3–8 and 10) ......................... 100

Table 66. Correlations Among Claim Scores for Mathematics (Grades 9 and 11) ............................. 100

Table 67. Marginal Reliability for ELA/L and Mathematics (Grades 3–8 and 10) ............................ 102

Table 68. Marginal Reliability for ELA/L and Mathematics (Grades 9 and 11) ................................ 102

Table 69. Average Conditional Standard Error of Measurement by Achievement Level (Grades 3–8,

10) ........................................................................................................................................... 107



viii American Institutes for Research

Table 70. Average Conditional Standard Error of Measurement by Achievement Level (Grades 9 and

11) ........................................................................................................................................... 107

Table 71. Average Conditional Standard Error of Measurement at Each Achievement Level Cut Score

and Difference of the SEMs Between Two Cuts (Grades 3–8 and 10) ....................................... 108


and Difference of the SEMs Between Two Cuts (Grades 9 and 11) ........................................... 108

Table 73. Classification Accuracy and Consistency by Achievement Level (Grades 3–8 and 10) ..... 112

Table 74. Classification Accuracy and Consistency by Achievement Level (Grades 9 and 11) ......... 113

Table 75. Marginal Reliability and Average Conditional SEM by Subgroup for ELA/L (Grades 3–5)

................................................................................................................................................. 113

Table 76. Marginal Reliability and Average Conditional SEM by Subgroup for ELA/L (Grades 6–8

and 10) ..................................................................................................................................... 114

Table 77. Marginal Reliability and Average Conditional SEM by Subgroup for ELA/L (Grades 9 and

11) ........................................................................................................................................... 114

Table 78. Marginal Reliability and Average Conditional SEM by Subgroup for Mathematics (Grades

3–5).......................................................................................................................................... 115

Table 79. Marginal Reliability and Average Conditional SEM by Subgroup for Mathematics (Grades

6–8 and 10) .............................................................................................................................. 115

Table 80. Marginal Reliability and Average Conditional SEM by Subgroup for Mathematics (Grades 9

and 11) ..................................................................................................................................... 116

Table 81. Marginal Reliability Coefficients for Claim Scores in ELA/L (Grades 3–8 and 10) .......... 117

Table 82. Marginal Reliability Coefficients for Claim Scores in ELA/L (Grades 9 and 11) .............. 118

Table 83. Marginal Reliability Coefficients for Claim Scores in Mathematics (Grades 3–8 and 10).. 118

Table 84. Marginal Reliability Coefficients for Claim Scores in Mathematics (Grades 9 and 11) ..... 119

Table 85. Vertical Scaling Constants on the Reporting Metric ......................................................... 121

Table 86. Cut Scores in Scale Scores (Grades 3–11) ........................................................................ 122

Table 87. Extended Lowest- and Highest-Obtainable Scores (Grades 3−11) .................................... 123

Table 88. ELA/L Rater Agreements for Short-Answer Items (Grades 3–11) .................................... 131

Table 89. ELA/L Rater Agreements for Full-Write Items (Grades 3–11) .......................................... 132

Table 90. Mathematics Rater Agreements (Grades 3–11) ................................................................. 133

Table 91. Types of Online Score Reports by Level of Aggregation .................................................. 135

Table 92. Types of Subgroups ......................................................................................................... 135

Table 93. Overview of Quality Assurance Reports ........................................................................... 158



ix American Institutes for Research

LIST OF FIGURES

Figure 1. Standard Error of Measurements Across Estimated Theta Range ........................................ 54

Figure 2. ELA/L Smarter Balanced Cut Scores in Grades 7, 8, and 11 ............................................... 56

Figure 3. Mathematics Smarter Balanced Cut Scores in Grades 7, 8, and 11 ...................................... 57

Figure 4. ELA/L %Proficient Across Years (Grades 3–8 and 10) ....................................................... 71

Figure 5. Mathematics %Proficient Across Years (Grades 3–8 and 10) .............................................. 72

Figure 6. ELA/L and Mathematics %Proficient Across Years (Grades 9 and 11)................................ 73

Figure 7. ELA/L Average Scale Score Across Years (Grades 3–8 and 10) ......................................... 74

Figure 8. Mathematics Average Scale Score Across Years (Grades 3–8 and 10) ................................ 75

Figure 9. ELA/L and Mathematics Average Scale Score Across Years (Grades 9 and 11) .................. 76

Figure 10. Student Ability–Item Difficulty Distribution for ELA/L (Grades 3–8 and 10) ................... 82

Figure 11. Student Ability–Item Difficulty Distribution for ELA/L (Grades 9 and 11) ....................... 82

Figure 12. Student Ability–Item Difficulty Distribution by Claim: ELA/L (Grades 3–5) .................... 83

Figure 13. Student Ability–Item Difficulty Distribution by Claim: ELA/L (Grades 6–8, 10) .............. 84

Figure 14. Student Ability–Item Difficulty Distribution by Claim: ELA/L (Grades 9 and 11)............. 85

Figure 15. Student Ability–Item Difficulty Distribution for Mathematics (Grades 3–8 and 10)........... 86

Figure 16. Student Ability–Item Difficulty Distribution for Mathematics (Grades 9 and 11) .............. 86

Figure 17. Student Ability–Item Difficulty Distribution by Claim: Mathematics (Grades 3–5) ........... 87

Figure 18. Student Ability–Item Difficulty Distribution by Claim: Mathematics (Grades 6–8, 10) ..... 88

Figure 19. Student Ability–Item Difficulty Distribution by Claim: Mathematics (Grades 9 and 11) .... 89

Figure 20. Conditional Standard Error of Measurement for ELA/L (Grades 3–8 and 10) .................. 103

Figure 21. Conditional Standard Error of Measurement (Grades 9 and 11) ....................................... 104

Figure 22. Conditional Standard Error of Measurement for Mathematics (Grades 3–8 and 10) ......... 105

Figure 23. Conditional Standard Error of Measurement for Mathematics (Grades 9 and 11) ............. 106

LIST OF EXHIBITS

Exhibit 1. Home Page: State Level .................................................................................................. 136

Exhibit 2. Home Page: District Level .............................................................................................. 137



x American Institutes for Research

Exhibit 3. Subject Detail Page for ELA/L by Gender: District Level ................................................ 138

Exhibit 4. Claim Detail Page for Mathematics by LEP Status: District Level ................................... 139

Exhibit 5. Target Detail Page for ELA/L: School Level ................................................................... 140

Exhibit 6. Target Detail Page for ELA/L: Teacher Level ................................................................. 141

Exhibit 7. Target Detail Page for Mathematics: School Level .......................................................... 142

Exhibit 8. Target Detail Page for Mathematics: Teacher Level......................................................... 143

Exhibit 9. Trend Report for Mathematics: District Level.................................................................. 144

Exhibit 10. Student Detail Page for ELA/L ...................................................................................... 146

Exhibit 11. Student Detail Page for Mathematics ............................................................................. 147

Exhibit 12. Participation Rate Report at District Level ..................................................................... 148

Exhibit 13. State at a Glance ELA/L ................................................................................................ 149

LIST OF APPENDICES

Appendix A Summary of 2018–2019 Interim Assessments

Appendix B Student Performance Across Five Years for All Students and by Subgroup

Appendix C Classification Accuracy and Consistency Index by Subgroup



1 American Institutes for Research

1. BACKGROUND

In 2010, The Smarter Balanced Assessment Consortium (SBAC) began developing a next-generation

assessment system. The assessments were designed to measure the new Common Core State Standards

(CCSS) in English language arts/literacy (ELA/L) and mathematics for grades 3–8 and high school, and to

provide valid, reliable, and fair test scores about student academic achievement.

The Smarter Balanced assessments consist of the end-of-year summative assessment designed for

accountability purposes and the optional interim assessments designed to support teaching and learning

throughout the year. The summative assessments are used to determine student achievement based on the

CCSS and track student progress toward college and career readiness in ELA/L and mathematics. The

summative assessments consist of two parts: a computer adaptive test (CAT) and a performance task (PT).

• Computer Adaptive Test: An online adaptive test that provides an individualized assessment for

each student.

• Performance Task: A task that challenges students to apply their knowledge and skills to respond

to real-world problems. Performance tasks can best be described as collections of questions and

activities that are coherently connected to a single theme or scenario. They are used to better

measure capacities such as depth of understanding, research skills, and complex analysis, which

cannot be adequately assessed with selected- or constructed-response items. Some performance

task items can be scored by the computer, but most are handscored.

Optional interim assessments allow teachers to check student progress throughout the year and give them

information that they can use to improve instruction and learning. These tools are used at the discretion of

schools and districts, and teachers can employ them to check students’ progress in mastering specific

concepts at strategic points during the school year. The interim assessments are available as fixed-form

tests and consist of the following features:

• Interim Comprehensive Assessments (ICAs) test the same content and report scores on the same

scale as the summative assessments.

• Interim Assessment Blocks (IABs) focus on specific sets of related concepts and provide more

detailed information about student learning.

The Idaho State Board of Education formally adopted the CCSS in ELA/L and mathematics on August 12,

2010 (State Board meeting minutes, 2010). The Idaho Content Standards define the knowledge and skills

that students need to succeed in college and careers. These standards include rigorous content and

application of knowledge through higher-order skills and align with college and workforce expectations.

At the same time, Idaho was one of 19 jurisdictions (18 states and the U.S. Virgin Islands), leading the

development of the assessments in ELA/L and mathematics.

The new statewide assessments developed by Smarter Balanced, in ELA/L and mathematics were

administered for the first time in spring 2015 to students in grades 3–11 in all Idaho public elementary and

secondary schools. American Institutes for Research (AIR) delivered and scored the assessments and

produced score reports. Measurement Incorporated (MI) scored the handscored items.

http://www.hawaiipublicschools.org/TeachingAndLearning/Testing/StateAssessment/Pages/AdaptiveTesting.aspx




1.1 UPDATES; 2015–2018

In 2015, as part of a scope of work in the Multi-Agency Assessment Cooperative (MAAC), AIR was tasked

to develop grades 9 and 10 ELA/L and mathematics tests on the basis of the grade 11 item pool in the

Smarter Balanced assessments shared by Idaho, West Virginia, and the U.S. Virgin Islands. The grades 9

and 10 tests would

• be calibrated on the Smarter Balanced grades 3–8 and 11 vertical scale;

• be administered as a computer-adaptive test; and

• have separate grade-specific cut scores.

AIR examined the CCSS for high school grades and concluded to use the grade 11 blueprint for grades 9

and 10 ELA/L. In mathematics, AIR created blueprints for grade 9 Integrated Mathematics I and grade 10

Integrated Mathematics II. AIR also set the cut scores for grades 9 and 10 in both ELA/L and mathematics

using a statistical method.

Idaho administered the high school assessment in grade 10, beginning in spring 2015. The state provides

assessments in grades 9 and 11, at the discretion of local education agencies to administer as an option.

House Bill 314, passed by the Idaho Legislature in the 2015 session mandated a review of the state’s ELA/L

and mathematics content standards, stating, “The state department of education shall begin to review the

Idaho's standards for learning of math and English language arts (ELA) in 2015. Idaho's content standards

of learning are intended to reinforce our commitment to maintaining a college and career ready standards.”

All stakeholders in Idaho were given the opportunity to voice their approval, or disapproval, of all standards

and provide actionable comments from August 12, 2015 to December 15, 2015 via an online platform.

Many avenues were utilized to elicit the maximum number of reviews statewide including radio and TV

ads reaching across Idaho. Once the challenge ended, all comments provided about specific standards were

evaluated by a team of Idaho educators and stakeholders. This team was composed of stakeholders

including K-12 teachers, administrators, higher education institutions, the PTA, parents, and business and

industry. The committee was selected from applications that were available statewide on the Idaho

Challenge home page by a team of stakeholders and the Idaho State Department of Education (SDE)

personnel. The criteria for selection of members was based on expertise, grade span experience, and

regional and stakeholder representation.

The team reviewed all actionable comments in face to face meetings on December 16 and 17, 2015.

Subsequently, the committee recommended 21 revisions or additions to the ELA/L standards and two

revisions to the mathematics standards. These revisions were taken to the Idaho State Board of Education

for approval before moving to the Idaho Legislature for final approval. The revisions were approved by the

2017 legislature.

1.2 CHANGES IN 2018–2019 SUMMATIVE ASSESSMENTS

In 2018, Smarter Balanced updated test blueprints of the ELA/L summative assessment, shortening the test

length by three to four items. The updated test blueprints were implemented in the 2018–2019 test

administration.

The purpose of the changes in the test blueprints was to reduce testing time burden by removing short

answer (SA) items while keeping the claim and target coverage specified in the test blueprint and the test

score reliability the same as the blueprint in previous years.




In the CAT component, the requirement for short answer items in claim 1 reading and claim 2 writing were

removed in the grades 3–5 assessment while keeping the short answer item requirement in the grades 6–11

assessment. Four items were reduced in claim 2 writing and two items were added in claim 4 research. In

the PT component, one or two research items were removed and thus it consisted of one research item and

one full-write item. Overall test length (CAT and PT combined) was shortened by three to four items in the

ELA/L summative assessment. Table 1 shows a summary of the changes in the test blueprints in ELA/L.

Table 1. Changes in ELA/L Test Blueprints

Component Claim 2017–2018

BP

2018–2019

BP Changes in 2018–2019 BP

CAT

Claim 1 Reading* 14–19 14–19

Grades 3–5: Removed 0–1 short answer item

requirement.

Grades 6-11: No change

Claim 2 Writing 10 6

Grades 3–5: Removed one brief-write item

requirement and reduced the item requirement

by four items.

Grades 6–11: Reduced the item requirement

by four items.

Claim 3 Listening 8–9 8–9 No change

Claim 4 Research 6 8 Added two items

PT Claim 4 Research 2–3 1

Kept one DOK 3 item, with preference for

machine-scored item

Claim 2 Full-Write 1 1 No change

Note. * Required items for claim 1 reading are 14–16 in grades 3–5, 14–19 in grades 6–7, 16–19 in grade 8, and 15–16 in grades 9–11 in both the 2017–2018 and 2018–2019 test administrations.

1.3 IMPACT OF CHANGES IN 2018–2019 ELA/L TEST BLUEPRINTS

The impact of changes in the ELA/L test blueprints is presented in Tables 2–5. The total testing time was

reduced in all grades except for grades 7 and 11. As expected, the overall testing time was reduced more in

grades 3–5 than in upper grades because of the removal of short-answer (SA) items in claims 1 and 2 in

grades 3–5. The decrease in overall testing time for grades 3–5 was estimated to be 19 to 23 minutes on the

average and 24 to 30 minutes at the 80th percentile. The decrease in overall testing time for grades 6 and

8–10 was estimated to be 5 to 12 minutes on the average, and 6 to 14 minutes at the 80th percentile. The

overall testing time and the 80th percentile remains the same in grade 7 while the overall testing time and

the 80th percentile increased by 15 and 17 minutes in grade 11.

The test score reliabilities are similar for the overall scores and claims 1, 3, and 4 scores between the 2017–

2018 and 2018–2019 administrations. The reliability for claim 2 writing scores decreased slightly because

the total required items were reduced from 11 to 7 items in combined CAT and PT tests.




Table 2. Changes in Testing Time (Grades 3–8 and 10)

Grade Overall CAT PT

Mean Median 80th Mean Median 80th Mean Median 80th

2017–2018 Testing Time

3 3:55 3:28 5:12 1:57 1:45 2:28 1:59 1:37 2:49

4 4:03 3:35 5:23 2:03 1:52 2:37 2:00 1:40 2:51

5 4:06 3:38 5:26 2:02 1:51 2:37 2:03 1:44 2:54

6 3:55 3:33 5:04 2:02 1:54 2:35 1:53 1:36 2:34

7 3:11 2:54 4:07 1:38 1:31 2:04 1:34 1:20 2:08

8 3:08 2:50 4:06 1:34 1:28 2:00 1:34 1:20 2:09

10 2:18 2:09 3:01 1:13 1:09 1:33 1:05 0:58 1:32


3 3:36 3:10 4:48 1:41 1:31 2:10 1:55 1:35 2:44

4 3:40 3:18 4:53 1:43 1:34 2:12 1:58 1:39 2:47

5 3:44 3:18 4:56 1:45 1:35 2:14 1:59 1:39 2:47

6 3:45 3:24 4:53 1:56 1:48 2:28 1:49 1:33 2:31

7 3:11 2:55 4:07 1:36 1:30 2:02 1:35 1:22 2:11

8 3:03 2:46 4:00 1:34 1:27 2:00 1:29 1:17 2:04

10 2:08 1:59 2:47 1:09 1:06 1:30 0:59 0:51 1:22

Decrease in Testing Time

3 0:19 0:18 0:24 0:16 0:14 0:18 0:04 0:02 0:05

4 0:23 0:17 0:30 0:20 0:18 0:25 0:02 0:01 0:04

5 0:22 0:20 0:30 0:17 0:16 0:23 0:04 0:05 0:07

6 0:10 0:09 0:11 0:06 0:06 0:07 0:04 0:03 0:03

7 0:00 -0:01 0:00 0:02 0:01 0:02 -0:01 -0:02 -0:03

8 0:05 0:04 0:06 0:00 0:01 0:00 0:05 0:03 0:05

10 0:10 0:10 0:14 0:04 0:03 0:03 0:06 0:07 0:10

Table 3. Changes in Testing Time (Grades 9 and 11)

Grade Overall CAT PT

Mean Median 80th Mean Median 80th Mean Median 80th


9 2:35 2:26 3:20 1:22 1:19 1:46 1:13 1:05 1:39

11 1:39 1:27 2:19 0:58 0:55 1:22 0:40 0:29 1:01


9 2:23 2:15 3:09 1:18 1:14 1:40 1:05 1:00 1:31

11 1:54 1:50 2:36 1:04 1:03 1:26 0:49 0:46 1:10

Decrease in Testing Time

9 0:12 0:11 0:11 0:04 0:05 0:06 0:08 0:05 0:08

11 -0:15 -0:23 -0:17 -0:06 -0:08 -0:04 -0:09 -0:17 -0:09




Table 4. Changes in Test Score Reliabilities (Grades 3–8 and 10)

Grade Total Score Claim 1

Reading

Claim 2

Writing

Claim 3

Listening

Claim 4

Research

2017–2018 Administration

3 0.92 0.76 0.79 0.59 0.67

4 0.92 0.74 0.78 0.58 0.68

5 0.92 0.75 0.80 0.61 0.74

6 0.92 0.76 0.80 0.59 0.68

7 0.92 0.77 0.79 0.58 0.67

8 0.92 0.76 0.78 0.59 0.69

10 0.92 0.75 0.79 0.61 0.68


3 0.92 0.76 0.72 0.60 0.70

4 0.91 0.76 0.71 0.60 0.71

5 0.92 0.75 0.71 0.61 0.76

6 0.91 0.76 0.74 0.59 0.71

7 0.91 0.78 0.73 0.55 0.71

8 0.91 0.75 0.72 0.58 0.72

10 0.91 0.77 0.73 0.60 0.71

Table 5. Changes in Test Score Reliabilities (Grades 9 and 11)

Grade Total Score Claim 1

Reading

Claim 2

Writing

Claim 3

Listening

Claim 4

Research


9 0.91 0.73 0.76 0.58 0.66

11 0.91 0.69 0.80 0.60 0.57


9 0.91 0.76 0.71 0.58 0.70

11 0.92 0.79 0.74 0.64 0.71

This report provides a technical summary of the 2018–2019 summative assessments in ELA/L and

mathematics administered in grades 3–8 and 10, as the Idaho Standards Achievement Tests (ISAT) and

also provides information on the construction of grades 9 and 10 tests. The report includes nine chapters:

Overview, Test Administration, Construction of Grades 9 and 10 Tests, Summary of 2018–2019

Operational Test Administration, Validity, Reliability, Scoring, Reporting and Interpreting Scores, and

Quality Control Procedures. The data included in this report are based on Idaho data for the summative

assessment only. The data in the tables and appendices in this report include all students with valid test

scores in the test administration system. The data may not match final accountability reports or other data

files produced by the department of education. For the interim assessments, the number of students who

took ICAs and IABs and their performance are provided in Appendix A.

Although this report includes information on all aspects of the technical quality of the ISAT test

administration for Idaho, it is an addendum to the 2018–2019 Smarter Balanced technical report. The

information on item and test development, item content review, field-test administration, item data review,




item calibrations, content alignment study, standard setting, and other validity information is included in

the Smarter Balanced technical report.

Smarter Balanced produces a report covering all technical aspects of the Smarter Balanced assessments

described in the Standards for Educational and Psychological Testing (American Educational Research

Association [AERA], American Psychological Association [APA], & National Council on Measurement in

Education [NCME], 2014) and the requirements of the U.S. Department of Education Peer Review of State

Assessment Systems Non-Regulatory Guidance for States (U.S. Department of Education, 2015). The

Smarter Balanced technical report includes information using the data at the consortium level, combining

data from the consortium states.




2. TEST ADMINISTRATION

2.1 TESTING WINDOWS

The 2018–2019 Idaho Standards Achievement Tests (ISAT) English language arts/literacy (ELA/L) and

mathematics assessments testing window spanned approximately two months for the online summative

assessments and approximately eight months for the interim assessments. The paper-pencil fixed-form tests

for the summative assessments were administered over a six-week period during the online summative

assessment testing window. Table 6 shows the testing windows for both online and paper-pencil summative

and interim assessments.

Table 6. 2018–2019 Testing Windows

Tests Grade Start Date End Date Mode

Summative Assessments 3–11 3/18/2019 5/24/2019 Online Adaptive

3–11 4/1/2019 5/10/2019 Paper Fixed-Form

Interim Comprehensive Assessments 3–8, 11 8/9/2018

6/3/2019

3/13/2019

7/17/2019 Online Fixed-Form

Interim Assessment Blocks 3–8, 11 8/9/2018

6/3/2019

3/13/2019

7/17/2019 Online Fixed-Form

2.2 TEST OPTIONS AND ADMINISTRATIVE ROLES

The ISAT ELA/L and mathematics assessments are administered primarily online. To ensure that all

eligible students in the tested grades were given the opportunity to take the ISAT ELA/L and mathematics

assessments, several assessment options were available for the 2018–2019 administration to accommodate

students’ needs. Table 7 lists the testing options that were offered in 2018–2019. A testing option was

selected by content area. Once an option was selected, it applied to all tests in the content area.

Table 7. Summary of Tests and Testing Options in 2018–2019

Assessments Test Options Test Mode

Summative Assessments

English Online

Spanish (mathematics only) Online

Braille Online

Braille Paper

Regular Print Fixed-Form Paper

Large Print Fixed-Form Paper

Interim Assessments

English Online

Spanish (mathematics only) Online Braille Online

To ensure standardized administration conditions, teachers (TEs) and test administrators (TAs) follow

procedures outlined in the Idaho Assessment Systems Manual. TEs and TAs must review the manual before

testing to ensure that the testing room is prepared appropriately (e.g., removing certain classroom posters,

arranging desks) and read the boxed directions verbatim to students before and during testing to maintain

the standardized administration conditions. Make-up procedures should be established for any students who

are absent on testing day(s).




2.2.1 Administrative Roles

The key personnel involved with the test administration are District Administrators (DAs), District Test

Coordinators (DCs), School Test Coordinators (SCs), TEs, and TAs. The main responsibilities of these key

personnel are described below. More detailed descriptions can be found in the manual, provided online at

https://idaho.portal.airast.org/core/fileparse.php/1519/urlt/Idaho-AlR-Systems-Manual.pdf.

District Administrator (DA)

The DA’s role is assigned by the Idaho State Department of Education (SDE). The DA is authorized to add

users to the Test Information Distribution Engine (TIDE) and to assign them any role except that of a DA.

DAs and DCs share many of the same test administration responsibilities. Their primary responsibility is

to coordinate the administration of the ISAT ELA/L and mathematics assessments in the district.

District Test Coordinator (DC)

The DC’s primary responsibility is to coordinate the administration of the ISAT ELA/L and mathematics

assessments in the district.

DCs are responsible for performing the following functions:

• Reviewing all state and Smarter Balanced policies and test administration documents

• Reviewing scheduling and test requirements with SCs, TEs, and TAs

• Working with SCs and Technology Coordinators to ensure that all systems, including the secure

browser, are properly installed and functioning

• Importing users (DCs, SCs, TEs, TAs) into TIDE

• Entering and verifying all student information, eligibility, and test settings in TIDE

• Scheduling and administering training sessions for all SCs, TEs, TAs, and Technology

Coordinators

• Ensuring that all personnel are trained on how to properly administer the ISAT ELA/L and

mathematics assessments

• Monitoring the secure administration of the test

• Investigating and recording all testing improprieties, irregularities, and breaches reported by the

TEs/TAs

• Attending to any secure materials in accordance with state and Smarter Balanced policies

School Test Coordinator (SC)

The SC’s primary responsibilities are to coordinate the administration of the ISAT ELA/L and mathematics

assessments and ensure that testing within his or her school is conducted in accordance with the test

procedures and security policies established by the Idaho SDE.

SCs are responsible for performing the following functions:

• Establishing a testing schedule with DCs, TEs, and TAs based on testing windows

https://idaho.portal.airast.org/core/fileparse.php/1519/urlt/Idaho-AlR-Systems-Manual.pdf




• Working with technology staff to ensure timely computer set-ups and installations

• Working with TEs and TAs to review student information in TIDE to ensure that student

information and test settings for designated supports and accommodations are correctly applied

• Entering student test settings in TIDE

• Identifying students who may require designated supports and test accommodations and ensuring

that procedures for testing these students follow state and Smarter Balanced policies

• Attending all district trainings and reviewing all state and Smarter Balanced policies and test

administration documents

• Ensuring that all TEs and TAs attend school or district trainings and review online training modules

posted on the portal

• Establishing secure and separate testing rooms if needed

• Monitoring secure administration of the test

• Monitoring testing progress during the testing window and ensuring that all students participate, as

appropriate

• Investigating and reporting all testing improprieties, irregularities, and breaches reported by the

TEs and TAs

• Attending to any secure material in accordance with state and Smarter Balanced policies

Teacher (TE)

A TE responsible for administering the ISAT ELA/L and mathematics assessments must have the same

qualifications as a TA. They also have the same test administration responsibilities as a TA. TEs can view

student results when they are made available. This role may also be assigned to teachers who do not

administer the test but will need access to student results.

Test Administrator (TA)

TAs are primarily responsible for administering the ISAT ELA/L and mathematics assessments. The TA

role does not allow access to student results and is designed for TAs, such as technology staff, who

administer tests but should not have access to student results.

TAs are responsible for performing the following functions:

• Completing ISAT ELA/L and mathematics assessments administration training

• Training and reviewing all state and Smarter Balanced policies and test administration documents

before administering any ISAT ELA/L and mathematics assessments

• Viewing student information before testing to ensure that a student receives the proper test with the

appropriate supports. TAs should report any potential data errors to SCs and DCs as appropriate.

• Administering the ISAT ELA/L and mathematics assessments

• Reporting all potential test security incidents to the SCs and DCs in a manner consistent with

Smarter Balanced, state, and district policies




2.2.2 Online Administration

Within the state’s testing window, schools can set testing schedules, allowing students to test in intervals

(e.g., multiple sessions) rather than in one long period, minimizing the interruption of classroom instruction

and efficiently using its facility. With online testing, schools do not need to address the on-site storage and

security issues associated with large shipments of printed testing materials.

SCs oversee all aspects of testing at their schools and serve as the main point of contact; TEs and TAs

administer the assessments only. TEs and TAs are trained in the online testing requirements and the

mechanics of starting, pausing, and ending a test session. Training materials for the test administration are

available online and at regional face-to-face training sessions. All school personnel who serve as test

proctors are required to complete an online TA Certification Course before testing begins. Upon completion

of this course, staff members receive a certificate and authorization to log in to the online testing system.

To start a test session, the TE or TA must first access the TA Interface of the online testing system using

his or her own computer. A test session ID is generated when the test session is created. Students who are

taking the assessment with the TE or TA need to enter their Education Unique Identification (EDUID)

number, first name, and test session ID into the Student Interface using computers provided by the school.

The TE or TA then verifies that the students are taking the appropriate assessments with the appropriate

accessibility feature(s) (see Section 2.6 for a list of accommodations). Students can begin testing only when

the TE or TA confirms the settings. The TE or TA will then read aloud the Test Administration Directions

in the Idaho Assessment Systems Manual to the student(s) and guide them through the login process.

Once an assessment is begun, the student must answer all test questions presented on a page before

proceeding to the next page. Skipping questions is not permitted. For the online computer-adaptive

test (CAT), students are allowed to scroll back to review and edit previously answered items, as long as

these items are in the same test session and this session has not been paused for more than 20 minutes.

Students may review and edit previously completed responses until they submit the assessment. During an

active CAT session, if a student reviews and changes the response to a previously answered item, then the

responses to any following items to which the student already responded remain the same. No new items

are assigned to this student because he or she changed an answer. For example, a student paused for 10

minutes after completing item 10. After the pause, the student returned to item 5 and changed the answer.

If the response change in item 5 changed the item score from incorrect to correct, the student’s overall score

improves; however, there is no change in the scores for items 6–10.

For the performance tasks (PTs), there is no pause rule, but the same rules that apply to the CAT for reviews

and changes to responses also apply to PTs.

For the summative assessment, an assessment can be started in one test session and completed in a different

test session. For the CAT, the assessment must be completed within 45 calendar days of the start date or

the assessment opportunity will expire. For the PTs, the assessment must be completed within 20 calendar

days of the start date.

During a test session, TEs or TAs may pause the test for a student or group of students for a break. It is up

to the TEs or TAs to determine an appropriate stopping point; however, for ELA/L and mathematics CATs,

the assessments cannot be paused for more than 20 minutes to ensure the integrity of the test scores or

testing. If an assessment is paused for more than 20 minutes, the student must start a new test session and

resume testing at the next unanswered question where the test was paused. The student may not view or

edit any previous responses.




The TE or TA must remain in the room at all times during a test session to monitor student testing. Once

the test session ends, the TE or TA must ensure that each student has successfully logged out of the system

and collect any handouts or scratch paper that students used during the assessment to securely shred them.

2.2.3 Paper-Pencil Test Administration

The paper-pencil versions of the ISAT ELA/L and mathematics assessments are provided as an

accommodation for students who do not have access to a computer or students with blindness or visual

impairments. For Idaho, paper-pencil tests were offered in regular print, braille, and large print formats.

In a district with student(s) who need to take the paper-pencil version of a test, the DA must submit a request

to SDE for appropriate materials on behalf of the student(s). If the request is approved, the testing contractor

will ship the appropriate test booklets, receipt instructions, and return instructions to the district.

Separate test booklets are used for ELA/L and mathematics assessments. The items from the CAT and the

PT components are combined into one test booklet, including two sessions for the CAT and one session for

the PT in both content areas. Thus, the TE or TA can break up the assessment into separate sessions.

After the student has completed the assessments, the TE must transcribe the student’s responses into the

Data Entry Interface (DEI) and return the test booklets to the testing vendor. The testing vendor will score

the handscored items. Once the handscored items are scored, scores will be combined with the non-

handscored items, and the final score will appear in the Online Reporting System (ORS).

The total number of students who took paper-pencil tests is presented in Table 8.

Table 8. Number of Students Who Took Paper-Pencil Tests

in 2018–2019 Summative Test Administration

Subject G3 G4 G5 G6 G7 G8 G9 G10 G11 Total

ELA/L 4 4 6 2 3 19

Mathematics 4 4 7 1 2 3 21

2.2.4 Braille Test Administration

The adaptive braille test was available with the same test blueprint in English in both ELA/L and

mathematics. In the 2018–2019 test administration, Smarter Balanced added the Braille Hybrid Adaptive

Test (Braille HAT) for mathematics. The Braille HAT consists of a fixed-form segment, a computer-

adaptive segment, and a fixed-form PT. The fixed-form segment includes items with tactile graphics which

can be embossed at the testing location or received as a package of pre-embossed materials through the

SDE. All items on the Braille HAT can be presented to the students using a Refreshable Braille Display

(RBD).

The braille interface is described below:

• The braille interface includes a text-to-speech component for mathematics consistent with the read-

aloud assessment accommodation. The Job Access with Speech (JAWS) screen reading software

provided by Freedom Scientific is an essential component that students use with the braille

interface.




• Mathematics items are presented to students in Nemeth Braille code via a braille embosser through

the adaptive online summative test and a fixed-form PT.

• Students taking the summative ELA/L assessment can emboss both reading passages and items as

they progress through the assessment. If a student has an RBD, a 40-cell RBD is recommended.

The summative ELA/L is presented to the student with items in either contracted or uncontracted

Literary Braille (for items containing only text) and via a braille embosser (for items with tactile or

spatial components that cannot be read by an RBD).

Before administering the online summative assessments using the braille interface, TEs or TAs must ensure

that the technical requirements are met. These requirements apply to the student’s computer, the TE/TA’s

computer, and any assistive braille technologies used in conjunction with the braille interface.

2.3 TRAINING AND INFORMATION FOR TEST COORDINATORS AND ADMINISTRATORS

All DAs, DCs, and SCs oversee all aspects of testing at their schools and serve as the main point of contact,

while TEs and TAs administer the online assessments. The online TA Certification Course, webinars,

manuals, and regional training sites are used to train TEs and TAs on the online testing requirements and

the mechanics of starting, pausing, and ending a test session. Training materials for the administration are

available online (http://idaho.portal.airast.org/resources).

2.3.1 Online Training

Multiple training opportunities were offered to key staff through the Internet.

TA Certification Course

All school personnel who serve as test proctors are required to complete an online TA Certification Course

to administer assessments. This web-based course is about 20 minutes long and covers information on

testing policies and the steps for administering a test session in the online system. The course is interactive,

requiring participants to practice starting test sessions under different scenarios. Throughout the training

and at the end of the course, participants are required to answer multiple-choice questions about the

information provided. Completion of the TA Certification Course is tracked online in the Test Information

Distribution Engine (TIDE).

Webinars

The following training modules were offered to the field:

Idaho Alternate Assessment (IDAA) Tutorial: March 1, 2018. This video provides a 20-minute walk-

through of the various steps needed for a student to participate and complete the Alternate Assessment in

ELA and Mathematics.

New Test Coordinator Video Tutorial: October 12, 2018. This video provides guidance to Idaho educators

on all the important information regarding the Idaho Assessment Systems. It provides detailed directions

on how to access the resources available on the ISAT portal.

Interim Assessment Implementation Video Tutorial: December 12, 2018. This video tutorial outlines the

tasks for administering and scoring Interim Assessments (ICAs and IABs). The optional Interim

Assessments are given to students throughout the year to help teachers monitor student progress. This video

also provides information on all available materials and resources specific to Interim Assessments.

http://idaho.portal.airast.org/




TIDE Video Tutorial Part 1 and 2: December 14, 2018. These videos provide guidance on the Test

Information Distribution Engine (TIDE). Part 1 includes details on activating your TIDE password, logging

in, and resetting your password, as well as the tasks required to add, view, and edit users, students, and

student test settings. Part 2 includes details on creating and printing rosters, printing test tickets and student

test settings, and the several reports available for monitoring test progress throughout your district or school.

Test Administration Interface and Student Interface Video Tutorial: January 11, 2019. This video provides

a walk-through of the test session setup and student sign-in process. This video tutorial also demonstrates

how students can navigate the practice tests, Interim assessments, and Summative assessments offered

through AIR.

Practice and Training Test Site

In August 2018, separate training sites were opened for TEs, TAs, and students. TEs and TAs can practice

administering assessments and starting and ending test sessions on the TA Training Site, and students can

practice taking an online assessment on the Student Practice and Training Site. The ISAT ELA/L and

mathematics assessments practice tests mirror the corresponding Summative assessments for ELA/L and

mathematics. Each test provides students with a grade-specific testing experience, including a variety of

question types and difficulty levels (approximately 30 items each in ELA/L and mathematics), as well as a

performance task.

The training tests are designed to provide students and teachers with opportunities to quickly familiarize

themselves with the software and navigational tools they will use for the ISAT ELA/L and mathematics

assessments. Training tests are available for both ELA/L and mathematics and are organized by grade bands

(grades 3–5, 6–8, and 9–11), with each test containing 5–10 questions.

A student can log in directly to the practice and training test site as a “Guest” without a TA-generated test

session ID, or the student can log in through a training test session created by the TE or TA in the TA

Training Site. Items in the student training test include all item types that are included in the operational

item pool, including multiple-choice, grid, and natural-language items. Teachers can also use these training

tests to help students become familiar with the online platform and question types.

Manuals and User Guides

The following manuals and user guides are available on the ISAT portal (http://idaho.portal.airast.org):

The Idaho Assessment Systems Manual – AIR Systems User Guide combines the following documents into

one comprehensive manual as unique chapters:

• The Test Information Distribution Engine (TIDE) User Guide is designed to help users navigate

TIDE. Users can find information on managing user account information, student account

information, student test settings, student accommodations, appeals, and rosters.

• The Test Administrator User Guide is designed to help users navigate the test delivery system

(TDS), including the Student Interface and the TA Interface, and to help support TAs in managing

and administering online testing for students.

• The Interim Test Administration Manual describes the Interim assessments and provides

administration details and policy information for District Coordinators and School Test

Coordinators regarding policies and procedures for the Interim assessments.





• The Assessment Viewing Application (AVA) User Guide provides an overview of how to access

and use AVA, which allows teachers to view items on the Interim assessments.

• The AIRWays Reporting User Guide provides instructions and support for users viewing

performance reports for Interim assessments. This user guide is intended for district-level, school-

level, and classroom teacher users.

• The Online Reporting System (ORS) User Guide provides information about the ORS, including

instructions for viewing score reports, accessing test management resources, creating and editing

rosters, and searching for students.

• The Test Improprieties in TIDE chapter provides guidance on how to correctly identify and escalate

test improprieties to the SDE by giving specific examples of test improprieties and suggested

actions in TIDE.

• The Online Summative Test Administration Manual provides information for District Coordinators

and School Test Coordinators regarding policies and procedures for the 2018–2019 ISAT ELA/L

and mathematics assessments. This manual also provides information for TEs and TAs

administering the ISAT ELA/L and mathematics assessments. It includes screen captures and step-

by-step instructions on how to administer the online tests.

• The Data Entry Interface (DEI) is a component of the TDS that allows authorized users to enter

student assessment data, such as item responses and scores.

• The Braille Requirements and Testing provides information about supported hardware and

software requirements for Braille testing and instructions for configuring JAWS. Information about

navigating an online Braille test using JAWS is also included.

• The Paper-Pencil Test Administration Quick Reference provides administration information for

accommodated tests administered on paper.

The Comprehensive Technology Manual for Technology Coordinators combines the following documents

into one comprehensive manual as unique chapters:

• The System Requirements for Online Testing outlines the basic technology requirements for

administering an online assessment, including operating system requirements and supported web

browsers.

• The Secure Browser Installation Manual provides instructions for downloading and installing the

secure browser on supported operating systems used for online assessments.

• The Technical Specifications Manual for Online Testing provides technology staff with the

technical specifications for online testing, including information on Internet and network

requirements, general hardware and software requirements, and the text-to-speech function.

• The Operating System Support Plan describes AIR’s plan for supporting operating systems during

the upcoming test administration and following years. This plan helps districts and schools manage

operating system deployments based on the support timelines.

• The Voice Pack Installation Guide provides information on how to install the NeoSpeech™ Voice

Pack on Windows operating systems for students that use this operating system and require the

text-to-speech accommodation. The voice pack is available in TIDE.




The User Roles Chart provides a brief overview of tasks within each of AIR’s systems, which users can

access each system, and features and tasks within each system.

All manuals and user guides pertaining to 2018–2019 online testing are available on the portal. DAs, DCs,

and SCs can use these manuals and user guides to train TEs and TAs regarding test administration policies

and procedures.

Quick Guides

The following quick guides were created to highlight the most important information for all the systems

used in the ISAT ELA/L and mathematics assessments.

The AIRWays Reporting Quick Guide provides instructions and support for users viewing the 2018–2019

Interim Assessment performance reports and handscoring information in AIRWays Reporting.

The ISAT Portal Quick Guide outlines the ISAT portal and the AIR systems available for the 2018–2019

school year.

The Dual Enrollments in TIDE Quick Guide describes a new feature in TIDE for users to have the ability

to enroll students in multiple districts or schools.

The Test Administration Quick Guide provides information to help users access and navigate the TA

Interface.

The Test Information Distribution Engine (TIDE) Quick Guide assist users in the field with creating

accounts for users and students in TIDE.

The Quick Guide to Printing Individual Student Reports (ISRs) in ORS provides information to users in the

field on how to quickly print ISRs for the assessments appearing in the ORS.

The Sample Tests Quick Guide provides directions to administer sample tests for the ISAT ELA/L and

Mathematics, Science and End-of-Course Assessments, and Alternate Assessment ELA/L and Mathematics

Assessments.

Training Modules

The following training modules were created to help users in the field understand the overall ISAT ELA/L

and mathematics assessments as well as how each system works. The modules were provided as PowerPoint

presentations.

Accessibility and Accommodations Training Module: This module provides a general overview of the

accessibility and accommodations tools available for students taking the ELA/L and mathematics

assessments in the TDS.

AIRWays Reporting Training Module: This module is designed to help district-level, school-level, and

classroom teachers navigate and view the Interim assessment student performance reports.

AIRWays Reporting Tutorials: These tutorials explain how to navigate AIRWays Reporting. The following

tutorials were presented to users:

1. How to Access AIRWays Reporting for Districts

2. How to Access AIRWays Reporting for Schools




3. How to Access AIRWays Reporting for Teachers

4. How to Modify Scores

5. How to Handscore Unscored Items

6. How to Access Longitudinal Reports

7. How to Export and Print Student Data

8. How to Create, Manage, and Edit Rosters

9. How to Set Up Your Reports So They Make Sense

Assessment Viewing Application (AVA) Module: This module explains how to navigate the Assessment

Viewing Application. AVA allows authorized users to view the Interim Comprehensive Assessments

(ICAs) and Interim Assessment Blocks (IABs) for administrative and instructional purposes.

Embedded Universal Tools and Online Features Module: This module acquaints students and teachers with

the online, universal tools (e.g., types of calculators, expandable text) available in the ISAT ELA/L and

mathematics assessments.

Online Reporting System (ORS) Tutorials: These tutorials explain how to navigate the ORS, including

summary statistics, student results, and score reports. The following tutorials were presented to users:

1. Defining Student Population

2. Creating Rosters

3. Viewing and Editing Rosters

4. Printing Reports

5. Using the Exploration Menu

6. Viewing Claim Detail Reports

7. Viewing Target Reports

8. Viewing Item Detail Reports

9. Viewing Reports by Demographic Sub-Groups

10. Downloading Individual Student Reports with Manifest Tutorial

11. Downloading Student Data Files

12. Printing Spanish Individual Student Reports (ISRs) – Open Captions

13. Printing Spanish Individual Student Reports (ISRs) – Closed Captions

Online Reporting System (ORS) Training Module: This training module provides information about all

ORS’s features, including instructions for viewing score reports, generating and exporting summary

statistics and student results, and adding and updating rosters.

Online Testing Braille Training Module: This module provides detailed information on how to administer

tests to students using online braille tests for the ELA/L and mathematics ISAT assessments.

Performance Task Overview Module: This module provides an overview of the performance tasks.




Technology Requirements for Online Testing Module: This module provides current information about

technology requirements, site readiness, supported devices, and secure browser installation, and is designed

to help Technology Coordinators prepare for the administration of online tests.

Test Administrator Interface and Student Interface Tutorial: This tutorial provides a 16-minute overview

of the test session setup and student sign-in process. This video tutorial also demonstrates how students can

navigate the practice tests, Interim assessments, and Summative assessments offered through AIR.

What Is a CAT? Module: This module describes the computer-adaptive test (CAT) and its role in taking

ELA/L and mathematics online assessments.

2.3.2 District Training Workshops

District Training Workshops (known locally as Assessment Roadshows) were held on January 31, February

5, and February 8, 2019, at three locations across the state of Idaho. Training was provided for the

administration of the ISAT ELA/L and mathematics assessments. During the training, district staff were

provided with information to support training their district and school staff. The following topics were

discussed during these workshops:

The Test Information Distribution Engine (TIDE) Roadshow Presentation addresses numerous topics in

TIDE, including tasks that typically take place before testing, during test administration, and after testing

has concluded.

The Technology Requirements for Online Testing and the Test Delivery System (TDS) Roadshow

Presentation assists users in the field to understand technology requirements, the TA Interface that TAs use

to administer online tests, and the Student Interface that students use to take online tests.

The Online Reporting System (ORS) Roadshow Presentation assists users in the field by providing

assessment data and tools in order to better understand student performance.

The Writing 3D-Science Assessment Entities Roadshow Presentation explains the 3-dimensional science

standards that students will be tested on in future school years.

The Troubleshooting Common Issues Roadshow Presentation provides guidance on how to reduce test

improprieties and testing delays.

2.4 TEST SECURITY

All test items, test materials, and student-level testing information are secured materials for all assessments.

The importance of maintaining test security and the integrity of test items is stressed throughout the webinar

trainings and in the user guides, modules, and manuals. Features in the testing system also protect test

security. This section describes system security, student confidentiality, and policies on testing

improprieties.

2.4.1 Student-Level Testing Confidentiality

All secured websites and software systems enforce role-based security models that protect individual

privacy and confidentiality in a manner consistent with the Family Educational Rights and Privacy Act

(FERPA) and other federal laws. Secure transmission and password-protected access are basic features of

the current system and ensure authorized data access. All aspects of the system, including item development




and review, test delivery, and reporting, are secured by password-protected logins. Our systems use role-

based security models that ensure that users may access only the data to which they are entitled and may

edit data only in accordance with their user rights.

There are three elements related to ensuring that the correct students are accessing appropriate test content:

1. Test eligibility, which refers to the assignment of a test for a particular student

2. Test accommodation, which refers to the assignment of a test setting to specific students based on

needs

3. Test session, which refers to the authentication process of a TE or TA creating and managing a test

session, the TE or TA reviewing and approving a test (and its settings) for every student, and the

student logging in to take the test

FERPA prohibits the public disclosure of student information or test results. Examples of prohibited

practices include

• providing login information (username and password) to other authorized TIDE users or to

unauthorized individuals;

• sending a student’s name and EDUID number together in an email message; and

• having students log in and test under another student’s EDUID number.

Test materials and score reports should not be exposed to identify student names with test scores except by

authorized individuals with an appropriate need-to-know status. If information about an individual test must

be sent via email or fax, include only the EDUID number, not the student’s name.

All students, including home-schooled students, must be enrolled or registered at their testing schools in

order to take the online or paper-pencil assessments. Student enrollment information, including

demographic data, is uploaded by the districts to TIDE during the school year. For any updates to student

information needed throughout the year, the DAs, DCs, and SCs must update the student information in

TIDE.

Students log in to the online assessment using their legal first name, EDUID number, and a test session ID.

Only students can log in to an online test session. TEs, TAs, proctors, or other personnel are not permitted

to log in to the system on behalf of students, although they are permitted to assist students who need help

logging in. For the paper-pencil versions of the assessments, DAs, DCs, SCs, TEs, or TAs are required to

enter student responses into the Data Entry Interface (DEI) site in order to receive scores for students testing

on paper.

After a test session, only staff with the administrative roles of DAs, DCs, SCs, or TEs can view their

students’ scores in the OORS. TAs do not have access to student scores.

2.4.2 System Security

The objective of system security is to ensure that all data are protected and accessed appropriately by the

right user groups. It is about protecting data and maintaining data and system integrity as intended,

including ensuring that all personal information is secured, that transferred data (whether sent or received)

is not altered in any way, that the data source is known, and that any service or activity can be performed

by a specific, designated user only.




A hierarchy of control: As described in Section 2.2, DAs, DCs, SCs, TAs, and TEs have well-defined

roles and access to the testing system. Districts are responsible for adding users’ access to the AIR systems

through TIDE. DAs must contact SDE to be added to TIDE. DAs are then responsible for selecting and

entering DC and SC information into TIDE, and DCs and SCs are responsible for entering TA and TE

information into TIDE. Throughout the year, the DAs, DCs, and SCs are also expected to delete user

information in TIDE for staff members who have transferred to other schools, resigned, or no longer serve

as TAs or teachers.

Password protection: All access points by different roles—at the state, district, and school levels—require

a password to log in to the system. Newly added users receive passwords through their personal email

addresses or school-assigned email addresses.

Secure Browser: A key role of the Technology Coordinator is to ensure that the secure browser is properly

installed on the computers used for the administration of the online assessments. Developed by the testing

contractor, the secure browser prevents students from accessing other computers or Internet applications

and from copying test information. The secure browser suppresses access to commonly used browsers such

as Internet Explorer and Firefox and prevents students from searching for answers on the Internet or

communicating with other students. The assessments can be accessed only through the secure browser and

not by other Internet browsers.

2.4.3 Security of the Testing Environment

The DCs, SCs, TEs, and TAs work together to determine appropriate testing schedules based on the number

of computers available, the number of students in each tested grade, and the average length of time needed

to complete each assessment.

Testing personnel are reminded in the online training, face-to-face training, and user manuals that

assessments should be administered in testing rooms that do not crowd students. Good lighting, ventilation,

and freedom from noise and interruption are important factors to consider when selecting testing rooms.

TEs and TAs must establish procedures to maintain a quiet environment during each test session,

recognizing that some students may finish more quickly than others. If students are allowed to leave the

testing room when they finish, TEs or TAs are required to explain the procedures for leaving without

disrupting others and to clarify where they are expected to report once they leave. If students are expected

to remain in the testing room until the end of the session, TEs or TAs are encouraged to prepare some quiet

work for students to do after they finish the assessment.

If a student needs to leave the room for a brief time, the TEs or TAs are required to pause the student’s

assessment. For the CAT, if the pause lasts longer than 20 minutes, the student can continue with the rest

of the assessment in a new test session, but the system will not allow the student to return to the items

presented before the pause. This measure is implemented to prevent students from using the time to look

up or verify answers.

Room Preparation

The room should be prepared before the start of the test session. Any information displayed on bulletin

boards, chalkboards, or charts that students might use to help answer test questions should be removed or

covered. This rule applies to rubrics, vocabulary charts, student work, posters, graphs, content-area strategy

charts, etc. The cell phones of both testing personnel and students must be turned off and stored out of sight

in the testing room. TEs and TAs are encouraged to minimize access to the testing rooms by posting signs




in halls and entrances in order to promote optimum testing conditions; they should also post “TESTING—

DO NOT DISTURB” signs on the doors of testing rooms.

Seating Arrangements

TEs and TAs should provide adequate spacing between students’ seats. Students should be seated so that

they will not be tempted to look at the answers of others. Because the online CAT is adaptive, it is unlikely

that students will see the same test questions as other students; however, through appropriate seating

arrangements, students should be discouraged from communicating. For the performance tasks, different

forms are spiraled within a classroom so that students receive different forms of the performance tasks.

After the Test

TEs or TAs must walk through the classroom to pick up any scratch paper that students used and any papers

that display students’ EDUID numbers and names together at the end of a test session. These materials

should be securely shredded or stored in a locked area immediately. The printed reading passages and

questions for any content area assessment provided for a student who is allowed to use this accommodation

in an individual setting must also be shredded immediately after a test session ends.

For the paper-pencil versions, specific instructions are provided in the Paper-Pencil Test Administration

Manual on how to package and secure the test booklets to be returned to the testing contractor’s office after

student responses have been entered into the DEI site.

2.4.4 Test Security Violations

Everyone who administers or proctors the assessments is responsible for understanding the security

procedures for administering them. Prohibited practices as detailed in the Idaho Assessment Systems

Manual fall into one of three categories:

Impropriety: This is a test security incident that has a low impact on the individual or group of students

who are testing and has a low risk of potentially affecting student performance on the test, test security, or

test validity (e.g., student[s] leaving the testing room without authorization).

Irregularity: A test security incident that impacts an individual or group of students who are testing and

may potentially affect student performance on the test, test security, or test validity. These circumstances

can be contained at the local level (e.g., disruption during the test session, such as a fire drill).

Breach: A test security incident that poses a threat to the validity of the test. Breaches require immediate

attention and escalation to the state agency. Examples may include such situations as exposure of secure

materials or a repeatable security/system risk. These circumstances have external implications

(e.g., administrators modifying student answers or students sharing test items through social media).

District and school personnel must document all test security incidents in the test security incident log on

the SDE website (https://apps.sde.idaho.gov/testincidentlog). This log is the document of record for all test

security incidents and should be maintained at the district level and submitted to the SDE as incidents occur

throughout testing.

2.5 STUDENT PARTICIPATION

All students (including retained students) currently enrolled in grades 3–8 and 10 at public schools in Idaho

are required to participate in the ISAT ELA/L and mathematics assessments. Testing at grades 9 and 11 is




optional. Students must be tested in the enrolled grade assessment; out-of-grade-level testing is not allowed

for the administration of the ISAT ELA/L and mathematics assessments.

2.5.1 Homeschooled Students

Students who are homeschooled may participate in the ISAT ELA/L and mathematics assessments at the

request of their parent or guardian. Schools must provide these students with one testing opportunity for

each relevant content area if requested.

2.5.2 Exempt Students

The following students are exempt from participating in the ISAT ELA/L and mathematics assessments:

• Foreign exchange students who are enrolled in a U.S. school

• English learners (ELs) who enrolled in a U.S. school within the last 12 months before the beginning

of testing have a one-time exemption; these students may instead participate in the English

language proficiency assessment consistent with state and federal policies. ELs are not exempt from

completing the mathematics assessment.

• A student with significant cognitive disabilities who meets the criteria for a state-selected or state-

developed ELA/L and mathematics alternative assessment based on alternative achievement

standards (approximately 1% or fewer of the student population). Students meeting these criteria

are not exempt from testing; they are exempt only from completing the ISAT ELA/L and


School personnel should follow federal and state policies regarding student participation.

2.6 ONLINE TESTING FEATURES AND TESTING ACCOMMODATIONS

The Smarter Balanced Assessment Consortium’s Usability, Accessibility, and Accommodations Guidelines

(UAAG) are intended for school-level personnel and decision-making teams, including Individualized

Education Program (IEP) and Section 504 Plan teams, as they prepare for and implement the Smarter

Balanced assessments. The UAAG provide information for classroom teachers, English language

development educators, special education teachers, and instructional assistants to use in selecting and

administering universal tools, designated supports, and accommodations for those students who need them.

The UAAG are also intended for assessment staff and administrators who oversee the decisions that are

made in instruction and assessment.

The UAAG apply to all students. They emphasize an individualized approach to the implementation of

assessment practices for those students who have diverse needs and participate in large-scale content

assessments. The UAAG focus on universal tools, designated supports, and accommodations for the ISAT

ELA/L and mathematics assessments. At the same time, the UAAG support important instructional

decisions about accessibility and accommodations for students who participate in the ISAT ELA/L and


The Summative assessments contain universal tools, designated supports, and accommodations in both

embedded and non-embedded versions. Embedded resources are part of the computer-adaptive test

administration system, whereas non-embedded resources are provided outside of that system.




State-level users, DAs, DCs, and SCs can set embedded and non-embedded designated supports and

accommodations based on their specific user roles. Designated supports and accommodations must be set

in TIDE before starting a test session.

All embedded and non-embedded universal tools are available to all students during a test session. A TE or

TA can deactivate any of the preselected universal tools in the TA Interface of the testing system for a

student who may be distracted by the ability to access a specific tool during a test session.

For additional information about the availability of designated supports and accommodations, refer to the

Usability, Accessibility, and Accommodations Guidelines at http://idaho.portal.airast.org.

2.6.1 Online Universal Tools for All Students

Universal tools are access features of an assessment or exam that are embedded or non-embedded

components of the test administration system. Universal tools are available to all students based on their

preference and selection and have been preset in TIDE. In the 2018–2019 test administration, the following

features of universal tools were available for all students to access. For specific information on how to

access and use these features, refer to the Idaho Assessment Systems Manual at http://idaho.portal.airast.org.

Embedded Universal Tools

Calculator: Students can access an embedded on-screen digital calculator for calculator-allowed items by

clicking the calculator button. This tool is available with the specific items that the Smarter Balanced Item

Specifications indicate are appropriate only.

Digital Notepad: This tool is used for making notes about an item. The digital notepad is item-specific and

is available through the end of the test segment. Notes are not saved when the student moves on to the next

segment or after a break of more than 20 minutes.

English Dictionary: An on-screen English dictionary is available for the full-write portion of an ELA/L

performance task.

English Glossary: Grade- and context-appropriate definitions of specific construct-irrelevant terms are

shown in English on the screen via a pop-up window. The student can access the embedded glossary by

clicking any of the pre-selected terms.

Expandable Items: Each item can be expanded so that it takes up a larger portion of the screen, requiring

less scrolling by the student.

Expandable Passages: Each passage or stimulus can be expanded so that it takes up a larger portion of the

screen, requiring less scrolling by the student.

Global Notes: This digital notepad is available for the full-write portion of ELA/L performance tasks. The

student clicks the notepad icon for the notepad to appear. During the ELA/L performance tasks, notes are

retained from segment to segment so that the student can return to them even though he or she cannot go

back to specific items in any previous segment.

Highlighter: This tool is used to highlight passages or sections of passages and test questions.

Keyboard Navigation: Navigation throughout a text can be accomplished by using a keyboard.

Line Reader: This tool is used to highlight an individual line of text in a passage or test question.






Mark for Review: The student can mark a question for review in order to return to it later. However, for the

CAT, if the assessment is paused for more than 20 minutes, students will not be allowed to return to marked

test questions.

Mathematics Tools: These digital tools (e.g., embedded ruler or protractor) are used for measurements and

are available only with the mathematics items for which the Smarter Balanced Item Specifications deem

them to be appropriate.

Pause: The student can pause the assessment and return to the test question they were working on. However,

if an assessment is paused for more than 20 minutes, students will not be allowed to return to previous test

questions.

Spellcheck: This tool is used to check the spelling of words in student-generated responses. Spellcheck

indicates only that a word is misspelled; it does not provide the correct spelling. This tool is available only

with the specific items for which the Smarter Balanced Item Specifications indicated that it would be

appropriate. Spellcheck is bundled with other embedded writing tools for all full-write portions of a

performance task (planning, drafting, revising, and editing). A full-write is the second part of a performance

task.

Strikethrough: This tool allows students to cross out response options using the strikethrough function.

Take as much time as needed to complete the ISAT ELA/L and mathematics assessments: Testing may be

split across multiple sessions so that the testing does not interfere with class schedules. The CAT assessment

must be completed within 45 calendar days of its start date. Performance tasks must be completed within

20 calendar days of the date on which each task is started.

Thesaurus: A thesaurus can be provided for the full-write portion of an ELA/L performance task. A full-

write is the second part of a performance task. The use of this universal tool may result in the student

needing additional overall time to complete the assessment.

Writing Tools: Selected writing tools (e.g., bold, italics, bullets, and undo/redo) are available for all student-

generated responses.

Zoom In: Students are able to zoom in on test questions, text, or graphics.

Non-Embedded Universal Tools

Breaks: Breaks may be given at predetermined intervals or after completion of sections of the assessment

for students taking a paper-pencil test. Sometimes, students are allowed to take breaks when individually

needed to reduce cognitive fatigue when they experience heavy assessment demands. The use of this

universal tool may result in the student needing additional overall time to complete the assessment.

English dictionary: An English dictionary can be provided for the full-write portion of an ELA/L

performance task. A full-write is the second part of a performance task. The use of this universal tool may

result in the student needing additional overall time to complete the assessment.

Scratch paper: Scratch paper may be used to make notes, write computations, or record responses. Only

plain paper or lined paper is appropriate for ELA/L. Graph paper is required beginning in grade 6 and can

be used on all mathematics assessments. A student can use an assistive technology device for scratch paper

as long as the device is consistent with the child’s IEP or Section 504 Plan and is acceptable to the state.




Thesaurus: A thesaurus can be provided for the full-write portion of an ELA/L performance task. A full-

write is the second part of a performance task. The use of this universal tool may result in the student

needing additional overall time to complete the assessment.

2.6.2 Designated Supports and Accommodations

Designated supports for the ISAT ELA/L and mathematics assessments are those features that are available

for use by any student for whom the need has been indicated by an educator (or team of educators with

parent/guardian and student). Scores achieved by students using designated supports will be included for

federal accountability purposes. It is recommended that a consistent process be used to determine these

supports for individual students. All educators making these decisions should be trained on the process and

understand the range of designated supports available. Smarter Balanced members have identified digitally

embedded and non-embedded designated supports for students for whom an adult or team has indicated a

need for the support.

Accommodations are changes in procedures or materials that increase equitable access during the ISAT

ELA/L and mathematics assessments. Assessment accommodations generate valid assessment results for

students who need them; they allow these students to show what they know and can do. Accommodations

are available for students with documented IEPs or Section 504 Plans. Consortium-approved

accommodations do not compromise the learning expectations, construct, grade-level standard, or intended

outcome of the assessments.

Embedded Designated Supports

Color contrast: Students are able to adjust screen background or font color, based on student needs or

preferences. This may include reversing the colors for the entire interface or choosing the color of font and

background. Black on white, reverse contrast, black on rose, medium gray on light gray, and yellow on blue

are offered for the online assessments.

Masking: Masking involves blocking off content that is not of immediate need or that may be distracting to

the student. Students can focus their attention on a specific part of a test item by using the masking feature.

Mouse pointer: This embedded support allows the mouse pointer to be set to a larger size and for the color

to be changed for students to more readily find the mouse pointer.

Text-to-speech (for mathematics stimuli items and ELA/L items; not for reading passages): Text is read

aloud to the student via embedded text-to-speech technology. The student can control the speed and raise

or lower the volume of the voice via a volume control.

Translated test directions (for mathematics): Translation of test directions is a language support available

before beginning the actual test items. Students can see test directions in another language. As an embedded

designated support, translated test directions are automatically a part of the stacked translation designated

support.

Translations (glossaries) (for mathematics): Translated glossaries are a language support. The translated

glossaries are provided for selected construct-irrelevant terms for mathematics. Translations for these terms

appear on the computer screen when students click on them. The following language glossaries were

offered: Arabic, Cantonese, Filipino, Korean, Mandarin, Punjabi, Russian, Spanish, Ukrainian, and

Vietnamese.




Translations (Spanish stacked) (for mathematics): Stacked translations are a language support available for

some students. They provide a full translation of each test item above the original item in English.

Turn off any universal tools: Teachers can disable any universal tools that might be distracting, that students

do not need to use, or that students are unable to use.

Non-Embedded Designated Supports

Amplification: The student adjusts the volume control beyond the computer’s built-in settings using

headphones or other non-embedded devices.

Bilingual dictionary: A bilingual/dual language word-to-word dictionary is a language support and can be

provided for the full-write portion of an ELA/L performance task.

Color contrast: Test content of online items may be displayed with different colors.

Color overlays: Color transparencies may be placed over a paper-pencil assessment.

Magnification: The size of specific areas of the screen (e.g., text, formulas, tables, graphics, navigation

buttons) may be adjusted by the student with an assistive technology device. Magnification enables

increasing the size to a level not allowed by the zoom universal tool.

Medical device: Students may have access to an electronic device for medical purposes (e.g., glucose

monitor). The device may include a cell phone and should support the student only during testing for

medical reasons.

Noise buffer: These include ear mufflers, white noise, and/or other equipment to reduce environmental

noises.

Read aloud (for mathematics items and ELA/L items; not for reading passages): Text is read aloud to the

student by a trained and qualified human reader who follows the administration guidelines provided in the

Idaho Assessment Systems Manual and the Guidelines for Read Aloud, Test Reader. All or portions of the

content may be read aloud.

Read aloud in Spanish (for mathematics only): Spanish text is read aloud to the student by a trained and

qualified human reader who follows the administration guidelines provided in the Idaho Assessment

Systems Manual and the read-aloud guidelines. All or portions of the content may be read aloud.

Scribe (for ELA/L non-writing items): Students dictate their responses to a human who records verbatim

what they dictate. The scribe must be trained and qualified and must follow the administration guidelines

provided in the Idaho Assessment Systems Manual.

Separate setting: Test location is altered so that the student is tested in a setting different from that made

available for most students.

Simplified test directions: The TA simplifies or paraphrases the test directions found in the Test

Administration Manual according to the Simplified Test Directions guidelines.

Translated test directions: This is a PDF file of directions translated in each of the languages currently

supported. A bilingual adult can read this information to the student.




Translations (glossaries) (for mathematics paper-pencil tests): Translated glossaries are a language support

provided for selected construct-irrelevant terms for mathematics. Glossary terms are listed by item and

include the English term and its translated equivalent.

Embedded Accommodations

American Sign Language (ASL) (for ELA/L listening items and mathematics items): Test content is

translated into ASL video. An ASL human signer and the signed test content are viewed on the same screen.

Students may view portions of the ASL video as often as needed.

Braille: This is a raised-dot code that individuals read with their fingertips. Graphic material (e.g., maps,

charts, graphs, diagrams, illustrations) is presented in a raised format (paper or thermoform). Contracted

and non-contracted braille is available; Nemeth code is available for mathematics.

Braille Transcript: This is a braille transcript of the closed captioning created for the listening passages.

Closed captioning (for ELA/L listening stimulus items): This is printed text that appears on the computer

screen as audio materials are presented.

Streamlined Interface Mode: This accommodation provides a streamlined interface of the test in an

alternate, simplified format in which the items are displayed below the stimuli.

Text-to-speech (for ELA/L reading passages): Text is read aloud to the student via embedded text-to-speech

technology. The student can control the speed and raise or lower the volume of the voice via a volume

control.

Non-Embedded Accommodations

100s Number Table: This is a paper-based table listing numbers from 1–100 available from Smarter

Balanced for reference.

Abacus: For students who typically use an abacus, this tool may be used in place of scratch paper.

Alternate response options: Alternate response options include but are not limited to adapted keyboards,

large keyboards, StickyKeys, MouseKeys, FilterKeys, adapted mouse, touch screen, head wand, and

switches.

Calculator (for grades 6–11 mathematics tests): A non-embedded calculator may be provided to students

needing a special calculator, such as a braille calculator or a talking calculator, currently unavailable in the

assessment platform.

Multiplication table (grade 4 and above mathematics tests): A paper-based single digit (1–9) multiplication

table is available from Smarter Balanced for reference.

Print-on-Demand: Paper copies of either passages/stimuli and/or items may be printed for students. For

those students needing a paper copy of a passage or stimulus, permission to request printing must first be

set in TIDE.

Read aloud (for ELA/L passages): Text is read aloud to the student via an external screen reader or by a

trained and qualified human reader who follows the administration guidelines provided in the Idaho

Assessment Systems Manual and Read Aloud Guidelines. All or portions of the content may be read aloud.




Members can refer to the Guidelines for Choosing the Read Aloud Accommodation when deciding if this

accommodation is appropriate for a student.

Scribe (for ELA/L writing items): Students dictate their responses to a human who records verbatim what

they dictate. The scribe must be trained and qualified, and must follow the administration guidelines

provided in the Idaho Assessment Systems Manual.

Speech-to-text: Voice recognition allows students to use their voices as devices to input information into

the computer in order to dictate responses or give commands (e.g., opening application programs, pulling

down menus, saving work). Voice recognition software generally can recognize speech up to 160 words

per minute. Students may use their own assistive technology devices.

Word prediction: This allows students to begin writing a word and choose from a list of words that have

been predicted from word frequency and syntax rules.

Table 9 presents a list of universal tools, designated supports, and accommodations that were offered in the

2018–2019 administration. Tables 10–21 provide the number of students who were offered the

accommodations and designated supports.




Table 9. SY 2018–2019 Universal Tools, Designated Supports, and Accommodations

Universal Tools Designated Supports Accommodations

Embedded

Calculator1

Digital Notepad English Dictionary2

English Glossary

Expandable Items

Expandable Passages

Global Notes

Highlighter

Keyboard Navigation

Line Reader

Mark for Review

Mathematics Tools3

Pause Spellcheck

Strikethrough

Thesaurus2

Writing Tools4

Zoom

Color Contrast

Masking Mouse Pointer

Text-to-Speech5

Translated Test Directions

Translations (Glossary)6

Translations (Stacked)6

Turn off Any Universal Tools

American Sign Language7

Braille Braille Transcript15

Closed Captioning8

Streamlined Interface Mode

Text-to-Speech9

Non-Embedded

Breaks

English Dictionary2

Scratch Paper

Thesaurus2

Amplification

Bilingual Dictionary2

Color Contrast

Color Overlay

Magnification

Medical Device

Noise Buffers

Read Aloud10

Read Aloud in Spanish

Scribe11

Separate Setting

Simplified Test Directions


Translations (Glossary)12

100s Number Table14

Abacus

Alternate Response

Options13

Calculator1

Multiplication Table14

Print-on-Demand

Read Aloud15

Scribe

Speech-to-Text

Word Prediction

Items shown are available for ELA/L and mathematics unless otherwise noted. 1 For calculator-allowed items only in grades 6–11 2 For full-write portion of ELA/L performance task 3 Includes embedded ruler, embedded protractor 4 Includes bold, italic, underline, indent, cut, paste, spellcheck, bullets, undo/redo 5 For ELA/L PT stimuli, ELA/L PT and CAT items (not ELA/L CAT reading passages), and mathematics stimuli and

items: Must be set in TIDE by district- or school-level user and must be set before test begins. 6 For mathematics items 7 For ELA/L listening items and mathematics items 8 For ELA/L listening items 9 For ELA/L reading passages. Must be set in TIDE by district- or school-level user and must be set before test begins. 10 For ELA/L items (not ELA/L reading passages) and mathematics items 11 For ELA/L non-writing items and mathematics items 12 For mathematics items on the paper-pencil test 13 Includes adapted keyboards, large keyboard, StickyKeys, MouseKeys, FilterKeys, adapted mouse, touch screen,

head wand, and switches 14 For mathematics items beginning in grade 4 15 For ELA/L reading passages, all grades




Table 10. ELA/L Total Students with Allowed Embedded and Non-Embedded Accommodations (Grades

3–8 and 10)

Accommodations Grade

3 4 5 6 7 8 10


American Sign Language 4 11 12 9 14 6 8

Braille 3 1 1

Closed Captioning 17 24 26 35 36 21 29

Streamlined Interface Mode 91 67 106 164 84 122 40

Text-to-Speech: Passages 47 35 56 70 61 71 11

Text-to-Speech: Passages and Items 2,126 2,024 2,146 1,899 1,778 1,759 763


Alternate Response Options 13 24 9 7 6 6 6

Print-on-Demand: Items 5 2 4 1 1 2 1

Print-on-Demand: Passages 2 10 10 1 1

Print-on-Demand: Passages and Items 50 63 55 43 78 73 32

Print-on-Demand: Stimuli and Items 1 3

Read Aloud Stimuli 98 91 84 113 103 70 86

Scribe Items (Writing) 119 131 109 79 49 34

Speech-to-Text 151 129 139 156 160 117 98

Word Prediction 31 30 42 42 52 25 7

Table 11. ELA/L Total Students with Allowed Embedded and Non-Embedded Accommodations (Grades

9 and 11)


9 11


American Sign Language 4

Closed Captioning 5

Streamlined Interface Mode 14

Text-to-Speech: Passages 14 1

Text-to-Speech: Passages and Items 460 35


Alternate Response Options 2

Print-on-Demand: Passages and Items 26 2

Read Aloud Stimuli 49 2

Speech-to-Text 21 5

Word Prediction 12 3




Table 12. ELA/L Total Students with Allowed Embedded Designated Supports (Grades 3–8 and 10)

Designated

Supports Subgroup

Grade

3 4 5 6 7 8 10

Color Contrast

Overall 47 19 36 14 8 10 30

LEP 10 5 3 1 3 1

Special Education 23 15 13 7 3 3 5

Masking

Overall 197 196 180 258 277 313 32

LEP 63 35 39 49 23 23 5


Mouse Pointer

Overall 9 2 4 12 12 20 7

LEP 1 7 8 14


Text-to-Speech: Items

Overall 1,704 1,781 1,834 1,259 992 996 393

LEP 726 627 622 395 315 291 152


Text-to-Speech:

Stimuli

Overall 4 4 3 4

LEP

Special Education 3 4 3 3

Text-to-Speech:

Stimuli & Items

Overall 9 10 20 6 2

LEP 2 1

Special Education 9 9 11 4 2

Table 13. ELA/L Total Students with Allowed Embedded Designated Supports (Grades 9 and 11)

Designated Supports Subgroup Grade

9 11

Color Contrast

Overall 14

LEP 7

Special Education 10

Masking

Overall 177

LEP 7


Mouse Pointer

Overall 2

LEP

Special Education 2


Overall 207 6

LEP 81 2

Special Education 86 3




Table 14. ELA/L Total Students with Allowed Non-Embedded Designated Supports (Grades 3–8 and 10)

Designated

Supports Subgroup

Grade

3 4 5 6 7 8 10

Amplification

Overall 7 6 8 10 147 150 5

LEP 1 1 5 6


Bilingual Dictionary

Overall 107 109 95 66 202 229 66

LEP 106 106 95 63 59 85 63


Color Contrast

Overall 7 4 9 20 162 164 20

LEP 2 2 2 8 8 2


Color Overlay

Overall 4 14 8 13 154 154 1

LEP 5 5


Magnification

Overall 8 19 16 32 158 161 21

LEP 1 1 2 19 7 9 2


Overall 2 4 9 3 10 6 4

Medical Device LEP 1 2


Noise Buffers

Overall 86 112 108 107 219 217 18

LEP 13 16 6 12 21 16 3


Read-Aloud Items

Overall 255 246 249 263 387 321 142

LEP 60 81 53 41 70 50 12


Read-Aloud Stimuli

Overall 4 1 12 2 4

LEP 1


Scribe Items (Non-

Writing)

Overall 79 85 73 54 164 168 5

LEP 4 1 4 1 8 8


Separate Setting

Overall 1,410 1,410 1,564 1,423 1,361 1,304 853

LEP 315 224 265 227 191 184 154

Special Education 951 983 1,104 1,035 867 847 687

Simplified Test

Directions

Overall 632 497 599 406 530 467 262

LEP 244 148 203 93 119 111 68


Translated Test

Directions

Overall 22 10 18 44 159 167 20

LEP 21 9 13 40 19 25 19

Special Education 3 6 5 5 3 9




Table 15. ELA/L Total Students with Allowed Non-Embedded Designated Supports (Grades 9 and 11)


9 11

Amplification

Overall 147

LEP 4


Bilingual Dictionary

Overall 166 1

LEP 28 1


Color Contrast

Overall 148 1

LEP 4


Color Overlay

Overall 147

LEP 4


Magnification

Overall 148

LEP 4


Noise Buffers

Overall 158 1

LEP 9


Read-Aloud Items

Overall 165 4

LEP 11 1


Scribe Items (Non-Writing)

Overall 151

LEP 4


Separate Setting

Overall 392 30

LEP 68 1



Overall 177 9

LEP 17



Overall 150 2

LEP 17 2






(Grades 3–8 and 10)


3 4 5 6 7 8 10


American Sign Language 4 10 8 9 12 6 7

Braille 3 1

Streamlined Interface Mode 90 72 108 163 84 126 39


100s Number Table 216 218 212 172 94 99 15

Abacus 10 6 4 2 4 1

Alternate Response Options 8 13 6 5 4 4 5

Calculator 98 180 213 362 395 486 321

Multiplication Table 55 193 267 242 157 103 24

Print-on-Demand: Items 1 2

Print-on-Demand: Stimuli 2 1

Print-on-Demand: Stimuli and Items 57 54 58 31 39 41 20

Read Aloud Stimuli 2 4

Speech-to-Text 137 109 111 124 129 103 70

Word Prediction 17 6 17 17 18 9 1


(Grades 9 and 11)


9 11


American Sign Language 4

Streamlined Interface Mode 12


100s Number Table 1 3

Alternate Response Options 2

Calculator 73 9

Multiplication Table 3 3

Print-on-Demand: Stimuli and Items 7 2

Speech-to-Text 16 6

Word Prediction 3 1




Table 18. Mathematics Total Students with Allowed Embedded Designated Supports (Grades 3–8 and 10)


3 4 5 6 7 8 10

Color Contrast

Overall 44 17 34 14 10 10 31

LEP 10 5 3 1 3 1


Masking

Overall 191 180 178 218 270 303 30

LEP 63 36 42 26 18 15 3


Mouse Pointer

Overall 8 2 4 12 12 21 7

LEP 1 7 8 14



Overall 192 202 267 142 173 113 54

LEP 41 37 41 30 23 20 7


Text-to-Speech: Stimuli

Overall 10 15 13 8 12 12 1

LEP 3 1 5 1 2 2

Special Education 6 10 5 6 5 8

Text-to-Speech: Stimuli and Items

Overall 2,891 2,838 2,987 2,503 2,088 2,151 945

LEP 991 867 875 651 515 515 223

Special Education 1,298 1,317 1,453 1,363 1,113 1,104 705

Translation (Glossary):

Spanish

Overall 164 142 162 101 100 104 87

LEP 164 139 151 97 92 97 86



Other Languages

Overall 15 8 3 2 2 7 2

LEP 15 7 3 2 2 7 2

Special Education




Table 19. Mathematics Total Students with Allowed Embedded Designated Supports (Grades 9 and 11)


9 11

Color Contrast

Overall 14

LEP 7


Masking

Overall 177

LEP 7


Mouse Pointer

Overall 2

LEP

Special Education 2


Overall 27 1

LEP 2


Text-to-Speech: Stimuli

Overall 3

LEP

Special Education 1

Text-to-Speech: Stimuli and Items

Overall 542 45

LEP 128 6


Translation (Glossary): Spanish

Overall 25 5

LEP 23 5

Special Education 2




Table 20. Mathematics Total Students with Allowed Non-Embedded Designated Supports (Grades 3–8,

10)


3 4 5 6 7 8 10

Amplification

Overall 11 9 10 10 150 149 5

LEP 1 1 1 6 6


Color Contrast

Overall 11 5 11 21 162 161 21

LEP 2 1 2 8 8 2


Color Overlay

Overall 2 13 8 12 153 152 1

LEP 5 5


Magnification

Overall 9 16 16 32 156 162 24

LEP 1 2 19 7 9 2


Medical Device

Overall 2 4 11 3 10 6

LEP 1 2


Noise Buffers

Overall 89 113 132 123 211 215 18

LEP 12 16 10 14 19 16 3


Read-Aloud Items

Overall 296 339 347 328 410 394 160

LEP 69 129 86 67 93 85 34


Read-Aloud Items

(Spanish)

Overall 4 3 2 9 8 8 3

LEP 4 3 2 8 8 7 3


Read-Aloud Stimuli

Overall 230 195 208 208 170 146 39

LEP 57 50 40 20 67 48 7


Read-Aloud Stimuli

(Spanish)

Overall 6 3 2 10 10 8 1

LEP 4 3 2 8 9 8 1

Special Education 3 3 2

Scribe Items

Overall 97 95 87 57 165 173 15

LEP 7 4 6 1 7 7 1


Separate Setting

Overall 1,438 1,454 1,600 1,425 1,366 1,315 857

LEP 324 274 286 241 208 187 159

Special Education 968 984 1,121 1,037 861 848 682

Simplified Test

Directions

Overall 636 541 605 405 525 444 268

LEP 252 193 213 102 127 110 70



Overall 35 26 37 49 188 178 36

LEP 35 25 30 45 44 36 35



Spanish

Overall 110 123 108 66 64 81 51

LEP 110 119 108 65 64 76 51






3 4 5 6 7 8 10


Other Languages

Overall 3 4 3 1 3 3 1

LEP 3 4 3 1 3 3 1

Special Education




Table 21. Mathematics Total Students with Allowed Non-Embedded Designated Supports (Grades 9 and

11)


9 11

Amplification

Overall 147

LEP 4


Color Contrast

Overall 147 1

LEP 4


Color Overlay

Overall 147

LEP 4


Magnification

Overall 149

LEP 4


Noise Buffers

Overall 156 1

LEP 10


Read-Aloud Items

Overall 183 5

LEP 25 1


Read-Aloud Items (Spanish)

Overall 1

LEP 1

Special Education

Read-Aloud Stimuli

Overall 13 2

LEP 1


Read-Aloud Stimuli (Spanish)

Overall 1

LEP 1

Special Education

Scribe Items

Overall 156

LEP 4


Separate Setting

Overall 407 39

LEP 86 5



Overall 207 9

LEP 45



Overall 156 4

LEP 19 4


Translation (Glossary): Spanish

Overall 15 1

LEP 15 1

Special Education 2




2.7 DATA FORENSICS PROGRAM

2.7.1 Data Forensics Report

The validity of test scores depends critically on the integrity of the test administrations. Any irregularities

in test administration could cast doubt on the validity of the inferences based on those test scores. Multiple

facets ensure that tests are administered properly; these include clear test administration policies, effective

TA training, and tools to identify possible irregularities in test administrations.

Online test administration allows for collection of information that was impossible in paper-pencil tests,

such as item response changes, item response time, number of visits for an item or an item group, and test

starting and ending times. AIR’s test delivery system (TDS) captures all of this information.

For online administrations, a set of quality assurance (QA) reports are generated during and after the testing

window. One of the QA reports focuses on flagging possible testing anomalies. Testing anomalies are

analyzed for changes in test scores between administrations, testing time, and item response patterns using

a person-fit index. Flagging criteria used for these analyses are configurable and can be changed by an

authorized user. Analyses are performed at the student level and summarized for each aggregate unit,

including testing session, TA, and school. The QA reports are provided to state clients to monitor testing

anomalies throughout the testing window.

2.7.2 Changes in Student Performance

Changes in student scores between administration years are examined using a regression model to check

for outliers. For these between-year comparisons, students’ current-year scores are regressed on their test

scores from the previous year and on the number of days between the two years’ test-end dates (to control

for the instruction time between the two test scores). Between-year comparisons are performed between the

current school year (e.g., 2018–2019) and the year before the current school year (e.g., 2017–2018).

A large score gain or loss in student scores between administration years is detected by examining the

residuals for outliers. The residuals are computed as the observed value minus the regression model’s

predicted value. To detect unusual residuals, the studentized residuals are computed. An unusual increase

or decrease in student scores between administration years is flagged when the absolute value of the

studentized residual is greater than |3|.

The residuals of students are also aggregated for a testing session, TA, and school. The system flags any

unusual changes in an aggregate performance between administrations and/or years based on the average

of the residuals in the aggregate unit (e.g., testing session, TA, school). For each aggregate unit, a t value

is computed and flagged when |𝑡| is greater than 3,

𝑡 =∑ �̂�𝑖𝑛𝑖=1 /𝑛

√𝑠2

𝑛+∑ 𝜎2(1 − ℎ𝑖𝑖)𝑛𝑖=1

𝑛2

,

where s is the standard deviation of residuals in an aggregate unit; n is the number of students in an

aggregate unit (e.g., testing session, TA, school), 𝜎2 is the MSE from the regression, and �̂�𝑖 is the residual

for the ith student.




The variance of average residuals in the denominator is estimated in two components, conditioning on true

residual 𝑒𝑖 , 𝑣𝑎𝑟(𝐸(�̂�𝑖|𝑒𝑖)) = 𝑠2 and 𝐸(𝑣𝑎𝑟(�̂�𝑖|𝑒𝑖)) = 𝜎

2(1 − ℎ𝑖𝑖). Following the law of total variance

(Billingsley, 1995, p. 456),

𝑣𝑎𝑟(�̂�𝑖) = 𝑣𝑎𝑟(𝐸(�̂�𝑖|𝑒𝑖)) + 𝐸(𝑣𝑎𝑟(�̂�𝑖|𝑒𝑖)) = 𝑠2 + 𝜎2(1 − ℎ𝑖𝑖), hence,

𝑣𝑎𝑟 (∑ �̂�𝑖𝑛𝑖=1

𝑛) =

∑ (𝑠2+𝜎2(1−ℎ𝑖𝑖))𝑛𝑖=1

𝑛2=

𝑠2

𝑛+

∑ (𝜎2(1−ℎ𝑖𝑖))𝑛𝑖=1

𝑛2.

The QA report includes a list of the flagged aggregate units and the number of flagged students. If the

aggregate unit size is between one to five students, the aggregate unit is flagged if the percentage of flagged

students is greater than 50%. The aggregate unit size for the score change is based on the number of students

included in the between-year regression analysis in the aggregate unit.

2.7.3 Item Response Time

The online environment also allows item response time to be captured as the item page time (the time each

item page is presented) in milliseconds. For discrete items, each item appears on the screen one item at a

time, whereas stimulus-based items appear on the screen together. The page time is the time spent on one

item for discrete items and the time spent on all items associated with a stimulus for stimulus-based items.

For each student, the total time taken to complete the test is computed by adding up the page time for all

items and item groups (stimulus-based items).

The expectation is that the item response time will be shorter than the average time if students have a prior

knowledge of items. An example of unusual item response time is a test record for an individual who scores

very well on the test even though the average time spent for each item is far less than that required of

students statewide. If students already know the answers to the questions, the response time will be much

shorter than the response time for those items where the student has no prior knowledge of the item content.

Conversely, if a TA helps students by “coaching” them to change their responses during the test, the testing

time could be longer than expected.

The average and the standard deviation of test-taking time are computed across all students for each

opportunity. Students and aggregate units were flagged if the test-taking time was greater than |3| standard

deviations of the state average. The state average and standard deviation were computed based on all

students when the analysis was performed. The QA report includes a list of the flagged aggregate units.

2.7.4 Inconsistent Item Response Pattern

In item response theory (IRT) models, person-fit measurement is used to identify test takers whose response

patterns are improbable given an IRT model. If a test has psychometric integrity, little irregularity will be

seen in the item responses of the individual who responds to the items fairly and honestly.

If a test taker has prior knowledge of some test items (or is provided answers during the exam), he or she

will respond correctly to those items at a higher probability than indicated by his or her ability as estimated

across all items. In this case, the person-fit index will be large for the student. We note, however, that if a

student has prior knowledge of the entire test content, this will not be detected based on the person-fit index,

even if the item response time index might flag such a student.

The person-fit index is based on all item responses of a test. An unlikely response to a single test question

may not result in a flagged person-fit index. Of course, not all unlikely patterns indicate cheating, as in the




case of a student who is able to guess a significant number of correct answers. Therefore, the evidence of

person-fit index should be evaluated along with other testing irregularities to determine possible testing

irregularities. The number of flagged students is summarized for every testing session, TA, and school.

The person-fit index is computed using a standardized log-likelihood statistic. Following Drasgow, Levine,

and Williams (1985) and Sotaridona, Pornell, and Vallejo (2003), aberrant response pattern is defined as a

deviation from the expected item score model. Snijders (2001) showed that the distribution of 𝑙𝑧 is

asymptotically normal (i.e., with an increasing number of administered items). Even at shorter test lengths

of 8 or 15 items, the “asymptotic error probabilities are quite reasonable for nominal Type I error

probabilities of 0.10 and 0.05” (Snijders, 2001).

Sotaridona et al. (2003) report promising results of using 𝑙𝑧 for systematic flagging of aberrant response

patterns. Students with 𝑙𝑧values greater than |3| are flagged. Aggregate units are flagged with t greater

than |3|,

𝑡 =𝐴𝑣𝑒𝑟𝑎𝑔𝑒 z

l values

√𝑠2 𝑛⁄,

where s = standard deviation of 𝑙𝑧values in an aggregate unit and n = number of students in an aggregate

unit. The QA report includes a list of the flagged aggregate units.

2.8 PREVENTION AND RECOVERY OF DISRUPTIONS IN TEST DELIVERY SYSTEM

AIR is continuously improving our ability to protect our systems from interruptions. AIR’s TDS is designed

to ensure that student responses are captured accurately and stored on more than one server in case of a

failure. Our architecture, described below, is designed to recover from failure of any component with little

interruption. Each system is redundant, and critical student response data is transferred to a different data

center each night.

AIR has developed a unique monitoring system that is very sensitive to changes in server performance.

Most monitoring systems provide warnings when something is going wrong. Ours does, too, but it also

provides warnings when any given server is performing differently from its performance over the prior few

hours, or differently than the other servers performing the same jobs. Subtle changes in performance often

precede actual failure by hours or days, allowing us to detect potential problems, investigate them, and

mitigate them before a failure. On multiple occasions, this has enabled us to make adjustments and replace

equipment before any problems occur.

AIR has also implemented an escalation procedure that enables us to alert clients within minutes of any

disruption. Our emergency alert system notifies by text message our executive and technical staff, who

immediately join a call to understand the problem.

The section below describes AIR system architecture and how it recovers from device failures, Internet

interruptions, and other problems.




2.8.1 High-Level System Architecture

Our architecture provides redundancy, robustness, and reliability required by a large-scale, high-stakes

testing program. Our general approach, which has been adopted by Smarter Balanced as standard policy, is

pragmatic and well supported by our architecture.

Any system built around an expectation of flawless performance of computers or networks within schools

and districts is bound to fail. Our system is designed to ensure that the testing results and experience are

able to respond robustly to such inevitable failures. Thus, AIR’s TDS is designed to protect data integrity

and prevent student data loss at every point in the process.

The key elements of the testing system, including the data integrity processes at work at each point in the

system, are described below. Fault tolerance and automated recovery are built into every component of the

system, as described in this section.

Student Machine

Student responses are conveyed to our servers in real time as students respond. Long responses, such as

essays, are saved automatically at configurable intervals (usually set to one minute) so that student work is

not at risk during testing.

Responses are saved asynchronously, with a background process on the student machine waiting for

confirmation of successfully stored data on the server. If confirmation is not received within the designated

time (usually set to 30–90 seconds), the system will prevent the student from doing any more work until

connectivity is restored. The student is offered the choice of asking the system to try again or pausing the

test and returning at a later time. For example:

• If connectivity is lost and restored within the designated time period, the student may be unaware

of the momentary interruption.

• If connectivity cannot be silently restored, the student is prevented from testing and given the option

of logging out or retrying the save.

• If the system fails completely, upon logging back in the system, the student returns to the item at

which the failure occurred.

In short, data integrity is preserved by confirmed saves to our servers and prevention of further testing if

confirmation is not received.

Test Delivery Satellites

The test delivery satellites communicate with the student machines to deliver items and receive responses.

Each satellite is a collection of web and database servers. Each satellite is equipped with redundant array

of independent disks (RAID) systems to mitigate the risk of disk failure. Each response is stored on multiple

independent disks.

One server for every four satellites serves as a backup hub. This server continually monitors and stores all

changed student response data from the satellites, creating an additional copy of the real-time data. In the

unlikely event of failure, data are completely protected. Satellites are automatically monitored, and upon

failure, they are removed from service. Real-time student data are immediately recoverable from the

satellite, backup hub, or hub (described below), with backup copies remaining on the drive arrays of the

disabled satellite.




If a satellite fails, students will exit the system. The automatic recovery system enables them to log in again

within seconds or minutes of the failure, without data loss. This process is managed by the hub. Data will

remain on the satellites until the satellite receives notice from the demographic and history servers that the

data are safely stored on those disks.

Hub

Hub servers are redundant clusters of database servers with RAID drive systems. Hub servers continuously

gather data from the test delivery satellites and their mini-hubs and store that data as described above. This

real-time backup copy remains on the hub until the hub receives notification from the demographic and

history servers that the data have reached the designated storage location.

Demographic and History Servers

The demographic and history servers store student data for the duration of the testing window. They are

clustered database servers, with RAID subsystems, that provide redundant capability to prevent data loss

in the event of server or disk failure. At the normal conclusion of a test, these servers receive completed

tests from the test delivery satellites. Upon successful completion of the storage of the information, these

servers notify the hub and satellites that it is safe to delete student data.

Quality Assurance System

The quality assurance (QA) system gathers data used to detect cheating, monitors real-time item function,

and evaluates test integrity. Every completed test runs through the QA system, and any anomalies (such as

unscored or missing items, unexpected test lengths, or other unlikely issues) are flagged, and immediate

notification goes out to our psychometricians and project team.

Database of Record

The Database of Record (DoR) is the final storage location for the student data. These clustered database

servers with RAID systems hold the completed student data.

2.8.2 Automated Backup and Recovery

Every system is backed up nightly. Industry-standard backup and recovery procedures are in place to ensure

safety, security, and integrity of all data. This set of systems and processes is designed to provide complete

data integrity and prevent loss of student data. Redundant systems at every point, real-time data integrity

protection and checks, and well-considered real-time backup processes prevent loss of student data, even

in the unlikely event of system failure.

2.8.3 Other Disruption Prevention and Recovery

These testing systems are designed to be extremely fault-tolerant. The systems can withstand failure of any

component with little or no service interruption. This robustness is archived through redundancy. Key

redundant systems are as follows:

• The system’s hosting provider has redundant power generators that can continue to operate for up

to 60 hours without refueling. With the multiple refueling contracts that are in place, these

generators can operate indefinitely.

• The hosting provider has multiple redundancies in the flow of information to and from the system’s




data centers by partnering with nine different network providers. Each fiber carrier must enter the

data center at separate physical points, protecting the data center from a complete service failure

caused by an unlikely network cable cut.

• On the network level are redundant firewalls and load balancers throughout the environment.

• The system uses redundant power and switching within all server cabinets.

• Data are protected by nightly backups. A full weekly backup and incremental nightly backups

protect data. Should a catastrophic event occur, AIR is able to reconstruct real-time data using the

data retained on the TDS satellites and hubs.

• The server backup agents send alerts to notify system administration staff in the event of a backup

error, at which time they will inspect the error to determine whether the backup was successful or

if they need to rerun it.

The system’s TDS is hosted in an industry-leading facility with redundant power, cooling, state-of-the-art

security, and other features that protect the system from failure. The system is redundant at every

component, and in the event of failure, the unique design ensures that data are always stored in at least two

locations. The engineering that led to this system protects student responses from loss.




3. CONSTRUCTION OF GRADES 9 AND 10 TESTS

Part of the scope of work in the Multi-Agency Assessment Cooperative (MAAC) was to develop grades 9

and 10 ELA/L and mathematics tests from the grade 11 item pool in the 2014 Smarter Balanced assessment.

The grades 9 and 10 tests would:

• be shared by Idaho, West Virginia, and the U.S. Virgin Islands;

• be calibrated on the Smarter Balanced grades 3–11 vertical scale;

• be administered as a computer-adaptive test; and

• have separate grade-specific cut scores.

3.1 TEST BLUEPRINTS

The grade 11 Smarter Balanced assessment was developed and intended to meet the standards for all high

school grades, not specifically for grade 11, although the item pool was administered to grade 11 students.

AIR examined the Common Core State Standards (CCSS) for high school grades, and determined that the

extent of the overlap in content standards in high school grades is too great to develop separate grades 9

and 10 blueprints in ELA/L. In mathematics, however, the grade 11 blueprint includes an accumulation of

standards from concepts taught in grades 9, 10, and 11; therefore, the blueprints for grades 9 and 10 can be

developed as a subset of the grade 11 blueprint.

3.1.1 ELA/L Test Blueprint

The CCSS for ELA/L in high school grades are nearly identical; therefore, the blueprint we propose for the

grades 9 and 10 ELA/L benchmark assessments is the same blueprint used in grade 11. Note that although

the same pool is administered in multiple grades, if a student takes a test in multiple grades, the algorithm

selects different items.

The Smarter Balanced blueprint is organized in claims and targets, within which are the CCSS for grades

11 and 12. These groupings can be found in the Smarter Balance Assessment content specifications located

on the Smarter Balanced website (http://www.smarterbalanced.org). The blueprint does not go down to the

standard level; therefore, the specific differences between the two grade bands are indistinguishable on the

blueprint itself.

Based on the content specifications, targets 4 and 5 show some differences between the standards at grades

9 and 10 and grades 11 and 12. For example, standard 9, which is included in both targets 4 and 5, calls for

a comparison across literary texts. In grades 11 and 12, the standard calls for a comparison that is limited

to foundational works of American literature from the same time period. In grades 9 and 10, the standard

calls for an examination of texts across time periods and cultures. While there are some variations in the

passages that support these standards, the items themselves—and the essential skills of integrating

knowledge across multiple texts—are, we believe, ostensibly the same constructs.

The Smarter Balanced assessment blueprint also calls for brief writing tasks as well as an extended writing

task associated with the performance task (PT). The rubric used to score the PT is the same rubric used at

grade 8. It is intended to measure overall writing performance rather than grade-specific subskills. Even the

conventions dimension of the rubric does not specify grade-level grammar/usage skills. A full-credit score

on conventions is given if the response “demonstrates an adequate command of conventions: adequate use

http://www.smarterbalanced.org/




of correct sentence formation, punctuation, capitalization, usage grammar, and spelling; no systematic

pattern of errors is displayed.”

3.1.2 Mathematics Test Blueprint

In mathematics, while all of the targets and domains on the grade 11 mathematics test are considered to be

college and career ready content, the blueprints for grades 9 and 10 included a subset of the grade 11

blueprint based on what is taught in Integrated Mathematics I for grade 9 and Integrated Mathematics II for

grade 10.

The steps taken to create the blueprints for grades 9 and 10 are:

• The targets in claim 1 that contain standards that are not part of the Integrated Mathematics I or

Integrated Mathematics II recommended standards from the CCSS Appendix A were removed.

Domains in claims 2, 3, and 4 that contain standards that are not part of the Integrated Mathematics

I/Integrated Mathematics II recommended standards from the CCSS Appendix A were removed.

• The targets were allocated appropriately to calculator and non-calculator segments based on how

the items were field tested on the grade 11 assessment.

• The total number of items allocated to each claim and content category were updated to be

proportional to the number of items on the grade 11 assessment.

3.2 SUMMARY OF SIMULATION STUDIES

Before the operational testing window, AIR conducts simulations to evaluate and ensure the

implementation and quality of the adaptive item-selection algorithm and the scoring algorithm. The

simulation tool enables us to manipulate key blueprint and configuration settings to match the blueprint and

minimize measurement error; this also enables us to maximize the number of different assessments seen by

students; no items are recycled across opportunities for a student.

3.2.1 Summary of Adaptive Algorithm

For the Smarter Balanced assessments, item selection rules ensure that each student receives an assessment

representing an adequate sample of the domain with appropriate difficulty. The algorithm maximizes the

information for each student and allows for certain constraints to be set, ensuring that items selected

represent the required content distribution. The TDS ensures that students are not exposed to the same items

or passages in subsequent assessments if they attempt multiple opportunities for the same content area.

Items selected for each student depend on the student’s performance on previously selected items. The

accuracy of the student responses to items determines the next item or passage that the student will see.

Thus, each student is presented with a set of items that most accurately aligns with his or her proficiency

level based on grade-level content. Higher performance is followed by more difficult items, and lower

performance is followed by less difficult items until test length constraints are met.

The adaptive algorithm selects items to administer on each student’s assessment to meet three objectives:

• Match the test specifications (test blueprints).

• Accurately classify test takers’ proficiency in each claim.




• Minimize the measurement error by administering an assessment with items targeted to a student’s

ability.

For the first item, the initial ability is used to select the first item or the first item group. The algorithm starts

each assessment with the score in the previous grade. For example, the initial ability for grade 9 students is

the grade 8 score in the previous year. When no prior score is available, the algorithm can either start an

assessment with an item of average difficulty near the average ability of students in the previous scores (

e.g., 2017–2018 summative test scores) assuming the same initial ability for all students, or select the first

item randomly from the pool.

The algorithm, as described in the Smarter Balanced Adaptive Algorithm (Cohen, C., & Albright, L., 2014),

selects the first item based on a prior estimate of student achievement, selecting from the k items providing

the most information given prior student achievement. The parameter k is a configurable parameter that can

be used to mitigate item exposure or more closely match a student’s performance depending on its value.

Because selection of the initial item can affect item exposure, the parameter k was chosen, in consultation

with Smarter Balanced, to minimize the unused item rate while providing the most information given prior

student achievement. Larger values of k provide more exposure control at the expense of optional selection.

After the initial item is administered, the algorithm identifies the best item to administer using the criteria

described in this section.

Match to the Blueprint

The algorithm first selects items to maximize fit to the test blueprint. Blueprints specify a range of items to

be administered in each claim for each assessment, with a collection of constraint sets. A constraint set is

a set of exhaustive, mutually exclusive classifications of items. For example, if a claim consists of four

targets and each item measures one—and only one—of the claims, the claim classifications constitute a

constraint set.

During item selection, the algorithm “rewards” claims that have not yet reached the minimum number of

items. For example, if the measurement claim requires that an assessment contain between eight and nine

items, measurement is the constrained feature. At any point in time, the minimum constraint on some

features may have already been satisfied, while others may not have been satisfied. Other features may be

approaching the maximum defined by the constraint. The value measure must reward items that have not

yet met minimum constraints and penalize items that would exceed the maximum constraints. The

algorithm stops administering items when the specified assessment length is met.

Increased Precision

The adaptive algorithm is able to quickly and efficiently derive very precise estimates of student

achievement. To increase the diagnostic value of score reports, the algorithm also seeks to increase the

likelihood that a student’s claim score will be clearly above or below the proficient level performance

standard. Thus, when selecting items from within each claim, the algorithm also values items that increase

the likelihood function that a student’s claim score is above or below the proficiency cut score. After

identifying eligible items that meet the blueprint, the algorithm selects items that maximize the precision

with which proficiency is assessed for each claim by selecting the best fitting item from the available items

within the targeted claim.




Match to Student Ability

In addition to rewarding items that match the blueprint, the adaptive algorithm also places greater value on

items that maximize assessment information near the student’s estimated ability, ensuring the most precise

estimate of student ability possible, given the constraints of the item pool and satisfaction of the blueprint

match requirement. After each response is submitted, the algorithm recalculates a score. As more answers

are provided, the estimate becomes more precise, and the difficulty of the items selected for administration

more closely aligns to the student’s ability level. Higher performance (answering items correctly) is

followed by more difficult items, and lower performance (answering items incorrectly) is followed by less

difficult items. When the assessment is completed, the algorithm scores the overall assessment and each

claim.

The algorithm allows previously answered items to be changed; however, it does not allow items to be

skipped. Item selection requires iteratively updating the estimate of the overall and claim ability estimates

after each item is answered. When a previously answered item is changed, the proficiency estimate is

adjusted to account for the changed responses when the next new item is selected. While the update of the

ability estimates is performed at each iteration, the overall and claim scores are recalculated using all data

at the end of the assessment for the final score.

The Smarter Balanced TDS administers assessments with items representing the breadth and depth

identified in the test specifications and content standards. Because the assessment adapts to each student’s

performance while maintaining accurate representation of the required grade-level knowledge and skills in

content breadth and depth, the test results provide precise estimates of each student’s achievement level

across the range of proficiency.

3.2.2 Testing Plan

The testing of the adaptive item-selection algorithm begins by generating a sample of test takers’ with true

thetas from a normal distribution with (,) for each grade and subject where and represent mean and

standard deviation of the normal distribution. The parameters for the normal distribution are based on

students’ scores in the spring 2017–2018 summative tests in Idaho, Vermont, and Washington. Each

simulated examinee is administered one test opportunity in both ELA/L and mathematics.

The initial ability is drawn from a uniform distribution within the range of true theta plus or minus 1. The

initial ability is used to initiate the test by choosing the first few items. Table 22 provides the means and

standard deviations used to generate a sample of student abilities in the simulation for grades 9 and 10 in

ELA/L and mathematics. Simulations generated 5,000 simulated test administrations in both ELA/L and

mathematics.

Table 22. Parameters Used to Generate True Ability Distributions

Grade ELA/L Mathematics

Mean SD Mean SD

9 0.849 1.262 0.351 1.501

10 1.179 1.292 0.750 1.513




3.2.3 Statistical Summaries

The statistics computed include the following: the statistical bias of the estimated theta parameter; mean

squared error (MSE); significance of the bias; average standard error of the estimated theta; standard error

of theta at the 5th, 25th, 75th, and 95th percentiles; and the percentage of students’ estimated theta falling

outside the 95% and 99% confidence intervals. Statistical bias refers to whether test scores systematically

underestimate or overestimate the student’s true ability.

Computational details of each statistic are provided on the following pages.

𝑏𝑖𝑎𝑠 = 𝑁−1 ∑ (𝜃𝑖 − 𝜃𝑖)𝑁𝑖=1 (1)

𝑀𝑆𝐸 = 𝑁−1 ∑ (𝜃𝑖 − 𝜃𝑖)2𝑁

𝑖=1 ,

where 𝜃𝑖 is the true theta and 𝜃𝑖 is the estimated theta for individual i. For the variance of the bias, a first-

order Taylor series of Equation (1) is used as:

𝑣𝑎𝑟( 𝑏𝑖𝑎𝑠) = 𝜎2 ∗ 𝑔′(𝜃𝑖)2

=1

𝑁(𝑁−1)∑ (𝜃𝑖 − 𝜃𝑖)

2𝑁𝑖=1 ,

where, 𝜃𝑖 is an average of the estimated thetas.

Significance of the bias is then tested as:

𝑧 =𝑏𝑖𝑎𝑠

√𝑣𝑎𝑟(𝑏𝑖𝑎𝑠)

A p-value for the significance of the bias is reported from this z test.

The average standard error of the estimated theta is computed as:

𝑚𝑒𝑎𝑛(𝑠𝑒) = √𝑁−1∑𝑠𝑒(𝜃𝑖)2𝑁

𝑖=1

,

where 𝑠𝑒(𝜃𝑖) is the standard error of the estimated theta (𝜃) for individual i.

To determine the percentage of students’ estimated theta falling outside the 95% and 99% confidence

interval coverage, a t-statistic is performed as follows:

𝑡 =𝜃𝑖−�̂�𝑖

𝑠𝑒(�̂�𝑖),

where 𝜃𝑖s the estimated theta for individual i, and 𝜃𝑖 is the true theta for individual i. The percentage of

students’ estimated theta falling outside the coverage is determined by comparing the absolute value of the

t-statistic to a critical value of 1.96 for the 95% coverage and to 2.58 for the 99% coverage.




3.2.4 Summary Statistics on Test Blueprints

In the adaptive item-selection algorithm, item selection takes place in two discrete stages: blueprint

satisfaction and match-to-ability. The Smarter Balanced assessment blueprints specify a range of items to

be administered in each claim, content domain/standards, and targets. Moreover, blueprints constrain Depth

of Knowledge (DOK) and item and passage types. For DOK, most of the Smarter Balanced blueprint

specifies the minimum number of items, not the maximum. In blueprints, all content blueprint elements are

configured to obtain a strictly enforced range of items administered. The algorithm also seeks to satisfy

target-level constraints, but these ranges are not strictly enforced. In ELA/L, the blueprints also specify the

number of passages in reading (claim 1) and listening (claim 3) claims.

Tables 23–24 present the percentages of tests aligned with the test blueprints for ELA/L and mathematics

by claim, target, DOK, and item type constraints. In ELA/L, all tests met the blueprint constraints for items

and passages except for target constraints in claim 1 due to the uneven distribution of items across targets

and DOKs within and across passages. In mathematics, all tests met the blueprint requirements.

Table 23. Percentage of Tests Meeting Blueprint Requirements in ELA/L

Claim Content Category/Target

Required

Items in

G9–10

%BP Match for Item

Requirements

%BP Match for

Passage Requirement

G9 G10 G9–10

1 Literary Text 4 100 100 100

Target 2: Central Ideas 1 99 99

Target 4: Reasoning and Evaluation 1 99 99

Targets 1, 3, 5, 6, & 7 2 99 99

Target 2 or 4 short text 0–1 100 100

Informational Text 11–12 100 100 100

Target 9 & 11 2–4 94 93

Targets 8, 10, 12, 13, & 14 7–10 98 97

Target 9 or 11 short text 0–1 100 100

DOK 1 ≤ 4 100 100

DOK 3 or 4 ≥ 3 100 100

2 Writing 6 100 100

Target 1, 3, & 6: Organization/Purpose 1 100 100

Target 1, 3, & 6: Evidence/Elaboration 1 100 100

Target 8: Language and Vocabulary Use 1 100 100

Target 9: Edit/Clarify 3 100 100

DOK 2 ≥ 2 100 100

DOK 3 or 4 1 100 100

Brief Write 1 100 100

3 Listening 8–9 100 100 100

Target 4: Listen/Interpret 8–9 100 100

DOK 2 or higher ≥ 4 100 100

4 Research 8 100 100

Target 2: Interpret and Integrate Information 2–3 100 100

Target 3: Analyze Information/Sources 2–3 100 100

Target 4: Use Evidence 2–3 100 100




Table 24. Percentage of Tests Meeting Blueprint Requirements in Mathematics

Claim Content/Target

G9 G10

Required

Items

%BP

Match

Required

Items %BP Match

1 Overall 20 100 20 100

DOK 2 or higher ≥ 7 100 ≥ 7 100 Priority Cluster 15 100 13 100

Targets D, E 0–5 100

Target F 0–1 100

Targets G, H, I 0–8 100 0–5 100

Target J 0–8 100

Target K 0–8 100

Targets L, M, N 0–8 100 0–9 100 Supporting Cluster 5 100 7 100

Target O 3–5 100

Target P 0–3 100

Targets A, B 0–4 100

Target C 0–3 100

2&4 Overall 6 100 6 100 DOK 3 or higher ≥ 2 100 ≥ 2 100

2. Target A 2 100 2 100

2. Targets B, C, D 1 100 1 100

4. Targets A, D 1 100 1 100

4. Targets B, E 1 100 1 100 4. Targets C, F 1 100 1 100

3-Calc Overall 8 100 5 100

DOK 3 or higher ≥ 2 100 ≥ 2 100 Targets A, D 2–3 100 1–3 100 Targets B, E 3 100 2–3 100

Targets C, F, G 1–2 100 0–2 100

3-No Calc Overall 1 100

Table 25 presents a summary for the number of targets specified in the blueprints and the average and range

of the number of unique targets administered in each simulated test by claim. The blueprints require

covering a few targets in a claim; therefore, the number of targets covered in each test are expected to vary

across tests. The blueprint match results demonstrate the fact that all test forms conform to the same content

target, thus providing evidence of content comparability. In other words, while each form is unique with

respect to its items, all forms align with the same curricular expectations set forth in the test blueprints.




Table 25. Average and Range of the Number of Unique Targets Covered in Each Test by Claim

Grade Total Targets in BP Mean Range (Minimum–

Maximum) C1 C

22

2

C3 C4 C1 C2 C3 C4 C1 C2 C3 C4

ELA/L

9 14 5 1 3 10.0 4 1 3 7–11 4–4 1–1 3–3

10 14 5 1 3 10.0 4 1 3 7–11 4–4 1–1 3–3

Mathematics

9 9 4 7 6 9.0 2 5.5 3 8–9 2–2 3–6 3–3

10 11 4 7 6 7.9 2 4 3 7–9 2–2 2–6 3–3

3.2.5 Summary Statistics on Ability Estimation

Each simulated test includes an initial ability, a true score, and an ability estimate based on the adaptive test

administration. Table 26 presents statistical summaries of the ability estimation, including mean of the

biases, which is the average of the biases of estimated abilities across all students, the standard error of the

mean bias, and the p-value for the significance of the estimated bias reported from the z test. Table 26 also

provides the mean square error and the percentage of students’ estimated theta falling outside the 95%

coverage and 99% coverage.

The mean bias of the estimated abilities is not statistically significant in ELA/L, while significant bias is

observed in mathematics for grades 9 and 10. Significant bias in mathematics is from the lower end of the

ability range due to the mismatch between the item difficulties and students’ abilities. The item pools

currently have a shortage of items that are better targeted toward low performing students, resulting in a

mismatch between the difficulty of item pool and the ability of the student population. Table 27 provides

the average item difficulty for the pool and the average estimated ability for the simulated students. As

shown in Table 27, the average item difficulties in the mathematics pool are much higher than the average

student abilities, indicating a difficulty to select items that maximize assessment information near the

student’s estimated ability while meeting the blueprint requirements. The percentages of students’ estimated

theta falling outside the 95% and 99% confidence interval coverage are close to 5% and 1%, respectively.

Table 26. Bias of the Estimated Abilities

Grade Mean of the

Biases

SE of the

Biases

P-value for

the Z-test MSE

95%

Coverage

99%

Coverage

ELA/L

9 0.01 0.01 0.37 0.18 4.7% 0.9%

10 0.00 0.01 0.43 0.18 5.4% 1.0%

Mathematics

9 0.05 0.01 0.00 0.30 4.9% 0.9%

10 0.05 0.01 0.00 0.30 4.6% 1.1%




Table 27. Average Item Difficulty and Student Ability

Grade

ELA/L Mathematics

Items Ability Items Ability

Mean SD Mean SD Mean SD Mean SD

9 1.722 1.476 0.840 1.363 2.394 1.680 0.279 1.634

10 1.722 1.476 1.175 1.370 2.700 1.364 0.709 1.638

Table 28 presents the mean standard error of the ability estimate across 5,000 simulated test administrations,

as well as the standard error across the ability distribution. The standard errors are large in the low ability

range in both ELA/L and mathematics—an indication that the item pool is too difficult for students and

there is a shortage of easy items. In ELA/L, the standard error is greatest at the very low end of the ability

range. In mathematics, the standard error is greatest at the very low end of the ability range and decreases

as it approaches the very high end of the ability range. The standard error curves are shown in Figure 1.

Table 28. Standard Errors of the Estimated Abilities

Grade Average

SE

SE at

5th Percentile

SE at Bottom

Quartile (25th)

SE at Top

Quartile (75th)

SE at 95th

Percentile

ELA/L

9 0.42 0.46 0.38 0.36 0.40

10 0.41 0.48 0.38 0.39 0.42

Mathematics

9 0.54 0.87 0.58 0.40 0.32

10 0.54 0.76 0.57 0.31 0.30




Figure 1. Standard Error of Measurements Across Estimated Theta Range

3.2.6 Global Item Exposure

The item exposure rate for each item was calculated by dividing the total number of test administrations in

which an item appears by the total number of tests administered. Then, we reported the distribution of the

item exposure rate (r) in six bins. The bins are r = 0% (unused), 0% < r < 20%, 20% < r < 40%, 40% < r <

60%, 60% < r < 80%, and 80% < r < 100%. If global item exposure is minimal, we would expect the largest

portion of items to appear in the 0% < r < 20% bin, an indication that most of the items appear on a very

small percentage of the test forms. The exposure rates are computed for 5,000 simulated tests in both ELA/L

and mathematics.

Table 29 presents the percentage of items that falls into each exposure bin by subject and grade. The

distribution of exposure rates is as expected, given the number of items in the blueprint constraints. Most

test items are administered in 20% or fewer test administrations. There were no items with high exposure

rates (e.g., 60%–80%, 81%–100%) in both ELA/L and mathematics. All items were used in grades 9 and

10 mathematics, while there were some unused items in ELA/L. Most unused items in ELA/L are due to

unbalanced DOK distribution in passage (e.g., too many DOK 3 or 4 items in each passage) and off-grade

items.




Table 29. Percentage of Items by Exposure Rate

Grade Total

Items

Exposure Rate

Unused 0%–20% 21%–40% 41%–60% 61%–80% 81%–100%

ELA/L

9 2,684 8.76 91.06 0.19 0 0 0

10 2,684 8.64 91.17 0.19 0 0 0

Mathematics

9 971 0 97.12 2.16 0.72 0 0

10 1,081 0 96.48 2.68 0.83 0 0

3.3 ESTABLISHING CUT SCORES

There are several ways that cut scores could be established for the common grades 9 and 10 tests. The most

time-consuming and expensive option would be to bring in a panel of standard setters and do a regular

standard setting similar to the one done by the Consortium. This could have been done after the close of the

testing window in 2015. The big disadvantage of this option was that scores in grades 9 and 10 could not

have been reported until after the standard-setting process was completed. A second more simple and

immediate way that the cut scores could be established was to use a regression interpolation procedure and

determine the cut scores statistically. This is the approach taken in the results below.

AIR examined the cut scores established by Smarter Balanced in a variety of ways. Several patterns were

immediately obvious when examining the cut scores in the vicinity of grade 9 and 10. These are shown in

Figures 2 and 3.




Figure 2. ELA/L Smarter Balanced Cut Scores in Grades 7, 8, and 11

-0.350

-0.330

-0.310

-0.290

-0.270

-0.250

-0.230

-0.210

-0.190

-0.170

7 8 9 10 11

Thet

a Sc

ore

Grade

Level 2

0.500

0.550

0.600

0.650

0.700

0.750

0.800

0.850

0.900

7 8 9 10 11

Thet

a Sc

ore

Grade

Level 3

1.600

1.650

1.700

1.750

1.800

1.850

1.900

1.950

2.000

2.050

7 8 9 10 11

Thet

a Sc

ore

Grade

Level 4




Figure 3. Mathematics Smarter Balanced Cut Scores in Grades 7, 8, and 11

-0.400

-0.300

-0.200

-0.100

0.000

0.100

0.200

0.300

0.400

7 8 9 10 11

Thet

a Sc

ore

Grade

Level 2

0.600

0.700

0.800

0.900

1.000

1.100

1.200

1.300

1.400

1.500

7 8 9 10 11

Thet

a Sc

ore

Grade

Level 3

1.500

1.700

1.900

2.100

2.300

2.500

2.700

7 8 9 10 11

Thet

a Sc

ore

Grade

Level 4




The obvious patterns in the graphs are that the cut scores for ELA/L are curvilinear between grades 7 and

11, but the cut scores for mathematics are linear. Therefore, in order to predict the cut scores for grades 9

and 10, AIR used a curvilinear regression approach for ELA/L and a linear regression approach for

mathematics. For ELA/L, theta was converted to exp(theta). The predicted exp(theta) was converted back

to the original theta metric by taking the log of predicted exp(theta). For mathematics, a simple linear

regression using theta was used. The sample sizes used in the regression analyses are listed in Table 30.

Tables 31 and 32 present the values of cut scores used in the regression, along with the slopes and intercepts

of the regressions. The percentage at and above for grades 9 and 10 was obtained from Educational Testing

Service (ETS). These percentages are based on the 2014 Smarter Balanced field-test vertical linking sample.

The predicted cut scores for grades 9 and 10 are provided in Tables 33 and 34. The scaled score cut scores

for grades 9 and 10 are bolded in both tables. The cut scores look reasonable and are probably very close

to what would be established if an actual workshop were used to recommend standards. The statistical

approach relies on the assumption that the results of the 2014 grade 9 and 10 vertical linking samples are

comparable to the results that would have occurred if the 2014 grade 9 and 10 tests had been administered

according to the above blueprints.

Table 30. Sample Sizes of Grades 9, 10, and 11 Students in Vertical Linking Sample


9 7,714 12,016

10 11,924 14,342

11 31,019 21,250

Table 31. Cut Scores for ELA/L

Anchoring

Grade Exp(theta) Theta Cut

Percentage (%)

at and Above

Level 2

7 0.712 -0.340 66

8 0.781 -0.247 71

11 0.838 -0.177 72

Slope 0.028589

Intercept 0.529122

Level 3

7 1.665 0.510 38

8 1.984 0.685 41

11 2.392 0.872 41

Slope 0.17107

Intercept 0.530975

Level 4

7 5.160 1.641 8

8 6.437 1.862 9

11 7.584 2.026 11

Slope 0.554269

Intercept 1.58987




Table 32. Cut Scores for Mathematics

Anchoring Grade Theta Cut Percentage (%) at

and Above

Level 2

7 −0.390 64

8 −0.137 62

11 0.354 59

Slope 0.180846

Intercept −1.625

Level 3

7 0.657 33

8 0.897 32

11 1.426 33

Slope 0.188577

Intercept −0.641

Level 4

7 1.515 13

8 1.741 13

11 2.561 11

Slope 0.264231

Intercept −0.351

Table 33. Predicted Cut Scores for ELA/L

Performance

Standards Grade

Predicted

Theta Cut

Inverse

Proportions Theta Cuts

Scaled Score

Cuts

Level 2

7 −0.316 65 −0.340 2479

8 −0.277 72 −0.247 2487

9 −0.240 68 −0.240 2488

10 −0.205 76 −0.205 2491

11 −0.170 72 −0.177 2493

Level 3

7 0.547 37 0.510 2552

8 0.642 43 0.685 2567

9 0.728 38 0.728 2571

10 0.807 46 0.807 2577

11 0.881 40 0.872 2583

Level 4

7 1.699 8 1.641 2649

8 1.796 10 1.862 2668

9 1.884 9 1.884 2670

10 1.965 13 1.965 2677

11 2.040 11 2.026 2682




Table 34. Predicted Cut Scores for Mathematics

Performance

Standards Grade

Predicted

Theta Cut Inverse

Proportions Theta Cuts

Scaled Score

Cuts

Level 2

7 −0.359 63 −0.390 2484

8 −0.178 63 −0.137 2504

9 0.003 56 0.003 2515

10 0.183 62 0.183 2529

11 0.364 59 0.354 2543

Level 3

7 0.679 32 0.657 2567

8 0.868 33 0.897 2586

9 1.056 28 1.056 2599

10 1.245 33 1.245 2614

11 1.433 33 1.426 2628

Level 4

7 1.499 13 1.515 2635

8 1.763 12 1.741 2653

9 2.027 9 2.027 2676

10 2.291 12 2.291 2697

11 2.556 11 2.561 2718




4. SUMMARY OF 2018–2019 OPERATIONAL TEST ADMINISTRATION

4.1 STUDENT POPULATION

All students enrolled in grades 3–8 and 10 in all public elementary and secondary schools are required to

participate in the Smarter Balanced ELA/L and mathematics assessments. Testing at grades 9 and 11 is

optional. Tables 35–38 present the demographic composition of Idaho students who meet attemptedness

requirements for scoring and reporting of the Smarter Balanced assessments.

Table 35. Number of Students in Summative ELA/L Assessment (Grades 3–8 and 10)

Group G3 G4 G5 G6 G7 G8 G10

All Students 22,722 23,337 24,315 24,367 23,835 23,897 22,305

Female 11,159 11,311 12,005 11,855 11,647 11,558 10,892

Male 11,563 12,026 12,310 12,512 12,188 12,339 11,413

African American 249 261 315 306 250 330 280

AmerIndian/Alaskan 292 276 319 286 309 326 275

Asian 260 254 268 302 267 348 310

Hispanic 3,855 3,948 4,186 4,171 4,035 4,052 3,460

Pacific Islander 198 191 197 177 195 250 381

White 17,868 18,407 19,030 19,125 18,779 18,591 17,599

LEP 2,405 2,067 2,140 1,880 1,718 1,757 1,259

Special Education 2,309 2,244 2,360 2,250 2,107 2,024 1,673

Section 504 492 690 847 979 1,059 1,115 1,027

Note. African American=Black or African American; AmerIndian/Alaskan=American Indian or Alaska Native; Pacific Islander=Native Hawaiian or Other Pacific Islander

Table 36. Number of Students in Summative ELA/L Assessment (Grades 9 and 11)

Group G9 G11

All Students 6,495 388

Female 3,164 197

Male 3,331 191

African American 64 4

AmerIndian/Alaskan 107 8

Asian 52 3

Hispanic 1,074 60

Pacific Islander 38 5

White 5,160 308

LEP 371 14


Section 504 195 14




Table 37. Number of Students in Summative Mathematics Assessment (Grades 3–8 and 10)

Group G3 G4 G5 G6 G7 G8 G10

All Students 22,747 23,356 24,322 24,368 23,826 23,892 22,274

Female 11,165 11,320 12,010 11,854 11,643 11,544 10,866

Male 11,582 12,036 12,312 12,514 12,183 12,348 11,408

African American 261 273 329 312 262 338 283

AmerIndian/Alaskan 293 276 318 285 305 325 273

Asian 264 261 274 305 267 351 309

Hispanic 3,874 3,963 4,201 4,189 4,055 4,067 3,469

Pacific Islander 199 188 197 180 195 251 254

White 17,856 18,395 19,003 19,097 18,742 18,560 17,686

LEP 2,443 2,118 2,190 1,928 1,763 1,785 1,287

Special Education 2,311 2,243 2,356 2,251 2,093 2,022 1,672

Section 504 501 695 851 980 1,061 1,117 1,033

Table 38. Number of Students in Summative Mathematics Assessment (Grades 9 and 11)

Group G9 G11

All Students 6,515 421

Female 3,162 217

Male 3,353 204

African American 76 4

AmerIndian/Alaskan 108 8

Asian 51 4

Hispanic 1,107 66

Pacific Islander 41 5

White 5,132 334

LEP 404 17


Section 504 196 15

4.2 SUMMARY OF OVERALL STUDENT PERFORMANCE

Tables 39–46 present a summary of the 2018–2019 summative test results for all students and by subgroup,

including the average and the standard deviation of scale scores, the percentage of students in each

achievement level, and the percentage of proficient students. Figures 4–6 present the percentage of

proficient students in five years for all students (cohort comparisons). Figures 7–9 present the average scale

scores in five years for all students. The average and the standard deviation of scale scores, as well as the

percentage of proficient students for each test administration across five years by subgroup are provided in

Appendix B.




Table 39. ELA/L Percentage of Students in Achievement Levels

for Overall and by Subgroup (Grades 3–5)

Group Number

Tested

Scale

Score

Mean

Scale

Score

SD

%

Level 1

%

Level 2

%

Level 3

%

Level 4

%

Proficient

Grade 3

All Students 22,722 2427.79 87.96 25 25 25 26 50

Female 11,159 2435.42 86.29 22 24 26 28 54

Male 11,563 2420.43 88.93 28 25 24 23 47

African American 249 2390.77 90.56 41 24 19 16 35

AmerIndian/Alaskan 292 2379.94 87.03 47 24 18 11 29

Asian 260 2445.16 89.22 23 18 26 34 60

Hispanic 3,855 2389.17 81.36 40 28 20 12 32

Pacific Islander 198 2429.55 88.12 24 23 27 26 53

White 17,868 2437.15 86.62 21 24 26 29 55

LEP 2,405 2379.09 79.40 46 27 18 9 27 Special Education 2,309 2349.80 84.15 63 21 10 7 17

Section 504 492 2412.77 81.17 30 27 26 17 43

Grade 4

All Students 23,337 2471.34 93.64 28 20 25 27 52

Female 11,311 2479.78 91.74 24 20 26 29 55

Male 12,026 2463.40 94.71 31 20 24 25 49



Asian 254 2477.36 100.43 26 17 23 34 57

Hispanic 3,948 2429.98 86.21 44 24 20 12 32


White 18,407 2481.69 92.18 24 19 26 31 57

LEP 2,067 2411.93 85.33 52 23 16 9 24

Special Education 2,244 2378.66 89.43 69 15 10 6 16

Section 504 690 2455.52 88.93 33 24 23 20 43 Grade 5

All Students 24,315 2512.57 93.91 23 20 32 24 57

Female 12,005 2523.11 91.24 19 20 34 27 61

Male 12,310 2502.29 95.34 27 21 31 21 53



Asian 268 2541.03 102.08 19 15 26 40 66

Hispanic 4,186 2469.39 86.75 38 26 26 10 36


White 19,030 2523.20 92.12 19 19 34 28 62

LEP 2,140 2452.26 86.40 45 28 20 7 27


Section 504 847 2501.51 85.91 26 24 32 18 50

Note: The percentage of each achievement level may not add up to 100% due to rounding.






Group Number

Tested

Scale

Score

Mean

Scale

Score

SD

%

Level 1

%

Level 2

%

Level 3

%

Level 4

%

Proficient

Grade 6

All Students 24,367 2535.88 93.56 20 25 35 20 55

Female 11,855 2549.42 89.46 16 24 38 23 61

Male 12,512 2523.05 95.54 24 26 33 16 50



Asian 302 2573.29 94.43 13 16 35 36 71

Hispanic 4,171 2493.11 89.08 34 32 26 8 34


White 19,125 2546.20 91.07 16 24 38 22 60


Section 504 979 2516.33 88.29 25 30 34 12 46

Grade 7

All Students 23,835 2561.12 97.95 20 22 39 19 58

Female 11,647 2577.18 92.83 15 21 42 22 64

Male 12,188 2545.77 100.22 25 24 37 15 52



Asian 267 2583.58 102.07 16 18 37 28 66

Hispanic 4,035 2513.87 93.94 35 29 30 7 37


White 18,779 2572.63 94.99 16 21 42 21 63

LEP 1,718 2489.85 97.32 46 26 23 5 28

IDEA 2,107 2440.02 89.65 68 20 10 2 12

Section 504 1,059 2543.23 89.83 24 26 38 11 49 Grade 8

All Students 23,897 2570.21 96.96 20 26 38 16 54

Female 11,558 2588.55 92.85 15 24 41 20 61

Male 12,339 2553.03 97.60 26 28 34 12 46



Asian 348 2607.10 97.70 12 20 36 31 67

Hispanic 4,052 2526.05 91.44 34 33 27 6 33


White 18,591 2581.00 94.86 17 25 40 18 59

LEP 1,757 2501.89 91.64 44 32 19 4 24


Section 504 1,115 2556.67 88.07 22 32 37 9 46






for Overall and by Subgroup (Grade 10)

Group Number

Tested

Scale

Score

Mean

Scale

Score

SD

%

Level 1

%

Level 2

%

Level 3

%

Level 4

%

Proficient

Grade 10

All Students 22,305 2592.89 109.75 19 22 35 24 59

Female 10,892 2608.87 103.21 14 21 38 27 65

Male 11,413 2577.64 113.59 24 23 32 21 53



Asian 310 2627.31 101.02 11 19 36 35 71

Hispanic 3,460 2544.20 104.61 31 30 28 11 39


White 17,599 2603.62 107.44 16 20 36 27 64


Section 504 1,027 2582.78 105.01 21 25 34 20 54






for Overall and by Subgroup (Grades 9 and 11)

Group Number

Tested

Scale

Score

Mean

Scale

Score

SD

%

Level 1

%

Level 2

%

Level 3

%

Level 4

%

Proficient

Grade 9

All Students 6,495 2580.20 104.73 20 25 35 21 56

Female 3,164 2598.18 97.81 13 23 39 25 63

Male 3,331 2563.12 108.18 25 26 31 18 48



Asian 52 2614.30 102.24 15 12 37 37 73

Hispanic 1,074 2539.89 99.44 30 31 28 10 39


White 5,160 2590.43 102.76 17 24 36 23 60

LEP 371 2500.51 97.59 46 29 20 5 25 Special Education 536 2455.59 86.89 67 22 10 1 11

Section 504 195 2566.10 109.13 27 23 30 20 50

Grade 11

All Students 388 2563.96 122.27 29 27 26 18 44

Female 197 2576.92 117.24 25 28 26 21 47

Male 191 2550.59 126.15 33 26 26 15 41



Asian 3 2535.06 228.27 67 0 0 33 33

Hispanic 60 2519.53 123.11 45 23 25 7 32


White 308 2573.27 120.35 26 27 27 20 47

LEP 14 2449.88 81.47 71 29 0 0 0

Special Education 66 2455.41 105.81 67 26 2 6 8

Section 504 14 2552.00 101.06 29 29 29 14 43





Table 43. Mathematics Percentage of Students in Achievement Levels


Group Number

Tested

Scale

Score

Mean

Scale

Score

SD

%

Level 1

%

Level 2

%

Level 3

%

Level 4

%

Proficient

Grade 3

All Students 22,747 2438.41 81.70 24 23 30 23 53

Female 11,165 2434.63 79.52 25 24 30 21 51

Male 11,582 2442.06 83.58 23 22 30 25 55



Asian 264 2470.63 88.17 18 15 27 40 67

Hispanic 3,874 2398.48 75.38 41 27 22 9 31


White 17,856 2448.06 79.70 19 23 32 26 58


Section 504 501 2426.07 75.28 30 25 28 18 46

Grade 4

All Students 23,356 2481.83 81.94 19 31 29 21 50

Female 11,320 2477.88 77.78 20 33 30 18 48

Male 12,036 2485.54 85.51 19 29 28 24 52



Asian 261 2492.06 89.93 20 26 26 28 54

Hispanic 3,963 2441.73 75.41 35 37 20 8 28


White 18,395 2491.92 79.82 15 29 31 24 55

LEP 2,118 2426.81 76.99 43 35 15 6 22


Section 504 695 2469.37 79.20 23 35 27 15 42 Grade 5

All Students 24,322 2510.79 91.43 28 27 21 24 45

Female 12,010 2507.65 88.04 28 29 20 22 42

Male 12,312 2513.85 94.52 27 25 21 27 47



Asian 274 2548.67 103.14 23 16 16 46 61

Hispanic 4,201 2466.61 82.79 46 29 15 9 25


White 19,003 2521.85 89.16 23 27 22 28 50

LEP 2,190 2452.78 83.84 54 27 12 8 20


Section 504 851 2500.91 85.19 31 31 20 18 38







Group Number

Tested

Scale

Score

Mean

Scale

Score

SD

%

Level 1

%

Level 2

%

Level 3

%

Level 4

%

Proficient

Grade 6

All Students 24,368 2526.31 102.58 28 30 22 21 43

Female 11,854 2528.83 97.24 26 30 24 20 43

Male 12,514 2523.91 107.34 29 29 21 21 42



Asian 305 2574.87 107.29 16 19 25 39 64

Hispanic 4,189 2473.49 99.31 48 31 14 7 22


White 19,097 2539.18 98.35 23 30 24 24 48


Section 504 980 2507.93 93.97 33 34 20 13 32

Grade 7

All Students 23,826 2547.24 107.40 26 28 25 21 46

Female 11,643 2548.36 102.78 25 30 25 20 45

Male 12,183 2546.16 111.63 27 26 24 22 46



Asian 267 2585.12 119.70 22 18 23 37 60

Hispanic 4,055 2489.33 103.93 48 29 16 7 24


White 18,742 2561.47 102.57 21 28 27 24 51

LEP 1,763 2466.44 107.85 56 26 12 5 18


Section 504 1,061 2533.49 98.32 29 34 22 15 37 Grade 8

All Students 23,892 2555.54 115.57 32 27 20 21 41

Female 11,544 2561.69 109.03 29 29 21 21 42

Male 12,348 2549.80 121.09 35 26 19 21 39



Asian 351 2614.41 127.25 19 21 21 40 61

Hispanic 4,067 2496.33 103.59 53 28 11 7 19


White 18,560 2570.12 112.68 27 27 22 24 46

LEP 1,785 2475.53 106.75 62 23 9 5 15


Section 504 1,117 2537.52 103.96 38 30 18 14 32






for Overall and by Subgroup (Grade 10)

Group Number

Tested

Scale

Score

Mean

Scale

Score

SD

%

Level 1

%

Level 2

%

Level 3

%

Level 4

%

Proficient

Grade 10

All Students 22,274 2561.64 121.61 40 27 19 14 33

Female 10,866 2562.11 114.74 38 29 20 12 32

Male 11,408 2561.20 127.82 42 24 18 16 34



Asian 309 2619.60 129.62 23 25 24 28 51

Hispanic 3,469 2500.24 106.52 63 23 11 4 14


White 17,686 2575.20 119.38 35 28 21 16 37


Section 504 1,033 2546.89 115.28 46 27 17 9 27






for Overall and by Subgroup (Grades 9 and 11)

Group Number

Tested

Scale

Score

Mean

Scale

Score

SD

%

Level 1

%

Level 2

%

Level 3

%

Level 4

%

Proficient

Grade 9

All Students 6,515 2538.25 114.51 39 29 21 10 32

Female 3,162 2545.37 102.79 36 32 23 9 32

Male 3,353 2531.53 124.19 43 26 20 11 32



Asian 51 2575.30 109.73 27 31 24 18 41

Hispanic 1,107 2490.78 104.90 57 29 12 3 15


White 5,132 2550.62 112.86 35 29 24 12 36

LEP 404 2458.18 101.30 70 22 6 1 8 Special Education 528 2408.67 101.13 88 8 3 1 4

Section 504 196 2532.01 119.05 45 26 18 11 29

Grade 11

All Students 421 2522.32 127.98 57 22 13 8 21

Female 217 2516.93 121.10 56 25 14 5 19

Male 204 2528.06 134.98 57 20 12 11 23

African American 4*

AmerIndian/Alaskan 8*

Asian 4*

Hispanic 66 2458.83 97.03 82 12 5 2 6

Pacific Islander 5*

White 334 2536.90 129.94 51 24 15 10 25

LEP 17 2403.65 51.81 100 0 0 0 0

Special Education 69 2409.43 95.89 91 4 4 0 4

Section 504 15 2517.26 91.67 47 47 7 0 7

Note: The percentage of each achievement level may not add up to 100% due to rounding. * Suppressed the data due to the small sample size, n<10.




Figure 4. ELA/L %Proficient Across Years (Grades 3–8 and 10)

48

46

52

48

5152

60

4950

54

50

5354

61

4748

54

51

54

52

59

50 50

5554 54 54

59

50

52

57

55

58

54

59

Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Grade 10

2015 2016 2017 2018 2019




Figure 5. Mathematics %Proficient Across Years (Grades 3–8 and 10)

50

43

38

36

3837

30

52

47

4039

42

38

31

50

47

42

40

42

39

32

52

48

4344 44

42

33

53

50

45

43

46

41

33


2015 2016 2017 2018 2019




Figure 6. ELA/L and Mathematics %Proficient Across Years (Grades 9 and 11)




Figure 7. ELA/L Average Scale Score Across Years (Grades 3–8 and 10)

2425 24272422

2428 2428

2461

24682463

24672471

25022505 2504

25092513

25242527 2527

25322536

25472552 2551 2552

25612566

25702567

2570 2570

2597 2599

25922595 2593

2015 2016 2017 2018 2019





Figure 8. Mathematics Average Scale Score Across Years (Grades 3–8 and 10)

24312435 2434

2437 2438

2470

24772475

24782482

24992502

25052507

2511

2516

2521 2522

25292526

2532

2541 2541 2542

254725462551 2551

2557

255625552557

25592562 2562

2015 2016 2017 2018 2019





Figure 9. ELA/L and Mathematics Average Scale Score Across Years (Grades 9 and 11)




Because the precision of scores in each claim is not sufficient to report scores, given a small number of

items, the scores on each claim are reported using one of the three performance categories, taking into

account the Standard Error of Measurement (SEM) of the claim score: (1) Below Standard, (2) At/Near

Standard, or (3) Above Standard (see Section 7.5, Rules for Calculating Strengths and Weaknesses for

Claim Scores, for the rules). Tables 47–50 present the distribution of performance categories for each claim.

The number of claims is four in ELA/L, and three in mathematics, combining claims 2 and 4.

Table 47. ELA/L Percentage of Students in Performance Categories by Claim (Grades 3–8 and 10)

Grade Performance

Category

Claim 1:

Reading

Claim 2:

Writing

Claim 3:

Listening

Claim 4:

Research

3

Below 24 27 14 28

At/Near 48 53 62 52

Above 27 20 24 20

4

Below 24 24 14 28

At/Near 47 55 64 53

Above 28 21 22 20

5

Below 21 20 17 25

At/Near 47 55 63 48

Above 32 25 20 27

6

Below 25 22 15 20

At/Near 47 55 66 53

Above 27 23 19 26

7

Below 25 19 15 21

At/Near 48 52 68 52

Above 28 29 16 27

8

Below 24 20 15 24

At/Near 48 56 66 51

Above 28 24 18 25

10

Below 23 16 14 22

At/Near 45 50 63 51

Above 32 34 23 27

Table 48. ELA/L Percentage of Students in Performance Categories by Claim (Grades 9 and 11)

Grade Performance

Category

Claim 1:

Reading

Claim 2:

Writing

Claim 3:

Listening

Claim 4:

Research

9

Below 23 16 14 23

At/Near 48 54 66 52

Above 29 30 20 25

11

Below 35 24 23 34

At/Near 39 51 57 46

Above 25 25 20 19




Table 49. Mathematics Percentage of Students in Performance Categories by Claim (Grades 3–8 and 10)

Grade Performance

Category

Claim 1: Concepts

and Procedures

Claims 2 & 4:

Problem Solving

& Modeling and

Data Analysis

Claim 3:

Communicating

Reasoning

3

Below 30 23 22

At/Near 34 48 48

Above 36 29 29

4

Below 32 27 26

At/Near 34 48 47

Above 34 25 27

5

Below 37 28 30

At/Near 32 49 49

Above 31 23 21

6

Below 37 32 31

At/Near 35 47 48

Above 27 21 21

7

Below 35 26 22

At/Near 36 49 56

Above 29 25 22

8

Below 38 30 28

At/Near 36 47 51

Above 26 23 20

10

Below 46 15 24

At/Near 32 68 62

Above 22 17 14

Table 50. Mathematics Percentage of Students in Performance Categories by Claim (Grades 9 and 11)

Grade Performance

Category

Claim 1: Concepts

and Procedures

Claims 2 & 4:

Problem Solving

& Modeling and

Data Analysis

Claim 3:

Communicating

Reasoning

9

Below 47 17 14

At/Near 40 66 72

Above 12 16 13

11

Below 64 49 43

At/Near 24 41 46

Above 12 10 10

4.3 TEST-TAKING TIME

The Smarter Balanced assessments are not timed. The time spent on each item may vary among individual

students, which may provide useful information about student testing behaviors and motivation, for

example. Since the length of a test session could be monitored by teachers (TEs) or test administrators

(TAs) who are knowledgeable about their schools and their students, additional time for students who need

it would be arranged.




In the test delivery system (TDS), item response latency is captured as the item page time (the length of

time that each item page is presented) in milliseconds. Discrete items appear on the screen one item at a

time, and items associated with a stimulus appear on the screen together with the page time measured as

the total time spent on all associated items. In this case, the page time for each item is the average time for

all the items associated with the stimulus. For each student, the total testing time for the test was the sum

of the page time for all items.

Tables 51–54 present an average testing time and the testing time at percentiles for the overall test, the CAT

component, and the PT component.

Table 51. ELA/L Test-Taking Time (Grades 3–8 and 10)

Grade

Average

Testing

Time

(hh:mm)

SD of

Testing

Time

(hh:mm)

Testing Time in Percentiles (hh:mm)

75th 80th 85th 90th 95th

Overall Test

3 3:36 1:57 4:25 4:48 5:20 6:05 7:19

4 3:40 1:51 4:31 4:53 5:23 6:04 7:10

5 3:44 1:53 4:33 4:56 5:28 6:12 7:24

6 3:45 1:46 4:33 4:53 5:20 5:59 7:05

7 3:11 1:30 3:50 4:07 4:29 4:59 5:55

8 3:03 1:29 3:43 4:00 4:22 4:55 5:47

10 2:08 1:05 2:36 2:47 3:02 3:23 4:03

CAT Component

3 1:41 0:52 2:01 2:10 2:21 2:38 3:11

4 1:43 0:47 2:03 2:12 2:22 2:38 3:09

5 1:45 0:48 2:05 2:14 2:26 2:43 3:17

6 1:56 0:49 2:19 2:28 2:39 2:56 3:26

7 1:36 0:40 1:55 2:02 2:11 2:24 2:47

8 1:34 0:42 1:52 2:00 2:10 2:24 2:51

10 1:09 0:31 1:25 1:30 1:37 1:48 2:04

PT Component

3 1:55 1:22 2:27 2:44 3:07 3:39 4:35

4 1:58 1:19 2:30 2:47 3:09 3:40 4:31

5 1:59 1:18 2:31 2:47 3:09 3:38 4:30

6 1:49 1:10 2:17 2:31 2:50 3:16 3:59

7 1:35 1:01 1:58 2:11 2:25 2:48 3:29

8 1:29 0:58 1:52 2:04 2:18 2:40 3:17

10 0:59 0:41 1:15 1:22 1:32 1:45 2:11




Table 52. ELA/L Test-Taking Time (Grades 9 and 11)

Grade

Average

Testing

Time

(hh:mm)

SD of

Testing

Time

(hh:mm)



Overall Test

9 2:23 1:04 2:57 3:09 3:23 3:41 4:14

11 1:54 1:02 2:27 2:36 2:45 3:00 3:27

CAT Component

9 1:18 0:33 1:35 1:40 1:47 1:58 2:14

11 1:04 0:37 1:22 1:26 1:31 1:42 2:00

PT Component

9 1:05 0:38 1:24 1:31 1:40 1:52 2:11

11 0:49 0:32 1:05 1:10 1:17 1:28 1:41

Table 53. Mathematics Test-Taking Time (Grades 3–8 and 10)

Grade

Average

Testing

Time

(hh:mm)

SD of

Testing

Time

(hh:mm)



Overall Test

3 2:16 1:18 2:47 3:03 3:22 3:48 4:40

4 2:19 1:13 2:50 3:04 3:23 3:49 4:38

5 2:43 1:28 3:18 3:37 4:00 4:32 5:31

6 2:32 1:15 3:04 3:18 3:37 4:04 4:52

7 1:56 0:58 2:21 2:32 2:47 3:09 3:45

8 2:07 1:06 2:34 2:47 3:04 3:26 4:09

10 1:22 0:45 1:43 1:51 2:02 2:18 2:45

CAT Component

3 1:28 0:52 1:47 1:56 2:08 2:25 3:00

4 1:33 0:51 1:52 2:02 2:15 2:33 3:08

5 1:36 0:51 1:57 2:07 2:19 2:38 3:12

6 1:39 0:48 1:59 2:08 2:20 2:37 3:07

7 1:25 0:43 1:43 1:52 2:03 2:19 2:47

8 1:32 0:48 1:53 2:02 2:14 2:31 3:02

10 0:55 0:31 1:10 1:16 1:24 1:34 1:51

PT Component

3 0:49 0:36 1:02 1:09 1:19 1:33 1:57

4 0:46 0:32 0:59 1:06 1:14 1:26 1:47

5 1:06 0:49 1:23 1:32 1:45 2:02 2:33

6 0:53 0:39 1:07 1:14 1:23 1:37 1:59

7 0:31 0:23 0:39 0:43 0:49 0:58 1:13

8 0:35 0:25 0:43 0:48 0:55 1:04 1:20

10 0:27 0:20 0:34 0:38 0:43 0:50 1:02




Table 54. Mathematics Test-Taking Time (Grades 9 and 11)

Grade

Average

Testing

Time

(hh:mm)

SD of

Testing

Time

(hh:mm)



Overall Test

9 1:39 0:48 2:04 2:13 2:23 2:38 3:04

11 1:20 0:50 1:47 1:54 2:03 2:17 2:46

CAT Component

9 1:04 0:31 1:20 1:25 1:32 1:42 1:58

11 0:52 0:32 1:09 1:14 1:20 1:29 1:44

PT Component

9 0:35 0:22 0:46 0:50 0:55 1:03 1:15

11 0:28 0:24 0:37 0:43 0:47 0:57 1:06

4.4 DISTRIBUTION OF STUDENT ABILITY AND ITEM DIFFICULTY

Figures 10–19 show the empirical distribution of the Idaho student scale scores in the 2018–2019

administration and the distribution of the administered summative item difficulty parameters for overall

and by reporting category. For overall, the student ability distribution is shifted to the left in all grades and

subjects, a pattern more pronounced in the mathematics upper grades, indicating that the pool includes more

difficult items than the ability of students in the tested population. The pool includes difficult items to

accurately measure high-performing students but needs additional easy items to better measure low-

performing students. At the reporting category level, the student ability distribution is shifted to the left in

claims 1 (reading) and 4 (research) in ELA/L. In mathematics, the student ability distribution is shifted to

the left for all claims except for claim 1 in lower grades. The Smarter Balanced Assessment Consortium

plans to add additional easy items to the pool and to augment the pool in proportion to the test blueprint

constraints (e.g., content, DOK, item type, item difficulties) to better measure low-performing students.




Figure 10. Student Ability–Item Difficulty Distribution for ELA/L (Grades 3–8 and 10)

Figure 11. Student Ability–Item Difficulty Distribution for ELA/L (Grades 9 and 11)




Figure 12. Student Ability–Item Difficulty Distribution by Claim: ELA/L (Grades 3–5)




Figure 13. Student Ability–Item Difficulty Distribution by Claim: ELA/L (Grades 6–8, 10)




Figure 14. Student Ability–Item Difficulty Distribution by Claim: ELA/L (Grades 9 and 11)




Figure 15. Student Ability–Item Difficulty Distribution for Mathematics (Grades 3–8 and 10)

Figure 16. Student Ability–Item Difficulty Distribution for Mathematics (Grades 9 and 11)




Figure 17. Student Ability–Item Difficulty Distribution by Claim: Mathematics (Grades 3–5)




Figure 18. Student Ability–Item Difficulty Distribution by Claim: Mathematics (Grades 6–8, 10)




Figure 19. Student Ability–Item Difficulty Distribution by Claim: Mathematics (Grades 9 and 11)




5. VALIDITY

According to the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014),

validity refers to the degree to which evidence and theory support the interpretations of test scores as

described by the intended uses of assessments. The validity of an intended interpretation of test scores relies

on all the evidence accrued about the technical quality of a testing system, including test development and

construction procedures, test score reliability, accurate scaling and equating, procedures for setting

meaningful achievement standards, standardized test administration and scoring procedures, and attention

to fairness for all test takers. The appropriateness and usefulness of the Smarter Balanced summative

assessments depends on the assessments meeting the relevant standards of validity.

Validity evidence provided in this chapter is as follows:

• Test Content

• Internal Structure

Evidence on test content validity is provided with the blueprint match rates for the delivered tests. Evidence

on internal structure is examined in the results of intercorrelations among claim scores.

Some of the evidence on standardized test administration, scoring procedures, and attention to fairness for

all test takers is provided in other chapters.

5.1 EVIDENCE ON TEST CONTENT

The Smarter Balanced summative assessment includes two components: the computer-adaptive test (CAT)

and the performance task (PT). For the CAT, each student receives a different set of items adapted to his or

her ability. For PT, each student is administered a fixed-form test. The content coverage in all PT forms is

the same.

In the adaptive item-selection algorithm, item selection takes place in two discrete stages: blueprint

satisfaction and match-to-ability. The Smarter Balanced blueprints specify a range of items to be

administered in each claim, content domain/standards, and targets. Moreover, blueprints constrain the DOK

and item and passage types. For DOK constraints, the Smarter Balanced blueprint specifies the minimum

number of items, not the maximum. In blueprints, all content blueprint elements are configured to obtain a

strictly enforced range of items administered. The algorithm also seeks to satisfy target-level constraints,

but these ranges are not strictly enforced. In ELA/L, the blueprints also specify the number of passages in

reading (claim 1) and listening (claim 3) claims.

Tables 55–57 present the percentages of tests aligned with the ELA/L test blueprint constraints for items in

claims, targets and DOK, and passages in claims 1 and 3. For the passage constraints, four passages in claim

1 reading and three to four passages in claim 3 listening are required. The composition of four reading

passages in claim 1 is two literary-text passages (one long and one short passage) and two informational-

text passages (one long and one short passage) in grades 3–5 and one literary-text passage (long passage)

and three information-text passages (one long and two short passages) in grades 6–11.

All tests met the blueprint requirements, except for claims 1 and 2 targets, which administered a few items

more or less than the item requirement. The violations in claim 1 reading targets appeared in all grades due

to the uneven distribution of items across targets and DOKs within and across passages. The violations in




claim 2 writing targets appeared in grade 6 only due to the uneven distribution of items across targets and

DOKs.

Tables 58–60 provide the percentages of tests aligned with the test blueprint constraints for the mathematics

CAT for claims, DOK, and target constraints. In mathematics, the tests met all blueprint requirements.

Table 55. Percentage of ELA/L Delivered Tests Meeting Blueprint Requirements

for Each Claim and the Number of Passages Administered (Grades 3–5)

Claim Content Category/Target

Required

Items

in G3–5

%BP Match for

Item Requirements

%BP Match

for Passage

Requirement

G3 G4 G5 G3–5

1 Literary Text 7–8 100 100 100 100

Target 2: Central Ideas 1–2 100 100 100

Target 4: Reasoning and Evaluation 1–2 100 100 100

Targets 1, 3, 5, 6, & 7 3–6 100 100 100

Informational Text 7–8 100 100 100 100

Target 9: Central Ideas 1–2 83 98 99

Target 11: Reasoning and Evaluation 1–2 100 100 100

Targets 8, 10, 12, 13, & 14 3–6 100 100 100

DOK 2 ≥ 7 100 100 100

DOK 3 or 4 ≥ 2 100 100 100

2 Writing 6 100 100 100

Target 1, 3, or 6: Organization/Purpose 1 100 100 100

Target 1, 3, or 6: Evidence/Elaboration 1 100 100 100

Target 8: Language and Vocabulary Use 1 100 100 100

Target 9: Edit/Clarify 3 100 100 100

DOK 2 or higher ≥ 2 100 100 100

3 Listening 8–9 100 100 100 100

Target 4: Listen/Interpret 8–9 100 100 100

DOK 2 or higher ≥ 3 100 100 100

4 Research 8 100 100 100

Target 2: Interpret and Integrate

Information 2–3 100 100 100

Target 3: Analyze Information/Sources 2–3 100 100 100

Target 4: Use Evidence 2–3 100 100 100




Table 56. ELA/L Percentage of Delivered Tests Meeting Blueprint Requirements

for Each Claim and the Number of Passages Administered (Grades 6–8, 10)

Claim Content Category/Targets

Required

Items in

G6–8

Required

Items in

G10

%BP Match for Item

Requirements

%BP Match

for Passage

Requirement

G6 G7 G8 G10 G6–8, 10

1 Literary Text 4–7 4 100 100 100 100 100

Target 2: Central Ideas 1 1 99 100 100 99

Target 4: Reasoning and Evaluation 1 1 100 100 100 99

Targets 1, 3, 5, 6, & 7 2–5 2 100 100 100 99

Target 2 or 4 short text 0–1 0–1 100 100 100 100

Informational Text 10–12* 11–12 100 100 100 100 100

Target 9 & Target 11 2–5 2–4 100 98 96 93

Targets 8, 10, 12, 13, & 14 7–10 7–10 100 98 96 97

Target 9 or 11 short text 0–1 0–1 100 100 100 100

DOK 1 ≤ 5 ≤ 4 100 100 100 100

DOK 3 or 4 ≥ 2 ≥ 3 100 100 100 100

2 Writing 6 6 100 100 100 100

Target 1, 3, & 6 (Organization/Purpose) 1 1 100 100 100 100

Target 1, 3, & 6 (Evidence/Elaboration) 1 1 100 100 100 100

Target 8: Language and Vocabulary Use 1 1 94 100 100 100

Target 9: Edit/Clarify 3 3 94 100 100 100

DOK 2 ≥ 2 ≥ 2 100 100 100 100

DOK 3 or 4 1 1 100 100 100 100

Brief Write 1 1 100 100 100 100

3 Listening 8–9 8–9 100 100 100 100 100

Target 4: Listen/Interpret 8–9 8–9 100 100 100 100

DOK 2 or higher ≥ 3 ≥ 4 100 100 100 100

4

Research 8 8 100 100 100 100

Target 2: Interpret and Integrate Information

2–3 2–3 100 100 100 100

Target 3: Analyze Information/Sources 2–3 2–3 100 100 100 100

Target 4: Use Evidence 2–3 2–3 100 100 100 100

* Required items for Informational Text are 10–12 in grades 6 and 7, and 12 in grade 8.




Table 57. ELA/L Percentage of Delivered Tests Meeting Blueprint Requirements

for Each Claim and the Number of Passages Administered (Grades 9, 11)

Claim Content Category/Targets

Required

Items in

G9, 11

%BP Match for Item

Requirements

%BP Match

for Passage

Requirement

G9 G11 G9, 11

1 Literary Text 4 100 100 100

Target 2: Central Ideas 1 99 99

Target 4: Reasoning and Evaluation 1 99 99

Targets 1, 3, 5, 6, & 7 2 99 98

Target 2 or 4 Short Text 0–1 100 100

Informational Text 11–12 100 100 100

Target 9 & Target 11 2–4 93 92

Targets 8, 10, 12, 13, & 14 7–10 97 97

Target 9 or 11 Short Text 0–1 100 100

DOK 1 ≤ 4 100 100

DOK 3 or 4 ≥ 3 100 100

2 Writing 6 100 100

Target 1, 3, & 6 (Organization/Purpose) 1 100 100

Target 1, 3, & 6 (Evidence/Elaboration) 1 100 100

Target 8: Language and Vocabulary Use 1 100 100

Target 9: Edit/Clarify 3 100 100

DOK 2 ≥ 2 100 100

DOK 3 or 4 1 100 100

Brief Write 1 100 100

3 Listening 8–9 100 100 100

Target 4: Listen/Interpret 8–9 100 100

DOK 2 or Higher ≥ 4 100 100

4

Research 8 100 100

Target 2: Interpret and Integrate Information

2–3 100 100

Target 3: Analyze Information/Sources 2–3 100 100

Target 4: Use Evidence 2–3 100 100




Table 58. Mathematics Percentage of Delivered Tests Meeting Blueprint Requirements

for Claims and Targets (Grades 3–5)

Claim Content Domain

G3 G4 G5

Required

Items

%BP

Match

Required

Items

%BP

Match

Required

Items

%BP

Match

1 Overall 17–20 100 17–20 100 17–20 100

DOK 2 or higher ≥ 7 100 ≥ 7 100 ≥ 7 100 Priority Cluster 13–15 100

Targets B, C, G, I 5–6 100

Targets D, F 5–6 100

Target A 2–3 100

Supporting Cluster 4–5 100

Targets E. J, K 3–4 100

Target H 1 100

Priority Cluster 13–15 100

Targets A, E, F 8–9 100

Target G 2–3 100

Target D 1–2 100

Target H 1 100


Targets I, K 2–3 100

Targets B, C, J 1 100

Target L 1 100


Targets E, I 5–6 100

Target F 4–5 100

Targets C, D 3–4 100


Targets J, K 2–3 100

Targets A, B, G, H 2 100

2&4 Overall 6 100 6 100 6 100 DOK 3 or Higher ≥ 2 100 ≥ 2 100 ≥ 2 100

2. Target A 2 100 2 100 2 100

2. Targets B, C, D 1 100 1 100 1 100

4. Targets A, D 1 100 1 100 1 100 4. Targets B, E 1 100 1 100 1 100 4. Targets C, F 1 100 1 100 1 100

3 Overall 8 100 8 100 8 100

DOK 3 or Higher ≥ 2 100 ≥ 2 100 ≥ 2 100

Targets A, D 3 100 3 100 3 100

Targets B, E 3 100 3 100 3 100

Targets C, F 2 100 2 100 2 100





for Claims and Targets (Grades 6–8 and 10)


G6 G7 G8 G10

Required

Items

%BP

Match

Required

Items

%BP

Match

Required

Items

%BP

Match

Required

Items

%BP

Match

1 Overall 16–20 100 16–20 100 16–20 100 20 100

DOK 2 or Higher ≥ 7 100 ≥ 7 100 ≥ 7 100 ≥ 7 100 Priority Cluster 12–15 100

Targets E, F 5–6 100

Target A 3–4 100

Targets G, B 2 100

Target D 2 100


Targets C, H, I, J 4–5 100


Targets A, D 8–9 100

Targets B, C 5–6 100


Targets E, F 2–3 100

Targets G, H, I 1–2 100


Targets C, D 5–6 100

Targets B, E, G 5–6 100

Targets F, H 2–3 100


Targets A, I, J 4–5 100

Priority Cluster 13 100

Targets D, E 0–5 100

Target F 0–1 100

Targets G, H, I 0–5 100

Targets L, M, N 0-9 100

Supporting Cluster 7 100

Target O 3–5 100

Targets A, B 0–4 100

2&4 Overall 6 100 6 100 6 100 6 100 DOK 3 or higher ≥ 2 100 ≥ 2 100 ≥ 2 100 ≥ 2 100

2. Target A 2 100 2 100 2 100 2 100

2. Targets B, C, D 1 100 1 100 1 100 1 100

4. Targets A, D 1 100 1 100 1 100 1 100

4. Targets B, E 1 100 1 100 1 100 1 100

4. Targets C, F 1 100 1 100 1 100 1 100

3-Calc Overall 7 100 8 100 8 100 5 100

DOK 3 or higher ≥ 2 100 ≥ 2 100 ≥ 2 100 ≥ 2 100

Targets A, D 2–3 100 3 100 3 100 1–3 100

Targets B, E 2–3 100 3 100 3 100 2–3 100

Targets C, F, G 1–2 100 2 100 2 100 0–2 100

3-No

Calc Overall 1 100 1 100





for Claims and Targets (Grades 9 and 11)


Grade 9 Grade 11

Required

Items

%BP

Match

Required

Items

%BP

Match

1 Overall 20 100 19–22 100

DOK 2 or higher ≥ 7 100 ≥ 7 100 Priority Cluster 15 100 14–16 100

Targets D, E 2 100

Target F 1 100

Targets G, H, I 0–8 100 4–5 100

Target J 0–8 100 2 100

Target K 0–8 100 2 100

Targets L, M, N 0–8 100 3–4 100 Supporting Cluster 5 100 5–6 100

Target O 2 100

Target P 0–3 100 1–2 100

Targets A, B 1 99

Target C 0–3 100 1 100

2&4 Overall 6 100 6 100 DOK 3 or higher ≥ 2 100 ≥ 2 100

2. Target A 2 100 2 100

2. Targets B, C, D 1 100 1 100

4. Targets A, D 1 100 1 100

4. Targets B, E 1 100 1 100 4. Targets C, F 1 100 1 100

3-Calc Overall 8 100 7 100

DOK 3 or higher ≥ 2 100 ≥ 2 100 Targets A, D 2–3 100 2–3 100 Targets B, E 3 100 2–3 100

Targets C, F, G 1–2 100 1–2 100

3-No Calc Overall 1 100

Tables 61 and 62 summarizes the target coverage by claim that includes the average and range of the number

of unique targets administered in each delivered test. Since the test blueprint is not required to cover all

targets in each test, it is expected that the number of targets covered varies across tests. Although the target

coverage varies somewhat across individual tests, all targets are covered at an aggregate level across all

tests combined.




Table 61. Average and Range of the Number of Unique Targets Assessed

Within Each Claim Across All Delivered Tests (Grades 3–8 and 10)

Grade Total Targets in BP Mean Range (Minimum – Maximum)

C1 C2 C3 C4 C1 C2 C3 C4 C1 C2 C3 C4

ELA/L

3 14 5 1 3 10.1 4 1 3 8-13 4-4 1-1 3-3

4 14 5 1 3 10.7 4 1 3 8-14 4-4 1-1 3-3

5 14 5 1 3 11.4 4 1 3 9-14 4-4 1-1 3-3

6 14 5 1 3 10.3 4 1 3 8-11 4-5 1-1 3-3

7 14 5 1 3 10.7 4 1 3 8-11 4-4 1-1 3-3

8 14 5 1 3 10.9 4 1 3 8-11 4-4 1-1 3-3

10 14 5 1 3 10.0 4 1 3 6-11 4-4 1-1 3-3

Mathematics

3 11 4 6 6 10.9 2 5.7 3 9-11 2-2 4-6 3-3

4 12 4 6 6 10.0 2 5.4 3 9-10 2-2 3-6 3-3

5 11 4 6 6 9.0 2 5.2 3.0 9-9 2-2 3-6 3-4

6 10 4 7 6 10.0 2 4.6 3 8-10 2-2 3-7 3-3

7 9 4 7 6 8.0 2 4.6 3 8-8 2-2 3-6 3-3

8 10 4 7 6 10.0 2 4.8 3.0 10-10 1-2 3-6 2-4

10 11 4 7 6 7.9 2 3.9 3.0 6-9 2-2 2-6 2-3

Table 62. Average and Range of the Number of Unique Targets Assessed

Within Each Claim Across All Delivered Tests (Grades 9 and 11)

Grade Total Targets in BP Mean Range (Minimum – Maximum)

C1 C2 C3 C4 C1 C2 C3 C4 C1 C2 C3 C4

ELA/L

9 14 5 1 3 10.0 4 1 3 6-11 4-4 1-1 3-3

11 14 5 1 3 10.0 4 1 3 8-11 4-4 1-1 3-3

Mathematics

9 9 4 7 6 9.0 2 5.5 3 7-9 2-2 3-6 3-3

11 16 4 7 6 14.9 2 5.1 3 14-16 2-2 3-7 3-3

An adaptive testing algorithm constructs a test form unique to each student, targeting the student’s level of

ability and meeting the test blueprints. Consequently, the test forms will not be statistically parallel (e.g.,

equal test difficulty). However, scores from the test should be comparable, and each test form should

measure the same content, albeit with a different set of test items, ensuring the comparability of assessments

in content and scores. The blueprint match and target coverage results demonstrate that test forms conform

to the same content as specified, thus providing evidence of content comparability. In other words, while

each form is unique with respect to its items, all forms align with the same curricular expectations set forth

in the test blueprints.

5.2 EVIDENCE ON INTERNAL STRUCTURE

The measurement and reporting model used in the Smarter Balanced assessments assumes a single

underlying latent trait (ability), with achievement reported as a total score as well as scores for each claim

measured. The evidence on the internal structure is examined based on the correlations among claim scores.




The correlations among claim scores, both observed (below diagonal) and corrected for attenuation (above

diagonal), are presented in Tables 63–66. The correction for attenuation indicates what the correlation

would be if claim scores could be measured with perfect reliability, corrected (adjusted) for measurement

error estimates.

The observed correlation between two claim scores with measurement errors can be corrected for

attenuation 𝑟𝑥′𝑦′ =𝑟𝑥𝑦

√𝑟𝑥𝑥×𝑟𝑦𝑦, where 𝑟𝑥′𝑦′ is the correlation between x and y corrected for attenuation, 𝑟𝑥𝑦 is

the observed correlation between x and y, 𝑟𝑥𝑥 is the reliability coefficient for x, and 𝑟𝑦𝑦 is the reliability

coefficient for y.

When corrected for attenuation (above diagonal), the correlations among claim scores are higher than

observed correlations. The disattenuated correlations are quite high, especially in mathematics. The

correction for attenuation is large in mathematics, because the marginal reliabilities of claim 2 and 4 and

claim 3 scores are low. The low reliabilities are due to large standard errors among lower scores due to a

shortage of easy items in the item pool.

Because the reliability for claim scores are low, the performance of all the claim scores is reported in three

performance categories. The distribution of performance categories for each claim is provided in Tables

47–50, Section 4.2. Scale scores are not reported for claims.




Table 63. Correlations Among Claim Scores for ELA/L (Grades 3–8 and 10)

Grade Claim Observed & Disattenuated Correlation

Claim 1 Claim 2 Claim 3 Claim 4

3

Claim 1: Reading 0.88 0.92 0.91

Claim 2: Writing 0.65 0.85 0.87

Claim 3: Listening 0.62 0.56 0.88

Claim 4: Research 0.67 0.62 0.57

4

Claim 1: Reading 0.87 0.91 0.92

Claim 2: Writing 0.64 0.83 0.87



5

Claim 1: Reading 0.88 0.89 0.93

Claim 2: Writing 0.64 0.83 0.89



6

Claim 1: Reading 0.88 0.93 0.93

Claim 2: Writing 0.66 0.87 0.87



7

Claim 1: Reading 0.87 0.92 0.93

Claim 2: Writing 0.65 0.86 0.87



8

Claim 1: Reading 0.89 0.94 0.93

Claim 2: Writing 0.66 0.88 0.89



10

Claim 1: Reading 0.90 0.93 0.93

Claim 2: Writing 0.68 0.86 0.90



Table 64. Correlations Among Claim Scores for ELA/L (Grades 9 and 11)


Claim 1 Claim 2 Claim 3 Claim 4

9

Claim 1: Reading 0.88 0.94 0.93

Claim 2: Writing 0.65 0.83 0.88



11

Claim 1: Reading 0.88 1 0.94

Claim 2: Writing 0.67 0.87 0.93






Table 65. Correlations Among Claim Scores for Mathematics (Grades 3–8 and 10)


Claim 1 Claim 2&4 Claim 3

3

Claim 1 0.96 0.92

Claim 2 & 4 0.77 0.98

Claim 3 0.76 0.72

4

Claim 1 0.96 0.97

Claim 2 & 4 0.80 0.98

Claim 3 0.79 0.74

5

Claim 1 1 0.96

Claim 2 & 4 0.79 1

Claim 3 0.77 0.73

6

Claim 1 1 0.97

Claim 2 & 4 0.81 1

Claim 3 0.78 0.75

7

Claim 1 1 0.97

Claim 2 & 4 0.79 1

Claim 3 0.75 0.70

8

Claim 1 1 0.96

Claim 2 & 4 0.80 1

Claim 3 0.75 0.70

10

Claim 1 1 0.93

Claim 2 & 4 0.65 1

Claim 3 0.62 0.53

Legend: Claim 1: Concepts and Procedures; Claims 2 & 4: Problem Solving & Modeling and Data Analysis; Claim 3: Communicating Reasoning

Table 66. Correlations Among Claim Scores for Mathematics (Grades 9 and 11)


Claim 1 Claim 2&4 Claim 3

9

Claim 1 1 0.90

Claim 2 & 4 0.70 1

Claim 3 0.59 0.56

11

Claim 1 1 1

Claim 2 & 4 0.76 1

Claim 3 0.71 0.64

Legend: Claim 1: Concepts and Procedures; Claims 2 & 4: Problem Solving & Modeling and Data Analysis; Claim 3: Communicating Reasoning




6. RELIABILITY

Reliability refers to the consistency in test scores. Reliability is evaluated in terms of the Standard Error of

Measurement (SEM). In classical test theory, reliability is defined as the ratio of the true score variance to

the observed score variance, assuming the error variance is the same for all scores. Within the item response

theory (IRT) framework, measurement error varies conditioning on ability. The amount of precision in

estimating achievement can be determined by the test information, which describes the amount of

information provided by the test at each score point along the ability continuum. Test information is a value

that is the inverse of the measurement error of the test; the larger the measurement error, the less test

information is being provided. In computer-adaptive testing (CAT), because selected items vary among

students, the measurement error can vary for the same ability depending on the selected items for each

student.

The reliability evidence of the Smarter Balanced summative tests is provided with marginal reliability,

SEM, and classification accuracy and consistency in each achievement level.

6.1 MARGINAL RELIABILITY

For reliability, the marginal reliability was computed for the scale scores, taking into account the varying

measurement errors across the ability range. Marginal reliability is a measure of the overall reliability of an

assessment based on the average conditional SEM, estimated at different points on the ability scale, for all

students.

The marginal reliability (�̅�) is defined as

�̅� = [𝜎2 − (∑ 𝐶𝑆𝐸𝑀𝑖

2𝑁𝑖=1

𝑁)]/𝜎2,

where N is the number of students; 𝐶𝑆𝐸𝑀𝑖is the conditional standard error of measurement of the scale

score for student i; and 𝜎2is the variance of the scale score. The higher the reliability coefficient, the greater

the precision of the test.

Another way to examine test reliability is with the SEM. In IRT, SEM is estimated as a function of test

information provided by a given set of items that makes up the test. In CAT, items administered vary among

all students, so the SEM also can vary among students, which yields conditional SEM. The average

conditional SEM can be computed as

𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐶𝑆𝐸𝑀 = 𝜎√1 − �̄� = √∑ 𝐶𝑆𝐸𝑀𝑖2𝑁

𝑖=1 /𝑁.

The smaller the value of average conditional SEM, the greater the accuracy of test scores.

Tables 67 and 68 presents the marginal reliability coefficients and the average conditional SEM for the total

scale scores.




Table 67. Marginal Reliability for ELA/L and Mathematics (Grades 3–8 and 10)

Grade N

Number of

Items Specified

in Test

Blueprint

Marginal

Reliability

Scale Score

Mean

Scale Score

SD

Average

CSEM

Min Max

ELA/L

3 22,722 38 41 0.92 2427.79 87.96 25.64

4 23,337 38 41 0.91 2471.34 93.64 27.56

5 24,315 38 41 0.92 2512.57 93.91 27.04

6 24,367 38 41 0.91 2535.88 93.56 27.50

7 23,835 38 41 0.91 2561.12 97.95 28.80

8 23,897 40 41 0.91 2570.21 96.96 28.88

10 22,305 39 41 0.91 2592.89 109.75 32.15

Mathematics

3 22,747 39 40 0.95 2438.41 81.70 19.12

4 23,356 37 40 0.94 2481.83 81.94 19.40

5 24,322 38 40 0.94 2510.79 91.43 22.50

6 24,368 38 39 0.94 2526.31 102.58 25.53

7 23,826 38 40 0.94 2547.24 107.40 27.13

8 23,892 38 40 0.93 2555.54 115.57 29.53

10 22,274 36 38 0.89 2561.64 121.61 40.25

Table 68. Marginal Reliability for ELA/L and Mathematics (Grades 9 and 11)

Grade N

Number of

Items Specified

in Test

Blueprint

Marginal

Reliability

Scale Score

Mean

Scale Score

SD

Average

CSEM

Min Max

ELA/L

9 6,495 39 41 0.91 2580.20 104.73 32.07

11 388 39 41 0.92 2563.96 122.27 34.98

Mathematics

9 6,515 38 39 0.88 2538.25 114.51 40.17

11 421 40 42 0.91 2522.32 127.98 38.45

6.2 STANDARD ERROR CURVES

Figures 20–23 present plots of the conditional SEM of scale scores across the range of ability. The vertical

lines indicate the cut scores for Level 2, Level 3, and Level 4. The item selection algorithm matched items

to each student’s ability and to the test blueprints with the same precision across the range of abilities.

Overall, the standard error curves suggest that students are measured with a high degree of precision, given

that the standard errors are consistently low. However, larger standard errors are observed at the lower ends

of the score distribution relative to the higher ends. This occurs because the item pools currently have a

shortage of easy items that are better targeted toward these lower-achieving students. Content experts use

this information to consider how to further target and populate item pools.




Figure 20. Conditional Standard Error of Measurement for ELA/L (Grades 3–8 and 10)




Figure 21. Conditional Standard Error of Measurement (Grades 9 and 11)




Figure 22. Conditional Standard Error of Measurement for Mathematics (Grades 3–8 and 10)




Figure 23. Conditional Standard Error of Measurement for Mathematics (Grades 9 and 11)




The SEMs presented in Figures 20–23 are summarized in Tables 69–72. Tables 69 and 70 provides the

average conditional SEM for all scores and for scores in each achievement level. Tables 71 and 72 present

the average conditional SEMs at each cut score and the difference in average conditional SEMs between

two cut scores. As shown in Figures 20–23, the greatest average conditional SEM is in Level 1 in both

ELA/L and mathematics. Average conditional SEMs at all cut scores are similar in ELA/L, but larger in

Level 2 cut scores in mathematics.

Table 69. Average Conditional Standard Error of Measurement by Achievement Level (Grades 3–8, 10)

Grade Level 1 Level 2 Level 3 Level 4 Average

CSEM

ELA/L

3 29.11 24.10 23.67 25.28 25.64

4 29.01 26.57 26.07 28.09 27.56

5 27.58 24.89 25.71 29.82 27.04

6 29.92 25.81 26.34 29.06 27.50

7 31.29 26.97 27.36 31.04 28.80

8 32.35 27.38 27.22 30.36 28.88

10 38.17 30.32 29.56 32.17 32.15

Mathematics

3 22.62 18.23 17.22 18.41 19.12

4 24.02 18.43 17.24 18.81 19.40

5 28.41 21.15 18.52 19.12 22.50

6 33.87 22.55 20.46 21.11 25.53

7 36.02 25.47 22.01 21.18 27.13

8 36.38 28.94 24.15 22.16 29.53

10 53.37 32.57 25.55 22.89 40.25

Table 70. Average Conditional Standard Error of Measurement by Achievement Level (Grades 9 and 11)

Grade Level 1 Level 2 Level 3 Level 4 Average

CSEM

ELA/L

9 38.01 30.46 29.59 31.76 32.07

11 43.92 30.20 29.49 32.77 34.98

Mathematics

9 51.71 33.21 28.45 25.81 40.17

11 45.61 28.60 24.33 21.93 38.45





and Difference of the SEMs Between Two Cuts (Grades 3–8 and 10)

Grade L2 Cut L3 Cut L4 Cut |L2–L3| |L3–L4| |L2–L4|

ELA/L

3 25.20 23.35 23.86 1.85 0.51 1.34

4 26.87 26.09 26.05 0.78 0.04 0.82

5 24.95 24.93 26.67 0.02 1.74 1.72

6 26.24 25.97 27.02 0.27 1.05 0.78

7 27.46 26.89 28.82 0.57 1.93 1.36

8 28.14 27.12 28.08 1.02 0.96 0.06

10 32.05 29.79 29.85 2.26 0.06 2.20

Mathematics

3 18.97 17.63 17.02 1.34 0.61 1.95

4 19.52 17.52 17.17 2.00 0.35 2.35

5 23.56 19.20 18.02 4.36 1.18 5.54

6 24.31 21.16 19.93 3.15 1.23 4.38

7 27.83 23.56 21.12 4.27 2.44 6.71

8 31.04 26.10 22.37 4.94 3.73 8.67

10 36.75 27.95 22.71 8.80 5.24 14.04


and Difference of the SEMs Between Two Cuts (Grades 9 and 11)

Grade L2 Cut L3 Cut L4 Cut |L2–L3| |L3–L4| |L2–L4|

ELA/L

9 32.21 29.68 30.22 2.53 0.54 1.99

11 32.52 – 31.90 – – 0.62

Mathematics

9 36.08 30.77 26.63 5.31 4.14 9.45

11 30.05 24.63 21.20 5.42 3.43 8.85

Note: Cells with “–” indicate no student data at the achievement level cuts.

6.3 RELIABILITY OF ACHIEVEMENT CLASSIFICATION

When student performance is reported in terms of achievement levels, a reliability of achievement

classification is computed in terms of the probabilities of accurate and consistent classification of students

as specified in Standard 2.16 in the Standards for Educational and Psychological Testing (AERA, APA,

and NCME, 2014). The indexes consider the accuracy and consistency of classifications.

For a fixed-form test, the accuracy and consistency of classifications are estimated on a single form’s test

scores from a single test administration based on the true-score distribution estimated by fitting a bivariate

beta-binomial model or a four-parameter beta model (Huynh, 1976; Livingston & Wingersky, 1979;

Subkoviak, 1976; Livingston & Lewis, 1995). For the CAT, because the adaptive testing algorithm

constructs a test form unique to each student, the classification indexes are computed based on all sets of

items administered across students using an IRT-based method (Guo, 2006).

The classification index can be examined in terms of the classification accuracy and the classification

consistency. Classification accuracy refers to the agreement between the classifications based on the form




actually taken and the classifications that would be made on the basis of the test takers’ true scores if their

true scores could somehow be known. Classification consistency refers to the agreement between the

classifications based on the form (adaptively administered items) actually taken and the classifications that

would be made on the basis of an alternate form (another set of adaptively administered items given the

same ability), that is, the percentages of students who would be consistently classified in the same

achievement levels on two equivalent test forms.

In reality, the true ability is unknown, and students do not take an alternate, equivalent form; therefore, the

classification accuracy and the classification consistency are estimated on the basis of students’ item scores

and the item parameters, and the assumed underlying latent ability distribution as described below. The true

score is an expected value of the test score with a measurement error.

For the ith student, the student’s estimated ability is 𝜃𝑖 with SEM of 𝑠𝑒(𝜃𝑖), and the estimated ability is

distributed, as 𝜃𝑖~𝑁(𝜃𝑖 , 𝑠𝑒2(𝜃𝑖)), assuming a normal distribution, where 𝜃𝑖 is the unknown true ability of

the ith student. The probability of the true score at achievement level l based on the cut scores 𝑐𝑙−1 and 𝑐𝑙 is estimated as

𝑝𝑖𝑙 = 𝑝(𝑐𝑙−1 ≤ 𝜃𝑖 < 𝑐𝑙) = 𝑝 ( 𝑐𝑙−1 − 𝜃𝑖

𝑠𝑒(𝜃𝑖)≤𝜃𝑖 − 𝜃𝑖

𝑠𝑒(𝜃𝑖)< 𝑐𝑙 − 𝜃𝑖

𝑠𝑒(𝜃𝑖)) = 𝑝(

𝜃𝑖 − 𝑐𝑙

𝑠𝑒(𝜃𝑖)<𝜃𝑖 − 𝜃𝑖

𝑠𝑒(𝜃𝑖)≤ 𝜃𝑖 − 𝑐𝑙−1

𝑠𝑒(𝜃𝑖))

= Φ(𝜃𝑖 − 𝑐𝑙−1

𝑠𝑒(𝜃𝑖)) −Φ(

𝜃𝑖 − 𝑐𝑙

𝑠𝑒(𝜃𝑖)).

Instead of assuming a normal distribution of 𝜃𝑖~𝑁(𝜃𝑖 , 𝑠𝑒2(𝜃𝑖)), we can estimate the above probabilities

directly using the likelihood function.

The likelihood function of theta given a student’s item scores represents the likelihood of the student’s

ability at that theta value. Integrating the likelihood values over the range of theta at and above the cut point

(with proper normalization) represents the probability of the student’s latent ability or the true score being

at or above that cut point. If a student with estimated theta is below the cut point, a probability of being at

or above the cut point is an estimate of the chance that this student is misclassified as below the cut, and 1

minus that probability is the estimate of the chance that the student is correctly classified as below the cut

score. Using this logic, we can define various classification probabilities.

The probability of the ith student being classified at achievement level l (𝑙 = 1,2,⋯ , 𝐿) based on the cut

scores 𝑐𝑢𝑡𝑙−1 and 𝑐𝑢𝑡𝑙 , given the student’s item scores 𝐳𝑖 = (𝑧𝑖1, ⋯ , 𝑧𝑖𝐽) and item parameters 𝐛 =

(𝐛1, ⋯ , 𝐛𝐽), and using the J administered items, can be estimated as

𝑝𝑖𝑙 = 𝑃(𝑐𝑢𝑡𝑙−1 ≤ 𝜃𝑖 < 𝑐𝑢𝑡𝑙|𝐳, 𝐛) =∫ 𝐿(𝜃|𝐳,𝐛)𝑑𝜃𝑐𝑢𝑡𝑙𝑐𝑢𝑡𝑙−1

∫ 𝐿(𝜃|𝐳,𝐛)𝑑𝜃+∞−∞

for 𝑙 = 2,⋯ , 𝐿 − 1,

𝑝𝑖1 = 𝑃(−∞ < 𝜃𝑖 < 𝑐𝑢𝑡1|𝐳, 𝐛) =∫ 𝐿(𝜃|𝐳,𝐛)𝑑𝜃𝑐𝑢𝑡1−∞

∫ 𝐿(𝜃|𝐳,𝐛)𝑑𝜃+∞−∞

,

𝑝𝑖𝐿 = 𝑃(𝑐𝑢𝑡𝐿−1 ≤ 𝜃𝑖 < ∞|𝐳,𝐛) =∫ 𝐿(𝜃|𝐳,𝐛)𝑑𝜃∞𝑐𝑢𝑡𝐿−1

∫ 𝐿(𝜃|𝐳,𝐛)𝑑𝜃+∞−∞

,

where the likelihood function, based on general IRT models, is




𝐿(𝜃|𝐳𝑖 , 𝐛) = ∏ (𝑧𝑖𝑗𝑐𝑗 +(1−𝑐𝑗)𝐸𝑥𝑝(𝑧𝑖𝑗𝐷𝑎𝑗(𝜃−𝑏𝑗))

1+𝐸𝑥𝑝(𝐷𝑎𝑗(𝜃−𝑏𝑗)))𝑗∈d ∏ (

𝐸𝑥𝑝(𝐷𝑎𝑗(𝑧𝑖𝑗𝜃−∑ 𝑏𝑖𝑘𝑧𝑖𝑗𝑘=1

))

1+∑ 𝐸𝑥𝑝(𝐷𝑎𝑗(∑ (𝜃−𝑏𝑗𝑘)𝑚𝑘=1 ))

𝐾𝑗𝑚=1

)𝑗∈p ,

where d stands for dichotomous and p stands for polytomous items; 𝐛𝑗 = (𝑎𝑗, 𝑏𝑗, 𝑐𝑗) if the jth item is a

dichotomous item, and 𝐛𝑗 = (𝑎𝑗 , 𝑏𝑗1, … , 𝑏𝑗𝐾𝑖) if the jth item is a polytomous item; 𝑎𝑗 is the item’s

discrimination parameter (for Rasch model, 𝑎𝑗 = 1), 𝑐𝑗 is the guessing parameter (for Rasch and 2PL

models, 𝑐𝑗 = 0), and 𝐷 is 1.7 for non-Rasch models and 1 for Rasch model.

Classification Accuracy

Using 𝑝𝑖𝑙, we can construct a 𝐿 × 𝐿 table as

(

𝑛𝑎11 ⋯ 𝑛𝑎1𝐿⋮ ⋮ ⋮

𝑛𝑎𝐿1 ⋯ 𝑛𝑎𝐿𝐿),

where 𝑛𝑎𝑙𝑚 = ∑ 𝑝𝑖𝑚𝑝𝑙𝑖=𝑙 . 𝑛𝑎𝑙𝑚 is the expected number of students at achievement level lm, 𝑝𝑙𝑖 is the ith

student’s achievement level, and 𝑝𝑖𝑚 are the probabilities of the ith student being classified at achievement

level m. In the above table, the row represents the observed level and the column represents the expected

level.

The classification accuracy (CA) at level 𝑙 (𝑙 = 1,⋯ , 𝐿) is estimated by

𝐶𝐴𝑙 =𝑛𝑎𝑙𝑙

∑ 𝑛𝑎𝑙𝑚𝐿𝑚=1

,

and the overall classification accuracy is estimated by

𝐶𝐴 =∑ 𝑛𝑎𝑙𝑙𝐿𝑙=1

𝑁,

where 𝑁 is the total number of students.

Classification Consistency

Using 𝑝𝑖𝑙 , which is similar to accuracy, we can construct another 𝐿 × 𝐿 table by assuming the test is

administered twice independently to the same student group, hence we have

(

𝑛𝑐11 ⋯ 𝑛𝑐1𝐿⋮ ⋮ ⋮

𝑛𝑐𝐿1 ⋯ 𝑛𝑐𝐿𝐿),

where 𝑛𝑐𝑙𝑚 = ∑ 𝑝𝑖𝑙𝑝𝑖𝑚𝑁𝑖=1 . 𝑝𝑖𝑙 and 𝑝𝑖𝑚 are the probabilities of the ith student being classified at

achievement level l and m, respectively based on observed scores and hypothetical scores from equivalent

test form.

The classification consistency (CC) at level 𝑙 (𝑙 = 1,⋯ , 𝐿) is estimated by

𝐶𝐶𝑙 =𝑛𝑐𝑙𝑙

∑ 𝑛𝑐𝑙𝑚𝐿𝑚=1

,

and the overall classification consistency is




𝐶𝐶 =∑ 𝑛𝑐𝑙𝑙𝐿𝑙=1

𝑁.

The analysis of the classification index is performed based on overall scale scores. Tables 73–74 provide

the percentages of classification accuracy and consistency both overall and by achievement level.

The overall classification index ranged from 78% to 83% for accuracy and from 70% to 76% for consistency

across all grades and subjects. For achievement levels, the classification index is higher in L1 and L4 than

in L2 and L3. The higher accuracy at L1 and L4 is due to the fact that the intervals used to compute the

classification probabilities for students in L1 and L4 [−∞, L2 cut; L4 cut, ∞] are wider than the intervals

used to compute the classification probabilities for students in L2 and L3 [L2 cut, L3 cut; L3 cut, L4

cut]. The misclassification probability tends to be higher for narrower intervals.

Accuracy of classifications is higher than the consistency of classifications in all achievement levels. The

accuracy is higher than the consistency because the accuracy is based on one test with a measurement error

and the true score while the consistency is based on two tests with measurement errors. The classification

indexes by subgroup are provided in Appendix C.




Table 73. Classification Accuracy and Consistency by Achievement Level (Grades 3–8 and 10)

Grade Achievement Level ELA/L Mathematics

% Accuracy % Consistency % Accuracy % Consistency

3

Overall 79 71 83 76

L1 89 83 90 85

L2 71 60 74 64

L3 68 57 79 72

L4 87 81 89 84

4

Overall 78 70 83 76

L1 90 84 89 82

L2 63 51 80 73

L3 65 55 78 71

L4 87 80 89 83

5

Overall 79 71 82 75

L1 90 83 89 84

L2 67 55 77 68

L3 74 66 71 61

L4 85 78 90 85

6

Overall 79 71 82 75

L1 89 81 91 85

L2 72 62 77 69

L3 76 68 72 62

L4 83 75 89 83

7

Overall 80 72 82 75

L1 90 83 91 85

L2 70 59 76 67

L3 78 71 74 65

L4 84 75 89 83

8

Overall 80 72 81 74

L1 88 81 90 84

L2 73 63 72 63

L3 79 72 71 60

L4 83 73 90 85

10

Overall 80 72 81 74

L1 88 81 89 84

L2 72 61 69 59

L3 76 68 75 65

L4 85 78 90 84




Table 74. Classification Accuracy and Consistency by Achievement Level (Grades 9 and 11)

Grade Achievement Level ELA/L Mathematics

% Accuracy % Consistency % Accuracy % Consistency

9

Overall 79 70 79 71

L1 87 80 89 84

L2 71 61 68 58

L3 76 68 70 60

L4 84 76 85 74

11

Overall 81 73 87 82

L1 89 83 95 92

L2 72 64 73 64

L3 77 67 77 66

L4 87 77 88 83

6.4 RELIABILITY FOR SUBGROUPS

The reliability of test scores is also computed by subgroup. Tables 75–80 present the marginal reliability

coefficients and average conditional SEMs. The reliability coefficients are similar across subgroups in

ELA/L, but somewhat lower in mathematics for subgroups with low performance, e.g., LEP and special

education. In mathematics, due to the shortage of easy items in the item pool, especially in grade 10, lower

reliability coefficients are due to the large percentage of students in Level 1 with large SEMs in these

subgroups.

Table 75. Marginal Reliability and Average Conditional SEM by Subgroup for ELA/L (Grades 3–5)

Group G3 G4 G5

MR CSEM MR CSEM MR CSEM

All Students 0.92 25.64 0.91 27.56 0.92 27.04

Female 0.91 25.46 0.91 27.46 0.91 27.08

Male 0.92 25.82 0.91 27.65 0.92 27.00

African American 0.91 26.74 0.91 28.24 0.92 27.09

AmerIndian/Alaskan 0.90 26.95 0.90 28.21 0.92 27.05

Asian 0.92 25.42 0.92 27.82 0.93 27.95

Hispanic 0.90 26.29 0.90 27.70 0.91 26.53

Pacific Islander 0.92 25.59 0.90 27.54 0.89 26.54

White 0.91 25.47 0.91 27.51 0.91 27.14

LEP 0.89 26.56 0.89 28.04 0.90 26.82

Special Education 0.89 28.51 0.89 29.57 0.90 28.33

Economic Disadvantage 0.90 25.61 0.90 27.53 0.91 26.46




Table 76. Marginal Reliability and Average Conditional SEM by Subgroup for ELA/L


Group G6 G7 G8 G10

MR CSEM MR CSEM MR CSEM MR CSEM

All Students 0.91 27.50 0.91 28.80 0.91 28.88 0.91 32.15

Female 0.91 27.38 0.90 28.68 0.90 28.71 0.91 31.59

Male 0.92 27.62 0.92 28.90 0.91 29.05 0.92 32.67

African American 0.92 28.27 0.92 30.02 0.91 29.96 0.90 35.72

AmerIndian/Alaskan 0.91 28.68 0.91 29.09 0.89 29.15 0.90 33.11

Asian 0.91 27.79 0.92 29.15 0.91 28.90 0.90 31.37

Hispanic 0.90 27.62 0.91 28.89 0.90 29.48 0.90 33.09

Pacific Islander 0.92 27.59 0.90 28.39 0.90 28.75 0.91 31.78

White 0.91 27.44 0.91 28.75 0.91 28.73 0.91 31.90

LEP 0.90 28.25 0.91 29.60 0.89 30.50 0.89 35.21

Special Education 0.88 30.08 0.88 31.38 0.84 32.27 0.84 37.18

Economic Disadvantage 0.90 27.26 0.90 28.38 0.90 28.38 0.91 31.89

Table 77. Marginal Reliability and Average Conditional SEM by Subgroup for ELA/L

(Grades 9 and 11)

Group G9 G11

MR CSEM MR CSEM

All Students 0.91 32.07 0.92 34.98

Female 0.90 31.50 0.91 35.63

Male 0.91 32.60 0.93 34.30

African American 0.93 35.25 0.91 29.77

AmerIndian/Alaskan 0.89 33.44 0.91 33.84

Asian 0.91 31.29 0.97 39.66

Hispanic 0.89 32.80 0.92 34.99

Pacific Islander 0.92 32.69 0.91 32.28

White 0.90 31.84 0.92 35.06

LEP 0.87 34.62 0.79 37.47

Special Education 0.82 36.86 0.87 38.12

Economic Disadvantage 0.91 32.47 0.90 32.61




Table 78. Marginal Reliability and Average Conditional SEM by Subgroup for Mathematics

(Grades 3–5)

Group G3 G4 G5

MR CSEM MR CSEM MR CSEM

All Students 0.95 19.12 0.94 19.40 0.94 22.50

Female 0.94 19.09 0.94 19.24 0.94 22.42

Male 0.95 19.15 0.95 19.55 0.94 22.57

African American 0.94 20.99 0.94 22.67 0.93 26.30

AmerIndian/Alaskan 0.93 20.13 0.93 21.48 0.93 26.26

Asian 0.95 19.29 0.95 19.46 0.95 22.10

Hispanic 0.93 20.00 0.93 20.35 0.91 24.46

Pacific Islander 0.94 19.04 0.94 19.42 0.92 21.52

White 0.94 18.88 0.94 19.10 0.94 21.92

LEP 0.93 20.38 0.92 21.30 0.91 25.53

Special Education 0.93 22.37 0.93 23.70 0.89 28.71

Economic Disadvantage 0.94 19.00 0.94 19.49 0.93 22.54

Table 79. Marginal Reliability and Average Conditional SEM by Subgroup for Mathematics


Group G6 G7 G8 G10

MR CSEM MR CSEM MR CSEM MR CSEM

All Students 0.94 25.53 0.94 27.13 0.93 29.53 0.89 40.25

Female 0.93 24.91 0.93 26.69 0.93 28.94 0.88 39.48

Male 0.94 26.11 0.94 27.55 0.94 30.07 0.90 40.98

African American 0.92 32.99 0.92 33.83 0.91 34.61 0.79 56.63

AmerIndian/Alaskan 0.91 28.73 0.92 31.87 0.90 32.90 0.83 47.75

Asian 0.95 24.13 0.95 26.22 0.95 27.82 0.93 34.59

Hispanic 0.92 28.72 0.91 30.96 0.90 32.68 0.80 47.68

Pacific Islander 0.94 24.67 0.93 26.51 0.93 29.20 0.89 40.52

White 0.94 24.62 0.94 26.06 0.94 28.66 0.90 38.26

LEP 0.90 31.22 0.90 33.72 0.90 34.55 0.77 53.84

Special Education 0.88 36.75 0.86 38.42 0.83 38.29 0.65 60.25

Economic Disadvantage 0.93 25.65 0.92 27.37 0.92 30.02 0.87 40.95




Table 80. Marginal Reliability and Average Conditional SEM by Subgroup for Mathematics (Grades 9

and 11)

Group G9 G11

MR CSEM MR CSEM

All Students 0.88 40.17 0.91 38.45

Female 0.86 38.37 0.90 38.71

Male 0.89 41.80 0.92 38.17

African American 0.84 51.79 0.91 35.18

AmerIndian/Alaskan 0.81 44.47 0.78 46.14

Asian 0.89 35.73 0.95 41.36

Hispanic 0.81 45.14 0.79 44.51

Pacific Islander 0.90 42.21 0.88 34.31

White 0.88 38.74 0.92 36.99

LEP 0.76 49.39 – 51.92

Special Education 0.69 56.50 0.72 50.35

Economic Disadvantage 0.88 40.86 0.83 37.51

Note: Cells with “–” indicate that marginal reliability coefficients were not computed due to a large SEM.

6.5 RELIABILITY FOR CLAIM SCORES

The marginal reliability coefficients and the measurement errors are also computed for claim scores. In

mathematics, claims 2 and 4 are combined to have enough items to generate a score. Because the precision

of scores in claims is insufficient to report scores given a small number of items, the scores on each claim

are reported using one of the three performance categories, taking into account the SEM of the claim score: (1) Below Standard, (2) At/Near Standard, or (3) Above Standard. Tables 81–84 present the marginal

reliability coefficients for each claim score in ELA/L and mathematics, respectively.




Table 81. Marginal Reliability Coefficients for Claim Scores in ELA/L (Grades 3–8 and 10)

Grade Claim

Number of Items

Specified in Test

Blueprint Marginal

Reliability

Scale Score

Mean

Scale Score

SD

Average

CSEM

Min Max

3

Claim 1: Reading 14 16 0.76 2434.21 101.23 49.83

Claim 2: Writing 7 7 0.72 2417.80 109.88 57.84

Claim 3: Listening 8 9 0.60 2442.93 121.86 76.74

Claim 4: Research 9 9 0.70 2406.09 117.48 64.44

4

Claim 1: Reading 14 16 0.76 2474.01 105.46 51.58

Claim 2: Writing 7 7 0.71 2467.56 120.24 64.26

Claim 3: Listening 8 9 0.60 2487.85 132.40 83.37

Claim 4: Research 9 9 0.71 2449.89 123.99 67.32

5

Claim 1: Reading 14 16 0.75 2522.27 110.86 55.58

Claim 2: Writing 7 7 0.71 2513.90 116.60 62.50

Claim 3: Listening 8 9 0.61 2510.58 132.55 82.41

Claim 4: Research 9 9 0.76 2501.64 118.54 58.30

6

Claim 1: Reading 14 19 0.76 2528.38 112.70 55.12

Claim 2: Writing 7 7 0.74 2534.96 111.26 57.22

Claim 3: Listening 8 9 0.59 2554.57 137.44 88.19

Claim 4: Research 9 9 0.71 2531.98 122.26 66.26

7

Claim 1: Reading 14 19 0.78 2559.04 114.76 54.38

Claim 2: Writing 7 7 0.73 2565.61 121.17 62.59

Claim 3: Listening 8 9 0.55 2568.59 134.15 89.50

Claim 4: Research 9 9 0.71 2550.39 132.90 70.98

8

Claim 1: Reading 16 19 0.75 2565.15 116.67 58.13

Claim 2: Writing 7 7 0.72 2575.05 119.17 63.18

Claim 3: Listening 8 9 0.58 2582.01 131.81 84.92

Claim 4: Research 9 9 0.72 2558.16 126.30 67.17

10

Claim 1: Reading 15 16 0.77 2588.07 127.88 60.76

Claim 2: Writing 7 7 0.73 2602.37 135.12 69.82

Claim 3: Listening 8 9 0.60 2596.05 153.66 96.68

Claim 4: Research 9 9 0.71 2575.88 144.41 77.98




Table 82. Marginal Reliability Coefficients for Claim Scores in ELA/L (Grades 9 and 11)

Grade Claim

Number of Items

Specified in Test

Blueprint Marginal

Reliability

Scale Score

Mean

Scale Score

SD

Average

CSEM

Min Max

9

Claim 1: Reading 15 16 0.76 2573.84 123.87 61.07

Claim 2: Writing 7 7 0.71 2589.51 129.44 69.72

Claim 3: Listening 8 9 0.58 2580.93 149.14 96.34

Claim 4: Research 9 9 0.70 2564.23 141.51 77.95

11

Claim 1: Reading 15 16 0.79 2563.90 142.25 64.54

Claim 2: Writing 7 7 0.74 2569.91 147.81 74.79

Claim 3: Listening 8 9 0.64 2564.64 170.11 101.37

Claim 4: Research 9 9 0.71 2544.02 146.56 79.30

Table 83. Marginal Reliability Coefficients for Claim Scores in Mathematics (Grades 3–8 and 10)

Grade Claim

Number of Items

Specified in Test

Blueprint Marginal

Reliability

Scale Score

Mean

Scale Score

SD

Average

CSEM

Min Max

3

Claim 1 20 20 0.91 2442.15 91.02 28.04

Claims 2 & 4 8 11 0.72 2432.21 93.72 49.98

Claim 3 9 11 0.76 2431.99 95.53 47.07

4

Claim 1 20 20 0.90 2484.41 87.59 27.54

Claims 2 & 4 8 10 0.76 2476.93 94.59 46.31

Claim 3 9 10 0.74 2474.86 97.10 49.09

5

Claim 1 20 20 0.90 2514.38 98.33 31.65

Claims 2 & 4 8 10 0.69 2503.38 101.36 56.42

Claim 3 9 10 0.72 2499.93 113.25 60.35

6

Claim 1 19 19 0.89 2528.79 110.58 37.18

Claims 2 & 4 9 10 0.73 2517.03 116.52 60.85

Claim 3 9 11 0.73 2519.40 120.12 62.70

7

Claim 1 20 20 0.89 2546.80 114.65 37.86

Claims 2 & 4 9 10 0.68 2538.28 125.13 70.77

Claim 3 9 10 0.67 2538.05 131.07 75.15

8

Claim 1 20 20 0.89 2557.48 124.42 42.16

Claims 2 & 4 8 10 0.71 2545.45 131.66 71.21

Claim 3 9 10 0.69 2542.88 139.16 76.88

10

Claim 1 20 20 0.84 2558.80 136.65 55.48

Claims 2 & 4 8 10 0.49 2535.88 164.12 117.25

Claim 3 7 8 0.54 2539.14 152.58 103.94

Legend: Claim 1: Concepts and Procedures; Claims 2 & 4: Problem Solving & Modeling and Data Analysis; and Claim 3: Communicating Reasoning




Table 84. Marginal Reliability Coefficients for Claim Scores in Mathematics (Grades 9 and 11)

Grade Claim

Number of Items

Specified in Test

Blueprint Marginal

Reliability

Scale Score

Mean

Scale Score

SD

Average

CSEM

Min Max

9

Claim 1 20 20 0.80 2528.82 117.70 52.03

Claims 2 & 4 8 10 0.49 2531.78 146.37 104.39

Claim 3 9 10 0.53 2529.12 162.90 112.20

11

Claim 1 22 22 0.86 2520.15 132.84 48.93

Claims 2 & 4 8 10 0.56 2492.14 178.73 118.77

Claim 3 9 10 0.54 2512.92 156.14 105.50

Legend: Claim 1: Concepts and Procedures; Claims 2 & 4: Problem Solving & Modeling and Data Analysis; and Claim 3: Communicating Reasoning




7. SCORING

The Smarter Balanced Assessment Consortium provided the vertically-scaled item parameters by linking

across all grades using common items in adjacent grades. All scores are estimated based on these item

parameters. Each student received an overall scale score, an overall achievement level, and a performance

category for each claim. This section describes the rules used in generating scores, as well as the

handscoring procedure.

7.1 ESTIMATING STUDENT ABILITY USING MAXIMUM LIKELIHOOD ESTIMATION

The Smarter Balanced tests are scored using maximum likelihood estimation (MLE). The likelihood

function for generating the MLEs is based on a mixture of item types.

Indexing items by i, the likelihood function based on the jth person’s score pattern for I items is

𝐿𝑗(𝜃𝑗|𝒛𝑗, 𝒂,𝑏1,… 𝑏𝑘) = ∏ 𝑝𝑖𝑗(𝑧𝑖𝑗|𝜃𝑗, 𝑎𝑖,𝑏𝑖,1, … 𝑏𝑖,𝑚𝑖)𝐼

𝑖=1 ,

where 𝑏𝑖′ = (𝑏𝑖,1, … , 𝑏𝑖,𝑚𝑖

) for the ith item’s step parameters, 𝑚𝑖 is the maximum possible score of this

item, 𝑎𝑖 is the discrimination parameter for item i, 𝑧𝑖𝑗is the observed item score for the person j, and k

indexes the step of the item i.

Depending on the item score points, the probability 𝑝𝑖𝑗(𝑧𝑖𝑗|𝜃𝑗 , 𝑎𝑖 , 𝑏𝑖,1, … , 𝑏𝑖,𝑚𝑖) takes either the form of a

two-parameter logistic (2PL) model for items with one point or the form based on the generalized partial

credit model (GPCM) for items with two or more points.

In the case of items with one score point, we have 𝑚𝑖 = 1,

𝑝𝑖𝑗(𝑧𝑖𝑗|𝜃𝑗, 𝑎𝑖,𝑏𝑖,1, … 𝑏𝑖,𝑚𝑖) =

{

𝑒𝑥𝑝 (𝐷𝑎𝑖(𝜃𝑗 − 𝑏𝑖,1))

1 + 𝑒𝑥𝑝 (𝐷𝑎𝑖(𝜃𝑗 − 𝑏𝑖,1))= 𝑝𝑖𝑗, 𝑖𝑓 𝑧𝑖𝑗 = 1

1

1 + 𝑒𝑥𝑝 (𝐷𝑎𝑖(𝜃𝑗 − 𝑏𝑖,1))= 1 − 𝑝𝑖𝑗, 𝑖𝑓 𝑧𝑖𝑗 = 0

}

;

in the case of items with two or more points,

𝑝𝑖𝑗(𝑧𝑖𝑗|𝜃𝑗 , 𝑎𝑖,𝑏𝑖,1, … 𝑏𝑖,𝑚𝑖) =

{

𝑒𝑥𝑝(

∑ 𝐷𝑎𝑖(𝜃𝑗 −𝑧𝑖𝑗𝑘=1

𝑏𝑖,𝑘))

𝑠𝑖𝑗(𝜃𝑗, 𝑎𝑖,𝑏𝑖,1,…𝑏𝑖,𝑚𝑖)

, 𝑖𝑓 𝑧𝑖𝑗 > 0

1

𝑠𝑖𝑗(𝜃𝑗, 𝑎𝑖,𝑏𝑖,1,…𝑏𝑖,𝑚𝑖), 𝑖𝑓 𝑧𝑖𝑗 = 0

}

,

where 𝑠𝑖𝑗(𝜃𝑗, 𝑎𝑖,𝑏𝑖,1,…𝑏𝑖,𝑚𝑖) = 1 + ∑ exp (∑ 𝐷𝑎𝑖(

𝑙𝑘=1

𝑚𝑖𝑙=1 𝜃𝑗 − 𝑏𝑖,𝑘)), 𝑎𝑛𝑑 𝐷 = 1.7.

Standard Error of Measurement

With MLE, the standard error (SE) for student j is:




𝑆𝐸(𝜃𝑗) = 1

√𝐼(𝜃𝑗) ,

where 𝐼(𝜃𝑗) is the test information for student j, calculated as:

𝐼(𝜃𝑗) = ∑ 𝐷2𝑎𝑖2 (

∑ 𝑙2𝐸𝑥𝑝(∑ 𝐷𝑎𝑖(𝜃𝑗−𝑏𝑖𝑘)𝑙𝑘=1 )

𝑚𝑖𝑙=1

1+∑ 𝐸𝑥𝑝(∑ 𝐷𝑎𝑖(𝜃𝑗−𝑏𝑖𝑘)𝑙𝑘=1 )

𝑚𝑖𝑙=1

− (∑ 𝑙𝐸𝑥𝑝(∑ 𝐷𝑎𝑖(𝜃𝑗−𝑏𝑖𝑘)

𝑙𝑘=1 )

𝑚𝑖𝑙=1

1+∑ 𝐸𝑥𝑝(∑ 𝐷𝑎𝑖(𝜃𝑗−𝑏𝑖𝑘)𝑙𝑘=1 )

𝑚𝑗𝑙=1

)

2

)𝐼𝑖=1 ,

where 𝑚𝑖 is the maximum possible score point (starting from 0) for the ith item, and 𝐷 is the scale factor,

1.7. The SE is calculated based only on the answered item(s) for both complete and incomplete tests. The

upper bound of the SE is set to 2.5 on the 𝜃 metric. Any value larger than 2.5 is truncated at 2.5 on the 𝜃

metric.

The algorithm allows previously answered items to be changed; however, it does not allow items to be

skipped. Item selection requires iteratively updating the estimate of the overall and claim ability estimates

after each item is answered. When a previously answered item is changed, the proficiency estimate is

adjusted to account for the changed responses when the next new item is selected. While the update of the

ability estimates is performed at each iteration, the overall and claim scores are recalculated using all data

at the end of the assessment for the final score.

7.2 RULES FOR TRANSFORMING THETA TO VERTICAL SCALE SCORES

The student’s performance in each subject is summarized in an overall test score referred to as a scale score.

The scale scores represent a linear transformation of the ability estimates (theta scores) using the formula,

𝑆𝑆 = 𝑎 ∗ 𝜃 + 𝑏 . The scaling constants a and b are provided by the Smarter Balanced assessment

consortium. Table 85 presents the scaling constants for each subject for the theta-to-scale score linear

transformation. Scale scores are rounded to an integer.

Table 85. Vertical Scaling Constants on the Reporting Metric

Subject Grade Slope (a) Intercept (b)

ELA/L 3–11 85.8 2508.2

Mathematics 3–11 79.3 2514.9

Standard errors of the MLEs are transformed to be placed onto the reporting scale. This transformation is:

𝑆𝐸𝑠𝑠 = 𝑎 ∗ 𝑆𝐸𝜃,

where 𝑆𝐸𝑠𝑠 is the standard error of the ability estimate on the reporting scale, 𝑆𝐸𝜃 is the standard error of

the ability estimate on the 𝜃 scale, and a is the slope of the scaling constant that transforms 𝜃 to the

reporting scale.

The scale scores are mapped into four achievement levels using three achievement standards (i.e., cut

scores). Table 86 provides three achievement standards for each grade and content area.




Table 86. Cut Scores in Scale Scores (Grades 3–11)


Level 2 Level 3 Level 4 Level 2 Level 3 Level 4

3 2367 2432 2490 2381 2436 2501

4 2416 2473 2533 2411 2485 2549

5 2442 2502 2582 2455 2528 2579

6 2457 2531 2618 2473 2552 2610

7 2479 2552 2649 2484 2567 2635

8 2487 2567 2668 2504 2586 2653

9 2488 2571 2670 2491 2577 2677

10 2491 2577 2677 2529 2614 2697

11 2493 2583 2682 2543 2628 2718

7.3 LOWEST/HIGHEST OBTAINABLE SCORES (LOSS/HOSS)

Although the observed score is measured more precisely in an adaptive test than in a fixed-form test,

especially for high- and low-performing students, if the item pool does not include enough easy or difficult

items to measure low- and high-performing students, the standard error can be large in low and high ends

of the ability range. The Smarter Balanced Assessment Consortium decided to truncate extreme unreliable

student ability estimates. Table 87 presents the lowest obtainable score (LOT or LOSS) and the highest

obtainable score (HOT or HOSS) in both theta and scale score metrics. Estimated thetas lower than LOT

or higher than HOT are truncated to the LOT and HOT values, and are assigned LOSS and HOSS associated

with the LOT and HOT. LOT and HOT were applied to all tests and all scores (total and claim scores). The

standard error for LOT and HOT is computed using the LOT and HOT ability estimates given the

administered items.




Table 87. Extended Lowest- and Highest-Obtainable Scores (Grades 3−11)

Subject Grade Theta Metric Scale Score Metric

LOT HOT LOSS HOSS

ELA/L 3 −5.9110 3.5332 2001 2811

ELA/L 4 −5.5500 4.1826 2032 2867

ELA/L 5 −5.2670 4.7546 2056 2916

ELA/L 6 −5.0000 5.0000 2079 2937

ELA/L 7 −4.9660 5.3119 2082 2964

ELA/L 8 −4.7925 5.6063 2097 2989

ELA/L 9 −4.7305 6.1096 2102 3032

ELA/L 10 −4.7305 6.1096 2102 3032

ELA/L 11 −4.7305 6.1096 2102 3032

Mathematics 3 −5.6030 3.1219 2071 2762

Mathematics 4 −5.3601 4.0264 2090 2834

Mathematics 5 −5.3012 4.7426 2095 2891

Mathematics 6 −5.1942 5.0000 2103 2911

Mathematics 7 −5.1311 5.6630 2108 2964

Mathematics 8 −5.0681 6.0272 2113 2993

Mathematics 9 −5.0000 7.1896 2118 3085

Mathematics 10 −5.0000 7.1896 2118 3085

Mathematics 11 −5.0000 7.1896 2118 3085

7.4 SCORING ALL CORRECT AND ALL INCORRECT CASES

In IRT maximum likelihood (ML) ability estimation methods, zero and perfect scores are assigned the

ability of minus and plus infinity. For all correct and all incorrect cases, the highest obtainable scores (HOT

and HOSS) or the lowest obtainable scores (LOT and LOSS) were assigned in the 2014–2015

administration. Since the 2015–2016 administration, all incorrect and correct cases were scored by either

adding 0.5 to or subtracting 0.5 from an item score with the smallest item discrimination parameter among

the administered operational items (CAT and PT) for a student.

7.5 RULES FOR CALCULATING STRENGTHS AND WEAKNESSES FOR CLAIM SCORES

In ELA/L, claim scores are computed for each claim. In mathematics, claim scores are computed for claim

1, claims 2 and 4 combined, and claim 3. For each claim, three performance categories, relative strengths

and weaknesses, are produced.

If the difference between the proficiency cut score and the claim score is greater (or fewer) than 1.5 times

the standard error of the claim, a plus or minus indicator appears on the student’s score report.

For summative tests, the specific rules are as follows:

• Below Standard (Code = 1): if 𝑟𝑜𝑢𝑛𝑑(𝑆𝑆𝑟𝑐 + 1.5 ∗ 𝑆𝐸(𝑆𝑆𝑟𝑐),0) < 𝑆𝑆𝑝

• At/Near Standard (Code = 2): if 𝑟𝑜𝑢𝑛𝑑(𝑆𝑆𝑟𝑐 + 1.5 ∗ 𝑆𝐸(𝑆𝑆𝑟𝑐),0) ≥ 𝑆𝑆𝑝 and 𝑟𝑜𝑢𝑛𝑑(𝑆𝑆𝑟𝑐 −

1.5 ∗ 𝑆𝐸(𝑆𝑆),0) < 𝑆𝑆𝑝, a strength or weakness is indeterminable

• Above Standard (Code = 3): if 𝑟𝑜𝑢𝑛𝑑(𝑆𝑆𝑟𝑐 − 1.5 ∗ 𝑆𝐸(𝑆𝑆𝑟𝑐),0) ≥ 𝑆𝑆𝑝




where 𝑆𝑆𝑟𝑐 is the student’s scale score on a claim; 𝑆𝑆𝑝 is the proficiency scale score cut (Level 3 cut); and

𝑆𝐸(𝑆𝑆𝑟𝑐) is the standard error of the student’s scale score on the claim.

7.6 TARGET SCORES

The target-level reports cannot be produced for a fixed-form test because the number of items included per

target (i.e., benchmark) is too low to produce a reliable score at the target level. A typical fixed-form test

includes only one or two items per target. Even when aggregated, these data narrowly reflect the benchmark

because they reflect only one or two ways of measuring the target. An adaptive test, however, offers a

tremendous opportunity for target-level data at the class, school, and district area level. With an adequate

item pool, a class of 20 students might respond to 10 or 15 different items measuring any given target.

Target scores are computed for attempted tests based on the responded items. Target scores are computed

in each claim (four claims) for ELA/L and only in claim 1 for mathematics.

Target scores are computed in two ways: (1) target scores relative to a student’s overall estimated ability

(θ), and (2) target scores relative to the proficiency standard (Level 3 cut).

7.6.1 Target Scores Relative to Student’s Overall Estimated Ability

By defining 𝑝𝑖𝑗 = 𝑝(𝑧𝑖𝑗 = 1), indicating the probability that student j responds correctly to item i, 𝑧𝑖𝑗

represents the jth student’s score on the ith item. For items with one score point, we use the 2PL IRT model

to calculate the expected score on item i for student j with estimated ability 𝜃𝑗 as:

𝐸(𝑧𝑖𝑗) =exp (𝐷𝑎𝑖(�̂�𝑗 − 𝑏𝑖))

1 + exp (𝐷𝑎𝑖(�̂�𝑗 − 𝑏𝑖))

For items with two or more score points, using the GPCM model, the expected score for student j with

estimated ability 𝜃𝑗 on an item i with a maximum possible score of mi is calculated as:

𝐸(𝑧𝑖𝑗) =∑𝑙exp(∑ 𝐷𝑎𝑖(�̂�𝑗 − 𝑏𝑖,𝑘)

𝑙𝑘=1 )

1 + ∑ exp(∑ 𝐷𝑎𝑖(�̂�𝑗 − 𝑏𝑖,𝑘)𝑙𝑘=1 )𝑚𝑖

𝑙=1

𝑚𝑖

𝑙=1

For each item i, the residual between observed and expected score for each student is defined as:

𝛿𝑖𝑗 = 𝑧𝑖𝑗 − 𝐸(𝑧𝑖𝑗)

Residuals are summed for items within a target. The sum of residuals is divided by the total number of

points possible for items within the target, T.

𝛿𝑗𝑇 =∑ 𝛿𝑗𝑖𝑖∈𝑇

∑ 𝐾𝑚𝑖𝑖∈𝑇.

For an aggregate unit, a target score is computed by averaging the individual student target scores for the

target, across students of different abilities receiving different items and measuring the same target at

different levels of difficulty,

𝛿̅𝑇𝑔 =1

𝑛𝑔∑ 𝛿𝑗𝑇𝑗∈𝑔 , and 𝑠𝑒(𝛿̅𝑇𝑔) = √

1

𝑛𝑔(𝑛𝑔−1)∑ (𝛿𝑗𝑇 − 𝛿̅𝑇𝑔)

2,𝑗∈𝑔




where 𝑛𝑔 is the number of students who responded to any of the items that belong to the target T for an

aggregate unit g. If a student did not happen to see any items on a particular target, the student is NOT

included in the 𝑛𝑔 count for the aggregate.

A statistically significant difference from zero in these aggregates may indicate that a roster, teacher, school,

or district is more effective (if 𝛿�̅�𝑔is positive) or less effective (negative 𝛿̅𝑇𝑔) in teaching a given target.

In the aggregate, a target performance is reported as a group of students performing better, worse, or as

expected on this target. In some cases, insufficient information will be available and that will be indicated

as well.

For target-level strengths/weakness, report the following:

• If 𝛿̅𝑇𝑔 ≥ +1 ∗ 𝑠𝑒(𝛿�̅�𝑔), then performance is better than on the rest of the test.

• If 𝛿̅𝑇𝑔 ≤ −1 ∗ 𝑠𝑒(𝛿�̅�𝑔), then performance is worse than on the rest of the test.

• Otherwise, performance is similar to performance on the test as a whole.

• If 𝑠𝑒(𝛿�̅�𝑔) > 0.2, data are insufficient.

7.6.2 Target Scores Relative to Proficiency Standard (Level 3 Cut)

By defining 𝑝𝑖𝑗 = 𝑝(𝑧𝑖𝑗 = 1), indicating the probability that student j responds correctly to item i. 𝑧𝑖𝑗

represents the jth student’s score on the ith item. For items with one score point we use the 2PL IRT model

to calculate the expected score on item i for student j with 𝜃𝐿𝑒𝑣𝑒𝑙 3 𝑐𝑢𝑡 as:

𝐸(𝑧𝑖𝑗) =exp(𝐷𝑎𝑖(𝜃𝐿𝑒𝑣𝑒𝑙 3 𝑐𝑢𝑡 − 𝑏𝑖))

1 + exp(𝐷𝑎𝑖(𝜃𝐿𝑒𝑣𝑒𝑙 3 𝑐𝑢𝑡 − 𝑏𝑖))

For items with two or more score points, using the generalized partial credit model, the expected score for

student j with Level 3 cut on an item i with a maximum possible score of mi is calculated as:

𝐸(𝑧𝑖𝑗) =∑𝑙exp(∑ 𝐷𝑎𝑖(𝜃𝐿𝑒𝑣𝑒𝑙 3 𝑐𝑢𝑡 − 𝑏𝑖,𝑘)

𝑙𝑘=1 )

1 + ∑ exp(∑ 𝐷𝑎𝑖(𝜃𝐿𝑒𝑣𝑒𝑙 3 𝑐𝑢𝑡 − 𝑏𝑖,𝑘)𝑙𝑘=1 )𝑚𝑖

𝑙=1

𝑚𝑖

𝑙=1

For each item i, the residual between observed and expected score for each student is defined as:

𝛿𝑖𝑗 = 𝑧𝑖𝑗 − 𝐸(𝑧𝑖𝑗)

Residuals are summed for items within a target. The sum of residuals is divided by the total number of

points possible for items within the target, T.

𝛿𝑗𝑇 =∑ 𝛿𝑗𝑖𝑖∈𝑇

∑ 𝑚𝑖𝑖∈𝑇.

For an aggregate unit, a target score is computed by averaging individual student target scores for the target,

across students of different abilities receiving different items measuring the same target at different levels

of difficulty,




𝛿̅𝑇𝑔 =1

𝑛𝑔∑ 𝛿𝑗𝑇𝑗∈𝑔 , and 𝑠𝑒(𝛿̅𝑇𝑔) = √

1

𝑛𝑔(𝑛𝑔−1)∑ (𝛿𝑗𝑇 − 𝛿̅𝑇𝑔)

2,𝑗∈𝑔

where 𝑛𝑔 is the number of students who responded to any of the items that belong to the target T for an

aggregate unit g. If a student did not happen to see any items on a particular target, the student is NOT

included in the 𝑛𝑔 count for the aggregate.

A statistically significant difference from zero in these aggregates may indicate that a class, teacher, school,

or district is more effective (if 𝛿�̅�𝑔is positive) or less effective (negative 𝛿̅𝑇𝑔) in teaching a given target.

We do not suggest direct reporting of the statistic 𝛿̅𝑇𝑔; instead, we recommend reporting whether, in the

aggregate, a group of students performs better, worse, or as expected on this target. In some cases,

insufficient information will be available and that will be indicated, as well.

For target-level strengths/weakness, we will report the following:

• If 𝛿̅𝑇𝑔 ≥ +1 ∗ 𝑠𝑒(𝛿�̅�𝑔), then performance is above the Proficiency Standard.

• If 𝛿̅𝑇𝑔 ≤ −1 ∗ 𝑠𝑒(𝛿�̅�𝑔), then performance is below the Proficiency Standard.

• Otherwise, performance is near the Proficiency Standard.

• If 𝑠𝑒(𝛿�̅�𝑔) > 0.2, data are insufficient.

7.7 HANDSCORING

AIR provides the automated electronic scoring, and Measurement Incorporated (MI) provides all

handscoring for the Smarter Balanced summative assessments. Short-answer (SA) items and full-write

items in ELA/L and SA items in mathematics are scored by human raters; this is also referred to as

handscoring. The procedures for scoring these items are specified by Smarter Balanced.

Outlined below is the scoring process MI follows. This procedure is used to score responses to all

constructed-response short answer and essay items.

7.7.1 Rater Selection

MI maintains a large pool of raters at each scoring center, as well as distributive raters who work remotely.

MI’s recruiting team first recruits qualified raters who have experience scoring the Smarter Balanced

assessment. Rater accuracy parameters are used to focus recruitment efforts for experienced Smarter

Balanced raters in order to recruit the most objectively accurate raters. Once recruited, experienced raters

are assigned to the content area and grade band(s) in which they are most experienced. These experienced,

demonstrably accurate raters comprise the majority of the total rater pool.

To supplement this core pool, MI contacts other raters in their database who have experience successfully

scoring other large-scale assessments. These raters are assigned to the grade level, subject area, and item

type for which they are most qualified based on their performance on similar projects. Returning staff are

selected based on experience and performance, as well as attendance, punctuality, and cooperation with

work procedures and MI policies. MI maintains evaluations and performance data for all staff who work

on each scoring project in order to determine employment eligibility for future projects. Finally, MI targets

recruitment of new raters for site-based and remote scoring as needed, in order to continue to identify talent




across the country that will best fulfill the handscoring requirements. For new raters, MI’s recruiting team

reviews applications, including prospective raters’ resumes, references, proof of degree, and recognition of

rater requirements, before offering employment.

In selecting team leaders, MI scoring leadership review the files of all returning staff. They look for people

who are experienced team leaders with a record of good performance on previous projects and also consider

raters who have been recommended for promotion to the team leader position.

MI is an equal opportunity employer that actively recruits minority staff. Historically, MI’s temporary staff

on major projects averages about 51% female, 49% male, 76% Caucasian, and 24% minority.

MI requires all handscoring project staff (scoring directors, team leaders, raters, and clerical staff) to sign

a confidentiality/nondisclosure agreement before receiving any training or secure project materials. The

employment agreement indicates that no participant in training and/or scoring may reveal information about

the test, the scoring criteria, or the scoring methods to any person.

7.7.2 Rater Training

All raters hired for Smarter Balanced assessment handscoring task are trained using the rubric(s), anchor

sets, and training/qualifying sets provided by Smarter Balanced. These sets were created during the original

field-test scoring in 2014 and approved by Smarter Balanced. The same anchor sets are used each year.

Additionally, MI conducts an annual review of the rater agreement and scoring materials in order to inform

the development of item-specific, supplemental training materials. Supplemental materials are developed

each summer and implemented in the following operational administration.

Once hired, raters are placed into a scoring group that corresponds to the subject/grade that they are deemed

best suited to score (based on work history, results of the placement assessments, and performance on past

scoring projects). Raters are trained on a specific item type (i.e., brief writes, reading, research, full-writes,

and/or mathematics). Within each group, raters are divided into teams consisting of one team leader and

10–15 raters. Each team leader and rater is assigned a unique number for easy identification of their scoring

work throughout the scoring session. The number of items an individual rater scores is minimized so that

the rater becomes highly experienced in scoring responses to a given set of items.

MI’s Virtual Scoring Center (VSC) includes an online training interface which presents rubrics, scoring

guides, and training/qualifying sets. Raters are trained by a scoring director (in person) or using scripted

videos (online). The same training protocol is followed for both site-based and distributive raters.

After the contracts and nondisclosure forms are signed and the scoring director completes his or her

introductory remarks, training begins. Rater training and team leader training follow the same format. The

scoring director presents the writing or constructed-response task and introduces the scoring guide (anchor

set), then discusses each score point with the entire room. This presentation is followed by practice scoring

on the training/qualifying sets. The scoring director reminds the raters to compare each training/qualifying

set response to anchor responses in the scoring guide to ensure consistency in scoring the training/qualifying

responses.

All scoring personnel log in to MI’s secure Scoring Resource Center (SRC). The SRC includes all online

training modules, functions as the portal to the VSC interface, and serves as the data repository for all

scoring reports that are used for rater monitoring.




After completing the first training set, raters are provided a rationale for the score of each response presented

in the set. Training continues until all training/qualifying sets have been scored and discussed.

Like team leaders, raters must demonstrate their ability to score accurately by attaining the qualifying

agreement percentage established by Smarter Balanced before they may score actual student responses.

Any raters unable to meet the qualifying standards are not permitted to score that item. Raters who reach

the qualifying standard on some items but not others will only score the items on which they have

successfully qualified. All raters understand this stipulation when they are hired.

Training is carefully orchestrated so that raters understand how to apply the rubric in scoring the responses,

how to reference the scoring guide, how to develop the flexibility needed to handle a variety of responses,

and how to retain the consistency needed to accurately score all responses. In addition to completing all of

the initial training and qualifications, significant time is allotted for demonstrations of the VSC handscoring

system, explanations of how to “flag” unusual responses for review by the scoring director, and instructions

about other procedures necessary for the conduct of a smooth project.

Training design varies slightly depending on Smarter Balanced item type:

• Full-writes: Raters train and qualify on baseline sets for each grade and writing purpose (e.g., Grade

3 Narrative, Grade 6 Argumentative, etc.), then take qualifying sets for each item in that grade and

purpose.

• Brief writes, reading, and research: Raters train and qualify on a baseline set within a specific grade

band and target.

• Mathematics: Raters train on baseline items, which qualify the raters for that item as well as any

items associated with it; for items with no associated items, training is for the specific item.

Rater training time varies by grade and content area. Training for brief writes, reading, research, and many

mathematics items can be accomplished in one day, while training for full-writes may take up to five days

to complete. Raters generally work 6.5 hours per day, excluding breaks. Evening shift raters work 3.75

hours, excluding breaks.

Multiple strategies are used to minimize rater bias. First, raters do not have access to any student identifiers.

Unless the students sign their names, write about their home towns, or in some way provide other

identifying information as part of their response, the raters have no knowledge of student characteristics.

Second, all raters are trained using Smarter Balanced-provided materials, which were approved as unbiased

examples of responses at the various score points. Training involves constant comparisons with the rubric

and anchor papers so that raters’ judgments are based solely on the scoring criteria. Finally, following

training, a cycle of diagnosis and feedback is used to identify any issues. Specifically, during scoring, raters

are monitored and any instances of raters making scoring decisions based on anything except the criteria

are discussed. Raters are further monitored, and if any continue to exhibit bias after receiving a reasonable

amount of feedback, they are dismissed.

MI also implements a series of automated score verifications to ensure the accuracy of scores. For example,

MI conducts a blank check which resets scores when a condition code of “blank” is assigned to a response

that has one or more characters in the response string (e.g., a response comprised of spaces or tabs). In this

case, only after three independent raters have assigned a condition code of “blank” to a response that appears

blank but includes characters in the response string is the score recorded. A similar check is run when a

score or condition code other than “blank” is assigned to a response that includes no characters in the




response string. Automatic resetting of double-scored responses when two raters assign non-adjacent

scores, mismatched condition codes, or a combination of a condition code and a numeric score provides an

additional score verification. In addition to automatically resetting and rescoring these responses, the rater

information is captured in a report and reviewed by scoring directors, as one of many tools used to determine

retraining needs.

7.7.3 Rater Statistics

One concern regarding the scoring of any open-response assessment is the reliability and accuracy of the

scoring. MI appreciates and shares this concern and continually develops new and technically sound

methods of monitoring reliability. Reliable scoring starts with detailed scoring rubrics and training materials

and thorough training sessions by experienced trainers. Quality results are achieved through the daily

monitoring of each rater.

In addition to extensive experience in the preparation of training materials and employing management and

staff with unparalleled expertise in the field of handscored educational assessments, MI constantly monitors

the quality of each rater’s work throughout every project. Rater status reports are used to monitor raters’

scoring habits during the Smarter Balanced handscoring project.

MI has developed and operates a comprehensive system for collecting and analyzing scoring data. After

the raters’ scores are submitted into the VSC handscoring system, the data are uploaded into the scoring

data report servers located at MI’s corporate headquarters in Durham, NC.

More than 20 reports are available and can be customized to meet the information needs of the client and

MI’s scoring department. These reports provide the following data:

• Rater ID and team

• Number of responses scored

• Number of responses assigned each score point (1–4 or other)

• Percentage of responses scored that day in exact agreement with a second rater

• Percentage of responses scored that day within one point of agreement with a second rater

• Number and percentage of responses receiving adjacent scores at each line (0/1, 1/2, 2/3, etc.)

• Number and percentage of responses receiving nonadjacent scores at each line

• Number of correctly assigned scores on the validity responses

Updated real-time reports are available that show both daily and cumulative (project-to-date) data. These

reports are available for access by the handscoring project monitors at each MI scoring center via a secure

website, and the handscoring project monitors provide updated reports to the scoring directors several times

per day. MI further utilized dynamic “threshold” reports, which, based on inputted criteria, immediately

identify potential scoring performance issues. These reports allow scoring leadership to pinpoint areas of

concern and to take corrective action with great efficiency. MI scoring directors are experienced in

examining these reports and using the information to determine a need for retraining of individual raters or

the group as a whole. It can easily be determined if a rater is consistently scoring high or low and the

specific score points with which they may be having difficulty. The scoring directors share such information

with the team leaders and direct all retraining efforts.




7.7.4 Rater Monitoring and Retraining

Team leaders spot-check (i.e., read behind) each rater’s scoring to ensure that he or she is on target and

conduct one-on-one retraining sessions addressing any problems found. At the beginning of the project,

team leaders read behind every rater every day; they become more selective about the frequency and number

of read-behinds as raters become more proficient at scoring. The daily rater reliability reports and

validity/calibration results are used to identify raters who need more frequent monitoring.

Retraining is an ongoing process once scoring is underway. Daily analysis of the rater status reports enables

management personnel to identify individual or group retraining needs. If it becomes apparent that a whole

team or group is having difficulty with a particular type of response, large group training sessions are

conducted. Standard retraining procedures include room-wide discussions led by the scoring director, team

discussions conducted by team leaders, and one-on-one discussions with individual raters. It is standard

practice to conduct morning room-wide retraining at MI each day, with a more extensive retraining on

Monday mornings in order to re-anchor the raters after a weekend away from scoring.

Each student response is scored holistically by a trained and qualified rater using the scoring criteria

developed and approved by Smarter Balanced, with a second read conducted on 15% of responses for each

item for reliability purposes. Responses are randomly selected for second reads and scored by raters who

are not aware of the score assigned by the first rater or even that the response has been read before. MI’s

QA/reliability procedures allow the handscoring staff to identify struggling raters very early and begin

retraining at once. While retraining these raters, MI also monitors their scoring intensively to ensure that

all responses are scored accurately. In fact, MI’s monitoring is also used as a retraining method. MI shows

raters responses that the raters have scored incorrectly, explains the correct scores, and has the raters change

the scores.

During scoring, raters occasionally send responses to their leadership for review and/or scoring. These types

of responses most commonly include non-scorable responses such as off-topic or foreign-language

responses that are difficult to score using the available rubrics and reference responses, as well as at-risk

responses that are alerted to the client state for action.

7.7.5 Validity Checks

MI’s VSC scoring system randomly seeds validity responses among operational responses during scoring.

A small set of validity responses is provided by Smarter Balanced for all vendors to use, and these are

supplemented with responses selected and approved by MI scoring management. The “true” scores for these

responses are entered into a validity database. Validity responses are indistinguishable from operational

responses.

MI staff and all clients have access to real-time validity reports that include the response identification

number, the score(s) assigned by the raters, and the “true” scores. A daily and project-to-date summary of

the percentages of correct scores and low/high considerations at each score point is also provided.

Retraining may be conducted with the raters using the validity data as a guide for how to focus the

retraining. Validity results are not used in isolation but as one piece of evidence along with the second read

and read-behind agreement to make decisions about retraining and dismissing raters.

MI has amassed a large, longitudinal dataset of rater performance data from years of Smarter Balanced

handscoring. In spring 2019, we launched an enhanced accuracy monitoring system drawing on these data.

This system used validity responses, calibrated to fit a unidimensional item response theory (IRT) model




for each content area/item type. Calibrating validity responses allows us to prioritize them (using

correlations and fit statistics) such that those responses that provide the greatest information about rater

accuracy are distributed to raters first. MI runs nightly analyses to evaluate performance during scoring.

Empirically-determined cut points are used to classify raters into performance tiers based on recent validity

and inter-rater reliability (IRR). A rater with unacceptable performance initially receives feedback and

additional monitoring in the form of increased read-behinds. If performance does not improve quickly, the

rater is assigned an assessment composed of validity responses, the results of which determine whether the

rater may continue to score.

7.7.6 Rater Dismissal

When read-behinds or daily statistics identify a rater who cannot maintain acceptable agreement rates, the

rater is retrained and monitored by scoring leadership personnel. A rater may be released from the project

if retraining is unsuccessful. In these situations, all items scored by a rater during the timeframe in question

can be identified, reset, and released back into the scoring pool. The aberrant rater’s scores are deleted, and

the responses are redistributed to other qualified raters for rescoring.

7.7.7 Rater Agreement

The inter-rater reliability (IRR) is computed based on scorable responses (numeric scores) scored by two

independent raters only, excluding non-scorable responses (e.g., off-topic, off-purpose, or foreign-language

responses) that are scored by scoring leadership, not by two independent raters. The IRR is computed based

on the raters who scored student responses in Idaho.

In ELA/L, writing essay item responses (full-writes) are scored in three dimensions: convention (0–2

rubric), evidence/elaboration (1–4 rubric), and organization/purpose (1–4 rubric). The short-answer (SA)

items are scored in 0–2. Mathematics SA items are scored using 0–1, 0–2, or 0–3 rubrics.

Tables 88–90 provide a summary of the IRR based on items with a sample size greater than 50. The inter-

rater reliability is presented with average of %exact agreement, minimum and maximum %exact

agreements, combined %exact and %adjacent agreement, and quadratic weighted Kappa (QWK).

Table 88. ELA/L Rater Agreements for Short-Answer Items (Grades 3–11)

Grade # of Items %Exact %(Exact+

Adjacent) QWK

Average Min Max

3 12 77.84 68.67 87.43 100 0.75

4 15 75.86 67.76 83.01 100 0.76

5 16 72.06 62.24 81.94 100 0.75

6 35 73.63 59.04 88.73 100 0.69

7 37 73.44 62.14 87.32 100 0.70

8 38 74.48 60.58 89.44 100 0.72

9–11 77 74.49 57.79 98.18 100 0.74




Table 89. ELA/L Rater Agreements for Full-Write Items (Grades 3–11)

Grade Dimensions # of

Items

%Exact %(Exact+

Adjacent) QWK

Average Min Max Conventions 19 68.80 57.14 75.19 99.44 0.62

3 Evid/Elab 19 72.31 57.39 80.17 99.14 0.68 Org/Purp 19 72.01 58.77 78.63 99.10 0.68 Conventions 22 66.80 56.88 75.73 98.72 0.60




7 Evid/Elab 24 66.94 55.74 76.56 98.99 0.68 Org/Purp 24 67.07 54.10 76.69 98.86 0.68

8

Conventions 25 75.86 67.69 85.71 98.93 0.57

Evid/Elab 25 68.28 57.46 76.30 98.96 0.72

Org/Purp 25 68.18 58.96 75.61 99.12 0.72

Conventions 28 75.48 66.43 82.01 99.13 0.65

9–11 Evid/Elab 28 72.43 63.77 79.14 99.39 0.76

Org/Purp 28 72.38 64.29 80.58 99.52 0.75

Legend: Evid/Elab: Evidence/Elaboration; Org/Purp: Organization/Purpose




Table 90. Mathematics Rater Agreements (Grades 3–11)

Grade Score

Points

# of

Items

%Exact %(Exact+

Adjacent) QWK

Average Min Max

3 1 10 91.23 82.72 97.96 100 0.80

4 1 11 87.52 80.54 96.62 100 0.69

5 1 5 90.90 87.70 97.31 100 0.64

6 1 12 98.22 96.49 100 100 0.90

7 1 8 96.98 93.12 99.28 100 0.78

8 1 15 92.06 84.62 100 100 0.81

9–11 1 16 92.16 84.88 100 100 0.71

3 2 29 89.93 68.62 98.91 100 0.91

4 2 41 88.59 71.81 99.33 100 0.88

5 2 51 88.50 77.01 97.83 100 0.87

6 2 41 89.67 77.92 97.78 100 0.88

7 2 25 89.90 80.63 95.14 100 0.86

8 2 26 90.13 82.11 100 100 0.88

9–11 2 22 91.57 75.88 100 100 0.88

3 3 6 92.61 89.45 95.54 100 0.97

4 3 4 88.74 87.76 89.33 100 0.95

5 3 8 87.26 82.20 97.80 100 0.90

7 3 1 78.61 78.61 78.61 100 0.82

9–11 3 7 87.79 75.81 91.20 100 0.91




8. REPORTING AND INTERPRETING SCORES

The Online Reporting System (ORS) generates a set of online score reports that includes the information

describing student performance for students, parents, educators, and other stakeholders. The online score

reports are produced immediately after students complete tests and the tests are handscored. Because the

score reports on students’ performance are updated each time that students complete tests and they are

handscored, authorized users (e.g., school principals, teachers) can have quickly available information on

students’ performance on the tests and use it to improve student learning. In addition to individual students

‘score reports, the ORS also produces aggregate score reports by class, school, district, and state. The timely

accessibility of aggregate score reports could help users monitor students’ performance in each subject by

grade area, evaluate the effectiveness of instructional strategies, and inform the adoption of strategies to

improve student learning and teaching during the school year. Additionally, the ORS provides participation

data that help monitor student participation rates.

This section contains a description of the types of scores reported in the ORS and a description of the ways

to interpret and use these scores in detail.

8.1 ONLINE REPORTING SYSTEM FOR STUDENTS AND EDUCATORS

8.1.1 Types of Online Score Reports

The ORS is designed to help educators and students answer questions about how well students have

performed on ELA/L and mathematics assessments. The ORS is the online tool that provides educators and

other stakeholders with timely, relevant score reports. The ORS for the Smarter Balanced assessments has

been designed with stakeholders, who are not technical measurement experts, in mind in order to make

score reports easy to read. This is achieved by using simple language so that users can quickly understand

assessment results and make inferences about student achievement. The ORS is also designed to present

student performance in a uniform format. For example, similar colors are used for groups of similar

elements, such as achievement levels, throughout the design. This design strategy allows readers to compare

similar elements and to avoid comparing dissimilar elements.

Once authorized users log in to the ORS and select “Score Reports,” the online score reports are presented

hierarchically. The ORS starts by presenting summaries on student performance by subject and grade at a

selected aggregate level. To view student performance for a specific aggregate unit, users can select the

specific aggregate unit from a drop-down list of aggregate units (e.g., schools within a district, or teachers

within a school). For more detailed student assessment results for a school, a teacher, or a roster, users can

select the subject and grade on the online score reports. Additionally, when authorized state-level users log

in to the ORS and select “State at a Glance,” the ORS generates a summary of student performance data

for a test across the entire state.

Generally, the ORS provides two categories of online score reports: (1) aggregate score reports, and (2)

student score reports. Table 91 summarizes the types of online score reports available at the aggregate level

and the individual student level. Detailed information about the online score reports and instructions on

how to navigate the online score reporting system can be found in the Online Reporting System User Guide,

located via a help button on the ORS.




Table 91. Types of Online Score Reports by Level of Aggregation

Level of

Aggregation Types of Online Score Reports

State

District

School

Teacher

Roster

• Number of students tested and percentage of students with Level 3 or 4 (for overall

students and by subgroup)

• Average scale score and standard error of average scale score (for overall students and

by subgroup)

• Percentage of students at each achievement level on the overall test and by claims (for

overall students and by subgroup)

• Achievement performance level in each target (for overall students)1

• Participation rate (for overall students)2

• On-demand student roster report

Student

• Total scale score and standard error of measurement

• Achievement level on overall and claim scores with achievement-level descriptors

• Average scale scores and standard errors of average scale scores for student’s school,

district, and state

• Student growth in scale score and achievement level over time

• Writing performance descriptors and scores by dimensions

Note: 1: Performance category in each target is provided for all aggregate levels except for state. 2: Participation rate reports are provided at the state, district, and school levels.

Aggregate score reports at a selected aggregate level are provided for overall students and by subgroup.

Users can see student assessment results by any of the subgroups. Table 92 presents the types of subgroup

and subgroup categories provided in ORS.

Table 92. Types of Subgroups

Subgroup Subgroup Category

Gender Female

Male

Special Education Status Yes

No

Limited English Proficiency

(LEP) Status

Yes

No

LEP Category* L1, LE, EW, X1, X2, X3, X4, FL, SO

Section 504 Status Yes

No

Race/Ethnicity

American Indian or Alaska Native

Asian

Black or African American

Hispanic or Latino Ethnicity

Native Hawaiian or Other Pacific Islander

White




8.1.2 Online Reporting System

8.1.2.1 Home Page

When users log onto the ORS and select “Score Reports,” the first page displayed contains summaries of

students’ performance across grades and subjects. State personnel see state summaries, district personnel

see district summaries, school personnel see school summaries, and teachers see summaries of their

students. Using a drop-down menu with a list of aggregate units, users can see a summary of students’

performance for the lower aggregate unit, as well. For example, the state personnel can see a summary of

students’ performance for district as well as state.

The home page summarizes students’ performance including (1) number of students tested, and (2)

percentage proficient. Exhibits 1 and 2 present a sample of home pages at the state level and the district

level, respectively.

Exhibit 1. Home Page: State Level




Exhibit 2. Home Page: District Level

8.1.2.2 Subject Detail Page

More detailed summaries of student performance in each grade on a subject area for a selected aggregate

level are presented when users select a grade within a subject on the home page. On each aggregate report,

the summary report presents the summary results for the selected aggregate unit as well as the summary

results for the state and the aggregate unit above the selected aggregate. For example, if a school is selected

on the subject detail page, the summary results of the state and the district of the school are provided above

the school summary results, as well, so that the school performance can be compared with the above-

aggregate levels.

The subject detail page provides the aggregate summaries on a specific subject area including: (1) number

of students tested, (2) average scale score and standard error associated with the average scale score, (3)

percentage of students at Level 3 or above, and (4) percentage of students in each achievement level. The

summaries are also presented for overall students and by subgroup. Exhibit 3 presents an example of subject

detail pages for ELA/L at the district level when a user selects a subgroup of gender.




Exhibit 3. Subject Detail Page for ELA/L by Gender: District Level

8.1.2.3 Claim Detail Page

The claim detail page provides the aggregate summaries on student performance in each claim for a

particular grade and subject. The aggregate summaries on the claim detail page include: (1) number of

students tested, (2) average scale score and standard error associated with the average scale score, (3)

percentage proficient, and (4) percentage of students in each claim performance category.

As with the subject detail page, the summary report presents the summary results for the selected aggregate

unit as well as the summary results for the state and the aggregate unit above the selected aggregate. Also,

the summaries on claim-level performance can be presented for overall students and by subgroup. Exhibit

4 presents an example of claim detail page for mathematics at the district level when a user selects a

subgroup of LEP status.




Exhibit 4. Claim Detail Page for Mathematics by LEP Status: District Level

8.1.2.4 Target Detail Page

The target detail page provides the aggregate summaries on student performance in each target. The target

detail page provides: (1) average scale scores and standard errors of average scale scores for the selected

aggregate unit and the aggregate unit above the selected aggregate, and (2) strength or weakness indicators

in each target that are computed in two ways (i.e., performance relative to proficiency, performance relative

to the test as a whole). It should be noted that the summaries on target-level student performance are

generated for overall students only. That is, the summaries on target-level student performance are not

generated by subgroup. Exhibits 5–8 present examples of target detail pages for ELA/L and mathematics

at the school and teacher levels.




Exhibit 5. Target Detail Page for ELA/L: School Level




Exhibit 6. Target Detail Page for ELA/L: Teacher Level




Exhibit 7. Target Detail Page for Mathematics: School Level




Exhibit 8. Target Detail Page for Mathematics: Teacher Level




8.1.2.5 Trend Report Page

The trend (i.e., longitudinal) page provides the trend of student performance for an aggregate (e.g., state,

district, and school) over time. The trend report can be set to plot either average scale scores or percentages

of proficient students on the graph for the selected aggregate unit. In addition, the trend report can be plotted

by demographic subgroup. Exhibit 9 presents an example of trend report pages for mathematics at the

district level.

Exhibit 9. Trend Report for Mathematics: District Level




8.1.2.6 Student Detail Page

When a student completes a test and the test is handscored, an online score report appears in the student

detail page in ORS. The student detail page provides individual student performance on the test. In each

subject area, the student detail page provides: (1) scale score and standard error of measurement (SEM),

(2) achievement level for overall test, (3) performance category in each claim, (4) average scale scores for

student’s state, district, and school, and (5) writing performance descriptors in each dimension (ELA/L

only).

Exhibits 10 and 11 present examples of student detail pages for ELA/L and mathematics.

Specifically, at the top of the page, the student’s name, scale score with SEM, and achievement level are

presented. On the left middle section, the student’s performance is described in detail using a barrel chart.

In the barrel chart, the student’s scale score is presented with SEM using a “±” sign. SEM represents the

precision of the scale score, or the range in which the student would likely score if a similar test was

administered multiple times. Further, in the barrel chart, achievement-level descriptors with cut scores at

each achievement level are provided, which defines the content area knowledge, skills, and processes that

test takers at the achievement level are expected to possess. On the right middle section, average scale

scores and standard errors of the average scale scores for state, district, and school are displayed so that the

student achievement can be compared with the above-aggregate levels. It should be noted that the “±” next

to the student’s scale score is the SEM of the scale score, whereas the “±” next to the average scale scores

for aggregate levels represents the standard error of the average scale scores. Under the barrel chart, the

trend of student performance over time is displayed. At the bottom of the page, student performance on

each claim and writing dimension scores (ELA/L only) is displayed alongside with a description of his or

her performance on each claim and on each writing dimension.




Exhibit 10. Student Detail Page for ELA/L




Exhibit 11. Student Detail Page for Mathematics




8.1.2.6 Participation Rate

In addition to online score reports, the ORS provides participation rate reports for the districts and schools

to help monitor student participation rate. Participation data are updated each time a student completes tests

and the test is handscored. Included in the participation table are: (1) the number and percentage of students

who are tested and not tested, and (2) the percentage proficient. Exhibit 12 presents a sample of the

participation rate report at the district level.

Exhibit 12. Participation Rate Report at District Level




8.1.2.7 State-Level Summary

The ORS provides a State at a Glance page for authorized state-level users to track student performance for

a test across the entire state. Users can specify the test and administration year to display in the report.

Exhibit 13 presents a sample of state-level summary for ELA/L.

Exhibit 13. State at a Glance ELA/L




8.2 INTERPRETATION OF REPORTED SCORES

A student’s performance on a test is reported on a scale score, an achievement level for the overall test, and

an achievement level for each claim. Students’ scores and achievement levels are also summarized at the

aggregate levels. The next section provides a description about how to interpret these scores.

8.2.1 Scale Score

A scale score is used to describe how well a student performed on a test and can be interpreted as an estimate

of the students’ knowledge and skills measured. The scale score is the transformed score from a theta score,

which is estimated based on mathematical models. Low scale scores can be interpreted to mean that the

student does not possess sufficient knowledge and skills measured by the test. Conversely, high scale scores

can be interpreted to mean that the student has proficient knowledge and skills measured by the test. Scale

scores can be used to measure student growth across school years. Interpretation of scale scores is more

meaningful when the scale scores are used along with achievement levels and achievement-level

descriptors.

8.2.2 Standard Error of Measurement

A scale score (observed score on any test) is an estimate of the true score. If a student takes a similar test

multiple times, the resulting scale score will vary across administrations, sometimes being a little higher, a

little lower, or the same. The standard error of measurement (SEM) represents the precision of the scale

score, or the range in which the student would likely score if a similar test was administered multiple times.

When interpreting scale scores, it is recommended to consider the range of scale scores incorporating the

SEM of the scale score.

The “±” next to the student’s scale score provides information about the certainty, or confidence, of the

score’s interpretation. The boundaries of the score band are one SEM above and below the student’s

observed scale score, representing a range of score values that is likely to contain the true score. For

example, 2680 ± 10 indicates that if a student was tested again, it is likely that the student would receive a

score between 2670 and 2690. The SEM can be different for the same scale score, depending on how closely

the administered items match the student’s ability.

8.2.3 Achievement Level

Achievement levels are proficiency categories on a test that students fall into based on their scale scores.

For the Smarter Balanced assessments, scale scores are mapped into four achievement levels (i.e., Level 1,

Level 2, Level 3, and Level 4) using three achievement standards (i.e., cut scores). Achievement-level

descriptors are a description of content area knowledge and skills that test takers at each achievement level

are expected to possess. Thus, achievement levels can be interpreted based on achievement-level

descriptors. For the achievement level in ELA/L, for instance, achievement-level descriptors are described

for grade 6 Level 3 as: “The student has met the achievement standard and demonstrates progress toward

mastery of the knowledge and skills in English language arts/literacy needed for likely success in entry-

level credit-bearing college coursework after high school.” Generally, students performing at Levels 3 and

4 on Smarter Balanced tests are considered to be on track to demonstrate progress toward mastery of the

knowledge and skills necessary for college and career readiness.




8.2.4 Performance Category for Claims

Students’ performance on each claim is reported in three categories: (1) Below Standard, (2) At/Near

Standard, and (3) Above Standard. Unlike the achievement level for overall test, student performance on

each of claims is evaluated with respect to the “Meets Standard” achievement standard. For students

performing at either “Below Standard” or “Above Standard,” this can be interpreted to mean that their

performance is clearly below or above the “Meets Standard” cut score for a specific claim. For students

performing at “At/Near Standard,” this can be interpreted to mean that the students’ performance does not

provide enough information to tell whether students reached the “Meets Standard” mark for the specific

claim.

8.2.5 Performance Category for Targets

In addition to the claim level reports, teachers and educators ask for additional reports on student

performance for instructional needs. Target-level reports are produced for the aggregate units only, not for

individual students, because each student is administered with too few items in a target to produce a reliable

score for each target.

Target reports are produced for each target within a claim. AIR reports two types of relative strength and

weakness scores for each target within a claim. The strengths and weaknesses reports are generated for

aggregate units of classroom, school, and district and provide information about how a group of students in

a class, school, or district performed on each target, either relative to their performance on the test as a

whole or relative to the proficiency cut set by Smarter Balanced. Specifically, for target performance

relative to the test as a whole, students’ observed performance on items within the reporting element is

compared with expected performance based on the overall ability estimate. At the aggregate level, when

observed performance within a target is greater than expected performance, then the reporting unit (e.g.,

roster, teacher, school, or district) shows a relative strength in that target. Conversely, when observed

performance within a target is below the level expected based on overall achievement, then the reporting

unit shows a relative weakness in that target.

For target performance relative to proficiency, students’ observed performance on items within the

reporting element is compared with proficiency cut (e.g., Achievement Level 3 cut). At the aggregate level,

when observed performance within a target is greater than the proficiency cut, the reporting unit shows a

relative strength in that target. Conversely, when observed performance within a target is below the

proficiency cut, the reporting unit shows a relative weakness in that target.

The performance on target shows how a group of students performed on each target either relative to their

overall subject performance on a test or relative to proficiency standard. The performance on target is

mapped into three performance categories: (1) better than performance on the test as a whole (higher than

expected) or relative to proficiency standard, (2) similar to performance on the test as a whole or relative to

proficiency standard, and (3) worse than performance on the test as a whole (lower than expected) or relative

to proficiency standard. “Worse than performance on the test as a whole” does not imply a lack of

achievement. Instead, it can be interpreted to mean that student performance on that target was below their

performance across all other targets put together. Although performance categories for targets provide some

evidence to help address students’ strengths and weaknesses, they should not be over-interpreted because

student performance on each target is based on relatively few items, especially for a small group.




8.2.6 Aggregated Score

Students’ scale scores are aggregated at roster, teacher, school, district, and state levels to represent how a

group of students performs on a test. When students’ scale scores are aggregated, the aggregated scale

scores can be interpreted as an estimate of knowledge and skills that a group of students possesses. Given

that student scale scores are estimates, the aggregated scale scores are also estimates and are subject to

measures of uncertainty. In addition to the aggregated scale scores, the percentage of students in each

achievement level for overall and by claim are reported at the aggregate level to represent how well a group

of students performs for overall and by claim.

8.3 APPROPRIATE USES FOR SCORES AND REPORTS

Assessment results can be used to provide information on individual students’ achievement on the test.

Overall, assessment results show what students know and are able to do in certain subject areas. Further,

they give information on whether students are on track to demonstrate the knowledge and skills necessary

for college and their careers. Additionally, assessment results can be used to identify students’ relative

strengths and weaknesses in certain content areas. For example, performance categories for claims can be

used to identify an individual student’s relative strengths and weaknesses among claims within a content

area.

Assessment results on student achievement on the test can be used to help teachers or schools make

decisions on how to support students’ learning. Aggregate score reports for teacher and school level provide

information regarding the strengths and weaknesses of their students and can be utilized to improve teaching

and student learning. For example, a group of students performed very well in overall, but it could be

possible that they would not perform as well in several targets compared to their overall performance. In

this case, teachers or schools can identify strengths and weaknesses of their students through the group

performance by claim and target and promote instruction on specific claim or target areas that the group

performance is below their overall performance. Further, by narrowing down the student performance result

by subgroup, teachers and schools can determine what strategies may need to be implemented to improve

teaching and student learning particularly for students from disadvantaged subgroups. For example,

teachers can see student assessment results by LEP status and observe that LEP students are struggling with

literary response and analysis in reading. Teachers can then provide additional instructions for these

students to enhance their achievement in a specific target in a claim.

In addition, assessment results can be used to compare students’ performance among different students and

among different groups. Teachers can evaluate how their students perform compared with other students in

schools and districts for overall and by claim. Although all students are administered different sets of items

in each computer-adaptive test, scale scores are comparable across students. Furthermore, scale scores can

be used to measure the growth of individual students over time if data are available. In the Smarter Balanced

assessments, the scale scores across grades are on the same scale because the scores are vertically linked

across grades. Therefore, scale scores from one grade can be compared with the next grade (i.e., measuring

the growth).

While assessment results provide valuable information to understand students’ performance, these scores

and reports should be used with caution. It is important to note that scale scores reported are estimates of

true scores and hence do not represent the precise measure for student performance. A student’s scale score

is associated with measurement error and thus users need to consider measurement error when using student

scores to make decisions about student achievement. Moreover, although student scores may be used to




help make important decisions about students’ placement and retention, or teachers’ instructional planning

and implementation, the assessment results should not be used as the only source of information. Given

that assessment results measured by a test provide limited information, other sources on student

achievement such as classroom assessment and teacher evaluation should be considered when making

decisions on student learning. Finally, when student performance is compared across groups, users need to

take into account the group size. The smaller the group size, the larger the measurement error related to

these aggregate data, thus requiring interpretation with more caution.




9. QUALITY CONTROL PROCEDURE

Quality assurance (QA) procedures are enforced through all stages of the Smarter Balanced test

development, administration, and scoring and reporting of results. AIR implements a series of quality

control steps to ensure error-free production of score reports in both online and paper formats. The quality

of the information produced in the test delivery system (TDS) is tested thoroughly before, during, and after

the testing window opens.

9.1 ADAPTIVE TEST CONFIGURATION

For the CAT, a test configuration file is the key file that contains all specifications for the item selection

algorithm and the scoring algorithm, such as the test blueprint specification, slopes and intercepts for theta-

to-scale score transformation, cut scores, and the item information (e.g., answer keys, item attributes, item

parameters, and passage information). The accuracy of the information in the configuration file is checked

and confirmed numerous times independently by multiple staff members before the testing window opens.

To verify the accuracy of the scoring engine, we use simulated test administrations. The simulator generates

a sample of students with an ability distribution that matches that of the population (Smarter Balanced

Assessment Consortium states). The ability of each simulated student is used to generate a sequence of item

response scores consistent with the underlying ability distribution. These simulations provide a rigorous

test of the adaptive algorithm for adaptively administered tests and also provide a check of form

distributions (if administering multiple test forms) and test scores in fixed-form tests.

Simulations are generated using the production item selection and scoring engine to ensure that verification

of the scoring engine is based on a wide range of student response patterns. The results of simulated test

administrations are used to configure and evaluate the adequacy of the item selection algorithm used to

administer the Smarter Balanced summative assessments. The purpose of the simulations is to configure

the adaptive algorithm to optimize item selection to meet blueprint specifications while targeting test

information to student ability as well as check the score accuracy.

After the adaptive test simulations, another set of simulations for the combined tests (CAT component plus

a fixed-form PT component) are performed to check scores. The simulated data are used to check whether

the scoring specifications were applied accurately. The scores in the simulated data file are checked

independently, following the scoring rules specified in the scoring specifications.

9.1.1 Platform Review

AIR’s TDS supports a variety of item layouts. Each item goes through an extensive platform review on

different operating systems like Windows, Linux, and iOS to ensure that the item looks consistent in all of

them. Some of the layouts have the stimulus and item response options/response area displayed side by

side. In each of these layouts, both stimulus and response options have independent scroll bars.

Platform review is a process in which each item is checked to ensure that it is displayed appropriately on

each tested platform. A platform is a combination of a hardware device and an operating system. In recent

years, the number of platforms has proliferated, and platform review now takes place on various platforms

that are significantly different from one another.




Platform review is conducted by a team. The team leader projects the item as it was web approved in the

Item Tracking System (ITS), and team members, each using a different platform, look at the same item to

see that it is rendered as expected.

9.1.2 User Acceptance Testing and Final Review

Before deployment, the testing system and content are deployed to a staging server where they are subject

to user acceptance testing (UAT). UAT of the TDS serves as both a software evaluation and content

approval role. The UAT period provides Idaho with an opportunity to interact with the exact test that the

students will use.

9.2 QUALITY ASSURANCE IN DOCUMENT PROCESSING

The Smarter Balanced assessments are administered primarily online; however, a few students took paper-

pencil assessments. When test documents are scanned, a quality control sample of documents consisting of

10 test cases per document type (normally between 500 and 600 documents) was created so that all possible

responses and all demographic grids are verified, including various typical errors that required editing via

MI’s Data Inspection, Correction, and Entry (DICE) application program. This structured method of testing

provided exact test parameters and a methodical way of determining that the output received from the

scanner(s) was correct. MI staff carefully compared the documents and the data file created from them to

further ensure that results from the scanner, editing process (validation and data correction), and transfer to

the AIR database are correct.

9.3 QUALITY ASSURANCE IN DATA PREPARATION

AIR’s test delivery system (TDS) has a real-time, built-in quality-monitoring component. After a test is

administered to a student, the TDS passes the resulting data to our Quality Assurance (QA) system. QA

conducts a series of data integrity checks, ensuring, for example, that the record for each test contains

information for each item, keys for multiple-choice items, score points in each item, and total number of

field-test items and operational items. QA ensures that the test record contains no data from items that have

been invalidated.

Data pass directly from the Quality Monitoring System (QMS) to the Database of Record (DoR), which

serves as the repository for all test information and from which all test information for reporting is pulled.

The Data Extract Generator (DEG) is the tool that is used to pull data from the DoR for delivery to the

SDE. AIR staff ensure that data in the extract files match the DoR before delivering to the SDE.

9.4 QUALITY ASSURANCE IN HANDSCORING

9.4.1 Double Scoring Rates, Agreement Rates, Validity Sets, and Ongoing Read-Behinds

MI’s scoring process is designed to employ a high level of quality control. All scoring activities are

conducted anonymously; at no time do scorers have access to the students’ demographic information.

MI’s Virtual Scoring Center (VSC) provides the infrastructure for extensive quality control procedures.

Through the VSC platform, project leadership can perform spot-checks (read-behinds) of each scorer to

evaluate scoring performance; provide feedback and respond to questions; deliver retraining and/or

recalibration items on demand and at regularly scheduled intervals; and prevent scorers from scoring live

responses in the event that they require additional monitoring.




Once scoring is underway, quality results are achieved by consistent monitoring of each scorer. The scoring

director and team leaders read behind each scorer’s performance every day to ensure that he or she is on

target, and they conduct one-on-one retraining sessions when necessary. MI’s QA procedures allow scoring

staff to identify struggling scorers very early and begin retraining immediately.

If through read-behinds (or data monitoring) it becomes apparent that a scorer is experiencing difficulties,

he or she is given interactive feedback and mentoring on the responses that have been scored incorrectly,

and that scorer is expected to change the scores. Retraining is an ongoing process throughout the scoring

effort to ensure more accurate scoring. Daily analyses of the scorer status reports alert management

personnel to individual or group retraining needs.

In addition to using validity responses as a qualification threshold, other validity responses are presented

throughout scoring as ongoing quality checks. Validity responses can be pulled from approved existing

anchor or validity responses, but they also may be generated from live scoring and included in the pool

following review and approval by the Smarter Balanced Assessment Consortium. MI periodically

administers validity sets to each of MI’s scorers to monitor the scorer status. VSC is capable of dynamically

embedding calibration responses in scoring sets as individual items or in sets of whatever number of items

is preferred by the state.

With the VSC program, the way in which the student responses are presented prevents scorers from having

any knowledge about which responses are being single or double read, or which responses are validity set

responses.

9.4.2 Handscoring QA Monitoring Reports

MI generates detailed scorer status reports for each scoring project using a comprehensive system for

collecting and analyzing score data. The scores are validated and processed according to the specifications

set out by Smarter Balanced. This allows MI to manage the quality of the scorers and take any corrective

actions immediately. Updated real-time reports are available that show both daily and cumulative (project-

to-date) data. These reports are available to states 24 hours a day via a secure website. Project leadership

reviews these reports regularly. This mechanism allows project leadership to spot-check scores at any time

and offer feedback to ensure that each scorer is on target.

9.4.3 Monitoring by State Department of Education

The SDE also directly observes MI activities, virtually. MI provides virtual access to the training activities

through the online training interface. The SDE monitors the scoring process through the Client Command

Center (CCC), with access to view and run specific reports during the scoring process.

9.4.4 Identifying, Evaluating, and Informing the State on Alert Responses

MI implements a formal process for informing clients when student responses reflect a possibly dangerous

situation for the test taker. We also flag potential security breaches identified during scoring. For possible

dangerous situations, scoring project management and staff employ a set of alert procedures to notify the

client of responses indicating endangerment, abuse, or psychological and/or emotional difficulties.

This process is also used to notify each Consortium state of possible instances of teacher or proctor

interference or student collusion with others. The alert procedure is habitually explained during scorer

training sessions. Within the VSC system, if a scorer identifies a response which may require an alert, he




or she flags or notes that response as a possible alert and transfers the image to the scoring manager. Scoring

management then decides if the response should be forwarded to the client for any necessary action or

follow-up.

9.5 QUALITY ASSURANCE IN TEST SCORING

To monitor the performance of the TDS during the test administration window, AIR statisticians examine

the delivery demands, including the number of tests to be delivered, the length of the window, and the

historic state-specific behaviors to model the likely peak loads. Using data from the load tests, these

calculations indicate the number of each type of server necessary to provide continuous, responsive service,

and AIR contracts for service in excess of this amount. Once deployed, our servers are monitored at the

hardware, operating system, and software platform levels with monitoring software that alerts our engineers

at the first signs that trouble may be ahead. The applications log not only errors and exceptions but also

item response time information for critical database calls. This information enables us to know instantly

whether the system is performing as designed or if it is starting to slow down or experience a problem. In

addition, item response time data are captured for each assessed student, such as data about how long it

takes to load, view, or respond to an item. All of this information is logged, enabling us to automatically

identify schools or districts experiencing unusual slowdowns, often before they even notice.

A series of quality assurance reports can also be generated at any time during the online assessment window,

such as blueprint match rate, item exposure rate, and item statistics, for early detection of any unexpected

issues. Any deviations from the expected outcome are flagged, investigated, and resolved. In addition to

these statistics, a cheating analysis report is produced to flag any unlikely patterns of behavior in a testing

session, as discussed in Section 2.7.

For example, an item statistics analysis report allows psychometricians to ensure that items are performing

as intended and serves as an empirical key check through the operational testing window. The item statistics

analysis report is used to monitor the performance of test items throughout the testing window and serves

as a key check for the early detection of potential problems with item scoring, including incorrect

designation of a keyed response or other scoring errors, as well as potential breaches of test security that

may be indicated by changes in the difficulty of test items. This report generates classical item analysis

indicators of difficulty and discrimination, including proportion correct and biserial/polyserial correlation.

The report is configurable and can be produced so that only items with statistics falling outside a specified

range are flagged for reporting or to generate reports based on all items in the pool.

For the CAT component, other reports such as blueprint match and item exposure reports allow

psychometricians to verify that test administrations conform to the simulation results. The QA reports can

be generated on any desired schedule. Item analysis and blueprint match reports are evaluated frequently at

the opening of the testing window to ensure that test administrations conform to blueprint, and items are

performing as anticipated.

Table 93 presents an overview of the QA reports.




Table 93. Overview of Quality Assurance Reports

QA Reports Purpose Rationale

Item Statistics To confirm whether items work as

expected

Early detection of errors (key errors for selected-response items and

scoring errors for constructed-

response, performance-, or

technology-enhanced items)

Blueprint Match Rates To monitor unexpectedly low

blueprint match rates

Early detection of unexpected

blueprint match issue

Item Exposure Rates

To monitor unlikely high exposure

rates of items or passages or

unusually low item pool usage (high

unused items/passages)

Early detection of any oversight in the

blueprint specification

Cheating Analysis To monitor testing irregularities Early detection of testing irregularities

9.5.1 Score-Report Quality Check

In the 2018–2019 Smarter Balanced summative assessments, only online score reports were produced in

Idaho.

9.5.1.1 Online Report Quality Assurance

Scores for online assessments are assigned by automated systems in real time. For machine-scored portions

of assessments, the machine rubrics are created and reviewed along with the items, then validated and

finalized during rubric validation following field testing. The review process “locks down” the item and

rubric when the item is approved for web display (Web Approval). During operational testing, actual item

responses are compared with expected item responses (given the item response theory [IRT] parameters),

which can detect mis-keyed items, item score distribution, or other scoring problems. Potential issues are

automatically flagged in reports available to our psychometricians.

The handscoring processes include rigorous training, validity and reliability monitoring, and back-reading

to ensure accurate scoring. Handscored items are paired to the machine-scored items by our Test Integration

System (TIS). The integration is based on identifiers that are never separated from their data and are checked

by our quality assurance (QA) system. The integrated scores are sent to our test-scoring system, a mature,

well-tested real-time system that applies client-specific scoring rules and assigns scores from the calibrated

items, including calculating achievement-level indicators, subscale scores, and other features, which then

pass automatically to the reporting system and Database of Record (DoR). The scoring system is tested

extensively before deployment, including hand checks of scored tests and large-scale simulations to ensure

that point estimates and standard errors are correct.

Every test undergoes a series of validation checks. Once the QA system signs off, data are passed to the

DoR, which serves as the centralized location for all student scores and responses, ensuring that there is

only one place where the “official” record is stored. Only after scores have passed the QA checks and are

uploaded to the DoR are they passed to the Online Reporting System (ORS), which is responsible for

presenting individual-level results and calculating and presenting aggregate results. Absolutely no score is

reported in the ORS until it passes all of the QA system’s validation checks. All of the above processes take

milliseconds to complete; within less than a second of handscores being received by AIR and passing QA

validation checks, the composite score is available in the ORS.




9.5.1.2 Paper Report Quality Assurance

Statistical Programming

The family reports contain custom programming and require rigorous quality assurance processes to ensure

their accuracy. All custom programming is guided by detailed and precise specifications in our reporting

specifications document. Upon approval of the specifications, analytic rules are programmed, and each

program is extensively tested on test decks and real data from other programs. The final programs are

reviewed by two senior statisticians and one senior programmer to ensure that they implement agreed-upon

procedures. Custom programming is implemented independently by two statistical programming teams

working from the specifications. Only when the output from both teams matches exactly are the scripts

released for production.

Much of the statistical processing is repeated, and AIR has implemented a structured software development

process to ensure that the repeated tasks are implemented correctly and identically each time. We write

small programs (called macros) that take specified data as input and produce data sets containing derived

variables as output. Approximately 30 such macros reside in our library for the score reports. Each macro

is extensively tested and stored in a central development server. Once a macro is tested and stored, changes

to the macro must be approved by the director of score reporting and the director of psychometrics, as well

as by the project directors for affected projects.

Each change is followed by a complete retesting with the entire collection of scenarios on which the macro

was originally tested. The main statistical program is mostly made up of calls to various macros, including

macros that verify the data and conversion tables and the macros that do the many complicated calculations.

This program is developed and tested using artificial data generated to test both typical and extreme cases.

In addition, the program goes through a rigorous code review by a senior statistician.

Display Programming

The paper report development process uses graphical programming, which takes place in a Xerox-

developed programming language called VIPP and allows virtually infinite control of the visual appearance

of the reports. After designers at AIR create backgrounds, our VIPP programmers write code that indicates

where to place all variable information (data, graphics, and text) on the reports. The VIPP code is tested

using both artificial and real data. AIR’s data generation utilities can read the output layout specifications

and generate artificial data for direct input into the VIPP programs. This allows the testing of these programs

to begin before the statistical programming is complete. In later stages, artificial data are generated

according to the input layout and run through the psychometric process and the score reporting statistical

programs, and the output is formatted as VIPP input. This enables us to test the entire system. Programmed

output goes through multiple stages of review and revision by graphics editors and the score reporting team

to ensure that design elements are accurately reproduced and data are correctly displayed. Once we receive

final data and VIPP programs, the AIR score reporting team reviews proofs that contain actual data based

on our standard quality assurance documentation.

In addition, we compare data independently calculated by AIR psychometricians with data on the reports.

A large sample of reports is reviewed by several AIR staff members to make sure that all data are correctly

placed on reports. This rigorous review typically is conducted over several days and takes place in a secure

location in the AIR building. All reports containing actual data are stored in a locked storage area. Before

printing the reports, AIR provides a live data file and individual student reports with sample districts for




Idaho staff review. AIR will work closely with SDE to resolve questions and correct any problems. The

reports will not be delivered unless SDE approves the sample reports and data file.




REFERENCES

American Educational Research Association, American Psychological Association, & National Council on

Measurement in Education. (2014). Standards for educational and psychological testing.

Washington, DC: American Educational Research Association.

Cohen, J. & Albright, L. (2014). Smarter Balanced adaptive item selection algorithm design report.

Washington, DC: American Institutes for Research.

Drasgow, F., Levine, M.V., & Williams, E.A. (1985). Appropriateness measurement with polychotomous

item response models and standardized indices. British Journal of Mathematical and Statistical

Psychology, 38(1), 67–86.

Guo, F. (2006). Expected Classification Accuracy Using the Latent Distribution. Practical Assessment,

Research & Evaluation, 11(6).

Huynh, H. (1976). On the reliability of decisions in domain-referenced testing, Journal of Educational

Measurement, 13(4), 253–264.

Livingston, S. A., & Lewis, C. (1995). Estimating the consistency and accuracy of classifications based on

test scores. Journal of Educational Measurement, 32(2), 179–197.

Livingston, S. A., & Wingersky, M. S. (1979). Assessing the reliability of tests used to make pass/fail

decisions. Journal of Educational Measurement, 16(4), 247–260.

Snijders, T. A. B. (2001). Asymptotic null distribution of person fit statistics with estimated person

parameter. Psychometrika 66(3), 331–342.

Sotaridona, L. S., Pornel, J. B., & Vallejo, A. (2003). Some applications of item response theory to testing.

The Philippine Statistician 52(1–4), 81–92.

Subkoviak, M. J. (1976). Estimating reliability from a single administration of a criterion-referenced test.

Journal of Educational Measurement, 13(4), 265–276.

U.S. Department of Education (2015). Peer Review of State Assessment Systems: Non-Regulatory Guidance

for States. Washington, D.C.: https://www2.ed.gov/policy/elsec/guid/assessguid15.pdf

https://www2.ed.gov/policy/elsec/guid/assessguid15.pdf




APPENDICES




Appendix A: Summary of the 2018–2019 Interim Assessments

The Interim Comprehensive Assessments (ICA) were fixed-form tests for each grade and subject. Most

students took the ICA once, but some students took it multiple times. Table A-1 presents the number of

students who took the ICA by the number of attempts. Total number of tests indicate the total ICA tests

taken by the total number of students, counting multiple attempts as multiple tests. For example, if a student

took ICA twice, the number of tests for this student is counted twice. Table A-2 summarizes student

performance on ICA for all tests taken, including the average and the standard deviation of scale scores,

the percentage of students in each achievement level, and the percentage of proficient students.

Table A-1. Number of Students Who Took ICAs (Grades 3–8 and 11)

Grade

Number of Students by Number of Attempts Total

Number of

Tests Taken Once Twice Three

Times Four

Times Five

Times

Total

Number of

Students

ELA/L

3 1,190 222 4 0 0 1,416 1,646

4 784 117 1 0 0 902 1,021

5 1,116 79 0 0 0 1,195 1,274

6 719 34 0 0 0 753 787

7 728 40 0 0 0 768 808

8 735 28 0 0 0 763 791

11 348 28 0 0 0 376 404

Mathematics

3 1,034 96 0 0 0 1,130 1,226

4 818 62 0 0 0 880 942

5 919 87 0 0 0 1,006 1,093

6 1,091 50 1 0 0 1,142 1,194

7 1,248 67 0 0 0 1,315 1,382

8 999 85 0 0 0 1,084 1,169

11 525 2 0 0 0 527 529




Table A-2. ICA ELA/L and Mathematics Percentage of Students in Achievement Levels (Grades 3–8 and

11)

Subject Grade

Total

Number of

Tests Taken

Scale

Score

Mean

Scale

Score

SD

%

Level 1

%

Level 2

%

Level 3

%

Level 4

%

Proficient

ELA/L

3 1,646 2393.66 80.64 39 27 21 13 34

4 1,021 2440.85 93.11 39 21 23 17 40

5 1,274 2501.24 88.76 26 23 30 21 51

6 787 2511.59 91.42 27 30 30 13 43

7 808 2528.29 97.55 33 25 31 12 43

8 791 2553.00 92.07 26 29 35 10 45

11 404 2543.21 87.27 28 39 28 6 34

Mathematics

3 1,226 2391.15 74.01 44 27 22 6 28

4 942 2462.31 81.29 25 33 28 14 42

5 1,093 2522.00 88.65 23 30 20 28 47

6 1,194 2499.72 83.87 33 40 19 7 27

7 1,382 2528.61 83.71 29 39 23 10 32

8 1,169 2526.86 92.51 41 33 16 9 26

11 529 2513.96 108.47 59 27 11 3 14

Note: The percentage of each achievement level may not add up to 100% or %Proficient due to rounding.

For the Interim Assessment Block (IABs), there were seven to nine IABs for ELA/L and six to 10 IABs in

mathematics. Students were allowed to take as many IABs as they wanted. Table A–3 show the total number

of students who took the IABs and the number of students by the number of IABs taken. For example, in

grade 3 ELA/L, a total of 14,419 students took the IABs. Among 14,419 students, 5,338 students took one

IAB, 2,997 students took two IABs, and so on.

Tables A–4 to A–7 disaggregate the number of students in Table A–3 by each individual block. For

example, 5,338 students in grade 3 ELA/L took one IAB only. Among 5,338 students, 341 students took

the Brief Writes IAB, 460 students took the Editing IAB, and so on. Tables A–8 to A–11 show the

percentage of students in each performance category for all students for each IAB.




Table A–3. Number of Students Who Took IABs (Grades 3–8)

Grade Total Number of IABs Taken

1 2 3 4 5 6 7 8 9 10

ELA/L

3 14,419 5,338 2,997 2,440 1,604 813 477 405 257 88

4 15,013 5,565 4,017 1,910 1,533 761 415 335 315 162

5 14,775 5,518 3,407 2,498 1,314 703 526 347 287 175

6 15,482 4,546 3,215 3,979 1,952 807 671 211 86 15

7 13,175 4,100 2,251 3,539 1,620 897 355 313 93 7

8 13,146 4,158 3,227 3,744 1,330 527 78 82

11 14,086 5,340 4,249 2,517 765 312 208 223 253 219

Mathematics

3 18,195 7,347 5,115 2,716 1,688 1,188 141

4 18,677 7,550 5,178 3,381 1,481 1,042 45

5 17,945 7,640 5,142 2,340 1,496 1,202 125

6 17,290 5,796 6,210 3,026 1,168 966 124

7 14,982 4,588 4,848 3,653 908 932 53

8 16,812 7,814 4,824 2,540 870 719 45

11 14,499 6,526 4,175 2,157 670 245 149 147 92 141 197




Table A–4: ELA/L Number of Students Who Took IABs by Block Labels (Grades 3–5)

Grade Block Number of IABs Taken

1 2 3 4 5 6 7 8 9

3

Brief Writes 341 345 473 410 253 286 182 194 88

Editing 460 679 758 807 553 354 377 242 88

Language and Vocabulary Use 1,159 1,120 1,357 1,237 713 430 376 251 88

Listening and Interpretation 159 606 797 972 593 366 358 249 88

Reading Informational Text 1,533 1,290 1,417 1,162 525 383 330 240 88

Reading Literary Text 1,293 1,404 1,464 940 563 365 397 256 88

Research 61 129 556 387 234 226 318 216 88

Revision 57 202 142 247 336 226 332 222 88

Performance Task 275 219 356 254 295 226 165 186 88

4

Brief Writes 690 349 340 496 255 231 191 308 162

Editing 442 863 468 663 518 348 313 305 162




Reading Literary Text 1,024 2,596 1,178 959 555 295 306 311 162

Research 51 305 549 421 277 180 250 287 162

Revision 137 76 167 300 364 234 298 284 162


5

Brief Writes 364 566 1,012 584 253 301 287 282 175

Editing 650 812 888 599 488 377 288 262 175

Language and Vocabulary Use 1,504 1,385 1,392 918 599 474 308 282 175

Listening and Interpretation 199 407 1,020 585 446 451 314 273 175


Reading Literary Text 941 1,317 883 893 448 420 300 261 175

Research 199 259 270 318 384 310 240 271 175

Revision 80 194 368 164 272 273 211 183 175





Table A–5: ELA/L Number of Students Who Took IABs by Block Labels (Grades 6–8, 11)


1 2 3 4 5 6 7 8 9

6

Brief Writes 112 92 232 153 87 69 51 10 15

Editing 550 724 1,717 1,125 471 653 206 86 15



Reading Informational Text 642 1,392 2,569 1,407 656 479 199 86 15

Reading Literary Text 906 1,295 2,223 1,377 591 413 184 86 15

Research 367 703 1,156 983 670 591 145 85 15

Revision 143 208 503 166 154 386 206 85 15


7

Brief Writes 296 268 415 338 107 207 250 40 7

Editing 500 613 1,469 926 851 338 302 91 7

Language and Vocabulary Use 652 920 1,662 782 866 316 313 92 7

Listening and Interpretation 181 388 1,326 402 202 108 157 93 7

Reading Informational Text 960 725 1,888 842 577 248 137 56 7

Reading Literary Text 903 499 1,429 1,094 545 308 312 93 7

Research 392 327 868 1,206 546 336 311 93 7

Revision 38 143 514 790 654 214 247 93 7

Performance Task 178 619 1,046 100 137 55 162 93 7

8

Brief Writes 372 474 522 398 497 66 82

Editing and Revising 1,144 1,575 2,174 1,223 523 75 82

Listening and Interpretation 504 646 1,953 370 144 69 82

Reading Informational Text 625 1,149 1,911 1,208 512 78 82

Reading Literary Text 649 1,479 1,329 1,095 431 73 82

Research 269 782 2,239 991 465 78 82

Performance Task 595 349 1,104 35 63 29 82

11

Brief Writes 263 234 106 255 156 137 188 245 219

Editing 1,299 2,188 986 326 286 190 214 251 219

Language and Vocabulary Use 909 1,255 1,152 507 275 194 211 247 219


Reading Informational Text 634 722 923 328 146 130 175 221 219

Reading Literary Text 455 1,859 1,395 587 212 132 182 242 219

Research 372 1,018 1,176 545 111 124 149 208 219

Revision 498 725 1,460 345 238 167 190 243 219





Table A–6: Mathematics Number of Students Who Took IABs by Block Labels (Grades 3–8)


1 2 3 4 5 6

3

Geometry 125 527 828 960 1,150 141

Measurement and Data 557 861 1,107 1,285 1,186 141

Number and Operations in Base Ten 2,258 2,784 1,954 1,475 1,184 141

Number and Operations – Fractions 405 2,086 1,826 1,409 1,180 141

Operational and Algebraic Thinking 3,998 3,967 2,389 1,548 1,187 141

Performance Task 4 5 44 75 53 141

4

Geometry 436 839 986 1,085 1,038 45

Measurement and Data 165 682 721 734 1,035 45


Number and Operations – Fractions 1,888 2,906 2,599 1,366 1,041 45

Operational and Algebraic Thinking 1,386 2,365 2,661 1,221 1,040 45


5

Geometry 441 706 782 933 1,195 125

Measurement and Data 232 628 978 1,071 1,173 125


Number and Operations – Fractions 2,847 4,069 1,901 1,419 1,202 125

Operations and Algebraic Thinking 373 1,367 1,276 1,062 1,184 125


6

Expressions and Equations 961 2,024 2,147 983 966 124

Geometry 246 852 1,167 656 956 124

Number System 2,888 5,045 2,936 1,120 965 124

Ratios and Proportional

Relationships 1,551 3,638 1,983 1,070 965 124

Statistics and Probability 120 756 321 795 953 124


7

Expressions and Equations 799 2,265 2,439 880 932 53

Geometry 177 517 1,020 473 930 53

Number System 2,142 3,830 3,514 779 932 53

Ratios and Proportional

Relationships 1,336 2,600 2,366 823 931 53

Statistics and Probability 83 355 828 578 921 53


8

Expressions and Equations I 2,364 2,805 2,241 835 719 45

Expressions and Equations II 1,354 1,879 1,327 605 717 45

Functions 2,928 2,629 1,659 733 719 45

Geometry 648 722 1,033 556 707 45

Number System 497 986 1,334 732 708 45





Table A–7: Mathematics Number of Students Who Took IABs by Block Labels (Grade 11)


1 2 3 4 5 6 7 8 9 10

11

Algebra – Linear Functions 3,600 3,249 1,498 516 207 137 136 79 125 197

Algebra – Quadratic Functions 561 1,267 1,328 352 175 123 130 83 129 197

Geometry – Congruence 697 686 471 251 118 63 63 86 140 197

GMD 15 159 160 148 129 81 105 87 137 197

GRTR 341 248 637 332 82 95 114 80 141 197

Interpreting Functions 143 1,215 268 249 181 127 139 68 140 197

Number and Quantity 366 689 886 310 129 102 124 76 133 197

SSE 687 418 684 340 119 98 130 87 136 197

Statistics and Probability 49 333 458 154 56 59 67 67 125 197

Performance Task 67 86 81 28 29 9 21 23 63 197

Note: The percentage of each performance category may not add up to 100% due to rounding. GRTR; Geometry – Measurement and Modeling, GMD; Geometry – Measurement and Modeling, SSE; Seeing Structure in Expressions and Polynomial Expressions




Table A–8: ELA/L Percentage of Students in Performance Categories by IAB Block Labels

(Grades 3–5)

Grade Block Number Tested % Below % At/Near % Above

3

Brief Writes 2,572 37 51 12

Editing 4,318 33 50 17

Language and Vocabulary Use 6,731 30 48 21

Listening and Interpretation 4,188 28 53 20

Reading Informational Text 6,968 25 53 22

Reading Literary Text 6,770 33 41 26

Research 2,215 23 44 33

Revision 1,852 35 44 21

Performance Task 2,064 30 54 16

4


Editing 4,082 31 53 16





Research 2,482 33 48 19

Revision 2,022 36 49 15


5


Editing 4,539 23 47 30





Research 2,426 25 45 31

Revision 1,920 31 42 28


Note: The percentage of each performance category may not add up to 100% due to rounding.




Table A–9: ELA/L Percentage of Students in Performance Categories by IAB Block Labels

(Grades 6–8, 11)


6

Brief Writes 821 14 77 8

Editing 5,547 22 63 15





Research 4,715 21 47 32

Revision 1,866 35 53 12


7


Editing 5,097 13 71 16





Research 4,086 15 56 29

Revision 2,700 22 57 21


8


Editing and Revising 6,796 22 58 21




Research 4,906 25 51 24


11


Editing 5,959 26 59 15





Research 3,922 20 54 25

Revision 4,085 33 53 15






Table A–10: Mathematics Percentage of Students in Performance Categories by IAB Block Labels

(Grades 3–5)


3

Geometry 3,731 30 51 20

Measurement and Data 5,137 31 43 26

Number and Operations in Base Ten 9,796 33 39 28

Number and Operations – Fractions 7,047 22 49 30

Operational and Algebraic Thinking 13,230 40 43 17

Performance Task 322 22 49 29

4

Geometry 4,429 11 69 21




Operational and Algebraic Thinking 8,718 40 45 15


5

Geometry 4,182 35 49 17




Operations and Algebraic Thinking 5,387 29 47 24






Table A–11: Mathematics Percentage of Students in Performance Categories by IAB Block Labels

(Grades 6–8, 11)

Grade Block Number

Tested

% Below % At/Near % Above

6

Expressions and Equations 7,205 30 44 25

Geometry 4,001 28 42 29

Number System 13,078 35 45 20

Ratios and Proportional Relationships 9,331 45 34 21

Statistics and Probability 3,069 18 69 12


7

Expressions and Equations 7,368 26 47 27

Geometry 3,170 21 62 17


Ratios and Proportional Relationships 8,109 22 56 22



8

Expressions and Equations I 9,009 33 52 14

Expressions and Equations II 5,927 35 50 14

Functions 8,713 43 40 17

Geometry 3,711 36 48 16



11

Algebra – Linear Functions 9,744 59 34 7

Algebra – Quadratic Functions 4,345 34 57 9

Geometry – Congruence 2,772 14 70 16

Geometry – Measurement and Modeling 1,218 24 72 4

Geometry – Right Triangles and Trigonometric Ratios 2,267 41 37 22

Interpreting Functions 2,727 40 49 11

Number and Quantity 3,012 42 48 10

Seeing Structure in Expressions and Polynomial Expressions 2,896 60 31 8







Appendix B: Student Performance Across Five Years for All Students and by Subgroup

Table B–1. ELA/L Student Performance Across Five Years (Grades 3 and 4)

Group

2014–2015 2015–2016 2016–2017 2017–2018 2018–2019

N % Prof Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD

Grade 3

All Students 22,089 48 2425.3 79.6 22,947 49 2427.4 81.8 23,355 47 2421.6 85.6 22,790 50 2427.5 85.9 22,722 50 2427.8 88.0

Female 10,830 53 2434.0 79.1 11,152 53 2435.3 81.0 11,503 51 2430.0 84.5 11,066 54 2436.2 84.3 11,159 54 2435.4 86.3

Male 11,259 44 2416.8 79.2 11,795 46 2420.0 81.9 11,852 43 2413.3 85.8 11,724 46 2419.4 86.6 11,563 47 2420.4 88.9

African American 177 22 2374.2 75.5 229 30 2387.4 79.5 253 27 2381.5 81.5 201 26 2377.2 85.2 249 35 2390.8 90.6

AmerIndian/Alaskan 299 22 2376.6 74.5 270 25 2372.3 76.9 315 23 2373.3 77.9 253 26 2375.5 82.6 292 29 2379.9 87.0

Asian 238 56 2447.8 86.8 269 58 2444.2 85.3 241 60 2446.6 87.3 231 55 2433.5 96.2 260 60 2445.2 89.2

Hispanic 3,993 27 2387.9 72.0 4,231 29 2389.2 76.1 4,132 27 2382.2 77.5 4,203 32 2392.5 80.6 3,855 32 2389.2 81.4

Pacific Islander 41 51 2424.1 78.2 67 42 2414.6 79.6 77 47 2430.7 77.2 88 38 2403.3 87.0 198 53 2429.6 88.1

White 16,734 54 2435.3 78.1 17,229 55 2437.9 79.8 17,711 52 2432.1 84.4 17,158 55 2437.2 84.5 17,868 55 2437.2 86.6

LEP 1,949 17 2368.8 66.1 1,517 13 2354.7 64.8 1,885 16 2358.0 72.3 1,778 17 2362.9 74.0 2,405 27 2379.1 79.4

Special Education 2,041 16 2351.2 79.0 2,009 15 2350.2 76.0 2,123 15 2343.4 81.8 2,438 17 2349.7 82.0 2,309 17 2349.8 84.2

Section 504 Plan 365 37 2403.0 79.6 328 35 2401.6 78.4 349 37.8 2403.0 84.0 396 36 2408.1 77.7 492 43 2412.8 81.2

Grade 4

All Students 22,226 46 2460.8 85.1 22,471 50 2467.6 87.2 23,398 48 2462.8 89.1 23,633 50 2467.4 92.3 23,337 52 2471.3 93.6

Female 10,762 51 2470.7 83.5 10,995 54 2477.0 86.8 11,398 52 2472.0 87.9 11,679 54 2476.1 91.2 11,311 55 2479.8 91.7

Male 11,464 42 2451.5 85.6 11,476 46 2458.6 86.7 12,000 44 2454.0 89.3 11,954 47 2458.9 92.6 12,026 49 2463.4 94.7



Asian 277 55 2480.8 87.4 232 59 2492.8 94.7 299 58 2488.4 96.8 241 62 2496.2 100.7 254 57 2477.4 100.4

Hispanic 4,044 25 2419.2 77.7 3,997 28 2423.8 80.4 4,220 29 2423.8 83.3 4,437 32 2426.8 86.8 3,948 32 2430.0 86.2


White 16,791 52 2472.1 83.2 17,053 55 2479.2 84.8 17,701 53 2473.4 87.1 17,713 56 2478.8 90.0 18,407 57 2481.7 92.2

LEP 2,107 19 2404.1 74.5 1,314 11 2382.4 70.9 1,539 12 2384.7 74.2 1,693 14 2383.4 76.7 2,067 24 2411.9 85.3


Section 504 Plan 439 38 2444.5 79.2 402 37 2437.7 83.2 439 36.2 2438.3 84.8 530 38 2445.9 86.3 690 43 2455.5 88.9





Group

2014–2015 2015–2016 2016–2017 2017–2018 2018–2019

N % Prof Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD

Grade 5

All Students 21,863 52 2501.9 85.0 22,584 54 2505.1 89.0 22,899 54 2504.5 92.3 23,726 55 2508.8 93.2 24,315 57 2512.6 93.9

Female 10,632 58 2514.7 82.9 10,946 60 2518.4 86.6 11,224 59 2517.0 90.0 11,577 60 2520.6 90.5 12,005 61 2523.1 91.2

Male 11,231 47 2489.7 85.1 11,638 48 2492.5 89.4 11,675 49 2492.4 93.0 12,149 50 2497.6 94.3 12,310 53 2502.3 95.3



Asian 252 67 2525.8 87.3 289 65 2531.0 95.2 267 63 2529.2 97.7 279 68 2540.5 92.6 268 66 2541.0 102.1

Hispanic 3,842 32 2464.3 79.6 4,070 33 2463.5 82.4 4,038 33 2461.2 86.3 4,450 37 2469.4 89.0 4,186 36 2469.4 86.8


White 16,665 57 2511.6 83.2 17,007 60 2516.4 86.8 17,377 59 2516.3 89.8 17,776 60 2519.6 90.8 19,030 62 2523.2 92.1

LEP 1,823 24 2446.3 77.3 1,230 14 2417.6 75.0 1,350 13 2414.0 78.5 1,357 11 2410.5 76.4 2,140 27 2452.3 86.4


Section 504 Plan 521 43 2485.5 80.9 425 40 2478.8 83.2 480 40.2 2474.8 86.9 613 41 2484.1 89.6 847 50 2501.5 85.9

Grade 6

All Students 21,559 48 2523.6 84.2 22,180 50 2527.3 86.5 23,050 51 2526.7 88.9 23,191 54 2531.7 92.1 24,367 55 2535.9 93.6

Female 10,464 54 2535.7 82.1 10,775 57 2541.0 84.2 11,190 57 2540.7 86.1 11,362 59 2544.9 89.3 11,855 61 2549.4 89.5

Male 11,095 43 2512.1 84.5 11,405 44 2514.4 86.7 11,860 44 2513.5 89.5 11,829 48 2519.1 92.9 12,512 50 2523.1 95.5



Asian 264 64 2545.3 87.0 255 67 2557.7 92.6 322 66 2560.8 86.7 239 69 2566.5 94.4 302 71 2573.3 94.4

Hispanic 3,668 28 2484.2 79.1 3,820 32 2490.9 82.0 4,069 31 2486.2 84.4 4,291 34 2490.0 89.1 4,171 34 2493.1 89.1


White 16,552 53 2533.2 82.5 16,940 55 2536.7 84.5 17,428 56 2537.4 86.5 17,439 59 2543.0 89.1 19,125 60 2546.2 91.1

LEP 1,682 20 2467.0 76.1 914 12 2439.0 76.6 1,339 13 2444.9 78.2 1,117 9 2425.9 76.3 1,880 25 2469.4 90.3


Section 504 Plan 626 35 2503.7 81.6 562 35 2501.4 83.8 554 37.7 2504.4 84.4 690 38 2501.1 88.5 979 46 2516.3 88.3





Group

2014–2015 2015–2016 2016–2017 2017–2018 2018–2019

N % Prof Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD

Grade 7

All Students 21,520 51 2547.0 89.2 21,960 53 2551.7 91.0 22,643 54 2551.5 93.1 23,314 54 2551.7 95.1 23,835 58 2561.1 98.0

Female 10,517 57 2562.0 85.9 10,679 60 2566.2 88.0 11,014 60 2566.6 88.8 11,248 61 2568.8 90.0 11,647 64 2577.2 92.8

Male 11,003 44 2532.6 89.8 11,281 46 2537.9 91.7 11,629 47 2537.2 94.7 12,066 48 2535.8 96.9 12,188 52 2545.8 100.2



Asian 274 64 2575.9 96.3 267 66 2574.4 90.8 287 67 2583.1 94.9 307 68 2586.7 93.1 267 66 2583.6 102.1

Hispanic 3,714 31 2506.8 82.6 3,635 33 2511.7 86.7 3,847 36 2512.7 89.0 4,347 34 2510.3 92.5 4,035 37 2513.9 93.9


White 16,464 56 2557.0 87.5 16,951 58 2561.8 88.9 17,332 58 2561.4 90.9 17,411 60 2563.1 92.2 18,779 63 2572.6 95.0

LEP 1,727 23 2487.7 83.8 787 12 2454.2 76.1 1,005 16 2457.7 88.0 1,156 10 2444.9 80.4 1,718 28 2489.9 97.3


Section 504 Plan 664 39 2529.3 84.0 605 38 2522.8 87.1 690 40.3 2527.0 88.8 739 41 2527.5 87.8 1,059 49 2543.2 89.8

Grade 8

All Students 21,483 52 2565.8 88.8 21,763 54 2569.9 90.9 22,259 52 2566.7 92.7 22,707 54 2569.8 94.4 23,897 54 2570.2 97.0

Female 10,563 59 2582.0 85.2 10,646 61 2586.8 86.6 10,771 59 2582.4 89.1 11,069 62 2588.4 89.9 11,558 61 2588.6 92.9

Male 10,920 45 2550.2 89.3 11,117 47 2553.7 92.0 11,488 46 2551.9 93.7 11,638 47 2552.0 95.2 12,339 46 2553.0 97.6



Asian 272 61 2589.0 97.3 286 69 2599.7 94.6 292 63 2590.2 92.3 265 68 2600.3 96.3 348 67 2607.1 97.7

Hispanic 3,602 34 2530.3 83.1 3,650 36 2533.1 83.9 3,677 34 2529.3 88.8 4,056 37 2534.3 89.8 4,052 33 2526.1 91.4


White 16,538 56 2574.7 87.5 16,698 58 2579.2 89.6 17,197 57 2576.3 90.9 17,203 58 2579.1 92.8 18,591 59 2581.0 94.9

LEP 1,595 25 2510.9 81.6 779 12 2469.8 77.9 916 14 2476.7 83.0 807 7 2454.7 73.8 1,757 24 2501.9 91.6


Section 504 Plan 709 42 2548.7 84.6 596 40 2548.0 87.6 732 38.1 2544.0 85.5 742 42 2546.3 86.8 1,115 46 2556.7 88.1




Table B–4. ELA/L Student Performance Across Five Years (Grade 10)

Group

2014–2015 2015–2016 2016–2017 2017–2018 2018–2019

N % Prof Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD

Grade 10

All Students 20,217 60 2597.2 100.7 20,770 61 2598.5 103.6 21,412 59 2591.7 105.2 21,512 59 2594.7 106.4 22,305 59 2592.9 109.8

Female 9,930 68 2615.4 94.6 10,057 68 2615.7 97.8 10,581 65 2608.1 100.5 10,578 66 2611.5 100.7 10,892 65 2608.9 103.2

Male 10,287 53 2579.7 103.2 10,713 55 2582.4 106.2 10,831 53 2575.6 107.2 10,934 53 2578.4 109.2 11,413 53 2577.6 113.6



Asian 293 73 2627.4 116.8 313 71 2618.9 126.2 307 69 2613.8 120.4 290 72 2630.3 118.2 310 71 2627.3 101.0

Hispanic 3,276 42 2553.5 93.9 3,403 43 2554.4 96.2 3,488 40 2545.2 101.0 3,732 40 2547.2 100.6 3,460 39 2544.2 104.6


White 15,697 65 2607.5 98.6 16,031 66 2609.4 101.3 16,470 64 2603.7 102.0 16,396 64 2607.0 103.4 17,599 64 2603.6 107.4

LEP 1,313 31 2524.3 95.7 643 12 2471.1 88.2 795 11 2459.5 88.4 702 6 2444.0 85.3 1,259 27 2509.3 107.3


Section 504 Plan 634 54 2579.2 98.3 533 49 2574.8 99.6 599 47.7 2569.9 102.1 737 53 2581.0 104.6 1,027 54 2582.8 105.0

* Suppressed data due to the small sample size, n < 10.





Group

2014–2015 2015–2016 2016–2017 2017–2018 2018–2019

N % Prof Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD

Grade 9

All Students 9,568 52 2569.7 96.9 6,040 54 2573.1 101.8 4,970 55 2575.9 101.1 5,507 57 2583.5 99.5 6,495 56 2580.2 104.7

Female 4,596 59 2585.7 93.4 2,942 62 2593.2 96.6 2,485 62 2592.7 96.0 2,704 64 2601.2 92.7 3,164 63 2598.2 97.8

Male 4,972 45 2555.0 97.7 3,098 46 2554.1 103.0 2,485 48 2559.1 103.4 2,803 51 2566.5 102.9 3,331 48 2563.1 108.2



Asian 87 69 2612.0 103.5 54 57 2579.1 132.0 55 55 2579.6 113.0 54 69 2599.5 96.7 52 73 2614.3 102.2

Hispanic 1,660 34 2530.3 90.5 1,027 39 2541.4 94.2 915 33 2530.7 96.9 862 42 2543.4 94.1 1,074 39 2539.9 99.4

Pacific Islander 2* 48 52 2562.0 96.8 64 44 2563.1 94.1 40 50 2579.0 83.9 38 47 2545.6 113.0

White 7,424 56 2578.9 95.3 4,620 58 2582.6 100.3 3,721 61 2589.5 98.3 4,407 61 2593.0 98.4 5,160 60 2590.4 102.8

LEP 664 30 2515.3 89.6 204 13 2463.8 92.1 253 11 2463.0 84.6 259 26 2500.7 98.3 371 25 2500.5 97.6

Special Education 707 7 2447.4 71.4 505 10 2456.3 85.6 386 8 2451.1 81.5 469 10 2461.5 87.0 536 11 2455.6 86.9

Section 504 Plan 282 39 2550.2 96.3 166 43 2552.9 91.5 120 45.8 2567.9 103.0 209 51 2571.5 93.7 195 50 2566.1 109.1

Grade 11

All Students 580 60 2602.3 103.5 1,048 39 2548.7 109.8 401 45 2558.5 115.9 404 30 2523.7 111.7 388 44 2564.0 122.3

Female 287 71 2628.6 92.9 505 44 2564.9 107.7 196 53 2578.1 112.3 170 36 2543.0 113.6 197 47 2576.9 117.2

Male 293 49 2576.6 107.0 543 34 2533.6 109.7 205 37 2539.7 116.5 234 25 2509.7 108.4 191 41 2550.6 126.2

African American 4* 14 14 2463.7 122.7 9* 10 0 2388.3 58.8 4*

AmerIndian/Alaskan 4* 18 33 2497.2 123.9 7* 5* 8*

Asian 5* 15 33 2509.4 170.2 2* 6* 3*

Hispanic 99 35 2546.1 103.1 203 31 2527.3 93.3 75 19 2500.5 87.4 72 15 2491.2 92.4 60 32 2519.5 123.1

Pacific Islander 0* 4* 2* 3* 5*

White 465 66 2614.7 99.8 767 41 2557.5 110.6 297 53 2575.5 118.4 308 35 2536.0 112.8 308 47 2573.3 120.4

LEP 39 26 2529.6 95.2 49 14 2470.9 95.4 19 5 2453.9 73.2 22 9 2442.1 90.6 14 0 2449.9 81.5


Section 504 Plan 18 56 2585.3 101.9 43 40 2539.2 108.6 15 46.7 2566.5 124.6 17 29 2539.1 95.3 14 43 2552.0 101.1





Table B–6. Mathematics Student Performance Across Five Years (Grades 3 and 4)

Group

2014–2015 2015–2016 2016–2017 2017–2018 2018–2019

N % Prof Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD

Grade 3

All Students 22,131 50 2431.1 74.1 22,954 52 2435.4 76.0 23,383 50 2433.6 78.4 22,840 52 2436.8 80.5 22,747 53 2438.4 81.7

Female 10,849 48 2429.0 72.0 11,156 51 2433.2 73.7 11,511 48 2430.8 76.0 11,092 51 2433.7 77.5 11,165 51 2434.6 79.5

Male 11,282 51 2433.2 76.1 11,798 53 2437.5 78.0 11,872 52 2436.2 80.7 11,748 53 2439.8 83.1 11,582 55 2442.1 83.6



Asian 243 59 2453.9 84.5 269 64 2461.0 82.2 251 61 2460.6 91.3 241 55 2441.2 89.6 264 67 2470.6 88.2

Hispanic 4,015 27 2394.5 68.7 4,248 31 2400.0 71.6 4,176 31 2399.2 72.7 4,234 33 2400.9 74.7 3,874 31 2398.5 75.4


White 16,740 56 2441.2 71.7 17,208 58 2445.4 73.4 17,675 56 2443.6 76.3 17,156 58 2447.1 78.5 17,856 58 2448.1 79.7

LEP 1,998 20 2379.7 67.4 1,561 18 2372.3 70.4 1,945 19 2375.6 71.7 1,850 20 2373.7 72.8 2,443 28 2391.8 77.1


Section 504 Plan 366 42 2414.7 80.1 326 36 2412.2 75.7 350 38.9 2412.3 76.0 399 38 2415.2 77.8 501 46 2426.1 75.3

Grade 4

All Students 22,260 43 2470.5 74.9 22,479 47 2477.4 77.8 23,429 47 2475.5 79.7 23,668 48 2478.1 81.7 23,356 50 2481.8 81.9

Female 10,767 42 2468.1 72.1 11,001 45 2474.2 75.2 11,409 45 2472.6 76.7 11,699 45 2473.9 77.7 11,320 48 2477.9 77.8

Male 11,493 45 2472.7 77.3 11,478 49 2480.4 80.1 12,020 48 2478.2 82.4 11,969 51 2482.2 85.2 12,036 52 2485.5 85.5



Asian 286 56 2492.1 81.9 238 58 2506.1 97.3 302 58 2502.3 90.0 252 60 2507.9 90.8 261 54 2492.1 89.9

Hispanic 4,061 21 2432.5 66.3 4,005 25 2436.0 71.2 4,255 26 2437.1 73.1 4,467 29 2439.1 75.1 3,963 28 2441.7 75.4


White 16,787 49 2481.1 73.1 17,037 53 2488.6 75.0 17,680 52 2486.4 77.0 17,697 54 2489.4 79.1 18,395 55 2491.9 79.8

LEP 2,155 15 2418.9 65.3 1,351 10 2401.7 66.7 1,599 13 2404.3 71.8 1,750 13 2403.6 69.8 2,118 22 2426.8 77.0


Section 504 Plan 438 35 2460.8 69.6 403 33 2456.3 71.2 440 33.4 2456.5 77.7 532 37 2461.2 76.7 695 42 2469.4 79.2





Group

2014–2015 2015–2016 2016–2017 2017–2018 2018–2019

N % Prof Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD

Grade 5

All Students 21,898 38 2499.2 81.8 22,580 40 2502.3 83.9 22,922 42 2504.5 88.1 23,758 43 2506.9 89.7 24,322 45 2510.8 91.4

Female 10,652 36 2497.3 78.5 10,945 38 2500.1 80.2 11,237 40 2501.6 84.8 11,594 41 2504.3 86.3 12,010 42 2507.7 88.0

Male 11,246 39 2501.1 84.8 11,635 42 2504.4 87.2 11,685 43 2507.3 91.1 12,164 45 2509.4 92.8 12,312 47 2513.9 94.5



Asian 263 57 2525.8 90.4 294 53 2532.2 90.7 274 50 2529.1 99.0 285 57 2545.3 95.7 274 61 2548.7 103.1

Hispanic 3,863 19 2459.1 75.8 4,082 19 2459.4 77.5 4,071 21 2459.9 81.4 4,490 23 2464.3 83.6 4,201 25 2466.6 82.8


White 16,658 43 2509.7 79.4 16,979 45 2514.3 81.0 17,343 47 2517.1 85.0 17,754 49 2518.8 86.9 19,003 50 2521.9 89.2

LEP 1,870 14 2443.8 75.2 1,269 7 2420.5 71.2 1,403 8 2420.9 73.8 1,427 7 2418.4 73.1 2,190 20 2452.8 83.8


Section 504 Plan 524 33 2489.1 78.8 424 28 2483.1 78.1 481 31.2 2481.6 82.8 617 31 2484.8 87.0 851 38 2500.9 85.2

Grade 6

All Students 21,577 36 2515.5 93.3 22,180 39 2521.2 97.4 23,066 40 2522.0 99.7 23,217 44 2528.8 102.1 24,368 43 2526.3 102.6

Female 10,472 36 2517.4 89.0 10,785 40 2523.4 92.6 11,189 41 2525.5 93.4 11,368 44 2530.7 97.2 11,854 43 2528.8 97.2

Male 11,105 36 2513.7 97.2 11,395 39 2519.1 101.6 11,877 39 2518.8 105.1 11,849 43 2526.9 106.6 12,514 42 2523.9 107.3



Asian 270 51 2542.0 98.4 257 57 2557.5 108.5 331 58 2563.3 104.1 245 59 2566.1 110.9 305 64 2574.9 107.3

Hispanic 3,683 17 2467.6 89.8 3,837 19 2473.2 93.0 4,130 19 2473.1 93.7 4,321 23 2477.5 99.4 4,189 22 2473.5 99.3


White 16,543 41 2527.6 90.0 16,908 45 2533.7 93.5 17,374 45 2535.5 96.0 17,423 49 2543.1 97.4 19,097 48 2539.2 98.4

LEP 1,711 12 2452.0 87.6 955 7 2418.7 92.9 1,392 9 2433.5 94.2 1,172 6 2412.8 93.5 1,928 15 2450.5 100.2


Section 504 Plan 625 27 2498.6 90.7 557 29 2500.6 98.3 558 28.1 2497.8 96.8 687 30 2499.9 97.2 980 32 2507.9 94.0





Group

2014–2015 2015–2016 2016–2017 2017–2018 2018–2019

N % Prof Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD

Grade 7

All Students 21,552 38 2532.5 96.6 21,952 42 2541.0 98.0 22,632 42 2541.2 102.5 23,321 44 2541.8 105.3 23,826 46 2547.2 107.4

Female 10,543 38 2534.5 92.2 10,673 41 2541.9 93.7 11,006 42 2542.6 97.6 11,253 44 2543.9 100.0 11,643 45 2548.4 102.8

Male 11,009 38 2530.5 100.7 11,279 42 2540.1 101.8 11,626 43 2539.8 106.9 12,068 44 2539.7 110.0 12,183 46 2546.2 111.6



Asian 281 53 2570.7 105.2 274 57 2573.4 103.4 294 63 2585.9 112.6 310 62 2597.0 108.8 267 60 2585.1 119.7

Hispanic 3,729 18 2483.9 90.2 3,655 21 2490.7 95.6 3,903 22 2492.2 96.7 4,383 22 2487.9 102.3 4,055 24 2489.3 103.9


White 16,466 43 2544.8 93.7 16,900 47 2553.8 93.5 17,254 47 2553.8 99.1 17,371 50 2556.6 100.3 18,742 51 2561.5 102.6

LEP 1,771 13 2463.8 93.1 846 8 2431.4 94.7 1,040 9 2437.8 98.0 1,220 6 2425.7 93.0 1,763 18 2466.4 107.9


Section 504 Plan 662 28 2515.2 92.9 602 27 2516.5 95.6 691 30.4 2519.0 97.5 737 34 2520.9 95.6 1,061 37 2533.5 98.3

Grade 8

All Students 21,521 37 2546.1 104.7 21,753 38 2551.1 107.8 22,221 39 2551.1 111.3 22,700 42 2557.2 114.1 23,892 41 2555.5 115.6

Female 10,566 37 2550.8 99.4 10,642 39 2555.4 102.1 10,746 39 2555.1 105.0 11,081 43 2562.6 107.9 11,544 42 2561.7 109.0

Male 10,955 36 2541.7 109.4 11,111 37 2546.9 112.8 11,475 38 2547.4 116.7 11,619 40 2552.0 119.5 12,348 39 2549.8 121.1



Asian 281 54 2583.9 121.6 291 54 2598.7 124.6 290 55 2599.4 117.8 272 62 2609.8 128.7 351 61 2614.4 127.3

Hispanic 3,625 17 2495.4 95.7 3,658 18 2501.0 96.3 3,697 20 2500.1 102.5 4,084 23 2506.2 104.4 4,067 19 2496.3 103.6


White 16,534 41 2558.6 102.5 16,668 43 2563.6 105.8 17,130 43 2564.2 108.8 17,155 47 2570.6 111.6 18,560 46 2570.1 112.7

LEP 1,641 13 2479.8 97.2 812 6 2440.0 91.5 964 9 2449.9 97.1 859 4 2427.6 89.8 1,785 15 2475.5 106.8


Section 504 Plan 707 29 2529.6 103.7 596 26 2524.1 106.9 730 25.8 2524.6 103.7 742 30 2528.1 105.3 1,117 32 2537.5 104.0




Table B–9. Mathematics Student Performance Across Five Years (Grade 10)

Group

2014–2015 2015–2016 2016–2017 2017–2018 2018–2019

N % Prof Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD

Grade 10

All Students 20,130 30 2554.7 109.1 20,750 31 2556.9 110.6 21,398 32 2559.0 116.1 21,519 33 2562.3 120.1 22,274 33 2561.6 121.6

Female 9,895 31 2557.9 105.3 10,065 31 2558.6 105.6 10,580 32 2561.1 111.0 10,589 33 2563.8 113.8 10,866 32 2562.1 114.7

Male 10,235 29 2551.7 112.5 10,685 31 2555.3 115.1 10,818 31 2556.8 120.9 10,930 33 2560.8 125.8 11,408 34 2561.2 127.8



Asian 295 51 2610.5 133.6 314 51 2605.6 131.0 313 54 2618.3 131.8 292 53 2625.5 144.1 309 51 2619.6 129.6

Hispanic 3,269 13 2503.9 98.6 3,406 12 2504.0 98.4 3,497 14 2506.2 103.4 3,766 15 2506.2 106.4 3,469 14 2500.2 106.5


White 15,619 34 2566.1 107.0 16,013 35 2569.2 108.9 16,448 36 2571.9 114.3 16,362 37 2576.5 117.6 17,686 37 2575.2 119.4

LEP 1,323 10 2482.7 100.8 661 5 2449.4 97.6 809 5 2444.1 98.4 756 3 2428.1 99.5 1,287 10 2473.3 111.8


Section 504 Plan 632 24 2535.5 109.4 531 22 2530.5 103.7 597 24.8 2538.4 112.2 735 24 2536.3 112.7 1,033 27 2546.9 115.3





Group

2014–2015 2015–2016 2016–2017 2017–2018 2018–2019

N % Prof Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD N % Prof

Scale

Score SD

Grade 9

All Students 9,631 28 2533.9 103.0 5,673 29 2536.1 106.8 4,943 30 2537.5 108.7 5,492 31 2542.4 110.8 6,515 32 2538.3 114.5

Female 4,635 27 2537.0 96.7 2,769 30 2541.9 99.5 2,477 29 2542.2 103.2 2,693 31 2548.0 101.8 3,162 32 2545.4 102.8

Male 4,996 28 2531.2 108.5 2,904 28 2530.6 113.1 2,466 30 2532.9 113.8 2,799 32 2537.0 118.7 3,353 32 2531.5 124.2



Asian 94 40 2556.3 118.9 51 29 2545.4 118.8 56 36 2551.2 101.5 54 39 2552.9 111.4 51 41 2575.3 109.7

Hispanic 1,670 12 2492.5 92.1 1,013 14 2494.0 98.6 905 12 2491.2 98.3 867 15 2495.8 104.6 1,107 15 2490.8 104.9

Pacific Islander 2* 50 16 2509.6 93.6 64 30 2523.6 117.1 40 25 2517.5 102.6 41 32 2517.6 131.3

White 7,456 31 2544.2 102.1 4,300 34 2548.9 105.2 3,685 35 2552.1 106.8 4,382 35 2554.2 108.8 5,132 36 2550.6 112.9

LEP 684 8 2473.3 88.9 222 5 2437.6 97.8 266 2 2432.3 90.2 273 9 2457.0 105.4 404 8 2458.2 101.3


Section 504 Plan 282 22 2518.1 104.5 168 22 2516.6 105.0 120 30.0 2541.3 103.7 206 24 2532.0 103.3 196 29 2532.0 119.1

Grade 11

All Students 589 35 2571.8 117.5 1,088 16 2516.3 111.2 459 19 2525.0 115.0 464 15 2510.0 113.8 421 21 2522.3 128.0

Female 295 39 2580.8 112.7 523 15 2523.5 102.1 229 19 2526.3 105.4 204 18 2522.9 113.0 217 19 2516.9 121.1

Male 294 31 2562.7 121.6 565 17 2509.7 118.6 230 18 2523.7 124.0 260 13 2499.9 113.6 204 23 2528.1 135.0

African American 5* 15 7 2448.4 131.3 10 0 2469.0 59.2 6* 4*

AmerIndian/Alaskan 3* 18 11 2466.6 96.9 7* 5* 8*

Asian 6* 16 31 2525.6 155.3 3* 6* 4*

Hispanic 104 9 2495.6 104.9 206 7 2491.0 99.6 81 7 2480.5 95.2 90 4 2469.4 88.5 66 6 2458.8 97.0

Pacific Islander 0* 5* 2* 3* 5*

White 467 41 2589.2 112.8 801 18 2525.6 110.9 348 22 2540.1 116.5 354 17 2521.8 113.7 334 25 2536.9 129.9

LEP 40 0 2470.6 68.3 53 4 2458.2 103.2 17 0 2437.2 64.5 24 4 2479.4 103.2 17 0 2403.7 51.8


Section 504 Plan 16 25 2544.4 104.8 42 14 2505.8 125.2 23 17.4 2522.9 112.4 20 5 2501.7 77.1 15 7 2517.3 91.7




184

American Institutes for Research

Appendix C: Classification Accuracy and Consistency Index by Subgroup

Table C–1. ELA/L Classification Accuracy and Consistency by Achievement Level (Grades 3–5)

Group N %Accuracy %Consistency

All L1 L2 L3 L4 All L1 L2 L3 L4

Grade 3

All Students 22,722 79 89 71 68 87 71 83 60 57 81

Female 11,159 79 89 71 68 88 70 82 60 57 81

Male 11,563 79 90 70 68 86 71 84 60 57 80

African American 249 80 91 69 69 82 73 87 59 57 74

AmerIndian/Alaskan 292 81 92 70 68 83 74 88 58 57 73

Asian 260 79 88 68 68 88 72 85 55 58 83

Hispanic 3,855 79 90 70 67 84 71 85 60 57 74

Pacific Islander 198 78 89 70 69 86 70 83 58 59 79

White 17,868 79 89 71 68 87 70 82 60 57 81

LEP 2,405 80 91 70 68 84 73 86 60 57 72

Special Education 2,309 85 93 69 68 85 79 91 58 56 76

Section 504 Plan 494 78 87 72 67 85 69 83 60 57 77

Grade 4

All Students 23,337 78 90 63 65 87 70 84 51 55 80

Female 11,311 77 89 63 66 87 69 82 52 55 81

Male 12,026 78 90 63 65 86 70 85 51 55 80



Asian 254 80 93 64 66 87 72 86 51 55 84

Hispanic 3,948 78 91 64 65 83 70 86 52 54 73


White 18,407 77 89 63 65 87 69 83 51 55 81

LEP 2,067 81 92 64 65 84 74 88 53 53 73


Section 504 Plan 690 77 89 63 66 86 69 83 52 55 78

Grade 5

All Students 24,315 79 90 67 74 85 71 83 55 66 78

Female 12,005 79 88 67 74 86 70 81 55 66 79

Male 12,310 79 91 67 74 85 71 85 55 66 77



Asian 268 83 89 69 75 90 75 85 56 64 86

Hispanic 4,186 79 90 66 74 81 71 85 56 65 70


White 19,030 79 89 67 74 86 71 82 55 66 79

LEP 2,140 80 91 67 74 81 73 87 57 63 70


Section 504 Plan 848 78 87 68 75 86 70 82 56 66 78

*The classification index is based on n < 10.



185


Table C–2. ELA/L Classification Accuracy and Consistency by Achievement Level (Grades 6–8)



Grade 6

All Students 24,367 79 89 72 76 83 71 81 62 68 75

Female 11,855 79 87 72 76 84 70 79 61 68 76

Male 12,512 79 90 72 76 83 71 83 62 68 74



Asian 302 81 89 71 77 86 73 81 60 68 81

Hispanic 4,171 79 89 72 75 81 71 83 63 66 68


White 19,125 79 88 72 76 84 70 80 62 68 75

LEP 1,880 82 91 72 75 82 74 86 63 65 68


Section 504 Plan 982 79 89 73 77 83 71 81 63 69 71

Grade 7

All Students 23,835 80 90 70 78 84 72 83 59 71 75

Female 11,647 79 89 70 78 84 71 80 59 71 76

Male 12,188 80 91 70 78 83 72 84 59 71 73



Asian 267 80 90 69 79 85 73 84 59 70 78

Hispanic 4,035 80 91 70 77 80 72 85 61 70 65


White 18,779 79 89 70 78 84 71 81 59 71 75

LEP 1,718 82 92 71 77 79 75 88 61 69 63


Section 504 Plan 1,061 79 87 71 78 84 70 80 60 71 71

Grade 8

All Students 23,897 80 88 73 79 83 72 81 63 72 73

Female 11,558 79 86 73 79 84 71 77 63 72 74

Male 12,339 80 89 72 79 81 72 82 63 72 70



Asian 348 81 90 76 78 84 73 81 66 69 79

Hispanic 4,052 80 88 73 79 80 72 83 64 70 67


White 18,591 79 87 73 79 83 71 79 63 72 73

LEP 1,757 82 91 73 78 81 75 86 64 68 69


Section 504 Plan 1,118 79 86 73 79 83 71 78 64 73 68




186


Table C–3. ELA/L Classification Accuracy and Consistency by Achievement Level (Grade 10)



Grade 10

All Students 22,305 80 88 72 76 85 72 81 61 68 78

Female 10,892 79 86 72 76 85 71 78 61 68 78

Male 11,413 80 89 72 76 85 72 83 61 68 78



Asian 310 80 91 71 76 86 72 81 62 67 79

Hispanic 3,460 80 89 72 76 82 71 83 63 68 71


White 17,599 80 87 72 76 85 72 81 60 69 79

LEP 1,259 82 91 72 76 83 75 87 62 67 73


Section 504 Plan 1,030 79 86 73 76 84 71 80 62 69 77




187


Table C–4. ELA/L Classification Accuracy and Consistency by Achievement Level (Grades 9 and 11)



Grade 9

All Students 6,495 79 87 71 76 84 70 80 61 68 76

Female 3,164 78 85 71 76 85 70 76 60 69 77

Male 3,331 79 88 70 76 84 71 82 61 67 75


AmerIndian/Alaskan 107 82 91 71 81 71* 74 86 59 73 59*

Asian 52 82 89* 75* 78 86 74 83* 56* 70 81

Hispanic 1,074 78 88 70 75 81 70 82 61 66 70

Pacific Islander 38 82 95 64* 80 80* 76 91 53* 71 77*

White 5,160 79 87 71 76 85 70 79 61 68 77

LEP 371 81 90 71 74 79 73 86 60 65 67

Special Education 536 85 92 70 75 76* 80 90 59 64 49*

Section 504 Plan 196 79 87 67 78 82 71 81 56 69 76

Grade 11

All Students 388 81 89 72 77 87 73 83 64 67 77

Female 197 80 86 72 77 86 72 80 63 67 78

Male 191 82 91 72 77 87 75 86 64 68 76

African American 4**

AmerIndian/Alaskan 8**

Asian 3**

Hispanic 60 83 90 72 79 95* 77 88 61 72 75*

Pacific Islander 5**

White 308 80 89 72 76 86 73 82 64 66 77

LEP 14 80 83 72* – – 75 84 62* – –

Special Education 66 84 93 63 59* 88* 80 88 59 37* 80*

Section 504 Plan 15 81 83* 84* 76* 83* 72 79* 66* 68* 82*


**Suppressed data due to the small sample size, n < 10.

Cells with “– “ indicate that no student was observed in the performance level.



188


Table C–5. Mathematics Classification Accuracy and Consistency by Achievement Level (Grades 3–5)



Grade 3

All Students 22,747 83 90 74 79 89 76 85 64 72 84

Female 11,165 82 90 74 79 89 75 84 64 72 83

Male 11,582 83 90 74 79 90 76 85 64 72 85



Asian 264 86 88 71 82 93 80 85 60 73 90

Hispanic 3,874 83 91 74 79 85 76 87 64 70 77


White 17,856 83 90 74 79 90 76 83 64 72 84

LEP 2,443 83 92 73 78 86 77 88 64 68 79


Section 504 Plan 501 82 88 73 80 87 75 83 62 72 82

Grade 4

All Students 23,356 83 89 80 78 89 76 82 73 71 83

Female 11,320 83 89 80 78 88 75 81 73 71 81

Male 12,036 84 89 80 79 89 77 83 72 71 84



Asian 261 85 91 80 82 90 79 84 73 73 86

Hispanic 3,963 83 90 80 78 84 77 85 73 69 75


White 18,395 83 88 80 78 89 76 81 72 71 83

LEP 2,118 85 91 79 78 85 78 87 73 68 77


Section 504 Plan 695 83 88 81 79 88 76 82 74 72 82

Grade 5

All Students 24,322 82 89 77 71 90 75 84 68 61 85

Female 12,010 82 89 77 71 89 75 83 68 61 84

Male 12,312 83 90 77 71 90 76 85 67 61 85



Asian 274 85 88 77 67 92 79 85 66 55 90

Hispanic 4,201 83 91 76 71 86 76 87 67 61 77


White 19,003 82 88 77 71 90 75 83 68 61 85

LEP 2,190 85 92 77 71 87 78 89 66 60 78


Section 504 Plan 851 82 88 78 72 91 75 83 69 62 84




189


Table C–6. Mathematics Classification Accuracy and Consistency by Achievement Level (Grades 6–8)



Grade 6

All Students 24,368 82 91 77 72 89 75 85 69 62 83

Female 11,854 81 90 77 72 88 74 85 69 62 82

Male 12,514 83 91 77 72 89 76 86 69 61 84



Asian 305 84 90 79 72 91 77 84 68 63 87

Hispanic 4,189 84 92 78 72 85 78 89 69 60 76


White 19,097 81 90 77 72 89 74 84 69 62 83

LEP 1,928 86 93 77 71 85 80 90 68 60 73


Section 504 Plan 981 82 91 77 72 88 75 85 69 61 80

Grade 7

All Students 23,826 82 91 76 74 89 75 85 67 65 83

Female 11,643 81 90 76 74 88 74 83 67 65 82

Male 12,183 83 91 76 74 89 76 86 67 65 84



Asian 267 84 89 76 73 92 78 85 64 65 89

Hispanic 4,055 84 92 76 75 87 78 89 67 65 77


White 18,742 81 90 76 74 89 74 82 67 65 84

LEP 1,763 86 93 76 74 89 80 91 67 63 76


Section 504 Plan 1,061 81 88 77 74 88 74 81 68 65 81

Grade 8

All Students 23,892 81 90 72 71 90 74 84 63 60 85

Female 11,544 81 89 72 71 90 73 83 63 60 84

Male 12,348 82 91 72 71 91 75 86 62 60 86



Asian 351 84 89 73 71 95 78 82 63 62 91

Hispanic 4,067 83 92 72 71 87 77 88 63 57 79


White 18,560 81 89 72 71 90 73 83 63 61 85

LEP 1,785 86 93 72 70 88 80 90 62 57 78


Section 504 Plan 1,119 81 89 71 72 89 73 84 62 60 83




190


Table C–7. Mathematics Classification Accuracy and Consistency by Achievement Level (Grade 10)



Grade 10

All Students 22,274 81 89 69 75 90 74 84 59 65 84

Female 10,866 80 89 69 75 89 73 83 59 65 82

Male 11,408 82 90 68 75 91 75 85 58 65 86



Asian 309 81 87 70 75 92 74 80 61 66 88

Hispanic 3,469 84 91 68 75 87 78 88 56 64 78


White 17,686 80 89 69 75 90 73 82 59 65 85

LEP 1,287 87 93 68 76 89 82 91 55 61 84


Section 504 Plan 1,036 81 89 69 76 91 74 84 58 65 84




191


Table C–8. Mathematics Classification Accuracy and Consistency by Achievement Level (Grades 9 and

11)



Grade 9

All Students 6,515 79 89 68 70 85 71 84 58 60 74

Female 3,162 77 88 69 71 83 69 82 59 60 70

Male 3,353 80 91 68 70 87 73 86 57 60 77

African American 76 86 94 71 66* 81* 80 90 62 53* 74*

AmerIndian/Alaskan 108 82 91 68 65 83* 75 88 54 55 67*

Asian 51 78 92 65 68 94* 71 85 59 57 82*

Hispanic 1,107 81 91 67 70 83 74 87 57 56 68

Pacific Islander 41 83 93 63* 75* 89* 77 89 54* 67* 78*

White 5,132 78 89 68 70 85 70 83 58 60 75

LEP 404 85 93 66 70 83* 80 90 55 53 61*

Special Education 528 92 95 67 70 88* 88 95 49 58 66*

Section 504 Plan 197 79 88 67 69 83 71 85 56 58 74

Grade 11

All Students 421 87 95 73 77 88 82 92 64 66 83

Female 217 87 95 73 80 84 82 92 65 68 77

Male 204 87 94 73 73 90 82 92 63 62 86

African American 4**

AmerIndian/Alaskan 8**

Asian 4**

Hispanic 66 91 96 72* 67* 93* 88 95 59* 57* 88*

Pacific Islander 5**

White 334 86 94 73 77 88 80 91 65 67 82

LEP 17 99 99 – – – 98 99 – – –

Special Education 69 95 98 72* 75* – 94 98 54* 71* –

Section 504 Plan 15 84 93* 77* 80* – 76 83* 71* 57* –


**Suppressed data due to the small sample size, n < 10.

Cells with “– “ indicate that no student was observed in the performance level.

idaho standards achievement tests in english language arts and mathematics 2018–2019 technical...

Documents