state innovation priorities - accountabilityworksinnovations (e.g., computerized testing) address...

51
Arizona, Colorado, Florida, Georgia, Maryland, Massachusetts, Michigan, Ohio, Pennsylvania, Tennessee and Texas Developed by Contributors: John H. Oswald and Theodor Rebarber. Review and input by Richard W. Cross and S. E. Phillips. Editorial assistance by Thomson W. McFarland. STATE INNOVATION PRIORITIES for State Testing Programs

Upload: others

Post on 04-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

Arizona, Colorado, Florida,Georgia, Maryland, Massachusetts,

Michigan, Ohio, Pennsylvania,Tennessee and Texas

Developed by

Contributors: John H. Oswald and Theodor Rebarber. Review and input by Richard W. Crossand S. E. Phillips. Editorial assistance by Thomson W. McFarland.

STATE INNOVATIONPRIORITIESfor State Testing Programs

Page 2: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 1

Executive Summary

Over the last decade, testing hasbecome recognized as a keylever in states’ efforts to estab-

lish accountability for results in publiceducation. The role of accountability inreform will only be enhanced by thenew testing and accountability require-ments of the No Child Left Behind Actof 2001 (Public Law 107-110). Toaddress the demand for “better, cheap-er, and more” testing, states have cho-sen to collaborate on priorities forinnovation, with the benefit of testingindustry input and participation. Asthe product of this collaboration, theEducation Leaders Council andAccountabilityWorks have assisted par-ticipating states in developing theseState Innovation Priorities for StateTesting Programs in order to defineclearly state expectations for innovativetesting practices that should achievewidespread adoption and implementa-tion. This effort included an intensiveTesting Summit held on February 20-21, 2002 in Austin, Texas, attended bystate education and testing industryleaders, where participants providedinput and suggestions regarding thisdocument. (A list of Test Summit atten-dees is on pp. 19-24.) The ELC Summitwas made possible by a grant from theU.S. Department of Education to theState Education Policy Network(SEPN).

In a market economy, it is expectedthat industry will respond to clearlyarticulated needs with innovativeproducts and services. Some needs,however, may best be addressedthrough innovative cooperation

between interested states. This docu-ment presents 25 innovations of bothkinds in a table divided into two timeperiods, with these further dividedaccording to whether the innovationsrepresent a “high” priority. The tableincludes concise descriptions for eachinnovation of potential benefits as wellas major challenges. Following thistable, there is an extended discussionorganized according to state needs:Test Content, Technical, Logistics,Interpretation, Application, andBusiness. It is worth noting that anumber of the innovations describedherein are already being implementedin some form by diverse state agen-cies, testing companies, and researchorganizations, though none are wide-spread and implemented to their fullpotential. Development work on manyof the short- to mid-term innovations—such as the Common Item Bank—needs to be intensive now if they areto be ready for implementation in theshort- to mid-term. The fact that thereis a degree of knowledge and experi-ence with many of these innovationsencourages us to view them as promis-ing, and therefore worthy of inclusion.

The list of innovations, grouped bytimeframe and relative importance, isas follows:

TIME PERIOD 1 INNOVATIONS(IMMEDIATE):

High Priorities1 Computerized or Web-based

Reporting 2 Coordinated Diagnostic Tests

Page 3: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

2 � EDUCATION LEADERS COUNCIL

3 Collaborative Resource on StateAssessment

4 Reliable System for Linking TestRecords to Students

5 Computerized Scoring of Open-ended Responses

6 Integrated Test ReadinessPrograms

Other Priorities7 Customized Off-the-shelf Tests8 Randomized Forms9 Local Scoring Option10 Integrate Important Benefits of

Norm-referenced Tests into StateStandards-based Tests

11 Enhanced Quality ControlProcedures

12 Distributed Scoring13 Valid Program Evaluation Models14 Standard Procedures for Bias

Studies15 Reports of Teacher Effectiveness

TIME PERIOD 2 INNOVATIONS (SHORT-TERM TO MID-TERM):

High Priorities1 Common Item Bank2 Computer-administered Tests3 Coordinated Multiple Measures

Other Priorities4 Multi-tiered Continuous

Assessment Systems5 Power Objectives6 Staff Development in Assessment

Development and Use forTeachers

7 Consistent Criterion-referencedInterpretive Methodology forClassroom Assessment

8 Teacher Certification inAssessment Knowledge

9 Individually-administered ClinicalTests to Measure State AcademicStandards

10 Improved Early Childhood Tests

EXECUT I VE SUMMARY

This electronic document is the full version of the printed, abridged report.

Page 4: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 3

Over the last decade, testing hasplayed an increasingly promi-nent role in the campaign to

improve our nation’s public schools.Parents, employers, policymakers, and,increasingly, educators have movedsteadily toward a greater focus onresults in exchange for every educationtax dollar—including better results forgroups that historically have notachieved at high levels. Testing is theinstrument that has allowed this shiftto occur. Currently, nearly all thestates and territories have institutedsome form of assessment program foraccountability.

New challenges, however, haveemerged with the increased emphasison testing. Because educational policyis not nationalized and is generallyreserved to the states, the states haveproceeded independently of each otherin developing and implementing manytesting programs that are redundantwith testing programs in other states.Given the diversity of the nation andthe poor track record of previous edu-cational reform initiatives, decentraliza-tion of educational policy has providedthe benefits of innovation and avoid-ance of big mistakes, but it has alsoresulted in strain for the industry andhigh costs for states. In many cases, thetesting industry has struggled to keepup with state expectations regardingthe volume of new test development,scoring, and program delivery. Thesedemands will only increase under therequirements for additional testing inthe No Child Left Behind Act of 2001(Public Law 107-110).

The testing industry and the states aredependent upon each other for thesuccess of state assessment programsand educational improvement. It istherefore appropriate for states to vol-untarily engage in collaboration ontesting innovation priorities with thebenefit of industry input and participa-tion. To identify such priorities, theEducation Leaders Council andAccountabilityWorks, Inc., have assist-ed participating states in developingthese State Innovation Priorities for StateTesting Programs in order to defineclearly their expectations for innova-tive testing practices that shouldachieve widespread adoption andimplementation. This effort includedan intensive Testing Summit held onFebruary 20-21, 2002 in Austin, Texas,attended by state education and testingindustry leaders, where participantsprovided input and suggestions regard-ing this document. (A list of attendeesis included at the end of this docu-ment.)

This document presents needs for thefuture of state testing, especially inlight of P.L. 107-110. It is designed as acatalyst for discussion between theleaders of the testing industry and theleaders of statewide education sys-tems. In a free enterprise economy, animportant market need will result ininnovations from the providers ofproducts and services to meet thatneed. In this case, an established needof the states should result in the test-ing industry’s development of innova-tive products and services to improvecustomer satisfaction. Further, some of

Introduction

Page 5: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

4 � EDUCATION LEADERS COUNCIL

IN TRODUCT ION

the testing innovations described herewill not result from new or improvedproducts from industry but fromenhanced cooperation between states.After a number of years of experimen-tation with testing, states are at a pointwhere they can intelligently discussways to work together that strengthentheir ability to implement accountabili-ty systems efficiently while also accel-erating reform.

This document is organized in twoparts. First, a table summarizes theinnovations by A) timeframe for imple-mentation and B) whether each innova-tion is a high priority. The table alsosuccinctly describes key benefits andchallenges for each innovation. Second,an extended discussion of the innova-tions is organized according to stateneeds: Test Content, Technical,Logistics, Interpretation, Application,and Business. As the needs are present-ed, the discussion identifies the innova-tions that could help meet them. Someinnovations (e.g., computerized testing)address multiple sets of needs.

This document presents 25 innovations,divided into two time periods.Innovations in time period 1, labeled1.01 through 1.15, address pressingneeds for which the necessary technol-ogy, personnel, or other infrastructureare ready for immediate implementa-tion. Innovations in time period 2,while similarly important, are on alonger timetable due to technological,logistical, or statutory obstacles thatmust be overcome for their widespreadimplementation. Development effortson these short- to mid-term innova-tions—such as the Common Item Bank(2.01)—need to be intensive already orto begin immediately if they are to beready for implementation in the short-to mid-term. Apart from the division ofinnovations into those that are “high

priorities” and those that are “other pri-orities,” no judgment about the relativeimportance of the innovations shouldbe inferred from the order in whichthey are listed. It should be noted thatthe quality and effectiveness of theseinnovations, particularly those that areto be implemented now, will improveover time. Thus, innovations that areimplemented immediately might onlyproduce a modest effect in the shortterm, but have a substantially greatereffect as the innovation is researchedand refined.

It is also worth noting that several test-ing companies, research organizations,and state agencies are creating or haveimplemented some of the innovativepractices that are described herein. Thereader should assume that there is adegree of knowledge and experiencewith many, if not all, of the innovationsdescribed. As a matter of fact, some ofthese practices have been available foryears (e.g. randomized forms and cus-tomized norm-referenced tests). Thosepractices are “innovations” for the pur-pose of this document only in that theyare not yet widely applied and canstand enhancement through eithertechnology or research.

Throughout this document, the terms“testing companies” and “industry”apply generically to refer to allproviders of test content, printing, scor-ing, validation and other testing-relatedservices, whether for-profit or not-for-profit, public or private. The term“state” applies to the educational enter-prise of the fifty states, territories, andother appropriate jurisdictions (such asthe Department of Defense DependentSchools) and includes state educationagencies (SEAs), state boards of educa-tion, and other official educational enti-ties.

Page 6: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

1.02:CoordinatedDiagnosticTests

New Products � Involves teachersmore deeply in stateassessment system

� Monitors studentprogress more fre-quently

� Provides for prac-tice in test takingskills

� More timely resultsfor diagnosis andremediation

� Require staff devel-opment in theirappropriate use

� Need to be integrat-ed with teachingtime in a way to addand not detract frominstruction

� High cost

Systems of diagnos-tic, classroom-basedassessments that arekeyed to the statestandards andassessments. Thesetests allow teachersto do formative eval-uations throughoutthe year to improvestudent perform-ance.

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 5

Innovations Table

1.01:Computerizedor Web-basedReporting

InformationTechnology

� Decreases time andcost of paper reporthandling, assumingsome functions arereplaced

� Allows interactivereports that aremore informativeand tailored to theindividual needs ofthe reader

� Allows more rapidreporting

� Requires computers,networks and infra-structure.

� Requires systemmaintenance

� Requires securitycontrols

� Not accessible to all,especially parents ofsome subgroups

A paperless systemof providing real-time accessibility toreports either on aschool computernetwork or on theInternet. Ideallythese reports wouldbe interactive, allow-ing cross-referencingand detailed investi-gation and explana-tion of scores formultiple types ofaudiences.

TIME PERIOD 1: IMMEDIATE

Innovation | Description | Category | Benefits | Challenges

HIGH PRIORITY INNOVATIONS

Page 7: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

6 � EDUCATION LEADERS COUNCIL

INNOVAT IONS TAB L E

TIME PERIOD 1: IMMEDIATE

Innovation | Description | Category | Benefits | Challenges

HIGH PRIORITY INNOVATIONS

1.03:CollaborativeResource on StateAssessment

Collaboration � Benefit from thediverse past experi-ences of states

� Cost savings in solv-ing several of thechallenges

� All states benefitfrom top nationalexperts

� Accelerates imple-mentation

� Greater range ofexperiences allowsfor faster and betterrevisions whereneeded

� Allows for “unitedfront” in states’efforts to publiclydefend testing pro-grams

� Some differences in state law and regulations addcomplexity

� Who pays?

An organizationserving as a resourceto states on a rangeof challenges relatedto assessement:a) the legal and psy-chometric defensibil-ity of assessments; b) timelines for test-ing program sched-ules; c) state/vendor con-tract templates;d) public defenseand communicationon state testing;e) “safe harbor”model accommoda-tions classificationsystem for specialneeds students, withoptions for combin-ing scores;f) similarly, standardprocedures for biasstudies; g) voluntary systemof describing resultsusing standards-based achievementlevelsh) other issues asneeded.

Page 8: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 7

INNOVAT IONS TAB L E

TIME PERIOD 1: IMMEDIATE

Innovation | Description | Category | Benefits | Challenges

HIGH PRIORITY INNOVATIONS

1.04: ReliableSystem forLinking TestRecords toStudents

InformationTechnology

� Meets requirementsof law

� Allows for studenttracking and thusbetter studentinstruction wheninformation is prop-erly applied

� Allows for moreeffective programevaluation

� Allows for moreeffective and fairteacher evaluation

� Lower administra-tive cost

� High initial invest-ment in developingsystem

� Requires coopera-tion of many agen-cies

� Possible privacythreats must bedealt with

� May be politicallyor legally difficult insome states

� Pre-gridding adds totime before the test

A set of affordablemethods to reliablyattach test results tostudents withoutunreasonable admin-istrative effort, andeven when studentsare mobile.

1.05:ComputerizedScoring ofOpen-endedResponses

InformationTechnology

� Reduces one of thehighest cost compo-nents of the assess-ment system – handscoring

� Reliability and accu-racy has beenshown to be equalto or better thanhuman readers insome studies

� Extremely sophisti-cated logic can beapplied to manyinnovative itemtypes

� Allows the cost-effective develop-ment of multiplemeasures

� Can be used aloneor in combinationwith human reading

� Increased speed ofscoring if usedalone

� Competing systemsuse different artifi-cial intelligencemodels, so a modelmust be chosen first

� Possible credibilityproblems with thegeneral public andwith teachers

� Possible for studentsto learn techniquesto “fool” some of thesystems

� Requires eithercomputer-adminis-tered tests, or con-version of handwrit-ten responses intocomputer readabletext (not just images)

� Works better forsome types of open-ended questions(e.g., essays) thanothers (e.g., shortanswers)

A method by whichopen-ended ques-tions are scoredentirely or partiallyby computer withlittle or no humanscoring.

Page 9: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

8 � EDUCATION LEADERS COUNCIL

INNOVAT IONS TAB L E

TIME PERIOD 1: IMMEDIATE

Innovation | Description | Category | Benefits | Challenges

HIGH PRIORITY INNOVATIONS

1.06:Integrated TestReadinessPrograms

New Products � Protects instruction-al time while stillproviding testpreparation

� Makes scores morevalid

� Improves equity

� More costly thansimple test prep

� Requires carefullyintegrated develop-ment of tests andtest readiness curric-ula

� Requires staff devel-opment to be usedproperly

Programs that simul-taneously teachmeaningful contentaround the state aca-demic standardswhile improvingtest-taking skills.

Page 10: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 9

INNOVAT IONS TAB L E

TIME PERIOD 1: IMMEDIATE

Innovation | Description | Category | Benefits | Challenges

OTHER INNOVATIONS

1.07:CustomizedOff-the-shelfTests

Content Sources � Less expensive� Faster to develop� Provide normative

data for measuringgains and other pur-poses

� High technical qual-ity of core test

� Measurement ofsome standards(those on shelf test)is preset to the pub-lisher’s item types

� Not as unique to thestate as a totallycustom test.

� Would not allow forrelease of items.

� Some potential psy-chometric problemswith norm-refer-enced scores if pub-lisher’s test ischanged enough

Publishers’ norm-referenced or criteri-on-referenced teststhat are supplement-ed with sets of itemsto round out themeasurement ofstate academic stan-dards.

1.08:RandomizedForms

AssessmentStructure

� Cost efficient wayof improving securi-ty, depending uponthe differencebetween forms

� Allows efficient try-out of new items

� Psychometric issueswith comparabilitymust be addressed.

Tests in many alter-nate forms adminis-tered across studentsso that students inthe same classreceive differentforms. While this isbest done in com-puter administra-tion, it can beaccomplished tosome degree inpaper and penciladministrations.

Page 11: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

TIME PERIOD 1: IMMEDIATE

Innovation | Description | Category | Benefits | Challenges

OTHER INNOVATIONS

10 � EDUCATION LEADERS COUNCIL

INNOVAT IONS TAB L E

1.09: LocalScoring Option

New Products � Allows rapid scor-ing and generationof reports neededfor instructionalintervention andprogram selection(e.g., summerschool)

� Requires an invest-ment in local scor-ing capability,including staffingand training

� Would not helpturnaround timewith items requiringhand scoring

� Could be more diffi-cult with complexprograms requiringcoordination of scor-ing of multiplebooklets

� Security problems� Scannable forms

sometimes scan dif-ferently the secondtime because of car-bon transfer

� Redundant scoringis not cost effective

A system wherebyLEAs can score testslocally for immedi-ate results prior tosending answermedia to the testscoring contractorfor official statewidescoring and report-ing.

1.10: IntegrateImportantBenefits of Norm-referencedScores into StateStandards-basedAssessments

New Products � Allows betterunderstanding bystakeholders

� Allows for value-added assessment

� Requires overcom-ing some philosophi-cal objections tocomparisons

� For national com-parisons, requiressome elements ofstandardized testingor NAEP items

Data providing com-parisons of perform-ance—within-stateor national—that addto the informativepower of stateassessments, as wellas key design ele-ments such as verti-cal equating, with-out replacing orcompromising thecriterion-referencedinterpretation sys-tem.

Page 12: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

TIME PERIOD 1: IMMEDIATE

Innovation | Description | Category | Benefits | Challenges

OTHER INNOVATIONS

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 11

INNOVAT IONS TAB L E

1.11:EnhancedQualityControlProcedures

InformationTechnology

� Efficient methods toimprove scoringaccuracy withoutexpensive, and falli-ble human interven-tion

� Applications ofaccepted industrystandards, such asISO 9000 and others

� Not foolproof anddoes not eliminatethe need for carefulplanning, execution,and state review

� Certification inindustry qualitycontrol standards isvery expensive fortesting companiesand costs could bepassed on to cus-tomers

Multiple improve-ments in qualitycontrol in printingand scoring tests,including: automat-ed methods of moni-toring quality con-trol in scoring anddetecting errorsbefore they affectreporting; applica-tion to the testingindustry of estab-lished industry stan-dards in manufac-turing, project flow,software and com-munications.

1.12:DistributedScoring

InformationTechnology

� Reduces cost ofhand scoring byallowing individualsto score from theirhomes and officeson their own sched-ule, which oftenmeans having topay a lower rate forthe convenience

� Allows the use ofhighly trained andqualified scorerswho otherwisewould not be will-ing to do the work,thus improvingquality

� Allows a largernumber of scorersin a more flexiblemanner, thusdecreasing turn-around time

� Still requires cen-tralized training

� Greater requirementfor monitoring scor-ing drift and agree-ment since the scor-ers are not underthe visual supervi-sion of management

� Requires both imag-ing technology andscanning (adds somecost) or computeradministration ofthe test. (Imagescanning might notbe an added cost ifthe documents arescanned anyway forother purposes)

A method for scor-ing open-endedresponses more effi-ciently by either (a)capturing the stu-dent’s writtenresponse using imag-ing technology, or(b) having the stu-dent enter responsesdirectly into a com-puter, and sendingthe responses totrained scorers “atlarge” who can workfrom their homes oroffices on their ownschedules. The scor-ers score the itemson their personalcomputers and relaythe scores, whichare then reconciledwith the secondscorer and acceptedor referred for a res-olution scoring. Thesame quality controlprocedures are usedas for hand scoringin scoring centers.

Page 13: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

12 � EDUCATION LEADERS COUNCIL

INNOVAT IONS TAB L E

TIME PERIOD 1: IMMEDIATE

Innovation | Description | Category | Benefits | Challenges

OTHER INNOVATIONS

1.13: ValidProgramEvaluationModels

Collaboration � Makes test resultsmore valuable fordetermining whatworks

� Allows for morevalid decisions thansimple score obser-vations

� Requires carefulresearch

� Some modelsrequire more elabo-rate, expensive programs with more types of dataprovided.

Testing industryshould cooperatewith researchersworking on modelsfor using stateassessment pro-grams for programevaluation.

1.14:StandardProcedures forBias Studies

Collaboration � Creates a set ofmodel standards,saving states fromoriginal work eachtime

� More legally defen-sible

� Funding should beeasier

� Requires agreementon which biasanalyses should beperformed

� Requires sign-off byauthorities

� States would bechoosing to followestablished proce-dures rather thaninventing their own

Agreement onmethodologies forbias studies (i.e.,which bias analysesshould be per-formed) that areknown and fundedby law-making bodies.

1.15: Reportsof TeacherEffectiveness

New Products • Provides fairermeasure of teacherperformance thancurrent simplisticmodels

• Helps teachersimprove theirinstruction

• Helps administra-tors work withteaching staff toimprove studentlearning

• Requires good longi-tudinal tracking andcollection of demo-graphic information.

• Models need furtherresearch.

• Impacts early plan-ning of entireassessment systemso that the properdata is obtainable

Valid reports of stu-dent gains whileunder the instruc-tion of an individualteacher, with spuri-ous data eliminated.Reports wouldaggregate data frommultiple groups ofstudents taught bythe same teacher.

Page 14: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 13

INNOVAT IONS TAB L E

TIME PERIOD 2: SHORT-TERM OR MID-TERM

Innovation | Description | Category | Benefits | Challenges

HIGH PRIORITY INNOVATIONS

2.01: CommonItem Bank

Content Sources � More cost-efficient,whether the itemsare released annual-ly or kept secureand reused

� Less wasteful thandeveloping all newitems in each stateto measure, sub-stantially, the sameobjectives

� Higher technicalquality of tests byusing items withmore research

� Requires some enti-ty to monitor accessand manage the sys-tem

� Effectivenessdepends upon thedegree to whichsame items can beused for multiplestates with differentstandards; increasedcooperation at earli-er stages, such asdevelopment ofstandards, wouldincrease efficiencyand savings

Sets of calibratedand researcheditems and test ele-ments tied to a mas-sive set that is thecombination of par-ticipating state aca-demic standards.Work would beginimmediately, withthe first productsavailable in three orfour years. Theitems and test ele-ments would bekept in a computer-ized content man-agement system forefficient and secureaccess by qualifiedindividuals whocould assemble testsas needed.

2.02:Computer-administeredTests

InformationTechnology

� Lower cost (onceinitial investment ismade)

� Allows many otherinnovations

� Increases speed ofdevelopment, scor-ing and reporting

� Allows for new itemtypes impossiblewith paper and pen-cil

� Initial cost is consid-erable (more com-puters, networks,development andtraining)

� Requires technologi-cal maintenance instates, LEAs andschools

� Some believe that,like other technolo-gy innovations inthe classroom, manyof the potential ben-efits will never berealized

� Fairness to minoritystudents who lackequal access to com-puters at home orduring other schoolhours might createopportunity to learnchallenges

State assessmentsadministered toevery student oncomputers withoutthe need for printedtests.

Page 15: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

14 � EDUCATION LEADERS COUNCIL

INNOVAT IONS TAB L E

TIME PERIOD 2: SHORT-TERM OR MID-TERM

Innovation | Description | Category | Benefits | Challenges

HIGH PRIORITY INNOVATIONS

2.03:CoordinatedMultipleMeasures

New Products � Provides coverageof some standardsthat are hard toassess

� Meets a require-ment of P.L. 107-110

� High cost� Requires research

for fairness

Measures of aca-demic standardsusing more than onemeasurementmethodology, e.g.,performance assess-ments or technologyassessments, butcoordinated with theprimary assessmentfor fairness andcomparability. Thesecould include non-assessment data, likegrade point average.

Page 16: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 15

INNOVAT IONS TAB L E

TIME PERIOD 2: SHORT-TERM OR MID-TERM

Innovation | Description | Category | Benefits | Challenges

OTHER INNOVATIONS

2.04: Multi-tieredContinuousAssessmentSystems

AssessmentStructure

� Increases contentcoverage withoutincreasing test size,potentially provid-ing more usefuldiagnostic informa-tion

� Involves teachersmore deeply in stateassessment system

� Monitors studentprogress more fre-quently

� Provides for prac-tice in test takingskills

� More timely resultsfor diagnosis andremediation

� Not all standardswould be coveredon end-of-yearstatewide test,though all major cat-egories of standardswould be covered

� Students who aresubstantially belowgrade level, or sub-stantially above,may not be studyingcontent in the grade-level standards.

� Less consistency inassessment of stan-dards that are meas-ured under non-standardized condi-tions

� Requires extensivestaff development

� To combine scoresacross assessmenttiers, requires com-plex data integrationfor which someschools do not havethe technology

� Requires eithercomputer-adminis-tration or complexperformance main-tenance systems

� Conditions foradministration areless standardized

� Tests are less securefor high stakes

Integrated sets ofassessments consist-ing of a standard-ized, statewide,every-student testingprogram for highstakes and mid-stakes and/or low-stakes assessmentsof the same or relat-ed standards that areadministered byclassroom teachersin informal pro-grams. The resultsare integrated to cre-ate a comprehensivepicture of studentachievement andreported in bothintegrated and sepa-rated forms. Thescoring and record-ing of student per-formance would beintegrated through-out the year into adatabase.

Page 17: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

16 � EDUCATION LEADERS COUNCIL

INNOVAT IONS TAB L E

TIME PERIOD 2: SHORT-TERM OR MID-TERM

Innovation | Description | Category | Benefits | Challenges

OTHER INNOVATIONS

2.05: PowerObjectives

AssessmentStructure

� Decreases test sizeby measuring onlyhigher order objec-tives

� Supports multi-tiered assessmentsystems

� Maintains higheststandards forstatewide assess-ment

� Requires supple-mentary classroomassessment system,which adds cost

� Requires identifica-tion of power objec-tives

� Requires complexdata integration forwhich some schoolsdo not have thetechnology.

Higher-order objec-tives that subsumelower-order orenabling objectives.In order for a stu-dent to be proficientin a power objective,he or she must beproficient in all ofthe subordinateobjectives.

2.06: StaffDevelopmentin AssessmentDevelopmentand Use forTeachers,especiallyonline.

New Products � Better trained teach-ers will use theassessments effec-tively for studentlearning

� E-learning providesflexibility of timeand place

� Lower cost than tra-ditional methods

� High initial cost ofdevelopment

� Teachers needaccess to the com-puters and Internetfor online compo-nent

� Some teachers (likesome students)would not learnwell online

Exemplary programsthat train teachers todevelop, use, andunderstand assess-ments using e-learn-ing techniques,either alone or inconcert with instruc-tor-led training.

2.07:ConsistentCriterion-referencedInterpretiveMethodologyfor ClassroomAssessment

New Products � Allows betterunderstanding bystakeholders

� Provides consisten-cy between class-room and stateassessment interpre-tation

� Requires agreementamong professionalson a system.

A uniform, under-standable way ofgrading and report-ing on informalassessments that iscoordinated with thestate assessment sys-tem.

2.08: TeacherCertification inAssessmentKnowledge

New Products � Teachers wouldknow what theyknow and do notknow

� Administratorscould plan for staffdevelopment moreeffectively

� Portability of certifi-cation would helpteachers if theymove

� Teachers’ unionsmight object to cer-tification require-ments if they areimposed on currentteachers (ratherthan limited to newteachers); limiting tonew teachers wouldreduce impact

A portable certifica-tion of teacher com-petence in the devel-opment, use andinterpretation ofassessments.

Page 18: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 17

INNOVAT IONS TAB L E

TIME PERIOD 2: SHORT-TERM OR MID-TERM

Innovation | Description | Category | Benefits | Challenges

OTHER INNOVATIONS

2.09:Individually-administeredClinical Teststo MeasureStateAcademicStandards

New Products • Allows makeuptests and more validtests of special pop-ulations

• Results can be inte-grated with thestate assessmentprogram

• Clinically moreappropriate thanadministering thegroup-assessment toa single individual

• Individual adminis-tration is costly

• Requires highlytrained administra-tors and pull-outfrom the classroom

• Possible measure-ment issues withcomparability ofscores betweengroup- and individu-ally administeredtests, even of samecontent.

Clinical instrumentscoordinated with thestate assessmentprogram objectivesand information sys-tem that can beadministered to stu-dents who areunable to take thestandardized group-administered stateassessment.

2.10:ImprovedEarlyChildhoodTests

New Products � Allow measurementof important skillsin the early gradesthat inform instruc-tion and programevaluation beforethe compulsory test-ing at grades 3-8

� Better prepareyoung students fortesting at therequired grades

� Controversial areaof testing

� Fear that forcingyoung children intostandardized meth-ods is inappropriate

Tests designed foradministration inPre-Kindergartenthrough Grade 2that provide validand reliable meas-urement of emerg-ing skills and readi-ness without the dis-advantages thatmany current formsof standardized testshave at those grades.

Page 19: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time
Page 20: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 19

D I SCUSS ION

Discussion of InnovationsAccording to State Needs

TYPES OF NEEDS

The needs of the states fall into sixgeneral areas: test content, test techni-cal characteristics, logistics of testingprograms, interpretation and dissemi-nation of results, application ofresults, and the business of runningtesting programs.

Test Content needs deal with whatthe tests measure and how it meas-ured.

Technical needs deal with the tests asmeasurement instruments: the qualityof the data that tests yield and theresearch that backs up the integrity ofthe test scores.

Logistics needs deal with test devel-opment, dissemination, administra-tion, scoring and reporting.

Interpretation needs deal with theunderstanding of the measurementprovided by the tests by the audienceswho need to understand the results.

Application needs deal with turningthe test results into improved educa-tion.

Business needs deal with choosingvendors and partners, financing testprograms and getting value for thepublic.

TEST CONTENT

Content validity is often cited as themost critical quality measure for anassessment program. Most states gothrough painstaking processes todefine what the schools should be

teaching, and want tests that validlymeasure the state academic standards.This is an expensive, time consumingprocess that often involves significantdebate.

Most state testing programs in the pastten years have been standards-basedtesting programs. The new ESEAauthorization (P.L. 107-110) mandatesthat states implement annual readingand math assessments for grades 3-8,which is the key principle of PresidentBush’s ambitious plan for holdingstates and local school districts usingfederal funds accountable for improv-ing students’ academic achievement.This will be followed by other mandat-ed assessment grades and subjects.Although states can select and designassessments of their choosing, theymust be aligned with state academicstandards. Many states test in addition-al subjects and/or additional grades.

Historically, some states adoptednationally-normed achievement testsfrom publishers who did rigorousanalyses of curriculums and gleanedfrom their analyses a common corecurriculum on which students could becompared, regardless of the state inwhich the student went to school.Since those national tests from thepublishers measured only a core cur-riculum that was common across thenation, many schools and states sup-plemented the publishers’ tests withcriterion-referenced tests of their ownthat would either be integrated withthe norm-referenced test or adminis-tered separately. The shift towards

Page 21: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

20 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

standards-based assessment accompa-nied the movement toward high-stakestests of content that seek to ensure stu-dent mastery of state learning goals.

In general, states need to make surethat the content of their tests is appro-priate and that they have coverage ofthe state academic standards, allow forcomparability from year to year, be ofobjective knowledge, and avoid assess-ment of personal family beliefs andattitudes.

There is a set of needs relative toimproving the development of contentstandards that are not addressed in thisdocument because, in most cases, theyare preliminary to the involvement ofthe testing industry. For the purposesof this discussion it is assumed that thestate academic content standards havealready been developed, and the issueis representation and coverage of thosestandards in the state assessment pro-gram.

Need: Better coverage of standardswithout adding testing time

The current state of affairs with regardto content coverage in state assess-ments is that most states struggle withappropriate emphasis, comprehensivecoverage, and appropriate types ofitems that are fair to all students.States must constantly weigh contentcoverage against cost and testing time,which combine to constitute “test size.”For example, if the state academic stan-dards are detailed, providing greaterguidance for teachers and curriculumdevelopers, coverage must focus oncritical skills, sampling, or a combina-tion of the two.

Numerous methods exist for balancingcontent coverage with test size. Onemethod is matrix testing, in whicheach student is presented with a subset

of the state content. This method wasused in the past by states that empha-sized group scores over individual per-formance scores. Matrix testing fellinto disfavor some time ago, andwould certainly be difficult to imple-ment appropriately under NCLB,which requires individual studentinformation. Various innovations havebeen implemented over the past twen-ty years to blend matrix testing withstudent census testing, but none haveproven practical in use.

Another method for increasing contentcoverage without increasing test size isto share the burden of testing all of theacademic standards between multipletiers of assessment from high stakes tolow-stakes multi-tiered continuous assess-ment systems1. For example, the high-stakes accountability test could focuson a smaller set of academic standardswhile less formal lower-stakes testscould be added into the assessmentprogram via coordinated classroomassessments handled by teachers.

Continuous multi-tiered assessmentcould be delivered on a teacher-select-ed schedule allowing the teacher tochoose the order in which instructionalcomponents are delivered, but stillrequiring that all of the continuousassessment components be used withinthe year. Continuous assessment couldreduce the amount of time needed forthe state end-of-year tests, since all ele-ments of the standards would be testedin short tests during the year, thusallowing the state test to be shorter byonly selecting the core elements forfinal accountability. Each year, statescould choose different core elementsfor the high stakes test, eliminating thetemptation for teachers to only teachthe core elements, since continuousassessments would cover all of thestandards and be recorded for diagnos-

1Innovation 2.04: Multitiered Continuous Assessment Systems: Integrated sets of assessments consisting ofa standardized, statewide, every-student testing program for high stakes and mid-stakes and/or low-stakesassessments of the same or related standards that are administered by classroom teachers in informal pro-grams. The results are integrated to create a comprehensive picture of student achievement. The scoringand recording of student performance would be integrated throughout the year into a database.

Page 22: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 21

D I SCUSS ION

tic and evaluation purposes.

Continuous assessment would alsoallow for a systematic record of deliv-ery of instruction and incrementalachievement that can be helpful toschools and states in demonstrating thepresence of opportunity to learn for allstudents. One caution is that, whilemultitiered continuous assessmentsmay be useful in driving improvementsin curriculum and instruction, theirvalue as valid and reliable externalevaluations (assessments) of studentprogress must be carefully examined.

A number of states have implementedthese multi-tiered programs. The chal-lenge is to determine which objectivesare worthy of placement on the high-stakes test that every student in thestate takes under standardized condi-tions. One method for doing this is theidentification of power objectives2.Power objectives are summative innature and cannot usually be accom-plished by students without proficien-cy in subordinate objectives that areprerequisite to the power objective. Forexample, for a student to understandhow to solve a simple story problem inmathematics that involves subtractingtwo numbers, the student must knowhow to subtract two numbers and howto solve simple story problems.Therefore, the inclusion on the task ofonly the higher level standard willshow proficiency in both of the lowerlevel standards as well as the ability tointegrate them into a higher task. Theproblem with only testing power objec-tives is that students who fail toachieve the power objective might stillhave proficiency in one or more of therelevant subordinate objectives. If thetest doesn’t measure those lower-levelobjectives, that diagnostic informationis not available to the educators aboutthat student. If the information onattainment of the lower level standards

is essential to some high-stakes testingpurpose, it must be included on thehigh-stakes test. If it is not relevant tothe high-stakes interpretation (e.g.,accountability or adequate yearlyprogress evaluations) but it is relevantto the instruction of the student, thatneed can be filled by a classroomassessment administered by theteacher in a low-stakes setting. Theissue of diagnostic assessments coordi-nated with the state assessment systemis addressed later on in this document.

In order for this type of multi-tieredtesting to meet the needs of the states,there would have to be an establishedsystem of coordinated tests that arelow-stakes and high-stakes. Therewould also have to be an acceptablesystem for determining what thepower objectives are and which subor-dinate objectives are subsumed insideof power objectives. This wouldrequire research as well as develop-ment of an appropriate assessment sys-tem and coordination between theinformation systems of the high-stakesand low-stakes tests.

Need: More customization of itemsand subtests

Another need is for more customiza-tion of test content. This need arisesfor those states interested in keepingcosts low by avoiding the developmentof custom tests from scratch. (For thepurpose of this document, customassessments are defined as new testbuilt entirely from scratch for a specif-ic purpose, such as a state assessmentof academic standards; customizedassessments are defined as existingtests that are modified to make themappropriate for a specific purpose.)

One innovation would be to have alibrary of test content from which tochoose items and assemble state tests

2 Innovation 2.05: Power Objectives: Higher-order objectives that subsume lower-order or enabling objec-tives. In order for a student to be proficient in a power objective, he or she must be proficient in all ofthe subordinate objectives.

Page 23: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

22 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

of academic standards, or a commonitem bank3. Rather than publishers pro-viding an intact test form, they wouldprovide small test-lets or even individ-ual items. This would allow states toadminister targeted custom tests thatmeasure their state content standardsat a lower of cost with acceptableresearch backing up the tests.

A common item bank would save thestates considerably in the cost of devel-oping tests, but so would using thesame the test. The issue is to examinethe trade-offs between independentdevelopment of standards and assess-ments by the states and the high costof developing over 50 complete, multi-subject, mutigrade assessment systemswithin the limited resources of theassessment profession. To the extentthat states are willing to agree thatsome or many of their standards areessentially the same, and are willing toagree to similar test item types tomeasure those standards, a commonitem bank can be very helpful.

Test items must be constructed to meetstrict test specifications and item speci-fications. Items are not necessarilyinterchangeable since an item writtenin one state may not meet the specifi-cations desired in another. In order foran item bank to be credible and help-ful to the states, they must containinformation on where the items origi-nated, what objectives they measure,what specifications they match, andthe statistical performance of theitems.

Modern computerized item bankingsystems make up for many of theshortfalls of past solutions. They storegraphic images and allow the develop-ment of print quality copies of the

items and the graphics, thus making itpossible to assemble useable testforms.

Another innovation would be to startwith an existing test, such as a pub-lished norm-referenced test, and cus-tomize it to provide accurate and com-prehensive measurement of state aca-demic standards4. This need arises forthose states interested in phasing in acustom state assessment while startingwith an off-the-shelf test as well asthose states interested in saving fundsby avoiding ever having to develop fullcustom tests. One disadvantage of thisapproach is that it prevents states fromreleasing the off-the-shelf test items,which are owned by the test developer,substantially increasing test securityrisks.

When a state sets out to customize anexisting test, publishers often put somerestrictions on the extent of customiza-tion they will permit in order to main-tain the integrity of the test’s technicaldata and norms. In one method, thepublisher would develop a short form,or abbreviated version, of the standard-ized test. The short form would be thepublisher’s best effort at creating thebriefest test to which it can attachnational performance data. The statewould then have a testing companycreate supplemental forms that wouldadd content coverage to the test. Itemsfrom the abbreviated standardized testwould be added to items in the supple-mental form to create reliable meas-urement of a particular academic stan-dard. This method has the advantageof high quality on most of the test andthe provision of national performancedata, which many states desire. It hasthe disadvantage of not providingstates with the exact type of content

3 Innovation 2.01: Common Item Banks: Sets of calibrated and researched items and test elements tied toa massive set that is the combination of all state academic standards. These items and test elementswould be kept in a computerized content management system for efficient and secure access by qualifiedindividuals who could assemble tests as needed. Such common items could be released annually or keptsecure for reuse.

4 Innovation 1.07: Customized Off-the-shelf Tests: Publishers’ norm-referenced or criterion-referencedtests that are supplemented with sets of items to round out the measurement of state academic standards.

Page 24: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 23

D I SCUSS ION

coverage and item types that theymight want in a custom test. The tech-nical quality of the norm-referencedscores are also dependent upon thedegree to which the final test approxi-mates the version that was standard-ized intact(the original form). There areother models for customization as well,each with advantages and disadvan-tages.

Need: Multiple measures to assessstandards

A third need with regard to test con-tent is the need for more multiple meas-ures5 that are comparable and fair. Oneof the criticisms of current state testingprograms is that they do not allow forsufficient coverage of difficult-to-assessstandards. For example, a language artsstandard may call on students to devel-op effective debating or presentationskills. Such a standard is generallyimpractical to assess in a large-scaleassessment environment, but needs tobe assessed by teachers.

Another often-cited purpose for multi-ple measures is to allow for studentvariations in learning style and test-tak-ing style across students. For example,some students claim to do better onopen-ended questions like essays thanon multiple-choice questions, whileother students feel the opposite is true.There is little evidence, however, tosupport the notion that students with-out disabilities who posses the under-lying skills can have difficulty demon-strating them on certain item types.Usually, such students lack at leastsome of the targeted skills; it is thus apolicy question for states to determinewhich skills all students should be ableto demonstrate, if any.

Some students who have disabilities,however, might be better tested via dif-

ferent items types and formats. P.L.107-110 sets aside funds for the cre-ation of “enhanced assessments,”which would allow, among otherthings, the creation of multiple meas-ures of student academic achievementfrom different sources. This raises theissue of accommodations in test admin-istration, which is addressed belowunder Logistics.

Multiple measures can create problemswith regard to consistency and fair-ness, and also can add considerablecost to a testing program. To the extentthat the testing industry can provide aninnovation that would permit multiplemeasures of state content, and stilldeal with the issues of fairness, itwould be helpful to states.

Need: Better instruments for earlychildhood assessment

Testing in the early grades (Pre-Kthrough Grade 2) has been controver-sial for years. While most educatorsand parents see the need for earlydetection of a child’s strengths andweaknesses as well as evaluation ofearly grade education programs andpedagogy, the use of standardized testsfor very young children is largelyfrowned upon. Rather than getinvolved in the debate about whethercurrent testing instrumentation isappropriate for young students, it isimportant to acknowledge that there isroom for improvement. An innovationthat would be most welcome would beimproved early school test instrumentsthat are developmentally appropriatefor the 4 to 7 year olds in these gradesbut also provide the kinds of standard-ized, valuable data that is obtainableon older students6. For example, com-puterized testing might allow the intro-duction of multimedia and more game-

5 Innovation 2.03: Coordinated Multiple Measures: Measures of academic standards using a differentmeasurement methodology, e.g., performance assessments or technology assessments, but coordinatedwith the primary assessment for fairness and comparability.

6 Innovation 2.10: Improved Early Childhood Tests: Tests designed for administration in Pre-Kindergartenthrough Grade 2 that provide valid and reliable measurement of emerging skills and readiness withoutthe disadvantages that many current forms of standardized tests have at these grades.

Page 25: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

24 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

like experiences for the young studentsthat can still provide excellent stan-dardized test results.

TECHNICAL

The second general category of needsfor innovation deals with the technicalquality and nature of the informationprovided by the test. The main pur-pose of assessment programs is to pro-vide useful information and the extentto which tests provide scores that leadto instructional improvement enhancestheir utility for states. Technical needscan be divided into two subcategories:types of scores and technical quality.

Types of Scores

There are two general types of scoresprovided by tests: criterion-referencedand norm-referenced. The debate hasraged for years in the state assessmentcommunity about the relative merits ofthese two approaches. Some feel thatnorm-referenced scores should beeliminated completely from stateassessments and all interpretationshould be criterion-referenced. Othersfeel that norm-referenced informationis not only beneficial to public under-standing of test results but also allowsfor interpretive schemas that are notpossible with criterion-referencedscores alone. There is a clear need toget past the rhetoric of this debate andprovide innovations in developing andusing the appropriate test score types.Many do not realize that criterion-ref-erenced scores can be obtained fromnorm-referenced tests and vice versa.While the test types are specificallystructured to maximize the quality ofone type of data or the other, manystates have designed systems that canobtain the quality of data they needfrom hybrid examinations.

Need: Acceptable, understandableways of measuring mastery atindividual and group levels

Criterion-referenced scores are neces-sary for an assessment program to bein compliance with P.L. 107-110.

There are many types of scoringschemes used across the states thathave been implementing standardsbased assessments. The variety ofmethods for providing mastery andproficiency scores for individual stu-dents and groups causes much confu-sion among the consumers of testinginformation. Teachers, school adminis-trators, state policy makers, parentsand the public-at-large all have a needto understand clearly how well schoolsare doing in educating students andhow well students are doing in achiev-ing proficiency or mastery of state aca-demic standards. A useful innovationwould be a common system of criterion-referenced information7 that was accept-able and useful in meeting the generalneeds of test consumers.

In the past, many states have adoptedthe NAEP scoring scheme of labelingperformance as basic, proficient, andadvanced, and P.L. 107-110 specificallyrequires at least these labels, althoughstates are free to add other levels ofperformance. The methods for deter-mining these proficiency levels, howev-er, have varied considerably from stateto state, which makes it confusing forthe public. To add even more confu-sion to interpretation, publishers haveattempted to tie their norm-referencedtests to these NAEP proficiencies.When states have tried to reconcile dif-ferences between performance on pub-lishers’ tests, state custom assessments,and NAEP itself, the analyses haveraised more questions than they haveanswered. The states have a need for aconsistent criterion-referenced inter-pretive scheme to be created by thetesting industry, certified as technicallyappropriate by the psychometric scien-tific community, and made available tostates for all of their assessment pro-grams.

7 Innovation 1.03: Collaborative Resource on State Assessment

Page 26: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 25

D I SCUSS ION

Not only does this criterion-referencedinterpretive scheme have to be avail-able for the major state assessmentprograms, it should also be made avail-able for classroom assessments. At theleast, a consistent interpretive method-ology should be established for class-room assessments8 so that teachers,students, and parents get used tounderstanding the interpretive scheme.Classroom teachers do not get verymuch, if any, instruction on classroomassessment when they are in their edu-cation college programs. Good in-serv-ice training that will improve class-room assessments to make them moreacademically sound would reduceteachers’ current use of what is knownas “hodgepodge grading.” This isaddressed later in this document.

In so many areas of life there is a com-mon language to talk about measure-ment. For example, there is a commonunderstanding of miles per gallon,degrees Fahrenheit, height in inches,the Dow Jones Industrial Average, andwind chill factors. Most people alsogenerally understand the meaning ofsome norm-referenced scores, such aspercentile ranks. Most parents under-stand the letter grade system of A, B,C, D and F, and they understand per-centage correct, passing scores andhonor rolls (notwithstanding the factthat the letter grades might not becomparable across teachers). The prob-lem with the basic/proficient/advancedscheme is that it is inconsistentlyapplied across states, and still debatedamong experts as to how it should beinterpreted.

It is important to point out that it isnot possible to accomplish absoluteequating of educational standardsacross different tests in different states.Even if all states agreed to use a com-mon set of labels, there would beinconsistency behind the meaning of

the labels in operational terms. Unlessthe states agreed to the definitions ofcontent for the various tests, the use ofcommon definitions would be mean-ingless. If one state has a math testthat is totally problem solving andanother has a test that is totally com-putation, the terms “proficient” and“basic” have different meanings, evenwith the same operational definition.

Need: Ways to provide the publicwith comparisons and other bene-fits of norm-reference tests whilestill using criterion-referencedtests

Norm-referenced scores traditionallyhave aimed to provide information onstudent and group performance rela-tive to that of other students andgroups. While norm-referenced data inisolation is often not as informative asis needed for educational improvementand accountability, and would not sat-isfy the requirements of federal law,the public feels comfortable interpret-ing norm-referenced comparisons andwill often create comparison interpre-tations informally when only criterion-referenced scores are provided. Theperception that all aspects of tradition-al norm-referenced tests are antitheti-cal to state standards-based assessmentprograms has evolved because of a nar-row interpretation of the utility ofnorm-referenced data.

The Tennessee Value-AddedAssessment System (TVAAAS), forexample, which has been hailed as aninnovative way to interpret studentprogress, relies on a norm-referencedtest to compare student progress fromyear to year. This system has also beenfound to be useful for measuring theeffectiveness of teachers and of schoolsand a number of states are consideringwhether to implement such a value-added model. Yet, unlike norm-refer-

8 Innovation 2.07: Consistent Criterion-referenced Interpretive Methodology for Classroom Assessments: Auniform, understandable way of grading and reporting on informal assessments that is coordinated withthe state assessment system.

Page 27: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

26 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

enced tests, many standards-based testslack an important design feature—“ver-tical equating”—that allows for theiruse in value-added analyses. It isimportant to note that it is not neces-sary for a test to be norm-referenced toaccomplish the goals of TVAAS orother value-added systems. Theses sys-tems do require regular testing andvertical scaling across the grades,which is routinely done for norm-refer-enced achievement batteries, but canalso be done for custom state assess-ments.

A needed innovation is for the testingindustry to develop methods for apply-ing the best features of norm-refer-enced tests while avoiding misinterpre-tations9.

Technical Quality

As the stakes for testing become high-er, tests will be subjected to increasedscrutiny by the public, the educationcommunity, the scientific community,and the courts. As a result, the needfor proof of technical quality in the testwill continue to increase. In our liti-gious society, it is likely that any test-ing program will be challenged by alawsuit from a school, a parent, a spe-cial-interest group, or student. Stateshave every right to expect that the ven-dors they choose to work with for test-ing programs provide the highest tech-nical quality of assessment programthat current technology and sciencepermits, and the law now requires it.

The three major components of techni-cal quality are reliability, validity, andfairness.

Reliability

Reliability is the extent to whichassessment provides consistent infor-

mation over occasions and forms.When states commonly used publish-ers’ norm-referenced assessments, theydid not need to be concerned aboutthis because the reliability of publishedstandardized tests that were bought offthe shelf was consistently high. A pub-lisher creating only one multilevel test-ing instrument can take great pains todevelop the test carefully over a periodof years and thoroughly research theinstrument. The significant investmentpublishers made in each of these off-the-shelf tests (often exceeding $20 mil-lion) was deemed worthwhile becausemillions of copies of the tests would besold over many states and multipleyears. Reliability presents particularchallenges, however, with custom builttests that are different for every state.The timelines for test development areextremely short, and the opportunityto recover the research and develop-ment costs over many years and mil-lions of administrations are much morelimited, especially in smaller states.

It is also the case that many customstate assessments use innovative itemtypes that have not been subjected tothe years of research that more tradi-tional item types have. This also cre-ates reliability challenges for the testdeveloper.

The third challenge to the reliability ofcustom developed state assessmentsstems from their high-stakes nature,which puts additional pressure onsecurity. One of the typical ways todeal with security threats is to use anew form of the test every year. Themore forms of a test are developed, themore the issue of inter-form reliabilitybecomes problematic. It is widelybelieved that drift occurs from form toform and year to year. This threatensthe meaningfulness of the test resultsas well as the reliability.

9 Innovation 1.10: Integrate Important Benefits of Norm-referenced Tests for State Standards-based Programs:Data using comparisons of performance and design features such as vertical equating that add to theinformative power of state assessments without replacing or compromising the criterion-referenced inter-pretation system.

Page 28: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 27

D I SCUSS ION

Need: Internal consistency andalternate-forms reliability so thatresults are credible

A priority for the states is the need forinternal consistency and consistencyacross forms so that results are credi-ble and so that good educational deci-sions are made on the basis of theresults, and not on the basis of aberra-tions in performance that result fromlack of reliability of hastily developedinstruments. Innovations that lead toincreased reliability without sacrificingsecurity and customizability are need-ed by the states. It is unlikely that thenature of these innovations will betechnical, since most technicalimprovements in psychometrics thatwould yield increased reliability havealready been made. It is more likelythat procedural and the technologicalinnovations can help solve this prob-lem. For example, the use of random-ized forms10 could allow for more secu-rity through scrambling of items acrossforms. Unfortunately, this is difficult toaccomplish cost-effectively with print-ed tests. The use of randomized formsis already widespread in state assess-ment systems, because this is a con-venient way to field test multiple newitems. However, this use of random-ized forms is accompanied by somesignificant technical challenges. It mayor may not be a cost effective way ofimproving test security as that dependson the extent to which the test formsare different. Furthermore, researchhas shown that changing item place-ment in a test can affect student per-formance, thus possibly creatinginequalities across the various testforms.

Also, the innovation in content of hav-ing common pools of items availableacross states (Innovation 2.01) could

provide improved reliability withoutsacrificing the standards-based natureof most state assessments.

Validity

Validity is the extent to which the testmeasures what it purports to measure.Assuming that the content objectives ofthe test are well defined and estab-lished, it is important that the testitems actually measure that content.This places restrictions on the types ofitems that may be used. For example,measurement of complex higher-orderskills, while desirable and included inmany states’ academic standards, mustbe done very carefully because itemsused to develop these higher orderskills can often be measuring multiplethings, some of which are not the statestandard. In addition, P.L. 107-110specifically prohibits the evaluationand assessment of personal or familybeliefs and attitudes.

Need: Legal defensibility for highstakes

One of the needs that states have is forgreater legal defensibility of high-stakes tests. This goes to the validityquestion very directly. If the test isvalid for this use, it should theoretical-ly be appropriate for high-stakes meas-urement of academic standards. Rightnow, states are often left to their ownresources to defend the validity oftheir testing programs and benefit littlefrom each other’s similar situations.One innovation desirable to the stateswould be the creation of a legal entityor agency specifically devoted to thelegal defensibility of high-stakes assess-ments and a body of knowledge andcase law that can be shared acrossstates11. Currently this exists only infor-mally.

10 Innovation 1.08: Randomized Forms: Tests in many alternate forms administered across students sothat students in the same class receive different forms. While this is best done in computer administra-tion, it can be accomplished to some degree in paper and pencil administrations.

11 Innovation 1.03:Collaborative State Resource for Assessment Systems: An organization serving as aresource on the legal defensibility of state assessments, possible timelines for testing programs and sched-ules, and state/vendor contractual techniques.

Page 29: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

28 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

Fairness

The third major component of techni-cal quality is fairness. Fairness is abroad category but includes at a mini-mum the absence of bias in a test andthe accessibility of the test to all stu-dents. P.L. 107-110 specifically requiresthat tests be used for all students,including those who are traditionallydifficult to test with standardized pro-cedures, specifically mentioning eco-nomically disadvantaged students, stu-dents with disabilities, students withlimited English proficiency, andmigrant students.

Need: Proper evidence of bias-freetests

One of the greatest needs that stateshave is proper evidence that theirassessments are free of bias. Becausethere are so many subgroups to bedealt with and complex analyses of testdata, the bias studies must be carefullyaimed at the right populations and sub-populations. This might not seem like alegitimate need, since bias study proce-dures are well established and docu-mented in the technical literature.However, there is a remarkable incon-sistency from state to state on thedegree to which these studies are actu-ally conducted, and they often costmore than the legislature has appropri-ated. While most state test develop-ment efforts deal with straightforwardideas of gender bias and ethnic bias,they often skip the elaborate studies ofsubgroup bias that would be necessaryto prove fairness to all students. Someof these difficulties are cost and proce-dural, and some are technical. Aninnovation that is needed by the statesis to have an agreed-upon set of proce-dures for bias studies12 that are used byall states and that are clearly known tolegislatures and lawmakers so that theycan appropriate the necessary funds todo these studies. The extent to which

the national item bank innovation(Innovation 2.01) is achieved will helpmeet this need if the item banks con-tain sufficient bias information aboutthe items.

P.L. 107-110 requires the results ineach state, LEA and school to be disag-gregated by gender, each major racialand ethnic group, English proficiency,migrant status, students with disabili-ties and by economically disadvan-taged students as compared to studentswho are not economically disadvan-taged. Furthermore, P.L. 107-110requires “evidence from the test pub-lisher or other relevant sources that theassessment used is of adequate techni-cal quality for each purpose for whichthe assessment is used [§1111(b)(3)(C)(iv)],” which evidence is consis-tent with nationally recognized techni-cal and professional standards[§1111(b)(3) (C)(iii)]. The JointProfessional Standards require that thebias analyses done for any group orsubgroup give evidence of the suitabili-ty of the test for each of its intendedpurposes. This puts a significant bur-den on states to meet the requirementsof the law, and few states have thebudgetary resources to fund thisresearch properly. The innovation ofcreating standard procedures for biasstudies would help if there were clearcosts associated with the appropriatestudies.

LOGISTICS

The next major set of needs deals withthe logistics of the testing process. Thisincludes test development, distribution,administration and scoring.

Test Development

Test development represents a highcost for states for two reasons: First ofall, the resources to develop high quali-ty assessments that are suitable and

12 Innovation 1.14: Standard Procedures for Bias Studies: Agreed-upon methodologies for bias studies thatare known and funded by law-making bodies.

Page 30: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 29

D I SCUSS ION

defensible for high-stakes use are limit-ed. Secondly, the increase in testing inthe past few years, which will only getmore intense with the passage of P.L.107-110, has increased the demand fortesting. Simple economics say thatincreasing the demand on a limitedsupply drives up prices. In the past,the term “standardized test” applied tofive or six nationally-normed off-the-shelf instruments. The testing industryevolved to develop instruments likethese every four to seven years, withsuitable research. The addition of fiftystate assessment systems, some ofwhich are close in size to the national-ly-normed tests, increases the demandfor test development by a factor of tenor more. When consideration is givento the frequency with which the stateassessments are expected to be revised,and the commensurate need for class-room-based systems that support andcomplement the state assessments, it iseasy to see the potential for overtaxingthe industry’s test development capaci-ty.

Need: Ways to develop more andbetter tests more quickly

States need innovations from the test-ing industry that increase the qualityof state tests without increasing thecost to more than what the taxpayerscan bear. They also need tests devel-oped more quickly without compro-mising quality. The faster tests can bedeveloped (i.e., items written andassembled), the more time there is forresearch that improves and maintainsthe psychometric quality of the assess-ments. It is also the case that somestate education agencies are challengedby their legislatures to implement test-ing programs very quickly. Lawmakerssometimes do not allow enough timefor states and the testing companies todo the best job.

There have been various proposals tomeet this need. One is that states shareitems informally in a massive itempool. Rather than writing and buying

new items the states can use commonitems that are also used on otherstates’ assessments. The challenges tothis are security and the appropriate-ness of items from one state to anotherwhen the state academic standards dif-fer. Another challenge to this is thedesire some states have to be differentfrom others, and to have completeownership of their testing systems.

The innovation that would bestaddress this need is a secure commonitem bank (Innovation 2.01), main-tained in a computerized system,accessible only by appropriate person-nel with passwords. Items wouldeither be released annually or wouldbe maintained and updated with tech-nical data from administrations in thevarious states and would be taggedwith information about their appropri-ate use. In the case of a pool that is notreleased, states could pay the ownersof the items a negotiated royaltydependent upon the use of the item inthat state so that the item bank couldbe maintained and improved overtime.

There are many obstacles to overcomein implementing this innovation. Firstof all, the issue would have to beaddressed as to who maintains theitem bank. Sophisticated computerizedtest item banking systems have beendeveloped and are on the market, sothe technology is only a small chal-lenge. The biggest challenge is logisti-cal. Would all the testing companiesshare access to the same item bank?Would item banks be proprietary totesting companies and only the statesthat worked with that testing companyhave access? Would some trusted enti-ty create and maintain the bank andtesting companies and states wouldhave their owned items placed in thebank voluntarily and receive paymentwhen their items were used (like themusic industry is attempting to dowith online music purchasing)? Shouldcompanies or agencies create itembanks and share them by making busi-

Page 31: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

30 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

ness deals? However this innovationoccurs, it is a priority for the states tohave faster development of high-quali-ty, properly researched tests that pro-vide multiple forms at an affordablecost.

Innovation 1.07: Customized Off-the-Shelf Tests can also go a long way toreducing the time and cost of testdevelopment. The challenges to over-come in implementing this innovationwere discussed earlier.

Test Distribution

Innovations in test distribution wouldallow tests to be presented to studentsin a more timely fashion with theappropriate level of security withoutcreating logistical challenges for teach-ers and school personnel.

Need: On-time test delivery

In recent years, some tests have beendelivered to schools for administrationlater than scheduled, causing problemsin the timeliness of testing and report-ing. In some cases, the late delivery ofprinted test materials have resulted inmajor disruptions of the school calen-dar, issues with comparability of datato past and future years when studentsare tested at a different point in theinstructional process, and resultant latescores reports, sometimes causingschools the additional expense of mail-ing reports home to parents after theschool year has ended and problemswith setting up registrations for sum-mer school. The testing industry needsto find a way of more consistentlydelivering tests on time. The challengerelates to the test development andresearch issues discussed above. If theprocess for test development takeslonger than the budgeted time, it

places a burden on the production andprinting of the test.

Part of the challenge remains with thetesting industry, which must be able tomeet deadlines to which it has agreed.But part of the solution also has to dowith the states establishing schedulesthat can reasonably be accomplished.Since there has been so much experi-ence in developing tests over the years,it should be possible to identify modelproject timelines so that states can planbetter and can inform their lawmakersand policymakers of the minimum andoptimum time to allot to test develop-ment and distribution. A useful innova-tion for the states would be a singlesource to access that would providesuggested schedules for testing pro-grams.13

This is another area in which technolo-gy innovations can help. Computer-administered tests14 do not need to beprinted and shipped, thus saving a con-siderable amount of time. Computer-ization of test development throughonline item banking (Innovation 2.01)also can improve test development effi-ciency. The combination of these inno-vations can increase the likelihood ofon-time presentation of the tests to thestudents.

Of all the innovations presented inthis paper, computer-administered testsis the most powerful and most chal-lenging. While it provides very manybenefits, it provides significant chal-lenges as well. First of all, the infra-structural demands of giving widelyadministered tests on a computer areconsiderable. If the tests require anInternet connection, it must be func-tional and fast enough to not distractfrom the testing setting. If the testsdepend on a server-based local area

13 Innovation 1.03: Collaborative State Resource for Assessment Systems: An organization serving as aresource on the legal defensibility of state assessments, possible timelines for testing program schedules,and state/vendor contractual techniques. This organization would be available to states and testing com-panies to assist them in the implementation of their programs.

14 Innovation 2.02: Computer-administered Tests: State assessments administered to every student oncomputers without the need for printed tests.

Page 32: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 31

D I SCUSS ION

network, it must be up and functionalfor all students. There are opportunityto learn challenges that would make ithard to justify equity in the scoresfrom students who have frequentaccess to computer at home to studentswho do not. There are issues about theequating of scores between computeradministrations and paper and penciladministrations if some students takethe test in different modalities fromothers.

Despite these challenges, computerizedtesting offers enormous benefits,including instant turnaround time ofscoring, lively multimedia questions,randomized forms, capturing of everykeystrokes that allow for richer scoringalgorithms, and ability to administeradaptive tests that provide more accu-rate scores with shorter testing times.

Computerized tests exist today, but thechallenges to their widespread applica-tion in school testing are significantenough that they will probably notrealize widespread use in high-stakesK-12 testing for many years. In theinterim, they can be effectively usedfor makeup tests of students who missthe main testing administration win-dow, of for retests on required examsfor students who failed the test on anearlier paper and pencil administrationand can retake the test at their conven-ience. In any case, continued researchinto this innovation is warrantedbecause of the many benefits it canprovide when the technological chal-lenges are solved.

Need: Better security

Security breaches in the distribution oftests create many problems for thestates. First of all they compromise theintegrity of the results, underminingthe entire assessment program. If thestate officials do not know who hasseen the test or portions of the testbefore administration, there is no wayto make fair comparisons. A knownsecurity breach can also compromise aform of the test, taking it out of use,

not only for that state, but perhaps forother states if they share items andtests.

One of the most effective measures tomaintain security is to release the testitems every year, though the cost ofdeveloping new items each year is sub-stantial. Maintaining the security oftest results can be expensive and logis-tically complex. For example, somestate have their testing companies labelevery test booklet and packet, and logwhich school and classroom receivedthose tests. Any tests that are notreturned are noted and the securitybreach is investigated. Another securi-ty measure that adds cost is sealing ofbooklets in sections. The seal is brokenby the student just before a section isadministered, increasing the likelihoodthat no one can see the test contentbefore administration. While theseprocedures have worked in manystates, they are expensive and timeconsuming.

Nothing can completely protect thesecurity of tests, but it can beimproved. Innovations that wouldimprove security are those that woulddecrease the opportunity to see the testbefore administration, and those thatwould increase the pool of items fromwhich a form presented to a particularstudent could be drawn.

Computerized-testing with randomizedforms, a combination of Innovations1.08 and 2.02, is actually a two-edgedsword with regard to security. On theone hand, it can be used to customizetests at the time of administration froman item bank, thus improving security.On the other hand, computerized test-ing can actually exacerbate the securityrisk if tests are administered over alonger period of time. Students’ memo-ry of a few test items shared withanother student who is tested later inthe year can be just enough to invali-date a test. This is certainly an areathat merits investigation for proceduralimprovement through innovation.

Page 33: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

32 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

Test security breaches can be uninten-tional or intentional. The most notori-ous cases of breaches are where theintent is to see test content in advancein order to unethically improve studentscores. However, many test securitybreaches are actually unintentional andare the result of poor training and/orinformation provided to teachers andadministrators as to what constituteappropriate test security measures.This issue is addressed in the sectionbelow on professional developmentneeds.

Need: Logistical simplification forteachers and school personnel

Testing has always been a logisticalburden for educators and the increasedemphasis on tests has increased thatburden. Not only does test administra-tion take time, but the logistics ofscheduling, distributing tests, collectingand preparing them for return for scor-ing, and the distribution of reports tostudents and parents are time consum-ing activities that do not add anythingto the learning process. Sparing educa-tors these administrative and logisticalchores would be most welcome andwould add to instructional time andschool instructional management.

One innovation that would help in thisregard is computer-administered tests(Innovation 2.02), to the extent thatproblems with technology and equip-ment do not make it even more diffi-cult for teachers and administrators.Another innovation would be comput-erized or web-based reporting15, whichmight decrease the amount of timethat reports are handled and filed.

Test Administration

Once the tests are distributed, theymust be administered. There are anumber of needs of the states relative

to test administration that could be bet-ter met by the testing industry, such asways to handle absences, special situa-tions and make-up tests, and ways toeliminate test taking skills as a factorin test scores.

Need: Better ways to handleabsences and special situations

It is a fact of life that some studentsare absent on the day of standardizedtesting. These students need make-uptests administered, which creates logis-tical challenges and security chal-lenges. The innovation of computer-administered tests (Innovation 2.02)can be ideal for this, provided thatthere can be assurance that the scoresfrom a computer-administered versionof the test have the same meaning asthe scores from the paper and pencilversion of the test. This will requireresearch and development of appropri-ate tests that can be administered bycomputer, as well as network of com-puters that are capable of administer-ing tests under standardized conditions.

Need: Practice tests and methodsto eliminate test taking skills as afactor

One of the concerns that states have isthe extent to which test-taking skillsare differentially distributed in the stu-dent population, resulting in some stu-dents performing less well than theyshould. Theoretically, the effect of testtaking skills should be unidirectional,that is, obtained scores are below truescores for students with poor test-tak-ing skills, but obtained scores are notincreased above true scores with stu-dents with good test taking skills.Thus, all students should be proficientin taking tests so as to make validscores and comparisons. States needhigh-quality test preparation programsthat do not artificially inflate scores by

15 Innovation 1.01: Computerized or Web-based Reporting: A paperless system of providing real-timeaccessibility to reports either on a school computer network or on the Internet. Ideally these reportswould be interactive, allowing cross-referencing and detailed investigation and explanation of scores formultiple types of audiences.

Page 34: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 33

D I SCUSS ION

teaching an overly targeted curricu-lum, but bring all students to the samelevel of proficiency in test taking. Aninnovation would be an integrated pro-gram that improves test taking skillswhile also providing legitimate instruc-tion, not just test prep16.

Innovation 2.04, Multi-tieredContinuous Assessment Systems canalso meet this need as it involvesbecoming familiar with test formatsand structure in the context of instruc-tion, as it should be.

Need: Special forms of tests forstudents with disabilities

P.L. 107-110 specifically requires thatthe state assessments provide for par-ticipation by all students with reason-able accommodations for students withdisabilities necessary to measure theirachievement relative to state contentstandards [§1111(b)(3)(C)(ix)].

Students who cannot take the testunder standardized conditions need tobe able to take an appropriate form ofthe test. Often the group-administeredinstruments designed for the generalpopulation cannot easily be adapted tostudents with disabilities while main-taining full validity. States need aninnovation that would provide tests indifferent formats, such as individuallyadministered or oral tests17 that meas-ure the same skills as the general testand need the research that proves thisto the satisfaction of the stakeholdersin the test. These individually adminis-tered tests would be clinical instru-ments specifically designed for theirtarget population of students. Thistechnique is viewed by school psychol-ogists as more appropriate than simplyproviding an individual administration

of the same test taken as a group-administered test by the general popu-lation of students.

The issue of accommodations has beena difficult one for the states and thetesting industry for some time. On theone hand, there is the desire to provideevery child the benefit of the stateassessment system, and all parents theinformation they need on their chil-dren. On the other hand, efficient test-ing programs usually present assess-ment situations that some childrencannot deal with and obtain validscores. For example, a visually-impaired student might need a Brailleor large-print version of the test. Otherchildren might need the test read aloudto them. Still others might need some-one else to fill in their answer docu-ments. These are called “accommoda-tions.” Testing condition alterations andmodifications compensate for extrane-ous factors and can change the mean-ing of test scores. A valuable innova-tion would be the creation, throughcollaboration, of a model system ofclassification of accommodations, andoptions for appropriately incorporatingthe scores from modified test adminis-trations and alternate tests into stateaccountability ratings18.

Note that accommodations and modifi-cations are usually determined on anindividual student basis based onneeds. Each state’s laws also come intoplay. For example, some states awardfull diplomas to studentswith disabili-ties if the student accomplishes the IEPwritten for him or her. Other stateslabel such diplomas as exceptional insome way. This area requires muchmore discussion and research beforethese problems are solvable. Somestates might even require additionallegislation.

16 Innovation 1.06: Integrated Test Readiness Programs: Programs that simultaneously teach meaningfulcontent around the state academic standards while improving test-taking skills.

17 Innovation 2.09: Individually-administered Tests to Measure State Academic Standards: Tests coordinatedwith the state assessment program objectives and information system that can be administered to stu-dents who are unable to take the standardized group-administered state assessment.

18 Innovation 1.03: Collaborative Resource on State Assessment

Page 35: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

34 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

Test Scoring

The last area of test logistics is testscoring. Since the major purpose oftesting is to obtain the information inthe score reports, the process of chang-ing the student’s raw responses intomeaningful information is crucial.Interpretation and the creation ofreports are addressed in the next sec-tion. The logistical challenges in testscoring are speed and accuracy. Ineach case the states need more speedand more accuracy.

Need: Faster turnaround of scores

One of the greatest needs of the statesis fast turnaround of scores. Eventhough many states put extreme pres-sure on their scoring vendors to turnaround results of tests in as little astwo weeks, this still sometimes resultsin scores being returned to schoolsafter school ends. Without the scores,schools have difficulty identifyingwhich students need summer schooland other tutoring. The problem isworse when there are deadlines missedeither in test delivery or in scoringturnaround. Missing documents at thescoring center, make-up testing andtransportation problems create evengreater delays. Educators often askwhy results cannot be returned thenext day.

There are two possible innovations thatcan help in this regard. The first iscomputer-administered tests(Innovation 2.02) that can be scoredinstantaneously. The second is amethod for scanning and scoringanswer sheets locally within the schoolor district to provide immediate feed-back to the schools19. The documentscould then be sent to the scoring ven-dor for additional scoring and aggrega-tion for summary reports. In order for

this to be feasible, schools would haveto invest in ways to score the testslocally without compromising theanswer media for the official scoring.States would also have to deal with theincreased costs of scoring the teststwice and the security risks that thiswould create.

Sometimes the testing program designmakes it harder to turn around scoresquickly. For example, items that needhand-scoring, like open-ended ques-tions and writing samples, add consid-erable time to the scoring process. Onemethod that can decrease the turn-around time for hand-scoring is com-puterized scoring of open-endedresponses20. There are at least four pro-grams currently available that canscore essay questions with accuracythat approaches or surpasses that ofhuman scorers. While these programshave been used in professional exami-nations and college testing, the extentto which the states and their stakehold-ers will trust these programs for K-12statewide assessment will depend uponthe research that backs them up andtheir continued use in actual programswithout negative consequences. (A lessradical option for those interested incomputerized scoring of open-endedresponses is to replace only one of thetwo human raters typically used foreach response; this may still result infinancial savings, but most if not all ofthe increased speed is lost.) The valueof computerized scoring is different forshort response items and longer essays.Most of the research on computerizedscoring has shown it works well forlonger sample of student work, in par-ticular for writing passages. Shortresponses have not been accuratelyscored by computer systems. Differenttesting programs with different purpos-es may differ in whether computerscoring is a benefit or a detriment.

19 Innovation 1.09: Local Scoring Option: A system whereby LEAs can score tests locally for immediateresults prior to sending answer media to the test scoring contractor for official statewide scoring andreporting.

20 Innovation 1.05: Computerized Scoring of Open-ended Responses: A method by which open-ended ques-tions are scored entirely or partially by computer with little or no human scoring.

Page 36: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 35

D I SCUSS ION

Another method that can decreaseturnaround and also assist with reduc-ing cost is distributed scoring21. In thismethod a student’s open-endedresponses are captured on a computer,either through use of an image scannerand software that isolates the image ofthe written response, or through com-puter-based test administration(Innovation 2.02). Once the student’sresponse is isolated or captured in thecomputer, it is transmitted over theInternet to the scorer who has beentrained to score that specific response.That scorer has access on the comput-er to the rubrics and sample responses.The scorer reads the response, enters ascore on the computer and transmits itback to the computer at the scoringcenter. The program in the computerat the scoring center waits until thesecond reader’s rating is submitted,reconciles the scores and either acceptsthe score (if it is within the toleranceset by the program) or sends the stu-dent’s response to someone else for aresolution reading and scoring. Thecost advantage of this method comesfrom the fact that the scorers often canuse their own PCs at home or at work,thus saving the cost of renting or own-ing a scoring center facility that seatshundreds of hand-scorers. The respons-es are viewed on a computer screenwhich saves the cost of making multi-ple copies of the papers. The ability towork on a flexible schedule from homeallows a greater variety of qualifiedindividuals to do the scoring who oth-erwise would not be willing to do so,since they might have other jobs orresponsibilities. It also allows the scor-ers to live anywhere, not just in thecity where the scoring center is locat-ed. This flexibility provides two advan-tages: higher quality scorers and theability to pay a lower rate (since a per-son is often willing to accept a lower

rate to avoid leaving home). The scor-ers still need to be trained and thatwould often mean coming to a centrallocation for at least the training,although net-meeting technologywould also allow that to be done overthe Internet. This technique existstoday, but is still used in very fewinstances. Its wider acceptance wouldbe a significant innovation.

Certain score types, like state norms orcomparisons, cannot be generated forreports until a minimum number ofstudents’ tests have been scored. P.L.107-110 requires student score reportsto show how students served by thelocal educational agency perform onthe statewide assessment compared tostudents in the state as a whole[§1111(h)(2)(B)(i)(II)]. It also requiresinformation that shows how a school’sstudents performed on the statewideassessment compared to students inthe LEA and the state as a whole[§1111(h)(2)(B)(ii)(II)]. These data forreporting cannot be generated untilthere is some way of obtaining thestate performance data.

There are two ways of dealing withthis need. The first is to use last year’sdata to generate state statistics for apreliminary set of score reports andthen to use this year’s data for arevised, later set of reports. While thisdecreases turnaround time for the ini-tial reports, it creates confusion andnecessitates double reporting, andmight not meet the requirements ofP.L. 107-110 (This has not yet beendetermined). Another possibility is toleave state comparisons off of the ini-tial wave of reports and place them ona later wave of reports. Any innovationthat allows state comparisons or statis-tics to be calculated more rapidlywould be welcomed by the states.

21 1.12: Distributed Scoring. A method for scoring open-ended responses more efficiently by either (a)capturing the student’s written response using imaging technology, or (b) having the student enteringresponses directly into a computer, and sending the responses to trained scorers “at large” who can workfrom their homes or offices on their own schedules. The scorers score the items on their personal com-puters and relay the scores, which are then reconciled with the second scorer and accepted or referred fora resolution scoring. The same quality control procedures are used as for hand scoring in scoring centers.

Page 37: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

36 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

Some innovations previously discussedmight help in this regard (e.g., innova-tion 2.04, Multi-tiered ContinuousAssessment Systems). The use of agood continuous assessment program,tied directly to the end of year compo-nents and the state standards, couldmake results available for summerschool placement, fall class placement,and even as an aid in teacher self-eval-uation without having to wait for theend of year state assessment. In thisway teachers would know right awayafter delivering a unit if the students“got the lesson” or they need to repeatit, instead of wondering at the end ofthe year when the class does not dowell on a particular area. It also helpscreate an incentive for the teacher toactually deliver the state standards andcurriculum, as well as a possiblerecord to demonstrate that the oppor-tunity to learn was present in eachclassroom. A challenge might berequiring a particular form of ongoingassessment might be viewed as overlyprescriptive.

Need: Greater accuracy of scoring

Few things are more frustrating tostates than errors in scoring that arediscovered after reports have been dis-tributed. In the past, this rarely hap-pened but the increased burden on thetest scoring profession in past yearshas created more errors than at anytime in the past. Scoring errors havemany sources, some of which areincorrect scoring keys or rubrics, incor-rect application of tabular data (normsor proficiency cutoffs), and humanerror in hand scoring. States andschool districts have had to issuerevised reports because of errors likethese and in some extreme cases, stu-dents were retained in a grade, deniedgraduation, or sent to summer schoolbased on incorrect score reports.

It is incorrect to assume that all scor-ing inaccuracies are the fault of thecompanies scoring the tests. In somecases, the states themselves or the testdevelopers (sometimes different fromthe test scoring contractor) signed offon incorrect rubrics or keys. Regardlessof the causes, states need to work withthe testing companies to improve theaccuracy of scoring to as close to 100%as possible.

Logistical changes can improve accura-cy of scoring, but result in creatingother problems. For example, addingtime to the scoring vendor’s schedulefor additional checks and balances goesagainst the need for faster turnaround.Adding more staff to the processincreases costs that many statesalready feel are straining their budgets.

Some of the innovations already pro-posed can help with scoring accuracy.For example, human error can bereduced by having computers scoreopen-ended items (Innovation 1.05),but this introduces new challengesmaking sure that the computers areproperly programmed and the rightkeys and rubrics are used. Decreasingthe novelty of assessment programs byusing common items, forms and itembanks (Innovation 2.01) can alsoimprove scoring accuracy because itreduces the need to train scorers andset up new programs for every singlestate and form. Recently, some scoringcompanies have developed sophisticat-ed logic that can catch errors beforethey are reported by applying artificialintelligence and algorithms22. To theextent that this innovation can beexpanded, states would be well served.

Another way that scoring quality mightbe improved would be the adoption inthe testing companies of quality con-trol standards that are widely accepted

22 Innovation 1.11, Enhanced Quality Control Procedures: Multiple improvements in quality control inprinting and scoring tests, including automated methods of monitoring quality control in scoring anddetecting errors before they affect reporting and application to the testing industry of accepted industrystandards in manufacturing, project flow, software and communications.

Page 38: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 37

D I SCUSS ION

in other high-stakes industries. Thereare numerous standards that might beapplicable, such as those of theInternational Organization forStandardization (ISO) and theWorkflow Management Coalition(WMC). The former have promulgatedISO 9000 standards, which are genericmanagement system standards, whichprovide the organization with a modelto follow in setting up and operatingthe management system that is builton a firm foundation of state-of-the-artpractices. There are also specific stan-dards for software, for communicationand workflow that can be adopted bytesting vendors. To the extent that cer-tain companies distinguish themselvesby adoption and certification in accept-ed standards, states will gain confi-dence in the use of those vendors intheir testing programs.

INTERPRETATION

A major category of need for innova-tion is in interpretation of tests.Interpretation starts with the scorereports that are the main product ofthe entire assessment system.Innovations in reporting and interpre-tation are needed for all the uses oftesting: understanding student per-formance, teacher performance, andschool performance; evaluating pro-grams; and diagnosis of strengths andweaknesses.

Reporting needs fall into two cate-gories: report design (what informationthe reports contain and how it isorganized) and report accessibility(how the reports are presented to theiraudiences). The states need innova-tions in both these categories in orderto use their assessment programs forimproving the education process.

Need: Student tracking regardlessof mobility

Longitudinal reports can serve a num-ber of information needs: describingstudent, teacher and school perform-

ance and program evaluation.Longitudinal reports describe progressfrom testing occasion to testing occa-sion, usually year to year and grade tograde. Tracking students and cohortsacross years provides valuable informa-tion about student learning that cannotbe obtained with a snapshot report.

In order to obtain useful longitudinalreports, the states need reliable waysof matching students, teachers and pro-grams across years. This requires thetesting program to collect or accessdemographic information without ille-gal invasion of privacy.

One of the innovations the states needis an easier way to track students,teachers and programs from year-to-year without placing unreasonablelogistical burdens on teachers to fill indemographic information on answersheets. Methods like pre-identificationservices have been used for some time,which match students’ answer sheetswith preloaded demographic files atthe scoring center. Not all states usethese systems, however, because oftheir high cost, and they still result inmany unmatched students. If no stu-dent is to be left behind, states needreliably generated longitudinal reportsfrom their testing companies.

Innovation is also need in tracking stu-dents regardless of mobility. If a stu-dent moves from one school to anotherin the same district, one district toanother in the same state, or even fromone state to another, there should be away to track that student on some setof standards so that those responsiblefor his or her education can under-stand the student’s progress.Furthermore, a student’s progress infuture years reflects back on the effec-tiveness of programs used to teach thestudent in earlier years and this infor-mation should follow the studentwherever he or she goes. This is oftendifficult because different testing ven-dors are used. What the states need isan innovative way of tracking the stu-

Page 39: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

38 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

dent’s programs and performance thatstays with the student, but is protectedaccordingly as private information,similar to medical information23.

Challenges to overcome in this innova-tion are that creating such a systemmay be politically or legally difficult insome states and that any kind of pre-gridding operations will cause a time-lag prior to testing and during this timeperiod, since students who move inand out of school or between schoolsmake some of the records inaccurateon the day of testing.

Student performance

The fundamental goal of the stateassessment program is improving stu-dent performance. To that end, thestates need excellent reports of studentperformance that are accurate, insight-ful and easy to understand. This is alsorequired by P.L. 107-110. The mainconsumers of student level reports arethe students themselves, the parents,and the teachers. None of these con-sumers should be expected to beexpert in understanding testing dataand statistics. Therefore, student levelreports must be simple to understand,without watering them down so theylack the valuable information the pro-gram can provide.

Need: Parent reports that makesense for parents

For years, test publishers have attempt-ed to create better parent reports.Innovations such as the narrativereport appeared in the late 1970s andhave become more sophisticated overtime. Narrative reports explain the stu-dent’s scores in simple language. In thelate 1980s, schools were given theopportunity to provide cartoon reportsin which the speech balloons of the

characters explained the test scores. Astime went on, and printing technologyimproved, the reports contained moreand more graphics. States with largepercentages of students whose parentsdid not speak English began providingreports in other languages, such asSpanish and Navajo. At first these lan-guages were only used for static inter-pretive information preprinted on theback of the page, but eventuallybecame dynamic text that changedwith the students’ scores.

The need for better parent reports con-tinues, especially in light of the moresophisticated assessments states areusing today. As new concepts such asproficiency are introduced to parents,reports must provide the informationin easy to understand fashion.

An innovation the states would like tosee is the creation of interactive, web-based reports (Innovation 1.01) thatallow the parent to delve as deeply aspossible into the information abouttheir child. Presentation of the datawith links to interpretive information,graphs and even videos that explainthe data would be helpful in drawingparents into the education of theirchild. The challenges associated withthis innovation are accessibility to allparents, since some do not have com-puters with Internet connections, andprotection of the confidentiality of theresults. One possible solution to theformer challenge is to have the resultsavailable on school computers for par-ents to come in and access. A solutionto the latter is a sophisticated systemof passwords and registrations similarto what is used today to access bankaccounts and financial statements onthe web. The interactive studentreports can also be useful for teachersto better understand test scores anddetermine what to do about theresults.

23 Innovation 1.04: Reliable System for Linking Test Records to Students: A set of affordable methods toreliably attach test results to students without unreasonable administrative effort, and even when stu-dents are mobile.

Page 40: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 39

D I SCUSS ION

Teacher performance

The decision to use student achieve-ment data as a measure of teacher per-formance is still somewhat controver-sial and not all states have endorsedthis approach. However, the states thathave done so need to have innovationsin the provision of test informationthat helps accomplish this goal.

Need: Fair and acceptable ways tomeasure teacher performancebased upon results

One of the criticisms of the use of stu-dent achievement data to measureteacher effectiveness is one of fairness.While it makes intuitive sense thatteachers should be evaluated on howmuch students learn in their class-rooms, how do we know which part ofthe student’s achievement is attributa-ble to a particular teacher? Variousmodels have been proposed for usingstudent data to evaluate teachers andthese need further research. Whatevermodels are adopted by the states, thetesting companies should be able toprovide the information that supportsthat model accurately and efficiently24.This might call for the creation of spe-cial longitudinal reports that onlyinclude students who have been withthat teacher for the full school year. Itmight also include the matching of ateacher’s students’ data from year toyear. Once we are able to accuratelymeasure student achievement gainswhile under a teacher’s instruction,and eliminate the spurious data, wecan present that information effective-ly to educational leaders and teachers.

School and District performance

State assessments must be used to eval-uate school and district performance.Since schools are often the unit offunding and administration for schooldistricts, they are often the focus of

rewards and sanctions that provide thedrivers for educational improvement.Under P.L. 107-110, the school is thebasic unit for which Adequate YearlyProgress is determined, though districtsalso face increased accountability. Thiscalls for accurate reports of school anddistrict performance from year to yearthat, like the teacher reports, eliminatespurious data that do not allow validconclusions about school performance.

Need: Reports that make sense foreducational leadership and thegeneral community

Reports must provide the data thateducational leaders need to evaluateschool effectiveness according to theaccepted models for evaluation.Reports that make sense to the generalcommunity are also needed since thesedata are often presented to the publicin newspapers, on websites, and inmailings to the community.Information that is complex and gar-bled causes the community to distrustthe schools. Information that is clearand meaningful causes the commu-nity to trust and want to help theschools.

Methods that states have used forreporting include school report cardsthat are mailed home or placed onwebsites, community meetings andpress releases. Report cards for dis-tricts and states are also requiredunder P.L. 107-110 [§1111(h)].

States need more innovations in thisarea to continue to improve the clarityand timeliness of reports to the com-munity of school performance as wellas district and state performance.Interactive websites are a good way todo this. Many businesses have devel-oped very sophisticated websites thatprovide excellent information in aninteractive self-paced manner.

24 Innovation 1.15, Reports of Teacher Effectiveness: Valid reports of student gains while under theinstruction of an individual teacher, with spurious data eliminated. Reports would aggregate data frommultiple groups of students taught by the same teacher.

Page 41: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

40 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

The e-learning profession has devel-oped excellent interactive tools forindividuals in schools and businessesto learn complex content. Assessmentscan be used to determine what the per-son already knows so that the materi-als can be tailored to his or knowledgelevel. States need to have access totools like these for informing their con-stituency about educational perform-ance.

Program evaluation

If test results are merely used to reportthe status of schools and students andare not used to improve educationalpractice, they fall far short of their pur-pose. Program evaluation involvesdetermining the efficacy of educationalprograms and activities. Which text-book programs works? Which technol-ogy programs are most effective?Which organizational schemes makethe most sense?

Need: Reports and schemas thatprovide actionable information ofwhich programs and practiceswork

States need reports and schemas(methods for working with the infor-mation) that lead to action. If theresults are not available in the properformat at the proper time, or the inter-pretive schema is inappropriate orpoorly applied, the results will not beeffectively used to improve studentlearning.

This calls for a close interactionbetween test developers, scorers, andeducational researchers so that theneeded actionable data are providedefficiently and effectively. Each type ofprogram evaluation requires the testingprogram to provide appropriate data.For example, the model promoted byJust For the Kids and National Centerfor Educational Accountability requires

statewide testing, standards, informa-tion on student demographics, exemp-tions and special education status, stu-dent-level data with linked recordsbetween databases, and access to thedata. The requirements are actuallymore specific and can be found athttp://www.just4kids.org/US/US_other-states.asp.

The Education Value-AddedAssessment System, developed byWilliam Sanders of Tennessee, requiresannual testing with norm-referenced orcriterion-referenced tests that possesscertain psychometric features.Information can be found at http://www.sasinschools/evaas/index.shtml

Another value-added model is onedeveloped by Anthony Bryk of theConsortium on Chicago SchoolResearch (http://www.consortium-chicago.org/achvm.html), which alsodepends upon annual testing at everygrade with a standardized test andnumerous demographics.

The innovation needed is for testingcompanies to place cooperate withresearchers on the continued develop-ment of good program evaluation mod-els25 and on reasonable ways to providethe information needed to feed thesemodels.

Diagnostics

Most state assessment programs haveprovided a mechanism for obtainingdiagnostic information, either from thestate assessment, or from a coordinat-ed classroom assessment system. Manypublishers have provided products thatdiagnose students’ strengths and weak-nesses on the state academic standards.P.L. 107-110 requires that the stateassessments produce individual stu-dent diagnostic reports to parents.Furthermore the assessments must beprovided to districts, schools and teach-

25 Innovation 1.13: Valid Program Evaluation Models: Improved models for using state assessment pro-grams for program evaluation.

Page 42: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 41

D I SCUSS ION

ers in a manner that is clear and usefulfor improving the educational achieve-ment of individual students.

Need: Better diagnostic informa-tion that is actionable

Regardless of the specific requirementsin P.L. 107-110, diagnostic informationis required for educational improve-ment. One of the difficulties that stateshave faced in the past is the balancebetween providing detailed diagnosticinformation on objectives and testsecurity. The more specific the infor-mation is about what the student canand cannot do, the more the items ofthe test are exposed. Releasing formsyearly drives up costs and puts addi-tional strains on developmentresources. A more intrinsic difficulty isproviding the type of detailed informa-tion useful to on the basis of a single,end-of-year assessment. Useful infor-mation can certainly be gleaned fromsuch assessments, but there are limits.

States need innovative methods to pro-vide detailed, targeted diagnostic infor-mation to students on the state aca-demic standards without compromis-ing the state assessment program’svalidity, security and cost. One possi-ble method is a coordinated system ofclassroom-based assessments that istied to the state standards and to thestate tested objectives, but not to thestate tested items26. This diagnosticassessment addresses the need fordetailed diagnostic information in atimelier manner, more often (whenteachers can intervene and correctinstead of merely memorializing failureafter the fact).

Numerous systems exist on the markettoday and are of varying quality andefficacy. Providers of these systemsshould be required to collect efficacydata on the effectiveness of those sys-

tems for improving student achieve-ment.

The innovation of providing commonbanks of objectives, items and tests(Innovation 2.01) would go a long wayto toward satisfying the need for diag-nosis. The key to the effective use ofthese tests for diagnosis is the type ofdata they provide. If that data is action-able and understandable, it wouldassist greatly in the improvement ofstudent achievement.

APPLICATION

All of the innovations in testing pro-grams: content, information, logistics,and interpretation are only effective tothe degree that the information isapplied properly. Many of the aspectsof application of the testing informa-tion to the educational process arebeyond the scope of this document,because they go past the testing pro-gram per se and move into the areas ofcurriculum development, staff develop-ment and pedagogy. To a certainextent, however, some innovationsfrom the testing community can assistgreatly in the appropriate applicationof assessment results to educationalprogress.

Improvement in practices

Improvement in educational practicecomes from finding out what workswell and doing it.

Need: Ways to identify processesand programs that work

In order to identify what works, theassessment data must be carefully tar-geted and clearly applicable to answerthe question. The testing providersmust be prepared to respond to knowl-edge gained from the education com-munity about effective practice—such

26 1.02: Coordinated Diagnostic Tests: Systems of diagnostic, classroom-based assessments that are keyedto the state standards and assessments. These tests allow teachers to do formative evaluations throughoutthe year to improve student performance.

Page 43: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

42 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

as research on reading, in particular,and instructional design more general-ly—and inform their state clients ofthis and the innovations they are put-ting into their testing tools and systemsto reflect this improved knowledgebase.

Improvement in testing quality

There are two types of informationprovided by a testing program: testsscores and transaction data. Test scoreshave been addressed throughout thisdocument. Transaction data consists ofinformation on who took the tests,where, when, and how the test itselfbehaved. This kind of data provides forthe improvement of the test instru-ments over time. Bad items and itemtypes are eliminated, ineffective proce-dures are replaced, and misunderstoodreports are revised.

Need: Ways to continually improvethe quality of testing programs bylearning

While publishers have long used trans-action data to improve their publishedinstruments, there has been less con-sistent application of transaction datato the improvement of custom stateassessments. States need methods tolearn from their successes and mis-takes to improve the assessment sys-tem.

Some of the innovations discussedalready can be of use in meeting thisneed. For example, computer-adminis-tered tests (Innovation 2.02) can cap-ture and save every keystroke, whichinformation can be used to improvethe test and the testing experience.Common item banks (Innovation 2.01)can store useful information on itemsbeyond the traditional item statistics.Information websites (Innovation 1.01)can provide anecdotal information inan organized fashion so that state edu-cators learn from each other aboutwhat works in assessment. Developinga constant improvement process will

enhance the state assessment programssignificantly.

Staff Development

Staff development for improving teach-ing is beyond the scope of this analysis,but a key area of staff developmentrelated to assessment is not. Teachersneed to be made better makers andconsumers of tests. This will onlyoccur if educational leadership in astate develops a consensus that teach-ers need assessment training and moti-vates teachers to want it. Currently,teachers do not take available assess-ment courses in college because thereare “too many other program require-ments.” To be effective, training inassessment may need to be required aspart of staff development or re-certifi-cation.

Need: Staff development programsthat make teachers better makersand consumers of tests

While high-stakes state assessmentsneed to be professionally developed,there are many types of assessmentsthat teachers develop. Teachers varysignificantly in their training to devel-op valid, reliable tests, and many prod-ucts and services exist to make it easierfor teacher to develop or select testsfrom item banks and test banks. Theseproducts need to be tied to the stateacademic standards, but more impor-tantly, teachers need staff developmenton how to use these assessments prop-erly.

Staff development is time consumingand expensive, but fortunately newmethods exist for effectively trainingteachers on testing. E-learning systemsinclude both synchronous (instructor-led) and asynchronous (self-paced) aswell as blended (a combination ofboth) methods. These program neednot be completely online, but can be acombination of online, video, CD-ROMand face-to-face programs. Testing com-panies and other entities should devel-

Page 44: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 43

D I SCUSS ION

op more comprehensive and moreaccessible teacher staff developmentprograms using whatever methodswork27. A teacher educated aboutassessment techniques and informationuse is a prerequisite to educationalimprovement.

In addition to staff development, a system of learning verification or certi-fication28 would be very useful.Administrators would know that theteachers are properly trained in thethings they need to know about assess-ment. Teachers will benefit fromknowing how to use test results effec-tively. The certification program can beassociated with rewards for the teach-ers who complete it (such as a stipend)and can be accompanied by informalpre-certification tests that the teacherscan take on a computer, in private, toprepare for the official certification.This method is widely used in otherprofessions.

BUSINESS

The last set of needs surrounds thebusiness of state assessment programs.States almost always contract with testproviders – publishers, testing servicecompanies, training companies, andtechnology companies – to deliverparts of the state assessment program.States have a responsibility to their tax-payers to conduct their business effi-ciently and effectively, and providershave similar commitments to theirstakeholders. Without customers, busi-ness cannot survive, and withoutproviders of services, states cannot runtesting programs. This interdependencemotivates all parties to develop betterbusiness systems so that both groupscan thrive.

Contractual arrangements

Currently, each state contracts withtesting companies to provide productsand services. State business lawsrequire differences in contractual pro-visions and details from state to state,and testing companies are used to thisand have systems to deal with thismethod of doing business.

Need: Ways to share businessarrangements that work, ratherthan reinventing the past

The decentralized nature of the busi-ness arrangements between states andtesting companies often results in wast-ed efforts. Each state goes through thesame process of negotiating with thetesting companies that another statehas just completed. Testing companiesmay be frustrated that they have toexplain to each state individually thecosts and logistical requirements ofrunning testing programs. An innova-tive way of doing business that pro-tects the independence and sovereigntyof states, but allows them to learn fromeach other’s contractual dealings so asto write fair contracts efficiently wouldbe very beneficial to the states and tothe publishers as well.

An obstacle to this is federal laws oncompetition that do not permit busi-ness to discuss or set prices with theircompetitors. One innovation thatwould be worth pursuing is some waythat states could share a body of con-tractual techniques and information29,including methods of fair pricing, sothat they could concentrate on runningtesting programs and not spending tax-payers’ money on contract negotiationsand legal fees.

27 Innovation 2.06: Staff Development in Assessment Development and Use for Teachers: Exemplary pro-grams that train teachers to develop, use, and understand assessments using e-learning techniques, eitheralone or in concert with instructor-led training.

28 Innovation 2.08: Teacher Certification in Assessment Knowledge: A portable certification of teacher com-petence in the development, use and interpretation of assessments.

29 Innovation 1.03: Collaborative State Resource for Assessment Systems: An organization serving as aresource on the legal defensibility of state assessments, possible timelines for testing program schedules,and state/vendor contractual techniques.

Page 45: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

44 � EDUCATION LEADERS COUNCIL

D I SCUSS ION

Increasing value

The burden for increased testing faceslimited funds at the state level. Statesneed to get more for their money, i.e.,greater value for their testing expenses.This includes reducing costs andincreasing utility.

Need: Better ratio of utility to cost

Cost reduction is relative. Increasingthe number of grades tested or subjectareas and increasing the size of statetests by testing more academic stan-dards and ordering more score reportswill certainly increase total cost, butstates need a way to decrease the unitcosts of items and of student scores. Asthe ratio of utility (what the programprovides for the state that leads to edu-cational improvement) to cost increas-es, so does value.

Innovations that might increase valueinclude the following:

• Item banks that reduce the amountof novel item and test developmentfor which a state must pay(Innovation 2.01)

• Replacement of paper and penciltest (which require paper purchase,printing, handling, shipping, distri-bution, collection, scanning and stor-age or destruction) with computer-administered tests, when computer-administered tests become financial-ly feasible (Innovation 2.02)

• Automated scoring of questions thatcurrently require trained humanreaders (Innovation 1.05)

• Web-based reporting that saves theprinting and distribution costs ofreports (Innovation 1.01)

• E-learning programs that providestaff development more efficiently,

particularly for remote locations(Innovation 2.06)

• Longitudinal tracking systems thatallow more information to follow amigrant student, rather than discard-ing expensive test results becausethey cannot be matched (Innovation1.04)

Each of these innovations was dis-cussed elsewhere in this document andcould result in greater cost-effective-ness in testing programs.

Protection

When assessment programs are highstakes, they create legal exposure forthe states, districts and schools, as wellas the testing companies. States andtesting companies must work togetherto minimize the exposure to costly law-suits and judgments.

Need: Better protection of thestate education agency from litiga-tion and liability due to errors andomissions of others

It is appropriate for each party to acontract to be responsible for errorsand omissions of their own, but theyshould be protected from errors andomissions of others. Because the stateassessment efforts are expanding soquickly, new areas of exposure andnew types of errors and omissions areconstantly being discovered. States, inparticular, need to be protected againstany error or omission of others. Bysharing contractual language regardingliability and protection (included inInnovation 1.03), the states and testingcompanies can concentrate onimprovement of the content and logis-tics of programs and not waste timeand resources dealing with legal issues.

Page 46: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 45

We thank the following individuals for their advice and input regarding this document, pri-marily through their participation at the Education Leaders Council Testing Summit, heldFebruary 20-21, 2002 in Austin, Texas. By including their names here, we acknowledge

their contributions; this list should not be interpreted as necessarily implying any endorsement ofthe ideas contained in this document by the individuals or organizations listed herein. This docu-ment is a product of the participating states listed on the cover, the Education Leaders Council, andAccountabilityWorks.

Acknowledgements

Michaele AlcaroPolicy SpecialistPennsylvania Department of Education

Dr. Arturo AlmendarezDeputy Commissioner for Programsand Instruction

Texas Education Agency

Leslye ArshtPresidentStandardsWork

Virginia BellandVice-President, SalesEastern DivisionEdVision

Sondra BisigDirector of CurriculumImplementation

Lightspan, Inc.

Larry BosleyVice-President Territory SalesVantage Learning

Stacey BoydPresident & CEOProject Achieve, Inc.

Merri BrantleyResearch and Policy AdvisorOffice of the State Superintendent ofSchools

Georgia Department of Education

Benjamin Brown, Ph.D.Executive DirectorEvaluation and AssessmentTennessee Department of Education

Ron CarriveauState Assessment DirectorArizona Department of Education

David ChayerVice President, Research and TestDevelopment

Data Recognition Corporation

Maye ChenVice President of OperationsProject Achieve, Inc.

Mitchell D. ChesterAssistant SuperintendentOffice of AssessmentOhio Department of Education

Dr. Criss CloudtAssociate Commissioner forAccountability

Texas Education Agency

Page 47: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

46 � EDUCATION LEADERS COUNCIL

ACKNOWLEDGEMENTS

Wilmer “Bill” CodyNCES/WestatEducation Policy Studies

William CoxStandard and Poors

Richard CrossDirector of ResearchAccountabilityWorks

Keith CruseManaging DirectorTexas Education Agency

Joel DandoManager Public AffairsRenaissance Learning Inc.

Jill DannemillerAssociate Director of SchoolAccountability

Ohio Department of Education

Alicia DiazEducation ConsultantOhio Department of EducationPolicy Research and Analysis

Richard DobbsCTB McGraw Hill

Chrys DoughertyDirector of ResearchNational Center for EducationalAccountability/Just 4 Kids

Dennis DoyleCo-founder and Chief AcademicOfficer

Schoolnet, Inc.

Brad DugganInterim President and ExecutiveDirector

National Center for EducationalAccountability/Just For Kids

Susan EngeleiterPresident and Chief Operating OfficerData Recognition Corporation

Thomas H. Fisher, Ed. DEducational Testing & AssessmentAdministrator

Florida Department of Education

Steve FleischmanExecutive DirectorEducation Quality Institute

Andrea FullerVice President, Sales & MarketingSMARTHINKING, Inc.

Matthew GandalExecutive Vice PresidentAchieve, Inc.

Mark GedlinskeChief Technology OfficerData Recognition Corporation

Christopher GergenPresident/Chairman and Co-founderSMARTHINKING, Inc.

Dennis GormleyETS K-12 Works

Maureen E. GrazioliVice President, Sales and MarketingRiverside Publishing

Ruth Grimes-CrumpOffice of Elementary and SecondaryEducation

U.S. Department of Education

Dr. David J. HarmonDirector of Research, Evaluation, andTesting

Georgia Department of Education

Carolyn HaugPolicy AssessmentColorado Department of Education

Dr. Hugh HayesDeputy Commissioner for Initiativesand Administration

Texas Education Agency

Page 48: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 47

ACKNOWLEDGEMENTS

Mark Heidorn, Ph.D.Director of Program Operations-Northern Region

CTB/McGraw-Hill

James M. Hill, IIIDirector, Computer-based TestingMarketing

Harcourt Educational Measurement

Steve HodasThe Princeton Review

James “Jim” HorneSecretary, Florida Board of Education

Shannon HoussonDirector of Analysis and ReportingTexas Education Agency

Steven HoyVice PresidentLightspan, Inc.

Gary HugginsExecutive DirectorELAC

Jeremy HughesDirector, Office of EducationalAssessment and Michigan MeritAward

Michigan Department of Treasury

Brian JonesVice President for Communication andPolicy

Education Leaders Council

Debra JonesSenior Intelligence Advisor SAS inSchool

Dr. Margie JorgensenVice PresidentHarcourt

John KatzmanCEOThe Princeton Review

Lisa Graham KeeganCEO Education Leaders Council

Richard KohrMeasurement and EvaluationSupervisor

Pennsylvania Department of Education Division of Evaluation and Reports

Carolyn KosteleckyAssistant Vice President of EducationalServices

ACT, Inc.

Steve KromerVice PresidentNCS Pearson

Stephen KutnoThe Princeton Review

Jackie LainAssociate DirectorStandard and Poors

John E. Laramy, Ph.D.PresidentRiverside Publishing

Cait Ngo LeeEvaluation and Grants CoordinatorLightspan, Inc.

Lenny LockEducation AssociatePennsylvania Department of Education Division of Evaluation and Reports

Dr. Kathleen MadiganExecutive DirectorNational Council on Teacher Quality

Gary MainorPresidentNCS Pearson

Dewayne MatthewsVice PresidentEducation Commission of the States

Patricia McDivittVice President, EducationalMeasurement & Test Development

ETS K-12 Works

Page 49: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

48 � EDUCATION LEADERS COUNCIL

ACKNOWLEDGEMENTS

Thomson W. McFarlandResearch AssociateAccountabilityWorks

Brian McGeeRegional Vice President, SouthernRegion

Harcourt Educational Measurement

Georgia McKeownChief of Staff - Jim HorneFlorida Board of Education

Dr. Ron McMichaelDeputy Commissioner for Finance andAccountability

Texas Education Agency

Michael McNallyVice-President of BusinessDevelopment

Vantage Learning

Chad A. MillerPolicy AnalystEducation Leaders Council

Libbie L. Miller, Ph.D.Statistical / Research AnalystLightspan

Dr. William J. MoloneyCommissioner of EducationColorado Department of Education

Dr. Mark MoodyAssistant State Superintendent forPlanning, Results, and InformationManagement

Maryland State Department ofEducation

Judith MorganExecutive Assistant/Press for FayeTaylor

Tennessee Department of Education

Ina MullisProfessor and Co-Director of theInternational Study Center

Lynch School of Education, BostonCollege

Dean H. Nafziger, Ph.D.PresidentHarcourt Educational Measurement

Dr. Beverly NashAssistant Division DirectorResearch, Evaluation, and TestingGeorgia Department of Education

Jeff NellhausAssociate Commissioner for StudentAssessment

Massachusetts Department ofEducation

Jim NelsonTexas Commissioner of EducationTexas Education Agency

Jo O’BrienAssistant to the CommissionerColorado Department of Education

Dr. Billie OrrPresidentEducation Leaders Council

John OswaldConsultantAccountabilityWorks

Shaundra OvermyerManager of State Assessment Programs Data Recognition Corporation

Terrance (Terry) D. PaulCo-ChairmanRenaissance Learning Inc.

Dr. Ron PeifferAssistant State Superintendent ofSchools

Maryland State Department ofEducation

Heidi PerlmanMassachusetts Department ofEducation

Robert B. PetersonPresidentC4SI, Inc.

Page 50: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

STATE INNOVATION PR IORIT IES FOR STATE TEST ING PROGRAMS � 49

ACKNOWLEDGEMENTS

Linda PfisterPresident and CEOETS K-12 Works

Susan PhillipsAdviser and ConsultantAccountability Works

Susan PimentelStandardswork

Kelly PowellELC Consultant

Debbie RatcliffSenior Director of CommunicationsTexas Education Agency

Theodor RebarberPresidentAccountabilityWorks

Dr. June RiversAssistant Manager, Value-AddedAssessment and Research

SAS in School

Marilyn RobertsMichigan Department of Education

Ed RoeberVice President for External RelationsMeasured Progress

Dr. Bill SandersResearch Fellow, University of NorthCarolina

Manager, Value-Added Assessment andResearch

SAS in School

Gary A. Schaeffer, Ph.D.Director of Research ProjectsCTB/McGraw-Hill

John SchillingChief of StaffEducation Leaders Council

Linda C. SchrenkoState Superintendent of SchoolsGeorgia Department of Education

Amanda SealsDirector of Media RelationsOffice of the State Superintendent ofSchools

Georgia Department of Education

Ramsay W. SeldenVice President & Director, AssessmentProgram

American Institutes for Research

Anne SmiskoAssociate Commissioner forCurriculum, Assessment, andTechnology

Texas Education Agency

Malbert SmithPresidentMetaMetrics, Inc

Russell R. SmithManaging DirectorRothary Landis Group

Larry SnowhiteVice President, Government RelationsHoughton Mifflin Company

Dr. Lew SolmonSenior Vice PresidentMilken Family Foundation

Patricia SpenceDirector of Psychometric ServicesData Recognition Corporation

Bernia StaffordVice President of School Marketingand Evaluation

Lightspan, Inc.

Jack StennerChairman and CEOMetaMetrics, Inc

John StephensExecutive DirectorNational Assessment Governing Board

Phyllis StolpDirector of Development andAdministration

Texas Education Agency

Page 51: STATE INNOVATION PRIORITIES - AccountabilityWorksinnovations (e.g., computerized testing) address multiple sets of needs. This document presents 25 innovations, divided into two time

50 � EDUCATION LEADERS COUNCIL

ACKNOWLEDGEMENTS

Karen StroupOffice of Management ServicesColorado Department of Education

Circe StumboPresidentWest Wind Enterprises

Leonard SwansonVice President, TechnologyETS K-12 Works

Robert W. Sweet, Jr. Professional Staff Member Committee on Education and theWorkforce

Faye TaylorCommissioner of EducationTennessee Department of Education

Blake ThompsonData Analysis ManagerGreatSchools.net

Cassandra ThompsonETS K-12 Works

William “Bill” TudorChairman & CEOEdVision

Cynthia WardVice President, Contract Managementand Proposals

ETS K-12 Works

John WinnAssistant SecretaryFlorida Board of Education

Wendy Yen, Ph.D.Vice President, ResearchETS K-12 Works

Dr. Michael YoesNCS Pearson

Philip B. Young, Ph.D.Director, State ProgramsRiverside Publishing

Victoria YoungDirector of Instructional CoordinationTexas Education Agency

Jodie ZalkAssessment SpecialistMassachusetts Department ofEducation

Molly ZebrowskiManager of Business DevelopmentNCS Pearson

Charles Zogby Secretary of EducationPennsylvania Department of Education