literature synthesis on curriculum-based measurement in ...shinn, 1989). research in reading focused...

THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007/PP. 85–120 85

Curriculum-based measurement (CBM) is a method for mon-itoring student growth in an academic area and evaluating theeffects of instructional programs on that growth (Deno, 1985).CBM was designed to be part of a problem-solving approachto special education whereby the academic difficulties of stu-dents would be viewed as problems to be solved rather thanas immutable characteristics within a child (Deno, 1990). In theproblem-solving approach, teachers were the “problem solvers”who constantly evaluated and modified students’ instructionalprograms. For a problem-solving approach to be effective, itwas necessary for teachers to have a tool that could be usedto evaluate growth in response to instruction. CBM was de-veloped to serve that purpose.

Two separate but related concerns drove the initial re-search into the development of CBM (Deno, 1985). The firstwas the concern for technical adequacy. If teachers were to usethe measures to make instructional decisions, the measureswould have to have demonstrated reliability and validity. Thesecond was the concern for practicality. If teachers were touse the measures on an ongoing and frequent basis to evalu-ate instructional programs, the measures would have to besimple, efficient, easily understood, and inexpensive. These dualconcerns led to the concept of “vital signs,” or indicators ofstudent performance (Deno, 1985). CBM measures were con-ceptualized to be short samples of work that would be indi-cators, or vital signs, of academic performance. The sampleswould need to be valid and reliable with respect to the broaderacademic domain they were representing, but would also needto be designed to be given on a frequent and repeated basis.

In 1989, Marston reviewed the existing research on CBM.At that time, CBM was viewed primarily as a progress-monitoring tool in basic skills for special education studentsat the elementary-school level (although there were discus-sions and instances of its uses more broadly, for example, seeShinn, 1989). Research in reading focused on two measures:word identification and reading aloud. The results of Marston’s

Literature Synthesis on Curriculum-Based Measurement in Reading

Miya Miura Wayman, Teri Wallace, Hilda Ives Wiley, Renáta Tichá, and Christine A. Espin,University of Minnesota

In this article, the authors review the research on curriculum-based measurement (CBM) in readingpublished since the time of Marston’s 1989 review. They focus on the technical adequacy of CBM re-lated to measures, materials, and representation of growth. The authors conclude by discussing issuesto be addressed in future research, and they raise the possibility of the development of a seamless andflexible system of progress monitoring that can be used to monitor students’ progress across students,settings, and purposes.

review provided support for the use of these two measures asindicators of general reading proficiency. In terms of reliabil-ity, results of five studies revealed test–retest reliability coef-ficients ranging from .82 to .97, with most coefficients above.90, and alternate-form reliability coefficients ranging from.84 to .96, with most coefficients above .90. Interrater agree-ment was .99. In terms of validity, 14 studies were reviewed.Criterion-related validity coefficients with published mea-sures of reading ranged from .63 to .90, with most above .80.Criterion-related validity coefficients with basal reading se-ries criterion mastery tests ranged from .57 to .86, with halfabove .80. Reading aloud correlated with teacher judgmentand with various measures of reading comprehension, dis-criminated between lower and higher performing students,and was sensitive to growth.

Since the time of Marston’s (1989) review, the researchon CBM has expanded considerably—especially in the areaof reading—making an updated review timely. Our purposein writing this review is to gather, summarize, and reflect onthe expansive body of literature published over the last 18years on CBM in reading. We focus this review on issues oftechnical adequacy as they relate to measures, materials, andgrowth. Given the vast amount of material and the diversityof topics covered in this review, we insert summaries and dis-cussion points throughout the article. In our final section, wedraw conclusions, raise issues related to future research, anddiscuss the potential development of a seamless and flexiblesystem of progress monitoring that can be used across stu-dents, settings, and purposes.

Method

The first step in our review process was to identify all articlesaddressing CBM in reading, writing, and math in kindergarten(K) to Grade 12. Electronic databases—including ERIC,

Address: Miya Miura Wayman, Department of Educational Psychology, 250D Burton Hall, 178 Pillsbury Dr. SE, University of Minnesota, Minneapolis, Minnesota 55455. E-mail: [email protected]

86 THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007

Science Citation Index Expanded, PsycInfo, Digital Disser-tation, and the Expanded Academic Index—were searchedusing the following terms: curriculum based measurement,curriculum-based measurement, curriculum based measure,curriculum-based measure, general outcome measure, andprogress monitoring. This initial search yielded 578 articles,dissertations, and reports related to CBM. Titles and abstractsof these documents were screened to confirm that they wererelated to CBM, and Method sections were screened to iden-tify those that reported results of empirical studies of CBM,yielding 160 documents. These documents were then reviewedby a team of educational psychology graduate students andgrouped by subject area (reading, mathematics, spelling, andwriting). Ninety (56%) of the documents addressed readingmeasures. In addition to documents identified through the lit-erature search, the complete set of technical reports producedby the Institute for Research on Learning Disabilities (IRLD)at the University of Minnesota was accessed.

Given the vast database on reading, and the limitationsof space accorded a review article, we chose to narrow ourreview in several key ways. First, we focused on research pub-lished since the time of Marston’s 1989 review. Second, we fo-cused on studies related to the technical adequacy of readingmeasures. Studies in which teachers used CBM to monitorprogress and to make instructional decisions have been reviewedrecently (Stecker, Fuchs, & Fuchs, 2005). Third, we focusedon research conducted with school-age students and thus didnot include research on the development of early literacy (e.g.,Dynamic Indicators of Basic Early Literacy [DIBELS]; Ka-minski & Good, 1998) or preliteracy measures (e.g., Individ-ual Growth and Development Indicators; McConnell, McEvoy,& Priest, 2002). Finally, we focused on three common CBMreading measures: reading aloud, maze selection, and wordidentification (see Note 1). Before reporting the results of ourreview, we present a brief discussion of the issue of validity,relating it specifically to the approach taken to investigate thevalidity of measures in CBM.

Issues of ValidityMessick (1989a, 1989b) described construct validity as a mul-tifaceted, but unified, concept that takes into account the ev-idential and consequential factors related to test interpretationand test use. This conceptualization is helpful in understand-ing the research on CBM, which has examined the evidencesupporting the interpretation and use of CBM scores, as wellas the consequences associated with that interpretation anduse. In this review, we focus on the evidential basis for valid-ity of CBM measures, although, in keeping with the view ofvalidity as a unified concept, we continuously consider the po-tential consequences associated with interpretation and use ofthe measures.

In a CBM approach, evidence for the validity of a mea-sure is determined by examining the extent to which themeasure serves as a vital sign or an indicator of a broader aca-

demic domain (Deno, 1985). To determine the evidential basisfor the validity of a measure, the pattern of relations—also re-ferred to as the nomological net (see Cronbach & Meehl, 1955;Messick, 1989b)—between the selected measure and manydifferent criterion measures, each reflecting the construct ofinterest, is examined. Criterion measures might include othermeasures of the construct, such as standardized achievementtests, but might also include student age, group membership,and change in performance in response to an intervention. Itis not just the pattern of relations but also the pattern of non-relations, or the discriminative validity of the measure, that isconsidered. For example, a measure of reading would be ex-pected to relate more closely to another reading measure thanto a math measure. After a pattern of relations is establishedfor particular students, settings, or purposes, the generaliz-ability (Messick, 1989b) of the measure, or the extent to whichthe validity of the measure holds across different settings, stu-dents, and purposes, is examined.

Establishing the validity of a measure is an ongoing andrecursive process (Messick, 1989b). Validity is determined notby one study or by one correlation but by the body of evidenceamassed over time. The question arises, then, as to how onedetermines when the data are strong enough to support the useof the measure for the purpose for which it was intended. Inother words, how good is good enough? There is no set stan-dard for determining when a measure is “good enough” to beconsidered valid for the purpose for which it was designed.One approach is to consider the consequences of decisionmaking with and without the measure (Messick, 1989b). Forexample, one might ask whether using a measure improvesthe ability to make decisions over using no measure at all. Inan area in which few measures exist, this question might bewarranted. Under such circumstances, correlations of .30between the selected measure and the criterion measuresmight be considered strong enough to warrant use of an in-strument for decision-making purposes. However, in the areaof reading, where many measures exist, it seems appropriateto apply more stringent criteria. In addition, the early researchon CBM, as illustrated in Marston’s (1989) review, set a rel-atively high standard for validity and reliability, with manycorrelations reaching levels of .70 to .90. Thus, in this article,we adopt the following guidelines in interpreting the strengthof the reliability and validity coefficients: Strong relations arethose that are .70 and above; moderate relations are those thatare .50 to .70; and weak relations are those that are below .50.We remind the reader, however, that these levels are arbitrar-ily chosen and used merely to help the reader interpret thestrength of relations compared to previous research in read-ing in CBM. To evaluate the overall validity of the measures,it is necessary to consider the entire body of research.

Our review is organized into three sections: TechnicalAdequacy of CBM Measures, Effects of Text Materials, andIssues About Measuring Growth. The studies described ineach section are also outlined in Table 1. We begin our review

(text continues on p. 105)

87T

AB

LE

1.C

hara

cter

isti

cs o

f St

udie

s E

xam

inin

g Te

chni

cal F

eatu

res

of C

BM

in R

eadi

ng

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Rea

ding

Alo

ud

Fuch

s,Fu

chs,

&

354–

8SE

Rea

ding

alo

ud1

WR

CSt

anfo

rd A

chie

vem

ent T

est

Max

wel

l (19

88)

Rea

ding

Com

preh

ensi

on s

ubte

st:.

91W

ord

Skill

s su

btes

t:.8

0

Shin

n,G

ood,

Knu

tson

,23

83

& 5

ND

Rea

ding

alo

ud1

WR

CC

onfi

rmat

ory

fact

or a

naly

sis

& T

illy

(199

2)3

Uni

tary

Mod

el

Rea

ding

Com

pete

nce:

.88–

.90

5Tw

o-Fa

ctor

Mod

elD

ecod

ing:

.89–

.90

Com

preh

ensi

on:.

74–.

75

Hos

p &

Fuc

hs (

2005

)31

01–

4N

DR

eadi

ng a

loud

1W

RC

Woo

dcoc

k R

eadi

ng M

aste

ry T

est–

Rev

ised

Test

–ret

est

(WR

MT-

R)

1W

ord

Atta

ck:.

71W

ord

Iden

tific

atio

n:.9

1.9

2–.9

7Pa

ssag

e C

ompr

ehen

sion

:.79

Bas

ic S

kills

:.86

Tota

l Rea

ding

–Sho

rt:.

902

Wor

d A

ttack

:.82

Wor

d Id

entif

icat

ion:

.88

Pass

age

Com

preh

ensi

on:.

83B

asic

Ski

lls:.

89To

tal R

eadi

ng–S

hort

:.91

3W

ord

Atta

ck:.

82W

ord

Iden

tific

atio

n:.8

8Pa

ssag

e C

ompr

ehen

sion

:.84

Bas

ic S

kills

:.87

Tota

l Rea

ding

–Sho

rt:.

914

Wor

d A

ttack

:.72

Wor

d Id

entif

icat

ion:

.73

Pass

age

Com

preh

ensi

on:.

82B

asic

Ski

lls:.

78To

tal R

eadi

ng–S

hort

:.83

Kra

nzle

r,B

row

nell,

574

GE

Rea

ding

alo

ud1

WR

CE

lem

enta

ry C

ogni

tive

Task

s:−.

13&

Mill

er (

1998

)K

aufm

an B

rief

Int

elli

genc

e Te

st

Mat

rice

s:.2

4K

aufm

an T

est

of E

duca

tion

al A

chie

vem

ent

Rea

ding

Com

preh

ensi

on:.

41

Rea

ding

alo

ud e

xpla

ined

11%

of

the

vari

ance

w

hen

cont

rolli

ng f

or g

ener

al c

ogni

tive

abili

ty

and

men

tal s

peed

(tab

le c

onti

nues

)

88(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Mar

kell

& D

eno

423

GE

Rea

ding

alo

ud1

WR

CL

arge

incr

ease

s in

rea

ding

alo

ud s

core

s (1

997)

wer

e as

soci

ated

with

incr

ease

s in

per

form

-an

ce o

n m

aze

sele

ctio

n an

d co

mpr

ehen

sion

qu

estio

ns. A

n in

vers

e re

latio

nshi

p w

as f

ound

be

twee

n di

ffic

ulty

leve

l and

mea

n re

adin

g al

oud

scor

es.

Ham

ilton

& S

hinn

66

3N

DR

eadi

ng a

loud

1W

RC

Wor

d C

alle

rs p

erfo

rmed

low

er th

an d

id S

imi-

(200

3)la

rly

Flue

nt P

eers

on

Rea

ding

Alo

ud,M

aze

Sele

ctio

n,C

ompr

ehen

sion

,Ora

l Que

stio

n A

nsw

erin

g,an

d W

oodc

ock

Rea

ding

Mas

tery

Te

stPa

ssag

e C

ompr

ehen

sion

.

Maz

e Se

lect

ion

Esp

in,D

eno,

Mar

u-2,

604

1–6

ND

Maz

e se

lect

ion

1C

SR

eadi

ng A

loud

Patte

rn o

f gr

owth

yam

a,&

Coh

en (

1989

)fo

und

with

in a

nd

betw

een

grad

es1 2 3

.77

4.8

65

.86

6Fu

chs

& F

uchs

(19

92)

(18

wee

ks,2

tim

espe

r w

eek)

Yea

r 1

635.

12

SEM

aze

sele

ctio

n2.

5C

S.3

1 C

S pe

r w

eek

(ave

rage

gra

de)

Yea

r 2

635.

12

SEM

aze

sele

ctio

n2.

5C

S.2

9 C

S pe

r w

eek

(ave

rage

gra

de)

257

—G

EM

aze

sele

ctio

n2.

5C

S.3

9 C

S pe

r w

eek

Jenk

ins

& J

ewel

l (19

93)

335

2–6

ND

Maz

e se

lect

ion

1C

SG

ates

-Mac

Gin

itie

Tota

l Rea

ding

:.65

–.76

Patte

rn o

f gr

owth

G

ates

-Mac

Gin

itie

Com

preh

ensi

on:.

63–.

75fo

und

with

in a

nd

Met

ropo

lita

n A

chie

vem

ent T

est–

6 (M

AT-

6)

betw

een

grad

esTo

tal R

eadi

ng:.

66–.

76M

AT-

6 C

ompr

ehen

sion

:.60

–.74

Teac

her

judg

men

t:.5

6R

eadi

ng a

loud

1W

RC

Gat

es-M

acG

init

ieTo

tal R

eadi

ng:.

67–.

88G

ates

-Mac

Gin

itie

Com

preh

ensi

on:.

63–.

86M

AT-

6 To

tal R

eadi

ng:.

60–.

87M

AT-

6 C

ompr

ehen

sion

:.58

–.84

Teac

her

judg

men

t:.6

6(t

able

con

tinu

es)

89(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Ard

oin

et a

l. (2

004)

753

GE

Maz

e se

lect

ion

3C

SW

oodc

ock-

John

son–

III

Bro

ad R

eadi

ng:.

50R

eadi

ng F

luen

cy:.

51Pa

ssag

e C

ompr

ehen

sion

:.31

Let

ter

Wor

d Id

entif

icat

ion:

.43

Iow

a Te

st o

f B

asic

Ski

lls

Tota

l Rea

ding

:.46

Voc

abul

ary:

.35

Rea

ding

Com

preh

ensi

on:.

4977

3G

ER

eadi

ng a

loud

1W

RC

Woo

dcoc

k-Jo

hnso

n–II

I(s

ingl

e pa

ssag

e)B

road

Rea

ding

:.70

Rea

ding

Flu

ency

:.74

Pass

age

Com

preh

ensi

on:.

42L

ette

r W

ord

Iden

tific

atio

n:.6

2Io

wa

Test

of

Bas

ic S

kill

sTo

tal R

eadi

ng:.

64V

ocab

ular

y:.3

5R

eadi

ng C

ompr

ehen

sion

:.58

Wor

d Id

entif

icat

ion

Dal

y,W

righ

t,K

elly

,30

1G

EW

ord

1W

RC

Con

curr

ent:

Test

–ret

est

& M

arte

ns (

1997

)id

entif

icat

ion

Woo

dcoc

k-Jo

hnso

n–R

evis

ed

.94

Bro

ad R

eadi

ng:.

40Pr

edic

tive

Wor

d Id

entif

icat

ion:

.71

Rea

ding

Alo

ud:.

73Fu

chs,

Fuch

s,&

15

11

LA

Wor

d 1

WR

CC

oncu

rren

tC

ompt

on (

2004

)id

entif

icat

ion

Woo

dcoc

k R

eadi

ng M

aste

ry T

est–

Rev

ised

(WR

MT-

R)

Wor

d A

ttack

:.52

–.59

WR

MT-

R W

ord

Iden

tific

atio

n:.7

7–.8

2C

ompr

ehen

sive

Rea

ding

Ass

essm

ent

Bat

tery

(CR

AB

) Fl

uenc

y:.9

3C

RA

B C

ompr

ehen

sion

:.73

Pred

ictiv

e W

RM

T-R

Wor

d A

ttack

:.45

Wor

d Id

entif

icat

ion:

.63

CR

AB

Flu

ency

:.80

CR

AB

Com

preh

ensi

on:.

66Sl

ope

(1 y

ear)

WR

MT-

R W

ord

Atta

ck:.

50W

RM

T-R

Wor

d Id

entif

icat

ion:

.79

CR

AB

Flu

ency

:.85

CR

AB

Com

preh

ensi

on:.

66(t

able

con

tinu

es)

90(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Com

pton

,Fuc

hs,

206

1L

AW

ord

1W

RC

Pred

ictiv

e Fu

chs,

& B

ryan

t id

entif

icat

ion

Gra

de 1

(2

006)

5-w

eek

wor

d ID

leve

l:.8

05-

wee

k w

ord

ID s

lope

:.42

Gra

de 2

Unt

imed

Wor

d-L

evel

Rea

ding

:.39

Tim

ed W

ord-

Lev

el R

eadi

ng:.

46R

eadi

ng C

ompr

ehen

sion

:.42

Com

posi

te S

core

:.47

Gen

eral

izab

ility

of

Res

earc

h:St

uden

t Pop

ulat

ions

and

Pur

pose

s

Esp

in &

Den

o 12

110

HA

& L

AR

eadi

ng

1W

RC

(199

3a)

alou

d (E

nglis

h)H

ATe

sts

of A

chie

vem

ent

and

Pro

ficie

ncy

(TA

P) C

ompl

ete

Com

posi

te:.

36G

rade

poi

nt a

vera

ge:−

.02

Eng

lish

stud

y ta

sk:.

34Sc

ienc

e st

udy

task

:.43

LA

TAP–

Com

plet

e C

ompo

site

:.71

Gra

de p

oint

ave

rage

:.39

Eng

lish

stud

y ta

sk:.

54Sc

ienc

e st

udy

task

:.36

Esp

in &

Den

o12

110

ND

Rea

ding

alo

ud1

WR

CA

ltern

ate-

form

.91

(199

3b)

(Eng

lish)

Test

–ret

est .

91

Esp

in &

Den

o 12

010

ND

Rea

ding

alo

ud1

WR

CE

nglis

h st

udy

task

:.41

(199

5)(E

nglis

h)V

ocab

ular

y 10

CM

.44

mat

chin

g (E

nglis

h)R

eadi

ng

1W

RC

Scie

nce

stud

y ta

sk:.

32al

oud

(Sci

ence

)V

ocab

ular

y 10

CM

.40

Mat

chin

g (S

cien

ce)

Few

ster

& M

acM

illan

46

56–

7G

ER

eadi

ng a

loud

1W

RC

8th-

grad

e E

nglis

h m

ark:

.46

(200

2)8t

h-gr

ade

Soci

al S

tudi

es m

ark:

.39

Rea

ding

alo

ud1

WR

C9t

h-gr

ade

Eng

lish

mar

k:.3

89t

h-gr

ade

Soci

al S

tudi

es m

ark:

.36

Rea

ding

alo

ud1

WR

C10

th-g

rade

Eng

lish

mar

k:.3

110

th-g

rade

Soc

ial S

tudi

es m

ark:

.30

(tab

le c

onti

nues

)

91(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Yov

anof

f,D

uesb

ery,

6,01

24–

8N

DR

eadi

ng a

loud

1W

RC

Alo

nzo,

& T

inda

l (2

005)

4V

ocab

ular

y:.3

5–.6

3C

ompr

ehen

sion

:.60

–.65

5V

ocab

ular

y:.6

1–.6

5C

ompr

ehen

sion

:.60

–.62

6V

ocab

ular

y:.6

0–.6

1C

ompr

ehen

sion

; .42

–.52

7V

ocab

ular

y:.4

9–.5

3C

ompr

ehen

sion

; .42

–.48

8V

ocab

ular

y:.5

3–.5

7C

ompr

ehen

sion

:.51

–.52

Rel

ativ

e im

port

ance

of

flue

ncy

decr

ease

s as

gr

ade

incr

ease

s. V

ocab

ular

y is

a s

igni

fica

nt

pred

icto

r ac

ross

gra

des.

Esp

in &

Foe

gen

176

6–8

ND

Rea

ding

alo

ud1

WR

CC

ompr

ehen

sion

:.57

(199

6)A

cqui

sitio

n:.5

4R

eten

tion:

.52

Maz

e se

lect

ion

2C

SC

ompr

ehen

sion

:.56

Acq

uisi

tion:

.59

Ret

entio

n:.6

2V

ocab

ular

y m

atch

ing

5C

MC

ompr

ehen

sion

:.65

Acq

uisi

tion:

.64

Ret

entio

n:.6

2

Hix

son

& M

cGlin

chey

44

24

ND

Rea

ding

alo

ud1

WR

CM

ichi

gan

Edu

cati

onal

Ass

essm

ent

Pro

gram

(200

4)(M

EA

P) R

eadi

ngC

auca

sian

:.54

Afr

ican

Am

eric

an:.

54Pa

id L

unch

:.53

Free

/Red

uced

Lun

ch:.

50M

etro

poli

tan

Ach

ieve

men

t Tes

t–7

(MA

T-7)

Cau

casi

an:.

78A

fric

an A

mer

ican

:.64

Paid

Lun

ch:.

75Fr

ee/R

educ

ed L

unch

:.66

Rea

ding

alo

ud,l

unch

sta

tus,

and

race

eac

h si

gnif

ican

tly c

ontr

ibut

ed to

pre

dict

ing

perf

orm

-an

ce o

n M

EA

P an

d M

AT-

7

(tab

le c

onti

nues

)

92(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Kra

nzle

r,M

iller

,&

326

2–5

GE

Rea

ding

alo

ud1

WR

CA

t Gra

des

4–5,

sign

ific

ant i

nter

cept

bia

ses

Jord

an (

1999

)w

ere

foun

d ba

sed

on r

ace/

ethn

icity

. At

Gra

de 5

,sig

nifi

cant

inte

rcep

t and

slo

pe

bias

was

fou

nd b

ased

on

gend

er. N

o bi

ases

fo

und

at G

rade

s 2–

3.

Hin

tze,

Cal

laha

n,13

62–

5G

ER

eadi

ng a

loud

1W

RC

Age

and

rea

ding

alo

ud e

ach

sign

ific

antly

M

atth

ews,

Will

iam

s,co

ntri

bute

d w

hen

pred

ictin

g pe

rfor

man

ce o

n &

Tob

in (

2002

)W

oodc

ock

John

son–

Rev

ised

Rea

ding

Com

pre-

he

nsio

n. R

ace

and

soci

oeco

nom

ic s

tatu

s di

d

not

cont

ribu

te s

igni

fica

ntly

whe

n ad

ded

to

and

read

ing

alou

d m

odel

.

Bak

er &

Goo

d 76

2G

E/E

LL

Rea

ding

alo

ud1

WR

CC

onve

rgen

t Con

stru

ctE

nglis

h-on

ly(1

0 w

eeks

,(1

995)

2 tim

es p

er w

eek)

Eng

lish-

only

Poin

t .87

Lev

el .9

9Sl

ope

.39

Stan

ford

Dia

gnos

tic

Rea

ding

Tes

t Si

gnif

ican

tly d

if-

(SD

RT

)–To

tal S

core

:.51

fere

nt s

lope

s w

ere

foun

d fo

r E

nglis

h-

only

and

bili

ngua

l st

uden

ts w

ith b

i-

lingu

al s

tude

nts

show

ing

grea

ter

grow

th.

Stan

ford

Rea

ding

Com

preh

ensi

on

subt

est–

pret

est:

.56

Bili

ngua

lPo

int .

92L

evel

.99

Slop

e .4

9Te

ache

r ra

ting:

.82

Bili

ngua

lSD

RT

–Tot

al S

core

:.53

Stan

ford

Rea

ding

Com

preh

ensi

on

subt

est–

pret

est:

.73

Stan

ford

Rea

ding

Com

preh

ensi

on

subt

est–

post

test

:.76

Teac

her

ratin

g of

rea

ding

:.80

Dis

crim

inan

t Con

stru

ctE

nglis

h-on

lyE

nglis

h L

angu

age

Flue

ncy:

.54

Teac

her

ratin

g of

Eng

lish:

.62

(tab

le c

onti

nues

)

93(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Bili

ngua

lE

nglis

h L

angu

age

Flue

ncy:

.44

Teac

her

ratin

g of

Eng

lish:

.62

Lan

guag

e A

sses

smen

t Sca

le–E

nglis

h:.4

7

Wile

y &

Den

o (2

005)

3 &

5G

E &

EL

LM

inne

sota

Com

preh

ensi

ve A

sses

smen

t(M

CA

)36

3G

ER

eadi

ng a

loud

1W

RC

.71

Maz

e se

lect

ion

1C

S.7

3E

LL

Rea

ding

alo

ud.6

1M

aze

sele

ctio

n.5

233

5G

ER

eadi

ng a

loud

.57

Maz

e se

lect

ion

.73

EL

LR

eadi

ng a

loud

.69

Maz

e se

lect

ion

.57

Gra

ves,

Plas

enci

a-77

1E

LL

,R

eadi

ng a

loud

WR

C(6

wee

ks,w

eekl

y)Pe

inad

o,D

eno,

&

HA

,Jo

hnso

n (2

005)

AA

,&

L

AE

LL

/HA

1.40

WR

C

per

wee

kE

LL

/HA

2.30

WR

C

per

wee

kE

LL

/2.

10 W

RC

L

Ape

r w

eek

Kle

in &

Jim

erso

n 4,

000

1–3

GE

Rea

ding

alo

ud1

WR

CSt

anfo

rd A

chie

vem

ent T

est–

9To

tal

(200

5)R

eadi

ng,c

oncu

rren

tH

ispa

nic:

.71–

.82

Cau

casi

an:.

63–.

82Fe

mal

e:.7

5–.8

2M

ale:

.73–

.85

Free

/Red

uced

Lun

ch:.

73–.

82R

egul

ar L

unch

:.64

–.83

Hom

e L

angu

age

Span

ish:

.70–

.82

Hom

e L

angu

age

Eng

lish:

.67–

.82

SAT-

9 To

tal R

eadi

ng,p

redi

ctiv

eH

ispa

nic:

.66

Cau

casi

an:.

58Fe

mal

e:.7

2M

ale:

.62

Free

/Red

uced

Lun

ch:.

62

(tab

le c

onti

nues

)

94(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Reg

ular

Lun

ch:.

70H

ome

Lan

guag

e Sp

anis

h:.6

5H

ome

Lan

guag

e E

nglis

h:.6

3In

terc

ept b

ias

evid

ent w

hen

ethn

icity

and

ho

me

lang

uage

fac

tors

wer

e co

mbi

ned

Mor

gan

& B

radl

ey-

153–

7V

IR

eadi

ng a

loud

WR

CA

ltern

ate

form

John

son

(199

5)1

Dia

gnos

tic

Rea

ding

Sca

les

.88–

.93

Dec

odin

g L

evel

:.60

2.6

6.9

4–.9

83

.65

.94–

.96

Test

–ret

est

1D

iagn

osti

c R

eadi

ng S

cale

s .8

4–.9

4C

ompr

ehen

sion

Lev

el:.

782

.80

.93–

.96

3.8

2.9

3–.9

71

Dia

gnos

tic

Rea

ding

Sca

les

Ove

rall

Rea

ding

Lev

el:.

762

.80

3.8

0

Alli

nder

& E

ccar

ius

36—

D

/HH

Rea

ding

alo

udW

RC

The

Tes

t of

Ear

ly R

eadi

ng A

bili

ty—

Dea

f A

ltern

ate

form

(199

9)an

d H

ard

of H

eari

ng V

ersi

on1

.30

.85

3.2

1.9

4

Cra

wfo

rd,T

inda

l,&

51

2 &

3G

E1

WR

CSt

atew

ide

read

ing

asse

ssm

ent

Stie

ber

(200

1)(m

ultip

le-c

hoic

e)2

Pred

ictiv

e:.6

63

Con

curr

ent:

.60

Goo

d,Si

mm

ons,

&

706

1 &

3G

ER

eadi

ng a

loud

1W

RC

Ben

chm

ark

goal

of

40 W

RC

in 1

min

in

Kam

e’en

ui (

2001

)G

rade

1 p

redi

cted

con

tinue

d re

adin

g su

cces

s in

Gra

de 2

. Ben

chm

ark

goal

of

110

WR

C in

1

min

in G

rade

3 p

redi

cted

suc

cess

on

the

Ore

gon

Stat

ewid

e A

sses

smen

t.

Hin

tze

& S

ilber

glitt

1,

766

1–3

ND

Rea

ding

alo

ud1

WR

CM

inne

sota

Com

preh

ensi

ve A

sses

smen

t A

ltern

ate

form

(200

5)(M

CA

) 1

Pred

ictiv

e:.4

9–.5

8.8

9–.9

12

Pred

ictiv

e:.6

1–.6

8.8

0–.8

53

Con

curr

ent:

.69

.83–

.87

(tab

le c

onti

nues

)

95(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Res

ults

of

disc

rim

inat

ive

anal

ysis

,log

istic

re

gres

sion

,and

Rec

eive

r O

pera

ting

Cha

ract

er-

istic

(R

OC

) cu

rves

indi

cate

d th

at r

eadi

ng a

loud

is

an

effi

cien

t met

hod

for

pred

ictin

g su

cces

s on

th

e M

CA

.

McG

linch

ey &

1,

362

4N

DR

eadi

ng a

loud

1W

RC

Mic

higa

n E

duca

tion

al A

sses

smen

t P

rogr

am

Test

–ret

est

Hix

son

(200

4)(M

EA

P):.

49–.

81.8

7–.9

5Se

nsiti

vity

= 7

5%Sp

ecif

icity

= 7

4%C

orre

ct c

lass

ific

atio

n =

74%

ME

AP

base

fai

lure

rat

e =

54%

ME

AP

base

pas

s ra

te =

46%

Stag

e &

Jac

obso

n 17

34

ND

Rea

ding

alo

ud1

WR

CW

ashi

ngto

n A

sses

smen

t of

Stu

dent

Lea

rnin

g(2

001)

(WA

SL)

Lev

el:.

51Se

nsiti

vity

= 6

6%Sp

ecif

icity

= 7

6%C

orre

ct c

lass

ific

atio

n =

74%

Bas

e fa

ilure

rat

e =

20%

Bas

e pa

ss r

ate

= 8

0%

Silb

ergl

itt &

Hin

tze

2,19

11–

3G

ER

eadi

ng a

loud

1W

RC

Min

neso

ta C

ompr

ehen

sive

Ass

essm

ent

(MC

A)

(200

5)1

Pred

ictiv

e:.5

72

Pred

ictiv

e:.6

73

Con

curr

ent:

.71

Log

istic

reg

ress

ion

and

RO

C c

urve

ana

lysi

s w

ere

the

stro

nges

t met

hods

for

est

ablis

hing

cut

sco

res.

R

OC

cur

ve a

naly

sis

was

the

mos

t fle

xibl

e.

Cur

ricu

lum

Eff

ects

Tin

dal,

Mar

ston

,Den

o,66

01–

6G

ER

eadi

ng a

loud

1W

RC

Dif

fere

nt c

urri

cula

pro

duce

d m

ean

leve

l dif

fer-

& G

erm

ann

(198

2,W

ord

iden

tific

atio

nen

ces

in r

eadi

ng a

loud

and

wor

d id

entif

icat

ion

IRL

D #

93)

scor

es. A

ll cu

rric

ula

show

ed a

cros

s gr

ade

grow

th.

Tin

dal,

Flic

k,&

12

2–5

SER

eadi

ng a

loud

1W

RC

(31

wee

ks,1

–2

Col

e (1

992)

times

per

wee

k);

stud

ent p

erfo

rm-

ance

on

pass

ages

from

inst

ruct

iona

lm

ater

ial w

as s

imi-

lar

to p

erfo

rman

ceon

pas

sage

s fr

omm

ains

trea

m

mat

eria

l.(t

able

con

tinu

es)

96(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Fuch

s &

Den

o 91

1–6

ND

Rea

ding

alo

ud1

WR

CW

oodc

ock

Rea

ding

Mas

tery

Tes

tPa

ssag

e (1

992)

Com

preh

ensi

onG

inn

720

.89–

.93

(pre

prim

er–7

)Sc

ott-

Fore

sman

.9

0–.9

3U

nlim

ited

(pre

prim

er–6

)

Hin

tze,

Shap

iro,

572–

4N

DR

eadi

ng a

loud

1W

RC

The

Deg

rees

of

Rea

ding

Pow

er T

est

Con

te,&

Bas

ile

Aut

hent

ic tr

ade

.64–

.69

(199

7)bo

oks

(Lev

els

1–5)

Lite

ratu

re-b

ased

.6

2–.6

9ba

sal (

Lev

els

1–5)

Har

tman

& F

ulle

r 26

1G

ESt

anfo

rd A

chie

vem

ent T

est

Test

–ret

est

(199

7).9

1–.9

924

2R

eadi

ng a

loud

1W

RC

Wor

d R

eadi

ng S

kills

:.72

Rea

ding

Com

preh

ensi

on:.

89W

ord

iden

tific

atio

n1

WR

CW

ord

Rea

ding

Ski

lls:.

71R

eadi

ng C

ompr

ehen

sion

:.82

453

Rea

ding

alo

ud1

WR

CW

ord

Rea

ding

Ski

lls:.

71R

eadi

ng C

ompr

ehen

sion

:.80

Wor

d id

entif

icat

ion

1W

RC

Wor

d R

eadi

ng S

kills

:.52

Rea

ding

Com

preh

ensi

on:.

56

Bro

wn-

Chi

dsey

,21

5N

DM

aze

sele

ctio

n 2

CS

Con

curr

ent L

itera

ture

-Bas

ed P

assa

ge:.

74–.

92C

ontr

olle

d an

d Jo

hnso

n,&

co

ntro

lled

pass

age

liter

atur

e-ba

sed

pas-

Fern

stro

m (

2005

)sa

ges

show

ed s

igni

fi-

cant

gro

wth

acr

oss

fall,

win

ter,

and

spri

ng.

Mea

n sc

ores

wer

e co

nsis

tent

ly h

ighe

r fo

r co

ntro

lled

pass

ages

.

Bra

dley

-Klu

g,2

& 5

GE

Rea

ding

alo

ud1

WR

CSh

apir

o,L

utz,

&

282

Lite

ratu

re =

1.3

6 D

uPau

l (19

98)

WR

C p

er w

eek

Bas

al =

1.3

2 W

RC

per

wee

k30

5L

itera

ture

= 0

.84

WR

C p

er w

eek

Bas

al =

0.3

2 W

RC

per

wee

k(1

0 w

eeks

,2

times

per

wee

k)(t

able

con

tinu

es)

97(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Hin

tze,

Shap

iro,

&

243

GE

Rea

ding

alo

ud1

WR

CPa

ralle

l for

m(9

wee

ks,2

tim

esL

utz

(199

4)pe

r w

eek)

Lite

ratu

re-b

ased

L

itera

ture

Gro

up =

pa

ssag

e−1

.04

WR

C p

erpe

r w

eek

.56–

.95

Tra

ditio

nal

Gro

up =

-0.

34

WR

C p

er w

eek

Tra

ditio

nal b

asal

L

itera

ture

Gro

up =

pa

ssag

e0.

66 W

RC

per

wee

k.6

0–.9

6Tr

aditi

onal

Gro

up =

1.

72 W

RC

per

w

eek

Hin

tze

& S

hapi

ro

160

2–5

GE

Rea

ding

alo

ud1

WR

CPa

ralle

l for

m(8

wee

ks,

(199

7).6

5–.9

7 2

times

per

wee

k)2

Lite

ratu

re-b

ased

L

itera

ture

Gro

up =

pa

ssag

e0.

56 W

RC

per

w

eek

Tra

ditio

nal

Gro

up =

0.6

8 W

RC

per

wee

kT

radi

tiona

l bas

al

Lite

ratu

re G

roup

=

pass

age

−0.4

6 W

RC

per

w

eek

Tra

ditio

nal

Gro

up =

−0.

26

WR

C p

er w

eek

3L

itera

ture

-bas

ed

Lite

ratu

re G

roup

=

pass

age

1.81

WR

C p

er

wee

kT

radi

tiona

l G

roup

= 1

.58

WR

C p

er w

eek

Tra

ditio

nal b

asal

L

itera

ture

Gro

up =

pa

ssag

e0.

67 W

RC

per

w

eek

Tra

ditio

nal

Gro

up =

0.3

8 W

RC

per

wee

k4

Lite

ratu

re-b

ased

L

itera

ture

Gro

up =

pa

ssag

e2.

69 W

RC

per

w

eek

(tab

le c

onti

nues

)

98(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Tra

ditio

nal

Gro

up =

2.2

9 W

RC

per

wee

kT

radi

tiona

l bas

al

Lite

ratu

re G

roup

=pa

ssag

e1.

72 W

RC

per

w

eek

Tra

ditio

nal

Gro

up =

1.9

0 W

RC

per

wee

k5

Lite

ratu

re-b

ased

L

itera

ture

Gro

up =

pass

age

1.18

WR

C p

er

wee

kT

radi

tiona

l G

roup

= 2

.05

WR

C p

er w

eek

Tra

ditio

nal b

asal

L

itera

ture

Gro

up =

pass

age

1.72

WR

C p

er

wee

kT

radi

tiona

l G

roup

= 2

.82

WR

C p

er w

eek

Pow

ell-

Smith

&

362

LA

/R

eadi

ng a

loud

1W

RC

Bas

al =

2.6

9 W

RC

B

radl

ey K

lug

(200

1)C

h. 1

per

wee

kG

ener

ic =

2.4

1 W

RC

per

wee

k(5

wee

ks,2

tim

es

per

wee

k)

Rile

y-H

elle

r,K

elly

-13

2L

AR

eadi

ng a

loud

1W

RC

.65

WR

C p

er d

ayV

ance

,& S

hriv

er

Gen

eric

=

(200

5)C

urri

culu

m-

depe

nden

t(5

wee

ks,2

tim

es

per

wee

k)

Dif

ficu

lty L

evel

Shin

n,G

leas

on,&

30

3–8

SE/C

h. 1

R

eadi

ng a

loud

1W

RC

(4 w

eeks

,4 ti

mes

Tin

dal (

1989

)pe

r w

eek)

1 be

low

gra

de

leve

l = 4

.30

WR

Cpe

r w

eek

1 ab

ove

grad

e le

vel =

3.7

0 W

RC

per

wee

k(t

able

con

tinu

es)

99(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

2 ab

ove

grad

e le

vel =

4.5

5 W

RC

pe

r w

eek

4 ab

ove

grad

e le

vel =

2.3

5 W

RC

pe

r w

eek

Dun

n &

Eck

ert (

2002

)20

2–3

GE

Rea

ding

alo

ud1

WR

CSi

mila

r m

ater

ial

.65

WR

C p

er w

eek

Cha

lleng

ing

mat

e-ri

al .9

2 W

RC

per

w

eek

(8 w

eeks

,2 ti

mes

pe

r w

eek)

Hin

tze,

Dal

y,&

80

1–4

ND

Rea

ding

alo

ud1

WR

C(1

1 w

eeks

,2Sh

apir

o (1

998)

times

per

wee

k)1

Gra

de L

evel

=

3.29

WR

C p

er w

eek

Goa

l Lev

el =

1.9

5 W

RC

per

wee

k2

Gra

de L

evel

= .7

2 W

RC

per

wee

kG

oal L

evel

= .3

0 W

RC

per

wee

k3

Gra

de L

evel

= .1

6 W

RC

per

wee

kG

oal L

evel

= .0

6 W

RC

per

wee

k4

Gra

de L

evel

=

1.57

WR

C p

er w

eek

Goa

l Lev

el =

1.8

5 W

RC

per

wee

k

Fuch

s,T

inda

l,&

202-

4G

EW

ord

5W

RC

Slop

e (2

wee

ks,

Den

o (1

981,

IRL

D #

48,

(rea

ding

id

entif

icat

ion

.da

ily):

Gra

de-

Stud

y II

I)in

stru

ctio

nal

Spec

ific

Dom

ain

≠le

vel)

Ent

ire

Gra

de

Dom

ain

Ent

ire

Gra

de

Dom

ain

≠ A

cros

s G

rade

Dom

ain

Stan

dard

Err

or o

f E

stim

ates

:

(tab

le c

onti

nues

)

100(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Gra

de-s

peci

fic

dom

ain

≠E

ntir

e gr

ade

dom

ain

Ent

ire

grad

e do

mai

n =

acr

oss

grad

ePo

ncy,

Skin

ner,

&

373

ND

Rea

ding

alo

ud1

WR

CSt

uden

t ski

lls

Axt

ell (

2005

)ac

coun

ted

for

81%

of

the

vari

ance

,and

pa

ssag

e ac

coun

ted

for

10%

of

the

vari

ance

.

Hin

tze

& C

hris

t 99

2–5

ND

Rea

ding

alo

ud1

WR

C(1

1 w

eeks

,(2

004)

2 tim

es p

er w

eek)

2U

ncon

trol

led

=

−.06

WR

C p

er

wee

k C

ontr

olle

d =

.25

WR

C p

er w

eek

3U

ncon

trol

led

=

−.05

WR

C p

er

wee

kC

ontr

olle

d =

.50

WR

C p

er w

eek

4U

ncon

trol

led

=

.48

WR

C p

er

wee

kC

ontr

olle

d =

.24

WR

C p

er w

eek

5U

ncon

trol

led

=

.42

WR

C p

er

wee

kC

ontr

olle

d =

1.0

2 W

RC

per

wee

k

Com

pton

,App

leto

n,24

82

AA

&L

AR

eadi

ng a

loud

1W

RC

Perc

enta

ge o

f hi

gh-f

requ

ency

wor

ds

& H

osp

(200

4)an

d de

coda

ble

wor

ds a

ccou

nted

fo

r 41

% o

f th

e va

rian

ce in

rea

ding

ac

cura

cy a

nd 5

4% o

f th

e va

rian

ce

in r

eadi

ng f

luen

cy.

(tab

le c

onti

nues

)

101(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Rel

iabi

lity

of G

row

th R

ates

Hin

tze,

Ow

en,S

hapi

ro,

& D

aly

(200

0)St

udy

116

02–

5G

ER

eadi

ng a

loud

1W

RC

Indi

vidu

al d

iffe

renc

es

(8 w

eeks

,ac

coun

ted

for

48%

2

times

of th

e va

rian

ce a

nd

per

wee

k)gr

ade

acco

unte

d fo

r 19

% o

f th

e va

rian

ce.

Stud

y 2

801–

4N

DR

eadi

ng a

loud

1W

RC

Indi

vidu

al d

iffe

renc

es

(10

wee

ks,

acco

unte

d fo

r 42

%

2 tim

esof

the

vari

ance

,and

pe

r w

eek)

grad

e ac

coun

ted

for

36%

of

the

vari

ance

.

Bro

wn-

Chi

dsey

,47

65–

8N

DM

aze

sele

ctio

n10

CS

Gra

de le

vel a

ccou

nted

D

avis

,& M

aya

(200

3)fo

r 68

%–7

1% o

f th

e va

rian

ce o

n tw

o pa

ssag

es.

Indi

vidu

al d

iffe

renc

es in

sc

ores

acc

ount

ed f

or 8

4%

of th

e va

rian

ce o

n on

e pa

ssag

e.

Hin

tze

& P

elle

12

3–4

GE

& S

ER

eadi

ng a

loud

1W

RC

Indi

vidu

al d

iffe

renc

es

(8 w

eeks

,Pe

titte

(20

01)

acco

unte

d fo

r 62

%

2 tim

es p

er

of th

e va

rian

ce,a

nd

wee

k)re

adin

g gr

oup

acco

unte

d fo

r 15

% o

f th

e va

rian

ce.

Mar

ston

& D

eno

263

GE

Rea

ding

alo

ud1

WR

CPa

ired

tte

st a

t(1

982,

IRL

D #

106)

Wee

ks 1

and

16

show

ed s

igni

-fi

cant

gro

wth

on

rea

ding

al

oud;

rea

ding

al

oud

show

ed

grea

ter

grow

th

whe

n co

mpa

red

to b

asal

ser

ies

test

s.

(tab

le c

onti

nues

)

102(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

Mar

ston

,Den

o,&

83

3–6

LA

Wor

d id

entif

icat

ion

1W

RC

Pair

ed t

test

at

Tin

dal (

1983

,IR

LD

W

eeks

1 a

nd 1

0 #1

26)

show

ed s

igni

-fi

cant

gro

wth

fo

r al

l wor

d lis

ts (

p <

.001

).

Skib

a,D

eno,

Mar

ston

,67

1–7

SER

eadi

ng a

loud

1W

RC

(6 m

onth

s,3–

5 &

Wes

son

(198

6)tim

es p

er w

eek)

1 21.

78 W

RC

per

w

eek,

SD=

1.2

13

1.62

WR

C p

er

wee

k,SD

= .7

14

1.42

WR

C p

er

wee

k,SD

= .9

25

1.36

WR

C p

er

wee

k,SD

= .6

66 7

Shin

,Den

o,&

Esp

in

432

GE

/M

aze

sele

ctio

n3

CS

Alte

rnat

e 1.

20 C

S pe

r (2

000)

Ch.

1fo

rm .7

5–.9

0m

onth

.91

CS

per

mon

th(9

mon

ths,

mon

thly

)

Spee

ce &

Ritc

hey

276

1H

A/A

A

Rea

ding

alo

ud1

WR

C(2

0 w

eeks

,(2

005)

& L

Aw

eekl

y &

m

onth

ly)

HA

/AA

1.5

WR

C p

er

wee

kL

A.7

7 W

RC

per

w

eek

Stan

dard

Rat

es o

f G

row

th

Mar

ston

,Low

ry,

581–

6G

EA

vera

ge %

of

Den

o,&

Mir

kin

grow

th

(198

1,IR

LD

1

Wor

d id

entif

icat

ion

1W

RC

251

#49)

284

325

(tab

le c

onti

nues

)

103(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

49

514

621

1R

eadi

ng a

loud

1W

RC

150

275

326

424

528

630

Mac

Mill

an (

2000

)1,

691

2–7

GE

Rea

ding

alo

ud1

WR

CSi

gnif

ican

t gr

owth

with

in

grad

es. G

row

th

rate

s de

crea

sed

as g

rade

in-

crea

sed.

Sig

nifi

-ca

nt d

iffe

renc

es

wer

e fo

und

by

gend

er (

fe-

mal

e >

mal

e).

(1 s

choo

l yea

r,3

times

a y

ear)

Fuch

s,Fu

chs,

Ham

lett,

(1 s

choo

l yea

r,W

alz,

& G

erm

ann

wee

kly)

(199

3)St

udy

111

71

ND

Rea

ding

alo

ud1

WR

C2.

10 W

RC

per

w

eek,

SD=

.80

21.

46 W

RC

per

w

eek,

SD=

.69

31.

08 W

RC

per

w

eek,

SD =

.52

40.

84 W

RC

per

w

eek,

SD=

.30

50.

49 W

RC

per

w

eek,

SD=

.28

60.

32 W

RC

per

w

eek,

SD=

.33

Stud

y 2

257

1N

DM

aze

sele

ctio

n2.

5C

S0.

34 C

S pe

r w

eek,

SD=

.39

20.

39 C

S pe

r w

eek,

SD=

.24

30.

47 C

S pe

r w

eek,

SD=

.37

(tab

le c

onti

nues

)

104(T

able

1 c

onti

nued

)

Sam

ple

Rea

ding

mea

sure

Res

ults

Typ

e of

T

ime

Scor

ing

Stud

yN

Gra

de(s

)L

evel

mea

sure

(min

)pr

oced

ure

Val

idit

yR

elia

bilit

y G

row

th

40.

38 C

S pe

r w

eek,

SD=

.32

50.

36 C

S pe

r w

eek,

SD=

.23

60.

27 C

S pe

r w

eek,

SD=

.25

Den

o,Fu

chs,

2,99

91–

6G

E &

SE

Rea

ding

alo

ud1

WR

C(1

sch

ool y

ear,

Mar

ston

,& S

hin

wee

kly

or

(200

1)3

times

a y

ear)

1G

E1.

80 W

RC

per

w

eek,

SE=

.15

SE0.

83 W

RC

per

w

eek,

SE=

.15

2G

E1.

66 W

RC

per

w

eek,

SE=

.09

SE0.

57 W

RC

per

w

eek,

SE=

.09

3G

E1.

18 W

RC

per

w

eek,

SE=

.10

SE0.

58 W

RC

per

w

eek,

SE=

.11

4G

E1.

01 W

RC

per

w

eek,

SE=

.05

SE0.

58 W

RC

per

w

eek,

SE=

.05

5G

E0.

58 W

RC

per

w

eek,

SE=

.05

SE0.

58 W

RC

per

w

eek,

SE=

.08

6G

E0.

66 W

RC

per

w

eek,

SE=

.04

SE0.

62 W

RC

per

w

eek,

SE=

.14

Not

e.A

A =

ave

rage

ach

ievi

ng; C

h. 1

= C

hapt

er 1

; CM

= c

orre

ct m

atch

es; C

S =

cor

rect

sel

ectio

ns; D

/HH

= d

eaf

or h

ard

of h

eari

ng; E

LL

= E

nglis

h la

ngua

ge le

arne

rs; G

E =

gen

eral

edu

catio

n; H

A =

hig

h ac

hiev

ing;

LA

= lo

wac

hiev

ing;

ND

= s

ampl

e in

clud

ed g

ener

al e

duca

tion

and

spec

ial e

duca

tion,

anal

ysis

not

dif

fere

ntia

ted

by g

roup

; SD

= s

tand

ard

devi

atio

n; S

E=

sta

ndar

d er

ror

(of

mea

sure

men

t); S

E =

spe

cial

edu

catio

n; V

I =

vis

ual i

mpa

irm

ent;

WR

C =

wor

ds r

ead

corr

ectly

. If

no s

peci

fic

num

bers

wer

e re

port

ed f

or s

tude

nts

rece

ivin

g sp

ecia

l edu

catio

n se

rvic

es,t

he p

artic

ipan

ts w

ere

assu

med

to b

e in

gen

eral

edu

catio

n. T

ests

:St

anfo

rd A

chie

vem

ent T

est–

7(G

ardn

er e

t al.,

1982

a,19

83);

Woo

dcoc

k R

eadi

ng M

aste

ry T

est

Rev

ised

(Woo

dcoc

k,19

87);

Kau

fman

Bri

ef I

ntel

lige

nce

Test

(Kau

fman

& K

aufm

an,1

990)

; Kau

fman

Tes

t of

Edu

cati

onal

Ach

ieve

men

t(K

aufm

an &

Kau

fman

,198

5); G

ates

-M

acG

initi

e (M

acG

initi

e et

al.,

1978

); M

etro

poli

tan

Ach

ieve

men

t Tes

t–6

(Pre

scot

t et a

l.,19

84);

Woo

dcoc

k-Jo

hnso

n–II

I (M

cGre

w &

Woo

dcoc

k,20

01);

Iow

a Te

st o

f B

asic

Ski

lls

(Hoo

ver

et a

l.,19

96);

Woo

dcoc

k-Jo

hnso

n–R

evis

ed(W

oodc

ock

& J

ohns

on,1

989)

; Com

preh

ensi

ve R

eadi

ng A

sses

smen

t B

atte

ry(F

uchs

,Fuc

hs,&

Ham

lett,

1989

); T

ests

of A

chie

vem

ent

and

Pro

ficie

ncy

(Sca

nnel

l et a

l.,19

86);

Mic

higa

n E

duca

tion

al A

sses

smen

t P

rogr

am(M

ichi

gan

Stat

e B

oard

of

Edu

catio

n,19

99);

Met

ropo

lita

n A

chie

vem

ent T

est–

7(H

arco

urt E

duca

tiona

l Mea

sure

men

t,19

98);

Sta

nfor

d D

iagn

osti

c R

eadi

ng T

est

(Kar

lsen

& G

ardn

er,1

985)

; Min

neso

ta C

ompr

ehen

sive

Ass

essm

ent

(Min

neso

taD

epar

tmen

t of

Chi

ldre

n Fa

mili

es,a

nd L

earn

ing,

1998

-200

2); S

tanf

ord

Ach

ieve

men

t Tes

t–9

(Har

cour

t Bra

ce &

Com

pany

,199

7); D

iagn

osti

c R

eadi

ng S

cale

s(S

pach

e,19

81);

The

Tes

t of

Ear

ly R

eadi

ng A

bili

ty—

Dea

f an

d H

ard

ofH

eari

ng V

ersi

on(R

eid

et a

l.,19

91);

Ore

gon

Stat

ewid

e A

sses

smen

t(O

rego

n D

epar

tmen

t of

Edu

catio

n,20

00);

Was

hing

ton

Ass

essm

ent

of S

tude

nt L

earn

ing

(199

8); T

he D

egre

es o

f R

eadi

ng P

ower

Tes

t (K

oslin

et a

l.,19

89).

THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007 105

by examining evidence related to the technical adequacy ofthe various measures used in CBM reading and consider gen-eralizability of the research to other students and purposes.

Technical Adequacy of CBM Measures

CBM Reading Measures

We focus our review on the three measures most commonlyused in CBM reading: reading aloud, maze selection, andword identification. In reading aloud, students read aloudfrom a passage, usually for 1 min, and the number of wordsread correctly is scored (Deno, 1985). Omissions, insertions,substitutions, hesitations, and mispronunciations are markedas errors. In maze selection, students read through a passagein which (typically) every seventh word has been deleted andreplaced with three word choices—one correct choice and twodistracters. (Rules for creating maze passages can be found inD. Fuchs and L. S. Fuchs, 1992.) Students read the passagesilently, usually for 1 to 3 min, making selections as they read.The number of correct selections is scored. In word identi-fication, students read aloud from a list of high-frequencywords, usually for 1 min, and the number of words read cor-rectly is scored (Deno, Mirkin, & Chiang, 1982). Omissions,insertions, substitutions, hesitations, and mispronunciationsare marked as errors. Words are usually selected from wordlists or from the reading curriculum. By far, the majority ofresearch has focused on the reading-aloud measure. Recently,however, interest in maze selection and word identificationhas grown as CBM has been extended to younger and olderstudents and as computerized progress monitoring has be-come a possibility.

Reading Aloud

Despite the early support for the technical adequacy of thereading-aloud measures, practitioners and researchers alikecontinued to express doubts about the relation between thesimple measure of reading aloud from text for 1 minute andreading proficiency, especially proficiency in reading com-prehension (e.g., see Mehrens & Clarizio, 1993; Yell, 1992).Thus, in the 1980s and 1990s, efforts were made to more closelyexamine the nature of the relationship between reading aloudand general reading proficiency, especially reading compre-hension. This research took two different approaches. Thefirst approach sought to clarify the relation between readingaloud and reading comprehension by considering alternativemeasures that might be more closely linked to reading com-prehension and by examining the theoretical underpinnings ofthe relation between reading aloud and reading proficiency.The second approach sought to examine the concomitant re-lation between CBM reading aloud and reading comprehen-sion, with a focus on the individual student.

Clarification of the Relation Between Reading Aloudand Reading Comprehension. Fuchs, Fuchs, and Maxwell(1988) compared the validity of CBM reading-aloud measuresto that of other measures typically used to assess reading com-prehension, including cloze (where every seventh word is de-leted from a text and replaced with a blank), story retell, andquestion-answering measures. Participants were students withmild disabilities in Grades 4 to 8. Results revealed that reading-aloud scores correlated more strongly with scores on the com-prehension and word skills subtests of a standardized achieve-ment test (r = .91 and r = .80, respectively) than did scoresfrom the other “typical” comprehension measures (rs = .76 to.82 for the reading comprehension and .66 to .76 for the wordskills subtests, respectively). Results of the Fuchs et al. studysuggested that reading aloud was more than just a measure offluent decoding, a notion that was supported in subsequent re-search investigating the theoretical nature of the relationshipbetween reading aloud and reading comprehension.

Shinn, Good, Knutson, Tilly, and Collins (1992) usedconfirmatory factor analysis to examine the role of readingaloud as it related to decoding, fluency, and comprehensionskills for students in Grades 3 and 5. A single-factor model of“reading competence” was validated for third-graders, withall reading skills making significant contributions. In con-trast, a two-factor model including decoding and comprehen-sion as two separate but highly related factors was validatedfor fifth graders, with reading aloud loading on the decodingfactor. Hosp and Fuchs (2005) also observed changes in thenature of the relationship between reading aloud and readingproficiency associated with age. Relationships between CBMreading aloud and the Decoding, Word Reading, and Com-prehension subtests of the Woodcock Reading Mastery Test–Revised (WRMT; Woodcock, 1987) were similar in magnitudefor students in Grades 2 and 3 (ranging from .82 to .88), butin Grade 4, lower correlations were observed for the Decod-ing and Word Reading subtests (rs = .72 and .73, respectively)than for the Reading Comprehension subtest (r = .82).

Kranzler, Brownell, and Miller (1998), in a somewhatdifferent approach, posed the hypothesis that the number ofwords read aloud from text in 1 min might merely be a re-flection of general speed of processing. Kranzler et al. exam-ined the roles of general cognitive ability, speed and efficiencyof elemental cognitive processing, and reading aloud in theprediction of reading comprehension for students in Grade 4.Multiple regression analyses revealed a significant relation-ship between reading aloud and reading comprehension mea-sures that could not be explained by general cognitive abilityor speed and efficiency of elemental cognitive processing. Re-sults suggested that reading aloud was not merely an indicatorof general cognitive processing speed. Kranzler et al. noted,however, that although reading aloud had the highest stan-dardized regression coefficient when compared to cognitiveability and mental speed, it explained only 11% of the uniquevariance found in reading comprehension.


Concomitant Change in Reading Aloud and ReadingComprehension. The studies reviewed in the previous sec-tion focused on patterns of results across groups; however, theconcerns raised by practitioners often focus on the nature ofthe relationship between CBM and reading comprehension forthe individual student. One such concern is whether readingaloud and reading comprehension change concomitantly. Mar-kell and Deno (1997) addressed this issue by experimentallymanipulating the difficulty level of reading material. Studentsin Grade 3 read passages that were two levels below, at, andtwo levels above grade level. Students also completed twocomprehension tasks for each passage—question answeringand maze selection. Results revealed that, on average, studentsread significantly fewer words in 1 min on the more difficultpassages, answered fewer questions correct, and selected fewercorrect maze choices, supporting the general relation betweenwords read aloud in 1 min and reading comprehension. How-ever, at the individual level, it appeared that the amount of per-formance change was an important factor to be considered.Results revealed that for only 52% of the students did the rankordering of reading-aloud scores match the rank ordering ofcomprehension scores on all three levels of materials. Takinga more liberal approach, and controlling for a ceiling effecton the comprehension measures, the percentage of studentsfor whom rankings matched on only the highest and lowestlevels were examined. This analysis resulted in an agreementof 100% and 96% for question answering and maze, respec-tively. These results suggested that a relatively large changein the number of words read in 1 min (perhaps as large as15–20 words) was needed to predict with certainty a con-comitant change in reading comprehension.

A second concern raised by practitioners is the existenceof “word callers,” that is, students who can read fluently butdo not comprehend. Hamilton and Shinn (2003) examinedteachers’ ability to identify word callers. Third-grade teach-ers were asked to identify one to two students who were wordcallers (WC) and one to two similarly fluent peers (SFP). Sim-ilarly fluent peers were students whom teachers judged ashaving fluency rates similar to the WC students but withhigher levels of comprehension. Results confirmed differ-ences in comprehension levels between the students in the WCand SFP groups, with SFP students scoring higher on com-prehension tasks. However, results also revealed differencesin reading fluency, with scores for WC students lower thanthose for SFP students, calling into question teachers’ abilityto identify students as word callers. We note that the Hamil-ton and Shinn (2003) study did not address the actual exis-tence of word callers, just teachers’ judgment of word callers.To examine the existence of word callers, one would need toexamine the relative standing of students on reading aloud andreading comprehension measures. Word callers would bethose whose relative standing on the reading aloud measureswere substantially higher than their standing on the readingcomprehension measures.

Maze Selection

Although the number of words read aloud in 1 min demon-strated good technical adequacy in the early IRLD research,the measure was limited in the sense that it had to be admin-istered individually, it lacked face validity, and it was unclearwhether the measure would be appropriate for older studentswho might presumably reach an asymptote in reading aloudperformance. These factors, combined with improvements intechnology and changes in the field of special education lead-ing to larger caseloads, led to consideration of the maze mea-sure. The maze could be administered in groups, appeared tobe more of a reading comprehension measure than a readingaloud measure, could be administered via the computer, andwas considered to be more acceptable for older students. Themaze was not a new measure. An untimed version of the mazehad been studied in 1970s by Guthrie as a measure of read-ing comprehension and was shown to have good stability, tocorrelate with standardized measures of reading proficiency,and to separate readers with and without disabilities (Guthrie,1973; Guthrie, Seifert, Burnham, & Caplan, 1974). The useof the maze as a timed measure within a CBM framework didnot appear until the late 1980s and early 1990s.

In 1989, Espin, Deno, Maruyama, and Cohen reportedon the technical adequacy of a maze measure that was part ofa group-administered screening instrument called the BasicAcademic Skills Samples (BASS; Deno, Maruyama, Espin, &Cohen, 1989). The reading portion of the BASS consisted ofthree 1-min maze selection tasks that were approximately ata first- to second-grade reading level. The BASS was admin-istered to more than 2,000 students in Grades 1 through 6across 31 schools. Correlations between the BASS maze and1-min reading-aloud passages for a random sample of stu-dents from Grades 3, 4, and 5 were .77, .86, and .86, respec-tively. Data from the entire sample revealed a stable patternof increase in maze scores from Grades 1 to 6, as well as fromwinter to spring within each grade.

Fuchs and Fuchs (1992) extended the research on mazeselection in their search for a CBM reading measure that wouldbe suitable for data collection via the computer and that mighthave greater acceptance for teachers than would reading aloud.Technical adequacy and level of teacher acceptance werecompared for several alternative CBM measures, includingquestion answering, story recall, cloze, and maze selection.Maze selection in this study was a 2.5-min measure adminis-tered twice weekly for 18 weeks via computer. Earlier re-search (Fuchs & Fuchs, 1990) had revealed correlations of .83between scores on maze and reading aloud and of .77 betweenscores on maze and the Reading Comprehension subtest ofthe Stanford Achievement Test (SAT; Gardner, Rudman, Karl-sen, & Merwin, 1982). Results of the Fuchs and Fuchs (1992)study revealed that the maze task was sensitive to change inperformance over time and, unlike other measures, had a rel-atively small ratio of slope to standard error of estimate (SEE),

making it easier to detect growth on a graph. In addition,teachers rated their satisfaction with maze highly, reportingthat they believed the maze reflected multiple dimensions ofreading, including decoding, comprehension, and fluency. Fi-nally, students reported that they liked taking the maze.

In a direct comparison of the technical adequacy ofreading aloud and maze selection measures, Jenkins and Jew-ell (1993) examined the validity of the two measures acrossGrades 2 to 6. All students in the study completed three 1-minmaze tasks and three 1-min reading-aloud passages. Passageswere at a first- to second-grade level. Criterion variables werescores on the Gates-MacGinitie Reading Tests (Gates; MacGin-itie, Kamons, Kowalski, MacGinitie, & McKay, 1978) and theMetropolitan Achievement Tests (MAT; Prescott, Balow, Ho-gan, & Farr, 1984). Within-grade correlations were moderate-strong to strong for both measures, ranging from .63 to .88with the Gates and from .58 to .87 with the MAT. In Grades 2through 4, correlations tended to be stronger for reading aloudthan for maze, but in Grades 5 and 6, this pattern of differ-ences disappeared. Looking across grades, correlations be-tween the reading aloud and the criterion measures droppedfrom the .80s in Grades 2 through 4 to .60s to .70s in Grades5 and 6. In contrast, correlations for the maze remained con-sistent across the grade levels, with most between .65 and .75.Finally, both measures revealed increases across Grades 2 to6, and from fall to spring within grade. For the reading aloudmeasures, change was greatest from Grades 2 to 3, after whichit leveled off. Maze, in contrast, reflected more even rates ofchange across the grades. Results suggested that readingaloud might be a better measure than maze for primary-gradestudents, a conclusion supported by Ardoin et al. (2004), whofound that adding a maze task did not add significantly to theprediction of performance on a standardized achievement testin reading for students in third grade.

Word Identification

Although both word identification (word ID) and readingaloud were included as potential indicators of reading profi-ciency in early CBM research (e.g., Marston, Deno, & Tin-dal, 1983; Marston, Lowry, Deno, & Mirkin, 1981; Shinn,Ysseldyke, Deno, & Tindal, 1986; Tindal, Marston, Deno, &Germann, 1982), reading aloud emerged as the more com-monly used measure, perhaps because it appeared to be moreclosely related to the construct of reading than did word ID.Reading aloud, however, proved difficult to use with begin-ning first-graders because many of these students could notread any words from text in the fall, creating a floor effect inthe measure (e.g., see Bain & Garlock, 1992; Fuchs, Fuchs,& Compton, 2004). As interest in early identification and pre-vention grew, interest in CBM word identification reemerged:The words presented in a word ID task could be controlledfor difficulty and were not constrained by the requirement offitting the words into a coherent story.

The technical adequacy of several potential CBM read-ing measures for students in first grade was compared in astudy by Daly, Wright, Kelly, and Martens (1997). Readingmeasures included word ID, letter reading, letter copying,letter-sound production, and letter-sound selection. Word IDprobes were created using words selected from the Harris-Ja-cobson pre-primer word list (Harris & Jacobson, 1972). Re-sults revealed that letter reading and word ID produced thebest technical adequacy data. Test–retest reliabilities for theword ID and letter-reading measures were .94 and .87, re-spectively, compared to .42 to .65 for the other measures. Con-current validity coefficients with the broad reading subtest ofthe Woodcock-Johnson–Revised (Woodcock & Johnson, 1989)were .40 and .35, respectively. Predictive validity coefficientsbetween the two measures and a passage-reading and word-ID task administered 4 months later were .73 (word ID) and.71 (letter reading) with passage reading and .71 (word ID)and .69 (letter reading) with word ID. Predictive validity co-efficients for the other measures ranged from −.09 to .53.

Fuchs, Fuchs, and Compton (2004) compared the valid-ity of a word-ID task and nonsense word fluency (NWF) taskfor first graders at risk in reading. The word-ID task was cre-ated using words from the Dolch preprimer, primer, and first-grade-level lists. The NWF measure was taken from DIBELS(Good, Simmons, & Kame’enui, 2001). Criterion measures in-cluded the word attack and word identification subtests of theWoodcock Reading Mastery Test–Revised (WRMT-R; Wood-cock, 1987) and the Comprehensive Reading Assessment Bat-tery (CRAB; Fuchs, Fuchs, & Hamlett, 1989). Students weretested in the fall and spring of the year on the criterion mea-sures and were tested at least once weekly on both CBM mea-sures. Alternate-form reliability for the word-ID and NWFtasks was .88 and .87, respectively. Validity was examined forboth level and slope of performance for the two CBM mea-sures. In general, correlations with the criterion measures wereconsistently and reliably stronger for word ID than for NWF.Concurrent validity coefficients for word-ID level ranged from.52 to .93, compared to .50 to .80 for NWF. Predictive valid-ity for word-ID level ranged from .45 to .80, compared to .46to .64 for NWF. Finally, the slopes produced by word ID weremore strongly correlated to the criterion variables (with mostcoefficients at .45 or above) than they were for NWF (withonly 2 of 12 coefficients at .45 or above). Results not only lentsupport for the concurrent and predictive validity of word IDas an indicator of performance but also provided informationsupporting the technical adequacy of the growth rates pro-duced by the measure.

In a recent study, Compton, Fuchs, Fuchs, and Bryant(2006) examined the use of a word ID measure within a response-to-intervention (RTI) approach for first-grade stu-dents. Students were administered a prediction battery in thefall of first grade consisting of word ID, rapid naming, phone-mic awareness, and oral vocabulary measures. Students werealso progress-monitored for a 5-week period using the word-



ID task, and the slope of improvement was calculated. Stu-dents were followed until the end of second grade. Comptonet al. examined the use of word-ID level, slope, and a combi-nation of level and slope as a part of a process for predictingperformance of participants at the end of second grade. (Theyalso examined different classification and data analytic ap-proaches, which are not reviewed here.) Results revealed thatadding word-ID level and slope significantly improved clas-sification accuracy for the identification of at-risk studentsover and above the use of phonemic awareness, rapid naming,and oral vocabulary measures.

Generalizability of Research: Student Populations and PurposesRecent work has examined the validity of CBM reading mea-sures—primarily reading aloud—to diverse groups of studentsand for new uses. In terms of generalizability to new studentpopulations, CBM research has been extended to studentsat the secondary-school level and to students with diversebackgrounds and characteristics (racial/ethnic and languagebackgrounds, gender, socioeconomic status, and sensory dis-abilities). In terms of generalizability of uses, CBM researchhas been extended to examine the uses of reading measuresfor predicting performance on state standard tests. Given spacelimitations, and the fairly limited amount of research withineach category, we briefly review these studies in this section.

Student Populations. Extensions of CBM reading re-search to the secondary-school level initially focused onthe use of reading measures to predict content-area perfor-mance, rather than to predict general reading performance(e.g., Espin & Deno, 1993a, 1993b; Espin & Deno, 1995; Few-ster & MacMillan, 2002; Yovanoff, Duesbery, Alonzo, & Tin-dal, 2005). More recent research has focused on predictinggeneral reading performance (Espin & Foegen, 1996; Espin,Wallace, Lembke, Campbell, & Long, 2007; Muyskens &Marston, 2006; Tichá, Espin, & Wayman, 2007). These stud-ies have all been conducted at the middle school level. Resultshave generally shown that both reading aloud and maze ex-hibit strong alternate-form reliability and moderate to strongcriterion-related and predictive validity. However, Espin et al.(2007) and Tichá et al. (2007) also found that whereas read-ing aloud did not reflect change in performance over time,maze selection did. Growth on maze was related to perfor-mance on a state reading test and to changes on a standard-ized achievement test in reading.

Extensions of CBM reading research to students fromdiverse backgrounds have produced mixed results. With re-gard to racial/ethnic backgrounds, Hixson and McGlinchey(2004) and Kranzler, Miller, and Jordan (1999) found thatCBM reading aloud resulted in overestimation of reading per-formance for African American students and underestimationof performance for Caucasian students. (Overestimation of

performance might result in underidentification for services.)However, Hintze, Callahan, Matthews, Williams, and Tobin(2002) found that CBM reading aloud resulted in neither over-nor underestimation of performance for African American orCaucasian students. In terms of language, results generallyhave revealed moderate to strong reliability and criterion-related validity coefficients for reading-aloud measures withEnglish learners (ELs; Baker & Good, 1995; Wiley & Deno,2005). In addition, gains on reading-aloud measures for ELstudents have been found to be similar to gains seen for non–EL students (Graves, Plasencia-Peinado, Deno, & Johnson,2005).

In a comprehensive study of the effects of home lan-guage, gender, ethnicity, and socioeconomic status on the tech-nical adequacy of CBM reading aloud measures, Klein andJimerson (2005) followed three cohorts of students (N = 398)longitudinally from Grades 1 through 3. The first cohort con-sisted of Caucasian students who spoke English as their homelanguage, the second group was Hispanic students who spokeEnglish as their home language, and the third cohort was His-panic students who spoke Spanish as their home language.Results revealed a strong relationship between the CBMreading-aloud and SAT reading scores for all three groups ofstudents at each grade level, with most correlations between.63 and .82. Linear regression analyses revealed that only acombination of ethnicity and home language resulted in biasin the measures. Specifically, the reading proficiency of His-panic students whose home language was Spanish was sys-tematically overpredicted (which could lead to systematicunderidentification for services), whereas the reading profi-ciency for Caucasian students whose home language wasEnglish was systematically underpredicted (which could leadto systematic overidentification).

Only two studies have examined the validity of CBMreading measures with students with sensory disabilities. Thefirst, Morgan and Bradley-Johnson (1995), provided supportfor the validity of the measures for students in Grades 3 to 7with visual impairments. The second, Allinder and Eccarius(1999), did not provide support for the validity of either read-ing aloud or maze selection for students who were deaf andhard of hearing.

Purposes. Recent work has extended the work on CBMreading measures to examine their use for predicting perfor-mance on state standards tests. Early studies focused on es-tablishing benchmark scores that would predict passing orfailing a state reading test (Crawford, Tindal, & Stieber, 2001;Good, Simmons, & Kame’enui, 2001). Subsequent studies thatexamined correlations between CBM reading-aloud measuresand performance on state standards tests reported diagnosticefficiency statistics, including sensitivity, specificity, positivepredictive power, and negative predictive power (Hintze &Silberglitt, 2005; McGlinchey & Hixson, 2004; Silberglitt &Hintze, 2005; Stage & Jacobsen, 2001; see Note 2). Sensitiv-

ity is the percentage of students below a cut score who fail atest. Specificity is the percentage of students above a cut scorewho pass a test. Positive predictive power is the probabilitythat a student with a score below the cut score will truly failthe test, whereas negative predictive power is the probabilitythat a student with a score above the cut score will truly passthe test.

Correlations between scores on the reading-aloud mea-sures and the various state reading tests generally ranged from.60 to .80 across studies. The exception to this pattern was thestudy by Stage and Jacobsen (2001), in which correlations be-tween reading aloud and the Washington state test were .43 to.44. The differences for the Stage and Jacobsen study mighthave been related to the nature of the Washington state read-ing test, which required both short-answer and extended writ-ten responses, and thus involved not only reading but alsowriting. The other state tests did not require written responses.

Diagnostic efficiency statistics across the four studieswere fairly consistent. Sensitivity values ranged from 65% to76%; specificity values ranged from 74% to 82%. Positivepredictive power values ranged from 55% to 77% for all butthe Stage and Jacobsen (2001) study, in which the value was41%. Negative predictive power values ranged from 83% to90% for all but the McGlinchey and Hixson (2004) study, inwhich the value was 46%. Across studies, the use of CBMadded significantly to positive and negative predictive powerabove base rates of prediction.

Summary and Discussion

Research on CBM reading measures has provided support forand clarified the nature of the relationship between readingaloud and general reading proficiency; has examined alter-nates to reading aloud, including maze selection and word ID;and has examined the generalizability of the results to differ-ent student populations and for different uses.

With regard to the reading-aloud measure, results gen-erally replicated earlier research demonstrating a strong rela-tionship between CBM reading aloud and reading proficiency,even when correlations were calculated within grade, ad-dressing a concern raised about the early CBM research (seeMehrens & Clarizio, 1993). Reading aloud was found to be abetter indicator of reading comprehension than were other“typical” comprehension measures, and results revealed thatreading aloud was not just a speed-of-processing measure. Inaddition, research provided insight into the theoretical natureof the relationship between reading aloud and reading profi-ciency for elementary-school students.

However, reading aloud has limitations. First, readingaloud may not be the best choice for very young and olderstudents. For readers at the very beginning stages of reading,reading-aloud measures produced a floor effect. Examinationof an alternative measure, word ID, proved a promising alter-native for very beginning readers. Reliability and validity co-

efficients for word ID were consistently strong for beginningreaders, and research supported the use of word ID as a partof an RTI approach to early identification and prevention. Sec-ond, although correlations between reading-aloud and crite-rion measures remained moderate to strong across elementaryschool grades, they were strongest at the primary grades anddecreased at the intermediate grades. No such decrease wasseen for maze, which remained fairly stable across the grades.Thus, although reading aloud might be the best measure forprimary-grade students, reading aloud and maze selection bothseem to be appropriate for intermediate-grade students. Forsecondary-school students, maze may be the best choice. Al-though research is limited, initial results suggest that readingaloud does not reflect growth for middle school students,whereas maze selection does. Finally, if progress is to be mon-itored across school years, maze might prove to be the bestchoice. It has been shown to have reasonable validity and re-liability for students across Grades 2 through 8, and the growthrates across grades have shown greater consistency thanthose for reading aloud. The reasons for the age-related dif-ferences seen between the measures are not clear and shouldbe examined more closely. Perhaps the teachers from theFuchs and Fuchs (1992) study were correct: Perhaps mazereflects multiple aspects of reading proficiency to a greaterextent than reading aloud does. If true, the separation of read-ing proficiency into decoding and comprehension factors atthe intermediate elementary grades, as seen in Shinn et al.(1992), would not affect the maze correlations but would af-fect reading-aloud correlations. Regardless of the reasons forthe differences, given the moderate to strong reliability andcriterion-related validity coefficients for maze, and given theadvantages offered by maze in terms of group administration,appropriateness for computerized administration, potentialfor cross-grade measurement, and acceptance by teachers, webelieve that in the future more attention should be devoted tomaze selection as a potential CBM reading measure.

The extension of the research to various populations andfor different uses is still in the early stages of development.Research at the secondary-school level is promising but hasfocused primarily on middle school students. Little has beendone at the high school level. With regard to students of di-verse backgrounds, evidence suggests that although the CBMreading measures may have reasonable reliability and crite-rion-related validity for various groups, the measures mayoverestimate the performance of African American studentsand of Hispanic students whose home language is Spanish andunderestimate the performance of Caucasian students. How-ever, results are mixed, and there is a need to further examinethese patterns of relations. We suggest that performance-leveldifferences between groups be factored out in future research.

In terms of extensions for new purposes, the use of CBMreading-aloud measures for predicting performance on statestandards tests has produced positive results, with the mea-sures producing generally strong correlations with state read-



ing tests and fairly good sensitivity, specificity, positive pre-dictive, and negative predictive power. These conclusionsmust be tempered by the fact that the technical adequacy ofCBM reading measures for predicting performance on statetests may be affected by the nature of the state test, as seen inStage and Jacobsen (2001).

Aside from a need for further research on the generaliz-ability of the measures, two other issues have yet to be suffi-ciently addressed in the area of measure development. First,there is a need to further examine the relationship betweenreading aloud and reading comprehension at the individuallevel. Few studies focused on the individual level. One that did,Markell and Deno (1997), implied that large gains in reading-aloud scores might be necessary before gains in reading com-prehension could be assumed. There is a need to replicate thisresearch and to examine the implications of the results for in-dividual progress monitoring. Relatedly, we think research onthe existence of word callers is of interest. It is conceivablethat a small group of students exists whose performance on thereading aloud measures is, relatively speaking, much higherthan their performance on comprehension measures.

A second issue relates to methods for linking measures.If word ID is to be used for beginning readers, reading aloudfor primary-grade readers, and maze for intermediate and sec-ondary-school readers, how might the measures be linked tocreate a picture of growth across school years? The ability tolink different measures across years would contribute to thedevelopment of a seamless and flexible system of progressmonitoring.

Effects of Text Materials

Thus far, we have focused on various measures that can beused in CBM in reading. In this section, we turn our attentionto the materials used to develop those measures. In 1994, Fuchsand Deno raised the question of whether instructionally use-ful performance assessment had to be based in the curricu-lum. The authors noted that although measures selected fromthe curriculum might have face validity, curriculum measuresalso introduced more error into progress measurement be-cause of variations in passage difficulty and student familiar-ity with passages. In addition, passages selected from withinan instructional curriculum might limit generalizability toother reading curricula.

A large number of studies were conducted beginning inthe 1990s to address questions related to the materials used todevelop CBM measures. The research fell into two broad cat-egories. The first addressed the question of curriculum source,that is, whether technical adequacy of the measures differedwith the curriculum used to generate measures. Included inthis category were studies comparing reading passages gen-erated from different curricula and studies comparing pas-sages generated from an instructional versus a “generic” ornoninstructional curriculum. The second category of research

addressed the difficulty level of the materials chosen, specif-ically whether measurement had to be done within the stu-dent’s instructional level.

We note two points before describing the literature inthis section. First, the research on materials has been con-ducted almost exclusively with reading-aloud measures. Fewstudies have examined maze or word ID measures. Second,there are a surprisingly large number of studies addressing thequestion of materials. Given the space limitations accreditedto a review article, and the similarities in pattern of results,we report the results generally, referring to details of specificstudies only when needed.

Curriculum Effects

Three general themes emerged from the research on the ef-fects of curriculum on CBM reading measures. First, level ofperformance differs significantly with curriculum source.Second, although technical adequacy does not vary with cur-riculum source, rates of growth may. Third, it is not necessaryto match instructional and progress-monitoring material.

With respect to the first theme—differences in levels ofperformance—results consistently reveal mean level differ-ences in scores on passages drawn from different curricula,beginning with Tindal, Marston, Deno, and Germann (1982).Studies have shown higher levels of performance on instruc-tional materials than on mainstream materials (Tindal, Flick,& Cole, 1992), on literature-based materials than on authenticmaterials (Hintze, Shapiro, Conte, & Basile, 1997), on basalmaterials than on literature-based materials (Bradley-Klug,Shapiro, Lutz, & DuPaul, 1998), and on generic materials thanon basal materials (Powell-Smith & Bradley-Klug, 2001). Inaddition, higher levels of performance have been found onmaze measures drawn from materials controlled for difficultythan on literature-based materials (Brown-Chidsey, Johnson,& Fernstrom, 2005). Mean level differences are important forprogress monitoring only insofar as they are used to comparestudents across classes, schools, or districts or when compar-ing student performance at one point in time to performanceat a later point in time. For such uses, it is important to keepthe source of material constant. However, performance-leveldifferences do not address the issue of technical adequacy ofthe measures as indicators of reading performance or growth.Measures may produce differences in levels of performancebut be equally good indicators of general reading proficiency.

The second theme to emerge from the curriculum mate-rials research does relate to technical adequacy. Results acrossseveral studies reveal that there are few differences in thetechnical adequacy of reading-aloud measures selected fromdifferent curricula. For example, Fuchs and Deno (1992) com-pared reading-aloud passages drawn from two published basalcurriculum series and found no differences in the magnitudeof correlations with the Woodcock Reading Mastery Test(WRMT; Woodcock, 1973) in Grades 1 through 6. Similarly,Hintze, Shapiro, Conte, and Basile (1997) found no differ-

ences in alternate-form reliability or criterion-related validityfor passages selected from authentic and literature-basedcurricula and no differences in the passages for classifyingstudents into groups. Hartman and Fuller (1997) found thattest–retest reliability and criterion-related validity coefficientsfor passages selected from literature-based materials for stu-dents in Grades 1 through 3 were similar to those reportedin the other research for passages selected from basal-seriestexts. Finally, Brown-Chidsey et al. (2005) found high corre-lations between performances on maze passages selected fromcontrolled and literature-based material.

Similar to the research on performance levels, researchon the developmental growth rates (i.e., changes in scoresacross grade levels) produced by CBM reading measures re-veal few differences related to curriculum source (Fuchs &Deno, 1992; Hintze et al., 1997). However, results from stud-ies of intraindividual growth rates have been mixed. Bradley-Klug, Shapiro, Lutz, and DuPaul (1998) found no differencesin growth rates produced by literature-based and basal-seriescurricula for second-graders. Differences in growth for fifth-graders were not statistically significant, but growth rateswere .84 words per week for literature-based and .32 for basal-series probes. Brown-Chidsey et al. (2005) found that mazeprobes created from controlled and literature-based passagesboth produced positive growth rates from fall to winter tospring measurements. The significance of the difference ingrowth rates was not tested.

Other studies have found significant differences in growthrates. Hintze, Shapiro, and Lutz (1994) examined the growthrates produced by literature-based and basal-series curriculafor two groups of third-grade students who were taught witheither a literature-based or basal-series curriculum. Growth ratesfor the literature-based and basal-series students were −.35and −1 when measured with literature-based probes, but were.66 and 1.70 when measured with basal-series probes. In theHintze et al. (1994) study, passages were not equated for dif-ficulty. Passage difficulty was equated in a subsequent study(Hintze & Shapiro, 1997) in which students in Grades 2 to 5who were instructed with literature-based or basal-series ma-terial were monitored with literature-based and basal-seriespassages. Again, results revealed differences in growth ratesrelated to curriculum source, although the results were in-consistent across grades. Students in Grades 2 through 4achieved greater growth when monitored with the literature-based, rather than with the basal-series, probes (a pattern op-posite that of Hintze et al. [1994], who found greater growthwith literature-based probes than with basal-series probes forthird-graders), whereas students in Grade 5 achieved greatergrowth with the basal-series probes (a pattern opposite thatfound by Bradley-Klug et al. [1998], who found greater growthrates on literature-based probes for fifth-grade students).

It is difficult to determine why there is such inconsis-tency in results regarding growth rates. However, given thegeneral pattern of inconsistency in the results (i.e., literature-based sometimes producing higher and sometimes lower growth

rates), we hypothesize that factors other than curriculum sourcemay be contributing to differences in growth rates. For ex-ample, as will be discussed in a later section, it is quite diffi-cult to determine the equivalency of “parallel probes” used tomonitor progress, even when the probes are drawn from thesame curriculum and matched on readability level. Moreover,slope values can easily be affected by a bunching of particu-larly difficult or easy probes near the beginning or end of aprogress-monitoring session. One way to address these issuesis to remove potential effects of nonequivalence of passagesby counterbalancing the order of passages across students sothat students do not read passages in the same order. Anothermethod is to use techniques other than readability to establishequivalence of the passages (a point we will discuss later). Ineither case, based on current research, we cautiously suggestthat growth rates are not affected by curriculum source; how-ever, as with level of performance differences, we would stillrecommend consistency in progress monitoring with respectto curriculum source.

The third and last theme emerging from the research onmaterials is related to the need to match monitoring materialto the materials used in instruction. The aforementionedHintze et al. (1994) and Hintze and Shapiro (1997) studies re-sulted in no interaction between the growth rates generated bymaterials selected from particular curricula and the curriculaused for instruction. In other words, students taught using aliterature-based series did not grow differently on literature-based probes than on basal-series probes. Other studies haveproduced similar results. Tindal et al. (1992) found no dif-ferences in slope of improvements for 12 special educationstudents in Grades 2 through 5 when monitored on highly con-trolled instructional materials or the general education basalcurriculum. Powell-Smith and Bradley-Klug (2001) and Riley-Heller, Kelly-Vance, and Shriver (2005) found no differencesin growth rates between probes derived from the students’ in-structional material and generic probes for second-grade stu-dents.

Difficulty Level

In terms of difficulty level of material, there are two generalissues to consider. The first is whether students must be mea-sured in instructional-level material or whether they can bemeasured in material outside their instructional level. The sec-ond is the importance of establishing equivalence of passagedifficulty for repeated progress monitoring.

Need to Measure at Instructional Level. Address-ing the issue of whether students must be measured with instructional-level material, Fuchs and Deno (1992) comparedcriterion-related validity and developmental growth rates pro-duced by materials of various difficulty levels. Results revealedno differences related to material difficulty in the magnitudeof the correlations. Average correlations across difficulty lev-els with the WRMT (Woodcock, 1973) were .91, ranging from



.89 to .93. Results also supported the use of a generic (in thiscase, third-grade) passage for tracking growth across gradelevels. Growth rates produced with a common third-gradepassage were relatively stable and linear. However, Fuchs andDeno also found that developmental growth rates decreasedas the difficulty of the material increased; thus, sixth-gradematerial produced growth rates that were less steep than thoseproduced with third-grade material.

Several studies have examined the influence of materialdifficulty on intraindividual growth rates. Shinn, Gleason, andTindal (1989) monitored the reading-aloud progress of mildlyhandicapped students in Grades 3 to 8 who were randomly as-signed to one of two groups: reading material one grade levelbelow and above instructional placement, or two and fourgrade levels above instructional placement. Results revealedcomparable slopes for students monitored one level below(4.3 words per week) and above (3.7 words per week) instruc-tional placement level. Although not statistically significant,slopes for two and four levels above instructional placementdid differ in magnitude (4.55 vs. 2.35 words per week), sup-porting the earlier finding that an increase in difficulty leadsto flatter slopes. Standard error of estimates (SEEs) did notdiffer by difficulty level.

Dunn and Eckert (2002) found no differences in slopesor SEEs for grade-level versus challenging (approximatelyone grade level above instructional level) material for second-or third-grade students. Hintze, Daly, and Shapiro (1998), how-ever, did find differences in the pattern of results for gradelevels. Students in Grades 1 through 4 were monitored withmaterial at and 1 year above grade level. No differences wereseen in alternate-form reliability between materials, nor in theslopes obtained for students in Grades 3 and 4. However, forstudents in Grades 1 and 2, growth rates were higher on grade-level material than they were on material above grade level.Differences were especially notable for first-grade students.The authors surmised that students in the beginning stages ofreading development experienced more fluency problemswhen the text was difficult than did students who were beyondthe beginning stages of reading.

One study examined the effects of difficulty level on theslopes produced by word identification probes. Fuchs, Tindal,and Deno (1981) compared three types of word lists: thosegenerated from grade-level material covered throughout theyear (grade-level, comprehensive), from grade-level materialcovered during the time of the study (grade-level, limited),and across-grade-level material (preprimer to Grade 4). Dif-ferences were found in the magnitudes of the slopes producedby the measures. The steepest slopes were produced by thegrade-level, limited word lists (.49), followed by the grade-level, comprehensive lists (.20), and the across-grade lists(−.07).

Equivalence of Passages. A second issue related to diffi-culty level of CBM materials, and one that has received far lessattention than other materials-related issues have, is the im-

portance of establishing the equivalence of the “parallel” pas-sages used for repeated progress monitoring. As one might ex-pect, CBM reading scores are sensitive to the difficulty level ofthe passages. For example, each of the studies described in thesection above reported mean score differences on the reading-aloud measure for passages of varying difficulty levels (e.g.,Dunn & Eckert, 2002; Fuchs & Deno, 1992; Hintze, et al.,1998; Shinn et al., 1989). In one respect, this is a positive find-ing and demonstrates the validity and sensitivity of the CBMreading-aloud measures. On the other hand, with respect toongoing progress monitoring, the finding is problematic. Itimplies that scores on repeated progress-monitoring measureswill be affected by variation in passage difficulty (a findingconfirmed by studies reviewed in a later section on growth).

The importance of establishing equivalence in passagedifficulty has been illustrated in two studies. In the first, Poncy,Skinner, and Axtell (2005) examined the effects of passagevariability on reading-aloud scores. Participants were third-graders who read 20 grade-level passages with readabilitiesranging from 2.8 to 3.1 grade level. Passages were presentedto students in random order over a period of 4 days. Analysesrevealed that 81% of the variance in students’ scores was dueto student skill, 10% was due to passage variability, and 9%was unaccounted for. By controlling the difficulty of the pas-sages on the basis of students’ average scores, variance due tostudent skill increased and variance to passage difficulty de-creased.

In the second study, Hintze and Christ (2004) comparedgrade-level material controlled for difficulty level with ran-domly selected materials from graded readers for students inGrades 2 to 5. Students were administered both controlled anduncontrolled reading-aloud passages over the course of 11weeks. The results indicated that estimates of both SEE andthe standard error of the slope (SEb) were smaller when pas-sages were controlled for difficulty than when they were not.

The results of Poncy et al. (2005) and of Hintze and Christ(2004) emphasize the need to establish passage equivalence.Yet, creating parallel passages is not as easy as it may seem.The most common method for establishing equivalence of CBMpassages has been to examine the readability levels of the pas-sages via the use of common readability formulas. However,Ardoin, Suldo, Witt, Aldrich, and McDonald (2005) foundonly modest relationships between the reading levels assignedto passages via readability formulas and the number of wordsread correctly (WRC) in 1 min from those passages for third-graders. The readability formula that produced the highestand most reliable mean associations with WRC was Forecast(Sticht, 1973), a formula that has not been used in the CBMresearch. In a component analysis, Ardoin et al. (2005) foundthat the two components significantly related to WRC weresyllables per 100 words and words not on the Dale-Chall listof 3,000 words. Results also revealed inconsistencies in thelevels assigned to passages among various readability formulas.

Others studies have demonstrated low correlations (rang-ing from rs = −.08 to .43) between scores on CBM reading-aloud

and readability formula levels (Bradley-Klug et al., 1998;Compton, Appleton, & Hosp, 2004; Hintze et al., 1994; Powell-Smith & Bradley-Klug, 2001) and low correlations amongvarious readability formulas (r = .28; Compton et al., 2004).In addition, Compton et al. (2004) found that certain passagecomponents were related to WRC in 1 min, including the num-ber of high-frequency and decodable words, the number ofmultisyllabic words (negatively related), and sentence length.

Given the problems with readability formulas for deter-mining passage difficulty, how can one determine the equiv-alence of passages used for CBM progress monitoring? Ardoinet al. (2005) proposed a process whereby a set of passages isselected based on specific components found to relate to read-ing fluency, such as those identified in Ardoin et al. (2005)and Compton et al. (2004). Selected passages are then field-tested with a large number of students. In this system, onlypassages that fall within 1 standard deviation of the meanWRC would be selected for use in CBM progress monitoring.The effects of such an approach on the stability of the growthrates produced by reading-aloud passages have yet to be ex-amined.


The research on curriculum source supports a robustness inthe CBM measures: Even when developed from different cur-riculum sources, the measures seem to function consistently.Moreover, it does not appear necessary to develop CBM probesfrom material in which the student is being instructed. Theseconclusions are good news in terms of the development of aseamless and flexible system of progress monitoring. Theyimply that CBM progress monitoring using reading-aloudmeasures can be used across various curricula and instruc-tional approaches. However, the research related to difficultylevel poses some restrictions to the general robustness ofCBM materials.

With respect to the issue of whether students need to bemeasured with instructional-level material, the literature in-dicates that CBM reading-aloud measures are fairly flexiblewith regard to difficulty level: Students can be measured withmaterial that is easier or more difficult than their instructionallevel, and the technical adequacy of the measures is not af-fected. However, there are limits to this flexibility. Rates ofgrowth may be affected if material is too difficult (e.g., twoto three levels above instructional level). Further, there issome indication that beginning readers may be more affectedby difficulty level than more advanced readers are. In termsof word identification, results reveal that the more closely themeasure is tied to the instruction, the more sensitive it is togrowth. Of course, this sensitivity must be balanced with gen-eralizability of the results. If words are selected that coveronly 1 month of instruction, students are likely to hit a ceil-ing in their performance in a relatively short amount of time.

The issue of difficulty as it relates to intraindividualgrowth monitoring is of greater concern. Generally, results

emphasize the need to establish passage equivalence for CBMprogress monitoring, especially if the measures are to beused as a part of a decision-making process that carries im-portant social consequences. The issue of passage equivalenceis perhaps less of a concern if CBM is used by a classroomteacher to monitor student progress and evaluate instructionalprograms. Given the time-consuming nature of the type ofapproach suggested by Ardoin et al. (2005), and the positivetreatment validity results reported in Stecker et al. (2005), use of passages selected from controlled sources, such as basal-reading series, is probably sufficient for such classroom use.However, if CBM is to be used as part of a school- or dis-trictwide decision-making process, or if it is to be used as partof an eligibility decision-making process, we feel that it isnecessary to establish equivalence of passage difficulty forprogress monitoring by adapting a process similar to that sug-gested by Ardoin et al. (2005).

Issues About Measuring Growth

We already have discussed in part some factors related togrowth. For example, the amount of growth produced byCBM measures may vary with the curriculum source or dif-ficulty level of the passages, and error in the production ofgrowth rates is reduced when the difficulty of the passagesused for progress monitoring is controlled. Two additional is-sues specific to growth have not yet been addressed: (a) Howreliable or trustworthy are the rates of growth produced byCBM measures? (b) Can standards for growth be determined,and do they differ for students by performance or grade level?As we did in the section on the effects of text materials, wecaution the reader that the majority of research conducted inthis area has been with the reading-aloud measure, with littleattention devoted to word ID or to maze selection.

Reliability of Growth Rates

The research on determining the trustworthiness of the growthrates produced by CBM measures includes examination of thedependability of single CBM scores and the reliability oraccuracy of the slopes produced by multiple, repeated scores.With regard to single scores, studies have examined the amountand sources of error surrounding single CBM scores. Christand Silberglitt (in press) calculated typical standard error ofmeasurement (SEM) values across students in Grades 1 to 5,taking into consideration various levels of measurement reli-ability and sample variability. Values were calculated for datacollected in fall, winter, and spring (three measurements perdata-collection period) across a period of 8 years for 8,200students. Results reveal that the median SEM across gradesand conditions was 10 words read correctly, with a range of 5to 15. Reliability, grade, and sample diversity affected SEMs,with smaller SEMs associated with higher levels of reliabil-ity, lower grade levels, and less sample variability. The au-



thors suggested that the use of standard SEMs could aid in theinterpretation of CBM reading aloud data.

Sources of error associated with CBM scores have beenexamined recently via the application of G theory, or gener-alizability theory, to CBM reading aloud. G theory is designedto assess the dependability of behavioral measurements byspecifying portions of error that can be accounted for by var-ious situational variables under which measurements are taken(Cronbach, Gleser, Nanda, & Rajaratnam, 1972). Through Gtheory, “portions of variance that in classic test score theoryare simply attributed to random error” (Hintze, Owen, Sha-piro, & Daly, 2000, p. 53) are identified and explained. Re-sults of a series of studies applying G theory have revealedconsistent support for the dependability of the CBM readingmeasures for inter- and intraindividual decision making at theelementary school level, with the majority of variance in CBMscores accounted for by individual variation and grade level(Hintze et al., 2000) or individual variation and group mem-bership (Brown-Chidsey, Davis, & Maya, 2003; Hintze &Pelle Petitte, 2001).

Although dependability of individual data points con-tributes to the dependability of the slope, it does not guaranteeit. Other factors must be considered with regard to slope.Several approaches have been taken to evaluate the trustwor-thiness or the reliability of the slopes produced by CBM mea-sures. In early CBM studies, the focus was on whether CBMreading measures were sensitive to change in performance.For example, Marston and Deno (1982) and Marston, Deno,and Tindal (1983) compared change in scores on reading-aloudand word ID measures to changes in scores for other readingmeasures, such as standardized achievement tests or basal-series tests. Results revealed that the CBM measures weremore sensitive to growth over a short period of time than werethese other measures.

Later studies focused more closely on the statistical prop-erties of the growth rates produced by repeated CBM data.Skiba, Deno, Marston, and Wesson (1986) examined slopesfor CBM reading-aloud data collected three to five times perweek over a 6-month period for 67 students with disabilities(Grades 1–7). The mean increase in number of words read perweek was 1.55 words per minute and tended to have an in-verse relationship to grade level (a pattern that has been repli-cated in other research that will be reviewed later). The meanSEE across grade levels was 10.17, with a range of 8.45 inGrade 1 to 1.56 in Grade 4.

In 1992, Fuchs and Fuchs examined the ratio of the slopevalue to the standard error of estimate. SEE indexes the de-gree of intraindividual instability in the CBM data. A largeSEE in proportion to the slope makes the graphs difficult tointerpret. Fuchs and Fuchs (1992) found that maze had a rel-atively small SEE in proportion to the slope when comparedto the other measures that might serve as alternatives for read-ing aloud.

Shinn et al. (1989) described a desirable slope as one thatwould be (a) easy to compute and interpret, (b) accurate in the

sense of not producing systematic over- or underpredictionsof performance, and (c) precise in the sense that individual er-rors of prediction are minimized. In a series of studies, thesecharacteristics were compared for different methods for slopecalculation (Good & Shinn, 1990; Parker & Tindal, 1992;Shinn, Good, & Stein, 1989). Results of these studies sup-ported the use of ordinary least squares (OLS) for calculatingslope compared to other methods (see Note 3). OLS was notthe easiest method for calculating slope, but it did not sys-tematically over- or underpredict future performance given arelatively small number (e.g., 10) of data points, and it mini-mized individual errors of prediction. Results of the slopestudies revealed negatively accelerating growth rates withinthe academic year.

Dunn and Eckert (2002) discussed limitations of studiesof slope, stating that the studies compared line-fitting meth-ods to each other rather than to an absolute standard of tech-nical adequacy and focused primarily on predicting later datafrom earlier data in the same data set. Dunn and Eckert pro-posed examination of the correlations between words readcorrectly and time (school day) as an indicator of accuracy ofthe slope. The square of the correlation coefficient would re-flect the amount of variability in the words read correctly dueto time. A higher percentage would indicate a more accurateslope line. Dunn and Eckert compared slopes for grade-leveland above-grade-level material (see description of the studyin the Materials section). Results revealed median correlationcoefficients of .15 and .14, respectively, indicating that verylittle variability in the number of words read correctly couldbe attributable to time.

Both Hintze and Christ (2004) and Christ (2006) con-sidered SEb in their investigations of slope reliability. Hintzeand Christ (2004; reviewed earlier) found that controlling thedifficulty of progress-monitoring material reduced both SEEand SEb. Christ (2006) examined the likely magnitudes ofSEb for different values of SEE and for different durationsof progress monitoring. Results revealed that the longer theprogress-monitoring duration, the smaller the SEE and thesmaller the SEbs (9.19 for 2 weeks compared to .42 for 15weeks). If one assumed a moderate amount of SEE, 9 to 10data points were needed to reduce SEbs to levels below 1. Tento 12 data points reduced SEbs to between .59 and .78. Christ(2006) discussed the importance of controlling testing condi-tions and passage difficulty to reduce SEE.

Most studies of slope have focused on the stability of theslopes, but little attention has been paid to the validity of theslopes generated by CBM data. We use validity in this senseto refer to the degree to which slope values predict perfor-mance on external criterion measures. With the developmentand ease of use of new statistical techniques, investigationsof the validity of slopes have become easier to accomplish.For example, Shin, Deno, and Espin (2000) used Hierarchi-cal Linear Modeling (HLM) to examine growth rates on mazeselection measures for second-grade students over the courseof a year. Results revealed that maze selection sensitively re-


flected improvement in student performance across the yearand reflected interindividual differences in growth rates. Inaddition, growth on maze was positively and significantly re-lated to later performance on a standardized reading achieve-ment test. Tichá et al. (2007) and Espin et al. (2007) usedHLM to examine growth rates produced by maze selection formiddle-school students. They found that growth on a maze se-lection task was significantly related to performance on a statestandards test and to growth on a standardized achievementtest. Finally, Speece and Ritchey (2005) used growth-curveanalysis to examine the growth rates produced by readingaloud for a sample of at-risk first-graders. Results revealedthat rate of growth on the reading-aloud measures in firstgrade predicted rate of growth in second grade, as well as end-of-year performance in second grade.

Standard Rates of Growth

A question often posed by practitioners is how much growthone can expect on CBM measures and whether growth ex-pectations differ by age or performance levels. Several stud-ies have examined these standard rates of growth. Marston,Lowry, Deno, and Mirkin (1981) examined growth rates fromfall to winter to spring for students in Grades 1 through 6 forboth reading aloud and word ID. Fifty-eight randomly se-lected general education students participated. Both word IDand reading-aloud measures reflected growth over time andexhibited steep continual, linear growth. Dramatic changes wereobserved in the earlier grades, with less dramatic changes inthe upper grades, a result seen in other studies (e.g., Deno etal., 1982; MacMillan, 2000). Hasbrouck and Tindal (1992)published CBM norms for reading aloud on the basis of datacollected over a period of 9 years from 7,000 to 9,000 stu-dents in Grades 2 through 5. They reported normative perfor-mance levels by time of year (fall, winter, spring), grade, andpercentile level within grade.

The limitation of grade-level standards is that they donot take into account an individual’s beginning level of per-formance. Students who are low performing may never reachgrade-level standards, even when they are improving (Fuchs,Fuchs, Hamlett, Walz, & Germann, 1993). Fuchs et al. (1993)addressed intraindividual norms for weekly growth rates onCBM reading-aloud and maze measures. Participants were twosamples of students in Grades 1 through 6. During the firstyear of the study, students (N = 117) were measured in grade-level material once each week using reading aloud. During thesecond year, students (N = 257) were measured in grade-levelmaterial at least monthly using a computerized maze selec-tion program. Results revealed that for the majority of stu-dents, a linear model fit the growth data for both reading aloudand maze, but for a proportion of the students, a quadraticmodel fit the data. For most of these cases, a slightly negativepattern of growth was found. Weekly growth rates were cal-culated by grade level. Results revealed differences in growthrates as a function of grade for reading aloud but not for maze.

As with earlier research, the magnitude of slopes was foundto decrease with an increase in grade. However, similar to thefindings of Jenkins and Jewell (1993), no such relationshipwas found for maze. Fuchs et al. (1993) presented what theytermed realistic and ambitious standards for weekly growth.For reading aloud, these standards were, by grade level, 2 and3 words per week (Grade 1), 1.5 and 2.0 (Grade 2), 1.0 and1.5 (Grade 3), .85 and 1.1 (Grade 4), .5 and .8 (Grade 5), and.3 and .65 (Grade 6). For maze, realistic and ambitious stan-dards for growth remained the same across grades and were.39 and .84 word choices per week, respectively.

Deno, Fuchs, Marston, and Shin (2001) established aca-demic growth standards for students in general and special ed-ucation under typical and effective instructional practices.Growth rates under typical conditions were generated usingextant databases from four educational agencies across thecountry. Growth rates under effective instructional conditionswere generated by combining data across studies in whichinstructional practices had been implemented and shown tobe effective. As with previous research, results revealed thegreatest growth in the early grades, with a decrease in growthrates with age. Typical growth rates for general and specialeducation by grade level were 1.8 and .83 (Grade 1), 1.66 and.57 (Grade 2), 1.18 and .58 (Grade 3), 1.01 and .58 (Grade 4),.58 and .58 (Grade 5), and .66 and .62 (Grade 6). Large dif-ferences existed between the growth rates of general and spe-cial education students up until Grade 4. In the second part ofthe study, Deno et al. (2001) examined growth rates for stu-dents with learning disabilities who had received effective in-structional treatments across five different studies. Growthrates for both reading aloud and maze were reported. Growthrates for reading aloud ranged from .83 to 2.10 words perweek. Growth rates for maze ranged from .56 to .70 wordsper week. With regard to the reading-aloud data, the resultswere striking in the sense that under effective instructionalconditions, students with LD exhibited growth rates close tothe typical growth rates for general education students seen inthe first part of the study.


Research on the technical adequacy of the growth rates pro-duced by CBM measures supports the dependability of singlemeasures; however, results are more mixed with regard toslope. Slopes have been found to predict future CBM perfor-mance and have also been found to predict performance andprogress on external measures. However, concern exists aboutthe variability of the data points around the slope and the vari-ability of the slope values themselves. Concern also existsabout establishing equivalence of passages used for progressmonitoring. More research is needed on the technical proper-ties of the slope values produced by various CBM readingmeasures, especially as they relate to the use of CBM withinthe framework of high-stakes decisions, such as those in-volved in eligibility determination. Future research should


also examine the relative effects of exceptionally high or lowdata points at various time points on slope values, and practi-tioners’ abilities to interpret slope values and apply them toinstructional decision making.

Although studies of standard growth rates make an im-portant contribution to the CBM literature, the results must beviewed with caution. First, as acknowledged by the researchers,none of the studies employed a nationally representative dataset. Second, as discussed earlier in this review, the growthrates obtained in these studies may have been affected by thematerials in the studies, specifically the equivalence of thepassages used to establish the growth rates. To illustrate thispoint, perusal of the various studies included in this reviewreveal growth rates for general education students in Grade 4ranging from .24 to 2.69 words per week. Similar diversity ingrowth rates can be seen at the other grade levels. It may beimportant to consider establishing nationally standard growthrates by creating a standard set of passages that are controlledfor difficulty and measuring a nationally representative sam-ple of students on a weekly basis. In addition, following thelead set by Fuchs et al. (1993), it seems important to considerboth typical and ambitious growth standards.

Conclusion

We begin our Conclusion section with a return to the questionposed at the beginning of the article: Are CBM measures“valid enough” for the purposes for which they are used? Webelieve that the data have supported the validity of the CBMreading-aloud measure for use by classroom teachers as an in-dicator of the performance and progress in reading for ele-mentary school students, Grades 2 to 5. The measures havebeen shown to relate to a variety of criterion measures acrossa multitude of studies conducted over many years with dif-ferent participants, methods, materials, and researchers. In ad-dition, the theoretical basis for the strength of reading aloudas an indicator of general reading proficiency has been sup-ported, and the measures have been shown to be dependable.

Despite the positive support for the reading-aloud mea-sures, we offer some words of caution about the CBM readingmeasures in general. First, with respect to reading aloud, themeasures have not always been found to be strongly related tocriterion measures. Although correlations are consistently inthe .70s or above, there are studies that have resulted in corre-lations in the .40s between reading aloud and criterion variables.A closer look at the reasons for these exceptions is in order.Research has suggested that the relationship between readingaloud and criterion variables may decrease as students getolder; however, other factors may also influence the relation-ship, such as the influence of various passage characteristicson the relation between reading aloud and criterion variables.

Second, the overwhelming majority of research in CBMreading has been done with reading aloud, and with samplesof students in Grades 2 through 5. The question of the valid-

ity of the measures for younger (K–Grade 1) and older (Grades6–12) students, and for students of diverse backgrounds, isstill open for examination. With regard to student age, researchsuggests that a word-ID measure may be more appropriatethan reading aloud would be for younger, beginning readers,and that maze may be more appropriate for older, middleschool students. With regard to students with diverse needs,correlations between the CBM measures and criterion vari-ables have generally been positive across various groups, butthere is evidence that CBM reading aloud overestimates theperformance of African American and EL students, whichcould result in underidentification for services. For studentswith sensory disabilities, too few studies have been conductedto draw conclusions.

Finally, we raise the issue of the uses of CBM measures.CBM measures are no longer simply a means for special ed-ucation teachers to evaluate the effects of instructional pro-grams. The use of CBM measures has shifted from that ofmonitoring the progress of students in special education to usein high-stakes decisions that carry important social conse-quences. This shift creates a new set of standards for the mea-sures. The validity of the measures for these new purposes hasyet to be established. There is little known, relatively speak-ing, about the technical characteristics of the slopes producedby CBM measures. For example, what is the best method fordetermining validity and reliability of the slope? How manydata points are needed to obtain a reliable and valid slope?Does this number differ with the age of the student, the ma-terial used, or the equivalency of “parallel” passages? Howdo we establish parallel passages? There is also little knownabout normative or ambitious rates of performance and growth.For example, how much growth should be expected fromstudents at different age and performance levels? Should na-tional norms be developed? If so, must standard sets of mate-rials be developed for this purpose? Finally, little is knownabout teachers’ understanding of CBM progress data and thethinking processes teachers use as they interpret and use datafor decision making. For example, how accurate are teachersat interpreting CBM data? How long does it take them to learnhow to interpret and use CBM data? How easy is it for teach-ers to connect progress-monitoring data to instructional deci-sions, and are there methods to enhance their ability to tie dataand instruction together? We believe that areas must be ex-plored if CBM is to be used as a part of a decision-makingprocess related to determining the need for special educationservices.

Despite these words of caution, we believe that the flex-ibility and durability of CBM in reading across different mea-sures, materials, settings, students, and situations is notable.This flexibility and durability provide the basis for consider-ing the development of a seamless and flexible system ofprogress monitoring that could be used across students of var-ious ages and performance levels. Such a system might allowone to follow the progress of a student from kindergarten toGrade 12, using the same measures and materials or linking

measures and materials. Many of the issues raised above (e.g.,establishment of normative levels and growth rates using stan-dard material) would need to be addressed before such as sys-tem could be realized, but the research on the development ofCBM measures in reading is at a point where development ofsuch a seamless and flexible system is conceivable.

AUTHORS’ NOTES

1. The Research Institute on Progress Monitoring at the Universityof Minnesota is funded by the U.S. Department of Education, Of-fice of Special Education Programs (Award H324H030003) andpartially supported the completion of this work.

2. We wish to thank the Netherlands Institute for Advanced Studyin the Humanities and Social Sciences for its support in the prepa-ration of this manuscript.

3. We wish to thank Stan Deno, Ed Shapiro, Ted Christ, JonghoShin, and John Hintze for input on parts of this article.

NOTES

1. These measures can be found under various names in the litera-ture. Reading aloud is often referred to as oral reading fluencyand oral reading. Maze selection is referred to as maze and multiple-choice cloze. Word identification is also referred to asisolated word reading, word list reading, and word identificationfluency. We have chosen to use reading aloud, maze selection,and word identification consistently throughout the article be-cause these terms represent the behavior that the student performson each CBM measurement task; that is, students read aloud, se-lect maze choices, or identify words.

2. Hintze and Silberglitt (2005) and Silberglitt and Hintze (2005)examined various methods for determining cut scores using theCBM measures. Results supported the use of logistic regressioneither alone or in combination with Receiver Operating Charac-teristics (ROC) curve analysis. Given the fact that using logisticregression alone is easier than using it with ROC and given thesimilarities in the pattern of results between the two approaches,we report results here for logistic regression analyses only.

3. Although the research supported the use of OLS for calculatingslope, this is not a method that can be easily calculated by handby teachers. Parker and Tindal (1992) proposed an alternative,Tukey I, that had good statistical properties and could be calcu-lated by hand. To our knowledge, this method has never caughton in either CBM research or practice. In research, the over-whelming majority of studies calculate slope using OLS.

REFERENCES

Allinder, R. M., & Eccarius, M. A. (1999). Exploring the technical adequacyof curriculum- based measurement in reading for children who use man-ually coded English. Exceptional Children, 65, 271–283.

Ardoin, S. P., Suldo, S. M., Witt, J., Aldrich, S., & McDonald, E. (2005). Ac-curacy of readability estimates’ predictions of CBM performance.School Psychology Review, 20, 1–22.

Ardoin, S. P., Witt, J. C., Suldo, S. M., Connell, J. E., Koenig, J. L., Resetar,J. L., et al. (2004). Examining the incremental benefits of administeringa maze and three versus one curriculum-based measurement readingprobes when conducting universal screening. School Psychology Re-view, 33, 218–233.

Bain, S. K., & Garlock, J. W. (1992). Cross-validation of criterion-related va-lidity for CBM reading passages. Diagnostique, 17, 202–208.

Baker, S. K., & Good, R. H. (1995). Curriculum-based measurement of Eng-lish reading with bilingual Hispanic students: A validation study withsecond-grade students. School Psychology Review, 24, 561–578.

Bradley-Klug, K. L., Shapiro, E. S., Lutz, J. G., & DuPaul, G. J. (1998). Eval-uation of oral reading rate as a curriculum-based measure within litera-ture-based curriculum. Journal of School Psychology, 36, 183–197.

Brown-Chidsey, R., Davis, L., & Maya, C. (2003). Sources of variation incurriculum-based measures of silent reading. Psychology in the Schools,40, 363–377.

Brown-Chidsey, R., Johnson, P., Jr., & Fernstrom, R. (2005). Comparison ofgrade-level controlled and literature-based maze CBM reading passages.School Psychology Review, 34, 387–394.

Christ, T. J. (2006). Short-term estimates of growth using curriculum-basedmeasurement of oral reading fluency: Estimates of standard error ofslope to construct confidence intervals. School Psychology Review, 35,128–133.

Christ, T. J., & Silberglitt, B. (in press). Curriculum-based measurement oforal reading fluency: The standard error of measurement. School Psy-chology Review.

Compton, D. L., Appleton, A. C., & Hosp, M. K. (2004). Exploring the re-lationship between text-leveling systems and reading accuracy and flu-ency in second-grade students who are average and poor decoders.Learning Disabilities Research & Practice, 19, 176–184.

Compton, D. L., Fuchs, D., Fuchs, L. S., & Bryant, J. D. (2006). Selectingat-risk readers in first grade for early intervention: A two-year longitu-dinal study of decision rules and procedures. Journal of EducationalPsychology, 98, 394–409.

Crawford, L., Tindal, G., & Stieber, S. (2001). Using oral reading rate to pre-dict student performance on statewide achievement tests. EducationalAssessment, 7, 303–323.

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The de-pendability of behavioral measurements: Theory of generalizability forscores and profiles. New York: Wiley.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychologicaltests. Psychological Bulletin, 52, 281–302.

Daly, E. J., III, Wright, J. A., Kelly, S. Q., & Martens, B. K. (1997). Mea-sures of early academic skills: Reliability and validity with a first gradesample. School Psychology Quarterly, 12, 268–280.

Deno, S. L. (1985). Curriculum-based measurement: The emerging alterna-tive. Exceptional Children, 52, 219–232.

Deno, S. L. (1990). Individual differences and individual difference: The es-sential difference of special education. The Journal of Special Educa-tion, 24, 160–173.

Deno, S. L., Fuchs, L. S., Marston, D., & Shin, J. H. (2001). Using curricu-lum-based measurement to establish growth standards for students withlearning disabilities. School Psychology Review, 30, 507–524.

Deno, S. L., Marston, D., Mirkin, P., Lowry, L., Sindelar, P., & Jenkins, J.(1982). The use of standard tasks to measure achievement in reading,spelling, and written expression: A normative and developmental study.Institute for Research on Learning Disabilities, 87.

Deno, S. L., Maruyama, G., Espin, C. A., & Cohen, C. (1989). The basic aca-demic skills samples (BASS). Minneapolis, MN: University of Min-nesota.

Deno, S. L., Mirkin, P. K., & Chiang, B. (1982). Identifying valid measuresof reading. Exceptional Children, 49, 36–45.

Dunn, E. K., & Eckert, T. L. (2002). Curriculum-based measurement in read-ing: A comparison of similar versus challenging material. School Psy-chology Quarterly, 17, 24–46.

Espin, C. A., & Deno, S. L. (1993a). Content-specific and general readingdisabilities of secondary students: Identification and educational rele-vance. The Journal of Special Education, 27, 321–337.

Espin, C. A., & Deno, S. L. (1993b). Performance in reading from contentarea text as an indicator of achievement. Remedial and Special Educa-tion, 14, 47–59.



Espin, C. A., & Deno, S. L. (1994-95). Curriculum-based measures for sec-ondary students: Utility and task specificity of text-based reading andvocabulary measures for predicting performance on content-area tasks.Diagnostique, 20, 121–142.

Espin, C. A., Deno, S. L., Maruyama, G., & Cohen, C. (1989, March). TheBasic Academic Skills Samples (BASS): An instrument for the screeningand identification of children at risk for failure in regular educationclassrooms. Paper presented at the National Convention of the Ameri-can Educational Research Association, San Francisco, CA.

Espin, C. A., & Foegen, A. (1996). Validity of general outcome measures forpredicting secondary students’ performance on content-area tasks. Ex-ceptional Children, 62, 497–514.

Espin, C., Wallace, T., Lembke, E., Campbell, H., & Long, J. (2007). Creat-ing a progress measurement system in reading for secondary students:Monitoring progress towards meeting high stakes standards. Manuscriptsubmitted for publication.

Fewster, S., & MacMillan, P. D. (2002). School-based evidence for the va-lidity of curriculum-based measurement of reading and writing. Reme-dial and Special Education, 23, 149–156.

Fuchs, L. S., & Deno, S. L. (1992). Effects of curriculum within curriculum-based measurement. Exceptional Children, 58, 232–243.

Fuchs, L. S., & Deno, S. L. (1994). Must instructionally useful performanceassessment be based in curriculum? Exceptional Children, 61, 15–24.

Fuchs, L. S., & Fuchs, D. (1990). [Relations among the Reading Compre-hension subtest of the Stanford Achievement Test and maze, recall, ques-tion answering, and fluency measures]. Unpublished data.

Fuchs, L. S., & Fuchs, D. (1992). Identifying a measure for monitoring stu-dent reading progress. School Psychology Review, 21, 45–58.

Fuchs, L. S., Fuchs, D., & Compton, D. L. (2004). Monitoring early readingdevelopment in first grade: Word identification fluency versus nonsenseword fluency. Exceptional Children, 71, 7–21.

Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1989). Effects of instrumental useof curriculum-based measurement to enhance instructional programs.Remedial and Special Education, 10, 43–52.

Fuchs, L. S., Fuchs, D., Hamlett, C. L., Walz, L., & Germann, G. (1993). For-mative evaluation of academic progress: How much growth can we ex-pect? School Psychology Review, 22, 27–48.

Fuchs, L. S., Fuchs, D., & Maxwell, L. (1988). The validity of informal com-prehension measures. Remedial and Special Education, 9, 20–28.

Fuchs, L. S., Tindal, G., & Deno, S. L. (1981). Effects of varying item do-main and sample duration on technical characteristics of daily measuresin reading. Institute for Research on Learning Disabilities, 48.

Gardner, R. F., Rudman, H. C., Karlsen, B., & Merwin, J. C. (1982). Stan-ford achievement test. Iowa City: Harcourt Brace Jovanovich.

Good, R. H., & Shinn, M. R. (1990). Forecasting accuracy of slope estimatesfor reading curriculum-based measurements: Empirical evidence. Be-havioral Assessment, 12, 179–193.

Good, R. H., Simmons, D. C., & Kame’enui, E. J. (2001). The importanceand decision-making utility of a continuum of fluency-based indicatorsof foundational reading skills for third-grade high-stakes outcomes. Sci-entific Studies of Reading, 5, 257–288.

Graves, A. W., Plasencia-Peinado, J., Deno, S. L., & Johnson, J. R. (2005).Formatively evaluating the reading progress of first-grade EnglishLearners in multiple-language classrooms. Remedial and Special Edu-cation, 26, 215–225.

Guthrie, J. T. (1973). Reading comprehension and syntactic responses in goodand poor readers. Journal of Educational Psychology, 65, 294–299.

Guthrie, J. T., Seifert, M., Burnham, N. A., & Caplan, R. I. (1974). The mazetechnique to assess, monitor reading comprehension. The ReadingTeacher, 28, 161–168.

Hamilton, C., & Shinn, M. R. (2003). Characteristics of word callers: An in-vestigation of the accuracy of teachers’ judgments of reading compre-hension and oral reading skills. School Psychology Review, 32, 228–240.

Harcourt Brace & Company. (1997). Stanford Achievement Test Series–Ninthedition: Technical data report. San Antonio, TX: Author.

Harcourt Educational Measurement. (1998). Metropolitan achievement test(7th ed.). San Antonio, TX: Harcourt Assessment.

Harris, A. P., & Jacobson, M. D. (1972). Basic elementary reading vocabu-laries. New York: MacMillan.

Hartman, J. M., & Fuller, M. L. (1997). The development of curriculum-basedmeasurement norms in literature-based classrooms. Journal of SchoolPsychology, 35, 377–389.

Hasbrouck, J. E., & Tindal, G. (1992). Curriculum-based oral reading flu-ency norms for students in grades 2 through 5. Teaching ExceptionalChildren, 24(3), 41–44.

Hintze, J. M., Callahan, J. E., Matthews, W. J., Williams, S. A. S., & Tobin,K. G. (2002). Oral reading fluency and prediction of reading compre-hension in African American and Caucasian elementary school children.School Psychology Review, 31, 540–553.

Hintze, J. M., & Christ, T. J. (2004). An examination of variability as a func-tion of passage variance in CBM progress monitoring. School Psychol-ogy Review, 33, 204–217.

Hintze, J. M., Daly, E. J., & Shapiro, E. S. (1998). An investigation of the ef-fects of passage difficulty level on outcomes of oral reading fluencyprogress monitoring. School Psychology Review, 27, 433–445.

Hintze, J. M., Owen, S. V., Shapiro, E. S., & Daly, E. J. (2000). Generaliz-ability of oral reading fluency measures: Application of G theory to cur-riculum-based measurement. School Psychology Quarterly, 15, 52–68.

Hintze, J. M., & Pelle Petitte, H. A. (2001). The generalizability of CBM oralreading fluency measures across general and special education. Journalof Psychoeducational Assessment, 19, 158–170.

Hintze, J. M., & Shapiro, E. S. (1997). Curriculum-based measurement andliterature-based reading: Is curriculum-based measurement meeting theneeds of changing reading curricula? Journal of School Psychology, 35,351–375.

Hintze, J. M., Shapiro, E. S., Conte, K. L., & Basile, I. M. (1997). Oral read-ing fluency and authentic reading material: Criterion validity of the tech-nical features of CBM survey-level assessment. School PsychologyReview, 26, 535–553.

Hintze, J. M., Shapiro, E. S., & Lutz, J. (1994). The effects of curriculum onthe sensitivity of curriculum-based measurement in reading. The Jour-nal of Special Education, 28, 188–202.

Hintze, J. M., & Silberglitt, B. (2005). A longitudinal examination of the di-agnostic accuracy and predictive validity of R-CBM and high-stakestesting. School Psychology Review, 34, 372–386.

Hixson, M. D., & McGlinchey, M. T. (2004). The relationship between race,income, and oral reading fluency and performance on two reading com-prehension measures. Journal of Psychoeducational Assessment, 22,351–364.

Hoover, H. D., Hieronymus, A. N., Frisbie, D. A., & Dunbar, S. B. (1996).Iowa test of basic skills. Itasca, IL: Riverside.

Hosp, M. K., & Fuchs, L. S. (2005). Using CBM as an indicator of decod-ing, word reading, and comprehension: Do the relations change withgrade? School Psychology Review, 34, 9–26.

Jenkins, J. R., & Jewell, M. (1993). Examining the validity of two measuresfor formative teaching: Reading aloud and maze. Exceptional Children,59, 421–432.

Kaminski, R. A., & Good, R. H., III. (1998). Assessing early literacy skillsin a problem-solving model: Dynamic Indicators of Basic Early Liter-acy Skills. In M. Shinn (Ed.), Advanced applications of curriculum-based measurement (pp. 113–142). New York: Guilford Press.

Karlsen, B., & Gardner, E. (1985). Stanford diagnostic reading test (3rd ed.).San Antonio, TX: Psychological Corp.

Kaufman, A. S., & Kaufman, N. L. (1985). Kaufman test of educationalachievement (brief form). Circle Pines, MN: American Guidance Service.

Kaufman, A. S., & Kaufman, N. L. (1990). Kaufman brief intelligence test.Circle Pines, MN: American Guidance Service.

Klein, J. R., & Jimerson, S. R. (2005). Examining ethnic, gender, language,and socioeconomic bias in oral reading fluency scores among Caucasianand Hispanic students. School Psychology Quarterly, 20, 23–50.


Koslin, B. L., Koslin, S., Zeno, S. M., & Ovens, S. H. (1989). The Degreesof Reading Power Test: Primary and standard forms. Brewster, NY:Touchstone Applied Science Associates.

Kranzler, J. H., Brownell, M. T., & Miller, M. D. (1998). The construct va-lidity of curriculum-based measurement of reading: An empirical testof a plausible rival hypothesis. Journal of School Psychology, 36, 399–415.

Kranzler, J. H., Miller, M. D., & Jordan, L. (1999). An examination ofracial/ethnic and gender bias on curriculum-based measurement of read-ing. School Psychology Quarterly, 14, 327–342.

MacGinitie, W. H., Kamons, J., Kowalski, R. L., MacGinitie, R. K., &McKay, T. (1978). Gates-MacGinitie Reading Tests (2nd ed.). Chicago:Riverside.

MacMillan, P. (2000). Simultaneous measurement of reading growth, gen-der, and relative-age effects: Many-faceted Rasch applied to CBM read-ing scores. Journal of Applied Measurement, 1, 393–408.

Markell, M. A., & Deno, S. L. (1997). Effects of increasing oral reading: Gen-eralization across reading tasks. The Journal of Special Education, 31,233–250.

Marston, D. (1989). A curriculum-based measurement approach to assessingacademic performance: What it is and why do it. In M. Shinn (Ed.), Cur-riculum-based measurement: Assessing special children (pp.18–78).New York: Guilford Press.

Marston, D., & Deno, S. L. (1982). Implementation of direct and repeated mea-surement in the school setting. Institute for Research on Learning Dis-abilities, 106.

Marston, D., Deno, S. L., & Tindal, G. (1983). A comparison of standardizedachievement tests and direct measurement techniques in measuringpupil progress. Institute for Research on Learning Disabilities, 126.

Marston, D., Lowry, L., Deno, S., & Mirkin, P. (1981). An analysis of learn-ing trends in simple measures of reading, spelling, and written expression:A longitudinal study. Institute for Research on Learning Disabilities, 49.

McConnell, S. R., McEvoy, M. A., & Priest, J. S. (2002). “Growing” mea-sures for monitoring progress in early childhood education: A researchand development process for individual growth and development indi-cators. Assessment for Effective Intervention, 27, 3–14.

McGlinchey, M. T., & Hixson, M. D. (2004). Using curriculum-based mea-surement to predict performance on state assessments in reading. SchoolPsychology Review, 33, 193–203.

McGrew, K. S., & Woodcock, R. W. (2001). Woodcock-Johnson III techni-cal manual. Itasca, IL: Riverside.

Mehrens, W. A., & Clarizio, H. F. (1993). Curriculum-based measurement:Conceptual and psychometric considerations. Psychology in the Schools,30, 241–254.

Messick, S. (1989a). Meaning and values in test validation: The science andethics of assessment. Educational Researcher, 18, 5–11.

Messick, S. (1989b). Validity. In R. L. Linn (Ed.), Educational measurement(3rd ed., pp. 13–103). New York: Macmillan.

Michigan State Board of Education. (1999). 1999 Update of the EssentialSkills Reading Test Blueprint Michigan Educational Assessment Pro-gram. Lansing: Michigan State Board of Education.

Morgan, S. K., & Bradley-Johnson, S. (1995). Technical adequacy of cur-riculum-based measurement for Braille readers. School Psychology Re-view, 24, 94–103.

Minnesota Department of Children, Families, and Learning. (1998–2002).Test specifications: Minnesota Comprehensive Assessment–Reading. St.Paul, MN: Author.

Muyskens, P., & Marston, D. B. (2006). The relationship between curriculum-based measurement and outcomes on high-stakes tests with secondarystudents. Minneapolis Public Schools. Unpublished manuscript.

Oregon Department of Education. (2000). Statewide assessment results 2000.Retrieved from the World Wide Web: http://www.ode.state.or.us/

Parker, R., & Tindal, G. (1992). Estimating trend in progress monitoring data:A comparison of simple line-fitting methods. School Psychology Re-view, 21, 300–312.

Poncy, B. C., Skinner, C. H., & Axtell, P. K. (2005). An investigation of thereliability and standard error of measurement of words read correctlyper minute using curriculum-based measurement. Journal of Psychoed-ucational Assessment, 23, 326–338.

Powell-Smith, K. A., & Bradley-Klug, K. L. (2001). Another look at the “C” inCBM: Does it really matter if curriculum-based measurement readingprobes are curriculum-based? Psychology in the Schools, 38, 299–312.

Prescott, G. A., Balow, I. H., Hogan, T. P., & Farr, R. C. (1984). Metropoli-tan achievement tests (MAT-6). San Antonio, TX: Psychological Corp.

Reid, D. K., Hresko, W. P., Hammill, D. D., & Wiltshire, S. M. (1991). Testof early reading ability: Special edition for students who are deaf orhard of hearing. Austin, TX: PRO-ED.

Riley-Heller, N., Kelly-Vance, L., & Shriver, M. (2005). Curriculum-basedmeasurement: Generic vs. curriculum-dependent probes. Journal of Ap-plied School Psychology, 21(1), 141–162.

Scannell, D. P., Haugh, O. M., Schild, A. H., & Ulmer, G. (1986). Tests ofachievement and proficiency. Chicago, IL: Riverside.

Shin, J., Deno, S. L., & Espin, C. (2000). Technical adequacy of the mazetask for curriculum-based measurement of reading growth. The Journalof Special Education, 34, 164–172.

Shinn, M. R. (Ed.). (1989). Curriculum-based measurement: Assessing spe-cial children. New York: Guilford Press.

Shinn, M. R., Gleason, M. M., & Tindal, G. (1989). Varying the difficulty oftesting materials: Implications for curriculum-based measurement. TheJournal of Special Education, 23, 223–233.

Shinn, M. R., Good, R. H., Knutson, N., Tilly, W. D., & Collins, V. L. (1992).Curriculum-based measurement of oral reading fluency: A confirmatoryanalysis of its relation to reading. School Psychology Review, 21, 459– 479.

Shinn, M. R., Good, R. H., & Stein, S. (1989). Summarizing trend in studentachievement: A comparison of methods. School Psychology Review, 18,356–370.

Shinn, M. R., Ysseldyke, J. E., Deno, S. L., & Tindal, G. A. (1986). A com-parison of differences between students labeled Learning Disabled andLow Achieving on measures of classroom performance. Journal ofLearning Disabilities, 19, 545–551.

Silberglitt, B., & Hintze, J. (2005). Formative assessment using CBM-R cutscores to track progress toward success on state-mandated achievementtests: A comparison of methods. Journal of Psychoeducational Assess-ment, 23, 304–325.

Skiba, R. J., Deno, S. L., Marston, D., & Wesson, C. (1986). Characteristicsof time-series data collected through curriculum-based reading mea-surement. Diagnostique, 12, 3–15.

Spache, G. (1981). Diagnostic reading scales. Monterey, CA: CTB/McGraw-Hill.

Speece, D. L., & Ritchey, K. D. (2005). A longitudinal study of the devel-opment of oral reading fluency in young children at risk for reading fail-ure. Journal of Learning Disabilities, 38, 387–399.

Stage, S. A., & Jacobsen, M. D. (2001). Predicting student success on a state-mandated performance-based assessment using oral reading fluency.School Psychology Review, 30, 407–419.

Stecker, P. M., Fuchs, L. S., & Fuchs, D. (2005). Using curriculum-basedmeasurement to improve student achievement: Review of research. Psy-chology in the Schools, 42, 795–819.

Sticht, T. G. (1973). Research toward the design, development and evalua-tion of a job-functional literacy program for the U.S. Army. LiteracyDiscussions, 4, 339–369.

Tichá, R., Espin, C. A., & Wayman, M. M. (2007). Reading progress moni-toring for secondary-school students: Reliability, validity, and sensitiv-ity to growth of reading aloud and maze selection measures. Manuscriptsubmitted for publication.

Tindal, G., Flick, D., & Cole, C. (1992). The effect of curriculum on inferencesof reading performance and improvement. Diagnostique, 18, 69–84.

Tindal, G., Marston, D., Deno, S. L., & Germann, G. (1982). Curriculum dif-ferences in direct repeated measures of reading. Institute for Researchon Learning Disabilities, 93.


Washington Assessment of Student Learning: Test sample. (1998). Olympia,WA: Office of the Superintendent of Public Instruction.

Woodcock, R. W. (1973). Woodcock reading mastery tests manual. CirclePines, MN: American Guidance Service.

Woodcock, R. W. (1987). Woodcock reading mastery test–Revised. CirclePines, MN: American Guidance Service.

Woodcock, R. W., & Johnson, M. B. (1989). Woodcock-Johnson test ofachievement: Standard and supplemental batteries. Allen, TX: DLMResources.

Wiley, H. I., & Deno, S. L. (2005). Oral reading and maze measures as pre-dictors of success for English learners on a state standards assessment.Remedial and Special Education, 26, 207–214.

Yell, M. L. (1992). Barriers to implementing curriculum-based measurement.Diagnostique, 18, 99–112.

Yovanoff, P., Duesbery, L., Alonzo, J., & Tindal, G. (2005). Grade-levelinvariance of a theoretical causal structure predicting reading compre-hension with vocabulary and oral reading fluency. Educational Mea-surement: Issues and Practice, 24(3), 4–12.

literature synthesis on curriculum-based measurement in ...shinn, 1989). research in reading focused...

Documents