literature synthesis on curriculum-based measurement in ...shinn, 1989). research in reading focused...
TRANSCRIPT
THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007/PP. 85–120 85
Curriculum-based measurement (CBM) is a method for mon-itoring student growth in an academic area and evaluating theeffects of instructional programs on that growth (Deno, 1985).CBM was designed to be part of a problem-solving approachto special education whereby the academic difficulties of stu-dents would be viewed as problems to be solved rather thanas immutable characteristics within a child (Deno, 1990). In theproblem-solving approach, teachers were the “problem solvers”who constantly evaluated and modified students’ instructionalprograms. For a problem-solving approach to be effective, itwas necessary for teachers to have a tool that could be usedto evaluate growth in response to instruction. CBM was de-veloped to serve that purpose.
Two separate but related concerns drove the initial re-search into the development of CBM (Deno, 1985). The firstwas the concern for technical adequacy. If teachers were to usethe measures to make instructional decisions, the measureswould have to have demonstrated reliability and validity. Thesecond was the concern for practicality. If teachers were touse the measures on an ongoing and frequent basis to evalu-ate instructional programs, the measures would have to besimple, efficient, easily understood, and inexpensive. These dualconcerns led to the concept of “vital signs,” or indicators ofstudent performance (Deno, 1985). CBM measures were con-ceptualized to be short samples of work that would be indi-cators, or vital signs, of academic performance. The sampleswould need to be valid and reliable with respect to the broaderacademic domain they were representing, but would also needto be designed to be given on a frequent and repeated basis.
In 1989, Marston reviewed the existing research on CBM.At that time, CBM was viewed primarily as a progress-monitoring tool in basic skills for special education studentsat the elementary-school level (although there were discus-sions and instances of its uses more broadly, for example, seeShinn, 1989). Research in reading focused on two measures:word identification and reading aloud. The results of Marston’s
Literature Synthesis on Curriculum-Based Measurement in Reading
Miya Miura Wayman, Teri Wallace, Hilda Ives Wiley, Renáta Tichá, and Christine A. Espin,University of Minnesota
In this article, the authors review the research on curriculum-based measurement (CBM) in readingpublished since the time of Marston’s 1989 review. They focus on the technical adequacy of CBM re-lated to measures, materials, and representation of growth. The authors conclude by discussing issuesto be addressed in future research, and they raise the possibility of the development of a seamless andflexible system of progress monitoring that can be used to monitor students’ progress across students,settings, and purposes.
review provided support for the use of these two measures asindicators of general reading proficiency. In terms of reliabil-ity, results of five studies revealed test–retest reliability coef-ficients ranging from .82 to .97, with most coefficients above.90, and alternate-form reliability coefficients ranging from.84 to .96, with most coefficients above .90. Interrater agree-ment was .99. In terms of validity, 14 studies were reviewed.Criterion-related validity coefficients with published mea-sures of reading ranged from .63 to .90, with most above .80.Criterion-related validity coefficients with basal reading se-ries criterion mastery tests ranged from .57 to .86, with halfabove .80. Reading aloud correlated with teacher judgmentand with various measures of reading comprehension, dis-criminated between lower and higher performing students,and was sensitive to growth.
Since the time of Marston’s (1989) review, the researchon CBM has expanded considerably—especially in the areaof reading—making an updated review timely. Our purposein writing this review is to gather, summarize, and reflect onthe expansive body of literature published over the last 18years on CBM in reading. We focus this review on issues oftechnical adequacy as they relate to measures, materials, andgrowth. Given the vast amount of material and the diversityof topics covered in this review, we insert summaries and dis-cussion points throughout the article. In our final section, wedraw conclusions, raise issues related to future research, anddiscuss the potential development of a seamless and flexiblesystem of progress monitoring that can be used across stu-dents, settings, and purposes.
Method
The first step in our review process was to identify all articlesaddressing CBM in reading, writing, and math in kindergarten(K) to Grade 12. Electronic databases—including ERIC,
Address: Miya Miura Wayman, Department of Educational Psychology, 250D Burton Hall, 178 Pillsbury Dr. SE, University of Minnesota, Minneapolis, Minnesota 55455. E-mail: [email protected]
86 THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007
Science Citation Index Expanded, PsycInfo, Digital Disser-tation, and the Expanded Academic Index—were searchedusing the following terms: curriculum based measurement,curriculum-based measurement, curriculum based measure,curriculum-based measure, general outcome measure, andprogress monitoring. This initial search yielded 578 articles,dissertations, and reports related to CBM. Titles and abstractsof these documents were screened to confirm that they wererelated to CBM, and Method sections were screened to iden-tify those that reported results of empirical studies of CBM,yielding 160 documents. These documents were then reviewedby a team of educational psychology graduate students andgrouped by subject area (reading, mathematics, spelling, andwriting). Ninety (56%) of the documents addressed readingmeasures. In addition to documents identified through the lit-erature search, the complete set of technical reports producedby the Institute for Research on Learning Disabilities (IRLD)at the University of Minnesota was accessed.
Given the vast database on reading, and the limitationsof space accorded a review article, we chose to narrow ourreview in several key ways. First, we focused on research pub-lished since the time of Marston’s 1989 review. Second, we fo-cused on studies related to the technical adequacy of readingmeasures. Studies in which teachers used CBM to monitorprogress and to make instructional decisions have been reviewedrecently (Stecker, Fuchs, & Fuchs, 2005). Third, we focusedon research conducted with school-age students and thus didnot include research on the development of early literacy (e.g.,Dynamic Indicators of Basic Early Literacy [DIBELS]; Ka-minski & Good, 1998) or preliteracy measures (e.g., Individ-ual Growth and Development Indicators; McConnell, McEvoy,& Priest, 2002). Finally, we focused on three common CBMreading measures: reading aloud, maze selection, and wordidentification (see Note 1). Before reporting the results of ourreview, we present a brief discussion of the issue of validity,relating it specifically to the approach taken to investigate thevalidity of measures in CBM.
Issues of ValidityMessick (1989a, 1989b) described construct validity as a mul-tifaceted, but unified, concept that takes into account the ev-idential and consequential factors related to test interpretationand test use. This conceptualization is helpful in understand-ing the research on CBM, which has examined the evidencesupporting the interpretation and use of CBM scores, as wellas the consequences associated with that interpretation anduse. In this review, we focus on the evidential basis for valid-ity of CBM measures, although, in keeping with the view ofvalidity as a unified concept, we continuously consider the po-tential consequences associated with interpretation and use ofthe measures.
In a CBM approach, evidence for the validity of a mea-sure is determined by examining the extent to which themeasure serves as a vital sign or an indicator of a broader aca-
demic domain (Deno, 1985). To determine the evidential basisfor the validity of a measure, the pattern of relations—also re-ferred to as the nomological net (see Cronbach & Meehl, 1955;Messick, 1989b)—between the selected measure and manydifferent criterion measures, each reflecting the construct ofinterest, is examined. Criterion measures might include othermeasures of the construct, such as standardized achievementtests, but might also include student age, group membership,and change in performance in response to an intervention. Itis not just the pattern of relations but also the pattern of non-relations, or the discriminative validity of the measure, that isconsidered. For example, a measure of reading would be ex-pected to relate more closely to another reading measure thanto a math measure. After a pattern of relations is establishedfor particular students, settings, or purposes, the generaliz-ability (Messick, 1989b) of the measure, or the extent to whichthe validity of the measure holds across different settings, stu-dents, and purposes, is examined.
Establishing the validity of a measure is an ongoing andrecursive process (Messick, 1989b). Validity is determined notby one study or by one correlation but by the body of evidenceamassed over time. The question arises, then, as to how onedetermines when the data are strong enough to support the useof the measure for the purpose for which it was intended. Inother words, how good is good enough? There is no set stan-dard for determining when a measure is “good enough” to beconsidered valid for the purpose for which it was designed.One approach is to consider the consequences of decisionmaking with and without the measure (Messick, 1989b). Forexample, one might ask whether using a measure improvesthe ability to make decisions over using no measure at all. Inan area in which few measures exist, this question might bewarranted. Under such circumstances, correlations of .30between the selected measure and the criterion measuresmight be considered strong enough to warrant use of an in-strument for decision-making purposes. However, in the areaof reading, where many measures exist, it seems appropriateto apply more stringent criteria. In addition, the early researchon CBM, as illustrated in Marston’s (1989) review, set a rel-atively high standard for validity and reliability, with manycorrelations reaching levels of .70 to .90. Thus, in this article,we adopt the following guidelines in interpreting the strengthof the reliability and validity coefficients: Strong relations arethose that are .70 and above; moderate relations are those thatare .50 to .70; and weak relations are those that are below .50.We remind the reader, however, that these levels are arbitrar-ily chosen and used merely to help the reader interpret thestrength of relations compared to previous research in read-ing in CBM. To evaluate the overall validity of the measures,it is necessary to consider the entire body of research.
Our review is organized into three sections: TechnicalAdequacy of CBM Measures, Effects of Text Materials, andIssues About Measuring Growth. The studies described ineach section are also outlined in Table 1. We begin our review
(text continues on p. 105)
87T
AB
LE
1.C
hara
cter
isti
cs o
f St
udie
s E
xam
inin
g Te
chni
cal F
eatu
res
of C
BM
in R
eadi
ng
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Rea
ding
Alo
ud
Fuch
s,Fu
chs,
&
354–
8SE
Rea
ding
alo
ud1
WR
CSt
anfo
rd A
chie
vem
ent T
est
Max
wel
l (19
88)
Rea
ding
Com
preh
ensi
on s
ubte
st:.
91W
ord
Skill
s su
btes
t:.8
0
Shin
n,G
ood,
Knu
tson
,23
83
& 5
ND
Rea
ding
alo
ud1
WR
CC
onfi
rmat
ory
fact
or a
naly
sis
& T
illy
(199
2)3
Uni
tary
Mod
el
Rea
ding
Com
pete
nce:
.88–
.90
5Tw
o-Fa
ctor
Mod
elD
ecod
ing:
.89–
.90
Com
preh
ensi
on:.
74–.
75
Hos
p &
Fuc
hs (
2005
)31
01–
4N
DR
eadi
ng a
loud
1W
RC
Woo
dcoc
k R
eadi
ng M
aste
ry T
est–
Rev
ised
Test
–ret
est
(WR
MT-
R)
1W
ord
Atta
ck:.
71W
ord
Iden
tific
atio
n:.9
1.9
2–.9
7Pa
ssag
e C
ompr
ehen
sion
:.79
Bas
ic S
kills
:.86
Tota
l Rea
ding
–Sho
rt:.
902
Wor
d A
ttack
:.82
Wor
d Id
entif
icat
ion:
.88
Pass
age
Com
preh
ensi
on:.
83B
asic
Ski
lls:.
89To
tal R
eadi
ng–S
hort
:.91
3W
ord
Atta
ck:.
82W
ord
Iden
tific
atio
n:.8
8Pa
ssag
e C
ompr
ehen
sion
:.84
Bas
ic S
kills
:.87
Tota
l Rea
ding
–Sho
rt:.
914
Wor
d A
ttack
:.72
Wor
d Id
entif
icat
ion:
.73
Pass
age
Com
preh
ensi
on:.
82B
asic
Ski
lls:.
78To
tal R
eadi
ng–S
hort
:.83
Kra
nzle
r,B
row
nell,
574
GE
Rea
ding
alo
ud1
WR
CE
lem
enta
ry C
ogni
tive
Task
s:−.
13&
Mill
er (
1998
)K
aufm
an B
rief
Int
elli
genc
e Te
st
Mat
rice
s:.2
4K
aufm
an T
est
of E
duca
tion
al A
chie
vem
ent
Rea
ding
Com
preh
ensi
on:.
41
Rea
ding
alo
ud e
xpla
ined
11%
of
the
vari
ance
w
hen
cont
rolli
ng f
or g
ener
al c
ogni
tive
abili
ty
and
men
tal s
peed
(tab
le c
onti
nues
)
88(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Mar
kell
& D
eno
423
GE
Rea
ding
alo
ud1
WR
CL
arge
incr
ease
s in
rea
ding
alo
ud s
core
s (1
997)
wer
e as
soci
ated
with
incr
ease
s in
per
form
-an
ce o
n m
aze
sele
ctio
n an
d co
mpr
ehen
sion
qu
estio
ns. A
n in
vers
e re
latio
nshi
p w
as f
ound
be
twee
n di
ffic
ulty
leve
l and
mea
n re
adin
g al
oud
scor
es.
Ham
ilton
& S
hinn
66
3N
DR
eadi
ng a
loud
1W
RC
Wor
d C
alle
rs p
erfo
rmed
low
er th
an d
id S
imi-
(200
3)la
rly
Flue
nt P
eers
on
Rea
ding
Alo
ud,M
aze
Sele
ctio
n,C
ompr
ehen
sion
,Ora
l Que
stio
n A
nsw
erin
g,an
d W
oodc
ock
Rea
ding
Mas
tery
Te
stPa
ssag
e C
ompr
ehen
sion
.
Maz
e Se
lect
ion
Esp
in,D
eno,
Mar
u-2,
604
1–6
ND
Maz
e se
lect
ion
1C
SR
eadi
ng A
loud
Patte
rn o
f gr
owth
yam
a,&
Coh
en (
1989
)fo
und
with
in a
nd
betw
een
grad
es1 2 3
.77
4.8
65
.86
6Fu
chs
& F
uchs
(19
92)
(18
wee
ks,2
tim
espe
r w
eek)
Yea
r 1
635.
12
SEM
aze
sele
ctio
n2.
5C
S.3
1 C
S pe
r w
eek
(ave
rage
gra
de)
Yea
r 2
635.
12
SEM
aze
sele
ctio
n2.
5C
S.2
9 C
S pe
r w
eek
(ave
rage
gra
de)
257
—G
EM
aze
sele
ctio
n2.
5C
S.3
9 C
S pe
r w
eek
Jenk
ins
& J
ewel
l (19
93)
335
2–6
ND
Maz
e se
lect
ion
1C
SG
ates
-Mac
Gin
itie
Tota
l Rea
ding
:.65
–.76
Patte
rn o
f gr
owth
G
ates
-Mac
Gin
itie
Com
preh
ensi
on:.
63–.
75fo
und
with
in a
nd
Met
ropo
lita
n A
chie
vem
ent T
est–
6 (M
AT-
6)
betw
een
grad
esTo
tal R
eadi
ng:.
66–.
76M
AT-
6 C
ompr
ehen
sion
:.60
–.74
Teac
her
judg
men
t:.5
6R
eadi
ng a
loud
1W
RC
Gat
es-M
acG
init
ieTo
tal R
eadi
ng:.
67–.
88G
ates
-Mac
Gin
itie
Com
preh
ensi
on:.
63–.
86M
AT-
6 To
tal R
eadi
ng:.
60–.
87M
AT-
6 C
ompr
ehen
sion
:.58
–.84
Teac
her
judg
men
t:.6
6(t
able
con
tinu
es)
89(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Ard
oin
et a
l. (2
004)
753
GE
Maz
e se
lect
ion
3C
SW
oodc
ock-
John
son–
III
Bro
ad R
eadi
ng:.
50R
eadi
ng F
luen
cy:.
51Pa
ssag
e C
ompr
ehen
sion
:.31
Let
ter
Wor
d Id
entif
icat
ion:
.43
Iow
a Te
st o
f B
asic
Ski
lls
Tota
l Rea
ding
:.46
Voc
abul
ary:
.35
Rea
ding
Com
preh
ensi
on:.
4977
3G
ER
eadi
ng a
loud
1W
RC
Woo
dcoc
k-Jo
hnso
n–II
I(s
ingl
e pa
ssag
e)B
road
Rea
ding
:.70
Rea
ding
Flu
ency
:.74
Pass
age
Com
preh
ensi
on:.
42L
ette
r W
ord
Iden
tific
atio
n:.6
2Io
wa
Test
of
Bas
ic S
kill
sTo
tal R
eadi
ng:.
64V
ocab
ular
y:.3
5R
eadi
ng C
ompr
ehen
sion
:.58
Wor
d Id
entif
icat
ion
Dal
y,W
righ
t,K
elly
,30
1G
EW
ord
1W
RC
Con
curr
ent:
Test
–ret
est
& M
arte
ns (
1997
)id
entif
icat
ion
Woo
dcoc
k-Jo
hnso
n–R
evis
ed
.94
Bro
ad R
eadi
ng:.
40Pr
edic
tive
Wor
d Id
entif
icat
ion:
.71
Rea
ding
Alo
ud:.
73Fu
chs,
Fuch
s,&
15
11
LA
Wor
d 1
WR
CC
oncu
rren
tC
ompt
on (
2004
)id
entif
icat
ion
Woo
dcoc
k R
eadi
ng M
aste
ry T
est–
Rev
ised
(WR
MT-
R)
Wor
d A
ttack
:.52
–.59
WR
MT-
R W
ord
Iden
tific
atio
n:.7
7–.8
2C
ompr
ehen
sive
Rea
ding
Ass
essm
ent
Bat
tery
(CR
AB
) Fl
uenc
y:.9
3C
RA
B C
ompr
ehen
sion
:.73
Pred
ictiv
e W
RM
T-R
Wor
d A
ttack
:.45
Wor
d Id
entif
icat
ion:
.63
CR
AB
Flu
ency
:.80
CR
AB
Com
preh
ensi
on:.
66Sl
ope
(1 y
ear)
WR
MT-
R W
ord
Atta
ck:.
50W
RM
T-R
Wor
d Id
entif
icat
ion:
.79
CR
AB
Flu
ency
:.85
CR
AB
Com
preh
ensi
on:.
66(t
able
con
tinu
es)
90(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Com
pton
,Fuc
hs,
206
1L
AW
ord
1W
RC
Pred
ictiv
e Fu
chs,
& B
ryan
t id
entif
icat
ion
Gra
de 1
(2
006)
5-w
eek
wor
d ID
leve
l:.8
05-
wee
k w
ord
ID s
lope
:.42
Gra
de 2
Unt
imed
Wor
d-L
evel
Rea
ding
:.39
Tim
ed W
ord-
Lev
el R
eadi
ng:.
46R
eadi
ng C
ompr
ehen
sion
:.42
Com
posi
te S
core
:.47
Gen
eral
izab
ility
of
Res
earc
h:St
uden
t Pop
ulat
ions
and
Pur
pose
s
Esp
in &
Den
o 12
110
HA
& L
AR
eadi
ng
1W
RC
(199
3a)
alou
d (E
nglis
h)H
ATe
sts
of A
chie
vem
ent
and
Pro
ficie
ncy
(TA
P) C
ompl
ete
Com
posi
te:.
36G
rade
poi
nt a
vera
ge:−
.02
Eng
lish
stud
y ta
sk:.
34Sc
ienc
e st
udy
task
:.43
LA
TAP–
Com
plet
e C
ompo
site
:.71
Gra
de p
oint
ave
rage
:.39
Eng
lish
stud
y ta
sk:.
54Sc
ienc
e st
udy
task
:.36
Esp
in &
Den
o12
110
ND
Rea
ding
alo
ud1
WR
CA
ltern
ate-
form
.91
(199
3b)
(Eng
lish)
Test
–ret
est .
91
Esp
in &
Den
o 12
010
ND
Rea
ding
alo
ud1
WR
CE
nglis
h st
udy
task
:.41
(199
5)(E
nglis
h)V
ocab
ular
y 10
CM
.44
mat
chin
g (E
nglis
h)R
eadi
ng
1W
RC
Scie
nce
stud
y ta
sk:.
32al
oud
(Sci
ence
)V
ocab
ular
y 10
CM
.40
Mat
chin
g (S
cien
ce)
Few
ster
& M
acM
illan
46
56–
7G
ER
eadi
ng a
loud
1W
RC
8th-
grad
e E
nglis
h m
ark:
.46
(200
2)8t
h-gr
ade
Soci
al S
tudi
es m
ark:
.39
Rea
ding
alo
ud1
WR
C9t
h-gr
ade
Eng
lish
mar
k:.3
89t
h-gr
ade
Soci
al S
tudi
es m
ark:
.36
Rea
ding
alo
ud1
WR
C10
th-g
rade
Eng
lish
mar
k:.3
110
th-g
rade
Soc
ial S
tudi
es m
ark:
.30
(tab
le c
onti
nues
)
91(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Yov
anof
f,D
uesb
ery,
6,01
24–
8N
DR
eadi
ng a
loud
1W
RC
Alo
nzo,
& T
inda
l (2
005)
4V
ocab
ular
y:.3
5–.6
3C
ompr
ehen
sion
:.60
–.65
5V
ocab
ular
y:.6
1–.6
5C
ompr
ehen
sion
:.60
–.62
6V
ocab
ular
y:.6
0–.6
1C
ompr
ehen
sion
; .42
–.52
7V
ocab
ular
y:.4
9–.5
3C
ompr
ehen
sion
; .42
–.48
8V
ocab
ular
y:.5
3–.5
7C
ompr
ehen
sion
:.51
–.52
Rel
ativ
e im
port
ance
of
flue
ncy
decr
ease
s as
gr
ade
incr
ease
s. V
ocab
ular
y is
a s
igni
fica
nt
pred
icto
r ac
ross
gra
des.
Esp
in &
Foe
gen
176
6–8
ND
Rea
ding
alo
ud1
WR
CC
ompr
ehen
sion
:.57
(199
6)A
cqui
sitio
n:.5
4R
eten
tion:
.52
Maz
e se
lect
ion
2C
SC
ompr
ehen
sion
:.56
Acq
uisi
tion:
.59
Ret
entio
n:.6
2V
ocab
ular
y m
atch
ing
5C
MC
ompr
ehen
sion
:.65
Acq
uisi
tion:
.64
Ret
entio
n:.6
2
Hix
son
& M
cGlin
chey
44
24
ND
Rea
ding
alo
ud1
WR
CM
ichi
gan
Edu
cati
onal
Ass
essm
ent
Pro
gram
(200
4)(M
EA
P) R
eadi
ngC
auca
sian
:.54
Afr
ican
Am
eric
an:.
54Pa
id L
unch
:.53
Free
/Red
uced
Lun
ch:.
50M
etro
poli
tan
Ach
ieve
men
t Tes
t–7
(MA
T-7)
Cau
casi
an:.
78A
fric
an A
mer
ican
:.64
Paid
Lun
ch:.
75Fr
ee/R
educ
ed L
unch
:.66
Rea
ding
alo
ud,l
unch
sta
tus,
and
race
eac
h si
gnif
ican
tly c
ontr
ibut
ed to
pre
dict
ing
perf
orm
-an
ce o
n M
EA
P an
d M
AT-
7
(tab
le c
onti
nues
)
92(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Kra
nzle
r,M
iller
,&
326
2–5
GE
Rea
ding
alo
ud1
WR
CA
t Gra
des
4–5,
sign
ific
ant i
nter
cept
bia
ses
Jord
an (
1999
)w
ere
foun
d ba
sed
on r
ace/
ethn
icity
. At
Gra
de 5
,sig
nifi
cant
inte
rcep
t and
slo
pe
bias
was
fou
nd b
ased
on
gend
er. N
o bi
ases
fo
und
at G
rade
s 2–
3.
Hin
tze,
Cal
laha
n,13
62–
5G
ER
eadi
ng a
loud
1W
RC
Age
and
rea
ding
alo
ud e
ach
sign
ific
antly
M
atth
ews,
Will
iam
s,co
ntri
bute
d w
hen
pred
ictin
g pe
rfor
man
ce o
n &
Tob
in (
2002
)W
oodc
ock
John
son–
Rev
ised
Rea
ding
Com
pre-
he
nsio
n. R
ace
and
soci
oeco
nom
ic s
tatu
s di
d
not
cont
ribu
te s
igni
fica
ntly
whe
n ad
ded
to
and
read
ing
alou
d m
odel
.
Bak
er &
Goo
d 76
2G
E/E
LL
Rea
ding
alo
ud1
WR
CC
onve
rgen
t Con
stru
ctE
nglis
h-on
ly(1
0 w
eeks
,(1
995)
2 tim
es p
er w
eek)
Eng
lish-
only
Poin
t .87
Lev
el .9
9Sl
ope
.39
Stan
ford
Dia
gnos
tic
Rea
ding
Tes
t Si
gnif
ican
tly d
if-
(SD
RT
)–To
tal S
core
:.51
fere
nt s
lope
s w
ere
foun
d fo
r E
nglis
h-
only
and
bili
ngua
l st
uden
ts w
ith b
i-
lingu
al s
tude
nts
show
ing
grea
ter
grow
th.
Stan
ford
Rea
ding
Com
preh
ensi
on
subt
est–
pret
est:
.56
Bili
ngua
lPo
int .
92L
evel
.99
Slop
e .4
9Te
ache
r ra
ting:
.82
Bili
ngua
lSD
RT
–Tot
al S
core
:.53
Stan
ford
Rea
ding
Com
preh
ensi
on
subt
est–
pret
est:
.73
Stan
ford
Rea
ding
Com
preh
ensi
on
subt
est–
post
test
:.76
Teac
her
ratin
g of
rea
ding
:.80
Dis
crim
inan
t Con
stru
ctE
nglis
h-on
lyE
nglis
h L
angu
age
Flue
ncy:
.54
Teac
her
ratin
g of
Eng
lish:
.62
(tab
le c
onti
nues
)
93(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Bili
ngua
lE
nglis
h L
angu
age
Flue
ncy:
.44
Teac
her
ratin
g of
Eng
lish:
.62
Lan
guag
e A
sses
smen
t Sca
le–E
nglis
h:.4
7
Wile
y &
Den
o (2
005)
3 &
5G
E &
EL
LM
inne
sota
Com
preh
ensi
ve A
sses
smen
t(M
CA
)36
3G
ER
eadi
ng a
loud
1W
RC
.71
Maz
e se
lect
ion
1C
S.7
3E
LL
Rea
ding
alo
ud.6
1M
aze
sele
ctio
n.5
233
5G
ER
eadi
ng a
loud
.57
Maz
e se
lect
ion
.73
EL
LR
eadi
ng a
loud
.69
Maz
e se
lect
ion
.57
Gra
ves,
Plas
enci
a-77
1E
LL
,R
eadi
ng a
loud
WR
C(6
wee
ks,w
eekl
y)Pe
inad
o,D
eno,
&
HA
,Jo
hnso
n (2
005)
AA
,&
L
AE
LL
/HA
1.40
WR
C
per
wee
kE
LL
/HA
2.30
WR
C
per
wee
kE
LL
/2.
10 W
RC
L
Ape
r w
eek
Kle
in &
Jim
erso
n 4,
000
1–3
GE
Rea
ding
alo
ud1
WR
CSt
anfo
rd A
chie
vem
ent T
est–
9To
tal
(200
5)R
eadi
ng,c
oncu
rren
tH
ispa
nic:
.71–
.82
Cau
casi
an:.
63–.
82Fe
mal
e:.7
5–.8
2M
ale:
.73–
.85
Free
/Red
uced
Lun
ch:.
73–.
82R
egul
ar L
unch
:.64
–.83
Hom
e L
angu
age
Span
ish:
.70–
.82
Hom
e L
angu
age
Eng
lish:
.67–
.82
SAT-
9 To
tal R
eadi
ng,p
redi
ctiv
eH
ispa
nic:
.66
Cau
casi
an:.
58Fe
mal
e:.7
2M
ale:
.62
Free
/Red
uced
Lun
ch:.
62
(tab
le c
onti
nues
)
94(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Reg
ular
Lun
ch:.
70H
ome
Lan
guag
e Sp
anis
h:.6
5H
ome
Lan
guag
e E
nglis
h:.6
3In
terc
ept b
ias
evid
ent w
hen
ethn
icity
and
ho
me
lang
uage
fac
tors
wer
e co
mbi
ned
Mor
gan
& B
radl
ey-
153–
7V
IR
eadi
ng a
loud
WR
CA
ltern
ate
form
John
son
(199
5)1
Dia
gnos
tic
Rea
ding
Sca
les
.88–
.93
Dec
odin
g L
evel
:.60
2.6
6.9
4–.9
83
.65
.94–
.96
Test
–ret
est
1D
iagn
osti
c R
eadi
ng S
cale
s .8
4–.9
4C
ompr
ehen
sion
Lev
el:.
782
.80
.93–
.96
3.8
2.9
3–.9
71
Dia
gnos
tic
Rea
ding
Sca
les
Ove
rall
Rea
ding
Lev
el:.
762
.80
3.8
0
Alli
nder
& E
ccar
ius
36—
D
/HH
Rea
ding
alo
udW
RC
The
Tes
t of
Ear
ly R
eadi
ng A
bili
ty—
Dea
f A
ltern
ate
form
(199
9)an
d H
ard
of H
eari
ng V
ersi
on1
.30
.85
3.2
1.9
4
Cra
wfo
rd,T
inda
l,&
51
2 &
3G
E1
WR
CSt
atew
ide
read
ing
asse
ssm
ent
Stie
ber
(200
1)(m
ultip
le-c
hoic
e)2
Pred
ictiv
e:.6
63
Con
curr
ent:
.60
Goo
d,Si
mm
ons,
&
706
1 &
3G
ER
eadi
ng a
loud
1W
RC
Ben
chm
ark
goal
of
40 W
RC
in 1
min
in
Kam
e’en
ui (
2001
)G
rade
1 p
redi
cted
con
tinue
d re
adin
g su
cces
s in
Gra
de 2
. Ben
chm
ark
goal
of
110
WR
C in
1
min
in G
rade
3 p
redi
cted
suc
cess
on
the
Ore
gon
Stat
ewid
e A
sses
smen
t.
Hin
tze
& S
ilber
glitt
1,
766
1–3
ND
Rea
ding
alo
ud1
WR
CM
inne
sota
Com
preh
ensi
ve A
sses
smen
t A
ltern
ate
form
(200
5)(M
CA
) 1
Pred
ictiv
e:.4
9–.5
8.8
9–.9
12
Pred
ictiv
e:.6
1–.6
8.8
0–.8
53
Con
curr
ent:
.69
.83–
.87
(tab
le c
onti
nues
)
95(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Res
ults
of
disc
rim
inat
ive
anal
ysis
,log
istic
re
gres
sion
,and
Rec
eive
r O
pera
ting
Cha
ract
er-
istic
(R
OC
) cu
rves
indi
cate
d th
at r
eadi
ng a
loud
is
an
effi
cien
t met
hod
for
pred
ictin
g su
cces
s on
th
e M
CA
.
McG
linch
ey &
1,
362
4N
DR
eadi
ng a
loud
1W
RC
Mic
higa
n E
duca
tion
al A
sses
smen
t P
rogr
am
Test
–ret
est
Hix
son
(200
4)(M
EA
P):.
49–.
81.8
7–.9
5Se
nsiti
vity
= 7
5%Sp
ecif
icity
= 7
4%C
orre
ct c
lass
ific
atio
n =
74%
ME
AP
base
fai
lure
rat
e =
54%
ME
AP
base
pas
s ra
te =
46%
Stag
e &
Jac
obso
n 17
34
ND
Rea
ding
alo
ud1
WR
CW
ashi
ngto
n A
sses
smen
t of
Stu
dent
Lea
rnin
g(2
001)
(WA
SL)
Lev
el:.
51Se
nsiti
vity
= 6
6%Sp
ecif
icity
= 7
6%C
orre
ct c
lass
ific
atio
n =
74%
Bas
e fa
ilure
rat
e =
20%
Bas
e pa
ss r
ate
= 8
0%
Silb
ergl
itt &
Hin
tze
2,19
11–
3G
ER
eadi
ng a
loud
1W
RC
Min
neso
ta C
ompr
ehen
sive
Ass
essm
ent
(MC
A)
(200
5)1
Pred
ictiv
e:.5
72
Pred
ictiv
e:.6
73
Con
curr
ent:
.71
Log
istic
reg
ress
ion
and
RO
C c
urve
ana
lysi
s w
ere
the
stro
nges
t met
hods
for
est
ablis
hing
cut
sco
res.
R
OC
cur
ve a
naly
sis
was
the
mos
t fle
xibl
e.
Cur
ricu
lum
Eff
ects
Tin
dal,
Mar
ston
,Den
o,66
01–
6G
ER
eadi
ng a
loud
1W
RC
Dif
fere
nt c
urri
cula
pro
duce
d m
ean
leve
l dif
fer-
& G
erm
ann
(198
2,W
ord
iden
tific
atio
nen
ces
in r
eadi
ng a
loud
and
wor
d id
entif
icat
ion
IRL
D #
93)
scor
es. A
ll cu
rric
ula
show
ed a
cros
s gr
ade
grow
th.
Tin
dal,
Flic
k,&
12
2–5
SER
eadi
ng a
loud
1W
RC
(31
wee
ks,1
–2
Col
e (1
992)
times
per
wee
k);
stud
ent p
erfo
rm-
ance
on
pass
ages
from
inst
ruct
iona
lm
ater
ial w
as s
imi-
lar
to p
erfo
rman
ceon
pas
sage
s fr
omm
ains
trea
m
mat
eria
l.(t
able
con
tinu
es)
96(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Fuch
s &
Den
o 91
1–6
ND
Rea
ding
alo
ud1
WR
CW
oodc
ock
Rea
ding
Mas
tery
Tes
tPa
ssag
e (1
992)
Com
preh
ensi
onG
inn
720
.89–
.93
(pre
prim
er–7
)Sc
ott-
Fore
sman
.9
0–.9
3U
nlim
ited
(pre
prim
er–6
)
Hin
tze,
Shap
iro,
572–
4N
DR
eadi
ng a
loud
1W
RC
The
Deg
rees
of
Rea
ding
Pow
er T
est
Con
te,&
Bas
ile
Aut
hent
ic tr
ade
.64–
.69
(199
7)bo
oks
(Lev
els
1–5)
Lite
ratu
re-b
ased
.6
2–.6
9ba
sal (
Lev
els
1–5)
Har
tman
& F
ulle
r 26
1G
ESt
anfo
rd A
chie
vem
ent T
est
Test
–ret
est
(199
7).9
1–.9
924
2R
eadi
ng a
loud
1W
RC
Wor
d R
eadi
ng S
kills
:.72
Rea
ding
Com
preh
ensi
on:.
89W
ord
iden
tific
atio
n1
WR
CW
ord
Rea
ding
Ski
lls:.
71R
eadi
ng C
ompr
ehen
sion
:.82
453
Rea
ding
alo
ud1
WR
CW
ord
Rea
ding
Ski
lls:.
71R
eadi
ng C
ompr
ehen
sion
:.80
Wor
d id
entif
icat
ion
1W
RC
Wor
d R
eadi
ng S
kills
:.52
Rea
ding
Com
preh
ensi
on:.
56
Bro
wn-
Chi
dsey
,21
5N
DM
aze
sele
ctio
n 2
CS
Con
curr
ent L
itera
ture
-Bas
ed P
assa
ge:.
74–.
92C
ontr
olle
d an
d Jo
hnso
n,&
co
ntro
lled
pass
age
liter
atur
e-ba
sed
pas-
Fern
stro
m (
2005
)sa
ges
show
ed s
igni
fi-
cant
gro
wth
acr
oss
fall,
win
ter,
and
spri
ng.
Mea
n sc
ores
wer
e co
nsis
tent
ly h
ighe
r fo
r co
ntro
lled
pass
ages
.
Bra
dley
-Klu
g,2
& 5
GE
Rea
ding
alo
ud1
WR
CSh
apir
o,L
utz,
&
282
Lite
ratu
re =
1.3
6 D
uPau
l (19
98)
WR
C p
er w
eek
Bas
al =
1.3
2 W
RC
per
wee
k30
5L
itera
ture
= 0
.84
WR
C p
er w
eek
Bas
al =
0.3
2 W
RC
per
wee
k(1
0 w
eeks
,2
times
per
wee
k)(t
able
con
tinu
es)
97(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Hin
tze,
Shap
iro,
&
243
GE
Rea
ding
alo
ud1
WR
CPa
ralle
l for
m(9
wee
ks,2
tim
esL
utz
(199
4)pe
r w
eek)
Lite
ratu
re-b
ased
L
itera
ture
Gro
up =
pa
ssag
e−1
.04
WR
C p
erpe
r w
eek
.56–
.95
Tra
ditio
nal
Gro
up =
-0.
34
WR
C p
er w
eek
Tra
ditio
nal b
asal
L
itera
ture
Gro
up =
pa
ssag
e0.
66 W
RC
per
wee
k.6
0–.9
6Tr
aditi
onal
Gro
up =
1.
72 W
RC
per
w
eek
Hin
tze
& S
hapi
ro
160
2–5
GE
Rea
ding
alo
ud1
WR
CPa
ralle
l for
m(8
wee
ks,
(199
7).6
5–.9
7 2
times
per
wee
k)2
Lite
ratu
re-b
ased
L
itera
ture
Gro
up =
pa
ssag
e0.
56 W
RC
per
w
eek
Tra
ditio
nal
Gro
up =
0.6
8 W
RC
per
wee
kT
radi
tiona
l bas
al
Lite
ratu
re G
roup
=
pass
age
−0.4
6 W
RC
per
w
eek
Tra
ditio
nal
Gro
up =
−0.
26
WR
C p
er w
eek
3L
itera
ture
-bas
ed
Lite
ratu
re G
roup
=
pass
age
1.81
WR
C p
er
wee
kT
radi
tiona
l G
roup
= 1
.58
WR
C p
er w
eek
Tra
ditio
nal b
asal
L
itera
ture
Gro
up =
pa
ssag
e0.
67 W
RC
per
w
eek
Tra
ditio
nal
Gro
up =
0.3
8 W
RC
per
wee
k4
Lite
ratu
re-b
ased
L
itera
ture
Gro
up =
pa
ssag
e2.
69 W
RC
per
w
eek
(tab
le c
onti
nues
)
98(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Tra
ditio
nal
Gro
up =
2.2
9 W
RC
per
wee
kT
radi
tiona
l bas
al
Lite
ratu
re G
roup
=pa
ssag
e1.
72 W
RC
per
w
eek
Tra
ditio
nal
Gro
up =
1.9
0 W
RC
per
wee
k5
Lite
ratu
re-b
ased
L
itera
ture
Gro
up =
pass
age
1.18
WR
C p
er
wee
kT
radi
tiona
l G
roup
= 2
.05
WR
C p
er w
eek
Tra
ditio
nal b
asal
L
itera
ture
Gro
up =
pass
age
1.72
WR
C p
er
wee
kT
radi
tiona
l G
roup
= 2
.82
WR
C p
er w
eek
Pow
ell-
Smith
&
362
LA
/R
eadi
ng a
loud
1W
RC
Bas
al =
2.6
9 W
RC
B
radl
ey K
lug
(200
1)C
h. 1
per
wee
kG
ener
ic =
2.4
1 W
RC
per
wee
k(5
wee
ks,2
tim
es
per
wee
k)
Rile
y-H
elle
r,K
elly
-13
2L
AR
eadi
ng a
loud
1W
RC
.65
WR
C p
er d
ayV
ance
,& S
hriv
er
Gen
eric
=
(200
5)C
urri
culu
m-
depe
nden
t(5
wee
ks,2
tim
es
per
wee
k)
Dif
ficu
lty L
evel
Shin
n,G
leas
on,&
30
3–8
SE/C
h. 1
R
eadi
ng a
loud
1W
RC
(4 w
eeks
,4 ti
mes
Tin
dal (
1989
)pe
r w
eek)
1 be
low
gra
de
leve
l = 4
.30
WR
Cpe
r w
eek
1 ab
ove
grad
e le
vel =
3.7
0 W
RC
per
wee
k(t
able
con
tinu
es)
99(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
2 ab
ove
grad
e le
vel =
4.5
5 W
RC
pe
r w
eek
4 ab
ove
grad
e le
vel =
2.3
5 W
RC
pe
r w
eek
Dun
n &
Eck
ert (
2002
)20
2–3
GE
Rea
ding
alo
ud1
WR
CSi
mila
r m
ater
ial
.65
WR
C p
er w
eek
Cha
lleng
ing
mat
e-ri
al .9
2 W
RC
per
w
eek
(8 w
eeks
,2 ti
mes
pe
r w
eek)
Hin
tze,
Dal
y,&
80
1–4
ND
Rea
ding
alo
ud1
WR
C(1
1 w
eeks
,2Sh
apir
o (1
998)
times
per
wee
k)1
Gra
de L
evel
=
3.29
WR
C p
er w
eek
Goa
l Lev
el =
1.9
5 W
RC
per
wee
k2
Gra
de L
evel
= .7
2 W
RC
per
wee
kG
oal L
evel
= .3
0 W
RC
per
wee
k3
Gra
de L
evel
= .1
6 W
RC
per
wee
kG
oal L
evel
= .0
6 W
RC
per
wee
k4
Gra
de L
evel
=
1.57
WR
C p
er w
eek
Goa
l Lev
el =
1.8
5 W
RC
per
wee
k
Fuch
s,T
inda
l,&
202-
4G
EW
ord
5W
RC
Slop
e (2
wee
ks,
Den
o (1
981,
IRL
D #
48,
(rea
ding
id
entif
icat
ion
.da
ily):
Gra
de-
Stud
y II
I)in
stru
ctio
nal
Spec
ific
Dom
ain
≠le
vel)
Ent
ire
Gra
de
Dom
ain
Ent
ire
Gra
de
Dom
ain
≠ A
cros
s G
rade
Dom
ain
Stan
dard
Err
or o
f E
stim
ates
:
(tab
le c
onti
nues
)
100(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Gra
de-s
peci
fic
dom
ain
≠E
ntir
e gr
ade
dom
ain
Ent
ire
grad
e do
mai
n =
acr
oss
grad
ePo
ncy,
Skin
ner,
&
373
ND
Rea
ding
alo
ud1
WR
CSt
uden
t ski
lls
Axt
ell (
2005
)ac
coun
ted
for
81%
of
the
vari
ance
,and
pa
ssag
e ac
coun
ted
for
10%
of
the
vari
ance
.
Hin
tze
& C
hris
t 99
2–5
ND
Rea
ding
alo
ud1
WR
C(1
1 w
eeks
,(2
004)
2 tim
es p
er w
eek)
2U
ncon
trol
led
=
−.06
WR
C p
er
wee
k C
ontr
olle
d =
.25
WR
C p
er w
eek
3U
ncon
trol
led
=
−.05
WR
C p
er
wee
kC
ontr
olle
d =
.50
WR
C p
er w
eek
4U
ncon
trol
led
=
.48
WR
C p
er
wee
kC
ontr
olle
d =
.24
WR
C p
er w
eek
5U
ncon
trol
led
=
.42
WR
C p
er
wee
kC
ontr
olle
d =
1.0
2 W
RC
per
wee
k
Com
pton
,App
leto
n,24
82
AA
&L
AR
eadi
ng a
loud
1W
RC
Perc
enta
ge o
f hi
gh-f
requ
ency
wor
ds
& H
osp
(200
4)an
d de
coda
ble
wor
ds a
ccou
nted
fo
r 41
% o
f th
e va
rian
ce in
rea
ding
ac
cura
cy a
nd 5
4% o
f th
e va
rian
ce
in r
eadi
ng f
luen
cy.
(tab
le c
onti
nues
)
101(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Rel
iabi
lity
of G
row
th R
ates
Hin
tze,
Ow
en,S
hapi
ro,
& D
aly
(200
0)St
udy
116
02–
5G
ER
eadi
ng a
loud
1W
RC
Indi
vidu
al d
iffe
renc
es
(8 w
eeks
,ac
coun
ted
for
48%
2
times
of th
e va
rian
ce a
nd
per
wee
k)gr
ade
acco
unte
d fo
r 19
% o
f th
e va
rian
ce.
Stud
y 2
801–
4N
DR
eadi
ng a
loud
1W
RC
Indi
vidu
al d
iffe
renc
es
(10
wee
ks,
acco
unte
d fo
r 42
%
2 tim
esof
the
vari
ance
,and
pe
r w
eek)
grad
e ac
coun
ted
for
36%
of
the
vari
ance
.
Bro
wn-
Chi
dsey
,47
65–
8N
DM
aze
sele
ctio
n10
CS
Gra
de le
vel a
ccou
nted
D
avis
,& M
aya
(200
3)fo
r 68
%–7
1% o
f th
e va
rian
ce o
n tw
o pa
ssag
es.
Indi
vidu
al d
iffe
renc
es in
sc
ores
acc
ount
ed f
or 8
4%
of th
e va
rian
ce o
n on
e pa
ssag
e.
Hin
tze
& P
elle
12
3–4
GE
& S
ER
eadi
ng a
loud
1W
RC
Indi
vidu
al d
iffe
renc
es
(8 w
eeks
,Pe
titte
(20
01)
acco
unte
d fo
r 62
%
2 tim
es p
er
of th
e va
rian
ce,a
nd
wee
k)re
adin
g gr
oup
acco
unte
d fo
r 15
% o
f th
e va
rian
ce.
Mar
ston
& D
eno
263
GE
Rea
ding
alo
ud1
WR
CPa
ired
tte
st a
t(1
982,
IRL
D #
106)
Wee
ks 1
and
16
show
ed s
igni
-fi
cant
gro
wth
on
rea
ding
al
oud;
rea
ding
al
oud
show
ed
grea
ter
grow
th
whe
n co
mpa
red
to b
asal
ser
ies
test
s.
(tab
le c
onti
nues
)
102(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
Mar
ston
,Den
o,&
83
3–6
LA
Wor
d id
entif
icat
ion
1W
RC
Pair
ed t
test
at
Tin
dal (
1983
,IR
LD
W
eeks
1 a
nd 1
0 #1
26)
show
ed s
igni
-fi
cant
gro
wth
fo
r al
l wor
d lis
ts (
p <
.001
).
Skib
a,D
eno,
Mar
ston
,67
1–7
SER
eadi
ng a
loud
1W
RC
(6 m
onth
s,3–
5 &
Wes
son
(198
6)tim
es p
er w
eek)
1 21.
78 W
RC
per
w
eek,
SD=
1.2
13
1.62
WR
C p
er
wee
k,SD
= .7
14
1.42
WR
C p
er
wee
k,SD
= .9
25
1.36
WR
C p
er
wee
k,SD
= .6
66 7
Shin
,Den
o,&
Esp
in
432
GE
/M
aze
sele
ctio
n3
CS
Alte
rnat
e 1.
20 C
S pe
r (2
000)
Ch.
1fo
rm .7
5–.9
0m
onth
.91
CS
per
mon
th(9
mon
ths,
mon
thly
)
Spee
ce &
Ritc
hey
276
1H
A/A
A
Rea
ding
alo
ud1
WR
C(2
0 w
eeks
,(2
005)
& L
Aw
eekl
y &
m
onth
ly)
HA
/AA
1.5
WR
C p
er
wee
kL
A.7
7 W
RC
per
w
eek
Stan
dard
Rat
es o
f G
row
th
Mar
ston
,Low
ry,
581–
6G
EA
vera
ge %
of
Den
o,&
Mir
kin
grow
th
(198
1,IR
LD
1
Wor
d id
entif
icat
ion
1W
RC
251
#49)
284
325
(tab
le c
onti
nues
)
103(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
49
514
621
1R
eadi
ng a
loud
1W
RC
150
275
326
424
528
630
Mac
Mill
an (
2000
)1,
691
2–7
GE
Rea
ding
alo
ud1
WR
CSi
gnif
ican
t gr
owth
with
in
grad
es. G
row
th
rate
s de
crea
sed
as g
rade
in-
crea
sed.
Sig
nifi
-ca
nt d
iffe
renc
es
wer
e fo
und
by
gend
er (
fe-
mal
e >
mal
e).
(1 s
choo
l yea
r,3
times
a y
ear)
Fuch
s,Fu
chs,
Ham
lett,
(1 s
choo
l yea
r,W
alz,
& G
erm
ann
wee
kly)
(199
3)St
udy
111
71
ND
Rea
ding
alo
ud1
WR
C2.
10 W
RC
per
w
eek,
SD=
.80
21.
46 W
RC
per
w
eek,
SD=
.69
31.
08 W
RC
per
w
eek,
SD =
.52
40.
84 W
RC
per
w
eek,
SD=
.30
50.
49 W
RC
per
w
eek,
SD=
.28
60.
32 W
RC
per
w
eek,
SD=
.33
Stud
y 2
257
1N
DM
aze
sele
ctio
n2.
5C
S0.
34 C
S pe
r w
eek,
SD=
.39
20.
39 C
S pe
r w
eek,
SD=
.24
30.
47 C
S pe
r w
eek,
SD=
.37
(tab
le c
onti
nues
)
104(T
able
1 c
onti
nued
)
Sam
ple
Rea
ding
mea
sure
Res
ults
Typ
e of
T
ime
Scor
ing
Stud
yN
Gra
de(s
)L
evel
mea
sure
(min
)pr
oced
ure
Val
idit
yR
elia
bilit
y G
row
th
40.
38 C
S pe
r w
eek,
SD=
.32
50.
36 C
S pe
r w
eek,
SD=
.23
60.
27 C
S pe
r w
eek,
SD=
.25
Den
o,Fu
chs,
2,99
91–
6G
E &
SE
Rea
ding
alo
ud1
WR
C(1
sch
ool y
ear,
Mar
ston
,& S
hin
wee
kly
or
(200
1)3
times
a y
ear)
1G
E1.
80 W
RC
per
w
eek,
SE=
.15
SE0.
83 W
RC
per
w
eek,
SE=
.15
2G
E1.
66 W
RC
per
w
eek,
SE=
.09
SE0.
57 W
RC
per
w
eek,
SE=
.09
3G
E1.
18 W
RC
per
w
eek,
SE=
.10
SE0.
58 W
RC
per
w
eek,
SE=
.11
4G
E1.
01 W
RC
per
w
eek,
SE=
.05
SE0.
58 W
RC
per
w
eek,
SE=
.05
5G
E0.
58 W
RC
per
w
eek,
SE=
.05
SE0.
58 W
RC
per
w
eek,
SE=
.08
6G
E0.
66 W
RC
per
w
eek,
SE=
.04
SE0.
62 W
RC
per
w
eek,
SE=
.14
Not
e.A
A =
ave
rage
ach
ievi
ng; C
h. 1
= C
hapt
er 1
; CM
= c
orre
ct m
atch
es; C
S =
cor
rect
sel
ectio
ns; D
/HH
= d
eaf
or h
ard
of h
eari
ng; E
LL
= E
nglis
h la
ngua
ge le
arne
rs; G
E =
gen
eral
edu
catio
n; H
A =
hig
h ac
hiev
ing;
LA
= lo
wac
hiev
ing;
ND
= s
ampl
e in
clud
ed g
ener
al e
duca
tion
and
spec
ial e
duca
tion,
anal
ysis
not
dif
fere
ntia
ted
by g
roup
; SD
= s
tand
ard
devi
atio
n; S
E=
sta
ndar
d er
ror
(of
mea
sure
men
t); S
E =
spe
cial
edu
catio
n; V
I =
vis
ual i
mpa
irm
ent;
WR
C =
wor
ds r
ead
corr
ectly
. If
no s
peci
fic
num
bers
wer
e re
port
ed f
or s
tude
nts
rece
ivin
g sp
ecia
l edu
catio
n se
rvic
es,t
he p
artic
ipan
ts w
ere
assu
med
to b
e in
gen
eral
edu
catio
n. T
ests
:St
anfo
rd A
chie
vem
ent T
est–
7(G
ardn
er e
t al.,
1982
a,19
83);
Woo
dcoc
k R
eadi
ng M
aste
ry T
est
Rev
ised
(Woo
dcoc
k,19
87);
Kau
fman
Bri
ef I
ntel
lige
nce
Test
(Kau
fman
& K
aufm
an,1
990)
; Kau
fman
Tes
t of
Edu
cati
onal
Ach
ieve
men
t(K
aufm
an &
Kau
fman
,198
5); G
ates
-M
acG
initi
e (M
acG
initi
e et
al.,
1978
); M
etro
poli
tan
Ach
ieve
men
t Tes
t–6
(Pre
scot
t et a
l.,19
84);
Woo
dcoc
k-Jo
hnso
n–II
I (M
cGre
w &
Woo
dcoc
k,20
01);
Iow
a Te
st o
f B
asic
Ski
lls
(Hoo
ver
et a
l.,19
96);
Woo
dcoc
k-Jo
hnso
n–R
evis
ed(W
oodc
ock
& J
ohns
on,1
989)
; Com
preh
ensi
ve R
eadi
ng A
sses
smen
t B
atte
ry(F
uchs
,Fuc
hs,&
Ham
lett,
1989
); T
ests
of A
chie
vem
ent
and
Pro
ficie
ncy
(Sca
nnel
l et a
l.,19
86);
Mic
higa
n E
duca
tion
al A
sses
smen
t P
rogr
am(M
ichi
gan
Stat
e B
oard
of
Edu
catio
n,19
99);
Met
ropo
lita
n A
chie
vem
ent T
est–
7(H
arco
urt E
duca
tiona
l Mea
sure
men
t,19
98);
Sta
nfor
d D
iagn
osti
c R
eadi
ng T
est
(Kar
lsen
& G
ardn
er,1
985)
; Min
neso
ta C
ompr
ehen
sive
Ass
essm
ent
(Min
neso
taD
epar
tmen
t of
Chi
ldre
n Fa
mili
es,a
nd L
earn
ing,
1998
-200
2); S
tanf
ord
Ach
ieve
men
t Tes
t–9
(Har
cour
t Bra
ce &
Com
pany
,199
7); D
iagn
osti
c R
eadi
ng S
cale
s(S
pach
e,19
81);
The
Tes
t of
Ear
ly R
eadi
ng A
bili
ty—
Dea
f an
d H
ard
ofH
eari
ng V
ersi
on(R
eid
et a
l.,19
91);
Ore
gon
Stat
ewid
e A
sses
smen
t(O
rego
n D
epar
tmen
t of
Edu
catio
n,20
00);
Was
hing
ton
Ass
essm
ent
of S
tude
nt L
earn
ing
(199
8); T
he D
egre
es o
f R
eadi
ng P
ower
Tes
t (K
oslin
et a
l.,19
89).
THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007 105
by examining evidence related to the technical adequacy ofthe various measures used in CBM reading and consider gen-eralizability of the research to other students and purposes.
Technical Adequacy of CBM Measures
CBM Reading Measures
We focus our review on the three measures most commonlyused in CBM reading: reading aloud, maze selection, andword identification. In reading aloud, students read aloudfrom a passage, usually for 1 min, and the number of wordsread correctly is scored (Deno, 1985). Omissions, insertions,substitutions, hesitations, and mispronunciations are markedas errors. In maze selection, students read through a passagein which (typically) every seventh word has been deleted andreplaced with three word choices—one correct choice and twodistracters. (Rules for creating maze passages can be found inD. Fuchs and L. S. Fuchs, 1992.) Students read the passagesilently, usually for 1 to 3 min, making selections as they read.The number of correct selections is scored. In word identi-fication, students read aloud from a list of high-frequencywords, usually for 1 min, and the number of words read cor-rectly is scored (Deno, Mirkin, & Chiang, 1982). Omissions,insertions, substitutions, hesitations, and mispronunciationsare marked as errors. Words are usually selected from wordlists or from the reading curriculum. By far, the majority ofresearch has focused on the reading-aloud measure. Recently,however, interest in maze selection and word identificationhas grown as CBM has been extended to younger and olderstudents and as computerized progress monitoring has be-come a possibility.
Reading Aloud
Despite the early support for the technical adequacy of thereading-aloud measures, practitioners and researchers alikecontinued to express doubts about the relation between thesimple measure of reading aloud from text for 1 minute andreading proficiency, especially proficiency in reading com-prehension (e.g., see Mehrens & Clarizio, 1993; Yell, 1992).Thus, in the 1980s and 1990s, efforts were made to more closelyexamine the nature of the relationship between reading aloudand general reading proficiency, especially reading compre-hension. This research took two different approaches. Thefirst approach sought to clarify the relation between readingaloud and reading comprehension by considering alternativemeasures that might be more closely linked to reading com-prehension and by examining the theoretical underpinnings ofthe relation between reading aloud and reading proficiency.The second approach sought to examine the concomitant re-lation between CBM reading aloud and reading comprehen-sion, with a focus on the individual student.
Clarification of the Relation Between Reading Aloudand Reading Comprehension. Fuchs, Fuchs, and Maxwell(1988) compared the validity of CBM reading-aloud measuresto that of other measures typically used to assess reading com-prehension, including cloze (where every seventh word is de-leted from a text and replaced with a blank), story retell, andquestion-answering measures. Participants were students withmild disabilities in Grades 4 to 8. Results revealed that reading-aloud scores correlated more strongly with scores on the com-prehension and word skills subtests of a standardized achieve-ment test (r = .91 and r = .80, respectively) than did scoresfrom the other “typical” comprehension measures (rs = .76 to.82 for the reading comprehension and .66 to .76 for the wordskills subtests, respectively). Results of the Fuchs et al. studysuggested that reading aloud was more than just a measure offluent decoding, a notion that was supported in subsequent re-search investigating the theoretical nature of the relationshipbetween reading aloud and reading comprehension.
Shinn, Good, Knutson, Tilly, and Collins (1992) usedconfirmatory factor analysis to examine the role of readingaloud as it related to decoding, fluency, and comprehensionskills for students in Grades 3 and 5. A single-factor model of“reading competence” was validated for third-graders, withall reading skills making significant contributions. In con-trast, a two-factor model including decoding and comprehen-sion as two separate but highly related factors was validatedfor fifth graders, with reading aloud loading on the decodingfactor. Hosp and Fuchs (2005) also observed changes in thenature of the relationship between reading aloud and readingproficiency associated with age. Relationships between CBMreading aloud and the Decoding, Word Reading, and Com-prehension subtests of the Woodcock Reading Mastery Test–Revised (WRMT; Woodcock, 1987) were similar in magnitudefor students in Grades 2 and 3 (ranging from .82 to .88), butin Grade 4, lower correlations were observed for the Decod-ing and Word Reading subtests (rs = .72 and .73, respectively)than for the Reading Comprehension subtest (r = .82).
Kranzler, Brownell, and Miller (1998), in a somewhatdifferent approach, posed the hypothesis that the number ofwords read aloud from text in 1 min might merely be a re-flection of general speed of processing. Kranzler et al. exam-ined the roles of general cognitive ability, speed and efficiencyof elemental cognitive processing, and reading aloud in theprediction of reading comprehension for students in Grade 4.Multiple regression analyses revealed a significant relation-ship between reading aloud and reading comprehension mea-sures that could not be explained by general cognitive abilityor speed and efficiency of elemental cognitive processing. Re-sults suggested that reading aloud was not merely an indicatorof general cognitive processing speed. Kranzler et al. noted,however, that although reading aloud had the highest stan-dardized regression coefficient when compared to cognitiveability and mental speed, it explained only 11% of the uniquevariance found in reading comprehension.
106 THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007
Concomitant Change in Reading Aloud and ReadingComprehension. The studies reviewed in the previous sec-tion focused on patterns of results across groups; however, theconcerns raised by practitioners often focus on the nature ofthe relationship between CBM and reading comprehension forthe individual student. One such concern is whether readingaloud and reading comprehension change concomitantly. Mar-kell and Deno (1997) addressed this issue by experimentallymanipulating the difficulty level of reading material. Studentsin Grade 3 read passages that were two levels below, at, andtwo levels above grade level. Students also completed twocomprehension tasks for each passage—question answeringand maze selection. Results revealed that, on average, studentsread significantly fewer words in 1 min on the more difficultpassages, answered fewer questions correct, and selected fewercorrect maze choices, supporting the general relation betweenwords read aloud in 1 min and reading comprehension. How-ever, at the individual level, it appeared that the amount of per-formance change was an important factor to be considered.Results revealed that for only 52% of the students did the rankordering of reading-aloud scores match the rank ordering ofcomprehension scores on all three levels of materials. Takinga more liberal approach, and controlling for a ceiling effecton the comprehension measures, the percentage of studentsfor whom rankings matched on only the highest and lowestlevels were examined. This analysis resulted in an agreementof 100% and 96% for question answering and maze, respec-tively. These results suggested that a relatively large changein the number of words read in 1 min (perhaps as large as15–20 words) was needed to predict with certainty a con-comitant change in reading comprehension.
A second concern raised by practitioners is the existenceof “word callers,” that is, students who can read fluently butdo not comprehend. Hamilton and Shinn (2003) examinedteachers’ ability to identify word callers. Third-grade teach-ers were asked to identify one to two students who were wordcallers (WC) and one to two similarly fluent peers (SFP). Sim-ilarly fluent peers were students whom teachers judged ashaving fluency rates similar to the WC students but withhigher levels of comprehension. Results confirmed differ-ences in comprehension levels between the students in the WCand SFP groups, with SFP students scoring higher on com-prehension tasks. However, results also revealed differencesin reading fluency, with scores for WC students lower thanthose for SFP students, calling into question teachers’ abilityto identify students as word callers. We note that the Hamil-ton and Shinn (2003) study did not address the actual exis-tence of word callers, just teachers’ judgment of word callers.To examine the existence of word callers, one would need toexamine the relative standing of students on reading aloud andreading comprehension measures. Word callers would bethose whose relative standing on the reading aloud measureswere substantially higher than their standing on the readingcomprehension measures.
Maze Selection
Although the number of words read aloud in 1 min demon-strated good technical adequacy in the early IRLD research,the measure was limited in the sense that it had to be admin-istered individually, it lacked face validity, and it was unclearwhether the measure would be appropriate for older studentswho might presumably reach an asymptote in reading aloudperformance. These factors, combined with improvements intechnology and changes in the field of special education lead-ing to larger caseloads, led to consideration of the maze mea-sure. The maze could be administered in groups, appeared tobe more of a reading comprehension measure than a readingaloud measure, could be administered via the computer, andwas considered to be more acceptable for older students. Themaze was not a new measure. An untimed version of the mazehad been studied in 1970s by Guthrie as a measure of read-ing comprehension and was shown to have good stability, tocorrelate with standardized measures of reading proficiency,and to separate readers with and without disabilities (Guthrie,1973; Guthrie, Seifert, Burnham, & Caplan, 1974). The useof the maze as a timed measure within a CBM framework didnot appear until the late 1980s and early 1990s.
In 1989, Espin, Deno, Maruyama, and Cohen reportedon the technical adequacy of a maze measure that was part ofa group-administered screening instrument called the BasicAcademic Skills Samples (BASS; Deno, Maruyama, Espin, &Cohen, 1989). The reading portion of the BASS consisted ofthree 1-min maze selection tasks that were approximately ata first- to second-grade reading level. The BASS was admin-istered to more than 2,000 students in Grades 1 through 6across 31 schools. Correlations between the BASS maze and1-min reading-aloud passages for a random sample of stu-dents from Grades 3, 4, and 5 were .77, .86, and .86, respec-tively. Data from the entire sample revealed a stable patternof increase in maze scores from Grades 1 to 6, as well as fromwinter to spring within each grade.
Fuchs and Fuchs (1992) extended the research on mazeselection in their search for a CBM reading measure that wouldbe suitable for data collection via the computer and that mighthave greater acceptance for teachers than would reading aloud.Technical adequacy and level of teacher acceptance werecompared for several alternative CBM measures, includingquestion answering, story recall, cloze, and maze selection.Maze selection in this study was a 2.5-min measure adminis-tered twice weekly for 18 weeks via computer. Earlier re-search (Fuchs & Fuchs, 1990) had revealed correlations of .83between scores on maze and reading aloud and of .77 betweenscores on maze and the Reading Comprehension subtest ofthe Stanford Achievement Test (SAT; Gardner, Rudman, Karl-sen, & Merwin, 1982). Results of the Fuchs and Fuchs (1992)study revealed that the maze task was sensitive to change inperformance over time and, unlike other measures, had a rel-atively small ratio of slope to standard error of estimate (SEE),
making it easier to detect growth on a graph. In addition,teachers rated their satisfaction with maze highly, reportingthat they believed the maze reflected multiple dimensions ofreading, including decoding, comprehension, and fluency. Fi-nally, students reported that they liked taking the maze.
In a direct comparison of the technical adequacy ofreading aloud and maze selection measures, Jenkins and Jew-ell (1993) examined the validity of the two measures acrossGrades 2 to 6. All students in the study completed three 1-minmaze tasks and three 1-min reading-aloud passages. Passageswere at a first- to second-grade level. Criterion variables werescores on the Gates-MacGinitie Reading Tests (Gates; MacGin-itie, Kamons, Kowalski, MacGinitie, & McKay, 1978) and theMetropolitan Achievement Tests (MAT; Prescott, Balow, Ho-gan, & Farr, 1984). Within-grade correlations were moderate-strong to strong for both measures, ranging from .63 to .88with the Gates and from .58 to .87 with the MAT. In Grades 2through 4, correlations tended to be stronger for reading aloudthan for maze, but in Grades 5 and 6, this pattern of differ-ences disappeared. Looking across grades, correlations be-tween the reading aloud and the criterion measures droppedfrom the .80s in Grades 2 through 4 to .60s to .70s in Grades5 and 6. In contrast, correlations for the maze remained con-sistent across the grade levels, with most between .65 and .75.Finally, both measures revealed increases across Grades 2 to6, and from fall to spring within grade. For the reading aloudmeasures, change was greatest from Grades 2 to 3, after whichit leveled off. Maze, in contrast, reflected more even rates ofchange across the grades. Results suggested that readingaloud might be a better measure than maze for primary-gradestudents, a conclusion supported by Ardoin et al. (2004), whofound that adding a maze task did not add significantly to theprediction of performance on a standardized achievement testin reading for students in third grade.
Word Identification
Although both word identification (word ID) and readingaloud were included as potential indicators of reading profi-ciency in early CBM research (e.g., Marston, Deno, & Tin-dal, 1983; Marston, Lowry, Deno, & Mirkin, 1981; Shinn,Ysseldyke, Deno, & Tindal, 1986; Tindal, Marston, Deno, &Germann, 1982), reading aloud emerged as the more com-monly used measure, perhaps because it appeared to be moreclosely related to the construct of reading than did word ID.Reading aloud, however, proved difficult to use with begin-ning first-graders because many of these students could notread any words from text in the fall, creating a floor effect inthe measure (e.g., see Bain & Garlock, 1992; Fuchs, Fuchs,& Compton, 2004). As interest in early identification and pre-vention grew, interest in CBM word identification reemerged:The words presented in a word ID task could be controlledfor difficulty and were not constrained by the requirement offitting the words into a coherent story.
The technical adequacy of several potential CBM read-ing measures for students in first grade was compared in astudy by Daly, Wright, Kelly, and Martens (1997). Readingmeasures included word ID, letter reading, letter copying,letter-sound production, and letter-sound selection. Word IDprobes were created using words selected from the Harris-Ja-cobson pre-primer word list (Harris & Jacobson, 1972). Re-sults revealed that letter reading and word ID produced thebest technical adequacy data. Test–retest reliabilities for theword ID and letter-reading measures were .94 and .87, re-spectively, compared to .42 to .65 for the other measures. Con-current validity coefficients with the broad reading subtest ofthe Woodcock-Johnson–Revised (Woodcock & Johnson, 1989)were .40 and .35, respectively. Predictive validity coefficientsbetween the two measures and a passage-reading and word-ID task administered 4 months later were .73 (word ID) and.71 (letter reading) with passage reading and .71 (word ID)and .69 (letter reading) with word ID. Predictive validity co-efficients for the other measures ranged from −.09 to .53.
Fuchs, Fuchs, and Compton (2004) compared the valid-ity of a word-ID task and nonsense word fluency (NWF) taskfor first graders at risk in reading. The word-ID task was cre-ated using words from the Dolch preprimer, primer, and first-grade-level lists. The NWF measure was taken from DIBELS(Good, Simmons, & Kame’enui, 2001). Criterion measures in-cluded the word attack and word identification subtests of theWoodcock Reading Mastery Test–Revised (WRMT-R; Wood-cock, 1987) and the Comprehensive Reading Assessment Bat-tery (CRAB; Fuchs, Fuchs, & Hamlett, 1989). Students weretested in the fall and spring of the year on the criterion mea-sures and were tested at least once weekly on both CBM mea-sures. Alternate-form reliability for the word-ID and NWFtasks was .88 and .87, respectively. Validity was examined forboth level and slope of performance for the two CBM mea-sures. In general, correlations with the criterion measures wereconsistently and reliably stronger for word ID than for NWF.Concurrent validity coefficients for word-ID level ranged from.52 to .93, compared to .50 to .80 for NWF. Predictive valid-ity for word-ID level ranged from .45 to .80, compared to .46to .64 for NWF. Finally, the slopes produced by word ID weremore strongly correlated to the criterion variables (with mostcoefficients at .45 or above) than they were for NWF (withonly 2 of 12 coefficients at .45 or above). Results not only lentsupport for the concurrent and predictive validity of word IDas an indicator of performance but also provided informationsupporting the technical adequacy of the growth rates pro-duced by the measure.
In a recent study, Compton, Fuchs, Fuchs, and Bryant(2006) examined the use of a word ID measure within a response-to-intervention (RTI) approach for first-grade stu-dents. Students were administered a prediction battery in thefall of first grade consisting of word ID, rapid naming, phone-mic awareness, and oral vocabulary measures. Students werealso progress-monitored for a 5-week period using the word-
THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007 107
108 THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007
ID task, and the slope of improvement was calculated. Stu-dents were followed until the end of second grade. Comptonet al. examined the use of word-ID level, slope, and a combi-nation of level and slope as a part of a process for predictingperformance of participants at the end of second grade. (Theyalso examined different classification and data analytic ap-proaches, which are not reviewed here.) Results revealed thatadding word-ID level and slope significantly improved clas-sification accuracy for the identification of at-risk studentsover and above the use of phonemic awareness, rapid naming,and oral vocabulary measures.
Generalizability of Research: Student Populations and PurposesRecent work has examined the validity of CBM reading mea-sures—primarily reading aloud—to diverse groups of studentsand for new uses. In terms of generalizability to new studentpopulations, CBM research has been extended to studentsat the secondary-school level and to students with diversebackgrounds and characteristics (racial/ethnic and languagebackgrounds, gender, socioeconomic status, and sensory dis-abilities). In terms of generalizability of uses, CBM researchhas been extended to examine the uses of reading measuresfor predicting performance on state standard tests. Given spacelimitations, and the fairly limited amount of research withineach category, we briefly review these studies in this section.
Student Populations. Extensions of CBM reading re-search to the secondary-school level initially focused onthe use of reading measures to predict content-area perfor-mance, rather than to predict general reading performance(e.g., Espin & Deno, 1993a, 1993b; Espin & Deno, 1995; Few-ster & MacMillan, 2002; Yovanoff, Duesbery, Alonzo, & Tin-dal, 2005). More recent research has focused on predictinggeneral reading performance (Espin & Foegen, 1996; Espin,Wallace, Lembke, Campbell, & Long, 2007; Muyskens &Marston, 2006; Tichá, Espin, & Wayman, 2007). These stud-ies have all been conducted at the middle school level. Resultshave generally shown that both reading aloud and maze ex-hibit strong alternate-form reliability and moderate to strongcriterion-related and predictive validity. However, Espin et al.(2007) and Tichá et al. (2007) also found that whereas read-ing aloud did not reflect change in performance over time,maze selection did. Growth on maze was related to perfor-mance on a state reading test and to changes on a standard-ized achievement test in reading.
Extensions of CBM reading research to students fromdiverse backgrounds have produced mixed results. With re-gard to racial/ethnic backgrounds, Hixson and McGlinchey(2004) and Kranzler, Miller, and Jordan (1999) found thatCBM reading aloud resulted in overestimation of reading per-formance for African American students and underestimationof performance for Caucasian students. (Overestimation of
performance might result in underidentification for services.)However, Hintze, Callahan, Matthews, Williams, and Tobin(2002) found that CBM reading aloud resulted in neither over-nor underestimation of performance for African American orCaucasian students. In terms of language, results generallyhave revealed moderate to strong reliability and criterion-related validity coefficients for reading-aloud measures withEnglish learners (ELs; Baker & Good, 1995; Wiley & Deno,2005). In addition, gains on reading-aloud measures for ELstudents have been found to be similar to gains seen for non–EL students (Graves, Plasencia-Peinado, Deno, & Johnson,2005).
In a comprehensive study of the effects of home lan-guage, gender, ethnicity, and socioeconomic status on the tech-nical adequacy of CBM reading aloud measures, Klein andJimerson (2005) followed three cohorts of students (N = 398)longitudinally from Grades 1 through 3. The first cohort con-sisted of Caucasian students who spoke English as their homelanguage, the second group was Hispanic students who spokeEnglish as their home language, and the third cohort was His-panic students who spoke Spanish as their home language.Results revealed a strong relationship between the CBMreading-aloud and SAT reading scores for all three groups ofstudents at each grade level, with most correlations between.63 and .82. Linear regression analyses revealed that only acombination of ethnicity and home language resulted in biasin the measures. Specifically, the reading proficiency of His-panic students whose home language was Spanish was sys-tematically overpredicted (which could lead to systematicunderidentification for services), whereas the reading profi-ciency for Caucasian students whose home language wasEnglish was systematically underpredicted (which could leadto systematic overidentification).
Only two studies have examined the validity of CBMreading measures with students with sensory disabilities. Thefirst, Morgan and Bradley-Johnson (1995), provided supportfor the validity of the measures for students in Grades 3 to 7with visual impairments. The second, Allinder and Eccarius(1999), did not provide support for the validity of either read-ing aloud or maze selection for students who were deaf andhard of hearing.
Purposes. Recent work has extended the work on CBMreading measures to examine their use for predicting perfor-mance on state standards tests. Early studies focused on es-tablishing benchmark scores that would predict passing orfailing a state reading test (Crawford, Tindal, & Stieber, 2001;Good, Simmons, & Kame’enui, 2001). Subsequent studies thatexamined correlations between CBM reading-aloud measuresand performance on state standards tests reported diagnosticefficiency statistics, including sensitivity, specificity, positivepredictive power, and negative predictive power (Hintze &Silberglitt, 2005; McGlinchey & Hixson, 2004; Silberglitt &Hintze, 2005; Stage & Jacobsen, 2001; see Note 2). Sensitiv-
ity is the percentage of students below a cut score who fail atest. Specificity is the percentage of students above a cut scorewho pass a test. Positive predictive power is the probabilitythat a student with a score below the cut score will truly failthe test, whereas negative predictive power is the probabilitythat a student with a score above the cut score will truly passthe test.
Correlations between scores on the reading-aloud mea-sures and the various state reading tests generally ranged from.60 to .80 across studies. The exception to this pattern was thestudy by Stage and Jacobsen (2001), in which correlations be-tween reading aloud and the Washington state test were .43 to.44. The differences for the Stage and Jacobsen study mighthave been related to the nature of the Washington state read-ing test, which required both short-answer and extended writ-ten responses, and thus involved not only reading but alsowriting. The other state tests did not require written responses.
Diagnostic efficiency statistics across the four studieswere fairly consistent. Sensitivity values ranged from 65% to76%; specificity values ranged from 74% to 82%. Positivepredictive power values ranged from 55% to 77% for all butthe Stage and Jacobsen (2001) study, in which the value was41%. Negative predictive power values ranged from 83% to90% for all but the McGlinchey and Hixson (2004) study, inwhich the value was 46%. Across studies, the use of CBMadded significantly to positive and negative predictive powerabove base rates of prediction.
Summary and Discussion
Research on CBM reading measures has provided support forand clarified the nature of the relationship between readingaloud and general reading proficiency; has examined alter-nates to reading aloud, including maze selection and word ID;and has examined the generalizability of the results to differ-ent student populations and for different uses.
With regard to the reading-aloud measure, results gen-erally replicated earlier research demonstrating a strong rela-tionship between CBM reading aloud and reading proficiency,even when correlations were calculated within grade, ad-dressing a concern raised about the early CBM research (seeMehrens & Clarizio, 1993). Reading aloud was found to be abetter indicator of reading comprehension than were other“typical” comprehension measures, and results revealed thatreading aloud was not just a speed-of-processing measure. Inaddition, research provided insight into the theoretical natureof the relationship between reading aloud and reading profi-ciency for elementary-school students.
However, reading aloud has limitations. First, readingaloud may not be the best choice for very young and olderstudents. For readers at the very beginning stages of reading,reading-aloud measures produced a floor effect. Examinationof an alternative measure, word ID, proved a promising alter-native for very beginning readers. Reliability and validity co-
efficients for word ID were consistently strong for beginningreaders, and research supported the use of word ID as a partof an RTI approach to early identification and prevention. Sec-ond, although correlations between reading-aloud and crite-rion measures remained moderate to strong across elementaryschool grades, they were strongest at the primary grades anddecreased at the intermediate grades. No such decrease wasseen for maze, which remained fairly stable across the grades.Thus, although reading aloud might be the best measure forprimary-grade students, reading aloud and maze selection bothseem to be appropriate for intermediate-grade students. Forsecondary-school students, maze may be the best choice. Al-though research is limited, initial results suggest that readingaloud does not reflect growth for middle school students,whereas maze selection does. Finally, if progress is to be mon-itored across school years, maze might prove to be the bestchoice. It has been shown to have reasonable validity and re-liability for students across Grades 2 through 8, and the growthrates across grades have shown greater consistency thanthose for reading aloud. The reasons for the age-related dif-ferences seen between the measures are not clear and shouldbe examined more closely. Perhaps the teachers from theFuchs and Fuchs (1992) study were correct: Perhaps mazereflects multiple aspects of reading proficiency to a greaterextent than reading aloud does. If true, the separation of read-ing proficiency into decoding and comprehension factors atthe intermediate elementary grades, as seen in Shinn et al.(1992), would not affect the maze correlations but would af-fect reading-aloud correlations. Regardless of the reasons forthe differences, given the moderate to strong reliability andcriterion-related validity coefficients for maze, and given theadvantages offered by maze in terms of group administration,appropriateness for computerized administration, potentialfor cross-grade measurement, and acceptance by teachers, webelieve that in the future more attention should be devoted tomaze selection as a potential CBM reading measure.
The extension of the research to various populations andfor different uses is still in the early stages of development.Research at the secondary-school level is promising but hasfocused primarily on middle school students. Little has beendone at the high school level. With regard to students of di-verse backgrounds, evidence suggests that although the CBMreading measures may have reasonable reliability and crite-rion-related validity for various groups, the measures mayoverestimate the performance of African American studentsand of Hispanic students whose home language is Spanish andunderestimate the performance of Caucasian students. How-ever, results are mixed, and there is a need to further examinethese patterns of relations. We suggest that performance-leveldifferences between groups be factored out in future research.
In terms of extensions for new purposes, the use of CBMreading-aloud measures for predicting performance on statestandards tests has produced positive results, with the mea-sures producing generally strong correlations with state read-
THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007 109
110 THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007
ing tests and fairly good sensitivity, specificity, positive pre-dictive, and negative predictive power. These conclusionsmust be tempered by the fact that the technical adequacy ofCBM reading measures for predicting performance on statetests may be affected by the nature of the state test, as seen inStage and Jacobsen (2001).
Aside from a need for further research on the generaliz-ability of the measures, two other issues have yet to be suffi-ciently addressed in the area of measure development. First,there is a need to further examine the relationship betweenreading aloud and reading comprehension at the individuallevel. Few studies focused on the individual level. One that did,Markell and Deno (1997), implied that large gains in reading-aloud scores might be necessary before gains in reading com-prehension could be assumed. There is a need to replicate thisresearch and to examine the implications of the results for in-dividual progress monitoring. Relatedly, we think research onthe existence of word callers is of interest. It is conceivablethat a small group of students exists whose performance on thereading aloud measures is, relatively speaking, much higherthan their performance on comprehension measures.
A second issue relates to methods for linking measures.If word ID is to be used for beginning readers, reading aloudfor primary-grade readers, and maze for intermediate and sec-ondary-school readers, how might the measures be linked tocreate a picture of growth across school years? The ability tolink different measures across years would contribute to thedevelopment of a seamless and flexible system of progressmonitoring.
Effects of Text Materials
Thus far, we have focused on various measures that can beused in CBM in reading. In this section, we turn our attentionto the materials used to develop those measures. In 1994, Fuchsand Deno raised the question of whether instructionally use-ful performance assessment had to be based in the curricu-lum. The authors noted that although measures selected fromthe curriculum might have face validity, curriculum measuresalso introduced more error into progress measurement be-cause of variations in passage difficulty and student familiar-ity with passages. In addition, passages selected from withinan instructional curriculum might limit generalizability toother reading curricula.
A large number of studies were conducted beginning inthe 1990s to address questions related to the materials used todevelop CBM measures. The research fell into two broad cat-egories. The first addressed the question of curriculum source,that is, whether technical adequacy of the measures differedwith the curriculum used to generate measures. Included inthis category were studies comparing reading passages gen-erated from different curricula and studies comparing pas-sages generated from an instructional versus a “generic” ornoninstructional curriculum. The second category of research
addressed the difficulty level of the materials chosen, specif-ically whether measurement had to be done within the stu-dent’s instructional level.
We note two points before describing the literature inthis section. First, the research on materials has been con-ducted almost exclusively with reading-aloud measures. Fewstudies have examined maze or word ID measures. Second,there are a surprisingly large number of studies addressing thequestion of materials. Given the space limitations accreditedto a review article, and the similarities in pattern of results,we report the results generally, referring to details of specificstudies only when needed.
Curriculum Effects
Three general themes emerged from the research on the ef-fects of curriculum on CBM reading measures. First, level ofperformance differs significantly with curriculum source.Second, although technical adequacy does not vary with cur-riculum source, rates of growth may. Third, it is not necessaryto match instructional and progress-monitoring material.
With respect to the first theme—differences in levels ofperformance—results consistently reveal mean level differ-ences in scores on passages drawn from different curricula,beginning with Tindal, Marston, Deno, and Germann (1982).Studies have shown higher levels of performance on instruc-tional materials than on mainstream materials (Tindal, Flick,& Cole, 1992), on literature-based materials than on authenticmaterials (Hintze, Shapiro, Conte, & Basile, 1997), on basalmaterials than on literature-based materials (Bradley-Klug,Shapiro, Lutz, & DuPaul, 1998), and on generic materials thanon basal materials (Powell-Smith & Bradley-Klug, 2001). Inaddition, higher levels of performance have been found onmaze measures drawn from materials controlled for difficultythan on literature-based materials (Brown-Chidsey, Johnson,& Fernstrom, 2005). Mean level differences are important forprogress monitoring only insofar as they are used to comparestudents across classes, schools, or districts or when compar-ing student performance at one point in time to performanceat a later point in time. For such uses, it is important to keepthe source of material constant. However, performance-leveldifferences do not address the issue of technical adequacy ofthe measures as indicators of reading performance or growth.Measures may produce differences in levels of performancebut be equally good indicators of general reading proficiency.
The second theme to emerge from the curriculum mate-rials research does relate to technical adequacy. Results acrossseveral studies reveal that there are few differences in thetechnical adequacy of reading-aloud measures selected fromdifferent curricula. For example, Fuchs and Deno (1992) com-pared reading-aloud passages drawn from two published basalcurriculum series and found no differences in the magnitudeof correlations with the Woodcock Reading Mastery Test(WRMT; Woodcock, 1973) in Grades 1 through 6. Similarly,Hintze, Shapiro, Conte, and Basile (1997) found no differ-
ences in alternate-form reliability or criterion-related validityfor passages selected from authentic and literature-basedcurricula and no differences in the passages for classifyingstudents into groups. Hartman and Fuller (1997) found thattest–retest reliability and criterion-related validity coefficientsfor passages selected from literature-based materials for stu-dents in Grades 1 through 3 were similar to those reportedin the other research for passages selected from basal-seriestexts. Finally, Brown-Chidsey et al. (2005) found high corre-lations between performances on maze passages selected fromcontrolled and literature-based material.
Similar to the research on performance levels, researchon the developmental growth rates (i.e., changes in scoresacross grade levels) produced by CBM reading measures re-veal few differences related to curriculum source (Fuchs &Deno, 1992; Hintze et al., 1997). However, results from stud-ies of intraindividual growth rates have been mixed. Bradley-Klug, Shapiro, Lutz, and DuPaul (1998) found no differencesin growth rates produced by literature-based and basal-seriescurricula for second-graders. Differences in growth for fifth-graders were not statistically significant, but growth rateswere .84 words per week for literature-based and .32 for basal-series probes. Brown-Chidsey et al. (2005) found that mazeprobes created from controlled and literature-based passagesboth produced positive growth rates from fall to winter tospring measurements. The significance of the difference ingrowth rates was not tested.
Other studies have found significant differences in growthrates. Hintze, Shapiro, and Lutz (1994) examined the growthrates produced by literature-based and basal-series curriculafor two groups of third-grade students who were taught witheither a literature-based or basal-series curriculum. Growth ratesfor the literature-based and basal-series students were −.35and −1 when measured with literature-based probes, but were.66 and 1.70 when measured with basal-series probes. In theHintze et al. (1994) study, passages were not equated for dif-ficulty. Passage difficulty was equated in a subsequent study(Hintze & Shapiro, 1997) in which students in Grades 2 to 5who were instructed with literature-based or basal-series ma-terial were monitored with literature-based and basal-seriespassages. Again, results revealed differences in growth ratesrelated to curriculum source, although the results were in-consistent across grades. Students in Grades 2 through 4achieved greater growth when monitored with the literature-based, rather than with the basal-series, probes (a pattern op-posite that of Hintze et al. [1994], who found greater growthwith literature-based probes than with basal-series probes forthird-graders), whereas students in Grade 5 achieved greatergrowth with the basal-series probes (a pattern opposite thatfound by Bradley-Klug et al. [1998], who found greater growthrates on literature-based probes for fifth-grade students).
It is difficult to determine why there is such inconsis-tency in results regarding growth rates. However, given thegeneral pattern of inconsistency in the results (i.e., literature-based sometimes producing higher and sometimes lower growth
rates), we hypothesize that factors other than curriculum sourcemay be contributing to differences in growth rates. For ex-ample, as will be discussed in a later section, it is quite diffi-cult to determine the equivalency of “parallel probes” used tomonitor progress, even when the probes are drawn from thesame curriculum and matched on readability level. Moreover,slope values can easily be affected by a bunching of particu-larly difficult or easy probes near the beginning or end of aprogress-monitoring session. One way to address these issuesis to remove potential effects of nonequivalence of passagesby counterbalancing the order of passages across students sothat students do not read passages in the same order. Anothermethod is to use techniques other than readability to establishequivalence of the passages (a point we will discuss later). Ineither case, based on current research, we cautiously suggestthat growth rates are not affected by curriculum source; how-ever, as with level of performance differences, we would stillrecommend consistency in progress monitoring with respectto curriculum source.
The third and last theme emerging from the research onmaterials is related to the need to match monitoring materialto the materials used in instruction. The aforementionedHintze et al. (1994) and Hintze and Shapiro (1997) studies re-sulted in no interaction between the growth rates generated bymaterials selected from particular curricula and the curriculaused for instruction. In other words, students taught using aliterature-based series did not grow differently on literature-based probes than on basal-series probes. Other studies haveproduced similar results. Tindal et al. (1992) found no dif-ferences in slope of improvements for 12 special educationstudents in Grades 2 through 5 when monitored on highly con-trolled instructional materials or the general education basalcurriculum. Powell-Smith and Bradley-Klug (2001) and Riley-Heller, Kelly-Vance, and Shriver (2005) found no differencesin growth rates between probes derived from the students’ in-structional material and generic probes for second-grade stu-dents.
Difficulty Level
In terms of difficulty level of material, there are two generalissues to consider. The first is whether students must be mea-sured in instructional-level material or whether they can bemeasured in material outside their instructional level. The sec-ond is the importance of establishing equivalence of passagedifficulty for repeated progress monitoring.
Need to Measure at Instructional Level. Address-ing the issue of whether students must be measured with instructional-level material, Fuchs and Deno (1992) comparedcriterion-related validity and developmental growth rates pro-duced by materials of various difficulty levels. Results revealedno differences related to material difficulty in the magnitudeof the correlations. Average correlations across difficulty lev-els with the WRMT (Woodcock, 1973) were .91, ranging from
THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007 111
112 THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007
.89 to .93. Results also supported the use of a generic (in thiscase, third-grade) passage for tracking growth across gradelevels. Growth rates produced with a common third-gradepassage were relatively stable and linear. However, Fuchs andDeno also found that developmental growth rates decreasedas the difficulty of the material increased; thus, sixth-gradematerial produced growth rates that were less steep than thoseproduced with third-grade material.
Several studies have examined the influence of materialdifficulty on intraindividual growth rates. Shinn, Gleason, andTindal (1989) monitored the reading-aloud progress of mildlyhandicapped students in Grades 3 to 8 who were randomly as-signed to one of two groups: reading material one grade levelbelow and above instructional placement, or two and fourgrade levels above instructional placement. Results revealedcomparable slopes for students monitored one level below(4.3 words per week) and above (3.7 words per week) instruc-tional placement level. Although not statistically significant,slopes for two and four levels above instructional placementdid differ in magnitude (4.55 vs. 2.35 words per week), sup-porting the earlier finding that an increase in difficulty leadsto flatter slopes. Standard error of estimates (SEEs) did notdiffer by difficulty level.
Dunn and Eckert (2002) found no differences in slopesor SEEs for grade-level versus challenging (approximatelyone grade level above instructional level) material for second-or third-grade students. Hintze, Daly, and Shapiro (1998), how-ever, did find differences in the pattern of results for gradelevels. Students in Grades 1 through 4 were monitored withmaterial at and 1 year above grade level. No differences wereseen in alternate-form reliability between materials, nor in theslopes obtained for students in Grades 3 and 4. However, forstudents in Grades 1 and 2, growth rates were higher on grade-level material than they were on material above grade level.Differences were especially notable for first-grade students.The authors surmised that students in the beginning stages ofreading development experienced more fluency problemswhen the text was difficult than did students who were beyondthe beginning stages of reading.
One study examined the effects of difficulty level on theslopes produced by word identification probes. Fuchs, Tindal,and Deno (1981) compared three types of word lists: thosegenerated from grade-level material covered throughout theyear (grade-level, comprehensive), from grade-level materialcovered during the time of the study (grade-level, limited),and across-grade-level material (preprimer to Grade 4). Dif-ferences were found in the magnitudes of the slopes producedby the measures. The steepest slopes were produced by thegrade-level, limited word lists (.49), followed by the grade-level, comprehensive lists (.20), and the across-grade lists(−.07).
Equivalence of Passages. A second issue related to diffi-culty level of CBM materials, and one that has received far lessattention than other materials-related issues have, is the im-
portance of establishing the equivalence of the “parallel” pas-sages used for repeated progress monitoring. As one might ex-pect, CBM reading scores are sensitive to the difficulty level ofthe passages. For example, each of the studies described in thesection above reported mean score differences on the reading-aloud measure for passages of varying difficulty levels (e.g.,Dunn & Eckert, 2002; Fuchs & Deno, 1992; Hintze, et al.,1998; Shinn et al., 1989). In one respect, this is a positive find-ing and demonstrates the validity and sensitivity of the CBMreading-aloud measures. On the other hand, with respect toongoing progress monitoring, the finding is problematic. Itimplies that scores on repeated progress-monitoring measureswill be affected by variation in passage difficulty (a findingconfirmed by studies reviewed in a later section on growth).
The importance of establishing equivalence in passagedifficulty has been illustrated in two studies. In the first, Poncy,Skinner, and Axtell (2005) examined the effects of passagevariability on reading-aloud scores. Participants were third-graders who read 20 grade-level passages with readabilitiesranging from 2.8 to 3.1 grade level. Passages were presentedto students in random order over a period of 4 days. Analysesrevealed that 81% of the variance in students’ scores was dueto student skill, 10% was due to passage variability, and 9%was unaccounted for. By controlling the difficulty of the pas-sages on the basis of students’ average scores, variance due tostudent skill increased and variance to passage difficulty de-creased.
In the second study, Hintze and Christ (2004) comparedgrade-level material controlled for difficulty level with ran-domly selected materials from graded readers for students inGrades 2 to 5. Students were administered both controlled anduncontrolled reading-aloud passages over the course of 11weeks. The results indicated that estimates of both SEE andthe standard error of the slope (SEb) were smaller when pas-sages were controlled for difficulty than when they were not.
The results of Poncy et al. (2005) and of Hintze and Christ(2004) emphasize the need to establish passage equivalence.Yet, creating parallel passages is not as easy as it may seem.The most common method for establishing equivalence of CBMpassages has been to examine the readability levels of the pas-sages via the use of common readability formulas. However,Ardoin, Suldo, Witt, Aldrich, and McDonald (2005) foundonly modest relationships between the reading levels assignedto passages via readability formulas and the number of wordsread correctly (WRC) in 1 min from those passages for third-graders. The readability formula that produced the highestand most reliable mean associations with WRC was Forecast(Sticht, 1973), a formula that has not been used in the CBMresearch. In a component analysis, Ardoin et al. (2005) foundthat the two components significantly related to WRC weresyllables per 100 words and words not on the Dale-Chall listof 3,000 words. Results also revealed inconsistencies in thelevels assigned to passages among various readability formulas.
Others studies have demonstrated low correlations (rang-ing from rs = −.08 to .43) between scores on CBM reading-aloud
and readability formula levels (Bradley-Klug et al., 1998;Compton, Appleton, & Hosp, 2004; Hintze et al., 1994; Powell-Smith & Bradley-Klug, 2001) and low correlations amongvarious readability formulas (r = .28; Compton et al., 2004).In addition, Compton et al. (2004) found that certain passagecomponents were related to WRC in 1 min, including the num-ber of high-frequency and decodable words, the number ofmultisyllabic words (negatively related), and sentence length.
Given the problems with readability formulas for deter-mining passage difficulty, how can one determine the equiv-alence of passages used for CBM progress monitoring? Ardoinet al. (2005) proposed a process whereby a set of passages isselected based on specific components found to relate to read-ing fluency, such as those identified in Ardoin et al. (2005)and Compton et al. (2004). Selected passages are then field-tested with a large number of students. In this system, onlypassages that fall within 1 standard deviation of the meanWRC would be selected for use in CBM progress monitoring.The effects of such an approach on the stability of the growthrates produced by reading-aloud passages have yet to be ex-amined.
Summary and Discussion
The research on curriculum source supports a robustness inthe CBM measures: Even when developed from different cur-riculum sources, the measures seem to function consistently.Moreover, it does not appear necessary to develop CBM probesfrom material in which the student is being instructed. Theseconclusions are good news in terms of the development of aseamless and flexible system of progress monitoring. Theyimply that CBM progress monitoring using reading-aloudmeasures can be used across various curricula and instruc-tional approaches. However, the research related to difficultylevel poses some restrictions to the general robustness ofCBM materials.
With respect to the issue of whether students need to bemeasured with instructional-level material, the literature in-dicates that CBM reading-aloud measures are fairly flexiblewith regard to difficulty level: Students can be measured withmaterial that is easier or more difficult than their instructionallevel, and the technical adequacy of the measures is not af-fected. However, there are limits to this flexibility. Rates ofgrowth may be affected if material is too difficult (e.g., twoto three levels above instructional level). Further, there issome indication that beginning readers may be more affectedby difficulty level than more advanced readers are. In termsof word identification, results reveal that the more closely themeasure is tied to the instruction, the more sensitive it is togrowth. Of course, this sensitivity must be balanced with gen-eralizability of the results. If words are selected that coveronly 1 month of instruction, students are likely to hit a ceil-ing in their performance in a relatively short amount of time.
The issue of difficulty as it relates to intraindividualgrowth monitoring is of greater concern. Generally, results
emphasize the need to establish passage equivalence for CBMprogress monitoring, especially if the measures are to beused as a part of a decision-making process that carries im-portant social consequences. The issue of passage equivalenceis perhaps less of a concern if CBM is used by a classroomteacher to monitor student progress and evaluate instructionalprograms. Given the time-consuming nature of the type ofapproach suggested by Ardoin et al. (2005), and the positivetreatment validity results reported in Stecker et al. (2005), use of passages selected from controlled sources, such as basal-reading series, is probably sufficient for such classroom use.However, if CBM is to be used as part of a school- or dis-trictwide decision-making process, or if it is to be used as partof an eligibility decision-making process, we feel that it isnecessary to establish equivalence of passage difficulty forprogress monitoring by adapting a process similar to that sug-gested by Ardoin et al. (2005).
Issues About Measuring Growth
We already have discussed in part some factors related togrowth. For example, the amount of growth produced byCBM measures may vary with the curriculum source or dif-ficulty level of the passages, and error in the production ofgrowth rates is reduced when the difficulty of the passagesused for progress monitoring is controlled. Two additional is-sues specific to growth have not yet been addressed: (a) Howreliable or trustworthy are the rates of growth produced byCBM measures? (b) Can standards for growth be determined,and do they differ for students by performance or grade level?As we did in the section on the effects of text materials, wecaution the reader that the majority of research conducted inthis area has been with the reading-aloud measure, with littleattention devoted to word ID or to maze selection.
Reliability of Growth Rates
The research on determining the trustworthiness of the growthrates produced by CBM measures includes examination of thedependability of single CBM scores and the reliability oraccuracy of the slopes produced by multiple, repeated scores.With regard to single scores, studies have examined the amountand sources of error surrounding single CBM scores. Christand Silberglitt (in press) calculated typical standard error ofmeasurement (SEM) values across students in Grades 1 to 5,taking into consideration various levels of measurement reli-ability and sample variability. Values were calculated for datacollected in fall, winter, and spring (three measurements perdata-collection period) across a period of 8 years for 8,200students. Results reveal that the median SEM across gradesand conditions was 10 words read correctly, with a range of 5to 15. Reliability, grade, and sample diversity affected SEMs,with smaller SEMs associated with higher levels of reliabil-ity, lower grade levels, and less sample variability. The au-
THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007 113
114 THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007
thors suggested that the use of standard SEMs could aid in theinterpretation of CBM reading aloud data.
Sources of error associated with CBM scores have beenexamined recently via the application of G theory, or gener-alizability theory, to CBM reading aloud. G theory is designedto assess the dependability of behavioral measurements byspecifying portions of error that can be accounted for by var-ious situational variables under which measurements are taken(Cronbach, Gleser, Nanda, & Rajaratnam, 1972). Through Gtheory, “portions of variance that in classic test score theoryare simply attributed to random error” (Hintze, Owen, Sha-piro, & Daly, 2000, p. 53) are identified and explained. Re-sults of a series of studies applying G theory have revealedconsistent support for the dependability of the CBM readingmeasures for inter- and intraindividual decision making at theelementary school level, with the majority of variance in CBMscores accounted for by individual variation and grade level(Hintze et al., 2000) or individual variation and group mem-bership (Brown-Chidsey, Davis, & Maya, 2003; Hintze &Pelle Petitte, 2001).
Although dependability of individual data points con-tributes to the dependability of the slope, it does not guaranteeit. Other factors must be considered with regard to slope.Several approaches have been taken to evaluate the trustwor-thiness or the reliability of the slopes produced by CBM mea-sures. In early CBM studies, the focus was on whether CBMreading measures were sensitive to change in performance.For example, Marston and Deno (1982) and Marston, Deno,and Tindal (1983) compared change in scores on reading-aloudand word ID measures to changes in scores for other readingmeasures, such as standardized achievement tests or basal-series tests. Results revealed that the CBM measures weremore sensitive to growth over a short period of time than werethese other measures.
Later studies focused more closely on the statistical prop-erties of the growth rates produced by repeated CBM data.Skiba, Deno, Marston, and Wesson (1986) examined slopesfor CBM reading-aloud data collected three to five times perweek over a 6-month period for 67 students with disabilities(Grades 1–7). The mean increase in number of words read perweek was 1.55 words per minute and tended to have an in-verse relationship to grade level (a pattern that has been repli-cated in other research that will be reviewed later). The meanSEE across grade levels was 10.17, with a range of 8.45 inGrade 1 to 1.56 in Grade 4.
In 1992, Fuchs and Fuchs examined the ratio of the slopevalue to the standard error of estimate. SEE indexes the de-gree of intraindividual instability in the CBM data. A largeSEE in proportion to the slope makes the graphs difficult tointerpret. Fuchs and Fuchs (1992) found that maze had a rel-atively small SEE in proportion to the slope when comparedto the other measures that might serve as alternatives for read-ing aloud.
Shinn et al. (1989) described a desirable slope as one thatwould be (a) easy to compute and interpret, (b) accurate in the
sense of not producing systematic over- or underpredictionsof performance, and (c) precise in the sense that individual er-rors of prediction are minimized. In a series of studies, thesecharacteristics were compared for different methods for slopecalculation (Good & Shinn, 1990; Parker & Tindal, 1992;Shinn, Good, & Stein, 1989). Results of these studies sup-ported the use of ordinary least squares (OLS) for calculatingslope compared to other methods (see Note 3). OLS was notthe easiest method for calculating slope, but it did not sys-tematically over- or underpredict future performance given arelatively small number (e.g., 10) of data points, and it mini-mized individual errors of prediction. Results of the slopestudies revealed negatively accelerating growth rates withinthe academic year.
Dunn and Eckert (2002) discussed limitations of studiesof slope, stating that the studies compared line-fitting meth-ods to each other rather than to an absolute standard of tech-nical adequacy and focused primarily on predicting later datafrom earlier data in the same data set. Dunn and Eckert pro-posed examination of the correlations between words readcorrectly and time (school day) as an indicator of accuracy ofthe slope. The square of the correlation coefficient would re-flect the amount of variability in the words read correctly dueto time. A higher percentage would indicate a more accurateslope line. Dunn and Eckert compared slopes for grade-leveland above-grade-level material (see description of the studyin the Materials section). Results revealed median correlationcoefficients of .15 and .14, respectively, indicating that verylittle variability in the number of words read correctly couldbe attributable to time.
Both Hintze and Christ (2004) and Christ (2006) con-sidered SEb in their investigations of slope reliability. Hintzeand Christ (2004; reviewed earlier) found that controlling thedifficulty of progress-monitoring material reduced both SEEand SEb. Christ (2006) examined the likely magnitudes ofSEb for different values of SEE and for different durationsof progress monitoring. Results revealed that the longer theprogress-monitoring duration, the smaller the SEE and thesmaller the SEbs (9.19 for 2 weeks compared to .42 for 15weeks). If one assumed a moderate amount of SEE, 9 to 10data points were needed to reduce SEbs to levels below 1. Tento 12 data points reduced SEbs to between .59 and .78. Christ(2006) discussed the importance of controlling testing condi-tions and passage difficulty to reduce SEE.
Most studies of slope have focused on the stability of theslopes, but little attention has been paid to the validity of theslopes generated by CBM data. We use validity in this senseto refer to the degree to which slope values predict perfor-mance on external criterion measures. With the developmentand ease of use of new statistical techniques, investigationsof the validity of slopes have become easier to accomplish.For example, Shin, Deno, and Espin (2000) used Hierarchi-cal Linear Modeling (HLM) to examine growth rates on mazeselection measures for second-grade students over the courseof a year. Results revealed that maze selection sensitively re-
THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007 115
flected improvement in student performance across the yearand reflected interindividual differences in growth rates. Inaddition, growth on maze was positively and significantly re-lated to later performance on a standardized reading achieve-ment test. Tichá et al. (2007) and Espin et al. (2007) usedHLM to examine growth rates produced by maze selection formiddle-school students. They found that growth on a maze se-lection task was significantly related to performance on a statestandards test and to growth on a standardized achievementtest. Finally, Speece and Ritchey (2005) used growth-curveanalysis to examine the growth rates produced by readingaloud for a sample of at-risk first-graders. Results revealedthat rate of growth on the reading-aloud measures in firstgrade predicted rate of growth in second grade, as well as end-of-year performance in second grade.
Standard Rates of Growth
A question often posed by practitioners is how much growthone can expect on CBM measures and whether growth ex-pectations differ by age or performance levels. Several stud-ies have examined these standard rates of growth. Marston,Lowry, Deno, and Mirkin (1981) examined growth rates fromfall to winter to spring for students in Grades 1 through 6 forboth reading aloud and word ID. Fifty-eight randomly se-lected general education students participated. Both word IDand reading-aloud measures reflected growth over time andexhibited steep continual, linear growth. Dramatic changes wereobserved in the earlier grades, with less dramatic changes inthe upper grades, a result seen in other studies (e.g., Deno etal., 1982; MacMillan, 2000). Hasbrouck and Tindal (1992)published CBM norms for reading aloud on the basis of datacollected over a period of 9 years from 7,000 to 9,000 stu-dents in Grades 2 through 5. They reported normative perfor-mance levels by time of year (fall, winter, spring), grade, andpercentile level within grade.
The limitation of grade-level standards is that they donot take into account an individual’s beginning level of per-formance. Students who are low performing may never reachgrade-level standards, even when they are improving (Fuchs,Fuchs, Hamlett, Walz, & Germann, 1993). Fuchs et al. (1993)addressed intraindividual norms for weekly growth rates onCBM reading-aloud and maze measures. Participants were twosamples of students in Grades 1 through 6. During the firstyear of the study, students (N = 117) were measured in grade-level material once each week using reading aloud. During thesecond year, students (N = 257) were measured in grade-levelmaterial at least monthly using a computerized maze selec-tion program. Results revealed that for the majority of stu-dents, a linear model fit the growth data for both reading aloudand maze, but for a proportion of the students, a quadraticmodel fit the data. For most of these cases, a slightly negativepattern of growth was found. Weekly growth rates were cal-culated by grade level. Results revealed differences in growthrates as a function of grade for reading aloud but not for maze.
As with earlier research, the magnitude of slopes was foundto decrease with an increase in grade. However, similar to thefindings of Jenkins and Jewell (1993), no such relationshipwas found for maze. Fuchs et al. (1993) presented what theytermed realistic and ambitious standards for weekly growth.For reading aloud, these standards were, by grade level, 2 and3 words per week (Grade 1), 1.5 and 2.0 (Grade 2), 1.0 and1.5 (Grade 3), .85 and 1.1 (Grade 4), .5 and .8 (Grade 5), and.3 and .65 (Grade 6). For maze, realistic and ambitious stan-dards for growth remained the same across grades and were.39 and .84 word choices per week, respectively.
Deno, Fuchs, Marston, and Shin (2001) established aca-demic growth standards for students in general and special ed-ucation under typical and effective instructional practices.Growth rates under typical conditions were generated usingextant databases from four educational agencies across thecountry. Growth rates under effective instructional conditionswere generated by combining data across studies in whichinstructional practices had been implemented and shown tobe effective. As with previous research, results revealed thegreatest growth in the early grades, with a decrease in growthrates with age. Typical growth rates for general and specialeducation by grade level were 1.8 and .83 (Grade 1), 1.66 and.57 (Grade 2), 1.18 and .58 (Grade 3), 1.01 and .58 (Grade 4),.58 and .58 (Grade 5), and .66 and .62 (Grade 6). Large dif-ferences existed between the growth rates of general and spe-cial education students up until Grade 4. In the second part ofthe study, Deno et al. (2001) examined growth rates for stu-dents with learning disabilities who had received effective in-structional treatments across five different studies. Growthrates for both reading aloud and maze were reported. Growthrates for reading aloud ranged from .83 to 2.10 words perweek. Growth rates for maze ranged from .56 to .70 wordsper week. With regard to the reading-aloud data, the resultswere striking in the sense that under effective instructionalconditions, students with LD exhibited growth rates close tothe typical growth rates for general education students seen inthe first part of the study.
Summary and Discussion
Research on the technical adequacy of the growth rates pro-duced by CBM measures supports the dependability of singlemeasures; however, results are more mixed with regard toslope. Slopes have been found to predict future CBM perfor-mance and have also been found to predict performance andprogress on external measures. However, concern exists aboutthe variability of the data points around the slope and the vari-ability of the slope values themselves. Concern also existsabout establishing equivalence of passages used for progressmonitoring. More research is needed on the technical proper-ties of the slope values produced by various CBM readingmeasures, especially as they relate to the use of CBM withinthe framework of high-stakes decisions, such as those in-volved in eligibility determination. Future research should
116 THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007
also examine the relative effects of exceptionally high or lowdata points at various time points on slope values, and practi-tioners’ abilities to interpret slope values and apply them toinstructional decision making.
Although studies of standard growth rates make an im-portant contribution to the CBM literature, the results must beviewed with caution. First, as acknowledged by the researchers,none of the studies employed a nationally representative dataset. Second, as discussed earlier in this review, the growthrates obtained in these studies may have been affected by thematerials in the studies, specifically the equivalence of thepassages used to establish the growth rates. To illustrate thispoint, perusal of the various studies included in this reviewreveal growth rates for general education students in Grade 4ranging from .24 to 2.69 words per week. Similar diversity ingrowth rates can be seen at the other grade levels. It may beimportant to consider establishing nationally standard growthrates by creating a standard set of passages that are controlledfor difficulty and measuring a nationally representative sam-ple of students on a weekly basis. In addition, following thelead set by Fuchs et al. (1993), it seems important to considerboth typical and ambitious growth standards.
Conclusion
We begin our Conclusion section with a return to the questionposed at the beginning of the article: Are CBM measures“valid enough” for the purposes for which they are used? Webelieve that the data have supported the validity of the CBMreading-aloud measure for use by classroom teachers as an in-dicator of the performance and progress in reading for ele-mentary school students, Grades 2 to 5. The measures havebeen shown to relate to a variety of criterion measures acrossa multitude of studies conducted over many years with dif-ferent participants, methods, materials, and researchers. In ad-dition, the theoretical basis for the strength of reading aloudas an indicator of general reading proficiency has been sup-ported, and the measures have been shown to be dependable.
Despite the positive support for the reading-aloud mea-sures, we offer some words of caution about the CBM readingmeasures in general. First, with respect to reading aloud, themeasures have not always been found to be strongly related tocriterion measures. Although correlations are consistently inthe .70s or above, there are studies that have resulted in corre-lations in the .40s between reading aloud and criterion variables.A closer look at the reasons for these exceptions is in order.Research has suggested that the relationship between readingaloud and criterion variables may decrease as students getolder; however, other factors may also influence the relation-ship, such as the influence of various passage characteristicson the relation between reading aloud and criterion variables.
Second, the overwhelming majority of research in CBMreading has been done with reading aloud, and with samplesof students in Grades 2 through 5. The question of the valid-
ity of the measures for younger (K–Grade 1) and older (Grades6–12) students, and for students of diverse backgrounds, isstill open for examination. With regard to student age, researchsuggests that a word-ID measure may be more appropriatethan reading aloud would be for younger, beginning readers,and that maze may be more appropriate for older, middleschool students. With regard to students with diverse needs,correlations between the CBM measures and criterion vari-ables have generally been positive across various groups, butthere is evidence that CBM reading aloud overestimates theperformance of African American and EL students, whichcould result in underidentification for services. For studentswith sensory disabilities, too few studies have been conductedto draw conclusions.
Finally, we raise the issue of the uses of CBM measures.CBM measures are no longer simply a means for special ed-ucation teachers to evaluate the effects of instructional pro-grams. The use of CBM measures has shifted from that ofmonitoring the progress of students in special education to usein high-stakes decisions that carry important social conse-quences. This shift creates a new set of standards for the mea-sures. The validity of the measures for these new purposes hasyet to be established. There is little known, relatively speak-ing, about the technical characteristics of the slopes producedby CBM measures. For example, what is the best method fordetermining validity and reliability of the slope? How manydata points are needed to obtain a reliable and valid slope?Does this number differ with the age of the student, the ma-terial used, or the equivalency of “parallel” passages? Howdo we establish parallel passages? There is also little knownabout normative or ambitious rates of performance and growth.For example, how much growth should be expected fromstudents at different age and performance levels? Should na-tional norms be developed? If so, must standard sets of mate-rials be developed for this purpose? Finally, little is knownabout teachers’ understanding of CBM progress data and thethinking processes teachers use as they interpret and use datafor decision making. For example, how accurate are teachersat interpreting CBM data? How long does it take them to learnhow to interpret and use CBM data? How easy is it for teach-ers to connect progress-monitoring data to instructional deci-sions, and are there methods to enhance their ability to tie dataand instruction together? We believe that areas must be ex-plored if CBM is to be used as a part of a decision-makingprocess related to determining the need for special educationservices.
Despite these words of caution, we believe that the flex-ibility and durability of CBM in reading across different mea-sures, materials, settings, students, and situations is notable.This flexibility and durability provide the basis for consider-ing the development of a seamless and flexible system ofprogress monitoring that could be used across students of var-ious ages and performance levels. Such a system might allowone to follow the progress of a student from kindergarten toGrade 12, using the same measures and materials or linking
measures and materials. Many of the issues raised above (e.g.,establishment of normative levels and growth rates using stan-dard material) would need to be addressed before such as sys-tem could be realized, but the research on the development ofCBM measures in reading is at a point where development ofsuch a seamless and flexible system is conceivable.
AUTHORS’ NOTES
1. The Research Institute on Progress Monitoring at the Universityof Minnesota is funded by the U.S. Department of Education, Of-fice of Special Education Programs (Award H324H030003) andpartially supported the completion of this work.
2. We wish to thank the Netherlands Institute for Advanced Studyin the Humanities and Social Sciences for its support in the prepa-ration of this manuscript.
3. We wish to thank Stan Deno, Ed Shapiro, Ted Christ, JonghoShin, and John Hintze for input on parts of this article.
NOTES
1. These measures can be found under various names in the litera-ture. Reading aloud is often referred to as oral reading fluencyand oral reading. Maze selection is referred to as maze and multiple-choice cloze. Word identification is also referred to asisolated word reading, word list reading, and word identificationfluency. We have chosen to use reading aloud, maze selection,and word identification consistently throughout the article be-cause these terms represent the behavior that the student performson each CBM measurement task; that is, students read aloud, se-lect maze choices, or identify words.
2. Hintze and Silberglitt (2005) and Silberglitt and Hintze (2005)examined various methods for determining cut scores using theCBM measures. Results supported the use of logistic regressioneither alone or in combination with Receiver Operating Charac-teristics (ROC) curve analysis. Given the fact that using logisticregression alone is easier than using it with ROC and given thesimilarities in the pattern of results between the two approaches,we report results here for logistic regression analyses only.
3. Although the research supported the use of OLS for calculatingslope, this is not a method that can be easily calculated by handby teachers. Parker and Tindal (1992) proposed an alternative,Tukey I, that had good statistical properties and could be calcu-lated by hand. To our knowledge, this method has never caughton in either CBM research or practice. In research, the over-whelming majority of studies calculate slope using OLS.
REFERENCES
Allinder, R. M., & Eccarius, M. A. (1999). Exploring the technical adequacyof curriculum- based measurement in reading for children who use man-ually coded English. Exceptional Children, 65, 271–283.
Ardoin, S. P., Suldo, S. M., Witt, J., Aldrich, S., & McDonald, E. (2005). Ac-curacy of readability estimates’ predictions of CBM performance.School Psychology Review, 20, 1–22.
Ardoin, S. P., Witt, J. C., Suldo, S. M., Connell, J. E., Koenig, J. L., Resetar,J. L., et al. (2004). Examining the incremental benefits of administeringa maze and three versus one curriculum-based measurement readingprobes when conducting universal screening. School Psychology Re-view, 33, 218–233.
Bain, S. K., & Garlock, J. W. (1992). Cross-validation of criterion-related va-lidity for CBM reading passages. Diagnostique, 17, 202–208.
Baker, S. K., & Good, R. H. (1995). Curriculum-based measurement of Eng-lish reading with bilingual Hispanic students: A validation study withsecond-grade students. School Psychology Review, 24, 561–578.
Bradley-Klug, K. L., Shapiro, E. S., Lutz, J. G., & DuPaul, G. J. (1998). Eval-uation of oral reading rate as a curriculum-based measure within litera-ture-based curriculum. Journal of School Psychology, 36, 183–197.
Brown-Chidsey, R., Davis, L., & Maya, C. (2003). Sources of variation incurriculum-based measures of silent reading. Psychology in the Schools,40, 363–377.
Brown-Chidsey, R., Johnson, P., Jr., & Fernstrom, R. (2005). Comparison ofgrade-level controlled and literature-based maze CBM reading passages.School Psychology Review, 34, 387–394.
Christ, T. J. (2006). Short-term estimates of growth using curriculum-basedmeasurement of oral reading fluency: Estimates of standard error ofslope to construct confidence intervals. School Psychology Review, 35,128–133.
Christ, T. J., & Silberglitt, B. (in press). Curriculum-based measurement oforal reading fluency: The standard error of measurement. School Psy-chology Review.
Compton, D. L., Appleton, A. C., & Hosp, M. K. (2004). Exploring the re-lationship between text-leveling systems and reading accuracy and flu-ency in second-grade students who are average and poor decoders.Learning Disabilities Research & Practice, 19, 176–184.
Compton, D. L., Fuchs, D., Fuchs, L. S., & Bryant, J. D. (2006). Selectingat-risk readers in first grade for early intervention: A two-year longitu-dinal study of decision rules and procedures. Journal of EducationalPsychology, 98, 394–409.
Crawford, L., Tindal, G., & Stieber, S. (2001). Using oral reading rate to pre-dict student performance on statewide achievement tests. EducationalAssessment, 7, 303–323.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The de-pendability of behavioral measurements: Theory of generalizability forscores and profiles. New York: Wiley.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychologicaltests. Psychological Bulletin, 52, 281–302.
Daly, E. J., III, Wright, J. A., Kelly, S. Q., & Martens, B. K. (1997). Mea-sures of early academic skills: Reliability and validity with a first gradesample. School Psychology Quarterly, 12, 268–280.
Deno, S. L. (1985). Curriculum-based measurement: The emerging alterna-tive. Exceptional Children, 52, 219–232.
Deno, S. L. (1990). Individual differences and individual difference: The es-sential difference of special education. The Journal of Special Educa-tion, 24, 160–173.
Deno, S. L., Fuchs, L. S., Marston, D., & Shin, J. H. (2001). Using curricu-lum-based measurement to establish growth standards for students withlearning disabilities. School Psychology Review, 30, 507–524.
Deno, S. L., Marston, D., Mirkin, P., Lowry, L., Sindelar, P., & Jenkins, J.(1982). The use of standard tasks to measure achievement in reading,spelling, and written expression: A normative and developmental study.Institute for Research on Learning Disabilities, 87.
Deno, S. L., Maruyama, G., Espin, C. A., & Cohen, C. (1989). The basic aca-demic skills samples (BASS). Minneapolis, MN: University of Min-nesota.
Deno, S. L., Mirkin, P. K., & Chiang, B. (1982). Identifying valid measuresof reading. Exceptional Children, 49, 36–45.
Dunn, E. K., & Eckert, T. L. (2002). Curriculum-based measurement in read-ing: A comparison of similar versus challenging material. School Psy-chology Quarterly, 17, 24–46.
Espin, C. A., & Deno, S. L. (1993a). Content-specific and general readingdisabilities of secondary students: Identification and educational rele-vance. The Journal of Special Education, 27, 321–337.
Espin, C. A., & Deno, S. L. (1993b). Performance in reading from contentarea text as an indicator of achievement. Remedial and Special Educa-tion, 14, 47–59.
THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007 117
118 THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007
Espin, C. A., & Deno, S. L. (1994-95). Curriculum-based measures for sec-ondary students: Utility and task specificity of text-based reading andvocabulary measures for predicting performance on content-area tasks.Diagnostique, 20, 121–142.
Espin, C. A., Deno, S. L., Maruyama, G., & Cohen, C. (1989, March). TheBasic Academic Skills Samples (BASS): An instrument for the screeningand identification of children at risk for failure in regular educationclassrooms. Paper presented at the National Convention of the Ameri-can Educational Research Association, San Francisco, CA.
Espin, C. A., & Foegen, A. (1996). Validity of general outcome measures forpredicting secondary students’ performance on content-area tasks. Ex-ceptional Children, 62, 497–514.
Espin, C., Wallace, T., Lembke, E., Campbell, H., & Long, J. (2007). Creat-ing a progress measurement system in reading for secondary students:Monitoring progress towards meeting high stakes standards. Manuscriptsubmitted for publication.
Fewster, S., & MacMillan, P. D. (2002). School-based evidence for the va-lidity of curriculum-based measurement of reading and writing. Reme-dial and Special Education, 23, 149–156.
Fuchs, L. S., & Deno, S. L. (1992). Effects of curriculum within curriculum-based measurement. Exceptional Children, 58, 232–243.
Fuchs, L. S., & Deno, S. L. (1994). Must instructionally useful performanceassessment be based in curriculum? Exceptional Children, 61, 15–24.
Fuchs, L. S., & Fuchs, D. (1990). [Relations among the Reading Compre-hension subtest of the Stanford Achievement Test and maze, recall, ques-tion answering, and fluency measures]. Unpublished data.
Fuchs, L. S., & Fuchs, D. (1992). Identifying a measure for monitoring stu-dent reading progress. School Psychology Review, 21, 45–58.
Fuchs, L. S., Fuchs, D., & Compton, D. L. (2004). Monitoring early readingdevelopment in first grade: Word identification fluency versus nonsenseword fluency. Exceptional Children, 71, 7–21.
Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1989). Effects of instrumental useof curriculum-based measurement to enhance instructional programs.Remedial and Special Education, 10, 43–52.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., Walz, L., & Germann, G. (1993). For-mative evaluation of academic progress: How much growth can we ex-pect? School Psychology Review, 22, 27–48.
Fuchs, L. S., Fuchs, D., & Maxwell, L. (1988). The validity of informal com-prehension measures. Remedial and Special Education, 9, 20–28.
Fuchs, L. S., Tindal, G., & Deno, S. L. (1981). Effects of varying item do-main and sample duration on technical characteristics of daily measuresin reading. Institute for Research on Learning Disabilities, 48.
Gardner, R. F., Rudman, H. C., Karlsen, B., & Merwin, J. C. (1982). Stan-ford achievement test. Iowa City: Harcourt Brace Jovanovich.
Good, R. H., & Shinn, M. R. (1990). Forecasting accuracy of slope estimatesfor reading curriculum-based measurements: Empirical evidence. Be-havioral Assessment, 12, 179–193.
Good, R. H., Simmons, D. C., & Kame’enui, E. J. (2001). The importanceand decision-making utility of a continuum of fluency-based indicatorsof foundational reading skills for third-grade high-stakes outcomes. Sci-entific Studies of Reading, 5, 257–288.
Graves, A. W., Plasencia-Peinado, J., Deno, S. L., & Johnson, J. R. (2005).Formatively evaluating the reading progress of first-grade EnglishLearners in multiple-language classrooms. Remedial and Special Edu-cation, 26, 215–225.
Guthrie, J. T. (1973). Reading comprehension and syntactic responses in goodand poor readers. Journal of Educational Psychology, 65, 294–299.
Guthrie, J. T., Seifert, M., Burnham, N. A., & Caplan, R. I. (1974). The mazetechnique to assess, monitor reading comprehension. The ReadingTeacher, 28, 161–168.
Hamilton, C., & Shinn, M. R. (2003). Characteristics of word callers: An in-vestigation of the accuracy of teachers’ judgments of reading compre-hension and oral reading skills. School Psychology Review, 32, 228–240.
Harcourt Brace & Company. (1997). Stanford Achievement Test Series–Ninthedition: Technical data report. San Antonio, TX: Author.
Harcourt Educational Measurement. (1998). Metropolitan achievement test(7th ed.). San Antonio, TX: Harcourt Assessment.
Harris, A. P., & Jacobson, M. D. (1972). Basic elementary reading vocabu-laries. New York: MacMillan.
Hartman, J. M., & Fuller, M. L. (1997). The development of curriculum-basedmeasurement norms in literature-based classrooms. Journal of SchoolPsychology, 35, 377–389.
Hasbrouck, J. E., & Tindal, G. (1992). Curriculum-based oral reading flu-ency norms for students in grades 2 through 5. Teaching ExceptionalChildren, 24(3), 41–44.
Hintze, J. M., Callahan, J. E., Matthews, W. J., Williams, S. A. S., & Tobin,K. G. (2002). Oral reading fluency and prediction of reading compre-hension in African American and Caucasian elementary school children.School Psychology Review, 31, 540–553.
Hintze, J. M., & Christ, T. J. (2004). An examination of variability as a func-tion of passage variance in CBM progress monitoring. School Psychol-ogy Review, 33, 204–217.
Hintze, J. M., Daly, E. J., & Shapiro, E. S. (1998). An investigation of the ef-fects of passage difficulty level on outcomes of oral reading fluencyprogress monitoring. School Psychology Review, 27, 433–445.
Hintze, J. M., Owen, S. V., Shapiro, E. S., & Daly, E. J. (2000). Generaliz-ability of oral reading fluency measures: Application of G theory to cur-riculum-based measurement. School Psychology Quarterly, 15, 52–68.
Hintze, J. M., & Pelle Petitte, H. A. (2001). The generalizability of CBM oralreading fluency measures across general and special education. Journalof Psychoeducational Assessment, 19, 158–170.
Hintze, J. M., & Shapiro, E. S. (1997). Curriculum-based measurement andliterature-based reading: Is curriculum-based measurement meeting theneeds of changing reading curricula? Journal of School Psychology, 35,351–375.
Hintze, J. M., Shapiro, E. S., Conte, K. L., & Basile, I. M. (1997). Oral read-ing fluency and authentic reading material: Criterion validity of the tech-nical features of CBM survey-level assessment. School PsychologyReview, 26, 535–553.
Hintze, J. M., Shapiro, E. S., & Lutz, J. (1994). The effects of curriculum onthe sensitivity of curriculum-based measurement in reading. The Jour-nal of Special Education, 28, 188–202.
Hintze, J. M., & Silberglitt, B. (2005). A longitudinal examination of the di-agnostic accuracy and predictive validity of R-CBM and high-stakestesting. School Psychology Review, 34, 372–386.
Hixson, M. D., & McGlinchey, M. T. (2004). The relationship between race,income, and oral reading fluency and performance on two reading com-prehension measures. Journal of Psychoeducational Assessment, 22,351–364.
Hoover, H. D., Hieronymus, A. N., Frisbie, D. A., & Dunbar, S. B. (1996).Iowa test of basic skills. Itasca, IL: Riverside.
Hosp, M. K., & Fuchs, L. S. (2005). Using CBM as an indicator of decod-ing, word reading, and comprehension: Do the relations change withgrade? School Psychology Review, 34, 9–26.
Jenkins, J. R., & Jewell, M. (1993). Examining the validity of two measuresfor formative teaching: Reading aloud and maze. Exceptional Children,59, 421–432.
Kaminski, R. A., & Good, R. H., III. (1998). Assessing early literacy skillsin a problem-solving model: Dynamic Indicators of Basic Early Liter-acy Skills. In M. Shinn (Ed.), Advanced applications of curriculum-based measurement (pp. 113–142). New York: Guilford Press.
Karlsen, B., & Gardner, E. (1985). Stanford diagnostic reading test (3rd ed.).San Antonio, TX: Psychological Corp.
Kaufman, A. S., & Kaufman, N. L. (1985). Kaufman test of educationalachievement (brief form). Circle Pines, MN: American Guidance Service.
Kaufman, A. S., & Kaufman, N. L. (1990). Kaufman brief intelligence test.Circle Pines, MN: American Guidance Service.
Klein, J. R., & Jimerson, S. R. (2005). Examining ethnic, gender, language,and socioeconomic bias in oral reading fluency scores among Caucasianand Hispanic students. School Psychology Quarterly, 20, 23–50.
THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007 119
Koslin, B. L., Koslin, S., Zeno, S. M., & Ovens, S. H. (1989). The Degreesof Reading Power Test: Primary and standard forms. Brewster, NY:Touchstone Applied Science Associates.
Kranzler, J. H., Brownell, M. T., & Miller, M. D. (1998). The construct va-lidity of curriculum-based measurement of reading: An empirical testof a plausible rival hypothesis. Journal of School Psychology, 36, 399–415.
Kranzler, J. H., Miller, M. D., & Jordan, L. (1999). An examination ofracial/ethnic and gender bias on curriculum-based measurement of read-ing. School Psychology Quarterly, 14, 327–342.
MacGinitie, W. H., Kamons, J., Kowalski, R. L., MacGinitie, R. K., &McKay, T. (1978). Gates-MacGinitie Reading Tests (2nd ed.). Chicago:Riverside.
MacMillan, P. (2000). Simultaneous measurement of reading growth, gen-der, and relative-age effects: Many-faceted Rasch applied to CBM read-ing scores. Journal of Applied Measurement, 1, 393–408.
Markell, M. A., & Deno, S. L. (1997). Effects of increasing oral reading: Gen-eralization across reading tasks. The Journal of Special Education, 31,233–250.
Marston, D. (1989). A curriculum-based measurement approach to assessingacademic performance: What it is and why do it. In M. Shinn (Ed.), Cur-riculum-based measurement: Assessing special children (pp.18–78).New York: Guilford Press.
Marston, D., & Deno, S. L. (1982). Implementation of direct and repeated mea-surement in the school setting. Institute for Research on Learning Dis-abilities, 106.
Marston, D., Deno, S. L., & Tindal, G. (1983). A comparison of standardizedachievement tests and direct measurement techniques in measuringpupil progress. Institute for Research on Learning Disabilities, 126.
Marston, D., Lowry, L., Deno, S., & Mirkin, P. (1981). An analysis of learn-ing trends in simple measures of reading, spelling, and written expression:A longitudinal study. Institute for Research on Learning Disabilities, 49.
McConnell, S. R., McEvoy, M. A., & Priest, J. S. (2002). “Growing” mea-sures for monitoring progress in early childhood education: A researchand development process for individual growth and development indi-cators. Assessment for Effective Intervention, 27, 3–14.
McGlinchey, M. T., & Hixson, M. D. (2004). Using curriculum-based mea-surement to predict performance on state assessments in reading. SchoolPsychology Review, 33, 193–203.
McGrew, K. S., & Woodcock, R. W. (2001). Woodcock-Johnson III techni-cal manual. Itasca, IL: Riverside.
Mehrens, W. A., & Clarizio, H. F. (1993). Curriculum-based measurement:Conceptual and psychometric considerations. Psychology in the Schools,30, 241–254.
Messick, S. (1989a). Meaning and values in test validation: The science andethics of assessment. Educational Researcher, 18, 5–11.
Messick, S. (1989b). Validity. In R. L. Linn (Ed.), Educational measurement(3rd ed., pp. 13–103). New York: Macmillan.
Michigan State Board of Education. (1999). 1999 Update of the EssentialSkills Reading Test Blueprint Michigan Educational Assessment Pro-gram. Lansing: Michigan State Board of Education.
Morgan, S. K., & Bradley-Johnson, S. (1995). Technical adequacy of cur-riculum-based measurement for Braille readers. School Psychology Re-view, 24, 94–103.
Minnesota Department of Children, Families, and Learning. (1998–2002).Test specifications: Minnesota Comprehensive Assessment–Reading. St.Paul, MN: Author.
Muyskens, P., & Marston, D. B. (2006). The relationship between curriculum-based measurement and outcomes on high-stakes tests with secondarystudents. Minneapolis Public Schools. Unpublished manuscript.
Oregon Department of Education. (2000). Statewide assessment results 2000.Retrieved from the World Wide Web: http://www.ode.state.or.us/
Parker, R., & Tindal, G. (1992). Estimating trend in progress monitoring data:A comparison of simple line-fitting methods. School Psychology Re-view, 21, 300–312.
Poncy, B. C., Skinner, C. H., & Axtell, P. K. (2005). An investigation of thereliability and standard error of measurement of words read correctlyper minute using curriculum-based measurement. Journal of Psychoed-ucational Assessment, 23, 326–338.
Powell-Smith, K. A., & Bradley-Klug, K. L. (2001). Another look at the “C” inCBM: Does it really matter if curriculum-based measurement readingprobes are curriculum-based? Psychology in the Schools, 38, 299–312.
Prescott, G. A., Balow, I. H., Hogan, T. P., & Farr, R. C. (1984). Metropoli-tan achievement tests (MAT-6). San Antonio, TX: Psychological Corp.
Reid, D. K., Hresko, W. P., Hammill, D. D., & Wiltshire, S. M. (1991). Testof early reading ability: Special edition for students who are deaf orhard of hearing. Austin, TX: PRO-ED.
Riley-Heller, N., Kelly-Vance, L., & Shriver, M. (2005). Curriculum-basedmeasurement: Generic vs. curriculum-dependent probes. Journal of Ap-plied School Psychology, 21(1), 141–162.
Scannell, D. P., Haugh, O. M., Schild, A. H., & Ulmer, G. (1986). Tests ofachievement and proficiency. Chicago, IL: Riverside.
Shin, J., Deno, S. L., & Espin, C. (2000). Technical adequacy of the mazetask for curriculum-based measurement of reading growth. The Journalof Special Education, 34, 164–172.
Shinn, M. R. (Ed.). (1989). Curriculum-based measurement: Assessing spe-cial children. New York: Guilford Press.
Shinn, M. R., Gleason, M. M., & Tindal, G. (1989). Varying the difficulty oftesting materials: Implications for curriculum-based measurement. TheJournal of Special Education, 23, 223–233.
Shinn, M. R., Good, R. H., Knutson, N., Tilly, W. D., & Collins, V. L. (1992).Curriculum-based measurement of oral reading fluency: A confirmatoryanalysis of its relation to reading. School Psychology Review, 21, 459– 479.
Shinn, M. R., Good, R. H., & Stein, S. (1989). Summarizing trend in studentachievement: A comparison of methods. School Psychology Review, 18,356–370.
Shinn, M. R., Ysseldyke, J. E., Deno, S. L., & Tindal, G. A. (1986). A com-parison of differences between students labeled Learning Disabled andLow Achieving on measures of classroom performance. Journal ofLearning Disabilities, 19, 545–551.
Silberglitt, B., & Hintze, J. (2005). Formative assessment using CBM-R cutscores to track progress toward success on state-mandated achievementtests: A comparison of methods. Journal of Psychoeducational Assess-ment, 23, 304–325.
Skiba, R. J., Deno, S. L., Marston, D., & Wesson, C. (1986). Characteristicsof time-series data collected through curriculum-based reading mea-surement. Diagnostique, 12, 3–15.
Spache, G. (1981). Diagnostic reading scales. Monterey, CA: CTB/McGraw-Hill.
Speece, D. L., & Ritchey, K. D. (2005). A longitudinal study of the devel-opment of oral reading fluency in young children at risk for reading fail-ure. Journal of Learning Disabilities, 38, 387–399.
Stage, S. A., & Jacobsen, M. D. (2001). Predicting student success on a state-mandated performance-based assessment using oral reading fluency.School Psychology Review, 30, 407–419.
Stecker, P. M., Fuchs, L. S., & Fuchs, D. (2005). Using curriculum-basedmeasurement to improve student achievement: Review of research. Psy-chology in the Schools, 42, 795–819.
Sticht, T. G. (1973). Research toward the design, development and evalua-tion of a job-functional literacy program for the U.S. Army. LiteracyDiscussions, 4, 339–369.
Tichá, R., Espin, C. A., & Wayman, M. M. (2007). Reading progress moni-toring for secondary-school students: Reliability, validity, and sensitiv-ity to growth of reading aloud and maze selection measures. Manuscriptsubmitted for publication.
Tindal, G., Flick, D., & Cole, C. (1992). The effect of curriculum on inferencesof reading performance and improvement. Diagnostique, 18, 69–84.
Tindal, G., Marston, D., Deno, S. L., & Germann, G. (1982). Curriculum dif-ferences in direct repeated measures of reading. Institute for Researchon Learning Disabilities, 93.
120 THE JOURNAL OF SPECIAL EDUCATION VOL. 41/NO. 2/2007
Washington Assessment of Student Learning: Test sample. (1998). Olympia,WA: Office of the Superintendent of Public Instruction.
Woodcock, R. W. (1973). Woodcock reading mastery tests manual. CirclePines, MN: American Guidance Service.
Woodcock, R. W. (1987). Woodcock reading mastery test–Revised. CirclePines, MN: American Guidance Service.
Woodcock, R. W., & Johnson, M. B. (1989). Woodcock-Johnson test ofachievement: Standard and supplemental batteries. Allen, TX: DLMResources.
Wiley, H. I., & Deno, S. L. (2005). Oral reading and maze measures as pre-dictors of success for English learners on a state standards assessment.Remedial and Special Education, 26, 207–214.
Yell, M. L. (1992). Barriers to implementing curriculum-based measurement.Diagnostique, 18, 99–112.
Yovanoff, P., Duesbery, L., Alonzo, J., & Tindal, G. (2005). Grade-levelinvariance of a theoretical causal structure predicting reading compre-hension with vocabulary and oral reading fluency. Educational Mea-surement: Issues and Practice, 24(3), 4–12.