errs - eric · ed-218 345 document resume. 0. tm 820'386. author, van der linden, wim j..-....
TRANSCRIPT
ED-218 345
DOCUMENT RESUME.
0
TM 820'386
AUTHOR , van der Linden, Wim J. .- .
TITLE Assessing Inconsistenciesin.Standard Setting withthe Angoff or Nedelsky Technique.
PUB DATE Mar 82 .
..\
NOTE 20p.; Presentation of this paper supported by a grantfrom-the Dutch Foundation of Educational Research.
EDRS PRICE MFOl/PC01 Plus 13ostage.DESCRIPTORS Criteridn Referenced Testd; *Cutting Scores; *Error
of Measurement; *Latent Treit theory; Mathematical°Models; Probability; 'Secondary Education; Test Items;irTest N
IDENTIFIERS *Angoff Methods; Inter Rater.Reliabilitk; *NedelskyMethod';. Standard Setting
ABSTRACT'A latent trait. method is presented to investigate the
possibility that Angoff or Nedelsky judges specify inconsistentprobabilities in standard setting-techniques for objectives -basedinstructional programs. It is tuggested that judges frequently.,specify a low probability of success for an easy item but'a largeprobability for a hard item. The.respontet of 156 pupils to a 25-itemtest from a tenth grade, physics course were inspected .bit eight AnOoffand nine'Nedelsky judges. The latent' trait analysis produced 18 itemsshowing a satisfactory fitto tbe Rasch model. Serious errors of
specificatio*ere found and errs were-coesiderably larger for theNedelsky technique. Special difficulties with the Nedelsky judges arediscudted. Applicatiot of the latent trait method are discussed.(Author/CM)
00, s
O
oti Jr
*************************p*********************************************Reproductions supplied EDRS are the best that can be made *
from the original document.******************************* ***'****************.********************
A
YY
v
O
Atsessing Inconsistencies in Standard Setting
with the Angoff or Nedelsky Technique-,
Wim J..van der Linden_
Twente University of Technology'
3
-.
....,
Enschede, The Netherlands. %
1.1 t
. .
cs
U.S. DEPARTMENT OF EDUCATIONNATIONAL INSTITUTE OF EOUCATION
EOLICATIONAL RESOURCES INFORMATION
CENTER (ERIC))1, This document has been reproduced as
received from the Person or organizationongmating itMinor changes have been made to improvereproduction quality
Points ofytew or opinions stated on this docu
merit do not necessanly represent ofttal NIEposinori or policy
"PERMISSION TO REPRODUCE THISMATERIAL. HAS BEEN GRANTED BY
(Al. J. th24,44,( L.;,,44
__TCLIHEIEDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)."
'
Yt *
Paper presented at the Annual,Meeting of the American Educational Research. A
Association, New York., March 19-23, /982. Pr6sentation of this paper was
made possible by &grant from the Dutch Foundation of Educationa4iiResearch.
00 t
r.
6
Inconsistencies in Standand'Seiting
1.
Abstract
It has often been argued that all techniques of .standard setting are arbitrary:,, ....
.
and .likely to,yield different resiilts for different techniques Or persons....
i
.
.. 4,
d This paper -deaJs with Avrelat'ed but hitherto :ignore& aspect, namely the,
1 . ', . , '. _ .
..,...
possibi I) fy Olt Ank)ff,-,or Nedel s4cy judges specify. ihtonsistent ,pf-ababilities,t d .. %, :. 4,,
- :e:g4 ;,a-A Ii015bbllity for an .easy itgliibilf a large proliabillty_ for a h4tIl
.
... . 6, . ,.-
l ' '' ' a 'a. ' 2,' . ,..' $$ .
, .,,item, 'A 1 atent. trail, 'method is prcoposedto `es'timat'e such 'mi sspeei fications..
. .. - - ,
-, 1 . . ,t . a , \worked'''' and eterWibes whether the judge has orked 'Consistently. Results froin 'an
:71*1* , , AI 0 '' 4
01 ' p ' 1
empirical study dre given which indiCate that serious :. errors of specification
,:of ,
can be expected:anci that these are considerably larder for the Nedelsiq than
for-the Angoff technique.
. r
.,4
r;
Inconsistencies in Standard Setting
2.
0
Assessing Inconsistencies in Standard-Setting
' with the Angoff or Nedeisky Technique
This panel- is concerned with'the 'u-se of .standard-setting techniques in.f
4( , \. .
objectives-bas6 inStructional-programs."For such pi-ograms,a .great variety of... ,,.
-._
, ;'.
jechntques hts been prOPoSed(for reviews, see:,.il'ais'; 1978; Hambleton, 1980; 4
4 Hamt?..leton; Powell, <SL EignoeY1979;,,Jaeger,k1979; thepa'r:d, 1980a, 1980b). The ,
emphasis in this Raper. will be 'on the affgoft (1971) and Nedesky (1954) tech-
niques. 'These two techniques, which are based bnan item by item judgment of, .
test content, are among' the most popular techniques in use in obJectives-
.
based instruction.4
It has-been argued that all standard setting is'arbitrary (Glass, 1978;
Shepard,'1979, 1980a, 1980n).,this is correct since standards 'ought to.refl,ect
learning objectives, and these, ultimately rest on valuesand norms. In
addition, the various standard-setting techniques available differ,more or -*4
less, in the conception 6fmastery underlying the,way standards are obtaingd:
Therefore, diffei-ent results can'be expected both for different techniques-----------
and for different persons using. the same 'technique. Thishas been confirmed'
in many experiments- {Andrew & Hecht, 1976; ,Brennan'& EOCkwood, 1980; .
.
Koffler, 1980; Saunders, Ryan, & Huynh, 1981; Skakun & Kling, 1980). In this
paper we do not share the concern with inconsistent results due to differeh-.
.
ces between techniques or,persons. Instead, the interest is in a related'
but hitherto ignored apsect of standard setting, nameh,the possibility.ofi
jLajiAkjijnatit±1a. Inthrudge 4nionsistencY rises when the'judge'.. .
...' -
specifies probabilities of success on the items,that are' incompatible with', f '
4. 0 : t
0
. ,
s \
Inconsistencies in Standard Setting'
3, ,
each other and imply different, conflicting standards. An example of
ntrajUdge.inconiistency is a judge specifying a low probability, of success
.for an easy item but a large probability for a hard item. These two
Ydugments are obviously inconsistent: the former implies a low standard,
whereas the latter indicates that a high standafti should be set. Another
example is.a judge specifying approximatel) gqual probabilities-for highly
discriminating items (of diffgringAifficulties). Gendrally, inconsistencies.
such aS.in these examples are due to a disciepancy beteen the actual proper-
ties of the items and the judge's perception of them.V
,.
Thus far, no attention has,been paid to the possibility of intrdjudge
inconsistency, and results' of the Angoff or Nedelsky-technique are generally
employed without checking the quality of the judge. This may be due to the,.
fact that classical test .theory does not provide satisfactory methods forI
analyzing such'inconsistencies. It is the-purpose of this paper to show ii,o)W
latent trait theory can be used to decide whether Angofj of Nedelsky standards
-have been set consistenly enough for. use in practice and to assess for which.'
c
items inconsistencies have occurred. The second purpose of, the paper is to
present empirical results showing how consistently the Arigoff.and Nedelsky .
techniques were used in a typical educational situation. In, the following9
it is assumed,that the reader is familiair with the elementary concepts
from latent trait theory as-well as the technical aspects of the..Angoff and
le Nedelsky techniques (S'ee the appendix). A fuller description Of the method
. and the empirical results is given in wan der Linen (1981a).1.
Inconsistencies in.Standard Setting
Method
r
4.
p
As mentioned earlier, intrajudge inconsistencies arise when probabili-
ties are specified that .are incompatible with. each other and imply different,
conflicting standards. Figure 1 shows how this can be viewed from latent
-trait theory. In this example, 0, denotes the level of mastery belbngingto
the borderline.studeat whom the judge has in mind. Fromhe item characteris-
tic curve it follows that this borderline student has a probability of,
Success equal to 1).i: However, the Sudgespec-ifies a probability equal to
p.(s). Now, a misspecification occurs if
(s).ei , pi. - pi
is unequal to zero., It 'i's easy to see that a judgeeis only"consisent if no
IlliSSI)edfiCatiOnSOCUW,thatis,ife.-7 0 for all items.:As soon ast.,
misspecificatilns are obtained for some items, the judge is inconsistent'
,.%- °
in the sense that his probabilities Aenof imply the same mastery level and
, .'',/'
,,,,,.
therefore cannot belong to bneperson. Thus,14n order to declare whether a
judges has worked c risistentlx, a method is needed tb assess the mfisspecifica--
,
tions e.- .
re,;The'fol)pwin eps summarize how latent trait theoiy can,be used for
,
tO.s.ptirpo .
A. A latent ,-tfO' model is cbcisen,.its parameters are lestimated, and its
fit is tes ed. 'Suppose that n items fit the model. ,
6
A
; 4
C
. W.
..*
e
.
'prob: of
succes. .
Pi
4
Q
a
_
em
Inconsistencies in.Standard Setting
5;
' 'o
o
a
V
0c
o
mastery level
ee
.
' e
p
a
Inconsistencies in St4pdard Setting
6.
2. For these n items the Angoff or Nedelsky technique is used to spetify
for each item the probability of success p.:(s)
3. Using'equations 1 and 2(appendix), the Angoff or Nedelsky standard, Tc,
is computed
4. The hypothesis to be tested is that the judge has worked consistently,
i.e., has specified correct probabilities of success, Note that under this
hypothesis, the Angoff or Nedelsky standard technically is a true score
'(expected observed score). The true score standard Tc is next transfbrmed
into a standard on the 0-stale of the latent trait model via the estimated
test characteristic curve (appendix,' equation 3). Since the latent trait
standard O` is no explicit function of Tc'
trial values must be sub-
stituted for the forther until the value of the ?atter is obtained. The
task is simplified by the faCt that 0 is monotonically related to T. ,
However, some computer programs standerdly produce the estimated test
characteristic curve, and in that case Oc can simply be read off.
5. Next, substituting 0cand the estimated item parameters into the model,
the estimated probabilities pi are computed,
6.. In order to determine whether the hypothesis of a consistent judge is
tenable, a comparison between the subjective, probabilities, provided by
the judge, p.(s), and the objective probabilities estimated under the
model, pi, must be made. This can be done using the index of consistency
Li (appendix, equation 5. C1 is the degree ta which the average absofute
, 0
.misspecification differs from its maximum possible value, measured on they. .
standard interval CO, 1]. The closer the value of C1to zero; the.less
,
tenable .the hypdtheSis.isthet the judge has worked consistently.
.
..
-
, ,....,
Inconsistencies in Standard Setting
7
T. A special difficulty is associated with the use of the Nedelsky technique.0
This technique can provide only a'limited number of possible probabilities
of success, and inconsistencies may therefore-be attributable to the;
discrete character-0 tie technique rather than the judge's behavior. The
index A (appendix, equation 6) can be used to assess howrlarge a reduction
.of consistency-has occurred because of the disCrete character of the- jo
technique.
8. Finally, the pattern of differences between pi(s) and pi' is analyzed..
Technically, these differences are the "residuals" left over after the
hypothesis of a consistent judge has been fitted to the data. An analysis
of,this pattern can be used, forinstance, to detect items with systematic. .
errors across judges., or items for which the judge needs additional frail-ling.
Results
p.
'An .empirical investigation; was carried out to illustrate the above
method and to compare results for the Nedelsky and Angoff techniques.. Eight0
An4off and nine Nedelsky judges were used.who each- inspected the same 25-
item test belongingo an instructional unit from a physics course introducing
grade ten pupils to elementary mechanical concepts. All items'wereof the,.
three- or four-choice type. A latent trait analysis, based'on the responsesI.
of 156 pupils, produced 18 items showing a satisfactory -fit to the Rasch'
.
model (appendix, equation 4). A moredetailed descriptiOn Of the items and
the design of the study is given in van der tinder', (1981a, 1981b)..
II
Inconsistencies in Standard S%tting
8.Ns.
Table 1
'Res.ults for Nine Judges Using the Nedelsky Technique
Judge E C1'
1
3
4
5
-6
7
8
3 9
Meah
62
0,25 .65 .74 .09
2 4'.30 .63- , ./1 .08
., .25 .65 .76 .11
, -,5 :69 -. .77 .08'. .
,
'-'.20 .75. ,84- .0,
.
.25. .69 .77. .08
.23 .69 -. .78 '.09
:23 .73 78 .05-
.25 .67' .76 : .09
.68 .77 .09
.
-
F
1
4
4.,
Incopsistencies in'St6ndard Setting1/4
9.
c
Tapre 2 .
Estimated Prckabilities of Success 'or -Two Nedelsky Judges-.
Judge 2 Judge 5
Item P e e. e. e.(u)
P(s) (u) (9.)
i 1P
2
3
7 4
5
.50
1.00
1.00
.50
1.06
.73
.11
.93
.50
'..94
.50 84
7 1.00 .8,7
8 .50 .q2
:60 .71
*10. .50 .86
11 .50;
.74;,
12 .50 .16
13 .33 .82
14 1.00 .22
15 .. 50 .26
16 :25 .62
.
17 . 1.00 94
18 .5 .17
.08
.12
.07
.16
.05
33
..33
1.00
.50
1.00
.66
:08
.90
.41
:92
.15,- .50 .79
.12 :50 .83
-.07, 1.60 .89
.05 .33 .63
113, .50 .81.
.01 .50 .67
.08 .50'. .12,.,
.17 1.'00 :J6.
.01 .33 .17
..02 .3 .20,
-12 , :50 .53
I
.06 1.00 .91
.07 .25 ./13
..73
.89
.93
.50 .
.94 -
.84
.8Y-
'.92
.71
.86
.74.
.84
.821 ,
.78
'..74.
.62
..94
.83
.66 .01
.96 .10
..59 .041
.92 . .08
.79. .12
4.83 .16
.89 .11
:63 .04
.81 .14
.67 .08
.88" .12
.75 .01
.83 .08
,
.80 .05
.53 .3
.91 .09
.87 ..12
ti
V,
dot
4,4t 1. - - -1
Inconsistencies inStandard Setting42
'0
k 10. .
, .I.
4
4 Table 3
I
A ,
I
c,
Results for Eight JudgeS Using the Angoff Technique
p
Judge
.1
2
3
441
5
6
7
8
Mean
E .C1
.21 .73
.15 '-c
.16 4
.20 P.75
.16 .80
.17' .78
v.22 .71 Ak.
.1g 4I
t
4
- ra
a
N
Ps"
4
. Inconsfstencies, in Standard Setting
p
Table -4
.,Estimated; .iVrobabilities of Success for Two Angoff-Judges .
tem
1
2
3
4
5:
7
8 .
-Judge 2 Judge 7
I p e(u)
.70 .74 .74
.50 .11 .89
..80 .93
:30 :50 .50. .
.-80 .94' .94' ;-
.90
1.00 .87:' .87i .
9,
10
12
13
14
-5
.60 .92 1.92
.70 .72 -.72 -
.90 .86 .86..
.60 .75- .75
.40 ; .16' .84.4..
.80 ..82.- .82
/ ' '.-'.40 .23 .77
1 , .50 .27 .73
51i.62 .62-
.70 ..94 -. .94
.30 1.18 4 .$2
Pi(s)
p. e(u)
.30 .57 .57
.30 .06 '.94
.90 .87 .87
.70. .34 .66
.70 .89 .89
,..80 .72. .72
60. .76 .76
.30 ,..86
.30 .56 -56-,- - ., -4.
:60- :70 . 6
. .70 ,69,
r*--.30 9 . .:91 .
1..
.50, 30- ..70.
.50 '.13 .87
.:30- .15 . .85,, . .,
. -.50 .46 .54
.80 .8'8 .813
.. .50 .p ,.90
Inconsistencies in Standard Setting
12.
so
Table 1-shows the results for the nine Nedelsky judges. T4(first column/.
A.0.give 'the average absolute errors Of specification (E); the n columns
111? the values for, the, consistency index,(C1) and the reduc ion in consis-
tency due-to the discrete -character of the Nedelsky techni ue (0)-. The mean.
error of Specification,for all nine judges was no less t an .25. The mean
value of 0 was equaj to .09.4
A
TaOle2containsthevaluespfp..Wandp.fpr he least consistenti -V
. as well as the,.most consister judge. The last two 'olumns show the upper(u) (z) (s), .^
.
(e ) and lower (e ) bounds to the estimated m specification (pi
for the individual 'items.41mote the larger varia lity of these specification
errors for the worst judge..
'The results'for the eight Angoff judges are given in Table 3. The mean
(absolute error for all eight judges was equal/to.18 and thus less serious
than for the Nedelsky technique. Corres ndingly, the values for ci are higher
than-the ones in Table 1. Table 4 gives more detailed information about the
resdlts:for the least consistent and most consistentAngoff judges.
The conclusion from te above findings is that when using the Angoff or
,,Nedelsky technique one has to reckon with serious misspecifications of the
probabilities'of success from which the stanstards_are_camplited: Onthe-whole,
these errors are however noticeably less unfavorable for the Angoff than for
the Nedelsky technique, the explanation being the lact that the latter admits
only discrete probabilities and thusalways forces the'judge to be inconsistent
to some extent.
1-A
.
sa
The method propdsed in this paper canbe used for several purposes.
JnconOstenciet in 5tandard Setting
_Discussion-
13.
An obvious possibility is a r outine check of standard setting results beforeoe,
they are use d'in educational practice. Other possibilities'are,for example:
/1) selecting judges meeting predetermined criteria of consistency, (2)
evaluating programs for training judges, (3) assessing consequences of
modifying standard- setting techniques; or (4) item analysis to detect items
w.with systematic errors across judges or techniques.
For all these applications of the method, it is necessary'that.the items
.
fit the latent trait model. However, if some of the items do not satifactorily
fit the model, the method can still be used for the other items4in-the telt.
The only modification necessary is the computation of a new standard skipping
:tile items not.fittingthe model: TIe estimAtion,of the errors of specification
and the-consistency index are based owthe new standard, and these estimates',-
then, still give an impression of how consistently the judge has-worked.
/5/ 1
If
Se.
'Inconsistencies in Standard Setting
t
References
14.
/.
. . r. .
Andrew,- B.J., & Heeht-, J.T. A preliminary investigation of two procedures
for setting examination standards. Educational and Psychological
Measurement;- 1976, 36, 45-50.
.0
Angoff, W.H.- Scales, norms, and equivalent scores. In R.L. Thorndike (Ed.),
to
KA.
Educational Measurement. Washington, D,C.: American Council on" .f
Education, 1.971.
Brennan, 1.0.ockwood, R.E. A comparison,of the Nedelsky and Angoff
cutting score procedures using generalizability theory. ',Applied
Psychological Measurement, 1980, 4,_219-240.
Glass, G.V. Standards and criteria. Journal Of Educational Measurement,
1978, 15, 237-261.
Hambleton, R.K Test score validity and standard-setting methods. In
R.A. Berk°(Ed.), Criterion-referenced measurement: The state of the art.
Baltimore, The Johns Hopkins University Press, 1980.
Hambleton, R.K,Powell, S., & Eignor,°DR. Issues and methods for standard-'
setting. Paper presented at the Annual' Meeting orthe National Council.,
on Measurement in Education, San Francisco, California, April 9-11,
1979.,
Jaeger; R.M. Measurement consequences of selected.standard,setting models.
In M.A.Bunda and J.R. Sanders (Eds.), Lect-LeLanclarolasin
competency -based measurement. Washington D.C.: National Council on
Measunemeht i n Edueation-,- 1979e.
Koffler, S.L. A compari.son of approaches for setting standa;ds. ,ournal
of Educational Measurement, 1980,17, 167 - 1.87,.
h.
a
A
0Inconsistencies in Standard Setting
.
15. -
a.
0
,Nedelsky,:..L, Absolute g ding standards for.objective tests. Educational
andPsYchatogical Measurement, 1954, 14, 3-19... ....._. -.. v
C., v .
Saunders., J.C.,!Ryan, J.P.,..S. Huynh, H. A coniparlson of two approaches to
setting passing scores based on the Nedelsky procedure. Applied
Psychological Measurement, 1981, 5, 209-217.
" Skakun'LE.N, & Kling, S. Coniparability,of methods for setting standards.
.Journal of Educational Measurement, 1980, 17., 229-235.
Shepard, L.A. Setting standards. In M.A. Sunda and J.R. Sanders (Eds.),
Practices and problems in.competencp:based measurement. 1.ashingt6n,
D.C.: National Council on Measurement in Education, 1979.
S Trd, L...Standard setting issues and methods. Applied Psychological
Measurement, 1980; 4, 447-467.. (a)
Shepard, C. Technical issues-.in mimimuMbompetency testin
Berliner-(Ed.), Review of research in =-education
Illinois: F.E. Peacock Publisher's, 1980. (b)
van der Linden, W.J. A latent trait method for determining intrajudge
inconsistency in the Angoff and Nedelsky techniques of standard
. setting.' Submitted for publication..1981. (a)
D.C.
Itasca:
van der Linden, W.J% A latent trait look at pretest-posttest validition of
criterion- referenced test items. Review'of Edutational Research, 1981,
51, 379-402. (b)..
r.
Angoff standard
.111consisenc ies in Standard Setting
Appendix: Main Formulas and Equftions
For a test of length n, the Angoff standard is equal, to
a
(1) -Pi(s)
,
i=1
0
wherepis) 15 the borderline student's probability of success as specified
by the judge.
Nedelslv standard7.
The Nedelsky standard is equal to l).with
(2)(s) -1
p: - fq. - k.]1 '
4 I
44.
where qi is the number of alternatives of item i, and kiis the number Of
alternativessof which the judge indicates that the borderline student knows
they are incorrect..
Test characterigtiC curve
For a student,With 0 it holds thatc
n
= E( u.1 10C) = E(X10c)
.1=1
4
Inconsistencies in,Standard Setting
17.,
where'Pi(+Iuc),Is the probability of a correct response to item E( ) is
the expected value operator, u.1
is the item response variable-(1 = correct,
0 = incorrect), and L
cis the classical test"theory true score.
I
Rasch model
.(4) Pi(+10c) = [1 + exp[--1
- b.).11 .
. ,
where b. ils the difficulty parameter of item i.
Index of consistency
M -(5)
1M
where, .
N.
n
E y
i=1
. - p.j/n;(s)
M E y e(u)
/n;
1=1 '
el Max{p., 1 - p.
445- I
Note that bi) is the maximum value of4. o
maximum value.00f E,
- pig, and that M is the
Reduc ti.n in consistency
Inconsistencies in Standard Setting
18.
For 616.NedeisY technique the. reduetioain consistency due to the discrete-o
character of its probabilities is equal to-.
,where .
r
4,
.07
'andk"sthevallleofiCill(2).COS"SLIchthat.el (° is minimal, Note
thate.PjstheminimUM*valOeofip.(s) pil, and that m the minimum
value of E.
r
.4
I
ee,
.1,