errs - eric · ed-218 345 document resume. 0. tm 820'386. author, van der linden, wim j..-....

ED-218 345

DOCUMENT RESUME.

0

TM 820'386

AUTHOR , van der Linden, Wim J. .- .

TITLE Assessing Inconsistenciesin.Standard Setting withthe Angoff or Nedelsky Technique.

PUB DATE Mar 82 .

..\

NOTE 20p.; Presentation of this paper supported by a grantfrom-the Dutch Foundation of Educational Research.

EDRS PRICE MFOl/PC01 Plus 13ostage.DESCRIPTORS Criteridn Referenced Testd; *Cutting Scores; *Error

of Measurement; *Latent Treit theory; Mathematical°Models; Probability; 'Secondary Education; Test Items;irTest N

IDENTIFIERS *Angoff Methods; Inter Rater.Reliabilitk; *NedelskyMethod';. Standard Setting

ABSTRACT'A latent trait. method is presented to investigate the

possibility that Angoff or Nedelsky judges specify inconsistentprobabilities in standard setting-techniques for objectives -basedinstructional programs. It is tuggested that judges frequently.,specify a low probability of success for an easy item but'a largeprobability for a hard item. The.respontet of 156 pupils to a 25-itemtest from a tenth grade, physics course were inspected .bit eight AnOoffand nine'Nedelsky judges. The latent' trait analysis produced 18 itemsshowing a satisfactory fitto tbe Rasch model. Serious errors of

specificatio*ere found and errs were-coesiderably larger for theNedelsky technique. Special difficulties with the Nedelsky judges arediscudted. Applicatiot of the latent trait method are discussed.(Author/CM)

00, s

O

oti Jr

*************************p*********************************************Reproductions supplied EDRS are the best that can be made *

from the original document.******************************* ***'****************.********************

A

YY

v

O

Atsessing Inconsistencies in Standard Setting

with the Angoff or Nedelsky Technique-,

Wim J..van der Linden_

Twente University of Technology'

3

-.

....,

Enschede, The Netherlands. %

1.1 t

. .

cs

U.S. DEPARTMENT OF EDUCATIONNATIONAL INSTITUTE OF EOUCATION

EOLICATIONAL RESOURCES INFORMATION

CENTER (ERIC))1, This document has been reproduced as

received from the Person or organizationongmating itMinor changes have been made to improvereproduction quality

Points ofytew or opinions stated on this docu

merit do not necessanly represent ofttal NIEposinori or policy

"PERMISSION TO REPRODUCE THISMATERIAL. HAS BEEN GRANTED BY

(Al. J. th24,44,( L.;,,44

__TCLIHEIEDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)."

'

Yt *

Paper presented at the Annual,Meeting of the American Educational Research. A

Association, New York., March 19-23, /982. Pr6sentation of this paper was

made possible by &grant from the Dutch Foundation of Educationa4iiResearch.

00 t

r.

6

Inconsistencies in Standand'Seiting

1.

Abstract

It has often been argued that all techniques of .standard setting are arbitrary:,, ....

.

and .likely to,yield different resiilts for different techniques Or persons....

i

.

.. 4,

d This paper -deaJs with Avrelat'ed but hitherto :ignore& aspect, namely the,

1 . ', . , '. _ .

..,...

possibi I) fy Olt Ank)ff,-,or Nedel s4cy judges specify. ihtonsistent ,pf-ababilities,t d .. %, :. 4,,

- :e:g4 ;,a-A Ii015bbllity for an .easy itgliibilf a large proliabillty_ for a h4tIl

.

... . 6, . ,.-

l ' '' ' a 'a. ' 2,' . ,..' $$ .

, .,,item, 'A 1 atent. trail, 'method is prcoposedto `es'timat'e such 'mi sspeei fications..

. .. - - ,

-, 1 . . ,t . a , \worked'''' and eterWibes whether the judge has orked 'Consistently. Results froin 'an

:71*1* , , AI 0 '' 4

01 ' p ' 1

empirical study dre given which indiCate that serious :. errors of specification

,:of ,

can be expected:anci that these are considerably larder for the Nedelsiq than

for-the Angoff technique.

. r

.,4

r;

Inconsistencies in Standard Setting

2.

0

Assessing Inconsistencies in Standard-Setting

' with the Angoff or Nedeisky Technique

This panel- is concerned with'the 'u-se of .standard-setting techniques in.f

4( , \. .

objectives-bas6 inStructional-programs."For such pi-ograms,a .great variety of... ,,.

-._

, ;'.

jechntques hts been prOPoSed(for reviews, see:,.il'ais'; 1978; Hambleton, 1980; 4

4 Hamt?..leton; Powell, <SL EignoeY1979;,,Jaeger,k1979; thepa'r:d, 1980a, 1980b). The ,

emphasis in this Raper. will be 'on the affgoft (1971) and Nedesky (1954) tech-

niques. 'These two techniques, which are based bnan item by item judgment of, .

test content, are among' the most popular techniques in use in obJectives-

.

based instruction.4

It has-been argued that all standard setting is'arbitrary (Glass, 1978;

Shepard,'1979, 1980a, 1980n).,this is correct since standards 'ought to.refl,ect

learning objectives, and these, ultimately rest on valuesand norms. In

addition, the various standard-setting techniques available differ,more or -*4

less, in the conception 6fmastery underlying the,way standards are obtaingd:

Therefore, diffei-ent results can'be expected both for different techniques-----------

and for different persons using. the same 'technique. Thishas been confirmed'

in many experiments- {Andrew & Hecht, 1976; ,Brennan'& EOCkwood, 1980; .

.

Koffler, 1980; Saunders, Ryan, & Huynh, 1981; Skakun & Kling, 1980). In this

paper we do not share the concern with inconsistent results due to differeh-.

.

ces between techniques or,persons. Instead, the interest is in a related'

but hitherto ignored apsect of standard setting, nameh,the possibility.ofi

jLajiAkjijnatit±1a. Inthrudge 4nionsistencY rises when the'judge'.. .

...' -

specifies probabilities of success on the items,that are' incompatible with', f '

4. 0 : t

0

. ,

s \

Inconsistencies in Standard Setting'

3, ,

each other and imply different, conflicting standards. An example of

ntrajUdge.inconiistency is a judge specifying a low probability, of success

.for an easy item but a large probability for a hard item. These two

Ydugments are obviously inconsistent: the former implies a low standard,

whereas the latter indicates that a high standafti should be set. Another

example is.a judge specifying approximatel) gqual probabilities-for highly

discriminating items (of diffgringAifficulties). Gendrally, inconsistencies.

such aS.in these examples are due to a disciepancy beteen the actual proper-

ties of the items and the judge's perception of them.V

,.

Thus far, no attention has,been paid to the possibility of intrdjudge

inconsistency, and results' of the Angoff or Nedelsky-technique are generally

employed without checking the quality of the judge. This may be due to the,.

fact that classical test .theory does not provide satisfactory methods forI

analyzing such'inconsistencies. It is the-purpose of this paper to show ii,o)W

latent trait theory can be used to decide whether Angofj of Nedelsky standards

-have been set consistenly enough for. use in practice and to assess for which.'

c

items inconsistencies have occurred. The second purpose of, the paper is to

present empirical results showing how consistently the Arigoff.and Nedelsky .

techniques were used in a typical educational situation. In, the following9

it is assumed,that the reader is familiair with the elementary concepts

from latent trait theory as-well as the technical aspects of the..Angoff and

le Nedelsky techniques (S'ee the appendix). A fuller description Of the method

. and the empirical results is given in wan der Linen (1981a).1.

Inconsistencies in.Standard Setting

Method

r

4.

p

As mentioned earlier, intrajudge inconsistencies arise when probabili-

ties are specified that .are incompatible with. each other and imply different,

conflicting standards. Figure 1 shows how this can be viewed from latent

-trait theory. In this example, 0, denotes the level of mastery belbngingto

the borderline.studeat whom the judge has in mind. Fromhe item characteris-

tic curve it follows that this borderline student has a probability of,

Success equal to 1).i: However, the Sudgespec-ifies a probability equal to

p.(s). Now, a misspecification occurs if

(s).ei , pi. - pi

is unequal to zero., It 'i's easy to see that a judgeeis only"consisent if no

IlliSSI)edfiCatiOnSOCUW,thatis,ife.-7 0 for all items.:As soon ast.,

misspecificatilns are obtained for some items, the judge is inconsistent'

,.%- °

in the sense that his probabilities Aenof imply the same mastery level and

, .'',/'

,,,,,.

therefore cannot belong to bneperson. Thus,14n order to declare whether a

judges has worked c risistentlx, a method is needed tb assess the mfisspecifica--

,

tions e.- .

re,;The'fol)pwin eps summarize how latent trait theoiy can,be used for

,

tO.s.ptirpo .

A. A latent ,-tfO' model is cbcisen,.its parameters are lestimated, and its

fit is tes ed. 'Suppose that n items fit the model. ,

6

A

; 4

C

. W.

..*

e

.

'prob: of

succes. .

Pi

4

Q

a

_

em

Inconsistencies in.Standard Setting

5;

' 'o

o

a

V

0c

o

mastery level

ee

.

' e

p

a

Inconsistencies in St4pdard Setting

6.

2. For these n items the Angoff or Nedelsky technique is used to spetify

for each item the probability of success p.:(s)

3. Using'equations 1 and 2(appendix), the Angoff or Nedelsky standard, Tc,

is computed

4. The hypothesis to be tested is that the judge has worked consistently,

i.e., has specified correct probabilities of success, Note that under this

hypothesis, the Angoff or Nedelsky standard technically is a true score

'(expected observed score). The true score standard Tc is next transfbrmed

into a standard on the 0-stale of the latent trait model via the estimated

test characteristic curve (appendix,' equation 3). Since the latent trait

standard O` is no explicit function of Tc'

trial values must be sub-

stituted for the forther until the value of the ?atter is obtained. The

task is simplified by the faCt that 0 is monotonically related to T. ,

However, some computer programs standerdly produce the estimated test

characteristic curve, and in that case Oc can simply be read off.

5. Next, substituting 0cand the estimated item parameters into the model,

the estimated probabilities pi are computed,

6.. In order to determine whether the hypothesis of a consistent judge is

tenable, a comparison between the subjective, probabilities, provided by

the judge, p.(s), and the objective probabilities estimated under the

model, pi, must be made. This can be done using the index of consistency

Li (appendix, equation 5. C1 is the degree ta which the average absofute

, 0

.misspecification differs from its maximum possible value, measured on they. .

standard interval CO, 1]. The closer the value of C1to zero; the.less

,

tenable .the hypdtheSis.isthet the judge has worked consistently.

.

..

-

, ,....,


7

T. A special difficulty is associated with the use of the Nedelsky technique.0

This technique can provide only a'limited number of possible probabilities

of success, and inconsistencies may therefore-be attributable to the;

discrete character-0 tie technique rather than the judge's behavior. The

index A (appendix, equation 6) can be used to assess howrlarge a reduction

.of consistency-has occurred because of the disCrete character of the- jo

technique.

8. Finally, the pattern of differences between pi(s) and pi' is analyzed..

Technically, these differences are the "residuals" left over after the

hypothesis of a consistent judge has been fitted to the data. An analysis

of,this pattern can be used, forinstance, to detect items with systematic. .

errors across judges., or items for which the judge needs additional frail-ling.

Results

p.

'An .empirical investigation; was carried out to illustrate the above

method and to compare results for the Nedelsky and Angoff techniques.. Eight0

An4off and nine Nedelsky judges were used.who each- inspected the same 25-

item test belongingo an instructional unit from a physics course introducing

grade ten pupils to elementary mechanical concepts. All items'wereof the,.

three- or four-choice type. A latent trait analysis, based'on the responsesI.

of 156 pupils, produced 18 items showing a satisfactory -fit to the Rasch'

.

model (appendix, equation 4). A moredetailed descriptiOn Of the items and

the design of the study is given in van der tinder', (1981a, 1981b)..

II

Inconsistencies in Standard S%tting

8.Ns.

Table 1

'Res.ults for Nine Judges Using the Nedelsky Technique

Judge E C1'

1

3

4

5

-6

7

8

3 9

Meah

62

0,25 .65 .74 .09

2 4'.30 .63- , ./1 .08

., .25 .65 .76 .11

, -,5 :69 -. .77 .08'. .

,

'-'.20 .75. ,84- .0,

.

.25. .69 .77. .08

.23 .69 -. .78 '.09

:23 .73 78 .05-

.25 .67' .76 : .09

.68 .77 .09

.

-

F

1

4

4.,

Incopsistencies in'St6ndard Setting1/4

9.

c

Tapre 2 .

Estimated Prckabilities of Success 'or -Two Nedelsky Judges-.

Judge 2 Judge 5

Item P e e. e. e.(u)

P(s) (u) (9.)

i 1P

2

3

7 4

5

.50

1.00

1.00

.50

1.06

.73

.11

.93

.50

'..94

.50 84

7 1.00 .8,7

8 .50 .q2

:60 .71

*10. .50 .86

11 .50;

.74;,

12 .50 .16

13 .33 .82

14 1.00 .22

15 .. 50 .26

16 :25 .62

.

17 . 1.00 94

18 .5 .17

.08

.12

.07

.16

.05

33

..33

1.00

.50

1.00

.66

:08

.90

.41

:92

.15,- .50 .79

.12 :50 .83

-.07, 1.60 .89

.05 .33 .63

113, .50 .81.

.01 .50 .67

.08 .50'. .12,.,

.17 1.'00 :J6.

.01 .33 .17

..02 .3 .20,

-12 , :50 .53

I

.06 1.00 .91

.07 .25 ./13

..73

.89

.93

.50 .

.94 -

.84

.8Y-

'.92

.71

.86

.74.

.84

.821 ,

.78

'..74.

.62

..94

.83

.66 .01

.96 .10

..59 .041

.92 . .08

.79. .12

4.83 .16

.89 .11

:63 .04

.81 .14

.67 .08

.88" .12

.75 .01

.83 .08

,

.80 .05

.53 .3

.91 .09

.87 ..12

ti

V,

dot

4,4t 1. - - -1

Inconsistencies inStandard Setting42

'0

k 10. .

, .I.

4

4 Table 3

I

A ,

I

c,

Results for Eight JudgeS Using the Angoff Technique

p

Judge

.1

2

3

441

5

6

7

8

Mean

E .C1

.21 .73

.15 '-c

.16 4

.20 P.75

.16 .80

.17' .78

v.22 .71 Ak.

.1g 4I

t

4

- ra

a

N

Ps"

4

. Inconsfstencies, in Standard Setting

p

Table -4

.,Estimated; .iVrobabilities of Success for Two Angoff-Judges .

tem

1

2

3

4

5:

7

8 .

-Judge 2 Judge 7

I p e(u)

.70 .74 .74

.50 .11 .89

..80 .93

:30 :50 .50. .

.-80 .94' .94' ;-

.90

1.00 .87:' .87i .

9,

10

12

13

14

-5

.60 .92 1.92

.70 .72 -.72 -

.90 .86 .86..

.60 .75- .75

.40 ; .16' .84.4..

.80 ..82.- .82

/ ' '.-'.40 .23 .77

1 , .50 .27 .73

51i.62 .62-

.70 ..94 -. .94

.30 1.18 4 .$2

Pi(s)

p. e(u)

.30 .57 .57

.30 .06 '.94

.90 .87 .87

.70. .34 .66

.70 .89 .89

,..80 .72. .72

60. .76 .76

.30 ,..86

.30 .56 -56-,- - ., -4.

:60- :70 . 6

. .70 ,69,

r*--.30 9 . .:91 .

1..

.50, 30- ..70.

.50 '.13 .87

.:30- .15 . .85,, . .,

. -.50 .46 .54

.80 .8'8 .813

.. .50 .p ,.90


12.

so

Table 1-shows the results for the nine Nedelsky judges. T4(first column/.

A.0.give 'the average absolute errors Of specification (E); the n columns

111? the values for, the, consistency index,(C1) and the reduc ion in consis-

tency due-to the discrete -character of the Nedelsky techni ue (0)-. The mean.

error of Specification,for all nine judges was no less t an .25. The mean

value of 0 was equaj to .09.4

A

TaOle2containsthevaluespfp..Wandp.fpr he least consistenti -V

. as well as the,.most consister judge. The last two 'olumns show the upper(u) (z) (s), .^

.

(e ) and lower (e ) bounds to the estimated m specification (pi

for the individual 'items.41mote the larger varia lity of these specification

errors for the worst judge..

'The results'for the eight Angoff judges are given in Table 3. The mean

(absolute error for all eight judges was equal/to.18 and thus less serious

than for the Nedelsky technique. Corres ndingly, the values for ci are higher

than-the ones in Table 1. Table 4 gives more detailed information about the

resdlts:for the least consistent and most consistentAngoff judges.

The conclusion from te above findings is that when using the Angoff or

,,Nedelsky technique one has to reckon with serious misspecifications of the

probabilities'of success from which the stanstards_are_camplited: Onthe-whole,

these errors are however noticeably less unfavorable for the Angoff than for

the Nedelsky technique, the explanation being the lact that the latter admits

only discrete probabilities and thusalways forces the'judge to be inconsistent

to some extent.

1-A

.

sa

The method propdsed in this paper canbe used for several purposes.

JnconOstenciet in 5tandard Setting

_Discussion-

13.

An obvious possibility is a r outine check of standard setting results beforeoe,

they are use d'in educational practice. Other possibilities'are,for example:

/1) selecting judges meeting predetermined criteria of consistency, (2)

evaluating programs for training judges, (3) assessing consequences of

modifying standard- setting techniques; or (4) item analysis to detect items

w.with systematic errors across judges or techniques.

For all these applications of the method, it is necessary'that.the items

.

fit the latent trait model. However, if some of the items do not satifactorily

fit the model, the method can still be used for the other items4in-the telt.

The only modification necessary is the computation of a new standard skipping

:tile items not.fittingthe model: TIe estimAtion,of the errors of specification

and the-consistency index are based owthe new standard, and these estimates',-

then, still give an impression of how consistently the judge has-worked.

/5/ 1

If

Se.

'Inconsistencies in Standard Setting

t

References

14.

/.

. . r. .

Andrew,- B.J., & Heeht-, J.T. A preliminary investigation of two procedures

for setting examination standards. Educational and Psychological

Measurement;- 1976, 36, 45-50.

.0

Angoff, W.H.- Scales, norms, and equivalent scores. In R.L. Thorndike (Ed.),

to

KA.

Educational Measurement. Washington, D,C.: American Council on" .f

Education, 1.971.

Brennan, 1.0.ockwood, R.E. A comparison,of the Nedelsky and Angoff

cutting score procedures using generalizability theory. ',Applied

Psychological Measurement, 1980, 4,_219-240.

Glass, G.V. Standards and criteria. Journal Of Educational Measurement,

1978, 15, 237-261.

Hambleton, R.K Test score validity and standard-setting methods. In

R.A. Berk°(Ed.), Criterion-referenced measurement: The state of the art.

Baltimore, The Johns Hopkins University Press, 1980.

Hambleton, R.K,Powell, S., & Eignor,°DR. Issues and methods for standard-'

setting. Paper presented at the Annual' Meeting orthe National Council.,

on Measurement in Education, San Francisco, California, April 9-11,

1979.,

Jaeger; R.M. Measurement consequences of selected.standard,setting models.

In M.A.Bunda and J.R. Sanders (Eds.), Lect-LeLanclarolasin

competency -based measurement. Washington D.C.: National Council on

Measunemeht i n Edueation-,- 1979e.

Koffler, S.L. A compari.son of approaches for setting standa;ds. ,ournal

of Educational Measurement, 1980,17, 167 - 1.87,.

h.

a

A

0Inconsistencies in Standard Setting

.

15. -

a.

0

,Nedelsky,:..L, Absolute g ding standards for.objective tests. Educational

andPsYchatogical Measurement, 1954, 14, 3-19... ....._. -.. v

C., v .

Saunders., J.C.,!Ryan, J.P.,..S. Huynh, H. A coniparlson of two approaches to

setting passing scores based on the Nedelsky procedure. Applied

Psychological Measurement, 1981, 5, 209-217.

" Skakun'LE.N, & Kling, S. Coniparability,of methods for setting standards.

.Journal of Educational Measurement, 1980, 17., 229-235.

Shepard, L.A. Setting standards. In M.A. Sunda and J.R. Sanders (Eds.),

Practices and problems in.competencp:based measurement. 1.ashingt6n,

D.C.: National Council on Measurement in Education, 1979.

S Trd, L...Standard setting issues and methods. Applied Psychological

Measurement, 1980; 4, 447-467.. (a)

Shepard, C. Technical issues-.in mimimuMbompetency testin

Berliner-(Ed.), Review of research in =-education

Illinois: F.E. Peacock Publisher's, 1980. (b)

van der Linden, W.J. A latent trait method for determining intrajudge

inconsistency in the Angoff and Nedelsky techniques of standard

. setting.' Submitted for publication..1981. (a)

D.C.

Itasca:

van der Linden, W.J% A latent trait look at pretest-posttest validition of

criterion- referenced test items. Review'of Edutational Research, 1981,

51, 379-402. (b)..

r.

Angoff standard

.111consisenc ies in Standard Setting

Appendix: Main Formulas and Equftions

For a test of length n, the Angoff standard is equal, to

a

(1) -Pi(s)

,

i=1

0

wherepis) 15 the borderline student's probability of success as specified

by the judge.

Nedelslv standard7.

The Nedelsky standard is equal to l).with

(2)(s) -1

p: - fq. - k.]1 '

4 I

44.

where qi is the number of alternatives of item i, and kiis the number Of

alternativessof which the judge indicates that the borderline student knows

they are incorrect..

Test characterigtiC curve

For a student,With 0 it holds thatc

n

= E( u.1 10C) = E(X10c)

.1=1

4

Inconsistencies in,Standard Setting

17.,

where'Pi(+Iuc),Is the probability of a correct response to item E( ) is

the expected value operator, u.1

is the item response variable-(1 = correct,

0 = incorrect), and L

cis the classical test"theory true score.

I

Rasch model

.(4) Pi(+10c) = [1 + exp[--1

- b.).11 .

. ,

where b. ils the difficulty parameter of item i.

Index of consistency

M -(5)

1M

where, .

N.

n

E y

i=1

. - p.j/n;(s)

M E y e(u)

/n;

1=1 '

el Max{p., 1 - p.

445- I

Note that bi) is the maximum value of4. o

maximum value.00f E,

- pig, and that M is the

Reduc ti.n in consistency


18.

For 616.NedeisY technique the. reduetioain consistency due to the discrete-o

character of its probabilities is equal to-.

,where .

r

4,

.07

'andk"sthevallleofiCill(2).COS"SLIchthat.el (° is minimal, Note

thate.PjstheminimUM*valOeofip.(s) pil, and that m the minimum

value of E.

r

.4

I

ee,

.1,

errs - eric · ed-218 345 document resume. 0. tm 820'386. author, van der linden, wim j..-....

Documents