test development. stages test conceptualization –defining the test test construction –selecting...

50
Test Development

Upload: frances-turrell

Post on 30-Mar-2015

297 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Test Development

Page 2: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

stages• Test conceptualization

– defining the test

• Test construction– Selecting a measurement scale– Developing items

• Test tryout• Item analysis• Revising the test

 

Page 3: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

1. Test conceptualization

• Defining the scope, purpose, and limits of the test.

Page 4: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Initial questions in test construction

• Should the item content be similar or varied?

Should the range of difficulty be narrow or broad? – ceiling effect vs. floor effect

• How many items should be created?

Page 5: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

• Which domains should be tapped? – the test developer may specify content domains

and cognitive skills that must be included on the test.

• What kind of test item should be used?

Page 6: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

2. Test construction

Page 7: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Selecting a scaling method

Page 8: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

levels of measurement

• N

• O

• I

• R

Page 9: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Scaling methods

• Most are rating scales that are summative

• May be unidimensional or multi-dimensional

Page 10: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Method of paired comparisons

• Aka forced choice

• Test taker is forced to pick one of two items paired together

Page 11: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Comparative scaling

• Test takers sort cards or rank items from “least” to “most”

Page 12: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Categorical scaling

• Test takers sort cards into one of 2 or more categories.

• Stimuli are thought to differ quantitatively not qualitatively

Page 13: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Likert type scales

• Response choices are ordered on a continuum from one extreme to the other (e.g., strongly agree to strongly disagree).

• Likert assumes an interval scale although this may not be realistically accurate.

 

Page 14: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Guttman scales

• Response choices for each item are various statements that lie on a continuum.

• Endorsing the most extreme statement reflects endorsement of milder statements as well.

Page 15: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Method of equal-appearing intervals

• Presumed to be interval• For knowledge scale:

– obtain T/F statements – Experts rate each item

• For attitude scale– Judges rate each item on a likert scale assuming equal

intervals

• For both • Total test score for the test taker is based on “weighted” items

(determined by averaging the experts ratings)

Page 16: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Method of absolute scaling

• Way to determine the difficulty level of items. – Give items to several age groups, with one age

group acting as the anchor.

• Item difficulty is assessed by noting the performance of each age group on each item as compared to the anchor group.

Page 17: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Method of empirical keying

• Based entirely on empirical findings. • Test developer comes up with several items

and then gives these to a group of people who are known to possess the construct and a group who is known not to possess the construct.

• Items are selected based on how well they distinguish one group from the other.

Page 18: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Writing the items

Page 19: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Item format

• Selected response

• Constructed response

Page 20: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Multiple choice

• Pros----

• Cons----

 

Page 21: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Matching

• Pros----

• Cons----

Page 22: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

True/False

• Pros----

• Cons----

• Forced-choice methodology.

Page 23: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Fill in

• Pros----

• Cons----

Page 24: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Short answer objective item

• Pros---

• Cons--- 

Page 25: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Essay

• Pros----

• Cons----

Page 26: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Scoring items

• Cumulative model

• Class/category

• Ipsative

• Correction for guessing

Page 27: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

3. Test tryout

• Should be on group that represents the ultimate group of test takers (who the test is intended for)

• Good items – Reliable– Valid– Discriminate well

Page 28: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

• Before item analysis, look at the variability of scores within the test– Floor effect?– Ceiling effect?

Page 29: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

4. Item analysis

• helps determine which items should be kept, revised, deleted.

Page 30: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Item-difficulty index

• proportion of examinees who get the item correct.

• can get a mean item difficulty.

Page 31: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Ideal item difficulty

• when using multiple guess items, try to account for the probability of chance. – Optimal item difficulty = 1+g/2 – exception to choosing item difficulty around

mid-range involves tests of extreme groups.

Page 32: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Item endorsement

• proportion of examinees who endorsed the item.

Page 33: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Item reliability index

• Indication of internal consistency

• Product of the item SD and the correlation between the item and total scale

• Items with low reliability can be eliminated

Page 34: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Item validity index

• Correlate item with criterion – (helps identify predictively useful test items)

• Multiply the item score and the criterion total score with the SD of the item.– The usefulness of an item also depends on its

dispersion or ability to discriminate

Page 35: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Item discrimination index• how well the item discriminates between high

scorers and low scorers on the test. • For each item, compare the performance of those

in the upper vs lower performance ranges. Formula: d= (U-L)/N

• U = # of pple in the upper range who got it right • L= # of pple in the lower range who got it right• N= total # of pple in the upper OR lower range.

Page 36: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Interpreting the IDI

• can vary from –1 to +1. • A (–) number =• A 0 indicates =• The closer the IDI is to +1• Can also use the IDI approach to examine the

pattern of incorrect responses.

Page 37: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Item characteristic curves

• “Graphic representation of item difficulty and discrimination”

• horizontal line = ability

• vertical line = probability of a correct response

Page 38: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

• plots the probability of a correct response relative to the position on the entire test.

• If the curve is an incline slope or like an S, the item is doing a good job of separating low and high scorers.

Page 39: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Item fairness

– Items should measure the same thing across groups

– Items should have similar ICC across groups– Items should have similar predictive validity

across groups

Page 40: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Speed tests

• Easy items, similar items – everyone gets correct.

• Measuring response time

• Traditional analyses of items do not apply

Page 41: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Qualitative item analysis

• Test takers descriptions of the test

• Think aloud administrations

• Expert panels

Page 42: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

5. Revising the test

• based on the info we obtained from the item analysis. New items and additional testing of these items may be required.

Page 43: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Cross validation

• Once you have your revised test, need to seek new, independent confirmation of the test’s validity.

• The researcher uses a new sample to determine if the test predicts the criterion as well as it did in the original sample.

Page 44: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Validity shrinkage

• Typically, with cross validation, you will find that the test is less accurate in predicting the criterion with this new sample.

Page 45: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Co-validation

• Validating two or more tests at the same time

• Co-norming

• Saves $

• Beneficial for tests that are used together

Page 46: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

6. Publishing the test

• final step that involves development of a test manual.

Page 47: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Production of testing materials

• Testing materials that are user friendly will be more accepted. The lay out of the materials should allow for smooth administration.

Page 48: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Technical manual

• Summarizes the technical data and references. Item analyses, scale reliabilities, validation evidence , etc can be found here.

Page 49: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

User’s manual

• provides instruction for administration, scoring, and interpretation.

• The Standards for Educational and Psychological Testing recommend that manuals meet several goals (p 135).

• two of the most important: • 1. describe the rationale and recommended uses

of the test• 2. provide data on reliability and validity.

Page 50: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item

Testing is big business