item analysis and validation

46
ITEM ANALYSIS AND VALIDATION Mark Leonard Tan Verena Gonzales Ann Creia Tupasi Ramil Cabañesas

Upload: kenkenken-tan

Post on 14-May-2015

12.321 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Item analysis and validation

ITEM ANALYSIS AND VALIDATION

Mark Leonard TanVerena GonzalesAnn Creia TupasiRamil Cabañesas

Page 2: Item analysis and validation

Introduction

The teacher normally prepares a draft of the test. Such a draft is subjected to item analysis and validation to ensure that the final version of the test would be useful and functional.

Page 3: Item analysis and validation

Phases of preparing a test Try-out phase Item analysis phase Item revision phase

Page 4: Item analysis and validation

Item Analysis

There are two important characteristics of an item that will be of interest of the teacher: Item Difficulty Discrimination Index

Page 5: Item analysis and validation

Item Difficulty or the difficulty of an item is defined as the number of students who are able to answer the item correctly divided by the total number of students. Thus:

Item difficulty = number of students with the correct answer

Total number of students

The item difficulty is usually expressed in percentage.

Page 6: Item analysis and validation

Example:What is the item difficulty index of an

item if 25 students are unable to answer it correctly while 75 answered it correctly?

Here the total number of students is 100, hence, the item difficulty index is 75/100 or 75%.

Page 7: Item analysis and validation

One problem with this type of difficulty index is that it may not actually indicate that the item is difficult or easy. A student who does not know the subject matter will naturally be unable to answer the item correctly even if the question is easy. How do we decide on the basis of this index whether the item is too difficult or too easy?

Page 8: Item analysis and validation

Range of difficulty index

Interpretation Action

0 – 0.25 Difficult Revise or discard

0.26 – 0.75 Right difficulty retain

0.76 - above Easy Revise or discard

Page 9: Item analysis and validation

Difficult items tend to discriminate between those who know and those who does not know the answer.

Easy items cannot discriminate between those two groups of students.

We are therefore interested in deriving a measure that will tell us whether an item can discriminate between these two groups of students. Such a measure is called an index of discrimination.

Page 10: Item analysis and validation

An easy way to derive such a measure is to measure how difficult an item is with respect to those in the upper 25% of the class and how difficult it is with respect to those in the lower 25% of the class. If the upper 25% of the class found the item easy yet the lower 25% found it difficult, then the item can discriminate properly between these two groups. Thus:

Page 11: Item analysis and validation

Index of discrimination = DU – DL

Example: Obtain the index of discrimination of an item if the upper 25% of the class had a difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct answer) while the lower 25% of the class had a difficulty index of 0.20.

Page 12: Item analysis and validation

DU = 0.60 while DL = 0.20, thus index of discrimination = .60 - .20 = .40.

Page 13: Item analysis and validation

Theoretically, the index of discrimination can range from -1.0 (when DU =0 and DL = 1) to 1.0 (when DU = 1 and DL = 0)

When the index of discrimination is equal to -1, then this means that all of the lower 25% of the students got the correct answer while all of the upper 25% got the wrong answer. In a sense, such an index discriminates correctly between the two groups but the item itself is highly questionable.

Page 14: Item analysis and validation

On the other hand, if the index discrimination is 1.0, then this means that all of the lower 25% failed to get the correct answer while all of the upper 25% got the correct answer. This is a perfectly discriminating item and is the ideal item that should be included in the test.

As in the case of index difficulty, we have the following rule of thumb:

Page 15: Item analysis and validation

Index Range Interpretation Action

-1.0 to -.50 Can discriminate but

the item is questionable

Discarded

-.55 to .45 Non-discriminating

Revised

.46 to 1.0 Discriminating item

Include

Page 16: Item analysis and validation

Example: Consider a multiple item choice type of test with the ff. data were obtained:

Item Options

1A B* C D

0 40 20 20 Total

0 15 5 0 Upper 25%

0 5 10 5 Lower 25%

The correct response is B. Let us compute the difficulty index and index of discrimination.

Page 17: Item analysis and validation

Difficulty index = no. of students getting the correct answer

Total= __40__

100= 40%, within of a “good item”

The correct response is B. Let us compute the difficulty index and index of discrimination:

Page 18: Item analysis and validation

The discrimination index can be similarly be computed:DU = no. of students in the upper 25% with correct response

No. of students in the upper 25%

=15/20 = .75 or 75%

DL= no. of students in lower 75% with correct response

no. of students in the lower 25%

= 5/20 = .25 or 25%

Discrimination index = DU – DL= .75 - .25= .50 or 50%

Thus, the item also has a “good discriminating power”.

Page 19: Item analysis and validation

It is also instructive to note that the distracter A is not an effective distracter since this was never selected by the students. Distracter C and D appear to have a good appeal as distracters.

Page 20: Item analysis and validation

Basic Item Analysis Statistics

The Michigan State University Measurement and Evaluation Department reports a number of item statistics which aid in evaluating the effectiveness of an item. Index of Difficulty – the proportional of the total group who got the item wrong. “Thus a high index indicates a difficult item and a low index indicates an easy item.

Page 21: Item analysis and validation

Index of Discrimination – is the difference between the proportion of the upper group who got an item right and the proportion of the lower group who got the item right.

Page 22: Item analysis and validation

More Sophisticated Discrimination Index

Item Discrimination refers to the ability of an item to differentiate among students on the basis of how well they know the material being tested.

A good item is one that has good discriminating ability and has a sufficient level of difficulty (not too difficult nor too easy).

Page 23: Item analysis and validation

At the end of the item analysis report, test items are listed according to their degrees of difficulty (easy, medium, hard) and discrimination (good, fair, poor). These distributions provide a quick overview of the test and can be used to identify items which are not performing well and which perhaps be improved or discarded.

Page 24: Item analysis and validation

The Item-Analysis Procedure for Norm provides the following information:

1. The difficulty of an item2. The discriminating power of an item3. The effectiveness of each

alternative

Page 25: Item analysis and validation

Benefits derived from Item Analysis1. It provides useful information for class

discussion of the test.2. It provides data which helps students

improve their learning.3. It provides insights and skills that lead

to the preparation of better tests in the future.

Page 26: Item analysis and validation

Index of Difficulty

x 100Where:

RU – The number in the upper group who answered the item correctly.

RL – The number in the lower group

who answered the item correctly.T – The total number who tried the

item.

Page 27: Item analysis and validation

Index of Item Discriminating Power

D = Where:

P – percentage who answered the item correctly (index of difficulty)

R – number who answered the item correctly

T – total number who tried the item

Page 28: Item analysis and validation

x 100 = 40%The smaller the percentage figure the more difficult the item.Estimate the item discriminating power using the formula below:

= = .40

Page 29: Item analysis and validation

The discriminating power of an item is reported as a decimal fraction; maximum discriminating power is indicated by an index of 1.00.Maximum discrimination is usually found at the 50 per cent level of difficulty.

0.00 – 0.20 = very difficult0.21 – 0.80 = moderately difficult0.81 – 1.00 = very easy

Page 30: Item analysis and validation

Validation

After performing the item analysis and revising the items which need revision, the next step is to validate the instrument.

The purpose of validation is to determine the characteristics of the whole test itself, namely, the validity and reliability of the test.

Validation is the process of collecting and analysing evidence to support the meaningfulness and usefulness of the test.

Page 31: Item analysis and validation

Validity

is the extent to which measures what it purports to measure or referring to the appropriateness, correctness, meaningfulness, and usefulness of the specific decisions a teacher makes based on the test results.

Page 32: Item analysis and validation

There are three main types of evidences that may be collected:

1. Content-related evidence of validity2. Criterion-related evidence of validity3. Construct-related evidence of validity

Page 33: Item analysis and validation

Content-related evidence of validity

refers to the content and format of the instrument. How appropriate is the content? How comprehensive? Does it logically get at the intended

variable? How adequately does the sample of

items or questions represent the content to be assessed?

Page 34: Item analysis and validation

Criterion-related evidence of validity

refers to the relationship between scores obtained using the instrument and scores obtained using one or more other test (often called criterion). How strong is this relationship? How well do such scores estimate

present or predict future performance of a certain type?

Page 35: Item analysis and validation

Construct-related evidence of validity

refers to the nature of the psychological construct or characteristic being measured by the test. How well does a measure of the

construct explain differences in the behaviour of the individuals or their performance on a certain task?

Page 36: Item analysis and validation

Usual procedure for determining content validity Teacher write out objectives based

on TOS Gives the objectives and TOS to 2

experts along with a description of the test takers.

The experts look at the objectives, read over the items in the test and place a check mark in front of each question or item that they feel does NOT measure one or more objectives.

Page 37: Item analysis and validation

Usual procedure for determining content validity They also place a check mark in front

of each objective NOT assessed by any item in the test.

The teacher then rewrites any item so checked and resubmits to experts and/or writes new items to cover those objectives not heretofore covered by the existing test.

Page 38: Item analysis and validation

Usual procedure for determining content validity This continues until the experts

approve all items and also when the experts agree that all of the objectives are sufficiently covered by the test.

Page 39: Item analysis and validation

Obtaining Evidence for criterion-related Validity The teacher usually compare scores

on the test in question with the scores on some other independent criterion test which presumably has already high validity (concurrent validity).

Another type of validity is called the predictive validity wherein the test scores in the instrument is correlated with scores on later performance of the feelings.

Page 40: Item analysis and validation

Gronlunds Expectancy Table

Grade Point Average

Test Score Very Good Good Needs Improveme

nt

High 20 10 5

Average 10 25 5

Low 1 10 14

Page 41: Item analysis and validation

The expectancy table shows that there were 20 students getting high test scores and subsequently rated excellent in terms of their final grades;

And finally 14 students obtained low test scores and were later graded as needing improvement.

Page 42: Item analysis and validation

The evidence for this particular test tends to indicate that students getting high score on it would be graded excellent; average scores on it would be rated good later; and students getting low scores on the test would be graded needing improvement later.

Page 43: Item analysis and validation

Reliability

Refers to the consistency of the scores obtained – how consistent they are for each individual from one administration of an instrument to another and from one set of items to another.

Page 44: Item analysis and validation

We already have the formulas for computing the reliability of a test; for internal consistency, for instance, we could use the split-half method or the Kuder-Richardson formulae:

KR-20 or KR-21

Page 45: Item analysis and validation

Reliability and validity are related concepts. If an instrument is unreliable, it cannot yet valid outcomes.

As reliability improves, validity may improve (or may not).

However, if an instrument is shown scientifically to be valid then it is almost certain that it is also reliable.

Page 46: Item analysis and validation

The ff. table is a standard followed by almost universally in educational tests and measurement:

Reliability Interpretation

.90 and above

Excellent reliability; at the level of the best standardized tests.

.80 - .90 Very good for a classroom test

.70 - .80 Good for a classroom test; in the range of most. There are probably a few items which could be improved.

.60 - .70 Somewhat low. This test should be supplemented by other measures (e.g., more test) for grading.

.50 - .60 Suggests need for revision of test, unless it is quite short (ten or fewer items). The test definitely needs to be supplemented by other measures (e.g., more tests) for grading.

.50 or below Questionable reliability. This test should not contribute heavily to the course grade, and it needs revision.