reliability consistency in testing. types of variance meaningful variance –variance between test...

26
Reliability Consistency in testing

Upload: kassandra-stabler

Post on 14-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Reliability

Consistency in testing

Page 2: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Types of variance

• Meaningful variance– Variance between test takers which reflects

differences in the ability or skill being measured

• Error variance– Variance between test takers which is caused

by factors other than differences in the ability or skill being measured

• Test developers as ‘variance chasers’

Page 3: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Sources of error variance

• Measurement error

• Environment

• Administration procedures

• Scoring procedures

• Examinee differences

• Test and items

• Remember, OS = TS + E

Page 4: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Estimating reliability for NRTs

• Are the test scores reliable over time?Would a student get the same score if tested tomorrow?

• Are the test scores reliable over different forms of the same test?

Would the student get the same score if given a different form of the test?

• Is the test internally consistent?

Page 5: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Reliability coefficient (rxx)

• Range: 0.0 (totally unreliable test) to 1.0 (perfectly reliable test)

• Reliability coefficients are estimates of the systematic variance in the test scores

• lower reliability coefficient = greater measurement error in the test score

Page 6: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Test-retest reliability

1. Same students take test twice

2. Calculate reliability (Pearson’s r)

3. Interpret r as reliability (conservative)

• Problems– Logistically difficult – Learning might take place between tests

Page 7: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Equivalent forms reliability

1. Same students take parallel forms of test

2. Calculate correlation

• Problems– Creating parallel forms can be tricky– Logistical difficulty

Page 8: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

University of Michigan English Placement Test

(University of Michigan English Placement Test Examiner’s Manual)

Page 9: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Internal consistency reliability

• Calculating the reliability from a single administration of a test

• Commonly reported– Split-half– Cronbach alpha– K-R20– K-R21

• Calculated automatically by many statistical software packages

Page 10: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Split-half reliability

1. The test is split in half (e.g., odd / even) creating “equivalent forms”

2. The two “forms” are correlated with each other

3. The correlation coefficient is adjusted to reflect the entire test length

– Spearman-Brown Prophecy formula

Page 11: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Calculating split half reliability

ID Q1 Q2 Q3 Q4 Q5 Q6 Odd Even

1 1 0 0 1 1 0

2 1 1 0 1 0 1

3 1 1 1 1 1 0

4 1 0 0 0 1 0

5 1 1 1 1 0 0

6 0 0 0 0 1 0

2

1

3

2

2

1

1

3

2

0

2

0

OddMean 1.83

SD 0.75

Even

Mean 1.33

SD 1.21

Page 12: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Calculating split half reliability (2)

Odd Mean Diff Even Mean Diff Prod.

2 1.83 1 1.33

1 1.83 3 1.33

3 1.83 2 1.33

2 1.83 0 1.33

2 1.83 2 1.33

1 1.83 0 1.33

0.17

-0.83

1.17

0.17

0.17

-0.83

-0.33

1.67

0.67

-1.33

0.67

-1.33

-0.056-1.386

0.784

-0.2260.114

1.104

0.334

Page 13: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Calculating split half

0.334

(6)(.75)(1.21)= 0.06

Adjust for test length using Spearman-Brown Prophecy formula

2 x 0.06(2 – 1)0.06 +1

rxx =0.11

Page 14: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Cronbach alpha

• Similar to split half but easier to calculate

2 (1 - (0.75)2 + (1.21)2

(1.47)2

) = 0.12

total

evenodd

S

SS2

22

12

Page 15: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

K-R20

• “Rolls-Royce” of internal reliability estimates

• Simulates calculating split-half reliability for every possible combination of items

Page 16: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

K-R20 formula

Note that this is variance, not standard deviation

Sum of Item Variance = the sum of IF(1-IF)

2

2

11

20t

i

S

S

k

kRK

Page 17: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

K-R21

• Slightly less accurate than KR-20, but can be calculated with just descriptive statistics

• Tends to underestimate reliability

Page 18: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

KR-21 formula

Note that this is variance (standard deviation squared)

2

)(1

121

kS

MkM

k

kRK

Page 19: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Test summary report (TAP)

Number of Items Excluded = 0Number of Items Analyzed = 40Mean Item Difficulty = 0.597Mean Item Discrimination = 0.491Mean Point Biserial = 0.417Mean Adj. Point Biserial = 0.369KR20 (Alpha) = 0.882KR21 = 0.870SEM (from KR20) = 2.733# Potential Problem Items = 9High Grp Min Score (n=15) = 31.000Low Grp Max Score (n=14) = 17.000

Split-Half (1st/ 2nd) Reliability = 0.307 (with Spearman-Brown = 0.470)Split-Half (Odd/Even) Reliability = 0.865 (with Spearman-Brown = 0.927)

Page 20: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Standard Error of Measurement

If we give a student the same test repeatedly (test-retest), we would expect to see some variation in the scores

50 49 52 50 51 49 48 50

With enough repetition, these scores would form a normal distribution

We would expect the student to score near the center of the distribution the most often

Page 21: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Standard Error of Measurement

• The greater the reliability of the test, the smaller the SEM

• We expect the student to score within one SEM approximately 68% of the time

• If a student has a score of 50 and the SEM is 3, we expect the student to score between 47 ~ 53 approximately 68% of the time on a retest

Page 22: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Interpreting the SEM

For a score of 29: (K-R21)

26 ~ 32 is within 1 SEM

23 ~ 35 are within 2 SEM

20 ~ 38 are within 3 SEM

Page 23: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Calculating the SEM

What is the SEM for a test with a reliability of r=.889 and a standard deviation of 8.124?

SEM = 2.7

What if the same test had a reliability of r = .95?

SEM = 1.8

xxrSSEM 1

Page 24: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Reliability for performance assessment

Traditional fixed response assessment

Performance assessment (i.e. writing, speaking)

Test-taker

Instrument (test)

Score

Test-taker

Task

Performance

Rater / judge

ScaleScore

Page 25: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill

Interrater/Intrarater reliability

1. Calculate correlation between all combinations of raters

2. Adjust using Spearman-Brown to account for total number of raters giving score

Page 26: Reliability Consistency in testing. Types of variance Meaningful variance –Variance between test takers which reflects differences in the ability or skill