chapter 7 procedures for estimating reliability 1

65
CHAPTER 7 Procedures for Procedures for Estimating Estimating Reliability Reliability 1

Upload: colten-gasson

Post on 15-Jan-2016

370 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: CHAPTER 7 Procedures for Estimating Reliability 1

CHAPTER 7

Procedures for Procedures for Estimating ReliabilityEstimating Reliability

1

Page 2: CHAPTER 7 Procedures for Estimating Reliability 1

*TYPES OF RELIABILITYTYPE OF RELIABILITYTYPE OF RELIABILITY WHT IT ISWHT IT IS HOW DO YOU DO ITHOW DO YOU DO IT WHAT THE WHAT THE

RELIABILITY RELIABILITY COEFFICIENT LOOKS COEFFICIENT LOOKS LIKELIKE

Test-RetestTest-Retest

2 Admin2 Admin

A measure of stabilitystability

Administer the same same test/measure test/measure at two different times to the same groupsame group of participants

r test1.test2r test1.test2Ex. IQ testEx. IQ test

Parallel/alternate Parallel/alternate Interitem/Equivalent FormsInteritem/Equivalent Forms2 Admin2 Admin

A measure of equivalenceequivalence

Administer twotwo different different forms forms of the same test same test to the to the same group same group of participants

r testA.testBr testA.testBEx. Stats Test

r testA.testBr testA.testBTest-Retest with Test-Retest with Alternate Alternate FormsForms

2 Admin2 Admin

A measure of stability stability and equivalenceequivalence

On Monday, you administer form A to On Monday, you administer form A to 1st half of the group and form B to the 1st half of the group and form B to the second half.second half.On Friday, you administer form B to 1st On Friday, you administer form B to 1st half of the group and form A to the 2nd half of the group and form A to the 2nd half half

Inter-RaterInter-Rater

1 Admin1 Admin

A measure of agreement agreement

Have two raters rate Have two raters rate behaviors and then behaviors and then determine the amount of determine the amount of agreement between themagreement between them

Percentage of Percentage of agreementagreement

Internal ConsistencyInternal Consistency

1 Admin1 Admin

A measure of how consistently each consistently each item measures the item measures the same underlying same underlying constructconstruct

Correlate performance on Correlate performance on each item with overall each item with overall performance performance across across participantsparticipants

Cronbach’sCronbach’s Alpha Alpha MethodMethodKKuder-uder-RRichardson ichardson MethodMethodSplit Half MethodSplit Half MethodHoyts MethodHoyts Method

2

Page 3: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures for Estimating/Calculating Reliability

Procedures Requiring Procedures Requiring 2 Test 2 Test AdministrationAdministration

Procedures Requiring Procedures Requiring 1 Test 1 Test AdministrationAdministration

3

Page 4: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures for Estimating Reliability*Procedures Requiring two *Procedures Requiring two (2) Test (2) Test

AdministrationAdministration1. Test-Retest Reliability 1. Test-Retest Reliability Method Method

measures the measures the Stability.Stability.2. Parallel (Alternate) 2. Parallel (Alternate) Equivalent Equivalent

Forms Forms Interitem Interitem Reliability Method Reliability Method measures the measures the Equivalence.Equivalence.

3. Test-Retest with Alternate Reliability 3. Test-Retest with Alternate Reliability FormsForms measures the measures the StabilityStability and and Equivalent Equivalent 4

Page 5: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 2 Test Administration

1. Test-Retest Reliability Method1. Test-Retest Reliability Method

Administering the Administering the same test same test to the to the same group of same group of participants participants then, the two sets of scores are then, the two sets of scores are correlated with each other.correlated with each other.

The The correlation coefficient ( r ) correlation coefficient ( r ) between the two between the two sets of scores is called the sets of scores is called the coefficient of coefficient of stability.stability.

The problem with this method is The problem with this method is time Samplingtime Sampling, , it it means that factors related to time are the sources means that factors related to time are the sources of measurement error e.g., of measurement error e.g., change in exam change in exam condition such as noises, the weather, illness, condition such as noises, the weather, illness, fatigue, worry, mood change etc. fatigue, worry, mood change etc.

5

Page 6: CHAPTER 7 Procedures for Estimating Reliability 1

How to Measure the Test Retest Reliability

Class IQ ScoresClass IQ Scores Students Students X –first time X –first timeY- second timeY- second time

John John 125 125 120 120 Jo Jo 110 110 112112 Mary Mary 130 130 128128 Kathy Kathy 122 122 120120 David David 115115 120120 rrfirst timefirst time.second time.second time stabilitystability

6

Page 7: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 2 Test Administration

2. Parallel (Alternate) Forms Reliability Method2. Parallel (Alternate) Forms Reliability Method

Different Forms Different Forms of the of the same test same test are are given to the given to the same group same group of participants of participants then, the two sets of scores are then, the two sets of scores are correlated. The correlation coefficient correlated. The correlation coefficient (r) between the two sets of scores is (r) between the two sets of scores is called the called the coefficient of coefficient of equivalence.equivalence.

7

Page 8: CHAPTER 7 Procedures for Estimating Reliability 1

How to measure the Parallel Forms Reliability Class Test ScoresClass Test Scores Students Students X-Form A X-Form A Y-Form B Y-Form B

John John 95 95 90 90 Jo Jo 80 80 8585 Mary Mary 78 78 8282 Kathy Kathy 82 82 8888 David David 7575 7272 rrformAformA••formB formB equivalenceequivalence

8

Page 9: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 2 Test Administration

3. Test-Retest with Alternate Reliability Forms3. Test-Retest with Alternate Reliability Forms It is a It is a combinationcombination of the test-retest and alternate of the test-retest and alternate

form reliability method. form reliability method. On On MondayMonday, you administer , you administer form A to 1form A to 1stst half half of of

the group and form B to the second half.the group and form B to the second half. On On Friday, Friday, you administer you administer form B to 1form B to 1stst half half of the of the

group and form A to the second half.group and form A to the second half. The The correlation coefficient ( r) correlation coefficient ( r) between the two between the two

sets of scores is called the sets of scores is called the coefficient of coefficient of stability stability and equivalence.and equivalence.

9

Page 10: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

A. Internal Consistency Reliability (ICR)A. Internal Consistency Reliability (ICR) Examines the Examines the unidimensionalunidimensional nature of a set of nature of a set of

items in a test. It tells us how items in a test. It tells us how unifiedunified the items are the items are in a test or in an assessment.in a test or in an assessment.

Ex. If we administer a 100-item personality test we Ex. If we administer a 100-item personality test we want the items to relate with one another and to want the items to relate with one another and to reflect the same construct reflect the same construct (personality). We want (personality). We want them to have them to have item homogeneity.item homogeneity.

ICRICR deals with deals with how unified the items are in a test how unified the items are in a test or an assessment. or an assessment. This is called This is called “item “item homogeneity.”homogeneity.”

10

Page 11: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures for Estimating ReliabilityProcedures for Estimating Reliability

*Procedures Requiring one (1) Test Administration

A. Internal Consistency A. Internal Consistency ReliabilityReliability

B. Inter-Rater ReliabilityB. Inter-Rater Reliability11

Page 12: CHAPTER 7 Procedures for Estimating Reliability 1

A. Internal Consistency Reliability (ICR)

*4 Different ways to measure ICR1. Guttman Split Half Reliability Method

same as (Spearman Brown Prophesy Formula)

2. Cronbach’s Alpha Method 3. Kuder Richardson Method4. Hoyt’s MethodThey are different statistical

procedures to calculate the reliability of a test.

12

Page 13: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

A. Internal Consistency Reliability (ICR)A. Internal Consistency Reliability (ICR)1. 1. GuttmanGuttman Split-HalfSplit-Half Reliability Reliability

MethodMethod (most popular) (most popular) usually use usually use for for dichotomouslydichotomously scored exams. scored exams.

First, administer a test, then First, administer a test, then divide divide the test items into 2 subteststhe test items into 2 subtests (There are four popular methods), (There are four popular methods), then, find the correlation between then, find the correlation between the 2 subtests and place it in the the 2 subtests and place it in the formula. formula.

13

Page 14: CHAPTER 7 Procedures for Estimating Reliability 1

1. Split Half Reliability Method

14

Page 15: CHAPTER 7 Procedures for Estimating Reliability 1

1. Split Half Reliability Method*The 4 popular methods are:*The 4 popular methods are:1.Assign all odd-numbered items 1.Assign all odd-numbered items

to form 1 and all even-numbered to form 1 and all even-numbered items to form 2items to form 2

2. Rank order the items in terms 2. Rank order the items in terms of their of their difficulty levels (p-values) difficulty levels (p-values) based on the responses of the based on the responses of the examiners; then assign items with examiners; then assign items with odd-numbered ranks to form 1 odd-numbered ranks to form 1 and those with even-numbered and those with even-numbered ranks to form 2ranks to form 2 15

Page 16: CHAPTER 7 Procedures for Estimating Reliability 1

1. Split Half Reliability Method

The four popular methods The four popular methods are:are:

3. Randomly assign items to 3. Randomly assign items to the two half-test formsthe two half-test forms

4. Assign items to half-test 4. Assign items to half-test forms so that the forms are forms so that the forms are “matched” in content “matched” in content e.g. if there e.g. if there are 6 items on reliability, each half will get 3.are 6 items on reliability, each half will get 3.

16

Page 17: CHAPTER 7 Procedures for Estimating Reliability 1

1. Split Half Reliability MethodA high Slit Half reliability coefficient (e.g., >0.90) indicates a homogeneous test.A high Slit Half reliability coefficient (e.g., >0.90) indicates a homogeneous test.

17

Page 18: CHAPTER 7 Procedures for Estimating Reliability 1

1. Split Half Reliability Method*Use the split half reliability *Use the split half reliability method method to calculate the to calculate the reliability estimate of a test reliability estimate of a test with with reliability coefficient reliability coefficient (correlation) of 0.25 (correlation) of 0.25 for the 2 for the 2 halves of this test halves of this test ??

18

Page 19: CHAPTER 7 Procedures for Estimating Reliability 1

1. Split Half Reliability Method1. Split Half Reliability Method

19

Page 20: CHAPTER 7 Procedures for Estimating Reliability 1

1. Split Half Reliability MethodA=X and B=Y

20

Page 21: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

A. Internal Consistency Reliability (ICR)2. Cronbach’s Alpha Method (used for

wide range of scoring such as Non-Dichotomously and

Dichotomously scored exams. Cronbach’s(α) is a preferred statistic. Lee Cronbach-Lee Cronbach-

21

Page 22: CHAPTER 7 Procedures for Estimating Reliability 1

22

Procedures Requiring 1 Test Administration

Page 23: CHAPTER 7 Procedures for Estimating Reliability 1

Cronbach αfor composite tests K is number of tests/subtest

23

Page 24: CHAPTER 7 Procedures for Estimating Reliability 1

A. Internal Consistency Reliability (ICR) 2. *Cronbach’s Alpha Method

or ( Coefficient (α) is a preferred statistic)

Ex. Suppose that the examinees are tested Ex. Suppose that the examinees are tested on on 4 essay items 4 essay items and the maximum score and the maximum score for each is 10 points. The variance for the for each is 10 points. The variance for the items are as follow; σ²1=9, σ²2=4.8, items are as follow; σ²1=9, σ²2=4.8, σ²3=10.2, and σ²4=16. If the total score σ²3=10.2, and σ²4=16. If the total score variance σ²x=100, used the Cronbach’s variance σ²x=100, used the Cronbach’s Alpha Method to calculate the internal Alpha Method to calculate the internal consistency of this test? A highconsistency of this test? A high α coefficient (e.g., >0.90) indicates a coefficient (e.g., >0.90) indicates a homogeneous homogeneous test.. 24

Page 25: CHAPTER 7 Procedures for Estimating Reliability 1

25

Page 26: CHAPTER 7 Procedures for Estimating Reliability 1

Cronbach’s Alpha Method

26

Page 27: CHAPTER 7 Procedures for Estimating Reliability 1

27

Page 28: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration 3. Kuder Richardson Method

A. Internal Consistency Reliability (ICR)A. Internal Consistency Reliability (ICR) **The Kuder-Richardson Formula 20 The Kuder-Richardson Formula 20 (KR-20) (KR-20) first first

published in 1937. It is a measure of published in 1937. It is a measure of internal internal consistency consistency reliability for measures with for measures with dichotomous choicesdichotomous choices. It is analogous \ə-ˈna-lə-. It is analogous \ə-ˈna-lə-

gəs\ to gəs\ to Cronbach's α, , except except Cronbach's α Cronbach's α is also is also used for non-dichotomous tests. used for non-dichotomous tests. pq=pq=σ²σ²i. i. A high A high KR-KR-2020 coefficient (e.g., >0.90) indicates a coefficient (e.g., >0.90) indicates a homogeneous homogeneous test..

28

Page 29: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test *Administration

29

Page 30: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

30

Page 31: CHAPTER 7 Procedures for Estimating Reliability 1

3. *Kuder Richardson Method (KR 20and KR 21) See table 7.1 or data on p.136 next

31

Page 32: CHAPTER 7 Procedures for Estimating Reliability 1

Variance=square of standard deviation=4.08Variance=square of standard deviation=4.08

32

Page 33: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

A. Internal Consistency Reliability A. Internal Consistency Reliability (ICR)(ICR)

3. Kuder 3. Kuder Richardson Method (KR 21) Richardson Method (KR 21) It It isis used only with dichotomously used only with dichotomously scored items. scored items. It does not require the It does not require the computing of each item variancecomputing of each item variance (you (you do it once do it once for all items or test variance for all items or test variance σ²σ²X=Total test score variance) X=Total test score variance) see table 7.1 see table 7.1 for standard deviation and variance for all items. for standard deviation and variance for all items.

It assumes all items are equal in difficulties. It assumes all items are equal in difficulties. 33

Page 34: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

34

Page 35: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

A. Internal Consistency Reliability A. Internal Consistency Reliability (IC(ICRR))

4. *Hoyt’s (1941) Method4. *Hoyt’s (1941) MethodHoyt used Hoyt used ANOVAANOVA to obtained to obtained

variancevariance or or MSMS to calculate the to calculate the Hoyt’s Hoyt’s Coefficient.Coefficient.

MSMS==σ²=S²=σ²=S²=VarianceVariance

35

Page 36: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

36

Page 37: CHAPTER 7 Procedures for Estimating Reliability 1

4. *Hoyt’s (1941) MethodMS person MS withinMS items MS between

MS residual has it’s own calculations, it is not =MS total

37

Page 38: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

B. Inter-Rater ReliabilityB. Inter-Rater Reliability

It is measure of consistency from rater to rater.

It is a measure of agreement between the raters.

38

Page 39: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

B. Inter-Rater Reliability Items Rater 1 Rater 2 1 4 3 2 3 5 3 5 5 4 4 2 5 1 2 First do the r rater1.rater2 then, X 100.

39

Page 40: CHAPTER 7 Procedures for Estimating Reliability 1

Procedures Requiring 1 Test Administration

B. Inter-Rater Reliability More than 2 raters: Raters 1, 2, and 3 Calculate r for 1 & 2=.6 Calculate r for 1 & 3=.7 Calculate r for 2 & 3=.8 µ=.7 x100=70%

40

Page 41: CHAPTER 7 Procedures for Estimating Reliability 1

*Factors that Affect Reliability *Factors that Affect Reliability Coefficients Coefficients

1. Group Homogeneity2. Test length3. Time limit

41

Page 42: CHAPTER 7 Procedures for Estimating Reliability 1

*Factors that Affect Reliability Coefficients

1. Group Homogeneity1. Group Homogeneity If a sample of examinees is highly homogeneous on the If a sample of examinees is highly homogeneous on the

construct being measured, the reliability estimate will be construct being measured, the reliability estimate will be lowerlower than if the sample were more than if the sample were more heterogeneousheterogeneous..

2. Test length2. Test length Longer tests are more reliable Longer tests are more reliable than shorter tests. than shorter tests. The effect of changing test length can be estimated by The effect of changing test length can be estimated by

using using Spearman Brown Prophecy FormulaSpearman Brown Prophecy Formula.. 3. Time limit3. Time limit Time Limit refers to when a test has a Time Limit refers to when a test has a rigid time limit.rigid time limit. Meaning, some examinees finish but others don’t, this Meaning, some examinees finish but others don’t, this

will artificially inflate the test reliability coefficient.will artificially inflate the test reliability coefficient.42

Page 43: CHAPTER 7 Procedures for Estimating Reliability 1

Reporting Reliability DataAccording to Standards for Educational and

Psychological Testing1. Result of different reliability studies 1. Result of different reliability studies

should be reported to take intoshould be reported to take into account account different sources of measurement errordifferent sources of measurement error that are most relevant to score use.that are most relevant to score use.

2.Standard error of measurement 2.Standard error of measurement and score and score bands for different confidence intervals bands for different confidence intervals should accompany each reliability estimateshould accompany each reliability estimate

3.Reliability and standard error estimates 3.Reliability and standard error estimates should be reported for subtest scores as should be reported for subtest scores as well as total test scorewell as total test score.

43

Page 44: CHAPTER 7 Procedures for Estimating Reliability 1

Reporting Reliability Data 4.Procedures and sample used in reliability 4.Procedures and sample used in reliability

studies should be sufficiently describe studies should be sufficiently describe to permit to permit users to determine similarity between conditions users to determine similarity between conditions of the reliability study and their local situations.of the reliability study and their local situations.

5.When a test is normally used for a particular 5.When a test is normally used for a particular population of examinees (e.g., those within a population of examinees (e.g., those within a grade level or those who have a particular grade level or those who have a particular handicap) handicap) reliability estimate and standard error reliability estimate and standard error of measurement should be reported separately of measurement should be reported separately for such specialized populationfor such specialized population..

44

Page 45: CHAPTER 7 Procedures for Estimating Reliability 1

Reporting Reliability Data 6.when test scores are used primarily for 6.when test scores are used primarily for

describing or comparing group performance, describing or comparing group performance, reliability and standard error of measurement for reliability and standard error of measurement for aggregated observations should be reported.aggregated observations should be reported.

7.If standard errors of measurement are 7.If standard errors of measurement are estimated by using a model such as the estimated by using a model such as the binomial binomial modelmodel, this should be clearly indicated; users will , this should be clearly indicated; users will probably assume that the classical standard error probably assume that the classical standard error of measurement is being reported. of measurement is being reported. A binomial A binomial model is characterized by trials which either end model is characterized by trials which either end in success (heads) or failure (tails). in success (heads) or failure (tails).

45

Page 46: CHAPTER 7 Procedures for Estimating Reliability 1

CHAPTER 8CHAPTER 8Introduction to Introduction to Generalizability TheoryGeneralizability Theory

Cronbach (1963)Cronbach (1963)

46

Page 47: CHAPTER 7 Procedures for Estimating Reliability 1

CHAPTER 8CHAPTER 8Introduction to Introduction to Generalizability Generalizability

TheoryTheory Cronbach (1963)Cronbach (1963)

GeneralizabilityGeneralizability is another way to is another way to calculate the reliability of a test by calculate the reliability of a test by using using ANOVA.ANOVA.

GeneralizabilityGeneralizability refers to the refers to the degreedegree to to which which a particular set of measurements a particular set of measurements of an examinee generalizes to a more of an examinee generalizes to a more extensiveextensive set of measurements set of measurements of that of that examinee. examinee. (just like conducting inferential research)(just like conducting inferential research)47

Page 48: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to GeneralizabilityGeneralizability Coefficient

FYI, In FYI, In Classical True Score Theory, Classical True Score Theory, the the Reliability was defined as the ratio of the Reliability was defined as the ratio of the True score True score to to Observed score.Observed score.

Reliability= Reliability= T/TT/T+E+E Also, an examinee’s Also, an examinee’s True score True score is defined is defined

as the average as the average (mean) of large number of (mean) of large number of strictly parallel measurementsstrictly parallel measurements, and the , and the True True score variance σ²score variance σ²TT defined as defined as variance of variance of these averages. these averages.

Reliability CoefficientReliability CoefficientppX1X2X1X2= = σ²T/ σ²T/ σ²Xσ²X 48

Page 49: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to Generalizability*Generalizabilty Coefficient

In In Generalizability theory Generalizability theory an an examinee’s examinee’s Universe ScoreUniverse Score is defined is defined as the average of the measurements in as the average of the measurements in the universe of generalization (The the universe of generalization (The Universe ScoreUniverse Score is the same as is the same as True True score score in in classical test theory), classical test theory), it is the it is the averageaverage or mean of the measurements or mean of the measurements in the in the Universe of GeneralizationUniverse of Generalization..

49

Page 50: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to Generalizability*Generalizabilty Coefficient

The The Generalizability Coefficient Generalizability Coefficient or or pp is is defined as the ratio of defined as the ratio of Universe Score Universe Score Variance Variance (σ²μ) (σ²μ) to to expected Observed Score expected Observed Score Variance Variance ((eeσ²σ²XX).).

Generalizability Coefficient=p= Generalizability Coefficient=p= σ²σ²μμ//eeσ²σ²XX

Ex. if Ex. if Expected Observed Score Variance=eσ²X Expected Observed Score Variance=eσ²X =10 =10

and and Universe Score Variance =σ²μ =5Universe Score Variance =σ²μ =5Then Then the the Generalizability Coefficient Generalizability Coefficient is: is: 55//1010==0.50.5

50

Page 51: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to Generalizability Key Terms

Universe: Universe: UniverseUniverse are a are a set of measurement set of measurement

conditions conditions which are more which are more extensiveextensive than than the condition under which the sample the condition under which the sample measurements were obtained.measurements were obtained.

Ex: If you took the Test Construction exam here at CAU then, the Ex: If you took the Test Construction exam here at CAU then, the Universe Universe or or (generalization) (generalization) is when you take the test construction exams at several other is when you take the test construction exams at several other universities, universities,

University Score University Score CAU 85CAU 85FIU 90FIU 90FAU 84FAU 84NSU 80NSU 80UM UM 8888 μ=85.40 is called the μ=85.40 is called the Universe ScoreUniverse Score

51

Page 52: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to Generalizability Key Terms

Universe Score:Universe Score: It is the same as It is the same as True score True score in in

Classical Test TheoryClassical Test Theory. It is the average . It is the average (mean) of the measurements in the (mean) of the measurements in the universe of generalization.universe of generalization.

Ex: The Ex: The meanmean of your scores on the Test of your scores on the Test Construction exams you took at other Construction exams you took at other universities is your universities is your Universe Score Universe Score (see (see previous slide).previous slide).

52

Page 53: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to Generalizability Key Term

Facets: FacetsFacets are are a a partpart or or aspectaspect of of

something also something also A Set of A Set of Measurement Measurement Conditions.Conditions.

Ex. Next slideEx. Next slide53

Page 54: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to Generalizability

*Facets: *Facets: ExampleExample

IfIf two supervisors two supervisors want to rate the want to rate the performance of factory workers under performance of factory workers under three workloads three workloads (heavy, medium, and (heavy, medium, and light), how many sets of measurements light), how many sets of measurements (facets) we’ll have?(facets) we’ll have?

See next slideSee next slide

54

Page 55: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to Generalizability

*Facets: *Facets: ExampleExample

IfIf two two supervisorssupervisors want to rate the want to rate the performance of factory workers under performance of factory workers under three three workloadsworkloads (heavy, medium, and (heavy, medium, and light), how many sets of measurements light), how many sets of measurements (facets) we’ll have?(facets) we’ll have?

See next slideSee next slide

55

Page 56: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to Generalizability Facets:

The two sets of measurement conditions or the two facetsfacets are;

1- the supervisorssupervisors (one and two), 2- The workloadsworkloads (heavy,

medium, and light). Ex. 2 next slide

56

Page 57: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to Generalizability Facets:

*A researcher measuring students *A researcher measuring students compositional writing on four occasions.compositional writing on four occasions.

On each occasion, each student writes On each occasion, each student writes compositions on two different topics.compositions on two different topics.

All compositions are graded by three All compositions are graded by three different raters. different raters.

This design involves how many facets??This design involves how many facets?? See next slide

57

Page 58: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to Generalizability Facets:

*A researcher measuring students *A researcher measuring students compositional writing on four compositional writing on four occasionsoccasions..

On each occasion, each student writes On each occasion, each student writes compositions on two different compositions on two different topics.topics.

All compositions are graded by three All compositions are graded by three different different raters. raters.

58

Page 59: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to GeneralizabilityKey Term

*Universe of Generalization:*Universe of Generalization:Universe of Generalization Universe of Generalization are all are all

of the of the measurement measurement conditions conditions for for the second set of measurement orthe second set of measurement or “universe.”“universe.” Such as; Such as; fatiguefatigue, , room room temperaturetemperature, , specificationspecification, , etc,. etc,.

Ex. Ex. All of the conditions All of the conditions under which you took your test- under which you took your test- construction exams at other universitiesconstruction exams at other universities..

59

Page 60: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to GeneralizabilityIntroduction to GeneralizabilityGeneralizability Distinguishes between Generalizability Distinguishes between

Generalizability Studies (G- Studies) Generalizability Studies (G- Studies) and and Decision studies (D-Studies).Decision studies (D-Studies).

*G-Studies:*G-Studies:

G-StudiesG-Studies are concern with are concern with extentextent to to which which a sample of measurement a sample of measurement generalizes to universe of generalizes to universe of measurements. measurements. It is the study of It is the study of

generalizability proceduresgeneralizability procedures.. 60

Page 61: CHAPTER 7 Procedures for Estimating Reliability 1

Generalizability Studies (G- Studies) Generalizability Studies (G- Studies) and and Decision studies (D-Studies)Decision studies (D-Studies)

*D-Studies:*D-Studies:

D-Studies D-Studies refer to providing refer to providing datadata for making decisions about for making decisions about examinees. It is about the examinees. It is about the adequacy of measurement.adequacy of measurement.

Ex. Next slideEx. Next slide

61

Page 62: CHAPTER 7 Procedures for Estimating Reliability 1

Generalizability Studies (G- Studies) Generalizability Studies (G- Studies) and and Decision studies (D-Studies)Decision studies (D-Studies)

Ex. Ex. Suppose we use an achievement test Suppose we use an achievement test to test 2000 children from to test 2000 children from publicpublic and and 2000 children from 2000 children from privateprivate schools. schools.

If we want to know whether this test is If we want to know whether this test is equally equally reliablereliable for both types of schools for both types of schools then we are dealing with then we are dealing with G-Study (quality G-Study (quality of measurement).of measurement).Ex. We can generalize a test to Ex. We can generalize a test to these two these two different different school population i.e CAU and FIU doc. stu. taking school population i.e CAU and FIU doc. stu. taking the EPPP examthe EPPP exam

62

Page 63: CHAPTER 7 Procedures for Estimating Reliability 1

Generalizability Studies (G- Studies) Generalizability Studies (G- Studies) andand Decision studies (D-Studies) Decision studies (D-Studies)

However, if we want to compare the However, if we want to compare the means of these different types of means of these different types of schools schools (using data) (using data) and draw a and draw a conclusion about differences in the conclusion about differences in the adequacyadequacy of the two educational of the two educational systems then, we dealing with systems then, we dealing with D-D-Study. Study. Ex. Compare the means of Ex. Compare the means of CAU and FIU doc. CAU and FIU doc. stu. Who took the EPPP exam.stu. Who took the EPPP exam.

63

Page 64: CHAPTER 7 Procedures for Estimating Reliability 1

Introduction to GeneralizabilityIntroduction to Generalizability

*Generalizability Designs:*Generalizability Designs: There are 4 different Generalizability designs 4 different Generalizability designs

with different Generalizability theorydifferent Generalizability theory

_ Stands for examineeexaminee

+ Stands for raterrater or or examinerexaminer

64

Page 65: CHAPTER 7 Procedures for Estimating Reliability 1

Generalizibility Designs:

1._ _ _ _ _ _ _ _ _ _1._ _ _ _ _ _ _ _ _ _ ++1. One 1. One raterrater rates each one of the rates each one of the examineesexaminees 2._ _ _ _ _ _ _ _ _ _ 2._ _ _ _ _ _ _ _ _ _ + + ++ + + 2. A group of 2. A group of raters raters rate each one of the rate each one of the examineesexaminees 3._ _ _ _ _ _ _ _ _ _ 3._ _ _ _ _ _ _ _ _ _ + + + + + + + + + ++ + + + + + + + + +3. One 3. One raterrater rates only one rates only one examineeexaminee 4._ _ _ _ _ _ _ _ _ _ 4._ _ _ _ _ _ _ _ _ _ +++ +++ +++ +++ +++ +++ +++ +++ +++ ++++++ +++ +++ +++ +++ +++ +++ +++ +++ +++4. Each 4. Each examinee examinee is rated by different group of is rated by different group of raters raters (most (most

expensive).expensive).

65