introduction to plausible values national research coordinators meeting madrid, february 2010
TRANSCRIPT
![Page 1: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/1.jpg)
Introduction to plausible values
National Research Coordinators Meeting Madrid, February 2010
![Page 2: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/2.jpg)
NRCMeetingMadrid
February 2010
Content of presentation
• Rationale for scaling• Rasch model and possible ability
estimates• Shortcomings of point estimates• Drawing plausible values• Computation of measurement error
![Page 3: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/3.jpg)
NRCMeetingMadrid
February 2010
Rationale for IRT scaling of data
• Summarising data instead of dealing with many single items
• Raw scores or percent correct sample-dependent
• Makes equating possible and can deal with rotated test forms
![Page 4: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/4.jpg)
NRCMeetingMadrid
February 2010
The ‘Rasch model’
• Models the probability to respond correctly to an item as
• Likewise, the probability of NOT responding correctly is modelled as
in
innii XP
exp1
exp)1(
)exp(1
1)0(
inniXP
![Page 5: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/5.jpg)
NRCMeetingMadrid
February 2010
IRT curves
0
0.5
1
-4 -3 -2 -1 0 1 2 3 4
![Page 6: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/6.jpg)
NRCMeetingMadrid
February 2010
How might we impute a reasonable proficiency value?
• Choose the proficiency that makes the score most likely– Maximum Likelihood Estimate– Weighted Likelihood Estimate
• Choose the most likely proficiency for the score– empirical Bayes
• Choose a selection of likely proficiencies for the score– Multiple imputations (plausible values)
![Page 7: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/7.jpg)
NRCMeetingMadrid
February 2010
Maximum Likelihood vs. Raw Score
0
1
2
3
4
5
Proficiency
Scor
e
![Page 8: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/8.jpg)
NRCMeetingMadrid
February 2010
The Resulting Proficiency Distribution
Score 0
Score 1
Score 2
Score 3Score 4
Score 5
Score 6
Proficiency on Logit Scale
![Page 9: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/9.jpg)
NRCMeetingMadrid
February 2010
Characteristics of Maximum Likelihood Estimates (MLE)
• Unbiased at individual level with sufficient information BUT biased towards ends of ability scale.
• Arbitrary treatment of perfects and zeroes required
• Discrete scale & measurement error leads to bias in population parameter estimates
![Page 10: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/10.jpg)
NRCMeetingMadrid
February 2010
Characteristics of Weighted Likelihood Estimates
• Less biased than MLE
• Provides estimates for perfect and zero scores
• BUT discrete scale & measurement error leads to bias in population parameter estimates
![Page 11: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/11.jpg)
NRCMeetingMadrid
February 2010
Plausible Values
• What are plausible values?
• Why do we use them?
• How to analyse plausible values?
![Page 12: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/12.jpg)
NRCMeetingMadrid
February 2010
Purpose of educational tests
• Measure particular students
(minimise measurement error of
individual estimates)
• Assess populations
(minimise error when generalising
to the population)
![Page 13: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/13.jpg)
NRCMeetingMadrid
February 2010
Posterior distributionsfor test scores on 6 dichotomous items
![Page 14: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/14.jpg)
NRCMeetingMadrid
February 2010
Empirical Bayes – Expected A-Priori estimates (EAP)
![Page 15: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/15.jpg)
NRCMeetingMadrid
February 2010
Characteristics of EAPs
• Biased at the individual level but unbiased population means (NOT variances)
• Discrete scale, bias & measurement error leads to bias in population parameter estimates
• Requires assumptions about the distribution of proficiency in the population
![Page 16: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/16.jpg)
NRCMeetingMadrid
February 2010
Plausible Values
Score 0
Score 1
Score 2
Score 3 Score 4
Score 5
Score 6
Proficiency on Logit Scale
![Page 17: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/17.jpg)
NRCMeetingMadrid
February 2010
Characteristics of Plausible Values
• Not fair at the student level
• Produces unbiased population parameter estimates– if assumptions of scaling are reasonable
• Requires assumptions about the distribution of proficiency
![Page 18: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/18.jpg)
NRCMeetingMadrid
February 2010
Estimating percentages below benchmark with Plausible Values
Level One Cutpoint
The proportion of plausible values less than the cut-point will be a superior estimator to the EAP, MLE or WLE based values
![Page 19: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/19.jpg)
NRCMeetingMadrid
February 2010
Methodology of PVs
• Mathematically computing posterior distributions around test scores
• Drawing 5 random values for each assessed individual from the posterior distribution for that individual
![Page 20: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/20.jpg)
NRCMeetingMadrid
February 2010
What is conditioning?
• Assuming normal posterior distribution:
• Model sub-populations:
X=0 for boyX=1 for girl
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
1 5 9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
2,N
2,N X
2...,N X Y Z
![Page 21: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/21.jpg)
NRCMeetingMadrid
February 2010
Conditioning Variables
• Plausible values should only be analysed with data that were included in the conditioning (otherwise, results may be biased)
• Aim: Maximise information included in the conditioning, that is use as many variables as possible
• To reduce number of conditioning variables, factor scores from principal component analysis were used in ICCS
• Use of classroom dummies takes between-school variation into account (no inclusion of school or teacher questionnaire data needed)
![Page 22: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/22.jpg)
NRCMeetingMadrid
February 2010
Plausible values
• Model with conditioning variables will improve precision of prediction of ability (population estimates ONLY)
• Conditioning provides unbiased estimates for modelled parameters.
• Simulation studies comparing PVs, EAPs and WLEs show that– Population means similar results– WLEs (or MLEs) tend to overestimate variances– EAPs tend to underestimate variance
![Page 23: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/23.jpg)
NRCMeetingMadrid
February 2010
Calculating of measurement error
• As in TIMSS or PIRLS data files, there are five plausible values for cognitive test scales in ICCS
• Using five plausible values enable researchers to obtain estimates of the measurement error
![Page 24: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/24.jpg)
NRCMeetingMadrid
February 2010
How to analyse PVs - 1
• Estimated mean is the AVERAGE of the mean for each PV
• Sampling variance is the AVERAGE of the sampling variance for each PV
M
iiM 1
ˆ1
ˆ
M
iiM 1
2)(
2)( ˆ
1ˆ
![Page 25: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/25.jpg)
NRCMeetingMadrid
February 2010
How to analyse PVs - 2
• Measurement variance computed as:
• Total standard error computed from measurement and sampling variance as:
25
1
2)( ˆˆ
1
1ˆ
i
iPV M
2 2ˆ ˆ ( )( ) ( )
1ˆ ˆ ˆ(1 )PV PVM
![Page 26: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/26.jpg)
NRCMeetingMadrid
February 2010
How to analyse PVs - 3
can be replaced by any statistic for instance:- SD- Percentile- Correlation coefficient- Regression coefficient- R-square- etc.
![Page 27: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/27.jpg)
NRCMeetingMadrid
February 2010
Steps for estimating both sampling and measurement error
• Compute statistic for each PV for fully weighted sample
• Compute statistics for each PV for 75 replicate samples
• Compute sampling error (based on previous steps)
• Compute measurement error• Combine error variances to calculate
standard error
![Page 28: Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e125503460f94afe2ed/html5/thumbnails/28.jpg)
NRCMeetingMadrid
February 2010
Questions or comments?