item parameter estimation: does winbugs do better than bilog-mg?
DESCRIPTION
Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?. Bayesian Statistics, Fall 2009 Chunyan Liu & James Gambrell. Introduction. 3 Parameter IRT Model Assigns each item a logistic function with a variable lower asymptote. Purpose. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/1.jpg)
Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?
Bayesian Statistics, Fall 2009Chunyan Liu & James Gambrell
![Page 2: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/2.jpg)
Introduction3 Parameter IRT ModelAssigns each item a logistic function with
a variable lower asymptote.
![Page 3: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/3.jpg)
Purpose
Compare BILOG-MG and WinBUGS estimation of item parameters under the 3 parameter logistic (3PL) IRT model
Investigate the effect of sample size on the estimation of item parameters
![Page 4: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/4.jpg)
BILOG – MG (Mislevy & Bock 1985)Propriety softwareUses unknown estimation shortcutsSometimes gives poor results“Black Box” programVery fast estimationProvides only point estimates and standard errors for
model parametersEstimation method
◦Marginal Maximum Likelihood◦Expectation-Maximization algorithm (Bock and Aitkin,
1981)
![Page 5: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/5.jpg)
WinBUGSMore open-source (related to OpenBugs)More widely studiedMight give more robust resultsMuch more flexibleProvides full posterior densities for model
parametersMore output to evaluate convergenceVery slow estimation!
![Page 6: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/6.jpg)
Literature ReviewMost researchers have used custom-built
MCMC samplers using Metropolis-Hastings- within-Gibbs algorithm ◦ as recommended by Cowles, 1996!
Patz and Junker (1999a & b)◦Wrote MCMC sampler in S plus◦Found that their sampler produced estimates
identical to BILOG for the 2PL model, but had some trouble with 3PL models.
◦Found MCMC was superior at handling missing data.
![Page 7: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/7.jpg)
Literature ReviewJones and Nediak (2000)
◦Developed “commercial grade” sampler in C++◦ Improved the Patz and Junker algoritm◦Compared MCMC results to BILOG using both real and
simulated data◦Found that item parameters varied substantially, but the
ICCs described were close according to the Hellinger deviance criterion
◦MCMC and BILOG were similar for real data◦MCMC was superior for simulated data◦Note that MCMC provides much more diagnostic out to
assess convergence problems
![Page 8: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/8.jpg)
Literature ReviewProctor, Teo, Hou, and Hsieh (2005
project for this class!)◦Compared BILOG to WinBUGS◦Fit a 2PL model◦Only simulated a single replication◦Did not use deviance or RMSE to assess error
![Page 9: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/9.jpg)
DataTest: 36-item multiple choice
Item parameters (a, b and c) come from Chapter 6 of Equating, Scaling and Linking (Kolen and Brennan)◦ Treated as true item parameters (See Appendix)
Item responses simulated using 3PL model
a – slope b – difficulty c – guessing – examinee ability
1( )1 exp( 1.7 ( ))
cp ca b
![Page 10: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/10.jpg)
Methods1. N (N=200, 500, 1000, 2000) θ values were generated
from N(0,1) distribution.
2. N item responses were simulated based on the θ’s generated in step 1 and the true item parameters using the 3PL model.
3. Item parameters (a, b, c for the 36 items) were estimated using BILOG-MG based on the N item responses.
4. Item parameters (a, b, c for the 36 items) were estimated using WinBUGS based on the N item responses using the same prior as specified by BILOG-MG.
5. Repeat steps two and four 100 times. For each item, we have 100 estimated parameter sets from both programs
![Page 11: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/11.jpg)
Priors
a[i] ~ dlnorm(0, 4) b[i] ~ dnorm(0, 0.25) c[i] ~ dbeta(5,17)
Same priors used in BILOG and WinBUGS
![Page 12: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/12.jpg)
Criterion-Root Mean Square Error (RMSE)
For each item, we computed the RMSE for a, b, and c using the same formula
where and
Here could be , , or and x could be the parameter of a, b or c
2 2ˆ ˆ ˆ( ) ( ) ( )RMSE x Bias x sd x
ˆ ˆ( ) ( )Bias x E x x
100
1
ˆˆ( )
100
ii
xE x
a b cx
![Page 13: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/13.jpg)
Results1. Deciding the number of Burn-in Iterations- History Plots
a[28] chains 1:3
iteration
1 2000 4000 6000
0.0
2.0
4.0
6.0
b[28] chains 1:3
iteration
1 2000 4000 6000
-4.0
-2.0
0.0
2.0
4.0
c[28] chains 1:3
iteration
1 2000 4000 6000
0.0
0.1
0.2
0.3
![Page 14: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/14.jpg)
a[28] chains 1:3
lag
0 20 40
-1.0 -0.5 0.0 0.5 1.0
b[28] chains 1:3
lag
0 20 40
-1.0 -0.5 0.0 0.5 1.0
c[28] chains 1:3
lag
0 20 40
-1.0 -0.5 0.0 0.5 1.0
a[28] chains 1:3
start-iteration
1051 2000 3000
0.0 0.5
1.0 1.5
b[28] chains 1:3
start-iteration
1051 2000 3000
0.0 0.5
1.0 1.5
c[28] chains 1:3
start-iteration
1051 2000 3000
0.0 0.5
1.0 1.5
Results-cont.1. Deciding the number of Burn-in Iterations- Autocorrelation and BGR plots
![Page 15: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/15.jpg)
Results-cont.1. Deciding the number of Burn-in Iterations- Statistics
node mean sd MC error 2.5% median 97.5% start samplea[1] 0.899 0.1011 0.004938 0.7117 0.8949 1.107 2501 3500a[2] 1.339 0.1159 0.004132 1.125 1.333 1.58 2501 3500a[3] 0.7308 0.111 0.005769 0.551 0.717 0.9893 2501 3500a[4] 2.012 0.2712 0.009897 1.531 1.996 2.59 2501 3500a[5] 1.766 0.2202 0.009585 1.394 1.745 2.243 2501 3500
b[1] -1.706 0.2944 0.01793 -2.253 -1.717 -1.1 2501 3500b[2] -0.4277 0.1167 0.005916 -0.6571 -0.428 -0.1857 2501 3500b[3] -0.7499 0.3967 0.01586 -1.409 -0.7994 0.1348 2501 3500b[4] 0.4324 0.09295 0.004443 0.2363 0.4384 0.6008 2501 3500b[5] -0.05619 0.122 0.006737 -0.3127 -0.05246 0.1657 2501 3500
c[1] 0.2458 0.088 0.004718 0.09253 0.2415 0.4362 2501 3500c[2] 0.1403 0.04745 0.002158 0.05368 0.139 0.2361 2501 3500c[3] 0.2538 0.09285 0.005864 0.09991 0.243 0.4557 2501 3500c[4] 0.2669 0.035 0.001491 0.1911 0.2693 0.3282 2501 3500c[5] 0.2588 0.05029 0.002589 0.1526 0.261 0.35 2501 3500
![Page 16: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/16.jpg)
1. Running conditions for WinBUGS
Adaptive phase: 1000 iterations
Burn-in: 1500 iterations
For computing the Statistics: 3500 iterations
Using 1 chain
Using bugs( ) function to run WinBUGS through R◦Need BRugs and R2WinBUGS packages
Results-cont.
![Page 17: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/17.jpg)
2. Effect of Sample SizeResults-cont.
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6BILOG-MG aN=200
N=500N=1000
Item
RM
SE
0 5 10 15 20 25 30 350
0.10.20.30.40.50.60.70.80.9
BILOG-MG bN=200N=500N=1000
Item
RM
SE
0 5 10 15 20 25 30 350
0.04
0.08
0.12
0.16
0.2BILOG-MG c
N=200N=500N=1000
Item
RM
SE
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6WinBUGS aN=200
N=500N=1000
Item
RM
SE
0 5 10 15 20 25 30 350
0.10.20.30.40.50.60.70.80.9
WinBUGS bN=200N=500N=1000
Item
RM
SE
0 5 10 15 20 25 30 350
0.04
0.08
0.12
0.16
0.2WinBUGS c
N=200N=500N=1000
Item
RM
SE
![Page 18: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/18.jpg)
BILOG-MG vs. WinBUGS – a parameter
Results-cont.
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6N=200BILOG-MG
WinBUGS
Item
RM
SE
0 5 10 15 20 25 30 35-0.0999999999999994
5.82867087928207E-16
0.100000000000001
0.200000000000001
0.300000000000001
0.400000000000001
0.500000000000001
0.600000000000001N=500
BILOG-MGWinBUGS
Item
RM
SE
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6N=1000
BILOG-MGWinBUGS
Item
RM
SE
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6N=2000BILOG-MG
WinBUGS
Item
RM
SE
![Page 19: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/19.jpg)
BILOG-MG vs. WinBUGS - b parameter
Results-cont.
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8N=200
BILOG-MGWinBUGS
Item
RM
SE
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8N=500BILOG-MG
WinBUGS
Item
RM
SE
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8N=1000
BILOG-MGWinBUGS
Item
RM
SE
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8N=2000
BILOG-MGWinBUGS
Item
RM
SE
![Page 20: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/20.jpg)
BILOG-MG vs. WinBUGS - c parameter
Results-cont.
0 5 10 15 20 25 30 35-4.16333634234434E-17
0.03
0.06
0.09
0.12N=200 BILOG-MG
WinBUGS
Item
RM
SE
0 5 10 15 20 25 30 35-4.16333634234434E-17
0.03
0.06
0.09
0.12N=500 BILOG-MG
WinBUGS
Item
RM
SE
0 5 10 15 20 25 30 35-4.16333634234434E-17
0.03
0.06
0.09
0.12N=1000 BILOG-MG
WinBUGS
Item
RM
SE
0 5 10 15 20 25 30 35-4.16333634234434E-17
0.03
0.06
0.09
0.12N=2000 BILOG-MG
WinBUGS
Item
RM
SE
![Page 21: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/21.jpg)
Discussion & Conclusions
Larger sample size decreased RMSE for all parameters under both programs.
For N=200, there was a significant convergence problem for BILOG-MG. No problem with WinBUGS.
![Page 22: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/22.jpg)
Discussion & Conclusions-cont.Slope parameter “a”
◦ WinBUGS was superior to BILOG when N = 500 or less◦ More accurately estimated for items without extreme a or b
parameters by both programs.Difficulty parameter “b”
◦ BILOG was superior to WinBUGs when N = 500 or less◦ Both programs had larger error for items either too difficult or
too easyGuessing parameter “c”
◦ WinBUGs was superior to BILOG at all sample sizes, but especially at N = 1,000 or less
◦ More accurately estimated for difficult items by both programs.◦ Both programs had larger error for items with shallow slopes.
![Page 23: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/23.jpg)
LimitationsOnly one chain is used in the simulation study.
Some of the MC errors are not less than 1/20 of the standard deviation, could use more iterations in MCMC sampler
Simulated data◦ Conforms to the 3PL model much more closely than real data
would◦ No missing responses◦ No omit problems◦ Fewer low scores
![Page 24: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/24.jpg)
WinBUGS code for running 3PLmodel 3PL;{ for (i in 1:N) { for (j in 1:n) { e[i,j]<-exp(a[j]*(theta[i]-b[j])) p[i,j] <- c[j]+(1-c[j])*(e[i,j]/(1+e[i,j])) resp[i,j] ~ dbern(p[i,j])
} theta[i] ~ dnorm(0,1) } for (i in 1:n) { a[i] ~ dlnorm(0, 4) b[i] ~ dnorm(0, 0.25) c[i] ~ dbeta(5,17) }}
![Page 25: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/25.jpg)
True Item Parmaetersitem a b c item a b c
1 0.5496 -1.796 0.1751 19 0.6562 0.3853 0.12012 0.7891 -0.4796 0.1165 20 1.0556 0.9481 0.20363 0.4551 -0.7101 0.2087 21 0.3479 2.2768 0.14894 1.4443 0.4833 0.2826 22 0.8432 1.0601 0.23325 0.974 -0.168 0.2625 23 1.1142 0.5826 0.06446 0.5839 -0.8567 0.2038 24 1.4579 1.0241 0.24537 0.8604 0.4546 0.3224 25 0.5137 1.379 0.14278 1.1445 -0.1301 0.2209 26 0.9194 1.0782 0.08799 0.7544 0.0212 0.16 27 1.8811 1.4062 0.1992
10 0.917 1.0139 0.3648 28 1.5045 1.5093 0.164211 0.9592 0.7218 0.2399 29 0.9664 1.5443 0.143112 0.6633 0.0506 0.124 30 0.702 2.2401 0.085313 1.2324 0.4167 0.2535 31 1.2651 1.8759 0.244314 1.0492 0.7882 0.1569 32 0.8567 1.714 0.086515 1.069 0.961 0.2986 33 1.408 1.5556 0.078916 0.9193 0.6099 0.2521 34 0.5808 3.4728 0.139917 0.8935 0.5128 0.2273 35 0.9257 3.1202 0.10918 0.9672 0.195 0.0535 36 1.2993 2.1589 0.1075
![Page 26: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/26.jpg)
Acknowledgement
Professor Katie Cowles
![Page 27: Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?](https://reader035.vdocuments.us/reader035/viewer/2022062305/5681656e550346895dd800ff/html5/thumbnails/27.jpg)
Questions?