optimal test design with rule-based item generation
DESCRIPTION
Optimal Test Design With Rule-Based Item Generation. Wim J. van der Linden. Cees A.W. Glas. Hanneke Geerlings. Item generation and automated test assembly. Hierarchical item response theory Second-level for each family of item parameters. Family calibration - PowerPoint PPT PresentationTRANSCRIPT
Optimal Test Design With Rule-Based Item Generation
Hanneke Geerlings
Wim J. van der Linden
Cees A.W. Glas
Item generation and automated test assembly
• Hierarchical item response theory– Second-level for each family of item parameters.
• Family calibration– New item generated from a family does not need
to be calibrated– Only family parameters are known, “family-
information function” replaces “item-information”– Depend on the degree of within-family item-
parameter variability across all families.
Three different cases of item generation
• Test assembly from a pool of pregenerated individually calibrated items.– Baseline
• Test generation on the fly from pools with calibrated item families.
• Test generation on the fly using calibrated radicals that define the item families.
Rule-Based Item Generation
• Radicals– Systematic use can help to ensure the content
validity of the items• Incidentals– Do not have any systematic effect on item
difficulty• Item cloning– Item with identical radicals but different
incidentals
The features are radicals or incidentals?
• Coding existing items on the presence or absence of the specified radicals and performing– Exploratory
• Developers define their radicals and incidentals. Model checking is using.– Confirmatory
Modeling Rule-Based Item Generation• Between family variation – radicals• Within family variation – incidentals• Multilevel response model– Item cloning model (Glas & van der Linden, 2003)– Level 1
– Level 2
• Common Σ : all families are generated by the same set of incidentals, and no interaction between radicals and incidentals
Modeling Rule-Based Item Generation
• The mean family difficulty as a wieghted sum of the effects of radicals–
– Linear Item Cloning Model (LICM)– with Σf : LICM-F– with Σ : LICM-C
RFRFFbF
RRb
RRb
ddd
ddd
ddd
2211
22221212
12121111
Family-Information Funciton
• Item-information
• Family- information
– Expected about θ in the response to a random item from family f.
Three Cases of Automated Test Design
• Test Assembly From a Pregenerated Item Pool
Three Cases of Automated Test Design
• The Generation on the Fly From Calibrated Families
Three Cases of Automated Test Design
• Test Generation on the Fly Using Calibrated Radicals Only
Effect of Within-Family Item-Parameter Variability on Family Information
•
• σ2=0 means all item parameters are equal to their respective family means.
• ρ = .0, .5 or -.5
Family-Parameter Mean (μ) Variance(σ2)
a 1 0 or 0.05
b-0.258
(optimal to θ = 0) 0 or 0.5
c 0.2 0 or 0.2
• σ2b : decrease in the family information
• σ2a : decrease at ability away from optimal
difficult value• σ2
c : no effect
• σ2a =0.05, σ2
b =0.5, σ2c =0.00
• ρab =.5 : – θ> μb, Family information increase
θ< μb, Family information decrease
Because a↑ b↑ information ↑
• σ2a =0.00, σ2
b =0.5, σ2c =0.2
• ρbc =.5 : small shift to left– Large guessing result in less information
• σ2a =0.05, σ2
b =0.00, σ2c =0.2
• ρac =.5 : counterbalance effect, a↑ c↑
• ρac =-.5 : higher family information
• If σ2 ≠ 0 (ρ ≠.0 ), use of item information lead to overestimation information on θ in a random item from the family.
Simulation Study
• Illustrate Case 1 and Case 2• The use of family information instead of item
information– Effects of test assembly based on knowledge of
item families only.
• β=(-2.0, 1.0, 0.3, 0.9, 0.6, 1.2)• μa~0.8(0.01)1.7, μc~0.1(0.01)0.2• Common Σ, W-B ratio: 0.01, 0.05, 0.1 and 0.2• 10 or 20 items were sampled from the family
distribution
• M1: without any constraints on radicalsM2: with constraints on radicals
• θp= -1, 0, 1. Rp= (1, 1, 1) or (1, 2, 1)• Number of families l= 10 or 20 (one item in
each family)• In M2, Each of the five radicals occur– 5~6 times (l= 10) or 10~12times(l= 20)
• Case 1 (PIP) vs. Case 2 (CF): function (8) vs. (17)
• 10 families and 10 items per family in poolRp= (1, 1, 1)– PIP: W-B↑, information↑, most informative item
are selected– Family information↓, large uncertainty– Fam Inf < True Item Inf (large W-B)– M2 (constraint radical) < M1 (unconstraint radical)
• 10 families and 20 items per family in poolRp= (1, 1, 1)– Doubling pool per family, Slight increase
• 20 families and 10 items per family in poolRp= (1, 1, 1)– The different (PIP & CF), increase– Shape is the same
• Figure 4• Rp= (1, 2, 1)– The value
smaller than uniform target
• Figure 5. uniform• l=10, lf=10, M1– W-B increase,
• CF decrease• PIP tend to target
• Figure 6. Rp= (1, 2, 1)
Discussion
• Model fit• Exposure control• Item uncertainty – capitalization-on-cnance– Small calibration samples– Variability of the true item parameters– Bank size – test length ratio
• Content constraint can mitigate it
Questions
• radicals and incidentals are a bit of abstract• certain combinations of radicals and
incidentals may result in invalid item. How do we know its in validness?
• if the cognitive processes involved in solving the test items are known", which is neally impossible
Questions
• μb=0.258 (Wolfe, 1981)