optimal test design with rule-based item generation

Optimal Test Design With Rule-Based Item Generation

Hanneke Geerlings

Wim J. van der Linden

Cees A.W. Glas

Item generation and automated test assembly

• Hierarchical item response theory– Second-level for each family of item parameters.

• Family calibration– New item generated from a family does not need

to be calibrated– Only family parameters are known, “family-

information function” replaces “item-information”– Depend on the degree of within-family item-

parameter variability across all families.

Three different cases of item generation

• Test assembly from a pool of pregenerated individually calibrated items.– Baseline

• Test generation on the fly from pools with calibrated item families.

• Test generation on the fly using calibrated radicals that define the item families.

Rule-Based Item Generation

• Radicals– Systematic use can help to ensure the content

validity of the items• Incidentals– Do not have any systematic effect on item

difficulty• Item cloning– Item with identical radicals but different

incidentals

The features are radicals or incidentals?

• Coding existing items on the presence or absence of the specified radicals and performing– Exploratory

• Developers define their radicals and incidentals. Model checking is using.– Confirmatory

Modeling Rule-Based Item Generation• Between family variation – radicals• Within family variation – incidentals• Multilevel response model– Item cloning model (Glas & van der Linden, 2003)– Level 1

– Level 2

• Common Σ : all families are generated by the same set of incidentals, and no interaction between radicals and incidentals

Modeling Rule-Based Item Generation

• The mean family difficulty as a wieghted sum of the effects of radicals–

– Linear Item Cloning Model (LICM)– with Σf : LICM-F– with Σ : LICM-C

RFRFFbF

RRb

RRb

ddd

ddd

ddd

2211

22221212

12121111

Family-Information Funciton

• Item-information

• Family- information

– Expected about θ in the response to a random item from family f.

Three Cases of Automated Test Design

• Test Assembly From a Pregenerated Item Pool


• The Generation on the Fly From Calibrated Families


• Test Generation on the Fly Using Calibrated Radicals Only

Effect of Within-Family Item-Parameter Variability on Family Information

•

• σ2=0 means all item parameters are equal to their respective family means.

• ρ = .0, .5 or -.5

Family-Parameter Mean (μ) Variance(σ2)

a 1 0 or 0.05

b-0.258

(optimal to θ = 0) 0 or 0.5

c 0.2 0 or 0.2

• σ2b : decrease in the family information

• σ2a : decrease at ability away from optimal

difficult value• σ2

c : no effect

• σ2a =0.05, σ2

b =0.5, σ2c =0.00

• ρab =.5 : – θ> μb, Family information increase

θ< μb, Family information decrease

Because a↑ b↑ information ↑

• σ2a =0.00, σ2

b =0.5, σ2c =0.2

• ρbc =.5 : small shift to left– Large guessing result in less information

• σ2a =0.05, σ2

b =0.00, σ2c =0.2

• ρac =.5 : counterbalance effect, a↑ c↑

• ρac =-.5 : higher family information

• If σ2 ≠ 0 (ρ ≠.0 ), use of item information lead to overestimation information on θ in a random item from the family.

Simulation Study

• Illustrate Case 1 and Case 2• The use of family information instead of item

information– Effects of test assembly based on knowledge of

item families only.

• β=(-2.0, 1.0, 0.3, 0.9, 0.6, 1.2)• μa~0.8(0.01)1.7, μc~0.1(0.01)0.2• Common Σ, W-B ratio: 0.01, 0.05, 0.1 and 0.2• 10 or 20 items were sampled from the family

distribution

• M1: without any constraints on radicalsM2: with constraints on radicals

• θp= -1, 0, 1. Rp= (1, 1, 1) or (1, 2, 1)• Number of families l= 10 or 20 (one item in

each family)• In M2, Each of the five radicals occur– 5~6 times (l= 10) or 10~12times(l= 20)

• Case 1 (PIP) vs. Case 2 (CF): function (8) vs. (17)

• 10 families and 10 items per family in poolRp= (1, 1, 1)– PIP: W-B↑, information↑, most informative item

are selected– Family information↓, large uncertainty– Fam Inf < True Item Inf (large W-B)– M2 (constraint radical) < M1 (unconstraint radical)

• 10 families and 20 items per family in poolRp= (1, 1, 1)– Doubling pool per family, Slight increase

• 20 families and 10 items per family in poolRp= (1, 1, 1)– The different (PIP & CF), increase– Shape is the same

• Figure 4• Rp= (1, 2, 1)– The value

smaller than uniform target

• Figure 5. uniform• l=10, lf=10, M1– W-B increase,

• CF decrease• PIP tend to target

• Figure 6. Rp= (1, 2, 1)

Discussion

• Model fit• Exposure control• Item uncertainty – capitalization-on-cnance– Small calibration samples– Variability of the true item parameters– Bank size – test length ratio

• Content constraint can mitigate it

Questions

• radicals and incidentals are a bit of abstract• certain combinations of radicals and

incidentals may result in invalid item. How do we know its in validness?

• if the cognitive processes involved in solving the test items are known", which is neally impossible

Questions

• μb=0.258 (Wolfe, 1981)

optimal test design with rule-based item generation

Documents