computerized adaptive testing using bayesian networks

Post on 25-Mar-2022

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Computerized adaptive testingusing Bayesian networks

Jirı Vomlel

Academy of Sciences of the Czech Republic

11th July, 2007

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 1 / 25

Educational Testing Service (ETS)

ETS is the world’s largest private educational testing organization.It has 2300 permanent employees.

Number of participants in tests in the school year 2000/2001:3 185 000 SAT I Reasoning Test and SAT II: Subject Area Tests2 293 000 PSAT: Preliminary SAT/National Merit Scholarship

Qualifying Test1 421 000 AP: Advanced Placement Program

801 000 The Praxis Series: Professional Assessments for Be-ginning Teachers and Pre-Professional Skills Tests

787 000 TOEFL: Test of English as a Foreign Language449 000 GRE: Graduate Record Examinations General Test

ETS has a research unit doing research on adaptive testing usingBayesian networks: http://www.ets.org/research/

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 2 / 25

Educational Testing Service (ETS)

ETS is the world’s largest private educational testing organization.It has 2300 permanent employees.Number of participants in tests in the school year 2000/2001:3 185 000 SAT I Reasoning Test and SAT II: Subject Area Tests2 293 000 PSAT: Preliminary SAT/National Merit Scholarship

Qualifying Test1 421 000 AP: Advanced Placement Program

801 000 The Praxis Series: Professional Assessments for Be-ginning Teachers and Pre-Professional Skills Tests

787 000 TOEFL: Test of English as a Foreign Language449 000 GRE: Graduate Record Examinations General Test

ETS has a research unit doing research on adaptive testing usingBayesian networks: http://www.ets.org/research/

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 2 / 25

Educational Testing Service (ETS)

ETS is the world’s largest private educational testing organization.It has 2300 permanent employees.Number of participants in tests in the school year 2000/2001:3 185 000 SAT I Reasoning Test and SAT II: Subject Area Tests2 293 000 PSAT: Preliminary SAT/National Merit Scholarship

Qualifying Test1 421 000 AP: Advanced Placement Program

801 000 The Praxis Series: Professional Assessments for Be-ginning Teachers and Pre-Professional Skills Tests

787 000 TOEFL: Test of English as a Foreign Language449 000 GRE: Graduate Record Examinations General Test

ETS has a research unit doing research on adaptive testing usingBayesian networks: http://www.ets.org/research/

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 2 / 25

Rasch model (G. Rasch,1960)

Model variables

Yn,i binary response variable - its values indicateswhether the answer of person n to question i wascorrect

n = 1, . . . ,N person indexi = 1, . . . , I question index

Model parameters

δi difficulty of question i - fixed effectsβn ability (knowledge level) of person n - a random ef-

fect

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 3 / 25

Rasch model (G. Rasch,1960)

Model variables

Yn,i binary response variable - its values indicateswhether the answer of person n to question i wascorrect

n = 1, . . . ,N person indexi = 1, . . . , I question index

Model parameters

δi difficulty of question i - fixed effectsβn ability (knowledge level) of person n - a random ef-

fect

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 3 / 25

Models for the response variable Y

Yn,i =

{1 if βn ≥ δi0 otherwise.

P(Yn,i = 1) =exp(βn − δi)

1 + exp(βn − δi)

P(Yn,i = 1 | βn) for δi = −2

-6 -4 -2 0 2

0.2

0.4

0.6

0.8

1

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 4 / 25

Models for the response variable Y

Yn,i =

{1 if βn ≥ δi0 otherwise.

P(Yn,i = 1) =exp(βn − δi)

1 + exp(βn − δi)

P(Yn,i = 1 | βn) for δi = −2

-6 -4 -2 0 2

0.2

0.4

0.6

0.8

1

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 4 / 25

Models for the response variable Y

Yn,i =

{1 if βn ≥ δi0 otherwise.

P(Yn,i = 1) =exp(βn − δi)

1 + exp(βn − δi)

P(Yn,i = 1 | βn) for δi = −2

-6 -4 -2 0 2

0.2

0.4

0.6

0.8

1

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 4 / 25

Probability distribution for random effect βn

P(βn) = N (0, σ2)

a normal (Gaussian) distributionwith the mean equal zero, and variance σ2.

-3 -2 -1 0 1 2 30

0.1

0.2

0.3

0.4

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 5 / 25

Probability distribution for random effect βn

P(βn) = N (0, σ2)

a normal (Gaussian) distributionwith the mean equal zero, and variance σ2.

-3 -2 -1 0 1 2 30

0.1

0.2

0.3

0.4

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 5 / 25

Computations with the Rasch model

prior Nβ(0,1)

-3 -2 -1 0 1 2 30

0.1

0.2

0.3

0.4

Nβ(0,1)

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 6 / 25

Computations with the Rasch model

P(Y = 1 | β, δ1 = −2) posterior

-3 -2 -1 0 1 2 3

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Nβ(0,1) · P(Y = 1 | β, δ1 = −2)

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 6 / 25

Computations with the Rasch model

P(Y = 0 | β, δ2 = 0) posterior

-3 -2 -1 0 1 2 3

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 30

0.025

0.05

0.075

0.1

0.125

0.15

0.175

Nβ(0,1) · P(Y = 1 | β, δ1 = −2) · P(Y = 0 | β, δ2 = 0)

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 6 / 25

Computations with the Rasch model

P(Y = 0 | β, δ3 = +1) posterior

-3 -2 -1 0 1 2 3

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 30

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Nβ(0,1) · P(Y = 1 | β, δ1 = −2) · P(Y = 0 | β, δ2 = 0)

·P(Y = 0 | β, δ3 = +1)

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 6 / 25

Computations with the Rasch model

P(Y = 0 | β, δ4 = +2) posterior

-3 -2 -1 0 1 2 3

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 30

0.02

0.04

0.06

0.08

0.1

0.12

Nβ(0,1) · P(Y = 1 | β, δ1 = −2) · P(Y = 0 | β, δ2 = 0)

·P(Y = 0 | β, δ3 = +1) · P(Y = 0 | β, δ4 = +2)

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 6 / 25

Student model and evidence models(R. Almond and R. Mislevy, 1999)

The variables of the models are: (1) skills, abilities,misconceptions, etc. - for brevity called skills - the vector of skillsis denoted S and (2) items (questions) - the vector of questions isdenoted X .

student model describes relations between student skills.evidence models - one for each item (question) - describesrelations of the item to the skills.

S4

X2

S1 S3

S4

S2S3

X3

S2

X1

S3

S5

S1

S5

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 7 / 25

Student model and evidence models(R. Almond and R. Mislevy, 1999)

The variables of the models are: (1) skills, abilities,misconceptions, etc. - for brevity called skills - the vector of skillsis denoted S and (2) items (questions) - the vector of questions isdenoted X .student model describes relations between student skills.

evidence models - one for each item (question) - describesrelations of the item to the skills.

S4

X2

S1 S3

S4

S2S3

X3

S2

X1

S3

S5

S1

S5

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 7 / 25

Student model and evidence models(R. Almond and R. Mislevy, 1999)

The variables of the models are: (1) skills, abilities,misconceptions, etc. - for brevity called skills - the vector of skillsis denoted S and (2) items (questions) - the vector of questions isdenoted X .student model describes relations between student skills.evidence models - one for each item (question) - describesrelations of the item to the skills.

S4

X2

S1 S3

S4

S2S3

X3

S2

X1

S3

S5

S1

S5

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 7 / 25

Building Bayesian network models

Three basic approaches:

Discussions with domain experts: expert knowledge is used toget the structure and parameters of the modelA dataset of records is collected and a machine learning methodis used to to construct a model and estimate its parameters.A combination of previous two: e.g. experts suggest thestructure and collected data are used to estimate parameters.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 8 / 25

Building Bayesian network models

Three basic approaches:Discussions with domain experts: expert knowledge is used toget the structure and parameters of the model

A dataset of records is collected and a machine learning methodis used to to construct a model and estimate its parameters.A combination of previous two: e.g. experts suggest thestructure and collected data are used to estimate parameters.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 8 / 25

Building Bayesian network models

Three basic approaches:Discussions with domain experts: expert knowledge is used toget the structure and parameters of the modelA dataset of records is collected and a machine learning methodis used to to construct a model and estimate its parameters.

A combination of previous two: e.g. experts suggest thestructure and collected data are used to estimate parameters.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 8 / 25

Building Bayesian network models

Three basic approaches:Discussions with domain experts: expert knowledge is used toget the structure and parameters of the modelA dataset of records is collected and a machine learning methodis used to to construct a model and estimate its parameters.A combination of previous two: e.g. experts suggest thestructure and collected data are used to estimate parameters.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 8 / 25

Example of a simple diagnostic task

We want to diagnose presence/absence of three skills

S1,S2,S3

using three questionsX1,2,X1,3,X2,3 .

The questions depend on skills and their dependence is described byconditional probability distributions

P(Xi,j = 1|Si = si ,Sj = sj) =

{1 if (si , sj) = (1,1)0 otherwise.

Assume all answers were wrong, i.e.,

X1,2 = 0, X1,3 = 0, X2,3 = 0 .

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 9 / 25

Example of a simple diagnostic task

We want to diagnose presence/absence of three skills

S1,S2,S3

using three questionsX1,2,X1,3,X2,3 .

The questions depend on skills and their dependence is described byconditional probability distributions

P(Xi,j = 1|Si = si ,Sj = sj) =

{1 if (si , sj) = (1,1)0 otherwise.

Assume all answers were wrong, i.e.,

X1,2 = 0, X1,3 = 0, X2,3 = 0 .

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 9 / 25

Example of a simple diagnostic task

We want to diagnose presence/absence of three skills

S1,S2,S3

using three questionsX1,2,X1,3,X2,3 .

The questions depend on skills and their dependence is described byconditional probability distributions

P(Xi,j = 1|Si = si ,Sj = sj) =

{1 if (si , sj) = (1,1)0 otherwise.

Assume all answers were wrong, i.e.,

X1,2 = 0, X1,3 = 0, X2,3 = 0 .

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 9 / 25

Example of a simple diagnostic task

We want to diagnose presence/absence of three skills

S1,S2,S3

using three questionsX1,2,X1,3,X2,3 .

The questions depend on skills and their dependence is described byconditional probability distributions

P(Xi,j = 1|Si = si ,Sj = sj) =

{1 if (si , sj) = (1,1)0 otherwise.

Assume all answers were wrong, i.e.,

X1,2 = 0, X1,3 = 0, X2,3 = 0 .

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 9 / 25

Reasoning under the assumption of skills’independence

X1,2

X1,3

X2,3

S1 S3

S2

First, assume the skills arepairwise independent, i.e.,

P(S1,S2,S3) = P(S1)·P(S2)·P(S3)

and for i = 1,2,3, si = 0,1P(Si = si) = 0.5

Then conditional probabilities for j = 1,2,3 are

P(Sj = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.75 ,

i.e., we cannot decide with certainty, which skills are present and whichare absent.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 10 / 25

Reasoning under the assumption of skills’independence

X1,2

X1,3

X2,3

S1 S3

S2

First, assume the skills arepairwise independent, i.e.,

P(S1,S2,S3) = P(S1)·P(S2)·P(S3)

and for i = 1,2,3, si = 0,1P(Si = si) = 0.5

Then conditional probabilities for j = 1,2,3 are

P(Sj = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.75 ,

i.e., we cannot decide with certainty, which skills are present and whichare absent.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 10 / 25

Reasoning under the assumption of skills’independence

X1,2

X1,3

X2,3

S1 S3

S2

First, assume the skills arepairwise independent, i.e.,

P(S1,S2,S3) = P(S1)·P(S2)·P(S3)

and for i = 1,2,3, si = 0,1P(Si = si) = 0.5

Then conditional probabilities for j = 1,2,3 are

P(Sj = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.75 ,

i.e., we cannot decide with certainty, which skills are present and whichare absent.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 10 / 25

Modeling dependence between skills

X2,3

X1,3

X1,2

S1 S3

S2

Now, assume there is adeterministic hierarchy amongskills

S1 ⇒ S2, S2 ⇒ S3

Then conditional probabilities are

P(S1 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S2 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S3 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.5

For i = 1,2,3: P(Si | X1,2 = 0,X1,3 = 0,X2,3 = 0) = P(Si | X2,3 = 0).X2,3 = 0 gives the same information as X1,2 = X1,3 = X2,3 = 0.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 11 / 25

Modeling dependence between skills

X2,3

X1,3

X1,2

S1 S3

S2

Now, assume there is adeterministic hierarchy amongskills

S1 ⇒ S2, S2 ⇒ S3

Then conditional probabilities are

P(S1 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S2 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S3 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.5

For i = 1,2,3: P(Si | X1,2 = 0,X1,3 = 0,X2,3 = 0) = P(Si | X2,3 = 0).X2,3 = 0 gives the same information as X1,2 = X1,3 = X2,3 = 0.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 11 / 25

Modeling dependence between skills

X2,3

X1,3

X1,2

S1 S3

S2

Now, assume there is adeterministic hierarchy amongskills

S1 ⇒ S2, S2 ⇒ S3

Then conditional probabilities are

P(S1 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S2 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S3 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.5

For i = 1,2,3: P(Si | X1,2 = 0,X1,3 = 0,X2,3 = 0) = P(Si | X2,3 = 0).X2,3 = 0 gives the same information as X1,2 = X1,3 = X2,3 = 0.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 11 / 25

Modeling dependence between skills

X2,3

X1,3

X1,2

S1 S3

S2

Now, assume there is adeterministic hierarchy amongskills

S1 ⇒ S2, S2 ⇒ S3

Then conditional probabilities are

P(S1 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S2 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S3 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.5

For i = 1,2,3: P(Si | X1,2 = 0,X1,3 = 0,X2,3 = 0) = P(Si | X2,3 = 0).X2,3 = 0 gives the same information as X1,2 = X1,3 = X2,3 = 0.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 11 / 25

Fixed test vs. adaptive test

wrong correct wrong

wrong correct

correct

correctwrongcorrectcorrectwrong wrong correct wrong

Q5

Q4

Q3

Q2

Q1

Q2

Q1

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

Q6 Q8

Q7

Q8

Q4 Q7 Q10

Q6

Q7

Q9

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 12 / 25

Computerized Adaptive Testing (CAT)

The goal of CAT is to tailor each test so that it brings most informationabout each student.

Two basic steps are repeated:

1 estimation of the knowledge level of the tested student2 selection of appropriate question to ask the student

Entropy as an information measure:H (P(S)) = −

∑s P(S = s) · log P(S = s)

0

0.5

1

0 0.5 1

entr

opy

probability

“The lower the entropy the more we know.”

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25

Computerized Adaptive Testing (CAT)

The goal of CAT is to tailor each test so that it brings most informationabout each student.Two basic steps are repeated:

1 estimation of the knowledge level of the tested student

2 selection of appropriate question to ask the studentEntropy as an information measure:

H (P(S)) = −∑

s P(S = s) · log P(S = s)

0

0.5

1

0 0.5 1

entr

opy

probability

“The lower the entropy the more we know.”

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25

Computerized Adaptive Testing (CAT)

The goal of CAT is to tailor each test so that it brings most informationabout each student.Two basic steps are repeated:

1 estimation of the knowledge level of the tested student2 selection of appropriate question to ask the student

Entropy as an information measure:H (P(S)) = −

∑s P(S = s) · log P(S = s)

0

0.5

1

0 0.5 1

entr

opy

probability

“The lower the entropy the more we know.”

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25

Computerized Adaptive Testing (CAT)

The goal of CAT is to tailor each test so that it brings most informationabout each student.Two basic steps are repeated:

1 estimation of the knowledge level of the tested student2 selection of appropriate question to ask the student

Entropy as an information measure:

H (P(S)) = −∑

s P(S = s) · log P(S = s)

0

0.5

1

0 0.5 1

entr

opy

probability

“The lower the entropy the more we know.”

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25

Computerized Adaptive Testing (CAT)

The goal of CAT is to tailor each test so that it brings most informationabout each student.Two basic steps are repeated:

1 estimation of the knowledge level of the tested student2 selection of appropriate question to ask the student

Entropy as an information measure:H (P(S)) = −

∑s P(S = s) · log P(S = s)

0

0.5

1

0 0.5 1

entr

opy

probability

“The lower the entropy the more we know.”

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25

Computerized Adaptive Testing (CAT)

The goal of CAT is to tailor each test so that it brings most informationabout each student.Two basic steps are repeated:

1 estimation of the knowledge level of the tested student2 selection of appropriate question to ask the student

Entropy as an information measure:H (P(S)) = −

∑s P(S = s) · log P(S = s)

0

0.5

1

0 0.5 1

entr

opy

probability

“The lower the entropy the more we know.”

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25

Building strategies using the models

Visit Denmark.

Visit Norway.

Visit Canary Islands.

Do you like hot climate?

Visit Egypt.

Do you like mountains?

Do you like the sea?

X3 = yes

X2 = noX1 = yes

X1 = no

X3 = noX2 = yes

For all terminal nodes (leaves) ` ∈ L(s) of a strategy s we define:

steps that were performed to get to that node (e.g. questionsanswered in a certain way). It is called collected evidence e`.Using the probabilistic model of the domain we can computeprobability of getting to a terminal node P(e`).Also during the process, when we collected evidence e, weupdate the probability of getting to a terminal node to conditionalprobability P(e`|e)

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 14 / 25

Building strategies using the models

Visit Denmark.

Visit Norway.

Visit Canary Islands.

Do you like hot climate?

Visit Egypt.

Do you like mountains?

Do you like the sea?

X3 = yes

X2 = noX1 = yes

X1 = no

X3 = noX2 = yes

For all terminal nodes (leaves) ` ∈ L(s) of a strategy s we define:steps that were performed to get to that node (e.g. questionsanswered in a certain way). It is called collected evidence e`.

Using the probabilistic model of the domain we can computeprobability of getting to a terminal node P(e`).Also during the process, when we collected evidence e, weupdate the probability of getting to a terminal node to conditionalprobability P(e`|e)

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 14 / 25

Building strategies using the models

Visit Denmark.

Visit Norway.

Visit Canary Islands.

Do you like hot climate?

Visit Egypt.

Do you like mountains?

Do you like the sea?

X3 = yes

X2 = noX1 = yes

X1 = no

X3 = noX2 = yes

For all terminal nodes (leaves) ` ∈ L(s) of a strategy s we define:steps that were performed to get to that node (e.g. questionsanswered in a certain way). It is called collected evidence e`.Using the probabilistic model of the domain we can computeprobability of getting to a terminal node P(e`).

Also during the process, when we collected evidence e, weupdate the probability of getting to a terminal node to conditionalprobability P(e`|e)

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 14 / 25

Building strategies using the models

Visit Denmark.

Visit Norway.

Visit Canary Islands.

Do you like hot climate?

Visit Egypt.

Do you like mountains?

Do you like the sea?

X3 = yes

X2 = noX1 = yes

X1 = no

X3 = noX2 = yes

For all terminal nodes (leaves) ` ∈ L(s) of a strategy s we define:steps that were performed to get to that node (e.g. questionsanswered in a certain way). It is called collected evidence e`.Using the probabilistic model of the domain we can computeprobability of getting to a terminal node P(e`).Also during the process, when we collected evidence e, weupdate the probability of getting to a terminal node to conditionalprobability P(e`|e)

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 14 / 25

Building strategies using the models

For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,

The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))

For each strategy we can compute:expected value of the strategy:

Ef (s) =∑`∈L(s)

P(e`) · f (e`)

The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25

Building strategies using the models

For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.

Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))

For each strategy we can compute:expected value of the strategy:

Ef (s) =∑`∈L(s)

P(e`) · f (e`)

The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25

Building strategies using the models

For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))

For each strategy we can compute:expected value of the strategy:

Ef (s) =∑`∈L(s)

P(e`) · f (e`)

The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25

Building strategies using the models

For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))

For each strategy we can compute:

expected value of the strategy:

Ef (s) =∑`∈L(s)

P(e`) · f (e`)

The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25

Building strategies using the models

For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))

For each strategy we can compute:expected value of the strategy:

Ef (s) =∑`∈L(s)

P(e`) · f (e`)

The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25

Building strategies using the models

For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))

For each strategy we can compute:expected value of the strategy:

Ef (s) =∑`∈L(s)

P(e`) · f (e`)

The goal is

to find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25

Building strategies using the models

For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))

For each strategy we can compute:expected value of the strategy:

Ef (s) =∑`∈L(s)

P(e`) · f (e`)

The goal isto find a strategy that maximizes its expected value.

Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25

Building strategies using the models

For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))

For each strategy we can compute:expected value of the strategy:

Ef (s) =∑`∈L(s)

P(e`) · f (e`)

The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,

which corresponds to minimization of expected entropy.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25

Building strategies using the models

For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))

For each strategy we can compute:expected value of the strategy:

Ef (s) =∑`∈L(s)

P(e`) · f (e`)

The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25

Space of tests of length 2 of 3 possible questions

X3

X1

X3

X3

X2

X3

X2

X1

X2

X1

X2

X2

X3

X1

X1

Entropy in node n:H(en) = H(P(S | en))

Expected entropy of a test tEH(t) =

∑`∈L(t) P(e`) · H(e`)

Let T be the set of all possibletests (e.g. of a given length).A test t? is optimal ifft? = arg mint∈T EH(t).Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25

Space of tests of length 2 of 3 possible questions

X3

X1

X3

X3

X2

X3

X2

X1

X2

X1

X2

X2

X3

X1

X1

Entropy in node n:H(en) = H(P(S | en))

Expected entropy of a test tEH(t) =

∑`∈L(t) P(e`) · H(e`)

Let T be the set of all possibletests (e.g. of a given length).A test t? is optimal ifft? = arg mint∈T EH(t).Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25

Space of tests of length 2 of 3 possible questions

X3

X1

X3

X3

X2

X3

X2

X1

X2

X1

X2

X2

X3

X1

X1

Entropy in node n:H(en) = H(P(S | en))

Expected entropy of a test tEH(t) =

∑`∈L(t) P(e`) · H(e`)

Let T be the set of all possibletests (e.g. of a given length).A test t? is optimal ifft? = arg mint∈T EH(t).Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25

Space of tests of length 2 of 3 possible questions

X3

X1

X3

X3

X2

X3

X2

X1

X2

X1

X2

X2

X3

X1

X1

Entropy in node n:H(en) = H(P(S | en))

Expected entropy of a test tEH(t) =

∑`∈L(t) P(e`) · H(e`)

Let T be the set of all possibletests (e.g. of a given length).

A test t? is optimal ifft? = arg mint∈T EH(t).Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25

Space of tests of length 2 of 3 possible questions

X3

X1

X3

X3

X2

X3

X2

X1

X2

X1

X2

X2

X3

X1

X1

Entropy in node n:H(en) = H(P(S | en))

Expected entropy of a test tEH(t) =

∑`∈L(t) P(e`) · H(e`)

Let T be the set of all possibletests (e.g. of a given length).A test t? is optimal ifft? = arg mint∈T EH(t).

Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25

Space of tests of length 2 of 3 possible questions

X3

X1

X3

X3

X2

X3

X2

X1

X2

X1

X2

X2

X3

X1

X1

Entropy in node n:H(en) = H(P(S | en))

Expected entropy of a test tEH(t) =

∑`∈L(t) P(e`) · H(e`)

Let T be the set of all possibletests (e.g. of a given length).A test t? is optimal ifft? = arg mint∈T EH(t).Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25

Adaptive test of basic operations with fractions

Examples of tasks:

T1:(

34 ·

56

)− 1

8 = 1524 −

18 = 5

8 −18 = 4

8 = 12

T2: 16 + 1

12 = 212 + 1

12 = 312 = 1

4

T3: 14 · 1

12 = 1

4 ·32 = 3

8

T4:(

12 ·

12

)·(

13 + 1

3

)= 1

4 ·23 = 2

12 = 16 .

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 17 / 25

Elementary and operational skills

CP Comparison (common numerator or de-nominator)

12 >

13 ,

23 >

13

AD Addition (comm. denom.) 17 + 2

7 = 1+27 = 3

7

SB Subtract. (comm. denom.) 25 −

15 = 2−1

5 = 15

MT Multiplication 12 ·

35 = 3

10

CD Common denominator(1

2 ,23

)=(3

6 ,46

)CL Canceling out 4

6 = 2·22·3 = 2

3

CIM Conv. to mixed numbers 72 = 3·2+1

2 = 312

CMI Conv. to improp. fractions 312 = 3·2+1

2 = 72

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 18 / 25

Misconceptions

Label Description Occurrence

MAD ab + c

d = a+cb+d 14.8%

MSB ab −

cd = a−c

b−d 9.4%

MMT1 ab ·

cb = a·c

b 14.1%

MMT2 ab ·

cb = a+c

b·b 8.1%

MMT3 ab ·

cd = a·d

b·c 15.4%

MMT4 ab ·

cd = a·c

b+d 8.1%

MC abc = a·b

c 4.0%

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 19 / 25

Student model

ACD

HV2 HV1

AD SB CMI CIM CL CD MT

MMT1 MMT2 MMT3 MMT4MCMAD MSB

ACMI ACIM ACL

CP

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 20 / 25

Evidence model for task T 1

(34· 5

6

)− 1

8=

1524− 1

8=

58− 1

8=

48

=12

T1 ⇔ MT & CL & ACL & SB & ¬MMT3 & ¬MMT4 & ¬MSB

X1

P(X1|T1)

MSB

SB

CL ACL

MMT3

MT

MMT4T1

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 21 / 25

Evidence model for task T 1

(34· 5

6

)− 1

8=

1524− 1

8=

58− 1

8=

48

=12

T1 ⇔ MT & CL & ACL & SB & ¬MMT3 & ¬MMT4 & ¬MSB

X1

P(X1|T1)

MSB

SB

CL ACL

MMT3

MT

MMT4T1

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 21 / 25

Evidence model for task T 1

(34· 5

6

)− 1

8=

1524− 1

8=

58− 1

8=

48

=12

T1 ⇔ MT & CL & ACL & SB & ¬MMT3 & ¬MMT4 & ¬MSB

X1

P(X1|T1)

MSB

SB

CL ACL

MMT3

MT

MMT4T1

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 21 / 25

Evidence model for task T 1 connected with thestudent model

MMT1

T1

MMT2 MMT3 MMT4MCMAD MSB

ACMI ACIM ACL ACD

CD MTCLCIMCMISBAD

HV1HV2

X1

CP

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 22 / 25

Skill Prediction Quality

74

76

78

80

82

84

86

88

90

92

0 2 4 6 8 10 12 14 16 18 20

Number of answered questions

adaptive testfixed descending order

fixed ascending order

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 23 / 25

Entropy

3

3.5

4

4.5

5

5.5

6

6.5

7

0 2 4 6 8 10 12 14 16 18 20

Number of answered questions

adaptive testfixed descending order

fixed ascending order

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 24 / 25

Conclusions and References

We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.

Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.

References:Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25

Conclusions and References

We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.

References:Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25

Conclusions and References

We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.

References:

Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25

Conclusions and References

We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.

References:Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.

Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25

Conclusions and References

We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.

References:Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.

J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25

Conclusions and References

We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.

References:Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.

J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25

top related