interactive learning of bayesian networks

56
Interactive Learning of Bayesian Networks Andrés Masegosa, Serafín Moral Department of Computer Science and Artificial Intelligence - University of Granada [email protected] Utrecht, July 2012

Upload: ntnu

Post on 12-Apr-2017

183 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Interactive Learning of Bayesian Networks

Interactive Learning of Bayesian Networks

Andrés Masegosa, Serafín Moral

Department of Computer Science and Artificial Intelligence - University of

[email protected]

Utrecht, July 2012

Page 2: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Outline

1 Motivation

2 The Bayesian Framework

3 Interactive integration of expert knowledge for modelselection

4 Applications:Learning the parent set of a variable in BN.

Learning Markov Boundaries.

Learning complete Bayesian Networks.

5 ConclusionsUtrecht Utrecht (Netherlands) 2/20

Page 3: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Outline

1 Motivation

2 The Bayesian Framework

Utrecht Utrecht (Netherlands) 3/20

Page 4: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Bayesian Networks

Bayesian Networks

Excellent models to graphically represent the dependency structure (MarkovCondition and d-separation) of the underlying distribution in multivariatedomain problems: very relevant source of knowledge .

Inference tasks : compute marginal, evidence propagation, abductive-inductivereasoning, etc.

Utrecht Utrecht (Netherlands) 4/20

Page 5: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Learning Bayesian Networks from Data

Learning Algorithms

Learning Bayesian networks from data is quite challenging: the DAG space issuper-exponential.

There usually are several models with explain the data simila rly well .

Bayesian framework: high posterior probability given the data.

Utrecht Utrecht (Netherlands) 5/20

Page 6: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Learning Bayesian Networks from Data

Uncertainty in Model Selection

This situation is specially common in problem domains with high number ofvariables and low sample sizes .

Utrecht Utrecht (Netherlands) 6/20

Page 7: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Domain/Expert Knowledge

Integration of Domain/Expert Knowledge

Find the best statistical model by the combination of data andexpert/domain knowledge .

Utrecht Utrecht (Netherlands) 7/20

Page 8: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Domain/Expert Knowledge

Integration of Domain/Expert Knowledge

Find the best statistical model by the combination of data andexpert/domain knowledge .

Emerging approach in gene expression data mining :

There is a growing number of biological knowledge data bases.

The model space is usually huge and the number of samples is low (costly

to collect).

Many approaches have shifted from begin pure data-oriented to try to

include domain knowledge.

Utrecht Utrecht (Netherlands) 7/20

Page 9: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Expert Knowledge

Previous Works

There have been many attempts to introduce expert knowledge when learningBNs from data.

Via Prior Distribution: Use of specific prior distributions over the possiblegraph structures to integrate expert knowledge:

Expert assigns higher prior probabilities to most likely edges .

Utrecht Utrecht (Netherlands) 8/20

Page 10: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Expert Knowledge

Previous Works

There have been many attempts to introduce expert knowledge when learningBNs from data.

Via Prior Distribution: Use of specific prior distributions over the possiblegraph structures to integrate expert knowledge:

Expert assigns higher prior probabilities to most likely edges .

Via structural Restrictions: Expert codify his/her knowledge as structuralrestrictions.

Expert defines the existence/absence of arcs and/or edges and causal

ordering restrictions.

Retrieved model should satisfy these restrictions.

Utrecht Utrecht (Netherlands) 8/20

Page 11: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Domain/Expert Knowledge

Limitations of "Prior" Expert Knowledge:

We are forced to include expert/domain knowledge foreach of the elements of the models (e.g. for everypossible edge of a BN).

Utrecht Utrecht (Netherlands) 9/20

Page 12: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Domain/Expert Knowledge

Limitations of "Prior" Expert Knowledge:

We are forced to include expert/domain knowledge foreach of the elements of the models (e.g. for everypossible edge of a BN).

Expert could be biased to provided the “clearest”knowledge, which could be the easiest to be find in thedata.

Utrecht Utrecht (Netherlands) 9/20

Page 13: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Domain/Expert Knowledge

Limitations of "Prior" Expert Knowledge:

We are forced to include expert/domain knowledge foreach of the elements of the models (e.g. for everypossible edge of a BN).

Expert could be biased to provided the “clearest”knowledge, which could be the easiest to be find in thedata.

The system does not help to the user to introduceinformation about the BN structure.

Utrecht Utrecht (Netherlands) 9/20

Page 14: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Domain/Expert Knowledge

Interactive Integration of Expert Knowledge

Data is firstly analyzed.

The system only inquires to the expert about most uncertainelements considering the information present in the data.

Utrecht Utrecht (Netherlands) 10/20

Page 15: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Domain/Expert Knowledge

Interactive Integration of Expert Knowledge

Data is firstly analyzed.

The system only inquires to the expert about most uncertainelements considering the information present in the data.

Benefits :

Expert is only asked a smaller number of times .

We explicitly show to the expert which are the elementsabout which data do not provide enough evidence to makea reliable model selection.

Utrecht Utrecht (Netherlands) 10/20

Page 16: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Domain/Expert Knowledge

Utrecht Utrecht (Netherlands) 11/20

Page 17: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Domain/Expert Knowledge

Utrecht Utrecht (Netherlands) 12/20

Page 18: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Integration of Domain/Expert Knowledge

Active Interaction with the Expert

Strategy: Ask to the expert by the presence of the edges that most reduce themodel uncertainty .

Method: Framework to allow an efficient and effective interaction with the expert.

Expert is only asked for this controversial structural feat ures .

Utrecht Utrecht (Netherlands) 12/20

Page 19: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Outline

1 Motivation

2 The Bayesian Framework

Utrecht Utrecht (Netherlands) 13/20

Page 20: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Notation

Let us denote by X = (X1, ...,Xn) a vector of randomvariables and by D a fully observed set of instancesx = (x1, ..., xn) ∈ Val(X).

Utrecht Utrecht (Netherlands) 14/20

Page 21: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Notation

Let us denote by X = (X1, ...,Xn) a vector of randomvariables and by D a fully observed set of instancesx = (x1, ..., xn) ∈ Val(X).

Let be M a model and M the set of all possible models. Mmay define:

Joint probability distribution: P(X|M) in the case of aBayesian network.

Conditional probability distribution for target variable:P(T |X,M) in the case of a Markov Blanket.

Utrecht Utrecht (Netherlands) 14/20

Page 22: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Examples of Models

Each model M is structured:

It is defined by a vector of elements M = (m1, ...,mK ) whereK is the number of possible components of M.

Utrecht Utrecht (Netherlands) 15/20

Page 23: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Examples of Models

Each model M is structured:

It is defined by a vector of elements M = (m1, ...,mK ) whereK is the number of possible components of M.

Examples :

Bayesian NetworkX 1

X 2

X 3

X 4

M = (1 , 0 , 0 , 1 , 0 , -1 )

Utrecht Utrecht (Netherlands) 15/20

Page 24: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Examples of Models

Each model M is structured:

It is defined by a vector of elements M = (m1, ...,mK ) whereK is the number of possible components of M.

Examples :

Bayesian Network Markov BlanketsX 1

X 2

X 3

X 4

M = (1 , 0 , 0 , 1 , 0 , -1 )

X 1 X 2 X 3 X 4 X 5

T

M = (1 , 1 , 0 , 0 , 1 )

Utrecht Utrecht (Netherlands) 15/20

Page 25: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

The Model Selection Problem: Bayesian Framework

Define a prior probability over the space of alternativemodels P(M).

It is not the classic uniform prior (the multplicity problem).

Utrecht Utrecht (Netherlands) 16/20

Page 26: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

The Model Selection Problem: Bayesian Framework

Define a prior probability over the space of alternativemodels P(M).

It is not the classic uniform prior (the multplicity problem).

For each model, it is computed its Bayesian score :

score(M|D) = P(D|M)P(M)

where P(D|M) =∫θ

P(D|θ,M)P(θ|M) is the marginallikelihood of the model.

Utrecht Utrecht (Netherlands) 16/20

Page 27: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Benefits of the full Bayesian approach

We assume that we are able to approximate theposterior by means of any Monte Carlo method :

P(M|D) =P(D|M)P(M)∑

M′∈Val(M) P(D|M′)P(M′)

Utrecht Utrecht (Netherlands) 17/20

Page 28: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Benefits of the full Bayesian approach

We assume that we are able to approximate theposterior by means of any Monte Carlo method :

P(M|D) =P(D|M)P(M)∑

M′∈Val(M) P(D|M′)P(M′)

We can compute the posterior probability of any of theelements of a mode l:

P(mi |D) =∑

M

P(M|D)IM(mi)

where IM(mi) is the indicator function: 1 if mi is present inM and 0 otherwise.

Utrecht Utrecht (Netherlands) 17/20

Page 29: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Examples

Utrecht Utrecht (Netherlands) 18/20

Page 30: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Examples

High Probable Edges: P(B → X |D) = 1.0 andP(A → X |D) = 0.8

Utrecht Utrecht (Netherlands) 18/20

Page 31: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Examples

High Probable Edges: P(B → X |D) = 1.0 andP(A → X |D) = 0.8

Low Probable Edges: P(C → X |D) = 0.0

Utrecht Utrecht (Netherlands) 18/20

Page 32: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Examples

High Probable Edges: P(B → X |D) = 1.0 andP(A → X |D) = 0.8

Low Probable Edges: P(C → X |D) = 0.0

Uncertain Edges: P(D → X |D) = 0.55

Utrecht Utrecht (Netherlands) 18/20

Page 33: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Outline

1 Motivation

2 The Bayesian Framework

Utrecht Utrecht (Netherlands) 19/20

Page 34: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Interaction with of Expert/Domain Knowledge

Interactive Integration of of Domain/Expert Knowledge :

Expert/Domain Knowledge is given for particularelements mk of the the models M:

If a variable is present or not in the final variable selection.

If there is an edge between any two variables in a BN.

Expert/Domain Knowledge may be not fully reliable .

Utrecht Utrecht (Netherlands) 20/20

Page 35: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

A s k

ME

D

U

m k

k

k

Utrecht Utrecht (Netherlands) 21/20

Page 36: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

A s k

ME

D

U

m k

k

k

E represents expert reliability : Ω(E) = Right ,Wrong.

If expert is wrong we assume a random answer.

Utrecht Utrecht (Netherlands) 21/20

Page 37: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

A s k

ME

D

U

m k

k

k

E represents expert reliability : Ω(E) = Right ,Wrong.

If expert is wrong we assume a random answer.

Our goal is to infer model structure:

U(Askk ,mk ,M,D) = logP(M|mk ,Askk ,D)

Utrecht Utrecht (Netherlands) 21/20

Page 38: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

Utrecht Utrecht (Netherlands) 22/20

Page 39: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

The expected utility of asking and not-asking is:

V (Askk ) =∑

mk

M

P(M|D)P(mk |M,Askk ,D)logP(M|mk ,Askk ,D)

Utrecht Utrecht (Netherlands) 22/20

Page 40: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

The expected utility of asking and not-asking is:

V (Askk ) =∑

mk

M

P(M|D)P(mk |M,Askk ,D)logP(M|mk ,Askk ,D)

V (Askk ) =∑

M

P(M|D)logP(M|D)

Utrecht Utrecht (Netherlands) 22/20

Page 41: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

The expected utility of asking and not-asking is:

V (Askk ) =∑

mk

M

P(M|D)P(mk |M,Askk ,D)logP(M|mk ,Askk ,D)

V (Askk ) =∑

M

P(M|D)logP(M|D)

The difference between both actions, V (Askk )− V (Askk ), is theinformation gain function :

IG(M : mk |D) = H(M|D)−∑

mk

P(mk |D)H(M|mk ,D)

Utrecht Utrecht (Netherlands) 22/20

Page 42: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

The expected utility of asking and not-asking is:

V (Askk ) =∑

mk

M

P(M|D)P(mk |M,Askk ,D)logP(M|mk ,Askk ,D)

V (Askk ) =∑

M

P(M|D)logP(M|D)

The difference between both actions, V (Askk )− V (Askk ), is theinformation gain function :

IG(M : mk |D) = H(M|D)−∑

mk

P(mk |D)H(M|mk ,D)

It can be shown that the element with the highest informationgain is the one with highest entropy :

m⋆

k = maxk

IG(M : mk |D) = maxk

H(mk |D)− H(Ek )

Utrecht Utrecht (Netherlands) 22/20

Page 43: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

1 Approximate P(M|D) by means of a MC technique.

Utrecht Utrecht (Netherlands) 23/20

Page 44: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

1 Approximate P(M|D) by means of a MC technique.

2 Ask the expert about the element mk with the highest entropy .

a = a ∪ a(mk ).

Utrecht Utrecht (Netherlands) 23/20

Page 45: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

1 Approximate P(M|D) by means of a MC technique.

2 Ask the expert about the element mk with the highest entropy .

a = a ∪ a(mk ).

Update P(M|D,a), which is equivalent to:

P(M|D,a) ∝ P(D|M)P(M|a)

Utrecht Utrecht (Netherlands) 23/20

Page 46: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

1 Approximate P(M|D) by means of a MC technique.

2 Ask the expert about the element mk with the highest entropy .

a = a ∪ a(mk ).

Update P(M|D,a), which is equivalent to:

P(M|D,a) ∝ P(D|M)P(M|a)

3 Stop Condition: H(mk |D,a) < λ.

Utrecht Utrecht (Netherlands) 23/20

Page 47: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

1 Approximate P(M|D) by means of a MC technique.

2 Ask the expert about the element mk with the highest entropy .

a = a ∪ a(mk ).

Update P(M|D,a), which is equivalent to:

P(M|D,a) ∝ P(D|M)P(M|a)

3 Stop Condition: H(mk |D,a) < λ.

4 Otherwise:Option 1: Go to Step 2 and ask to the expert again.

Utrecht Utrecht (Netherlands) 23/20

Page 48: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Our Approach for Expert Interaction

1 Approximate P(M|D) by means of a MC technique.

2 Ask the expert about the element mk with the highest entropy .

a = a ∪ a(mk ).

Update P(M|D,a), which is equivalent to:

P(M|D,a) ∝ P(D|M)P(M|a)

3 Stop Condition: H(mk |D,a) < λ.

4 Otherwise:Option 1: Go to Step 2 and ask to the expert again.

Option 2: Go to Step 1 and sample now using P(M|a).

Utrecht Utrecht (Netherlands) 23/20

Page 49: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Example I

Probability of the edges :P(A → X |D) = 0.8

P(B → X |D) = 0.455

P(C → X |D) = 0.75

Utrecht Utrecht (Netherlands) 24/20

Page 50: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Example II

Expert say that B → X is present in the model :P(A → X |D) = 0.88

P(B → X |D) = 1.0

P(C → X |D) = 0.77

Utrecht Utrecht (Netherlands) 25/20

Page 51: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Developments

This methodology has been applied to the following modelselection problems :

Utrecht Utrecht (Netherlands) 26/20

Page 52: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Developments

This methodology has been applied to the following modelselection problems :

Induce a Bayesian network conditioned to a previouslygiven causal order of the variables.

Utrecht Utrecht (Netherlands) 26/20

Page 53: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Developments

This methodology has been applied to the following modelselection problems :

Induce a Bayesian network conditioned to a previouslygiven causal order of the variables.

Induce the Markov Blanket of a target variable (FeatureSelection).

Utrecht Utrecht (Netherlands) 26/20

Page 54: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Developments

This methodology has been applied to the following modelselection problems :

Induce a Bayesian network conditioned to a previouslygiven causal order of the variables.

Induce the Markov Blanket of a target variable (FeatureSelection).

Induce Bayesian Networks without any restriction.

Utrecht Utrecht (Netherlands) 26/20

Page 55: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Conclusions

We have developed a general method for model selectionwhich allow the inclusion of expert knowledge.

The method is robust even when expert knowledge iswrong.

The number of interactions is minimized.

It has been successfully applied to different modelselection problems.

Utrecht Utrecht (Netherlands) 27/20

Page 56: Interactive Learning of Bayesian Networks

Motivation The Bayesian Framework

Future Works

Develop a new score to measure the impact of theinteraction in model selection.

Extend this methodology to other probabilistic graphicalmodels.

Evaluate the impact of the prior over the parameters.Preference among models may change with the parameterprior.Detect the problem and let the user choose.

Employ alternative ways to introduce expert knowledge:E.g. In BN, we ask about edges: direct causal probabilisticrelationships.Many domain knowledge is about correlations andconditional independencies.

Utrecht Utrecht (Netherlands) 28/20