jeffreys' and bdeu priors for model selection

Jeffreys' and BDeu Priors for Model Selection

WITMSE 2016

Helsinki, Finland, September 20Joe Suzuki(prof-joe)

Joe Suzuki (Osaka Univ., Japan)

Goal and Contributions

[Goal] Compare for model selection

• BDeu (Bayesian Dirichlet equivalent uniform)

• Jeffreys prior (T-K estimator)

[Contribution]

Mathematically Proves

Road Map

1. Bayesian Dirichlet Scores

2. BDeu and Jeffreys Scores

3. A Found Property and its Proof

4. Main Theorem

5. Regularity in Model Selection

6. Summary

Assign a Prob. to each Seq.

Express a Prob. by the product of Cond. Probs.

Simultaneous Probs.

Cond. Probs.

BDeu and Jeffreys’ Prior

Example 1 : Bayesian Network Structure Learning (BNSL)

Example 2: Independence Testing

A Motivating Example

A Found Property

Sketch of J(n)>0 for BDeu

Sketch of J(n)≦0 for Jeffreys’

An Intuitive Reasoning

Main Theorem

Examples

more likely

unlikely

Regularity in Model Selection

Fitness + Simplicity → optimal

(-1) x Likelihood + Penalty Term → min

Newton’sLaw of Motion

MaxwellEquations

If model A is better than model B w.r.t. fitness and simplicity,model A should be chosen (regularity).

Information CriteriaLASSO

BDeu violates regularity in model selection

Z XZ X

Y

Y X

B&B for efficient BNSL (Depth First Search)

Those bounds utilize regularity

Campos and Ji 2011 figured out one (=nice)

but the bound is not efficient (experiments).

Designing Pruning rules for BDeu is HARDer.

because regularity cannot be assumed

Bayes Prior

Based on his/her Belief:

Nobody should reject it from a general point of view.

BDeu violates regularity

contradicts with Newton, Maxwell, Information Critreria, LASSO, etc.

People might notice that their beliefs have been wrong, after knowing the new result in this paper.

Summary

The prior behind BDeu might have been based on a wrong belief That contradicts regularity in model selection

Future Work: Consider NML and others in a similar way

jeffreys' and bdeu priors for model selection

Science