bayesian model choice as a classification problem - applied bayesian biostatistics … ·...

140
Bayesian model choice as a classification problem Jean-Michel Marin University of Montpellier, CNRS Alexander Grothendieck Montpellier Institute 22 May 2019 Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 1 / 36

Upload: others

Post on 18-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Bayesian model choice as a classificationproblem

Jean-Michel Marin

University of Montpellier, CNRSAlexander Grothendieck Montpellier Institute

22 May 2019

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 1 / 36

Page 2: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Thanks

Numerous colleagues participated to parts of this work

I Christian Robert (bayesian model selection andApproximate Bayesian Computation)

I Pierre Pudlo (Approximate bayesian Computation)I Arnaud Estoup (population genetics applications)I Judith, Louis, Mohammed, Natesh ...

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 2 / 36

Page 3: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Thanks

Numerous colleagues participated to parts of this work

I Christian Robert (bayesian model selection andApproximate Bayesian Computation)

I Pierre Pudlo (Approximate bayesian Computation)I Arnaud Estoup (population genetics applications)I Judith, Louis, Mohammed, Natesh ...

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 2 / 36

Page 4: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Thanks

Numerous colleagues participated to parts of this work

I Christian Robert (bayesian model selection andApproximate Bayesian Computation)

I Pierre Pudlo (Approximate bayesian Computation)I Arnaud Estoup (population genetics applications)I Judith, Louis, Mohammed, Natesh ...

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 2 / 36

Page 5: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Thanks

Numerous colleagues participated to parts of this work

I Christian Robert (bayesian model selection andApproximate Bayesian Computation)

I Pierre Pudlo (Approximate bayesian Computation)I Arnaud Estoup (population genetics applications)I Judith, Louis, Mohammed, Natesh ...

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 2 / 36

Page 6: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Thanks

Numerous colleagues participated to parts of this work

I Christian Robert (bayesian model selection andApproximate Bayesian Computation)

I Pierre Pudlo (Approximate bayesian Computation)I Arnaud Estoup (population genetics applications)I Judith, Louis, Mohammed, Natesh ...

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 2 / 36

Page 7: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

M Bayesian parametric models in competition

fm(y|θm) πm(θm) m = 1, . . . ,M

Prior probabilities in the model space P(M = m)

Target: the model’s posterior probabilities

P(M = m|y) ∝ P(M = m)

∫fm(y|θm)πm(θm)dθm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 3 / 36

Page 8: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

M Bayesian parametric models in competition

fm(y|θm) πm(θm) m = 1, . . . ,M

Prior probabilities in the model space P(M = m)

Target: the model’s posterior probabilities

P(M = m|y) ∝ P(M = m)

∫fm(y|θm)πm(θm)dθm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 3 / 36

Page 9: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

M Bayesian parametric models in competition

fm(y|θm) πm(θm) m = 1, . . . ,M

Prior probabilities in the model space P(M = m)

Target: the model’s posterior probabilities

P(M = m|y) ∝ P(M = m)

∫fm(y|θm)πm(θm)dθm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 3 / 36

Page 10: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

M Bayesian parametric models in competition

fm(y|θm) πm(θm) m = 1, . . . ,M

Prior probabilities in the model space P(M = m)

Target: the model’s posterior probabilities

P(M = m|y) ∝ P(M = m)

∫fm(y|θm)πm(θm)dθm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 3 / 36

Page 11: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

A key quantity the marginal likelihood (the evidence)∫fm(y|θm)πm(θm)dθm

Bayesian inference embodies Occam’s razor

A simple model, like Model 0,makes only a limited range ofpredictions; a more powerfulmodel, like Model 1, is able topredict a greater variety of datasets

If the data set falls in region R, the less powerful model will bethe more probable model

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 4 / 36

Page 12: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

A key quantity the marginal likelihood (the evidence)∫fm(y|θm)πm(θm)dθm

Bayesian inference embodies Occam’s razor

A simple model, like Model 0,makes only a limited range ofpredictions; a more powerfulmodel, like Model 1, is able topredict a greater variety of datasets

If the data set falls in region R, the less powerful model will bethe more probable model

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 4 / 36

Page 13: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

A key quantity the marginal likelihood (the evidence)∫fm(y|θm)πm(θm)dθm

Bayesian inference embodies Occam’s razor

y

Margin

al like

lihood

Model 1

Model 0

R

A simple model, like Model 0,makes only a limited range ofpredictions; a more powerfulmodel, like Model 1, is able topredict a greater variety of datasets

If the data set falls in region R, the less powerful model will bethe more probable model

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 4 / 36

Page 14: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

A key quantity the marginal likelihood (the evidence)∫fm(y|θm)πm(θm)dθm

Bayesian inference embodies Occam’s razor

y

Margin

al like

lihood

Model 1

Model 0

R

A simple model, like Model 0,makes only a limited range ofpredictions; a more powerfulmodel, like Model 1, is able topredict a greater variety of datasets

If the data set falls in region R, the less powerful model will bethe more probable model

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 4 / 36

Page 15: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

The marginal likelihood corresponds to a penalized likelihood

The BIC information criterium comes from an asymptoticLaplace approximation of the marginal likelihood

Drton and Plummer (2017) Very nice extensions for singularmodel selection problems

Bayes factor for models M1 and M0

B10 =

∫f1(y|θ1)π1(θ1)dθ1∫f0(y|θ0)π0(θ0)dθ0

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 5 / 36

Page 16: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

The marginal likelihood corresponds to a penalized likelihood

The BIC information criterium comes from an asymptoticLaplace approximation of the marginal likelihood

Drton and Plummer (2017) Very nice extensions for singularmodel selection problems

Bayes factor for models M1 and M0

B10 =

∫f1(y|θ1)π1(θ1)dθ1∫f0(y|θ0)π0(θ0)dθ0

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 5 / 36

Page 17: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

The marginal likelihood corresponds to a penalized likelihood

The BIC information criterium comes from an asymptoticLaplace approximation of the marginal likelihood

Drton and Plummer (2017) Very nice extensions for singularmodel selection problems

Bayes factor for models M1 and M0

B10 =

∫f1(y|θ1)π1(θ1)dθ1∫f0(y|θ0)π0(θ0)dθ0

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 5 / 36

Page 18: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

The marginal likelihood corresponds to a penalized likelihood

The BIC information criterium comes from an asymptoticLaplace approximation of the marginal likelihood

Drton and Plummer (2017) Very nice extensions for singularmodel selection problems

Bayes factor for models M1 and M0

B10 =

∫f1(y|θ1)π1(θ1)dθ1∫f0(y|θ0)π0(θ0)dθ0

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 5 / 36

Page 19: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

Difficulties with the Bayesian model choice paradigm

Prior difficulties

I When we have prior informations, how to choose the priordistributions on the parameters of each model in acompatible way?

I When we do not have any prior information, we can notuse easily improper prior distributions

I What about the prior distribution in the models’s space?

We do not address these crucial questions in this talk

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 6 / 36

Page 20: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

Difficulties with the Bayesian model choice paradigm

Prior difficulties

I When we have prior informations, how to choose the priordistributions on the parameters of each model in acompatible way?

I When we do not have any prior information, we can notuse easily improper prior distributions

I What about the prior distribution in the models’s space?

We do not address these crucial questions in this talk

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 6 / 36

Page 21: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

Difficulties with the Bayesian model choice paradigm

Prior difficulties

I When we have prior informations, how to choose the priordistributions on the parameters of each model in acompatible way?

I When we do not have any prior information, we can notuse easily improper prior distributions

I What about the prior distribution in the models’s space?

We do not address these crucial questions in this talk

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 6 / 36

Page 22: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

Difficulties with the Bayesian model choice paradigm

Prior difficulties

I When we have prior informations, how to choose the priordistributions on the parameters of each model in acompatible way?

I When we do not have any prior information, we can notuse easily improper prior distributions

I What about the prior distribution in the models’s space?

We do not address these crucial questions in this talk

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 6 / 36

Page 23: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

Difficulties with the Bayesian model choice paradigm

Prior difficulties

I When we have prior informations, how to choose the priordistributions on the parameters of each model in acompatible way?

I When we do not have any prior information, we can notuse easily improper prior distributions

I What about the prior distribution in the models’s space?

We do not address these crucial questions in this talk

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 6 / 36

Page 24: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

Difficulties with the Bayesian model choice paradigm

Prior difficulties

I When we have prior informations, how to choose the priordistributions on the parameters of each model in acompatible way?

I When we do not have any prior information, we can notuse easily improper prior distributions

I What about the prior distribution in the models’s space?

We do not address these crucial questions in this talk

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 6 / 36

Page 25: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

Computational difficulties

I How to approximate the marginal likelihoods?

I When the number of models in consideration is huge, howto explore the models’s space?

We consider the case of a limited number of models and notaddress trans-dimensional sampling solutions, like the reversiblejump algorithm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 7 / 36

Page 26: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

Computational difficulties

I How to approximate the marginal likelihoods?

I When the number of models in consideration is huge, howto explore the models’s space?

We consider the case of a limited number of models and notaddress trans-dimensional sampling solutions, like the reversiblejump algorithm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 7 / 36

Page 27: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

Computational difficulties

I How to approximate the marginal likelihoods?

I When the number of models in consideration is huge, howto explore the models’s space?

We consider the case of a limited number of models and notaddress trans-dimensional sampling solutions, like the reversiblejump algorithm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 7 / 36

Page 28: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

Computational difficulties

I How to approximate the marginal likelihoods?

I When the number of models in consideration is huge, howto explore the models’s space?

We consider the case of a limited number of models and notaddress trans-dimensional sampling solutions, like the reversiblejump algorithm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 7 / 36

Page 29: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

We concentrate on the crucial question: how to approximate themarginal likelihood or find the model that maximises it

Two cases: the calculating of the likelihood is tractable or not

Goal of this talk: show you that, in each case, these prob-lems can be re-written as classification problems and thatthe associated estimation methods can be very effective

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 8 / 36

Page 30: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

We concentrate on the crucial question: how to approximate themarginal likelihood or find the model that maximises it

Two cases: the calculating of the likelihood is tractable or not

Goal of this talk: show you that, in each case, these prob-lems can be re-written as classification problems and thatthe associated estimation methods can be very effective

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 8 / 36

Page 31: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

We concentrate on the crucial question: how to approximate themarginal likelihood or find the model that maximises it

Two cases: the calculating of the likelihood is tractable or not

Goal of this talk: show you that, in each case, these prob-lems can be re-written as classification problems and thatthe associated estimation methods can be very effective

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 8 / 36

Page 32: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

I Tractable likelihood: use of a logistic regression to estimatethe marginal likelihood

I Intractable likelihood: use of Approximate Bayesian Modelchoice using random forests

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 9 / 36

Page 33: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Introduction

I Tractable likelihood: use of a logistic regression to estimatethe marginal likelihood

I Intractable likelihood: use of Approximate Bayesian Modelchoice using random forests

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 9 / 36

Page 34: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodStandard Monte Carlo approximation

The standard Monte Carlo approximation of

m(y) =∫f(y|θ)π(θ)dθ = Eπ [f(y|θ)]

is given by1N

N∑i=1

f(y|θi)

where θ1, . . . ,θN is an N-sample from π(·)

When the prior is far from the posterior, very high variance

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 10 / 36

Page 35: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodStandard Monte Carlo approximation

The standard Monte Carlo approximation of

m(y) =∫f(y|θ)π(θ)dθ = Eπ [f(y|θ)]

is given by1N

N∑i=1

f(y|θi)

where θ1, . . . ,θN is an N-sample from π(·)

When the prior is far from the posterior, very high variance

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 10 / 36

Page 36: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodStandard Monte Carlo approximation

The standard Monte Carlo approximation of

m(y) =∫f(y|θ)π(θ)dθ = Eπ [f(y|θ)]

is given by1N

N∑i=1

f(y|θi)

where θ1, . . . ,θN is an N-sample from π(·)

When the prior is far from the posterior, very high variance

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 10 / 36

Page 37: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodStandard Monte Carlo approximation

The standard Monte Carlo approximation of

m(y) =∫f(y|θ)π(θ)dθ = Eπ [f(y|θ)]

is given by1N

N∑i=1

f(y|θi)

where θ1, . . . ,θN is an N-sample from π(·)

When the prior is far from the posterior, very high variance

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 10 / 36

Page 38: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodImportance sampling approximation

Let g(·) be a distribution such that g(θ) > 0when f(y|θ)π(θ) > 0

The importance sampling approximation of

m(y) =∫f(y|θ)π(θ)dθ = Eg

[f(y|θ)

π(θ)

g(θ)

]is given by

1N

N∑i=1

f(y|θi)π(θi)

g(θi)

where θ1, . . . ,θN is an N-sample from g(·)

Problem specific and curse of dimensionality

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 11 / 36

Page 39: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodImportance sampling approximation

Let g(·) be a distribution such that g(θ) > 0when f(y|θ)π(θ) > 0

The importance sampling approximation of

m(y) =∫f(y|θ)π(θ)dθ = Eg

[f(y|θ)

π(θ)

g(θ)

]is given by

1N

N∑i=1

f(y|θi)π(θi)

g(θi)

where θ1, . . . ,θN is an N-sample from g(·)

Problem specific and curse of dimensionality

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 11 / 36

Page 40: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodImportance sampling approximation

Let g(·) be a distribution such that g(θ) > 0when f(y|θ)π(θ) > 0

The importance sampling approximation of

m(y) =∫f(y|θ)π(θ)dθ = Eg

[f(y|θ)

π(θ)

g(θ)

]is given by

1N

N∑i=1

f(y|θi)π(θi)

g(θi)

where θ1, . . . ,θN is an N-sample from g(·)

Problem specific and curse of dimensionality

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 11 / 36

Page 41: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodImportance sampling approximation

Let g(·) be a distribution such that g(θ) > 0when f(y|θ)π(θ) > 0

The importance sampling approximation of

m(y) =∫f(y|θ)π(θ)dθ = Eg

[f(y|θ)

π(θ)

g(θ)

]is given by

1N

N∑i=1

f(y|θi)π(θi)

g(θi)

where θ1, . . . ,θN is an N-sample from g(·)

Problem specific and curse of dimensionality

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 11 / 36

Page 42: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodThe harmonic mean estimator

Let ϕ(·) be a distribution such that ϕ(θ) = 0when π(θ)f(y|θ) = 0

[ϕ(θ)

π(θ)f(y|θ)

∣∣∣∣y] = ∫ϕ(θ)

π(θ)f(y|θ)π(θ)f(y|θ)m(y)

dθ =1

m(y)

The harmonic mean approximation Newton and Raftery (1994) ofm(y) is given by

1

/N−1

N∑i=1

ϕ(θi)

π(θi)f(y|θi)

where θ1, . . . ,θN is an N-sample from π(·|y)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 12 / 36

Page 43: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodThe harmonic mean estimator

Let ϕ(·) be a distribution such that ϕ(θ) = 0when π(θ)f(y|θ) = 0

[ϕ(θ)

π(θ)f(y|θ)

∣∣∣∣y] = ∫ϕ(θ)

π(θ)f(y|θ)π(θ)f(y|θ)m(y)

dθ =1

m(y)

The harmonic mean approximation Newton and Raftery (1994) ofm(y) is given by

1

/N−1

N∑i=1

ϕ(θi)

π(θi)f(y|θi)

where θ1, . . . ,θN is an N-sample from π(·|y)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 12 / 36

Page 44: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodThe harmonic mean estimator

Let ϕ(·) be a distribution such that ϕ(θ) = 0when π(θ)f(y|θ) = 0

[ϕ(θ)

π(θ)f(y|θ)

∣∣∣∣y] = ∫ϕ(θ)

π(θ)f(y|θ)π(θ)f(y|θ)m(y)

dθ =1

m(y)

The harmonic mean approximation Newton and Raftery (1994) ofm(y) is given by

1

/N−1

N∑i=1

ϕ(θi)

π(θi)f(y|θi)

where θ1, . . . ,θN is an N-sample from π(·|y)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 12 / 36

Page 45: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodThe harmonic mean estimator

Let ϕ(·) be a distribution such that ϕ(θ) = 0when π(θ)f(y|θ) = 0

[ϕ(θ)

π(θ)f(y|θ)

∣∣∣∣y] = ∫ϕ(θ)

π(θ)f(y|θ)π(θ)f(y|θ)m(y)

dθ =1

m(y)

The harmonic mean approximation Newton and Raftery (1994) ofm(y) is given by

1

/N−1

N∑i=1

ϕ(θi)

π(θi)f(y|θi)

where θ1, . . . ,θN is an N-sample from π(·|y)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 12 / 36

Page 46: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodThe harmonic mean estimator

As opposed to usual importance sampling constraints, thedensity ϕ(θ) must have lighter—rather than fatter—tails thanπ(θ)f(y|θ) for the approximation of the marginal likelihood to en-joy finite variance

Using ϕ(θ) = π(θ) as in the original harmonic mean approxi-mation will most usually result in an infinite variance estimator

Very high variance

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 13 / 36

Page 47: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodThe harmonic mean estimator

As opposed to usual importance sampling constraints, thedensity ϕ(θ) must have lighter—rather than fatter—tails thanπ(θ)f(y|θ) for the approximation of the marginal likelihood to en-joy finite variance

Using ϕ(θ) = π(θ) as in the original harmonic mean approxi-mation will most usually result in an infinite variance estimator

Very high variance

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 13 / 36

Page 48: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodThe harmonic mean estimator

As opposed to usual importance sampling constraints, thedensity ϕ(θ) must have lighter—rather than fatter—tails thanπ(θ)f(y|θ) for the approximation of the marginal likelihood to en-joy finite variance

Using ϕ(θ) = π(θ) as in the original harmonic mean approxi-mation will most usually result in an infinite variance estimator

Very high variance

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 13 / 36

Page 49: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodChib’s solution

m(y) =f(y|θ)π(θ)π(θ|y)

,∀θ

For an arbitrary value θ∗ of θ, the Chib’s Chib (1995) approxima-tion to the marginal likelihood is

m(y) =f(y|θ∗)π(θ∗)π(θ∗|y)

π(θ|y) may be the Gaussian approximation based on the MLE

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 14 / 36

Page 50: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodChib’s solution

m(y) =f(y|θ)π(θ)π(θ|y)

,∀θ

For an arbitrary value θ∗ of θ, the Chib’s Chib (1995) approxima-tion to the marginal likelihood is

m(y) =f(y|θ∗)π(θ∗)π(θ∗|y)

π(θ|y) may be the Gaussian approximation based on the MLE

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 14 / 36

Page 51: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodChib’s solution

m(y) =f(y|θ)π(θ)π(θ|y)

,∀θ

For an arbitrary value θ∗ of θ, the Chib’s Chib (1995) approxima-tion to the marginal likelihood is

m(y) =f(y|θ∗)π(θ∗)π(θ∗|y)

π(θ|y) may be the Gaussian approximation based on the MLE

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 14 / 36

Page 52: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodChib’s solution

m(y) =f(y|θ)π(θ)π(θ|y)

,∀θ

For an arbitrary value θ∗ of θ, the Chib’s Chib (1995) approxima-tion to the marginal likelihood is

m(y) =f(y|θ∗)π(θ∗)π(θ∗|y)

π(θ|y) may be the Gaussian approximation based on the MLE

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 14 / 36

Page 53: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodChib’s solution

A second solution is to use a nonparametric approximationbased on a preliminary MCMC sample

In the setting of latent variables models, Chib’s approximationcan be attractive as there exists a natural approximation toπk(θ

∗|y)

π(θ∗|y) =1T

T∑t=1

π(θ∗|y, z(t))

where the z(t)’s are the latent variables simulated by the MCMCsampler

High variance and curse of dimensionality

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 15 / 36

Page 54: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodChib’s solution

A second solution is to use a nonparametric approximationbased on a preliminary MCMC sample

In the setting of latent variables models, Chib’s approximationcan be attractive as there exists a natural approximation toπk(θ

∗|y)

π(θ∗|y) =1T

T∑t=1

π(θ∗|y, z(t))

where the z(t)’s are the latent variables simulated by the MCMCsampler

High variance and curse of dimensionality

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 15 / 36

Page 55: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodChib’s solution

A second solution is to use a nonparametric approximationbased on a preliminary MCMC sample

In the setting of latent variables models, Chib’s approximationcan be attractive as there exists a natural approximation toπk(θ

∗|y)

π(θ∗|y) =1T

T∑t=1

π(θ∗|y, z(t))

where the z(t)’s are the latent variables simulated by the MCMCsampler

High variance and curse of dimensionality

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 15 / 36

Page 56: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodSome others alternatives

Large set of approximations for marginal likelihoodor Bayes factors

I Annealed Importance Sampling by Neal [2001]

I Bridge sampling techniques Meng and Wong 1996; Mengand Schilling 2002Nice R library bridgesampling (Gronau, Singmann,Wagenmakers)

I The Savage–Dickey ratio Verdinelli and Wasserman (1995),Marin and Robert (2010)

I ...

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 16 / 36

Page 57: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodSome others alternatives

Large set of approximations for marginal likelihoodor Bayes factors

I Annealed Importance Sampling by Neal [2001]

I Bridge sampling techniques Meng and Wong 1996; Mengand Schilling 2002Nice R library bridgesampling (Gronau, Singmann,Wagenmakers)

I The Savage–Dickey ratio Verdinelli and Wasserman (1995),Marin and Robert (2010)

I ...

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 16 / 36

Page 58: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodSome others alternatives

Large set of approximations for marginal likelihoodor Bayes factors

I Annealed Importance Sampling by Neal [2001]

I Bridge sampling techniques Meng and Wong 1996; Mengand Schilling 2002Nice R library bridgesampling (Gronau, Singmann,Wagenmakers)

I The Savage–Dickey ratio Verdinelli and Wasserman (1995),Marin and Robert (2010)

I ...

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 16 / 36

Page 59: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodSome others alternatives

Large set of approximations for marginal likelihoodor Bayes factors

I Annealed Importance Sampling by Neal [2001]

I Bridge sampling techniques Meng and Wong 1996; Mengand Schilling 2002Nice R library bridgesampling (Gronau, Singmann,Wagenmakers)

I The Savage–Dickey ratio Verdinelli and Wasserman (1995),Marin and Robert (2010)

I ...

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 16 / 36

Page 60: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodSome others alternatives

Large set of approximations for marginal likelihoodor Bayes factors

I Annealed Importance Sampling by Neal [2001]

I Bridge sampling techniques Meng and Wong 1996; Mengand Schilling 2002Nice R library bridgesampling (Gronau, Singmann,Wagenmakers)

I The Savage–Dickey ratio Verdinelli and Wasserman (1995),Marin and Robert (2010)

I ...

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 16 / 36

Page 61: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

Idea: reduce an estimation problem to a classification problemSeveral versions:

I Logistic regression for density estimation: Hastie et al.(2003)

I Intensity estimation: Baddeley et al. (2010)

I Logistic regression for estimation in unnormalised models:Geyer (1994) and Gutmann and Hyvarinen (2012)

The last one is called noise-contrastive estimation by the authors

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 17 / 36

Page 62: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

Idea: reduce an estimation problem to a classification problemSeveral versions:

I Logistic regression for density estimation: Hastie et al.(2003)

I Intensity estimation: Baddeley et al. (2010)

I Logistic regression for estimation in unnormalised models:Geyer (1994) and Gutmann and Hyvarinen (2012)

The last one is called noise-contrastive estimation by the authors

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 17 / 36

Page 63: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

Idea: reduce an estimation problem to a classification problemSeveral versions:

I Logistic regression for density estimation: Hastie et al.(2003)

I Intensity estimation: Baddeley et al. (2010)

I Logistic regression for estimation in unnormalised models:Geyer (1994) and Gutmann and Hyvarinen (2012)

The last one is called noise-contrastive estimation by the authors

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 17 / 36

Page 64: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

Idea: reduce an estimation problem to a classification problemSeveral versions:

I Logistic regression for density estimation: Hastie et al.(2003)

I Intensity estimation: Baddeley et al. (2010)

I Logistic regression for estimation in unnormalised models:Geyer (1994) and Gutmann and Hyvarinen (2012)

The last one is called noise-contrastive estimation by the authors

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 17 / 36

Page 65: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

Idea: reduce an estimation problem to a classification problemSeveral versions:

I Logistic regression for density estimation: Hastie et al.(2003)

I Intensity estimation: Baddeley et al. (2010)

I Logistic regression for estimation in unnormalised models:Geyer (1994) and Gutmann and Hyvarinen (2012)

The last one is called noise-contrastive estimation by the authors

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 17 / 36

Page 66: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

Suppose thatI θ1, . . . ,θN is an N-sample from π(·|y)I u1, . . . ,uN is an N-sample from π(·)

Let ζ = (θ1, . . . ,θN,u1, . . . ,uN)

We note zi = 1 if the ζi comes from π(·|y) and zi = 0 if ζi = 0comes from π(·):

f(θ|z = 1) = π(θ|y) =f(y|θ)π(θ)m(y)

f(θ|z = 0) = π(θ)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 18 / 36

Page 67: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

Suppose thatI θ1, . . . ,θN is an N-sample from π(·|y)I u1, . . . ,uN is an N-sample from π(·)

Let ζ = (θ1, . . . ,θN,u1, . . . ,uN)

We note zi = 1 if the ζi comes from π(·|y) and zi = 0 if ζi = 0comes from π(·):

f(θ|z = 1) = π(θ|y) =f(y|θ)π(θ)m(y)

f(θ|z = 0) = π(θ)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 18 / 36

Page 68: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

Suppose thatI θ1, . . . ,θN is an N-sample from π(·|y)I u1, . . . ,uN is an N-sample from π(·)

Let ζ = (θ1, . . . ,θN,u1, . . . ,uN)

We note zi = 1 if the ζi comes from π(·|y) and zi = 0 if ζi = 0comes from π(·):

f(θ|z = 1) = π(θ|y) =f(y|θ)π(θ)m(y)

f(θ|z = 0) = π(θ)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 18 / 36

Page 69: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

The log-odds ratio is

η(θ) = log[P(z = 1|θ)P(z = 0|θ)

]

From an estimate of η(θ), we can deduce an estimate of m(y)

η(θ) = log(f(y|θ)π(θ)) − log(m(y)) − log(π(θ))

η(θ) = c+ log(f(y|θ))

wherec = − log(m(y))

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 19 / 36

Page 70: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

The log-odds ratio is

η(θ) = log[P(z = 1|θ)P(z = 0|θ)

]

From an estimate of η(θ), we can deduce an estimate of m(y)

η(θ) = log(f(y|θ)π(θ)) − log(m(y)) − log(π(θ))

η(θ) = c+ log(f(y|θ))

wherec = − log(m(y))

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 19 / 36

Page 71: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

The log-odds ratio is

η(θ) = log[P(z = 1|θ)P(z = 0|θ)

]

From an estimate of η(θ), we can deduce an estimate of m(y)

η(θ) = log(f(y|θ)π(θ)) − log(m(y)) − log(π(θ))

η(θ) = c+ log(f(y|θ))

wherec = − log(m(y))

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 19 / 36

Page 72: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

The log-odds ratio is

η(θ) = log[P(z = 1|θ)P(z = 0|θ)

]

From an estimate of η(θ), we can deduce an estimate of m(y)

η(θ) = log(f(y|θ)π(θ)) − log(m(y)) − log(π(θ))

η(θ) = c+ log(f(y|θ))

wherec = − log(m(y))

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 19 / 36

Page 73: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

c is the intercept of a logistic regression model

c can be estimate using our two simulated datasets and the max-imum likelihood estimator (not explicit)

c ∈ arg maxc

(n∑i=1

[c+ log(f(y|θi))]

n∑i=1

log(1 + exp(c) log(f(y|θi)))

n∑i=1

log(1 + exp(c) log(f(y|ui)))

)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 20 / 36

Page 74: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

c is the intercept of a logistic regression model

c can be estimate using our two simulated datasets and the max-imum likelihood estimator (not explicit)

c ∈ arg maxc

(n∑i=1

[c+ log(f(y|θi))]

n∑i=1

log(1 + exp(c) log(f(y|θi)))

n∑i=1

log(1 + exp(c) log(f(y|ui)))

)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 20 / 36

Page 75: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

c is the intercept of a logistic regression model

c can be estimate using our two simulated datasets and the max-imum likelihood estimator (not explicit)

c ∈ arg maxc

(n∑i=1

[c+ log(f(y|θi))]

n∑i=1

log(1 + exp(c) log(f(y|θi)))

n∑i=1

log(1 + exp(c) log(f(y|ui)))

)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 20 / 36

Page 76: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

A very toy example

y|θ ∼ N (θ, 1)

θ ∼ N (0, 1)

In such a case,

θ|y ∼ N

(y

2,12

)m(y) =

1√2π√

2exp

(−y2

4

)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 21 / 36

Page 77: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

A very toy example

y|θ ∼ N (θ, 1)

θ ∼ N (0, 1)

In such a case,

θ|y ∼ N

(y

2,12

)m(y) =

1√2π√

2exp

(−y2

4

)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 21 / 36

Page 78: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

A very toy example

y|θ ∼ N (θ, 1)

θ ∼ N (0, 1)

In such a case,

θ|y ∼ N

(y

2,12

)m(y) =

1√2π√

2exp

(−y2

4

)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 21 / 36

Page 79: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

# y <- rnorm(1,rnorm(1),1)

y <- -0.5

target <- dnorm(y,0,sqrt(2))

thetaprior <- rnorm(1000)

thetapost <- rnorm(1000,y/2,sqrt(1/2))

zeta <- c(thetapost,thetaprior)

z <- c(rep(1,1000),rep(0,1000))

x <- log(dnorm(y,zeta,1))

df <- data.frame(z=z,x=x)

model <- glm(z˜offset(x),data=df,family=binomial)

1/exp(as.numeric(model$coefficients))

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 22 / 36

Page 80: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

With Christian Robert, we are testing this strategy, the first re-sults are impressive (work in progress)

We will work also on promising extension to

I estimate ratio of normalizing contants

I replace the prior by a distribution

I adapt to MCMC samples from the posterior

I adapt to latent variable models

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 23 / 36

Page 81: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

With Christian Robert, we are testing this strategy, the first re-sults are impressive (work in progress)

We will work also on promising extension to

I estimate ratio of normalizing contants

I replace the prior by a distribution

I adapt to MCMC samples from the posterior

I adapt to latent variable models

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 23 / 36

Page 82: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

With Christian Robert, we are testing this strategy, the first re-sults are impressive (work in progress)

We will work also on promising extension to

I estimate ratio of normalizing contants

I replace the prior by a distribution

I adapt to MCMC samples from the posterior

I adapt to latent variable models

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 23 / 36

Page 83: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

With Christian Robert, we are testing this strategy, the first re-sults are impressive (work in progress)

We will work also on promising extension to

I estimate ratio of normalizing contants

I replace the prior by a distribution

I adapt to MCMC samples from the posterior

I adapt to latent variable models

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 23 / 36

Page 84: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

With Christian Robert, we are testing this strategy, the first re-sults are impressive (work in progress)

We will work also on promising extension to

I estimate ratio of normalizing contants

I replace the prior by a distribution

I adapt to MCMC samples from the posterior

I adapt to latent variable models

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 23 / 36

Page 85: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Tractable likelihoodLogistic regression approximation

With Christian Robert, we are testing this strategy, the first re-sults are impressive (work in progress)

We will work also on promising extension to

I estimate ratio of normalizing contants

I replace the prior by a distribution

I adapt to MCMC samples from the posterior

I adapt to latent variable models

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 23 / 36

Page 86: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodContext

When the likelihood function f(y|θ) is expensive or impossibleto calculate, it is extremely difficult to sample from the posteriordistribution

π(θ|y) ∝ π(θ)f(y|θ)

Two typical situations:

f(y|θ) =

∫f(y, u|θ)µ(du), the calculation of this integral is in-

tractable and the latent vector u takes values in a high dimen-sional space (e.g. population genetics models)

f(y|θ) = g(y,θ)/Z(θ) and the calculation of Z(θ) is intractable(e.g. for Markov random fields)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 24 / 36

Page 87: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodContext

When the likelihood function f(y|θ) is expensive or impossibleto calculate, it is extremely difficult to sample from the posteriordistribution

π(θ|y) ∝ π(θ)f(y|θ)

Two typical situations:

f(y|θ) =

∫f(y, u|θ)µ(du), the calculation of this integral is in-

tractable and the latent vector u takes values in a high dimen-sional space (e.g. population genetics models)

f(y|θ) = g(y,θ)/Z(θ) and the calculation of Z(θ) is intractable(e.g. for Markov random fields)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 24 / 36

Page 88: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodContext

When the likelihood function f(y|θ) is expensive or impossibleto calculate, it is extremely difficult to sample from the posteriordistribution

π(θ|y) ∝ π(θ)f(y|θ)

Two typical situations:

f(y|θ) =

∫f(y, u|θ)µ(du), the calculation of this integral is in-

tractable and the latent vector u takes values in a high dimen-sional space (e.g. population genetics models)

f(y|θ) = g(y,θ)/Z(θ) and the calculation of Z(θ) is intractable(e.g. for Markov random fields)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 24 / 36

Page 89: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodContext

When the likelihood function f(y|θ) is expensive or impossibleto calculate, it is extremely difficult to sample from the posteriordistribution

π(θ|y) ∝ π(θ)f(y|θ)

Two typical situations:

f(y|θ) =

∫f(y, u|θ)µ(du), the calculation of this integral is in-

tractable and the latent vector u takes values in a high dimen-sional space (e.g. population genetics models)

f(y|θ) = g(y,θ)/Z(θ) and the calculation of Z(θ) is intractable(e.g. for Markov random fields)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 24 / 36

Page 90: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodContext

When the likelihood function f(y|θ) is expensive or impossibleto calculate, it is extremely difficult to sample from the posteriordistribution

π(θ|y) ∝ π(θ)f(y|θ)

Two typical situations:

f(y|θ) =

∫f(y, u|θ)µ(du), the calculation of this integral is in-

tractable and the latent vector u takes values in a high dimen-sional space (e.g. population genetics models)

f(y|θ) = g(y,θ)/Z(θ) and the calculation of Z(θ) is intractable(e.g. for Markov random fields)

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 24 / 36

Page 91: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodContext

ABC is a technique that only requires being able to samplefrom the likelihood f(·|θ)

This technique stemmed from population genetics models,about 20 years ago, and population geneticists still significantlycontribute to methodological developments of ABC

If, with Christian Robert, we work on ABC methods, we can bevery grateful to our biologist colleagues!

When the likelihood functions fm(y|θm) are intractable, it is verychallenging to estimate the marginal likelihoods∫

fm(y|θm)πm(θm)dθm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 25 / 36

Page 92: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodContext

ABC is a technique that only requires being able to samplefrom the likelihood f(·|θ)

This technique stemmed from population genetics models,about 20 years ago, and population geneticists still significantlycontribute to methodological developments of ABC

If, with Christian Robert, we work on ABC methods, we can bevery grateful to our biologist colleagues!

When the likelihood functions fm(y|θm) are intractable, it is verychallenging to estimate the marginal likelihoods∫

fm(y|θm)πm(θm)dθm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 25 / 36

Page 93: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodContext

ABC is a technique that only requires being able to samplefrom the likelihood f(·|θ)

This technique stemmed from population genetics models,about 20 years ago, and population geneticists still significantlycontribute to methodological developments of ABC

If, with Christian Robert, we work on ABC methods, we can bevery grateful to our biologist colleagues!

When the likelihood functions fm(y|θm) are intractable, it is verychallenging to estimate the marginal likelihoods∫

fm(y|θm)πm(θm)dθm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 25 / 36

Page 94: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodContext

ABC is a technique that only requires being able to samplefrom the likelihood f(·|θ)

This technique stemmed from population genetics models,about 20 years ago, and population geneticists still significantlycontribute to methodological developments of ABC

If, with Christian Robert, we work on ABC methods, we can bevery grateful to our biologist colleagues!

When the likelihood functions fm(y|θm) are intractable, it is verychallenging to estimate the marginal likelihoods∫

fm(y|θm)πm(θm)dθm

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 25 / 36

Page 95: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

ABC likelihood-free methods for model choice in Gibbs randomfields Grelaud, Robert, Marin, Rodolphe and Taly (2009) BayesianAnalysis

1) For i = 1, . . . ,N

a) Generate mi from the prior P(M = m)b) Generate θ ′mi

from the prior πmi(·)

c) Generate z from the model fmi(·|θ ′mi

)d) Calculate di = d(η(z),η(y))

2) Order the distances d(1), . . . ,d(N)

3) Select the model using the majority rule among the k-smallestdistances index set

A standard K-Nearest Neighbor classifier

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 26 / 36

Page 96: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

ABC likelihood-free methods for model choice in Gibbs randomfields Grelaud, Robert, Marin, Rodolphe and Taly (2009) BayesianAnalysis

1) For i = 1, . . . ,N

a) Generate mi from the prior P(M = m)b) Generate θ ′mi

from the prior πmi(·)

c) Generate z from the model fmi(·|θ ′mi

)d) Calculate di = d(η(z),η(y))

2) Order the distances d(1), . . . ,d(N)

3) Select the model using the majority rule among the k-smallestdistances index set

A standard K-Nearest Neighbor classifier

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 26 / 36

Page 97: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

ABC likelihood-free methods for model choice in Gibbs randomfields Grelaud, Robert, Marin, Rodolphe and Taly (2009) BayesianAnalysis

1) For i = 1, . . . ,N

a) Generate mi from the prior P(M = m)b) Generate θ ′mi

from the prior πmi(·)

c) Generate z from the model fmi(·|θ ′mi

)d) Calculate di = d(η(z),η(y))

2) Order the distances d(1), . . . ,d(N)

3) Select the model using the majority rule among the k-smallestdistances index set

A standard K-Nearest Neighbor classifier

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 26 / 36

Page 98: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

ABC likelihood-free methods for model choice in Gibbs randomfields Grelaud, Robert, Marin, Rodolphe and Taly (2009) BayesianAnalysis

1) For i = 1, . . . ,N

a) Generate mi from the prior P(M = m)b) Generate θ ′mi

from the prior πmi(·)

c) Generate z from the model fmi(·|θ ′mi

)d) Calculate di = d(η(z),η(y))

2) Order the distances d(1), . . . ,d(N)

3) Select the model using the majority rule among the k-smallestdistances index set

A standard K-Nearest Neighbor classifier

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 26 / 36

Page 99: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

ABC likelihood-free methods for model choice in Gibbs randomfields Grelaud, Robert, Marin, Rodolphe and Taly (2009) BayesianAnalysis

1) For i = 1, . . . ,N

a) Generate mi from the prior P(M = m)b) Generate θ ′mi

from the prior πmi(·)

c) Generate z from the model fmi(·|θ ′mi

)d) Calculate di = d(η(z),η(y))

2) Order the distances d(1), . . . ,d(N)

3) Select the model using the majority rule among the k-smallestdistances index set

A standard K-Nearest Neighbor classifier

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 26 / 36

Page 100: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

ABC likelihood-free methods for model choice in Gibbs randomfields Grelaud, Robert, Marin, Rodolphe and Taly (2009) BayesianAnalysis

1) For i = 1, . . . ,N

a) Generate mi from the prior P(M = m)b) Generate θ ′mi

from the prior πmi(·)

c) Generate z from the model fmi(·|θ ′mi

)d) Calculate di = d(η(z),η(y))

2) Order the distances d(1), . . . ,d(N)

3) Select the model using the majority rule among the k-smallestdistances index set

A standard K-Nearest Neighbor classifier

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 26 / 36

Page 101: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

ABC likelihood-free methods for model choice in Gibbs randomfields Grelaud, Robert, Marin, Rodolphe and Taly (2009) BayesianAnalysis

1) For i = 1, . . . ,N

a) Generate mi from the prior P(M = m)b) Generate θ ′mi

from the prior πmi(·)

c) Generate z from the model fmi(·|θ ′mi

)d) Calculate di = d(η(z),η(y))

2) Order the distances d(1), . . . ,d(N)

3) Select the model using the majority rule among the k-smallestdistances index set

A standard K-Nearest Neighbor classifier

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 26 / 36

Page 102: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

ABC likelihood-free methods for model choice in Gibbs randomfields Grelaud, Robert, Marin, Rodolphe and Taly (2009) BayesianAnalysis

1) For i = 1, . . . ,N

a) Generate mi from the prior P(M = m)b) Generate θ ′mi

from the prior πmi(·)

c) Generate z from the model fmi(·|θ ′mi

)d) Calculate di = d(η(z),η(y))

2) Order the distances d(1), . . . ,d(N)

3) Select the model using the majority rule among the k-smallestdistances index set

A standard K-Nearest Neighbor classifier

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 26 / 36

Page 103: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

ABC likelihood-free methods for model choice in Gibbs randomfields Grelaud, Robert, Marin, Rodolphe and Taly (2009) BayesianAnalysis

1) For i = 1, . . . ,N

a) Generate mi from the prior P(M = m)b) Generate θ ′mi

from the prior πmi(·)

c) Generate z from the model fmi(·|θ ′mi

)d) Calculate di = d(η(z),η(y))

2) Order the distances d(1), . . . ,d(N)

3) Select the model using the majority rule among the k-smallestdistances index set

A standard K-Nearest Neighbor classifier

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 26 / 36

Page 104: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

If η(y) is a sufficient statistics for the model choice problem, thiscan work pretty well

If not...

Lack of confidence in approximate Bayesian computation model choiceRobert, Cornuet, Marin, Pillai (2011) PNAS

Relevant statistics for Bayesian model choiceMarin, Pillai, Robert, Rousseau (2014) JRSS B

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 27 / 36

Page 105: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

If η(y) is a sufficient statistics for the model choice problem, thiscan work pretty well

If not...

Lack of confidence in approximate Bayesian computation model choiceRobert, Cornuet, Marin, Pillai (2011) PNAS

Relevant statistics for Bayesian model choiceMarin, Pillai, Robert, Rousseau (2014) JRSS B

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 27 / 36

Page 106: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

I intuitiveI simple to implementI embarrassingly parallelisableI BUT curse of dimensionality: most of the simulations are at

the boundary of the space as the number of summarystatistics increases

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 28 / 36

Page 107: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

I intuitiveI simple to implementI embarrassingly parallelisableI BUT curse of dimensionality: most of the simulations are at

the boundary of the space as the number of summarystatistics increases

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 28 / 36

Page 108: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

I intuitiveI simple to implementI embarrassingly parallelisableI BUT curse of dimensionality: most of the simulations are at

the boundary of the space as the number of summarystatistics increases

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 28 / 36

Page 109: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodClassic ABC model choice procedure

I intuitiveI simple to implementI embarrassingly parallelisableI BUT curse of dimensionality: most of the simulations are at

the boundary of the space as the number of summarystatistics increases

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 28 / 36

Page 110: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodDIY-ABC software

Infering population history with DIY ABC: a user-friedly approach Ap-proximate Bayesian Computation Cornuet, Santos, Beaumont, Robert,Marin, Balding, Guillemaud, Estoup (2008) Bioinformatics

DIYABC v2.0: a software to make Approximate Bayesian Computationinferences about population history using Single Nucleotide Polymor-phism, DNA sequence and microsatellite data Cornuet, Pudlo, Veyssier,Dehne-Garcia, Gautier, Leblois, Marin, Estoup (2014) Bioinformatics

Asian ladybugEuropean honey beeDrosophila suzukiiFour human populations, to studythe out-of-Africa colonizationPigmy human populations

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 29 / 36

Page 111: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodDIY-ABC software

Infering population history with DIY ABC: a user-friedly approach Ap-proximate Bayesian Computation Cornuet, Santos, Beaumont, Robert,Marin, Balding, Guillemaud, Estoup (2008) Bioinformatics

DIYABC v2.0: a software to make Approximate Bayesian Computationinferences about population history using Single Nucleotide Polymor-phism, DNA sequence and microsatellite data Cornuet, Pudlo, Veyssier,Dehne-Garcia, Gautier, Leblois, Marin, Estoup (2014) Bioinformatics

Asian ladybugEuropean honey beeDrosophila suzukiiFour human populations, to studythe out-of-Africa colonizationPigmy human populations

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 29 / 36

Page 112: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodDIY-ABC software

Infering population history with DIY ABC: a user-friedly approach Ap-proximate Bayesian Computation Cornuet, Santos, Beaumont, Robert,Marin, Balding, Guillemaud, Estoup (2008) Bioinformatics

DIYABC v2.0: a software to make Approximate Bayesian Computationinferences about population history using Single Nucleotide Polymor-phism, DNA sequence and microsatellite data Cornuet, Pudlo, Veyssier,Dehne-Garcia, Gautier, Leblois, Marin, Estoup (2014) Bioinformatics

Asian ladybugEuropean honey beeDrosophila suzukiiFour human populations, to studythe out-of-Africa colonizationPigmy human populations

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 29 / 36

Page 113: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodDIY-ABC software

Infering population history with DIY ABC: a user-friedly approach Ap-proximate Bayesian Computation Cornuet, Santos, Beaumont, Robert,Marin, Balding, Guillemaud, Estoup (2008) Bioinformatics

DIYABC v2.0: a software to make Approximate Bayesian Computationinferences about population history using Single Nucleotide Polymor-phism, DNA sequence and microsatellite data Cornuet, Pudlo, Veyssier,Dehne-Garcia, Gautier, Leblois, Marin, Estoup (2014) Bioinformatics

Asian ladybugEuropean honey beeDrosophila suzukiiFour human populations, to studythe out-of-Africa colonizationPigmy human populations

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 29 / 36

Page 114: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodDIY-ABC software

Infering population history with DIY ABC: a user-friedly approach Ap-proximate Bayesian Computation Cornuet, Santos, Beaumont, Robert,Marin, Balding, Guillemaud, Estoup (2008) Bioinformatics

DIYABC v2.0: a software to make Approximate Bayesian Computationinferences about population history using Single Nucleotide Polymor-phism, DNA sequence and microsatellite data Cornuet, Pudlo, Veyssier,Dehne-Garcia, Gautier, Leblois, Marin, Estoup (2014) Bioinformatics

Asian ladybugEuropean honey beeDrosophila suzukiiFour human populations, to studythe out-of-Africa colonizationPigmy human populations

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 29 / 36

Page 115: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodDIY-ABC software

Infering population history with DIY ABC: a user-friedly approach Ap-proximate Bayesian Computation Cornuet, Santos, Beaumont, Robert,Marin, Balding, Guillemaud, Estoup (2008) Bioinformatics

DIYABC v2.0: a software to make Approximate Bayesian Computationinferences about population history using Single Nucleotide Polymor-phism, DNA sequence and microsatellite data Cornuet, Pudlo, Veyssier,Dehne-Garcia, Gautier, Leblois, Marin, Estoup (2014) Bioinformatics

Asian ladybugEuropean honey beeDrosophila suzukiiFour human populations, to studythe out-of-Africa colonizationPigmy human populations

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 29 / 36

Page 116: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodDIY-ABC software

Infering population history with DIY ABC: a user-friedly approach Ap-proximate Bayesian Computation Cornuet, Santos, Beaumont, Robert,Marin, Balding, Guillemaud, Estoup (2008) Bioinformatics

DIYABC v2.0: a software to make Approximate Bayesian Computationinferences about population history using Single Nucleotide Polymor-phism, DNA sequence and microsatellite data Cornuet, Pudlo, Veyssier,Dehne-Garcia, Gautier, Leblois, Marin, Estoup (2014) Bioinformatics

Asian ladybugEuropean honey beeDrosophila suzukiiFour human populations, to studythe out-of-Africa colonizationPigmy human populations

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 29 / 36

Page 117: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodFrontline news from population geneticists country

DIYABC (2014) paper has now around 500 citations

I simulate from the model can be very computationally intensive,parallelizable algorithms are necessary

I sequential methods are difficult to calibrate and do not givereproducible results

I available techniques to select the summary statistics do not givereproducible results

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 30 / 36

Page 118: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodFrontline news from population geneticists country

DIYABC (2014) paper has now around 500 citations

I simulate from the model can be very computationally intensive,parallelizable algorithms are necessary

I sequential methods are difficult to calibrate and do not givereproducible results

I available techniques to select the summary statistics do not givereproducible results

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 30 / 36

Page 119: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodFrontline news from population geneticists country

DIYABC (2014) paper has now around 500 citations

I simulate from the model can be very computationally intensive,parallelizable algorithms are necessary

I sequential methods are difficult to calibrate and do not givereproducible results

I available techniques to select the summary statistics do not givereproducible results

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 30 / 36

Page 120: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodFrontline news from population geneticists country

DIYABC (2014) paper has now around 500 citations

I simulate from the model can be very computationally intensive,parallelizable algorithms are necessary

I sequential methods are difficult to calibrate and do not givereproducible results

I available techniques to select the summary statistics do not givereproducible results

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 30 / 36

Page 121: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodFrontline news from population geneticists country

DIYABC (2014) paper has now around 500 citations

I simulate from the model can be very computationally intensive,parallelizable algorithms are necessary

I sequential methods are difficult to calibrate and do not givereproducible results

I available techniques to select the summary statistics do not givereproducible results

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 30 / 36

Page 122: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodFrontline news from population geneticists country

Two major difficulties

I to ensure reliability of the method, the number ofsimulations should be large

I choice of the summary statistics is still a problem

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 31 / 36

Page 123: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodFrontline news from population geneticists country

Two major difficulties

I to ensure reliability of the method, the number ofsimulations should be large

I choice of the summary statistics is still a problem

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 31 / 36

Page 124: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodFrontline news from population geneticists country

Two major difficulties

I to ensure reliability of the method, the number ofsimulations should be large

I choice of the summary statistics is still a problem

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 31 / 36

Page 125: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodUse modern machine learning tools

Exploiting a large number of summary statistics is not an issuefor some machine learning methods

Idea: learn on a huge reference table using random forests

Some theoretical guarantees for sparse problems

Analysis of a random forest modelBiau (2012) JMLR

Consistency of random forestsScornet, Biau, Vert (2015) The Annals of Statistics

This work stands at the interface between Bayesian inferenceand machine learning techniques

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 32 / 36

Page 126: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodUse modern machine learning tools

Exploiting a large number of summary statistics is not an issuefor some machine learning methods

Idea: learn on a huge reference table using random forests

Some theoretical guarantees for sparse problems

Analysis of a random forest modelBiau (2012) JMLR

Consistency of random forestsScornet, Biau, Vert (2015) The Annals of Statistics

This work stands at the interface between Bayesian inferenceand machine learning techniques

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 32 / 36

Page 127: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodUse modern machine learning tools

Exploiting a large number of summary statistics is not an issuefor some machine learning methods

Idea: learn on a huge reference table using random forests

Some theoretical guarantees for sparse problems

Analysis of a random forest modelBiau (2012) JMLR

Consistency of random forestsScornet, Biau, Vert (2015) The Annals of Statistics

This work stands at the interface between Bayesian inferenceand machine learning techniques

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 32 / 36

Page 128: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodUse modern machine learning tools

Exploiting a large number of summary statistics is not an issuefor some machine learning methods

Idea: learn on a huge reference table using random forests

Some theoretical guarantees for sparse problems

Analysis of a random forest modelBiau (2012) JMLR

Consistency of random forestsScornet, Biau, Vert (2015) The Annals of Statistics

This work stands at the interface between Bayesian inferenceand machine learning techniques

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 32 / 36

Page 129: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodUse modern machine learning tools

Exploiting a large number of summary statistics is not an issuefor some machine learning methods

Idea: learn on a huge reference table using random forests

Some theoretical guarantees for sparse problems

Analysis of a random forest modelBiau (2012) JMLR

Consistency of random forestsScornet, Biau, Vert (2015) The Annals of Statistics

This work stands at the interface between Bayesian inferenceand machine learning techniques

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 32 / 36

Page 130: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodUse modern machine learning tools

Exploiting a large number of summary statistics is not an issuefor some machine learning methods

Idea: learn on a huge reference table using random forests

Some theoretical guarantees for sparse problems

Analysis of a random forest modelBiau (2012) JMLR

Consistency of random forestsScornet, Biau, Vert (2015) The Annals of Statistics

This work stands at the interface between Bayesian inferenceand machine learning techniques

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 32 / 36

Page 131: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodUse modern machine learning tools

Exploiting a large number of summary statistics is not an issuefor some machine learning methods

Idea: learn on a huge reference table using random forests

Some theoretical guarantees for sparse problems

Analysis of a random forest modelBiau (2012) JMLR

Consistency of random forestsScornet, Biau, Vert (2015) The Annals of Statistics

This work stands at the interface between Bayesian inferenceand machine learning techniques

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 32 / 36

Page 132: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodABC random forests

Reliable ABC model choice via random forests Pudlo, Marin, Estoup,Cornuet, Gauthier and Robert (2016) Bioinformatics

Input ABC reference table involving model index and summ. statisticsm(1) η1(z(1)) η2(z(1)) . . . ηd(z(1))

m(2) η1(z(2)) η2(z(2)) . . . ηd(z(2))...

......

......

m(N) η1(z(N)) η2(z(N)) . . . ηd(z(N))

possibly large collection of summary statistics: from scientifictheory input to machine-learning alternatives

Output a random forest classifier to infer model indexes m(η(y))

abcrf R library

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 33 / 36

Page 133: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodABC random forests

Reliable ABC model choice via random forests Pudlo, Marin, Estoup,Cornuet, Gauthier and Robert (2016) Bioinformatics

Input ABC reference table involving model index and summ. statisticsm(1) η1(z(1)) η2(z(1)) . . . ηd(z(1))

m(2) η1(z(2)) η2(z(2)) . . . ηd(z(2))...

......

......

m(N) η1(z(N)) η2(z(N)) . . . ηd(z(N))

possibly large collection of summary statistics: from scientifictheory input to machine-learning alternatives

Output a random forest classifier to infer model indexes m(η(y))

abcrf R library

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 33 / 36

Page 134: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodABC random forests

Reliable ABC model choice via random forests Pudlo, Marin, Estoup,Cornuet, Gauthier and Robert (2016) Bioinformatics

Input ABC reference table involving model index and summ. statisticsm(1) η1(z(1)) η2(z(1)) . . . ηd(z(1))

m(2) η1(z(2)) η2(z(2)) . . . ηd(z(2))...

......

......

m(N) η1(z(N)) η2(z(N)) . . . ηd(z(N))

possibly large collection of summary statistics: from scientifictheory input to machine-learning alternatives

Output a random forest classifier to infer model indexes m(η(y))

abcrf R library

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 33 / 36

Page 135: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodABC random forests

Reliable ABC model choice via random forests Pudlo, Marin, Estoup,Cornuet, Gauthier and Robert (2016) Bioinformatics

Input ABC reference table involving model index and summ. statisticsm(1) η1(z(1)) η2(z(1)) . . . ηd(z(1))

m(2) η1(z(2)) η2(z(2)) . . . ηd(z(2))...

......

......

m(N) η1(z(N)) η2(z(N)) . . . ηd(z(N))

possibly large collection of summary statistics: from scientifictheory input to machine-learning alternatives

Output a random forest classifier to infer model indexes m(η(y))

abcrf R library

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 33 / 36

Page 136: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodABC random forests

Reliable ABC model choice via random forests Pudlo, Marin, Estoup,Cornuet, Gauthier and Robert (2016) Bioinformatics

Input ABC reference table involving model index and summ. statisticsm(1) η1(z(1)) η2(z(1)) . . . ηd(z(1))

m(2) η1(z(2)) η2(z(2)) . . . ηd(z(2))...

......

......

m(N) η1(z(N)) η2(z(N)) . . . ηd(z(N))

possibly large collection of summary statistics: from scientifictheory input to machine-learning alternatives

Output a random forest classifier to infer model indexes m(η(y))

abcrf R library

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 33 / 36

Page 137: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodABC random forests

Reliable ABC model choice via random forests Pudlo, Marin, Estoup,Cornuet, Gauthier and Robert (2016) Bioinformatics

Input ABC reference table involving model index and summ. statisticsm(1) η1(z(1)) η2(z(1)) . . . ηd(z(1))

m(2) η1(z(2)) η2(z(2)) . . . ηd(z(2))...

......

......

m(N) η1(z(N)) η2(z(N)) . . . ηd(z(N))

possibly large collection of summary statistics: from scientifictheory input to machine-learning alternatives

Output a random forest classifier to infer model indexes m(η(y))

abcrf R library

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 33 / 36

Page 138: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodABC random forests

Random forest predicts a MAP model index, from the observeddataset: the predictor provided by the forest is good enough toselect the most likely model but not to derive directly the associ-ated posterior probability

frequency of trees associated with majority model is noproper substitute to the true posterior probability

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 34 / 36

Page 139: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodABC random forests

Estimate of the posterior probability of the selected model

P[M = m(η(y))|η(y)]

random comes from M (bayesian)!

P[M = m(η(y))|η(y)] = 1 − E[I(M , m(η(y)))|η(y)

]

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 35 / 36

Page 140: Bayesian model choice as a classification problem - Applied Bayesian Biostatistics … · 2019-06-12 · Bayesian model choice as a classification problem Jean-Michel Marin University

Intractable likelihoodABC random forests

A second random forest in regression

1) compute the value of I(M , m(η(z)) for the trainedrandom forest and for all terms in the ABC reference tableusing the out-of-bag classifiers;

2) train a RF regression and getρ(η(z)) = E

[I(M , m(η(z)))|η(z)]

];

3) return P[M = m(η(y))|η(y)] = 1 − ρ(η(y)).on same reference table out-of-bag magic trick avoid over-fitting!

Jean-Michel Marin (UM, CNRS & IMAG) Bayesian Biostatistics 2019 22 May 2019 36 / 36