icic data analysis workshop 2016 · 2020. 8. 5. · a word on priors in theory or physics driven...

22
Bayesian model comparison ICIC Data Analysis Workshop 2016 Ln(a) Sellentin Imperial College London & Université de Genève email: [email protected]

Upload: others

Post on 09-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Bayesian model comparison

ICIC Data Analysis Workshop 2016

Ln(a) SellentinImperial College London

& Université de Genève

email: [email protected]

Page 2: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Typical questions

Page 3: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Typical questions

Page 4: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Typical questions

Page 5: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Typical questions

Page 6: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Typical questions

Image credit: Horndeski, Gregory W.

Page 7: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

The evidence

● Normalization constant in parameter inference● The quantity for model comparison

→ It balances the goodness of fit against the number of parameters. 'Occam's razor'.

→ It avoids (extreme) overfitting.

Page 8: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Toy Model

Will always decrease with number of parameters.

Page 9: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Polynomial example

Unknown truth

Page 10: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

A word on priors in ● Theory or physics driven priors

– , Mass > 0

● Data driven priors & combination of experiments

– Prior = old data

– Likelihood = new data

– Posterior = old and new data

● Subjective & informative priors

– 'Only an unstated prior is a bad prior.'

● Objective & 'uninformative' priors

– Maximize KL-divergence

– Exploit symmetry groups: Haar-measures and invariant 'volumes'

– Reparameterization independence (Jeffreys priors)

● Frequentist matching priors

Page 11: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Model comparison

?

Have:

Want:

Bayes' theorem:

Page 12: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Model comparison

Get rid off the prior probability for the data by taking a ratio:

Where:

Bayes factor: > 1 prefers M1

< 1 prefers M

2

Page 13: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

● Bayes factor = evidence1/evidence2.

● Without loss of generality:

● Then:

● Ergo: Introduce ln for measure of decisiveness:

Magnitude of B

decisiveness asymptotes to zero vs. decisiveness grows linearly

→ now B12

and B21

are treated equally

Page 14: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Calibration on the Jeffreys scale

● Dark Energy Survey (DES) SV data

● WL analysis: flat LCDM vs. LCDM + curvature

● Sellentin & Heavens (2016)

Example:

Page 15: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Model selection in the CMB

CMB = photons, in gravitational potentials of all particle species

Sellentin & Durrer (2015)

But are these neutrinos? Or just any relativistic fluid?

Model comparison:

Neutrinos vs. ideal fluid: Neutrinos vs. viscous fluid:+ parameter constraints as a side effect

Page 16: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Expected support for models

● Single data realization:

● Know statistical properties of data → calculate expected likelihood (even without having real data at all)

Page 17: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Expected support for models

Heavens et al. (2007)

M0: M

1:

Page 18: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

● Imagine M1 uses all parameters of M0 but introduces some extra parameters

● Nested model: for have M1 → M0

● Examples: – wCDM → LambdaCDM for w = -1

– Curved LambdaCDM → flat LambdaCDM for k = 0

– Rainy day → sunny day for rain = 0

Nested Models

Page 19: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Savage-Dickey Density Ratio

● SDDR is an approximate Bayes factor for nested models

● The full Bayes factor is

● For nested models:

● Insert into Bayes factor:

● Now need to care about the priors.

Page 20: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Savage-Dickey Density Ratio● Bayes factor:

● Make extra assumption for priors:

● Insert into Bayes factor:

● Leading to the Savage-Dickey Density Ratio:

→ Plan ahead, use Nested Sampling not MCMC to get B + param. constraints→ If too late: MCMC+SDDR+importance sampling approximate B (excercise)

Example from Dirian et al.(2016):

Page 21: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Model averaging

NuisanceParams.

M1

M2

Physicsparams

M1 M

2

● Imagine two models explain the same effect. None is 'better' than the other, as given by B.

● Weak lensing: Intrinsic alignment model?

● Structure formation: Press-Schechter mass function or Sheth-Torman or Jenkins et al. or...?

● Includes model uncertainty into parameter uncertainty.

Page 22: ICIC Data Analysis Workshop 2016 · 2020. 8. 5. · A word on priors in Theory or physics driven priors –, Mass > 0 Data driven priors & combination of experiments – Prior = old

Summary

● Bayesians compare models by evidence ratios

● Balance goodness of fit against number of parameters

● Samplers exist that give parameter constraints and evidences (→ JP's lecture)

● Savage-Dickey Density Ratio may or may not be of relevance to you in case of nested models...

● … depending on your attitude towards priors (subjective/objective).

● Model comparison is prior dependent.