j. daunizeau institute of empirical research in economics, zurich, switzerland brain and spine...

J. Daunizeau

Institute of Empirical Research in Economics, Zurich, Switzerland

Brain and Spine Institute, Paris, France

Bayesian inference

Overview of the talk

1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm

1.2 Hierarchical models

1.3 Frequentist versus Bayesian inference

2 Numerical Bayesian inference methods

2.1 Sampling methods

2.2 Variational methods (ReML, EM, VB)

3 SPM applications

3.1 aMRI segmentation

3.2 Decoding of brain images

3.3 Model-based fMRI analysis (with spatial priors)

3.4 Dynamic causal modelling

Degree of plausibility desiderata:- should be represented using real numbers (D1)- should conform with intuition (D2)- should be consistent (D3)

a=2b=5

a=2

• normalization:

• marginalization:

• conditioning :(Bayes rule)

Bayesian paradigmprobability theory: basics

Bayesian paradigmderiving the likelihood function

- Model of data with unknown parameters:

y f e.g., GLM: f X

- But data is noisy: y f

- Assume noise/residuals is ‘small’:

22

1exp

2p

4 0.05P

→ Distribution of data, given fixed parameters:

22

1exp

2p y y f

f

Likelihood:

Prior:

Bayes rule:

Bayesian paradigmlikelihood, priors and the model evidence

generative model m

Bayesian paradigmforward and inverse problems

,p y m

forward problem

likelihood

,p y m

inverse problem

posterior distribution

Principle of parsimony :« plurality should not be assumed without necessity »

y=f(

x)y

= f(

x)

x

“Occam’s razor” :

mo

de

l evi

de

nce

p(y

|m)

space of all data sets

Model evidence:

Bayesian paradigmmodel comparison

••• hierarchy

causality

Hierarchical modelsprinciple

Hierarchical modelsdirected acyclic graphs (DAGs)

•••

prior densities posterior densities

Hierarchical modelsunivariate linear hierarchical model

t t Y t *

0*P t t H

0p t H

0*P t t H if then reject H0

• estimate parameters (obtain test stat.)

H0: 0• define the null, e.g.:

• apply decision rule, i.e.:

classical SPM

p y

0P H y

0P H y if then accept H0

• invert model (obtain posterior pdf)

H0: 0• define the null, e.g.:

• apply decision rule, i.e.:

Bayesian PPM

Frequentist versus Bayesian inferencea (quick) note on hypothesis testing

Y

• define the null and the alternative hypothesis in terms of priors, e.g.:

0 0

1 1

1 if 0:

0 otherwise

: 0,

H p H

H p H N

0

1

1P H y

P H yif then reject H0• apply decision rule, i.e.:

y

Frequentist versus Bayesian inferencewhat about bilateral tests?

1p Y H

0p Y H

space of all datasets

• Savage-Dickey ratios (nested models, i.i.d. priors):

10 1

1

0 ,

0

p y Hp y H p y H

p H








3 SPM applications





Sampling methodsMCMC example: Gibbs sampling

Variational methodsVB / EM / ReML

→ VB : maximize the free energy F(q) w.r.t. the “variational” posterior q(θ) under some (e.g., mean field, Laplace) approximation

1 or 2q

1 or 2 ,p y m

1 2, ,p y m

1

2








3 SPM applications





realignmentrealignment smoothingsmoothing

normalisationnormalisation

general linear modelgeneral linear model

templatetemplate

Gaussian Gaussian field theoryfield theory

p <0.05p <0.05

statisticalstatisticalinferenceinference

segmentationand normalisation

segmentationand normalisation

dynamic causalmodelling

dynamic causalmodelling

posterior probabilitymaps (PPMs)

posterior probabilitymaps (PPMs)

multivariatedecoding

multivariatedecoding

grey matter CSFwhite matter

…

…

yi ci

k

2

1

1 2 k

class variances

classmeans

ith voxelvalue

ith voxellabel

classfrequencies

aMRI segmentationmixture of Gaussians (MoG) model

Decoding of brain imagesrecognizing brain states from fMRI

+

fixation cross

>>

paceresponse

log-evidence of X-Y sparse mappings:effect of lateralization

log-evidence of X-Y bilateral mappings:effect of spatial deployment

fMRI time series analysisspatial priors and model comparison

PPM: regions best explainedby short-term memory model

PPM: regions best explained by long-term memory model

fMRI time series

GLM coeff

prior varianceof GLM coeff

prior varianceof data noise

AR coeff(correlated noise)

short-term memorydesign matrix (X)

long-term memorydesign matrix (X)

m2m1 m3 m4

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

m1 m2 m3 m4

15

10

5

0

V1 V5stim

PPC

attention

1.25

0.13

0.46

0.39 0.26

0.26

0.10estimated

effective synaptic strengthsfor best model (m4)

models marginal likelihood

ln p y m

Dynamic Causal Modellingnetwork structure identification

1 2

31 2

31 2

3

1 2

3

time

( , , )x f x u

32

21

13

0t

t t

t t

tu

13u 3

u

DCMs and DAGsa note on causality

m1

m2

diff

eren

ces

in lo

g- m

ode

l evi

denc

es

1 2ln lnp y m p y m

subjects

fixed effect

random effect

assume all subjects correspond to the same model

assume different subjects might correspond to different models

Dynamic Causal Modellingmodel comparison for group studies

I thank you for your attention.

A note on statistical significancelessons from the Neyman-Pearson lemma

• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test

1

0

p y Hu

p y H

is the most powerful test of size to test the null. 0p u H

MVB (Bayes factor) u=1.09, power=56%

CCA (F-statistics)F=2.20, power=20%

error I rate

1 -

erro

r II

rat

e

ROC analysis

• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?

j. daunizeau institute of empirical research in economics, zurich, switzerland brain and spine...

Documents

causality slide

bayesian paradigm likelihood

best model

f slide

dynamic causal modelling

bayesian ppm frequentist

gibbs sampling slide

modelbased fmri analysis