Download - Massively Univariate Inference for fMRI

1

Massively Univariate Inference for fMRI

Thomas Nichols, Ph.D.Assistant Professor

Department of Biostatistics University of Michigan

http://www.sph.umich.edu/~nichols

ISBIApril 15, 2004

2

Introduction & Overview

• Inference in fMRI– Where’s the blob!?

• Overview

I. Statistics Background

II. Assessing Statistic Images

III. Thresholding Statistic Images

I. Statistics Background 3

I. Statistics Background

• Hypothesis Testing

• Fixed vs Random Effects

• Nonparametric/resampling inference


• Null Hypothesis H0

• Test statistic T– t observed realization of T

level– Acceptable false positive rate

– P( T>u | H0 ) =

• P-value– Assessment of t assuming H0

– P( T > t | H0 )• Prob. of obtaining stat. as large

or larger in a new experiment

– P(Data|Null) not P(Null|Data)

Hypothesis Testing

u

Null Distribution of T

t

P-val

Null Distribution of T


Random Effects Models

• GLM has only one source of randomness

– Residual error • But people are another source of error

– Everyone activates somewhat differently…

XY


Subj. 1

Subj. 2

Subj. 3

Subj. 4

Subj. 5

Subj. 6

0

Fixed vs. RandomEffects

• Fixed Effects– Intra-subject

variation suggests all these subjects different from zero

• Random Effects– Inter-subject

variation suggests population not very different from zero

Sampling distnsof each subject’s effect

Populationdistn of effect


Random Effects for fMRI• Summary Statistic Approach

– Easy• Create contrast images for each subject• Analyze contrast images with one-sample t

– Limited• Only allows one scan per subject• Assumes balanced designs and homogeneous meas. error.

• Full Mixed Effects Analysis– Hard

• Requires iterative fitting• REML to estimate inter- and intra subject variance

– SPM2 & FSL implement this, very differently

– Very flexible


Nonparametric Inference• Parametric methods

– Assume distribution ofstatistic under nullhypothesis

– Needed to find P-values, u

• Nonparametric methods– Use data to find

distribution of statisticunder null hypothesis

– Any statistic!

5%

Parametric Null Distribution

5%

Nonparametric Null Distribution


Nonparametric Inference:Permutation Test

• Assumptions– Null Hypothesis Exchangeability

• Method– Compute statistic t

– Resample data (without replacement), compute t*

– {t*} permutation distribution of test statistic

– P-value = #{ t* > t } / #{ t* }

• Theory– Given data and H0, each t* has equal probability

– Still can assume data randomly drawn from population


Permutation TestToy Example

• Data from V1 voxel in visual stim. experimentA: Active, flashing checkerboard B: Baseline, fixation6 blocks, ABABAB Just consider block averages...

• Null hypothesis Ho – No experimental effect, A & B labels arbitrary

• Statistic– Mean difference

A B A B A B

103.00 90.48 99.93 87.83 99.76 96.06



• Under Ho

– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation distribution

AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86

AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15

AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67

AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25

ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82



• Under Ho

– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation distribution

0 4 8-4-8


Permutation TestStrengths

• Requires only assumption of exchangeability– Under Ho, distribution unperturbed by permutation– Allows us to build permutation distribution

• Subjects are exchangeable– Under Ho, each subject’s A/B labels can be flipped

• fMRI scans not exchangeable under Ho– Due to temporal autocorrelation


Permutation TestLimitations

• Computational Intensity– Analysis repeated for each relabeling– Not so bad on modern hardware

• No analysis discussed below took more than 3 hours

• Implementation Generality– Each experimental design type needs unique

code to generate permutations• Not so bad for population inference with t-tests


Nonparametric Inference:The Bootstrap

• Theoretical differences– Independence instead of exchangeability– Asymtotically valid

• For each design, finite sample size properties should be evaluated

• Practical differences– Resample residuals with replacement– Residuals, not data, resampled

• (1) Estimate model parameterb or statistic t• (2) Create residuals• (3) Resample residuals e*

• (4) Add model fit to e*, constitute Bootstrap dataset y*

• (5) Estimate b* or t* using y*

• (6) Repeat (3)-(5) to build Bootstrap distribution of b* or t*

• Many important details– See books: Efron & Tibshirani (1993), Davison & Hinkley (1997)

II. Assessing Statistic Images 16

I. Assessing Statistic Images

Where’s the signal?

t > 0.5t > 3.5t > 5.5

High Threshold Med. Threshold Low Threshold

Good SpecificityPoor Power

(risk of false negatives)

Poor Specificity(risk of false positives)

Good Power

...but why threshold?!


Blue-sky inference:What we’d like

• Don’t threshold, model!– Signal location?

• Estimates and CI’s on (x,y,z) location

– Signal intensity?• CI’s on % change

– Spatial extent?• Estimates and CI’s on activation volume

• Robust to choice of cluster definition

• ...but this requires an explicit spatial model


Blue-sky inference...a spatial model?

• No routine spatial modeling methods exist– High-dimensional mixture modeling problem– Activations don’t look like Gaussian blobs– Need realistic shapes, sparse representation

• One initial attempt:– Hartvig, N. and Jensen, J. (2000). HBM, 11(4):233-248.


Blue-sky inference:What we get

• Signal location– Local maximum (no inference)– Center-of-mass (no inference)

• Sensitive to blob-defining-threshold

• Signal intensity– Local maximum intensity (P-value, CI’s)

• Spatial extent– Volume above blob-defining-threshold

(P-value, no CI’s)• Sensitive to blob-defining-threshold


Voxel-wise Tests

• Retain voxels above -level threshold u

• Gives best spatial specificity– The null hyp. at a single voxel can be rejected

Significant Voxels

space

u

No significant Voxels


Cluster-wise tests

• Two step-process– Define clusters by arbitrary threshold uclus

– Retain clusters larger than -level threshold k

Cluster not significant

uclus

space

Cluster significantk k


Inference on ImagesCluster-wise tests

• Typically better sensitivity

• Worse spatial specificity– The null hyp. of entire cluster is rejected– Only means that one or more of voxels in

cluster active

Cluster not significant

uclus

space

Cluster significantk k


Voxel-Cluster Inference

• Joint Inference– Combine both voxel height T & cluster size C– Poline et al used bivariate probability

• JCBFM (1994) 14:639-642.

• P( maxxCluster T(x) > t , C > c )

• Only for Gaussian images, Gaussian ACF

– Bullmore et al use “cluster mass”• IEEE-TMI (1999) 18(1):32-42.

• Test statistic = xCluster (T(x)-uclus) dx

• No RFT result, uses permutation test


Voxel-Cluster Inference

• Not a new “level” of inference– cf, Friston et al (1996), Detecting Activations in PET

and fMRI: Levels of Inference and Power. NI 4:223-325

• Joint null is intersection of nulls– Joint null = {Voxel null} {Cluster null}– If joint null rejected, only know one or other is

false– Can’t point to any voxel as active– Joint voxel-cluster inference really just

“Intensity-informed cluster-wise inference”


Multiple Comparisons Problem

• Which of 100,000 voxels are sig.? =0.05 5,000 false positive voxels

• Which of (random number, say) 100 clusters significant? =0.05 5 false positives clusters

t > 0.5 t > 1.5 t > 2.5 t > 3.5 t > 4.5 t > 5.5 t > 6.5


MCP Solutions:Measuring False Positives

• Familywise Error Rate (FWER)– Familywise Error

• Existence of one or more false positives

– FWER is probability of familywise error

• False Discovery Rate (FDR)– FDR = E(V/R)– R voxels declared active, V falsely so

• Realized false discovery rate: V/R


Measuring False PositivesFWER vs FDR

Signal

Signal+Noise

Noise


FWE

6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7%

Control of Familywise Error Rate at 10%

11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5%

Control of Per Comparison Rate at 10%

Percentage of Null Pixels that are False Positives

Control of False Discovery Rate at 10%

Occurrence of Familywise Error

Percentage of Activated Pixels that are False Positives

III. Thresholding Statistic Images 30

III. Thresholding Statistic ImagesFWER MCP Solutions

• Bonferroni

• Maximum Distribution Methods– Random Field Theory– Permutation


FWE MCP Solutions: Bonferroni

• For a statistic image T...– Ti ith voxel of statistic image T

• ...use = 0/V 0 FWER level (e.g. 0.05)– V number of voxels– u -level statistic threshold, P(Ti u) =

• By Bonferroni inequality...

FWER = P(FWE)= P( i {Ti u} | H0)

i P( Ti u| H0 )

= i = i 0 /V = 0


FWER MCP Solutions

• Bonferroni



FWER MCP Solutions: Controlling FWER w/ Max

• FWER & distribution of maximum

FWER = P(FWE)= P( i {Ti u} | Ho)= P( maxi Ti u | Ho)

• 100(1-)%ile of max distn controls FWERFWER = P( maxi Ti u | Ho) =

– whereu = F-1

max (1-)

. u


FWER MCP Solutions:Random Field Theory

• Euler Characteristic u

– Topological Measure• #blobs - #holes

– At high thresholds,just counts blobs

– FWER = P(Max voxel u | Ho)

= P(One or more blobs | Ho)

P(u 1 | Ho)

E(u | Ho)

Random Field

Suprathreshold Sets

Threshold

No holes

Never more than 1 blob


RFT Details:Expected Euler Characteristic

E(u) () ||1/2 (u 2 -1) exp(-u 2/2) / (2)2

– Search region R3

– ( volume– ||1/2 roughness

• Assumptions– Multivariate Normal– Stationary*– ACF twice differentiable at 0

* Stationary– Results valid w/out stationary– More accurate when stat. holds

Only very upper tail approximates1-Fmax(u)


Random Field TheorySmoothness Parameterization

• E(u) depends on ||1/2

roughness matrix:

• Smoothness parameterized as Full Width at Half Maximum– FWHM of Gaussian kernel

needed to smooth a whitenoise random field to roughness

Autocorrelation Function

FWHM


• RESELS– Resolution Elements– 1 RESEL = FWHMx FWHMy FWHMz

– RESEL Count R• R = () ||• Volume of search region in units of smoothness• Eg: 10 voxels, 2.5 FWHM, 4 RESELS

• Wrong RESEL interpretation– “Number of independent ‘things’ in the image”

• See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res..

Random Field TheorySmoothness Parameterization


Random Field Intuition

• Corrected P-value for voxel value t Pc = P(max T > t)

E(t) () ||1/2 t2 exp(-t2/2)

• Statistic value t increases– Pc decreases (but only for large t)

• Search volume increases– Pc increases (more severe MCP)

• Roughness increases (Smoothness decreases)– Pc increases (more severe MCP)


• General form for expected Euler characteristic• 2, F, & t fields • restricted search regions • D dimensions •

E[u()] = d Rd () d (u)

RFT Details:Super General Formula

Rd (): d-dimensional Minkowskifunctional of

– function of dimension,space and smoothness:

R0() = () Euler characteristic of

R1() = resel diameter

R2() = resel surface area

R3() = resel volume

d (): d-dimensional EC density of Z(x)– function of dimension and threshold,

specific for RF type:

E.g. Gaussian RF:

0(u) = 1- (u)

1(u) = (4 ln2)1/2 exp(-u2/2) / (2)

2(u) = (4 ln2) exp(-u2/2) / (2)3/2

3(u) = (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2)2

4(u) = (4 ln2)2 (u3 -3u) exp(-u2/2) / (2)5/2


5mm FWHM

10mm FWHM

15mm FWHM

• Expected Cluster Size– E(S) = E(N)/E(L)– S cluster size– N suprathreshold volume

(T > uclus})

– L number of clusters

• E(N) = () P( T > uclus )

• E(L) E(u)

– Assuming no holes

Random Field TheoryCluster Size Tests


Random Field TheoryCluster Size Distribution

• Gaussian Random Fields (Nosko, 1969)

– D: Dimension of RF

• t Random Fields (Cao, 1999)– B: Beta distn

– U’s: 2’s– c chosen s.t.

E(S) = E(N) / E(L)

)()12/()(

/2

/2 ~LED

NED

D ExpS

b

Db

D

cBSU

UD

0

0~

/2

2/1


Random Field TheoryCluster Size Corrected P-Values

• Previous results give uncorrected P-value

• Corrected P-value– Bonferroni

• Correct for expected number of clusters

• Corrected Pc = E(L) Puncorr

– Poisson Clumping Heuristic (Adler, 1980)• Corrected Pc = 1 - exp( -E(L) Puncorr )


Random Field Theory Limitations

• Sufficient smoothness– FWHM smoothness 3-4 times voxel size– More like ~10 times for low-df data

• Smoothness estimation– Estimate is biased when images not

sufficiently smooth

• Multivariate normality– Virtually impossible to check

• Several layers of approximations

Lattice ImageData

Continuous Random Field


Real Data

• fMRI Study of Working Memory – 12 subjects, block design Marshuetz et al (2000)

– Item Recognition• Active:View five letters, 2s pause,

view probe letter, respond

• Baseline: View XXXXX, 2s pause,view Y or N, respond

• Second Level RFX– Difference image, A-B constructed

for each subject

– One sample t test

...

D

yes

...

UBKDA

Active

...

N

no

...

XXXXX

Baseline


Real Data:RFT Result

• Threshold– S = 110,776– 2 2 2 voxels

5.1 5.8 6.9 mmFWHM

– u = 9.870

• Result– 5 voxels above

the threshold– 0.0063 minimum

FWE-correctedp-value

-log 1

0 p

-va

lue


FWER-controlling MCP Solutions

• Bonferroni



Controlling FWER: Permutation Test

• Parametric methods– Assume distribution of

max statistic under nullhypothesis

• Nonparametric methods– Use data to find

distribution of max statisticunder null hypothesis

– Any max statistic!

5%

Parametric Null Max Distribution

5%

Nonparametric Null Max Distribution


Permutation TestOther Statistics

• Nonparametric allows arbitrary statistics– Standard ones are usually best, but...

• Consider smoothed variance t statistic– To regularize low-df variance estimate

skip


Permutation TestSmoothed Variance t


• Consider smoothed variance t statistic

t-statisticvariance

mean difference


Permutation TestSmoothed Variance t


• Consider smoothed variance t statistic

SmoothedVariancet-statistic

mean difference

smoothedvariance


Real Data:Permutation Result

• Permute!– 212 = 4,096 ways to flip 12 A/B labels– For each, note maximum of t image.

Permutation DistributionMaximum t

Orthogonal Slice OverlayThresholded t


t11 Statistic, RF & Bonf. Thresholdt11 Statistic, Nonparametric Threshold

uRF = 9.87uBonf = 9.805 sig. vox.

uPerm = 7.67

58 sig. vox.

Smoothed Variance t Statistic,Nonparametric Threshold

378 sig. vox.

Test Level vs. t11 Threshold


Does this Generalize?RFT vs Bonf. vs Perm.

t Threshold (0.05 Corrected)

df Voxel FWHM RESEL RF Bonf Perm

Verbal Fluency 4 5.4 400 4701.32 42.59 10.14 Location Switching 9 5.7 197 11.17 9.07 5.83 Task Switching 9 6.2 162 10.79 10.35 5.10 Faces: Main Effect 11 4.2 713 10.43 9.07 7.92 Faces: Interaction 11 3.9 870 10.70 9.07 8.26 Item Recognition 11 6.3 463 9.87 9.80 7.67 Visual Motion 11 3.6 1158 11.07 8.92 8.40 Emotional Pictures 12 5.3 295 8.48 8.41 7.15 Pain: Warning 22 4.4 289 5.93 6.05 4.99 Pain: Anticipation 22 4.6 253 5.87 6.05 5.05


RFT vs Bonf. vs Perm.

No. Significant Voxels (0.05 Corrected)

t SmVar t df RF Bonf Perm Perm

Verbal Fluency 4 0 0 0 0 Location Switching 9 0 0 158 354 Task Switching 9 4 6 2241 3447 Faces: Main Effect 11 127 371 917 4088 Faces: Interaction 11 0 0 0 0 Item Recognition 11 5 5 58 378 Visual Motion 11 626 1260 1480 4064 Emotional Pictures 12 0 0 0 7 Pain: Warning 22 127 116 221 347 Pain: Anticipation 22 74 55 182 402


Controlling FDR:Benjamini & Hochberg

• Select desired limit q on E(FDR)• Order p-values, p(1) p(2) ... p(V)

• Let r be largest i such that

• Reject all hypotheses corresponding to p(1), ... , p(r).

p(i) i/V q

p(i)

i/V

i/V qp-

valu

e

0 1

01


Benjamini & Hochberg:Key Properties

• FDR is controlled E(obsFDR) q m0/V

– Conservative, if large fraction of nulls false

• Adaptive– Threshold depends on amount of signal

• More signal, More small p-values,More p(i) less than i/V q/c(V)


P-value threshold when no signal:

/V

P-value thresholdwhen allsignal:

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1O

rder

ed p

-val

ues

p(i

)

Fractional index i/V

Adaptiveness of Benjamini & Hochberg FDR


Benjamini & Hochberg Procedure

• c(V) = 1– Positive Regression Dependency on Subsets

P(X1c1, X2c2, ..., Xkck | Xi=xi) is non-decreasing in xi

• Only required of test statistics for which null true• Special cases include

– Independence– Multivariate Normal with all positive correlations– Same, but studentized with common std. err.

• c(V) = i=1,...,V 1/i log(V)+0.5772– Arbitrary covariance structure

Benjamini &Yekutieli (2001).Ann. Stat.29:1165-1188


Benjamini & HochbergDependence Intuition

• Illustrating BH under dependence– Extreme example of positive dependence

p(i)

i/V

i/V q/c(V)p-

valu

e

0 1

018 voxel image

32 voxel image(interpolated from 8 voxel image)


Other FDR Methods

• John Storey JRSS-B (2002) 64:479-498

– pFDR “Positive FDR”• FDR conditional on one or more rejections• Critical threshold is fixed, not estimated• pFDR and Bayes: pFDR(t) = P( H=0 | T > t )

– Asymptotically valid under “clumpy” dependence• James Troendle JSPI (2000) 84:139-158

– Normal theory FDR• More powerful than BH FDR• Requires numerical integration to obtain thresholds

– Exactly valid if whole correlation matrix known

III. Thresholding Statistic Images 62FWER Perm. Thresh. = 7.6758 voxels

Real Data: FDR Example

FDR Threshold = 3.833,073 voxels

• Threshold– Indep/PosDep

u = 3.83– Arb Cov

u = 13.15

• Result– 3,073 voxels above

Indep/PosDep u– <0.0001 minimum

FDR-correctedp-value

63

Conclusions

• Must account for multiplicity– Otherwise have a fishing expedition

• FWER– Very specific, not very sensitive

• FDR– Less specific, more sensitive– Sociological calibration still underway

• Intuition is not everything– Lots of details not covered here– See references, and reference’s references

64

References

• Most of this talk covered in these papers

TE Nichols & S Hayasaka, Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review. Statistical Methods in Medical Research, 12(5): 419-446, 2003.

TE Nichols & AP Holmes, Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping, 15:1-25, 2001.

CR Genovese, N Lazar & TE Nichols, Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate. NeuroImage, 15:870-878, 2002.

Download - Massively Univariate Inference for fMRI

Top Related