Download - Massively Univariate Inference for fMRI
1
Massively Univariate Inference for fMRI
Thomas Nichols, Ph.D.Assistant Professor
Department of Biostatistics University of Michigan
http://www.sph.umich.edu/~nichols
ISBIApril 15, 2004
2
Introduction & Overview
• Inference in fMRI– Where’s the blob!?
• Overview
I. Statistics Background
II. Assessing Statistic Images
III. Thresholding Statistic Images
I. Statistics Background 3
I. Statistics Background
• Hypothesis Testing
• Fixed vs Random Effects
• Nonparametric/resampling inference
I. Statistics Background 4
• Null Hypothesis H0
• Test statistic T– t observed realization of T
level– Acceptable false positive rate
– P( T>u | H0 ) =
• P-value– Assessment of t assuming H0
– P( T > t | H0 )• Prob. of obtaining stat. as large
or larger in a new experiment
– P(Data|Null) not P(Null|Data)
Hypothesis Testing
u
Null Distribution of T
t
P-val
Null Distribution of T
I. Statistics Background 5
Random Effects Models
• GLM has only one source of randomness
– Residual error • But people are another source of error
– Everyone activates somewhat differently…
XY
I. Statistics Background 6
Subj. 1
Subj. 2
Subj. 3
Subj. 4
Subj. 5
Subj. 6
0
Fixed vs. RandomEffects
• Fixed Effects– Intra-subject
variation suggests all these subjects different from zero
• Random Effects– Inter-subject
variation suggests population not very different from zero
Sampling distnsof each subject’s effect
Populationdistn of effect
I. Statistics Background 7
Random Effects for fMRI• Summary Statistic Approach
– Easy• Create contrast images for each subject• Analyze contrast images with one-sample t
– Limited• Only allows one scan per subject• Assumes balanced designs and homogeneous meas. error.
• Full Mixed Effects Analysis– Hard
• Requires iterative fitting• REML to estimate inter- and intra subject variance
– SPM2 & FSL implement this, very differently
– Very flexible
I. Statistics Background 8
Nonparametric Inference• Parametric methods
– Assume distribution ofstatistic under nullhypothesis
– Needed to find P-values, u
• Nonparametric methods– Use data to find
distribution of statisticunder null hypothesis
– Any statistic!
5%
Parametric Null Distribution
5%
Nonparametric Null Distribution
I. Statistics Background 9
Nonparametric Inference:Permutation Test
• Assumptions– Null Hypothesis Exchangeability
• Method– Compute statistic t
– Resample data (without replacement), compute t*
– {t*} permutation distribution of test statistic
– P-value = #{ t* > t } / #{ t* }
• Theory– Given data and H0, each t* has equal probability
– Still can assume data randomly drawn from population
I. Statistics Background 10
Permutation TestToy Example
• Data from V1 voxel in visual stim. experimentA: Active, flashing checkerboard B: Baseline, fixation6 blocks, ABABAB Just consider block averages...
• Null hypothesis Ho – No experimental effect, A & B labels arbitrary
• Statistic– Mean difference
A B A B A B
103.00 90.48 99.93 87.83 99.76 96.06
I. Statistics Background 11
Permutation TestToy Example
• Under Ho
– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation distribution
AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86
AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15
AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67
AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25
ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82
I. Statistics Background 12
Permutation TestToy Example
• Under Ho
– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation distribution
0 4 8-4-8
I. Statistics Background 13
Permutation TestStrengths
• Requires only assumption of exchangeability– Under Ho, distribution unperturbed by permutation– Allows us to build permutation distribution
• Subjects are exchangeable– Under Ho, each subject’s A/B labels can be flipped
• fMRI scans not exchangeable under Ho– Due to temporal autocorrelation
I. Statistics Background 14
Permutation TestLimitations
• Computational Intensity– Analysis repeated for each relabeling– Not so bad on modern hardware
• No analysis discussed below took more than 3 hours
• Implementation Generality– Each experimental design type needs unique
code to generate permutations• Not so bad for population inference with t-tests
I. Statistics Background 15
Nonparametric Inference:The Bootstrap
• Theoretical differences– Independence instead of exchangeability– Asymtotically valid
• For each design, finite sample size properties should be evaluated
• Practical differences– Resample residuals with replacement– Residuals, not data, resampled
• (1) Estimate model parameterb or statistic t• (2) Create residuals• (3) Resample residuals e*
• (4) Add model fit to e*, constitute Bootstrap dataset y*
• (5) Estimate b* or t* using y*
• (6) Repeat (3)-(5) to build Bootstrap distribution of b* or t*
• Many important details– See books: Efron & Tibshirani (1993), Davison & Hinkley (1997)
II. Assessing Statistic Images 16
I. Assessing Statistic Images
Where’s the signal?
t > 0.5t > 3.5t > 5.5
High Threshold Med. Threshold Low Threshold
Good SpecificityPoor Power
(risk of false negatives)
Poor Specificity(risk of false positives)
Good Power
...but why threshold?!
II. Assessing Statistic Images 17
Blue-sky inference:What we’d like
• Don’t threshold, model!– Signal location?
• Estimates and CI’s on (x,y,z) location
– Signal intensity?• CI’s on % change
– Spatial extent?• Estimates and CI’s on activation volume
• Robust to choice of cluster definition
• ...but this requires an explicit spatial model
II. Assessing Statistic Images 18
Blue-sky inference...a spatial model?
• No routine spatial modeling methods exist– High-dimensional mixture modeling problem– Activations don’t look like Gaussian blobs– Need realistic shapes, sparse representation
• One initial attempt:– Hartvig, N. and Jensen, J. (2000). HBM, 11(4):233-248.
II. Assessing Statistic Images 19
Blue-sky inference:What we get
• Signal location– Local maximum (no inference)– Center-of-mass (no inference)
• Sensitive to blob-defining-threshold
• Signal intensity– Local maximum intensity (P-value, CI’s)
• Spatial extent– Volume above blob-defining-threshold
(P-value, no CI’s)• Sensitive to blob-defining-threshold
II. Assessing Statistic Images 20
Voxel-wise Tests
• Retain voxels above -level threshold u
• Gives best spatial specificity– The null hyp. at a single voxel can be rejected
Significant Voxels
space
u
No significant Voxels
II. Assessing Statistic Images 21
Cluster-wise tests
• Two step-process– Define clusters by arbitrary threshold uclus
– Retain clusters larger than -level threshold k
Cluster not significant
uclus
space
Cluster significantk k
II. Assessing Statistic Images 22
Inference on ImagesCluster-wise tests
• Typically better sensitivity
• Worse spatial specificity– The null hyp. of entire cluster is rejected– Only means that one or more of voxels in
cluster active
Cluster not significant
uclus
space
Cluster significantk k
II. Assessing Statistic Images 23
Voxel-Cluster Inference
• Joint Inference– Combine both voxel height T & cluster size C– Poline et al used bivariate probability
• JCBFM (1994) 14:639-642.
• P( maxxCluster T(x) > t , C > c )
• Only for Gaussian images, Gaussian ACF
– Bullmore et al use “cluster mass”• IEEE-TMI (1999) 18(1):32-42.
• Test statistic = xCluster (T(x)-uclus) dx
• No RFT result, uses permutation test
II. Assessing Statistic Images 24
Voxel-Cluster Inference
• Not a new “level” of inference– cf, Friston et al (1996), Detecting Activations in PET
and fMRI: Levels of Inference and Power. NI 4:223-325
• Joint null is intersection of nulls– Joint null = {Voxel null} {Cluster null}– If joint null rejected, only know one or other is
false– Can’t point to any voxel as active– Joint voxel-cluster inference really just
“Intensity-informed cluster-wise inference”
II. Assessing Statistic Images 25
Multiple Comparisons Problem
• Which of 100,000 voxels are sig.? =0.05 5,000 false positive voxels
• Which of (random number, say) 100 clusters significant? =0.05 5 false positives clusters
t > 0.5 t > 1.5 t > 2.5 t > 3.5 t > 4.5 t > 5.5 t > 6.5
II. Assessing Statistic Images 26
MCP Solutions:Measuring False Positives
• Familywise Error Rate (FWER)– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)– FDR = E(V/R)– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
II. Assessing Statistic Images 27
Measuring False PositivesFWER vs FDR
Signal
Signal+Noise
Noise
II. Assessing Statistic Images 28
FWE
6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7%
Control of Familywise Error Rate at 10%
11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5%
Control of Per Comparison Rate at 10%
Percentage of Null Pixels that are False Positives
Control of False Discovery Rate at 10%
Occurrence of Familywise Error
Percentage of Activated Pixels that are False Positives
II. Assessing Statistic Images 29
MCP Solutions:Measuring False Positives
• Familywise Error Rate (FWER)– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)– FDR = E(V/R)– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
III. Thresholding Statistic Images 30
III. Thresholding Statistic ImagesFWER MCP Solutions
• Bonferroni
• Maximum Distribution Methods– Random Field Theory– Permutation
III. Thresholding Statistic Images 31
FWE MCP Solutions: Bonferroni
• For a statistic image T...– Ti ith voxel of statistic image T
• ...use = 0/V 0 FWER level (e.g. 0.05)– V number of voxels– u -level statistic threshold, P(Ti u) =
• By Bonferroni inequality...
FWER = P(FWE)= P( i {Ti u} | H0)
i P( Ti u| H0 )
= i = i 0 /V = 0
III. Thresholding Statistic Images 32
FWER MCP Solutions
• Bonferroni
• Maximum Distribution Methods– Random Field Theory– Permutation
III. Thresholding Statistic Images 33
FWER MCP Solutions: Controlling FWER w/ Max
• FWER & distribution of maximum
FWER = P(FWE)= P( i {Ti u} | Ho)= P( maxi Ti u | Ho)
• 100(1-)%ile of max distn controls FWERFWER = P( maxi Ti u | Ho) =
– whereu = F-1
max (1-)
. u
III. Thresholding Statistic Images 34
FWER MCP Solutions:Random Field Theory
• Euler Characteristic u
– Topological Measure• #blobs - #holes
– At high thresholds,just counts blobs
– FWER = P(Max voxel u | Ho)
= P(One or more blobs | Ho)
P(u 1 | Ho)
E(u | Ho)
Random Field
Suprathreshold Sets
Threshold
No holes
Never more than 1 blob
III. Thresholding Statistic Images 35
RFT Details:Expected Euler Characteristic
E(u) () ||1/2 (u 2 -1) exp(-u 2/2) / (2)2
– Search region R3
– ( volume– ||1/2 roughness
• Assumptions– Multivariate Normal– Stationary*– ACF twice differentiable at 0
* Stationary– Results valid w/out stationary– More accurate when stat. holds
Only very upper tail approximates1-Fmax(u)
III. Thresholding Statistic Images 36
Random Field TheorySmoothness Parameterization
• E(u) depends on ||1/2
roughness matrix:
• Smoothness parameterized as Full Width at Half Maximum– FWHM of Gaussian kernel
needed to smooth a whitenoise random field to roughness
Autocorrelation Function
FWHM
III. Thresholding Statistic Images 37
• RESELS– Resolution Elements– 1 RESEL = FWHMx FWHMy FWHMz
– RESEL Count R• R = () ||• Volume of search region in units of smoothness• Eg: 10 voxels, 2.5 FWHM, 4 RESELS
• Wrong RESEL interpretation– “Number of independent ‘things’ in the image”
• See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res..
Random Field TheorySmoothness Parameterization
III. Thresholding Statistic Images 38
Random Field Intuition
• Corrected P-value for voxel value t Pc = P(max T > t)
E(t) () ||1/2 t2 exp(-t2/2)
• Statistic value t increases– Pc decreases (but only for large t)
• Search volume increases– Pc increases (more severe MCP)
• Roughness increases (Smoothness decreases)– Pc increases (more severe MCP)
III. Thresholding Statistic Images 39
• General form for expected Euler characteristic• 2, F, & t fields • restricted search regions • D dimensions •
E[u()] = d Rd () d (u)
RFT Details:Super General Formula
Rd (): d-dimensional Minkowskifunctional of
– function of dimension,space and smoothness:
R0() = () Euler characteristic of
R1() = resel diameter
R2() = resel surface area
R3() = resel volume
d (): d-dimensional EC density of Z(x)– function of dimension and threshold,
specific for RF type:
E.g. Gaussian RF:
0(u) = 1- (u)
1(u) = (4 ln2)1/2 exp(-u2/2) / (2)
2(u) = (4 ln2) exp(-u2/2) / (2)3/2
3(u) = (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2)2
4(u) = (4 ln2)2 (u3 -3u) exp(-u2/2) / (2)5/2
III. Thresholding Statistic Images 40
5mm FWHM
10mm FWHM
15mm FWHM
• Expected Cluster Size– E(S) = E(N)/E(L)– S cluster size– N suprathreshold volume
(T > uclus})
– L number of clusters
• E(N) = () P( T > uclus )
• E(L) E(u)
– Assuming no holes
Random Field TheoryCluster Size Tests
III. Thresholding Statistic Images 41
Random Field TheoryCluster Size Distribution
• Gaussian Random Fields (Nosko, 1969)
– D: Dimension of RF
• t Random Fields (Cao, 1999)– B: Beta distn
– U’s: 2’s– c chosen s.t.
E(S) = E(N) / E(L)
)()12/()(
/2
/2 ~LED
NED
D ExpS
b
Db
D
cBSU
UD
0
0~
/2
2/1
III. Thresholding Statistic Images 42
Random Field TheoryCluster Size Corrected P-Values
• Previous results give uncorrected P-value
• Corrected P-value– Bonferroni
• Correct for expected number of clusters
• Corrected Pc = E(L) Puncorr
– Poisson Clumping Heuristic (Adler, 1980)• Corrected Pc = 1 - exp( -E(L) Puncorr )
III. Thresholding Statistic Images 43
Random Field Theory Limitations
• Sufficient smoothness– FWHM smoothness 3-4 times voxel size– More like ~10 times for low-df data
• Smoothness estimation– Estimate is biased when images not
sufficiently smooth
• Multivariate normality– Virtually impossible to check
• Several layers of approximations
Lattice ImageData
Continuous Random Field
III. Thresholding Statistic Images 44
Real Data
• fMRI Study of Working Memory – 12 subjects, block design Marshuetz et al (2000)
– Item Recognition• Active:View five letters, 2s pause,
view probe letter, respond
• Baseline: View XXXXX, 2s pause,view Y or N, respond
• Second Level RFX– Difference image, A-B constructed
for each subject
– One sample t test
...
D
yes
...
UBKDA
Active
...
N
no
...
XXXXX
Baseline
III. Thresholding Statistic Images 45
Real Data:RFT Result
• Threshold– S = 110,776– 2 2 2 voxels
5.1 5.8 6.9 mmFWHM
– u = 9.870
• Result– 5 voxels above
the threshold– 0.0063 minimum
FWE-correctedp-value
-log 1
0 p
-va
lue
III. Thresholding Statistic Images 46
FWER-controlling MCP Solutions
• Bonferroni
• Maximum Distribution Methods– Random Field Theory– Permutation
III. Thresholding Statistic Images 47
Controlling FWER: Permutation Test
• Parametric methods– Assume distribution of
max statistic under nullhypothesis
• Nonparametric methods– Use data to find
distribution of max statisticunder null hypothesis
– Any max statistic!
5%
Parametric Null Max Distribution
5%
Nonparametric Null Max Distribution
III. Thresholding Statistic Images 48
Permutation TestOther Statistics
• Nonparametric allows arbitrary statistics– Standard ones are usually best, but...
• Consider smoothed variance t statistic– To regularize low-df variance estimate
skip
III. Thresholding Statistic Images 49
Permutation TestSmoothed Variance t
• Nonparametric allows arbitrary statistics– Standard ones are usually best, but...
• Consider smoothed variance t statistic
t-statisticvariance
mean difference
III. Thresholding Statistic Images 50
Permutation TestSmoothed Variance t
• Nonparametric allows arbitrary statistics– Standard ones are usually best, but...
• Consider smoothed variance t statistic
SmoothedVariancet-statistic
mean difference
smoothedvariance
III. Thresholding Statistic Images 51
Real Data:Permutation Result
• Permute!– 212 = 4,096 ways to flip 12 A/B labels– For each, note maximum of t image.
Permutation DistributionMaximum t
Orthogonal Slice OverlayThresholded t
III. Thresholding Statistic Images 52
t11 Statistic, RF & Bonf. Thresholdt11 Statistic, Nonparametric Threshold
uRF = 9.87uBonf = 9.805 sig. vox.
uPerm = 7.67
58 sig. vox.
Smoothed Variance t Statistic,Nonparametric Threshold
378 sig. vox.
Test Level vs. t11 Threshold
III. Thresholding Statistic Images
Does this Generalize?RFT vs Bonf. vs Perm.
t Threshold (0.05 Corrected)
df Voxel FWHM RESEL RF Bonf Perm
Verbal Fluency 4 5.4 400 4701.32 42.59 10.14 Location Switching 9 5.7 197 11.17 9.07 5.83 Task Switching 9 6.2 162 10.79 10.35 5.10 Faces: Main Effect 11 4.2 713 10.43 9.07 7.92 Faces: Interaction 11 3.9 870 10.70 9.07 8.26 Item Recognition 11 6.3 463 9.87 9.80 7.67 Visual Motion 11 3.6 1158 11.07 8.92 8.40 Emotional Pictures 12 5.3 295 8.48 8.41 7.15 Pain: Warning 22 4.4 289 5.93 6.05 4.99 Pain: Anticipation 22 4.6 253 5.87 6.05 5.05
III. Thresholding Statistic Images
RFT vs Bonf. vs Perm.
No. Significant Voxels (0.05 Corrected)
t SmVar t df RF Bonf Perm Perm
Verbal Fluency 4 0 0 0 0 Location Switching 9 0 0 158 354 Task Switching 9 4 6 2241 3447 Faces: Main Effect 11 127 371 917 4088 Faces: Interaction 11 0 0 0 0 Item Recognition 11 5 5 58 378 Visual Motion 11 626 1260 1480 4064 Emotional Pictures 12 0 0 0 7 Pain: Warning 22 127 116 221 347 Pain: Anticipation 22 74 55 182 402
III. Thresholding Statistic Images 55
MCP Solutions:Measuring False Positives
• Familywise Error Rate (FWER)– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)– FDR = E(V/R)– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
III. Thresholding Statistic Images 56
Controlling FDR:Benjamini & Hochberg
• Select desired limit q on E(FDR)• Order p-values, p(1) p(2) ... p(V)
• Let r be largest i such that
• Reject all hypotheses corresponding to p(1), ... , p(r).
p(i) i/V q
p(i)
i/V
i/V qp-
valu
e
0 1
01
III. Thresholding Statistic Images 57
Benjamini & Hochberg:Key Properties
• FDR is controlled E(obsFDR) q m0/V
– Conservative, if large fraction of nulls false
• Adaptive– Threshold depends on amount of signal
• More signal, More small p-values,More p(i) less than i/V q/c(V)
III. Thresholding Statistic Images 58
P-value threshold when no signal:
/V
P-value thresholdwhen allsignal:
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1O
rder
ed p
-val
ues
p(i
)
Fractional index i/V
Adaptiveness of Benjamini & Hochberg FDR
III. Thresholding Statistic Images 59
Benjamini & Hochberg Procedure
• c(V) = 1– Positive Regression Dependency on Subsets
P(X1c1, X2c2, ..., Xkck | Xi=xi) is non-decreasing in xi
• Only required of test statistics for which null true• Special cases include
– Independence– Multivariate Normal with all positive correlations– Same, but studentized with common std. err.
• c(V) = i=1,...,V 1/i log(V)+0.5772– Arbitrary covariance structure
Benjamini &Yekutieli (2001).Ann. Stat.29:1165-1188
III. Thresholding Statistic Images 60
Benjamini & HochbergDependence Intuition
• Illustrating BH under dependence– Extreme example of positive dependence
p(i)
i/V
i/V q/c(V)p-
valu
e
0 1
018 voxel image
32 voxel image(interpolated from 8 voxel image)
III. Thresholding Statistic Images 61
Other FDR Methods
• John Storey JRSS-B (2002) 64:479-498
– pFDR “Positive FDR”• FDR conditional on one or more rejections• Critical threshold is fixed, not estimated• pFDR and Bayes: pFDR(t) = P( H=0 | T > t )
– Asymptotically valid under “clumpy” dependence• James Troendle JSPI (2000) 84:139-158
– Normal theory FDR• More powerful than BH FDR• Requires numerical integration to obtain thresholds
– Exactly valid if whole correlation matrix known
III. Thresholding Statistic Images 62FWER Perm. Thresh. = 7.6758 voxels
Real Data: FDR Example
FDR Threshold = 3.833,073 voxels
• Threshold– Indep/PosDep
u = 3.83– Arb Cov
u = 13.15
• Result– 3,073 voxels above
Indep/PosDep u– <0.0001 minimum
FDR-correctedp-value
63
Conclusions
• Must account for multiplicity– Otherwise have a fishing expedition
• FWER– Very specific, not very sensitive
• FDR– Less specific, more sensitive– Sociological calibration still underway
• Intuition is not everything– Lots of details not covered here– See references, and reference’s references
64
References
• Most of this talk covered in these papers
TE Nichols & S Hayasaka, Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review. Statistical Methods in Medical Research, 12(5): 419-446, 2003.
TE Nichols & AP Holmes, Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping, 15:1-25, 2001.
CR Genovese, N Lazar & TE Nichols, Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate. NeuroImage, 15:870-878, 2002.