data modeling general linear model & statistical inference
DESCRIPTION
Data Modeling General Linear Model & Statistical Inference. Thomas Nichols, Ph.D. Assistant Professor Department of Biostatistics http://www.sph.umich.edu/~nichols Brain Function and fMRI ISMRM Educational Course July 11, 2002. Motivations. Data Modeling Characterize Signal - PowerPoint PPT PresentationTRANSCRIPT
1
Data ModelingGeneral Linear Model &
Statistical InferenceThomas Nichols, Ph.D.
Assistant ProfessorDepartment of Biostatistics
http://www.sph.umich.edu/~nichols
Brain Function and fMRIISMRM Educational Course
July 11, 2002
2
Motivations
• Data Modeling– Characterize Signal– Characterize Noise
• Statistical Inference– Detect signal– Localization (Where’s the blob?)
3
Outline
• Data Modeling– General Linear Model – Linear Model Predictors– Temporal Autocorrelation – Random Effects Models
• Statistical Inference– Statistic Images & Hypothesis Testing– Multiple Testing Problem
4
Basic fMRI Example
• Data at one voxel– Rest vs.
passive word listening
• Is there an effect?
5
A Linear Model
IntensityT
ime = 1 2+ + er
ror
x1 x2
• “Linear” in parameters 1 & 2
6
Linear model, in image form…
= + +1 2
Y 11x 22x
7
Linear model, in image form…
= + +1̂ 2̂
Y ̂ 11̂x 22ˆ x
Estimated
8
… in image matrix form…
= +
2
1
ˆ
ˆ
Y ̂ X ̂
9
… in matrix form.
XY
=
+YY X
N
1
N N
1 1p
p
N: Number of scans, p: Number of regressors
10
Linear Model Predictors
• Signal Predictors– Block designs– Event-related responses
• Nuisance Predictors– Drift– Regression parameters
11
Signal Predictors
• Linear Time-Invariant system
• LTI specified solely by– Stimulus function of
experiment
– Hemodynamic ResponseFunction (HRF)
• Response to instantaneousimpulse
Blocks
Events
12
Convolution Examples
Event-Related
Hemodynamic Response Function
Predicted Response
Block Design
Experimental Stimulus Function
13
HRF Models
• Canonical HRF– Most sensitive
if it is correct– If wrong, leads to
bias and/or poor fit• E.g. True response
may be faster/slower
• E.g. True response may have smaller/bigger undershoot
SPM’s HRF
14
HRF Models
• Smooth Basis HRFs– More flexible– Less interpretable
• No one parameter explains the response
– Less sensitive relativeto canonical (only if canonical is correct)
Gamma Basis
Fourier Basis
15
HRF Models
• Deconvolution– Most flexible
• Allows any shape
• Even bizarre, non-sensical ones
– Least sensitive relativeto canonical (again, if canonical is correct) Deconvolution Basis
16
Drift Models
• Drift– Slowly varying– Nuisance variability
• Models– Linear, quadratic– Discrete Cosine Transform
Discrete Cosine Transform Basis
17
General Linear ModelRecap
• Fits data Y as linear combination of predictor columns of X
• Very “General”– Correlation, ANOVA, ANCOVA, …
• Only as good as your X matrix
XY
18
Temporal Autocorrelation
• Standard statistical methods assume independent errors– Error i tells you nothing about j i j
• fMRI errors not independent– Autocorrelation due to– Physiological effects– Scanner instability
19
Temporal AutocorrelationIn Brief
• Independence
• Precoloring
• Prewhitening
20
Autocorrelation: Independence Model
• Ignore autocorrelation
• Leads to – Under-estimation of variance– Over-estimation of significance– Too many false positives
21
Autocorrelation:Precoloring
• Temporally blur, smooth your data– This induces more dependence!– But we exactly know the form of the
dependence induced– Assume that intrinsic autocorrelation is
negligible relative to smoothing
• Then we know autocorrelation exactly• Correct GLM inferences based on “known”
autocorrelation
[Friston, et al., “To smooth or not to smooth…” NI 12:196-208 2000]
22
Autocorrelation:Prewhitening
• Statistically optimal solution
• If know true autocorrelation exactly, canundo the dependence– De-correlate your data, your model– Then proceed as with independent data
• Problem is obtaining accurate estimates of autocorrelation– Some sort of regularization is required
• Spatial smoothing of some sort
23
Autocorrelation Redux
Advantage Disadvantage Software
Indep. Simple Inflated significance
All
Precoloring Avoids autocorr. est.
Statistically inefficient
SPM99
Whitening Statistically optimal
Requires precise autocorr. est.
FSL, SPM2
24
Autocorrelation: Models
• Autoregressive– Error is fraction of previous error plus
“new” error
– AR(1): i = i-1 + I
• Software: fmristat, SPM99
• AR + White Noise or ARMA(1,1)– AR plus an independent WN series
• Software: SPM2
• Arbitrary autocorrelation function k = corr( i, i-k )
• Software: FSL’s FEAT
25
Statistic Images &Hypothesis Testing
• For each voxel– Fit GLM, estimate betas
• Write b for estimate of – But usually not interested in all betas
• Recall is a length-p vector
XY
26
Building Statistic Images
=
+
= +Y X
Predictor of interest
27
Building Statistic Images
• Contrast– A linear combination
of parameters– c’
T =
contrast ofestimated
parameters
varianceestimate
T =
ss22c’(X’X)c’(X’X)++cc
c’bc’b
c’ = 1 0 0 0 0 0 0 0
b b b b b ....
28
Hypothesis Test
• So now have a value T for our statistic
• How big is big– Is T=2 big? T=20?
29
Hypothesis Testing
• Assume Null Hypothesis of no signal
• Given that there is nosignal, how likely is our measured T?
• P-value measures this– Probability of obtaining T
as large or larger
level– Acceptable false positive rate
P-val
T
30
Random Effects Models
• GLM has only one source of randomness
– Residual error
• But people are another source of error– Everyone activates somewhat differently…
XY
31
Subj. 1
Subj. 2
Subj. 3
Subj. 4
Subj. 5
Subj. 6
0
Fixed vs.RandomEffects
• Fixed Effects– Intra-subject
variation suggests all these subjects different from zero
• Random Effects– Intersubject
variation suggests population not very different from zero
Distribution of each subject’s effect
32
Random Effects for fMRI• Summary Statistic Approach
– Easy• Create contrast images for each subject• Analyze contrast images with one-sample t
– Limited• Only allows one scan per subject• Assumes balanced designs and homogeneous meas. error.
• Full Mixed Effects Analysis– Hard
• Requires iterative fitting• REML to estimate inter- and intra subject variance
– SPM2 & FSL implement this, very differently
– Very flexible
33
Random Effects for fMRIRandom vs. Fixed
• Fixed isn’t “wrong”, just usually isn’t of interest• If it is sufficient to say
“I can see this effect in this cohort”then fixed effects are OK
• If need to say“If I were to sample a new cohort from the population I would get the same result”
then random effects are needed
34
Multiple Testing Problem
• Inference on statistic images– Fit GLM at each voxel– Create statistic images of effect
• Which of 100,000 voxels are significant? =0.05 5,000 false positives!
t > 0.5 t > 1.5 t > 2.5 t > 3.5 t > 4.5 t > 5.5 t > 6.5
35
MCP Solutions:Measuring False Positives
• Familywise Error Rate (FWER)– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)– R voxels declared active, V falsely so
• Observed false discovery rate: V/R
– FDR = E(V/R)
36
FWER MCP Solutions
• Bonferroni
• Maximum Distribution Methods– Random Field Theory– Permutation
37
FWER MCP Solutions
• Bonferroni
• Maximum Distribution Methods– Random Field Theory– Permutation
38
FWER MCP Solutions: Controlling FWER w/ Max
• FWER & distribution of maximum
FWER= P(FWE)= P(One or more voxels u |
Ho)= P(Max voxel u | Ho)
• 100(1-)%ile of max distn controls FWERFWER = P(Max voxel u | Ho)
u
39
FWER MCP Solutions:Random Field Theory
• Euler Characteristic u
– Topological Measure• #blobs - #holes
– At high thresholds,just counts blobs
– FWER = P(Max voxel u | Ho)= P(One or more blobs | Ho) P(u 1 | Ho) E(u | Ho)
Random Field
Suprathreshold Sets
Threshold
40
Controlling FWER: Permutation Test
• Parametric methods– Assume distribution of
max statistic under nullhypothesis
• Nonparametric methods– Use data to find
distribution of max statisticunder null hypothesis
– Any max statistic!
5%
Parametric Null Max Distribution
5%
Nonparametric Null Max Distribution
41
Measuring False Positives
• Familywise Error Rate (FWER)– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)– R voxels declared active, V falsely so
• Observed false discovery rate: V/R
– FDR = E(V/R)
42
Measuring False PositivesFWER vs FDR
Signal
Signal+Noise
Noise
43
FWE
6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7%
Control of Familywise Error Rate at 10%
11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5%
Control of Per Comparison Rate at 10%
Percentage of Null Pixels that are False Positives
Control of False Discovery Rate at 10%
Occurrence of Familywise Error
Percentage of Activated Pixels that are False Positives
44
Controlling FDR:Benjamini & Hochberg
• Select desired limit q on E(FDR)• Order p-values, p(1) p(2) ... p(V)
• Let r be largest i such that
• Reject all hypotheses corresponding to p(1), ... , p(r).p(i) i/V q p(i)
i/V
i/V qp-
valu
e
0 1
01
45
Conclusions
• Analyzing fMRI Data– Need linear regression basics– Lots of disk space, and time– Watch for MTP (no fishing!)
46
Thanks
• Slide help– Stefan Keibel, Rik Henson, JB Poline, Andrew
Holmes