A (very) brief introduction to multivoxel analysis “stuff”
Jo Etzel, Social Brain Lab
Mass univariate (spm) Test every voxel
separately Fit a linear model to each
voxel Look for brain structures
where the blobs occur Obtain a p-value for each
voxel Use parametric (or
permutation) statistics to evaluate significance
Multivariate Test groups of voxels
(ROIs) at once Use machine learning
algorithms Structures first (no blobs)
Obtain a classification accuracy for each ROI
Evaluate significance of accuracy by permutation testing and perhaps subsetting (parametric statistics also possible)
Classification
Beard?
man deep voice?
man flat chest?
man woman
yes
yes
no
no
no
yes
from www.cac.science.ru.nl/people/ustun/index.html
What sort of flower is this?
Is this example type 1 or type 2?
from http://spect.yale.edu/displaying_results.html
listening to hand action sounds listening to mouth action sounds
Can the activity in the right premotor cortex classify the volumes into hand and mouth action sounds?
- or -Is the brain activity in this ROI different while listening to hand and mouth action sounds?
subject stimulus example voxel 1 voxel 2
1 mouth 1 0.8464 -0.8932
1 mouth 2 0.5776 0.0494
1 hand 1 1.4955 0.2002
1 hand 2 -0.3844 -0.3322
2 mouth 1 0.5432 -0.9855
www.cs.sunysb.edu/~mueller/research/brainMiner/
0.3543 -0.1734 2.3444 0.3832 -0.9844 0.9864 0.0033for each ROI in each volume (or summary volume)
temporal compression; extract GLM parameter estimates
to classifiers
subject
stimulus example voxel 1 voxel 2
1 mouth 1 0.8464 -0.8932
1 mouth 2 0.5776 0.0494
1 hand 1 1.4955 0.2002
1 hand 2 -0.3844
-0.3322
1 hand 3 0.5432 -0.9855
subject
stimulus example voxel 1 voxel 2
1 mouth 1 0.8464 -0.8932
1 mouth 2 0.5776 0.0494
1 hand 1 1.4955 0.2002
1 hand 2 -0.332 2.1134
subject
stimulus example voxel 1 voxel 2
1 hand 5 1.4955 0.2002
1 hand 6 -0.3844
-0.3322
1 mouth 3 0.5432 -0.9855
1 hand 4 2.3145 1.3342
testing set
training setclassifier
accuracy (from test set)
1st train: mouth vs. hand?
2nd test: mouth vs. hand?
classify each subject separately, average accuracy across subjects subjects can have their own patterns
classify subjects together, test left-out subject subjects need the same patterns
all data
preM
L
preM
R
M1
L
M1
R
S1
L
S1
R
S2
L
S2
R
aud
L
aud
R
othe
r L
othe
r R
mea
n ac
cura
cy
0.45
0.50
0.55
0.60
0.65 *
*
*
**
*
ROIs
* P < 0. 0042, permutation test (Bonferroni correction) of 0.05 for 12 ROIs.
So, have which ROIs allowed significant classification accuracy for separating mouth and hand action sounds: preML, preMR, M1L, S2R, audL, audR.
Permutation TestWhat?
null hypothesis is arbitrary labeling If labeling is arbitrary
(random), then there is no relationship between the labels and the activity, so classification should not be possible.
real data classified about the same as arbitrary data: no evidence for a relationship between the labels and the activity: not significant.
real data is classified better than the arbitrary data: it is “unusual:” significant.
How? classify the real data make many randomized-label
data sets (arbitrary or “fake” data).
classify each fake data set in the same way as the real data
proportion of fake data sets classified better than the real data is the p-value
1. Analyze the real data (get accuracy). M1 L = 0.6000
2. Make lots of permuted-label data sets (all if possible, at least 1,000).
0.4944 0.510 0.499 0.5211 0.480 0.5002 0.498 0.519 0.5720 0.4789 …
3. Analyze the fake data sets (get accuracies).
4. Count how many of the fake data sets were classified more accurately than the real data.
Of 1000 fake data sets, none had accuracy higher than 0.60.
5. Divide and get the p-value.
1/1001 = 0.000999, soM1 L p = 0.001
(50/1001 = 0.0499)
Done!
Permutation test results, showing true results with uncorrected p-value cutoff lines. True mean accuracy lines that fall above the range of the permuted lines are highly significant (p = 0.002).
0.40
0.45
0.50
0.55
0.60
0.65
0.70
mouth vs. handm
ean
accu
racy
audL audR S2L S2R M1L M1R S1L S1R preML preMR
0.40
0.45
0.50
0.55
0.60
0.65
0.70
env vs. hand
mea
n ac
cura
cy
audL audR S2L S2R M1L M1R S1L S1R preML preMR
0.40
0.45
0.50
0.55
0.60
0.65
0.70
Smouth vs. Shand
mea
n ac
cura
cy
audL audR S2L S2R M1L M1R S1L S1R preML preMR
all voxels in ROI (mean)permutation (mean)permutation, uncorr p=0.002 (max, min)permutation, uncorr p=0.005 (0.5%)permutation, uncorr p=0.05 (5%)
all voxels in ROI (mean)permutation (mean)permutation, uncorr p=0.002 (max, min)permutation, uncorr p=0.005 (0.5%)permutation, uncorr p=0.05 (5%)
all voxels in ROI (mean)permutation (mean)permutation, uncorr p=0.002 (max, min)permutation, uncorr p=0.005 (0.5%)permutation, uncorr p=0.05 (5%)
Black box?
Random Forests (RFs) first fMRI use, I think algorithm makes 10,000
CARTs, all try to classify, majority vote final class
algorithm makes random variable selections for the “forest” of “trees”
support vector machines (svms)
previously used with fMRI data algorithm converts data to a
higher dimension to try to find a linear separating line (hyperplane) between the classes.
classifierBeard?
man deep voice?
man flat chest?
man woman
yes
yes
no
no
no
yes
Beard?
man deep voice?
man flat chest?
man woman
yes
yes
no
no
no
yesBeard?
man deep voice?
man flat chest?
man woman
yes
yes
no
no
no
yes
k-nearest neighbor Linear discriminant analysis Gaussian Naïve Bayes Hidden Markov Models Partial Least Squares
For more info/review: Mitchell, Machine Learning,
2004, 145-175; O’toole, Journal of Cognitive
Neuroscience, 2007, 1735-1753.
Other classification-type algorithms
classifier
Neural Networks A generalization of linear regression
functions; many variations. Each node calculates a weighted
sum, and does a binary input to the next layer, so simple networks can be expressed as (long) equations.
Training the network for each person involves setting the weights on each voxel.
Idea: brain activity pattern during remembering an item should be similar to the pattern when learning the item.
Subjects learned 3 lists of 10 items each; items were labeled pictures of famous people, famous locations, and everyday objects.
After learning, subjects tried to remember all of the items; saying them aloud as they remembered them.
2: testingvolumes when recalling
1: trainingvolumes when learning items in each category classifier
(neural network)3: accuracy
(sort of)
“For each brain scan, the classifier produced an estimate of the match between the current testing pattern and each of the three study contexts.”
“However, follow-up analyses indicated that voxels outside of peak category-selective areas are also important for establishing this result.”
Take-Home Message
Classification (multivariate methods) can answer questions and find patterns not available with spm-type methods. spm is still useful, though!
Method is (for me) ROI-focused: start with hypotheses about which ROIs can classify (or not) which stimuli.
Differences in activation to stimuli can be restated as classification predictions.
That’s It!
Permutation vs. t-test p-values
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
mouth_hand, p=0, r2=0.9933
t-test p
perm
utat
ion
p
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
env_hand, p=1e-05, r2=0.873
t-test p
perm
utat
ion
p0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Smouth_Shand, p=0, r2=0.9739
t-test p
perm
utat
ion
p
The permutation and t-test p-values are very highly correlated, though not always identical.
Calculated correlation line in solid, perfect in dotted.
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
mouth_hand, p=0, r2=0.9933
t-test p
perm
utat
ion
p
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
env_hand, p=1e-05, r2=0.873
t-test p
perm
utat
ion
p
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Smouth_Shand, p=0, r2=0.9739
t-test p
perm
utat
ion
p