once size does not fit all: regressor and subject specific techniques for predicting experience in...
DESCRIPTION
How do we learn in a very high dimensional setting (~35K voxels) ? Focus on informative areas: choose voxels by correlation thresholding, searchlight Look for global modes: whole brain, PCA, euclidean distance kernel, searchlight kernel without thresholding Advantage: improves stability by pooling over larger areas Disadvantage: correlated noisy areas that do not carry any information may bias the predictor Advantage: ignore areas that are mostly noise Assumes that information is localized, and feature selection method is stable LOCALGLOBALTRANSCRIPT
![Page 1: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/1.jpg)
Once Size Does Not Fit All:Regressor and Subject Specific
Techniques for Predicting Experience in Natural
EnvironmentsDenis Chigirev, Chris Moore, Greg
Stephens & The Princeton EBC Team
![Page 2: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/2.jpg)
How do we learn in a very high dimensional setting (~35K voxels) ?
Look for linear projection(s):
linear regression, ridge regression, linear SVM
How to control for complexity?
Loss function:
quadratic
linear, hinge
Prior
(regularization)
Create a “look-up table”:
nonlinear kernel methods, kernel ridge regression, RKHS, GP, nonlinear SVM
Need similarity measure between brain states (i.e. kernel) & regularization
Assumes “clustering” of similar states
Advantage: pools together many weak signals
Assumes regressor continuity along paths of data points
weights
similarity measure
LINEAR NONLINEAR
![Page 3: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/3.jpg)
How do we learn in a very high dimensional setting (~35K voxels) ?
Focus on informative areas:
choose voxels by correlation thresholding, searchlight
Look for global modes:
whole brain, PCA, euclidean distance kernel, searchlight kernel without thresholding
Advantage: improves stability by pooling over larger areas
Disadvantage: correlated noisy areas that do not carry any information may bias the predictor
Advantage: ignore areas that are mostly noise
Assumes that information is localized, and feature selection method is stable
LOCAL GLOBAL
![Page 4: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/4.jpg)
Different methods emphasize different aspects of the learning problem
Linear Nonlinear
Local Corr. thresh& ridge, searchlight & ridge
Searchlight RKHS
Global PCA & ridge Euclidean RKHS
![Page 5: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/5.jpg)
Ridge Regression using ALL voxels
Difference of means (centroids):
Linear regression solution:
Ridge regression solution:
w=hxiy
w=C ¡ 1hxiy
w= (C +¸I )¡ 1hxiy
• Regularization allows to use all ~ 30K voxels
• Centroids are well estimated (1st order statistic), but covariance matrix is 2nd order, therefore requires regularization
![Page 6: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/6.jpg)
Whole Brain Ridge Regression
Keeping only large eigenvalues of covariance matrix (i.e. PCA-type compexity control) is MUCH LESS effective than ridge regularization.
![Page 7: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/7.jpg)
Reproducing Kernel Hilbert Space (RKHS) T. Poggio
Instead of looking for linear projections (ridge regression, SVM w/ linear kernel), use the measure of similarity between brain states to project the new brain state onto existing ones in feature space.
y(x) = Pi ciK (xi ;x) where (number of
TRs)
(NT R °I +K )c= y
i = 1::NT R
learn “support” coefficients by solving this equation, where represents regularization in feature space.
°c
(aka Kernel Ridge Regression, if use gaussian kernel recover mean GP solution)
We choose where is the distance
between brain states. We use Euclidean distance and searchlight distance.
K (xi ;xj ) = e¡ d2i j =2¾2 di j
![Page 8: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/8.jpg)
This framework allows the similarity measure between different brain states to be tested for their use in prediction
data predictionHow similar are the brain states?
Learning algorithm
(SVM, RKHS, etc. – choice of regularization and loss )
(euclidean distance, mahalanobis, searchlight, earth movers?)
K (xi ;xj ) = e¡ d2i j =2¾2 y(x) = Pi ciK (xi ;x)
This allows to assess independently the quality of brain state similarity measure and the quality of the learning procedure.
Euclidean measure (default), in practice, performs relatively well.
![Page 9: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/9.jpg)
Basics of Searchlight
which pair of brain states is further apart?
d2i j = (xi ¡ xj )C ¡ 1(xi ¡ xj )Mahalanobis distance:
more different
less different
(d®ij )2 = (x®i ¡ x®j )C ¡ 1® (x®i ¡ x®j )
Problem: amplifies poorly estimated dimension for whole brain states.
Solution: apply locally to 3x3x3 supervoxel and then sum individual contributions
here is a 3x3x3 “supervoxel”.x®i ;®= 1::NvoxThen the distance between brain states can be computed as a weighted average:
d2i j =P N vox
®=1 b®(d®ij )2
We used to find that this solution is now self-regularizing, i.e. one can take the complexity penalty to zero.
b®=1
![Page 10: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/10.jpg)
Why might searchlight help? (hint: stability!)
m2
m1
m1
m2
voxel correlation with feature (movie1 & movie2)
Threshold voxel correlation with feature (movie1 & movie2)
searchlight correlation with feature (movie1 & movie2)
Threshold searchlight corr with feature (movie1 & movie2)
m1
The projection learned by linear ridge is only as good as the stability of the underlying voxel correlations with the regressor.
Searchlight distance versus Euclidean distance, tested in RKHS
![Page 11: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/11.jpg)
Different methods emphasize different aspects of the learning problem
Linear Nonlinear
Local Correlation thresholding, ridge complexity control (Chigirev et al. PBAIC 2006, implemented as part of a public MVPA matlab toolbox)
Weighted searchlight RKHS allows to zoom on areas of interest – future work!
Global SVD trick allows to compute 30k x 30k covariance matrix, ridge regularization outperforms PCA as complexity control.
Eucledian RKHS (Kernel Ridge) may be slightly improved by considering global searchlight kernel as similarity measure, has remarkable self-regularization property.
![Page 12: Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg](https://reader036.vdocuments.us/reader036/viewer/2022062909/5a4d1bba7f8b9ab0599d0870/html5/thumbnails/12.jpg)
I would like to thank my collaboraters: Chris Moore*, Greg Stephens, Greg Detre, Michael Bannert
as well as Ken Norman and Jon Cohen for supporting Princeton EBC Team.