large scale functional connectivity for brain decoding · 2014. 5. 25. · are explained. third...

8
LARGE SCALE FUNCTIONAL CONNECTIVITY FOR BRAIN DECODING Orhan Firat 1 , Itir Onal 1 , Emre Aksan 1 , Burak Velioglu 1 , Ilke Oztekin 2 and Fatos T. Yarman Vural 1 1 Department of Computer Engineering, Middle East Technical University, Turkey, 2 Department of Psychology, Koc University, Turkey, Emails:{orhan.firat, itir, emre.aksan, velioglu, vural}@ceng.metu.edu.tr , [email protected] ABSTRACT Functional Magnetic Resonance Imaging (fMRI) data con- sists of time series for each voxel recorded during a cog- nitive task. In order to extract useful information from this noisy and redundant data, techniques are proposed to select the voxels that are relevant to the underlying cog- nitive task. We propose a simple and efficient algorithm for decoding the brain states by modelling the correlation patterns between the voxel time series. For each stimulus during the experiment, a separate functional connectivity matrix is computed in voxel level. The elements in con- nectivity matrices are then filtered out by making use of a minimum spanning tree formed using a global connectivity matrix for the entire experiment in order to reduce dimen- sionality. For a recognition memory experiment with nine subjects, functional connectivity matrices are computed for encoding and retrieval phases. The class labels of the re- trieval samples are predicted within a k-nearest neighbour space constructed by the traversed entries in the functional connectivity matrices for encoding samples. The proposed method is also adapted to large scale functional connectiv- ity tasks by making use of graphics boards. Classification performance in ten categories is comparable and even bet- ter compared to both classical and enhanced methods of multi-voxel pattern analysis techniques. KEY WORDS Data mining and Machine Learning, Magnetic Resonance Imaging, Medical Image Processing, Brain State Decoding, Functional Connectivity, MVPA. 1 Introduction Understanding and modelling the human brain is one of the greatest challenges of our century across many fields in neuro-science, cognitive science and computer science. Neuro-imaging mediums, in particular, functional Mag- netic Resonance Imaging (fMRI) is the premier choice for the analysis of the brain in order to reach this goal. How- ever, the bulk amount of incomplete, noisy and yet redun- dant data requires intensive image processing and machine learning techniques to extract useful information about the nature of the brain. The techniques employed for analysing and learning the patterns of brain activity in fMRI data are called Multi-voxel pattern analysis (MVPA). Under the umbrella of MVPA techniques, many problems, such as hypothesis validation [1], diagnosing disorders [2, 3] and recently brain state decoding, also known as mind read- ing [4, 5], are studied to be resolved and expected to have a huge impact on understanding the intrinsics of the brain. Brain decoding tasks are challenging in nature be- cause of the scarcity of the labelled data compared to the high dimensional feature spaces. Therefore, researchers try to reduce the dimensionality of the data by selecting the relevant voxels in order to improve significance. An alternative to this approach is to incorporate various infor- mation sources and modalities to improve the accuracy and the generalization performances. One way to extract usefull information from the fMRI data is to employ all the temporal measurements and de- velop a model for the temporal structure between the voxel pairs. There are a variety of techniques to represent the re- lationship between the time series data. A popular model which is widely used in fMRI data is known as functional connectivity [6,7]. Functional connectivity is defined as the statistical dependency between the neural elements or re- gions across time, and widely used for decoding problems, recently [8–10]. Extracting the functional connectivity pat- terns from the fMRI measurements is not a well defined task and requires many assumptions about the data. First of all, the number of time samples which fall into a cognitive task is very few. Therefore, it is not easy to estimate the correlations other then zero-lag in voxel level. Secondly, due to the many unexplained characteristics of the brain it is not clear which similarity metric suits to model the time series between the voxel airs. There are several approaches proposed in the litera- ture to model the functional connectivity among the voxels for brain decoding. In [9, 10] region level functional con- nectivity matrices for two classes are used for classification by subtracting them to reveal discriminative relations. In [11] functional connectivity is used to select a neighbour- hood for a mesh model (MAD) [12] around each voxel to extract features that compress neighbouring structures for further classification. All of the above mentioned methods utilize functional connectivities among the brain regions, but not among the voxels, resulting a far fewer number of elements in connectivity matrix calculation steps. The ma- jor drawback of these inter regional connectivity analysis is the smoothing effects introduced by averaging all the voxel intensity values within a region. Also, in most cases, the re-

Upload: others

Post on 16-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LARGE SCALE FUNCTIONAL CONNECTIVITY FOR BRAIN DECODING · 2014. 5. 25. · are explained. Third section clarifies the details of the em-ployed method. Experiments on fMRI data and

LARGE SCALE FUNCTIONAL CONNECTIVITY FOR BRAIN DECODINGOrhan Firat1, Itir Onal1, Emre Aksan1, Burak Velioglu1, Ilke Oztekin2 and Fatos T. Yarman Vural1

1Department of Computer Engineering, Middle East Technical University, Turkey,2Department of Psychology, Koc University, Turkey,

Emails:{orhan.firat, itir, emre.aksan, velioglu, vural}@ceng.metu.edu.tr , [email protected]

ABSTRACTFunctional Magnetic Resonance Imaging (fMRI) data con-sists of time series for each voxel recorded during a cog-nitive task. In order to extract useful information fromthis noisy and redundant data, techniques are proposed toselect the voxels that are relevant to the underlying cog-nitive task. We propose a simple and efficient algorithmfor decoding the brain states by modelling the correlationpatterns between the voxel time series. For each stimulusduring the experiment, a separate functional connectivitymatrix is computed in voxel level. The elements in con-nectivity matrices are then filtered out by making use of aminimum spanning tree formed using a global connectivitymatrix for the entire experiment in order to reduce dimen-sionality. For a recognition memory experiment with ninesubjects, functional connectivity matrices are computed forencoding and retrieval phases. The class labels of the re-trieval samples are predicted within a k-nearest neighbourspace constructed by the traversed entries in the functionalconnectivity matrices for encoding samples. The proposedmethod is also adapted to large scale functional connectiv-ity tasks by making use of graphics boards. Classificationperformance in ten categories is comparable and even bet-ter compared to both classical and enhanced methods ofmulti-voxel pattern analysis techniques.

KEY WORDSData mining and Machine Learning, Magnetic ResonanceImaging, Medical Image Processing, Brain State Decoding,Functional Connectivity, MVPA.

1 IntroductionUnderstanding and modelling the human brain is one ofthe greatest challenges of our century across many fieldsin neuro-science, cognitive science and computer science.Neuro-imaging mediums, in particular, functional Mag-netic Resonance Imaging (fMRI) is the premier choice forthe analysis of the brain in order to reach this goal. How-ever, the bulk amount of incomplete, noisy and yet redun-dant data requires intensive image processing and machinelearning techniques to extract useful information about thenature of the brain. The techniques employed for analysingand learning the patterns of brain activity in fMRI dataare called Multi-voxel pattern analysis (MVPA). Under theumbrella of MVPA techniques, many problems, such as

hypothesis validation [1], diagnosing disorders [2, 3] andrecently brain state decoding, also known as mind read-ing [4,5], are studied to be resolved and expected to have ahuge impact on understanding the intrinsics of the brain.

Brain decoding tasks are challenging in nature be-cause of the scarcity of the labelled data compared to thehigh dimensional feature spaces. Therefore, researcherstry to reduce the dimensionality of the data by selectingthe relevant voxels in order to improve significance. Analternative to this approach is to incorporate various infor-mation sources and modalities to improve the accuracy andthe generalization performances.

One way to extract usefull information from the fMRIdata is to employ all the temporal measurements and de-velop a model for the temporal structure between the voxelpairs. There are a variety of techniques to represent the re-lationship between the time series data. A popular modelwhich is widely used in fMRI data is known as functionalconnectivity [6,7]. Functional connectivity is defined as thestatistical dependency between the neural elements or re-gions across time, and widely used for decoding problems,recently [8–10]. Extracting the functional connectivity pat-terns from the fMRI measurements is not a well definedtask and requires many assumptions about the data. First ofall, the number of time samples which fall into a cognitivetask is very few. Therefore, it is not easy to estimate thecorrelations other then zero-lag in voxel level. Secondly,due to the many unexplained characteristics of the brain itis not clear which similarity metric suits to model the timeseries between the voxel airs.

There are several approaches proposed in the litera-ture to model the functional connectivity among the voxelsfor brain decoding. In [9, 10] region level functional con-nectivity matrices for two classes are used for classificationby subtracting them to reveal discriminative relations. In[11] functional connectivity is used to select a neighbour-hood for a mesh model (MAD) [12] around each voxel toextract features that compress neighbouring structures forfurther classification. All of the above mentioned methodsutilize functional connectivities among the brain regions,but not among the voxels, resulting a far fewer number ofelements in connectivity matrix calculation steps. The ma-jor drawback of these inter regional connectivity analysis isthe smoothing effects introduced by averaging all the voxelintensity values within a region. Also, in most cases, the re-

Page 2: LARGE SCALE FUNCTIONAL CONNECTIVITY FOR BRAIN DECODING · 2014. 5. 25. · are explained. Third section clarifies the details of the em-ployed method. Experiments on fMRI data and

gion priors from anatomical atlases are quite coarse or notdetailed enough for the specific activation patterns causedby the cognitive process in voxel level. These downsidesmotivates our research to further explore voxel level con-nectivity patterns resulting large connectivity matrices thatare difficult to compute and store. However, this holisticapproach may provide a fine and informative representationfor a cognitive process. An example of using whole brainconnectivity can be found in [13] where whole brain func-tional connectivity matrices are used but along with MADfor functional neighbour selection in a two class classifica-tion task again for linear-spatial feature extraction.

Another soft spot on fMRI brain decoding tasks asmentioned above, is the scantiness of the labelled samplescompared to the high dimensional feature spaces. Suchregimes are known to be prone to severe deficiencies forclassification problems such as over-fitting, insignificanceand curse of dimensionality. In order to overcome theseproblems, dimensionality reduction and/or voxel selec-tion methods are employed by researchers such as Princi-pal Component Analysis [14–16], Independent ComponentAnalysis [17, 18] and Low-Dimensional Embeddings [19].The commonality between these methods is the transfor-mation of original feature space into a low dimensionalfeature space. Experimental priors are hard to inject inthese approaches and further voxel level interpretationstransformed feature space is not trivial. A convenient andemerging tool to attack dimensionality reduction problemis network coding where the network structure regardingto cognitive states are compressed using graph theoreti-cal approaches [20–22], as also employed in our proposedmethod.

In this study, we employ voxel-level functional con-nectivity matrices directly to classify cognitive processes.The difference of our approach with [11,13] is that we dis-carded any complicated feature extraction steps after cal-culating the connectivity matrices and used them directlyfor matching the corresponding class label. Furthermore,we calculated the pair-wise correlation measures from sig-nal windows adjusted according to the hemo-dynamic re-sponse function. We also employed graphics boards to fur-ther improve the capability of calculating whole brain func-tional connectivity matrices. The quadratic size of the con-nectivity matrices that form our feature space is reduced bytaking the pairwise correlation measures that are spannedby a template Minimum Spanning Tree (MST). TemplateMST is formed using a separate functional connectivitymatrix that is calculated by the pairwise correlations ac-counting all the time points this time not within small sig-nal windows, which further reduced dimensionality to thenumber of voxels in the experiment.

This paper is organised as follows, in the next sectionthe fMRI experiment design and pre-processing processesare explained. Third section clarifies the details of the em-ployed method. Experiments on fMRI data and classifica-tion results are analysed in fourth section followed by thediscussion and conclusion in the fifth and sixth section.

Figure 1. A sample trial for conducted fMRI recognitionmemory experiment. In each trial participant is exposedto a separate study list belonging to one of ten categories(animal category is shown). Each trial proceeded with thepresentation of 5-word study list for 2 seconds in encodingphase, followed by a delay period of math problem solving.In the retrieval phase a test probe is presented and indicatedwhether the word was a member of the current study list.

2 Dataset and fMRI Experiment2.1 fMRI ExperimentIn the current study, fMRI recording was conducted dur-ing a recognition memory task for nine healthy subjects ineight runs per subject. Each participant is shown a list ofwords belonging to a specified category in the encodingphase (e.g. fruits or tools) for 2 seconds as a study list.In the study list, 5 words belonging to the same categoryis presented sequentially for 400 ms each. Following thestudy list, participants solved math problems 3500 ms eachas a delay period. Employing a delay period for 14 sec-onds, allows independent assessment of encoding related(i.e. study list period) brain activation from retrieval re-lated (i.e. during the test probe) activity patterns. Finally atest probe is presented as the last step, and the participantexecutes a yes/no response indicating whether the word be-longs to the current study list for 2 seconds (e.g., see Figure1). Recording was conducted using a 3T Siemens scannerwith a 2 seconds TR, meaning that we obtained a sepa-rate brain volume each 2 seconds. For the MVPA analy-ses we focused on the lateral temporal cortex region having8142 voxels total. fMRI data consists of 2400 time pointsin eight runs with 240 class labels for the encoding phase,and 240 class labels for the retrieval phase (for each of theten classes, 24 samples are obtained for both encoding andretrieval). The encoding samples are used for training anddecoding samples are used for testing a classifier further inour method.

2.2 fMRI Data Pre-processingImage processing and data analysis were performed us-ing SPM5 (http://www.fil.ion.ucl.ac.uk/spm/). Following

Page 3: LARGE SCALE FUNCTIONAL CONNECTIVITY FOR BRAIN DECODING · 2014. 5. 25. · are explained. Third section clarifies the details of the em-ployed method. Experiments on fMRI data and

Figure 2. Example of a typical fMRI experiment for brain state decoding. 4D fMRI data consist of several volumes across time,some of which have assigned to a class labels(indicated by vertical orange lines in time axis). For each class either encoding orretrieval, a separate functional connectivity matrix is formed by considering a suitable time window that encapsulates indicatedclass labels (indicated by highlighted intervals with different colors for classes)

quality assurance procedures to assess outliers or artefactsin volume and slice-to-slice variance in the global sig-nal, functional images were corrected for differences inslice acquisition timing by re-sampling all slices in time tomatch the first slice, followed by motion correction acrossall runs (using sinc interpolation). Functional data werethen normalized based on MNI stereotaxic space using a12-parameter affine transformation along with a nonlin-ear transformation using cosine basis functions. Imageswere resampled into 2-mm cubic voxels and then spatiallysmoothed with an 8-mm FWHM isotropic Gaussian kernel.Further detrending and normalization was avoided becausewe employed connectivity measures within small time win-dows.

3 MethodfMRI data consists of 3-dimensional brain volumes acrosstime {ti}ni=1, where each 3D volume is formed by stackingseveral 2D scans (slices). Each pixel in these 2D imagesrepresents the intensity of a small volume of brain tissue(voxel) at a time instant ti. We represent the voxel inten-sity of a voxel at location sj in a time instant ti as υ(ti, sj),note that sj is a three dimensional vector indicating the po-sition of a voxel in the volume. We indicate the time signalvector for a voxel at location sj as υ(tp:q, sj) where timepoints are delimited starting from p and ending at q. Atypical fMRI experiment consists of several runs, where ineach run the subject is exposed to some task specific stim-ulus {cj}Sj=1 at the predefined time instants, where S is thetotal number of semantic classes in the experiment and nis the total length of the experiment across runs (see Fig-ure 2). Within a large amount of samples across time, only

few have assigned class labels (orange vertical lines in Fig-ure 2). The rest of the samples that are not having a classlabel ti/cj are generally discarded in MVPA tasks or a pre-determined number of time points are averaged accordingto the prior knowledge of the peaks of hemo-dynamic re-sponse function (e.g. 2-3 time points after the stimulus).

3.1 Functional Connectivity for Feature Representa-tion

In this study we utilize the entire temporal structure ofvoxels for each stimulus and none of the samples are dis-carded. For this purpose, first we determine a time win-dow delimited by p and q for each separate stimulus. Itis well known that, Hemo-dynamic Response Function(HRF) peaks around 4-6 seconds after a stimulus and re-turns to baseline after 10-12 seconds [23]. In order to cap-ture the temporal structure, we specified time windows thatspan 6 samples in the fMRI recordings (see Figure 2). Con-sidering 2 seconds TR as the experimental protocol, 6 sam-ples span a full hemo-dynamic response function with 12seconds. After determining a window size in time, we ex-tract a distinct functional connectivity matrix for each timewindow in voxel-level (blue and green highlighted bars inFigure 2). Note that starting from a time window adjustedfor a single stimulus delimited by p and q we obtain a singleconnectivity matrix for each stimulus, as we expand the de-limiters to the experimental limits where p = 1 and q = n,we obtain a single connectivity matrix for the whole ex-periment. The elements in the functional connectivity ma-trix are represented by symmetric dependence measures,in the time domain. It has been suggested that correlationbased measures are well suited for functional connectivityanalysis [24]. Consequently, we use zero-order correlation

Page 4: LARGE SCALE FUNCTIONAL CONNECTIVITY FOR BRAIN DECODING · 2014. 5. 25. · are explained. Third section clarifies the details of the em-ployed method. Experiments on fMRI data and

(cross-correlation) to measure the functional similarity be-tween time-series. The zero-order correlation coefficientρjk, between two voxels at location sj and sk, within thetime window started at time point p and ended at time pointq is calculated by,

ρjk =covjk

(υ(tp:q, sj), υ(tp:q, sk)

)√varj

(υ(tp:q, sj)

)· vark

(υ(tp:q, sk)

) , (1)

where covjk is the covariance of the signals measured attwo voxels within the time window, and varj is the vari-ance of the signals measured at a voxel υ(tp:q, sj) andρjk ∈ [−1, 1]. Note that, p is set to the beginning of a stim-ulus, q is the end point of the window as p+ 6 and the timewindow is indicated by p : q. Here voxel intensities withina time window p : q comprise a vector of intensity values υand used to calculate ρjk. Each stimulus corresponds eitherto encoding (training) or retrieval (test) phase and accom-modates a class label. By calculating connectivity matricesFCp:q for all stimulus, we obtain a training and test set forour further classification task. Considering the nature ofzero-order correlation which is symmetric, resulting func-tional connectivity matrices are also symmetric. Thereforeit is sufficient to take into account only the lower diagonalsof each connectivity matrix. In this study we constructedour feature space for classification, by extracting lower di-agonals of connectivity matrices and converting them intocolumn vectors for further comparison. Each connectivitymatrix is converted into a column vector and represented inthe feature space by these vectors where we finally used ak-nearest neighbours method to determine the class labelsfor test samples.

3.2 Large Scale Connectivity and ParallelismThe functional connectivity matrices grow quadratically aswe increase the number of voxels considered in the exper-iment. As an example, a single connectivity matrix for 8kvoxels has 64M pairwise elements and is time and spaceconsuming to calculate. For 80k voxels, 6.4B entries re-serve around 50GB of memory which is just for one con-nectivity matrix. The practical solution for time and spacecomplexity can be achieved by making use of graphicsboards to speed-up computation time and storing resultingmatrix into disk as chunks for space complexity. We at-tacked this problem by first storing a portion of the timesignals data on the texture memory of GPU then comput-ing only the corresponding chunk of connectivity matrixand storing result to disk in binary format to further spacegain and speed. By iterating over different time signal por-tions repeatedly, lower diagonal chunks of resulting func-tional connectivity matrix is calculated and stored to disk.For speeding up the computations, correlation measure isexpanded to exploit parallelism. Let us simplify the nota-tion of correlation and denote x and y as our time windowintensity vectors for two distinct voxels. The correlationmeasure in (1) can be re-written as follows with a slightabuse of notation,

ρxy =

∑xiyi −m

(∑xi

m

)(∑yi

m

)√(∑

x2i −m(∑

xi

m

)2)(∑y2i −m

(∑yi

m

)2) ,(2)

where m is the temporal window size, which is 6 in ourcase.

The parallelism can be achieved easily by convert-ing a complex problem into a matrix-vector multiplica-tion problem which is very efficient to calculate on GPUs.Analysing the re-arranged formula (2), we observe that∑xi,∑yi,∑x2i and

∑y2i can be calculated with a single

matrix-vector multiplication. Only the summation∑xiyi

needs to be calculated in a kernel as the last step. We canachieve further computation enhancements by storing othersummation results in texture memory of GPU. The overallalgorithm to compute a large scale functional connectivitymatrix is given as follows:

1. Load data chunk to device memory (time series)

2. Calculate∑xi,∑yi using CUBLAS, save resulting

vectors on device memory

3. Square data matrix with a transform routine on Thrustand calculate

∑x2i ,∑y2i using CUBLAS, save re-

sulting vectors on device memory

4. Load source and destination time-series into texturememory (pairwise cliques)

5. Calculate pairwise correlation on GPU, save resultingmatrix chunk on disk

6. Return to step 4

In our CUDA implementation for large scale func-tional connectivity calculation, we employed CUDA BasicLinear Algebra Subroutines (CUBLAS) and CUDA Thrustlibraries for speed up and simplicity.

3.3 Dimensionality Reduction Using Minimum Span-ning Trees

The quadratic growth of the connectivity matrices drasti-cally increases the dimension of feature space when weuse all the correlation measures directly in classification.As an example 8k voxels results in a connectivity matrixthat has 64M pairwise elements and a feature vector having32M dimensions (considering only lower diagonals). Dueto the scarcity of the labelled samples which is 240/240 fortraining/test, we need to employ some dimension reductiontechniques. In such large scales, conventional dimensional-ity reduction approaches become obsolete because of theirtime and space complexities (even inverting a big matrixor singular value decomposition is costly). Therefore oneneeds an efficient approach to resolve this problem. Graphtheory is a well structured and developed are field in com-puter science where it becomes handy in large scale net-work analysis problems. A functional connectivity matrix

Page 5: LARGE SCALE FUNCTIONAL CONNECTIVITY FOR BRAIN DECODING · 2014. 5. 25. · are explained. Third section clarifies the details of the em-ployed method. Experiments on fMRI data and

Figure 3. Template Minimum Spanning Tree calculation process. Each pairwise relations within the entire experiment is usedto calculate a global connectivity matrix FC1:n with a time window equal to the length of experiment. Template MST is thenformed by converting connectivity matrix into an intermediate distance matrix G.

is an affinity matrix which can be converted to a distancematrix easily. For each pair-wise element jk of a connec-tivity matrix FC, we can convert a correlation measure ρjkinto a distance measure djk by calculating,

djk = 1− |ρjk| , for all pairs jk, (3)

where both positive and negative correlation are assumedto have equal information, and zero correlation indicatinglargest distance.

Figure 4. Algorithm flow of proposed method. Input fMRIdata is used simultaneously to compute functional connec-tivity matrices for each stimulus within small time windowstp:q and calculating the template MST within entire timedomain t1:n. Resulting template MST is then used to filterout elements of each FCp:q , constructing training and testdata further fed to classifier.

After defining a distance measure, we can transformour dimensionality reduction problem into a graph traversalproblem using a Minimum Spanning Tree (MST), whichspans all the nodes of the graph (voxels) with minimum to-tal distance (regarding to maximum correlation after ourtransformation). The suggested approach enables us to

compress the information encoded in a correlation matrixby taking into account only the most informative (highlypositive or negative correlations) pair-wise relations, span-ning entire region of interest in the brain for cognitive pro-cesses. Furthermore, we use the path traversed by the MSTto filter out elements in any other correlation matrix byconsidering only the pairwise relations reside on the path.Note that, in a graph with V nodes, MST of the graph hasV − 1 edges, meaning that we can transform a functionalconnectivity matrix as a distance matrix for 8k voxels hav-ing 64M elements then representing it with a spanning treehaving only 8k-1 edges (elements of the distance matrix)out of 64M elements. The reduction of the dimensionalityis colossal and easy to compute with iterative MST rou-tines [25].

Extraction of a unique MST from a set of functionalconnectivity matrices is not a trivial problem; because inour model, for each stimulus, a different connectivity ma-trix is calculated and correlations are extracted within thetime interval of that stimulus. If we extract a different MSTfor each connectivity matrix, then each MST will span adifferent feature space. A practical solution to overcomethis problem is to calculate a unique MST that is sharedamong all the connectivity matrices extracted from distinctstimulus windows. This global template MST should en-capsulate all the shared structure among different stimulusand must be shared to extract comparable features that arelocalized with its unique path. As mentioned before, wecan extend the time window of a stimulus into the limitsand obtain a single connectivity matrix for the whole exper-iment by setting delimiters p = 1 and q = n. This yields asingle connectivity matrix which can also be considered ascorrelation pattern of the entire experiment. By calculatinga unique connectivity matrix for the entire experiment, wecan extract a template MST to be used as a reference pathto filter out not-spanned edges in each of the connectivitymatrices calculated for each stimulus.

Page 6: LARGE SCALE FUNCTIONAL CONNECTIVITY FOR BRAIN DECODING · 2014. 5. 25. · are explained. Third section clarifies the details of the em-ployed method. Experiments on fMRI data and

Method Subject-1 Subject-2 Subject-3 Subject-4 Subject-5 Subject-6 Subject-7 Subject-8 Subject-9MVPA 0.39 0.54 0.46 0.47 0.49 0.47 0.44 0.52 0.43MAD [12] 0.51 0.65 0.60 0.64 0.67 0.58 0.56 0.63 0.55FC-Euc 0.64 0.68 0.63 0.60 0.63 0.67 0.61 0.61 0.64FC-Cos 0.63 0.68 0.62 0.60 0.62 0.64 0.61 0.60 0.64FC-Cor 0.65 0.68 0.64 0.62 0.61 0.68 0.62 0.63 0.64FC-Abs 0.64 0.68 0.63 0.60 0.62 0.66 0.61 0.61 0.64FC-MST-Euc 0.54 0.61 0.58 0.55 0.56 0.61 0.57 0.56 0.62FC-MST-Cos 0.50 0.59 0.54 0.54 0.54 0.60 0.55 0.54 0.59FC-MST-Cor 0.66 0.67 0.64 0.62 0.61 0.65 0.63 0.64 0.64FC-MST-Abs 0.57 0.63 0.59 0.58 0.57 0.63 0.57 0.58 0.62Max - FC 0.65 0.68 0.64 0.62 0.63 0.68 0.62 0.63 0.64Max - FC-MST 0.66 0.67 0.64 0.62 0.61 0.65 0.63 0.64 0.64Impr.wrt MAD 0.15 0.03 0.03 -0.01 -0.04 0.10 0.07 0.01 0.09Impr.wrt MVPA 0.26 0.14 0.18 0.15 0.14 0.21 0.19 0.12 0.21

Table 1. Classification performances of proposed method compared with classical methods for fMRI machine learning tasks(MVPA) and a local-linear feature extraction method (MAD). Proposed method is indicated by FC and followed by the distancemetric used.

Formally speaking, a minimum spanning tree is con-nected sub-graph of an undirected weighted graph, con-taining all the nodes in the original graph with no cycles.Such a sub-graph with no cycles is a spanning tree and withthe minimum total weights it is called minimum spanningtree. We define a fully connected intermediate graph G asG = (V,E) where the nodes of the graph sj ∈ V , rep-resent the voxel coordinates and edge weights are definedas djk calculated using (1) by setting p = 1 and q = nthen transforming into distances using (3). Note that inthe time interval t1:n, sample windows that belong to re-trieval phases (test samples) are excluded at all steps forfairness. Computation of the template MST for whole timeseries MSTFC1:n

, is straightforward after nodes and arcweights are defined, using Kruskal’s method. ResultingMSTFC1:n

has V − 1 edges that are further used to fil-ter each connectivity matrix extracted using small windowsof stimulus. This process is illustrated in Figure 3, wherefunctional connectivity matrix is calculated using all voxelpairs including entire time points this time. Resulting tem-plate minimum spanning tree MSTFC1:n

of connectivitymatrix FC1:n is then computed with the help of interme-diate graph G. In Figure 3, template MST is visualisedonly with the edges marked in lower diagonal since it isundirected. The marked elements in the template MST isfurther used to filter each connectivity matrices to reducethe dimension of the feature vector. The algorithm flow isillustrated in Figure 4. In the last step of the algorithm, theselected elements in the lower diagonals by template MSTof each stimulus FC matrix, are used to form training andtest data, depending on the records obtained at the encodingor retrieval phase of the experiment.

4 ExperimentsIn our experiments on fMRI brain decoding task, we em-ploy a region of interest having 8142 voxels for 9 sub-

ject where each subject has 8 experimental runs. Theclassification performance is measured for each run sep-arately and then averaged for each subject. For a k-nearest neighbour classification regime, we need a dis-tance metric for our feature space. In this study, we em-ploy four different distance metrics to measure the distancebetween training and test sample vectors. These metricsare standard Euclidean distance (FC/FCMST-Euc), cosinebetween two vectors (FC/FCMST-Cos), 1-absolute corre-lation (FC/FCMST-Cor) and sum of absolute differences(FC/FCMST-Abs). We compare our method with a stan-dard MVPA approach where voxel intensity values are di-rectly fed into the classifiers. As a second comparisonwe used Mesh Learning approach (MAD) to extract local-linear features from voxel intensity values, which exploitspatial structure among the voxels. The performance re-sults are illustrated in Table 1. We observe a significant im-provement up to 25% compared to classical MVPA methodin all of the subjects. This improvement is expected as weincorporated temporal information into classification pro-cess. Calculating correlation and using coefficients as fea-tures smooths out noise in voxel intensities.

MAD approach exploits spatial structure of the voxelintensities for each time instant separately and enhancesthe classification performance compared to the classicalMVPA approach. We observed a better performance upto 14% compared to MAD approach. For two out of ninesubjects, MAD approach give better results. For the otherseven subjects proposed method substantially outperformsthe MAD approach.

5 DiscussionThe suggested approach is distinguished from the othercomparable methods in the way of accounting the fMRIdata modalities. MAD exploits spatial modality of fMRIdata whereas our proposed method exploits temporal

Page 7: LARGE SCALE FUNCTIONAL CONNECTIVITY FOR BRAIN DECODING · 2014. 5. 25. · are explained. Third section clarifies the details of the em-ployed method. Experiments on fMRI data and

modality. Considering the noisy spatial acquisition pro-cess of fMRI data, MAD approach is more prone to noisewith respect to proposed method which slightly smooths-out noise component by correlation analysis. Another ad-vantage of the proposed method is its simplicity. The modelorder selection or neighbour selection problems of MADapproach are simply avoided by incorporating correlationstructures. However, the problem of selecting the informa-tive pairwise relations is emerged. This problem is resolvedby calculating a template MST to span the most informa-tive pair-wise relations and accounting only the ones thatare spanned by template MST. Final step reduced the di-mensionality drastically down to 8141 from 32M with aminor performance drawback in three subjects but equalor better performance in other subjects. The performancegain observed by reduced dimensionality is also expected.Avoiding the curse of dimensionality and overfitting by re-duced dimensions results a higher generalization perfor-mance with a better significance. The dimensionality ofthe feature spaces for each method is also illustrated in Ta-ble 2. by analysing Table 2 we observe that improved per-formance for enhanced methods are encouraged by higherdimensions in the feature spaces. The baseline MVPAmethod employs around 8k feature dimensions. With suchenhanced methods (MAD) in order to achieve better per-formance the trade off reveals it self as increased featuredimensions. The proposed method aids this problem byretracting the high number of dimensions with improvedclassification accuracy.

Another observation regarding to the proposedmethod is the structure of the calculated template MST. InFigure 3, illustrated MST has the majority of the connec-tions near diagonals meaning that the most correlated andinformative relations reside in a close vicinity of a voxelwhich is known from the capillary structure the brain andpoint spread function of the fMRI medium. But the inter-esting connections also observed off the diagonals whichare remote informative relations. Further studies should re-veal these connectivities and examine the hub and cluster-ing structures.

Method Dimension of Feature SpaceMVPA 8.142MAD 16.284-154.698FC 33 MFC-MST 8.141

Table 2. Dimensionality of the feature spaces for differentmethod employed and proposed method.

In our experiments we also employed a stress testfor correlation analysis on GPU, by calculating a pairwisefunctional connectivity matrix for 80.000 and 220.000 vox-els. We observed that the developed approach is capable ofcalculating and storing in such large-scales which is justnot possible with a straight-forward approach or by using

tools such as SPM, Functional Connectivity Toolbox [26]or FSL [27]. The developed approach implemented withCUDA is available on http://ceng.metu.edu.tr/˜e1697481/SCT/.

6 ConclusionIn this study we introduced a simple approach for brain de-coding using temporal structures of voxel activations. Inthe proposed method, voxels are represented in the fea-ture space by their pairwise correlation coefficients with allother voxels within a time window determined by hemo-dynamic response function. Proposed method is tested ina recognition memory experiment having 10 classes with9 subjects and results are compared with standard methodsand methods that exploit spatial structure. Experimentalresults suggest that the proposed method is more robust tonoise by incorporating temporal structure and comparablewith the current methods for MVPA.

Further studies should focus on selecting the informa-tive relations encoded in the functional connectivity ma-trices in order to discriminate semantic classes for higherclassification accuracy. For a better significance, dimen-sionality of the feature representations should be furtherreduced by either model order selection methods or byanalysing between class connectivity matrices. As a futurework we take into consideration to combine the proposedmethod with methods exploiting spatial structures.

AcknowledgmentThis work is supported by TUBITAK under the grant num-ber 112E315.

References

[1] I. Oztekin and D. Badre, “Distributed Patterns ofBrain Activity that Lead to Forgetting.” Frontiers inhuman neuroscience, vol. 5, 2011.

[2] T. Schmah, G. E. Hinton, R. S. Zemel, S. L. Small,and S. C. Strother, “Generative versus discriminativetraining of rbms for classification of fmri images.” inNIPS, 2008.

[3] M. N. Coutanche, S. L. Thompson-Schill, and R. T.Schultz, “Multi-voxel pattern analysis of fMRI datapredicts clinical symptom severity.” NeuroImage,vol. 57, 2011.

[4] K. A. Norman, S. M. Polyn, G. J. Detre, and J. V.Haxby, “Beyond mind-reading: multi-voxel patternanalysis of fMRI data.” Trends in cognitive sciences,vol. 10, 2006.

[5] J.-D. Haynes and G. Rees, “Decoding mental statesfrom brain activity in humans.” Nature reviews. Neu-roscience, vol. 7, 2006.

[6] K. J. Friston, “Functional and effective connectivity:a review,” Brain connectivity, vol. 1, no. 1, pp. 13–36,2011.

Page 8: LARGE SCALE FUNCTIONAL CONNECTIVITY FOR BRAIN DECODING · 2014. 5. 25. · are explained. Third section clarifies the details of the em-ployed method. Experiments on fMRI data and

[7] O. Sporns, “Networks of the Brain,” Nov. 2010.

[8] B. Chai, D. B. Walther, D. M. Beck, and F.-F. L., “Ex-ploring Functional Connectivity of the Human Brainusing Multivariate Information Analysis,” Advancesin neural information processing systems, 2009.

[9] J. Richiardi, H. Eryilmaz, S. Schwartz, P. Vuilleumier,and D. Van De Ville, “Decoding brain states fromfMRI connectivity graphs.” NeuroImage, vol. 56,no. 2, pp. 616–26, May 2011.

[10] W. R. Shirer, S. Ryali, E. Rykhlevskaia, V. Menon,and M. D. Greicius, “Decoding subject-driven cog-nitive states with whole-brain connectivity patterns.”Cerebral cortex (New York, N.Y. : 1991), vol. 22,no. 1, pp. 158–65, Jan. 2012.

[11] O. Firat, M. Ozay, I. Onal, I. Oztekin, and F. T.Yarman-Vural, “Functional mesh learning for patternanalysis of cognitive processes,” in IEEE 12th In-ternational Conference on Cognitive Informatics andCognitive Computing, ICCI*CC, 2013, pp. 161–167.

[12] M. Ozay, I. Oztekin, U. Oztekin, and F. T. Y. Vu-ral, “Mesh Learning for Classifying Cognitive Pro-cesses,” Arxiv:1205.2382, 2012.

[13] O. Ekmekci, O. Firat, M. Ozay, I. Oztekin, F. T.Yarman-Vural, and U. Oztekin, “Mesh learning forobject classification using fmri measurements,” inIEEE International Conference on Image Processing,ICIP, 2013, pp. 2631–2634.

[14] L. K. Hansen, J. Larsen, F. A. Nielsen, S. C. Strother,E. Rostrup, R. Savoy, N. Lange, J. Sidtis, C. Svarer,and O. B. Paulson, “Generalizable patterns in neu-roimaging: How many principal components?” Neu-roImage, vol. 9, no. 5, pp. 534–544, 1999.

[15] R. Viviani, G. Gron, and M. Spitzer, “Functional prin-cipal component analysis of fmri data,” Human brainmapping, vol. 24, no. 2, pp. 109–129, 2005.

[16] G. S. Sidhu, N. Asgarian, R. Greiner, and M. R. G.Brown, “Kernel principal component analysis for di-mensionality reduction in fmri-based diagnosis ofadhd,” Frontiers in Systems Neuroscience, vol. 6,no. 74, 2012.

[17] V. D. Calhoun, T. Adali, L. K. Hansen, J. Larsen, andJ. J. Pekar, “Ica of functional mri data: An overview,”in in Proceedings of the International Workshop onIndependent Component Analysis and Blind SignalSeparation, 2003, pp. 281–288.

[18] F. Pereira, T. Mitchell, and M. Botvinick, “Machinelearning classifiers and fmri: a tutorial overview,”Neuroimage, vol. 45, no. 1, pp. S199–S209, 2009.

[19] P. Mannfolk, R. Wirestam, M. Nilsson, F. Stahlberg,and J. Olsrud, “Dimensionality reduction of fmri timeseries data using locally linear embedding,” Mag-netic Resonance Materials in Physics, Biology andMedicine, vol. 23, no. 5-6, pp. 327–338, 2010.

[20] S. L. Bressler and V. Menon, “Large-scale brain net-works in cognition: emerging methods and princi-ples,” Trends in cognitive sciences, vol. 14, no. 6, pp.277–290, 2010.

[21] A. F. Alexander-Bloch, N. Gogtay, D. Meunier,R. Birn, L. Clasen, F. Lalonde, R. Lenroot, J. Giedd,E. T. Bullmore, A. Alexander-Bloch et al., “Disruptedmodularity and local connectivity of brain functionalnetworks in childhood-onset schizophrenia,” Fron-tiers in systems neuroscience, vol. 4, p. 147, 2010.

[22] Y. He and A. Evans, “Graph theoretical modelingof brain connectivity,” Current opinion in neurology,vol. 23, no. 4, pp. 341–350, 2010.

[23] S. A. Huettel, A. W. Song, and G. McCarthy, Func-tional magnetic resonance imaging. Sinauer Asso-ciates Sunderland, MA, 2004, vol. 1.

[24] S. M. Smith, K. L. Miller, G. Salimi-Khorshidi,M. Webster, C. F. Beckmann, T. E. Nichols, J. D.Ramsey, and M. W. Woolrich, “Network modellingmethods for FMRI.” NeuroImage, vol. 54, no. 2, pp.875–91, Jan. 2011.

[25] J. B. Kruskal, “On the Shortest Spanning Subtreeof a Graph and the Traveling Salesman Problem,”Proceedings of the American Mathematical Society,vol. 7, no. 1, pp. 48–50, Feb. 1956.

[26] D. Zhou, W. K. Thompson, and G. Siegle, “MAT-LAB toolbox for functional connectivity.” NeuroIm-age, vol. 47, no. 4, pp. 1590–607, Oct. 2009.

[27] M. Jenkinson, C. F. Beckmann, T. E. Behrens, M. W.Woolrich, and S. M. Smith, “Fsl,” Neuroimage,vol. 62, no. 2, pp. 782–790, 2012.