extracting activated regions offmri data using...

Proceedings of International Joint Conference on Neural Networks, Atlanta, Georgia, USA, June 14-19, 2009

Extracting activated regions of fMRI data using unsupervised learning

Heydar Davoudi1, Ali Taalimi1

, and Emad Fatemizadeh1

1Biomedical Signal and Image Processing Laboratory (BiSIPL)Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran.

Abstract- Clustering approaches are going to efficientlydefine the activated regions of the brain in fMRI studies.However, choosing appropriate clustering algorithms anddefining optimal number of clusters are still key problems ofthese methods. In this paper, we apply an improved version ofGrowing Neural Gas algorithm, which automatically operateson the optimal number of clusters. The decision criterion forcreating new clusters at the heart of this online clustering istaken from MB cluster validity index. Comparison with otherso-called clustering methods for fMRI data analysis shows thatthe proposed algorithm outperforms them in both artificial andreal datasets.

I. INTRODUCTION

CLUSTER analysis is a significant exploratory dataanalysis approach for finding the activated regions infMRI data [1-8]. Many clustering methods described in

the literature, attempt to partition the brain time series intoseveral temporal patterns according to some similarity amongthem. Furthermore, extracting appropriate features fromfMRI time series is considered as a challenging research topic[7].

Wismuller et.al. [5] in a fMRI study, compared theperformance of three well-known clustering algorithms,namely, Neural Gas [9], Kohonen's self-organizing map [10],and a fuzzy clustering scheme based on deterministicannealing [11]. These three algorithms require the number ofcluster as an input, and choosing the true number of clustersis a challenging task. They considered the temporal pattern ofeach pixel (Pixel Time Course or PTC) as its feature vector.In addition, the correlation coefficient between task-relatedcluster and activation paradigm, and its correspondingquantization error were defined as performance measures.They showed that Neural Gas outperforms the two otheralgorithms, by achieving the minimum quantization error andmaximum correlation with activation paradigm.

In a similar but more comprehensive work [6], differenttypes of clustering approaches such as hierarchical, crisp(neural gas, self-organizing maps, hard competitive learning,k-means, maximum-distance, CLARA) and fuzzy (c-means,fuzzy competitive learning) algorithms are studied on bothartificial and real fMRI data. All of these algorithms mustknow the number of clusters a priori. Similar to [5], thewhole PTC of pixels is considered as their feature vector. Inorder to evaluate the performance of those algorithms, twomeasures, namely correlation coefficient and weightedJaccard coefficient are proposed. It is investigated that,Neural gas is the best algorithm leading to a cluster withmaximum true positive (TP) but minimum false positive (FP)pixels.

978-1-4244-3553-1/09/$25.00 ©2009 IEEE

Because of low SNR and high dimension of raw temporalpattern of pixels, it is better to extract some feature fromthem that contain more information for clustering purposes.Lange and Zeger [12] considered some parameters of Gammafunction for cross-correlation, as features. In [13], thehemodynamic response in fMRI is modeled with FIR filter.Some features such as group delay and standard deviationcan be extracted from filter coefficients. Goutte et.al. [7],extracted activation strength and delay from the preprocessed cross-correlation function. In addition, they createdfeature vectors from parameters produced by severalactivation map computation methods for a meta-clusterapproach. Another significance of their study is the use ofsome information criteria or cluster validity indices, such asAkaike Information Criterion (AIC), Bayesian InformationCriterion (BIC), and Integrated Classification Likelihood(ICL), to find the optimal number of clusters. However, onemay run the clustering algorithm for a wide range of numberof clusters, to find where the cluster validity index achievesits highest value.

As notified, one of the key problems in clustering is todecide on the number of clusters. However, fMRI time seriesliterature mostly have ignored this issue. Most of the fMRIclustering algorithms require the number of clusters a priori.To overcome this problem, some cluster validity indices areintroduced to determine the optimal number of clusters. Inaddition to AIC, BIC, and ICL, many other cluster validityindices have been proposed, such as the Bezdek's partitioncoefficient (PC) [14], Davies-Bouldin's (DB) index [15], andrecently proposed Maulik-Bandyopadhyay (MB) index (alsoknown as index 1) [16]. The MB Index outperforms otherwell-known indices in conjunction with different clusteringalgorithms.

In this paper, we apply a novel clustering algorithm [17]into fMRI data analysis. This algorithm unifies a growingcompetitive clustering algorithm with cluster validity indices.We selected Growing Neural Gas (GNG) [18] as a growingclustering algorithm and MB Index [16] as a cluster insertionpolicy (stopping criteria) in the heart of growing neural gasalgorithm.

The organization of the rest of the paper is as follows.Our clustering algorithm is proposed in Section 2. Section 3presents the experiments. Finally, Section 4 gives someconcluding remarks.

II. PROPOSED METHOD

A. GNG algorithm and MB index

GNG is a growing soft competitive learning algorithmproposed in [18] which has many applications in data

641

clustering. This algorithm starts with very few clusters, andnew clusters are inserted successively and in periodic way,near the cluster with the most accumulated error.

One may say that if the clustering error ofGNG algorithmbe considered as a stopping criterion, the optimal number ofcluster will be achieved. However, if we let this error toomuch small, each sample tends to be a separate cluster. Asanother alternative, net size can be used as a stoppingcriterion. It means that the number of clusters should beknown a priori.

As it is said earlier, many clustering algorithms requirethe number of clusters as an input. Cluster validity indices, inaddition to checking the quality of clustering results, candetermine the optimal number of clusters. One recentlydeveloped index, known as ME index or Index I, is defined asfollows [16]:

MB(N )=={_1. E(l) D(N )}P (1)e N

eE(N

e) e

Where Nc is the number of clusters, and E(N) is theintra-cluster distance defined by:

N c N

E(NJ = LLUj i IIXj -Will (2)i=l j=l

Where N is the number of samples and uij is the partitionmatrix defined as below:

{I X. E C.

U j i = 0 ~j ~~i (3)

And the inter-cluster distance is defined by:N c

D(NJ = rrj~f IIWi -Wjll (4)

This distance is the maximum separation between twoclusters over all possible pairs of clusters, and it cannotexceed the maximum distance of two pairs of data samples.

The correct number of clusters is considered as theextermal point of Equation 1. As can be seen from thisEquation, the MB index is composed of three terms,

namely,_I_, E(1) , and D(Ne

) . As N; increases, the firstN e E(Nc )

term decreases, the second and the third terms increase. Infact, the first term is a penalty term, avoiding the number ofclusters to be identified too large. These three terms competewith each other critically and make a plausible balance todetermine the optimal number of clusters. The power p isused to control the contrast of index. We use p = 1 in thispaper.

B. Proposed Clastering Algorithm

By manipulating both growing neural gas algorithm andMB cluster validity index, and combining them in an efficientway, a novel clustering algorithm will be achieved leading tothe optimal number of clusters. We chose GNG algorithmbecause it has constant parameters over time and benefits agrowing structure.

In GNG, new clusters will be inserted after showing Asamples. However, there is no guarantee that the insertion isnecessary for these newly arrived samples. Anotherpossibility is that these A samples may contain more than one

underlying cluster. Thus, it may be more plausible to makethis "periodic insertion policy" more adaptive and intelligent.Our idea is to check performance measure by a criterion, suchas a manipulated MB Index, to make the resulted clusterinsertion more reasonable.

Since the MB Index originally is performed after aclustering algorithm, it needs all the dataset. To make ouralgorithm online, not requiring the whole dataset, wecomputed the clustering error, recursively. Furthermore,instead of decreasing the accumulated error around eachcluster in each step, a forgetting factor for each cluster isused. These accumulated errors are used for computing MBindex, i.e. a modified version of MB index which hasforgetting factor is used.

The complete proposed algorithm is as the following [17]:

1. a. Create Network 1 by initializing the clusters set A to

contain two clusters c1 and C2 with prototypes WI and

W 2 randomly chosen from data. A == {c1,c 2 }

Initialize the connection set of Network 1, P c {A x A}, to

the empty set.b. Create Network 2 by initializing the clusters set A' to

contain three clusters c; ,c; ,and c; with previously chosen, ,

prototypes W 1 == W 1 and W 2 == W 2 and newly chosen

, A' {' , '}W 3 . == c1, c2 , c3

Initialize the connection set of Network 2, P' c {A' x A'} ,to the empty set.2- Get an input instance Xi (feature vector of ith pixel).a. Determine both the winner s1 and the second-nearestcluster s2 from set A .b. Determine both the WInner s; and the second-nearest

I ' fr A'custer S2 om set .

3. a. If the connection between S1 and S2 does not exist

already, create it: P == P u {(sl' S2)}"

b. If the connection between s; and s; does not exist

already, create it: P' == P' u {(s;, s;)}"

4. a. Set the age of the connection between Sl and S2 to

zero.

b. Set the age of the connection between s; and s; to zero.

5. Add the squared distance between the input signal and thenearest cluster in input space to a local error variable for bothnetwork:

ES1

new = AES1

old +Ilx - W SI II for Network 1

E' new == AE' old + IIX - W' II fior Network 2SI SI SI

Where A is forgetting factor and is chosen near one.

6. a. Adapt W SI and its direct topological neighbors as

follow:

Aw, ==cb As (x -W s )1 1 1

~W i == cn Ai (x - w.) for all direct neighbors i of Sl"

b. Adapt w' and its direct topological neighbors as follow:SI

642

~w:! ==£b A; (x -w:!)

~w~ ==£i A' (x-w~) for all directneighborsiof s;.

7. a. Increment the age of all edges emanating from s1 •

b. Increment the age of all edges emanating from s; .8. Remove edges of both networks with an age larger

than a max . If this results in units having no emanating edgesin network 1, remove them. Then remove the units innetwork 2 with minimum accumulated error. Note thatnetwork 2 always must have one more cluster than network1.

I'9. Compute a == - as follow: a = AE(Nc)Nc D'

I AE'(N;)N; D

Where AE is total accumulated error and computed asfollow:

Nc

AE(Nc)=LEii=l

and D is maximum distance between two clusters.10. If a>1 insert a new cluster as follows:a. let Network 1 equal to Network 2.b. grow Network 2 by inserting a new cluster as follow:Determine the cluster q' with the maximum accumulated

error.Determine among the neighbors of q' the cluster f' with

the maximum accumulated error.Add a new cluster r' to the Network 2 and interpolate itsprototype from q' and f' as follow:

A'==A'u{r'} , w'. ==(w~, +w~,)/2.

Insert edges connecting the new cluster r' with cluster q

and f' and remove the original edge between q' and f' .Decrease the error variables of q' and f' by a fraction a .

M;~ == -aE~ M;~ == -aE~Interpolate the error variable of r from q and fE' ==(E' +E' )/2

r q f10. If still any sample exists go to step 2.We used the following parameters empirically:

cb == 0.05 '&n == 0.0006, a == 0.5, f3 = 0.0005, a max == 50,

,1== 0.9

Our proposed clustering algorithm can be considered asan intelligent system which can perform the following [19].

1) Learnfrom a large amount ofdata.2) Adapt incrementally in an online mode.3) Create new clusters automatically.4) Memorize Prototypes and some data that can

be used at a later stage for more refinement.5) Interact continiousely with the enviroment in a

"life-long" learning manner.6) Adequately represent space and time using

parameters that represent short term and long term memory.

Having these characteristics, this algorithm follows canbe accounted as an evolving connectionist systems (ECOS)[19]. Evolving connectionist systems are systems that evolvetheir structure and functionality over time through interactionwith environment (incoming data).

III. EXPERIMENTS

A.jMRl Datasets

Quantitative performance assessment was done by bothsimulated and real datasets. The simulated dataset was a fullyartificially constructed mathematical fMRI phantom. Itconsisted of a time series (248 instances) of a transversalbrain slice (64x64 pixels) and three regions of activation (28pixels, see Figure I(a)). The signal intensity increases duringactivation in a boxcar-like pattern (25 instances off, 25instances on).

The real dataset is an auditory fMRI activation data takenfrom the Welcome Trust Centre for Neuroimaging atUniversity College of London (UCL), available athttp://www.fil.ion.ucl.ac.uklspm/data/auditory.Thissmoothed warped realigned fMRI dataset consists of timeseries of 96 images with a 3-D matrix size of 53x63x46pixels.

Figure 2(a) shows a 53x63 gray activation map and itsbinary image for z=24 scan. Note that how temporal lobes(auditory cortex) is activated. Furthermore, activated volumesin 3-D brain (for all 46 scans) are illustrated in Figure 2(b).

B. Preprocessing

Removing background pixels is the first step in fMRIanalysis. Next, temporal pattern of pixels must be processed,in order to better representation of their underlyinginformation. Fig. 1(b) shows the PTC of an activated pixel inartificial dataset. After normalizing this signal, removing itsoutliers, and passing it through a smoother, it will be morecompatible with the activation paradigm (see Fig. I(c)).

For real dataset, in addition to normalization, outlierremoval, and smoothing, we performed a first orderdetrending method using six first and six last values of eachPTC. Fig. 3, illustrates how preprocessing improves anactivated PTC from real dataset.

C. Results

We mainly considered the features introduced in the studyperformed by Goutte et.al. [7]. However, two features,activation strength and delay, led to acceptable results andother features had negligible effects on overall performance.

Furthermore, we chose Jaccard Coefficient (JC) [6] as ourperformance coefficient, which is defined as follow:

JC==TP/ (TP+FP+ FN) (5)

Where TP, FP, and FN are, respectively, true positive,false positive, and false negative values providing aquantitative measure of the quality of activation cluster.

We compared the Jaccard coefficient of the proposedalgorithm with some well-known clustering algorithms, suchas K-means, Neural Gas (NG), Growing Neural Gas (GNG),and Fuzzy C-means (FCM), with different numbers ofclusters. It should be considered that in contrast with theseclustering algorithms which require the number of cluster asan input, the proposed method automatically stops on theoptimal number of clusters, and gives it as an output.

643

b

According to Table I, it is obvious that the proposed methodoutperforms other clustering algorithms with a high Jaccardcoefficient.

Fig. 3. a) An activated PTC (48,28,24) from real auditory dataset. b)preprocessed PTC (solid line) in comparison with the activation paradigm.Consider that how preprocessing significantly improves the compatibility ofthe activated PTC with the activation paradigm.

a

c

..~, ,- .-. ." . t..... /-

c e !

.~c e

.,H o -:(J lCO reo 200 ec

Fig. I. a) Three regions of activity in artificial activation map. b) anactivated PTC (25,25). c) preprocessed PTC (solid line) in comparison withthe activation paradigm. Note that how preprocessing improves thecompatibility of the activated PTC with the activation paradigm.

a

b

.,:'

::1fl-r,

TABLE ICOMPARISON OF TilE P ERFORMANCE OF PROPOS ED M ETHOD WI TH FOUR

W ELL-KNOWN C LUSTERING A LGORITHMS

Artificia l Dataset VCL Aud itory Da ta set

Ne =.j Ne - 10 Ne = 15 Ne - 5 Ne - 7 Ne - 15

kmeans 0 .90 0 .% 0.% 0 .90 o.xx 0 .X2

Neura l Gas (N f ; ) 0 .% lJ.9X 0.') 1 0 .9.j lJ.9 1 lJ.X5

G rowi ng Neural Gas (CiNG ) 0 .97 0 .96 0.96 0 .92 0 .87 0 .79

Fuzz~' C-me:lIls ( FCM) 0 .9.~ 0 .')5 0.92 0 .86 O.X.j lJ.7J

Proposed meth od

( Improved (; Nfi ) lJ.99 (Nc v-l ) O.9J (Ne~<) )

IV. CONCLUSION

Cluste ring techni ques are significant model-free dataanalysis approaches for finding the activated regions in fMRIdata . In this paper, an online, growing, competitive clusteringalgorit hm is applied to find these regions. This methodachieves the optimal number of underlying clusters in bothartificial and real datasets automatically and efficiently.

We manipulated both the so-called Growing Neural Gasalgorithm [18] and MB cluster validity index [16], and finallycombined them in an efficient way. In fact, the insertioncondition of the growing neural gas algorithm is changed intothe comparison of two subsequent modified MB index foreach feature vector. This enab les the learning system to workon the optimal number of underlying clusters in fMRI data .We guess that focusing on this novel family of clusteringmethods which have no prior information about the topo logyof data will bring a robust unsupervised learning approach forfinding activated regions of a variety of datasets.

Fig. 2. a) Gray activation map (left) of the real auditory data for z=24 scan,and its binary image (right). b) activated volumes in 3-D brain (for all 46scans). Note that how temporal lobes (auditory cortex) arc activated.

R EFERENCES

[I] K.I1. Chuang, MJ. Chiu, C.C. Lin, and J.H. Chen, "Model-freefunctional MRI analysis using kohonen clustering neural network andfuzzy C-means," IEEE Trans. Med Imag ., vol 18, pp. I I 17-1128. Dec.1999.

[2] G. Scarth, M. McIntrye, B. Wowk, and R. Samorjai, "Detectionnovelty in functional imaging using fuzzy clustering," in ProceedingsSMR 3rd Annual Meeting 1995, pp. 238-242.

644

[3] S. Ngan and X. Hu, "Analysis of fmri imaging data using selforganizing mapping with spatial connectivity," Magn Reson Med, vol.41, pp. 939-946,1999.

[4] A. Wismuller, A. Meyer-Base, O. Lange, D. Auer, M. F. Reiser, andD. Sumners, "Model-free functional MRI analysis based onunsupervised clustering," Journal of Biomedical Informatics, vol.37,pp 10-18,2004.

[5] E. Dimitriadoua, M. Barthb, C. Windischbergerc, K. Hornika, E.Moser, "A quantitative comparison of functional MRI clusteranalysis," Artificial Intelligence in Medicine, vol. 3l,pp. 57-71,2004.

[6] C. Goutte, L. K. Hansen, M. G. Liptrot, and E. Rostrup, "Featurespace clustering for tMRI meta-analysis," IMM, Tech. Rep. IMMREP-1999-l3, 1999.

[7] C. Goutte, P. Toft, E. Rostrup, F. A. Nielsen, and L. K. Hansen, "Onclustering tMRI time series," NeuroImage, vol. 9, no. 3, pp. 298-310,1999.

[8] T. M. Martinetz, S. G. Berkovich, and K. J. Schulten, "Neural-gasnetwork for vector quantization and its application to time-seriesprediction," IEEE Trans Neural Networks, volA, pp. 560-569, 1993.

[9] T. Kohonen, "Self-organized formation of topologically correctfeature maps," Biological Cybernetics, vol. 43, pp 59-69, 1982.

[10] K. Rose, E. Gurewitz, and G. Fox, "Vector quantization bydeterministic annealing," IEEE Trans In! Theory, vol. 38, pp. 124957,1992.

[11] N. Lange and S. L. Zeger, "Non-linear Fourier time series analysis forhuman brain mapping by functional magnetic resonance imaging," J.Roy. Statistical Soc., ser. C, Appl. Stat., vol. 46, no. 1, pp. 1-30, 1997.

[12] C. Goutte, F. A. Nielsen, and L.K. Hansen, "Modeling theHaemodynamic Response in tMRI Using Smooth FIR Filters," IEEETrans. Med Imaging, vol. 19, no. 12, Dec. 2000.

[13] J. C. Bezdek, "Numerical taxonomy with fuzzy sets," Journal ofMathematical Biology, Vol. 1,57-71,1974.

[14] D. L. Davies and D.W. Bouldin, "A Cluster Separation Measure,"IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 1, pp.224-227, 1979.

[15] U. Maulik, and Bandyopadhyay, "Performance evaluation of someclustering algorithms and validity indices," IEEE Trans. PatternAnalysis Machine Intelligence, vol. 24(12), pp. 1650-1654,2002.

[16] N. Sadati, H. Bayati, H. Davoudi, "An improvement of growing neuralgas algorithm with the aim of finding the optimal number of clusters",submitted to Neurocomputing (under revision).

[17] B. Fritzke, "A growing neural gas network learns topologies," InAdvances in Neural Information Processing Systems 7, pages 625632. MIT Press, Cambridge MA, 1995.

[18] Kasabov, N., "On-line learning, reasoning, rule extraction andaggregation in locally optimized evolving fuzzy neural networks, "Neurocomputing. Vol. 41, pp. 25-45,2001.

645

extracting activated regions offmri data using...

Documents