jean-marc vesin and touradj ebrahimi- trends in brain computer interfaces

EURASIP Journal on Applied Signal Processing

Trends in Brain Computer Interfaces

Guest Editors: Jean-Marc Vesin and Touradj Ebrahimi


Guest Editors: Jean-Marc Vesin and Touradj Ebrahimi


Copyright © 2005 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in volume 2005 of “EURASIP Journal on Applied Signal Processing.” All articles are open accessarticles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproductionin any medium, provided the original work is properly cited.

Editor-in-ChiefMarc Moonen, Belgium

Senior Advisory EditorK. J. Ray Liu, College Park, USA

Associate EditorsGonzalo Arce, USA Arden Huang, USA King N. Ngan, Hong KongJaakko Astola, Finland Jiri Jan, Czech Douglas O’Shaughnessy, CanadaKenneth Barner, USA Søren Holdt Jensen, Denmark Antonio Ortega, USAMauro Barni, Italy Mark Kahrs, USA Montse Pardas, SpainJacob Benesty, Canada Thomas Kaiser, Germany Wilfried Philips, BelgiumKostas Berberidis, Greece Moon Gi Kang, Korea Vincent Poor, USAHelmut Bölcskei, Switzerland Aggelos Katsaggelos, USA Phillip Regalia, FranceJoe Chen, USA Walter Kellermann, Germany Markus Rupp, AustriaChong-Yung Chi, Taiwan Lisimachos P. Kondi, USA Hideaki Sakai, JapanSatya Dharanipragada, USA Alex Kot, Singapore Bill Sandham, UKPetar M. Djurić, USA C.-C. Jay Kuo, USA Dirk Slock, FranceJean-Luc Dugelay, France Geert Leus, The Netherlands Piet Sommen, The NetherlandsFrank Ehlers, Germany Bernard C. Levy, USA Dimitrios Tzovaras, GreeceMoncef Gabbouj, Finland Mark Liao, Taiwan Hugo Van hamme, BelgiumSharon Gannot, Israel Yuan-Pei Lin, Taiwan Jacques Verly, BelgiumFulvio Gini, Italy Shoji Makino, Japan Xiaodong Wang. USAA. Gorokhov, The Netherlands Stephen Marshall, UK Douglas Williams, USAPeter Handel, Sweden C. Mecklenbräuker, Austria Roger Woods, UKUlrich Heute, Germany Gloria Menegaz, Italy Jar-Ferr Yang, TaiwanJohn Homer, Australia Bernie Mulgrew, UK

Contents

Editorial, Jean-Marc Vesin and Touradj EbrahimiVolume 2005 (2005), Issue 19, Pages 3087-3088

Clustering of Dependent Components: A New Paradigm for fMRI Signal Detection, Anke Meyer-Bäse,Monica K. Hurdal, Oliver Lange, and Helge RitterVolume 2005 (2005), Issue 19, Pages 3089-3102

Robust EEG Channel Selection across Subjects for Brain-Computer Interfaces, Michael Schröder,Thomas Navin Lal, Thilo Hinterberger, Martin Bogdan, N. Jeremy Hill, Niels Birbaumer,Wolfgang Rosenstiel, and Bernhard SchölkopfVolume 2005 (2005), Issue 19, Pages 3103-3112

Determining Patterns in Neural Activity for Reaching Movements Using Nonnegative MatrixFactorization, Sung-Phil Kim, Yadunandana N. Rao, Deniz Erdogmus, Justin C. Sanchez,Miguel A. L. Nicolelis, and Jose C. PrincipeVolume 2005 (2005), Issue 19, Pages 3113-3121

Finding Significant Correlates of Conscious Activity in Rhythmic EEG, Piotr J. DurkaVolume 2005 (2005), Issue 19, Pages 3122-3127

Feature Selection and Blind Source Separation in an EEG-Based Brain-Computer Interface,David A. Peterson, James N. Knight, Michael J. Kirby, Charles W. Anderson, and Michael H. ThautVolume 2005 (2005), Issue 19, Pages 3128-3140

A Time-Frequency Approach to Feature Extraction for a Brain-Computer Interface with aComparative Analysis of Performance Measures, Damien Coyle, Girijesh Prasad, and T. M. McGinnityVolume 2005 (2005), Issue 19, Pages 3141-3151

EEG-Based Asynchronous BCI Controls Functional Electrical Stimulation in a Tetraplegic Patient,Gert Pfurtscheller, Gernot R. Müller-Putz, Jörg Pfurtscheller, and Rüdiger RuppVolume 2005 (2005), Issue 19, Pages 3152-3155

Steady-State VEP-Based Brain-Computer Interface Control in an Immersive 3D GamingEnvironment, E. C. Lalor, S. P. Kelly, C. Finucane, R. Burke, R. Smith, R. B. Reilly, and G. McDarbyVolume 2005 (2005), Issue 19, Pages 3156-3164

Estimating Driving Performance Based on EEG Spectrum Analysis, Chin-Teng Lin, Ruei-Cheng Wu,Tzyy-Ping Jung, Sheng-Fu Liang, and Teng-Yi HuangVolume 2005 (2005), Issue 19, Pages 3165-3174

EURASIP Journal on Applied Signal Processing 2005:19, 3087–3088c© 2005 Hindawi Publishing Corporation

Editorial

Jean-Marc VesinSignal Processing Institute, Swiss Federal Institute of Technology, 1015 Lausanne, SwitzerlandEmail: [email protected]

Touradj EbrahimiSignal Processing Institute, Swiss Federal Institute of Technology, 1015 Lausanne, SwitzerlandEmail: [email protected]

Brain-computer interfaces (BCI), an emerging domainin the field of man-machine interaction, have attracted in-creasing attention in the last few years. Among the reasonsfor such an interest, one may cite the expansion of neuro-sciences, the development of powerful information process-ing and machine learning techniques, as well as the merefascination for control of the physical world with humanthoughts.

BCI pose significant challenges, at both the biomedicaland the data processing levels. Brain processes are not fullyunderstood yet. Also, the information on the dynamics ofthese processes, up to now gathered mainly with electroen-cephalographic (EEG) or functional magnetic resonanceimaging (fMRI) systems, is incomplete and, more than often,noisy. As such, it is important for BCI applications to deter-mine how, physically, the maximum amount of informationcan be extracted, and to design efficient tools both to processthe data and to classify the results.

This special issue presents nine papers exhibiting a ratherbalanced state of research and development in BCI. Threepapers deal with information extraction, three with signalprocessing aspects, and three present applications. Moreover,while most current efforts concentrate on continuous EEG-based techniques, fMRI, implanted microwire electrode, andevoked potential-based techniques are also presented.

In the first batch of three papers on “information extrac-tion,” A. Meyer-Base et al. study independent componentanalysis (ICA) and unsupervised clustering techniques andcombine them to produce task-related activation maps forfMRI datasets. M. Schroder et al. explore the problem of EEGchannel selection for BCI tasks, and S.-P. Kim et al. proposea nonnegative matrix factorization to identify local spatio-temporal patterns of neural activity in microwire electrodesignals from monkey motor cortical regions.

The second batch of papers devoted to “signal process-ing” aspects of EEG signals bring new insights to this fieldby making use of advanced signal processing techniques and

by evaluating their performance. The paper by P. J. Durkapresents a methodology for the time-frequency analysis ofevent-related changes in EEG signals. D. A. Peterson et al. in-vestigate the potential of blind source separation (BSS) andsupport vector machine (SVM)-based classification to dis-criminate two cognitive tasks. Finally, D. Coyle et al. deal withthe extraction of time-frequency features to discriminate twoimagined movements.

The last batch concentrates on three exciting applicationsof BCI. The paper by G. Pfurtscheller et al. describes a BCIapproach for the control of a grasping device using func-tional electrical stimulation by a tetraplegic patient. E. Laloret al. present a BCI-based 3D video game using steady-statevisual evoked potentials, and C.-T. Lin et al. propose an EEG-based car-driver drowsiness estimation device.

We would like to thank the authors of this special issuefor their valuable submissions and the reviewers for theirhigh-quality evaluation. We hope the contributions madehere will serve to further encourage and stimulate progressin this new and exciting field. Last but not least, we wouldlike to thank the editorial team of EURASIP JASP for theircontinuous support and patience.

Jean-Marc VesinTouradj Ebrahimi

Jean-Marc Vesin graduated from the EcoleNationale Superieure d’Ingenieurs Elec-triciens de Grenoble (ENSIEG, Grenoble,France) in 1980. He received his M.S. degreefrom Laval University, Quebec, Canada, in1984, where he spent four years on researchprojects. After two years in the industry, hejoined the Swiss Federal Institute of Tech-nology, Lausanne, Switzerland, where heobtained his Ph.D. degree in 1992. He isnow a Senior Researcher in the Signal Processing Institute of EPFL.

3088 EURASIP Journal on Applied Signal Processing

His research work is focused on the analysis of biomedical signalsand the computer modeling of biological systems, with an emphasison cardiovascular and neuronal applications. He is the author ofmore than 150 journal and conference papers.

Touradj Ebrahimi is currently a Professorat EPFL, involved in research and teach-ing of multimedia signal processing. Hehas been the recipient of various distinc-tions such as the IEEE and Swiss NationalASE Award, the SNF-PROFILE grant foradvanced researchers, three ISO certificatesfor key contributions to MPEG-4 and JPEG2000, and the Best Paper Award of theIEEE Transactions on Consumer Electron-ics. His research interests include still, moving, and 3D imageprocessing and coding, visual information security (rights protec-tion, watermarking, authentication, data integrity, steganography),new media, and human-computer interfaces (smart vision, brain-computer interface). He is the author or the coauthor of more than150 research publications, and holds 10 patents.


Clustering of Dependent Components:A New Paradigm for fMRI Signal Detection

Anke Meyer-BaseDepartment of Electrical and Computer Engineering, Florida State University, Tallahassee, FL 32310-6046, USAEmail: [email protected]

Monica K. HurdalDepartment of Mathematics, Florida State University, Tallahassee, FL 32306-4510, USAEmail: [email protected]

Oliver LangeDepartment of Electrical and Computer Engineering, Florida State University, Tallahassee, FL 32310-6046, USAEmail: [email protected]

Helge RitterNeuroinformatics Group, Faculty of Technology, University of Bielefeld, 33501 Bielefeld, GermanyEmail: [email protected]

Received 1 February 2004

Exploratory data-driven methods such as unsupervised clustering and independent component analysis (ICA) are considered tobe hypothesis-generating procedures and are complementary to the hypothesis-led statistical inferential methods in functionalmagnetic resonance imaging (fMRI). Recently, a new paradigm in ICA emerged, that of finding “clusters” of dependent com-ponents. This intriguing idea found its implementation into two new ICA algorithms: tree-dependent and topographic ICA. ForfMRI, this represents the unifying paradigm of combining two powerful exploratory data analysis methods, ICA and unsupervisedclustering techniques. For the fMRI data, a comparative quantitative evaluation between the two methods, tree-dependent andtopographic ICA, was performed. The comparative results were evaluated by (1) task-related activation maps, (2) associated timecourses, and (3) ROC study. The most important findings in this paper are that (1) both tree-dependent and topographic ICAare able to identify signal components with high correlation to the fMRI stimulus, and that (2) topographic ICA outperforms allother ICA methods including tree-dependent ICA for 8 and 9 ICs. However for 16 ICs, topographic ICA is outperformed by tree-dependent ICA (KGV) using as an approximation of the mutual information the kernel generalized variance. The applicability ofthe new algorithm is demonstrated on experimental data.

Keywords and phrases: dependent component analysis, topographic ICA, tree-dependent ICA, fMRI.

1. INTRODUCTION

Functional magnetic resonance imaging with high tempo-ral and spatial resolution represents a powerful techniquefor visualizing rapid and fine activation patterns of the hu-man brain [1, 2, 3, 4, 5]. As is known from both theoret-ical estimations and experimental results [4, 6, 7], an acti-vated signal variation appears very low on a clinical scan-ner. This motivates the application of analysis methods todetermine the response waveforms and associated activatedregions. Generally, these techniques can be divided into twogroups: model-based techniques require prior knowledgeabout activation patterns, whereas model-free techniques do

not. However, model-based analysis methods impose somelimitations on data analysis under complicated experimen-tal conditions. Therefore, analysis methods that do not relyon any assumed model of functional response are consideredmore powerful and relevant. We distinguish two groups ofmodel-free methods: transformation-based and clustering-based methods. There are two kinds of model-free methods.The first kind, principal component analysis (PCA) [8, 9]or independent component analysis (ICA) [10, 11, 12, 13],transforms original data into high-dimensional vector spaceto separate functional response and various noise sourcesfrom each other. The second kind, fuzzy clustering anal-ysis [14, 15, 16, 17] or self-organizing maps [17, 18, 19],


attempts to classify time signals of the brain into severalpatterns according to temporal similarity among these sig-nals.

Among the data-driven techniques, ICA has been shownto provide a powerful method for the exploratory analysisof fMRI data [11, 13]. ICA is an information-theoretic ap-proach which enables to recover underlying signals, or inde-pendent components (ICs) from linear data mixtures. There-fore, it is an excellent method to be applied for the spatial lo-calization and temporal characterization of sources of BOLDactivation. ICA can be applied to fMRI both temporally andspatially. Spatial ICA has dominated so far in fMRI applica-tions because the spatial dimension is much larger than thetemporal dimension in fMRI. However, recent literature re-sults have suggested that temporal and spatial ICA yield simi-lar results for experiments where two predictable task-relatedcomponents are present.

A new methodology has attracted a lot of attention inthe ICA community during the last two years: the idea offinding “clusters” of independent components. Two leadingpapers implemented this new paradigm in a striking way.Clusters are defined as connected components of a graphi-cal model (lattice in [20] and tree structured in [21]). Bothmodels attempt a decomposition of the source variables suchthat they are dependent within a cluster and independentbetween the clusters. This idea emerged from multidimen-sional ICA, where the sources are not assumed to be all mu-tually independent [22]. Instead, it is assumed that they canbe grouped in n-tuples, such that within these tuples they aredependent on each other, but are independent outside.

The two paradigms differ in terms of topology and theknowledge of number and sizes of components.

In [20], the components are arranged on a two-dimen-sional grid or lattice as is typical in topographic models. Thegoal is to define a statistical model where the topographicproximity reflects the statistical dependencies between com-ponents. The components (simple cells) are placed on thegrid such that any two cells that are close to each other modeldependent components whereas cells that are far from eachother model independent components. The measure of de-pendency is based on the correlation of energies. Energy inthis context means the squaring operation. Nonlinear cor-relations are of importance since they cannot be easily setto zero by standard whitening procedures. Translated to ourmodel, this means that energies are strongly positively cor-related for neighboring components. The topology of themodel is fixed. This model also requires that the number andsizes of the components have to be fixed in advance. Learningis based on the maximization of the likelihood.

A totally different concept is employed in [21]. Here, thetopology of the dependency structure is not fixed in advance.However, it is assumed that it has the structure of a tree. Thegoal of the learning is to identify a minimal spanning treeconnecting the given sources in such a manner that no othertree expresses the dependency structure of the given distri-bution better. It is interesting to point out that in traditionalICA the graphical model has no edges meaning that the ran-dom variables are mutually independent.

We have seen that both clustering methods as well as ICAtechniques have their particular strengths in fMRI signal de-tection. Therefore, it is natural to look for a unifying tech-nique that combines those two processing mechanisms andapplies this combination to fMRI. The topographic and thetree-dependent ICA, as previously described, have the com-putational advantages associated with both techniques.

In this paper, we perform a detailed comparative studyfor fMRI among the tree-dependent and topographic ICAwith standard ICA techniques. In a systematic manner, wewill compare and evaluate the results obtained based oneach technique and present the benefits associated with eachparadigm.

2. EXPLORATORY DATA ANALYSIS METHODS

Functional organization of the brain is based on two comple-mentary principles, localization and connectionism. Local-ization means that each visual function is performed mainlyby a small set of cortical neurons. Connectionism, on theother hand, expresses that the brain regions involved in a cer-tain visual cortical function are widely distributed, and thusthe brain activity necessary to perform a given task may bethe functional integration of activity in distinct brain sys-tems. It is important to stress that in neurobiology the term“connectionism” is used in a different sense than that used inthe neural network terminology.

The following sections are dedicated to presenting the al-gorithms and evaluate the discriminatory power of the twomain groups of exploratory data analysis methods.

2.1. The basic ICA algorithms

According to the principle of functional organization of thebrain, it was suggested for the first time in [11] that the mul-tifocal brain areas activated by the performance of a visualtask should be unrelated to the brain areas whose signalsare affected by artifacts of physiological nature, head move-ments, or scanner noise related to fMRI experiments. Ev-ery single above-mentioned signal can be described by oneor more spatially independent components, each associatedwith a single time course of a voxel and a component map.It is assumed that the component maps, each described by aspatial distribution of fixed values, are representing overlap-ping, multifocal brain areas of statistically dependent fMRIsignals. This aspect is visualized in Figure 1. In addition, it isconsidered that the distributions of the component maps arespatially independent and, in this sense, uniquely specified.Mathematically, this means that if pk(Ck) specifies the prob-ability distribution of the voxel values Ck in the kth com-ponent map, then the joint probability distribution of all ncomponents yields

p(C1, . . . ,Cm

) = n∏k=1

pk(Ck), (1)

where each of the component maps Ck is a vector (Cki, i =1, 2, . . . ,M), where M gives the number of voxels. Indepen-dency is a stronger condition than uncorrelatedness. It was

Clustering of Dependent fMRI Components 3091

Independentcomponents

Measuredsignals

Time course

Map

(a)

n

...

2

1

S X = AS X

t = n

...

t = 2

t = 1

Mixing matrixA

Componentmaps

MeasuredfMRI signals

Mixing

(b)

Figure 1: Visualization of ICA applied to fMRI data. (a) Scheme of fMRI data decomposed into independent components, and (b) fMRIdata as a mixture of independent components where the mixing matrix A specifies the relative contribution of each component at each timepoint [11].

shown in [11] that these maps are independent if the activevoxels in the maps are sparse and mostly nonoverlapping.Additionally it is assumed that the observed fMRI signals arethe superposition of the individual component processes ateach voxel. Based on these assumptions, ICA can be appliedto fMRI time series to spatially localize and temporally char-acterize the sources of BOLD activation.

Different methods for performing ICA decompositionshave been proposed which employ different objective func-tions together with different criteria of optimization of thesefunctions, and it is assumed that they can produce differentresults.

2.2. Models of spatial ICA in fMRI

In the following we will assume that X is a T ×M matrix ofobserved voxel time courses (fMRI signal data matrix), C isthe N × M random matrix of component map values, andA is a T × N mixing matrix containing in its columns theassociated time courses of the N components. Furthermore,T corresponds to the number of scans, and M is the numberof voxels included in the analysis.

The spatial ICA (sICA) problem is given by the followinglinear combination model for the data:

X = AC, (2)

where no assumptions are made about the mixing matrix Aand the rows Ci being mutually statistically independent.

Then the ICA decomposition of X can be defined as aninvertible transformation:

C = WX, (3)

where W is an unmixing matrix providing a linear decompo-sition of data. A is the pseudoinverse of W.

The employed ICA algorithms are the TDSEP, JADE, andthe FastICA approach based on the minimization of mutualinformation but using the negentropy as a measure of non-Gaussianity [23], and topographic ICA which combines to-pographic mapping with ICA [20].

2.3. Tree-dependent component analysis model

The paradigm of TCA is derived from the theory of tree-structured graphical models. In [24] a strategy was shown toapproximate optimally an n-dimensional discrete probabil-ity distribution by a product of second-order distributions,or the distribution of the first-order tree dependence. A treeis an undirected graph with at most a single edge betweentwo nodes. This tree concept can be easily interpreted withrespect to ICA. A graph with no edges means that the ran-dom variables are mutually independent and this pertains toICA. On the other hand, if no assumptions are made aboutindependence, then the corresponding family of probabilitydistributions represents the set of all distributions.

A probability distribution can be approximated in severalways. Here, we look into approximations based on a prod-uct of n − 1 second-order component distributions. In [24]a strategy of the best approximation of an nth-order distri-bution was developed by a product of n − 1 second-ordercomponent distributions:

Pi(x) =n∏i=1

P(xmi | xmj (i)

), 0 ≤ j(i) < i, (4)

where P(x) is a joint probability distribution of n discretevariables with x = x1, . . . , xn being a vector, (m1, . . . ,mn) is anunknown permutation of integers 1, 2, . . . ,n, and P(xi | x0) isby definition equal to P(xi). The above introduced probabil-ity distribution is named a probability distribution of first-order tree dependence.


To determine the goodness of an approximation, it is nec-essary to define a closeness as

I(P,Pa

) =∑xP(x) log

P(x)Pa(x)

, (5)

where P(x) and Pa(x) are two probability distributions of then random variables x. The quantity I(P,Pa) has the propertyI(P,Pa) ≥ 0.

Translated to random variables, the above definition isnamed mutual information and is always nonnegative:

I(xi, xj

) = ∑xi,xj

P(xi, xj

)log

(P(xi, xj

)P(xi)P(xj)). (6)

In the following, we will state the solution to the approxi-mation of the probability distribution. We are searching fora distribution of tree dependence Pτ(x1, . . . , xn) such thatI(P,Pτ) ≤ I(P,Pt) for all t ∈ Tn where Tn represents theset of all possible first-order dependence trees. Thus, the so-lution τ is defined as the optimal first-order dependence tree.

In parlance of graph theory, every branch of the depen-dence tree is assigned a branch weight I(xi, xj(i)). Thus beinggiven a dependence tree t, the sum of all branch weights be-comes a useful quantity.

In [24] it was shown that a maximum-weight depen-dence tree is a dependence tree t such that, for all t′ in Tn,

n∑i=1

I(xi, xj(i)

) ≥ n∑i=1

I(xi, xj′(i)

). (7)

In other words, a probability distribution of tree dependencePt(x) is an optimum approximation to P(x) if and only if itsdependence tree t has maximum weight or minimizing thecloseness measure I(P,Pt) is equivalent to maximizing thetotal branch weight.

The idea of approximating discrete probability distribu-tions with dependence trees described before and adaptedfrom [24] can be easily translated to ICA [21].

In classic ICA, we want to minimize the mutual informa-tion of the estimated components s = Wx. Thus, the resultderived in [24] can be easily extended and becomes the tree-dependent ICA.

The objective function for TCA is given by J(x, W, t) andincludes the demixing matrix W. Thus, the mutual informa-tion for TCA becomes

J(x, W, t) = It(s) = I(s1, . . . , sm

)− ∑(u,v)∈t

I(su, sv

), (8)

where s factorizes in a tree t.In TCA as in ICA, the density p(x) is not known and

the estimation criteria have to be substituted by empiri-cal contrast functions. As described in [21], we will em-ploy three types of contrast functions: (i) approximation ofthe entropiesbeing part of (8) via kernel density estimation

(KDE), (ii) approximation of the mutual information basedon kernel generalized variance (KGV), and (iii) approxima-tion based on cumulants using Gram-Charlier expansions(CUM).

2.4. Topographical independent component analysis

The topographic independent component analysis [20] rep-resents a unifying model which combines topographic map-ping with ICA.

Achieved by a slight modification of the ICA model, itcan at the same time be used to define a topographic orderbetween the components and thus has the usual computa-tional advantages associated with topographic maps.

The paradigm of topographic ICA has its roots in [25]where a combination of invariant feature subspaces [26] andindependent subspaces [22] is proposed. In the following, wewill describe these two parts, which substantially reflect theconcept of topographic ICA [27].

2.4.1. Invariant feature subspaces

The principle of invariant feature subspaces was developedby Kohonen [26] with the intention of representing featureswith some invariances. This principle states that an invariantfeature is given by a linear subspace in a feature space. Thevalue of the invariant feature is given by the squared norm ofthe projection of the given data point on that subspace.

A feature subspace can be described by a set of orthogo-nal basis vectors w j , j = 1, . . . ,n, where n is the dimension ofthe subspace. Then the value G(x) of the feature G with theinput vector x is given by

G(x) =n∑j=1

⟨w j , x

⟩2. (9)

In other words, this describes the distance between the inputvector x and a general linear combination of the basis vectorsw j of the feature subspace [26].

2.4.2. Independent subspaces

Traditional ICA works under the assumption that the ob-served signals xi(t) (i = 1, . . . ,n) are generated by a lin-ear weighting of a set of n statistically independent randomsources s j(t) with time-independent coefficients ai j . In a ma-trix form, this can be expressed as

x(t) = As(t), (10)

where x(t) = [x1(t), . . . , xn(t)]T , s(t) = [s1(t), . . . , sn(t)], andA = [ai j].

In multidimensional ICA [22], the sources si are not as-sumed to be all mutually independent. Instead, it is assumedthat they can be grouped in n-tuples, such that within thesetuples they are dependent on each other, but are indepen-dent outside. This newly introduced assumption was ob-served in several image processing applications. Each n-tupleof sources si corresponds to n basis vectors given by the rows


of matrix A. A subspace spanned by a set of n such basisvectors is defined as an independent subspace. In [22] twosimplifying assumptions are made: (1) although si are not atall independent, they are chosen to be uncorrelated and ofunit variance, and (2) the data are preprocessed by whiten-ing (sphering) them. This means the w j are orthonormal.

Let J be the number of independent feature subspacesand Sj , j = 1, . . . , J , the set of indices that belong to the sub-space of index j. Assume that we have T given observationsx(t), t = 1, . . . ,T . Then the likelihood L of the data based onthe model is given by

L(

wi, i = 1, . . . ,n)

=T∏t=1

[|det W|

J∏j=1

pj(⟨

wi, x(t)⟩

, i ∈ Sj)] (11)

with pj(·) being the probability density inside the jth n-tupleof si. The expression |det W| is due to the linear transforma-tion of the pdf. As always with ICA, pj(·) need not be knownin advance.

2.4.3. Fusion of invariant feature andindependent subspaces

In [25] it is shown that a fusion between the concepts of in-variant and independent subspaces can be achieved by con-sidering probability distributions for the n-tuples of si be-ing spherically symmetric, that is, depending on the norm.In other words, the pdf pj(·) has to be expressed as a func-tion of the sum of the squares of the si, i ∈ Sj , only. Ad-ditionally, it is assumed that the pdfs are equal for all sub-spaces.

The log likelihood of this new data model is given by

logL(

wi, i = 1, . . . ,n)

=T∑t=1

J∑j=1

log p

( ∑i∈Sj

⟨wi, x(t)

⟩2)

+ T log |det W|.(12)

p(∑

i∈Sj s2i ) = pj (si, i ∈ Sj) gives the pdf inside the jth n-

tuple of si. Based on the prewhitening, we have log |det W| =0.

For computational simplification, set

G

( ∑i∈Sj

s2i

)= log p

( ∑i∈Sj

⟨wi, x(t)

⟩2). (13)

Since it is known that the projection of visual data on anysubspace has a super-Gaussian distribution, the pdf has to bechosen to be sparse. Thus, we will choose G(u) = α

√u + β

yielding a multidimensional version of an exponential distri-bution. α and β are constants and enforce that si is of unitvariance.

u3

u2

u1

Σ

Σ

Σ

φ

φ

φ

s3

s2

s1

σ3

σ2

σ1

A

x3

x2

x1

Figure 2: Topographic ICA model [20]. The variance-generatingvariables ui are randomly generated and mixed linearly within theirtopographic neighborhoods. This forms the input to nonlinearity φ,thus giving the local variance σi. Components si are generated withvariances σi. The observed variables xi are obtained as with standardICA from the linear mixture of the components si.

2.4.4. The topographic ICA architecture

Based on the concepts introduced in the preliminary subsec-tions, this section describes the topographic ICA.

To introduce a topographic representation in the ICAmodel, it is necessary to relax the assumption of indepen-dence among neighboring components si. This makes it nec-essary to adopt an idea from self-organized neural networks,that of a lattice. It was shown in [20] that a representa-tion which models topographic correlation of energies is anadequate approach for introducing dependencies betweenneighboring components.

In other words, the variances corresponding to neigh-boring components are positively correlated while the othervariances are, in a broad sense, independent. The architec-ture of this new approach is shown in Figure 2.

This idea leads to the following representation of thesource signals:

si = σizi, (14)

where zi is a random variable having the same distribution assi, and the variance σi is fixed to unity.

The variance σi is further modeled by a nonlinearity:

σi = φ

( n∑k=1

h(i, k)uk

), (15)

where ui are the higher-order independent components usedto generate the variances, while φ describes some nonlinear-ity. The neighborhood function h(i, k) can either be a two-dimensional grid or have a ring-like structure. Further ui andzi are all mutually independent.

The learning rule is based on the maximization of thelikelihood. First, it is assumed that the data are preprocessedby whitening and that the estimates of the components areuncorrelated. The log likelihood is given by

logL(

wi, i = 1, . . . ,n)

=T∑t=1

n∑j=1

G

( n∑i=1

(wTi x(t)2)) + T log |det W|. (16)


Topo

ICA

Fast

ICA

JAD

E

TD

SEP

PC

A

TC

AC

UM

TC

AK

GV

TC

AK

DE

0.7

0.8

0.9

1

Are

a

(a)

Topo

ICA

Fast

ICA

JAD

E

TD

SEP

PC

A

TC

AC

UM

TC

AK

GV

TC

AK

DE

0.7

0.8

0.9

1

Are

a

(b)

Topo

ICA

Fast

ICA

JAD

E

TD

SEP

PC

A

TC

AC

UM

TC

AK

GV

0.7

0.8

0.9

1

Are

a

(c)

Figure 3: Results of the comparison between tree-dependent ICA, topographic ICA, Jade, FastICA, TDSEP, and PCA on fMRI data. Spatialaccuracy of ICA maps is assessed by ROC analysis using correlation map with a chosen threshold of 0.4. The number of chosen independentcomponents (ICs) for all techniques is N = 8 in (a), N = 9 in (b), and N = 16 in (c).

The update rule for the weight vector wi is derived froma gradient algorithm based on the log likelihood assuminglog |det W| = 0:

∆wi ∝ E

x(

wTi x)ri

, (17)

where

ri =n∑

k=1

h(i, k)g

( n∑j=1

h(k, j)(

wTj x)2). (18)

The function g is the derivative of G = −α1√u + β1. Af-

ter every iteration, the vectors wi in (17) are normalized tounit variance and orthogonalized. This equation represents amodulated learning rule, where the learning term is modu-lated by the term ri.

The classic ICA results from the topographic ICA by set-ting h(i, j) = δi j .

3. RESULTS AND DISCUSSION

fMRI data were recorded from six subjects (3 female, 3male, age 20–37) performing a visual task. In five subjects,five slices with 100 images (TR/TE = 3000/60 msec) were

acquired with five periods of rest and five photic simula-tion periods with rest. Simulation and rest periods com-prised 10 repetitions each, that is, 30 seconds. Resolutionwas 3× 3× 4 mm. The slices were oriented parallel to thecalcarine fissure. Photic stimulation was performed usingan 8 Hz alternating checkerboard stimulus with a centralfixation point and a dark background with a central fixa-tion point during the control periods [17]. The first scanswere discarded for remaining saturation effects. Motion arti-facts were compensated by automatic image alignment (AIR,[28]).

The clustering results were evaluated by (1) task-relatedactivation maps, (2) associated time courses, and (3) ROCcurves.

3.1. Estimation of the ICA model

To decide to what extent spatial ICA of fMRI time series de-pends on the employed algorithm, we have first to look at theoptimal number of principal components selected by PCAand used in the ICA decomposition. ICA is a generalizationof PCA. In case no ICA is performed, then the number of in-dependent components equals zero, and this means there isno PCA decomposition performed.

In the following we will give the set parameters. ForPCA, no parameters had to be set. For FastICA we choose


(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 4: Cluster assignment maps for cluster analysis based on the tree-dependent ICA (CUM) of a visual stimulation fMRI experimentobtained for 16 ICs.

(1) ε = 10−6, (2) 105 as the maximal number of iterations,and (3) the nonlinearity g(u) = tanhu. And last, for topo-graphic ICA we set the following: (1) stop criterium is full-filled if the synaptic weights difference between two consecu-tive iterations is less than 10−5×number of ICs, (2) the func-tion g(u) = u, and (3) 104 is the maximal number of itera-tions.

It is significant to find a fixed number of ICs that cantheoretically predict new observations in same conditions,assuming the basic ICA model actually holds. To do so, we

compared the six proposed algorithms for 8, 9, and 16 com-ponents in terms of ROC analysis using a correlation mapwith a chosen threshold of 0.4. The obtained results are plot-ted in Figure 3. It can be seen that topographic ICA outper-forms all other ICA methods for 8 and 9 ICs. However, for16 ICs topographic ICA is outperformed by tree-dependentICA (KGV) using as an approximation of the mutual infor-mation the kernel generalized variance.

The clustering results for the two methods, the tree-dependent (CUM and KGV) and topographic ICA are shown


cc : 0.08

(a)

cc : −0.05

(b)

cc : 0.19

(c)

cc : 0.02

(d)

cc : −0.08

(e)

cc : −0.03

(f)

cc : 0.04

(g)

cc : −0.05

(h)

cc : 0.21

(i)

cc : 0.07

(j)

cc : −0.05

(k)

cc : 0.20

(l)

cc : −0.23

(m)

cc : 0.00

(n)

cc : −0.09

(o)

cc : −0.92

(p)

Figure 5: Associated codebook vectors for the tree-dependent ICA (CUM) as shown in Figure 4. Assignment of the codebook vectorscorresponds to the order of the assignment maps shown in Figure 4.

in Figures 4–9. Figures 4, 6, and 8 illustrate the so-called as-signment maps where all the pixels belonging to a specificcluster are highlighted. The assignment between a pixel anda specific cluster is given by the minimum distance betweenthe pixel and an IC from the established codebook. On theother hand, each IC shown in Figures 5, 7, and 9 can beviewed as the cluster-specific weighted average of all pixeltime courses.

We immediately can see a topographical representationin Figure 9 by looking at the last row (ICs 15 and 16): the

two time courses s with the highest absolute-valued cor-relation are grouped together. Thus, the advantage of thetree-dependent ICA (KGV) becomes immediately evident: itgroups together signals according to their dependence con-tent. This effect cannot be observed neither with topographicnor with tree-dependent ICA (CUM).

3.2. Characterization of task-related effects

For all subjects, and runs, unique task-related activationmaps and associated time courses were obtained by the


(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 6: Cluster assignment maps for cluster analysis based on the topographic ICA of a visual stimulation fMRI experiment obtained for16 ICs.

tree-dependent and topographic ICA techniques. Thecorrelation of the component time course most closelyassociated with the visual task for these two techniques isshown in Table 1 for IC = 8, 9, and 16. This time course canserve as an estimate of the stimulus reference function usedin the fMRI experiment, as identified by the specific depen-dent component technique. From Table 1, we see for the tree-dependent ICA (CUM) a continuous increase for the corre-lation coefficient while for the topographic ICA this correla-tion coefficient decreases for IC = 16 and for tree-dependentICA (KGV) it decreases even for IC = 9.

3.3. Exploratory analysis of ancillary findings

From Figures 4–9, we can also obtain some insight in thetype of artifactual components. For the cluster assignmentmaps in Figure 4, cluster 12 and cluster 16 in Figure 6 maybe assigned to a coactivation of the frontal eye fields inducedby stimulus onset. No such findings can be reported fromFigure 8. There may be some type of physiological related-ness between cluster 12 on one hand, and between cluster 16showing high correlation with the stimulus function, on theother hand in Figure 4. The same is valid for cluster 16 andcluster 8 in Figure 6. Interestingly, Figure 8 determines two


cc : −0.02

(a)

cc : −0.10

(b)

cc : 0.07

(c)

cc : −0.02

(d)

cc : −0.05

(e)

cc : 0.06

(f)

cc : −0.09

(g)

cc : −0.086

(h)

cc : 0.14

(i)

cc : −0.05

(j)

cc : 0.12

(k)

cc : 0.12

(l)

cc : −0.11

(m)

cc : −0.26

(n)

cc : 0.01

(o)

cc : 0.18

(p)

Figure 7: Associated codebook vectors for the topographic ICA as shown in Figure 6. Assignment of the codebook vectors corresponds tothe order of the assignment maps shown in Figure 6.

ICs showing a high correlation with the stimulus function.However, this connection is not revealed by the feature spacemetric and thus is not supported by clustering approachesbased on this metric.

An additional benefit from unsupervised clustering tech-niques represents the ability to identify data highly indicativeof artifacts, for example, ventricular pulsation or throughplane motion. Cluster 6 in Figure 4 and cluster 3 in Figure 6,for example, show the region of the inner ventricles. It is im-portant to mention that these effects could not have been de-tected by model-based approaches.

4. CONCLUSION

In the present paper, we have experimentally compared fourstandard ICA algorithms already adopted in the fMRI liter-ature with two new algorithms, the tree-dependent and to-pographic ICA. The goal of the paper was to determine therobustness and reliability of extracting task-related activationmaps and time courses from fMRI data sets. The success ofICA methods is based on the condition that the spatial dis-tribution of brain areas activated by task performance mustbe spatially independent of the distributions of areas affected


(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 8: Cluster assignment maps for cluster analysis based on the tree-dependent ICA (KGV) of a visual stimulation fMRI experimentobtained for 16 ICs.

by artifacts. The obtained results proved to reveal extremelywell the structure of the data set.

It can be seen that topographic ICA outperforms allother ICA methods for 8 and 9 ICs. However, for 16 ICstopographic ICA is outperformed by tree-dependent ICA(KGV) using as an approximation of the mutual informa-tion the kernel generalized variance. All dependent compo-nent techniques can be employed to identify interesting an-cillary findings that cannot be detected by model-based ap-proaches. The applicability of the new algorithms is demon-strated on experimental data. We conjecture that the methodcan serve as a multipurpose exploratory data analysis strategy

to image time-series analysis and provide good visualiza-tion for many fields ranging from biomedical basic researchto clinical assessment of patient data. In particular, beyondthe application to fMRI data analysis discussed in this pa-per, the method exhibits a specific potential to serve in ap-plications refering to dynamic contrast-enhanced perfusionMRI for the diagnosis of cerebrovascular disease or mag-netic resonance mammography for the analysis of suspi-cious lesions in patients with breast cancer. In addition,it could yield a visualization of large trees using a hyper-bolic space by employing a hyperbolic self-organized map[29].


cc : −0.16

(a)

cc : 0.08

(b)

cc : 0.31

(c)

cc : −0.11

(d)

cc : −0.00

(e)

cc : 0.02

(f)

cc : 0.00

(g)

cc : −0.04

(h)

cc : 0.22

(i)

cc : 0.15

(j)

cc : 0.01

(k)

cc : 0.04

(l)

cc : 0.24

(m)

cc : 0.19

(n)

cc : 0.82

(o)

cc : 0.66

(p)

Figure 9: Associated codebook vectors for the tree-dependent ICA (KGV) as shown in Figure 8. Assignment of the codebook vectors corre-sponds to the order of the assignment maps shown in Figure 8.

Table 1: Comparison of the correlations of the component time course most closely associated with the visual task for tree-dependent (treeICA) and topographic ICA (topo ICA) for IC = 8, 9, and 16.

No. of ICs Tree ICA (KDE) Tree ICA (KGV) Tree ICA (CUM) Topo ICA

IC = 8 0.78 0.74 0.78 0.85

IC = 9 0.79 0.66 0.91 0.87

IC = 16 — 0.82 0.92 0.86


ACKNOWLEDGMENTS

The authors would like to thank Dr. Dorothee Auer from theMax Planck Institute of Psychiatry in Munich, Germany, forproviding the fMRI data. We are grateful for the financialsupport of the Humboldt Foundation.

REFERENCES

[1] P. A. Bandettini, E. C. Wong, R. S. Hinks, R. S. Tikofsky, andJ. S. Hyde, “Time course EPI of human brain function dur-ing task activation,” Magnetic Resonance in Medicine, vol. 25,no. 2, pp. 390–397, 1992.

[2] J. Frahm, K. D. Merboldt, and W. Hanicke, “Functional MRIof human brain activation at high spatial resolution,” Mag-netic Resonance in Medicine, vol. 29, no. 1, pp. 139–144, 1993.

[3] K. Kwong, “Functional magnetic-resonance-imaging withecho-planar imaging,” Magnetic Resonance Quarterly, vol. 11,no. 1, pp. 1–20, 1995.

[4] K. Kwong, J. Belliveau, D. Chesler, et al., “Dynamic magneticresonance imaging of human brain activity during primarysensor stimulation,” Proceedings of the National Academy ofScience, vol. 89, no. 12, pp. 5675–5679, 1992.

[5] S. Ogawa, D. Tank, R. Menon, et al., “Intrinsic signal changesaccompanying sensory stimulation: functional brain mappingwith magnetic resonance imaging,” Proceedings of the NationalAcademy of Science, vol. 89, no. 13, pp. 5951–5955, 1992.

[6] J. Boxerman, P. A. Bandettini, K. Kwong, et al., “The intravas-cular contribution to FMRI signal change: Monte Carlo mod-eling and diffusion-weighted studies in vivo,” Magnetic Reso-nance in Medicine, vol. 34, no. 1, pp. 4–10, 1995.

[7] S. Ogawa, T. Lee, and B. Barrere, “The sensitivity of mag-netic resonance image signals of a rat brain to changes in thecerebral venous blood oxygenation activation,” Magnetic Res-onance in Medicine, vol. 29, no. 2, pp. 205–210, 1993.

[8] J. J. Sychra, P. A. Bandettini, N. Bhattacharya, and Q. Lin,“Synthetic images by subspace transforms I. Principal com-ponents images and related filters,” Medical Physics, vol. 21,no. 2, pp. 193–201, 1994.

[9] W. Backfrieder, R. Baumgartner, M. Samal, E. Moser, andH. Bergmann, “Quantification of intensity variations infunctional MR images using rotated principal components,”Physics in Medicine and Biology, vol. 41, no. 8, pp. 1425–1438,1996.

[10] M. J. McKeown, T.-P. Jung, S. Makeig, et al., “Spatially in-dependent activity patterns in functional MRI data duringthe stroop color-naming task,” Proceedings of the NationalAcademy of Sciences, vol. 95, no. 3, pp. 803–810, 1998.

[11] M. J. McKeown, S. Makeig, G. G. Brown, et al., “Analysis offMRI data by blind separation into independent spatial com-ponents,” Human Brain Mapping, vol. 6, no. 3, pp. 160–188,1998.

[12] F. Esposito, E. Formisano, E. Seifritz, et al., “Spatial indepen-dent component analysis of functional MRI time-series: Towhat extent do results depend on the algorithm used?” Hu-man Brain Mapping, vol. 16, no. 3, pp. 146–157, 2002.

[13] K. Arfanakis, D. Cordes, V. M. Haughton, C. H. Moritz, M.A. Quigley, and M. E. Meyerand, “Combining independentcomponent analysis and correlation analysis to probe inter-regional connectivity in fMRI task activation datasets,” Mag-netic Resonance Imaging, vol. 18, no. 8, pp. 921–930, 2000.

[14] G. Scarth, M. McIntyre, B. Wowk, and R. Somorjai, “Detec-tion of novelty in functional images using fuzzy clustering,”in Proc. 3rd Scientific Meeting of the International Society forMagnetic Resonance in Medicine, vol. 95, pp. 238–242, Nice,France, August 1995.

[15] K.-H. Chuang, M.-J. Chiu, C.-C. Lin, and J.-H. Chen,“Model-free functional MRI analysis using Kohonen cluster-ing neural network and fuzzy C-means,” IEEE Trans. Med.Imag., vol. 18, no. 12, pp. 1117–1128, 1999.

[16] R. Baumgartner, L. Ryner, W. Richter, R. Summers, M. Jar-masz, and R. Somorjai, “Comparison of two exploratory dataanalysis methods for fMRI: fuzzy clustering vs. principal com-ponent analysis,” Magnetic Resonance Imaging, vol. 18, no. 1,pp. 89–94, 2000.

[17] A. Wismuller, O. Lange, D. R. Dersch, et al., “Cluster analy-sis of biomedical image time-series,” International Journal ofComputer Vision, vol. 46, no. 2, pp. 103–128, 2002.

[18] H. Fischer and J. Hennig, “Clustering of functional MR data,”in Proc. 4th Annual Meeting of the International Society forMagnetic Resonance in Medicine (ISMRM ’96), pp. 1179–1183,New York, NY, USA, April 1996.

[19] S. C. Ngan and X. Hu, “Analysis of functional magnetic reso-nance imaging data using self-organizing mapping with spa-tial connectivity,” Magnetic Resonance in Medicine, vol. 41,no. 5, pp. 939–946, 1999.

[20] A. Hyvarinen, P. Hoyer, and M. Inki, “Topographic indepen-dent component analysis,” Neural Computation, vol. 13, no. 7,pp. 1527–1558, 2001.

[21] F. R. Bach and M. I. Jordan, “Beyond independent compo-nents: trees and clusters,” Journal of Machine Learning Re-search, vol. 4, pp. 1205–1233, December 2003.

[22] J.-F. Cardoso, “Multidimensional independent componentanalysis,” in Proc. IEEE International Conference on Acoustics,Speech, and Signal Processing (ICASSP ’98), vol. 4, pp. 1941–1944, Seattle, Wash, USA, May 1998.

[23] A. Hyvarinen and E. Oja, “Independent component analysis:algorithms and applications,” Neural Networks, vol. 13, no. 4-5, pp. 411–430, 2000.

[24] C. R. Chow and C. N. Liu, “Approximating discrete probabil-ity distributions with dependence trees,” IEEE Trans. Inform.Theory, vol. 14, no. 3, pp. 462–467, 1968.

[25] A. Hyvarinen and P. Hoyer, “Emergence of phase- and shift-invariant features by decomposition of natural images intoindependent feature subspaces,” Neural Computation, vol. 12,no. 7, pp. 1705–1720, 2000.

[26] T. Kohonen, “Emergence of invariant-feature detectors in theadaptive-subspace self-organizing map,” Biological Cybernet-ics, vol. 75, no. 4, pp. 281–291, 1996.

[27] A. Meyer-Base, Pattern Recognition for Medical Imaging, Aca-demic Press, Boston, Mass, USA, 2003.

[28] R. P. Woods, S. R. Cherry, and J. C. Mazziotta, “Rapid auto-mated algorithm for aligning and reslicing PET images,” Jour-nal of Computer Assisted Tomography, vol. 16, no. 4, pp. 620–633, 1992.

[29] H. Ritter, “Self-organizing maps in non-euclidean spaces,” inKohonen Maps, pp. 97–108, Springer, Berlin, Germany, 1999.

Anke Meyer-Base is with the Departmentof Electrical and Computer Engineering atthe Florida State University. Her researchareas include theory and application ofneural networks, medical image process-ing, pattern recognition, and parallel pro-cessing. She was awarded the Lise-Meitner-Prize in 1997. She published over 100 pa-pers in several areas including intelligentsystems, medical image processing, speechrecognition, and neural networks. She is author of the bookPattern Recognition in Medical Imaging which appeared in Else-vier/Academic Press in 2003.


Monica K. Hurdal is an Assistant Profes-sor of Biomedical Mathematics at FloridaState University in Tallahassee, Florida. Shewas awarded her Ph.D. degree in 1999from Queensland University of Technology,Australia, in applied mathematics. Subse-quently, Dr. Hurdal was a Postdoctoral Re-search Associate for two years at FloridaState University (FSU) in mathematics andalso computer science, working on confor-mal flat mapping of the human brain. She continued her researchat Johns Hopkins University in the Center for Imaging Science asa Research Scientist, followed by her current position in 2001 inBiomedical Mathematics at FSU. Her research interests include ap-plying topology, geometry, and conformal methods to the analysisand modeling of neuroscientific data from the human brain. She isinvestigating topology issues associated with constructing corticalsurfaces from MRI data, computing conformal maps of the brain,and applying topological and conformal invariants to characterizedisease in MRI studies.

Oliver Lange studied information tech-nologies engineering at the TU in Munich.After finishing his diploma in 1999, he wasa Ph.D. student in biomedical engineeringat the Institute of Clinical Radiology at theUniversity of Munich. When he finished hisPh.D. in 2003, Oliver Lange was a Consul-tant for the Department of Engineering atFlorida State University. Since July 2004, hehas been working as a Research Engineer inthe field of biomedical signal processing.

Helge Ritter studied physics and mathe-matics at the Universities of Bayreuth, Hei-delberg, and Munich. After a Ph.D. degreein physics at the Technical University ofMunich in 1988, he visited the Laboratoryof Computer Science at Helsinki Univer-sity of Technology and the Beckman In-stitute for Advanced Science and Technol-ogy at the University of Illinois at Urbana-Champaign. Since 1990 he has been theHead of the Neuroinformatics Group at the Faculty of Technol-ogy, Bielefeld University. His main interests are principles of neuralcomputation and their application to build intelligent systems. In1999, Helge Ritter was awarded the SEL Alcatel Research Prize andin 2001 the Leibniz Prize of the German Research Foundation DFG.


Robust EEG Channel Selection across Subjectsfor Brain-Computer Interfaces

Michael Schroder,1 Thomas Navin Lal,2 Thilo Hinterberger,3 Martin Bogdan,1

N. Jeremy Hill,2 Niels Birbaumer,3 Wolfgang Rosenstiel,1 and Bernhard Scholkopf2

1 Department of Computer Engineering, Eberhard-Karls University Tubingen, Sand 13, 72076 Tubingen, GermanyEmails: [email protected], [email protected],[email protected]

2 Max Planck Institute for Biological Cybernetics, Spemannstrasse 38, 72076 Tubingen, GermanyEmails: [email protected], [email protected], [email protected]

3 Institute of Medical Psychology and Behavioral Neurobiology, Eberhard-Karls University Tubingen, Gartenstrasse 29,72074 Tubingen, GermanyEmails: [email protected], [email protected]

Received 11 February 2004; Revised 22 September 2004

Most EEG-based brain-computer interface (BCI) paradigms come along with specific electrode positions, for example, for a visual-based BCI, electrode positions close to the primary visual cortex are used. For new BCI paradigms it is usually not known wheretask relevant activity can be measured from the scalp. For individual subjects, Lal et al. in 2004 showed that recording positions canbe found without the use of prior knowledge about the paradigm used. However it remains unclear to what extent their methodof recursive channel elimination (RCE) can be generalized across subjects. In this paper we transfer channel rankings from a groupof subjects to a new subject. For motor imagery tasks the results are promising, although cross-subject channel selection does notquite achieve the performance of channel selection on data of single subjects. Although the RCE method was not provided withprior knowledge about the mental task, channels that are well known to be important (from a physiological point of view) wereconsistently selected whereas task-irrelevant channels were reliably disregarded.

Keywords and phrases: brain-computer interface, channel selection, feature selection, recursive channel elimination, supportvector machine, electroencephalography.

1. INTRODUCTION

Brain-computer interface (BCI) systems are designed to dis-tinguish two or more mental states during the performanceof mental tasks (e.g., motor imagery tasks). Many BCI sys-tems for humans try to classify those states on the basis ofelectroencephalographic (EEG) signals using machine learn-ing algorithms.

The input for classification methods is a set of trainingexamples. In the case of BCI one example might consist ofEEG data (possibly containing several channels) of one trialand a label marking the class of the trial. Classification meth-ods pursue the objective to find structure in the data and as aresult provide a mapping from EEG data to mental states.

For some tasks the relevant EEG recording positions thatlead to good classification results are known, especially whenthe tasks involve motor imagery (e.g., the imagination oflimb movements) or the overall activity of large parts of thecortex (so-called slow cortical potentials, SCP) that occursduring intentions or states of preparation and relaxation.

For the development of new paradigms the neural cor-relates might not be known in detail and finding optimalrecording positions for the use in BCIs is challenging. Suchnew paradigms can become necessary in cases when mo-tor cortex areas show lesions, for the increase of the in-formation rate of BCI systems, or for robust multiclassBCIs.

Algorithms for channel selection (CS) can identify suit-able recording sites for individual subjects even in the ab-sence of prior knowledge about the mental task. In this caseit is possible to reduce the number of EEG electrodes neces-sary for the classification of brain signals without losing sub-stantial classification performance.

In addition the CS results1 can help to understand whichpart of the brain generates the class-relevant activity and even

1If an ordered list of channels is given by the CS algorithm that representsthe importance of each channel for classification, this result is also called aranking.


ABC

DE

0 10 20 30 40

Best n remaining channels

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Test

erro

r

Average RFEAverage motor 17

0 10 20 30 40


0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Test

erro

r

Figure 1: Test error of the channel selection method RCE for five subjects (A to E) on 39 EEG channels. The left graph shows the developmentof the test error against the best n remaining channels determined by RCE. For some subjects, the test error can be decreased by selectingfewer than 39 channels. The right graph shows test error of RCE averaged over the five subjects. On average, good performance can beobtained by less than 10 channels. The average test error for a set of 17 EEG channels over or close to motor cortex is added as a baseline forcomparison.

simplifies the detection of artifact channels.2 In [2], differentchannel selection algorithms have been compared for a mo-tor imagery task. Figure 1 shows an example of the changein classification error that is observed applying the winningmethod recursive channel elimination (RCE) to the data offive individuals.

If data from several subjects are available, the questionsarise, whether a set of channels selected for one subject isuseful also for other subjects and whether generalized con-clusions can be drawn about channels relevant for the classi-fication of a certain mental task across subjects.

The paper is organized as follows. Section 2 contains theexperimental setup, a description of the mental task, and thebasic data preprocessing. In Section 3 the channel selectionmethod and the classification algorithm are described. Re-sults of cross-subject channel selection compared to averageindividual channel selection are given in Section 4 while thefinal section concludes.

2. DATA ACQUISITION

2.1. Experimental setup and mental task

We recorded EEG signals from eight untrained right-handed male subjects using 39 silver chloride electrodes

2Some subjects unintentionally use muscle activity that influences therecorded signals when trained in a BCI system, especially if feedback is pro-vided.

(see Figure 2). The reference electrodes were positioned atTP9 and TP10. The two electrodes Fp2 and 1 cm lateral of theright eye (EOG) were used to record possible EOG artifactsand eye blinks while two frontotemporal and two occipitalelectrodes were positioned to detect possible muscle activityduring the experiment. Before sampling the data at 256 Hzan analog bandpass filter with cutoff frequencies 0.1 Hz and40 Hz was applied.

The subjects were seated in an armchair at 1 m distancein front of a computer screen. Following the experimentalsetup of [3] the subjects were asked to imagine left versusright hand movements during each trial. With every subject,we recorded 400 trials during one single session. The totallength of each trial was 9 seconds. Additional intertrial inter-vals for relaxation varied randomly between 2 and 4 seconds.No outlier detection was performed and no trials were re-moved during the data processing at any stage.

Each trial started with a blank screen. A small fixationcross was displayed in the center of the screen from second2 to 9. A cue in the form of a small arrow pointing to theright or left side was visible for half a second starting withsecond 3. In order to avoid event-related signals in later pro-cessing stages only data from seconds 4 to 9 of each trial wereconsidered for further analysis. Feedback was not providedat any time.

2.2. Preanalysis

As Pfurtscheller and da Silva have reported [4], movement-related desynchronization of the µ-rhythm (8–12 Hz) is not

Robust Channel Selection across Subjects 3105

equally strong in subjects and might even fail for some sub-jects due to various reasons (e.g., because of too short in-tertrial intervals that prevent a proper resynchronization).Therefore we performed a pre-analysis in order to identifyand exclude subjects that did not show significant µ-activityat all.

For seven of the eight subjects, the µ-band was onlyslightly different from the 8–12 Hz usually given in the EEGliterature. Only one subject showed scarcely any activity inthis frequency range but instead a recognizable movement-related desynchronization in the 16–20 Hz band.

Restricted to only the 17 EEG channels that were locatedover or close to the motor cortex, we calculated the maxi-mum energy of the µ-band using the Welch method [5] foreach subject. This feature extraction resulted in one param-eter per trial and channel and explicitly incorporated priorknowledge about the task.

The eight datasets consisting of the Welch-features wereclassified with linear SVMs (see below) including individ-ual model selection for each subject. Generalization errorswere estimated by 10-fold cross-validation. For three subjectsthe pre-analysis showed very poor error rates close to chancelevel, and their datasets were excluded from further analysis.

2.3. Data preprocessing

For the remaining five subjects the 5 s windows recordedfrom each trial resulted in a time series of 1280 sample pointsper channel. We fitted an autoregressive (AR) model of or-der 3 to the time series3 of all 39 channels using forward-backward linear prediction [6]. The three resulting AR coef-ficients per channel and trial formed the new representationof the data.

The extraction of the features did not explicitly incorpo-rate prior knowledge although autoregressive models havesuccessfully been used for motor-related tasks (e.g., [3]).However, they are not directly linked to the µ-rhythm.

Before AR, datasets from several subjects were combinedfor cross-subject channel selection, an additional centeringand linear scaling of the data was performed. This was doneindividually for each subject and trial in order to maintainthe proportion of corresponding AR coefficients in a trial.

2.4. Notation

Let n denote the number of training vectors (trials) of thedatasets (n = 400 for each of the five datasets) and letd denote the data dimension (d = 3 · 39 = 117 for allfive datasets). The training data for a classifier is denoted asX = (x(1), . . . , x(n)) ∈ Rn×d with labels Y = (y1, . . . , yn) ∈−1, 1n. For the task used in this paper y = −1 denotesimagined left hand movement and y = 1 denotes imagined

3For comparison reasons this choice of the model order is the same asin [2]. For this work different model orders had been compared in the fol-lowing way. For a given order we fitted an AR-model to each EEG sequence.After proper model selection a support vector machine with 10-fold cross-validation (CV) was trained on the AR coefficients. Model order 3 resultedin the best mean CV error.

P9

TP9

FT9

F9 F10

EOG

FT10

TP10

P10

O1

PO7

P7

TP7

T7

FT7

F7

AF7

Fp1Fpz

Fp2

AF8

F8

FT8

T8

TP8

P8

PO8

O2

P5

CP5

C5

FC5

F5 F6

FC6

C6

CP6

P6

PO3 POz

P3

CP3

C3

FC3

F3

AF3 AFz AF4

F4

FC4

C4

CP4

P4

PO4

P1

CP1

C1

FC1

F1 Fz

FCz

Cz

CPz

Pz P2

CP2

C2

FC2

F2

PO9 Oz

O9

PO10

O10Iz

Figure 2: The positions of 39 EEG electrodes used for data acquisi-tion are marked by black circles. The two referencing electrodes aremarked by dotted circles. Eight electrodes over or close to the mo-tor cortex are shown in bold circles (positions C1, C2, C3, C4, FC3,FC4, CP3, and CP4).

right hand movement. The terms dimension and feature areused synonymously.

3. CHANNEL SELECTION ANDCLASSIFICATION METHODS

Channel selection algorithms as well as feature selection al-gorithms can be characterized as either filter or wrappermethods [7]. They select or omit dimensions of the data thatcorrespond to one EEG channel depending on a performancemeasure.

The problem of how to rate the relevance of a chan-nel if nonlinear interactions between channels are presentis not trivial, especially since the overall accuracy might notbe monotonic in the number of features used. Some meth-ods try to overcome this problem by optimizing the selectionfor feature subsets of fixed sizes (plus-l take-away-r search)or by implementing floating strategies (e.g., floating forwardsearch) [7]. Only few algorithms like genetic algorithms canchoose subgroups of arbitrary size during the selection pro-cess. They have successfully been used for the selection ofspatial features [8] in BCI applications but are computation-ally demanding.

For the application of EEG channel selection, it is nec-essary to treat certain groups of features homogenously: nu-merical values belonging to one and the same EEG channelhave to be dealt with in a congeneric way so that a spatialinterpretation of the solution becomes possible.

In [2] three state-of-the-art algorithms were comparedfor the problem of channel selection in BCI. As the methodof recursive channel elimination (RCE), which is closely re-lated to support vector machines (SVM), performed superiorcompared to other methods, we will use RCE for the cross-subject channel selection experiments described in this pa-per.


ξi0 .γ

ξi1 .γ

γ

x(i0)

x(i1)

H

Figure 3: Linear SVM. For nonseparable datasets, slack variablesξi are introduced. The bold points on the dashed lines are calledsupport vectors (SVs). The solution for the hyperplane H can bewritten in terms of the SVs. For more detail see Section 3.1.

3.1. Support vector machines

The support vector machine is a relatively new classificationtechnique developed by Vapnik [9] which has shown to per-form strongly in a number of real-world problems, includingBCI [10]. The central idea is to separate data X ⊂ Rd fromtwo classes by finding a weight vector w ∈ Rd and an offsetb ∈ R of a hyperplane

H : Rd −→ −1, 1,x −→ sign (w · x + b)

(1)

with the largest possible margin,4 which apart from being anintuitive idea has been shown to provide theoretical guar-antees in terms of generalization ability [9]. One variant ofthe algorithm consists of solving the following optimizationproblem:

minw∈Rd

‖w‖22 + C

n∑

i=1

ξ2i

s.t. yi(w · x(i) + b

) ≥ 1− ξi (i = 1, . . . ,n).

(2)

The parameters ξi are called slack variables and ensure thatthe problem has a solution in case the data are not linear sep-arable5 (see Figure 3). The margin is defined as γ(X ,Y ,C) =1/‖w‖2. In practice one has to trade off between a low train-ing error, for example,

∑ξ2i , and a large margin γ. This trade

off is controlled by the regularization parameter C. Findinga good value for C is part of the model selection procedure.If no prior knowledge is available C has to be estimated fromthe training data, for example, by using cross-validation. Thevalue 2/C is also referred to as the ridge. For a detailed dis-cussion please refer to [11].

4If X is linear separable the margin of a hyperplane is proportional to thedistance of the hyperplane to the closest point x ∈ X .

5 If the data are linear separable the slack variables can improve the gen-eralization ability of the solutions.

3.2. Recursive channel elimination

This channel selection method is derived from the recursivefeature elimination method prosed by Guyon et al. [12]. Itis based on the concept of margin maximization. The impor-tance of a channel is determined by the influence it has on themargin of a trained SVM. Let W be the inverse of the margin

W(X ,Y ,C) := 1γ(X ,Y ,C)

= ‖w‖2. (3)

Let X− j be the data with features j removed and Y− j thecorresponding labels. In the original version one SVM istrained during each iteration and the features j which mini-mize |W(X ,Y ,C)−W(X− j ,Y− j ,C)| are removed (typically,i.e., one feature only); this is equivalent to removing the di-mensions j that correspond to the smallest |wj|. For channelselection this method was adapted in the following way.

Let Fk ⊂ 1, . . . ,d denote the features from chan-nel k. For each channel k we define the score sk :=(1/|Fk|)

∑l∈Fk |wl|. At each iteration we remove the channels

with the lowest score. If no prior knowledge is available theparameter C has to be estimated from the training data.

3.3. Generalization error estimation

For model selection purposes we estimated the generaliza-tion error of classifiers via 10-fold cross-validation.

If the generalization error of a channel selection methodhad to be estimated, a somewhat more elaborated proce-dure was used. An illustration of this procedure is given inFigure 4.

The whole dataset is split up into 10 folds (F1 to F10) asfor usual cross-validation. In each fold F, the channel selec-tion (CS in Figure 4) is performed based on the training set ofF only, leading to a specific ranking of the 39 EEG channels.For each fold F, 39 classifiers Ch

F , h = 1, . . . , 39, are trained asfollows: Ch

F is trained on the h best6 channels, respectively, ofthe train set of F and tested on the corresponding channelsof the test set of F. For each fold, this results in 39 test errors(E1

F to E39F ).

During the last step, the corresponding test errors are av-eraged over all folds. This leads to an estimate of the general-ization error for every number of selected channels.

4. EXPERIMENTS AND RESULTS

The successful transfer of EEG channel rankings of one sub-ject to another can be difficult for several reasons.

(i) The head shapes might vary between subjects. Thislimits the comparability of electrode positions andchannel selection outcomes.

(ii) Subjects might use different mental representations fora task, even if they are instructed carefully.

6In this context, best means according to the ranking calculated for thatfold.


Average over 10 folds:

φ

E1 · · ·E39

E1F10 · · ·E39

F10

F10 TestF10

C1F10 · · ·C39

F10

10 folds ......

...

...

E1F2 · · ·E39

F2

...

DataF2

TestF2

C1F2 · · ·C39

F2

TrainF2

CS1 2· · ·· · ·

39

E1F1 · · ·E39

F1

F1

TestF1

TrainF1

CS

C1F1 · · ·C39

F1Calculate 39 testerrors for fold F1

1 2· · ·· · ·

39 for h = 1 : 39train classifier Ch

on h best channelsend

Rankingof channels

Figure 4: Illustration of the procedure for channel selection anderror estimation using cross-validation.

(iii) Cortex areas important for the mental task are prob-ably organized slightly differently between subjects.This limits the comparability of localized activity pat-terns.

Luckily motor imagery tasks involve a comparably bigpart of the cortex. As a result small dislocations of EEG elec-trodes (e.g., around typical motor positions C3 and C4, seeSection 2) usually do not lead to profound error increase forthe classification of brain activity.

Nevertheless it is very important to investigate the reli-ability of cross-subject channel selection: on the one hand,even a slightly increased classification error leads to a largedrop in the information rate for a BCI system [13]; on theother hand, mental tasks that do not show the advantages ofmotor imagery will more and more be focused on by BCI re-search in order to expand existing systems to multiclass BCIsor for increasing the information rate of patients whose mo-tor areas are not intact.

The following subsections show results for the recur-sive channel elimination method on cross-subject data. InSection 4.1 RCE is applied to combined data of all five sub-jects. Results are compared with the individual channel rank-ings obtained from the five subjects. In Section 4.2 the trans-fer of rankings is investigated: RCE calculates rankings ofdata combined from 4 subjects before these rankings aretested on the corresponding remaining unseen dataset of thelast subject.

4.1. Channel selection on combined data

We applied the channel selection method of recursive chan-nel elimination (RCE) introduced in Section 3 on a trainingdataset that was combined from the five AR datasets.

The estimation of the average generalization error for all39 stages of the channel selection process with RCE was car-ried out using linear SVMs as classifiers with parameter Cpreviously determined by 10-fold cross-validation.7 Detailsabout the 10-fold cross-validation process for channel selec-tion are described in Section 3.3 and Figure 4. Figure 5 showsthe development of the estimated classification error for all39 steps of the RCE.

For this combined dataset the test error was minimal(26.9%) when using data from 32 or more EEG channelsbut further reduction down to 24 channels increased the testerror only marginally. Reducing the number of channels tofewer than the best 17 channels leads to a strong increase ofthe test error.

Throughout the ranking in the table of Figure 5, artifactor task-irrelevant channels appear only in the last ranks (e.g.,EOG, occipital channels, FT9, FT10, etc.). Direct compari-son between Figures 1 and 5 reveal that the curve in Figure 1shows smaller error rates. The performance of a classifiertrained on the RCE channels of combined data is worse thanthe average performance of classifiers trained on the individ-ual RCE channels of single subject data.

4.2. Transfer of channel selectionoutcomes to new subjects

In this section we analyze whether there exists a generalgood subgroup of EEG channels (i.e., a subgroup of chan-nels that perform well for all subjects) for a fixed mentaltask and whether this subgroup can be determined by theRCE method. We describe different methods to obtain chan-nel rankings, some of which include the data of more thanone subject. However these rankings are always tested on thedata of one subject only. Table 1 provides an overview overall ranking modes.

Cross-subject modes

We iterate the following process. One subject is removedfrom the combined data base. We perform the RCE on theremaining data which leads to a channel ranking.

We use this ranking in two different ways to obtain testerrors via 10-fold cross-validation on the data of the removedsubject.

(i) Best 8 (cross). The channel subset used for testing con-sists of the eight best-ranked channels. The resulting 8 bestchannels are plotted in Figure 6.

(ii) Best n (cross). The channel subset used for testingconsists of the n best-ranked channels. The number n is cho-sen such that the expected cross-validation error on the four

7Estimating the parameter for each number of channels in the process ofchannel selection might improve the accuracy but was not performed.


0 5 10 15 20 25 30 35 40


0.25

0.3

0.35

0.4

0.45

0.5

Test

erro

r

Rank Position Rank Position

12345

6

7

8

910

111213

141516

17181920

2122232425

26

27

28

2930

313233

343536

373839

CP2CP1FC2FCzF1

C4

FC4

C2

F2C1

C3CPzFC3

FT7FC1C6

CP4P6C5O2

POzF6

AFzTP8Cz

P1

CP3

P2

FT9P5

FT10TP7FT8

Fp2F5O1

O9EOGO10

Figure 5: RCE results for a combined dataset of all 5 subjects. The graph shows a test error estimation for the n best channels. The errorvalues were estimated by 10-fold crossvalidation. The table on the right shows the channel ranking performed on the combined data. Eightchannels which are located over or close to the motor cortex (see Figure 2) are printed with grey background. The surface map visualizes thisranking. The 24 best-ranked electrodes were mapped to grey scale values. Bright areas of the surface map correspond to relevant channels(according to RCE) whereas dark areas show less-relevant electrodes.

Table 1: Ranking modes overview: explanation of the ranking modes used for the comparison shown in Figure 7. The rankings were calcu-lated on different kinds of datasets: on data from single subjects or (for cross-subject tests) on combined datasets (4-fold cross-validation).Testing of the ranking modes was always performed on the data of one single subject.

Mode Ranking method Ranking based on Description

Motor 8 A priori knowledge Single subject 8 channels over or close to motor cortex

Random 8 (Random) Single subject 8 channels

Best n (single) RCE Single subjectn channels

with highest rank that minimize CV error

Best 8 (single) RCE Single subject8 channels

with highest rank

Best n (cross) RCE Four subjectsn channels

with highest rank that minimize CV error

Best 8 (cross) RCE Four subjects8 channels

with highest rank

subjects is minimized. Note that this choice does not dependon the data of the fifth test subject.

As this process is repeated for every subject that was leftout, we can average the error values of the modes Best 8(cross) and Best n (cross) over five repetitions.

For comparison: single-subject modes

For the fixed mental task of motor activity and imagery, theEEG literature suggests the channels CP3, CP4, and adja-cent electrodes (e.g., [3]). Our guess at generally good sub-group of EEG channels is thus the electrode set: FC3, FC4,C1, C2, C3, C4, CP3, CP4 (see electrodes marked in boldfacein Figure 2). The corresponding test mode is referred to asMotor 8.

If no prior knowledge of a task and no channel selectionwere available, a random choice of channels would be the sin-gle solution. For comparison reasons we include the modeRandom 8. Its test error is the average of ten repetitions ofchoosing eight random channels, optimizing the regulariza-tion parameter C and testing this random subset via 10-foldcross-validation on the data of one subject.

For the two modes Best 8 (single) and Best n (single) theRCE method was applied to the individual data of singlesubjects only. These modes used subgroups of the eight bestchannels and n best channels (see above) for calculating thetest error via 10-fold cross-validation. It can be expected thatthe ranking for data from single subjects leads to more accu-rate classification results and can reveal task-related artifact


Without subject A

P9

TP9

FT9

F9 F10

EOG

FT10

TP10

P10

O1

PO7

P7

TP7

T7

FT7

F7

AF7

Fp1Fpz

Fp2

AF8

F8

FT8

T8

TP8

P8

PO8

O2

P5

CP5

C5

FC5

F5 F6

FC6

C6

CP6

P6

PO3 POz

P3

CP3

C3

FC3

F3

AF3 AFz AF4

F4

FC4

C4

CP4

P4

PO4

P1

CP1

C1

FC1

F1 Fz

FCz

Cz

CPz

Pz P2

CP2

C2

FC2

F2

Without subject B

P9

TP9

FT9

F9 F10

EOG

FT10

TP10

P10

O1

PO7

P7

TP7

T7

FT7

F7

AF7

Fp1Fpz

Fp2

AF8

F8

FT8

T8

TP8

P8

PO8

O2

P5

CP5

C5

FC5

F5 F6

FC6

C6

CP6

P6

PO3 POz

P3

CP3

C3

FC3

F3

AF3 AFz AF4

F4

FC4

C4

CP4

P4

PO4

P1

CP1

C1

FC1

F1 Fz

FCz

Cz

CPz

Pz P2

CP2

C2

FC2

F2

Without subject C

P9

TP9

FT9

F9 F10

EOG

FT10

TP10

P10

O1

PO7

P7

TP7

T7

FT7

F7

AF7

Fp1Fpz

Fp2

AF8

F8

FT8

T8

TP8

P8

PO8

O2

P5

CP5

C5

FC5

F5 F6

FC6

C6

CP6

P6

PO3 POz

P3

CP3

C3

FC3

F3

AF3 AFz AF4

F4

FC4

C4

CP4

P4

PO4

P1

CP1

C1

FC1

F1 Fz

FCz

Cz

CPz

Pz P2

CP2

C2

FC2

F2

Without subject D

P9

TP9

FT9

F9 F10

EOG

FT10

TP10

P10

O1

PO7

P7

TP7

T7

FT7

F7

AF7

Fp1Fpz

Fp2

AF8

F8

FT8

T8

TP8

P8

PO8

O2

P5

CP5

C5

FC5

F5 F6

FC6

C6

CP6

P6

PO3 POz

P3

CP3

C3

FC3

F3

AF3 AFz AF4

F4

FC4

C4

CP4

P4

PO4

P1

CP1

C1

FC1

F1 Fz

FCz

Cz

CPz

Pz P2

CP2

C2

FC2

F2

Without subject E

P9

TP9

FT9

F9 F10

EOG

FT10

TP10

P10

O1

PO7

P7

TP7

T7

FT7

F7

AF7

Fp1Fpz

Fp2

AF8

F8

FT8

T8

TP8

P8

PO8

O2

P5

CP5

C5

FC5

F5 F6

FC6

C6

CP6

P6

PO3 POz

P3

CP3

C3

FC3

F3

AF3 AFz AF4

F4

FC4

C4

CP4

P4

PO4

P1

CP1

C1

FC1

F1 Fz

FCz

Cz

CPz

Pz P2

CP2

C2

FC2

F2

With all subjects

P9

TP9

FT9

F9 F10

EOG

FT10

TP10

P10

O1

PO7

P7

TP7

T7

FT7

F7

AF7

Fp1Fpz

Fp2

AF8

F8

FT8

T8

TP8

P8

PO8

O2

P5

CP5

C5

FC5

F5 F6

FC6

C6

CP6

P6

PO3 POz

P3

CP3

C3

FC3

F3

AF3 AFz AF4

F4

FC4

C4

CP4

P4

PO4

P1

CP1

C1

FC1

F1 Fz

FCz

Cz

CPz

Pz P2

CP2

C2

FC2

F2

Figure 6: The database consists of data from 5 subjects. The channels were ranked 5 times using the channel selection method recursivechannel elimination (RCE), each time using the data of four subjects only. The electrode positions marked in bold are the 8 best-ranked onesand are consistently located over or close to the motor cortex although the method was not provided with prior knowledge about the motorimagery task. This type of ranking is referred to as Best 8 (cross).


Motor 8Random 8Best n (single)

Best 8 (single)Best n (cross)Best 8 (cross)

A B C D E Average

Test subject

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Test

erro

r

Figure 7: Comparison of the test errors of six different rankingmodes for single subjects (A to E) and the test errors of these modesaveraged over the five subjects (Average). For each mode and sub-ject, the regularization parameter C was estimated separately. Alltest errors were obtained using 10-fold CV. The first mode Motor8 tests the classification error for 8 channels over or close to themotor cortex, whereas Random 8 is based on 8 randomly chosenchannels. Modes Best n (single) and Best 8 (single) test channel setswhose rankings were calculated based on the specific subject only.Modes Best n (cross) and Best 8 (cross) test channel sets whose rank-ings were calculated based on all other subject’s data but did notincorporate data from the test subject.

channels [2] that might not be present in data from othersubjects.

Figure 7 shows the results for the 6 modes. The right-most block contains an average taken over subjects for eachof the modes. From the average results we observe the fol-lowing.

(i) The 8 motor channels are not optimal: Best 8 (single)performs much better.8

(ii) Mode Best 8 (cross) performs almost as well as themotor channel mode. Although we conclude that the RCEmethod fails to find an optimal channel subset, the resultssuggest that when transferring channel positions across sub-jects the expected performance is not much worse than theone using prior knowledge.

(iii) The subset of 8 random channels performs surpris-ingly well. This finding suggests that the structure of the datacan successfully be captured by the SVM even if only fewchannels close to the motor cortex are contained in the chan-nel subset. However all other modes show better error esti-mations.

8In Figure 1 the choice of motor channels results in a lower classificationerror than the error from the RCE method. This is due to the fact that theregularization parameter C or ridge was not optimized for a specific rankingas was done in this study.

(iv) The performance of Best n (cross) mode is compara-ble to the results of the Best 8 (single) mode (23%); never-theless this comparison is unfair since on average 27 chan-nels were used. The cross-validation averaged over the fivesubjects is 26% for the choice of 27 random channels (notplotted in Figure 7).

(v) The best performing mode is Best n (single). On aver-age it only uses n = 14 channels and yields an error as low as21.8%.

5. CONCLUSION

The recursive channel elimination (RCE) method was ap-plied to EEG channel selection in the context of signal classi-fication for a Brain-Computer interface (BCI) system.

All experiments were based on data from five subjectsrecorded during a motor imagery task comprising imaginedleft and right hand movement.

For individual subject we analyzed the performance ofthree different types of rankings: (i) ranking including chan-nels over the motor cortex only, (ii) ranking obtained by RCEfrom the data of that subject, (iii) ranking obtained by RCEfrom the data of the other four subjects.

We obtained best results with RCE rankings from sin-gle subjects. A comparison reveals that they outperform mo-tor rankings (including prior knowledge about the task) byabout 5% absolute error.

The transfer of RCE rankings from the data of multiplesubjects to a new subject leads to a small decrease in perfor-mance. The difference to the performance of motor rankingsturns out to be less than 2% on average.

We conclude that individual channel ranking is prefer-able over cross-subject ranking for the experimentalparadigm investigated here.

However for the first time, it could be shown that RCEcannot only successfully be used to select channels for indi-vidual subjects, but that RCE rankings on the combined dataof multiple subjects are consistently in agreement with theEEG literature on motor imagery tasks, and can still yield er-ror rates as low as 17% on unseen subjects.

ACKNOWLEDGMENTS

The authors would like to thank Bernd Battes and ProfessorDr. Kuno Kirschfeld for their help with the EEG recordings.Special thanks to Dr. Jason Weston for his help on featureselection topics. This work have been supported in part byDFG (AUMEX RO 1030/12), NIH, and the IST Programmeof the European Community, under the PASCAL Networkof Excellence, IST-2002-506778. Thomas Navin Lal was sup-ported by a grant from the Studienstiftung des deutschenVolkes.

REFERENCES

[1] E. E. Sutter, “The brain response interface: communicationthrough visually-induced electrical brain responses,” Jour-nal of Microcomputer Applications, vol. 15, no. 1, pp. 31–45,1992.


[2] T. N. Lal, M. Schroder, T. Hinterberger, et al., “Support vectorchannel selection in BCI,” IEEE Trans. Biomed. Engineering,vol. 51, no. 6, pp. 1003–1010, 2004.

[3] G. Pfurtscheller, C. Neuper, A. Schlogl, and K. Lugger, “Sep-arability of EEG signals recorded during right and left mo-tor imagery using adaptive autoregressive parameters,” IEEETrans. Rehab. Eng., vol. 6, no. 3, pp. 316–325, 1998.

[4] G. Pfurtscheller and F. H. Lopes da Silva, “Event-relatedEEG/MEG synchronization and desynchronization: basicprinciples,” Clinical Neurophysiology, vol. 110, no. 11, pp.1842–1857, 1999.

[5] P. D. Welch, “The use of fast Fourier transform for the esti-mation of power spectra: a method based on time averagingover short, modified periodograms,” IEEE Trans. Audio Elec-troacoust., vol. 15, no. 2, pp. 70–73, 1967.

[6] S. Haykin, Adaptive Filter Theory, Prentice-Hall, Upper SaddleRiver, NJ, USA, 1996.

[7] P. Pudil, F. J. Ferri, J. Novovicova, and J. Kittler, “Floatingsearch methods for feature selection with nonmonotonic cri-terion functions,” in Proc. 12th International Conference onPattern Recognition (ICPR ’94), vol. 2, pp. 279–283, Jerusalem,Israel, October 1994.

[8] M. Schroder, M. Bogdan, W. Rosenstiel, T. Hinterberger, andN. Birbaumer, “Automated EEG feature selection for braincomputer interfaces,” in Proc. 1st International IEEE EMBSConference on Neural Engineering, pp. 626–629, Capri, Italy,March 2003.

[9] V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons,New York, NY, USA, 1998.

[10] B. Blankertz, G. Curio, and K. Muller, “Classifying single trialEEG: towards brain computer interfacing,” in Advances inNeural Information Processing Systems, T. K. Leen, T. G. Di-etterich, and V. Tresp, Eds., vol. 14, MIT Press, Cambridge,Mass, USA, 2001.

[11] B. Scholkopf and A. Smola, Learning with Kernels, MIT Press,Cambridge, Mass, USA, 2002.

[12] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selec-tion for cancer classification using support vector machines,”Machine Learning, vol. 46, no. 1-3, pp. 389–422, 2002.

[13] A. Schlogl, C. Keinrath, R. Scherer, and G. Pfurtscheller, “In-formation transfer of an EEG-based brain-computer inter-face,” in Proc. 1st International IEEE EMBS Conference on Neu-ral Engineering, pp. 641–644, Capri, Italy, March 2003.

Michael Schroder received his Diploma incomputer science in 2000. Currently he is aPh.D. student at the Department for Com-puter Engineering (Professor Rosenstiel) atthe Eberhard-Karls-Universitat Tubingen inGermany. His research interests include ma-chine learning, brain-computer interfacesystems, and signal processing.

Thomas Navin Lal received his Diplomain mathematics in 2001 and spent oneyear with the machine learning group ofProfessor Dr. Thomas Hofmann at BrownUniversity, Providence, RI. He is currentlya Ph.D. student of Professor Dr. Bern-hard Scholkopf at the Max Planck Institutefor Biological Cybernetics, Tubingen, Ger-many. He is a researcher at the PASCALnetwork of excellence and is currently sup-ported by a grant from the Studienstiftung des deutschen Volkes.

Thilo Hinterberger received his Diploma inphysics from the University of Ulm, Ger-many, and received his Ph.D. degree inphysics from the University of Tubingen,Germany, in 1999, on the development of abrain-computer interface, called “ThoughtTranslation Device.” He is currently a Re-search Associate with the Institute of Med-ical Psychology and Behavioral Neurobiol-ogy at the University of Tubingen, Ger-many. His primary research interests focus on the further devel-opment of brain-computer interfaces and their applications andalso on the development of EEG classification methods and theinvestigation of neuropsychological mechanisms during the oper-ation of a BCI using functional MRI. He is a Member of the Soci-ety of Psychophysiological Research and the Deutsche PhysikalischeGesellschaft (DPG).

Martin Bogdan received the EngineerDiploma in signal engineering from theFachhochschule Offenburg, Germany, in1993, and the Engineer Diploma in in-dustrial informatics and instrumentationfrom the Universite Joseph Fourier Greno-ble, France, in 1993. In 1998, he receivedthe Ph.D. degree in computer science (com-puter engineering) from the University ofTubingen, Germany. In 1994, he joinedthe Department of Computer Engineering at the University ofTubingen, where, since 2000, he has headed the research groupNeuroTeam. This research group deals mainly with signal process-ing based on artificial neural nets and machine learning focused on,but not limited to, biomedical applications.

N. Jeremy Hill graduated in experimen-tal psychology at the University of Oxford,UK, in 1995. Until 2001 he was a ResearchAssistant, Programmer, and finally a doc-toral student in the psychophysics labora-tory of Dr. Bruce Henning in Oxford. He re-ceived the Ph.D. degree in 2002, for a doc-toral thesis on psychophysical statistics en-titled “Testing hypotheses about psychome-tric functions.” Since then he has been partof Professor Bernhard Scholkopf ’s Department for Empirical Infer-ence for Machine Learning and Perception at the Max Planck Insti-tute for Biological Cybernetics in Tubingen, Germany, and now hefocuses on brain-computer interface research.

Niels Birbaumer was born in 1945. He re-ceived his Ph.D. degree in 1969, in bio-logical psychology, art history, and statis-tics, from the University of Vienna, Aus-tria. From 1975 to 1993, he was a FullProfessor of clinical and physiological psy-chology, University of Tubingen, Germany.From 1986 to 1988, he was a Full Profes-sor of psychology, Pennsylvania State Uni-versity, USA. Since 1993, he has been a Pro-fessor of medical psychology and behavioral neurobiology at theFaculty of Medicine, the University of Tubingen, and Professor ofclinical psychophysiology, University of Padova, Italy. Since 2002,he has been the Director of the Center of Cognitive Neuroscience,University of Trento, Italy.


Wolfgang Rosenstiel is Professor at theUniversity of Tubingen and is the Chair ofComputer Engineering. He is also the Man-aging Director of the Wilhelm Schickard In-stitute at Tubingen University, and the Di-rector of the Department for System Designin Microelectronics at the Computer Sci-ence Research Centre (FZI). He is on the Ex-ecutive Board of the German Edacentrum.His research areas include artificial neuralnetworks, signal processing, embedded systems, and computer ar-chitecture.

Bernhard Scholkopf received an M.S. de-gree in mathematics (University of London,1992) and a Diploma in physics (Eberhard-Karls-Universitat Tubingen, 1994), and aPh.D. degree in computer science (Techni-cal University Berlin, 1997). He won the Li-onel Cooper Memorial Prize of the Uni-versity of London, the Annual DissertationPrize of the German Association for Com-puter Science (GI), and the Prize for theBest Scientific Project at the German National Research Center forComputer Science (GMD). He has researched at AT&T Bell Labs,at GMD FIRST, Berlin, at the Australian National University, Can-berra, and at Microsoft Research Cambridge, UK. He has taughtat the Humboldt University and the Technical University Berlin. InJuly 2001, he was elected Scientific Member of the Max Planck Soci-ety and Director at the MPI for Biological Cybernetics. In October2002, he was appointed Honorary Professor for Machine Learningat the Technical University Berlin.


Determining Patterns in Neural Activity for ReachingMovements Using Nonnegative Matrix Factorization

Sung-Phil KimDepartment of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USAEmail: [email protected]

Yadunandana N. RaoMotorola Inc., FL, USAEmail: [email protected]

Deniz ErdogmusDepartment of Computer Science and Biomedical Engineering, Oregon Health & Science University,Beaverton, OR 97006, USAEmail: [email protected]

Justin C. SanchezDepartment of Pediatrics, Division of Neurology, University of Florida, Gainesville, FL 32611, USAEmail: [email protected]

Miguel A. L. NicolelisDepartment of Neurobiology, Center for Neuroengineering, Duke University, Durham, NC 27710, USAEmails: [email protected]; [email protected]

Jose C. PrincipeDepartment of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USAEmail: [email protected]

Received 31 January 2004; Revised 23 March 2005

We propose the use of nonnegative matrix factorization (NMF) as a model-independent methodology to analyze neural activity.We demonstrate that, using this technique, it is possible to identify local spatiotemporal patterns of neural activity in the formof sparse basis vectors. In addition, the sparseness of these bases can help infer correlations between cortical firing patterns andbehavior. We demonstrate the utility of this approach using neural recordings collected in a brain-machine interface (BMI) setting.The results indicate that, using the NMF analysis, it is possible to improve the performance of BMI models through appropriatepruning of inputs.

Keywords and phrases: brain-machine interfaces, nonnegative matrix factorization, spatiotemporal patterns, neural firing activ-ity.

1. INTRODUCTION

Brain-machine interfaces (BMIs) are an emerging field thataims at directly transferring the subject’s intent of move-ment to an external machine. Our goal is to engineer de-vices that are able to interpret neural activity originating inthe motor cortex and generate accurate predictions of handposition. In the BMI experimental paradigm, hundreds ofmicroelectrodes are implanted in the premotor, motor, and

posterior parietal areas and the corresponding neural activ-ity is recorded synchronously with behavior (hand reach-ing and grasping movements). Spike detection and sortingalgorithms are used to determine the firing times of sin-gle neurons. Typically, the spike-time information is sum-marized into bin counts using short windows (100 millisec-onds in this paper). A number of laboratories including ourown have demonstrated that linear and nonlinear adaptivesystem identification approaches using the bin count input


can lead to BMIs that effectively predict the hand positionand grasping force of primates for different movement tasks[1, 2, 3, 4, 5, 6, 7, 8]. The adaptive methods studied thus farinclude moving average models, time-delay neural networks(TDNNs), Kalman filter and extensions, recursive multilayerperceptrons (RMLPs), and mixture of linear experts gated byhidden Markov models (HMMs).

BMIs open up an important avenue to study the spatio-temporal organization of spike trains and their relationshipswith behavior. Recently, our laboratory has investigated thesensitivity of neurons and cortical areas based on their rolein the mapping learned by the RMLP and the Wiener fil-ter [7]. We examined how each neuron contributes to theoutput of the models, and found consistent relationships be-tween cortical regions and segments of the hand trajectoryin a reaching movement. This analysis indicated that, dur-ing each reaching action, specific neurons from the posteriorparietal, the premotor dorsal, and the primary motor regionssequentially became dominant in controlling the output ofthe models. However, this approach relies on determining asuitable model, because it explicitly uses the learned modelto infer the dependencies.

In this paper, we propose a model-independent method-ology to study spatiotemporal patterns between neuronalspikes and behavior utilizing nonnegative matrix factoriza-tion (NMF) [9, 10]. In its original applications, NMF wasmainly used to provide an alternative method for determin-ing sparse representations of images to improve recognitionperformance [10, 11]. d’ Avella and Tresch have also pro-posed an extension of NMF to extract time-varying musclesynergies for the analysis of behavior patterns of a frog [12].The nonnegativity constraints in NMF result in the unsuper-vised selection of sparse bases that can be linearly combined(encoded) to reconstruct the original data. Our hypothesis isthat NMF can similarly yield sparse bases for analyzing neu-ral firing activity, because of the intrinsic nonnegativity ofthe bin counts and the sparseness of spike trains.

The application of NMF to extract local features of neu-ral spike counts follows the method of obtaining sparse basesto describe the local features of face images. The basis vec-tors provided by NMF and their temporal encoding patternsare examined to determine how the activities of specific neu-rons localize to each segment of the reaching trajectory. Wewill show that the results from this model-independent anal-ysis of the neuronal activity are consistent with the previousobservations from the model-based analysis.

2. NONNEGATIVE MATRIX FACTORIZATION

NMF is a procedure to decompose a nonnegative data matrixinto the product of two nonnegative matrices: bases and en-coding coefficients. The nonnegativity constraint leads to aparts-based representation, since only additive, not subtrac-tive, combinations of the bases are allowed. An n × m non-negative data matrix X, where each column is a sample vec-tor, can be approximated by NMF as

X = WH + E, (1)

where E is the error and W and H have dimensions n × rand r ×m, respectively. W consists of a set of r basis vectors,while each column of H contains the encoding coefficientsfor every basis for the corresponding sample. The number ofbases is selected to satisfy r(n + m) < nm so that the numberof equations exceed that of the unknowns.

This factorization can be described in terms of columnsas

x j ≈ Wh j , for j = 1, . . . ,m, (2)

where x j is the jth column of X and h j is the jth column ofH. Thus, each sample vector is a linear combination of ba-sis vectors in W weighted by h j . The nonnegative constraintson W and H allow only additive combination of basis vectorsto approximate x j . This constraint allows the visualization ofthe basis vectors as “part” of the original sample [10]. This iscontrary to factorization by PCA, where negative basis vec-tors are allowed.

The decomposition of X into W and H can be deter-mined by optimizing an error function between the originaldata matrix and the decomposition. Two possible cost func-tions used in the literature are the Frobenius norm of the er-ror matrix ‖X −WH‖2

F and the Kullback-Leibler divergenceDKL(X‖WH). The nonnegativity constraint can be satisfiedby using multiplicative update rules discussed in [10] to min-imize these cost functions. In this paper, we will employ theFrobenius norm measure, for which the multiplicative up-date rules that converge to a local minimum are given below:

Hµ j(k + 1) = Hµj (k)

(WTX

)µj(

WTWH)µj

,

Wiµ(k + 1) = Wiµ(k)

(XHT)

iµ(WHHT)

iµ

.

(3)

Aab denotes the element of a matrix A at ath row and bthcolumn. It has been proven in [9] that the Frobenius normcost function is nonincreasing under this update rule.

3. FACTORIZATION OF THE NEURONALACTIVITY MATRIX

We will now apply the multiplicative update rule in (2) to theneuronal bin-count matrix (created by real neural recordingsof a behaving primate). The goal is to determine nonnegativesparse bases for the neural activity, from which we wish to de-duce the local spatial structure of the neural population fir-ing activity. These bases also point out common populationfiring patterns corresponding to the specific behavior. In ad-dition, the resulting factorization yields a temporal encodingmatrix that indicates how the instantaneous neural activity isoptimally constructed from these localized representations.Since we are interested in the relationship between the neu-ral activity and behavior, we would like to study the couplingbetween this temporal encoding pattern with the movementof the primate, as well as the contribution of the specific basesvectors, which represent neural populations.

Determination of Neural Firing Patterns Using NMF 3115

Table 1: Distribution of neurons and cortical regions.

Monkey-1 Monkey-2

Regions PP M1(area 1) PMd M1(area 2) M1 PMd

Neurons 1 ∼ 33 34 ∼ 54 55 ∼ 81 82 ∼ 104 1 ∼ 37 38 ∼ 54

3.1. Data preparation

Synchronous, multichannel neuronal spike trains were col-lected at Duke University using two female owl monkeys(Aotus trivirgatus): Belle (monkey-1) and Carmen (monkey-2).1 Microwire electrodes were implanted in cortical regionswhere motor associations are known [1, 13]. During theneural recording process, up to sixty-four electrodes wereimplanted in posterior parietal (PP)-area 1, primary motor(M1)-area 2, area 4, and premotor dorsal (PMd)-area 3, eachreceiving sixteen electrodes. From each electrode, one to fourneurons can be discriminated. The firing times of individualneurons were determined using spike detection and sortingalgorithms [14] and were recorded while the primate per-formed a 3D reaching task that consists of a reach to foodfollowed by eating. The primate’s hand position was alsorecorded using multiple fiber optic sensors (with a sharedtime clock) and digitized with a 200 Hz sampling rate [1].These sensors were contained in the plastic strip of whichbending and twisting modified the transmission of the lightthrough the sensors in order to record positions in 3D spacemore accurately. The neuronal firing times were binned innonoverlapping windows of 100 milliseconds, representingthe local firing rate for each neuron. In this recording sessionof approximately 20 minutes (12 000 bins), 104 neurons formonkey-1 and 54 neurons for monkey-2 could be discrimi-nated (whose distribution to cortical regions is provided inTable 1 from [13]), and there were 71 reaching actions formonkey-1 and 65 for monkey-2, respectively. These reach-ing movements consist of three natural segments shown inFigure 1.

Based on the analysis of Wessberg et al. [1], the instanta-neous movement is correlated with the current and the pastneural data up to 1 second (10 bins). Therefore, for eachtime instant, we form a bin-count vector by concatenating10 bins of firing counts (which correspond to 10-tap delayline in a linear filter) from every neuron. Hence, if xj(i) rep-resents the ith bin of neuron j, where i ∈ 1, . . . , 12 000, abin-count vector at time instance i is represented by x(i) =[x1(i), x1(i−1), . . . , x1(i−9), x2(i), . . . , xn(i−9)]T , where n isthe number of neurons. Since we are interested in determin-ing repeated spatiotemporal firing patterns during the reach-ing movements, only the bin counts from time instanceswhere the primate’s arm is moving are considered. There isa possibility that in the selected training set some neuronsnever fire. The rows corresponding to these neurons must beremoved from the bin-count matrix, since they tend to cause

1All experimental procedures conformed to the National Academy PressGuide for the Care and Use of Laboratory Animals and were approved by theDuke University Animal Care and Use Committee.

0 10 20 30 40 50 60 70−30

−20

−10

0

10

20

30

40

50

60

70

xyz

Time (ms)Po

siti

on(m

m)

Rest to food Food to mouth Mouth to rest

Figure 1: Segmentation of the reaching trajectories: reach from restto food, reach from food to mouth, and reach from mouth to restpositions (taken from [7]).

instability in the NMF algorithm. In addition, to prevent theerror criterion from focusing too much on neurons that sim-ply fire frequently (although the temporal structure of theiractivity might not be significant for the task), the bin countsin each row (i.e., for each neuron) of the data matrix are nor-malized to have the unit length in its two norms. In general,if n neurons are considered for a total of m time instances,the data matrix X has dimension (10n)×m. Since the entriesof the data matrix are bin counts, they are guaranteed to benonnegative. Accounting for 71 or 65 movements, there arem = 2143 time instances for monkey-1 and m = 2521 formonkey-2.

3.2. Analysis of factorization process

In the application of NMF to a given neural firing matrix,there are several important issues that must be addressed: theselection of the number of bases, the uniqueness of the NMFsolution, and understanding how NMF can find local struc-tures of neural firing activity.

The problem of the choice of the number of bases canbe addressed in the framework of model selection. A num-ber of model selection techniques (e.g., the cross-validation)can be utilized for finding the optimal number of bases.In this paper, we choose to adopt a selection criterion thathas been recently developed for clustering. The criterion iscalled the index I , which has been used to indicate the cluster


validity [15]. This index has shown consistent performanceof selecting the true number of clusters for various experi-mental settings. The index I is composed of three factors as

I(r) =(

1r· E1

Er·Dr

)p

, (4)

where Er is the approximation error (Frobenius norm) for rbases, and Dr is the maximum Euclidean distance betweenbases such that

Dr = rmaxi, j=1

∥∥wi −w j

∥∥. (5)

The optimal r is the one that maximizes I(r). We will utilizethis index to determine the optimal r for NMF with p = 1.

Donoho and Stodden have shown that a unique solu-tion of NMF is possible under certain conditions [16]. Theyhave shown through a geometrical interpretation of NMFthat if the data are not strictly positive, there can be onlyone set of nonnegative bases which spans the data in thepositive orthant. With an articulated set of images obeyingthree rules (a generative model, linear independence of gen-erators, and factorial sampling), they showed NMF identi-fies the generators or “parts” of images. If we consider ourneuronal bin-count matrix, each row contains many zero en-tries (zero bin counts) even after removing nonfiring neu-rons since most neurons do not fire continuously once in ev-ery 100-millisecond window during the entire training set.Therefore, our neuronal data are not strictly positive. Thisimplies that the existence of a unique set of nonnegative basesfor the neuronal bin-count matrix is warranted. The ques-tion still remains if the NMF basis vectors can find the gen-erative firing patterns for the neural population by meetingthe three conditions mentioned above. Here, we discuss theneuronal bin-count data with respect to these conditions.

As stated previously, we have demonstrated through sen-sitivity analysis that the specific neuronal subsets from thePP, PMd, and M1 regions were sequentially involved in de-riving the output of the predictive models during reachingmovements [7]. Hence, the bin-count data for the reachingmovement will contain increasing firing activity of the spe-cific neuronal subset on local partitions of the trajectory. Dueto binning, it is possible that more than one firing pattern isassociated with a single data sample. This analysis leads toa generative model for the binned data in which data sam-ples are generated by linear combination of the specific firingpatterns with nonnegative coefficients. Also, these firing pat-terns will be linearly independent since the neuronal subsetin each firing patterns tends to modulate firing rates only forthe local part of trajectory. The third condition of factorialsampling can be approximately satisfied by the repetition ofmovements in which the variability of a particular firing pat-tern is observed during the entire data set. However, a morerigorous analysis is necessary to support the argument thatthe set of firing patterns is complete in factorial terms. There-fore, we expect that the NMF solutions may be slightly vari-able reflecting the ambiguity in the completeness of factorial

sampling. This might be overcome by collecting more datafor reaching movements, and will be pursued in future stud-ies.

3.3. Case studiesThe NMF algorithm is applied to the described neuronal datamatrix prepared using ten taps, n = 91 neurons for monkey-1 (after eliminating the neurons that do not fire throughthe entire training set) and n = 52 neurons for monkey-2. The NMF algorithm with 100 independent runs resultsin r=5 bases for both monkey-1 and monkey-2 datasets forwhich the index I is maximized. The means and the stan-dard deviations of the normalized cost (Frobenius norm oferror between approximation and the given data matrix di-vided by the Frobenius norm of the data only) for 100 runsare 0.8399 ± 0.001 for monkey-1 data and 0.7348 ± 0.002for monkey-2 data. This implies that the algorithm approx-imately converges to the same solution with different initialconditions (although not sufficient).

In Figure 2, we show the resulting basis vectors (columnsof W) for the bin counts (presented in matrix form wherecolumns are different neurons and rows are different delays),as well as their corresponding time-varying encoding coeffi-cients (rows of H) superimposed on the reaching trajectorycoordinates of three consecutive movements. Based on theassumption that the neuronal bin-count data approximatelysatisfy the three conditions for the identification of the gener-ators, the NMF basis vectors determine the sequence of spa-tiotemporal firing patterns representing the firing modula-tion of the specific neuronal subsets during the course of thereaching movement. Alternatively, we can say that NMF dis-covers these latent firing patterns of neural population by op-timal linear approximation of the data with few bases [9]. Forexample, from the two basis vectors each corresponding totwo primates in the left panel of Figure 2, we observe that fir-ings of the neurons in group-b are followed by firings of theneurons in group-a (the bright activity denoted by b occursearlier in time than the activity denoted by a, since increas-ing values in the vertical axis of each basis indicates goingfurther back in time). Thus, NMF effectively determines andsummarizes this sparse firing pattern that involves a groupof neurons firing sequentially. Their relative average activityis also indicated by the relative magnitudes of the entries ofthis particular basis.

Using these time-synchronized neural activity and handtrajectory recordings, it is also possible to discover rela-tionships between firing patterns and certain aspects of themovement. We can assess the repeatability of a certain fir-ing pattern summarized by a basis vector by observing thetime-varying activity of the corresponding encoding signal(the corresponding row of H) in time. An increase in thiscoefficient corresponds to a larger emphasis to that basis inreconstructing the original neural activity data. In the rightpanel of Figure 2, we observe that all bases are activated reg-ularly in time by their corresponding encoding signals (atdifferent time instances and at different amplitudes). For ex-ample, the first basis for monkey-1 is periodically activatedto the same amplitude, whereas the activation amplitude of


Time

50 100

50 100

50 100

50 100

50 100

10

5

1

10

5

1

10

51

10

51

10

51

a

b

Lag

Neuron index

Time

(b)

25 50

25 50

25 50

25 50

25 50

10

51

10

5

1

10

51

105

1

10

5

1

a

b

Lag

Neuron index

(a)

Figure 2: (a) The five bases for monkey-1 (top) and monkey-2 (bottom). (b) Their corresponding encoding signals (thick solid line) overlaidon the 3-dimensional coordinates of the reaching trajectory (dotted lines) for three consecutive representative reaching tasks (separated bythe dashed lines). Note that the encoding signals are scaled to be in the same order of the magnitude of the reaching trajectory for the visualpurpose.

the third basis varies in every movement, which might in-dicate a change in the role of the corresponding neuronalfiring pattern in executing that particular movement. Theperiodic activation of encodings also indicates the burstingnature of the spatiotemporal repetitive patterns. Hence, theNMF bases tend to encode synchronous and bursting spa-tiotemporal patterns of neural firing activity.

From the NMF decomposition, we observe certain asso-ciations between the activities of neurons from different cor-tical regions and different segments of the reaching trajec-tory. In particular, an analysis of the monkey-1 data based onFigure 2 indicates that neurons in PP and M1 (array 1) repeatsimilar firing patterns during the reach from rest to food.This assessment is based on the observation that bases three,four, and five, which involve firing activities from neurons

in these regions, are repeatedly activated by the increasedamplitude of their respective encoding coefficients. Similarly,neurons in M1 (array 2) are repeatedly activated during thereach to and from the mouth (bases one and two). Theseobservations are consistent with our previous analyses thatwere conducted through trained input-output models (suchas the Wiener filter and RMLP) [7]. Table 2 compares theneurons, which were observed to have the highest sensitivityfrom trained models, and the neurons that have the largestmagnitudes in each NMF basis. This comparison is basedon monkey-1 dataset. We can see that neurons from NMFare a subset of neurons obtained from the sensitivity anal-ysis. It is also worth stating that NMF basis provides moreinformation than the model-based sensitivity analysis sinceit determines the synchronous spatiotemporal patterns while


Table 2: Comparison of important neurons (examined in the monkey-1 dataset).

Regions PP M1(area 1) PMd M1(area 2)

The high sensitive neurons through RMLP 4, 5, 7, 22, 26, 29 38, 45 None 93, 94

The largest-magnitude neurons in NMF bases 7, 29 45 None 93, 94

Table 3: Performance evaluation of the Wiener filter and the mixture of multiple models based on NMF.

CC (x) CC (y) CC (z) MSE (x) MSE (y) MSE (z)

Monkey-1

Wiener filter 0.5772 0.6712 0.7574 0.4855 0.3468 0.2460

NMF mixture 0.7147 0.7078 0.8076 0.2711 0.2786 0.1627

Monkey-2

Wiener filter 0.3737 0.4304 0.6192 0.3050 0.7405 0.2882

NMF mixture 0.4974 0.5041 0.6916 0.2354 0.5400 0.2112

the sensitivity analysis only determines individual importantneurons. Finally, we would like to reiterate that the analysispresented here is solely based on the data, which means thatthis analysis does not need to train a specific model to inves-tigate the neural population organization.

3.4. Modeling improvement for BMI using NMFWe will demonstrate a simple example showing the improvedBMIs performance in predicting hand positions by utilizingNMF. We will compare the performance of two systems; theWiener filter directly applied to the original spike count dataand the mixture of multiple linear filters based on the NMFbases and encodings.

The straight Wiener filter is directly applied to the neuralfiring data to estimate the three coordinates of the primate’shand position. The Wiener filter has been a standard modelfor BMIs, and many other approaches have been comparedwith it [19]. With nine delays, the input dimensionality of thefilter is 910 for monkey-1 or 510 for monkey-2 (discardinginactive (no firing) neural channels). Then we add a bias toeach input vector to estimate the y-intercept. The weights ofthe filter are estimated by the Wiener-Hopf equation as

W = R−1P, (6)

where R is a 911 × 911 (or 511 × 511 for monkey-2) inputcorrelation matrix, and P is a 911×3 (or 511×3 for monkey-2) input-output cross-correlation matrix.

The mixture of multiple models employs the NMF en-codings as mixing coefficients. An NMF basis is used as awindow function for the corresponding local model. There-fore, each model sees a given input vector through a differentwindow and uses the windowed input vector to produce theoutput. Then the NMF encodings are used to combine eachmodel’s output to produce the final estimate of the desiredhand position vector. This can be described in the followingequation:

dc(n) =K∑k=1

hk(n)(

zk(n)Tgk,c + bk,c), (7)

where hk(n) is an NMF encoding coefficient for the kth basisat nth column (i.e., time index), gk,c is the weight vector ofthe kth model for the cth coordinate (c ∈ [x, y, z]), and bk,c isthe y-intercept of the kth model for the cth coordinate. zk(n)is the input vector windowed by the kth NMF basis. Its ithelement is given by

zk,i(n) = xi(n) ·wk,i. (8)

Here, xi(n) is the normalized firing count of the neuron i attime instance n, and wk,i is the ith element of the kth NMFbasis. gk,c and bk,c can be estimated based on the MSE crite-rion by using of the stochastic gradient algorithm such as thenormalized least mean square (NLMS). The weight updaterule of the NLMS for each model is then given by

gk,c(n + 1) = gk,c(n) +η

β +∥∥zk(n)

∥∥2 hk(n)ec(n)zk(n),

bk,c(n + 1) = bk,c(n) +η

β +∥∥zk(n)

∥∥2 hk(n)ec(n),(9)

where η is the learning rate and β is the normalization factor.ec(n) is the error between the cth coordinate of the desiredresponse and the model output.

In the experiment, we divided the data samples into 1771training samples and 372 test samples for monkey-1 datasetand 1739 and 782, respectively, for monkey-2 dataset. Theparameters are set as η,β,K = 0.01, 1, 5. The entiretraining data set is presented 60 times sufficient enough forthe weights to converge. The performance of the model isevaluated on the test set by two measures; the correlationcoefficient (CC) between desired hand trajectory and themodel output trajectory, and the mean squared error (MSE)normalized by the variance of the desired response. Table 3presents the evaluation of the performance of two systemsfor both monkey-1 and monkey-2 datasets. It shows a signif-icant improvement in generalization performance with themixture of models based on NMF factorization.

Note that the general performance of models for themonkey-2 dataset is worse than that for the monkey-1


dataset. The reasons may come from many experimentalvariables. One of them may be the number of electrodes andthe corresponding cortical areas, as we can see in Table 1 thatonly 32 electrodes were implanted in two areas for monkey-2,while 64 electrodes in four areas for monkey-1.

To quantify the performance difference between theWiener filter and the mixture of multiple models, we can ap-ply a statistical test based on the mean squared error (MSE)performance metric [17]. By modeling the performance dif-ference in terms of the MSE using short-time windows as anormal random variable, one can apply the t-test to quan-tify significance. This t-test was applied to both modelingoutputs for monkey-1 and monkey-2 with α = 0.01 orα = 0.05. For both datasets, the null hypothesis was re-jected with both significance levels, resulting in the p-valuesof 0.0023 for monkey-1 and 0.0007 for monkey-2, respec-tively. Therefore, the statistical test of the performance differ-ence demonstrates that the mixture of multiple models basedon NMF improves the performance significantly comparedto the standard Wiener filter.

3.5. DiscussionsThe results presented in the previous case study are a repre-sentative example of a broader set of NMF experiments per-formed on this recording. Selection of the number of tapsand the number of bases (r) is dependent on the particu-lar stimulus or behavior associated with the neural data. Al-though we have used a model selection method originally de-veloped for clustering, and did not provide full justificationthat this index is suitable to NMF, the main motivation isto demonstrate that the problem of selecting the number ofbases can be addressed in the context of model selection. Thiswill be pursued in future research.

The number of patterns that can be distinctly representedby NMF is limited by the number of bases. A very small num-ber of bases will lead to the combination of multiple patternsinto a single nonsparse basis vector. At the other extreme, avery large number of bases will result in the splitting of a pat-tern into two or more bases, which have similar encoding co-efficient signals in time. In these situations, the bases underconsideration can be combined into one basis.

It is intriguing that the mixture of models based on NMFgeneralizes better than the Wiener filter despite the fact thatthe mixture contains much more model parameters. How-ever, each model in the mixture receives the inputs processedby the sparse basis vector. Therefore, each model learns themapping between only a particular subset of neurons andhand trajectories, and the effective number of parameters foreach model is much less than the total number of input vari-ables. Moreover, further overfitting is avoided by combiningthe outputs of local models by the sparse encodings of NMF.

4. CONCLUSIONS

Nonnegative matrix factorization is a novel and relativelynew tool for analyzing the data structure when nonnegativ-ity constraints are imposed. In BMIs the neural inputs areprocessed by grouping the firings into bin counts. Since thebin counts are always positive, we hypothesized that NMF

would be appropriate for analyzing the neural activity. Theexperimental results and the analysis presented in this papershowed that we could find repeated patterns in neuronal ac-tivity that occurred in synchrony with the reaching behaviorand was automatically and efficiently represented in a set ofsparse bases. The sparseness of the bases indicates that onlya small number of neurons exhibit repeated firing patternsthat are influential in reconstructing the original neural ac-tivity matrix.

As presented in [10], NMF provides local bases of theobjects, while principal component analysis (PCA) providesglobal bases. In our preliminary experiments of PCA for thesame data, we have observed that PCA only found the mostfrequently firing neurons, which may not be related to the be-havior. Therefore, NMF can find local representation of theneural firing data, and this property of NMF can be more ef-fective than PCA for BMIs where firing activities of differentcortical areas are collected.

Lee and Seung have claimed in their paper that the sta-tistical independence among the encodings of independentcomponent analysis (ICA) forces the basis to be holistic [10].And, if local parts of the neural activity occur together at thesame time, the complicated dependencies between the en-codings would not be captured by the ICA algorithm. How-ever, we have observed that the NMF encodings seem to beuncorrelated over the entire movement. Hence, ICA withsome nonnegative constraints (e. g., nonnegative ICA [18],the ICA model with nonnegative basis [19], and nonnegativesparse coding [20]) may yield interesting encodings of theneural firing activities. Further studies will present the com-parison between NMF and these constrained ICA algorithmsapplied for BMIs.

While NMF is found to be a useful tool for analyzingneural data to find repeatable activity patterns, there arestill several issues when using NMF for neural data analysis.Firstly, the method only detects patterns of activity, but it isknown that the inactivity of a neuron could often indicate re-sponse to a stimulus or cause a behavior. An analysis based onNMF will fail to identify such neurons. Next, the nontation-ary characteristics of neural activities would make it difficultfor NMF to find fixed spatiotemporal firing patterns. Sincethe neural ensemble function tends to change over neuronalspace and time such that different spatio-temporal firing pat-terns may be involved for the same behavioral output, wemay have to continuously adapt NMF factors to track thosechanges. This motivates us to consider a recursive algorithmof NMF, which will enable us to adapt NMF factors online. Itwill be covered in the future study.

In our application of NMF, we demonstrated that theNMF learning algorithm resulted in similar Frobeniousnorm of the error matrix for 100 runs obtained with dif-ferent initial conditions. However, this does not necessarilymean that the resulted factors are similar with small variance.Therefore, we need to quantify the similarity of the NMF re-sults with different initializations. An alternative is to employother methods to obtain the global solution such as geneticor simulated annealing algorithms. This will be presented ina follow-up report.


ACKNOWLEDGMENTS

The authors would like to thank Johan Wessberg for collect-ing the data used in this paper. This work was supported bythe DARPA project no. N66001-02-C-8022.

REFERENCES

[1] J. Wessberg, C. R. Stambaugh, J. D. Kralik, et al., “Real-timeprediction of hand trajectory by ensembles of cortical neuronsin primates,” Nature, vol. 408, no. 6810, pp. 361–365, 2000.

[2] D. W. Moran and A. B. Schwartz, “Motor cortical activityduring drawing movements: population representation dur-ing spiral tracing,” Journal of Neurophysiology, vol. 82, no. 5,pp. 2693–2704, 1999.

[3] J. K. Chapin, K. A. Moxon, R. S. Markowitz, and M. A. L.Nicolelis, “Real-time control of a robot arm using simultane-ously recorded neurons in the motor cortex,” Nature Neuro-science, vol. 2, no. 7, pp. 664–670, 1999.

[4] M. D. Serruya, N. G. Hatsopoulos, L. Paninski, M. R. Fellows,and J. P. Donoghue, “Brain-machine interface: instant neuralcontrol of a movement signal,” Nature, vol. 416, no. 6877, pp.141–142, 2002.

[5] J. C. Sanchez, S.-P. Kim, D. Erdogmus, et al., “Input-outputmapping performance of linear and nonlinear models for es-timating hand trajectories from cortical neuronal firing pat-terns,” in Proc. 12th IEEE International Workshop on Neu-ral Networks for Signal Processing, pp. 139–148, Martigny,Switzerland, September 2002.

[6] S. Darmanjian, S.-P. Kim, M. C. Nechyba, et al., “Bimodalbrain-machine interface for motor control of robotic pros-thetic,” in Proc. IEEE/RSJ International Conference on Intelli-gent Robots and Systems (IROS ’03), vol. 4, pp. 3612–3617, LasVegas, Nev, USA, October 2003.

[7] J. C. Sanchez, D. Erdogmus, Y. N. Rao, et al., “Interpret-ing neural activity through linear and nonlinear models forbrain machine interfaces,” in Proc. 25th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety, vol. 3, pp. 2160–2163, Cancun, Mexico, September2003.

[8] J. M. Carmena, M. A. Lebedev, R. E. Crist, et al., “Learning tocontrol a brain-machine interface for reaching and graspingby primates,” PLoS Biology, vol. 1, no. 2, pp. 1–16, 2003.

[9] D. D. Lee and H. S. Seung, “Algorithms for non-negative ma-trix factorization,” in Advances in Neural Information Process-ing Systems 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds.,pp. 556–562, MIT Press, Cambridge, Mass, USA, 2001.

[10] D. D. Lee and H. S. Seung, “Learning the parts of objects bynon-negative matrix factorization,” Nature, vol. 401, no. 6755,pp. 788–791, 1999.

[11] D. Guillamet, M. Bressan, and J. Vitria, “A weighted non-negative matrix factorization for local representations,” inProc. IEEE Computer Society Conference on Computer Visionand Pattern Recognition (CVPR ’01), vol. 1, pp. 942–947,Kauai, Hawaii, USA, December 2001.

[12] A. d’Avella and M. C. Tresch, “Modularity in the motor sys-tem: decomposition of muscle patterns as combinations oftime-varying synergies,” in Advances in Neural InformationProcessing Systems 14, T. G. Dietterich, S. Becker, and Z.Ghahramani, Eds., pp. 629–632, MIT Press, Cambridge, Mass,USA, 2002.

[13] J. C. Sanchez, From cortical neural spike trains to behav-ior: modeling and analysis, Ph.D. dissertation, Department ofBiomedical Engineering, University of Florida, Gainesville,Fla, USA, 2004.

[14] M. A. L. Nicolelis, A. A. Ghazanfar, B. M. Faggin, S. Votaw,and L. M. Oliveira, “Reconstructing the engram: simulta-neous, multisite, many single neuron recordings,” Neuron,vol. 18, no. 4, pp. 529–537, 1997.

[15] U. Maulik and S. Bandyopadhyay, “Performance evaluation ofsome clustering algorithms and validity indices,” IEEE Trans.Pattern Anal. Machine Intell., vol. 24, no. 12, pp. 1650–1654,2002.

[16] D. Donoho and V. Stodden, “When does non-negative matrixfactorization give a correct decomposition into parts?” in Ad-vances in Neural Information Processing Systems 16, S. Thrun,L. K. Saul, and B. Scholkopf, Eds., pp. 1141–1148, MIT Press,Cambridge, Mass, USA, 2004.

[17] S.-P. Kim, J. C. Sanchez, Y. N. Rao, et al., “A Compar-ison of optimal MIMO linear and nonlinear models forbrain-machine interfaces,” submitted to Neural Computation,2004.

[18] M. Plumbley, “Conditions for nonnegative independent com-ponent analysis,” IEEE Signal Processing Lett., vol. 9, no. 6, pp.177–180, 2002.

[19] L. Parra, C. Spence, P. Sajda, A. Ziehe, and K.-R. Muller, “Un-mixing hyperspectral data,” in Advances in Neural Informa-tion Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R.Muller, Eds., pp. 942–948, MIT Press, Cambridge, Mass, USA,2000.

[20] P. O. Hoyer, “Non-negative sparse coding,” in Proc. 12thIEEE International Workshop on Neural Networks for SignalProcessing, pp. 557–565, Martigny, Switzerland, September2002.

Sung-Phil Kim was born in Seoul, SouthKorea. He received a B.S. degree from theDepartment of Nuclear Engineering, SeoulNational University, Seoul, South Korea, in1994. In 1998, he entered the Departmentof Electrical and Computer Engineering,University of Florida, in pursuit of Mas-ter of Science degree. He joined the Com-putational NeuroEngineering Laboratory asa Research Assistant in 2000. He also re-ceived an M.S. degree in December 2000 from the Departmentof Electrical and Computer Engineering, University of Florida.From 2001, he continued to pursue a Ph.D. degree in the De-partment of Electrical and Computer Engineering, University ofFlorida under supervision of Dr. Jose C. Principe. In the Compu-tational NeuroEngineering Laboratory, he has investigated the de-coding models and the analytical methods for brain-machine in-terfaces.

Yadunandana N. Rao was born in Mysore,India. He received his B.E. degree in elec-tronics and communication engineeringfrom the University of Mysore, India, in Au-gust 1997, and the M.S. and Ph.D. degreesin electrical and computer engineering fromthe University of Florida, Gainesville, Fla,in 2000 and 2004, respectively. From May2000 to January 2001, he worked as a De-sign Engineer at GE Medical Systems, Wis.Currently he is a Senior Engineer at Motorola, Fla. His researchinterests include adaptive signal processing theory, algorithms andanalysis, neural networks for signal processing, and biomedical ap-plications.


Deniz Erdogmus received his B.S. degreesin electrical engineering and mathematicsin 1997, and his M.S. degree in electricalengineering, with emphasis on systems andcontrol, in 1999, all from the Middle EastTechnical University, Turkey. He receivedhis Ph.D. degree in electrical engineeringfrom the University of Florida, Gainesville,in 2002. Since 1999, he has been with theComputational NeuroEngineering Labora-tory, University of Florida, working with Jose Principe. His currentresearch interests include information-theoretic aspects of adaptivesignal processing and machine learning, as well as their applicationsto problems in communications, biomedical signal processing, andcontrols. He is the recipient of the IEEE SPS 2003 Young AuthorAward, and is a Member of IEEE, Tau Beta Pi, and Eta Kappa Nu.

Justin C. Sanchez received a B.S. degreewith highest honors in engineering sci-ence along with a minor in biomechan-ics from the University of Florida in 2000.From 1998 to 2000, he spent three yearsas a Research Assistant in the Departmentof Anesthesiology, University of Florida. In2000, he joined the Department of Biomed-ical Engineering and Computational Neu-roEngineering Laboratory, the University ofFlorida. In the spring of 2004, he completed both his M.E. andPh.D. degrees in biomedical signal processing working on the de-velopment of modeling and analysis tools for brain-machine in-terfaces. He is currently a Research Assistant Professor in the De-partment of Pediatrics, Division of Neurology, the University ofFlorida. His neural engineering electrophysiology laboratory is cur-rently developing neuroprosthetics for use in the research and clin-ical settings.

Miguel A. L. Nicolelis was born in SaoPaulo, Brazil, in 1961. He received his M.D.and Ph.D. degrees from the University ofSao Paulo, Brazil, in 1984 and 1988, re-spectively. After postdoctoral work at Hah-nemann University, Philadelphia, he joinedDuke University, where he now codirectsthe Center for Neuroengineering and is aProfessor of neurobiology, biomedical en-gineering, and psychological and brain sci-ences. His laboratory is interested in understanding the generalcomputational principles underlying the dynamic interactions be-tween populations of cortical and subcortical neurons involved inmotor control and tactile perception.

Jose C. Principe is a Distinguished Profes-sor of electrical and computer engineeringand biomedical engineering at the Univer-sity of Florida where he teaches advancedsignal processing, machine learning, and ar-tificial neural networks (ANNs) modeling.He is a BellSouth Professor and the Founderand Director of the University of FloridaComputational NeuroEngineering Labora-tory (CNEL). His primary area of interestis processing of time-varying signals with adaptive neural models.The CNEL has been studying signal and pattern recognition prin-ciples based on information-theoretic criteria (entropy and mutual

information). He is an IEEE Fellow. He is a Member of the AD-COM of the IEEE Signal Processing Society, a Member of the Boardof Governors of the International Neural Network Society, and theEditor-in-Chief of the IEEE Transactions on Biomedical Engineer-ing. He is a Member of the Advisory Board of the University ofFlorida Brain Institute. He has more than 90 publications in ref-ereed journals, 10 book chapters, and 200 conference papers. Hedirected 35 Ph.D. dissertations and 45 Master theses. He recentlywrote an interactive electronic book entitled Neural and AdaptiveSystems: Fundamentals Through Simulation published by John Wi-ley and Sons.


Finding Significant Correlates of ConsciousActivity in Rhythmic EEG

Piotr J. DurkaLaboratory of Medical Physics, Institute of Experimental Physics, Warsaw University, ul. Hoza 69, 00-681 Warsaw, PolandEmail: [email protected]

Received 28 January 2004; Revised 27 July 2004

One of the important issues in designing an EEG-based brain-computer interface is an exact delineation of the rhythms, relatedto the intended or performed action. Traditionally, related bands were found by trial and error procedures seeking maximumreactivity. Even then, large values of ERD/ERS did not imply the statistical significance of the results. This paper presents completemethodology, allowing for a high-resolution presentation of the whole time-frequency picture of event-related changes in theenergy density of signals, revealing the microstructure of rhythms, and determination of the time-frequency regions of energychanges, which are related to the intentions in a statistically significant way.

Keywords and phrases: time-frequency, adaptive approximations, matching pursuit, ERD, ERS, multiple comparisons.

1. INTRODUCTION

Thinking of a “brain-computer interface” (BCI), one canimagine a device which would directly process all the brainsoutput—like in a perfect virtual reality machine [1]. Today’sattempts are much more humble: we are basically at the levelof controlling simple left/right motions. On the other hand,these approaches are more ambitious than direct connec-tions to the peripheral nerves: we are trying to guess the in-tention of an action directly from the activity of the brainscortex, recorded from the scalp (EEG).

Contemporary EEG-based BCI systems are based uponvarious phenomena like, for example, visual or P300 evokedpotentials, slow cortical potentials, or sensorimotor cortexrhythms [2]. The most attractive path leads towards the de-tection of the “natural” EEG features, for example a normalintention of moving the right hand (or rather its reflectionin EEG) would move the cursor to the right. Determinationof such features in EEG is more difficult than using evokedor especially trained responses. Desynchronization of the µrhythm is an example of a feature correlated not only withthe actual movement, but also with its mere imagination.

All these approaches encounter obstacles, common in theneurosciences: great intersubject variability and poor under-standing of the underlying processes. Significant improve-ment can be brought by coherent basic research on the EEGrepresentation of conscious actions. This paper presents twomethodological aspects of such research.

(i) High-resolution parameterization and feature extrac-tion from the EEG time series. Scalp electrodes gather

signal from many neural populations, so the rhythmsof interest are buried in a strong background. Owingto the high temporal resolution of EEG and the oscil-latory character of most of its features, we can look forthe relevant activities in the time-frequency plane.

(ii) Determination of significant correlates of consciousactivities requires a dedicated statistical framework.Until recently, reporting significance of changes in thetime-frequency plane presented a serious problem.

2. TIME-FREQUENCY ENERGY DENSITY OF SIGNALS

Among the parameters used in nowadays BCI systems(like those designed in the Graz University of Technology[3]), event-related desynchronization and synchronization(ERD/ERS) phenomena play an important role. ERD andERS are defined as the percentage of change of the average(across repetitions) power of a given rhythm—usually µ/α,β, and γ [4]. Estimation of the time course of the rhythm en-ergy is crucial for the sensitivity of these parameters. But dueto the intersubject variability, we cannot expect the rhythmsto appear at the same frequencies for all subjects.

Therefore, a classical procedure was developed to find thereactive rhythms [4]. For each subject, the frequency range ofinterest was divided into 1 Hz intervals, in each of them thesingle trials (repetitions) were bandpass filtered, squared, andaveraged, to obtain the estimate of the average band energy.Among these fixed bands, those revealing the largest changesrelated to the event were chosen. This naturally limits the

Significance of Changes of the Time-Frequency Energy Density of EEG 3123

a2

2ab

b2

a b

Figure 1: Top: Wigner distribution ((A.5); vertical—frequency,horizontal—time) of the signal simulated as two short sines (bot-tom). We observe the autoterms a2 and b2 corresponding to thetime and frequency spans of the sines, and cross-term 2ab at timecoordinates where no activity occurs in the signal.

frequency resolution to 1 Hz—not taking into account theaccuracy of bandpass filtering of finite sequences.

The whole problem is naturally embedded in the time-frequency space. Time-frequency density of signal energy, av-eraged across trials, provides all the information about therhythms and the time course of their energy in one clear pic-ture (Figure 2).

2.1. Time-frequency distributions of energy density

Because of the uncertainty principle, there are many alterna-tive estimates of the time-frequency density of signal’s en-ergy. Actually, the same problem (nonunique estimates) ispresent also in calculating the spectral power or bandpass fil-tering finite sequences, but in the quadratic time-frequencydistributions we may say that the relevancy of the prob-lem is “squared.” Fluctuations of power spectra, appearingat high resolutions, in the time-frequency distributions takethe form of cross-terms. These false peaks occur in betweenthe autoterms (which correspond to the actual signal’s struc-tures), and significantly blur the energy estimates (Figure 1).Their presence stems from the equation (a + b)2 = a2 + b2 +2ab. Quadratic representation of an unknown signal s, com-posed of two structures a and b, contains autoterms corre-sponding to these structures (a2 and b2) as well as the cross-term 2ab. For a signal more complex than a sum of twoclear and separate structures (like the simplistic simulationin Figure 1), cross-terms are indistinguishable from the au-toterms. Advanced mathematical methods are being devel-oped for the reduction of this drawback [5]. While some ofthem give impressive results for particular signals, in generalwe are confronted with the tradeoff: higher resolution versus.more reliable (suppressed cross-terms) estimate.

2.2. Adaptive approximations

If we knew exactly the structures (a and b) of which the sig-nal is composed, we might explicitly omit the cross-term 2ab,

thus obtaining a clear time-frequency picture. In practice,this would require a reasonably sparse approximation of thesignal in a form

s ≈M∑n=1

wigi, (1)

where gi are known functions fitting well the actual signal’sstructures. This may be achieved only by choosing the func-tions gi for each analyzed signal separately.1 Criterion of theirchoice is usually aimed at explaining the maximum part ofsignal energy in a given number of iterations (M). However,the problem of choosing the optimal set of functions gi is in-tractable.2 A suboptimal solution can be found by means ofthe matching pursuit (MP) algorithm [7]. But even this sub-optimal solution is still quite computer-intensive,3 so the firstpractical applications were not possible before mid-nineties[8]. The MP algorithm and construction of an estimate ofthe signal’s time-frequency energy density, which is free ofcross-terms, are described in the appendix. Functions gi arechosen from large and redundant collections of Gabor func-tions (sine-modulated Gauss).

Advantages of this estimator in the context of event-related desynchronization and synchronization were dis-cussed in [9, 10].

3. MICROSTRUCTURE OF THE EEG RHYTHMS

3.1. Experimental data

To present advantages of the presented methodology, theclassical ERD/ERS experimental setup was modified to ob-tain relatively long epochs of EEG between the events.

Thirty-one-year-old right-handed subject was half lyingin a dim room with open eyes. Movement of the thumb, de-tected by a microswitch, was performed approximately 5 sec-onds (at a subject’s choice) after a quiet sound generated ap-proximately every 20 seconds. Experiment was divided into15-minute sessions, and recorded EEG into 20-second long.After artifacts rejection, 124 epochs were left for the analysis.EEG was registered from electrodes at positions selected fromthe 10–20 system. Figures 2–4 present results for the C4 elec-trode (contralateral to the hand performing movements) inthe local average reference. Signal was down-sampled offlinefrom 250 Hz to 125 Hz.

Figure 5 presents data from another subject, collected ina standard ERD/ERS experiment.

1Contrary to most of the approaches, where all the signals are repre-sented via products with the same set of functions (e.g., basis).

2Finding the subset of M functions, which explains the largest ratio ofthe signal energy among all the other M-subsets of the highly redundant set,requires checking all the possible M-subsets, which leads to the combina-torial explosion even for moderate sets of candidate functions. Problems ofsuch computational complexity are termed NP-hard [6].

3Recent results indicate possibilities of a significant decrease of compu-tation times of bias-free MP decompositions.


4032241680

0 2 4 6 8 10 12 14 16 18 20

Figure 2: Average time-frequency energy density of 124 trials (Section 3.1, energy cut above 2%, sqrt scale); darker area marks higher valuesof the energy density. Horizontal scale in seconds, vertical in Hz. Finger movement in the 12th second.

40

322416

80

3 5 7 9 11 13 15 17

209

104

0

−34

−68%

Figure 3: ERD/ERS map corresponding to the time between 3 and 19 seconds (vertical lines in Figure 2). Shades of gray are proportional tothe percentage of change relative to the reference epoch (between 1 and 3 seconds in Figure 2).

3.2. High-resolution picture of energy density

Time-frequency estimates of the signal energy density, in-cluding the MP estimate given by (A.5), contain no phase in-formation, so they can be summed across the trials to give theaverage time-frequency density of energy.4 Figure 2 presentssuch an average for 124 repetitions of EEG synchronized tothe finger movement, occurring in the 12th second. We eas-ily observe that the α rhythm concentrates around 12 Hz. Wemay also notice its decrease (desynchronization) around thetime when finger movement occurred, as well as some in-creased activity in 15–30 Hz near 12–13 seconds (β synchro-nization).

In another experiment (Figure 5), high-resolution esti-mate revealed clearly two very close but separate componentsof the µ rhythm with different time courses—an effect elusiveto the previously applied methods.

3.3. High-resolution ERD and ERS

Speaking of the decrease in the α rhythm in the previ-ous section, we compared the activity near the 12th sec-ond (Figure 2) to the average level of the α rhythm energy,or, more correctly, to a period before the movement, whichshould not be related to the event. To quantify this proce-dure, we must define the reference period, to which the en-ergy changes will be related. It should be distant enough fromthe onset of the event, to avoid incorporating premovementcorrelates into the reference. To avoid border problems of es-timates, it should be also removed from the very start of the

4Note that the average of the energy densities is in general different fromthe energy density of the averaged signal. The latter (averaged signal) revealsphase-locked phenomena like for example the classical evoked potential.

analyzed epoch. In Figure 2 it was chosen between the 1st andthe 3rd second.

Classically, for each selected band, ERD/ERS were cal-culated as the percentage of power relative to the referenceepoch (ERD corresponding to a decrease and ERS to an in-crease). Owing to the high-resolution estimate of the wholepicture of energy density, we may calculate it for the wholerelevant time-frequency region with maximum resolution.ERD/ERS map in Figure 3 was obtained as a ratio of eachpoint’s energy to the average energy of the reference epoch inthe same frequency. In this plot we observe, like in Figure 2,darker area (increase) corresponding to the β postmove-ment synchronization, and white spot around the time ofthe movement, corresponding to the α desynchronization.However, in the long premovement period there are still alot of fluctuations, which naturally implies a question aboutthe statistical significance of the observed changes.

4. STATISTICAL SIGNIFICANCE

The following steps constitute a fully automatic (hence ob-jective and repeatable) and statistically correct procedure,which delineates and presents with high resolution the time-frequency regions of significant changes in the average energydensity.

(1) Divide the time-frequency plane into resels (fromresolution elements), for which the statistics are calculated(Section 4.1).

(2) Calculate pseudo-t statistics and p-values for the nullhypothesis of no change in the given resel compared to thereference epoch in the same frequency (Section 4).

(3) Select a threshold for the null hypothesis corrected bymultiple comparisons (Section 4.3).

(4) Display the energy changes calculated for maximum


40

32241680

3 5 7 9 11 13 15 17

209

104

0

−34

−68%

Figure 4: ERD/ERS from Figure 3 displayed in regions revealing statistically significant changes in resampling pseudo-t tests (Section 4.2),corrected by 5% false detection rate (Section 4.3).

13

11

9

7

5

Hz

0.5 1.52.5

3.54.5

5.56.5

Movements

Figure 5: Average time-frequency energy density (2) of 57 trialsfrom the C1 electrode (average reference), constructed for gγi longerthan 250 milliseconds. Presented from 5 to 15 Hz is the finger move-ment in the 5th second. We observe two very close, but separate, µrhythms with different time courses. Faster rhythm desynchronizesabout 1.5 seconds before the movement, while the slower lasts untilits very onset and desynchronizes in the 5th second.

resolution (Section 3.2) in windows corresponding to reselswhich indicated statistically significant changes.

These steps will be described in the following sections.Further details can be found in [10].

4.1. Integration of MP maps in resels

In choosing the dimensions of a resel, suitable for the sta-tistical analyses, we turn to the theory of the periodogramsampling [11]. For a statistically optimal sampling of the pe-riodogram the product of the frequency interval and signallength gives 1/2. This value was taken as the product of theresel’s widths in time and frequency, their ratio being a freeparameter.

Calculating the amount of energy in such relatively largeresels simply as the value of the distribution (A.5) in its cen-ter, that is,

Epoint(ti,ωi

) =∑n

∣∣⟨Rn f , gγn⟩∣∣2

Wgγn(ti,ωi

), (2)

may not be representative for the amount of energy con-tained in a given resel. In such case5 we use the exact solution:

Eint(ti,ωi

)

=∑n

∣∣⟨Rn f , gγn⟩∣∣2

∫ ti+∆t/2

ti−∆t/2

∫ ωi+∆ω/2

ωi−∆ω/2Wgγn(t,ω)dt dω.

(3)

4.2. Resampling the pseudo-t statistics

The values of energy of all the N repetitions (trials) in eachquestioned resel will be compared to the energies of reselswithin the corresponding frequency of the reference epoch.We denote the time indices ti of resels belonging to the ref-erence epoch as ti, i∈ ref and their number contained ineach frequency slice as Nref. For each resel at coordinatesti,ωi we will compare its energy averaged over N repeti-tions with the energy averaged over repetitions in resels fromthe reference epoch in the same frequency. Their differencecan be written as

∆E(ti,ωi

) = 1N

N∑k=1

Ekint

(ti,ωi

)+

− 1N ·Nref

N∑k=1

∑j∈ref

Ekint

(t j ,ωi

)

= E(ti,ωi

)− E(tref,ωi

),

(4)

where the superscript “k” denotes the kth repetition (out ofN).

However, we want to account also for the different vari-ances of Ek, revealing the variability of the N repetitions.Therefore we replace the simple difference of means (4) bythe pseudo-t statistics:

t = ∆E(ti,ωi

)s∆

, (5)

where ∆E is defined as in (4), and s∆ is the pooled variance ofthe reference epoch and the investigated resel. In spite of the

5The difference between (2) and (3) is most significant for structuresnarrow in time or frequency relative to the dimensions of resels.


central limit theorem, this magnitude tends to have nonnor-mal distribution [10]. Therefore, we use resampling meth-ods.

We estimate the distribution of t from (5)—under thenull hypothesis of no significant change—from the data inthe reference epoch (for each frequency N · Nref values) bydrawing with replacement two samples of sizes N and N ·Nref and calculating, for each such replication, statistics (5).This distribution is approximated once for each frequency.Then for each resel the actual value of (5) is compared tothis distribution yielding p for the null hypothesis.

The number of permutations giving values of (5) exceed-ing the observed value has a binomial distribution for Nrep

repetitions with probability α.6 Its variance equals Nrepα(1−α). The relative error of α will be then (cf. [12])

σαα=√√√ (1− α)

αNrep. (6)

To keep this relative error at 10% for a significance levelα = 5%, Nrep = 2000 is enough. Unfortunately, due to theproblem of multiple comparisons discussed in Section 4.3,we need to work with much smaller values of α. In this studyNrep was set to 2 · 106, which resulted in relatively large com-putation times.

4.3. Adjustment for multiplicityIn the preceding section, we estimated the achieved signif-icance levels p for null hypotheses of no change of the av-erage energy in each resel, compared to the reference regionin the same frequency. Adjusting results for multiplicity is avery important issue in case of such a large amount of po-tentially correlated tests. As proposed in [10], it can be effec-tively achieved using the false discovery rate (FDR, [13]). Itcontrols the ratio q of the number of the true null hypothe-ses rejected to all the rejected hypotheses. In our case this isthe ratio of the number of resels, to which significant changesmay be wrongly attributed, to the total number of resels re-vealing changes.

Let us denote the total number of performed tests, equalto the number of questioned resels, as m. If for m0 of themthe null hypothesis of no change is true, then [13] provesthat the following procedure controls the FDR at the levelq(m0/m) ≤ q.

(1) Order the achieved significance levels pi, approxi-mated in the previous section for all the resels sepa-rately, in an ascending series: p1 ≤ p2 ≤ · · · ≤ pm.

(2) Find

k = max

i : pi ≤ i

m∑m

j=1(1/ j)q

. (7)

(3) Reject all hypotheses for which p ≤ pk.

6For brevity we omit the distinction between the exact value α, whichwould be estimated from all the possible repetitions, and the actually calcu-lated.

4.4. Display of the statistically significant ERD/ERS

Figure 4 gives the final picture of statistically significantchanges in the time-frequency plane. It is constructed by dis-playing the high-resolution ERD/ERS map (Figure 3) only inthe areas corresponding to the resels which revealed statisti-cal significance in the procedure from Section 4. Desynchro-nization of 12-Hz α occurs around the time of the movement(12th second). Synchronization of 18–30 Hz β, occurring justafter the movement, is divided in half by the harmonic ofα (24 Hz). In the long premovement epoch no significantchanges are detected, which suggests the robustness and reli-ability of the whole procedure.

5. CONCLUSIONS

Presented procedure gives high-resolution and free-of-cross-terms estimates of the average time-frequency energy den-sity of event-related EEG, revealing the microstructure ofrhythms. Time-frequency area of significant changes are as-sessed via objective statistical procedures. This allows for ex-ample to investigate the minimum number of repetitions re-quired to delineate the reactive rhythms. Application of thismethodology may bring a significant improvement in basicresearch on the event-related changes of EEG rhythms, aswell as “per subject” customization of the ERD/ERS-basedBCI.

6. REPRODUCIBLE RESEARCH

Software for calculating the MP decomposition (ap-pendix), with complete source code in C and executa-bles for GNU/Linux and MS Windows, plus an inter-active display and averaging of the time-frequency mapsof energy (in Java), are available at http://brain.fuw.edu.pl/∼durka/software/mp. Datasets used in Figures 5–4 andMatlab code for calculating maps and statistics like Figures2–4: are available at http://brain.fuw.edu.pl/∼durka/tfstat/.

APPENDIX

MATCHING PURSUIT ALGORITHM

In each of the steps, a waveform gγn from the redundant dic-tionary D is matched to the signal Rn f , which is the residualleft after subtracting results of previous iterations:

R0 f = f ,

Rn f = ⟨Rn f , gγn⟩gγn + Rn+1 f ,

gγn = arg maxgγi∈D

∣∣⟨Rn f , gγi⟩∣∣,

(A.1)

where arg maxgγi∈D means the gγi giving the largest value ofthe product |〈Rn f , gγi〉|.


Dictionaries (D) for time-frequency analysis of real sig-nals are constructed from real Gabor functions:

gγ(t) = K(γ)e−π((t−u)/s)2sin(

2πω

N(t − u) + φ

). (A.2)

N is the size of the signal, K(γ) is such that ‖gγ‖ = 1,γ = u,ω, s,φ denotes parameters of the dictionary’s func-tions. For these parameters no particular sampling is a pri-ori defined. In practical implementations we use subsets ofthe infinite space of possible dictionary’s functions. However,any fixed scheme of subsampling this space introduces a sta-tistical bias in the resulting parameterization. A bias-free so-lution using stochastic dictionaries, where parameters of thedictionary’s functions are randomized before each decompo-sition, was proposed in [14].

For a complete dictionary the procedure converges tof —in theory in an infinite number of steps [7], but in prac-tice we use finite sums:

f ≈M∑n=0

⟨Rn f , gγn

⟩gγn . (A.3)

From this decomposition we can derive an estimate E f (t,ω)of the time-frequency energy density of signal f , by choosingonly autoterms from the Wigner distribution

W f (t,ω) =∫f(t +

τ

2

)f(t − τ

2

)e−iωτ dτ, (A.4)

calculated for the expansion (A.3). This representation willbe a priori free of cross-terms:

E f (t,ω) =M∑n=0

∣∣⟨Rn f , gγn⟩∣∣2

Wgγn(t,ω). (A.5)

ACKNOWLEDGMENTS

Thanks to J. Zygierewicz and J. Ginter for the exampledatasets. This work was supported by the Grant 4T11E02823of the Committee for Scientific Research (Poland).

REFERENCES

[1] S. Lem, Summa Technologiae, Wydawnictwo Literackie,Krakow, Poland, 2nd edition, 1966.

[2] J. R. Wolpaw, N. Birbaumer, W. J. Heetderks, et al., “Brain-computer interface technology: a review of the first interna-tional meeting,” IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 164–173, 2000.

[3] G. Pfurtscheller, C. Neuper, C. Guger, et al., “Current trendsin Graz brain-computer interface (BCI) research,” IEEE Trans.Rehab. Eng., vol. 8, no. 2, pp. 216–219, 2000.

[4] G. Pfurtscheller, “EEG event-related desynchronization(ERD) and event-related electro-encephalogram synchro-nization (ERS),” in Electroencephalography: Basic Principles,Clinical Applications and Related Fields, E. Niedermayer andF. Lopes Da Silva, Eds., pp. 958–965, Williams & Wilkins,Baltimore, Md, USA, 4th edition, 1999.

[5] W. J. Williams, “Recent advances in time-frequency represen-tations: Some theoretical foundations,” in Time Frequency andWavelets in Biomedical Signal Processing, M. Akay, Ed., IEEEPress Series in Biomedical Engineering, pp. 3–43, IEEE Press,Piscataway, NJ, USA, 1997.

[6] D. Harel, Algorithmics: The Spirit of Computing, Addison-Wesley, Reading, Mass, USA, 2nd edition, 1992.

[7] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Processing, vol. 41,no. 12, pp. 3397–3415, 1993.

[8] P. J. Durka and K. J. Blinowska, “Analysis of EEG transients bymeans of matching pursuit,” Annals of Biomedical Engineering,vol. 23, no. 5, pp. 608–611, 1995.

[9] P. J. Durka, D. Ircha, C. Neuper, and G. Pfurtscheller,“Time-frequency microstructure of event-related electro-encephalogram desynchronization and synchronization,”Medical & Biological Engineering & Computing, vol. 39, no. 3,pp. 315–321, 2001.

[10] P. J. Durka, J. Zygierewicz, H. Klekowicz, J. Ginter, and K.J. Blinowska, “On the statistical significance of event-relatedEEG desynchronization and synchronization in the time-frequency plane,” IEEE Trans. Biomed. Eng., vol. 51, no. 7,pp. 1167–1175, 2004.

[11] M. B. Priestley, Spectral Analysis and Time Series, AcademicPress, New York, NY, USA, 1981.

[12] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap,Chapman & Hall, New York, NY, USA, 1993.

[13] Y. Benjamini and D. Yekutieli, “The control of the false discov-ery rate in multiple testing under dependency,” Ann. Statist.,vol. 29, no. 4, pp. 1165–1188, 2001.

[14] P. J. Durka, D. Ircha, and K. J. Blinowska, “Stochastic time-frequency dictionaries for matching pursuit,” IEEE Trans. Sig-nal Processing, vol. 49, no. 3, pp. 507–510, 2001.

Piotr J. Durka received his M.S. and Ph.D.degrees in medical physics from WarsawUniversity, where he is currently an Assis-tant Professor. His research relates to themethodology of EEG analysis, mainly time-frequency signal processing. He introducedadaptive approximations (MP algorithm) tothe EEG analysis; after a decade of successfulapplications, he aims at the unification ofadvanced signal processing and traditional,visual analysis of EEG.

EURASIP Journal on Applied Signal Processing 2005:19, 3128–3140c© 2005 David A. Peterson et al.

Feature Selection and Blind Source Separationin an EEG-Based Brain-Computer Interface

David A. PetersonDepartment of Computer Science, Center for Biomedical Research in Music, Molecular,Cellular, and Integrative Neurosciences Program, and Department of Psychology,Colorado State University, Fort Collins, CO 80523, USAEmail: [email protected]

James N. KnightDepartment of Computer Science, Colorado State University, Fort Collins, CO 80523, USAEmail: [email protected]

Michael J. KirbyDepartment of Mathematics, Colorado State University, Fort Collins, CO 80523, USAEmail: [email protected]

Charles W. AndersonDepartment of Computer Science and Molecular, Cellular, and Integrative Neurosciences Program,Colorado State University, Fort Collins, CO 80523, USAEmail: [email protected]

Michael H. ThautCenter for Biomedical Research in Music and Molecular, Cellular, and Integrative Neurosciences Program,Colorado State University, Fort Collins, CO 80523, USAEmail: [email protected]

Received 1 February 2004; Revised 14 March 2005

Most EEG-based BCI systems make use of well-studied patterns of brain activity. However, those systems involve tasks that indi-rectly map to simple binary commands such as “yes” or “no” or require many weeks of biofeedback training. We hypothesizedthat signal processing and machine learning methods can be used to discriminate EEG in a direct “yes”/“no” BCI from a singlesession. Blind source separation (BSS) and spectral transformations of the EEG produced a 180-dimensional feature space. Weused a modified genetic algorithm (GA) wrapped around a support vector machine (SVM) classifier to search the space of featuresubsets. The GA-based search found feature subsets that outperform full feature sets and random feature subsets. Also, BSS trans-formations of the EEG outperformed the original time series, particularly in conjunction with a subset search of both spaces. Theresults suggest that BSS and feature selection can be used to improve the performance of even a “direct,” single-session BCI.

Keywords and phrases: electroencephalogram, brain-computer interface, feature selection, independent components analysis,support vector machine, genetic algorithm.

1. INTRODUCTION

1.1. EEG-based brain-computer interfaces

There is a fast-growing research and development effort un-derway to implement brain-computer interfaces (BCI) using

This is an open access article distributed under the Creative CommonsAttribution License, which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

the electroencephalogram (EEG) [52]. The overall goal is toprovide people with a new channel for communication withthe external environment. This is particularly important forpatients who are in a “locked-in” state in which conventionalmotor output channels are compromised.

One simple, desirable BCI function would allow individ-uals without motor function to respond to questions withsimple “yes” or “no” responses [35]. Yet most BCI researchhas used experiments that require an indirect mapping be-tween what the subject does and the effect on an external

Feature Selection and BSS in EEG-Based BCI 3129

system. For example, subjects may be required to imagineleft- or right-hand movement in order to use the BCI [3, 37,39]. If they want to use the BCI to respond yes/no to ques-tions, they have to remember that left-hand imagined move-ment corresponds to “yes,” and right-hand imagined move-ment corresponds to “no.” Other BCI research requires ex-tensive subject biofeedback training in order for the subjectto gain some degree of voluntary influence over EEG featuressuch as slow cortical potentials [5] or 8–12 Hz rhythms [53].For both the imagined movement and biofeedback scenarios,the mapping between what the subject does and the effect onthe BCI is indirect. In the latter case, a single session is insuf-ficient and the subject must undergo many weeks or monthsof training sessions.

A more direct approach would simply have the sub-ject imagine “yes” or “no” and would not require extensivebiofeedback training. While imagined movement and bidi-rectional influence over time- and frequency-domain ampli-tude can be readily detected and used as control signals ina BCI, the EEG activity associated with complex cognitivetasks such as imagining different words is much more poorlyunderstood. Can advances in signal processing and pat-tern recognition methods enable us to distinguish whethera subject is imagining “yes” or “no” by the simultaneouslyrecorded EEG? Furthermore, can that distinction be learnedin a single recording session?

1.2. The EEG feature space

The EEG measures the scalp-projected electrical activity ofthe brain with millisecond resolution at up to over 200 elec-trode locations. Although most EEG-based BCI research usesfar fewer electrodes, research into the role of the specific to-pographic distribution of the electrodes [54] suggests thatdense electrode arrays may standardize and enhance the sys-tem’s performance. Furthermore, advances in electrode andcap technology have made the time required to apply over200 electrodes reasonable even for BCI patients. EEG anal-yses, including much of the EEG-based BCI research, makeextensive use of the signals’ corresponding frequency spec-trum. The spectrum is usually divided into five canonical fre-quency bands. Thus, if one considers the power in each ofthese bands for each of 200 electrodes, each trial is describedby 1000 “features.” If interelectrode features such as cross-correlation or coherence are considered, this number growscombinatorially. As in many such problems, a subset of fea-tures will often lead to better dissociation between trial typesthan the full set of features. However, the number of uniquefeature subsets for N features is 2N , a space that cannot be ex-haustively explored for N greater than about 25. This is butone reason why most EEG research uses only a very smallnumber of features. A significant number of features are dis-carded, including features that might significantly improvethe accuracy with which the signals can be classified.

1.3. Blind source separation of EEG

Given a set of observations, in our case a set of time series,blind source separation (BSS) methods such as independent

component analysis (ICA) [22] attempt to find a (usuallylinear) transformation of the observations that results in aset of independent observations. Infomax [4] is an imple-mentation of ICA that searches for a transformation thatmaximizes the information between the observations and thetransformed signals. Bell and Sejnowski showed that a trans-formation maximizing the information is, in many cases, agood approximation to the transformation resulting in in-dependent signals. ICA has been used extensively in analysesof brain imaging data, including EEG [26, 34], magnetoen-cephalogram (MEG) [47, 49], and functional magnetic res-onance imaging (FMRI) [26]. Assumptions about how inde-pendent brain sources are mixed and map to the recordedscalp electrodes, and the corresponding relevance for BSSmethods, are discussed extensively in [27].

Maximum noise fraction (MNF) is an alternative BSS ap-proach for transforming the raw EEG data. It was initiallyintroduced in the context of denoising multispectral satellitedata [14]. Subsequently it has been extended to the denois-ing of time-series [1] and it has been compared to principalcomponents analysis and canonical correlation analysis in aBCI [2]. The basis of the MNF subspace approach is to con-struct a set of basis vectors that optimize the amount of noise(or, equivalently, signal) captured. Specifically, the maximumnoise fraction basis maximizes the noise-to-signal (as well asthe signal-to-noise) ratio of the transformed signal. Thus, theoptimization criterion is based on the ratio of second-orderstatistical quantities. Furthermore, unlike ICA, the basis vec-tors have a natural ordering based on the signal-to-noise ra-tio. MNF is similar to the second-order blind identification(SOBI) algorithm and requires that the signals have differentautocovariance structures. The requirement exists because ofthe second-order nature of the algorithm.

The relationship of MNF to ICA is a consequence of thefact that they both provide methods for solving the BSS prob-lem [1, 21]. Initial results for the application of MNF to theanalysis of EEG time-series demonstrated MNF was simulta-neously effective at eliminating noise and extracting what ap-peared to be observable phenomenon such as eye blinks andline noise [28, 29]. It is interesting that ICA and MNF per-form similarly given their disparate formulations. This sug-gests that under appropriate assumptions (see [1, 21, 28]) themutual information criterion and the signal-to-noise ratiocan be related quantities. However, in the instance that sig-nals of interest are mixed such that they share the same sub-space, the MNF approach provides a representation for themixed and unmixed subspaces.

1.4. Classification and the feature selection problem

The support vector machine (SVM) classifier [45, 48] learnsa hyperplane that provides a maximal soft margin betweenthe data classes in a higher-dimensional transform space de-termined by a choice of kernel function. Although SVMs canfail in problems with many nuisance features [19], they havedemonstrated competitive classification performance in dif-ficult domains as diverse as DNA microarray data [8], textcategorization [25], and image classification [40]. They have


also been successfully employed in EEG-based BCI research[6, 12, 32, 56]. In contrast to competing nonlinear classifierssuch as multilayer perceptrons, SVMs often exhibit higherclassification accuracy, are not susceptible to local optima,and can be trained much faster. Because we seek feature sub-sets that maximize classification accuracy, the feature subsetsearch needs to be driven by how well the data can be clas-sified using the corresponding feature subsets, the so-called“wrapper” approach to feature selection [30]. Thus the speedcharacteristic of SVMs is particularly important because wewill train and test the classifiers for every feature subset weevaluate.

Our prior research with EEG datasets from a cognitiveBCI [2] and movement prediction BCI [12] demonstratedthe benefit of feature selection for small and large featurespaces, respectively. There are many ways to implement thefeature selection search [7, 16, 42]. One logical choice is agenetic algorithm (GA) [13, 20]. GAs provide a stochasticglobal search of the feature subset space, evaluating manypoints in the space in parallel. A population of feature subsetsis evolved using crossover and mutation operations akin tonatural selection. The evolution is guided by how well featuresubsets can classify the trials. GAs have been successfully em-ployed for feature selection in a wide variety of applications[15, 51, 55] including EEG-based BCI research [12, 56]. GAsoften exhibit superior performance in domains with manyfeatures [46], do not get trapped in local optima as with gra-dient techniques, and make no assumptions about feature in-teractions or the lack thereof.

In summary, this paper evaluates a feature selection sys-tem for classifying trials in a novel, challenging BCI usingspectral features from the original, and two BSS transforma-tions of, scalp recorded EEG. We hypothesized (1) that clas-sification accuracy would be higher for the feature subsetsfound by the GA than for full feature sets and random featuresubsets and (2) that the power spectra of the BSS transforma-tions would provide feature subsets with higher classificationaccuracy than the power spectra of the original signals.

2. METHODS

2.1. Subjects

The subjects were 34 healthy, right-handed fully informedconsenting volunteers with no history of neurological or psy-chiatric conditions. The present paper is based on data fromeight of the subjects who met certain criteria for behavioralmeasures and details of the EEG recording procedure. Specif-ically, we selected eight subjects that wore caps with physi-cally linked mastoids for the reference. Other subjects wore acap with mastoids digitally linked for the reference. Althoughthe difference between physically and digitally linked mas-toid reference is minor, it can be nontrivial depending onthe relative impedances at the two mastoid electrodes [36].Thus, to eliminate the possibility that the slight differencein caps could influence the questions at hand, we electedto consider only those subjects wearing the cap with physi-cally linked mastoids. We also considered only those subjects

0 0.75 1.5 2.5 3 s

EEG

<Visualize> · · · (100 trials)No<Visualize>Yes

Visual display

Figure 1: BCI task timeline. Subjects were asked to visualize themost recently presented word until the next word is displayed. Theperiod of simultaneously recorded EEG used for subsequent anal-ysis was 1000 milliseconds long beginning 750 milliseconds afterdisplay offset and 500 milliseconds before the next display onset.

that exhibited reasonable inter-response intervals and a rea-sonably even distribution of “yes”/“no” responses in a sep-arate, voluntarily decided premotor visualization version ofthe task (described in a separate forthcoming manuscript).The subjects were selected on these criteria only, before theirEEG data was reviewed. The eight subjects were 19 + / − 1years of age and included five females.

2.2. BCI experiment procedure

On each of 100 trials subjects were shown one of the words“yes” or “no” on a computer display for 750 milliseconds andwere instructed to visualize the word until the next word isdisplayed (see Figure 1). There were 50 “yes” trials and 50“no” trials presented in random order with a maximum ofthree of the same stimulus in a row. Because in subsequentanalyses we planned to ignore the first two trials due to ex-periment start-up transients, the first two trials were requiredto include exactly one of each type.

2.3. EEG recording and feature composition

The EEG was continuously recorded with a 32-electrode cap(QuikCap, Neuroscan, Inc.), pass band of 1–100 Hz, andsampled at 1 kHz. Although much higher than the 200 Hzrequired by Nyquist, we typically sample at 1 kHz for themere convenience that in subsequent time-domain analysesand plots, samples are equivalent to milliseconds. ElectrodesFC4 and FCZ were excluded because of sporadic techni-cal problems with the corresponding channels in the ampli-fier. The remaining 30 electrodes used in subsequent analy-sis included bipolar VEOG and HEOG electrodes commonlyused to monitor blinks and eye movement artifacts. All otherelectrodes were referenced to physically linked mastoids. Wedid not employ any artifact removal or mitigation in thepresent study, as we sought to measure performance with-out the added help or complexity of artifact mitigation tech-niques.

The BSS methods were applied to the continuouslyrecorded EEG data from the beginning of the first epoch tothe end of the last. The majority of the continuous recordrepresented task-related activity because the intertrial periodwas only approximately 30 milliseconds. We used the Matlabimplementation of Infomax available as part of the EEGLAB


software1 [10]. The EEGLAB software first spheres the data,which decorrelates the channels. This simplifies the ICA pro-cedure to finding a rotation matrix which has fewer degreesof freedom [23]. Except for the convergence criteria, all of thedefault parameter values for EEGLAB’s Infomax algorithmwere used. Initially, extended Infomax, which allows for sub-Gaussian as well as super-Gaussian source distributions, wasused. No sub-Gaussian sources were extracted on the firsttwo subjects so the standard Infomax approach was used onall of the subject data. An initial transformation matrix wasfound with a tolerance of 0.1. The algorithm was then rerunwith this transformation matrix and a tolerance of 0.001.

To investigate whether comparing Infomax ICA and theMNF method would be of empirical value, a simple test wasperformed on the data set for several subjects. Both trans-forms were applied to each subject’s data and the resultingcomponents were compared. The cross-correlation for allInfomax-MNF component pairs was computed, and the op-timal matching was found. This matching paired the com-ponents so that the maximal cross-correlation was achieved.Had the components produced been the same, the cross-correlation measure would have been 100%. Cross correla-tions of 60–70% were found in the tests performed, and sowe decided the two transforms were sufficiently dissimilar towarrant the evaluation of both in the study.

Each of the original, Infomax, and MNF-transformeddata were “epoched” such that the one-second period begin-ning 750 milliseconds after stimulus offset was used for sub-sequent analysis. Because iconic memory is generally thoughtto last about 500 milliseconds, this choice of temporal win-dow should minimize the influence of iconic memory andplace relatively more weight on active visualization processes.We then computed spectral power for each channel (com-ponent) and each trial (epoch) using Welch’s periodogrammethod that uses the average spectra from overlapping win-dows of the epoch. We computed averaged spectral power inthe delta (2–4), theta (4–8), lower alpha (8–10), upper alpha(10–12), beta (12–35), and gamma (35–50 Hz) frequencybands. Thus, the full feature set contains 30 electrodes × 6spectral bands each for a total of 180 features. The first andsecond trials were excluded to reduce the transient effects ofthe start of the task. Thus, all subsequent analyses use 49 tri-als of each type (“yes,” “no”) for each subject. All reportedresults are for individual subjects.

2.4. Classification

In the present report, we sought subsets from a very large fea-ture set that would maximize our ability to distinguish “yes”from “no” trials. The distinction was tested with a supportvector machine (SVM) classifier and an oversampled variantof 10-fold cross-validation.

As discussed in the introduction, we chose a support vec-tor machine (SVM) classifier because of its record of very

1Available from the Swartz Center for Computational Neuro-science, University of California, San Diego, http://www.sccn.ucsd.edu/eeglab/index.html.

good classification performance in challenging problem do-mains and its speed of training. We used a soft margin SVM2

with a radial basis function (RBF) kernel with γ = 0.1. TheSVM was trained with regularization parameter υ = 0.8,which places an upper bound on the fraction of error exam-ples and lower bound on the fraction of support vectors [44].Given m training examples Xx1, . . . , xm ⊆ RN and theircorresponding class labels Y = y1, . . . , ym ⊆ −1, 1, theSVM training produces nonnegative Lagrange multipliers αithat form a linear decision boundary:

f (x) =m∑

i=1

yiαik(x, xi

)(1)

in the feature space3 defined by the Gaussian kernel (of widthinversely proportional to γ):

k(x, xi

) = exp(− γ∥∥x − xi

∥∥2). (2)

On each feature subset evaluation, we trained and tested theSVM on one full run of stratified 10-fold cross-validation,randomly selecting with replacement 10% of the trials oneach fold for testing.

2.5. Feature selection

We used a genetic algorithm (GA) to search the space of fea-ture subsets in a “wrapper” fashion (see Figure 2). Individu-als in the GA were simply bit strings of length 180, with a 1indicating the feature was included in the subset and 0 indi-cating it was not. Our Matlab GA implementation was basedon Goldberg’s original simple GA [13], using roulette-wheelselection and 1-point crossover. We used conventional valuesfor the probability of crossover (0.6) and that of mutation(1/(4 ∗ D), where D = number of features, or 0.0014). Weevolved a population of 200 individuals over 50 generations.Each individual’s “fitness” measure was determined by thecorresponding subset’s mean classification accuracy.

We instrumented the GA with a mechanism for main-taining information about the cumulative population, thatis, all individuals evaluated thus far. Thus, individuals thatwere evaluated more than once develop a list of evaluationmeasures (classification accuracies). This took advantage ofthe inherent “resampling” that occurs in the GA because rela-tively “fit” individuals are more likely to live on and be reeval-uated in later generations than “unfit” individuals. Such re-sampling, with different partitions of the trials into train-ing/test sets on each new evaluation, reduces the risk of over-fitting due to selection bias. The empirical effect of this over-sampled variant of cross-validation and its role in feature se-lection search is illustrated in the first part of Section 3. All

2The SVM was implemented with version 3.00 of the OSU SVM Tool-box for Matlab [33], which is based on version 2.33 of Dr. Chih-Jen Lin’sLIBSVM.

3Here “feature space” refers to the space induced by the RBF kernel, notto be confused with the feature space, and implicit space of feature subsets,referred to elsewhere in the manuscript.


Originaldata

Infomax

MNF

Powerspectra

Features

Genetic algorithm

Featuresubset

selection

Supportvector

machine

Classificationaccuracy

Feature subsets withhighest classification

accuracy

Dissociatingfeatures

(a) (b)

Figure 2: Feature selection system architecture. Three feature “families” were composed with parallel and/or series execution of signal trans-formations. Feature subsets are then evaluated with a support vector machine (SVM) classifier and the space of possible feature subsetssearched by a genetic algorithm (GA) guided by the classification accuracy of the feature subsets. (a) Feature composition. (b) Featureselection. (Adapted from [12, Figure 1].)

subsequent reports of classification accuracy use the mean ofthe 10 best feature subsets that were subjected to at least five“sample evaluations” each.

3. RESULTS

3.1. Fitness evolution and overfittingat the feature selection level

Figure 3 shows how the fitness of feature subsets evolves overgenerations of the GA. In these and subsequent figures, the“chance” level of classification accuracy (50%) is shown witha dotted line. Note that even at the first generation of ran-domly selected feature subsets, the average performance ofthe population is slightly above chance at 54%. This sug-gests that, on average, randomly chosen feature subsets pro-vide some small discriminatory information to the classifier.The approximately 70% accuracy maximum mean fitness inthe first generation of the GA represents a single “sampling”of the 10-fold cross-validation. Thus, there exists a set of10 randomly chosen training/test trial partitions for whichone of the 200 initial, randomly chosen feature subsets gave70% classification accuracy. However, such results need tobe assessed with caution, as illustrated in the right panel ofFigure 3. Further “sampling” for a given feature subset (i.e.,repetitions of a full 10-fold cross-validation) gives a more ac-curate picture of that feature subset’s ability to dissociate the“yes” and “no” trials.

3.2. The benefit of feature selection

Figure 4 shows how classification accuracy is improved whencomparing feature subsets selected by the GA with full fea-ture sets. For every BSS transformation (original, Infomax,and MNF) every subject’s “yes”/“no” visualizations are bet-ter distinguished with feature subsets than with the wholefeature set.

3.3. The benefit of BSS transformations

Figure 5 shows for each subject how the classification ac-curacies compare for the original signals and the two BSS

transformations. For every subject, at least one of the BSStransformations leads to better classification accuracy thanthe original signals. Spectra of Infomax and MNF transfor-mations performed statistically significantly better than thespectra of the original signals for every subject except sub-ject 1 and MNF for subject 5 (Wilcoxon rank-sum test, alpha= 0.05). The relative performance of the three transforma-tions does not appear to be an artifact of random processesin the GA because it holds across two entirely separate runsof the GA.

3.4. Intersubject variability in good feature subsets

Figure 6 shows the features selected for the feature subsetsthat provided the highest classification accuracy. For bothsubjects, the features include a diverse mix of electrodes andfrequency bands. Although spatial trends emerge (e.g., thefull power spectrum was included for electrodes FC3 andCZ), no single frequency band was included across all elec-trodes. Also, there appears to be some consistency betweensubjects in terms of the selected features. Subject 1’s best fea-ture subset included 106 features and subject 6’s best featuresubset included 91 features. The two subjects’ best subsetshad 57 features in common, including broadband featuresfrom central and left frontocentral scalp regions.

3.5. Feature values correspondingto the “yes” and “no” trials

Figure 7 shows the median values of the features across the49 trials of each type for subject 6. Although a spatiospec-tral pattern of differences is shown in the lower part of thefigure, none of the individual features exhibited significantdifferences between the two conditions. A few were signif-icant at the p < 0.05 level (0.02-0.03), but certainly notafter adjusting for multiple comparisons. Some of the fea-tures with notable differences between “yes” and “no” wereincluded in subject 6’s best feature subset (e.g., multiplebands from CZ, FZ, and FC3). However, a number of suchfeatures were not included in subject 6’s best feature subset(e.g., delta band power in P3, F7, FP2, and F8—see Figure 6aand Figure 7c).


0 10 20 30 40 50

Generation

40

45

50

55

60

65

70

75

80

Cla

ssifi

cati

onac

cura

cy(%

)

MaxAvg

(a)

2 4 6 8 10

Number of samples

40

45

50

55

60

65

70

75

80

Cla

ssifi

cati

onac

cura

cy(%

)

(b)

Figure 3: Feature subset evolution and overfitting. (a) Mean fitness of all individuals in the cumulative population as of that generation; “avg”is the average and “max” the maximum mean fitness. Data shown is for subject 6, Infomax transformation. Note that the maximum meanfitness in the cumulative population does not monotonically increase because repeated sampling of a particularly fit individual may reducethat individual’s mean fitness value (see (b)). (b) Mean fitness of the best individual in the population for each of several different “sampling”values. Each “sample” is the mean classification accuracy from a full 10-fold cross-validation run, which uses 10 randomly selected train/testpartitions of the trials for that subject. The generally decreasing function reflects overfitting at the feature selection level, whereby so manyfeature subset evaluations occur that the system finds train/test partitions of the trials that lead to higher-than-average fitness for a specificfeature subset. Additional sampling of how well that feature subset classifies the data increases confidence that the oversampled result is notsimply due to 10 fortuitous partitions of the trials.

4. DISCUSSION

4.1. Feature selection in the EEG-based BCI

We implemented a feature selection system for optimizingclassification in a novel, “direct” EEG-based BCI. For all threerepresentations of the signals (original, Infomax, and MNF)and for all subjects, the GA-based search of the feature sub-set space leads to higher classification rates than both thefull feature sets and randomly selected subsets. This indi-cates that choosing feature subsets can improve correspond-ing classification in an EEG-based BCI. This also indicatesthat it is not simply smaller feature sets that lead to improvedclassification, but the selection of specific “good” feature sub-sets. Also, classification accuracy improves over generationsof the GA’s feature subset search, indicating that the GA’s it-erative search process leads to improved solutions. We ranthe GA for over 700 generations for one subject’s Infomax

data, and the resultant feature subsets demonstrated morethan a 14% increase in classification accuracy over that ob-tained after just 50 generations. Although this suggests anextensive search of the feature subset space may be benefi-cial, the roughly one week of additional computational timemay be inappropriate for some BCI research settings.

Note that, as mentioned in the introduction, there aremany ways to conduct the feature subset search and the GAis only one family of such search methods. Sequential for-ward (or backward) search (SFS) methods add features oneat a time but can suffer from nesting wherein optimal subsetsare missed because previously “good” features are no longerjointly “good” with other newer features and cannot be re-moved. The same limitation applies to backward versionsof SFS that subtract single features from a full feature set.Floating versions of these methods, sequential forward float-ing search (SFFS), and sequential backward floating search


All Subset40

45

50

55

60

65

70

Cla

ssifi

cati

onac

cura

cy(%

)

(a)

All Subset40

45

50

55

60

65

70

Cla

ssifi

cati

onac

cura

cy(%

)

(b)

All Subset40

45

50

55

60

65

70

Cla

ssifi

cati

onac

cura

cy(%

)

(c)

Figure 4: Feature subsets outperform the whole feature set across feature classes and subjects. “All” refers to the full set of all features, and“subset” refers to the feature subsets found by the GA. Each line connects the mean classification accuracies for both cases for a single subjectfor each of the (a) “original,” (b) “Infomax,” and (c) “MNF” transformations.

(SBFS) [41], mitigate the nesting problem by variably addingand taking away previously added features. In principle, bothGAs and the floating methods allow for complex feature-feature interactions. However, their migration thru the sub-set space can differ substantially. Depending on how they areimplemented, sequential methods can implicitly assume acertain ordering to the features, whereas GAs do not makethat assumption. Similarly, SFFS/SBFS are not as “global” intheir search as a GA. The floating search methods cannot“jump” from one subset to a very different subset in a singlestep as is inherent in typical GA implementations. Whetheror to what extent these differences affect the efficacy of thesearch methods depends on the problem domain and needsto be evaluated empirically. A few investigators have com-pared the floating search methods SFFS/SBFS to GAs for fea-ture selection [11, 24, 31]. Kudo and Sklansky have demon-strated that GAs outperform SFFS and SBFS when the num-ber of features is greater than about 50 [31]. Another class offeature selection methods is known as “embedded” methods.In the embedded approach, the process of selecting features isembedded in the use of the classifier. One example is recur-sive feature elimination (RFE) [17, 50], which has recentlybeen used in an EEG-based BCI [32]. RFE takes advantageof the feature ranking inherent in using a linear SVM. How-ever, as with other embedded approaches to feature selection,it lacks the flexibility of wrapper methods because, by def-inition, the feature subset search cannot be separated from

the choice of classifier. Feature selection research has only re-cently begun with EEG and a comparison of feature selectionmethods with EEG needs to be conducted.

We also demonstrated and addressed the issue of over-fitting at the level of feature selection. The sensitivity of anysingle feature subset’s performance to the specific set of 10train/test trial partitions is a testament to the well-known butoften overlooked trial-to-trial variability of the EEG. It is alsoan empirical illustration of overfitting resulting from exten-sive search of the feature subset space, also known as “selec-tion bias” [43]. Our feature subset search conducts many fea-ture subset evaluations (e.g., 200 individuals over 50 genera-tions= 10, 000 evaluations) and there are many ways to ran-domly choose a partition of training/test trials. Thus, thereexist 10 random training/test partitions of the trials for whichspecific feature subsets will do much better than average ifevaluated over other sets of 10 random train/test partitions.Fundamentally, as more points in the feature subset space aretested, the risk of finding fortuitous sets of train/test parti-tions increases, so greater partition sampling is required. Inthe case of a GA-based feature selection algorithm, we couldmake the partition sampling dynamic by, for example, in-creasing the amount of resampling as the GA progresses thrugenerations of evolution. However, increasing the data par-tition sampling over the course of the feature subset searchwould of course slow down the system as the search pro-gresses. Nevertheless, the GA’s inherent resampling and the


1 2 3 4 5 6 7 8

Subject

45

50

55

60

65

70

Cla

ssifi

cati

onac

cura

cy(%

)

OriginalInfomaxMNF

(a)

1st 2nd

GA run

45

50

55

60

65

70

Cla

ssifi

cati

onac

cura

cy(%

)

OriginalInfomaxMNF

(b)

Figure 5: The benefit of the BSS transformations and the replicability of their relative value between GA runs. (a) Mean classification accuracyof the 10 best feature subsets with at least 5 “sample evaluations.” (b) The performance results for the three transformations for subject 5over two separate runs of the GA.

O2

O1

OZ

PZ

P4

CP

4P

8C

4T

P8

T8

P7

P3

CP

3C

PZ

CZ

FT8

TP

7C

3 FZ F4 F8 T7

FT7

FC3

F3FP

2 F7FP

1

δθ

Iα

uα

β

γ

(a)

O2

O1

OZ

PZ

P4

CP

4P

8C

4T

P8

T8

P7

P3

CP

3C

PZ

CZ

FT8

TP

7C

3 FZ F4 F8 T7

FT7

FC3

F3FP

2 F7FP

1

δθ

Iα

uα

β

γ

(b)

Figure 6: Features selected in a “good” subset of the original spectral features and their overlap between two subjects. (a) Subject 6, (b) subject 1.White indicates the feature was not selected, grey indicates that the feature was selected for that subject only, and black indicates the featurewas selected for both subjects.

ease with which such resampling could be implemented in aGA provide yet another reason to use a GA for the featuresubset search in extremely noisy domains such as EEG.

How best to address the overfitting issue remains an ac-tive line of research. There are numerous data partitioningand resampling methods such as leave-one-out or the boot-strap. Although we partially mitigated the issue by using anoversampled variant of cross-validation, a more principledapproach needs to be developed for highly noisy, underde-termined problem domains. Although one should use as testdata trials unseen during the feature subset search [43], thisfurther exacerbates the problem of having so few trials asis typically the case with single-session EEG experiments.The current experiment had roughly 50 trials per condition

per subject. Although experimental sessions with many moretrials per condition raise concerns about habituation andarousal, the benefits for evaluating classifiers and associatedfeature selection may outweigh the disadvantages. In casessuch as the present study with a limited number of trials,oversampling methods such as the bootstrap or the resam-pling GA variant we used may provide a reasonable alterna-tive to the full, nested cross-validation implied by separateclassifier model selection and feature subset search.

4.2. The classifier and subset search parameter space

We used only nonlinear SVMs in this study. A theoretical ad-vantage over linear SVMs is that they can capture nonlin-ear relationships between features and the classes. However,


O2

O1

OZ

PZ

P4

CP

4P

8C

4T

P8

T8

P7

P3

CP

3C

PZ

CZ

FT8

TP

7C

3 FZ F4 F8 T7

FT7

FC3

F3FP

2 F7FP

1

δθIαuαβγ

0

2

4

(a)

O2

O1

OZ

PZ

P4

CP

4P

8C

4T

P8

T8

P7

P3

CP

3C

PZ

CZ

FT8

TP

7C

3 FZ F4 F8 T7

FT7

FC3

F3FP

2 F7FP

1

δθIαuαβγ

0

2

4

(b)

O2

O1

OZ

PZ

P4

CP

4P

8C

4T

P8

T8

P7

P3

CP

3C

PZ

CZ

FT8

TP

7C

3 FZ F4 F8 T7

FT7

FC3

F3FP

2 F7FP

1

δθIαuαβγ

−0.5

0

0.5

(c)

Figure 7: Median feature values for the two kinds of trials. (a) “Yes”, (b) “no”, and (c) difference values for subject 6, original spectra features.Bars on right show normalized spectral power (or power difference, for “yes”−“no”).

nonlinear classifiers have the disadvantage that the classifier’sweights do not provide a simple proxy measure of the in-put feature’s importance, as is the case with the linear SVMformulation. We also used only one setting of SVM param-eters in this study. The optimal width of the Gaussian SVMkernel, γ, in particular is known to be sensitive to the clas-sifier’s input dimensionality (number of features). Althoughwe could have varied γ as a function of the subset size, weexplicitly chose not to. If we had varied γ in a principled way(e.g., larger for fewer features), the exact formulation wouldbe arbitrary. If we would have conducted SVM model selec-tion and optimized γ empirically, it would have introducedanother loop of cross-validation in addition to that usedto train and test the SVM for every subset evaluation. Thiswould not only be substantially more computationally de-manding, but also exacerbate the risk of overfitting or reducethe amount of trials available for training/testing. In eithercase, allowing γ to vary would introduce another variableand we would not know whether differences in performancebetween feature subsets should be attributed to the subsetsthemselves or their correspondingly tuned classifier parame-ters. Although the relative performance of the full versus par-tial feature subsets is sensitive to γ, we expect that the rela-tionship found in the present study would remain becausefeature selection usually improves classification accuracy inEEG-based BCIs. Note also that the relative performance offeature selection using the original versus BSS-based featureswas based on a consistent application of γ and the subsetscontained roughly equivalent numbers of features.

We also used only one setting of GA parameters in thisstudy. In general, one would expect that classification ac-curacy and the feature selection process are sensitive to theparameters used in the SVM and GA. In fact, especially inwrapper approaches to feature selection, the classifier’s op-timal parameters and optimal feature selection search algo-rithm parameters will not be independent. In other words,

the optimal SVM model parameters will be sensitive to thespecific feature subset, and vice versa. Thus, it may be sub-optimal to conduct the model selection separately from thefeature selection. Instead, the SVM model selection processand the feature subset search should be conducted simulta-neously rather than sequentially. We have recently demon-strated this empirically with DNA microarray data [38], adomain with noise characteristics and input dimensionalitynot unlike that of EEG features. Although the SVM parame-ters could be encoded into a bit string and optimized with aGA in conjunction with the feature subset, the two optimiza-tion problems are qualitatively different and should proba-bly be conducted with separate mechanisms. This remains aquestion for further research.

4.3. BSS in EEG-based BCI

Our results showed that the power spectra of the BSS trans-formations provided feature subsets with higher classifica-tion accuracy than the power spectra of the original EEG sig-nals. This improvement held for seven out of eight subjectsand was consistent across independent runs of the GA. Theresults suggest that BSS transformations of EEG signals pro-vide features with stronger dissociating power than featuresbased on spectral power of the original EEG signals. Infomaxand MNF differed only slightly, but both provided a markedimprovement in classification accuracy over spectral trans-formations of the original signals. This suggests that use ofa BSS method may be more important than the choice ofspecific BSS method, although further tests with other BSSmethods and other datasets would be required to substanti-ate that interpretation.

In some EEG research using ICA, the investigator eval-uates independent components manually. This can be con-sidered a manual form of feature selection. However as withthe “filter” approach to feature selection, the features are


not selected based on their impact on the accuracy of thefinal classifier in which they are used. Rather, they are se-lected based on characteristics such as their scalp topogra-phy, the morphology of their time course, or the variance ofthe original signal for which they account. In some cases, thedecision about which features to keep is subjective. In thepresent study we explicitly chose not to take this approach.Instead, we used the wrapper approach to search the full fea-ture set based exclusively upon the components’ contribu-tion to classification. Of course, this does not preclude thepossibility that preceding automated feature selection witha manual filter approach to feature selection would improveoverall performance. Many domains benefit from the jointapplication of manual and automated approaches, includingmethods that do and do not leverage domain-specific knowl-edge.

4.3.1. “Good” feature subsets

Subjects’ best feature subsets included many features fromthe full feature set. We believe that this may be at least par-tially the result of crossover in the GA, whereby new individ-uals will tend toward having approximately half of the fea-tures selected. The fitness function used by the GA to searchthe space of feature subsets used only those subsets’ classi-fication accuracy. We did not use selective pressure to re-duce the number of features in selected subsets. However,this could be easily implemented by simply biasing the fit-ness function with a term that weights the cardinality of thesubsets under consideration. If there exist many feature sub-sets of low cardinality that perform roughly as well as subsetswith higher cardinality, then one would generally prefer thelow-cardinality solutions because subsets with fewer featureswould, in general, be easier to analyze and interpret.

Good feature subsets included a disproportionately highrepresentation of left frontocentral electrodes. This topogra-phy is consistent with a role for language production, includ-ing subvocal verbal rehearsal. It suggests that the cortical net-works involved in rehearsing words may exhibit dissociablepatterns of activity for different words. The spatial informa-tion in the EEG scalp topography is insufficient to determinewhether the networks used for rehearsing the two words haddifferentiable anatomical substrates. However, such differ-ences may be detectable with dipole analysis of high-densityEEG and/or functional neuroimaging.

We compared subjects’ good subsets of spectral powerbased on original EEG signals. Of the two subjects whose bestfeature subsets we analyzed, approximately 60% of the in-cluded features were common to both subjects. The commonfeatures included several spectral bands in left frontocentralelectrodes. We did not compare subjects’ good subsets usingBSS-transformed EEG. One disadvantage of the BSS meth-ods is that, because they are usually used to transform fullcontinuous EEG recordings on a per-subject basis, there isno immediately apparent way to match one subject’s com-ponents with another subject’s components. Although thiscan be attempted manually, the process can be subjective andproblematic. Often only some of the components have sim-ilar topographies and/or time courses between subjects, and

the degree of similarity can be quite variable. Thus it may bedifficult to compare selected features among different sub-jects when the features are based on BSS transformations ofthe original EEG signals.

The pattern of actual feature values was very similar forthe “yes” and “no” trials. Because both conditions involvedthe same type of task, it is reasonable to assume that the as-sociated brain activity would be similar at the level of scalp-recorded EEG. None of the individual features differed sig-nificantly between the two conditions. Although some of thefeatures with highest amplitude differences between “yes”and “no” were included in the best (most dissociating) fea-ture subsets, other such features were not. At the currentpoint in this research, we cannot conclude whether this is be-cause certain features were not considered in the GA-basedsearch, or because the interactions of certain features do bet-ter than those single features. Evidence for or against theformer interpretation could be excluded by adding a sim-ple per-feature test to the GA’s search of the feature subsetspace. Note that single features can have identical means (in-deed, even identical distributions) for “yes” and “no” trials,yet contribute to a feature subset’s ability to dissociate the twotrial types because of class-conditional interdependencies be-tween the features. Per-feature statistical tests, and some fea-ture selection methods, for that matter, assume the featuresare independent, ignoring any interactions among the fea-tures. Such assumptions are generally too limiting for com-plex, high-dimensional domains such as EEG. Besides, evenwhen the features are independent, there are cases when thed best features are not the same as the best d features [9, 18].

4.4. BCI application relevance

Our BCI task design provides a native interface for a patientwithout any motor control to directly respond “yes” or “no”to questions [35]. The paradigm provides a good model fora BCI setting in which the caregiver initiates dialog with thepatient. Furthermore, it avoids the indirect mappings and ex-tensive biofeedback training required in other BCI designs.However, this “direct” task design has some clear limitations.First, we do not have any control over what the subject isdoing when they are supposed to be visualizing the word.The subjects could have been daydreaming on some trialsor, perhaps even worse, still visualizing the word from anearlier trial. Of course this would degrade classification ac-curacy and may be a more severe problem for neurologi-cally impaired patients compared to the healthy, albeit per-haps less motivated, volunteers we used. Second, even if sub-jects are performing the task as instructed, different subjectsmay use different cognitive processes with correspondinglydifferent neural substrates. For example, subjects that main-tain visualizations close to the original percept will recruitrelatively more early visual system activity (e.g., in occipi-tal/temporal areas), whereas subjects that maintain the wordin a form of working memory will probably recruit the front-temporal components of the phonological loop. These twosystems involve cortico-cortical and thalamocortical loopsproducing different changes in oscillatory electrophysiologyusually manifest as changes in gamma and theta/alpha bands,


respectively. Thus, the spectral and topographic features thatbest distinguish the yes/no responses will most likely vary persubject. Indeed, this is one of the biggest motivations for tak-ing a feature selection approach to EEG-based BCIs and con-ducting the feature selection search on a strictly per-subjectbasis as we did in the present study. Third, and perhaps mostnotably, the classification accuracy is far below that obtainedin studies using “indirect” approaches. Nothing about ourapproach precludes having more than one session and there-fore many more trials with which to learn good feature sub-sets and improve classification accuracy. Also, although indi-rect approaches will probably continue to provide high clas-sification accuracy (and therefore a generally higher bit rate)for the near future, advances in basic cognitive psychologyand cognitive neuroscience may provide more clues aboutwhat might be good EEG features to use to distinguish di-rect commands such as visualizing or imagining yes/no oron/off responses. In the meantime, BSS transformations andfeature selection may provide moderate classification perfor-mance in “direct” BCIs and even help inform basic scientistsabout the EEG features on which to focus their research.

Our approach to feature selection is amenable to the de-velopment of on-line BCI applications. One could use the fullsystem, including the GA, to learn off-line the best featuresubset for a given subject and task, then use the trained SVMwith that feature subset and without the GA in an on-line set-ting. Dynamic adjustments to the optimal feature subset canbe continuously identified off-line and reincorporated intothe on-line system. Also, as suggested in the results, the bestfeature subset may include features from only a small sub-set of electrodes. The potentially much smaller number ofelectrodes could be applied to the subject, reducing appli-cation time and the risk of problematic electrodes for easieron-line use of the BCI. Although we intentionally used a de-sign without biofeedback, one could supplement this designwith feedback. Other groups have found that incorporationof feedback can be used to increase classification accuracy.Feature selection could provide guidance on which featuresare most significant for dissociating classes of EEG trials, andtherefore one source of guidance for choice of information touse in the feedback signals provided to the subject.

5. CONCLUSION

Signal processing and machine learning can be used to en-hance classification accuracy in BCIs where a priori infor-mation about dissociable brain activity patterns does not ex-ist. In particular, blind source separation of the EEG sig-nals prior to their spectral power transformation leads to in-creased classification accuracy. Also, even sophisticated clas-sifiers like a support vector machine can benefit from the useof specific feature subsets rather than the full set of possi-ble features. Although the search for feature subsets exac-erbates the risk that the classifier will overfit the trials usedto train the BCI, a variety of methods exist for mitigatingthat risk and can be assessed over the course of feature sub-set search. Feature selection is a particularly promising line

of investigation for signal processing in BCIs because it canbe used off-line to find the subject-specific features that canbe used for optimal on-line performance.

ACKNOWLEDGMENTS

The authors thank three anonymous reviewers for manyhelpful comments on the original manuscript, Dr. CarolSeger for use of Psychology Department EEG Laboratory re-sources, and Darcie Moore for assistance with data collec-tion. Partial support provided by Colorado Commission onHigher Education Center of Excellence Grant to Michael H.Thaut and National Science Foundation Grant 0208958 toCharles W. Anderson and Michael J. Kirby.

REFERENCES

[1] M. G. Anderle and M. J. Kirby, “An application of themaximum noise fraction method to filtering noisy time-series,” in Proc. 5th International Conference on Mathematicsin Signal Processing, University of Warwick, Coventry, UK,2001.

[2] C. W. Anderson and M. J. Kirby, “EEG subspace represen-tations and feature selection for brain-computer interfaces,”in Proc. 1st IEEE Conference on Computer Vision and Pat-tern Recognition Workshop for Human Computer Interaction(CVPRHCI ’03), vol. 5, Madison, Wis, USA, June 2003.

[3] F. Babiloni, F. Cincotti, L. Lazzarini, et al., “Linear classifica-tion of low-resolution EEG patterns produced by imaginedhand movements,” IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp.186–188, 2000.

[4] A. J. Bell and T. J. Sejnowski, “An information-maximizationapproach to blind separation and blind deconvolution,” Neu-ral Computation, vol. 7, no. 6, pp. 1129–1159, 1995.

[5] N. Birbaumer, A. Kubler, N. Ghanayim, et al., “The thoughttranslation device (TTD) for completely paralyzed patients,”IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 190–193, 2000.

[6] B. Blankertz, G. Curio, and K.-R. Muller, “Classifying sin-gle trial EEG: towards brain computer interfacing,” in Neu-ral Information Processing Systems (NIPS ’01), T. G. Diettrich,S. Becker, and Z. Ghahramani, Eds., vol. 14, Vancouver, BC,Canada, pp. 157–164, December 2001.

[7] A. L. Blum and P. Langley, “Selection of relevant features andexamples in machine learning,” Artificial Intelligence, vol. 97,no. 1-2, pp. 245–271, 1997.

[8] M. P. S. Brown, W. N. Grundy, D. Lin, et al., “Knowledge-based analysis of microarray gene expression data by us-ing support vector machines,” Proceedings of the NationalAcademy of Sciences of the United States of America, vol. 97,no. 1, pp. 262–267, 2000.

[9] T. M. Cover, “The best two independent measurements arenot the two best,” IEEE Trans. Syst., Man, Cybern., vol. 4, no. 1,pp. 116–117, 1974.

[10] A. Delorme and S. Makeig, “EEGLAB: an open source toolboxfor analysis of single-trial EEG dynamics including indepen-dent component analysis,” Journal of Neuroscience Methods,vol. 134, no. 1, pp. 9–21, 2004.

[11] F. Ferri, P. Pudil, M. Hatef, and J. Kittler, “Comparative studyof niques for large scale feature selection,” in Pattern Recogni-tion in Practice IV: Multiple Paradigms, Comparative Studies,and Hybrid Systems, E. S. Gelsema and L. N. Kanal, Eds., pp.403–413, Vlieland, The Netherlands, June 1994.


[12] D. Garrett, D. A. Peterson, C. W. Anderson, and M. H. Thaut,“Comparison of linear, nonlinear, and feature selection meth-ods for EEG signal classification,” IEEE Transactions on NeuralSystems and Rehabilitation Engineering, vol. 11, no. 2, pp. 141–144, 2003.

[13] D. E. Goldberg, Genetic Algorithms in Search, Optimization,and Machine Learning, Addison Wesley, Reading, Mass, USA,1989.

[14] A. A. Green, M. Berman, P. Switzer, and M. D. Craig, “A trans-formation for ordering multispectral data in terms of im-age quality with implications for noise removal ,” IEEE Trans.Geosci. Remote Sensing, vol. 26, no. 1, pp. 65–74, 1988.

[15] C. Guerra-Salcedo and D. Whitley, “Genetic approach to fea-ture selection for ensemble creation,” in Proc. Genetic and Evo-lutionary Computation Conference (GECCO ’99), pp. 236–243,Orlando, Fla, USA, July 1999.

[16] I. Guyon and A. Elisseeff, “An introduction to variableand feature selection,” Journal of Machine Learning Research,vol. 3, no. 7-8, pp. 1157–1182, 2003.

[17] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selec-tion for cancer classification using support vector machines,”Machine Learning, vol. 46, no. 1-3, pp. 389–422, 2002.

[18] D. J. Hand, Discrimination and Classification, John Wiley &Sons, New York, NY, USA, 1981.

[19] T. Hastie, R. Tibshirani, and J. Friedman, The Elements ofStatistical Learning: Data Mining, Inference, and Prediction,Springer, New York, NY, USA, 2001.

[20] J. H. Holland, Adaptation in Natural and Artificial Systems,University of Michigan Press, Ann Arbor, Mich, USA, 1975.

[21] D. R. Hundley, M. J. Kirby, and M. Anderle, “Blind source sep-aration using the maximum signal fraction approach,” SignalProcessing, vol. 82, no. 10, pp. 1505–1508, 2002.

[22] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Compo-nent Analysis, John Wiley & Sons, New York, NY, USA, 2001.

[23] A. Hyvarinen and E. Oja, “Independent component analysis:algorithms and applications,” Neural Networks, vol. 13, no. 4-5, pp. 411–430, 2000.

[24] A. Jain and D. Zongker, “Feature selection: evaluation, appli-cation, and small sample performance,” IEEE Trans. PatternAnal. Machine Intell., vol. 19, no. 2, pp. 153–158, 1997.

[25] T. Joachims, “Text categorization with support vector ma-chines,” in Proc. 10th European Conference on Machine Learn-ing (ECML ’98), pp. 137–142, Chemnitz, Germany, April1998.

[26] T.-P. Jung, S. Makeig, M. J. McKeown, A. J. Bell, T.-W. Lee,and T. J. Sejnowski, “Imaging brain dynamics using indepen-dent component analysis,” Proc. IEEE, vol. 89, no. 7, pp. 1107–1122, 2001.

[27] T. P. Jung, S. Makeig, M. Westerfield, J. Townsend, E. Courch-esne, and T. J. Sejnowski, “Analysis and visualization of single-trial event-related potentials,” Human Brain Mapping, vol. 14,no. 3, pp. 166–185, 2001.

[28] M. J. Kirby and C. W. Anderson, “Geometric analysis for thecharacterization of nonstationary time-series,” in Perspectivesand Problems in Nonlinear Science: A Celebratory Volume inHonor of Larry Sirovich, E. Kaplan, J. Marsden, and K. R.Sreenivasan, Eds., chapter 8, Springer Applied MathematicalSciences Series, Springer, New York, NY, USA, pp. 263–292,March 2003.

[29] J. N. Knight, Signal Fraction Analysis and Artifact Removal inEEG, Department of Computer Science, Colorado State Uni-versity, Fort Collins, Colo, USA, 2003.

[30] R. Kohavi and G. H. John, “Wrappers for feature subset se-lection,” Artificial Intelligence, vol. 97, no. 1-2, pp. 273–324,1997.

[31] M. Kudo and J. Sklansky, “Comparison of algorithms thatselect features for pattern classifiers,” Pattern Recognition,vol. 33, no. 1, pp. 25–41, 2000.

[32] T. N. Lal, M. Schroder, T. Hinterberger, et al., “Support vectorchannel selection in BCI,” IEEE Trans. Biomed. Eng., vol. 51,no. 6, pp. 1003–1010, 2004.

[33] J. Ma, Y. Zhao, and S. Ahalt, OSU SVM Classifier Matlab Tool-box, Ohio State University, Columbus, Ohio, USA, 2002.

[34] S. Makeig, M. Westerfield, T.-P. Jung, et al., “Functionally in-dependent components of the late positive event-related po-tential during visual spatial attention,” The Journal of Neuro-science, vol. 19, no. 7, pp. 2665–2680, 1999.

[35] L. A. Miner, D. J. McFarland, and J. R. Wolpaw, “Answer-ing questions with an electroencephalogram-based brain-computer interface,” Archives of Physical Medicine and Reha-bilitation, vol. 79, no. 9, pp. 1029–1033, 1998.

[36] P. L. Nunez, R. Srinivasan, A. F. Westdorp, et al., “EEG co-herency I: statistics, reference electrode, volume conduction,Laplacians, cortical imaging, and interpretation at multiplescales,” Electroencephalography and Clinical Neurophysiology,vol. 103, no. 5, pp. 499–515, 1997.

[37] W. D. Penny, S. J. Roberts, E. A. Curran, and M. J.Stokes, “EEG-based communication: a pattern recognitionapproach,” IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 214–215,2000.

[38] D. A. Peterson and M. H. Thaut, “Model and feature selec-tion in microarray classification,” in Proc. IEEE Symposiumon Computational Intelligence in Bioinformatics and Compu-tational Biology (CIBCB ’04), pp. 56–60, La Jolla, Calif, USA,October 2004.

[39] G. Pfurtscheller, C. Neuper, C. Guger, et al., “Current trendsin Graz brain-computer interface (BCI) research,” IEEE Trans.Rehab. Eng., vol. 8, no. 2, pp. 216–219, 2000.

[40] M. Pontil and A. Verri, “Support vector machines for 3D ob-ject recognition,” IEEE Trans. Pattern Anal. Machine Intell.,vol. 20, no. 6, pp. 637–646, 1998.

[41] P. Pudil, J. Novovicova, and J. Kittler, “Floating search meth-ods in feature selection,” Pattern Recognition Letters, vol. 15,no. 11, pp. 1119–1125, 1994.

[42] B. Raman and T. R. Ioerger, “Enhancing learning using featureand example selection,” Tech. Rep. Departement of ComputerScience, Texas A & M University, College Station, Tex, USA.

[43] J. Reunanen, “Overfitting in making comparisons betweenvariable selection methods,” Journal of Machine Learning Re-search, vol. 3, no. 7-8, pp. 1371–1382, 2003.

[44] A. J. Smola, B. Scholkopf, R. C. Williamson, and P. L.Bartlett, “New support vector algorithms,” Neural Computa-tion, vol. 12, no. 5, pp. 1207–1245, 2000.

[45] B. Scholkopf, C. J. C. Burges, and A. J. Smola, Eds., Advancesin Kernel Methods: Support Vector Learning, MIT Press, Cam-bridge, Mass, USA, 1999.

[46] W. Siedlecki and J. Sklansky, “A note on genetic algorithmsfor large-scale feature selection,” Pattern Recognition Letters,vol. 10, no. 5, pp. 335–347, 1989.

[47] A. C. Tang and B. A. Pearlmutter, “Independent componentsof magnetoencephalography: localization and single-trial re-sponse onset detection,” in Magnetic Source Imaging of theHuman Brain, L. Kaufman and Z. L. Lu, Eds., pp. 159–201,Lawrence Erlbaum Associates, Mahwah, NJ, USA, 2003.

[48] V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons,New York, NY, USA, 1998.

[49] R. Vigario, J. Sarela, V. Jousmaki, M. Hamalainen, and E. Oja,“Independent component approach to the analysis of EEG


and MEG recordings,” IEEE Trans. Biomed. Eng., vol. 47, no. 5,pp. 589–593, 2000.

[50] J. Weston, A. Elisseeff, B. Scholkopf, and M. E. Tipping, “Useof the zero-norm with linear models and kernel methods,”Journal of Machine Learning Research, vol. 3, no. 7-8, pp.1439–1461, 2003.

[51] L. D. Whitley, J. R. Beveridge, C. Guerra-Salcedo, and C. R.Graves, “Messy genetic algorithms for subset feature selec-tion,” in Proc. 7th International Conference on Genetic Algo-rithms (ICGA ’97), T. Baeck, Ed., pp. 568–575, Morgan Kauf-mann, East Lansing, Mich, USA, July 1997.

[52] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller,and T. M. Vaughan, “Brain-computer interfaces for com-munication and control,” Clinical Neurophysiology, vol. 113,no. 6, pp. 767–791, 2002.

[53] J. R. Wolpaw, D. J. McFarland, G. W. Neat, and C. A. Forneris,“An EEG-based brain-computer interface for cursor control,”Electroencephalography and Clinical Neurophysiology, vol. 78,no. 3, pp. 252–259, 1991.

[54] J. R. Wolpaw, D. J. McFarland, and T. M. Vaughan, “Brain-computer interface research at the Wadsworth center,” IEEETrans. Rehab. Eng., vol. 8, no. 2, pp. 222–226, 2000.

[55] J. Yang and V. Honavar, “Feature subset selection using a ge-netic algorithm,” in Feature Extraction, Construction and Se-lection: A Data Mining Perspective, H. Liu and H. Motoda,Eds., pp. 117–136, Kluwer Academic, Boston, Mass, USA,1998.

[56] E. Yom-Tov and G. F. Inbar, “Feature selection for the classi-fication of movements from single movement-related poten-tials,” IEEE Transactions on Neural Systems and RehabilitationEngineering, vol. 10, no. 3, pp. 170–177, 2002.

David A. Peterson is a Ph.D. candidate inthe Computer Science Department at Col-orado State University (CSU) and part ofthe Cognitive Neuroscience Group affiliatedwith CSU’s Center for Biomedical Researchin Music. He received a B.S. degree in elec-trical engineering and a B.S. degree in fi-nance from the University of Colorado atBoulder. He did business data network con-sulting for Accenture (previously AndersenConsulting) prior to returning to academia. His research is onbiomedical applications of machine learning, with an emphasis onclassification and feature selection. He has published research inareas as diverse as mammalian taste coding, brain oscillations as-sociated with working memory, and the interaction of model andfeature selection in microarray classification. His current interestsare in cognitive, EEG-based brain-computer interfaces and the in-fluence of rhythmic musical structure on the electrophysiology ofverbal learning.

James N. Knight is currently a Ph.D. stu-dent at Colorado State University. He re-ceived his M.S. degree in computer sci-ence from Colorado State University and hisB.S. degree in math and computer sciencefrom Oklahoma State University. His re-search areas include signal processing, rein-forcement learning, high-dimensional datamodeling, and the application of Markovchain Monte Carlo methods to problems insurface chemistry.

Michael J. Kirby received the B.S. degree inmathematics from MIT (1984), the M.S. de-gree (1986) and Ph.D. degree (1988) bothfrom the Division of Applied Mathematics,Brown University. He joined Colorado StateUniversity in 1989 where he is currently aProfessor of mathematics and computer sci-ence. He was an Alexander Von HumboldtFellow (1989–1991) at the Institute for In-formation Processing, University of Tuebin-gen, Germany, and received an Engineering and Physical SciencesResearch Council (EPSRC) Visiting Research Fellowship (1996). Hereceived an IBM Faculty Award (2002) and the Colorado State Uni-versity, College of Natural Sciences Award for Graduate Student Ed-ucation (2002). His interests are in the area geometric methods formodeling large data sets including algorithms for the representa-tion of data on manifolds and data-driven dimension estimation.He has published widely in this area including the textbook Geo-metric Data Analysis (2001), Wiley & Sons.

Charles W. Anderson received the B.S. de-gree in computer science in 1978 from theUniversity of Nebraska, and the M.S. andPh.D. degrees in computer science in 1982and 1986, respectively, from the Univer-sity of Massachusetts, Amherst. From 1986through 1990, he was a Senior Member ofTechnical Staff at GTE Labs in Waltham,Mass. He is now an Associate Professor inthe Department of Computer Science atColorado State University in Fort Collins, Colo. His research in-terests are in neural networks for signal processing and control.Specifically, he is currently working with medical signals and im-ages and with reinforcement learning methods for the control ofheating and cooling systems. Additional information can be foundat http://www.cs.colostate.edu/∼anderson.

Michael H. Thaut is a Professor of neurosciences and the Chairof the Department of Music, Theatre, and Dance at ColoradoState University. He is also the Head of the Center for Biomedi-cal Research in Music. His research focuses on rhythm perceptionand production and its application to movement rehabilitation intrauma, stroke, and Parkinson’s patients. Recent expansion of hisresearch agenda includes applications of the rhythmic structure ofmusic to cognitive rehabilitation in multiple sclerosis. He receivedhis Ph.D. degree in music from Michigan State University and holdsdegrees in music from the Mozarteum in Salzburg, Austria, andpsychology from Muenster University in Germany. He has servedas a Visiting Professor of kinesthesiology at the University of Michi-gan, a Visiting Scientist at Duesseldorf University Medical School,and a Visiting Professor at Heidelberg University. The author andcoauthor of primary textbooks in music therapy, his works have ap-peared in English, German, Italian, Spanish, Korean, and Japanese.


A Time-Frequency Approach to Feature Extractionfor a Brain-Computer Interface with a ComparativeAnalysis of Performance Measures

Damien CoyleIntelligent Systems Engineering Laboratory, School of Computing and Intelligent Systems, Faculty of Engineering,University of Ulster, Magee Campus, Derry BT48 7JL, UKEmail: [email protected]

Girijesh PrasadIntelligent Systems Engineering Laboratory, School of Computing and Intelligent Systems, Faculty of Engineering,University of Ulster, Magee Campus, Derry BT48 7JL, UKEmail: [email protected]

T. M. McGinnityIntelligent Systems Engineering Laboratory, School of Computing and Intelligent Systems, Faculty of Engineering,University of Ulster, Magee Campus, Derry BT48 7JL, UKEmail: [email protected]

Received 2 February 2004; Revised 4 October 2004

The paper presents an investigation into a time-frequency (TF) method for extracting features from the electroencephalogram(EEG) recorded from subjects performing imagination of left- and right-hand movements. The feature extraction procedure(FEP) extracts frequency domain information to form features whilst time-frequency resolution is attained by localising the fastFourier transformations (FFTs) of the signals to specific windows localised in time. All features are extracted at the rate of thesignal sampling interval from a main feature extraction (FE) window through which all data passes. Subject-specific frequencybands are selected for optimal feature extraction and intraclass variations are reduced by smoothing the spectra for each signal byan interpolation (IP) process. The TF features are classified using linear discriminant analysis (LDA). The FE window has potentialadvantages for the FEP to be applied in an online brain-computer interface (BCI). The approach achieves good performance whenquantified by classification accuracy (CA) rate, information transfer (IT) rate, and mutual information (MI). The informationthat these performance measures provide about a BCI system is analysed and the importance of this is demonstrated through theresults.

Keywords and phrases: brain-computer interface, neuromuscular disorders, electroencephalogram, time-frequency methods, lin-ear classification.

1. INTRODUCTION

Nearly two million people in the United States [1] are af-fected by neuromuscular disorders. A conservative estimateof the overall prevalence is that 1 in 3500 of the world’s pop-ulation may be expected to have a disabling inherited neu-romuscular disorder presenting in childhood or in later life[2]. In many cases those affected may have no control overmuscles that would normally be used for communication.BCI technology is a developing technology but has the po-tential to contribute to the improvement of living standardsfor these people by offering an alternative communicationchannel which does not depend on the peripheral nerves ormuscles [3]. A BCI replaces the use of nerves and muscles

and the movements they produce with electrophysiologicalsignals in conjunction with the hardware and software thattranslate those signals into actions [1].

A BCI involves extracting information from the highlycomplex EEG. This is usually achieved by extracting featuresfrom EEG signals recorded from subjects performing specificmental tasks. A class of features for each mental task is usuallyobtained from signals, prerecorded whilst a subject performsa number of repetitions of each mental task. Subsequently aclassifier is trained to learn which features belong to whichclass. This ultimately leads to the development of a BCI sys-tem that can determine which mental tasks are related to spe-cific EEG signals [4] and associate those EEG signals with theuser’s intended communication.


This work demonstrates the use of the short time Fouriertransform (STFT) to extract reliable features from EEG sig-nals altered by imagined right/left-hand movements. EEGdata was recorded from two recording sites on the scalppositioned at C3 and C4 [5] over the motor cortex. TheSTFT is used to calculate frequency spectra from a win-dow (i.e., STFT-window) which slides along the data con-tained within another window (i.e., the feature extraction(FE) window). All EEG data recorded from each record-ing site is passed through the FE window. The spectra aresmoothed using an interpolation (IP) process. Features areobtained from each interpolated spectrum by calculating thenorm of the power in predetermined subject-specific fre-quency bands. Linear discriminant analysis (LDA) is used forclassification and system performance is quantified based onthree performance measures. The measurement of BCI per-formance is very important for comparing different systemsand measuring improvements in systems. There are a num-ber of techniques used to quantify the effectiveness and per-formance of a BCI system. These include measuring the clas-sification accuracy (CA) and/or measuring the informationtransfer (IT) rate. The latter performance quantifier takesinto consideration the CA and the time (CT) required toperform classification of each mental task. A third and rel-atively new quantifier of performance for a BCI system isto quantify the mutual information (MI) which is a mea-sure of the average amount of information a classifier out-put contains about the input signal [6, 7]. A critical analy-sis of the performance measures, illustrating the advantagesof utilising each one for evaluating a BCI system, is pro-vided.

The performance of the system is dependent uponchoices of parameter combinations. It is shown that thewidth of the main FE window, the number of STFT win-dows, the width and length of the STFT windows, and theamount of overlap between consecutive STFT-windows allhave significant affects on the performance of the system. Aninterpolation process for smoothing the frequency spectraimproves the features and helps increase CA rates. The im-portance of each parameter is analysed. The results demon-strate that, to obtain the best performance, the parametercombinations have to be optimised individually for each sub-ject. However, a number of parameters converge to similarvalues, therefore there may exist a particular parameter com-bination that would generalise well to all subjects and thuspotentially simplify the application of the system to each in-dividual subject. Details on these aspects of the system, alongwith a comparison to other BCI systems, are discussed.

The paper is organised in 11 sections. Section 2 describesthe data acquisition procedure. Section 3 introduces theSTFT and the FEP and Section 4 provides an analysis of theEEG used in this work. Sections 5 and 6 describe the FEP andthe classification procedures, respectively. Section 7 describesbriefly three methods for quantifying the performance of aBCI system. Section 8 outlines the system optimisation pro-cedure. Sections 9 and 10 document and discuss the results.Section 11 concludes the paper.

2. DATA ACQUISITION

The EEG data used to demonstrate this approach wasrecorded by the Graz BCI research group (see acknowl-edgement) [8, 9, 10, 11]. The Graz group has developed aBCI which uses µ (8–12 Hz) and central β (18–25 Hz) EEGrhythms recorded over the motor cortex. Several factors havesuggested that µ and/or β rhythms may be good signal fea-tures for EEG-based communication. These signals are as-sociated with those cortical areas most directly connectedto the brain’s normal motor output channels [1]. The datawas recorded from 3 subjects (S1, S2, and S3) over two ses-sions, in a timed experimental recording procedure. Eachtrial was 8 s length. The first 2 s was quiet, at t = 2 s anacoustic stimulus signifies the beginning of a trial, and across “+” was displayed for 1 s, then at t = 3 s, an arrow(left or right) was displayed as cue. At the same time thesubject was asked to move a bar in the direction of the cueby imagining moving the left or right hand. The feedback(bar movement) can help the user learn to control their EEGbetter for specific tasks. For subject S1 a total of 280 trialswere recorded (140 trials of each type of movement imagery).For the subject S2 there were 320 trials (160 trials of eachtype of movement imagery). The recording was made using ag.tec amplifier (http://www.gtec.at/) and Ag/AgCl electrodes.All signals were sampled at 128 Hz and filtered between 0.5and 30 Hz. Two bipolar EEG channels were measured usingtwo electrodes positioned 2.5 cm posterior (“−”) and ante-rior (“+”) to position C3 and C4 according to the interna-tional standard (10/20 system) electrode positioning nomen-clature. In bipolar recording the recorded voltage is the volt-age difference between the anterior and posterior electrodeat each recording site. A detailed description of similar ex-perimental setups for recording these EEG signals is available[6, 8, 9, 10, 11, 12].

3. THE FE WINDOW AND THE STFT WINDOW

In this investigation there are two windows utilised—the FEwindow and the STFT window. EEG signals (or data) arefed through the FE window and within the FE window thefrequency components of the EEG signal are obtained us-ing a fast Fourier transform (FFT). Within the FE windowa temporal resolution is attained by sliding the STFT win-dow along the data sequence with a certain overlap. Thiswindowed signal processing technique is often referred to asthe Gabor transform after Gabor (1946). STFT analysis of anonstationary signal assumes stationarity over the selectedsignal segment (the STFT window). The inherent assump-tion of stationarity over the STFT window can lead to smear-ing in the frequency domain and decreased frequency reso-lution when analysing EEG signals with fast changing spec-tral content [13]. The temporal resolution can be made ashigh as possible by sliding the STFT window along the FEwindow with a large overlap. This maximises the potentialfor identifying short events that occur within the FE window[14].

A Time-Frequency Approach to Feature Extraction for a BCI 3143

Main FEwindow,

length =MSTFT window,

length = N

STFTwindow 1

STFTwindow 2

Overlap

ovl

C3

C4

Incomingsignals

Processedsignal (OLD)

FE

T

Figure 1: Illustration of FE window and STFT window in the FEP.

To localise the Fourier transform of the signal at time in-stant τ which falls within the main FE window, the STFT-window function is peaked around τ and falls off, thus em-phasising the signal in the vicinity of time τ and suppress-ing it for distant times [15]. There are a number of windowswhich can be used for achieving these characteristics. Gaborproposed the use of a Gaussian window formulated as fol-lows:

w(t) = e−(1/2)(α(t−N/2)/(N/2))2, (1)

where 0 ≤ t < N and α is the reciprocal of the standard de-viation. The width of the window is inversely related to thevalue of α; a larger α produces a narrower window. The win-dow has the length N . These constant parameters denote thelength of the window and the degree of localisation in thetime domain, respectively [15]. The tuning of these param-eters is very important for the extraction of features used inthis approach and this is made apparent in the results section.

The ordinary Fourier transform (FT) is based on com-paring the signal with complex sinusoids that extend throughthe whole time domain; its main disadvantage is the lack ofinformation about the time evolution of the frequencies. Inthis case, if an alteration occurs at some time boundary, thewhole Fourier spectrum will be affected [15]. The FT requiresstationarity of the signal which is a disadvantage in EEG anal-ysis, the EEG signal being highly nonstationary. The STFThelps to overcome many of these disadvantages and is for-mulated as follows:

Yk( f , τ) =τ+N/2∑

i=τ−N/2w∗(t − τ)yk(t)e

(− j

2πN

f t)

, (2)

where f = 0, 1, . . . ,Nf − 1. Nf is the number of frequencypoints or Fourier transforms.Yk( f , τ) contains the frequencyspectrum for each STFT window centred at τ. yk is the inputEEG signal (i.e., either C3 (k = 1) or C4 (k = 2)) containedwithin the main FE window. The number of STFT windows

used to analyse the data contained in the FE window dependson the length of the FE window M, the STFT window lengthN , and the amount of overlap, ovl, between adjacent STFTwindows. M must always be larger than N . Yk is a matrixwith Nf rows and E = (M−ovl)/(N −ovl) columns (i.e., therows contain the power of the signal for each harmonic andE is the number of STFT windows that are produced withinthe FE window).

This analysis was carried out offline although, to approx-imate the online capabilities, all features are extracted withinthe FE window so that features can be extracted at the rate ofthe sampling interval as data passes through the window. Aseach new signal sample enters the FE window, the oldest sam-ple is removed and the STFT window slides along the signalwithin the FE window (this process is repeated as each newsample enters the FE window). A frequency spectrum is cal-culated for each STFT window centred at τ. An illustrationof the FE window and STFT window is shown in Figure 1.This illustration shows two STFT-windows contained withinthe FE window for each signal (C3 or C4).

4. SPECTRAL ANALYSIS AND ERD/ERS

The spectra of signals recorded from recording sites C3 andC4 when subjects perform imagination of hand movementsusually show an increase and decrease in the intensity of fre-quencies in the µ (8–12) and central β (18–25) ranges, de-pending on the recording location and the imagined handmovement (left or right). When certain cortical areas, such asthe sensorimotor area, become activated during the course ofinformation processing, amplitude attenuation occurs in theoscillations of the µ and central β rhythms. This is knownas an event-related desynchronisation (ERD). An amplitudeenhancement or event-related synchronisation (ERS) can beobserved in cortical areas that are not specifically engaged ina given mode of activity at a certain moment of time [9, 11].The location and frequency ranges of ERS/ERD are subject-specific.


0 10 20 30 40 50 60 700

5

10

15

20

Frequency (Hz)

Am

plit

ude

Raw spectrum

0 10 20 30 40 50 60 700

5

10

15

20

Frequency (Hz)

Am

plit

ude

Raw spectrum

Figure 2: C3 Left (windows 1 & 2).

0 10 20 30 40 50 60 700

5

10

15

20

Frequency (Hz)

Am

plit

ude

Raw spectrum

0 10 20 30 40 50 60 700

5

10

15

20

Frequency (Hz)

Am

plit

ude

Raw spectrum

Figure 3: C4 Left (windows 1 & 2).

Figures 2, 3, 4, and 5 show a typical set of frequencyspectra. Figures 2 and 3 are obtained from calculating theSTFT from EEG signals recorded during imagination of left-hand movement. Figures 4 and 5 were obtained from signalsrecorded during imagination of right-hand movement. Forthis analysis only two windows were used for each signal.The top graph in each figure is the spectrum of the first win-dow and the bottom is the spectrum calculated for the sec-ond window. The dominant frequency components can beobserved from each graph.

0 10 20 30 40 50 60 700

5

10

15

20

Frequency (Hz)

Am

plit

ude

Raw spectrum

0 10 20 30 40 50 60 700

5

10

15

20

Frequency (Hz)

Am

plit

ude

Raw spectrum

Figure 4: C3 Right (windows 1 & 2).

0 10 20 30 40 50 60 700

5

10

15

20

Frequency (Hz)

Am

plit

ude

Raw spectrum

0 10 20 30 40 50 60 700

5

10

15

20

Frequency (Hz)

Am

plit

ude

Raw spectrum

Figure 5: C4 Right (windows 1 & 2).

Both spectra in Figure 5 (C4 recording right signal) showstrong evidence of µ (8–12 Hz) rhythm. This is not observ-able from Figure 4 (C3 recording right signal) which suggeststhat there is an ERD of the µ rhythm on the contra lateral side(opposite side to imagined hand movement). ERD can beinterpreted as an electrophysiological correlate of activatedcortical areas involved in processing of sensory or cognitiveinformation or production of motor behaviour (see [16]). Asmall peak can be observed within the central β (between 18and 20 Hz) rhythm on the C4 spectral plots which suggests


that there is an ERS in the central β rhythm on ipsilateralhemisphere. The large peak on the µ rhythm on the C4 elec-trode is an electrophysiological correlate of cortical areas atrest or the cooperative or synchronised behaviour of a largenumber of neurons. Similar contralateral-ipsilateral differ-ences occur during the imagination of left-hand movement,except the differences are symmetrically reversed. To deter-mine that events are truly event related, an experiment de-scribed in [16], which involves averaging spectra, is the stan-dard approach for distinguishing ERS/ERD in EEG signals.

The µ rhythm and the central β rhythm were selected asthe most reactive frequency bands from which to extract fea-tures, for all subjects analysed. There are subtle differences inthe main peaks of the upper and lower graphs in each figureindicating that throughout the imagination there is a changein the amplitude and degree of ERS/ERD in the signals. Theevolution of the frequency over time within the FE windowcan be observed more closely by using an increased num-ber of STFT windows with smaller length. Also, motor im-agery data becomes most separable at specific segments (sub-ject specific) [10], therefore if the FE window length M is se-lected properly the segments of data that produce maximumfeature separability are captured as they pass through thewindow. The best FE window width M is subject specific. Mmust be selected empirically for each subject. If the STFT-window parameters are selected properly, then feature sepa-rability can be maximised within the FE window.

5. INTERPOLATION-BASED FEATUREEXTRACTION PROCEDURE

The extracted spectra contain quite a lot of detail on the fre-quencies that are not as prominent as those in and around theµ and central β ranges. Smoothing the spectra can reduce fea-ture quality degradation caused by irregular frequency com-ponents introduced by noise and help compensate for miss-ing information. The spectrum shape can be smoothed bydecreasing the width of the STFT-window (i.e., increasing αof (1)). If the window is too narrow, the frequency resolutionwill be poor, and if the window is too wide, the time localiza-tion will not be so precise. This means that sharp localiza-tions in time and frequency are mutually exclusive becausea frequency cannot be calculated instantaneously [15]. De-pending on the application and the quantity of informationrequired about the frequency components the choice of win-dow and window parameters must be adjusted to obtain thedesired resolution. For this approach a good frequency reso-lution is important especially in the µ and β ranges but theobjective is to obtain features which can provide maximumseparability between both classes (left and right). In this re-spect the appearance of each spectrum was not of major im-portance. The reactive bands in the spectra are similar amongmost of the signals within each class but there are usually dis-crepancies in the upper and lower frequencies of each bandas well as in the peak amplitude of each band. To reduce thepossibility of these frequency components having a negativeeffect on the identification of features within each class an

interpolation process is performed to extract the gross shapeof the spectrum.

The interpolation process can smooth the spectra andthus the differences between spectra within each class can beminimised. In this way some of the larger peaks may be lostbut the interpolation plays a role in compensating missinginformation which ought to contribute to the discrimination[17] and can help reduce the intra class variance—a funda-mental goal of most FEPs. The formula for the interpolationprocess is shown as follows:

Yipkl (u) =IPE∑i=IPS

(Yk(i, l)

IPE− IPS +1

), l = 1, . . . ,E, (3)

IPS =

0 if u− ip < 0,

u− ip otherwise,(4)

IPE =Nf − 1 if u + ip > Nf + 1,

u + ip otherwise.(5)

Equation (2) is used to calculate Yk , E is the number ofspectra, and u is the value of the interpolated spectra at eachfrequency point (harmonic), therefore u = 0, 1, . . . ,Nf − 1.Nf is the number of frequency points or Fourier transformsin the spectra. The value of ip determines the number of in-terpolation points which in turn determines the degree ofsmoothing. A feature, f kl , is obtained by taking the l2-norm(i.e., the square root of the sum of the components squared)of the interpolated spectra between the preselected reactivefrequency bands. If E is the number of spectra (i.e., the num-ber of windows for one spectrogram) and L is the number ofsignals, then

m = LE, (6)

f kl =∥∥∥∥Yipkl

∥∥∥∥, l = 1, . . . ,E, (7)

where f kl is a feature obtained from the reactive frequencybands of lth interpolated spectrum of the signal recorded atthe kth recording site. According to (6), if there are 3 spec-tra (i.e., 3 STFT windows) within the FE-window for eachsignal, then E = 3, L = 2 (2 signals), and m = 6 thus,each feature vector would contain six features. To recapit-ulate, the number of features depends on the number ofSTFT-windows which depends on the FE-window length, theSTFT-window length, and the amount of overlap betweeneach STFT-window. If a large number of spectra are pro-duced, choosing a number of specific interpolated spectra forfeature extraction will reduce the feature vector dimension-ality and thus maintain/improve computational efficiency;however this may cause performance degradation. The fea-ture vector is

f v = ( f 11 , f 1

2 , . . . , f 1E , f 2

1 , f 22 , . . . , f 2

E

). (8)


6. CLASSIFICATION

After feature extraction classification is performed using lin-ear discriminant analysis (LDA), a classifier that works onthe assumption that different classes of features can be sep-arated linearly. Linear classifiers are generally more robustthan their nonlinear counterparts, since they have only lim-ited flexibility (less free parameters to tune) and are lessprone to overfitting [18]. Experimentation involved extrac-tion and classification of features at every time point in atrial. The classes were labelled −1 for left and +1 for right.This resulted in a classifier which provides a time-varyingsigned distance (TSD) as described in [6, 11]. The sign of theclassification indicates the class and the magnitude (or TSD)indicates the confidence in the classification. The time evo-lution of the CA rates and the TSD can be used to determinewhen the signals are most separable. The TSD is described inthe following section.

7. PERFORMANCE QUANTIFICATION

The performance of the proposed BCI system is quantifiedby CA, IT rate, and the MI. The CA is the percentage of tri-als that are classified correctly. The capacity of a communi-cation system is given by its IT rate, normally measured inbits/min (bpm). Capacity is often measured by the accuracyand the speed of the system in a specified application [19].For systems that rely on accuracy and speed, the main ob-jective is to maximise the number of bits that can be com-municated with high accuracy in a specific time window. Inpresent BCI systems, increasing the speed and accuracy isone of the main objectives. For example, the BCI systems in[4, 8, 19, 20, 21, 22] must be able to accurately decipher theEEG signals and respond correctly to its interpretation of theuser’s command as quickly as possible. IT rate was first usedto quantify the performance of a BCI system by Wolpaw etal. [19] and the calculation was derived in [23, 24]. A rela-tively new quantifier of performance for a BCI system is toquantify the MI which is a measure of the average amountof information a classifier output contains about the signal.This performance measure was first used by Schlogl et al.[6, 7]. To estimate the MI the classifier should produce a dis-tance value, D, where the sign of D indicates the class (ina two-class system) and the value expresses the distance tothe separating hyperplane. A greater distance from the hy-perplane indicates a higher signal-to-noise ratio (SNR). D isreferred to as the time-varying signed distance (TSD) whenestimated at the rate of the sampling interval. The D valueat a specific time point t (i.e., D(t)) for all trials is used toestimate the MI. The MI between the TSD and the class rela-tionship is the entropy difference of the TSD with and with-out the class information. The system described in this workfacilitates features to be extracted with a time resolution ashigh as the sampling rate very easily, therefore the TSD isestimated at every time instant t although there must be Msamples within the FE window before feature extraction be-gins.

8. SYSTEM OPTIMISATION

Due to the nature of this FEP, there are a number of param-eters that must be tuned and the values of these parameterscan have a significant effect on the performance of the sys-tem. These parameters are listed as follows:

(i) width of subject-specific frequency band(s),(ii) FE window length, M,

(iii) STFT window length, N ,(iv) window width, α,(v) overlap between STFT windows, ovl,

(vi) interpolation interval, ip.

Firstly, the most reactive frequency bands are selected.It is known from Pfurtscheller’s work [16] and the Graz re-search group’s [10] theoretical and meticulous work on EEGsignals recorded during the imagination of left- and right-hand movement, as well as analysis done on the spectralgraphs showing the ERD/ERS phenomenon for subject S1(c.f. Section 4), that the most reactive bands usually occur inthe µ (8–12 Hz) and central β (18–25 Hz) range. Further ad-justments of the selected bands were carried out during theperformance evaluation and it was observed that CA couldbe increased by adjusting the range of the selected bands. Inthis investigation an empirical selection of the most reactivefrequency bands was performed by increasing or decreasingthe µ and central beta bands in steps of 0.25 Hz. The data setfor each subject was partitioned into three subsets—a train-ing set (Tr), a validation set (V), and a testing set (Ts). Thetraining sets consisted of 100 trials for subject S1, and 120 forsubjects S2 and S3. The validation set for each subject con-sisted of 40 trials. The test (Ts) set consisted of 100 trials forsubject S1, and 120 for subjects S2 and S3. The best subject-specific frequency bands and all other parameters were cho-sen by testing the system on a validation data set and choos-ing the band widths that provided the highest CA rates.

To begin the parameter selection procedure firstly, the FEwindow length, M, was chosen. The value of M had to belarge enough so that the window contained enough signal toextract reliable features; however a window that is too largemay result in degraded performance. For example, if a win-dow length M = 500 is chosen, the minimum classificationtime is 500 s∗128−1 s = 3.9 s and if M = 300 the minimumclassification time is 2.34 s therefore, the IT rate can be signif-icantly influenced by the choice of M. Six different windowsizes ranging between 100 and 450 were tested. The windowsize which provided the best features was selected for furthertests. To tune the remaining STFT parameters firstly, 3 val-ues of α were chosen and subsequently tests were run withN = 50 : 50 : 300 (i.e., N was set for all multiples of 50 upto 300) whilst ip and ovl were set to 1. It was assumed thatby observing results at 6 different STFT window lengths, foreach of the three different values of α, a sufficient indicationof good combinations of these parameters for each subjectcould be attained. The highest CA rates on the training datawere used to indicate the best combinations of all param-eters. Up to eight different values of ovl were then selectedranging from 1 to 100 in specific multiples of 5 for small N


and 10 for larger values of N . The value of ovl must be lessthan N . At each value of ovl and the chosen best values of Nand α, obtained from the first selection procedure, anotherset of tests were run with ip = 3 : 3 : 18. Again CA rateswere used to choose the best combination of all four param-eters. It was observed that the CA rates are sensitive to smallchanges in ip so another set of tests were carried out wherethe best chosen ip values from the previously described testswere decremented and incremented by 1 and then 2. In cer-tain cases additional variations of the parameters were intro-duced for exhaustive tests. In a minority of situations the CArates for two or more combinations were equal and in thiscase the IT rate was used to decide the best choice. This pa-rameter selection technique only covers a small percentage ofthe possible combinations, therefore a more meticulous anal-ysis may produce better results. An automated method couldbe used to search the parameter space for optimisation of thesystem.

9. RESULTS

All parameter selection was done by analysing how well thesystem performed on the validation data (40 trials for subjectS1 and 60 trials for subjects S2 and S3). To test the generali-sation abilities of the system, further tests were performed onthe unseen testing data which consisted of 100 trials for eachof the subjects. All performance quantifiers are estimated atthe rate of the sampling interval (i.e., the performance is av-eraged over all trials at each time point; therefore, after eachnew sample is enveloped in the main FE window, the old-est sample is removed and a new set of features is extractedand classified). The results at the best time points (deter-mined by the point at which CA is maximal) are presented.Table 1 shows the results obtained based on the parametersselected using the approach described in the previous sec-tion. Columns 1 and 2 indicate the subject and the selectedsubject specific frequency bands (2 frequency bands for eachsubject), respectively. There are three parameter combina-tions (PCs), and the corresponding results, shown for eachsubject. Column 3 specifies the PC for each subject for ease ofreference. Columns 4–8 specify the FE window length M, theSTFT window lengthN , the window width, α, the overlap be-tween STFT windows, ovl, and the interpolation interval, ip,respectively. Column 8 specifies the number of features, m,which is calculated using (6). The CA rates for the validationdata are specified in column 10. The CA rates, times at whichCA is maximal (CT), the corresponding IT rates, and themaximum MI for the test data are specified in columns 11–14, respectively. All simulations were performed using MAT-LAB (http://www.mathworks.com). Functions from varioustoolboxes were utilised and all data manipulation and itera-tive software routines were developed using MATLAB sourcecode.

9.1. Subject S1

From Table 1 it can be seen that the most reactive frequencybands and feature extraction parameters differ among sub-jects. For subject S1 the most reactive bands are within the

entire µ range and a small band (18–19.5) within the centralβ range. When selecting the FE window size for subject S1,the CA rates for two different windows were equal; there-fore, the STFT window parameters were selected for eachof these windows and the results were compared. The bestSTFT window parameters differed for both FE windows. TheCA rates on the test data were less than those achieved onthe validation data indicating that overfitting occurred. PC2achieved a higher CA rate on the validation data and alsogeneralised the best to the test data. Also, the highest IT ratesare not correlated with highest CA rates although the MI forPC2 is highest. As can be seen, the test CA rates for PC2 areonly 1% higher than those obtained using PC1 but the ITrates are circa 3 bits/min lower—a substantial difference inIT rate. This is due to CT being much lower for PC1. Theclassification time is considered as the time interval (CT),beginning at the moment the user initiates the communica-tion signal (i.e., second 3 of timing scheme [8]) and endingat the point where classification is performed. In an offlineanalysis, IT rate is calculated at the point where CA is maxi-mal, thus providing an estimate of the maximum IT that thesystem is capable of achieving. The FE window size is signif-icantly smaller for PC1 than for PC2 and, as mentioned inSection 4, this can affect the IT rate (i.e., the minimum CTis always ≥ M∗128−1). This is possibly the reason for signif-icant differences in IT rates and indicates the importance ofselecting the best FE window size.

9.2. Subject S2

The most reactive frequency bands for subject S2 were se-lected to be at the upper half of the µ band (10.75–13), theupper end of lower β band, and central β bands. In this casethe CA rates of the test data are significantly higher than thatof the validation data; however, the PC for this subject waschosen as the best and the results indicate that this PC gen-eralises well to the test data. The difference in the CA ratesmay be due to the fact that the validation set is much smallerthan the test set and may contain a larger percentage of trialswhich are more difficult to classify. The IT rate is significantat almost 9 bits/min. The MI for this subject is high, indicat-ing that the SNR is high and that this subject may be able toperform modulated control of cursor more comfortably thansubject S1.

9.3. Subject S3

The most reactive bands for subject S3 appeared to be be-tween the upper end of the µ band and the lower end of thecentral β as well as in the upper β band. The upper β band is afairly uncommon reactive band but the selection method de-scribed in Section 8 resulted in this band being chosen. Forthis subject the CA rates are, again, higher for the test datathan for the validation data. This is possibly for the same rea-sons described for subject S2. The IT rate is significant at al-most 12 bits/min. It can be seen that the CT is approximately0.5 s less than that of subject S1 (PC2) but there is large dif-ference in IT rates. This is due to the CT and the CA ratesfor each subject being substantially different. The MI for thissubject is similar to that of subject S2.


Table 1: FEP parameter combination for three subjects and a comparative analysis of results.

Parameters No. Val Test

Sub2 freq. bands

(Hz) PC M N α ovl ip mCA(%)

CA(%)

CT(s)

IT(b/m)

MI(bits)

S18–13, 1 200 50 0.68 1 4 8 90 85 2.18 10.7 .047

18–19.5 2 360 100 3.68 15 2 8 91.3 86 3.42 7.28 0.52

S210.75–13,17–22.5

1 360 100 0.68 1 4 6 86.3 91.7 4.11 8.56 0.65

S311.5–19.5,27.25–30

1 360 50 0.68 5 4 14 87.5 91 2.98 11.33 0.63

10. DISCUSSION

10.1. System comparison

Results from this work show that the proposed FEP com-pares well to existing approaches. Performance results varydepending on different parameters choices. CA rates of 92%are achieved on unseen data without using cross-validation.Results ranging from 70% to 95% are reported for experi-ments carried out on similar EEG recordings [8, 9, 10]. Manyof these results are subject specific and in some cases arebased on a 10∗10 cross-validation, results of which providea more general view of the classification ability [8]. In [10] itis shown that the features derived from the power of the fre-quencies are most reliable for online feature extraction whereresults are obtained from 4 subjects, over a number of ses-sions. In the first few sessions the CA rates range between73% and 85% and for later sessions the results range from83% to 90%. The results in this work are based on record-ings made in the first few sessions at early stages of trainingand results range between 85%–92%. Results are reported ontests across different sessions, indicating that the approach isfairly stable and robust for all subjects. Robustness appearsto be an advantage of this approach, however an analysisfor multiple subjects over multiple sessions is necessary toclarify this. Current BCIs have maximum IT rates of up to25 bits/min [25]. In [26] it is shown that IT rates rangingbetween 12 and 18 bpm are achieved using left/right motorimagery data although, some of these results are based on a10 × 10 fold cross-validation. In this investigation IT ratesbetween 8–12 bits/min are achieved.

10.2. FEP parameters

Due to the considerably large number of possible FEP pa-rameters combinations, all possible combinations were nottested. A more efficient way to find the optimum param-eter settings would be to develop a fitness function whichcontains details on the three performance measures and theCT and use an automated search algorithm to optimise thePC. Criteria for limiting the optimisation to prevent over fit-ting may also be necessary. This would require a substan-tial amount of development and simulation time but wouldprobably result in improved performance. For this analysisthe results obtained were sufficient and compare well to re-sults reported in BCI literature utilising similar data.

The selection of subject-specific frequency bands did sig-nificantly influence the results. The most reactive frequencybands were initially selected based on the visual inspectionand then adjusted to obtain optimal performance (c.f. Sec-tion 8). In [10, 27, 28] the most reactive subject-specificfrequency bands were selected by a technique known asdistinction sensitive learning vector quantisation (DSLVQ)and it is shown that optimal electrode positions and fre-quency bands are strongly dependent on the subject and thatsubject-specific frequency component selection is very im-portant in BCI systems. In [28] DSLVQ is applied on spec-tral data in 1 s time window starting after cue presentationwhereas in this work the most reactive frequency bands wereselected by analysing the time course of the CA rate. It isknown that the frequency components may evolve duringthe course of the motor imagery tasks so it is possible thatthe most relevant bands vary during this period also. Theempirical approach to frequency band selection employed inthis work was used to find a general set of frequency bandsfor each subject so that CA could be maximised during thecourse of performing the mental task. Also, the bands wereadjusted in steps of 0.25 Hz whereas in [28] the analysis wasperformed on 1 Hz bands ranging between 9 and 28 Hz. Theapproach carried out in this work was not overly time con-suming and converged to a good set of relevant frequencybands for each subject. Although the approach describedin this work is a manual approach, it may account for theevolving relevance of the frequency bands more so than theDSLVQ approach which is more automated but may havebeen more time consuming to perform an analysis such asthat described in this work. In [28] it is suggested that, dueto the relevance of frequency bands changing over the courseof the trial, the DSLVQ algorithm may need dynamic adap-tation to maintain optimal band selection. Future work willinvolve experimentation with DSLVQ to determine its po-tential for dynamically selecting the relevant frequency bandsfrom EEG signals as they evolve during the course of the mo-tor imagery tasks. This may enhance the accuracy and auton-omy of the feature extraction procedure.

The FE window length can significantly influence the timecourse of the CA rates and CT. The best FE-window for allsubjects appeared to be between 200–360 (i.e., between circa1.56 s and 2.73 s long). None of the CTs equalled the win-dow length, M, indicating that there was some data removed


(i.e., forgotten) from the FE window before data within thewindow became most separable. Therefore proper selectionof the FE window can substantially improve performanceby capturing only signal sequences which are most separa-ble and forgetting data that may contribute to performancedegradation.

The STFT window parameters (N , α, and ovl) are alsocrucially important for this approach. Most CA rates weremaximised by using short but wide (small α) windows withsmall amounts of overlap. As detailed in Section 3, if the win-dow is too narrow, the frequency resolution will be poor, andif the window is too wide, the time localisation will not be soprecise. The temporal resolution can be made as high as pos-sible by sliding the STFT window along the FE window with alarge overlap. A small and wide STFT window (M = 50) canlocalise the frequency components in time whilst, at the sametime, obtain a good frequency resolution. The window func-tion utilised in this work becomes more like a uniform win-dow with a parabolic top (i.e., less Gaussian) as α is decreasedbelow 2. Therefore, most of the best PCs chosen cause thefrequency components within each STFT window to be em-phasised more so than a Gaussian window (α > 2) would al-low. The temporal resolution is achieved by sliding the STFTalong the data with a certain overlap. Results from additionaltests suggest that if the temporal resolution is too high (i.e.,a large overlap) features overfitting may occur. N was set to100 in the best PC for subject S2, indicating that the timelocalisation did not have to be as precise.

The interpolation process also plays an important role inthe improvement of CA. The degree of smoothing is propor-tional to the value of ip. If ip is zero then no interpolationis performed. As can be seen from Table 1, for the best PCsfor all subjects, some degree of smoothing was found to im-prove the CA rate. The improvement was, in some cases, onlyslight (approximately 2%) but nevertheless this is significant.The feature separability is very sensitive to the value of ip andincreasing ip too much can cause performance degradation.As outlined in [19], a small increase in CA can significantlyimprove the IT rate, therefore the performance enhancementthat the interpolation process can provide is very importantin BCI systems. As mentioned, most of the PCs providedgood time-frequency resolution but if the frequency resolu-tion is too precise the intraclass variation will increase dueto irregular frequency components. The interpolation pro-cess reduces the negative effects of irregular frequencies bysmoothing the spectra and thus reducing the intraclass vari-ance. Even increasing ip to 2 can reduce the intraclass vari-ance and produce better CA and MI rates; however, in somecases, the interpolation process can reduce the interclass vari-ance.

Overall, the parameters for each subject (apart from thesubject-specific frequency bands and FE-window size) showsome coherence. Therefore it may be appropriate to selecta standard set for all subjects. This would allow fast appli-cation of the system to each individual subject. It is alsopossible that, by optimising the parameter combinations foreach subject using an automated search algorithm, improvedperformance could be achieved, although the training times

may be costly. Parameters M and N do not have to be veryfinely tuned to obtain the best performance. Parameters α,ip, and ovl are critical parameters and cannot be varied toomuch from the selected best without significant degradationin performance. In additional experimentation, parameterswere chosen arbitrarily with a small STFT window (N = 50)and high CA rates were achieved on the validation data butthe results on the testing data were unsatisfactory. This oc-curred when ovl was large. For example, when the overlapwas set equal to 45 (i.e., 95%), a large number of spectrawere produced for each signal. Assuming the FE window sizeM = 360 then, the number of spectra (i.e., STFT windows)is E = (M − ovl)/(N − ovl) = 63 and from (6) m = 126 (cf.Section 3). This large number of features is almost half thenumber of data samples in the window and this can result inoverfitted features. Thus the linear classifier begins to over-fit. Parameter combinations that produced lower numbers offeatures (i.e., < 30) produced classifiers which generalised thebest to the unseen test data.

10.3. The performance quantifying methods

The three performance measures have advantages and dis-advantages and based on each, different conclusions can bedrawn about the system. All three provide different informa-tion; classification accuracy rate simply provides the accuracyand other information such as sensitivity and specificity canbe obtained. Even though these measures provide informa-tion about how well the system can distinguish between dif-ferent sets of features extracted from the input space, they donot provide any information about the time required to doso. Timing is critical in any communication system and inmost cases communicating in real time or as close as possi-ble to real time is desirable. So, if a two-class system achieves100% accuracy but it requires 20 seconds to perform the clas-sification, then the advantage gained by the high accuracyis diminished by the fact that the classification required somuch time.

As can be seen from Table 1, differences in CA and CThave significant effects on the IT rate, a performance mea-sure which can quantify the performance of the system basedon the CT and CA. The challenge is to find the optimal per-formance between accuracy and speed. In some cases the op-timum can be obtained by accepting an FEP or classifier thathas a reduced accuracy but a fairly rapid response. This willproduce significantly faster IT rates but will result in a sys-tem where the probability of misclassifications occurring ismuch higher. This can be observed for the results of subjectS1 where there is a slight difference in CA (1%) but a largedifference in IT. The PC with the highest CA did not obtainthe highest IT, therefore care must be taken when choosingthe best PC.

MI calculation does not consider the accuracy or the timeof classification but does quantify the average amount of in-formation that can be obtained from the classifier outputabout the signal. This may be very important if it is intendedto use the classifier output to control an application which re-quires proportional control. For example, the control of cur-sor may be performed by adjusting the cursor proportional


to the magnitude (TSD) of the classifier output and/or usingthe cursor to select from more than two choices on a one-dimensional scale. A person’s ability to vary the MI wouldprovide potential for the system to increase the possible ITrate to more than one bit for a two-class problem [6]. TheMI can quantify how well a system may perform these typesof tasks but does not provide much information about ac-curacy and time, therefore would not be a better quantifierthan IT rate, although MI does provide information aboutthe system that the IT rate does not. Overall maximising theCA rates is the most important although there is more usefulinformation about the system performance contained in theIT rate.

11. CONCLUSION

To the best of the authors’ knowledge, this type of TF-basedFEP has not been used for feature extraction in EEG-basedcommunication before. Although TF-based FEPs have beenreported for application in BCIs, a process which involves amain FE window and interpolation process is a novel proce-dure and, as the results demonstrate, significantly enhancesthe FEP and overall system performance. Analysing the timeevolution of the frequencies and values of the performancequantifiers can determine the best FE window size and alsoprovides information about the signal segments which aremost separable. The FE-window-based approach can be usedfor continuous feature extraction and thus has the potentialto be used in an online system.

As the calculation of IT rate utilises knowledge on CAand duration of classification, IT rate provides significantlymore knowledge about the system than simply the CA rateand the MI. However, classification accuracy is the most im-portant in BCI applications and IT rates could be deceiving ifCA and CT are not reported also. Therefore, it is concludedthat, although IT rate is the best performance quantifier, allthree quantifiers can provide information on different andimportant aspects of a BCI system. It is suggested that the re-sults of each performance quantifier should be analysed andreported.

Further work will involve developing automated pro-cedures for selecting the most reactive subject-specific fre-quency bands and an automated parameter optimisationprocedure which can search the parameter space to find theoptimum subject-specific parameters. Although, an empiri-cal selection procedure can be used to select good subject-specific parameter combinations, it is anticipated that the fullpotential of the proposed approach will be realised only bydeveloping a more intuitive parameter selection procedure.

ACKNOWLEDGMENTS

The authors would like to acknowledge the Institute ofHuman-Computer Interfaces, University of Technology, andGuger Technologies (G.Tec), Graz, Austria, for providing theEEG. The first author of this work is funded by a WilliamFlynn scholarship.

REFERENCES

[1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller,and T. M. Vaughan, “Brain-computer interfaces for com-munication and control,” Journal of Clinical Neurophysiology,vol. 113, no. 6, pp. 767–791, 2002, invited review.

[2] A. E. H. Emery, “Population frequencies of inherited neu-romuscular diseases—a world survey,” Neuromuscular Disor-ders, vol. 1, no. 1, pp. 19–29, 1991.

[3] J. R. Wolpaw, N. Birbaumer, W. J. Heetderks, et al., “Brain-computer interface technology: a review of the first interna-tional meeting,” IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 164–173, 2000.

[4] D. Coyle, G. Prasad, and T. M. McGinnity, “EEG-based com-munication: a time-series prediction approach,” in Proc. IEEECybernetics Intelligence—Challenges and Advances (CICA ’03),pp. 142–147, Reading, UK, September 2003.

[5] B. J. Fisch, Fisch and Spehlmann’s EEG Primer: Basic Princi-ples of Digital and Analog EEG, Elsevier, New York, NY, USA,1999.

[6] A. Schlogl, C. Neuper, and G. Pfurtscheller, “Estimating themutual information of an EEG-based Brain-Computer Inter-face,” Biomedizinische Technik, vol. 47, no. 1-2, pp. 3–8, 2002.

[7] A. Schlogl, C. Keinrath, R. Scherer, and G. Pfurtscheller, “In-formation transfer of an EEG-based brain computer inter-face,” in Proc. 1st International IEEE EMBS Conference on Neu-ral Engineering, pp. 641–644, Capri Island, Italy, March 2003.

[8] C. Guger, A. Schlogl, C. Neuper, D. Walterspacher, T. Strein,and G. Pfurtscheller, “Rapid prototyping of an EEG-basedbrain-computer interface (BCI),” IEEE Transactions on Neu-ral Systems and Rehabilitation Engineering, vol. 9, no. 1, pp.49–58, 2001.

[9] E. Haselsteiner and G. Pfurtscheller, “Using time-dependentNeural Networks for EEG classification,” IEEE Trans. Rehab.Eng., vol. 8, no. 4, pp. 457–463, 2000.

[10] G. Pfurtscheller, C. Neuper, A. Schlogl, and K. Lugger, “Sep-arability of EEG signals recorded during right and left mo-tor imagery using adaptive autoregressive parameters,” IEEETrans. Rehab. Eng., vol. 6, no. 3, pp. 316–325, 1998.

[11] G. Pfurtscheller, C. Guger, G. Muller, G. Krausz, and C. Neu-per, “Brain oscillations control hand orthosis in a tetraplegic,”Neuroscience Letters, vol. 292, no. 3, pp. 211–214, 2000.

[12] A. Schlogl, D. Flotzinger, and G. Pfurtscheller, “Adaptive au-toregressive modeling used for single-trial EEG classification,”Biomedizinische Technik, vol. 42, no. 6, pp. 162–167, 1997.

[13] M. Roessgen, M. Deriche, and B. Boashash, “A compara-tive study of spectral estimation techniques for noisy non-stationary signals with application to EEG data,” in Proc. Con-ference Record of The 27th Asilomar Conference on Signals, Sys-tems, and Computers, vol. 2, pp. 1157–1161, Pacific Grove,Calif, USA, November 1993.

[14] S. V. Notley and S. J. Elliott, “Efficient estimation of a time-varying dimension parameter and its application to EEG anal-ysis,” IEEE Trans. Biomed. Eng., vol. 50, no. 5, pp. 594–602,2003.

[15] R. Q. Rodrigo, Quantitative analysis of EEG signals: time-frequency methods and chaos theory, Ph.D. thesis, Medical Uni-versity of Lubeck, Lubeck, Germany, 1998.

[16] G. Pfurtscheller, Electroencephalography, Basic Principles,Clinical Application and Related Fields, E. Niedermeyer and F.L. Da Silva, Eds., Williams and Wilkins, Baltimore, Md, USA,4th edition, 1998.

[17] D. Nishikawa, W. Yu, H. Yokoi, and Y. Kakazu, “On-line learn-ing method for EMG prosthetic hand control,” Electronics andCommunications in Japan (Part III: Fundamental ElectronicScience), vol. 84, no. 10, pp. 35–46, 2001, scripta technica.


[18] K.-R. Muller, C. W. Anderson, and G. E. Birch, “Linearand nonlinear methods for brain-computer interfaces,” IEEETransactions on Neural Systems and Rehabilitation Engineer-ing, vol. 11, no. 2, pp. 165–169, 2003.

[19] J. R. Wolpaw, H. Ramoser, D. J. McFarland, and G.Pfurtscheller, “EEG-based communication: improved accu-racy by response verification,” IEEE Trans. Rehab. Eng., vol. 6,no. 3, pp. 326–333, 1998.

[20] E. Donchin, K. M. Spencer, and R. Wijesinghe, “The men-tal prosthesis: assessing the speed of a P300-based brain-computer interface,” IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp.174–179, 2000.

[21] K. A. Moxon, “Brain-control interfaces for sensory and motorprosthetic devices,” in Proc. IEEE International Conference onAcoustics, Speech, and Signal Processing (ICASSP ’01), vol. 6,pp. 3445–3448, Salt Lake City, Utah, USA, May 2001.

[22] A. Kostov and M. Polak, “Parallel man-machine training indevelopment of EEG-based cursor control,” IEEE Trans. Re-hab. Eng., vol. 8, no. 2, pp. 203–205, 2000.

[23] C. E. Shannon and W. Weaver, The Mathematical Theory ofCommunication, University of Illinois Press, Urbana, Ill, USA,1963.

[24] J. R. Pierce, An Introduction to Information Theory, Dover,New York, NY, USA, 1980.

[25] T. M. Vaughan, “Guest editorial brain-computer interfacetechnology: a review of the second international meeting,”IEEE Transactions on Neural Systems and Rehabilitation En-gineering, vol. 11, no. 2, pp. 94–109, 2003.

[26] P. Sykacek, S. Roberts, M. Stokes, E. Curran, M. Gibbs, andL. Pickup, “Probabilistic methods in BCI research,” IEEETransactions on Neural Systems and Rehabilitation Engineer-ing, vol. 11, no. 2, pp. 192–194, 2003.

[27] M. Pregenzer and G. Pfurtscheller, “Distinction sensitivelearning vector quantization (DSLVQ) application as a clas-sifier based feature selection method for a Brain ComputerInterface,” in Proc. 4th International Conference on ArtificialNeural Networks (ICANN ’95), no. 409, pp. 433–436, Cam-bridge, UK, June 1995.

[28] M. Pregenzer and G. Pfurtscheller, “Frequency component se-lection for an EEG-based brain to computer interface,” IEEETrans. Rehab. Eng., vol. 7, no. 4, pp. 413–419, 1999.

Damien Coyle was born in 1980 in the Re-public of Ireland. He graduated from theUniversity of Ulster in 2002 with a First-Class Honours degree in electronics andcomputing engineering. He is currently un-dertaking research as a Ph.D. student inthe Intelligent Systems Engineering Labo-ratory (ISEL) at the University of Ulster.His research interests include nonlinear sig-nal processing, biomedical signal process-ing, chaos theory, information theory, and neural and adaptive sys-tems. Coyle is a Member of the IEE and IEEE.

Girijesh Prasad was born in 1964 in In-dia. He has a First-Class Honours degreein electrical engineering and a First-ClassMaster’s degree in computer science andtechnology and received a Doctorate fromthe Queen’s University of Belfast in 1997.Currently he holds the post of Lecturerand is a member of Intelligent SystemsEngineering Laboratory (ISEL) research

group of the School of Computing and Intelligent Systems at theUniversity of Ulster. His research interests include computationalintelligence, predictive modelling and control of complex nonlin-ear systems, performance monitoring and optimisation, thermalpower plants, brain-computer interface, and medical packagingprocesses. He is a Member of the IEE and the IEEE and is a Char-tered Engineer.

T. M. McGinnity has been a member ofthe University of Ulster academic staff since1992, and holds the post of Professor ofintelligent systems engineering within theFaculty of Engineering. He has a First-ClassHonours degree in physics, and a Doctor-ate from the University of Durham, is a Fel-low of the IEE, Member of the IEEE, and aChartered Engineer. He has 25 years of ex-perience in teaching and research in elec-tronic engineering, leads the research activities of the IntelligentSystems Engineering Laboratory at the Magee campus of the uni-versity, and is Head of the School of Computing and Intelligent Sys-tems. His current research interests relate to the creation of intel-ligent computational systems in general, particularly in relation tohardware and software implementations of neural networks, fuzzysystems, genetic algorithms, embedded intelligent systems utilizingre-configurable logic devices, and bio-inspired intelligent systems.


EEG-Based Asynchronous BCI Controls FunctionalElectrical Stimulation in a Tetraplegic Patient

Gert PfurtschellerLaboratory of Brain-Computer Interfaces, Institute of Computer Graphics and Vision, and Ludwig Boltzmann-Institute forMedical Informatics and Neuroinformatics, Graz University of Technology, Inffeldgasse 16a, 8010 Graz, AustriaEmail: [email protected]

Gernot R. Muller-PutzLaboratory of Brain-Computer Interfaces, Institute of Computer Graphics and Vision, Graz University of Technology,Inffeldgasse 16a, 8010 Graz, AustriaEmail: [email protected]

Jorg PfurtschellerDepartment of Traumatology, Hospital Villach, Nikolaigasse 43, 9400 Villach, AustriaEmail: [email protected]

Rudiger RuppDepartment II, Orthopedic Hospital of Heidelberg University, Schlierbacher Landstraße 200a, 69118 Heidelberg, GermanyEmail: [email protected]

Received 29 January 2004

The present study reports on the use of an EEG-based asynchronous (uncued, user-driven) brain-computer interface (BCI) forthe control of functional electrical stimulation (FES). By the application of FES, noninvasive restoration of hand grasp functionin a tetraplegic patient was achieved. The patient was able to induce bursts of beta oscillations by imagination of foot movement.These beta oscillations were recorded in a one EEG-channel configuration, bandpass filtered and squared. When this beta activityexceeded a predefined threshold, a trigger for the FES was generated. Whenever the trigger was detected, a subsequent switchingof a grasp sequence composed of 4 phases occurred. The patient was able to grasp a glass with the paralyzed hand completely onhis own without additional help or other technical aids.

Keywords and phrases: beta oscillations, motor imagery, functional electrical stimulation, brain-computer interface, spinal cordinjury, neuroprosthesis.

1. INTRODUCTION

The idea of direct brain control of functional electrical stim-ulation (FES) seems to be a realistic concept for restoration ofthe hand grasp function in patients with a high spinal cordinjury. Today, electrical brain activity either recorded fromthe intact scalp (EEG) or with subdural electrodes (ECoG)can be classified and transferred into signals for control ofFES system (neuroprosthesis). Nowadays both implantablesystems [1, 2] and devices using surface electrodes [3] areavailable for clinical use. For the transformation of mentalcommands reflected in changes of the brain signal into con-trol signals for FES devices, an asynchronous, user-drivenbrain-computer interface (BCI) is necessary [4]. Such anasynchronous BCI analyses the EEG (ECoG) continuouslyand uses no cue stimuli.

For the realization of a reliable and easy to apply BCI,only one signal channel (one recording with two electrodes)should be used. Further, it is necessary to have a mental strat-egy established to produce short increases or bursts in theEEG (ECoG) amplitude and to detect the increase with a sim-ple threshold comparator.

We report for the first time on restoration of hand graspfunction composed of 4 phases by electrical stimulation ofhand muscles with surface electrodes and control of the stim-ulation by one-channel EEG recording.

2. MATERIALS AND METHODS

2.1. Subject

The tetraplegic patient we report on is a 29-year old man suf-fering from a traumatic spinal cord injury since April 1998.

Asynchronous BCI Controls FES in a Tetraplegic Patient 3153

He is affected by a complete motor and sensory lesion belowC5 and an incomplete lesion below C4. As a preparation forthe experiment, the patient performed an individual stimu-lation program until he achieved a strong and fatigue resis-tant contraction of the paralyzed muscles of the hand andforearm. The residual volitional muscle activation of his leftupper extremity is as follows.

Shoulder: active abduction and flexion up to 90; grade3/5 before - grade 4/5 after training, full rotational range ofmotion (ROM); full passive ROM.

Elbow: active flexion grade 3/5 before / grade 4/5 af-ter training, no active extension (triceps grade 0/5); pro-and supination possible (partly trick movement); full passiveROM.

Forearm, hand, and fingers: M. extensor carpi radialis(ECR) showed a palpable active contraction (grade 1/5)without change over training; all other muscles grade 0/5; al-most full passive ROM in finger joints; full wrist, thumb, andforearm ROM.

2.2. Functional electrical stimulation

Our aim was to find a functional grasp pattern that wouldbring the most benefit for our patient, and to find a practicalway to generate it by use of surface stimulation electrodes. Akind of fine manipulating grasp, providing the ability to pickup objects from a table, for example, food or a glass, seemedto be most suitable. This grasp is generated by flexion in themetacarpophalangeal (MCP) joints of the extended fingersagainst the thumb, so that small objects are held between theball of the end phalanx of the fingers and the thumb, whilelarger objects are held between the palmar side of the wholefingers and the thumb.

As a precondition for a functional hand grasp pattern, thewrist needs to be dorsal flexed and held stable in this positionduring flexion of the fingers. Due to the lack of an adequateactive wrist extension and a partial denervation (lesion of pe-ripheral nerve fibers) of the wrist extensor muscle (M. exten-sor carpi radialis muscle, grade 1/5) in our patient, it was notpossible to get a stable dorsal flexion of the wrist by stimula-tion, forcing us to use a mechanical orthosis fixing the wristin a dorsal flexed position.

An opening of the hand (phases 1 and 4, Figure 1) by ex-tension of all fingers joints and the thumb could be achievedby stimulation of the finger extensors (M. extensor digito-rum communis) and the thumb extensor muscle (M. exten-sor pollicis longus) with electrodes on the radial side of theproximal forearm.

For the actual grasping (phase 2, Figure 1), we simultane-ously stimulated the finger flexors (M. flexor digitorum su-perficialis, less the M. flexor digitorum profundus) by onepair of electrodes on the ulnar side of the proximal forearmand the intrinsic hand muscles with two further electrodeson the dorsal side of the hand. The application of the orthosisfor dorsal flexion of the wrist leads to a light flexed positionof the thumb sufficient for serving as a stable counterpart tothe flexing fingers. Therefore, no additional stabilization ofthe thumb via surface stimulation was necessary.

Motorimagery

Motorimagery

Motorimagery

Motorimagery

10 s

Threshold

Triggerfor FES Relax

50454035302520151050−50

0

50

EE

G(µ

V)

50454035302520151050−25

0

25

Bet

a(µ

V)

50454035302520151050

−8

−6

−4

Bet

apo

wer

1 2 3 0

Figure 1: Example of bipolar EEG recording from the vertex (uppertrace) bandpass filtered (15–19 Hz) EEG signal (middle) and bandpower time course (lower trace, arbitrary units) over a time inter-val of 50 seconds. Threshold and trigger pulse generation after FESoperation and grasp phases are indicated. Shots of the grasping areshown in the lower part.

For the external stimulation, we used a stimulator(Microstim8, Krauth & Timmermann, Germany) with bi-phasic, rectangular constant current pulses. The stimulationfrequency was set to 18 Hz; the current was set for each pairof electrodes on an individual level. Due to the integratedmicrocontroller, we were able to implement different stimu-lation patterns for a grasp sequence directly into the device.

The output of the BCI was then used as a trigger signalfor switching between the different grasp phases (phase 0 -no stimulation, phase 1 - opening hand, phase 2 - grasping,phase 3 - releasing, phase 4 = phase 0, see Figure 1).

2.3. EEG recording and processing

The EEG was recorded bipolarly from 2 gold-electrodes fixedin a distance of 5 cm in an anterior-posterior position on thevertex (Cz according to the international 10–20 system). TheEEG signal was amplified (sensitivity was 50 µV) between 0.5and 30 Hz with a bipolar EEG-amplifier (Raich, Graz) andsampled with 128 Hz. The signal was online processed bybandpass filtering (15–19 Hz), squaring, averaging over 128samples, and logarithmizing. After passing a threshold de-


tector, a trigger pulse was generated followed by a refractoryperiod of 3 seconds. The threshold was empirically selectedby comparing the band power values obtained from restingand imagery periods.

2.4. Mental strategyOur patient participated in a number of BCI training ses-sions with the goal of developing a mental strategy to inducemovement-specific EEG patterns and to transform these pat-terns into a binary control signal. During the training, sev-eral types of imaginations were used in order to increasethe classification accuracy. Imaginations of left versus righthand movements were carried out first. Then single-foot mo-tor imageries versus relaxing or hand movement imaginationcould increase the accuracy. Finally, after 55 training sessions,best results were achieved by the imagination of both feetversus right hand imagery. These two patterns were discrim-inable online 100%.

3. RESULTS

As a first result of the BCI training, the patient was ableto control the opening and closing of an electromechani-cal hand orthosis by 2-channel EEG recording [5]. Inspec-tion of EEG patterns induced by motor imagery has shownthat hand motor imagery was accompanied by a weak EEGdesynchronization [6] whereas foot motor imagery inducedlarge bursts of beta oscillations with frequencies of 17 Hz. It istherefore quite logical to use only one mental state, namely,the state inducing beta oscillations for control purposes. Atthe end of the training, the patient has learned to voluntar-ily induce beta bursts. An example of an EEG signal recordedbipolarly on the vertex close to the foot representation areais shown in Figure 1. The EEG signal is disturbed by large ar-tifacts from eye movements, because the patient watched hishand. Bandpass filtering of the EEG in the beta band (15–19 Hz) reveals 4 bursts of beta activity with a duration ofabout five seconds within the 50 seconds of recording period.The beta power increase was used for generation of a trig-ger pulse, whenever power exceeded the predefined thresh-old. Applying a refractory period of 3 seconds a maximum of20 switches can theoretically be achieved per minute.

The use of only 2 electrodes placed close to the vertexand the recording of one bipolar EEG channel minimizes theeffects brought about by using muscle activity for control.Calculating the power spectra and computing the power inthe 20–60 Hz band (part of the EMG activity band) showeda band power close to zero.

The patient was able to trigger the FES grasp phases bythe induction of beta burst on his own. Using this setting,our patient was able, for the first time after the accident, todrink from a glass without any help and without the use of astraw.

4. DISCUSSION

It is interesting to note that the motor imagery induced betaburst is a relative stable phenomenon in our patient with aconstant frequency around 17 Hz. Since this time about 3

years ago, foot motor imagery was always able to induce betabursts with constant frequency components. The generationnetwork of these beta bursts is very likely in the foot repre-sentation area and/or the supplementary motor area (SMA).In a foot motor imagery task, both primary sensorimotorarea and SMA play an important role, whereby the SMA islocated in the medial portion of Brodman’s area 6 in front ofthe foot representation area. From scalp recordings, we can-not expect, however, to differentiate between both sources,because of the proximity of SMA and foot representationarea [7].

There is strong evidence from EEG, MEG, and ECoGrecordings that different motor tasks including imagery cangenerate beta oscillations between 20–35 Hz in the SMAand/or the foot representation area of able-bodied subjects[8, 9, 10, 11, 12]. Common for all these reports on inducedbeta oscillations close to the vertex are its strict localizationto the midcentral area and its dominant frequency between20–35 Hz.

There is, however, one important difference between allthe observed beta oscillations associated with a motor taskin able-bodied subjects and the induced beta oscillations inour tetraplegic patient: The former are generated after ter-mination of the motor task, the latter during execution ofthe motor task. Whether in both cases the same or similarnetworks in the SMA and/or foot representation area are in-volved needs further research. Important is that the beta os-cillations on the vertex induced by the reported patient area robust and reliable phenomenon that can be generated at“will”.

For an EEG-based control of a neuroprosthesis in all-daylife under real-world conditions, the performance of the BCIhas to be maximized by using a minimum number of elec-trodes. Using more than one single bipolar derivation, it islikely to help identifying more than a binary switch conclud-ing in the realization of a more complex EEG-based controlfor the future.

ACKNOWLEDGEMENT

This project was supported by the Austrian Federal Min-istry of Transport, Innovation and Technology, projectGZ140.587/2, the “Lorenz-Boehler Gesellschaft,” and the“Allgemeine Unfallversicherungsanstalt” (AUVA).

REFERENCES

[1] B. Fromm, R. Rupp, and H. J. Gerner, “The freehand system:an implantable neuroprosthesis for functional electrostimu-lation of the upper extremity,” Handchirurgie, Mikrochirurgie,Plastische Chirurgie, vol. 33, no. 3, pp. 149–152, 2001.

[2] P. H. Peckham, M. W. Keith, K. L. Kilgore, et al., “Efficacyof an implanted neuroprosthesis for restoring hand grasp intetraplegia: A multicenter study,” Archives of Physical Medicineand Rehabilitation, vol. 82, no. 10, pp. 1380–1388, 2001.

[3] M. R. Popovic, D. B. Popovic, and T. Keller, “Neuroprosthesesfor grasping,” Neurological Research, vol. 24, no. 5, pp. 443–452, 2002.

[4] G. Pfurtscheller and C. Neuper, “Motor imagery and direct

Asynchronous BCI Controls FES in a Tetraplegic Patient 3155

brain-computer communication,” Proc. IEEE, vol. 89, no. 7,pp. 1123–1134, 2001.

[5] G. Pfurtscheller, C. Guger, G. R. Muller, G. Krausz, andC. Neuper, “Brain oscillations control hand orthosis in atetraplegic,” Neuroscience Letters, vol. 292, no. 3, pp. 211–214,2000.

[6] G. Pfurtscheller and F. H. Lopes da Silva, “Event-relatedEEG/MEG synchronization and desynchronization: basicprinciples,” Clinical Neurophysiology, vol. 110, no. 11,pp. 1842–1857, 1999.

[7] A. Ikeda, H. O. Luders, R. C. Burgess, and H. Shibasaki,“Movement-related potentials recorded from supplementarymotor area and primary motor area. Role of supplementarymotor area in voluntary movements,” Brain, vol. 115, no. 4,pp. 1017–1043, 1992.

[8] C. Neuper and G. Pfurtscheller, “Motor imagery and ERD,” inEvent-Related Desynchronization. Handbook of Electroenceph.and Clin. Neurophysiol. Revised Edition, G. Pfurtscheller andF. H. Lopes da Silva, Eds., vol. 6, pp. 303–325, Elsevier, Ams-terdam, the Netherlands, 1999.

[9] G. Pfurtscheller, M. Woertz, G. Supp, and F. H. Lopes daSilva, “Early onset of post-movement beta electroencephalo-gram synchronization in the supplementary motor area dur-ing self-paced finger movement in man,” Neuroscience Letters,vol. 339, no. 2, pp. 111–114, 2003.

[10] S. Ohara, A. Ikeda, T. Kunieda, et al., “Movement-relatedchange of electrocorticographic activity in human supple-mentary motor area proper,” Brain, vol. 123, no. 6, pp. 1203–1215, 2000.

[11] J. Kaiser, W. Lutzenberger, H. Preissl, D. Mosshammer, andN. Birbaumer, “Statistical probability mapping reveals high-frequency magnetoencephalographic activity in supplemen-tary motor area during self-paced finger movements,” Neuro-science Letters, vol. 283, no. 1, pp. 81–84, 2000.

[12] R. Salmelin, M. Hamalainen, M. Kajola, and R. Hari, “Func-tional segregation of movement-related rhythmic activity inthe human brain,” NeuroImage, vol. 2, no. 4, pp. 237–243,1995.

Gert Pfurtscheller received the M.S. andPh.D. degrees in electrical engineering fromthe Graz University of Technology, Graz,Austria. He is a Professor of medical in-formatics, Director of the Laboratory ofBrain-Computer Interfaces, Graz Universityof Technology, and Director of the Lud-wig Boltzmann-Institute for Medical Infor-matics and Neuroinformatics. His researchinterests include functional brain topogra-phy, the design of brain-computer communication systems, andnavigation in virtual environments by a brain-computer inter-face.

Gernot R. Muller-Putz received the M.S.degree in biomedical engineering from theGraz University of Technology, Graz, Aus-tria, in May 2000. He received the Ph.D. de-gree in electrical engineering in August 2004from the same university. His research in-terests include brain-computer communi-cation systems, the human somatosensorysystem, rehabilitation engineering, and as-sistive technology.

Jorg Pfurtscheller received the M.D. degreefrom the University of Graz in 1995. Cur-rently he is with the Department of Trau-matology, Hospital Villach, in the final stageto become a trauma surgeon. His researchinterests include rehabilitation after spinalcord injury and applications of functionalelectrical stimulation.

Rudiger Rupp received the M.S. degreein electrical engineering with a focus onbiomedical engineering from the TechnicalUniversity of Karlsruhe, Germany, in 1994.After working at the Institute for Biomedi-cal Engineering and Biocybernetics (Profes-sor G. Vossius), since 1996, he is currentlywith the Orthopaedic University Hospital IIin Heidelberg (Professor H. J. Gerner), Ger-many, where he holds the position as a Re-search Group Manager. He is now finishing his Ph.D. work. Hismain research interests are in the field of rehabilitation engineer-ing especially for spinal cord injured patients. This includes neuro-prosthetics mainly of the upper extremity, application of functionalelectrical stimulation for therapeutic purposes, development andclinical validation of novel methods, devices for locomotion ther-apy, gait analysis in incomplete spinal cord injury, and realizationof software projects for standardized documentation of rehabilita-tion outcome. He is a Member of IEEE, IFESS, and VDE.


Steady-State VEP-Based Brain-Computer InterfaceControl in an Immersive 3D Gaming Environment

E. C. Lalor,1 S. P. Kelly,1,2 C. Finucane,3 R. Burke,4 R. Smith,1 R. B. Reilly,1 and G. McDarby1

1 School of Electrical, Electronic and Mechanical Engineering, University College Dublin, Belfield, Dublin 4, IrelandEmails: [email protected], [email protected], [email protected], [email protected]

2 The Cognitive Neurophysiology Laboratory, Nathan S. Kline Institute for Psychiatric Research, Orangeburg NY 10962, USAEmail: [email protected],

3 Medical Physics and Bioengineering, St. James’s Hospital, P.O. Box 580, Dublin 8, IrelandEmail: [email protected]

4 EOC Operations Center, Microsoft Corporation, Sandyford Industrial Estate, Dublin 18, IrelandEmail: [email protected]

Received 2 February 2004; Revised 19 October 2004

This paper presents the application of an effective EEG-based brain-computer interface design for binary control in a visuallyelaborate immersive 3D game. The BCI uses the steady-state visual evoked potential (SSVEP) generated in response to phase-reversing checkerboard patterns. Two power-spectrum estimation methods were employed for feature extraction in a series ofoffline classification tests. Both methods were also implemented during real-time game play. The performance of the BCI wasfound to be robust to distracting visual stimulation in the game and relatively consistent across six subjects, with 41 of 48 gamessuccessfully completed. For the best performing feature extraction method, the average real-time control accuracy across subjectswas 89%. The feasibility of obtaining reliable control in such a visually rich environment using SSVEPs is thus demonstrated andthe impact of this result is discussed.

Keywords and phrases: EEG, BCI, SSVEP, online classification, overt attention.

1. INTRODUCTION

The concept of a brain-computer interface (BCI) stems froma need for alternative, augmentative communication, andcontrol options for individuals with severe disabilities (e.g.,amyotropic lateral sclerosis), though its potential uses extendto rehabilitation of neurological disorders, brain-state mon-itoring, and gaming [1]. The most practical and widely ap-plicable BCI solutions are those based on noninvasive elec-troencephalogram (EEG) measurements recorded from thescalp. These generally utilize either event-related potentials(ERPs) such as P300 [2] and visual evoked potential (VEP)measures [3], or self-regulatory activity such as slow corti-cal potentials [4] and changes in cortical rhythms [5, 6, 7].The former design, being reliant on natural involuntary re-sponses, has the advantage of requiring no training, whereasthe latter design normally demonstrates effectiveness onlyafter periods of biofeedback training, wherein the subjectlearns to regulate the relevant activity in a controlled way.

Performance of a BCI is normally assessed in terms ofinformation transfer rate, which incorporates both speedand accuracy. One BCI solution that has seen considerable

success in optimizing this performance measure relies onsteady-state visual evoked potentials (SSVEPs), a periodicresponse elicited by the repetitive presentation of a visualstimulus at a rate of 6–8 Hz or more [8]. SSVEPs havebeen successfully utilized in both above-mentioned BCIdesigns—gaze direction within a matrix of flickering stim-uli is uniquely manifest in the evoked SSVEP through itsmatched periodicity [3, 9], and also the self-regulation ofSSVEP amplitude has been reported as feasible with appro-priate feedback [10].

The effectiveness of SSVEP-based BCI designs is due toseveral factors. The signal itself is measurable in as large apopulation as the transient VEP—very few fail to exhibit thistype of response [8, 11]. The task of feature extraction is re-duced to simple frequency component extraction, as thereare only a certain number of separate target frequencies, usu-ally one for each choice offered in the BCI. High signal-to-noise ratios are obtainable when analyzing the SSVEP at suf-ficiently high frequency resolution [8]. Finally, SSVEPs areresilient to artifacts, as blink, movement, and electrocardio-graphic artifacts are confined mostly to lower EEG frequen-cies [11]. Moreover, the source of ocular artifacts (blinks, eye

SSVEP-Based BCI Control in 3D Gaming Environment 3157

movements) is located on the opposite side of the head tothe visual cortex over which the SSVEP is measured. Thoughthese characteristics are well affirmed by the success of cur-rent SSVEP-based BCIs [3, 9], it is not known to what degreeperformance may be compromised by concurrent unrelatedvisual stimulation, where an individual’s visual resources aredivided, as in a video gaming environment.

In this paper, the authors wish to address a novel applica-tion of the SSVEP-based BCI design within a real-time gam-ing framework. The video game involves the movement ofan animated character within a virtual environment. Boththe character and environment have been modelled as 3Dvolumes. The lighting and virtual camera position changein response to the character’s movements within the envi-ronment. Overall, the result is a very visually engaging videogame.

The SSVEP response constitutes only a portion of theoverall set of visual processes manifest in the ongoing EEGduring game play. In this study, we address the challenge ofextracting and processing SSVEP measures from a signal ofsuch indeterminate complexity in real time for BCI control.

The design of the SSVEP-based BCI was split into twoparts. First, a preliminary offline analysis was conducted todetermine the most favourable signal processing methodol-ogy and choose suitable frequencies. Once satisfactory offlineanalysis results were obtained, the full real-time game wasimplemented. Performance of the real-time BCI game whenplayed by six normal subjects is presented.

2. PRELIMINARY ANALYSIS

2.1. Methods

(A) Subjects

Five male subjects, aged between 23 and 27, participated inthe preliminary study. All subjects had normal or corrected-to-normal vision.

(B) Experimental setup

Subjects were seated 70 cm from a 43 cm (“17”) computermonitor. EEG was acquired in a shielded room from two Ag-AgCl scalp electrodes placed at sites O1 and O2, according tothe 10–20 international electrode-positioning standard [12],situated over the left and right hemispheres of the primary vi-sual cortex, respectively. Skin-electrode junction impedanceswere maintained below 5 kΩ. Each channel, referenced tothe right ear lobe on bipolar leads, was amplified (20 K),50 Hz line filtered, and bandpass filtered over the range 0.01–100 Hz by Grass Telefactor P511 rack amplifiers. Assumingthat eye movement and blink artifacts did not threaten sig-nal integrity at frequencies of interest, neither horizontal norvertical EOG signals were recorded. Subjects were monitoredvisually throughout for continued compliance. Signals weredigitized at a sampling frequency of 256 Hz.

Initial testing of the experimental setup involved acquir-ing data from two subjects while gazing at either a circu-lar yellow flicker stimulus on black background or a sim-ilarly sized rectangular black and white checkerboard pat-tern, modulated at several test frequencies between 6 Hz and

25 Hz. On visual inspection of power spectra, it was foundthat the checkerboard pattern produced a more pronouncedSSVEP than a flicker stimulus modulated at the same fre-quency. Furthermore, it has been found that to elicit anSSVEP signal at a certain frequency, a flicker stimulus mustbe modulated at that frequency, while a checkerboard patternneed only be modulated at half that frequency, as the SSVEPis produced at its rate of phase-reversal or alternation rate[13]. This is an important consideration when using a stan-dard monitor with refresh rate of 100 Hz. Hence, checker-board patterns were chosen as stimuli in the following pre-liminary tests and BCI game. From this point, checkerboardfrequencies will be given in terms of alternation rate, equiva-lent to the frequency of the SSVEP produced.

Twenty five seconds of eyes-closed data were first ac-quired for each subject to accurately locate alpha frequency.Testing then proceeded with several 25-second trials duringwhich the subject viewed a full-screen checkerboard patternat frequencies between 6 Hz and 25 Hz, excluding the indi-vidual’s alpha band [9]. The power spectra for these datawere examined and the two frequencies eliciting the largestSSVEPs were identified. The subject then underwent 25-second trials in which he viewed each one of two bilateralcheckerboard patterns phase-reversing at the two selectedfrequencies and this was repeated with positions reversed,giving a total of 4 trials. Each 4 × 4 checkerboard pattern’smedial edge was situated 4.9 bilateral to a central cross, cen-tered on the horizontal meridian, and subtended a visual an-gle of 6.5 vertically and 7.2 horizontally. These dimensionswere determined empirically.

(C) Feature extraction

Two feature extraction methods were employed for compar-ison in the preliminary data. Each was aimed at utilizing theseparable aspects of the SSVEP signals. For both methods,each 25-second trial was divided into approximately 50 over-lapping segments, each of which counts as a single case forwhich the feature(s) is derived. Both 1-second and 2-secondsegments were used for comparison, with a view to assessingspeed achievable by using each method in real time.

Method 1: squared 4-second FFT

In this method, each one- or two-second segment was ex-tracted using a Hamming window, zero-padded to 1024 sam-ples (4s), and the fast Fourier transform (FFT) was calculatedand squared. A single feature was extracted for each segment:

F1(n) = log

(Xn( f 1)Xn( f 2)

), (1)

where

Xn = mean2(

FFT(xn(t)

)∣∣O1, FFT

(xn(t)

)∣∣O2

), (2)

that is, the square of the FFT averaged over electrode sitesO1 and O2, of the nth segment xn(t), and f 1 and f 2 are thechosen checkerboard frequencies.


4038363432302826242220181614121086

Frequency (Hz)

0

0.2

0.4

0.6

0.8

1

Nor

mal

ized

PSD

17 Hz20 Hz

Figure 1: Power spectra for full-screen checkerboard trials at 17 and 20 Hz for subject 1. Spectra were calculated using the squared FFTmethod averaged across the entire 25-second trial.

Method 2: FFT of autocorrelation

This method is similar in that it also corresponds to calculat-ing a PSD estimate. In this case, the autocorrelation functionis calculated for each segment followed by the FFT:

F2(n) = log

(Yn( f 1)Yn( f 2)

), (3)

where

Yn = mean(

FFT(Rxx

n)∣∣O1, FFT

(Rxx

n)∣∣O2

),

Rxxn(t) = E

xn(t0)xn(t0 − t

),

(4)

where the second formula in (4) is the autocorrelation func-tion of the nth segment xn(t).

This method of PSD estimation is more resilient to noisedue to the fact that the autocorrelation of white noise is zeroat all nonzero latencies.

(D) Classification

Linear discriminants were used as the classifier model forthis study, providing a parametric approximation to Bayes’rule [14]. In the case of both feature extraction methods,this corresponds to calculating a threshold in one dimension.Optimization of the linear discriminant model is achievedthrough direct calculation and is very efficient, thus lendingitself well to real-time applications.

Performance of the LDA classifier was assessed on thepreliminary data using 10-fold cross-validation [14]. Thisscheme randomly divides the available data into 10 approxi-mately equal-sized, mutually exclusive “folds.” For a 10-foldcross-validation run, 10 classifiers are trained with a differentfold used each time as the testing set, while the other 9 foldsare used for the training data. Cross-validation estimates aregenerally pessimistically biased, as training is performed us-ing a subsample of the available data.

Results

All subjects during preliminary testing were reported to befully compliant in following given directions. Analysis ofpower spectra during full-screen checkerboard trials resultedin the selection of 17 Hz and 20 Hz as the bilateral checker-board frequencies. These frequencies were employed in eachof the four test trials for all subjects. Power spectra for full-screen checkerboard trials for a representative subject areshown in Figure 1.

Note that peaks exist at both the frequency of modula-tion of each constituent square of the checkerboard (hence-forth referred to as the first harmonic) and the alterna-tion rate (second harmonic). Both the flicker stimulus andcheckerboard SSVEP frequency effects described above areexhibited in the spectrum due to the large size of the con-stituent squares of the full-screen checkerboard pattern. Asexpected, the second harmonic was more dominant oncethe checkerboards were made smaller such that the pat-tern as a whole could be viewed in the subjects’ foveal vi-sion.

The power spectra for left and right gaze directions for arepresentative subject are shown in Figure 2. It can be seenthat for this subject, the magnitude of the SSVEP responseto a 17 Hz stimulus is greater than that for a 20 Hz stim-ulus, which demonstrates the need for classifier training todetermine a decision threshold. Each subject’s alpha rhythmcaused little contamination of the spectra, being of low am-plitude during testing—rapid stimulus presentation resultsin very little cortical idling in the visual cortex, and shorttrial length prevents arousal effects known also to affect al-pha [15].

The classification accuracy for all five subjects using thetwo feature extraction methods are listed in Tables 1 and 2.Performance was assessed using both 1- and 2-second seg-ments, and the question of whether inclusion (by averag-ing) of the first harmonic in the feature had any effect was


4038363432302826242220181614121086

Frequency (Hz)

0

0.2

0.4

0.6

0.8

1

Nor

mal

ized

PSD

17 Hz20 Hz

Figure 2: Power spectra for left and right gaze directions for subject 4. Spectra were calculated using the squared FFT method averagedacross the entire 25-second trial.

Table 1: Offline performance for Method 1 averaged over two checkerboard configurations.

Subject2nd harmonic only 1st + 2nd harmonic

1s window 2s window 1s window 2s window

Subject 1 88.4% 92.2% 79.3% 88.2%

Subject 2 72.2% 79.0% 70.4% 74.3%

Subject 3 58.7% 62.0% 62.4% 69.3%

Subject 4 75.7% 81.4% 67.4% 72.9%

Subject 5 57.0% 54.2% 52.1% 50.8%

Average across subjects 70.4% 74.4% 66.3% 71.1%

addressed. This results in the augmented feature

F1′(n) = log

(mean

(Xn( f 1),Xn( f 1/2)

)mean

(Xn( f 2),Xn( f 2/2)

))

(5)

for Method 1 and similarly for Method 2.For both methods, analysis using 2-second segments is

shown to perform better than 1-second segments. Also it canbe seen that inclusion of the first harmonic in the augmentedfeature in fact degraded performance slightly. Performanceof these two methods was comparable, with the more noise-resilient autocorrelation method performing marginally bet-ter as expected.

3. REAL-TIME BCI GAME

3.1. Methods

(A) MindBalance—the game

The object of the MindBalance game is to gain 1D controlof the balance of an animated character on a tightrope usingonly the player’s EEG. As mentioned in Section 1, the gameinvolves the movement of the animated character withina virtual environment, with both the character and envi-ronment modelled as 3D volumes. The lighting and virtual

camera position change in response to the character’s move-ments within the environment. During the game, a musicalsoundtrack as well as spoken comments by the character arealso played over the aforementioned speakers to make thegame more engaging.

A checkerboard is positioned on either side of the charac-ter. These checkerboards are phase-reversed at 17 and 20 Hz.A game begins with a brief classifier training period. Thisrequires the subject to attend to the left and right checker-boards as indicated by arrows for a period of 15 secondseach. This process is repeated three times (Figure 3). Duringthis training period, audio feedback is continually presentedusing speakers located behind the subject. The audio feed-back is in the form of a looped double-click sound, the playspeed of which is linearly related to the feature (F1 in thecase of Method 1 or F2 in the case of Method 2). Feedbackis presented in order to ensure compliance during the criticaltraining period.

In the game, the tightrope walking character walks to-wards the player and stumbles every 1.5–5.0 seconds toone side chosen randomly. The player must intervene toshift the character’s balance so that it remains stable on thetightrope. To do this, the player must direct his gaze and fo-cus on the checkerboard on the opposite side of the screen to


Table 2: Offline performance for Method 2 averaged over two checkerboard configurations.

Subject2nd harmonic only 1st + 2nd harmonic

1s window 2s window 1s window 2s window

Subject 1 89.8% 96.1% 82.9% 89.1%

Subject 2 71.0% 80.4% 73.2% 80.8%

Subject 3 61.7% 65.8% 62.7% 73.5%

Subject 4 80.1% 82.3% 71.9% 78.9%

Subject 5 59.5% 62.0% 59.4% 55.0%

Average across subjects 72.4% 77.3% 70.0% 75.5%

Figure 3: The training sequence.

which the character is losing balance (Figure 4). The char-acter’s off-balance animation lasts for 3 seconds. This du-ration was chosen to give the player time to realize whichcheckerboard required fixation to elicit the required SSVEPsand help the character regain his balance. At the end ofthe 3-second animation, a decision based on the most re-cent 1 or 2 seconds of EEG is obtained. To allow for bet-ter game play, a second more pronounced off-balance 3-second animation was used in order to give a player a sec-ond chance in the case where an incorrect decision wasobtained from the EEG. There was also an optional playmode where an EEG feature value within a certain rangeof the decision threshold, when detected at the end of theoff-balance animation, resulted in no decision being takenand the original 3-second off-balance animation being sim-ply replayed. This dead zone was removed during our onlinetests.

(B) Signal processing and the C# engine

The overall processing system is shown in Figure 5. In or-der to carry out this study, a programming engine and plat-form were required, capable of rendering detailed 3D graph-ics while at the same time processing continuous EEG datato control a sprite within the game. This was accomplishedusing a combined graphics, signal processing, and networkcommunications engine implemented in C#.1 One machine

1Implemented by the MindGames Group at Media Lab Europe.

Figure 4: The character loses balance during the game.

is dedicated to the rendering of the 3D graphics while a sec-ond machine was dedicated to the real-time data acquisitionand signal processing of the EEG data. This signal processingengine allows selection of signal processing functions and pa-rameters as objects to be included into a chain of signal pro-cessing blocks to perform the required processing. Whenevera decision on the fate of the animated character is required, arequest in the form of a UDP packet is sent over the local areanetwork to the signal processing machine which sends back adecision based on the most recent feature extracted from theEEG.

(C) Interface equipment of game control

The setup for the real-time BCI game was similar to that usedin the preliminary offline analysis. One difference was theamplification stage in which the Grass Telefactor P511 rackamplifiers were replaced by Biopac biopotential amplifiers.

The subject was seated in front of a large screen on whicha 140 × 110 cm image was projected. Within the game pic-tured in Figures 3 and 4, each 4 × 4 checkerboard pattern’smedial edge was situated 8.5 bilateral to the tightrope, cen-tered on the horizontal meridian, and subtended a visual an-gle of 11.4 vertically and 11.8 horizontally.

(D) Subjects and test protocol

Six male subjects aged between 24 and 34 participated in thefollowing test procedure to assess performance of the real-time BCI game. All subjects had normal or corrected-to-normal vision.


Audio/visual

feedback

O1 & O2channels

Amplier sand Þlters

Data bu er1 or 2 s

at 256 Hz

Autocorrelation

Hammingwindow

& zero pad

1024-pointFFT

PSD estimate

Featureselection

Freq. powerratio

Featuretranslation

(log)

3D graphicsgame

engine

Figure 5: Flowchart of signal processing stages employed in real-time BCI game.

Table 3: Percentage of correct decisions in real-time game play, us-ing Method 1 with second SSVEP harmonic only.

Subject 1s window 2s window

Subject 1 75.0% 100%

Subject 2 72.7% 100%

Subject 3 75.0% 70.6%

Subject 4 69.2% 100%

Subject 5 87.5% 78.2%

Subject 6 100% 88.2%

Average across subjects 79.9% 89.5%

Each subject was asked to play the game eight times. Fourof the games were played where the EEG was analyzed bythe FFT method described above as Method 1 for the of-fline data. In two of these games, the decision on the fateof the tightrope walking character was based on a 1-secondwindow of EEG data, and in the other two games, the deci-sion was based on a 2-second window. The other four gameswere played using EEG analyzed by Method 2, the autocor-relation followed by FFT method. Again, two games used 1-second segments of EEG data and two games used 2-secondsegments.

On average, there were eight trials per game. This variedfrom game to game as a result of the random number of stepstaken by the character between losses of balance and the factthat in seven of the 48 games played, two consecutive errorsoccurred resulting in the character falling from the tightropeand the end of the game.

Results

Tables 3 and 4 list the percentage of correct decisions result-ing in the desired regain of balance on the tightrope. In sevenof the 48 games played, two consecutive errors occurred re-sulting in the character falling from the tightrope, causingthe game to end. Three of the six subjects did not allow thecharacter to fall off the tightrope in any of the eight games.

One objective measure of BCI performance is the bit rate,as defined by Wolpaw [16]. For a trial with N possible sym-bols in which each symbol is equally probable, the probabil-ity (P) that the symbol will be selected is the same for each

Table 4: Percentage of correct decisions in real-time game play, us-ing Method 2 with second SSVEP harmonic only.

Subject 1s window 2s window

Subject 1 87.5% 91.7%

Subject 2 50.0% 58.3%

Subject 3 85.7% 46.2%

Subject 4 85.7% 75.0%

Subject 5 63.6% 100%

Subject 6 87.5% 92.3%

Average across subjects 76.7% 77.3%

symbol, and each error has the same probability, then the bitrate can be calculated as follows:

Bits per symbol = log2 N + P · log2 P + (1− P) · log21− P

N − 1,

Bit Rate = bits per symbol∗ symbols per minute.(6)

In the case of the present study, one symbol is sent per trial.Using this definition of bit rate and given that each trial lastsfor 3 seconds and the peak accuracy for the real-time systemis 89.5%, the bit rate is 10.3 bits/min.

4. DISCUSSION

The results from this study indicate that the distinct SSVEPresponses elicited by phase-reversing checkerboard patternscan be successfully used to make binary decisions in real timein a BCI-controlled game involving a visually elaborate envi-ronment.

The two feature extraction methods can be directly com-pared for the offline data, given that the methods were usedto classify the same data set. The results for both methodsare comparable, with Method 2 performing marginally betterthan Method 1. This may be due to the resilience of Method2 to uncorrelated noise.

In the real-time gaming situation, Method 1 and Method2 were employed during separate games. Therefore, classifi-cation for the two methods was performed on different datasets. For this reason, and because each subject undertook


a relatively small number of trials, a direct comparison be-tween the methods in the real-time tests is not as meaningful.The fact that Method 1 performs better than Method 2 maybe attributable more to the anomalous performance of sub-jects 2 and 3 during the games played using Method 2 thanto the feature extraction method itself.

In both online and offline testings, classification basedon 2-second windows exceeded that of 1-second windowsfor all features. This is to be expected as a 2-second windowgives higher-frequency resolution and allows more accurateextraction of the SSVEP peak amplitudes. As mentioned ear-lier, a bit rate 10.3 bits/min is achievable using the full triallength of 3 seconds, allowing for the time taken for the sub-ject to respond to the loss of balance of the character in thegame and for the elicitation of the SSVEP. It is also usefulto calculate theoretical bit rate maxima based purely on the1- and 2-second EEG windows. This gives a peak bit rateof 15.5 bits/min for the 2-second window and 16.6 bits/minfor the 1-second window. It is worth noting that the bit ratedefined in (6) is designed to encourage accuracy over speedand as a result, the penalty incurred by the drop in accuracyalmost negates the doubling of the number of symbols perminute achieved using the 1-second window.

The decrease in performance obtained by the inclusion ofthe first harmonic in the offline testing may be attributed tonoise added to the first harmonic due to activity in the alphaband. It was for this reason that the frequencies of the stimuliwere originally chosen outside the alpha range and only thesecond harmonic was used in the real-time testing.

Two additional interesting observations were made dur-ing both the offline and online testings. Firstly, the two inves-tigators who themselves participated as subjects in the studyachieved better performance both in terms of accuracy inthe offline analysis and in terms of success in completing thegame. This implies that either practice or a more motivatedapproach to stimulus fixation results in a more pronouncedvisual response. This may be thought of in terms of visualattention. Endogenous modulation of SSVEP response hasbeen reported as possible in relation to both foveal fixatedstimuli [10] and covertly attended stimuli in peripheral vi-sion [17]. The improved discriminability of the SSVEP withincreased “conscious effort” may be related to the ability ofthe subject to focus selective attention on the fixated stimu-lus, as well as the ability to inhibit processing of distractorsin the peripheral visual field.

Secondly, in post-experiment debriefing, subjects re-ported that audio feedback during training aided in the suc-cessful sustained fixation on a particular stimulus and theinhibition of responses to distractions. Also, in the case ofan error causing the character to drop to the second levelof imbalance, subjects found it possible to adjust their fix-ation strategy, most notably through observing the checker-board as a whole rather than specifically fixating on any indi-vidual elements or allowing perception of the phase reversalas a moving pattern. These adjustments in fixation strategyprompted by the discrete presentation of biofeedback duringthe game in conjunction with the motivation to succeed inthe task evoked by the immersive environment may be the

reason for the better average performance during the real-time sessions (peak 89.5%) when compared with the offlineresults (peak 75.5%).

A possible explanation for the high performance of thisBCI design in spite of continuous distracting stimulationmay be offered by considering the underlying physiology.The topographic organization of the primary visual cortexis such that a disproportionately wide cortical area is devotedto the processing of information from the central or fovealregion of the visual field, and thus directing one’s gaze at adesired repetitive stimulus produces an SSVEP response towhich all other responses to competing stimuli are small incomparison.

The SSVEP BCI design has not been actively employed inalternative or augmentative communication (AAC) for thedisabled. This is partly due to the fact that, for successfuloperation, the subject’s ocular motor control must be fullyintact to make selections by shifting gaze direction. Giventhe range of accessibility options available for the disabled,it is only in very extreme cases, such as those where reli-able eye movement is not possible, that a communicationmedium driven by EEG generated by the brain itself is ap-plicable.

While the need for reliable ocular motor control is aprerequisite for using the BCI described in this paper, wespeculate that the use of the BCI to control a character inan engaging game such as that described may prove a use-ful tool in assisting with motivational issues pertaining toALS patients. As BCI systems take considerable training tomaster, typically several months, this system may serve toencourage patients to train for a greater length of time. Itmay also be possible that through continued and regularplaying of the game, an ALS patient may be able to retainan acceptable level of control, even after ocular motor con-trol has deteriorated to the point where eye-tracking sys-tems are no longer feasible. This would involve detection ofchanges in the amplitudes of the SSVEP as modulated byattention to the stimuli in one’s peripheral vision. In orderto explore this idea, the authors are currently extending thisstudy to covert visual attention, in which subjects direct at-tention to one of two bilateral stimuli without eye move-ment.

Also worthy of investigation is the presentation of morestimuli in order to give multidimensional control in the 3Denvironment.

5. CONCLUSION

This paper presented the application of an effective EEG-based brain-computer interface design for binary control ina visually elaborate immersive 3D game. Results of the studyindicate that successful binary control using steady-state vi-sual evoked potentials is possible in an uncontrolled environ-ment and is resilient to any ill effects potentially incurred bya rich detailed visual environment. All six subjects demon-strated reliable control achieving an average of 89.5% correctselections for one of the methods investigated, correspond-ing to a bit rate of 10.3 bits/min.


ACKNOWLEDGMENTS

We wish to acknowledge Phil McDarby for his assistance indesigning the gaming environment. We would also like tothank the subjects who participated in the experimental ses-sions.

REFERENCES

[1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller,and T. M. Vaughan, “Brain-computer interfaces for com-munication and control,” Clinical Neurophysiology, vol. 113,no. 6, pp. 767–791, 2002.

[2] L. A. Farwell and E. Donchin, “Talking off the top of yourhead: toward a mental prosthesis utilizing event-related brainpotentials,” Electroencephalography and Clinical Neurophysiol-ogy, vol. 70, no. 6, pp. 510–523, 1988.

[3] E. E. Sutter, “The visual evoked response as a communicationchannel,” in Proc. IEEE/NSF Symposium on Biosensors, pp. 95–100, Los Angeles, Calif, USA, September 1984.

[4] N. Birbaumer, A. Kubler, N. Ghanayim, et al., “The thoughttranslation device (TTD) for completely paralyzed patients,”IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 190–193, 2000.

[5] G. Pfurtscheller and F. H. Lopes da Silva, “Event-relatedEEG/MEG synchronization and desynchronization: basicprinciples,” Clinical Neurophysiology, vol. 110, no. 11,pp. 1842–1857, 1999.

[6] G. Pfurtscheller and C. Neuper, “Motor imagery and directbrain-computer communication,” Proc. IEEE, vol. 89, no. 7,pp. 1123–1134, 2001.

[7] J. R. Wolpaw, D. J. McFarland, G. W. Neat, and C. A. Forneris,“An EEG-based brain-computer interface for cursor control,”Electroencephalography and Clinical Neurophysiology, vol. 78,no. 3, pp. 252–259, 1991.

[8] D. Regan, Human Brain Electrophysiology: Evoked Potentialsand Evoked Magnetic Fields in Science and Medicine, Elsevier,New York, NY, USA, 1989.

[9] M. Cheng, X. Gao, S. Gao, and D. Xu, “Design and implemen-tation of a brain-computer interface with high transfer rates,”IEEE Trans. Biomed. Eng., vol. 49, no. 10, pp. 1181–1186, 2002.

[10] M. Middendorf, G. R. McMillan, G. L. Calhoun, and K. S.Jones, “Brain-computer interfaces based on the steady-statevisual-evoked response,” IEEE Trans. Rehab. Eng., vol. 8, no. 2,pp. 211–214, 2000.

[11] K. E. Misulis, Spehlmann’s Evoked Potential Primer,Butterworth-Heinemann, Boston, Mass, USA, 1994.

[12] F. Sharbrough, G.-E. Chatrian, R. P. Lesser, H. Luders, M.Nuwer, and T. W. Picton, “American electroencephalographicsociety guidelines for standard electrode position nomencla-ture,” Clinical Neurophysiology, vol. 8, no. 2, pp. 200–202,1991.

[13] G. R. Burkitt, R. B. Silberstein, P. J. Cadusch, and A. W. Wood,“Steady-state visual evoked potentials and travelling waves,”Clinical Neurophysiology, vol. 111, no. 2, pp. 246–258, 2000.

[14] B. D. Ripley, Pattern Recognition and Neural Networks, Cam-bridge University Press, Cambridge, UK, 1996.

[15] W. Klimesch, “EEG alpha and theta oscillations reflect cogni-tive and memory performance: a review and analysis,” BrainResearch Reviews, vol. 29, no. 2-3, pp. 169–195, 1999.

[16] J. R. Wolpaw, H. Ramoser, D. J. McFarland, and G.Pfurtscheller, “EEG-based communication: improved accu-racy by response verification,” IEEE Trans. Rehab. Eng., vol. 6,no. 3, pp. 326–333, 1998.

[17] S. T. Morgan, J. C. Hansen, and S. A. Hillyard, “Selective at-tention to stimulus location modulates the steady-state visualevoked potential,” Proceedings of the National Academy of Sci-ences of the United States of America, vol. 93, no. 10, pp. 4770–4774, 1996.

E. C. Lalor received the B.E. degree inelectronic engineering from University Col-lege Dublin, Ireland, in 1998 and theM.S. degree in electrical engineering fromthe University of Southern California in1999. He is currently working towards thePh.D. degree in the Department of Elec-tronic and Electrical Engineering in Uni-versity College Dublin, Ireland. From 2002to 2005, he worked as a Research Asso-ciate with Media Lab Europe, the European research partnerof the MIT Media Lab. His current interests include brain-computer interfaces and signal processing applications in neuro-science.

S. P. Kelly received the B.E. degree in elec-tronic engineering and Ph.D degree in bio-chemical engineering from University Col-lege Dublin, Ireland, in 2001 and 2005,respectively. He is currently a Postdoc-toral Research Fellow in the Cognitive Neu-rophysiology Laboratory, Nathan S. KlineInstitute for Psychiatric Research in NewYork. His current research interests includethe neurophysiology of selective attentionand multisensory integration in humans, and EEG-based brain-computer interfacing for alternative communication and con-trol.

C. Finucane was born in Dublin in 1979. Hegraduated from University College Dublin(UCD) in 2001 with a B.S. degree in elec-tronic engineering. He subsequently com-pleted an M. Eng. Sc. degree at UCD and theNational Rehabilitation Hospital for workentitled “EEG-based brain-computer inter-faces for the disabled” in 2003 before join-ing the Department of Medical Physics, St.James’s Hospital, Dublin, where he cur-rently works as a Medical Physicist. Finucane’s research interestsinclude the development of novel brain-computer interfaces, neu-rophysiological signal analysis, biomedical applications of multi-media, wireless and Internet technologies, and biological systemsmodelling.

R. Burke received the B.S. Eng. degree inmathematics and engineering from Queen’sUniversity, Kingston, Canada, in 1999, andan SM degree in media arts and sciencefrom the Massachusetts Institute of Tech-nology in 2001. From 2002 to 2004, heworked as a Research Associate with theMindGames Group at Media Lab Europe,Dublin. He is currently a Member of the De-veloper and Platform Group at Microsoft.


R. Smith obtained the B.E. degree in elec-tronic engineering from University CollegeDublin (UCD), Ireland, in 2002. He subse-quently completed an M. Eng. Sc. at UCDfor research that focused on neurophysi-ological signal processing and the devel-opment of brain-computer interfaces. Hisresearch primarily focuses on EEG-basedBCIs and their possibilities in the field ofneurological rehabilitation.

R. B. Reilly received his B.E., M. Eng. Sc.,and Ph.D. degrees in 1987, 1989, and 1992,all in electronic engineering, from the Na-tional University of Ireland. Since 1996,he has been on the academic staff in theDepartment of Electronic and ElectricalEngineering at University College Dublin.He is currently a Senior Lecturer and re-searches into neurological signal processingand multimodal signal processing. He wasthe 1999/2001 Silvanus P. Thompson International Lecturer forthe IEE. In 2004, he was awarded a US Fulbright Award for re-search collaboration into multisensory integration with the NathanS. Kline Institute for Psychiatric Research, New York. He is a re-viewer for the Journal of Applied Signal Processing and was GuestEditor for the mini issue on multimedia human-computer inter-face, September 2004. He is the Republic of Ireland Representativeon the Executive Committee of the IEEE United Kingdom and Re-public of Ireland Section. He is an Associate Editor for IEEE Trans-actions on Multimedia and also a reviewer for IEEE Transactionson Biomedical Engineering, IEEE Transactions on Neural Systemsand Rehabilitation Engineering, IEEE Transactions on IndustrialElectronics, Signal Processing, and IEE Proceedings Vision, Image& Signal Processing.

G. McDarby obtained the B.E. and M.S. de-grees in electronic engineering from Uni-versity College Dublin, Ireland, in 1988 and1995, respectively. He received the Ph.D. de-gree in biomedical signal processing in 2000from the University of New South Wales,Sydney. Since 2000, he has worked as aprincipal Research Scientist in Media LabEurope leading a multidisciplinary groupcalled MindGames. His research is focusedon combining sensory immersion (augmented reality), game play,novel biometric interfaces, and intelligent biofeedback to construc-tively affect the state of the human mind. He is strongly committedto finding ways where technology can be a transformational toolto people marginalized in society and is heavily involved with theIntel Computer Clubhouse Programme. He is a much sought-afterspeaker on technology and philosophy and has recently been nom-inated to the European Academy of Sciences for contributions tohuman progress.


Estimating Driving Performance Based onEEG Spectrum Analysis

Chin-Teng LinBrain Research Center, University System of Taiwan, Taipei 112, Taiwan

Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu 300, TaiwanEmail: [email protected]

Ruei-Cheng WuBrain Research Center, University System of Taiwan, Taipei 112, Taiwan


Tzyy-Ping JungInstitute for Neural Computation, University of California, San Diego, La Jolla, CA 92093-0523, USAEmail: [email protected]

Sheng-Fu LiangBrain Research Center, University System of Taiwan, Taipei 112, Taiwan

Department of Biological Science and Technology, National Chiao-Tung University, Hsinchu 300, TaiwanEmail: [email protected]

Teng-Yi HuangBrain Research Center, University System of Taiwan, Taipei 112, Taiwan


Received 12 February 2004; Revised 14 March 2005

The growing number of traffic accidents in recent years has become a serious concern to society. Accidents caused by driver’sdrowsiness behind the steering wheel have a high fatality rate because of the marked decline in the driver’s abilities of perception,recognition, and vehicle control abilities while sleepy. Preventing such accidents caused by drowsiness is highly desirable but re-quires techniques for continuously detecting, estimating, and predicting the level of alertness of drivers and delivering effectivefeedbacks to maintain their maximum performance. This paper proposes an EEG-based drowsiness estimation system that com-bines electroencephalogram (EEG) log subband power spectrum, correlation analysis, principal component analysis, and linearregression models to indirectly estimate driver’s drowsiness level in a virtual-reality-based driving simulator. Our results demon-strated that it is feasible to accurately estimate quantitatively driving performance, expressed as deviation between the center ofthe vehicle and the center of the cruising lane, in a realistic driving simulator.

Keywords and phrases: drowsiness, EEG, power spectrum, correlation analysis, linear regression model.

1. INTRODUCTION

Driving safety has received increasing attention due to thegrowing number of traffic accidents in recent years. Driver’sfatigue has been implicated as a causal factor in many acci-dents. The National Transportation Safety Board found that58 percent of 107 single-vehicle roadway departure crasheswere fatigue-related in 1995, where the truck driver survived

and no other vehicle was involved. Accidents caused bydrowsiness at the wheel have a high fatality rate because ofthe marked decline in the driver’s abilities of perception,recognition, and vehicle control abilities while sleepy. Pre-venting such accidents is thus a major focus of efforts in thefield of active safety research [1, 2, 3, 4, 5, 6]. A well-designedactive safety system might effectively avoid accidents causedby drowsiness at the wheel. Many factors could contribute


to drowsiness or fatigue, such as long working hours, lackof sleep, or the use of medication. Besides, another impor-tant factor of drowsiness is the nature of the task, such asmonotonous driving on highways. The continued construc-tion of highway and improvement of vehicle equipmentshave made it effortless for drivers to maneuver and operatetheir vehicles on the road for hours. An examination of thesituations when drowsiness occurred shows that most of theaccidents were on highways [4].

A number of methods have been proposed to detect vig-ilance changes in the past. These methods can be categorizedinto two main approaches. The first approach focuses onphysical changes during fatigue, such as the inclination of thedriver’s head, sagging posture, and decline in gripping forceon steering wheel [7, 8, 9, 10, 11, 12]. These methods can befurther classified as being either contact or else noncontacttypes in terms of the ways physical changes are measured.The contact type involves the detection of driver’s movementby direct sensor contacts, such as using a cap or eyeglasses orattaching sensors to the driver’s body. The noncontact typemakes use of optical sensors or video cameras to detect vig-ilance changes. These methods monitor driving behavior orvehicle operation to detect driver fatigue. Driving behaviorincludes the steering wheel, accelerator, and brake pedal ortransmission shift level, and the operation of vehicle includesthe vehicle speed, lateral acceleration, and yaw rate or lateraldisplacement. Since these parameters vary in different vehicletypes and driving conditions, it would be necessary to devisedifferent detection logic for different types of vehicles.

The second approach focuses on measuring physiologi-cal changes of drivers, such as eye activity measures, heartbeat rate, skin electric potential, and particularly, electroen-cephalographic (EEG) activities as a means of detecting thecognitive states [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,24, 25]. Stern et al. [22, 23] reported that the eye blinkduration and blink rate typically increase while blink am-plitude decreases as function of the cumulative time ontasks. Other electrooculographic (EOG) studies have foundthat saccade frequencies and velocities decline as time onthe task increases [24, 25]. Recently, Van Orden et al. [14].further compared the eye-activity-based methods to EEG-based methods for alertness estimates in a compensatoryvisual tracking task. However, although these eye-activityvariables are well correlated with the subject performance,those eye-activity-based methods require a relatively longmoving-averaged window aiming to track slow changes invigilance, whereas the EEG-based method can use a shortermoving-averaged window to track second-to-second fluctu-ations in the subject performance in a visual compensatorytask [14, 15, 16, 17, 18].

While approaches based on EEG signals have the ad-vantages for making accurate and quantitative judgmentsof alertness levels, most recent psychophysiological studieshave focused on using the same estimator for all subjects[21, 26, 27]. These methods did not account for large in-dividual variability in EEG dynamics accompanying loss ofalertness, and thus could not accurately estimate or predictindividual changes in alertness and performance. In contrast,

Makeig and Inlow used individualized multiple linear regres-sion models to estimate operators’ changing levels of alert-ness [18]. Jung et al. further use the neural network model,applied to EEG power spectrum, in an auditory monitoringtask and showed that a continuous, accurate, noninvasive,and near real-time estimation of an operator’s global levelof alertness is feasible [15, 16].

The scope of the current study is to examine neural ac-tivity correlates of fatigue/drowsiness in a realistic workingenvironment. Our research investigates the feasibility of us-ing multichannel EEG data to estimate and predict nonin-vasively the continuous fluctuations in human global-levelalertness indirectly by measuring the driver’s driving perfor-mance expressed as deviation between the center of the ve-hicle and the center of the cruising lane, in a very realisticdriving task. To investigate the relationship of minute-scalefluctuations in performance to concurrent changes in theEEG spectrum, we first computed the correlations betweenchanges in EEG power spectrum and the fluctuations in driv-ing performance. We then build an individualized linear re-gression model for each subject applied to principal compo-nents of EEG spectra to assess the EEG dynamics accompa-nying loss of alertness for each operator. This approach canbe used to construct and test a portable embedded system fora real-time alertness-monitoring system.

This paper is organized as follows. Section 2 describesthe detailed descriptions of the EEG-based drowsiness ex-perimental setup including the virtual-reality-based highwayscene, subject instructions, physiological data collection, andalertness measurement. Detailed signal analysis of the col-lected data is given in Section 3. In Section 4, we explorethe relationship between the alertness level, expressed as thedriving performance, and the EEG power spectrum. Behav-ioral data are used to evaluate estimation performance of ouralertness-monitoring model. Finally, we conclude our find-ings in Section 5.

2. EXPERIMENTAL SETUP

2.1. Virtual-reality-based highwaydriving simulator

In this study, we developed a VR-based 3D interactivehighway scene using the high-fidelity emulation software,Coryphaeus, running on a high-performance SGI worksta-tion. First, we created models of various objects (such ascars, roads, and trees, etc.) for the scene and setup the cor-responding positions, attitudes, and other relative parame-ters between objects. Then, we developed the dynamic mod-els among these virtual objects and built a complete high-way simulated scene of full functionality with the aid of thehigh-level C-based API program. Figure 1 shows the VR-based highway scene displayed on a color XVGA 15′′ moni-tor (304.1 mm wide and 228.1 mm high) including four lanesfrom left to right, separated by a median stripe to simulatethe view of the driver. The distance from the left-hand sideto the right-hand side of the road is evenly divided into256 parts (digitized into values 0–255). The highway scene

Estimating Driving Performance Based on EEG Spectrum Analysis 3167

0 60 63 123 132 192 195 255

0 32

Figure 1: VR-based highway scene used in our experiments. Thedistance from the left side to the right side of the road is evenlydivided into 256 parts (digitized into values 0–255). The width ofeach lane is 60 units. The width of the car is 32 units. The refreshrate of highway scene was set properly to emulate a car driving at100 km/h fixed speed on the highway.

changes interactively as the driver/subject is driving the carat a fixed velocity of 100 km/hr on the highway. The car isconstantly and randomly drifted away from the center ofthe cruising lane, mimicking the consequences of a non-ideal road surface. The highway scene was connected to a36-channel physiological measuring system, where the EEG,EOG, ECG, and subject’s performance, deviations betweenthe center of the vehicle and the center of the cruising (third)lane, were continuously and simultaneously measured andrecorded.

2.2. Subjects

Statistical reports [4] showed that the drowsiest time occursfrom late night to early morning, and during the early af-ternoon hours. During these periods, drowsiness often oc-curs within one hour of continuous driving, indicating thatdrowsiness is not necessarily caused by long driving hours.Thus, the best time for doing the highway-drowsiness simu-lation is the early afternoon hours after lunch because driversusually get drowsy within an hour of continuous driving. Atotal of ten subjects (ages from 20 to 40 years) participatedin the VR-based highway driving experiments. Each subjectcompleted simulated driving sessions on two separated days.On the first day, these participants were told of the generalfeatures of the driving task, completed necessary informedconsent material, and then started with a 15 ∼ 45 minutepractice to keep the car at the center of the cruising laneby maneuvering the car with the steering wheel. Subjects re-ported this amount of practice to be sufficient to train par-ticipants to asymptote on the task. After practicing, partic-ipants were then prepared with 33 EEG (including 2 EOG)electrodes referenced to the right earlobe based on a modi-fied international 10–20 system, and 2 ECG electrodes placedon the chest. After a brief calibration procedure, subjects be-gan a ∼ 45 minute lane-keeping driving task and his/herEEG signals and driving performance defined as deviations

of the center of the car from the center of the third lane ofthe road were measured and recorded simultaneously. Par-ticipants returned on a different day to complete the other∼ 45 min driving session. Participants who demonstratedwaves of drowsiness involving two or more microsleeps inboth sessions were selected for further analysis. Based onthese criteria, five participants (10 sessions) were selected forfurther modeling and cross-session testing.

2.3. Data collection

During each driving session, 33 EEG/EOG channels (us-ing sintered Ag/AgCl electrodes), 2 ECG channels (bipolarconnection), and the deviation between the center of thevehicle and the center of the cruising lane are simultane-ously recorded by the Scan NuAmps Express system (Com-pumedics Ltd., VIC, Australia). Before data acquisition, thecontact impedance between EEG electrodes and cortex wascalibrated to be less than 5 kΩ. The EEG data were recordedwith 16-bit quantization level at a sampling rate of 500 Hzand then resampled down to 250 Hz for the simplicity of dataprocessing.

2.4. Alertness measurement

To find the relationship between the measured EEG signalsand the subject’s cognitive state, and to quantify the level ofthe subject’s alertness, we defined a subject’s driving perfor-mance index as the deviation between the center of the ve-hicle and the center of the cruising lane. When the subject isdrowsy (checked from video recordings), the value of drivingperformance index increases, and vice versa. The recordeddriving performance time series were then smoothed using acausal 90-second square moving-averaged filter advancing at2-second steps to eliminate variance at cycle lengths shorterthan 1–2 minutes since the fluctuates of drowsiness level withcycle lengths were in general longer than 4 minutes [15, 16].

3. DATA ANALYSIS

The flowchart of data analysis for estimating the level ofalertness based on the EEG power spectrum was shown inFigure 2. For each subject, after collecting 33-channel EEGsignals and driving deviations in a 45-minute simulated driv-ing session, the EEG data were first preprocessed using a sim-ple lowpass filter with a cut-off frequency of 50 Hz to removethe line noise and other high-frequency noise. Then, we cal-culated the moving-averaged log power spectra of all 33 EEGchannels. The correlation coefficients between the smoothedsubjects’ driving performance and the log power spectra ofall EEG channels at each frequency band are further evalu-ated to form a correlation spectrum. The log power spectraof 2 EEG channels with the highest correlation coefficientsare further decomposed using principal component analy-sis (PCA) algorithm to reduce feature dimensions. Then thefirst 50 representative PCA components with higher eigen-values were selected as the input vectors of the linear regres-sion model to estimate the individual subject’s driving per-formance. Detailed analyses are described in the followingsubsections.


EEG Noiseremoval

Moving-averagedspectralanalysis

EEG logpower spectra Correlation

analysis

SelectedEEG channels

PCA

SelectedPCA

components Linearregression

model

Subject’sdriving

performance

Figure 2: Flowchart for processing the EEG signals. (1) A low-pass filter was used to remove the line noise and higher-frequency(> 50 Hz) noise. (2) Moving-averaged spectral analysis was used tocalculate the EEG log power spectrum of each channel advancing at2-second steps. (3) Two EEG channels with higher correlation coef-ficients between subject’s driving performance and EEG log powerspectrum were further selected. (4) Principal component analysiswas trained and used to decompose selected features and extract therepresentative PCA-components as the input vectors for the linearregression models. (5) The linear regression models were trained inone training session and used to continuously estimate and predictthe individual subject’s driving performance in the testing session.

3.1. Moving-averaged power spectral analysis

Moving-averaged spectral analysis of the EEG data as shownin Figure 3 was first accomplished using a 750-point Han-ning window with 250-point overlap. Windowed 750-pointepochs were further subdivided into several 125-point sub-windows using the Hanning window again with 25-pointsteps, each extended to 256 points by zero padding for a 256-point FFT. A moving median filter was then used to averageand minimize the presence of artifacts in the EEG recordsof all subwindows. The moving-averaged EEG power spec-tra were further converted into a logarithmic scale for spec-tral correlation and driving performance estimation [28, 29].Thus, the time series of EEG log power spectrum for each ses-sion consisted of 33-channel EEG power spectrum estimatedacross 40 frequencies (from 1 to 40 Hz) stepping at 2-second(500-point, an epoch) time intervals.

3.2. Correlation analysis

Since alertness level fluctuates with cycle lengths longerthan 4 minutes [15, 16], we smoothed the EEG power anddriving performance time series using a causal 90-secondsquare moving-averaged filter to eliminate variances at cy-cle lengths shorter than 1–2 minutes. To investigate the re-lationship of minute-scale fluctuations in continuous driv-ing performance with concurrent changes in the 33-channelEEG power spectrum over times and subjects, we measuredcorrelations between changes in the EEG log power spec-trum and driving performance as forming a correlation spec-trum by computing the Pearson’s correlation coefficients be-tween two time series at each EEG frequency expressed as

Corrxy = (∑

(x−x)∗(y− y))/√∑

(x − x)2 ∗∑ (y − y)2. Thechannels with higher correlated coefficients between the EEGlog power spectrum and the subject driving performancewere further selected (see Section 4.1), and the dimensions

2 sTime

750

250

125 256-pt.FFT

25Averaged

Figure 3: Block diagram for moving-averaged spectral analysis. TheEEG data was first divided using a 750-point Hanning window with250-point overlap. The 750-point epochs were further divided intoseveral 125-point frames using Hanning windows again with 25-point step size, and each frame was applied for a 256-point FFTby zero padding. Then the subwindow power spectrum was furtheraveraged and converted to a logarithmic scale to form a log powerspectrum.

of selected EEG power spectrum of such channels were re-duced using principal component analysis (PCA) algorithm.

3.3. Feature extraction

In this study, we use a multivariate linear regression model[30] to estimate/predict the subject’s driving performancebased on the information available in the EEG log powerspectrum at sites Cz and Pz (as suggested in Section 4.1).The EEG power spectrum time series for each session con-sisted of 1350 (750-point, an epoch) EEG power estimates at40 frequencies (from 1 to 40 Hz) at 2–5 time intervals. Wethen applied Karhunen-Loeve principal component analysis(PCA) to the full EEG log spectrum to decompose the EEGlog power spectrum time series and extract the directionsof the largest variance for each session. The PCA is a lineartransformation that can find the principal coordinate axes ofsamples such that along the new axes, the sample variancesare extremes (maxima and minima) and uncorrelated. Us-ing a cutoff on the spread along each axis, a sample may thusbe reduced in its dimensionality [31]. The principal axes andthe variance along each of them are given by the eigenvec-tors and associated eigenvalues of the dispersion matrix. Inour study, the projections of the PCA components account-ing for the largest 50 eigenvalues were then used as inputs totrain the individual linear regression models for each subject,which used a 50-order linear polynomial with a least-square-error cost function to estimate the time course of the driv-ing performance. Each model was trained using the featuresonly extracted in the training session and tested on a sepa-rate testing session of the same subject for each of the five


selected subjects. The parameters of PCA (eigenvectors) fromthe training sessions were used to project features in the test-ing sessions so that all data were processed in the same wayfor the same subject before feeding to the estimation models.

4. RESULTS AND DISCUSSION

4.1. Relationship between the EEG spectrumand subject alertness

To investigate the relationship of minute-scale fluctuationsin driving performance to concurrent changes in the EEGspectrum, we measured correlations between changes in theEEG power spectrum and driving performance by comput-ing the correlation coefficients between the two time seriesat each EEG frequency. We refer to the results as forming acorrelation spectrum. For each EEG site and frequency, wethen computed spectral correlations for each session sepa-rately and averaged the results across all 10 sessions fromthe five subjects. Figure 4a shows the results for 40 fre-quencies between 1 and 40 Hz. Note that the mean correla-tion between performance and EEG power is predominantlypositive at all EEG channels below 20 Hz. We also investi-gated the spatial distributions of these positive correlationsby plotting the correlations between EEG power spectrumand driving performance, computed separately at dominantfrequency bins 7, 12, 16, and 20 Hz (cf. Figure 4a) on thescalp (Figure 4b). As the results in Figure 4a show, the cor-relation coefficients plotted on the scalp maps are predom-inantly positive. The correlations are particularly strong atcentral and posterior channels, which are similar to the re-sults of previous studies in the driving experiments [21, 26,27]. The relatively high correlation coefficients of EEG logpower spectrum with driving performance suggest that us-ing EEG log power spectrum may be suitable for drowsi-ness (microsleep) estimation, where the subject’s cognitivestate might fall into stage one of the nonrapid eye move-ment (NREM) sleep. To be practical for routine use duringdriving or in other occupations, EEG-based cognitive assess-ment systems should use as few EEG sensors as possible toreduce the preparation time for wiring drivers and compu-tational load for estimating continuously the level of alert-ness in near real time. According to the correlations shownin Figure 4b, we believe it is adequate to use the EEG signalsat sites Cz and Pz to assess the alertness level of drivers con-tinuously.

Next, we compared correlation spectra for individual ses-sions to examine the stability of this relationship over timeand subjects. Figures 5 and 6 plot correlation spectra at sitesFz, Cz, Pz, and Oz of two separate driving sessions for ex-treme cases from Subjects A (best) and B (worst), respec-tively. The relationship between EEG power spectrum anddriving performance is stable within the subjects, especiallybelow 20 Hz. However, the relationship is variable from sub-ject to subject (compare Figures 5 and 6). The time intervalsbetween the training and testing sessions of the lane-keepingexperiments ranged from one day to one week long for theselected five subjects.

4035302520151050

Frequency (Hz)

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Cor

rela

tion

coeffi

cien

t

(a)

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

−0.4

−0.2

0

0.2

0.4

−0.4

−0.2

0

0.2

0.4

7 Hz 12 Hz

16 Hz 20 Hz

(b)

Figure 4: Correlation spectra. Correlations between EEG powerand driving performance, computed separately for 40 EEG frequen-cies between 1 and 40 Hz. (a) Grand mean correlation spectra for 10sessions on 5 subjects. (b) Scalp topographies of the correlations atdominant frequencies at 7, 12, 16, and 20 Hz.

The above analyses provide strong and convergingevidence that changes in subject’s alertness level indexedby driving performance during a driving task are stronglycorrelated with the changes in the EEG power spectrumat several frequencies at central and posterior sites. Thisrelationship is relatively variable between subjects, but stablewithin subjects, consistent with the findings from a simpleauditory target detection task reported in [15, 16]. Thesefindings suggest that information available in the EEG canbe used for real-time estimation of changes in alertness ofhuman operators performing monitoring tasks. However,for maximal accuracy the estimation algorithm should becapable of adapting to individual differences in the mappingbetween EEG and alertness.


403020100

Frequency (Hz)

−0.5

0

0.5

1C

orre

lati

onco

effici

ent

Run 1Run 2

(a)

403020100

Frequency (Hz)

−0.5

0

0.5

1

Cor

rela

tion

coeffi

cien

t

Run 1Run 2

(b)

403020100

Frequency (Hz)

−0.5

0

0.5

1

Cor

rela

tion

coeffi

cien

t

Run 1Run 2

(c)

403020100

Frequency (Hz)

−1

−0.5

0

0.5

1C

orre

lati

onco

effici

ent

Run 1Run 2

(d)

Figure 5: Correlation spectra between the EEG power spectrum and the driving performance at (a) Fz, (b) Cz, (c) Pz, and (d) Oz channelsin two separate driving sessions for Subject A (best case). Note that the relationship between the EEG power spectrum and the drivingperformance is stable within this subject.

4.2. EEG-based driving performanceestimation/prediction

In order to estimate/predict the subject’s driving perfor-mance based on the information available in the EEG powerspectrum at sites Cz and Pz, a 50-order linear regressionmodel y = ∑N=50

i=1 aixi + a0 with a least-square-error costfunction is used, where y is the desired output, x is the inputfeature, N is the order (N = 50 in this case), ai’s are the pa-rameters, and a0 = 1 is the constant. We used only two EEGchannels (Cz and Pz) that showed the highest correlation be-tween the EEG power spectrum and the driving performancebecause using all 33 channels may introduce more unex-pected noise. Figure 7 plots the estimated and actual driving

performance of a session of Subject A. The linear regressionmodel in this figure is trained with and tested against thesame session, that is, within-session testing. As can beenseen, the estimated driving performance matched extremelywell the actual driving performance (r = 0.88). When themodel was tested against a separate test session of the samesubject as shown in Figure 8, the correlation between the ac-tual and estimated driving performance, though decreased,remained high (r = 0.7). Across ten sessions, the mean cor-relation coefficient between actual driving performance timeseries and within-session estimation is 0.90± 0.034, whereasthe mean correlation coefficient between actual drivingperformance and cross-session estimation is 0.53 ± 0.116.These results suggest that continuous EEG-based driving


403020100

Frequency (Hz)

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8C

orre

lati

onco

effici

ent

Run 1Run 2

(a)

403020100

Frequency (Hz)

−0.5

0

0.5

1

Cor

rela

tion

coeffi

cien

t

Run 1Run 2

(b)

403020100

Frequency (Hz)

−0.5

0

0.5

1

Cor

rela

tion

coeffi

cien

t

Run 1Run 2

(c)

403020100

Frequency (Hz)

−1

−0.5

0

0.5

1C

orre

lati

onco

effici

ent

Run 1Run 2

(d)

Figure 6: Correlation spectra between the EEG power spectrum and the driving performance at (a) Fz, (b) Cz, (c) Pz, and (d) Oz channelsin two separate driving sessions for Subject B (worst case). Note that the relationship between the EEG power spectrum and the drivingperformance is stable within this subject, especially below 20 Hz. However, the relationship is variable from subject to subject (compareFigures 5 and 6).

performance estimation using a small number of datachannels is feasible, and can give accurate information aboutminute-to-minute changes in operator alertness.

5. CONCLUSIONS

In this study, we demonstrated a close relationship be-tween minute-scale changes in driving performance andthe EEG power spectrum. This relationship appears sta-ble within individuals across sessions, but is somewhatvariable between subjects. We also combined EEG power

spectrum estimation, correlation analysis, PCA, and lin-ear regression to continuously estimate/predict fluctuationsin human alertness level indexed by driving performancemeasurement, deviation between the center of the vehicleand the center of the cruising lane. Our results demon-strated that it is feasible to accurately estimate driving errorsbased on multi-channel EEG power spectrum estimation andprincipal component analysis algorithm. The computationalmethods we employed in this study were well within the ca-pabilities of modern real-time embedded digital signal pro-cessing hardware to perform in real time using one or more


25002000150010005000

Time (s)

0

10

20

30

40

50

60

70

80

90

100D

rivi

ng

erro

r

Actual deviationEstimated deviation

Figure 7: Driving performance estimates for a session of Subject A,based on a linear regression (dashed line) of PCA-reduced EEG logspectra at two scalp sites, overplotted against actual driving perfor-mance time series for the session (solid line). The correlation coef-ficient between the two time series is r = 0.88.

25002000150010005000

Time (s)

0

20

40

60

80

100

120

Dri

vin

ger

ror

Actual deviationEstimated deviation

Figure 8: Driving performance estimates for a test session, based ona linear regression (dashed line) of PCA-reduced EEG log spectrafrom a separate training session of the same subject, overplottedagainst actual driving performance time series of the test session(solid line). The correlation coefficient between the two time seriesis r = 0.7. Note that the training and testing data in this study werecompletely disjoined.

channels of EEG data. Once an estimator has been developedfor each driver, based on limited pilot testing, the methoduses only spontaneous EEG signals from the individual, anddoes not require further collection or analysis of operatorperformance. The proposed methods thus might be used toconstruct and test a portable embedded system for a real-time alertness-monitoring system.

ACKNOWLEDGMENTS

The authors would like to thank Mrs. Jeng-Ren Duann,Chun-Fei Hsu, Wen-Hung Chao, Yu-Chieh Chen, Kuan-Chih Huang, Shih-Cheng Guo, and Yu-Jie Chen for theirgreat help in developing and operating the experiments. Thiswork was supported in part by the Ministry of Education,Taiwan, under Grant EX-91-E-FAOE-4-4 and Ministry ofEconomic Affairs, Taiwan, under Grant 93-17-A-02-S1-032to C. T. Lin and associates and a Grant from Swartz Founda-tion to T. P. Jung.

REFERENCES

[1] J. French, “A model to predict fatigue degraded performance,”in Proc. IEEE 7th Conference on Human Factors and PowerPlants, vol. 4, pp. 6–9, Scottsdate, Ariz, USA, September 2002.

[2] W. W. Wierwille, S. S. Wreggit, and R. R. Knipling, “Develop-ment of improved algorithms for on-line detection of driverdrowsiness,” in Proc. Convergence ’94, International Congresson Transportation Electronics, SAE (Society of Automotive En-gineers), pp. 331–340, Detroit, Mich, USA, October 1994.

[3] A. Amditis, A. Polychronopoulos, E. Bekiaris, and P. C. An-tonello, “System architecture of a driver’s monitoring and hy-povigilance warning system,” in Proc. IEEE Intelligent VehicleSymposium (IV ’02), vol. 2, pp. 527–532, Versailles, France,June 2002.

[4] H. Ueno, M. Kaneda, and M. Tsukino, “Development ofdrowsiness detection system,” in Proc. Vehicle Navigation andInformation Systems Conference (VNIS ’94), pp. 15–20, Yoko-hama, Japan, August–September 1994.

[5] R. Grace, V. E. Byrne, D. M. Bierman, et al., “A drowsy driverdetection system for heavy vehicles,” in Proc. AIAA/IEEE/SAE17th Conference on Digital Avionics Systems (DASC ’98), vol. 2,pp. I36/1–I36/8, Bellevue, Wash, USA, October–November1998.

[6] T. Pilutti and A. G. Ulsoy, “Identification of driver state forlane-keeping tasks,” IEEE Trans. Syst., Man, Cybern. A, vol. 29,no. 5, pp. 486–502, 1999.

[7] P. Smith, M. Shah, and N. da Vitoria Lobo, “Monitor-ing head/eye motion for driver alertness with one camera,”in Proc.15th International Conference on Pattern Recognition(ICPR ’00), vol. 4, pp. 636–642, Barcelona, Spain, September2000.

[8] C. M. Frederick-Recascino and M. Hilscher, “Monitoring au-tomated displays: effects of and solutions for boredom,” inProc. 20th Conference of Digital Avionics Systems (DASC ’01),vol. 1, pp. 5D3/1–5D3/5, Daytona Beach, Fla, USA, October2001.

[9] G. Kaefer, G. Prochart, and R. Weiss, “Wearable alertnessmonitoring for industrial applications,” in Proc. 7th IEEE In-ternational Symposium on Wearable Computers (ISWC ’03),pp. 254–255, White Plains, NY, USA, October 2003.

[10] K. B. Khalifa, M. H. Bedoui, R. Raytchev, and M. Dogui, “Aportable device for alertness detection,” in Proc. 1st Annual In-ternational IEEE-EMBS Special Topic Conference on Microtech-nologies in Medicine & Biology, pp. 584–586, Lyon, France, Oc-tober 2000.

[11] C. A. Perez, A. Palma, C. A. Holzmann, and C. Pena, “Face andeye tracking algorithm based on digital image processing,” inProc. IEEE International Conference on Systems, Man, and Cy-bernetics (SMC ’01), vol. 2, pp. 1178–1183, Tucson, Ariz, USA,October 2001.

[12] J. C. Popieul, P. Simon, and P. Loslever, “Using driver’s headmovements evolution as a drowsiness indicator,” in Proc. IEEE


International Intelligent Vehicles Symposium (IV ’03), pp. 616–621, Columbus, Ohio, USA, June 2003.

[13] T. L. Morris and J. C. Miller, “Electrooculographic and perfor-mance indices of fatigue during simulated flight,” BiologicalPsychology, vol. 42, no. 3, pp. 343–360, 1996.

[14] K. Van Orden, W. Limbert, S. Makeig, and T.-P. Jung, “Eyeactivity correlates of workload during a visualspatial memorytask,” Human Factors, vol. 43, no. 1, pp. 111–121, 2001.

[15] T.-P. Jung, S. Makeig, M. Stensmo, and T. J. Sejnowski, “Esti-mating alertness from the EEG power spectrum,” IEEE Trans.Biomed. Eng., vol. 44, no. 1, pp. 60–69, 1997.

[16] S. Makeig and T.-P. Jung, “Changes in alertness are a principalcomponent of variance in the EEG spectrum,” Neuroreport,vol. 7, no. 1, pp. 213–216, 1995.

[17] M. Matousek and I. Petersen, “A method for assessing alert-ness fluctuations from EEG spectra,” Electroencephalographyand Clinical Neurophysiology, vol. 55, no. 1, pp. 108–113,1983.

[18] S. Makeig and M. Inlow, “Lapses in alertness: coherence offluctuations in performance and EEG spectrum,” Electroen-cephalography and Clinical Neurophysiology, vol. 86, no. 1,pp. 23–35, 1993.

[19] J. Qiang, Z. Zhiwei, and P. Lan, “Real-time nonintrusive mon-itoring and prediction of driver fatigue,” IEEE Trans. Veh.Technol., vol. 53, no. 4, pp. 1052–1068, 2004.

[20] S. Makeig and T.-P. Jung, “Tonic, phasic, and transient EEGcorrelates of auditory awareness in drowsiness,” CognitiveBrain Research, vol. 4, no. 1, pp. 15–25, 1996.

[21] S. Roberts, I. Rezek, R. Everson, H. Stone, S. Wilson, andC. Alford, “Automated assessment of vigilance using com-mittees of radial basis function analysers,” IEE Proceedings–ScienceMeasurement & Technology, vol. 147, no. 6, pp. 333–338, 2000.

[22] J. A. Stern, D. Boyer, and D. Schroeder, “Blink rate: a possiblemeasure of fatigue,” Human Factors, vol. 36, no. 2, pp. 285–297, 1994.

[23] J. A. Stern, L. C. Walrath, and R. Goldstein, “The endogenouseyeblink,” Psychophysiology, vol. 21, no. 1, pp. 22–33, 1984.

[24] D. Schmidt, L. A. Abel, L. F. Dell’Osso, and R. B. Daroff, “Sac-cadic velocity characteristics: intrinsic variability and fatigue,”Aviation, Space and Environmental Medicine, vol. 50, no. 4,pp. 393–395, 1979.

[25] D. K. McGregor and J. A. Stern, “Time on task and blink ef-fects on saccade duration,” Ergonomics, vol. 39, no. 4, pp. 649–660, 1996.

[26] B. J. Wilson and T. D. Bracewell, “Alertness monitor usingneural networks for EEG analysis,” in Proc. IEEE Signal Pro-cessing Society Workshop on Neural Networks for Signal Process-ing X, vol. 2, pp. 814–820, Sydney, NSW, Australia, December2000.

[27] P. Parikh and E. Micheli-Tzanakou, “Detecting drowsinesswhile driving using wavelet transform,” in Proc. IEEE 30thAnnual Northeast on Bioengineering Conference, pp. 79–80,Boston, Mass, USA, April 2004.

[28] M. Steriade, “Central core modulation of spontaneous oscil-lations and sensory transmission in thalamocortical systems,”Current Opinion in Neurobiology, vol. 3, no. 4, pp. 619–625,1993.

[29] M. Treisman, “Temporal rhythms and cerebral rhythms,” inTiming and Time Perception, J. Gibbon and L. Allen, Eds.,vol. 423, pp. 542–565, New York Academy of Sciences, NewYork, NY, USA, 1984.

[30] S. Chatterjee and A. S. Hadi, “Influential observations, highleverage points, and outliers in linear regression,” StatisticalScience, vol. 1, no. 3, pp. 379–416, 1986.

[31] C. M. Bishop, Neural Networks for Pattern Recognition, OxfordUniversity Press, Oxford, UK, 1995.

Chin-Teng Lin received the B.S. degreefrom the National Chiao-Tung University(NCTU), Taiwan, in 1986, and the Ph.D.degree in electrical engineering from Pur-due University, USA, in 1992. He is cur-rently the Chair Professor and AssociateDean of the College of Electrical Engineer-ing and Computer Science, and Director ofthe Brain Research Center at NCTU. He isthe author of Neural Fuzzy Systems (Pren-tice Hall). He has published about 90 journal papers includ-ing over 65 IEEE journal papers. He is an IEEE Fellow for hiscontributions to biologically inspired information systems. Heserves on Board of Governors at the IEEE CAS and SMC Soci-eties now. He has been the President of Asia Pacific Neural Net-work Assembly since 2004. He has received the Outstanding Re-search Award granted by the National Science Council, Taiwan,since 1997 to present, received the Outstanding Engineering Pro-fessor Award granted by the Chinese Institute of Engineering(CIE) in 2000, and the 2002 Taiwan Outstanding Information-Technology Expert Award. He was also elected to be one of the38th Ten Outstanding Rising Stars in Taiwan (2000). He currentlyserves as an Associate Editor of the IEEE Transactions on Cir-cuits and Systems, Part I & Part II, IEEE Transactions on Systems,Man, Cybernetics, IEEE Transactions on Fuzzy Systems, and soforth.

Ruei-Cheng Wu received the B.S. degreein nuclear engineering from the NationalTsing-Hua University, Taiwan, in 1995, andthe M.S. degree in control engineering fromthe National Chiao-Tung University, Tai-wan, in 1997. He is currently pursuing thePh.D. degree in electrical and control en-gineering at the National Chiao-Tung Uni-versity, Taiwan. His current research inter-ests are biomedical signal processing, mul-timedia signal processing, fuzzy neural networks, and linear con-trol.

Tzyy-Ping Jung received the B.S. degree inelectronics engineering from the NationalChiao Tung University, Taiwan, in 1984, andthe M.S. and Ph.D. degrees in electrical en-gineering from The Ohio State University in1989 and 1993, respectively. He was a Re-search Associate at the National ResearchCouncil of the National Academy of Sci-ences and at the Computational Neurobi-ology Laboratory, The Salk Institute, SanDiego, Calif. He is currently an Associate Research Professor at theInstitute for Neural Computation of the University of California,San Diego. He is also the Associate Director of the Swartz Center forComputational Neuroscience at UCSD. His research interests are inthe areas of biomedical signal processing, cognitive neuroscience,artificial neural networks, time-frequency analysis of human EEG,functional neuroimaging, and the development of neural human-system interfaces.


Sheng-Fu Liang was born in Tainan, Tai-wan, in 1971. He received the B.S. and M.S.degrees in control engineering from the Na-tional Chiao-Tung University (NCTU), Tai-wan, in 1994 and 1996, respectively. He re-ceived the Ph.D. degree in electrical andcontrol engineering from NCTU in 2000.From 2001 to 2005, he was a Research Assis-tant Professor in electrical and control engi-neering, NCTU. In 2005, he joined the De-partment of Biological Science and Technology, NCTU, where heserves as an Assistant Professor. He has also served as the ChiefExecutive of the Brain Research Center, NCTU Branch, UniversitySystem of Taiwan, since September 2003. His current research in-terests are biomedical engineering, biomedical signal/image pro-cessing, machine learning, fuzzy neural networks (FNN), the devel-opment of brain-computer interface (BCI), and multimedia signalprocessing.

Teng-Yi Huang received the B.S. degreein electrical engineering from the NationalCentral University, Taiwan, in 2002, andthe M.S. degree in electrical and controlengineering from the National Chiao-TungUniversity, Taiwan, in 2004. He is cur-rently pursuing the Ph.D. degree at the Na-tional Chiao-Tung University, Taiwan. Hisresearch interests are in the areas of biomed-ical signal processing, biofeedback control,and virtual reality technology.

jean-marc vesin and touradj ebrahimi- trends in brain computer interfaces

Documents