a weighted approach for sparse signal support estimation...

A Weighted Approach for Sparse Signal SupportEstimation with Application to EEG Source

LocalizationAhmed Al Hilli, Laleh Najafizadeh and Athina Petropulu

Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854E-mail: [email protected], [email protected], [email protected]

Abstract— In sparse signal recovery problems, `1-norm mini-mization is typically used as an alternative to more complex `0-norm minimization. The Range Space Property (RSP) providesthe conditions under which the least `1-norm solution isequal to at most one of the least `0-norm solutions. Theseconditions depend on the sensing matrix and the supportof the underlying sparse solution. In this paper, we addressthe problem of recovering sparse signals by weighting thecorresponding sensing matrix with a diagonal matrix. We showthat by appropriately choosing the weights, we can formulatean `1-norm minimization problem that satisfies the RSP, evenif the original problem does not. By solving the weightedproblem we can obtain the support of the original problem.We provide the conditions which the weights must satisfy,for both the noise free and the noisy cases. Although theprecise conditions involve information about the support of thesparse vector, the class of good weights is very wide, and inmost cases encompasses an estimate of the underlying vector,obtained via a conventional method, i.e., a method that does notencourage sparsity. The proposed approach is a good candidatefor Electroencephalography (EEG) sparse source localization,where the corresponding sensing matrix has high coherence.The performance of the proposed approach is evaluated bothvia simulations and also experiments based on localizing activesources in the brain corresponding to an auditory task fromEEG recordings of a human subject.

I. INTRODUCTION

In sparse signal recovery problems we wish to describethe observation, y, using the smallest possible basis from adictionary matrix, A, or equivalently, we seek the sparsestsolution to the problem y = Ax. In order to avoid thecomplexity of the underlying `0-norm minimization problem,one typically employs `1-norm minimization, hoping thatthe two problems are strictly equivalent [1], i.e., there isa unique least `0-norm solution (a unique sparsest solution),which coincides with the least `1-norm solution. Conditionsfor strict equivalence include the mutual coherence [2], theRestricted Isometry Property (RIP) [3], and the Null SpaceProperty [4]. When `1-norm minimization algorithms areused for sparse signal recovery [2], [3], [5], it is implicitlyassumed that the conditions for strict equivalence hold.However, in most cases, either the conditions do not hold,or their validity cannot be easily confirmed. In fact, only afew dictionary matrices have been proven to satisfy the strictequivalence conditions [3], [6]. In real world scenarios, strictequivalence conditions may not be satisfied, in which case

Work supported in part by NSF under grant NSF ECCS 1408437.Preliminary results of this work were presented at CAMSAP 2015.

the least `1-norm solution may not be related to the sparsesignal of interest.

The recently introduced Range-Space Property (RSP) andfull rank property [1] address the case in which the least`0-norm solution is not unique, and provide the conditionsfor the least `1-norm solution to be equal to at most oneof the sparsest solutions; when this happens, the `1-normminimization problem and the `0-norm problems are calledequivalent. The RSP conditions depend on the sensing matrixand the support of the underlying sparse solution.

Another approach for estimating a sparse vector encom-passes re-weighted recursive methods, most notably, the FO-cal Underdetermined System Solver (FOCUSS) method [7],and the re-weighted `1-norm minimization [8]. In FOCUSS,a low resolution estimate of the sparse vector is used asan initial weight. In an iterative fashion, FOCUSS solvesa weighted minimum least-squares problem, using weightsthat are inversely proportional to the solution obtained duringthe previous iteration. Although the iteration has been shownto converge, due to multiple local minima, it may convergeto a sub-optimal solution [9]. Also, FOCUSS performancedegrades at low Signal-to-Noise Ratio (SNR) [10], [11].The re-weighted `1-norm approach [8] is also iterative;in each iteration, a sparse signal estimate is obtained bysolving a weighted `1-norm minimization problem subjectto various constraints, with the weights taken to be inverselyproportional to the absolute values of the signal estimateobtained in the previous iteration. In addition to problemswith local minima, there is no guarantee that the re-weighted`1-norm approach will converge as the number of iterationsincreases.

Greedy algorithms have also been proposed for sparsevector recovery, such as the Matching Pursuit (MP) [12],the Orthogonal MP (OMP) [13], and the Order RecursiveMP (ORMP) [14]. The MP method constructs a basis thatbest represents the signal by selecting columns out of theovercomplete basis matrix, A. The basis is constructed onevector at a time. In the i-th iteration, the column that ismost correlated with the residual vector is selected and addedto the basis; the residual vector is initially set equal to theobservation, and in each iteration is updated by subtractingfrom it the contribution of the selected column in thatiteration. In OMP [13], the estimate of the sparse vectoris updated by projecting the observation vector onto thesubspace of the selected columns. This avoids the selectionof the same column in different iterations, which can happenin MP. A variant of OMP is the Order Recursive Matching

Pursuit (ORMP) [12]. Since in all greedy algorithms columnselection depends on the column correlation with the resid-ual, the performance suffers in cases of highly correlatedsources. For instance, in [14], the authors show degradedperformance in cases of data consisting of closely spacedfrequencies and a dictionary matrix A constructed from therows of a high resolution Discrete Fourier Transform.

In this paper, we address the problem of recovering sparsesignals by weighting the corresponding sensing matrix witha diagonal matrix. We show that by appropriately choosingthe weights, we can formulate an `1-norm minimizationproblem that satisfies the RSP, even if the original problemdoes not. By solving the weighted problem we obtain thesupport of the original solution. We provide the conditionswhich the weights must satisfy, for both the noise free andthe noisy cases. We should emphasize that like all existingweighted methods [7], [8], the proposed method does notprovide a precise construction of weights. Instead, our papershows analytically that there is a large class of functionsthat qualify as good weights; these functions are related to alow resolution estimate of the underlying sparse vector, i.e.,an estimate obtained via a standard method, that does notinduce sparsity. It is worth mentioning that after estimatingthe support, one can easily recover the signal by solvingan overdetermined system based on the observations and amatrix composed of the column of the dictionary matrixthat associated with the support. Simulation results showthat in practical scenarios in which one does not knowwhether the RSP is met, using the proposed approach resultsis significantly improved localization of the sparse signalsamples. Simulation results show that the proposed methodperforms better than [7], and is less sensitive than greedyalgorithms [12]–[14] in scenarios with correlated sources.This paper expands on our preliminary results which werereported in [15], by analyzing the effect of using a coarsesignal estimate to construct the weight, and extends theresults to noisy environments.

As an example of application for the proposed approach,we focus on Electroencephalography (EEG) source localiza-tion problem, in which, measurements obtained by sensorsplaced on the head are used to localize activations inside thebrain. Assuming sparse brain activity in response to simpletasks, the problem of source localization can be formulatedas a sparse signal recovery problem, with the support ofthe sparse vector directly related to the coordinates of thesources inside the brain. The challenge, however, is thatthe corresponding basis matrix, referred to as the lead fieldmatrix, has high mutual coherence, based on which, there isno guarantee that the corresponding least `1-norm solutionwill lead to the actual sources. The proposed method, withweights equal to the MUSIC estimate of the brain activity,is used in an experiment eliciting auditory evoked potentials,and is shown to correctly localize brain activations.

The paper is organized as follows. Section II reviews back-ground theory on sparse vector recovery. Section III presentsthe proposed approach. Section IV shows the simulationand experimental results of the application of the proposedapproach in EEG source localization, and Section V providesconclusion remarks.

Notation Throughput the paper we use bold face capital

letter to denote matrices, bold face small letters to denotecolumn vectors, normal small letter to denote scalars. Weuse ≺ to represent element-wise less than, � to representelement-wise greater than, (.)† to represent the Moore-Penrose pseudoinverse, and N(A) to represent the null spaceof matrix A. xi is used to represent the i-th element of thevector x, ai is used to represent the i-th column of a matrixA, wij represents an element in a matrix at row i and columnj of a matrix W. The ‖.‖p is used to denote the p norm ofa vector, while |x| denotes a vector whose elements are theabsolute values of the corresponding elements of x.

II. BACKGROUND THEORY ON SPARSE SIGNALRECOVERY

The sparsest solution of the underdetermined problem

y = Ax (1)

can be obtained by solving the `0-norm minimization prob-lem, i.e.,

minimizex

‖x‖0subject to y = Ax.

(2)

However, since this is an NP-hard problem, its convex `1-norm relaxation is used instead [2], [3], [5], i.e.,

minimizex

‖x‖1subject to y = Ax.

(3)

Under certain conditions, the problems (2) and (3) are strictlyequivalent, i.e., there is a unique least `0-norm solutionthat coincides with the least `1-norm solution [7], [2]–[4].Sometimes, (2) has several sparsest solutions, i.e., solutionswith the same number of non-zero elements but differentsupport. When one of those sparsest solutions coincides withthe a solution of (3), then, the problems of (3) and (2) arereferred to as equivalent [1]. The work in [1] provides a set ofconditions for the least `1-norm solution to be equal to a thesparsest solution, and also shows that under those conditionsthere is at most one such sparsest solution that satisfies theRSP conditions. The conditions are given in the followingtheorem.

Theorem 1 Let x ∈ Rn be a sparsest solution to (1). x isthe least `1-norm solution if and only if x satisfies both ofthe following conditions:

(i) Range Space Property (RSP): There exists a vector u

such that:

(a) u ∈ R(AT )(b) ui = 1 xi > 0

(c) ui = −1 xi < 0(d) |ui| < 1 xi = 0

(ii) Full Rank Property: The matrix [AJ+ AJ− ] has fullcolumn rank, where AJ+ and AJ− are matrices containingthe columns of A associated with the positive and negativeelements of x, respectively [1].

Corollary 1 [1]: For any given underdetermined linearsystem, there exists at most one sparsest solution satisfyingthe RSP [1].

III. THE PROPOSED APPROACH

A. Noise-Free Sparse Vectors

As it was already mentioned, if the sensing matrix A in(1) does not satisfy the strong equivalence conditions, thereis no guarantee that the least `1-norm solution will be equalto the underlying sparse vector x. In the following, we showthat by appropriately weighting the sensing matrix, we canformulate an `1-norm minimization problem that satisfies theRSP, thus guaranteeing that its solution has the same supportwith at most one of the sparsest solutions. By selecting theweights based on an estimate obtained via a conventionalmethod that does not encourage sparsity, we bias the `1-norm minimization problem to produce a sparsest solutionthat has the same support as the signal of interest.

A tool in showing the aforementioned result is a sufficientcondition for Theorem 1 to hold, in the context of the systemof (1). The sufficient condition is given in the followingtheorem.

Theorem 2 If for a sparsest solution x it holds that∣∣ATJ0(ATJs)†us∣∣ ≺ 1 , then x equals the least `1-normsolution, where 1 is a vector of 1’s of appropriate size,AJs , [AJ+ AJ− ], AJ0 contains the columns of Aassociated with zero values in x, and uTs = [u

Tp ,u

Tn ]T where

up and un are vectors of 1’s and −1’s corresponding topositive and negative xi’s respectively.

Proof: To prove Theorem 2 we will prove its contra-positive, i.e., we will show that if a solution does not satisfyTheorem 1, then

∣∣ATJ0(ATJs)†us∣∣ � 1.Let x satisfy RSP conditions (a),(b), and (c) except

condition (d) of Theorem 1(i). On defining v such thatATv = us, the aforementioned conditions can be rewrittenas

ATJ+v = 1,ATJ−v = −1,

∣∣ATJ0v∣∣ � 1 (4)Let us group together the first and second conditions in (4)

to form the system ATs v = us, where As =[AJ+ AJ−

],

and us =[1 −1

]T. This is an underdetermined system,

whose least square solution equals vls = (ATs )†us. Substi-

tuting vls in∣∣ATJ0v∣∣ � 1, we get |ATJ0(ATs )†us| � 1, which

completes the proof.

Let W be a diagonal matrix, which is nonzero over thesupport of x, such that x = Wq. Then (2) can be written as

minimizeq

‖Wq‖0

subject to y = AWq.(5)

Since ‖Wq‖0 = ‖q‖0, we can write (5) as

minimizeq

‖q‖0


The corresponding `1-norm minimization problem becomes

minimizeq

‖q‖1


By solving the problem of (7), we will be able to determinethe support of x, as x and q have the same support. For the

discussion below, we will assume that the observation is alinear combination of independent columns of A.

Suppose that the problem of (3) does not satisfy the RSP.When formulating the problem of (7), we can appropriatelyselect W such that the solution of (7) satisfies the conditionin Theorem 2, and as such, the problem of (7) satisfies theRSP. With the right W, and since based on Corollary 1 thereis at most 1 sparsest solution satisfying the RSP, the solutionof (7) would have the same support as the least `0-normsolution of (1).

Substituting A with AW into the condition of Theorem2, we get the following sufficient condition for W:

|WJ0ATJ0(ATJs)†W−1Js us| ≺ 1 (8)

where WJ0 and WJs present the corresponding elementsof W that are associated with zero and non-zero elementsof x, respectively.

Proposition 1: If we choose W such that (8) is satisfied ona sparsest solution q?, then q? is the only sparsest solutionthat is equal to the solution of (7).

The proof follows directly from Corollary 1.

We should emphasize that the weighting matrix doesnot improve the coherence of the sensing matrix, sincemultiplication with a diagonal matrix does not change thematrix coherence. Instead, we can constrain the weightingmatrix so that the weighted problem satisfies the RangeSpace Property. By satisfying the RSP, the least `1-normsolution will have the same support as at most one of thesparsest solutions of the original `0-norm problem.

There are infinitely many choices for W that validate thecondition in (8). In this paper, we use as weight vector anestimate of the sparse vector obtained through a method thatdoes not encourage sparsity. We should note that the, `1-norm solution would not be a good weight vector becausemost of its entries are zeros, and important components in theestimated vector may have been lost. Suppose that a roughestimate of x can be approximated as x̂, given as

x̂j =∑i∈S

xihi(j − i), (9)

where hi(j) is a Gaussian kernel with zero mean andvariance σi, and S is the set of indexes in the support ofthe sparse vector. Let us set wj = xj . Then, (8) can bewritten as

|WJ0ATJ0(ATJs)†Uw−J | < 1 (10)

where U is a diagonal matrix with diagonal entries equalto us, and w−J a column vector with entries equal tothe reciprocal of diagonal entries of WJs. Via the identity|aTb| ≤ ‖a‖∞‖b‖1, with b = wJ0j1, we can rewrite (10)as

‖(ATJ0j(ATJs)†U)T ‖∞wJ0jK∑k=1

w−Jk < 1

‖(ATJ0j(ATJs)†U)T ‖∞∑i∈S

wi <1∑K

k=1 w−Jk

(11)

for all j /∈ S, where K represents the number of non-zeroentries in x.

The term ‖(ATJ0i(ATJs)†U)T ‖∞ of (11) can be written as‖(ATJ0iAJs(ATJsAJs)−1U)T ‖∞ which is basically relatedto the coherence of the matrix A. Without the weights(σi = 0), high coherence might prevent (11) from beingvalid. However, if σ 6= 0, there would be weights that willcounteract the high coherence and make (11) valid.

B. A More Relaxed Condition on W

Eq. (8) provides a sufficient condition for W in orderfor RSP to be satisfied. In fact, there is a wider class ofweights, which can be found by exploiting a more relaxedcondition for W’s that satisfy the RSP. To provide therelaxed condition on W, we first rewrite the RSP conditionsin an equivalent form, and then provide an upper limit forweights such that the RSP is satisfied.

Let x0 be a sparsest solution to (1). Based on the RSP,there should be a vector v such that

(a) ATJsv = us(b) |ATJ0v| < 1

(12)

with AJs, AJ0, and us as defined in Theorem 2.If we assume that the number of non-zero entries in the

sparse vector x0 is less than the number of rows in A, (12-(a)) is an underdetermined system, and its solution can bewritten as

v = (ATJs)†us +α (13)

where α is a vector that belongs to the null space of ATJs.Substituting (13) into (12), we get

(a) |ATJ0(ATJs)†us + ATJ0α| < 1(b) α ∈ N(ATJs)

(14)

Note that the conditions in (14) are sufficient and necessaryfor x0 to be equal to the least `1-norm solution. In otherwords, x0 equals the least `1-norm solution if and only ifthe intersection of the set of vectors α that satisfy (14-(a))with the null space of ATJs is a non-empty set. Finding suchα can be done by solving the following convex problem

minimizeα,�

�

subject to |ATJ0(ATJs)†us + ATJ0α| ≤ 1�ATJsα = 0

(15)

If the minimum � < 1, then the conditions in (14) aresatisfied. Otherwise, the conditions in (14) are not satisfied(i.e., there is no α ∈ N(ATs ) that satisfies (14-a)). For theweighted approach, the conditions of (14) can be rewrittenas

(a) |WJ0ATJ0(ATJs)†W−1Js us + WJ0A

TJ0α| < 1

(b) α ∈ N(ATJs)(16)

If the sufficient condition of Theorem 2 is not satisfied for thenon-weighted `1-norm problem (i.e., some of the elements inthe vector

∣∣ATJ0(ATJs)†us∣∣ have values larger than or equal to1), x0 can still be a solution to the `1-norm problem if thereis a vector in the null space of ATs that can meet (14-(a)).However, due to the high coherence between the columns ofA, many elements in the vector

∣∣ATJ0(ATJs)†us∣∣ may havevalues greater than 1, and it could be impossible to findsuch an α (i.e., the intersection between the two sets thatare defined in ((14)-(a) and (b)) is an empty set). However,

we can ensure that the intersection of the sets defined in((16)-(a) and (b)) is non-empty set by choosing W such that((16)-(a)) is satisfied for a vector α ∈ N(ATJs). Next, wefind an upper limit for W that follows the model in (9),such that the conditions in (16) are satisfied.

Let us assume that the weights follow the model that isdefined in (9). Following the model in (9), the problem canbe recast as selecting the maximum σ such that there is anintersection between the sets defined in (16). This can bedone by solving the following optimization problem

maximizeα,σ

σ

subject to |WJ0ATJ0(ATJs)†W−1Js us + WJ0ATJ0α| < 1

ATJsα = 0

wJ0i =∑i∈S

xihi(j − i), ∀j /∈ S

(17)where hi(j) is a Gaussian kernel with zero mean andvariance σ2. We can solve (17) via an iterative approach.By first choosing large σ, we can find the vector α suchthat the distance between the two set defined in the first andsecond constraint in (17) is minimum. Using this α, we canconstruct the maximum weights such that the first constraintin (17) is satisfied. The maximum weights can be used toestimate σ such that the weights in the third constraint in (17)is less than the maximum weights. The process is repeated byfinding another α with minimum distance between the twoset defined in the first and second constraints in (17) usingthe new weights. The process is stopped when the distanceis zero.

Remark The above approach does not provide a practicalway to construct the weights, as it uses information about thesparse signal support. However, by finding the limit for theweights, once can see that the class of weights that satisfy theconditions in (16) is wide. A low-resolution estimate of thetrue sparse vector would probably fall in that class, and thusmake the weighted `1-norm problem satisfy the RSP, even ifthe original non-weighted `1-norm problem does not satisfythe RSP. This is illustrated in the following simulations.

Matrix A (64 × 2889) was constructed using a realistichead model obtained from BrainStorm software [16], whichrepresents the relationship between the EEG channel read-ings, and the source distribution inside the brain. The numberof non-zero entries in the sparse vector was set to 4. The non-zero indices were assigned randomly, and the correspondingnon-zero elements were drawn from a Gaussian distributionwith mean 1, and standard deviation 5. In order to simulate aweighting matrix obtained based on a blurred version of thesparse vector, the diagonal weighting matrix was constructedaccording to (9) with σi = σ. Two cases were considered(σ = 1, 2, .., 7) and ( σ = 8 ). Figs. 1-a, 1-b, and 1-d showthe true sparse signal, the least `1-norm estimate, and theweighted least `1-norm estimate using one of the weightsshown in Fig. 1-c, respectively. Fig. 1-e shows the upperbound on the weights, such that the weighted least `1-normsolution coincide with the true solution (obtained by solving(17) iteratively), while Fig. 1-f is the weighted least `1-normestimate for the weights shown in Fig. 1-e (i.e. σ = 8). Onecan see from Fig. 1-d that the weighted solution correctlyestimates the support of the original vector for a large class

of weights (Fig. 1-c) as long as the weights fall below theweights upper limits shown in Fig. 1-(e) (blue curve). Ofcourse, at some point, when the support mismatch is large,i.e., σ = 8, the estimate deteriorates (Fig. 1-f).

C. Noisy Sparse Vector

To account for noise in the observations, we will minimizean objective function that is a tradeoff between ‖q‖1 and thefitting error ‖y −AWq‖2, i.e.,

minimizeq

h‖q‖1 + ‖y −AWq‖2 (18)

where h is the regularization parameter. Let us first considerthe case of the non-weighted problem corresponding to (18),i.e.,

minimizex

h‖x‖1 + ‖y −Ax‖2 (19)

Assuming that Slater’s and the strict complementary condi-tions hold, we give the following theorem.

Theorem 3 x? is a solution to the problem of (19) if andonly if there is u such that(a) ui = h if x?i > 0(b) ui = −h if x?i < 0(c) |ui| < h if x?i = 0

(d) u = AT y−Ax?

‖y−Ax?‖2

Proof: First, we prove the necessary condition of thetheorem. Define x?p and x

?n to be vectors containing the

positive and negative entries of x? respectively, i.e.,

x?pi =

{x?i x

?i > 0

0 x?i ≤ 0, and x?ni =

{−x?i x?i < 00 x?i ≥ 0

.

Then x? = x?p − x?n and ‖x?‖1 = x?p + x?n. The problem of(19) can be rewritten as

minimizexn,xp,v,t

h1Txp + h1Txn + t

Subject to xp ≥ 0,xn ≥ 0t ≥ ‖v‖2,v = y −Axp + Axn

(20)

with a solution x? = x?p − x?n. The Lagrangian of (20) is

L = h1Txp + h1Txn + t− λT1 xp − λ

T2 xn

−α(t− ‖v‖2) + λT3 (y −Axp + Axn − v)(21)

and the corresponding dual problem is

maximizeλ1,λ2,λ3

yTλ3

Subject to ATλ3 + λ1 = 1h (C1)

ATλ3 − λ2 = −1h (C2)λ1 ≥ 0,λ2 ≥ 0, ‖λ3‖2 ≤ 1

(22)

On denoting with v?, λ?1, λ?2, and λ

?3 the solution to (22),

and based on the strict complementary slackness, we haveλ?1i = 0 when x

?pi > 0, and λ

?2i = 0 when x

?ni > 0. From the

complementary slackness and the constraints (C1) and (C2)in (22), we have:

aTpiλ3 = h when x?i > 0

aTniλ3 = −h when x?i < 0(23)

where api and ani are the columns from A associated withpositive and negative elements of x?, respectively.

By adding and subtracting (C1) and (C2) in (22), we get

ATλ?3 =λ2 − λ1

2(C3)

λ1 + λ2 = 2h1 (C4)(24)

For zero x?pi and x?ni, λ1i and λ2i are non-zero, and from

(C4) in (24), we have 0 < λ1i < 2h and 0 < λ2i < 2h,which implies

−2h < λ2i − λ1i < 2h when x?i = 0 (25)

Combining (25) and (C3) in (24), we get

−h < āiλ3 < h when x?i 6= 0 (26)

where āi is any column associated with zero entry in x?.For proving statement (d) in this theorem, we use Slaters

Condition, i.e.,

yTλ?3 = h1Tx?p + h1

Tx?n + ‖v?‖2 (27)

Since y = Ax?p−Ax?n+v

?, we can rewrite (27) as follows

(ATλ?3 − h1)Tx?p − (ATλ?3 + h1)

Tx?n + λ?T3 v

? = ‖v?‖2(28)

The first two terms in (28) are always zero. This is because,from (23), for non-zero x?p and x

?n, A

Tλ?3−h1 and ATλ?3+

h1 are zeros. Also, these terms are zero when x?p and x?n

are zeros. This implies that

λ?3v? = ‖v?‖2 =⇒ λ?3 =

v?

‖v?‖2=

y −Ax?

‖y −Ax?‖2(29)

So, indeed, there is a u = ATλ?3 ∈ R(AT ) such that whenx? is a solution to (19), we have(a) ui = h if x?i > 0(b) ui = −h if x?i < 0(c) |ui| < h if xi = 0

(d) u = AT y−Ax?

‖y−Ax?‖2 .Now, we will prove the sufficient condition, i.e., if for a

given x?, we have λ?3 such that

λ?3 =y −Ax?

‖y −Ax?‖2,

ui = a

Ti λ3 = h if x

?i > 0

ui = aTi λ3 = −h if x?i < 0

|aTi λ3| < h if x?i = 0(30)

then x? is a solution to (19). Assume that x̂ 6= x? is the solu-tion to (19). Then, it should hold that h‖x̂‖1 + ‖y −Ax̂‖2 <h‖x?‖1 + ‖y −Ax?‖2. Also, from the necessary conditionof this theorem, we should have

aTi λ̂3 = h if x̂i > 0aTi λ̂3 = −h if x̂i < 0|aTi λ̂3| < h if x̂i = 0λ̂3 =

y−Ax̂‖y−Ax̂‖2

(31)

Following the assumption of Slater’s condition, and sincethe dual problem should attend its maximum at λ̂3 and v̂ =y −Ax̂, we should have

yT v̂‖v̂‖2 > yT v?

‖v?‖2⇒ (Ax̂ + v̂)T v̂‖v̂‖2 > (Ax

? + v?)T v?

‖v?‖2⇒ h‖x̂‖1 + ‖y −Ax̂‖2 > h‖x?‖1 + ‖y −Ax?‖2

(32)

0 500 1000 1500 2000 2500 3000

-4

-2

0

2

4

6

8

0 500 1000 1500 2000 2500 3000

-0.5

0

0.5

1

1.5

2

2.5

(a) (b)

0 500 1000 1500 2000 2500 3000

Discrete Time

0

0.2

0.4

0.6

0.8

1

Weig

hts

= 1

= 2

= 3

= 4

= 5

= 6

= 7

0 500 1000 1500 2000 2500 3000

Discrete Time

-1

-0.5

0

0.5

1

(c) (d)

500 1000 1500 2000 2500

Discrete Time

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

We

igh

ts

0 500 1000 1500 2000 2500 3000

Discrete Time

-4

-2

0

2

4

6

8

(e) (f)

Fig. 1. (a) The original source vector. (b) Least `1-norm solution. (c) Weights corresponding to σ = 1, ..., 7. (d) Least `1-norm solution of the weightedproblem obtained with any of the weights shown in (c). (e) Weights for (σ = 8) (green). The blue line shows the upper bound on the weights based onEq. (9) (f) Weighted least `1-norm solution with the weights equal to the green curve in (e); these weights exceed the upper bound estimated by solving(17).

which contradicts the first assumption requiring thath‖x̂‖1 + ‖y −Ax̂‖2 < h‖x?‖1 + ‖y −Ax?‖2. This im-plies that x̂ is not the solution to the problem of (19).

With Theorem 3, we can now state the effect of the weight-ing matrix W on the solution by the following theorem.

Theorem 4 In the context of the problem of (18), let wii bethe weight associated with qi entry. If we choose wii < h‖ai‖2 ,then qi = 0, where ai is the column of A associated withqi.

Proof: From condition (c) of Theorem 3, if we choosewii such that |wiiaTi λ

?3| < h, then qi = 0. Starting from

condition (c) of Theorem 3, we have

wii|aTi λ?3| < h

⇒ wii < h|aTi λ?3 |(33)

and since aTi λ3? ≤ ‖ai‖2‖λ?3‖2 = ‖ai‖2, we get

wii <h

‖ai‖2⇒ qi = 0 (34)

The importance of the result of Theorem 4 is that insteadof solving the original problem, one can solve a reduceddimensional problem by setting the entries in x that satisfies(35) to zero, and only solves for entries that violate Theorem4 result, which have the effect of reducing the amount ofmemory and calculation required to solve the problem in(18).

Another interpretation of the condition in Theorem 4 isthat for x?i = 0, λ

?3 should lie between the two hyperplanes,

defined as aTi λ?3 = h and a

Ti λ

?3 = −h, and for |x?i | > 0, λ

?3

should lie on one of the hyperplanes that has the same signas xi.

Now, we study the effect of the weighting matrix W ofthe solution of (19). The constraints of the dual problem of(18) can be rewritten as

‖wiiaTi λ?3‖∞ ≤ h

‖λ?3‖2 ≤ 1(35)

By assigning wii to small values, the distance between thetwo hyperplanes will increase, and in this case, there willbe no intersection of the two hyperplanes with the secondnorm of λ?3. Also, when we assign high values for wii,the distance between the corresponding hyperplanes willdecrease, making these hyperplanes intersect with the secondnorm of λ?3.

The above discussion is illustrated in Fig. 2 for non-weighted case, and in Fig. 3 for the weighted case. In

this example, we chose A =[0.5377 −2.2588 0.31881.8339 0.8622 −1.3077

],

x0 =[1 0 0

]T. White noise was added to the output with

signal to noise ratio (SNR) equal to 15 dB, and h was setto 0.5. In Fig. 2, the two hyperplanes associated with eachcolumn in A are plotted, with the solid line representingthe hyperplane associated with positive h, and dashed linethe hyperplane associated with negative h. The red, blue,and green lines are the hyperplanes associated with the first,second, and third columns of A, respectively. The blue circlerepresents the second norm of λ?3. As shown in Fig. 2, thevalue of λ?3 will be the point of intersection of the green

-1 -0.5 0 0.5 1

1

-1

-0.5

0

0.5

1

2

Fig. 2. The solution space of λ3 ∈ R2 in the non-weighted problem. Solidand dashed lines represent the hyperplanes associated with positive andnegative h, respectively. The red, blue, and green lines are the hyperplanesassociated with the first, second, and the third columns of A, respectively.

-1 -0.5 0 0.5 1

1

-4

-3

-2

-1

0

1

2

3

4

2

Fig. 3. The solution space of λ3 ∈ R2 for the weighted problem. Solid anddashed lines represent the hyperplanes associated with positive and negativeh respectively. The red, blue, and green lines are the hyperplanes associatedwith the first, second, and the third columns of A, respectively.

and red lines with the λ?3 norm curve. The solution forλ?3 in this case indicates that x

?3 will be zero, while x

?1

and x?2 will not be zero. Indeed, the optimal solution isx? =

[1.0209 0.0511 0

]T. Fig. 3 represents the solution

of problem (18) with the same settings as in the aboveexample, and for W = diag

[3 0.2 0.2

]. As shown in Fig.

3, the solution for λ?3 will be the intersection of the redhyperplanes with the unit norm circle. The solution in thiscase indicates that the second and the third entries of x? willbe zero, and only the first entry of x? will be non-zero. Thus,x? =

[0.34 0 0

]T. The weighted approach is better in the

sense that it has the real support of the actual sparse vector,while the non-weighted solution shows two activate entries.

The distance between the two hyperplanes wiiaTi λ3 = hand wiiaTi λ3 = −h equals to 2hwii‖ai‖2 . Based on theabove discussion, by making the distance between thesehyperplanes larger than two, and because ‖λ3‖2 < 1, therewill be no intersection between the unit norm ball andthe above hyperplanes, which coincides with the result ofTheorem 4.

The main result that can be concluded from the weightednoisy case is that by assigning high values to wiis corre-sponding to xis that are expected to be non-zero, and small

values to wiis corresponding to xis that are expected to bezero, the solution to (18) will favor the solution q that hasthe same support as the underlying vector x0.

IV. APPLICATION TO EEG SPARSE SOURCELOCALIZATION

In this section, we apply the proposed approach to theproblem of EEG source localization. EEG is a relativelyinexpensive non-invasive neuroimaging technique, offeringa window into the human brain function by measuring theelectrical potentials over the scalp that are reflective of theunderlying neural activity. Compared to other non-invasiveneuroimaging techniques, such as Functional Magnetic Res-onance Imaging (fMRI), EEG offers superior temporal reso-lution, and hence continues to be an attractive imaging tool inseveral domains including basic neuroscience research [17]–[19], clinical neuroscience [20]–[22], and brain computerinterfaces (BCIs) [23]–[26]. EEG however, suffers from theproblem of poor spatial resolution due to volume conductioneffect [27]. The signals recorded by EEG electrodes on thescalp surface represent a weighted sum of the electricalactivity of the underlying neurons. As such, EEG recordingsdo not directly identify the location of sources in the brain.

Developing reliable EEG source localization techniques,because of their potential in enabling an imaging tool withhigh accuracy in both the temporal and spatial domains,has been of great interest to the neuroscience and clinicalcommunities. For example, EEG source localization couldhave important applications in noninvasive BCIs. The Ma-jority of existing EEG-based BCIs use information fromthe scalp-recorded signals (e.g. event-related desynchroniza-tion/synchronization (ERD/ERS)) for extracting features todistinguish actions [28]. Due to the poor spatial resolution ofEEG, the degree of freedom of these BCIs has been limitedto discriminating a small number (e.g. 6) of very “distinct”classes of actions [26], [29], [30], and as the number ofclasses increases, the classification performance degrades.This issue has been a major challenge for EEG-based BCIs,as in a realistic setting, more than few degrees of freedom isneeded. Recent work [31], demonstrating improvement in theaccuracy of discriminating several actions based on sourcedomain as compared to sensor domain information, furthermotivates the application of source localization in EEG-basedBCIs.

To date, several algorithms with different a priori con-straints on sources have been proposed to solve the ill-posed inverse problem for EEG source localization [32],[33]. In MUltiple Signal Classification (MUSIC) [34], [35],Minimum Variance Beamforming (MVB) [36], [37], and theLinearly Constrained Minimum Variance (LCMV) methods[38], [39], second order statistics are used to localize thesources. However, it is hard to obtain good estimates ofthose statistics due to the limited number of snapshots. Forinstance, in MVB, the number of statistically independentsnapshots should be at least three times the number ofchannels to achieve stable source localization estimates [39].

Another approach of source localization methods is theclass that fits the observations to a linear system model.The small number of obtained recordings at a given time,as compared to the internal mesh size of the brain [40]

-1 -0.5 0 0.5 1

x

0

5

10

15

20

25

30

Fre

qu

en

cy

(p

erc

en

t)

Fig. 4. Histogram of normalized cross correlation of the columns of arealistic lead field matrix.

makes the source estimation problem underdetermined withinfinitely many solutions [32]. Assuming that at a given timea small number of sources inside the brain are active abovea certain threshold, the source localization problem can beformulated as a sparse signal recovery problem. EEG sourcelocalization methods that exploit the sparsity of x include `1-norm minimization [41]–[44], FOCUSS [7], MP [12], ORMP[12], and Source Deflated Matching Pursuit (SDMP) [45].

As already mentioned in Section II, strong equivalenceconditions should be satisfied in order to use the `1-normminimization for sparse EEG source localization. However,when using a realistic head model to construct the lead fieldmatrix, these conditions may not be met, thus compromisingthe solution of `1-norm based methods. On the other hand,the problem of low SNR related to EEG source localizationand the problem of local minimum can affect the perfor-mance of FOCUSS in correctly estimating the location ofsources. Also, in all the greedy algorithms (i.e., MP, ORMP,and SDMP), the sparsity level should be known in advance.Because of the highly correlated columns in the lead fieldmatrix (as will be discussed below), degraded performanceis expected for the greedy algorithms.

A. Model Description

The current-based dipole model [46], in which the activesources are modeled as dipoles was adopted here. By seg-menting the cortex into m nodes, a dipole vector, called thelead field vector (LFV), is assigned to each node. With nnumber of electrodes on the scalp, at given time instant t,the EEG model can be described as

y(t) = Ax(t) + no(t) (36)

where y(t) ∈ Rn represents the electrode readings at timeinstant t, x(t) ∈ R3m denotes the dipole source vector attime instant t, A ∈ Rn×3m is the lead field matrix andno(t) ∈ Rn denotes the noise vector. It is clear that thesystem described by (36) is underdetermined, i.e., for thesame electrode readings y(t), infinite solutions for x(t) canbe obtained.

The coherence histogram of the lead field matrix, obtainedby the BrainStorm [16] toolbox, is shown in Fig. 4 (thediagonal elements are excluded). From the figure, one can seethat approximately 15 % of the columns exhibit a correlation

factor greater than 0.7, which indicates that there is noguarantee that the least `1-norm solution will coincide withthe sparsest source vector. Also, from Fig. 4, one can seethat there are vectors with correlation factor greater than0.8, which violates the RIP condition [3], which requiresthat all k subsets columns should behave like orthonormalcolumns, with k representing the number of non-zero entriesof the sparsest solution. Following the above discussion, it isobvious that the lead field matrix does not satisfy the strongequivalence conditions described in Section I.

B. Simulation Setup

A 3-shell realistically-shaped head model (Colin27) [47],provided by the Brainstrom software package [16], wasused to represent the geometry of the brain. The electricalconductivity for the cortex, skull, and scalp was set to 1.0S/m, 180 S/m, and 1.0 S/m, respectively.

The EEG is believed to be mainly generated by theinhibitory and excitatory post-synaptic potentials of corticalpyramidal nerve cells, which are spatially aligned perpendic-ular to the cortical surface [48]. Therefore, in this paper, weonly considered the cortex as the source space. While theactivities from deep brain activations are generally believedto be poorly represented in EEG signals, if similar to [11],the model for localizing deep sources can be approximatedas linear and instantaneous, the proposed approach can alsolocalize them if appropriate weights satisfying the conditionsmentioned the paper are used.

The cortex was divided into 966 grid points such thatthe mean distance between two grid points is 5 mm. 64electrodes, following the International 10 − 10 system ofEEG sensor placement, were positioned in the sensor space.The lead field matrix was accordingly constructed [49]. Thishead model was used in both the construction of the forwardmodel (calculating the electrode potentials in the sensorspace from active dipoles in the source space), as well asin solving the inverse problem (reconstructing activity inthe source space from electrode potentials in the sensorspace). To simulate active sources, dipoles were modeled assinusoidal signals with frequencies in the range of [6 − 30]Hz. White noise was added to the observations obtained fromthe forward model. Our results and simulations are obtainedon a single snapshot, so they do not depend on whether thesources are temporally independent of each other. However,obtaining the weighting matrix via the MUSIC method doesexploit time correlations. Of course, one could use othermethods to obtain the weighting matrix.

The inverse problem was solved using the proposed ap-proach, non-weighted approach (i.e., (19)), FOCUSS, ORMPand MUSIC. To construct the weighting matrix W for theproposed approach, and also to initialize FOCUSS, MUSICwas employed. If otherwise stated, 50 snapshots were usedto estimate this matrix; this number of snapshots resultedin comparable performance between MUSIC, FOCUUSS,and the proposed approach (see Fig. 9) for the case of 2sources, 15 dB SNR, and 4 cm minimum distance betweensources.. Various conditions were considered (e.g. variableSNR, different number of sources). For each condition, 1000Monte Carlo trials were performed, where in each trial, thelocation of sources were chosen at random. For multiple

source conditions, the distance between active sources werekept to be greater than a predefined value, to ensure there isno overlap between estimated sources.

To evaluate the performance of different algorithms, “suc-cess rate” was used as the performance metric. For k sources,we declared an estimation to be successful if the distancebetween the locations of the k largest estimated sources andthe locations of the actual sources was less than a predefinedthreshold value (d). The success rate was defined as the ratioof the number of properly estimated source locations and thetotal number of sources. The selection of values for d wasdone along the lines of [45]. Considering that in our headmodel the mean distance between two grid points is 0.5 cm,setting d to 1 cm, will allow estimates located one grid pointaway from the true source to be considered as successfulestimate. Note that if d is increased, the success rate fromall algorithms is expected to increase, since a larger areaaround the true sources would qualify as correct estimation.

C. Simulation Results

In this section, we present simulation results to demon-strate the performance of the proposed method and compareit against existing sparse signal recovery methods. [32]. Notethat, given our grid size and d = 1 cm, for any two sourcesto be resolvable, the distance between them should be largerthan 2 cm. In our simulations, we considered an additional2 cm inter-source to ensure that there are no interferencesbetween sources.

1) Single Trial Simulation Results: The purpose of thissimulation is to demonstrate the local minima problem andresidual interference that are associated with FOCUSS andORMP, respectively. In single trial simulations, two sourceswere considered at random locations with 4 cm minimumdistance between them. The SNR was set to 25 dB. HighSNR was selected here in order to reduce the effects of noiseon the performance of FOCUSS estimation. Fig. 5 shows theresults. As can be seen from Fig. 5-b, the estimation obtainedby the non-weighted approach contains additional sourcesthat do not correspond to real sources (Fig. 5-a). The estimateof ORMP (Fig. 5-d) suffers from residual interference [45].Fig. 5-e and -f, show respectively the estimated vector byFOCUSS and the proposed approach, using the weightsshown in Fig. 5-c. Although the weights assign high valuesto the support of the underlying sparse vector, the estimatevia FOCUSS shows false active sources, due to the localminima problem. Fig. 5-f shows the estimate obtained viathe proposed approach, where as compared to ORMP andFOCUSS, a more accurate estimation of sources is obtained.

2) Monte-Carlo Simulation Results: The purpose ofMonte Carlo simulations is to compare the average perfor-mance among all the algorithms under different scenarios.Fig. 6 shows the performance of the proposed approach,FOCUSS, ORMP, non-weighted, and MUSIC versus SNRfor the case of two sources, where the minimum distancebetween the sources were set to be 4 cm. One can see thatthe proposed approach is more robust than the other methods.At SNR of 0 dB, the success rate for the proposed method is80%, while for FOCUSS it is 30%. The low success rate forFOCUSS at low SNR is because of its iterative procedurewhich involves finding the inverse of AW. This process is

0 500 1000 1500 2000 2500 3000

Discrete Time

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Th

e O

rig

inal S

ign

al

0 500 1000 1500 2000 2500 3000

Discrete Time

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

No

n W

eig

hte

d A

pp

roach

(a) (b)

0 500 1000 1500 2000 2500 3000

Discrete Time

100

101

102

103

104

105

MU

SIC

Esti

mate

(L

og

Scale

)

0 500 1000 1500 2000 2500 3000

Discrete Time

0

0.2

0.4

0.6

0.8

1

1.2

1.4

OR

MP

Esti

mate

(c) (d)

0 500 1000 1500 2000 2500 3000

Discrete Time

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

FO

CU

SS

Esti

mate

0 500 1000 1500 2000 2500 3000

Discrete Time

0

5

10

15

20

Th

e P

rop

os

ed

Ap

pro

ac

h

10-6

(e) (f)

Fig. 5. (a) The original source vector (2 sources, 25 dB SNR, and 4 cm minimum distance). (b) Estimate obtained from the non-weighted approach. (c)Estimate obtained from MUSIC shown in log scale. (d) Estimate obtained from ORMP. (e) Estimate obtained from FOCUSS. (f) Estimate obtained fromthe proposed approach.

-5 0 5 10 15

Signal to Noise Ratio (dB)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Su

cess R

ate

Proposed Approach

ORMP

FOCUSS

MUSIC

Non Weighted Approach

Fig. 6. Success rate of the proposed approach, FOCUSS, ORMP, non-weighted, and MUSIC for the case of two sources as a function of SNR.The minimum distance between sources was set to be 4 cm.

sensitive to noise due to the ill-posed lead field matrix A[11].

Figs. 7 and 8 compare the performance of all five algo-rithms for different number of sources at SNR of 10 dB, withthe minimum distance between the sources set to 4 cm, and8 cm, respectively. While the proposed approach offers supe-rior performance as compared to others when the number ofsources is smaller than 5, as the number of sources increases,the performance of all algorithms degrades. Comparing Figs.7 and 8, it can be seen that as the distance between sourcesincreases, the performance improves. This could be due tothe fact that distant sources correspond to the columns of thelead field matrix that are less correlated.

Fig. 9 shows the success rate of the proposed approach,FOCUSS, and MUSIC for the case of two sources and SNRof 15 dB, when different number of snapshots are usedto construct the weighting matrix. One can see that as thenumber of snapshots decreases, the performance, in terms ofsuccess rate, for all approaches, degrades, with the proposedapproach performing better than FOCUSS. The performancedegradation is due to the degradation in MUSIC estimation,which is used to construct the weighting matrix.

In the case of sensing matrix with low coherence, theperformance of the proposed approach will be similar tothe other methods. To demonstrate this, we performed asimulation for two sources, SNR=10 dB, and with a Gaussianmatrix (the mutual coherence was 0.578) as the dictionarymatrix. The results are summarized in Table I, third column.

In our simulation setup we considered as signals dipoleswith frequencies in the range of [6 − 30] Hz. Here, weconsidered a scenario where the frequency range for signalgeneration is [0.5−50] Hz. The results, summarized in TableI, last column, are similar to those of Fig. 6 for SNR of10 dB, suggesting that the choice of frequency range forgenerating signals does not affect the performance of thelocalization algorithms.

To compare the computational complexity of differentapproaches in terms of average processing time per trial,1000 Monte Carlo trials for the case of 2 sources, andSNR=10 dB were conducted and the results are shown

1 2 3 4 5 6 7 8 9 10

Number of Sources

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Su

ccess R

ate

Proposed Approach

FOCUSS

MUSIC

ORMP


Fig. 7. Success rate of the proposed approach, FOCUSS, ORMP, non-weighted, and MUSIC as a function of number of sources. SNR was set at10 dB, and the minimum distance between sources was set to be 4 cm.

1 2 3 4 5 6 7 8 9 10

Number of Sources

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Su

cc

es

s R

ate

Proposed Approach

FOCUSS

MUSIC

ORMP


Fig. 8. Success rate of the proposed approach, FOCUSS, ORMP, non-weighted, and MUSIC as a function of number of sources. SNR was set at10 dB, and the minimum distance between sources was set to be 8 cm.

10 20 30 40 50

Number of Snapshots

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Su

ccess R

ate

The Proposed Approach

FOCUSS

MUSIC

Fig. 9. Success rate of the proposed approach, FOCUSS and MUSICfor the case of two sources and SNR of 15 dB, for different number ofsnapshots used to construct the weighting matrix.

TABLE IPERFORMANCE COMPARISON OF THE PROPOSED APPROACH WITH OTHER METHODS. FOR EACH SCENARIO, 1000 MONTE CARLO TRIALS FOR 2

SOURCES AT SNR=10 dB ARE CONSIDERED.

Method AverageProcessing Time (sec)Success Rate (%)

for Low Coherence ASuccess Rate (%)

for Signal Generation in the range of (0.5-50)HzMUSIC 0.0725 100 88.78

FOCUSS 1.4296 100 86.84Proposed Approach 1.991 100 97.24

ORMP 0.1229 100 75.5Non-weighted 1.9270 93.27 31.9

in Table I. The `1-norm-based approaches (both weightedand non-weighted) as compared to FOCUSS, MUSIC, andORMP, are more computationally intensive, as expected.

D. Experimental Results

We also examined the performance of the proposed ap-proach in solving the source localization problem, using realEEG data. EEG experiment for eliciting auditory evokedpotentials (AEPs) [45] was conducted with one volunteer,who provided his written informed consent. The stimuluswas a pure tone of 1000 Hz with duration of 40 ms, thatwas presented to the left ear of the participant. The paradigmconsisted of 1092 trials with inter stimulus interval (ISI) of760 ms. Brain activities were recorded using a 64-channelEEG system (Brain Products, Germany) with 1 kHz samplingrate.

Preprocessing was performed using EEGLAB [50].Recorded signals were first downsampled to 256 Hz toreduce the processing time when performing independentcomponent analysis (ICA) for the artifact removal step.EEG recordings were bandpass filtered between 0.5 Hz and100 Hz with a notch filter at 60 Hz. Bad channels wereidentified (five channels: FP1, FP2, AF7, FT9, and FT7), andtheir corresponding signals were replaced with the averageof the signals from their neighboring channels. ICA wasthen employed to remove the artifacts (e.g. eyeblinks). Thecovariance matrix was estimated using the samples thatcorrespond to 50 ms before the stimuli, and was used forwhitening the data. Epochs were extracted and then averagedacross trials.

It has been reported that the most relevant componentsassociated with the auditory experiments are P50 and N100[51]. The ERP waveform for all channels are shown in Fig.10 (filtered down to 30 Hz for display) where both P50 andN100 components can be identified. This result is alignedwith what has been reported in other EEG studies [52]–[54].While here the results for N100 is presented, the localizationcan also be performed for P50 or other ERPs of interest.

To estimate the location of activities related to N100, thesegment [100 − 132] ms from ERP was selected. Sincethe exact location of sources are unknown, to comparethe performance of different localization methods, we takea qualitative approach [55], with reference to the existingknowledge about the expected active regions correspondingto N100. As reported in [45], [56], for this task and at N100,activations in both left and right primary auditory cortices areexpected to occur.

To construct the weighting matrix W, for the proposedapproach and for initializing FOCUSS, MUSIC with 8

0 50 100 150 200 250 300

Time (ms)

-2

-1

0

1

2

3

4

Pote

ntial (

v)

Fig. 10. ERP in response to an auditory task for all 64 channels.

snapshots (about 32 ms) was used. The number of sources(sparsity level) for ORMP was restricted to 10. For eachlocalization method, the inverse problem was solved at eachsample over the duration of [100−132] ms, and the averageof the estimated sources across all time samples was ob-tained. Top, right and left views of the cortex, indicating theestimated locations of active sources based on the proposedapproach, FOCUSS, and ORMP are shown in Figs. 11, 12,and 13, respectively. The proximity of the locations of theprimary auditory cortices is also shown in Figs. 11-13 viablack circles. The proposed approach identifies active sourcesin both left and right auditory cortices. This result is alignedwith previous fMRI studies [57]. Few active sources are alsoidentified in other regions of the brain. Brain activationsrelated to N100, located outside the auditory cortex, havealso been reported in previous studies [45], [58]. FOCUSSshows activations near the temporal lobes, as well as severalother regions in the brain. Deviation from the expectedactive regions in FOCUSS could be due to the local minimaproblem that associated with FOCUSS or the low SNR[40]. Previous studies have shown variations in features ofauditory-based ERPs (e.g. amplitude of N100) for differentindividuals [52]. The observed low SNR could therefore, bedue to the fact that the obtained ERP is from one subject. Dueto high coherence possessed by the realistic lead field matrix,the performance of ORMP is expected to be degraded. Thisis also observed in Fig. 13, where ORMP fails to localizeactive sources in expected regions.

V. CONCLUSIONSA weighting approach for sparse signal support estimation

has been presented. We have shown that by appropriatelyselecting the the weights, we formulate an `1-norm mini-mization problem that satisfies the RSP, even if the original

(a) (b) (c)

Fig. 11. Estimated location of sources in the cortex via the proposed approach, a) top view, b) right view, and c) left view.

(a) (b) (c)

Fig. 12. Estimated location of sources in the cortex via FOCUSS, a) top view, b) right view, and c) left view.

(a) (b) (c)

Fig. 13. Estimated location of sources in the cortex via ORMP, a) top view, b) right view, and c) left view.

problem does not satisfy the RSP. Conditions on the weightsfor both noise free and noisy cases have been provided.Although those conditions involve information about thesupport of the sparse vector, the class of good weights is verywide, and in most cases encompasses estimate obtained via aconventional method that does not encourage sparsity of thesparse vector. Simulation results have shown that in practicalscenarios, using the proposed approach with weights con-structed based estimate obtained via a conventional methodthat does not encourage sparsity of the solution, results in sig-nificantly improved localization of the sparse signal samplesas compared to directly applying `1-norm minimization. Asan application example, we applied the proposed approachto the EEG source localization problem. Simulated and realEEG data have been considered and the proposed approachwas applied using MUSIC as an estimate to construct theweights. The impact of SNR and number of active sourceson the performance of the proposed approach have beenstudied through Monte Carlo simulations, and the resultshave been compared with those of FOCUSS, ORMP andthe non-weighted approach. The proposed approach appears

to be more robust in terms of SNR as compared to FOCUSS,and in terms of SNR and the number of sources as comparedto ORMP and non-weighted approach. Using EEG data,we have also qualitatively evaluated the performance ofthe proposed approach in localizing active sources withinprimary auditory cortices that are responsible for auditoryN100. While for the purpose of validation here we hadexperimental data from one subject, our future work willinvolve the inclusion of more subjects.

REFERENCES

[1] Y.-B. Zhao, “RSP-Based Analysis for Sparsest and Least-Norm So-lutions to Underdetermined Linear Systems,” IEEE Transactions onSignal Processing, vol. 61, no. 22, pp. 5777–5788, 2013.

[2] D. L. Donoho and M. Elad, “Optimally sparse representation in general(nonorthogonal) dictionaries via `1 minimization,” Proceedings of theNational Academy of Sciences, vol. 100, no. 5, pp. 2197–2202, 2003.

[3] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEETransactions on Information Theory, vol. 51, no. 12, pp. 4203–4215,2005.

[4] A. Cohen, W. Dahmen, and R. DeVore, “Compressed sensing andbest k-term approximation,” Journal of The American MathematicalSociety, vol. 22, no. 1, pp. 211–231, 2009.

[5] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposi-tion by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20,no. 1, pp. 33–61, 1998.

[6] E. J. Candès et al., “Compressive sampling,” in Proceedings of TheInternational Congress of Mathematicians, vol. 3. Madrid, Spain,2006, pp. 1433–1452.

[7] I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstructionfrom limited data using FOCUSS: A re-weighted minimum normalgorithm,” IEEE Transactions on Signal Processing, vol. 45, no. 3,pp. 600–616, 1997.

[8] E. J. Candes, M. B. Wakin, and S. P. Boyd, “Enhancing sparsityby reweighted `1 minimization,” Journal of Fourier Analysis andApplications, vol. 14, no. 5-6, pp. 877–905, 2008.

[9] D. P. Wipf and B. D. Rao, “Sparse bayesian learning for basisselection,” IEEE Transactions on Signal Processing, vol. 52, no. 8,pp. 2153–2164, 2004.

[10] B. D. Rao, K. Engan, S. F. Cotter, J. Palmer, and K. Kreutz-Delgado,“Subset selection in noise based on diversity measure minimization,”IEEE Transactions on Signal Processing, vol. 51, no. 3, pp. 760–770,2003.

[11] P. Xu, Y. Tian, H. Chen, and D. Yao, “Lp norm iterative sparse solu-tion for EEG source localization,” IEEE Transactions on BiomedicalEngineering, vol. 54, no. 3, pp. 400–409, 2007.

[12] S. F. Cotter, R. Adler, R. Rao, and K. Kreutz-Delgado, “Forwardsequential algorithms for best basis selection,” IEE Proceedings-Vision, Image and Signal Processing, vol. 146, no. 5, pp. 235–244,1999.

[13] J. A. Tropp and A. C. Gilbert, “Signal recovery from random mea-surements via orthogonal matching pursuit,” IEEE Transactions onInformation Theory, vol. 53, no. 12, pp. 4655–4666, 2007.

[14] J. Adler, B. D. Rao, and E. Kreutz-Delgado, “Comparison of basisselection methods,” in Signals, Systems and Computers, 1996. Con-ference Record of the Thirtieth Asilomar Conference on, vol. 1. IEEE,1996, pp. 252–257.

[15] A. Al Hilli, L. Najafizadeh, and A. Petropulu, “EEG sparse sourcelocalization via range space rotation,” in 2015 IEEE 6th InternationalWorkshop on Computational Advances in Multi-Sensor Adaptive Pro-cessing (CAMSAP) (IEEE 2015), Cancun, Mexico, Dec. 2015, pp.269–272.

[16] F. Tadel, S. Baillet, J. C. Mosher, D. Pantazis, and R. M. Leahy,“Brainstorm: a user-friendly application for MEG/EEG analysis,”Computational Intelligence and Neuroscience, vol. 2011, p. 8, 2011.

[17] C. S. Herrmann, M. H. Munk, and A. K. Engel, “Cognitive functionsof gamma-band activity: memory match and utilization,” Trends inCognitive Sciences, vol. 8, no. 8, pp. 347–355, 2004.

[18] S. Ponten, A. Daffertshofer, A. Hillebrand, and C. J. Stam, “Therelationship between structural and functional connectivity: graphtheoretical analysis of an EEG neural mass model,” NeuroImage,vol. 52, no. 3, pp. 985–994, 2010.

[19] N. Karamzadeh, A. Medvedev, A. Azari, A. Gandjbakhche, andL. Najafizadeh, “Capturing dynamic patterns of task-based functionalconnectivity with EEG,” NeuroImage, vol. 66, pp. 311–317, 2013.

[20] B. J. Roach and D. H. Mathalon, “Event-related EEG time-frequencyanalysis: an overview of measures and an analysis of early gammaband phase locking in schizophrenia,” Schizophrenia Bulletin, vol. 34,no. 5, pp. 907–926, 2008.

[21] L. Yang, C. Wilke, B. Brinkmann, G. A. Worrell, and B. He, “Dynamicimaging of ictal oscillations using non-invasive high-resolution EEG,”Neuroimage, vol. 56, no. 4, pp. 1908–1917, 2011.

[22] D. Moretti, D. Paternicò, G. Binetti, O. Zanetti, and G. B. Frisoni,“EEG markers are associated to gray matter changes in thalamus andbasal ganglia in subjects with mild cognitive impairment,” NeuroIm-age, vol. 60, no. 1, pp. 489–496, 2012.

[23] L. Qin, L. Ding, and B. He, “Motor imagery classification by meansof source analysis for brain–computer interface applications,” Journalof Neural Engineering, vol. 1, no. 3, p. 135, 2004.

[24] B. Blankertz, G. Dornhege, M. Krauledat, K.-R. Müller, V. Kunzmann,F. Losch, and G. Curio, “The Berlin Brain-Computer Interface: EEG-based communication without subject training,” IEEE Transactions onNeural Systems and Rehabilitation Engineering,, vol. 14, no. 2, pp.147–152, 2006.

[25] F. Lotte, M. Congedo, A. Lécuyer, F. Lamarche, and B. Arnaldi,“A review of classification algorithms for EEG-based brain–computerinterfaces,” Journal of Neural Engineering, vol. 4, no. 2, p. R1, 2007.

[26] R. Xu, N. Jiang, C. Lin, N. Mrachacz-Kersting, K. Dremstrup, andD. Farina, “Enhanced low-latency detection of motor intention fromEEG for closed-loop brain-computer interface applications,” IEEETransactions on Biomedical Engineering, vol. 61, no. 2, pp. 288–296,2014.

[27] P. L. Nunez, R. Srinivasan, A. F. Westdorp, R. S. Wijesinghe, D. M.Tucker, R. B. Silberstein, and P. J. Cadusch, “EEG coherency: I:statistics, reference electrode, volume conduction, Laplacians, corticalimaging, and interpretation at multiple scales,” Electroencephalogra-phy and Clinical Neurophysiology, vol. 103, no. 5, pp. 499–515, 1997.

[28] G. Pfurtscheller, C. Brunner, A. Schlögl, and F. L. Da Silva, “Murhythm (de) synchronization and EEG single-trial classification ofdifferent motor imagery tasks,” NeuroImage, vol. 31, no. 1, pp. 153–159, 2006.

[29] S. Ge, R. Wang, and D. Yu, “Classification of four-class motor imageryemploying single-channel electroencephalography,” PLoS ONE, vol. 9,no. 6, p. e98019, 2014.

[30] F. Shiman, E. López-Larraz, A. Sarasola-Sanz, N. Irastorza-Landa,M. Spueler, N. Birbaumer, and A. Ramos-Murguialday, “Classificationof different reaching movements from the same limb using eeg,”Journal of neural engineering, 2017.

[31] B. J. Edelman, B. Baxter, and B. He, “EEG Source Imaging Enhancesthe Decoding of Complex Right-Hand Motor Imagery Tasks,” IEEETransactions on Biomedical Engineering, vol. 63, no. 1, pp. 4–14,2016.

[32] R. Grech, T. Cassar, J. Muscat, K. P. Camilleri, S. G. Fabri, M. Zer-vakis, P. Xanthopoulos, V. Sakkalis, and B. Vanrumste, “Review onsolving the inverse problem in EEG source analysis,” Journal ofNeuroengineering and Rehabilitation, vol. 5, no. 1, p. 1, 2008.

[33] R. D. Pascual-Marqui, “Review of methods for solving the EEG in-verse problem,” International Journal of Bioelectromagnetism, vol. 1,no. 1, pp. 75–86, 1999.

[34] R. Schmidt, “Multiple emitter location and signal parameter estima-tion,” IEEE Transactions on Antennas and Propagation, vol. 34, no. 3,pp. 276–280, 1986.

[35] J. C. Mosher, P. S. Lewis, and R. M. Leahy, “Multiple dipole modelingand localization from spatio-temporal MEG data,” IEEE Transactionson Biomedical Engineering, vol. 39, no. 6, pp. 541–557, 1992.

[36] J. Capon, “High-resolution frequency-wavenumber spectrum analysis,”Proceedings of the IEEE, vol. 57, no. 8, pp. 1408–1418, 1969.

[37] K. Sekihara, S. S. Nagarajan, D. Poeppel, A. Marantz, andY. Miyashita, “Reconstructing spatio-temporal activities of neuralsources using an MEG vector beamformer technique,” IEEE Transac-tions on Biomedical Engineering, vol. 48, no. 7, pp. 760–771, 2001.

[38] O. L. Frost, “An algorithm for linearly constrained adaptive arrayprocessing,” Proceedings of the IEEE, vol. 60, no. 8, pp. 926–935,1972.

[39] B. D. Van Veen, W. Van Drongelen, M. Yuchtman, and A. Suzuki,“Localization of brain electrical activity via linearly constrained min-imum variance spatial filtering,” IEEE Transactions on BiomedicalEngineering, vol. 44, no. 9, pp. 867–880, 1997.

[40] A. Rodrı́guez-Rivera, B. V. Baryshnikov, B. D. Van Veen, and R. T.Wakai, “MEG and EEG source localization in beamspace,” IEEETransactions on Biomedical Engineering, vol. 53, no. 3, pp. 430–441,2006.

[41] K. Matsuura and Y. Okabe, “Selective minimum-norm solution ofthe biomagnetic inverse problem,” IEEE Transactions on BiomedicalEngineering, vol. 42, no. 6, pp. 608–615, 1995.

[42] ——, “A robust reconstruction of sparse biomagnetic sources,” IEEETransactions on Biomedical Engineering, vol. 44, no. 8, pp. 720–726,1997.

[43] L. Ding and B. He, “Sparse Source Imaging in EEG,” in Joint Meetingof the 6th International Symposium on Noninvasive Functional SourceImaging of the Brain and Heart and the International Conference onFunctional Biomedical Imaging, 2007, pp. 20–23.

[44] K. Uutela, M. Hämäläinen, and E. Somersalo, “Visualization ofmagnetoencephalographic data using minimum current estimates,”NeuroImage, vol. 10, no. 2, pp. 173–180, 1999.

[45] S. C. Wu and A. L. Swindlehurst, “Matching pursuit and sourcedeflation for sparse EEG/MEG dipole moment estimation,” IEEETransactions on Biomedical Engineering, vol. 60, no. 8, pp. 2280–2288, 2013.

[46] H. Hallez, B. Vanrumste, R. Grech, J. Muscat, W. De Clercq, A. Ver-gult, Y. D’Asseler, K. P. Camilleri, S. G. Fabri, S. Van Huffel et al.,“Review on solving the forward problem in EEG source analysis,”Journal of Neuroengineering and Rehabilitation, vol. 4, no. 1, p. 1,2007.

[47] “http://imaging.mrc-cbu.cam.ac.uk/imaging/MniTalairach.”[48] B. He, L. Yang, C. Wilke, and H. Yuan, “Electrophysiological imaging

of brain activity and connectivitychallenges and opportunities,” IEEETransactions on Biomedical Engineering, vol. 58, no. 7, pp. 1918–1931, 2011.

[49] M. Fuchs, R. Drenckhahn, H. Wischmann, and M. Wagner, “Animproved boundary element method for realistic volume-conductor

1053-587X (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

modeling,” IEEE Transactions on Biomedical Engineering, vol. 45,no. 8, pp. 980–997, 1998.

[50] A. Delorme and S. Makeig, “EEGLAB: an open source toolbox foranalysis of single-trial EEG dynamics including independent compo-nent analysis,” Journal of Neuroscience Methods, vol. 134, no. 1, pp.9–21, 2004.

[51] K. Sekihara, K. E. Hild, and S. S. Nagarajan, “A novel adaptivebeamformer for MEG source reconstruction effective when largebackground brain activities exist,” IEEE Transactions on BiomedicalEngineering, vol. 53, no. 9, pp. 1755–1764, 2006.

[52] G. P. Jacobson and M. B. Fitzgerald, “Auditory evoked gammaband potential in normal subjects,” Journal-American Academy ofAudiology, vol. 8, pp. 44–52, 1997.

[53] J. Schadow, D. Lenz, S. Thaerig, N. A. Busch, I. Fründ, and C. S.Herrmann, “Stimulus intensity affects early sensory processing: soundintensity modulates auditory evoked gamma-band activity in humanEEG,” International Journal of Psychophysiology, vol. 65, no. 2, pp.152–161, 2007.

[54] K. Whittingstall, G. Stroink, and B. Dick, “Dipole localization accu-racy using grand-average EEG data sets,” Clinical Neurophysiology,vol. 115, no. 9, pp. 2108–2112, 2004.

[55] J. Yao and J. P. Dewald, “Evaluation of different cortical sourcelocalization methods using simulated and experimental EEG data,”Neuroimage, vol. 25, no. 2, pp. 369–382, 2005.

[56] S. D. Mayhew, S. G. Dirckx, R. K. Niazy, G. D. Iannetti, andR. G. Wise, “EEG signatures of auditory activity correlate withsimultaneously recorded fMRI responses in humans,” Neuroimage,vol. 49, no. 1, pp. 849–864, 2010.

[57] L. Jäncke, N. Shah, S. Posse, M. Grosse-Ryuken, and H.-W. Müller-Gärtner, “Intensity coding of auditory stimuli: an fMRI study,” Neu-ropsychologia, vol. 36, no. 9, pp. 875–883, 1998.

[58] Y.-T. Zhang, Z.-J. Geng, Q. Zhang, W. Li, and J. Zhang, “Auditorycortical responses evoked by pure tones in healthy and sensorineuralhearing loss subjects: functional MRI and magnetoencephalography.”Chinese Medical Journal, vol. 119, no. 18, pp. 1548–1554, 2006.

Ahmed Al Hilli Ahmed Al Hilli received his B.S.degree in Electrical Engineering from BabylonUniversity, Babylon, Iraq, in 2002, and his M.Sc.degree from University of Technology, Baghdad,Iraq, 2005. Currently, he is pursuing the Ph.D.degree in Electrical and Computer Engineering atRutgers University, The State University of NewJersey, NJ, USA.

His research interests include digital signal pro-cessing and sparse signal recovery.

Laleh Najafizadeh Laleh Najafizadeh (S’02,M’10) received her B.Sc. from Isfahan Universityof Technology, Isfahan, Iran, her M.Sc. from theUniversity of Alberta, Edmonton, AB, Canada,and her Ph.D. from the Georgia Institute of Tech-nology, Atlanta, Georgia, USA, all in ElectricalEngineering. From 2003 to 2004, she was with theiCORE Wireless Communications Laboratory atthe University of Alberta, and from 2010 to 2012she was a postdoctoral fellow at the National In-stitutes of Health (NIH), MD, USA. She currently

is an Assistant Professor in the Department of Electrical and ComputerEngineering, at Rutgers University, Piscataway, NJ. Dr. Najafizadeh hasco-authored two book chapters and more than 80 peer-reviewed papersin premier journals and conference proceedings. She is the recipient ofTexas Instruments Leadership Fellowship, Delta Kappa Gamma WorldFellowship, and competitive scholarships from the Alberta Ingenuity Fund,and the Alberta Informatics Circle of Research Excellence. Together withher students she received the best student paper award (Runner-Up) fromthe 2014 IEEE ISCAS.

Athina P. Petropulu Athina P. Petropulu receivedher undergraduate degree from the National Tech-nical University of Athens, Greece, and the M.Sc.and Ph.D. degrees from Northeastern University,Boston MA, all in Electrical and Computer En-gineering. She is Distinguished Professor at theElectrical and Computer Engineering (ECE) De-partment at Rutgers, having served as chair ofthe department during 2010-2016. Before joiningRutgers in 2010, she was faculty at Drexel Uni-versity. She held Visiting Scholar appointments at

SUPELEC, Universite Paris Sud, Princeton University and University ofSouthern California. Dr. Petropulu’s research interests span the area ofstatistical signal processing, wireless communications, signal processingin networking, physical layer security, and radar signal processing. Herresearch has been funded by various government industry sponsors includingthe National Science Foundation, the Office of Naval research, the US Army,the National Institute of Health, the Whitaker Foundation, Lockheed Martin.

Dr. Petropulu is Fellow of IEEE and recipient of the 1995 PresidentialFaculty Fellow Award given by NSF and the White House. She has served asEditor-in-Chief of the IEEE Transactions on Signal Processing (2009-2011),IEEE Signal Processing Society Vice President-Conferences (2006-2008),and member-at-large of the IEEE Signal Processing Board of Governors. Shewas the General Chair of the 2005 International Conference on AcousticsSpeech and Signal Processing (ICASSP-05), Philadelphia PA. In 2005 shereceived the IEEE Signal Processing Magazine Best Paper Award, and in2012 the IEEE Signal Processing Society Meritorious Service Award forëxemplary service in technical leadership capacities.̈ She was selected asIEEE Distinguished Lecturer for the Signal Processing Society for 2017-2018.

More info on her work can be found at www.ece.rutgers.edu/ cspl

a weighted approach for sparse signal support estimation...

Documents