ieee transactions on neural networks 1 fast …mandic/research/q_fast_ica_ieee... · 2011-10-24 ·...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON NEURAL NETWORKS 1

Fast Independent Component Analysis Algorithmfor Quaternion Valued Signals

Soroush Javidi, Student Member, IEEE, Clive Cheong Took, Member, IEEE,and Danilo P. Mandic, Senior Member, IEEE

Abstract— An extension of the fast independent componentanalysis algorithm is proposed for the blind separation of bothQ-proper and Q-improper quaternion-valued signals. This isachieved by maximizing a negentropy-based cost function, andis derived rigorously using the recently developed HR calculusin order to implement Newton optimization in the augmentedquaternion statistics framework. It is shown that the use ofaugmented statistics and the associated widely linear modelingprovides theoretical and practical advantages when dealing withgeneral quaternion signals with noncircular (rotation-dependent)distributions. Simulations using both benchmark and real-worldquaternion-valued signals support the approach.

Index Terms— Augmented quaternion statistics, independentcomponent analysis, quaternion blind source separation, quater-nion noncircularity, quaternion widely linear modeling.

I. INTRODUCTION

THE independent component analysis (ICA) methodologyis a popular framework for the separation of latent

sources from an observed mixture, based on the assumptionof statistical independence of the sources [1]. In its standardform, the latent sources are assumed to be linearly mixed,and subject to additive noise. For this scenario, algorithmsproposed in the past two decades include those based onthe second- and higher-order statistics (second order blindidentification and joint approximate diagonalization of eigen-matrices algorithms [2], [3]), and those based on informationtheoretic criteria such as the maximization of likelihood andminimization of mutual information [4].

The fast independent component analysis (FastICA)algorithm [5], a fast converging algorithm based on themaximization of non-Gaussianity and implemented using anapproximative Newton optimization method, has become astandard for the separation of both sub- and super-Gaussiansources. The algorithm has shown to exhibit cubic conver-gence when in the deflationary mode (using the kurtosis costfunction) and local quadratic convergence in the symmetricorthogonalization approach [6]. The FastICA algorithm wassubsequently extended to allow for the separation of complex

Manuscript received June 8, 2010; accepted September 28, 2011. This workwas supported in part by the Engineering and Physical Sciences ResearchCouncil, U.K., under Grant EP/H026266/1.

The authors are with the Department of Electrical and Electronic Engi-neering, Imperial College London, London SW7 2AZ, U.K. (e-mail:[email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TNN.2011.2171362

circular sources in [7], and more recently it was generalized toenable separation of complex noncircular sources [8], this wasachieved based on the fourth-order moments and through thestrong uncorrelating transform [9]. A FastICA algorithm forthe separation of both circular and noncircular sources basedon a negentropy-based criterion was addressed in [10].

The recent progress in supervised and unsupervised adaptivesignal processing algorithms in the complex domain [11] hasbeen made possible through to the advances in the so-calledaugmented statistics [12] and the framework for analysis ofnon-analytic functions called the CR calculus (also known asWirtinger calculus) [13].

On the other hand, the advances in multidimensional sensortechnology (3-D anemometers, 3-D inertial body sensors,robotics) have highlighted the need for adaptive signal process-ing algorithms in the quaternion domain, which is a naturaldomain for the processing of 3-D and 4-D signals. Theliterature on quaternion-valued signal processing includes thealgebraic [14], [15] as well as statistical approaches [16], [17].More recent developments include the analysis of quaternion-valued random variables via augmented quaternion statis-tics [18], the unitary diagonalization of quaternion matrices[19], and the so-called HR calculus, which is a unifiedframework for the analysis of non-analytic quaternion func-tions [20].

These advances have been exploited through widely lin-ear modeling of quaternion signals [18], [21], allowing theincorporation of the full second-order information and leadingto the class of widely linear quaternion least mean squarealgorithms [22]. For nonlinear signal models, both split andfully quaternion nonlinear models have been successfullyimplemented [23], [24], whereas for unsupervised adaptivealgorithms a quaternion ICA algorithm based on likelihoodmaximization and the concept of Infomax was proposed byLe Bihan and Buchholz in [25]. In their study, it was con-cluded that a fully quaternion nonlinearity results in betterseparation performance. Moreover, real-valued and complex-valued ICA models are shown to be special instances ofthe quaternion-valued ICA model (all being division alge-bras), which highlights the generality of quaternions. In otherwords, quaternion-valued ICA can solve both the real-valuedand complex-valued blind source separation paradigm, butthe opposite does not necessarily hold. For instance, inter-channel correlation of a quadrivariate independent componentis accounted for in a quaternion model, whereas complex-valued ICA models do not allow for the modelling of

1045–9227/$26.00 © 2011 IEEE


2 IEEE TRANSACTIONS ON NEURAL NETWORKS

correlation beyond two dimensions. Therefore, the complex-valued ICA model is not likely to reconstruct the quadrivariatesources as shown in Section IV-D.

In this paper, we propose a FastICA algorithm suitablefor the separation of both Q-proper and Q-improper (second-order noncircular) quaternion-valued signals from an observedlinear mixture. This is achieved by means of the recentlyintroduced augmented quaternion statistics [18], [21], widelylinear modeling [18], [21], [22], and HR calculus [20]. Anaugmented Newton method is proposed, whereby at a costof additional complexity we are able to capture the completestatistical properties of the mixing signals and thus ensuresuccessful separation of latent sources. The performance isvalidated using synthetic Q-proper and Q-improper signalsin both deflationary and simultaneous separation scenarios,together with real-world case studies of electroencephalogram(EEG) artifact extraction and in the context of Alamouti codingin communications.

II. PRELIMINARIES ON QUATERNION SIGNALS

A. Quaternion Algebra

Consider the quaternion variable q = qa+ıqb+jqc+κqd ∈H, where qa, qb, qc, and qd are real-valued scalars, and ı, j ,and κ are both orthogonal unit vectors and imaginary units,such that

ıj = −j ı = κ jκ = −κj = ı κı = −ıκ = j

ıjκ = ı2 = j2 = κ2 = −1. (1)

These identities illustrate the noncommutative property ofquaternion products, whereby q1q2 �= q2q1. The variable qcan also be written in terms of its real (scalar) part �{q} = qa

and its vector part �{q} = ı�ı {q} + j�j {q} + κ�κ{q}, suchthat q = �{q} + �{q}. Alternatively, by adopting the Cayley–Dickson notation, q can be constructed from a pair of complexquantities z1 = qa + ıqb and z2 = qc + ıqd , such thatq = z1 + z2j , however, in this paper direct quaternionicnotation will be used.

We next consider three self-inverse mappings1 or involu-tions [26] about the ı, j , and κ axes, given by

qı = −ıqı = qa + ıqb − jqc − κqd

qj = −jqj = qa − ıqb + jqc − κqd

qκ = −κqκ = qa − ıqb − jqc + κqd (2)

which form the bases for augmented quaternion statistics [18],[21]. Intuitively, an involution represents a rotation along asingle imaginary axis, while the conjugate operator (·)∗, alsoan involution, rotates along all the three imaginary directions,that is

q∗ = qa − ıqb − jqc − κqd . (3)

The involutions have the property that (q1q2)α = qα

1 qα2 , α =

{ı, j, κ}, while (q1q2)∗ = q∗2q∗1. Finally, the norm (modulus)

of a quaternion variable q is defined by

‖q‖2 =√

qq∗ = √q∗q =

√q2

a + q2b + q2

c + q2d (4)

1A self-inverse mapping operator sinv(·) is such that sinv(sinv(q)

) = q.

while for a vector q in a quaternion Hilbert space [16], the2-norm is defined as ‖q‖2 =

√qH q.

B. Augmented Statistics and Widely Linear Modeling

For a random vector q = qa + ıqb + jqc + κqd ∈ HN ,the probability density function (pdf) is defined in termsof the joint pdf of its components, such that pQ(q) �pQa,Qb,Qc,Qd (qa, qb, qc, qd ). Its mean is then calculated interms of each respective component as E{q} = E{qa} +ı E{qb}+ j E{qc}+ κ E{qd}. The covariance matrix of quadri-variate real-valued component vectors Cr

qq = E{qr qrT } ∈R4N×4N describes the second-order relationship between therespective components of q, where qr = [qT

a , qTb , qT

c , qTd ]T .

Representing the components of Crqq by their equiva-

lent quaternion-valued counterparts allows for the completesecond-order statistical information to be captured directly inH [18]. This is achieved by expressing the components of thequaternion random vector q through its involutions (2), suchthat

qa = 1

4

(q+ qı + qj + qκ

), qb = 1

4

(q+ qı − qj − qκ

)

qc = 1

4

(q− qı + qj − qκ

), qd = 1

4

(q− qı − qj + qκ

).

(5)

In analogy to the complex domain2 where both z and z∗are used to define the augmented statistics [27], [28], it can beshown that the bases q, qı , qj , and qκ provide a suitable meansto define the quaternion augmented statistics [18]. This way,the augmented random vector qa = [qT , qıT , qjT , qκT ]T isused to define the augmented covariance matrix

Caq = E

{qaqa H

}

=

⎡

⎢⎢⎢⎣

Cqq Cqı Cqj Cqκ

CHqı Cqı qı Cqı qj Cqı qκ

CHqj Cqj qı Cqj qj Cqj qκ

CHqκ Cqκqı Cqκqj Cqκqκ

⎤

⎥⎥⎥⎦∈ H4N×4N (6)

which describes the complete second-order information avail-able. The block matrices in (6), Cqı , Cqj , Cqκ , are respectivelytermed the ı -, j -, and κ-covariance matrices E{qqαH }, α ={ı, j, κ}, while Cqq = E{qqH } is the standard covariancematrix. The ı -, j -, and κ-covariance matrices are also calledthe complementary or pseudo-covariance matrices [28].

In the domain of second-order statistics, the concept ofproperness3 has been extended from the complex to thequaternion domain and has been discussed in [16] and [17].Following on the involution-based augmented bases in [18]and [21], a random vector is considered Q-proper if it isnot correlated with its involutions, or equivalently, Cqı =Cqj = Cqκ = 0, that is, all the cross-covariance matricesvanish, it is otherwise termed Q-improper [18]. Note that,

2In the complex domain, the real and imaginary components can be repre-sented in terms of the conjugate coordinates z and z∗ as zr = (1/2)(z+ z∗)and zi = (1/2j )(z− z∗).

3Properness refers to second-order circularity (rotation invariant pdfs), thatis, equal power in all the components, qa , qb, qc, and qd , of a quaternionrandom variable q = qa + ıqb + jqc + κqd .


JAVIDI et al.: FastICA ALGORITHM FOR QUATERNION VALUED SIGNALS 3

for a Q-proper random vector, the augmented covariancematrix in (6) has a block-diagonal structure. More restricteddefinitions of properness can also be given, whereby one ormore pseudo-covariances are nonzero (C-proper) [17]. Thiscan be intuitively understood as rotation invariance along oneor more of the quaternion axes, Q-properness thus reflectsrotation invariance along all the three imaginary axes.

Recall that the solution to the mean square error (MSE)estimator of a real-valued signal y ∈ R in terms of anobservation x , that is

y = E{y|x} (7)

is given by y = hT x, where h is a coefficient vector and xthe regressor vector. As a generalization, the MSE estimatorfor a quaternion-valued signal y ∈ H can be written in termsof the MSE estimators of its respective components, given by

ya = E{ya|qa, qb, qc, qd } yb = E{yb|qa, qb, qc, qd}yc = E{yc|qa, qb, qc, qd } yd = E{yd |qa, qb, qc, qd}

(8)

such that

y = ya + ı yb + j yc + κ yd

= E {ya|qa, qb, qc, qd} + ı E {yb|qa, qb, qc, qd}+j E {yc|qa, qb, qc, qd } + κ E {yd |qa, qb, qc, qd}.

(9)

Observe that by using the relations (5), the MSE estimator ofy can be equivalently written as

y = E{

y|q, qı, qj , qκ}+ ı E

{yı |q, qı , qj , qκ

}

+j E{

yj |q, qı , qj , qκ}+ κ E

{yκ |q, qı , qj , qκ

}

(10)

which for jointly normal independent components results inthe widely linear estimator [18], [22]

y = hH q + gH qı + uH qj + vH qκ

= ωa H qa (11)

where the augmented weight vector ωa = [hT , gT , uT , vT ]T .Thus, the linear estimator in (11) is the optimal estimator forthe generality of quaternion-valued signals, both proper andimproper.

C. Overview of the HR Calculus

A real-valued cost function, typically the error power,is a common performance criterion across statistical signalprocessing problems. However, the ways to calculate gradientsof a real function of quaternion variables are only just beingaddressed. In a similar fashion to the complex CR calculusframework, where a real-valued function is defined based onthe conjugate coordinates z and z∗ [11], [13], [29], in thecontext of HR calculus [20] the mapping f (q): HN �→ R canbe considered as a function of the orthogonal quaternion basisvectors q, qı , qj , and qκ , such that

f (q, qı , qj , qκ) : HN ×HN ×HN ×HN �→ R. (12)

Likewise, the duality between a quaternion function f and itsreal-valued equivalent g (vectors in R4) can be expressed as

f (q) = f (q, qı , qj , qκ )

= fa(qa, qb, qc, qd )+ ı fb(qa, qb, qc, qd )

+j fc(qa, qb, qc, qd )+ κ fd(qa, qb, qc, qd )

= g(qa, qb, qc, qd ). (13)

Then, based on the isomorphism between R4 and H and byconsidering the components of the quaternion variable q in theorthogonal bases in (5), a relation can be established betweenthe derivatives of g ∈ R4 and those taken directly with respectto the quaternion basis variables. Within the HR calculus, theHR derivatives are given by [20]

∂ f

∂q= 1

4

(∂ f

∂qa− ı

∂ f

∂qb− j

∂ f

∂qc− κ

∂ f

∂qd

)

∂ f

∂qı= 1

4

(∂ f

∂qa− ı

∂ f

∂qb+ j

∂ f

∂qc+ κ

∂ f

∂qd

)

∂ f

∂qj= 1

4

(∂ f

∂qa+ ı

∂ f

∂qb− j

∂ f

∂qc+ κ

∂ f

∂qd

)

∂ f

∂qκ= 1

4

(∂ f

∂qa+ ı

∂ f

∂qb+ j

∂ f

∂qc− κ

∂ f

∂qd

)(14)

and the HR∗ derivatives can then readily be written from (14)by using the property that, for real functions of quaternionvariables,

(∂ f /∂q

)∗ = ∂ f /∂q∗, leading to

∂ f

∂q∗= 1

4

(∂ f

∂qa+ ı

∂ f

∂qb+ j

∂ f

∂qc+ κ

∂ f

∂qd

)

∂ f

∂qı∗ =1

4

(∂ f

∂qa+ ı

∂ f

∂qb− j

∂ f

∂qc− κ

∂ f

∂qd

)

∂ f

∂qj∗ =1

4

(∂ f

∂qa− ı

∂ f

∂qb+ j

∂ f

∂qc− κ

∂ f

∂qd

)

∂ f

∂qκ∗ =1

4

(∂ f

∂qa− ı

∂ f

∂qb− j

∂ f

∂qc+ κ

∂ f

∂qd

). (15)

Similar to the conjugate derivatives in (15), involution-wisederivatives are also applicable to real-valued functions ofquaternion variables, and are given by

(∂ f

∂q

)α

= ∂ f

∂qα, α = {ı, j, κ}. (16)

Recently, in [20], it has been shown that in the quaterniondomain, the direction of steepest descent (maximum rate ofchange of f (q)) is given by the derivative with respect to q∗,i.e., (∂ f /∂q∗). This can be seen as a natural generalization ofBrandwood’s result for functions of complex variables [30].Finally, note that, while we have considered real-valued func-tions of quaternion variables in the above discussion, the HR

calculus framework can be equally utilized for the analysis ofgeneral quaternion-valued functions.

III. QUATERNION FASTICA (Q-FASTICA) ALGORITHM

Consider the standard ICA model

x = As (17)

where observed mixtures x ∈ HN are a weighted sum of Ns

latent sources s ∈ HNs in a noise-free environment, and the



rows of A ∈ HN×Ns form the respective mixing parameters.While no knowledge of the mixing process is available, thesources are assumed statistically independent, for convenience,they have zero mean and unit variance and for generalityno assumption is made regarding the ı -, j -, and κ-variances.The mixing matrix A is assumed square (N = Ns ), well-conditioned, and invertible.

We shall now show that for a quaternion random vectorq ∈ HN , its whitening matrix V is given by

V = �−12 EH (18)

where � is the diagonal matrix of right eigenvalues and E isthe matrix of the corresponding eigenvectors of the covariancematrix of q.

Indeed, by expressing the covariance matrix in terms of thequaternion right eigenvalue decomposition Cqq = E{qqH } =E�EH [19], [31], the covariance matrix of the whitenedrandom vector p = Vq then becomes

E{

ppH}= VE

{qqH

}VH

= �−12 EH (

E�EH )E�−

12 = I (19)

where I is the identity matrix.Using the above result as a preprocessing step in ICA

algorithms, the quaternion mixture x is whitened such that4

E{

xxH}=ME

{ssH

}MH = I (20)

where x = Vx = VAs and M � VA is a new unitary mixingmatrix containing the whitening matrix V, given in (18). Weaim to obtain a demixing matrix W such that WH x is anestimate of the original sources, albeit subject to the scaling,phase, and permutation ambiguities. Then for the nth sourceestimate, we have

yn = wHn x = wH

n Ms = uH s = eξϕsm (21)

where wn is the nth column of the demixing matrix W,u is a vector with a single nonzero value given by eξϕ

at the nth entry signifying an arbitrary direction withinH, ϕ is an unknown and arbitrary angle, and ξ =(ıqb + jqc + κqd

/√q2

b + q2c + q2

d

)is the unit pure quater-

nion vector.5 Finally, note that, by constraining the demixingvector wn to unit norm, the estimated source yn is of unitvariance

E{

yn y∗n} = wH

n E{

xxH}

wn = wHn wn = 1 (22)

while the matrix W becomes unitary.

A. Newton-Update-Based ICA Algorithm

The proposed q-FastICA algorithm is based on the max-imization of negentropy of the separated sources, followingon the previous implementations of the FastICA algorithm inthe real and complex domains [1], [7], [10]. This is achieved

4The result in [19] can be seen as the Takagi factorization of quaternionmatrices.

5A pure imaginary quaternion is referred to as the imaginary or vector partof a quaternion variable.

by choosing an appropriate nonlinear function G(y), so as tomake a suitable approximation of the negentropy function.

In [25], three such distinct quaternion nonlinearities wereidentified: a nonlinear function applied component-wise toy (split-quaternion function), applied to the components ofthe Cayley–Dickson form of y (split-complex function), orapplied directly to y (full-quaternion function). It was alsoshown that the full-quaternion nonlinearity resulted in thebest separation performance. The difficulty in applying a fullquaternion nonlinearity in this paper stems from the factthat under the stringent analyticity conditions of the Cauchy–Riemann–Fueter [32] equations, the only analytic function inH is a constant. As an alternative, local analyticity conditionsmay be considered in the calculation of the derivatives [24],[33]. However, the assumptions therein may not be validfor general nonlinear functions. For continuity, following theexisting complex FastICA (c-FastICA) algorithms which alluse a real-valued nonlinear function, we shall also employa real-valued smooth and even nonlinearity G : R �→ R,while using the HR calculus and widely linear modeling toimplement an augmented Newton method so as to employ thefull information available within general Q-improper mixtures.

The real-valued q-FastICA cost function, which approxi-mates negentropy [1], is then defined as

J (w, wı , wj , wκ ) = E{G(|wH x|2)} (23)

where the cost function J is written in terms of the four basisvectors, to emphasize the widely linear model. Based on (11),the optimization problem based on (23) can be stated as

wopt = arg max‖w‖22=1 J (w, wı , wj , wκ ) (24)

where the demixing vector is normalized to avoid very smallvalues of w, while keeping the unit variances of the extractedsources.

The solution of this constrained optimization problem isfound through the method of Lagrange multipliers and byutilizing the Newton method to perform a fast iterativesearch for the optimal value wopt. In summary, the q-FastICAalgorithm for the estimation of one possibly noncircular sourceis expressed in its augmented form as

wa(k + 1) = wa(k)− (Haww)−1∇wa∗L

λ(k + 1) = λ(k)+ μ∇wa∗Lw(k + 1)← w(k + 1)

‖w(k + 1)‖2 (25)

where the augmented demixing vector wa = [w, wı , wj , wκ ]T ,L is the Lagrangian function and λ is the Lagrange parameterupdated via a gradient ascent method with step size μ. Thevector ∇wa∗L and matrix Ha

ww are respectively, the augmentedgradient vector and Hessian matrix of the Lagrangian function.The algorithm is summarized in Algorithm 1 and the fullderivation is provided in the Appendix.

The estimation of multiple sources can be performed one byone through a deflationary procedure, where the nth estimatedsource is recovered either sequentially, by the following Gram–Schmidt orthogonalization procedure or, simultaneously, via a



Algorithm 1: The q-FastICA algorithm1: Preprocessing: Decorrelate the input data x, which may

be achieved using the whitening matrix V (18),and center the data by removing their mean;

2: Initialize the demixing vector w1 by assigning randomvalues or the identity matrix;

3: Calculate the augmented gradient vector ∇wa∗ andHessian matrix Ha

ww of the Lagrangian accordingto (41);

4: Update the augmented demixing vector aswa ← wa − (Ha

ww)−1∇wa∗L5: Update the Lagrange parameter as

λ← λ+ μ∇wa∗L6: Normalize the updated demixing vector

w← w/‖w‖27: If the algorithm has not converged, go to step 3.

symmetric orthogonalization method. Orthogonalization pro-cedures in the quaternion domain follow from the alreadyestablished results [34].

Computational complexity of the proposed q-FastICAis approximately O(4160N3 + 32N2T ), whereas that ofc-FastICA in the long vector mode is approximately O(32N3+48N2T ), symbols N = Ns and T denote, respectively, thenumber of mixtures/sources and the number of samples foreach source.

IV. SIMULATIONS AND DISCUSSION

A. Benchmark Simulations

The performance of the q-FastICA algorithm is firstassessed through simulations using synthetic 4-D signal codeslocated on the edges of geometric polytopes [35] withvarying degree of Q-improperness. To assess the degree ofQ-improperness of the generated sources, we define a circular-ity measure based on the ratio of the complementary variancesto the standard variance, expressed as [18]

r =∣∣E{qqı∗}∣∣+∣∣E{qqj∗}∣∣+∣∣E{qqκ∗}∣∣

3E{qq∗} , r ∈ [0, 1]. (26)

This way, a measure of r = 0 indicates a Q-proper source,while for a highly Q-improper source r = 1.

For the q-FastICA algorithm with symmetric orthogonaliza-tion, the standard performance index (PI) measure was used,given by

P I = 10 log10

⎛

⎝ 1

N

N∑

i=1

⎛

⎝N∑

j=1

|bi j |max{|bi1|, . . . , |bi N |} − 1

⎞

⎠

+ 1

N

N∑

j=1

(N∑

i=1

|bi j |max{|b1 j |, . . . , |bN j |} − 1

)⎞

⎠ (27)

where BH = WH VA and bi j = (B)i j and a PI of less than−10 dB signifies good separation performance.

For simulations, 5000 samples of four polytope sourceswere mixed using a randomly generated quaternion-valued

−1 0 1−1

0

1

s 1s 2

s 3s 4

� − �i

−1 0 1−1

0

1

� − �j

−1 0 1−1

0

1

� − �k

−1 0 1−1

0

1

�i − �

j

−1 0 1−1

0

1

�i − �

k

−1 0 1−1

0

1

�j − �

k

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

(a)

−1 0 1−1

0

1

y 1y 2

y 3y 4

� − �i

−1 0 1−1

0

1

� − �j

−1 0 1−1

0

1

� − �k

−1 0 1−1

0

1

�i − �

j

−1 0 1−1

0

1

�i − �

k

−1 0 1−1

0

1

�j − �

k

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

−1 0 1−1

0

1

(b)

1 2 3 4 5 6−20

−18

−16

−14

−12

−10

−8

−6

−4

iteration

(c)

PI (

dB)

G1

G2

G3

Fig. 1. Performance of the q-FastICA algorithm for the separation of foursources using a symmetric orthogonalization procedure. (a) Scatter plot ofthe quaternion sources, properties given in Table I. (b) Scatter plot of theestimated sources, with nonlinearity G1(y). (c) PI at each iteration of theICA procedure for nonlinearities G1(y) = log cosh(y), G2(y) = √0.1 + yand G3(y) = log(0.1 + y).

4× 4 mixing matrix. The observed mixtures were then decor-related and processed using the q-FastICA algorithm (25),with symmetric orthogonalization. To estimate the sourcessimultaneously, Table I describes the source properties, whilescatter plot performance graphs are given in Fig. 1(a). Sourcess1(k) to s4(k) were, respectively, generated from cubic,5-point dicyclic, 2-point cyclic, and 3-point cyclic groups,source s3(k) had a high degree of Q-improperness r = 1,the source s4(k) had the value of r = 0.3351, and the othertwo sources were Q-proper. For performance comparison,



TABLE I

SOURCE PROPERTIES FOR SIMULATIONS ON BENCHMARK SIGNALS

BASED ON SYMMETRIC ORTHOGONALIZATION

Source Polytope Q-improperness measure (r)

s1(k) Cubic 0.0104s2(k) Dicyclic (5 point) 0.0089s3(k) Cyclic (2 point) 1.0000s4(k) Cyclic (3 point) 0.3351

TABLE II

SEPARATION OF SOURCES WITH DIFFERENT MEASURES OF

Q-IMPROPERNESS. PI (27) IS GIVEN IN DECIBELS

Quaternion infomax ICArs1 rs2 Full-H Split-C q-FastICA

‘CASE 1’ 0.01 0.01 −5.57 −3.11 −16.69‘CASE 2’ 0.34 0.02 −4.57 −2.51 −18.16‘CASE 3’ 0.35 1.00 −3.11 −2.21 −17.35

the nonlinearity G was chosen as in [7], with G1(y) =log cosh(|y|2), G2(y) = √

0.1+ |y|2 and G3(y) = log(0.1 +|y|2). The demixing matrix W was initialized randomly, andthe step sizes were set as μ1 = 1, μ2 = 0.1, μ3 = 0.5, andλ = 5 for the gradient ascent update algorithm. As shownin Fig. 1(c), the algorithm successfully separated all the foursources with the respective PI values of −17.87,−15.8086,and −19.4882 dB. Fig. 1(b) illustrates the scatter plots of thenormalized estimated sources for the case with nonlinearityG1, note that sources were estimated in a random order.

B. Effect of Q-Improperness on Performance

In the next stage, the effect of the Q-impropriety of sourceson the performance of the algorithm when separating usinga symmetric orthogonalizsation method was assessed overthree cases for the mixture of 5000 samples of a Q-properand Q-improper polytope signal, given in Table II and alsoshown in Fig. 2. For comparison, the quaternion Infomax ICAalgorithm [25] was utilized with both full-quaternion and split-complex nonlinearity, where G(y) = log cosh(y). Observe thatthe q-FastICA algorithm had a consistent performance in allthree cases with an average PI of −17.40 dB, outperform-ing the quaternion Infomax ICA whose performance furtherdeteriorated for Q-improper sources, having an average PIof −4.42 and −2.61, respectively, for the full-quaternion andsplit-complex nonlinearity.

C. EEG Artifact Extraction

In an EEG recording session, each channel comprises amixture of an EEG signal corresponding to the neural activity,and electrical activity pertaining to artifacts such as movementof the head, line noise, and eye blinks. In the modeling of theEEG signal, the artifacts, both external and biological, wereconsidered statistically independent from pure EEG [36]–[38];and the usefulness of the real-valued FastICA algorithm in theextraction of eyeblink artifacts was studied in [39].

In our experimental setup, data was sampled at 4.8 kHz for30 s from 12 electrodes placed symmetrically on the scalp

Case 1 Case 2 Case 3−20

−18

−16

−14

−12

−10

−8

−6

−4

−2

0

PI (

dB)

Infomax (Full−H) Infomax (Split−C) q−FastICA

Fig. 2. Performance of the proposed q-FastICA algorithm (25) and thequaternion Infomax ICA in separating sources with varying degrees ofQ-improperness, whose properties are given in Table II.

according to the 10–20 system, as shown in Fig. 3(f), withthe reference and ground electrodes placed, respectively, onthe right earlobe and forehead. The electrodes used were AF7,AF8, AF3, AF4, ML, MR, C3, C4, PO7, PO8, PO3, and PO4,where the ML and MR electrodes were placed, respectively, onthe left and right mastoid. In addition, the voltage differencebetween the two pairs of electrodes placed above and to theside of the eye sockets measured the electrooculogram (EOG),i.e., the electrical activity due to eye blinks and eye movement.The left and right EOG was combined into a quadrivariatesignal as a reference to assess the performance of the q-FastICA algorithm in artifact removal from EEG.

The three quaternion-valued EEG signals were formedfrom four symmetric electrodes from the frontal (AF7, AF8,AF3, AF4), central (ML, MR, C3, C4) and occipital (PO7,PO8, PO3, PO4) regions of the head. The so-constructed Q-improper quaternion signals were

x1(k) = AF8(k)+ ıAF4(k)+ jAF7(k)+ κAF3(k)

x2(k) = MR(k)+ ıC3(k)+ jML(k)+ κC4(k)

x3(k) = PO8(k)+ ıPO4(k)+ jPO3(k)+ κPO7(k) (28)

and the observed EEG mixtures were then represented asx = [x1(k), x2(k), x3(k)]T . The corresponding degrees of Q-impropropriety were, respectively, 0.8902, 0.6824, and 0.8932,measured according to (26).

In this scheme, the q-FastICA algorithm (25) was firstutilized to estimate the source signals, with the step size μ = 1and initial Lagrange parameter λ = 5 was chosen empiri-cally, while following our earlier analysis, the nonlinearitywas chosen to be G(y) = log cosh(y). Next, the estimatedsource pertaining to the EOG artifact was selected throughthe examination of the kurtosis values of the components ofthe separated sources. Pure EEG signals typically have near-zero kurtosis values, while those belonging to EOG artifactshave super-Gaussian distributions and thus large kurtosis val-ues [40], this being attributed the the sparse nature of eyeblinks.



0 2 4 6 8 10 12 14 16 18 20 22 24 26

AF8

AF4

AF7

AF3

C4

MR

C3

ML

PO8

PO4

PO3

PO7

vEOG

hEOG

time (s)

(a)

40 2 6 8 10 12 14 16 18 20 22 24 26

�{y1}

�{y2}

�{y3}

�i{y

1}

�i{y

2}

�i{y

3}

�i{y

1}

�j{y

2}

�j{y

3}

�k{y

1}

�k{y

2}

�k{y

3}

time (s)

40 2 6 8 10 12 14 16 18 20 22 24 26time (s)

40 2 6 8 10 12 14 16 18 20 22 24 26time (s)

40 2 6 8 10 12 14 16 18 20 22 24 26time (s)

(b)

kurt

osis

AF8

AF4

AF7

AF3 C4

MR C3

ML

PO8

PO4

PO3

PO7

vEO

G

hEO

G

kurt

osis

(c)

EO

GE

xtra

cted

EO

Gre

sidu

e

(d)

6 7 8 9

6 7 8 9

6 7 8 9

6 7 8 9

6 7 8 9

6 7 8 9 6 7 8 9

−505

−505

−505

−505

−202

−202

6 7 8 9

6 7 8 9

6 7 8 9

6 7 8 9

6 7 8 9

−505

−505

−505

−505

−202

−202

AF8

AF4

AF7

AF3

C4

MR

C3

ML

PO8

PO4

PO3

time (s)

PO7

time (s)(e)

AF3AF8AF7

C3

PO3 PO4

PO7 PO8

MRMLC4

AF4

(f)

20151050

10

5

0

0.20

−0.2

0.20

−0.2

0.20

−0.2

� �k

�j

�i

y1(k) y

2(k) y

3(k)

Fig. 3. Removal of EOG artifact from an EEG recording using the proposed q-FastICA algorithm. (a) Recorded EEG and EOG channels. (b) Componentsof the estimated sources. (c) Top: Kurtosis values of the recorded EEG channels, bottom: Kurtosis values of each component of the estimates. (d) Originaland reconstructed EOG signals, along with the residual estimation error. (e) Original recorded EEG (thick gray line) and clean EEG mixture after artifactremoval (thin black line), shown between 6s–9s. (f) Placement of the EEG recording electrodes.

A waveform of the original recorded channels and thecomponents of the quaternion-valued separated sources aredepicted, respectively, in Fig. 3(a) and (b). The occurrencesof eye blinks can be seen at the beginning of the record-ing, then at around 7, 15, and 22 s, with the effect of

the EOG artifact more prominent on the frontal lobe chan-nel and less severe in the central and occipital channels.By inspection, the separated EOG artifact can be seen inthe components of the third extracted source y3(k), that is�{y3(k)},�ı{y3(k)},�j {y3(k)}, and�κ{y3(k)}, and is verified



−0.1 −0.05 0 0.05 0.1−2−1

012

s1a

−0.2 −0.1 0 0.1 0.2−2−1

012

s2a

−0.1 −0.05 0 0.05 0.1−2−1

012

s1b

−0.2 −0.1 0 0.1 0.2−2−1

012

s2b

(a)

−2 −1 0 1 2−1

−0.50

0.51

cY1

−1 −0.5 0 0.5 1−2−1

012

cY3

(b)

−2 −1 0 1 2

−2 −1 0 1 2

−0.4−0.2

00.20.4

cY1

−2−1

012

cY3

−2 −1 0 1 2−1

−0.50

0.51

cY2

−1 −0.5 0 0.5 1−2−1

012

cY4

−2 −1 0 1 2

−2 −1 0 1 2

−0.4−0.2

00.20.4

cY2

−2−1

012

cY4

(c)

−1 −0.5 0 0.5 1−1

−0.50

0.51

Y1a

−0.1 0 0.1−2−1

012

Y2a

−0.05 0 0.05

0.05

0

0.05

Y1b

−2−1

012

Y2b

−0.1 −0.05 0 0.05 0.1

(d)

Fig. 4. Communication codes in quaternion mixing. (a) Communicationsources. (b) c-FastICA estimates (long vector). (c) c-FastICA estimates.(d) q-FastICA estimates.

through comparison of the kurtosis values of each componentFig. 3(c). While most of the estimated sources had a near-zero measure of kurtosis, the real and imaginary compo-nents of y3(k) exhibited, in comparison, very large kurtosisvalues.

To study the effectiveness of the algorithm in removingthe artifact, the components of y3(k) were reconstructed toform the EOG artifact by averaging the four channels ofy3(k) and were then compared to the original quadrivariate

1 0.5 0 0.5 1−1

−0.50

0.51

s1a

0.04 0.02 0 0.02 0.04−2−1

012

s2a

1 0.5 0 0.5 1−1

−0.50

0.51

s1b

0.04 0.02 0 0.02 0.04−2−1

012

s2b

(a)

−2 −1 0 1 2−2−1

012

cY1

−2 −1 0 1 2−2−1

012

cY3

−2 −1 0 1 2

−2 −1 0 1 2

−2 −1 0 1 2−2−1

012

cY1

−2 −1 0 1 2−2−1

012

cY3

(b)

−1−0.5

00.5

1

−1−0.5

00.5

1

cY1

−2−1

012

−2−1

012

cY3

−2 −1 0 1 2

0.5 0 0.5−2−1

012

cY2

−2−1

012

cY4

(c)

−0.04 −0.02 0 0.02 0.04

Y1a

−1 −0.5 0 0.5 1

Y2a

−2−1

012

Y1b

−0.04 −0.02 0 0.02 0.04

−0.5 0 0.5

−0.4−0.2

00.20.40.6

Y1b

(d)

Fig. 5. Communication codes in quaternion mixing when the constellationsignals s1a �= s1b. (a) Communication sources. (b) c-FastICA estimates (longvector). (c) c-FastICA estimates. (d) q-FastICA estimates.

EOG recording (which is also the average of the two pairs ofelectrodes placed near the eye sockets). Fig. 3(d) depicts theEOG extraction performance, along with the residual error ofthe estimation process, having a small MSE of 1.21× 10−4.By excluding the artifact components contained in y3(k), theclean EEG mixture was reconstructed, a 3-s window between6 and 9 s for each reconstructed channel is shown in Fig. 3(e),where the effect of the EOG present at 7 s was diminished inall the channels.



−1−0.5

00.5

1

cY1

−2−1

012

cY3

−2−1

012

−0.04 −0.02 0 0.02 0.04

Y1a

−1−0.5

00.5

1

−1 −0.5 0 0.5 1

Y2a

1

−2−1

012

Y1b

−0.04 −0.02 0 0.02 0.04

−1−0.5

00.5

−1 −0.5 0 0.5 1

Y2b

−1 −0.5 0 0.5 1−1

−0.50

0.51

s1a

−0.04 −0.02 0 0.02 0.04−2−1

012

s2a

−1 −0.5 0 0.5 1−1

−0.50

0.51

s1b

−0.04 −0.02 0 0.02 0.04−2−1

012

s2b

(a)

−1 −0.5 0 0.5 1−1

−0.50

0.51

cY1

−0.4 −0.2 0 0.2 0.4−2−1

012

cY3

−1 −0.5 0 0.5 1−1

−0.50

0.51

cY2

−0.4 −0.2 0 0.2 0.4−2−1

012

cY4

(b)

−1 −0.5 0 0.5 1

−0.4 −0.2 0 0.2 0.4

−1−0.5

00.5

1

cY2

−2−1

012

cY4

−1 −0.5 0 0.5 1

−0.5 0 0.5

(c)

(d)

Fig. 6. Communication codes in complex mixing. (a) Communicationsources. (b) c-FastICA estimates (long vector). (c) c-FastICA estimates. (d) q-FastICA estimates.

D. Alamouti-Based Communication Systems

We next present the application of q-FastICA in the contextof a practical communication problem of Alamouti coding,which was shown to enhance the reliability of data communi-cations. For a single user, the model of two transmit antennasand one antenna receiver can be expressed as [41]

[xa

xb

]=

[sa −s∗bsb s∗a

] [aα

aβ

]∈ C (29)

x = as ∈ H (30)

where xa and xb are two consecutive observed signals at thereceiver, sa and sb denote the transmitted complex symbolsto be recovered, and aα and aβ denote the channel responsesbetween the two transmit antennas and receiver. The complexmixing problem in (29) can be formulated as a quaternionsource separation problem in (30), where x = xa + xbj ands = sa + sbj [42]. The single-user quaternion model can begeneralized for a multiuser communication system as in (17),where each mixing matrix entry ai jε represents the channelbetween the receiver of the i th user and the εth transmitantenna of the j th user, ε ∈ {α, β}.

For clarity, a 2 × 2 source separation in H (31) can beexpressed in C (32) as

[x1x2

]=

[a11a12a21a22

][s1s2

]∈ H (31)

corresponding to[x1a

x1b

]=

[s1a−s∗1bs1b s∗1a

][a11α

a11β

]+[s2a−s∗2bs2b s∗2a

][a12α

a12β

]∈ C

[x2a

x2b

]=

[s2a−s∗2bs2b s∗2a

][a22α

a22β

]+[s1a−s∗1bs1b s∗1a

][a21α

a21β

]∈ C. (32)

We have considered three scenarios: 1) the pair of sym-bols sia = sib corresponding to the i th user were equal;2) the pair of symbols s1a �= s1b of the first user werenot equal; and 3) the mixing matrix A in (31) was complexvalued. The sources were communication constellations suchas binary phase shift keying (BPSK), quadrature PSK (QPSK),and 16-quadrature amplitude modulation (QAM) for thesethree scenarios as shown in Figs. 4(a), 5(a), and 6(a). Forall the scenarios, q-FastICA was compared with c-FastICA[10] to solve this special complex-valued BSS problem. Twomethodologies were considered for using c-FastICA: 1) thelong vector approach in which the i th quaternion observationxi = xia + xibj were split into two complex observationsxia and xib , which were then concatenated into a “long”complex-valued vector, 2) the conventional approach wherebythe two quaternion observations were split into four complexobservations, i.e., {x1, x2} ∈ H ≡ {x1a, x1b, x2a, x2b} ∈ C.For the first scenario in Fig. 4, observe that the estimatesof the long vector method are different from those by theconventional method. However, both estimates were far frombeing close to the original complex sources. On the other hand,the q-FastICA gave reasonable estimates of the sources. Thenature of dependence between sia and sib explains why c-FastICA failed to recover the sources. Moreover, the model ofAlamouti-based communication BSS problem in (29) is notbest suited for the conventional ICA model. As for scenariotwo in Fig. 5, it was expected that the c-FastICA estimateswould improve, as now we have three independent sources,instead of just two. Again, the long-vector c-FastICA tech-nique did not estimate successfully any of the three sources,however, conventional c-FastICA recovered the QPSK source.In practice, it is highly unlikely that sia and sib are different,because that would mean the user is alternating between BPSKand QPSK for each consecutive symbol period. For the sake ofcomparison, we have also considered complex-valued mixingin the context of Alamouti-based communication systems in



the third scenario in Fig. 6. This scenario is also unlikelydue to the fact that the channel response aib of the secondtransmitter antenna for each i th user is zero. Both long vectorand conventional c-FastICA recovered an improved versionof the QPSK source (vis-a-vis their estimates of scenario 2).However, they did not succeed in reconstructing the 16-QAMsource, in contrast to q-FastICA.

V. CONCLUSION

An ICA algorithm suitable for the blind separation of bothQ-proper and Q-improper sources has been introduced. Thewell-known negentropy-based cost function has been utilizedto estimate independent quaternion-valued sources, while anaugmented Newton method implementation has allowed theextension of the FastICA methodology to the quaterniondomain. The performance of the proposed widely q-FastICAalgorithm in both deflationary and simultaneous separation,using benchmark quaternion polytope signals, has been illus-trated. The algorithm has also been shown to be effective inthe removal of ocular artifacts from the EEG and in Alamouti-based communication systems.

APPENDIX ISOME BACKGROUND RESULTS FROM HR CALCULUS

Some results used in the derivation of the q-FastICAalgorithm (25) are discussed below.

A. Chain Rule in the HR Calculus

For a composite quaternion function F ◦ G = F(G(q)) :H �→ H, the chain rule is expressed as

∂ F

∂ξ= ∂F

∂G∂G∂ξ + ∂F

∂Gı∂Gı

∂ξ + ∂F∂Gj

∂Gj

∂ξ + ∂F∂Gκ

∂Gκ

∂ξ (33)

and ξ = {q, qı , qj , qκ}. To show this, the total differential ofF(q) can be written as [20]

d F = ∂F∂ q dq + ∂F

∂ q ı dq ı + ∂F∂ qj dqj + ∂F

∂ qκ dqκ (34)

where the dummy variable q � G(q). Likewise, the totaldifferential for G(q) is given by

dG = ∂G∂q dq + ∂G

∂qı dqı + ∂G∂qj dqj + ∂G

∂qκ . (35)

By substituting (35) into (34), and after rearranging theexpressions, we obtain the total differential of F with respectto q as

d F =(

∂F∂G

∂G∂q + ∂F

∂Gı∂Gı

∂q + ∂F∂Gj

∂Gj

∂q + ∂F∂Gκ

∂Gκ

∂q

)dq

+(

∂F∂G

∂G∂qı + ∂F

∂Gı∂Gı

∂qı + ∂F∂Gj

∂Gj

∂qı + ∂F∂Gκ

∂Gκ

∂qı

)dqı

+(

∂F∂G

∂G∂qj + ∂F

∂Gı∂Gı

∂qj + ∂F∂Gj

∂Gj

∂qj + ∂F∂Gκ

∂Gκ

∂qj

)dqj

+(

∂F∂G

∂G∂qκ + ∂F

∂Gı∂Gı

∂qκ + ∂F∂Gj

∂Gj

∂qκ + ∂F∂Gκ

∂Gκ

∂qκ

)dqκ

where the derivatives ∂ F/∂ξ are given by the terms within thebrackets. The corresponding chain rule for the HR∗ derivativescan be obtained similarly, and the result of (33) can beextended to vector-valued functions to form a generalizedchain rule for the derivatives.

B. Augmented Quaternion Newton Method

The isomorphism between R4 and H allows for considera-tion of the duality between the derivatives in the two domains.This methodology was previously considered in [43] andresulted in the derivation of the augmented complex Newtonmethod. The extension of this paper to the quaternion domainbased on the involution bases was detailed in [20]. A shortsummary is presented below. For a function f (q) : HN �→R, its augmented gradient ∇qa∗ f = ∂ f /∂qa∗ and HessianHa

qq = (∂/∂qa∗)(∂ f /∂qq∗)T , where the augmented vector

qa = [qT , qıT , qjT , qκT ]T . The augmented Newton updatecan then be written as

�qa = −(Haqq

)−1 · ∇qa∗ f (36)

where �qa = qa(k + 1)− qa(k) denotes the change in qa ineach consecutive update.

Finally, observe that the elements of the augmented Hessianmatrix

Haqq =

⎡

⎢⎢⎣

Hq∗q∗ Hqı∗q∗ Hqj∗q∗ Hqκ∗q∗Hq∗qı∗ Hqı∗qı∗ Hqj∗qı∗ Hqκ∗qı∗Hq∗qj∗ Hqı∗qj∗ Hqj∗qj∗ Hqκ∗qj∗Hq∗qκ∗ Hqı∗qκ∗ Hqj∗qκ∗ Hqκ∗qκ∗

⎤

⎥⎥⎦ (37)

can be written in terms of its first row by utilizing theinvolution property in (16) and noting that

((·)α)β = (·)γ , α �=

β �= γ = {ı, j, κ}.

APPENDIX IIDERIVATION OF THE AUGMENTED Q-FASTICA

UPDATE ALGORITHM

A. First and Second Derivatives of the Cost Function J (w)

Firstly, by using the product rule, the derivatives of theinvolutions of |y|2 = yy∗ = |wH x|2 with respect to theconjugate demixing vector w∗ are calculated as

∂ yy∗

∂w∗= ∂ y

∂w∗y∗ + y

∂ y∗

∂w∗= xy∗ − 1

2yx∗

∂(yy∗)ı

∂w∗= ∂ yı

∂w∗yı∗ + yı ∂ yı∗

∂w∗= 1

2yı xı∗

∂(yy∗)j

∂w∗= ∂ yj

∂w∗yj∗ + yj ∂ yj∗

∂w∗= 1

2yj xj∗

∂(yy∗)κ

∂w∗= ∂ yκ

∂w∗yκ∗ + yκ ∂ yκ∗

∂w∗= 1

2yκxκ∗.

Then, by using the chain rule (33) and after simplification, weobtain the gradients of the cost function as

∇w∗J = E{2g(|y|2)xy∗}∇wı∗J = E{2g(|y|2)xyı∗}∇wj∗J = E{2g(|y|2)xyj∗}∇wκ∗J = E{2g(|y|2)xyκ∗} (38)

where g is the first derivative of G, this result can alsobe interpreted based on the involution property (16). Aftersimplifications and considering the whiteness of x, the second



derivatives of the cost function J can then be calculated as

∂

∂w∗

(∂J∂w∗

)T

= E{4g′(|y|2)xy∗xT y∗ − g(|y|2)I}∂

∂wı∗

(∂J∂w∗

)T

= E{2g′(|y|2)(xy∗)ı (xT y∗)+ g(|y|2)I}∂

∂wj∗

(∂J∂w∗

)T

= E{2g′(|y|2)(xy∗)j (xT y∗)+ g(|y|2)I}∂

∂wκ∗

(∂J∂w∗

)T

= E{2g′(|y|2)(xy∗)κ(xT y∗)+ g(|y|2)I}(39)

where g′ is the second derivative of G and the calculationsof the remaining derivatives follow from property (16). Noticethat the non-commutativity of the quaternion product prohibitsfurther simplification of the derivatives in (39).

B. Augmented Newton Update

The Lagrangian function L for the optimization problemin (24) is given by

L(w, λ) = J (w)+ λ(wH w− 1)︸︷︷︸� c

(40)

where λ ∈ R is the Lagrange parameter. We can use theNewton method (36) to find the extrema of (40), where

∂L∂wa∗ = ∂J

∂wa∗ + ∂c∂wa∗

∂

∂wa∗

(∂L

∂wa∗

)T

= Haww + ∂

∂wa∗

(∂c

∂wa∗

)T

. (41)

The augmented gradient and Hessian of J are then obtainedusing (38) and (39). The gradients of c are given by

∂c

∂w∗= λ

(w− 1

2w∗

)∂c

∂wı∗ =λ

2w∗

∂c

∂wj∗ =λ

2w∗ ∂c

∂wκ∗ =λ

2w∗

and the Hessian can be calculated from

∂

∂w∗

(∂c

∂w∗

)T

= −λI∂

∂wı∗

(∂c

∂w∗

)T

= −λ

2I

∂

∂wj∗

(∂c

∂w∗

)T

= −λ

2I

∂

∂wκ∗

(∂c

∂w∗

)T

= −λ

2I.

The Newton update is obtained by substituting these resultsin (36). Finally, the Lagrange parameter λ is updated using agradient ascent method, where at each iteration the demixingvector w is first updated via the augmented Newton method,followed by the update of λ using the current value of w anda normalization of the demixing vector [44], as in (25).

ACKNOWLEDGMENT

The authors would like to thank N. Le Bihan for providingthe script to generate the class of polytope signal codes andalso C. Jahanchahi for discussions on the HR calculus.

REFERENCES

[1] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analy-sis. New York: Wiley, 2001.

[2] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “Ablind source separation technique using second-order statistics,” IEEETrans. Signal Process., vol. 45, no. 2, pp. 434–444, Feb. 1997.

[3] J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussiansignals,” IEE Proc. F Radar Signal Process., vol. 140, no. 6, pp. 362–370, Dec. 1993.

[4] A. J. Bell and T. J. Sejnowski, “An information-maximization approachto blind separation and blind deconvolution,” Neural Comput., vol. 7,no. 6, pp. 1129–1159, Nov. 1995.

[5] A. Hyvärinen, “Fast and robust fixed-point algorithms for independentcomponent analysis,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp. 626–634, May 1999.

[6] E. Oja and Z. Yuan, “The FastICA algorithm revisited: Convergenceanalysis,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1370–1381,Nov. 2006.

[7] E. Bingham and A. Hyvärinen, “A fast fixed-point algorithm for inde-pendent component analysis of complex valued signals,” J. Neural Syst.,vol. 10, no. 1, pp. 1–8, Feb. 2000.

[8] S. C. Douglas, “Fixed-point algorithms for the blind separation ofarbitrary complex-valued non-Gaussian signal mixtures,” EURASIP J.Appl. Signal Process., vol. 2007, no. 1, pp. 83–98, Jan. 2007.

[9] J. Eriksson and V. Koivunen, “Complex random vectors and ICAmodels: Identifiability, uniqueness, and separability,” IEEE Trans. Inf.Theory, vol. 52, no. 3, pp. 1017–1029, Mar. 2006.

[10] M. Novey and T. Adali, “On extending the complex FastICA algorithmto noncircular sources,” IEEE Trans. Signal Process., vol. 56, no. 5, pp.2148–2154, May 2008.

[11] D. P. Mandic and S. L. Goh, Complex Valued Nonlinear Adaptive Filters:Noncircularity, Widely Linear and Neural Models. New York: Wiley,2009.

[12] P. J. Schreier and L. L. Scharf, Statistical Signal Processing of Complex-Valued Data. Cambridge, U.K.: Cambridge Univ. Press, 2010.

[13] K. Kreutz-Delgado, “The complex gradient operator and the CR-calculus,” Dept. Electr. Comput. Eng., Univ. California, San Diego, Tech.Rep. ECE275A, 2006, pp. 1–74.

[14] J. P. Ward, Quaternions and Cayley Numbers. Norwell, MA: Kluwer,1997.

[15] S. Sangwine and N. Le Bihan, “Quaternion polar representation witha complex modulus and complex argument inspired by the Cayley-Dickson form,” Adv. Appl. Clifford Algebras, vol. 20, no. 1, pp. 111–120,Mar. 2010.

[16] N. N. Vakhania, “Random vectors with values in quaternion Hilbertspaces,” Theory Probab. Appl., vol. 43, no. 1, pp. 99–115, Jan. 1999.

[17] N. Le Bihan and P. O. Amblard, “Detection and estimation of Gaussianproper quaternion valued random processes,” in Proc. 7th IMA Conf.Math. Signal Process., Cirencester, U.K., 2006, pp. 1–3.

[18] C. C. Took and D. P. Mandic, “Augmented second-order statistics ofquaternion random signals,” Signal Process., vol. 91, no. 2, pp. 214–224, Feb. 2011.

[19] C. C. Took, D. Mandic, and F. Zhang, “On the unitary diagonalisationof a special class of quaternion matrices,” Appl. Math. Lett., vol. 24,no. 11, pp. 1806–1809, Nov. 2011.

[20] D. P. Mandic, C. Jahanchahi, and C. C. Took, “A quaternion gradientoperator and its applications,” IEEE Signal Process. Lett., vol. 18, no. 1,pp. 47–50, Jan. 2011.

[21] J. Vía, D. Ramírez, and I. Santamaría, “Properness and widely linearprocessing of quaternion random vectors,” IEEE Trans. Inf. Theory,vol. 56, no. 7, pp. 3502–3515, Jul. 2010.

[22] C. C. Took and D. P. Mandic, “A quaternion widely linear adaptivefilter,” IEEE Trans. Signal Process., vol. 58, no. 8, pp. 4427–4431, Aug.2010.

[23] B. C. Ujang, C. C. Took, and D. P. Mandic, “Split quaternion nonlinearadaptive filtering,” Neural Netw., vol. 23, no. 3, pp. 426–434, Apr. 2010.

[24] B. C. Ujang, C. C. Took, and D. P. Mandic, “Quaternion-valuednonlinear adaptive filtering,” IEEE Trans. Neural Netw., vol. 22, no. 8,pp. 1193–1206, Aug. 2011.

[25] N. Le Bihan and S. Buchholz, “Quaternionic independent componentanalysis using hypercomplex nonlinearities,” in Proc. 7th IMA Conf.Math. Signal Process., 2006, pp. 1–3.

[26] T. A. Ell and S. J. Sangwine, “Quaternion involutions and anti-involutions,” Comput. & Math. Appl., vol. 53, no. 1, pp. 137–143, Jan.2007.



[27] B. Picinbono, “Second-order complex random vectors and normal distri-butions,” IEEE Trans. Signal Process., vol. 44, no. 10, pp. 2637–2640,Oct. 1996.

[28] P. J. Schreier and L. L. Scharf, “Second-order analysis of impropercomplex random vectors and processes,” IEEE Trans. Signal Process.,vol. 51, no. 3, pp. 714–725, Mar. 2003.

[29] W. Wirtinger, “Zur formalen theorie der funktionen von mehr komplexenveränderlichen,” Math. Annalen, vol. 97, no. 1, pp. 357–375, Dec. 1927.

[30] D. H. Brandwood, “A complex gradient operator and its application inadaptive array theory,” IEE Proc. F: Commun., Radar Signal Process.,vol. 130, no. 1, pp. 11–16, Feb. 1983.

[31] F. Zhang, “Quaternions and matrices of quaternions,” Linear AlgebraAppl., vol. 251, pp. 21–57, Jan. 1997.

[32] A. Sudbery, “Quaternionic analysis,” Math. Proc. Cambridge Philos.Soc., vol. 85, no. 2, pp. 199–225, 1979.

[33] S. De Leo and P. P. Rotelli, “Quaternionic analyticity,” Appl. Math. Lett.,vol. 16, no. 7, pp. 1077–1081, Oct. 2003.

[34] A. T. Erdogan, “On the convergence of ICA algorithms with symmetricorthogonalization,” IEEE Trans. Signal Process., vol. 57, no. 6, pp.2209–2221, Jun. 2009.

[35] L. H. Zetterberg and H. Brändström, “Codes for combined phase andamplitude modulated signals in a 4-D space,” IEEE Trans. Commun.,vol. 25, no. 9, pp. 943–950, Sep. 1977.

[36] M. K. I. Molla, T. Tanaka, T. M. Rutkowski, and A. Cichocki, “Separa-tion of EOG artifacts from EEG signals using bivariate EMD,” in Proc.Int. Conf. Acoust. Speech Signal Process., Dallas, TX, Mar. 2010, pp.562–565.

[37] A. Greco, N. Mammone, F. C. Morabito, and M. Versaci, “Semi-automatic artifact rejection procedure based on kurtosis, Renyi’s entropyand independent component scalp maps,” in Proc. Int. Enform. Conf.,2005, pp. 22–26.

[38] P. S. Kumar, R. Arumuganathan, K. Sivakumar, and C. Vimal, “Anadaptive method to remove ocular artifacts from EEG signals usingwavelet transform,” J. Appl. Sci. Res., vol. 5, no. 7, pp. 711–745, 2009.

[39] R. N. Vigário, “Extraction of ocular artefacts from EEG using inde-pendent component analysis,” Electroencephal. Clinical Neurophysiol.,vol. 103, no. 3, pp. 395–404, Sep. 1997.

[40] A. Delorme, S. Makeig, and T. Sejnowski, “Automatic artifact rejectionfor EEG data using high-order statistics and independent componentanalysis,” in Proc. Int. Workshop ICA, 2001, pp. 457–462.

[41] S. M. Alamouti, “A simple transmit diversity technique for wirelesscommunications,” IEEE J. Sel. Areas Commun., vol. 16, no. 8, pp. 1451–1458, Oct. 1998.

[42] J. Vía, D. P. Palomar, L. Vielva, and I. Santamaría, “Quaternion ICAfrom second-order statistics,” IEEE Trans. Signal Process., vol. 59, no. 4,pp. 1586–1600, Apr. 2011.

[43] A. Bos, “Complex gradient and Hessian,” IEE Proc. Vis., Image SignalProcess., vol. 141, no. 6, pp. 380–383, Dec. 1994.

[44] S. Zhang and A. G. Constantinides, “Lagrange programming neuralnetworks,” IEEE Trans. Circuits Syst. II: Analog Digital Signal Process.,vol. 39, no. 7, pp. 441–452, Jul. 1992.

Soroush Javidi (S’09) received the M.Eng. andPh.D. degrees in electrical and electronic engineer-ing from Imperial College London, London, U.K.,in 2006 and 2010, respectively.

His research has focused on the use of augmentedcomplex statistics and widely linear models in prac-tical applications, such as electroencephalogramsand wind modeling. His current research interestsinclude statistical signal processing and complex-valued blind source separation.

Clive Cheong Took (S’06–M’07) received theB.S. degree in telecommunication engineering fromKings College, London University, London, U.K.,where he was the top departmental graduate in 2004,and the Ph.D. degree in blind signal processing fromCardiff University, Wales, U.K., in 2007.

He is currently with Imperial College London,London. His current research interests include adap-tive, blind, and multidimensional signal process-ing with applications in biomedicine and renewableenergy.

Danilo P. Mandic (M’99–SM’03) received thePh.D. degree in electronic engineering from ImperialCollege London, London, U.K., in 1999.

He is a Professor of signal processing with Impe-rial College London and is working in the area ofadaptive signal processing and multiscale modeling.He has been a Guest Professor with Katholieke Uni-versiteit Leuven, Leuven, Belgium, and a FrontierResearcher at RIKEN, Tokyo, Japan. His publicationrecord includes two research monographs entitledRecurrent Neural Networks for Prediction (1st ed.,

Aug. 2001) and Complex Valued Nonlinear Adaptive Filters: Noncircularity,Widely Linear and Neural Models (1st ed., Wiley, Apr. 2009), an edited bookentitled Signal Processing for Information Fusion (Springer, 2008), and morethan 200 publications on signal processing.

Prof. Mandic is a member of the IEEE Technical Committee on Theoryand Methods for Signal Processing and an Associate Editor for the IEEETRANSACTIONS ON NEURAL NETWORKS, and has been on the editorialboards of the IEEE TRANSACTIONS ON SIGNAL PROCESSING and the IEEETRANSACTIONS ON CIRCUITS AND SYSTEMS II. He has produced awardwinning papers and products resulting from his collaboration with industry.

ieee transactions on neural networks 1 fast …mandic/research/q_fast_ica_ieee... · 2011-10-24 ·...

Documents