1 a convex analysis framework for blind separation of non...
TRANSCRIPT
1
A Convex Analysis Framework for BlindSeparation of Non-Negative Sources
Tsung-Han Chan†, Wing-Kin Ma∗, Chong-Yung Chi‡, and Yue Wang§
Abstract
This paper presents a new framework for blind source separation (BSS) of non-negative sourcesignals. The proposed framework, referred herein to as convex analysis of mixtures of non-negativesources (CAMNS), is deterministic requiring no source independence assumption, the entrenched premisein many existing (usually statistical) BSS frameworks. Thedevelopment is based on a special assumptioncalled local dominance. It is a good assumption for source signals exhibiting sparsity or high contrast,and thus is considered realistic to many real-world problems such as multichannel biomedical imaging.Under local dominance and several standard assumptions, weapply convex analysis to establish a newBSS criterion, which states that the source signals can be perfectly identified (in a blind fashion) byfinding the extreme points of an observation-constructed polyhedral set. Algorithms for fulfilling theCAMNS criterion are also derived, using either linear programming or simplex geometry. Simulationresults on several data sets are presented to demonstrate the efficacy of the proposed method over severalother reported BSS methods.
Index Terms
Blind separation, Non-negative sources, Convex analysis criterion, Linear program, Simplex geom-etry, Convex optimization
EDICS: SSP-SSEP (Signal separation), SAM-APPL (Applications ofsensor & array multichannelprocessing), BIO-APPL (Applications of biomedical signalprocessing)
This work was supported in part by the National Science Council (R.O.C.) under Grants NSC 95-2221-E-007-122 and NSC95-2221-E-007-036, and by the US National Institutes of Health under Grants EB000830 and CA109872. This work was partlypresented at the 32th IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, Hawaii, USA,April 15-20, 2007.
†Tsung-Han Chan is with Institute of Communications Engineering & Department of Electrical Engineering, NationalTsing Hua University, Hsinchu, Taiwan, R.O.C. E-mail: [email protected], Tel: +886-3-5715131X34033, Fax: +886-3-5751787.
∗Wing-Kin Ma is the corresponding author. Address: Department of Electronic Engineering, The Chinese University of HongKong, Shatin, N.T., Hong Kong. E-mail: [email protected], Tel:+852-31634350, Fax: +852-26035558.
‡Chong-Yung Chi is with Institute of Communications Engineering & Department of Electrical Engineering, National TsingHua University, Hsinchu, Taiwan, R.O.C. E-mail: [email protected], Tel: +886-3-5731156, Fax: +886-3-5751787.
§Yue Wang is with Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and StateUniversity, Arlington, VA 22203, USA E-mail: [email protected], Tel: 703-387-6056, Fax: 703-528-5543.
December 27, 2007 DRAFT
2
I. INTRODUCTION
Recently there has been much interest in blind separation ofnon-negative source signals [1], [2],
referred herein to as non-negative blind source separation(nBSS). There are many applications where
the sources to be separated are non-negative by nature; for example, in analytical chemistry [3], [4],
hyperspectral imaging [5], and biomedical imaging [6]. Howto cleverly exploit the non-negative signal
characteristic in nBSS has been an intriguing subject, leading to numerous nBSS alternatives being
proposed [7]–[11].
One major class of nBSS methods utilizes the statistical property that the sources are mutually
uncorrelated or independent, supposing that the sources dosatisfy that property. Methods falling in this
class include second-order blind identification (SOBI) [12], fast fixed-point algorithm for independent
component analysis (ICA) [13], non-negative ICA (nICA) [7], stochastic non-negative ICA (SNICA)
[8], and Bayesian positive source separation (BPSS) [9], toname a few. SOBI and fast ICA were
originally developed for more general blind source separation (BSS) problems where signals can be
negative. The two methods can be directly applied to nBSS, but it has been reported that their separated
signals may have negative values especially in the presenceof finite sample effects [4], [14]. nICA takes
source non-negativity into account, and is shown to provideperfect separation when the sources have
non-vanishing density around zeros (which is also called the well-grounded condition). SNICA uses a
simulated annealing algorithm for extracting non-negative sources under the minimum mutual information
criterion. BPSS also uses source non-negativity. It applies Bayesian estimation with both the sources and
mixing matrix being assigned Gamma distribution priors.
Another class of nBSS methods is deterministic requiring noassumption on source independence
or zero correlations. Roughly speaking, these methods explicitly exploits source non-negativity or even
mixing matrix non-negativity, with an attempt to achieve some kind of least square criterion. Alternating
least squares (ALS) [10], [15] deals with a sequence of leastsquares problems where non-negativity
constraints on either the sources or mixing matrix are imposed. Non-negative matrix factorization (NMF)
[11] decomposes the observation matrix as a product of two non-negative matrices, one serving as the
estimate of the sources while another the mixing matrix. NMFis not a unique decomposition, which may
result in indeterminacy of the sources and mixing matrix. Toovercome this problem, a sparse constraint
on the sources has been proposed [16].
In this paper we propose a new nBSS framework, known as convexanalysis of mixtures of non-negative
sources (CAMNS). Convex analysis and optimization techniques have drawn considerable attention in
December 27, 2007 DRAFT
3
signal processing, serving as powerful tools for various topics such as communications [17]–[22], array
signal processing [23], and sensor networks [24]. Apart from using source non-negativity, adopts a special
deterministic assumption calledlocal dominance. This assumption was initially proposed to capture the
sparse characteristics of biomedical images [25], [26], but we found it a good assumption or approximation
for high contrast images such as human portraits, as well. Under the local dominant assumption and
some standard nBSS assumptions, we show using convex analysis that the true source signals serve as
the extreme points of some observation-constructed polyhedral set. This geometrical discovery is not
only surprising but important, since it provides a novel nBSS criterion that guarantees perfect blind
separation. To practically realize CAMNS, we derive extreme-point finding algorithms using either linear
programming or simplex geometry. As we will see by simulations, the blind separation performance of
the CAMNS algorithms is promising even in the presence of strongly correlated sources.
We should stress that the proposed CAMNS framework is deterministic, but it is conceptually different
from the other existing deterministic frameworks such as NMF. In particular, the idea of using convex
analysis to establish nBSS criterion cannot be found in the other frameworks. Moreover, CAMNS does
not require mixing matrix non-negativity while NMF requires. The closest match to CAMNS would be
non-negative least correlated component analysis (nLCA) [25], [26], a concurrent development that also
exploits the local dominance assumption. Nevertheless, convex analysis is not involved in nLCA.
CAMNS works by exploiting signal geometry, using convex analysis and optimization. It is interesting
to note that in BSS of magnitude bounded sources (BSS-MBS) [27]–[29], signal geometry would also be
utilized. For example, in [27] a key step is to use a neural learning algorithm to determine the vertices
of an observation space generated by magnitude bounded source signals. But we should also emphasize
that the underlying assumptions and techniques employed inCAMNS are substantially different from
those in BSS-MBS.
The paper is organized as follows. In Section II, the problemstatement is given. In Section III,
we review some key concepts of convex analysis, which would be useful for understanding of the
mathematical derivations that follow. The new BSS criterion for separating non-negative sources is
developed in Section IV. We show how to use linear programs (LPs) to practically achieve the new
criterion in Section V. Section VI studies a geometric alternative to the LP method. Finally, in Section
VII, we use simulations to evaluate the performance of the proposed methods as well as some other
existing nBSS methods.
December 27, 2007 DRAFT
4
II. PROBLEM STATEMENT AND ASSUMPTIONS
For ease of later use, let us define the following notations:
R, RN , R
M×N Set of real numbers,N -vectors,M × N matrices
R+, RN+ , R
M×N+ Set of non-negative real numbers,N -vectors,M × N matrices
1 All one vector
IN N × N identity matrix
ei Unit vector with theith entry being equal to 1
� Componentwise inequality
‖ · ‖ Euclidean norm
N (µ,Σ) Gaussian distribution with meanµ and covarianceΣ
The scenario under consideration is that of linear instantaneous mixtures. The signal model is given
by
x[n] = As[n], n = 1, . . . , L (1)
wheres[n] = [ s1[n], . . . , sN [n] ]T is the input or source vector sequence withN denoting the input
dimension,x[n] = [ x1[n], . . . , xM [n] ]T is the output or observation vector sequence withM denoting
the output dimension,A ∈ RM×N is the mixing matrix describing the input-output relation,andL is the
sequence (or data) length and we assumeL � max{M,N}. Note that (1) can be alternatively expressed
as
xi =N
∑
j=1
aijsj , i = 1, . . . ,M, (2)
whereaij is the (i, j)th element ofA, sj = [ sj[1], . . . , sj[L] ]T is a vector representing thejth source
signal andxi = [ xi[1], . . . , xi[L] ]T is a vector representing theith observed signal.
In BSS, the problem is to extracts[n] without information ofA. The BSS framework to be proposed
is based on the following assumptions:
(A1) All sj are componentwise non-negative; i.e., for eachj, sj ∈ RL+.
(A2) Each source signal vector islocal dominant, in the following sense: For eachi ∈ {1, . . . , N}, there
exists an (unknown) indexi such thatsi[`i] > 0 andsj[`i] = 0, ∀j 6= i. (This means that for each
source there is at least onen at which the source dominates.)
(A3) The mixing matrix has unit row sum; i.e., for alli = 1, . . . ,M ,
N∑
j=1
aij = 1. (3)
December 27, 2007 DRAFT
5
(A4) M ≥ N andA is of full column rank.
Let us discuss the practicality of(A1) − (A4). Assumption(A1) is true in image analysis [5], [6]
where image intensities are often represented by non-negative numbers. Assumption(A2) is special
and instrumental to the development that ensues. It may be completely satisfied or serve as a good
approximation when the source signals are sparse (or contain many zeros). In brain magnetic resonance
imaging (MRI), for instance, the non-overlapping region ofthe spatial distribution of a fast perfusion and a
slow perfusion source images [6] can be higher than95%. It may also be appropriate to assume(A2) when
the source signals exhibit high contrast. Assumption(A4) is a standard assumption in BSS. Assumption
(A3) is automatically satisfied in MRI due to the partial volume effect [26], and in hyperspectral images
due to the full additivity condition [5]. When(A3) is not satisfied, the following idea [26] can be used.
Suppose thatxTi 1 6= 0 andsT
j 1 6= 0 for all i and j, and consider the following normalized observation
vectors:
xi =xi
xTi 1
=N
∑
j=1
(aijs
Tj 1
xTi 1
)(sj
sTj 1
). (4)
By letting aij = aijsTj 1/xT
i 1 andsj = sj/sTj 1, we obtain a BSS problem formulationxi =
∑Nj=1 aij sj
which is in the same form as the original signal model in (2). It is easy to show that the new mixing
matrix, denoted byA, has unit row sum. In addition, the rank ofA is the same as that ofA. To show
this, we notice that
A = D−11 AD2 (5)
whereD1 = diag(xT1 1, ...,xT
M1) andD2 = diag(sT1 1, ..., sT
N1). SinceD1 andD2 are of full rank, we
have rank(A) =rank(A).
III. SOME BASIC CONCEPTS OFCONVEX ANALYSIS
We review some convex analysis concepts that will play an important role in the ensuing development.
For detailed explanations of these concepts, readers are referred to [30], [31], [32] which are excellent
literatures in convex analysis.
Given a set of vectors{s1, . . . , sN} ⊂ RL, the affine hull is defined as
aff{s1, . . . , sN} =
{
x =
N∑
i=1
θisi
∣
∣
∣
∣
θ ∈ RN ,
N∑
i=1
θi = 1
}
. (6)
An affine hull can always be represented by an affine set:
aff{s1, . . . , sN} ={
x = Cα+ d∣
∣ α ∈ RP
}
(7)
December 27, 2007 DRAFT
6
for some (non-unique)d ∈ RL andC ∈ R
L×P . Here,C is assumed to be of full column rank andP is
the affine dimension which must be less thanN . For example, if{s1, . . . , sN} is linearly independent,
thenP = N − 1. In that case, a legitimate(C,d) is given byC = [ s1 − sN , s2 − sN , . . . , sN−1 − sN ]
andd = sN . To give some insights, Fig. 1 pictorially illustrates an affine hull for N = 3. We can see
that it is a plane passing throughs1, s2, ands3.
Given a set of vectors{s1, . . . , sN} ⊂ RL, the convex hullis defined as
conv{s1, . . . , sN} =
{
x =
N∑
i=1
θisi
∣
∣
∣
∣
θ ∈ RN+ ,
N∑
i=1
θi = 1
}
. (8)
ForN = 3, a convex hull is a triangle with verticess1, s2, ands3 Geometricallys1, . . . , sN would be the
‘corner points’ of its convex hull, defined formally as theextreme points. A point x ∈ conv{s1, . . . , sN}
is an extreme point ofconv{s1, . . . , sN} if it cannot be a nontrivial convex combination ofs1, . . . , sN ;
i.e.,
x 6=N
∑
i=1
θisi (9)
for all θ ∈ RN+ ,
∑Ni=1 θi = 1, andθ 6= ei for any i. The set of extreme points ofconv{s1, . . . , sN} must
be either the full set or a subset of{s1, . . . , sN}. In addition, if {s1, . . . , sN} is an affinely independent
set (or {s1 − sN , . . . , sN−1 − sN} is a linearly independent set), then the set of extreme points of
conv{s1, . . . , sN} is exactly {s1, . . . , sN}. Moreover, the boundary ofconv{s1, . . . , sN} is entirely
constituted by all itsfaces, defined byconv{si1 , . . . , siK} where{i1, ..., iK} ⊂ {1, ..., N}, ik 6= il for
k 6= l, andK < N . A facet is a face withK = N − 1. For example, in Fig. 1, we can see that the facets
are the line segmentsconv{s1, s2}, conv{s2, s3} andconv{s1, s3}.
s1
s2
s3
Facets
0
aff{s1, s2, s3} = {x = Cα + d|α ∈ R2}
conv{s1, s2, s3}
Fig. 1. Example of 3-dimensional signal space geometry forN = 3.
A simplest simplexof affine dimensionN − 1, or (N − 1)-simplex is defined as the convex hull ofN
December 27, 2007 DRAFT
7
affinely independent vectors{s1, . . . , sN} ⊂ RN−1. An (N −1)-simplex is formed byN extreme points,
N facets, and a number of faces. The family of these simplest simplexes has the vertex-descriptions
including a point, line segment, triangle forN = 1, 2, and3, respectively.
IV. N EW NBSS CRITERION BY CONVEX ANALYSIS
Now, consider the BSS problem formulation in (2) and the assumptions in(A1)-(A4). Under(A3), we
see that every observationxi is an affine combination of the true source signals{s1, . . . , sN}; that is
xi ∈ aff{s1, . . . , sN} (10)
for all i = 1, . . . ,M . This leads to an interesting question whether the observationsx1, . . . ,xM provide
sufficient information to construct the source signal affinehull aff{s1, . . . , sN}. This is indeed possible,
as described in the following lemma:
Lemma 1. Under (A3) and (A4), we have
aff{s1, . . . , sN} = aff{x1, . . . ,xM}. (11)
The proof of Lemma 1 is given in Appendix A. Figure 2(a) demonstrates geometrically the validity
of Lemma 1, for the special case ofN = 2 where aff{s1, . . . , sN} is a line. Now let us focus on
the characterization ofaff{s1, . . . , sN}. It can easily be shown from(A2) that {s1, . . . , sN} is linearly
independent. Hence,aff{s1, . . . , sN} has dimensionN − 1 and admits a representation
aff{s1, . . . , sN} ={
x = Cα+ d∣
∣ α ∈ RN−1
}
(12)
for some(C,d) ∈ RL×(N−1)×R
L such thatrank(C) = N −1. Note that (C,d) is non-unique. Without
loss of generality, we can restrictC to be a semi-orthogonal matrix, i.e.,CTC = I. If M = N , it is easy
to obtain(C,d) from the observationsx1, . . . ,xM ; see the review in Section III. For the more general
case ofM ≥ N , (C,d) may be found by solving the following minimization problem
(C,d) = arg minC,d
CT C=I
M∑
i=1
eA(C,d)(xi) (13)
whereeA(x) is the projection error ofx ontoA, defined as
eA(x) = minx∈A
‖x− x‖22, (14)
and
A(C, d) = {x = Cα+ d | α ∈ RN−1} (15)
December 27, 2007 DRAFT
8
is an affine set parameterized by (C, d). The objective of (13) is to find an (N −1)-dimensional affine set
that has the minimum projection error with respect to the observations. At first glance, (13) is a nonconvex
optimization problem, but in fact it can be solved analytically as stated in the following proposition:
Proposition 1. The affine set fitting problem in (13) has a closed-form solution
d =1
M
M∑
i=1
xi (16)
C = [ q1(UUT ),q2(UUT ), . . . ,qN−1(UUT ) ] (17)
whereU = [ x1 −d, . . . ,xM −d ] ∈ RL×M , and the notationqi(R) denotes the eigenvector associated
with the ith principal eigenvalue of the input matrixR.
The proof of Proposition 1 is given in Appendix B. Since affineset fitting provides a best affine set
for minimizing the data fitting error, in the presence of noise it has a noise mitigation advantage.
s1s1
s2s2
x1
x2
x3
e1e1
e2e2
e3e3
RL+
aff{s1, s2} = aff{x1,x2,x3}
S = A(C,d) ∩ RL+
= conv{s1, s2}
(a) (b)
Fig. 2. Geometric illustrations of CAMNS, for the special case of N = 2, M = 3, and L = 3. (a) Affine set geometry
indicated by Lemma 1 and Proposition 1; (b) convex hull geometry suggested in Lemma 2.
Recall that the source signals are non-negative. Hence, we havesi ∈ aff{s1, . . . , sN} ∩RL+ for any i.
Let us define
S = aff{s1, . . . , sN} ∩ RL+ = A(C,d) ∩ R
L+ (18)
= {x | x = Cα+ d, x � 0, α ∈ RN−1} (19)
December 27, 2007 DRAFT
9
which is a polyhedral set. We can show that
Lemma 2. Under (A1) and (A2), we have
S = conv{s1, . . . , sN}. (20)
The proof of Lemma 2 is given in Appendix C. Following the simple illustrative example in Figure 2(a),
in Figure 2(b) we verify geometrically thatS is equivalent toconv{s1, . . . , sN} for N = 2. From
Lemma 2, we notice an important consequence that the set of extreme points ofS or conv{s1, . . . , sN}
is {s1, . . . , sN}, since{s1, . . . , sN} is linearly independent [due to(A2)]. The extremal property of
{s1, . . . , sN} can be seen in the illustration in Figure 2(b).
From the derivations above, we conclude that
Theorem 1. (nBSS criterion by CAMNS) Under (A1) to (A4), the polyhedral set
S ={
x ∈ RL
∣
∣ x = Cα+ d � 0, α ∈ RN−1
}
(21)
where (C,d) is obtained from the observation set{x1, ...,xM} by the affine set fitting procedure in
Proposition 1, hasN extreme points given by the true source vectorss1, ..., sN .
Proof: Theorem 1 is a direct consequence of Lemma 1, Proposition 1, Lemma 2, and the basic
result that the extreme points ofconv{s1, . . . , sN} ares1, . . . , sN .
The theoretical implication of Theorem 1 is profound: It suggests that the true source vectors can
be perfectly identified by finding all the extreme points ofS. This provides new opportunities in nBSS
that cannot be found in the other presently available literature to our best knowledge. Exploring these
opportunities in practice are the subject of the following sections.
V. L INEAR PROGRAMMING METHOD FORCAMNS
This section, as well as the next section are dedicated to thepractical implementation of CAMNS. In
this section, we propose an approach that uses linear programs (LPs) to systematically fulfil the CAMNS
criterion. The next section will describe a geometric approach as an alternative to the LP.
Our problem as indicated in Theorem 1 is to find all the extremepoints of the polyhedral setS in (19). In
the optimization literature this problem is known as vertexenumeration; see [33]–[35] and the references
therein. The available extreme-point finding methods are sophisticated, requiring no assumption on the
extreme points. However, the complexity of those methods would increase exponentially with the number
of inequalitiesL (note thatL is also the data length in our problem, which is often large inpractice),
December 27, 2007 DRAFT
10
except for a few special cases not applicable to this work. The notable difference of the development
here is that we exploit the characteristic that the extreme points s1, ..., sN are linearly independent in
the CAMNS problem [recall that this property is a direct consequence of(A2)]. By doing so we will
establish an extreme-point finding method (for CAMNS) whosecomplexity is polynomial inL.
We first concentrate on identifying one extreme point fromS. Consider the following linear minimiza-
tion problem:
p? = mins
rTs
subject to (s.t.) s ∈ S
(22)
for some arbitrarily chosen directionr ∈ RL, wherep? denotes the optimal objective value of (22). Using
the polyhedral structures ofS in (19), problem (22) can be equivalently represented by an LP
p? = minα
rT (Cα+ d)
s.t. Cα+ d � 0
(23)
which can be solved by readily available algorithms such as the polynomial-time interior-point meth-
ods [36], [37]. Problem (23) is the problem we solve in practice, but (22) leads to important implications
to extreme-point search.
A fundamental result in LP theory is thatrTs, the objective function of (22), attains the minimum at
a point of the boundary ofS. To provide more insights, some geometric illustrations are given in Fig. 3.
We can see that the solution of (22) may be uniquely given by one of the extreme pointssi [Fig. 3(a)],
or it may be any point on a face [Fig. 3(b)]. The latter case poses a trouble to our task of identifyingsi,
but it is arguably not a usual situation. For instance, in thedemonstration in Fig. 3(b),r must be normal
to s2 − s3 which may be unlikely to happen for a randomly pickedr. With this intuition in mind, we
prove in Appendix D that
Lemma 3. Suppose thatr is randomly generated following a distributionN (0, IL). Then, with proba-
bility 1, the solution of (22) is uniquely given bysi for somei ∈ {1, ..., N}.
The idea behind Lemma 3 is to show that undesired cases, such as that in Fig. 3(b) happen with probability
zero.
We may find another extreme point by solving the maximizationcounterpart of (22)
q? = maxα
rT (Cα+ d)
s.t. Cα+ d � 0.
(24)
December 27, 2007 DRAFT
11
00
rr
s1s1
s2s2
s3s3
(a) (b)
SS
optimal point set of optimal points
Fig. 3. Geometric interpretation of an LP.
Using the same development as above, we can show the following: Under the premise of Lemma 3,
the solution of (24) is, with probability 1, uniquely given by an extreme pointsi different from that in
(22).
Suppose that we have identifiedl extreme points, say, without loss of generality,{s1, ..., sl}. Our
interest is in refining the above LP extreme-point finding procedure such that the search space is restricted
to {sl+1, ..., sN}. To do so, consider a thin QR decomposition [38] of[s1, ..., sl]
[s1, ..., sl] = Q1R1, (25)
whereQ1 ∈ RL×l is semi-unitary andR1 ∈ R
l×l is upper triangular. Let
B = IL − Q1QT1 . (26)
We assume thatr takes the form
r = Bw (27)
for somew ∈ RL, and consider solving (23) and (24) with such anr. Sincer is orthogonal to the old
extreme pointss1, ..., sl, the intuitive expectation is that (23) and (24) should bothlead to new extreme
points. Interestingly, we found theoretically that expectation is not true, but close. Consider the following
lemma which is proven in Appendix E:
Lemma 4. Suppose thatr = Bw, whereB ∈ RL×L is given by (26) andw is randomly generated
following a distributionN (0, IL). Then, with probability 1, at least one of the optimal solutions of (23)
December 27, 2007 DRAFT
12
and (24) is a new extreme point; i.e.,si for somei ∈ {l+1, ..., N}. The certificate of finding new extreme
points is indicated by|p?| 6= 0 for the case of (23), and|q?| 6= 0 for (24).
By repeating the above described procedures, we can identify all the extreme pointss1, ..., sN . The
resultant CAMNS-LP method is summarized in Table I.
TABLE I
A SUMMARY OF CAMNS-LP METHOD
CAMNS-LP method
Given an affine set characterization 2-tuple (C,d).
Step 1. Set l = 0, andB = IL.
Step 2. Randomly generate a vectorw ∼ N (0, IL), and setr := Bw.
Step 3. Solve the LPs
p? = minα:Cα+d�0
rT (Cα + d)
q? = maxα:Cα+d�0
rT (Cα + d)
and obtain their optimal solutions, denoted byα?
1 andα?
2, respectively.
Step 4. If l = 0
S = [ Cα?
1 + d, Cα?
2 + d ]
else
If |p?| 6= 0 then S := [ S Cα?
1 + d ].
If |q?| 6= 0 then S := [ S Cα?
2 + d ].
Step 5. Updatel to be the number of columns ofS.
Step 6. Apply QR decomposition
S = Q1R1,
whereQ1 ∈ RL×l andR1 ∈ R
l×l. UpdateB := IL −Q1QT
1 .
Step 7. RepeatStep 2 until l = N .
The CAMNS-LP method in Table I is not only systematically straightforward to apply, it is also efficient
due to the maturity of convex optimization algorithms. Using a primal-dual interior-point method, each
LP problem [or the problem in (22) or (24)] can be solved with aworst-case complexity ofO(L0.5(L(N−
1) + (N − 1)3)) ' O(L1.5(N − 1)) for L � N [37]. Since the algorithm solves2(N − 1) LP problems
in the worst case, we infer that its worst-case complexity isO(L1.5(N − 1)2).
December 27, 2007 DRAFT
13
Based on Theorem 1, Lemma 3, Lemma 4, and the above complexitydiscussion, we assert that
Proposition 2. Under (A1)-(A4), the CAMNS-LP method in Table I finds all the true source vectors
s1, ..., sN with probability 1. It does so with a worst-case complexity of O(L1.5(N − 1)2).
We have provided a MATLAB implementation of the CAMNS-LP method in http://www.ee.
nthu.edu.tw/∼wkma/CAMNS/CAMNS.htm. Readers are encouraged to test the code and give us
feedback.
VI. GEOMETRIC METHOD FORCAMNS
The CAMNS-LP method developed in the last section effectively uses numerical optimization to achieve
the CAMNS criterion. In this section we develop analytical or semi-analytical alternatives to achieving
CAMNS, that are simple and more efficient than CAMNS-LP. The methods to be proposed are for the
cases of two and three sources only, where the relatively simple geometrical structures in the two cases
are utilized. To facilitate the development, in Section VI-A we provide an alternate form of the CAMNS
criterion. Then, Sections VI-B and VI-C describe the geometric algorithms for two and three sources,
respectively.
A. An Alternate nBSS Criterion
Let us consider the pre-image ofS under the mappings = Cα+ d, denoted by
F ={
α ∈ RN−1
∣
∣ Cα+ d � 0}
={
α ∈ RN−1
∣
∣ cTnα+ dn ≥ 0, n = 1, . . . , L
}
(28)
wherecTn is the nth row of C. There is a direct correspondence between the extreme points of S and
F , as described in the following lemma:
Lemma 5. The polyhedral setF in (28) is equivalent to an (N-1)-simplex
F = conv{α1, . . . ,αN} (29)
where eachαi ∈ RN−1 satisfies
Cαi + d = si. (30)
The proof of Lemma 5 is given in Appendix F. Hence, as an alternative to our previously proposed
approach where we find{s1, ..., sN} by identifying the extreme points ofS, we can achieve perfect
December 27, 2007 DRAFT
14
blind separation by identifying the extreme points ofF . In this alternative, we are aided by a useful
convex analysis result presented as follows:
Lemma 6. (Extreme point validation for polyhedra [31]) For a polyhedral set in form of (28), a point
α ∈ F is an extreme point ofF if and only if the following collection of vectors
Cα = { cn ∈ RN−1 | cT
nα = −dn, n = 1, . . . , L } (31)
containsN − 1 linearly independent vectors.
In summary, an alternate version of the BSS criterion in Theorem 1 is given as follows
Theorem 2. (Alternate nBSS criterion by CAMNS) Under (A1) to (A4), the set
F ={
α ∈ RN−1
∣
∣ Cα+ d � 0}
(32)
where(C,d) are obtained from the observationsx1, . . . ,xM via the affine set fitting solution in Propo-
sition 1, hasN extreme pointsα1, . . . ,αN . Each extreme point corresponds to a source signal through
the relationshipCαi +d = si. The extreme points ofF may be validated by the procedure in Lemma 6.
Proof: Theorem 2 directly follows from Theorem 1, Lemma 5, and Lemma6.
Based on Theorem 2, we propose geometric based methods for the cases of two and three sources in
the following two subsections.
B. Two Sources: A Simple Closed-Form Solution
For the case ofN = 2, the setF is simply
F = {α ∈ R | cnα + dn ≥ 0, n = 1, ..., L}. (33)
By Lemma 6,α is extremal if and only if
α = −dn/cn, cn 6= 0, α ∈ F (34)
for somen ∈ {1, ..., L}. From (33) we see thatα ∈ F implies the following two conditions:
α ≥ −dn/cn, for all n such that cn > 0, (35)
α ≤ −dn/cn, for all n such that cn < 0. (36)
December 27, 2007 DRAFT
15
We therefore conclude from (34), (35), and (36) that the extreme points are obtained ifα1 > α2, given
by
α1 = min{−dn/cn | cn < 0, n = 1, 2, ..., L}, (37)
α2 = max{−dn/cn | cn > 0, n = 1, 2, ..., L} (38)
which are simple closed-form expressions.
C. Three Sources: A Semi-Analytic Solution
α1
α2
α3
F
Hk1
Hk2
Hk3
Fig. 4. Geometric illustration ofF for N = 3.
For N = 3, the setF is a triangle onR2. By exploiting the relatively simple geometry ofF , we can
locate the extreme points very effectively. Figure 4 shows the geometry ofF in this case. We see that the
boundary ofF is entirely constituted by three facetsconv{α1,α2}, conv{α1,α3}, andconv{α2,α3}.
They may be represented by the following polyhedral expression
conv{α1,α2} = F ∩Hk1, (39)
conv{α1,α3} = F ∩Hk2, (40)
conv{α2,α3} = F ∩Hk3, (41)
for somek1, k2, k3 ∈ {1, ..., L}, where
Hk = {α ∈ R2 | cT
kα = −dk}. (42)
Suppose that (k1, k2, k3) is known. Then, by Lemma 6, the three extreme points are given by the closed
form
α1 = −
cTk1
cTk2
−1
dk1
dk2
, (43a)
December 27, 2007 DRAFT
16
α2 = −
cTk1
cTk3
−1
dk1
dk3
, (43b)
α3 = −
cTk2
cTk3
−1
dk2
dk3
. (43c)
A geometric interpretation of (43) is as follows: an extremepoint is the intersection of any two of the
facet-forming linesHk1, Hk2
, andHk3. This can also be seen in Figure 4.
Inspired by the fact that finding all the facets is equivalentto finding all the extreme points, we propose
an extreme-point finding heuristic in Table II. The idea behind is illustrated in Figure 5. In the first stage
[Figure 5(a)], we move from some interior pointα′ over a directionr until we reach the boundary. This
process helps identify one facet line, sayHk1. Similarly, by travelling over the direction opposite tor,
we may locate another facet line, sayHk2. The intersection ofHk1
andHk2results in finding an extreme
point α1. In the second stage [Figure 5(b)], we use the same idea to locate the last facet lineHk3, by
travelling over directionα′ −α1.
α1α1
α′ −α1
α′α′
(a) (b)
Hk1
Hk2
Hk3
r
ψ1
ψ2
ψ3
Fig. 5. Illustration of the operations of the geometric extreme-point finding algorithm forN = 3.
The proposed algorithm requires an interior pointα′ as the starting point for facet search. Sometimes,
the problem nature allows us to determine such a point easily. For instance, if the mixing matrix is
componentwise non-negative oraij ≥ 0 for all i, j, then it can be verified thatα′ = 0 is interior toF .
When an interior point is not known, we can find one numerically by solving the LP
maxα, β
β
s.t. cTnα+ dn ≥ β, n = 1, ..., L
(44)
(This is known as the phase I method in optimization [30].)
December 27, 2007 DRAFT
17
TABLE II
A SUMMARY OF GEOMETRIC EXTREME-POINT FINDING ALGORITHM
Geometric Extreme-Point Finding Algorithm for N = 3
Given an affine set characterization 2-tuple (C,d), and a vectorα′ interior toF .Step 1. Randomly generate a directionr ∼ N (0, I2).Step 2. Locate a boundary point
ψ1 = α′ + t1r
where
t1 = sup{t | α′ + tr ∈ F}
= min{−(cT
nα′ + dn)/cT
nr | cT
nr < 0, n = 1, ..., L}.
Step 3. Find the index set
K1 = {n | cT
nψ1 = −dn, n = 1, ..., L}
If {ck | k ∈ K1} contains2 linearly independent vectors (i.e.,ψ1 is an extreme pointand there is an indeterminacy thatHk 6= Hl for somek, l ∈ K1), then go toStep 1.
Step 4. Locate a boundary pointψ2 = α
′ + t2r
where
t2 = inf{t | α′ + tr ∈ F}
= max{−(cT
nα′ + dn)/cT
n r | cT
nr > 0, n = 1, ..., L}.
Step 5. Find the index set
K2 = {n | cT
nψ3 = −dn, n = 1, ..., L}
If {ck | k ∈ K2} contains2 linearly independent vectors, then go toStep 1.Step 6. Determine
α1 = −
�cT
k1
cT
k2 �−1 �dk1
dk2 � ,
for an arbitraryk1 ∈ K1 andk2 ∈ K2.Step 7. Setr = α′ − α1, and locate a boundary point
ψ3 = α′ + t3r
where
t3 = sup{t | α′ + tr ∈ F}
= min{−(cT
nα′ + dn)/cT
nr | cT
nr < 0, n = 1, ..., L}.
Step 8. Find the index set
K3 = {n | cT
nψ3 = −dn, n = 1, ..., L}.
Step 9. Determine
α2 = −
�cT
k1
cT
k3 �−1�dk1
dk3 � , α3 = −
�cT
k2
cT
k3 �−1 �dk2
dk3 � ,
for an arbitraryk1 ∈ K1, k2 ∈ K2, andk3 ∈ K3.
December 27, 2007 DRAFT
18
VII. S IMULATIONS
To demonstrate the efficacy of the CAMNS-based algorithms, four simulation results are presented here.
Section VII-A is an X-ray image example where our task is to distinguish bone structures from soft tissue.
Section VII-B considers a benchmarked problem [2] in which the sources are faces of three different
persons. Section VII-C focuses on a challenging scenario reminiscent of ghosting effects in photography.
Section VII-D uses Monte Carlo simulation to evaluate the performance of CAMNS-based algorithms
under noisy condition. For performance comparison, we alsotest three existing nBSS algorithms, namely
non-negative least-correlated component analysis (nLCA)[26], non-negative matrix factorization (NMF)
[11], and non-negative independent component analysis (nICA) [7].
The performance measure used in this paper is described as follows. Let S = [s1, . . . , sN ] be the
true multi-source signal matrix, andS = [s1, . . . , sN ] be the multi-source output of a BSS algorithm.
It is well known that a BSS algorithm is inherently subject topermutation and scaling ambiguities. We
propose a sum square error (SSE) measure forS and S [39], [40], given as follows:
e(S, S) = minπ∈ΠN
N∑
i=1
∥
∥
∥
∥
si −‖si‖
‖sπi‖sπi
∥
∥
∥
∥
2
(45)
whereπ = (π1, . . . , πN ), and ΠN = {π ∈ RN | πi ∈ {1, 2, . . . , N}, πi 6= πj for i 6= j} is the set
of all permutations of{1, 2, ..., N}. The optimization of (45) is to adjust the permutationπ such that
the best match between true and estimated signals is yielded, while the factor‖si‖/‖sπi‖ is to fix the
scaling ambiguity. Problem (45) is the optimal assignment problem which can be efficiently solved by
Hungarian algorithm1 [41].
A. Example ofN = M = 2: Dual-energy Chest X-ray Imaging
Dual-energy chest x-ray imaging is clinically used for detecting calcified granuloma, a symptom of
lung nodules [42]. The diagnostic images are acquired from two stacked detectors separated by a copper
filter along which x-rays at two different energies are passed. For visualizing the symptom of calcified
granuloma, it is necessary to separate bone structures and soft tissue from the diagnostic images.
In this simulation we have two164× 164 source images, one representing bone structure and another
soft tissue. The two images can be found in [43] and they are displayed in Figure 6(a). Each image
is represented by a source vectorsi ∈ RL, by scanning the image vertically from top left to bottom
right (therebyL = 1642 = 26896). We found that the two source signals satisfy the local dominant
1A Matlab implementation is available athttp://si.utia.cas.cz/Tichavsky.html.
December 27, 2007 DRAFT
19
assumption [or(A2)] perfectly, by numerical inspection. The observation vectors, or the diagnostic images
are synthetically generated using a mixing matrix
A =
0.55 0.45
0.63 0.37
. (46)
The mixed images are shown in Figure 6(b). The separated images of the various nBSS methods are
illustrated in Figure 6(c)-(g). By visual inspection, the CAMNS-based methods, nICA, and nLCA appear
to work equally well. In Table IV the various methods are quantitatively compared, using the SSE in (45).
The table suggests that the CAMNS-based methods, along sidewith nLCA achieve perfect separation.
B. Example ofN = M = 3: Human Face Separation
Three128 × 128 human face images, taken from the benchmarks in [2], are usedto generate three
observations. The mixing matrix is
A =
0.20 0.62 0.18
0.35 0.37 0.28
0.40 0.40 0.20
. (47)
In this example, the local dominant assumption is not perfectly satisfied. To shed some light into this,
we propose a measure called thelocal dominance proximity factor(LDPF) of theith source, defined as
follows:
κi = maxn=1,...,L
si[n]∑
j 6=i sj[n]. (48)
Whenκi = ∞, we have theith source satisfying the local dominant assumption perfectly. The values of
κi’s in this example are shown in Table III, where we see that theLDPFs of the three sources are strong
but not infinite.
Figure 7 shows the separated images of the various nBSS methods. We see that the CAMNS-based
methods and nLCA provide good separation, despite the fact that the local dominance assumption is not
perfectly satisfied. This result indicates that the CAMNS-based methods have some robustness against
violation of local dominance. Moreover, the nICA works poorly due to the violation of the assumption
of uncorrelated sources. The SSE performance of the variousmethods is given in Table IV, where we
have two observations. First, the CAMNS-LP method yields the best performance among all the methods
under test. Second, the performance of the LP method is better than that of the CAMNS-geometric
method. The latter suggests that the LP method is more robustthan the geometric method, when local
dominance is not exactly satisfied. This result will be further confirmed in the Monte Carlo simulation
in Section VII-D.
December 27, 2007 DRAFT
20
��� ��� ���
��� ��� ��� ��
Fig. 6. Dual-energy chest x-ray imaging: (a) the sources, (b) the observations, and the extracted sources obtained by (c)
CAMNS-LP method, (d) CAMNS-geometric method, (e) nLCA, (f)NMF and (g) nICA.
C. Example ofM = N = 4: Ghosting Effect
We take a285 × 285 Lena image from [2] as one source and then shift it diagonallyto create three
more sources; see Figure 8(a). Apparently, these sources are strongly correlated. Even worse, their LDPFs,
December 27, 2007 DRAFT
21
��� ��� ���
��� ��� ��� ��
Fig. 7. Human face separation: (a) the sources, (b) the observations, and the extracted sources obtained by (c) CAMNS-LP
method, (d) CAMNS-geometric method, (e) nLCA, (f) NMF and (g) nICA.
December 27, 2007 DRAFT
22
shown in Table III are not too satisfactory compared to the previous two examples. The mixing matrix is
A =
0.02 0.37 0.31 0.30
0.31 0.21 0.26 0.22
0.05 0.38 0.28 0.29
0.33 0.23 0.21 0.23
. (49)
Figure 8(b) displays the observations, where the mixing effect is reminiscent of the ghosting effect
in analog televisions. The image separation results are illustrated in Figure 8(c)-(f). Clearly, only the
CAMNS-LP method and nLCA provide sufficiently good mitigation of the “ghosts”. This result once
again suggests that the CAMNS-based methods (as well as nLCA) is not too sensitive to the effect of
local dominance violation. Regarding the comparison of theCAMNS-LP method and nLCA, one may
find that the nLCA separated images have some ghosting residuals, upon very careful visual inspection.
As for the proposed method, we argue that the residuals are harder to notice. Moreover, our numerical
inspection found a problem that the nLCA signal outputs havenegative values sometimes. For this reason,
we see in Table IV that the SSE of nLCA is about 10dB larger thanthat of the CAMNS-LP method.
TABLE III
LOCAL DOMINANCE PROXIMITY FACTORS IN THE THREE SCENARIOS.
κi
source 1 source 2 source 3 source 4
Dual-energy X-ray ∞ ∞ - -
Human face separation 15.625 6.172 10.000 -
Ghosting reduction 2.133 2.385 2.384 2.080
TABLE IV
THE SSES OF THE VARIOUS NBSSMETHODS IN THE THREE SCENARIOS.
SSEe(S, S) (in dB)
CAMNS-LP CAMNS-Geometric nLCA NMF nICA
Dual-energy X-ray -252.2154 -247.2876 -259.0132 30.8372 24.4208
Human face separation 9.4991 18.2349 19.6589 32.0085 38.5425
Ghosting reduction 20.7535 - 31.3767 38.6202 41.8963
December 27, 2007 DRAFT
23
���
���
���
���
���
���
Fig. 8. Ghosting reduction: (a) the sources, (b) the observations and the extracted sources obtained by (c) CAMNS-LP method,
(d) nLCA, (e) NMF and (f) nICA.
December 27, 2007 DRAFT
24
D. Example ofM = 6, N = 3: Noisy Environment
We use Monte Carlo simulation to test the performance of the various methods when noise is present.
The three face images in Figure 7(a) were used to generate sixnoisy observations. The noise is in-
dependently and identically distributed (i.i.d.), following a Gaussian distribution with zero mean and
varianceσ2. To maintain non-negativity of the observations in the simulation, we force the negative
noisy observations to zero. We performed 100 independent runs. At each run the mixing matrix was i.i.d.
uniformly generated on [0,1] and then each row is normalizedto 1 to maintain (A3). The average errors
e(S, S) for different SNRs (defined here as SNR=∑N
i=1 ‖si‖2/LNσ2
i ) are shown in Figure 9. One can
see that the CAMNS-LP method performs better than the other methods.
�� �� �� ����
��
��
��
��
��
��
��
���� �
���
����
��
�ˆ
(,
)e
SS ��������
����
���������������
� ��
��!
Fig. 9. Performance evaluation of the CAMNS-based methods,nLCA, NMF, and nICA for the human face images experiment
under noisy condition.
VIII. C ONCLUSION
We have developed a convex analysis based framework for non-negative blind source separation.
The core of the framework is a new nBSS criterion, which guarantees perfect separation under some
assumptions [see (A1)-(A4)] that are realistic in many applications such as multichannel biomedical
imaging. To practically realize this result, we have proposed a systematic LP-based method for fulfilling
the criterion. We should mention a side benefit that the LP method deals with linear optimization that can
be solved efficiently and does not suffer from local minima. Moreover, we have used simplex geometry to
establish a computationally very cheap alternative to the LP method. Our current development has led to
December 27, 2007 DRAFT
25
two simple geometric algorithms, for two and three sources respectively. Future direction should consider
extension of the geometric approach to four sources and beyond. We anticipate that the extension would
be increasingly complex in a combinatorial manner. By contrast, the comparatively more expensive LP
method does not have such a troubleper se, and is applicable to any number of sources.
We have also performed extensive simulations to evaluate the separation performance of the CAMNS-
based methods, under several scenarios such as x-ray, humanportraits, and ghosting. The results indicate
that the LP method offers the best performance among the various methods under test.
APPENDIX
A. Proof of Lemma 1
Any x ∈ aff{x1, ...,xM} can be represented by
x =
M∑
i=1
θixi, (50)
whereθ ∈ RM , θT1 = 1. Substituting (2) into (50), we get
x =
N∑
j=1
βjsj , (51)
whereβj =∑M
i=1 θiaij for j = 1, ..., N , or equivalently
β = ATθ. (52)
SinceA has unit row sum [(A3)], we have
βT 1 = θT (A1) = θT1 = 1. (53)
This implies thatβT1 = 1, and as a result it follows from (51) thatx ∈ aff{s1, ..., sN}.
On the other hand, anyx ∈ aff{s1, ..., sN} can be represented by (51) forβT1 = 1. SinceA has full
column rank [(A4)], there always exist aθ such that (52) holds. Substituting (52) into (51) yields (50).
Since (53) implies thatθT1 = 1, we conclude thatx ∈ aff{x1, ...,xM}.
B. Proof of Proposition 1
As a basic result in least squares, each projection error in (13)
eA(C,d)(xi) = minα∈R
N−1
‖Cα+ d− xi‖22 (54)
has a closed form
eA(C,d)(xi) = (xi − d)TP⊥c (xi − d) (55)
December 27, 2007 DRAFT
26
whereP⊥c is the orthogonal complement projection ofC. Using (55), we can therefore rewrite the affine
set fitting problem [in (13)] as
minC
TC=I
{
mind
J(C, d)
}
(56)
where
J(C, d) =M∑
i=1
(xi − d)TP⊥c (xi − d) (57)
The inner minimization problem in (56) is an unconstrained convex quadratic program, and it can be easily
verified thatd = 1M
∑Mi=1 xi is an optimal solution to the inner minimization problem. Bysubstituting
this optimald into (56) and by lettingU = [x1 − d, ...,xM − d], problem (56) can be reduced to
minCT C=IN−1
Trace{UTP⊥c U} (58)
When CT C = IN−1, the projection matrixP⊥c can be simplified toIL − CCT . Subsequently (58) can
be further reduced to
maxC
TC=IN−1
Trace{UT CCTU}. (59)
An optimal solution of (59) is known to be theN − 1 principal eigenvector matrix [44].
C. Proof of Lemma 2
Assume thatz ∈ aff{s1, ..., sN} ∩ RL+:
z =
N∑
i=1
θisi � 0, 1Tθ = 1.
From (A2), it follows thatz[`i] = θisi[`i] ≥ 0, ∀i. Sincesi[`i] > 0, we must haveθi ≥ 0, ∀i. Therefore,
z lies in conv{s1, ..., sN}. On the other hand, assume thatz ∈ conv{s1, ..., sN}, i.e.,
z =
N∑
i=1
θisi, 1Tθ = 1, θ � 0
implying thatz ∈ aff{s1, ..., sN}. From(A1), we havesi � 0 ∀i and subsequentlyz � 0. This completes
the proof for (20).
D. Proof of Lemma 3
Any point in S = conv{s1, ..., sN } can be equivalently represented bys =∑N
i=1 θisi, whereθ � 0
andθT1 = 1. Applying this result to (22), problem (22) can be reformulated as
minθ∈RN
∑Ni=1 θiρi
s.t. θT 1 = 1, θ � 0.(60)
December 27, 2007 DRAFT
27
whereρi = rTsi. We assume without loss of generality thatρ1 < ρ2 ≤ · · · ≤ ρN . If ρ1 < ρ2 < · · · < ρN ,
then it is easy to verify that the optimal solution to (60) is uniquely given byθ? = e1. In its counterpart
in (22), this translates intos? = s1. But whenρ1 = ρ2 = · · · = ρP and ρP < ρP+1 ≤ · · · ≤ ρN for
someP , the solution of (60) is not unique. In essence, the latter case can be shown to have a solution
set
Θ = {θ | θT1 = 1, θ � 0, θP+1 = ... = θN = 0}. (61)
We now prove that the non-unique solution case happens with probability zero. Suppose thatρi = ρj
for somei 6= j, which means that
(si − sj)T r = 0. (62)
Let v = (si−sj)T r. Apparently,v follows a distributionN (0, ‖si−sj‖
2). Sincesi 6= sj , the probability
Pr[ρi = ρj] = Pr[v = 0] is of measure zero. This in turn implies thatρ1 < ρ2 < · · · < ρN holds with
probability 1.
E. Proof of Lemma 4
The approach to proving Lemma 4 is similar to that in Lemma 3. Let
ρi = rTsi = (Bw)Tsi (63)
for which we haveρi = 0 for i = 1, ..., l. It can be shown that
ρl+1 < ρl+2 < · · · < ρN (64)
holds with probability 1, as long as{s1, . . . , sN} is linearly independent. Problems (22) and (24) are
respectively equivalent to
p? = minθ∈RN
N∑
i=l+1
θiρi
s.t. θ � 0, θT1 = 1,
(65)
q? = maxθ∈RN
N∑
i=l+1
θiρi
s.t. θ � 0, θT1 = 1.
(66)
Assuming (64), we have three distinct cases to consider
(C1) ρl+1 < 0, ρN < 0,
(C2) ρl+1 < 0, ρN > 0,
December 27, 2007 DRAFT
28
(C3) ρl+1 > 0, ρN > 0.
For (C2), we can see the following: Problem (65) has a unique optimal solutionθ? = el+1 [ands? = sl+1
in its counterpart in (22)], attaining an optimal valuep? = ρl+1 < 0. Problem (66) has a unique optimal
solutionθ? = eN [and s? = sN in its counterpart in (24)], attaining an optimal valueq? = ρN > 0. In
other words, both (65) and (66) lead to finding of new extreme points. For(C1), it is still true that (65)
finds a new extreme point withp? < 0. However, problem (66) is shown to have a solution set
Θ = {θ | θT1 = 1, θ � 0, θl+1 = · · · = θN = 0}. (67)
which contains convex combinations of the old extreme points, and the optimal value isq? = 0. A similar
condition happens with(C3), where (66) finds a new extreme point withq? > 0 while (65) does not
with p? = 0.
F. Proof of Lemma 5
Equation (28) can also be expressed as
F ={
α ∈ RN−1 | Cα+ d ∈ conv{s1, ..., sN}
}
.
Thus, everyα ∈ F satisfies
Cα+ d =N
∑
i=1
θisi (68)
for someθ � 0, θT1 = 1. SinceC has full column rank, (68) can be re-expressed as
α =
N∑
i=1
θiαi, (69)
whereαi = (CTC)−1CT (si−d) (or Cαi+d = si). Equation (69) implies thatF = conv{α1, ...,αN}.
Now, assume that{α1, ...,αN} are affinely dependent, i.e., there must exist an extreme point αN =∑N−1
i=1 γiαi where∑N−1
i=1 γi = 1. One then hassN = CαN + d =∑N−1
i=1 γisi where∑N−1
i=1 γi = 1
which implies{s1, ..., sN} are affinely dependent (contradiction). Thus, the setF is an(N −1)-simplex.
REFERENCES
[1] A. Hyvarinen, J. Karhunen, and E. Oja,Independent Component Analysis. New York: John Wiley, 2001.
[2] A. Cichocki and S. Amari,Adaptive Blind Signal and Image Processing. John Wiley and Sons, Inc., 2002.
[3] E. R. Malinowski,Factor Analysis in Chemistry. New York: John Wiley, 2002.
[4] D. Nuzillard and J.-M. Nuzillard, “Application of blindsource separation to 1-D and 2-D nuclear magnetic resonance
spectroscopy,”IEEE Trans. Signal Process. Lett., vol. 5, no. 8, pp. 209–211, Feb. 1998.
December 27, 2007 DRAFT
29
[5] J. M. P. Nascimento and J. M. B. Dias, “Does independent component analysis play a role in unmixing hyperspectral
data?”IEEE Trans. Geosci. Remote Sensing, vol. 43, no. 1, pp. 175–187, Jan. 2005.
[6] Y. Wang, J. Xuan, R. Srikanchana, and P. L. Choyke, “Modeling and reconstruction of mixed functional and molecular
patterns,”International Journal of Biomedical Imaging, vol. 2006, pp. 1–9, 2006.
[7] M. D. Plumbley, “Algorithms for non-negative independent component analysis,”IEEE Trans. Neural Netw., vol. 14, no. 3,
pp. 534–543, 2003.
[8] S. A. Astakhov, H. Stogbauer, A. Kraskov, and P. Grassberger, “Monte Carlo algorithm for least dependent non-negative
mixture decomposition,”Analytical Chemistry, vol. 78, no. 5, pp. 1620–1627, 2006.
[9] S. Moussaoui, D. Brie, A. Mohammad-Djafari, and C. Carteret, “Separation of non-negative mixture of non-negative
sources using a Bayesian approach and MCMC sampling,”IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4133–4145,
Nov. 2006.
[10] R. Tauler and B. Kowalski, “Multivariate curve resolution applied to spectral data from multiple runs of an industrial
process,”Anal. Chem., vol. 65, pp. 2040–2047, 1993.
[11] D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,”Nature, vol. 401, pp. 788–791,
Oct. 1999.
[12] A. Belouchrani, K. Meraim, J.-F. Cardoso, and E. Moulines, “A blind source separation technique using second order
statistics,”IEEE Trans. Signal Process., vol. 45, no. 2, pp. 434–444, Feb. 1997.
[13] A. Hyvarinen and E. Oja, “A fixed-point algorithm for independent component analysis,”Neur. Comput., vol. 9, pp.
1483–1492, 1997.
[14] M. S. Bartlett, J. R. Movellan, and T. J. Sejnowski, “Face recognition by independent component analysis,”IEEE Trans.
Neural Netw., vol. 13, no. 6, pp. 1450–1464, Nov. 2002.
[15] C. Lawson and R. J. Hanson,Solving Least-Squares Problems. New Jersey: Prentice-Hall, 1974.
[16] P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints,”Journal of Machine Learning Research, vol. 5,
pp. 1457–1469, 2004.
[17] W.-K. Ma, T. Davidson, K. Wong, Z. Luo, and P. Ching, “Quasi-maximum-likelihood multiuser detection using semi-
definite relaxation with application to synchronous CDMA,”IEEE Trans. Signal Process., vol. 50, no. 4, pp. 912–922,
April 2002.
[18] D. P. Palomar, J. M. Cioffi, and M. A. Lagunas, “Joint Tx-Rx beamforming design for multicarrier MIMO channels: A
unified framework for convex optimization,”IEEE Trans. Signal Process., vol. 51, no. 9, pp. 2381–2401, 2003.
[19] Z.-Q. Luo, T. N. Davidson, G. B. Giannakis, and K. M. Wong, “Transceiver optimization for block-based multiple access
through ISI channels,”IEEE Trans. Signal Process., vol. 52, no. 4, pp. 1037–1052, 2004.
[20] Y. Ding, T. N. Davidson, Z.-Q. Luo, and K. M. Wong, “Minimum BER block precoders for zero-forcing equalization,”
IEEE Trans. Signal Process., vol. 51, no. 9, pp. 2410–2423, 2003.
[21] A. Wiesel, Y. C. Eldar, and S. Shamai, “Linear precodingvia conic optimization for fixed mimo receivers,”IEEE Trans.
Signal Process., vol. 54, no. 1, pp. 161–176, 2006.
[22] N. D. Sidiropoulos, T. N. Davidson, and Z.-Q. Luo, “Transmit beamforming for physical-layer multicasting,”IEEE Trans.
Signal Process., vol. 54, no. 6, pp. 2239–2251, 2006.
[23] S. A. Vorobyov, A. B. Gershman, and Z.-Q. Luo, “Robust adaptive beamforming using worst-case performance optimization:
A solution to the signal mismatch problem,”IEEE Trans. Signal Process., vol. 51, no. 2, pp. 313–324, 2003.
December 27, 2007 DRAFT
30
[24] P. Biswas, T.-C. Lian, T.-C. Wang, and Y. Ye, “Semidefinite programming based algorithms for sensor network localization,”
ACM Trans. Sensor Networks, vol. 2, no. 2, pp. 188–220, 2006.
[25] F.-Y. Wang, Y. Wang, T.-H. Chan, and C.-Y. Chi, “Blind separation of multichannel biomedical image patterns by non-
negative least-correlated component analysis,” inLecture Notes in Bioinformatics (Proc. PRIB’06), Springer-Verlag, vol.
4146, Berlin, Dec. 9-14, 2006, pp. 151–162.
[26] F.-Y. Wang, C.-Y. Chi, T.-H. Chan, and Y. Wang, “Blind separation of positive dependent sources by non-negative least-
correlated component analysis,” inIEEE International Workshop on Machine Learning for SignalProcessing (MLSP’06),
Maynooth, Ireland, Sept. 6-8, 2006, pp. 73–78.
[27] A. Prieto, C. G. Puntonet, and B. Prieto, “A neural learning algorithm for blind separation of sources based on geometric
properties,”Signal Processing, vol. 64, pp. 315–331, 1998.
[28] A. T. Erdogan, “A simple geometric blind source separation method for bound magnitude sources,”IEEE Trans. Signal
Process., vol. 54, no. 2, pp. 438–449, 2006.
[29] F. Vrins, J. A. Lee, and M. Verleysen, “A minimum-range approach to blind extraction of bounded sources,”IEEE Trans.
Neural Netw., vol. 18, no. 3, pp. 809–822, 2006.
[30] S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge Univ. Press, 2004.
[31] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar,Convex Analysis and Optimization. Athena Scientific, 2003.
[32] B. Grunbaum,Convex Polytopes. Springer, 2003.
[33] M. E. Dyer, “The complexity of vertex enumeration methods,” Mathematics of Operations Research, vol. 8, no. 3, pp.
381–402, 1983.
[34] K. G. Murty and S.-J. Chung, “Extreme point enumeration,” College of Engineering, University of Michigan,” Technical
Report 92-21, 1992, available online:http://deepblue.lib.umich.edu/handle/2027.42/6731.
[35] K. Fukuda, T. M. Liebling, and F. Margot, “Analysis of backtrack algorithms for listing all vertices and all faces ofa
convex polyhedron,”Computational Geometry: Theory and Applications, vol. 8, no. 1, pp. 1–12, 1997.
[36] J. F. Sturm, “Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones,”Optimization Methods and
Software, vol. 11-12, pp. 625–653, 1999.
[37] I. J. Lustig, R. E. Marsten, and D. F. Shanno, “Interior point methods for linear programming: Computational state of the
art,” ORSA Journal on Computing, vol. 6, no. 1, pp. 1–14, 1994.
[38] G. H. Golub and C. F. V. Loan,Matrix Computations. The Johns Hoplkins University Press, 1996.
[39] P. Tichavsky and Z. Koldovsky, “Optimal pairing of signal components separated by blind techniques,”IEEE Signal
Process. Lett., vol. 11, no. 2, pp. 119–122, 2004.
[40] J. R. Hoffman and R. P. S. Mahler, “Multitarget miss distance via optimal assignment,”IEEE Trans. System, Man, and
Cybernetics, vol. 34, no. 3, pp. 327–336, May 2004.
[41] H. W. Kuhn, “The Hungarian method for the assignment method,” Naval Research Logistics Quarterly, vol. 2, pp. 83–97,
1955.
[42] S. G. Armato, “Enhanced visualization and quantification of lung cancers and other diseases of the chest,”Experimental
Lung Res., vol. 30, no. 30, pp. 72–77, 2004.
[43] K. Suzuki, R. Engelmann, H. MacMahon, and K. Doi, “Virtual dual-energy radiography: improved chest radiographs by
means of rib suppression based on a massive training artificial neural network (MTANN),”Radiology, vol. 238. [Online].
Available: http://suzukilab.uchicago.edu/research.htm
[44] R. A. Horn and C. R. Johnson,Matrix Analysis. New York: Cambridge Univ. Press, 1985.
December 27, 2007 DRAFT