blind seperation image sources via adaptive dictionary learning

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012 2921

Blind Separation of Image Sources via AdaptiveDictionary Learning

Vahid Abolghasemi, Member, IEEE, Saideh Ferdowsi, StudentMember, IEEE, and Saeid Sanei, SeniorMember, IEEE

Abstract—Sparsity has been shown to be very useful in sourceseparation of multichannel observations. However, in most cases,the sources of interest are not sparse in their current domain andone needs to sparsify them using a known transform or dictionary.If such a priori about the underlying sparse domain of the sourcesis not available, then the current algorithms will fail to success-fully recover the sources. In this paper, we address this problemand attempt to give a solution via fusing the dictionary learninginto the source separation. We first define a cost function basedon this idea and propose an extension of the denoising method inthe work of Elad and Aharon to minimize it. Due to impracticalityof such direct extension, we then propose a feasible approach. Inthe proposed hierarchical method, a local dictionary is adaptivelylearned for each source along with separation. This process im-proves the quality of source separation even in noisy situations.In another part of this paper, we explore the possibility of addingglobal priors to the proposed method. The results of our experi-ments are promising and confirm the strength of the proposed ap-proach.

Index Terms—Blind source separation (BSS), dictionarylearning, image denoising, morphological component analysis(MCA), sparsity.

I. INTRODUCTION

I N SIGNAL and image processing, there are many instanceswhere a set of observations is available and we wish to re-

cover the sources generating these observations. This problem,which is known as blind source separation (BSS), can be pre-sented by the following linear mixture model:

(1)

where is the observation matrix,is the source matrix, and is the

mixing matrix (throughout the paper, we assume that the mixingmatrix is column normalized). The additive of size

Manuscript received July 16, 2011; revised November 04, 2011 and Jan-uary 08, 2012; accepted February 04, 2012. Date of publication February 13,2012; date of current version May 11, 2012. The associate editor coordinatingthe review of this manuscript and approving it for publication was Prof. BulentSankur.V. Abolghasemi is with the School of Engineering and Design, Brunel Uni-

versity, Uxbridge UB8 3PH, U.K., and also with the Faculty of Engineering andPhysical Sciences, University of Surrey, Guildford, GU2 7XH, U.K. (e-mail:[email protected]).S. Ferdowsi and S. Sanei are with the Department of Computing, Faculty of

Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH,U.K.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIP.2012.2187530

represents the instrumental noise or the imperfection of themodel. In BSS, the aim is to estimate both and from the ob-servations. This problem does not have a unique solution in gen-eral. However, one can find a solution for (1) by imposing someconstraints into the separation process and making the sourcesdistinguishable. Independent component analysis (ICA) is oneof the well-established methods that exploits the statistical in-dependence of the sources [2]. It also assumes that the sourcesare non-Gaussian and attempts to separate them by minimizingthe mutual information.Nonnegativity is another constraint that has been shown to be

useful for source separation. In nonnegative matrix factoriza-tion, a nonnegative matrix is decomposed into a product of twononnegative matrices. The nonnegativity constraint allows onlyadditive combination; hence, it can produce a part-based repre-sentation of data [3].In particular, and in image separation applications, there

have been proposed several Bayesian approaches [4]–[8]. In[4], Kayabol et al. adopted a Markov random field (MRF)model to preserve the spatial dependence of neighboring pixels(e.g., sharpness of edges) in a 2-D model. They provided afull Bayesian solution to the image separation using Gibbssampling. Another MRF-based method has been proposed byTonazzini et al. in [7]. They used a maximum a posteriori(MAP) estimation method for recovering both the mixingmatrix and the sources, based on alternating maximization in asimulated annealing scheme. Their experimental results haveshown that this scheme is able to increase the robustness againstnoise. Later, Tonazzini et al. [8] proposed to apply mean fieldapproximation to MRF and used expectation maximization(EM) algorithm to estimate parameters of both the mixingmatrix and the sources. This method has been successfullyused in document processing application. In addition, in [9],the images have been modeled by hidden Markov fields withunknown parameters and a Bayesian formulation has beenconsidered. A fast Monte Carlo Markov chain (MCMC) al-gorithm has been proposed in this paper for joint separationand segmentation of mixed images. Kuruoglu et al. [10] havestudied the detection of cosmic microwave background (CMB)using image separation techniques. They exploited the spatialinformation in the images by modeling the correlation andinteractions between pixels by MRFs. They proposed a MAPalgorithm for estimation of the sources and the mixing matrixparameters.Different from MRF-based approaches for CMB detection

in astrophysical images is spectral matching ICA (SMICA)method proposed by Cardoso et al. [5]. In SMICA, the spectraldiversity of the sources is exploited. This enables SMICA

1057-7149/$31.00 © 2012 IEEE

2922 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012

to separate the sources even if they are Gaussian and noisy,provided that they have different power spectra. It is shown in[5] that this idea leads to an efficient EM algorithm, much fasterthan previous algorithms with non-Gaussianity assumption.An extension of this method to the wavelet domain, calledwSMICA, has been proposed by Moudden et al. [6].Wavelet domain analysis for image separation problem has

been also shown to be advantageous in some other works. In[11], Ichir and Mohammad-Djafari assumed three differentmodels of the wavelet coefficients, i.e., independent Gaussianmixture model, hidden Markov tree model, and contextualhidden Markov field model. Then, they proposed an MCMCalgorithm for joint blind separation of the sources and estima-tion of the mixing matrix and the hyperparameters. The workin [12] also uses the wavelet domain for source separation.In another work, Bronstein et al. [13] studied the problem ofrecovering a scene recorded through a semi-reflecting mediumand proposed a technique called sparse ICA (SPICA). Theyproposed to apply the wavelet packet transform (WPT) to themixtures prior to applying Infomax ICA. Their algorithm issuitable for the cases where all the sources can be sparselyrepresented in the wavelet domain.Sparsity-based approaches for the BSS problem have re-

ceived much attention recently. The term sparse refers tosignals or images with small number of nonzeros with respectto some representation bases. In sparse component analysis(SCA), the assumption is that the sources can be sparselyrepresented using a known common basis or dictionary. Forinstance, the aforementioned SPICA method [13] considersthe wavelet domain for this purpose. In addition, the proposedmethods in [14] and [15] use sparse dictionaries for separationof speech mixtures. Nevertheless, there are many cases whereeach source is sparse in a different domain, which makes itdifficult to directly apply SCA methods to their mixtures. Mul-tichannel morphological component analysis (MMCA) [16]and generalized morphological component analysis (GMCA)[17] have been recently proposed to address this problem.The main assumption in MMCA is that each source can besparsely represented in a specific known transform domain. InGMCA, each source is modeled as the linear combination of anumber of morphological components where each componentis sparse in a specific basis. MMCA and GMCA are extensionsof a previously proposed method of morphological componentanalysis (MCA) [18]–[21] to the multichannel case. In MCA,the given signal/image is decomposed into different morpho-logical components subject to sparsity of each component in aknown basis (or dictionary).MMCA performs well where prior knowledge about the

sparse domain of each individual source is available; however,what if such a priori is not available or what if the sourcesare not sparsely representable using the existing transforms?One may answer these questions by referring to the dictionarylearning framework. In dictionary learning framework, the aimis to find an overcomplete dictionary that can sparsely representa given set of images or signals. Utilizing learned dictionariesinto MCA has been shown to yield promising results [22],[23]. However, taking the advantages of learned dictionaries

in MMCA is still an open issue. One possible approach to usedictionary learning for MMCA is to learn a specific dictionaryfor each source from a set of exemplar images. This, however,is rarely the case, since in most BSS problems, such trainingsamples are not available.In this paper, we adapt MMCA to those cases that the spar-

sifying dictionaries/transforms are not available. The proposedalgorithm is designed to adaptively learn the dictionaries fromthe mixed images within the source separation process. Thismethod is motivated by the idea of image denoising using alearned dictionary from the corrupted image in [1]. Other ex-tensions of the work in [1] have been also reported in [22], [24],and [25] for the purpose of image separation from a single mix-ture. In this paper, the multichannel case with more observationsthan sources is considered. We start by theoreticallyextending the denoising problem to BSS. Then, a practical algo-rithm is proposed for BSS without any prior knowledge aboutthe sparse domains of the sources. The results indicate that adap-tive dictionary learning, one for each source, enhances the sep-arability of the sources.

A. Notation

In this paper, all parameters have real values although it is notexplicitly mentioned. We use small and capital boldface charac-ters to represent vectors and matrices, respectively. All vectorsare represented as column vectors by default. For instance, theth column and th row of matrix are represented by columnvectors and , respectively. The vectorized version of ma-trix is shown by vec . The matrix Frobenius normis indicated by . The transpose operation is denoted by

. The -norm, which counts the number of nonzeros, andthe -norm, which sums over the absolute values of all ele-ments, are shown by and , respectively.

B. Organization of the Paper

In the next section, the motivation behind the proposedmethod is stated. We start by describing the denoising problemand its relation to image separation. Then, the extension ofsingle mixture separation to the multichannel observationscenario is argued. In Section III, a practical approach for thispurpose is proposed. Section IV is devoted to discussing thepossibility of adding global sparsity priors to the proposedmethod. Some practical issues are discussed in Section V. Thenumerical results are given in Section VI. Finally, this paper isconcluded in Section VII.

II. FROM IMAGE DENOISING TO SOURCE SEPARATION

A. Image Denoising

Consider a noisy image corrupted by additive noise. Elad andAharon [1] showed that if the knowledge about the noise poweris available, it is possible to denoise it by learning a local dic-tionary from the noisy image itself. In order to deal with largeimages, they used small (overlapped) patches to learn such dic-tionary. The obtained dictionary is considered local since it de-scribes the features extracted from small patches. Let us repre-sent the noisy image by vector of length . The

ABOLGHASEMI et al.: BLIND SEPARATION OF IMAGE SOURCES VIA ADAPTIVE DICTIONARY LEARNING 2923

unknown denoised image is also vectorized and representedas . The th patch from is shown by vectorof pixels. For notational simplicity, the th patch is ex-pressed as explicit multiplication of operator (a binarymatrix) by .1 The overall denoising problem is expressed as

(2)

where scalars and control the noise power and sparsity de-gree, respectively. In addition, is the sparsifying dic-tionary that contains normalized columns (also called atoms),and are sparse coefficients of length .In the proposed algorithm by Elad and Aharon [1], and

are respectively initialized with and overcompletediscrete cosine transform (DCT) dictionary. The minimizationof (2) starts with extracting and rearranging all the patches of .The patches are then processed by K-SVD [26], which updatesand estimates sparse coefficients . Afterward, and

are assumed fixed and is estimated by computing

(3)

where is the identity matrix and is the refined version of .Again, and are updated by K-SVD but this time usingthe patches from that are less noisy. Such conjoined denoisingand dictionary adaptation is repeated to minimize (2). In prac-tice, (3) is obtained computationally easier since isdiagonal and the above expression can be calculated in a pixel-wise fashion.It is shown that (3) is a kind of averaging using both noisy

and denoised patches, which, if repeated along with updating ofother parameters, will denoise the entire image [1]. However, inthe sequel, we will try to find out if this strategy is extendablefor the cases where the noise is added to the mixtures of morethan one image.

B. Image Separation

Image separation is a more complicated case of image de-noising where more than one image are to be recovered from asingle observation. Consider a single linear mixture of two tex-tures with additive noise: (or ).The authors of [25] attempt to recover and using the priorknowledge about two sparsifying dictionaries and . Theyuse a minimum mean-squared-error (MSE) estimator for thispurpose. In contrast, the recent work in [24] does not assumeany prior knowledge about the dictionaries. It rather attempts tolearn a single dictionary from the mixture and then applies a de-cision criterion to the dictionary atoms to separate the images.In another recent work, Peyre et al. [22] presented an adaptiveMCA scheme by learning the morphologies of image layers.They proposed to use both adaptive local dictionaries and fixed

1Practically, we apply a nonlinear operation to extract the patches from imageby sliding a mask of appropriate size over the entire image similar to [1] and

[22].

global transforms (e.g., wavelet and curvelet) for image sepa-ration from a single mixture. Their simulation results show theeffects of adaptive dictionaries on separation of complex texturepatterns from natural images.All the related studies have demonstrated the advantages that

adaptive dictionary learning can have on the separation task.However, there is still one missing piece and that is consideringsuch adaptivity for the multichannel mixtures. Performing theidea of learning local dictionaries within the source separationof multichannel observations obviously has many benefits indifferent applications. Next, we extend the denoising methodin [1] for this purpose.

C. Multichannel Source Separation

In this section, the aim is to extend the denoising problem(2) to the multichannel source separation. Consider the BSSmodel introduced in Section I, and assume further that thesources of interest are 2-D grayscale images. The BSS modelfor 2-D sources can be represented by vectorizing all images

to and then stacking themto form . The BSS model (1) cannot bedirectly incorporated into (2) as it requires both and tobe single vectors. Hence, we use the vectorized versions ofthese matrices in model (1), which can be obtained using theproperties of the Kronecker product2

(4)

In the above expression, and are column vectors of lengthsand , respectively. In addition, is a column vector of

length . is a block diagonal matrix of size, and is the Kronecker product symbol. We consider the

noiseless setting and modify (2) to

(5)

It is clearly seen that the above expression is similar to (2) butwith an extra mixing matrix to be estimated. In addition, vec-tors and are much lengthier than and as they representvectorized versions of multiple images and mixtures.The problem (5) can be minimized in an alternating scheme

by keeping all but one unknown fixed at a time. The estima-tion of and can be achieved using K-SVD similar to[1]. In K-SVD, the dictionary is updated columnwise using sin-gular value decomposition (SVD); the sparse coefficients areestimated using one of the common sparse coding techniques.However, estimation of is slightly different from that ofin (3) and needs more attention. It has a closed-form solution,

2The matrix multiplication can be expressed as a linear transformation onmatrices. In particular, vec vec for three matrices ,, and . In our case, vec vec .


which is obtained by taking the gradient of (5) and setting it tozero

(6)

leading to

(7)

In order to estimate , we consider all unknowns, except ,fixed and simplify (5) by converting the first quadratic term intoan ordinary matrix product and obtain

(8)

The above minimization problem is easily solved using pseu-doinverse of as

(9)

The above steps (i.e., estimation of , , , and ) shouldbe alternately repeated to minimize (5). However, the long ex-pression (7) is not practically computable, particularly if thenumber of sources and observations is large. This is becauseof dealing with a huge matrix . In addition,in contrast to the aforementioned denoising problem, the matrixto be inverted in (7) is not diagonal, and therefore, the estima-tion of cannot be calculated pixel by pixel, which makes thesituation more difficult to handle. In the next section, a practicalapproach is proposed to solve this problem.

III. ALGORITHM

In order to find a practical solution for (5), we use a hierar-chical scheme such as the one in MMCA [16]. To do this, theBSS model (1) is broken into rank-1 multiplications and thefollowing minimization problem is defined:

(10)

where denotes the dictionary corresponding to the thsource, i.e., , and is the th residual expressed as

(11)

It is important to note that, with this new formulation, we haveto learn dictionaries, one for each source, different from (5).The advantage of this scheme is learning adaptive source-spe-cific dictionaries, which improves source diversity, as also seeninMMCA using known transform domains [21]. In addition, the

cost function in (10) does not require dealing with cumbersomematrices and vectors [as shown in (5)] and allows calculatingthe image sources in a pixelwise fashion.The solution for (10) is achieved using an alternating scheme.

The minimization process for the th level of hierarchy can beexpressed as follows. The image patches are extracted fromand then processed by K-SVD for learning and sparse coef-ficients , whereas other parameters are kept fixed. Then, thegradient of (10) with respect to is calculated and set to zero

(12)

Finally, after some manipulations and simplifications in (12),the estimation of the th source is obtained as

(13)It is interesting to notice that the inverting term in the above

expression is the same as that in (3) for the denoising problem.Thus, as aforementioned, this calculation can be obtained pixel-wise. Next, in order to update , a simple least square linear re-gression such as the one in [16] will give the following solution:

(14)

However, normalization of is necessary after each updateto preserve the column norm of the mixing matrix. The abovesteps for updating all variables are executed for all from 1 to. Moreover, the entire procedure should be repeated to mini-mize (10). A pseudocode of the proposed algorithm is given inAlgorithm 1.

Algorithm 1 The proposed algorithm.

Input: Observation matrix , patch size , number ofdictionary atoms , number of sources , noise standarddeviation , noise gain , regularization parameter , andtotal number of iterations .

Output: Dictionaries , sparse coefficients , sourcematrix , and mixing matrix .

1 begin

2 Initialization:

3 Set to overcomplete DCT;

4 Set to a random column-normalized matrix;

5 ;

6 Choose to be multiple times of ;

7 ;

8 repeat

9 ;

10 for to do


11 Extract all the patches from ;

12 Solve the sparse recovery problem:

s.t.;

13 Update using K-SVD;

14 Calculate the residual:;

15 Compute using (13);

16 ;

17 ;

18 end

19 ;

20 until stopping criterion is met;

21 end

As already implied, the motivation behind the proposed al-gorithm is to learn source-specific dictionaries offering sparserepresentations. Such dictionary learning, when embedded intothe source separation process, improves both separation and dic-tionary learning quality. In other words, in the first few iter-ations of Algorithm 1, each source includes portions of othersources. However, the dictionaries gradually learn the domi-nant components and reject the weak portions caused by othersources. Using these dictionaries, the estimated sources, whichare used for dictionary learning in the next iteration, will haveless amount of irrelevant components. This adaptive process isrepeated until most of the irrelevant portions are rejected anddictionaries become source specific. The entire above procedureis carried out to minimize the cost function in (10). It is alsonoteworthy to mention that (10) is not jointly convex in all pa-rameters. Therefore, the alternating estimation of these parame-ters, using the proposed method, does not necessarily lead to anoptimal solution. This is the case for other previous alternatingminimization problems too [16], [24]. However, the obtainedparameters are local minima of (10) and, hence, are consideredas approximate solutions.

IV. GLOBAL DICTIONARIES VERSUS LOCAL DICTIONARIES

The proposed method takes the advantages of fully local dic-tionaries that are learned within the separation task. We callthese dictionaries local since they capture the structure of smallimage patches to generate dictionary atoms. In contrast, theglobal dictionaries are generally applied to the entire image orsignal and capture the global features. Incorporating the globalfixed dictionaries into the proposed method can be advanta-geous where a prior knowledge about the structure of sourcesis available. Such combined local and global dictionaries havebeen used in [22] for single mixture separation. Here, we con-sider the multichannel case and extend our proposed method inSection III for this purpose.

Consider a known global unitary basis for eachsource. The minimization problem (10) can be modified as fol-lows to capture both global and local structures of the sources:

(15)

Note that term is exactly similar to what was used inthe original MMCA [16]. All variables in the above expressioncan be similarly estimated using Algorithm 1, except the actualsources . In order to find , the gradient of (15) withrespect to is set to zero, leading to

sgn (16)

where sgn is a componentwise signum function. The aboveexpression amounts to soft thresholding [16] due to the signumfunction, and hence, the estimation of can be obtained bythe following steps:• Soft thresholding of with threshold andattaining ;

• Reconstructing by

(17)

Note that, since is a unitary and known matrix, it is notexplicitly stored but implicitly applied as a forward or inversetransform where applicable. Similar to the previous section, theabove expression can be executed pixelwise and is not compu-tationally expensive.

V. PRACTICAL ISSUES

Implementation of the proposed methods needs some carefulconsiderations, which are addressed here.

A. Noise

Similar to the denoising methods in [1], the proposed methodalso requires the knowledge about noise power. This should beutilized in the sparse coding step of the K-SVD algorithm forsolving the following problem:

s.t. (18)

where is a constant and is the noise standard deviation. Theabove well-known sparse recovery problem can be solved usingorthogonal matching pursuit (OMP) [27]. The FOCal Underde-termined System Solver (FOCUSS) [28] method is also used tosolve (18) by replacing the -norm with the -norm.


Furthermore, incorporating the noise power in solving (18)has an important advantage in the dictionary update stage. Itensures that the dictionary does not learn the existing noise inthe patches. Consequently, the estimated image using this dic-tionary would become cleaner, which will later refine the dic-tionary atoms in the next iteration. This progressive denoisingloop is repeated until a clean image is achieved.In the proposed method, however, noise means the portions

of other sources that are mixed with . Hence, we initially con-sider a high value for since the portions of other sourcesmight be high although the noise is zero, and then graduallyreduce it to zero as the iterations evolve. Nevertheless, if theobservations themselves are noisy, then we should start from ahigher bound and decrease it toward the actual noise power asthe iterations evolve.

B. Dictionary and Patch Sizes

Unfortunately, not much can be theoretically said aboutchoosing the optimum dictionary size (or redundancy factor

). Highly overcomplete dictionaries allow higher sparsityof the coefficients, although being computationally expensive.However, it may not be necessary to choose very high redun-dancy and it depends on the data type and number of trainingsignals [24]. There have been some reported works allowing avariable dictionary size to find the optimum redundancy factorsuch as in [29] and [30]. Such analysis is out of the scope ofthis paper, and we choose a fixed redundancy factor in oursimulations.Patch size is normally chosen depending on the entire image

size and also the knowledge (if available) about the patterns ofthe actual sources. However, adopting very large patches shouldbe avoided as they lead to large dictionaries and also providefew training signals for the dictionary learning stage. We havechosen 8 8 patches for all the experiments, which seems tobecome a standard in the literature.Furthermore, we normally use overlapping patches as shown

to give better results in our simulations and also in the cor-responding literature [1], [25]. However, we have empiricallyobserved that the percentage of overlapping can be adjustedbased on the noise level. In noiseless settings, less percentage ofoverlap (e.g., 50%) would be sufficient, whereas in noisy situa-tions, full overlapped patches result in the desired performance.Full overlapped patches are obtained by one-pixel shifting ofthe operator , in both vertical and horizontal directions, overthe entire image.

C. Complexity Analysis

The proposed algorithm is more computationally expensivethan standard MMCA. It includes K-SVD, which imposes twoextra steps of sparse coding and dictionary update. However,the complexity of computing is almost the same as in MMCAsince (13) can be computed pixelwise. The detailed analysis ofthe complexity of the proposed method per iteration is as fol-lows. Consider lines 12 and 13 in Algorithm 1, and assumefurther that we use approximate K-SVD [31]. Based on [31],these two lines cost , where , foreach . The complexity of computing the residual (line 14) is

. Lines 15 and 16 respectively cost

and , where denotes the number of ex-tracted patches. Finally, normalization in line 17 costs .Since all these calculations are executed for each source (i.e.,

), then the total computation cost per each iteration ofthe algorithm would be

. It is seen that, due to learning one dictio-nary for each individual source, the proposed algorithm wouldbe computationally demanding for large-scale problems.A possible approach to speed up the algorithm is to learn one

dictionary for all the sources and then choose relevant atomsfor estimating each source. This strategy, however, has not beencompleted yet and is under development. In addition, consid-ering faster dictionary learning algorithms rather than K-SVDcan alleviate the problem.

VI. RESULTS

In the first experiment, we illustrate the results of applyingAlgorithm 1 to the mixtures of four sources. A severe caseof image sources with different morphologies was chosen toexamine the performance of the proposed method. A 6 4full-rank random column-normalized was used as the mixingmatrix. Five hundred iterations were selected as the stoppingcriterion. The mixtures were noiseless. However, we selected

(see below for the reasoning of this choice) and useda decreasing starting from 10 and reaching 0.01 at the endof iterations. The patches had 50% overlap. Other parameterswere chosen as follows: , , ,and . In addition, in order to show the advantages oflearning adaptive local dictionaries over the fixed local dic-tionaries, we applied Algorithm 1 to the same mixtures whileignoring the dictionary learning part. This way, source sepa-ration is performed using local fixed DCT dictionaries for allsources. We also applied the following methods for comparisonpurposes: GMCA3 [21] based on wavelet transform, SPICA4

[13] based on WPT, and SMICA [5] based on Fourier trans-form. Fig. 1 illustrates the results of applying all these methodstogether with the corresponding MSEs. MSE is calculated asMSE , where and are the originaland recovered images, respectively. Fig. 1 (see the bottom row)shows that the proposed method could successfully recover theimage sources via adaptive learning of the sparsifying dictio-naries. The achieved MSEs, given in Fig. 1, are lowest for theproposed method for all image sources except for the brickliketexture. In addition, the learned dictionary atoms, as shown inFig. 2, indicate good adaptation to the corresponding sources.Other methods do not perform as well as the proposed method.For instance, as shown in Fig. 1, SPICA has a problem in sep-aration of noiselike texture and Barbara. GMCA has separatedall image sources with some interference from other sources.SMICA has perfectly recovered the bricklike texture with thelowest MSE among all other methods. However, it has somedifficulties in recovering Barbara and cartoon boy images.As another measure on the performance of the proposed

method, the reconstruction error as a function of the numberof iterations is shown in Fig. 3. It is shown in this figure that

3Available at: http://md.cosmostat.org/Home.html4Available at: http://visl.technion.ac.il/bron/spica/


Fig. 1. Results of applying different methods to six noiseless mixtures.

a monotonic decrease in the value of the separation error isachieved and the asymptotic error is almost zero.In the next experiment, the performance of the proposed

method in noisy situations is evaluated. For this purpose, wegenerated four mixtures from two image sources, namely, Lenaand boat. Then, Gaussian noise with a standard deviation of15 was added to the mixtures so that the peak signal-to-noiseratios (PSNRs) of the mixtures are equal to 20 dB. (Note thatthe image size is 128 128.) The noisy mixtures are shown inFig. 4(a). We applied the proposed algorithm to these mixturesstarting with and evolving while was graduallydecreasing to . We used fully overlapped patches and200 iterations, with the rest of parameters similar to those inthe first experiment. One of the considerable advantages of theproposed method is the ability to denoise the sources duringthe separation. This is because the core idea of the proposedmethod comes from a denoising task. In contrast, in mostconventional BSS methods, denoising should be carried outeither before or after the separation, which is not ideal and may

Fig. 2. Obtained dictionaries as a result of applying the proposed method tothe mixtures shown in Fig. 1.

Fig. 3. MSE computed as . Note that the elements ofimage sources have amplitude in the range [0, 255].

lead the algorithm to fail. Among the existing related methods,GMCA [21] uses a thresholding scheme to denoise the imagesources during the separation. Hence, we applied this algorithmto the same mixtures to compare its performance with that ofthe proposed method. We applied SPICA and SMICA to thesame mixtures too. The results of this experiment are demon-strated in Fig. 4. It is shown in Fig. 4(f) that both separation anddenoising tasks have been successful in the proposed method.The corresponding learned dictionaries using the proposedmethod, given in Fig. 5(a), confirm these observations. Inaddition, the proposed method is superior to GMCA, SPICA,and SMICA, and it denoises the sources with higher PSNRs,as seen by comparing Fig. 4(c)–(f). Although GMCA, SPICA,and SMICA have separated the image sources, they are notsuccessful in denoising them.In addition, the sparsity rate of ’s, obtained by averaging

over the support of all (where is the number ofpatches), for both sources is given in Fig. 5(b). Note that a 42 mixing matrix was used in this experiment. This graph

shows the percentage of used atoms by sparse coefficients,


Fig. 4. (a) Noisy mixtures. (b) Original sources. Separated sources using (c) GMCA, (d) SPICA, (e) SMICA, and (f) the proposed method.

Fig. 5. (a) Obtained dictionaries as a result of applying the proposed method tothe mixtures shown in Fig. 4(a). (b) Percentage of average number of nonzerosof sparse coefficients, equivalent to the occurrence rate of dictionary atomssorted based on a descending order.

sorted in descending order, for both dictionaries correspondingto Lena and boat. As found from Fig. 5(b), the most frequentlyused atom shows a small percentage of appearance ( 17%),and other dictionary atoms have been used less frequently. Thismeans that the learned dictionaries are so efficient that few

linear combinations of their atoms are sufficient to generate theimage sources of interest.In another experiment, we evaluated and compared the influ-

ences of adding global sparsity prior to the proposed algorithm.The experiment settings are as follows. Two images, namely,texture and Lena, were mixed together using a 4 2 random. Algorithm 1, which only takes advantage of locally learned

dictionaries, was applied to the mixtures. The proposed algo-rithm in Section IV (local and global) was also applied to thesame mixtures. We considered discrete wavelet transform andDCT as global sparsifying bases for Lena and texture, respec-tively. The methods of FastICA5 [32], JADE [2], and SMICAwere also used in this experiment for comparison. We varied thenoise standard deviation from 0 to 20 to investigate the perfor-mance of all thesemethods in dealing with different noise levels.Similar to [16] and [17], we calculated the mixing matrix crite-rion (also calledAmari criterion [33]) as ,where is the scaling and permutation matrix.6 Themixing ma-trix criterion curves are depicted in Fig. 6 as a function of stan-dard deviation. As it is shown in Fig. 6, in low/moderate noisesituations, the best performance is achieved when both globaland local dictionaries are used. However, the performance ofthe proposed method using only local and both local and globaldictionaries decreases in high noise. Moreover, the recovery re-sults of JADE and FastICA are less accurate than those of theother methods. The performance of SMICA is somewhere be-tween JADE’s and that of the proposed method.Next, in order to find an optimal , we investigated the ef-

fects of choosing different ’s on the recovery quality. As ex-pected and also shown in [1], selection of is dependent on the

5Available at: http://research.ics.tkk.fi/ica/fastica/6The proposed method only causes signed permutations due to dealing with

column-normalized mixing matrices.


TABLE IDETAILED SIMULATION TIME OF ALGORITHM 1. “TOTAL” MEANS THE TOTAL TIME (IN SECONDS) ELAPSED PER ONE ITERATION

Fig. 6. Mixing matrix criterion as a function of noise standard deviation fordifferent algorithms. Note that the “Global” method is actually equivalent tothe original MMCA in [16].

Fig. 7. Mixing matrix criterion as a function of .

noise standard deviation. However, our case is slightly differentdue to the source separation task.7 In our simulations, Gaussiannoises with different power values were added to the mixturesand the mixing matrix criterion was calculated while varying. Fig. 7 represents the achieved results for this experiment. Inthis figure, it can be found that all the curves achieve nearly thesame recovery quality for . Therefore, isa reasonable choice. Interestingly, these results and the range ofappropriate choices for are very similar to what were achievedin [1].As already mentioned, it is expected that the computation

time of the proposed method increases for a large number of

7Indeed, we empirically observed better performance by startingwith a highervalue of for solving (18) and decreasing it toward the true noise standarddeviation as the iterations proceed.

sources. A set of simulations was conducted to numericallydemonstrate this effect and also to evaluate the effects ofchanging other dimensions such as patch size and dictionarysize on the simulation time. In Table I, we have given these re-sults per one iteration of Algorithm 1, where the image patcheshad 50% overlap. A desktop computer with a Core 2 Duo CPUof 3 GHz and 2 GB of RAM was used for this experiment. Itis interesting to note that, as shown in Table I, much of thecomputational burden is incurred by the dictionary updatestage (using K-SVD). In addition, the computation of isfar less complicated than both “sparse coding” and “dictionaryupdate” due to pixelwise operation. This implies that furthereffort is required to speed up the dictionary learning part of theproposed algorithm.

VII. CONCLUSION

In this paper, the BSS problem has been addressed. The aimhas been to take advantage of sparsifying dictionaries for thispurpose. Unlike the existing sparsity-based methods, we as-sumed no prior knowledge about the underlying sparsity domainof the sources. Instead, we have proposed to fuse the learningof adaptive sparsifying dictionaries for each individual sourceinto the separation process. Motivated by the idea of image de-noising via a learned dictionary from the patches of the cor-rupted image in [1], we proposed a hierarchical approach for thispurpose. Our simulation results on both noisy and noiselessmix-tures have been encouraging and confirmed the effectivenessof the proposed approach. However, further work is requiredto speed up the algorithm and make it suitable for large-scaleproblems. In addition, the possibility of applying the proposedmethod to the underdeterminedmixtures (less observations thansources) is on our future plans.

REFERENCES

[1] M. Elad and M. Aharon, “Image denoising via sparse and redun-dant representations over learned dictionaries,” IEEE Trans. ImageProcess., vol. 15, no. 12, pp. 3736–3745, Dec. 2006.

[2] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Anal-ysis. New York: Wiley-Interscience, May 2001.

[3] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,”Nature vol. 401, no. 6755, pp. 788–791,Oct. 1999 [Online]. Available: http://dx.doi.org/10.1038/44565

[4] K. Kayabol, E. E. Kuruoglu, and B. Sankur, “Bayesian separationof images modeled with MRFs using MCMC,” IEEE Trans. ImageProcess., vol. 18, no. 5, pp. 982–994, May 2009.

[5] J.-F. Cardoso, H. Snoussi, J. Delabrouille, and G. Patanchon, “Blindseparation of noisy Gaussian stationary sources. Application to cosmicmicrowave background imaging,” in Proc. 11th EUSIPCO, Toulouse,France, Sep. 2002, pp. 561–564.


[6] Y. Moudden, J.-F. Cardoso, J.-L. Starck, and J. Delabrouille, “Blindcomponent separation in wavelet space: Application to CMB analysis,”EURASIP J. Appl. Signal Process., vol. 2005, pp. 2437–2454, 2005.

[7] A. Tonazzini, L. Bedini, E. E. Kuruoglu, and E. Salerno, “Blindseparation of auto-correlated images from noisy mixtures using MRFmodels,” in Proc. Int. Symp. ICA, Nara, Japan, Apr. 2003, pp. 675–680.

[8] A. Tonazzini, L. Bedini, and E. Salerno, “A Markov model for blindimage separation by a mean-field EM algorithm,” IEEE Trans. ImageProcess., vol. 15, no. 2, pp. 473–482, Feb. 2006.

[9] H. Snoussi and A. Mohammad-Djafari, “Fast joint separation and seg-mentation of mixed images,” J. Electron. Imaging, vol. 13, no. 2, pp.349–361, Apr. 2004.

[10] E. Kuruoglu, A. Tonazzini, and L. Bianchi, “Source separation in noisyastrophysical images modelled by Markov random fields,” in Proc.ICIP, Oct. 2004, vol. 4, pp. 2701–2704.

[11] M. M. Ichir and A. Mohammad-Djafari, “Hidden Markov models forwavelet-based blind source separation,” IEEE Trans. Image Process.,vol. 15, no. 7, pp. 1887–1899, Jul. 2006.

[12] W. Addison and S. Roberts, “Blind source separation with non-sta-tionary mixing using wavelets,” in Proc. Int. Conf. ICA, Charleston,SC, 2006.

[13] A. M. Bronstein, M. M. Bronstein, M. Zibulevsky, and Y. Y. Zeevi,“Sparse ICA for blind separation of transmitted and reflected images,”Int. J. Imaging Syst. Technol., vol. 15, no. 1, pp. 84–91, 2005.

[14] M. G. Jafari and M. D. Plumbley, “Separation of stereo speech signalsbased on a sparse dictionary algorithm,” in Proc. 16th EUSIPCO, Lau-sanne, Switzerland, Aug. 2008, pp. 25–29.

[15] M. G. Jafari, M. D. Plumbley, and M. E. Davies, “Speech separationusing an adaptive sparse dictionary algorithm,” in Proc. Joint Work-shop HSCMA, Trento, Italy, May 2008, pp. 25–28.

[16] J. Bobin, Y. Moudden, J. Starck, and M. Elad, “Morphological diver-sity and source separation,” IEEE Signal Process. Lett., vol. 13, no. 7,pp. 409–412, Jul. 2006.

[17] J. Bobin, J. Starck, J. Fadili, and Y. Moudden, “Sparsity and mor-phological diversity in blind source separation,” IEEE Trans. ImageProcess., vol. 16, no. 11, pp. 2662–2674, Nov. 2007.

[18] J. Starck, M. Elad, and D. Donoho, “Redundant multiscale transformsand their application for morphological component separation,” Adv.Imaging Electron Phys., vol. 132, pp. 287–348, 2004.

[19] J.-L. Starck, M. Elad, and D. Donoho, “Image decomposition via thecombination of sparse representations and a variational approach,”IEEE Trans. Image Process., vol. 14, no. 10, pp. 1570–1582, Oct.2005.

[20] M. Elad, J. Starck, P. Querre, and D. Donoho, “Simultaneous cartoonand texture image inpainting using morphological component analysis(MCA),” Appl. Comput. Harmonic Anal., vol. 19, no. 3, pp. 340–358,Nov. 2005.

[21] J. Bobin, J.-L. Starck, J. M. Fadili, Y. Moudden, and D. L. Donoho,“Morphological component analysis: An adaptive thresholdingstrategy,” IEEE Trans. Image Process., vol. 16, no. 11, pp. 2675–2681,Nov. 2007.

[22] G. Peyre, J. Fadili, and J. Starck, “Learning the morphological diver-sity,” SIAM J. Imaging Sci., vol. 3, no. 3, pp. 646–669, Jul. 2010.

[23] G. Peyre, J. Fadili, and J.-L. Starck, “Learning adapted dictionaries forgeometry and texture separation,” in Proc. SPIE, 2007, p. 670 11T.

[24] M. Elad, Sparse and Redundant Representations: From Theoryto Applications in Signal and Image Processing. New York:Springer-Verlag, 2010.

[25] N. Shoham and M. Elad, “Algorithms for signal separation exploitingsparse representations, with application to texture image separation,”in Proc. IEEEI 25th Conv. Elect. Electron. Eng. Israel, Dec. 2008, pp.538–542.

[26] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm fordesigning overcomplete dictionaries for sparse representation,” IEEETrans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.

[27] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measure-ments via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol.53, no. 12, pp. 4655–4666, Dec. 2007.

[28] I. Gorodnitsky and B. Rao, “Sparse signal reconstruction from limiteddata using FOCUSS: A re-weighted minimum norm algorithm,” IEEETrans. Signal Process., vol. 45, no. 3, pp. 600–616, Mar. 1997.

[29] M. Yaghoobi, T. Blumensath, and M. Davies, “Parsimonious dictio-nary learning,” in Proc. IEEE ICASSP, Apr. 2009, pp. 2869–2872.

[30] S. Tjoa, M. Stamm, W. Lin, and K. Liu, “Harmonic variable-size dic-tionary learning for music source separation,” in Proc. IEEE ICASSP,Mar. 2010, pp. 413–416.

[31] R. Rubinstein, M. Zibulevsky, and M. Elad, Efficient implementationof the K-SVD algorithm using batch orthogonal matching pursuit Tech-nion, Haifa, Israel, Tech. Rep., 2008.

[32] A. Hyvarinen, “Fast and robust fixed-point algorithms for independentcomponent analysis,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp.626–634, May 1999.

[33] S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm forblind signal separation,” inAdvances in Neural Information ProcessingSystems. Cambridge, MA: MIT Press, 1996, pp. 757–763.

Vahid Abolghasemi (S’08–M’12) received thePh.D. degree in signal and image processing fromthe University of Surrey, Guildford, U.K., in 2011.He is currently with the School of Engineering

and Design, Brunel University, Uxbridge, U.K. Hismain research interests include sparse signal andimage analysis, dictionary learning, compressedsensing, and blind source separation.

Saideh Ferdowsi (S’09) received the B.Sc. and M.Sc. degrees in electronicengineering from Shahrood University of Technology, Shahrood, Iran, in 2005and 2007, respectively. She is currently working toward the Ph.D. degree in theFaculty of Engineering and Physical Sciences, University of Surrey, Guildford,U.K.Her main research interests include blind source separation and biomedical

signal and image processing.

Saeid Sanei (SM’05) received the Ph.D. degree inbiomedical signal and image processing from Impe-rial College London, London, U.K., in 1991.He is currently a Reader in Neurocomputing at the

Faculty of Engineering and Physical Sciences, Uni-versity of Surrey, Guildford, U.K. His main area ofresearch is signal processing and its application inbiomedical engineering.Dr. Sanei is an associate editor of IEEE SIGNAL

PROCESSING LETTERS and EURASIP Journals.

blind seperation image sources via adaptive dictionary learning

Education

separation process

quality of source separation

source matrix

proposed method

image separation applications

image processing

joint separation manuscript

image denoising