ieee transactions on circuits and systems—i: …increasing characteristic of the curve is ensured...

13
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 4, APRIL 2004 817 Spline Neural Networks for Blind Separation of Post-Nonlinear-Linear Mixtures Mirko Solazzi and Aurelio Uncini, Member, IEEE Abstract—In this paper, a novel paradigm for blind source sep- aration in the presence of nonlinear mixtures is presented. In par- ticular, the paper addresses the problem of post-nonlinear mixing followed by another instantaneous mixing system. This model is called here the post-nonlinear-linear model. The method is based on the use of the recently introduced flexible activation function whose control points are adaptively changed: a neural model based on adaptive B-spline functions is employed. The signal separation is achieved through an information maximization criterion. Exper- imental results and comparison with existing solutions confirm the effectiveness of the proposed architecture. Index Terms—Blind signal processing, flexible activation func- tion, neural networks, nonlinear mixtures, post-nonlinear-linear (PNL-L) model, source separation, unsupervised adaptive algorithms. I. INTRODUCTION I N THE LAST few years, a great interest in blind signal processing has been raised. There are many potential applications of blind signal processing in science and tech- nology, for example, the problems of image enhancement and recognition such as face and fingerprint recognition, signal equalization and reconstruction in digital communication, medical diagnosis such as EEG and ECG, geophysical analysis, etc. Other applications are in the acoustic area such as the so-called “cocktail-party” that consists of separating several mixed sound signals produced by several speakers or music sources. In general, blind signal separation (BSS) or source separation consists of recovering unobserved signals or sources from several observed mixtures. Typically, the observations are collected from the output of a set of sensors, where each sensor receives a different combination of the source signals, depending by the nature of the application. The separation of instantaneous linear mixtures has been ex- tensively studied and a number of algorithms for blind signal processing have been proposed [1]–[14], [18]–[20]. In partic- ular, several researchers employed information maximization (INFOMAX) techniques, implemented in neural networks like architecture, for the problem of blind source separation and de- convolution of independent sources. Bell and Sejnowski, in [1], propose the use of a one-layer neural network in order to sepa- rate linear mixtures of signals. The architecture is composed of Manuscript received December 2, 2002; revised January 20, 2003. This paper was recommended by Associate Editor P. Comon. M. Solazzi is with the Dipartimento di Elettronica e Automatica, University of Ancona, 60131 Ancona, Italy. A. Uncini is with the Dipartimento INFOCOM, University of Rome “La Sapienza,” 00184 Rome, Italy (e-mail: [email protected]). Digital Object Identifier 10.1109/TCSI.2004.826210 an invertible linear transformation followed by a bounded mo- notonously increasing nonlinear function applied to all outputs separately. The adaptation (or learning in neural network con- test) is carried out by maximizing the output entropy. In this case, if the probability density function (pdf) of the sources is known, the fixed nonlinearities should be taken to be equal to the cumulative density functions (cdfs) of the sources. In [2], the au- thors, giving a new explanation of [1], underline the relevance of the output information and reinterpret the Bell and Sejnowski approach in a more general context of pdf estimation. Usually, however, the cdfs of the sources are not known. Al- though simulations exhibit good results in some cases also for nonlinearities that do not exactly match the signals, in the gen- eral case, it is important to better estimate the exact nonlinear functions. Several approaches have been proposed to get adaptive non- linearities (see, for example, [3]–[14]) for use in the blind sep- aration problem. These functions, however, can present some limitations due their representation capability and/or computa- tional complexity. In [5], in order to flatten the pdf of a signal, the authors pro- pose a polynomial functional-link approach. The monotonously increasing characteristic of the curve is ensured by certain poly- nomial constraints. The polynomial shape adaptation, as dis- cussed in [6], suffers from the “forgetting problem.” Changing each polynomial’s coefficient produces the modification of the entire curve shape, and this can cancel the information of the previous training patterns. Recently, in order to reduce the computational burden and im- prove the generalization capabilities, an adaptive spline neural network (ASNN) has been proposed [6], [7]. This architecture is shown to be suitable for signal processing applications [8], being based on a flexible Catmul–Rom spline activation func- tion. In [16], a new nonlinear architecture is proposed and ap- plied to the problem of flattening the pdf of a random process. It is based on B-spline functions that allow only having constraints on the control parameters in order to ensure the needed monoto- nously increasing characteristics. It was successfully applied to the problem of blind source separation of instantaneous linear mixtures [17]. In general, a nonlinear mixing model is more realistic and accurate than a linear model and appropriate for representing many practical applications. Blind separation of signals in non- linear mixtures has been addressed by only few authors because of the nonuniqueness of the solution [9]–[15], [21]. Taleb and Jutten [13] studied the separability of nonlinear mixtures using statistically independent sources and developed algorithms for particular models called post-nonlinear (PNL) models. More re- cently, Tan et al. [14] proposed a novel radial basis function 1057-7122/04$20.00 © 2004 IEEE

Upload: others

Post on 02-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 4, APRIL 2004 817

Spline Neural Networks for Blind Separation ofPost-Nonlinear-Linear Mixtures

Mirko Solazzi and Aurelio Uncini, Member, IEEE

Abstract—In this paper, a novel paradigm for blind source sep-aration in the presence of nonlinear mixtures is presented. In par-ticular, the paper addresses the problem of post-nonlinear mixingfollowed by another instantaneous mixing system. This model iscalled here the post-nonlinear-linear model. The method is basedon the use of the recently introduced flexible activation functionwhose control points are adaptively changed: a neural model basedon adaptive B-spline functions is employed. The signal separationis achieved through an information maximization criterion. Exper-imental results and comparison with existing solutions confirm theeffectiveness of the proposed architecture.

Index Terms—Blind signal processing, flexible activation func-tion, neural networks, nonlinear mixtures, post-nonlinear-linear(PNL-L) model, source separation, unsupervised adaptivealgorithms.

I. INTRODUCTION

I N THE LAST few years, a great interest in blind signalprocessing has been raised. There are many potential

applications of blind signal processing in science and tech-nology, for example, the problems of image enhancement andrecognition such as face and fingerprint recognition, signalequalization and reconstruction in digital communication,medical diagnosis such as EEG and ECG, geophysical analysis,etc. Other applications are in the acoustic area such as theso-called “cocktail-party” that consists of separating severalmixed sound signals produced by several speakers or musicsources. In general, blind signal separation (BSS) or sourceseparation consists of recovering unobserved signals or sourcesfrom several observed mixtures. Typically, the observationsare collected from the output of a set of sensors, where eachsensor receives a different combination of the source signals,depending by the nature of the application.

The separation of instantaneous linear mixtures has been ex-tensively studied and a number of algorithms for blind signalprocessing have been proposed [1]–[14], [18]–[20]. In partic-ular, several researchers employed information maximization(INFOMAX) techniques, implemented in neural networks likearchitecture, for the problem of blind source separation and de-convolution of independent sources. Bell and Sejnowski, in [1],propose the use of a one-layer neural network in order to sepa-rate linear mixtures of signals. The architecture is composed of

Manuscript received December 2, 2002; revised January 20, 2003. This paperwas recommended by Associate Editor P. Comon.

M. Solazzi is with the Dipartimento di Elettronica e Automatica, Universityof Ancona, 60131 Ancona, Italy.

A. Uncini is with the Dipartimento INFOCOM, University of Rome “LaSapienza,” 00184 Rome, Italy (e-mail: [email protected]).

Digital Object Identifier 10.1109/TCSI.2004.826210

an invertible linear transformation followed by a bounded mo-notonously increasing nonlinear function applied to all outputsseparately. The adaptation (or learning in neural network con-test) is carried out by maximizing the output entropy. In thiscase, if the probability density function (pdf) of the sources isknown, the fixed nonlinearities should be taken to be equal to thecumulative density functions (cdfs) of the sources. In [2], the au-thors, giving a new explanation of [1], underline the relevanceof the output information and reinterpret the Bell and Sejnowskiapproach in a more general context of pdf estimation.

Usually, however, the cdfs of the sources are not known. Al-though simulations exhibit good results in some cases also fornonlinearities that do not exactly match the signals, in the gen-eral case, it is important to better estimate the exact nonlinearfunctions.

Several approaches have been proposed to get adaptive non-linearities (see, for example, [3]–[14]) for use in the blind sep-aration problem. These functions, however, can present somelimitations due their representation capability and/or computa-tional complexity.

In [5], in order to flatten the pdf of a signal, the authors pro-pose a polynomial functional-link approach. The monotonouslyincreasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed in [6], suffers from the “forgetting problem.” Changingeach polynomial’s coefficient produces the modification of theentire curve shape, and this can cancel the information of theprevious training patterns.

Recently, in order to reduce the computational burden and im-prove the generalization capabilities, an adaptive spline neuralnetwork (ASNN) has been proposed [6], [7]. This architectureis shown to be suitable for signal processing applications [8],being based on a flexible Catmul–Rom spline activation func-tion. In [16], a new nonlinear architecture is proposed and ap-plied to the problem of flattening the pdf of a random process. Itis based on B-spline functions that allow only having constraintson the control parameters in order to ensure the needed monoto-nously increasing characteristics. It was successfully applied tothe problem of blind source separation of instantaneous linearmixtures [17].

In general, a nonlinear mixing model is more realistic andaccurate than a linear model and appropriate for representingmany practical applications. Blind separation of signals in non-linear mixtures has been addressed by only few authors becauseof the nonuniqueness of the solution [9]–[15], [21]. Taleb andJutten [13] studied the separability of nonlinear mixtures usingstatistically independent sources and developed algorithms forparticular models called post-nonlinear (PNL) models. More re-cently, Tan et al. [14] proposed a novel radial basis function

1057-7122/04$20.00 © 2004 IEEE

Page 2: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

818 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 4, APRIL 2004

(RBF) neural network approach to separate independent signalsfrom their nonlinear mixtures without the prior knowledge ofthe source signals and mixing channels.

In [15], is presented a method that can been seen as an exten-sion of the INFOMAX criteria based on the minimization of themutual information of the estimated component (MISEP). Thismethod allows the analysis of linear and nonlinear mixtures andautomates the estimation of the optimal nonlinearities to be usedat the outputs during learning.

In this paper, a novel neural network model for blinddemixing of PNL-linear (PNL-L) mixtures without assumptionon source pdf and mixing channel is proposed.

We address the use of the ASNN recently introduced for su-pervised and unsupervised neural networks. The basic schemeof the ASNN is very similar to classical neural structures butwith improved nonlinear flexible B-spline activation functions.These functions can change their shapes adapting few controlpoints by the learning algorithm. The flexible activation func-tions used in the ASNNs have several interesting features. They:1) are easy to adapt; 2) have the necessary smoothing character-istics; and 3) are easy to implement both in hardware and insoftware. In particular, a suitable architecture composed of twolayers of flexible nonlinear functions for the separation of non-linear mixtures is proposed and a gradient-ascending algorithmwhich maximizes the outputs entropy is derived.

II. NONLINEAR MIXING-DEMIXING MODEL

Most of the works on blind source separation mainlyaddress cases of the instantaneous linear mixture. Thismodel assumes the existence of mutually independentsignals (i.e., different people speaking,noise, music,…) and the observation of many mixtures

, these mixtures being linear and instan-taneous. Let be a real or complex rectangular with

matrix, the data model for the linear mixture can beexpressed in matrix form as

(1)

where represents thestatistically independent sources column vector and

is the array containingthe observed mixtures.

For the real-world situation, however, the basic linear mixingmodel (1) is too simple for describing the observed data. Inmany applications, we can consider a PNL mixing model (see,e.g., [12]). PNL mixing is more realistic and accurate than thelinear model although it is a particular case of general nonlinearmixing.

For instantaneous mixtures, a nonlinear data model can havethe form

(2)

where represents an unknown differentiable bijective map-ping from a subset of in a subset of . This model providesthe observation , which is the unknown nonlinear mixtureof the unknown statistically independent source . In the par-

ticular case of PNL mixing, the model has channel nonlinearityonly and the observed mixtures can be defined as

(3)

where represents the invertible and derivable nonlinear func-tions and denotes the scalar entries of regular mixing matrix

. In PNL mixing, vector de-notes the component-wise nonlinear channel transfer function,which may be different for each channel. This model has an ap-plication in many scenarios such as the nonlinear characteristicintroduced by the preamplifiers of receiving sensors in sensorarrays, under the assumption of linear mixing behavior of theenvironment.

As demonstrated by Taleb and Jutten [13], the PNL modelhas a favorable separability property. This means that sources

, which were estimated using an INFOMAX criterion on theoutputs, are equal to the unknown sources with the same alter-ations noted in linear memoryless mixtures

(4)

where and are permutation and diagonal matrices, respec-tively, and is a constant translation vector.

An extension to a more general nonlinear model can be madeby introducing a subsequent stage to generate cross correlationbetween channels. This more general model, called PNL-L, in-troduces additional problems and no solution can be found inmany cases due to many local minima in the cost function. Ob-served mixtures are defined as

(5)In synthesis, the unknown mixing system is modeled as a

cascade of instantaneous linear mixing, a nonlinear functiontransformation and an additional linear mixing stage to generatecross-correlation (see Fig. 1). So, it is obvious to consider thecorresponding separating system as the inverse transformation

of the nonlinear part followed by a demixing matrix op-portunely defined during an adaptation process. The observedsignals are first combined with a matrix to eliminate or re-duce cross correlation; in the particular case of PNL mixing,the matrix is the identity matrix . This separating system isillustrated in Fig. 2.

III. INFORMATION-MAXIMIZATION FOR NONLINEAR

BLIND SEPARATION

It is well known that separation of independent sources ispossible using concepts derived from information theory. Twoor more random variables are stochasti-cally independent if knowledge of the values one of them tellsus nothing about the values of others. More generally, a set ofsignals are independent if their joint pdf can be decomposed as

(6)

Page 3: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

SOLAZZI AND UNICINI: SPLINE NEURAL NETWORKS FOR BLIND SEPARATION 819

Fig. 1. PNL-L mixing model.

Fig. 2. Nonlinear blind separation structure.

where is the pdf of th source signal.Let be the estimated source signal vector and let be its

pdf. In order to measure the degree of independence, we use anadequately chosen independent probability distribution

and consider the Kullback–Leibler (KL) divergencebetween two probability distributions and

(7)

This measure is nonnegative and reaches its minimum valueor vanishes if and only if ; in other words, when the vector

is mutually independent component wise. Minimizing the KLdivergence can make the estimated source signals independent;it is also equivalent to minimizing the Shannon’s mutual infor-mation of the estimated signals

(8)

where is the entropy of andthe entropy of its th component.

Another important property of the KL divergence is the in-variance under an invertible nonlinear transformation ofdata samples

(9)

As a consequence, the mutual independence will not be af-fected by any invertible nonlinear function transformation. If werepresent the overall separating system as a nonlinear differen-tiable mapping , the relation between inputand output joint distributions of this mapping is given by

(10)

where is the determinant of the Jacobian matrix of thetransformation

...... (11)

The entropy of the transformed output signal is given by

(12)

The optimal solution of the information-theoretic criterionis obtained when the maximum entropy of is reached, when

is a uniform distribution [1]. In this case, we have. Since the second term in (12) does not contain any model

parameter, thus, only maximizing the first term with respect tomodel parameter set, performs maximization of . Usingthe gradient ascent learning algorithm, we have to consider the

Page 4: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

820 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 4, APRIL 2004

Fig. 3. Locally adaptive spline/B-spline functions, each defined inside a singlespan.

derivative of the entropy function with respect to modelparameters (the proof is reported in the Appendix)

(13)

IV. ADAPTIVE SPLINE NETWORK

A. Adaptive Spline Flexible Activation Function

Spline activation functions are smooth parametric curves, di-vided in multiple tracts (spans) each controlled by four controlpoints. Let be the nonlinear function to reproduce, then thespline activation function can be expressed as

(14)

i.e., as a composition of spans (where is the totalnumber of the control points each de-pending from a local variable and controlled by the

control points as shown in Fig. 3. The pa-rameters , can be derived by a dummy variable that shiftsand normalizes the input

(15)

ififif

(16)

where is the fixed distance between two adjacent controlpoints. The constraints imposed by (16) are necessary to keep

Fig. 4. Adaptive spline blind separation system.

the input within the active region that encloses the controlpoints. Separating into integer and fractional parts using thefloor operator , we finally get

and (17)

In matrix form, the output can be expressed as

(18)

where

(19)

spline or

B-splice (20)

(21)

with and is the coefficient matrix of the Spline/B-spline interpolation. In order to ensure the monotonously in-creasing characteristic of the function, the following additionalconstraint must be imposed:

(22)

B. Structure of the Spline Neural Separating System

We employed the spline neuron structure to implement theblind separation system. Both output nonlinearities and inputnonlinearities are spline-based functions. The structure of thenonlinear BSS system is depicted in Fig. 4. The parameter set

for this model includes elements of the demixing matrix ,spline control points of each input non-linearity , and spline control points of each outputnonlinearity

C. Learning Algorithm Using INFOMAX Criteria

We can derive the learning algorithm of parameters of thenonlinear separating model using a gradient ascent method, onthe basis of the maximization entropy criterion. According to(13), we have (see Appendix for details)

(23)

Page 5: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

SOLAZZI AND UNICINI: SPLINE NEURAL NETWORKS FOR BLIND SEPARATION 821

(24)

(25)

(26)

where ; ; is theth column of matrix, and

(27)

(28)

(29)

(30)

(31)

(32)

is a diagonal matrix with element .We can adjust the matrices and also using the natural

gradient method proposed by Amari and Cichocki [19], whichsimplifies the learning rule, avoiding their inversion, and accel-erates the convergence of the learning process

(33)

(34)

where denotes the identity matrix.Let be the time-index step and the learning rate, the adap-

tation rules are

(35)

(36)

(37)

(38)

D. Discussion

In Section III, we have seen that minimizing the KL diver-gence can make the estimated source signals independent; thismeasure of independence is also equal to Shannon’s mutual in-formation between the components of output vector

(39)

where

(40)

The employment of mutual information as a cost function,as defined above, is not easy because the marginal entropies

strictly depend on the marginal pdf of the output,which are unknown and vary during the adaptation of modelparameters. Let us refer to model of Fig. 4.

The minimization of requires the computation of its gra-dient with respect to model parameters. In detail, to estimate thedemixing matrix coefficients we must compute

(41)

In [13], the authors introduced and demonstrated the fol-lowing lemma.

Lemma 1: Let be a random variable,and let be a function of differentiable with re-spect to the nonrandom parameter and such that accepts adifferentiable pdf . Then

(42)

where is called a score function (SF) of.Using Lemma 1, we have

(43)

where .The derivation of the mutual information with respect to pa-

rameters of spline nonlinear functions is

(44)

Page 6: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

822 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 4, APRIL 2004

Writing and using againLemma 1, the first term of (44) becomes

(45)

while the second term becomes

(46)

Combining the two terms, we obtain

(47)

Finally, we consider the input linear stage: To estimate thematrix coefficients, we have to compute

(48)

writing , we have

(49)

and for the second term

(50)

Combining the two terms, we have

(51)

Comparing (43), (47), and (51) with (25), (24), and (26), re-spectively, we can note that the two approaches are equivalentexcept for definition of the SF vector. The derivatives ofwith respect to model parameters point out the relevance of SFs.In the entropy maximization approach, a layer of nonlinear func-tions are applied to the outputs and employed to update the in-ternal model parameters. In the mutual information minimiza-tion approach, SF estimation is required to compute adaptationsfor the parameters. The difference between the two approachesconsists only in the estimation of SFs. In the first case, they aredefined as the ratio between second derivative and first deriva-tive of output spline nonlinear function; in the other case, theycan be directly estimated [13].

E. Direct Estimation of Spline-Based SF

Following the approach introduced by Taleb and Jutten in[13], we can redefine the output nonlinearities as the SFs em-ployed for the learning algorithm. We put

(52)

and then introduce a nonlinear parametric functionto approximate each SF . The parameter vector isestimated by minimizing the mean square error

(53)

according to a gradient descent algorithm

(54)

The gradient of with respect to parameters is given by

(55)

This expression also compares the unknown SFs . An-other lemma is necessary to overcome this problem (see [13],for the proof).

Lemma 2: Let be a random variable, and letbe its SF, if is a differentiable function satisfying

then

(56)

Applying this lemma to (55), we have

(57)

where the terms are no longer present.We propose to implement functions using the

spline approximation scheme defined above. Each nonlinearfunction is locally defined as

(58)

Page 7: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

SOLAZZI AND UNICINI: SPLINE NEURAL NETWORKS FOR BLIND SEPARATION 823

Fig. 5. Experiment 1. (a) Two source signals: a modulated signal and a sinusoid. (b) Nonlinear mixtures of source signals. (c) Separated signals.

(59)

(60)

(61)

The advantage of this second approach is that each element ofthe matrix is not a ratio between two functions, but an ap-

proximate spline function. This avoid the instability introducedby singular points generated by the ratios.

V. EXPERIMENTAL RESULTS

In this section, we present two experiments proposed in [14]to compare performance of our algorithm with results of RBFnetwork separating system and moreover a real audio applica-tion. In particular, due to stability property, as previously dis-cussed, we use the direct SF approach for all the experiments.

A number of simulations have been performed on aPentium II 300-MHz workstation to fully evaluate the validityof this approach.

Page 8: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

824 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 4, APRIL 2004

TABLE IAVERAGED INDEX S FOR EXPERIMENT 2. THE MEAN VALUE hS i HAS BEEN COMPUTED USING FIVE SIGNALS (OBTAINED USING FIVE DIFFERENT NETWORK

INITIAL CONDITIONS). EACH TEST HAS BEEN REPEATED USING DIFFERENT MIXINGA MATRIX WITH DIFFERENT det(A )

Experiment 1: Consider a two-channel nonlinear mixturewith cubic nonlinearity

(62)

where matrices and are defined as

The source vector is given by a amplitude-modulated signaland a sinusoidal signal

(63)

Fig. 5(a)–(c) shows, respectively, the input signals, the nonlinearmixtures and the separated signals for this example. Forthis experiment we employed a network structure with two neu-rons in the input and the output layers with spline nonlinearitiesdefined by with 31 control points and . The overalllearning process takes less than 1 min with learning rates valuesof the order of . As shown in Fig. 5(c), the proposed algo-rithm can give a clear separation of the nonlinear, without anyknowledge of original source signals.

Experiment 2: Consider a four-channel nonlinear mixturewith bipolar sigmoid nonlinearity

(64)

where the mixing matrix and are nonsingular and chosenas

We assume that the source vector consists of a binarysignal, a sinusoid, a saw-toothed wave (ramp) with period

, and a high-frequency carrier [Fig. 6(a)]

(65)

In our experiments, we adopted a network structure with fourneurons in the input and the output layers; both the input andoutput layers were characterized by spline nonlinearities with51 control points and . The spline functions wereinitialized as unitary linear function for the input layer and zerofunction for the output. We fixed the learning rates values of theorder of .

In order to evaluate the separation success of the proposedalgorithm, a measure of the separation goodness has been em-ployed. According to the method proposed in [22], we use thefollowing index:

(66)

where is the th output of cascade mixing/unmixing systemwhen only the source is active.

In order to experimentally estimate the convergence capabil-ities of the proposed method, the index has been statisticallyevaluate over five test. In Table I, we report the statistical resultsof the indexes averaged over five tests obtained using dif-ferent network initial conditions and different mixing matrices

[such as belong to a specific range].Note that, separability theoretical properties for PNL-L mix-

tures are not available, i.e., after achieving output independence,there is no guarantee for source separation. Although there is nomathematical proof, the simulation results, reported in Table I,indicate a successful source separation in all the performed tests.

The overall learning process with the proposed algorithmtakes about 80 s (on a Pentium II 300 MHz), and generallyreaches the convergence after 100000 learning steps (or inputpoints). The separated signals of Fig. 6(c) and the inputnonlinear mixtures are depicted in Fig. 6(b).

Experiment 3: The last experiment is a real audio applica-tion: consider a four-channel post-nonlinear mixture with dif-ferent nonlinearities

(67)

Page 9: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

SOLAZZI AND UNICINI: SPLINE NEURAL NETWORKS FOR BLIND SEPARATION 825

Fig. 6. Experiment 2. (a) Four source signals: a binary, a sinusoid, a ramp and a carrier. (b) Nonlinear mixtures of source signals. (c) Separated signals.

Page 10: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

826 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 4, APRIL 2004

Fig. 7. Experiment 3 Separation of 4 audio mixed sources: a man voice (1), two music tracks (2), (4), and noise (3).

where the mixing matrix is nonsingular and chosen as

The four audio sources are represented by a human (man)voice, two music tracks, and a noisy channel. Fig. 7 shows sep-aration results for this example. The network structure adoptedwas the same of the previous experiment. The overall learningprocess takes about 2 min with learning rates values of the orderof .

VI. CONCLUSION

A model for blind demixing of PNL-L mixtures based on aflexible B-spline activation function neuron, has been proposed.

Based on a gradient ascent method according to theINFOMAX criterion with SF estimation, a suitable learningalgorithm for the parameters of the nonlinear separating modelhas been derived.

We have defined two adaptive approaches. In the first, we usesplines functions to compensate the channel nonlinearities anddemixing matrices to invert the lineat part of the channel. We up-date the spline control point vectors and the de-mixing matricesby the computation of the ratios between second and first deriva-tives of these spline functions (as required by the adaptation al-

Page 11: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

SOLAZZI AND UNICINI: SPLINE NEURAL NETWORKS FOR BLIND SEPARATION 827

gorithm). In the second approach, we use splines to representdirectly, the ratios between the those derivatives. We update thedemixing matrices using the vector where elements are notratios between two functions but an approximate spline functioncalled SF. This avoids instability of singular points given by theratios computation.

Due to the local adaptation capability of spline nonlinearfunctions, this model is characterized by a fast learning conver-gence rate, and can be applied to real-time signal separation.

Although separability theoretical properties for PNL-Lmixtures are not available (the paper does not provide suchtheoretical results), the reported experiments (see, e.g., Table I),indicate that source separation can be obtained under differentinitial network conditions. In other words, although indepen-dence has not been proved to be theoretically sufficient forsuch mixtures, the statistical independence of the output has, inpractice, yielded separation in our model.

Moreover, a relevant feature of the proposed method is thatseparation of strong nonlinear mixtures is possible without anyknowledge of original source signals.

APPENDIX

Here, we give a detailed explanation of learning algorithmformulas. Adopting an INFOMAX criterion, the entropy of thetransformed output signal is given by

(68)

where is the Jacobian matrix of the transformation and itsdeterminant

...... (69)

Referring to the adaptive spline separation structure depicted inFig. 3, we have

(70)

and

(71)

In matrix form, can be rewritten as (72), shown at the bottomof the page. Using the gradient ascent learning algorithm, wehave to consider the derivative of the entropy functionwith respect to model parameters. Let be a generic modelparameter

(73)

In particular, updating of the output spline function is per-formed by the computation of

(74)

(75)

(76)

with , , , and

(77)

where is the th column of matrix; finally, we have

(78)

.... . .

......

. . ....

.... . .

......

. . .

(72)

Page 12: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

828 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 4, APRIL 2004

Updating the input spline function is quite a bit morecomplicated because we have to take into account the back prop-agation of the entropy function through output functions andmatrix

(79)

(80)

(81)

where

(82)

and indicates the th column of matrix .Let us now derive the algorithm for demixing matrix as

follows:

(83)

(84)

(85)

and for input matrix

(86)

(87)

Page 13: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: …increasing characteristic of the curve is ensured by certain poly-nomial constraints. The polynomial shape adaptation, as dis-cussed

SOLAZZI AND UNICINI: SPLINE NEURAL NETWORKS FOR BLIND SEPARATION 829

(88)

(89)

where

(90)

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers fortheir helpful comments and suggestions.

REFERENCES

[1] A. J. Bell and T. J. Sejnowski, “An information-maximization approachto blind separation and blind deconvolution,” Neur. Comput., vol. 7, pp.1129–1159, 1995.

[2] J. Dehaene and N. Twun-Danso, “Local adaptive algorithm for informa-tion maximization in neural networks, and application to source separa-tion,” in Proc. Int. Conf. Acoustic, Speech, Signal Processing , Munich,Germany, Apr. 1997, pp. 59–62.

[3] C. C. Cheung and L. Xu, “Separation of two independent sources bythe information-theoretic approach with cubic nonlinearity,” in Proc.IEEE Int. Conf. Neural Networks, vol. 4, Houston, TX, June 1997, pp.2239–2244.

[4] L. Xu, C. C. Cheung, H. H. Yang, and S. Amari, “Independent com-ponent analysis by the information theoretic approach with mixture ofdensities,” in Proc. IEEE Int. Conf. Neural Networks, vol. 3, Houston,TX, June 1997, pp. 1821–1826.

[5] S. Fiori, P. Bucciarelli, and F. Piazza, “Blind signal flatting usingwarping neural models,” in Proc. Int. Joint Conf. Neural Networks, vol.3, 1998, pp. 2312–2317.

[6] S. Guarnieri, F. Piazza, and A. Uncini, “Multilayer feedforward net-works with adaptive spline activation function,” IEEE Trans. NeuralNetworks, vol. 10, pp. 672–683, May 1999.

[7] L. Vecci, F. Piazza, and A. Uncini, “Learning and approximation capa-bilities of adaptive spline activation function neural networks,” NeuralNetworks, vol. 11, pp. 259–270, 1998.

[8] A. Uncini, L. Vecci, P. Campolucci, and F. Piazza, “Complex-valuedneural networks with adaptive spline activation function for digital radiolinks nonlinear equalization,” IEEE Trans. Signal Processing, vol. 47,pp. 505–514, Feb. 1999.

[9] G. Deco and W. Brauer, “Non linear higher-order statistical decorrela-tion by volume-conserving neural architectures,” Neural Networks, vol.8, pp. 525–535, 1995.

[10] P. Pajunen, A. Hyvarinen, and J. Karhunen, “Nonlinear blind sourcesseparation by self-organizing maps,” in Proc. Int. Conf. Neural Informa-tion Processing Systems, vol. 2, Hong Kong, Sept. 1996, pp. 1207–1210.

[11] H. H. Yang, S. I. Amari, and A. Cichocki, “Information back-propaga-tion for blind separation of sources from nonlinear mixtures,” in Proc.Int. Conf. Neural Networks, vol. 4, Houston, TX, 1996, pp. 2141–2146.

[12] A. Taleb, C. Jutten, and S. Olympieff, “Source separation in post non-linear mixtures: An entropy-based algorithm,” in Proc. Eur. Symp. Arti-ficial Neural Networks, 1998, pp. 2089–2092.

[13] A. Taleb and C. Jutten, “Source separation in post nonlinear mixtures,”IEEE Trans. Signal Processing, vol. 47, pp. 2807–2820, Oct. 1999.

[14] Y. Tan, J. Wang, and J. M. Zurada, “Nonlinear blind source separationusing a radial basis function,” IEEE Trans. Neural Networks, vol. 12,pp. 124–134, Jan. 2001.

[15] L. B. Almeida, “MISEP—An ICA method for linear and nonlinear mix-tures, based on mutual information,” in Proc. WCCI Int. Joint Conf.Neural Networks, May 2002, pp. 442–447.

[16] M. Solazzi, F. Piazza, and A. Uncini, “An adaptive spline nonlinearfunction for blind signal processing,” in Proc. Proc. IEEE WorkshopNeural Networks Signal Processing, vol. 1, Sydney, Australia, Dec.11–13, 2000, pp. 396–404.

[17] A. Pierani, F. Piazza, M. Solazzi, and A. Uncini, “Low complexity adap-tive nonlinear function for blind signal separation,” in Proc. IEEE Int.Joint Conf. Neural Networks, vol. 3, Como, Italy, July 24–27, 2000, pp.333–338.

[18] J. F. Cardoso, “Blind signal separation: Statistical principles,” Proc.IEEE, vol. 86, pp. 2009–2025, Oct. 1998.

[19] S. I. Amari and A. Cichocki, “Adaptive blind signal processing-Neuralnetwork approaches,” Proc. IEEE, vol. 86, pp. 2026–2048, Oct. 1998.

[20] L. Q. Zhang, A. Chichocki, and S. Amari, “Natural gradient algorithmfor blind separation of overdetermined mixture with additive noise,”IEEE Signal Processing Lett., vol. 6, pp. 293–295, Nov. 1999.

[21] T.-W. Lee, B. Koehler, and R. Orglmeister, “Blind source separation ofnonlinear mixing models,” Neur. Networks Signal Processing VII, pp.406–415, Sept. 1997.

[22] D. Schobben, K. Torkkola, and P. Smaragdis, “Evaluation of blind signalseparation methods,” in Proc. Int. Workshop Independent ComponentAnalysis Blind Signal Separation, Aussois, France, Jan. 11–15, 1999.

Mirko Solazzi was born in Jesi, Italy, in 1973. Hereceived the laurea degree in electronic engineering(with honors) and the Ph.D. degree in electronic en-gineering from the University of Ancona, Ancona,Italy, and the Politecnico of Bari, Bari, Italy, in 1998and 2002, respectively.

His research interests include adaptive neural net-works, digital signal processing, and nonlinear blind-source separation.

Aurelio Uncini (M’88) was born in Cupra Montana,Italy, in 1958. He received the laurea degree in elec-tronic engineering and the Ph.D. degree in electricalengineering from the University of Ancona, Ancona,Italy, and the University of Bologna, Bologna, Italy,in 1983 and 1994, respectively.

From 1984 to 1986, he was with the “Ugo Bor-doni” Foundation, Rome, Italy, engaged in researchon digital processing of speech signals. From 1986to 1987, he was with the Italian Ministry of Com-munication. From 1987 to 1993, he has been a free

researcher affiliated with the Department of Electronics and Automatics, Uni-versity of Ancona, where, from 1994 to 1998, he was an Assistant Professor.Since November 1998, he has been an Associate Professor in the INFOCOMDepartment,University of Rome “La Sapienza,” Rome, Italy, where he teachescircuits theory, audio signal processing and neural networks. He is the author ofmore than 100 papers in the field of circuits theory, optimization algorithms forcircuits design, neural networks, and signal processing. His present research in-terests also include adaptive filters, audio processing, neural networks for signalprocessing, and blind signal processing.

Prof. Uncini is a member of the Machine Learning for Signal Processing TCof the IEEE Signal Processing Society, the TC on Blind Signal Processing of theIEEE Circuits and Systems Society, the Associazione Elettrotecnica ed Elet-tronica Italiana (AEI), of the International Neural Networks Society (INNS),and the Società Italiana Reti Neuroniche (SIREN).