chapter 3 artificial neural networks and...

26
46 CHAPTER 3 ARTIFICIAL NEURAL NETWORKS AND LEARNING ALGORITHM 3.1 ARTIFICIAL NEURAL NETWORKS 3.1.1 Introduction The notion of computing takes many forms. Historically, the term ‘computing’ has been dominated by the concept of ‘programmed computing’ than neural computing. In programmed computing, algorithms are designed and subsequently implemented using the currently dominant architecture, whereas in neural computing, learning replaces a priori program development. The neural computing offers a potential solution to many currently unsolved problems in conventional computing. Artificial neural networks provide new tools and new foundations for solving practical problems in prediction, decision and control, signal separation, state estimation, pattern recognition, data mining, etc. Traditional statistics tries to collect a huge library of different methods for different tasks, but the brain is a living proof that one system can do it all, if there is data. It proves that a system can manage millions of variables without being confused. Nowadays, engineers and scientists are trying to develop intelligent machines. Artificial Neural Systems (ANS) are examples of such machines that have greater potential to further improve the quality of human life. Artificial neural networks are collections of mathematical models that emulate some of the observed properties of biological nervous systems and

Upload: others

Post on 20-Apr-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

46

CHAPTER 3

ARTIFICIAL NEURAL NETWORKS AND

LEARNING ALGORITHM

3.1 ARTIFICIAL NEURAL NETWORKS

3.1.1 Introduction

The notion of computing takes many forms. Historically, the term

‘computing’ has been dominated by the concept of ‘programmed computing’

than neural computing. In programmed computing, algorithms are designed

and subsequently implemented using the currently dominant architecture,

whereas in neural computing, learning replaces a priori program development.

The neural computing offers a potential solution to many currently unsolved

problems in conventional computing. Artificial neural networks provide new

tools and new foundations for solving practical problems in prediction,

decision and control, signal separation, state estimation, pattern recognition,

data mining, etc. Traditional statistics tries to collect a huge library of

different methods for different tasks, but the brain is a living proof that one

system can do it all, if there is data. It proves that a system can manage

millions of variables without being confused. Nowadays, engineers and

scientists are trying to develop intelligent machines. Artificial Neural Systems

(ANS) are examples of such machines that have greater potential to further

improve the quality of human life.

Artificial neural networks are collections of mathematical models that

emulate some of the observed properties of biological nervous systems and

47

draw on the analogies of adaptive biological learning. They are capable of

developing their behavior through learning. They learn through experiences

like human brain. They are dynamical systems in the learning/training phase

of their operation and convergence is an essential feature. There are different

types of ANNs. Some of the popular models include the BPN which is

generally trained with the Generalized Delta Rule (GDR), Learning Vector

Quantization (LVQ), Radial Basis Function (RBF), Hopfield, Adaptive

Resonance Theory (ART) and Kohonen’s Self-Organizing Feature Map

(SOM) networks. Some ANNs are classified as feedforward while others are

recurrent (i.e., implement feedback) depending on how data is processed

through the network.

The synaptic weight update of ANNs can be carried out by

supervised methods, or by unsupervised methods, or by fixed weight

association network methods. In the case of the supervised methods, inputs

and outputs are used, whereas, in the unsupervised methods, only the inputs

are used. In the fixed weight association network methods, inputs and outputs

are used along with pre-computed and pre-stored weights. Artificial Neural

Networks are used where

• a conventional process is not suitable.

• the conventional method cannot be easily delivered.

• the conventional method cannot fully capture the complexity

in the data and the stochastic behavior is important.

• an explanation of the network’s decision is not required.

The function of a single neuron in an ANN is shown in Figure 3.1.

• Each neuron receives inputs from other neurons and/or from

external sources.

48

• Like a real neuron, the processing element has many inputs

but has only a single output, which is connected to many other

processing elements in the network.

• Each processing element is numbered. It applies

linear/nonlinear function on its net input to compute the

output.

Figure 3.1 Function of a single neuron

In the above Figure, the input pattern is represented by

[ ]T

1 2 nX x ,x ,......., x= (3.1)

Each artificial neuron’s input has an associated weight which

indicates the fraction (or amount) of the transfer of one neuron’s output to

another neuron’s input. The inputs for an artificial neuron are from external

sources or from other neurons. The notation ‘wij ’ is used to represent the

weight on the interconnection link from neuron ‘j’ to neuron ‘i’. The weight

vector ‘W’ is represented as

[ ]T

1 2 nW w ,w ,.......,w= (3.2)

49

The net input value of a single neuron is determined by the

weighted sum of inputs as given by

i i1 1 i2 2 in nnet(i) net w x w x ............ w x= = + + + (3.3)

= n

ik kk 1

w x=∑ (3.4)

where ‘n’ is the number of inputs. A neuron (or unit) fires, if the sum of its

inputs exceeds some threshold value. If it fires, it produces an output, which

has been sent to the next layer neurons.

In vector notation, the net input value is given by

Tinet W X= (3.5)

The output value of a single neuron is obtained as

io(i) f (net )= (3.6)

The objective of neural network design is to determine an optimal

set of weights w*. Therefore, the artificial neurons involve two important

processes.

(i) Determine the net input value by combining the inputs.

(ii) Mapping the net input value into the neuron’s output. This

mapping may be as simple as using the identity function or as

complex as using a nonlinear function.

3.1.2 Backpropagation Neural Network

The area of speech signal separation and recovery of original signals

from a mixed signal is a challenging domain for information and system

processing. Continued attempt of using artificial neural network in speech

50

signal separation is taking place (Cichocki and Unbehauen 1996; Amari and

Cichocki 1998; Meyer et al 2006). They have ensured the separation of

extremely weak or badly scaled stationary signals, as well as a successful

separation even if the mixing matrix is very ill-conditioned.

The Backpropagation (BPN) neural network is a multilayer

supervised neural network which uses steepest descent method to update

weights. Its architecture as given in Figure 3.2 has an input layer with ‘I’

nodes and an output layer with ‘K’ nodes and one hidden layer with ‘J’ nodes.

Each neuron in the hidden layer has its own input weights and the output of a

neuron in a layer goes to all neurons in the next layer.

Figure 3.2 Architecture of backpropagation neural network

The initial weight values are randomly generated by the Matlab function

rand(). The input layer does not process the mixture input. It just distributes

the input samples to all the neurons in the hidden layer. The output of each

hidden layer neuron is obtained by applying the sigmoid function to its net

51

input value. Each hidden neuron output is fed to all the neurons in the output

layer. Each neuron in the output layer, first calculates the net input value and

then it applies nonlinear function on the net input to produce an output value

‘m’.

In the earlier stage of this work, an attempt has been made to

extract the individual speech signals from an artificially mixed speech signal

using Backpropagation neural network and its performance is compared with

RBF neural network. To compare their performance, the waveforms of

speeches of only two persons are recorded in a closed environment for two

seconds at the rate of 8KHz. The schematic diagram of speech signal recovery

by BPN neural network is shown in Figure 3.3.

Training

`

Testing

Figure 3.3 Schematic diagram of speech signal recovery

4000 samples of each speech signal shown in Figure 3.4 are mixed and are

used for training the BPN network.

52

(a) Speech signals of two persons

(b) Mixed speech signal

Figure 3.4 Two speech signals and mixed signal

(a) Recovery of speech 1 (b) Recovery of speech 2

Figure 3.5 Recovery of speech signals by BPN neural network

(a) Recovery of speech 1 (b) Recovery of speech 2

Figure 3.6 Recovery of speech signals by RBF neural network

53

From the simulation result shown in Figure 3.5, it has been observed that the

recovered signals by BPN are distorted with poor quality of speech for limited

number of iterations (at which RBF neural network is converged). The BPN

network which has 15 hidden neurons, takes 65 minutes to reduce the error to

0.01. The computational load is more since the weights between input and

hidden layer and weights between hidden and output layer are updated.

The major limitation of the backpropagation neural network is

its slow convergence time (i.e. more training time) and it has ended up with

local minima. Moreover, there is no proof of convergence, although it seems

to perform well in practice. Due to stochastic gradient descent on a nonlinear

error surface, it is likely that most of the time the result may converge to some

local minimum on the error surface. Another major limitation is the problem

of scaling and when the complexity of the problem is increased, there is no

guarantee that good generalization would result. So, the network size, such as

the number of hidden layers has to be increased. This results in heavy burden

on the computation and network complexity. So the RBF neural network is

trained with the same data and from Figure 3.6, it is observed that

recovered signals by RBF network are obtained without distortion. The RBF

network which has two hidden neurons, takes 52 sec to reduce the error to

0.01. The computational load is less since only the weights between hidden

and output layer are updated.

3.1.3 Training Strategy of Backpropagation Neural Network

For the network to learn patterns, the weight updating algorithm,

Unsupervised Stochastic Gradient Descent Algorithm (USGDA) has been

used. The present work involves modification of weights to extract the

54

independent signals from mixed signals. The function of the network is based

on an unsupervised learning strategy.

The inputs of a pattern are presented and the output of the network

in the output layer is computed and the weights in the output layer and hidden

layer are updated by the weight update equation and compared with the

previous weight value. The total error i.e., the difference between the previous

weight value and the current weight value is determined. The total error for all

patterns presented is calculated and if this total error is greater than zero, the

learning rate parameter is varied by the Equation (4.1) and the weights are

updated by the weight update Equations (3.7) and (3.8). At each iteration, this

process decreases the total error of the network for all the patterns presented.

To minimize the total error to zero, the network is presented with all the

training patterns many times. This procedure is repeated until the error

becomes 0.01.

As given in Figure 3.2, ‘R’ is the weight matrix between input and

hidden layer. ‘Z’ is the weight matrix between the hidden and output layer.

The weight value zkj on the interconnection from neuron ‘j’ to neuron ‘k’ is

updated by the weight update equation

( ) ( ) ( ) ( )( ) Tkj kj 1 2 pz t 1 z t lrp d t d t O+ = + × − × + ε (3.7)

where ‘t’ - time step,

lrp - learning rate parameter, = 0.99,

ε - constant parameter, = 0.05

d1(t) = inv(det(Z(t))

( ) 3 5 7 9 11

2 k k k k k

13 15k k

d t [3m 4m 2.92m 5m 3.417m

0.78m 0.056m ]

= − + − +

− +

55

TpO - Transpose of mixture signal ‘p’

Similarly, all the remaining weights between hidden and output

layer are updated by the Equation (3.7). The weight value rji on the

interconnection from neuron ‘i’ to neuron ‘j’ is updated by the weight update

equation

( ) ( ) ( ) ( )( )( ) ( ) ( ) 1Tji ji 1 2 p kj pjr t 1 z t lrp d t d t O z t 1 exp( net

−+ = + × − × + ε × × + −

(3.8)

Similarly, all the remaining weights between input and hidden layer are

updated by the Equation (3.8).

To implement this algorithm, the speeches of two persons are

recorded in a closed environment for two seconds at the rate of 4 KHz. Like

this, 25 speeches of males and 25 speeches of females are recorded and stored

as .wav files. Fifty combinations (one male and one female; two males; two

females) of different speech waveforms (S1 and S2) are mixed artificially by

multiplying the speech signals with various coefficients as given by

O1 = 0.3×S1+0.7×S2 (3.9)

500 samples of each mixture signal are preprocessed by the

technique, normalization so that the inputs to the nodes of the input layer are

between zero and one. 40 mixture signals are used for training both BPN and

RBF neural networks by the unsupervised stochastic gradient descent

algorithm and 10 mixture signals are used for testing the networks. From

Figure 3.7, it is found that after certain number of iterations (no. of

iterations=2223, at which the RBF network is converged), the BPN network is

able to converge to only some extent. The network is found to convergence

56

slowly due to local minima and slow training. When the number of nodes in

hidden layer is 15, BPN network takes 13 hours and 5 minutes for reducing

the error to 0.01. So, it has been observed that the recovered signals by BPN

are distorted for limited number of iterations (at which ASN-RBF neural

network is converged) and the contents of speeches are retained with

distortions in the quality of speech.

0102030405060708090

100

0 1000 2000 3000

No. of Itrations

Err

or BPN

ASN-RBF

Figure 3.7 Graph of error versus iterations for sample size=500

(50 mixture signal), MSE= 0.01 and ηηηη=0.99

When the number of nodes in hidden layer is 2, RBF neural network takes 4

hours and 33 minutes for reducing the error to 0.01 in 2223 iterations. So, the

training time of RBF network is 8 hours and 32 minutes less than that of BPN

network. Thus, the performance of the RBF network is found to be much

superior to BPN in terms of recovering the original signals with less training

time.

57

3.1.4 Adaptive Self-Normalized Radial Basis Function (ASN-RBF)

Neural Network

In recent years, there has been an increasing interest in using Radial

Basis Function neural network for many problems. Like Backpropagation and

Counter propagation neural networks, it is a feedforward neural network that

is capable of performing nonlinear relationship between the input and output

vector spaces. RBF and BPN are both universal approximators. i.e., when

they are designed with enough hidden layer neurons, they approximate any

continuous function with arbitrary accuracy (Girosi and Poggio 1989;

Hartman et al 1990). This is a property they share with other feedforward

networks having one hidden layer of nonlinear neurons. Hornik et al (1989)

have shown that the nonlinearity need not be sigmoid and it can be any of a

wide range of functions. It is therefore not surprising to find that there always

exists an RBF network capable of accurately mimicking a specified BPN or

vice-versa.

The RBF network is found to be suitable for BSS problem since it has

the following characteristics.

1. It has faster learning capability and it is good at handling

nonlinear data.

2. It finds input to output mapping using local approximators and

they require fewer training samples.

3. It provides smaller interpolation errors, higher reliability and a

more well-developed theoretical analysis than BPN

58

As shown in Figure 3.8, the ASN-RBF neural network consists of three

layers: an input layer with 500 neurons, a single layer of nonlinear processing

neurons and an output layer with 2, 3 or 4 neurons depending on the number

of sources. In Backpropagation neural network, the weights between hidden

layer and output layer and also the weights between hidden and input layer

are updated during training. But, in RBF neural network, only the weights

between hidden and output layer are updated. The RBF network does not end

up with local minima and the outputs of the hidden layer neurons are

calculated by

( ) ( ) ( )k

m mp f o u o,c u o ci i ik k k ik k

k 1 k 1= = ϕ = ϕ −∑ ∑

= =

for i 1,2,...., N= (3.10)

and the outputs of output layer neurons are calculated by

N

ji ii 1

j

w .pm =

= β

∑ where

1β =α

(3.11)

where ip is the output of the hidden neuron ‘i’, ‘α’ is the convergence

parameter used in the network, n 1o R +∈ is an input vector and ( )k .ϕ is a radial

basis function which is given by ( )2 2iexp( D ) 2− λ , where

( ) ( )T2i ji jiD O W O W= − − and ‘λ’ is the spread factor which controls the

width of the radial basis function, ikU is the weight matrix between input and

hidden layer, jiW is the weight matrix between hidden and output layer, ‘N’ is

the number of neurons in the hidden layer and N 1kc R ×∈ are the RBF centers

in the input vector space. For each neuron in the hidden layer, the Euclidean

59

distance between its associated center and the input to the network is

computed. The convergence parameter ‘α’ is used in the network for faster

convergence of the proposed learning algorithm. During training, if it is very

low, the total error becomes NaN (Not a Number) and the network is not

converged. So, the convergence parameter is gradually increased from a lower

value such that the network does not encounter with NaN and the network is

converged for a particular value. Therefore, the total error is reduced to the

tolerance value after a finite number of iterations.

m=500; N=20-2; n=2,3,4

Figure 3.8 Topology of ASN-RBF neural network

The convergence parameters used for different experimental set ups

are given in Table 3.1. The ASN-RBF neural network architecture is capable

of performing nonlinear relationship between input and output vector spaces.

The scaling parameter ‘β’ is used for post-processing to obtain the correct

60

output data. The centers ck are assumed to perform an adequate sampling of

the input vector space. They are usually chosen as subset of the input data.

The weight vector Wik determines the value of ‘O’ which produces the

maximum output from the neuron. The response at other values of ‘O’ drops

quickly as ‘O’ deviates from ‘W’ becoming negligible in value when ‘O’ is

far from ‘W’.

Table 3.1 Convergence parameters used for experimental set ups

Source signals No. of

samples

Convergence parameter

(α)

Max. Iterations

Crow2.wav,song1.wav 500 100.5 2087

crow2.wav, ssong1.wav wsong2.wav

500 25.5 752

male1.wav,female1.wav, male2.wav

500 15.5 1349

male1.wav,female1.wav, male2.wav, female2.wav

500 15.5 720

3.1.5 Training Strategy of ASN-RBF Neural Network

There are two sets of parameters governing the mapping properties

of the RBF neural network: the weights Wik in the output layer and the centers

ck of the radial basis functions. The ASN-RBF neural network is trained with

fixed centers. They are chosen in a random manner as a subset of the input

data set. After the network has been trained, some of the centers are removed

61

in a systematic manner without significant degradation of the system

performance. The location of the centers of the receptive fields is a critical

issue and there are many alternatives for their determination. In the learning

algorithm, the center and corresponding hidden layer neuron are located at

each input vector in the training set. The diameter of the receptive region,

determined by the value of the spread factor ‘λ’ is set at 0.01, has a profound

effect upon the accuracy of the system. The objective is to cover the input

space with receptive fields as uniformly as possible. If the spacing between

centers is not uniform, it is necessary for each hidden neuron to have its own

value of ‘λ’.

100 inputs (birds voices) are used for this proposed network. Out of

these, 80 inputs are used for training the ASN-RBF neural network and 20

inputs are used for testing the network. Each input corresponds to different

combinations of birds voices downloaded from the website

http://www.nhl.nl/~ribot/english/sounds1.html. Different combinations of the

birds voices are artificially mixed and preprocessed by the technique,

Normalization so that the inputs to the nodes of input layer are between zero

and one. The preprocessed input is fed to the network in the form of 500

samples, corresponding to 500 neurons in the input layer. The number of

source signals is varied from two to four. The ASN-RBF neural network is

also trained and tested for separation of speech signals which are recorded for

about 2 sec in a closed environment at the rate of 8 KHz. The proposed ASN-

RBF neural network and learning algorithm perform well under non-

stationary environments and when the number of source signals is unknown

and dynamically changed.

62

3.2 UNSUPERVISED STOCHASTIC GRADIENT DESCENT

LEARNING ALGORITHM

To separate independent components from the observed signal, an

objective function is required. The objective function is chosen such that it

gives original signals when it is minimized. During training of the ASN-RBF

neural network, one of the free parameters i.e., the interconnection weights

between hidden layer and output layer are adjusted to minimize the objective

function. In signal processing (Taleb and Jutten 1998), when the components

of the output vector become independent, its joint probability density function

factorizes to marginal pdfs, given by

( ) ( )i

k

M M ii 1

f M,W f m ,W=

= ∏ (3.12)

where )W,m(f iM i is the marginal pdf of Mi , mi is the ith component of the

output signal ‘M’ and ‘k’ is the number of source signals. The Equation (3.12)

is a constraint imposed on the learning algorithm. The joint pdf of ‘M’

parameterized by ‘W’ is written as

( )( )

M

f Oof M,W

B= (3.13)

where B is the determinant of the Jacobian matrix ‘B’. It is defined as

1 1 1

1 2 k

2 2 2

1 2 k

k k k

1 2 k

m m m...

o o o

m m m...

o o oB

... ... ... ...

m m m...

o o o

∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂=

∂ ∂ ∂∂ ∂ ∂

(3.14)

63

Referring to Equation (1.4), each element in Equation (3.14) is

represented interms of ‘w’ as

iij

j

mw

o

∂ =∂

(3.15)

Therefore, Equation (3.14) is written as

11 12 1k

21 22 2k

k1 k2 kk

w w ... w

w w ... wW

... ... ... ...

w w ... w

= (3.16)

Now, Equation (3.13) is written as

( )( )

M

f Oof M,WW

= (3.17)

To extract independent components from the observed signal, the

difference between the joint pdf and the product of marginal pdfs is

determined. When the components become independent, the difference

becomes zero (i.e. the joint pdf becomes equal to the product of marginal pdfs

of the separated signals). It is expressed as

( ) ( )i

k

M M ii 1

f M,W f m ,W 0=

− =∏ (3.18)

Since the logarithm provides computational simplicity, it is taken

on both sides of Equation (3.18) and it becomes,

( ) ( )( )i

k

M M ii 1

log(f M,W) log f m ,W=

=∑ (3.19)

64

Substituting the value of ( )Mf m,W from Equation (3.17) in

Equation (3.19),

( ) ( )( )

i

ko

M ii 1

f Olog log f m ,W

W =

=

∑ (3.20)

Because the pdf of the input vector is independent of the parameter

vector ‘W’, the objective function for optimization becomes

( ) ( )i

k

M ii 1

(W) log W log f (m ,W)=

Φ = − −∑ (3.21)

Now, the Edgeworth series has been used to expand the second

term in Equation (3.21). The edgeworth series expansion of the random

variable ‘M’ about the Gaussian approximate α(m) is given by

( )( ) ( ) ( ) ( )

2M i 3 34

3 4 6

f m k 10kk1 H m H m H m .........

m 3 4 6= + + + +

α ! ! ! (3.22)

where α(m) denotes the probability density function of a random variable ‘M’,

normalized to zero mean and unit variance, ki denotes the cumulant of order

‘i’ of the standardized scalar random variable ‘M’ and Hi denotes the Hermite

polynomial of order ‘i’. The third and fourth order cumulants and Hermite

polynomials are given by

k3=c3 (3.23)

H3(m) = m3-3m (3.24)

H4(m)=m4-6m2+3 (3.25)

k4=c4-3c22 (3.26)

65

The cumulants are expressed interms of moments. The rth order

moment of ‘m’ is given by

ri,r ic E m = (3.27)

rn

ir ir 1

E w .O=

= ∑ (3.28)

Substituting the values of the cumulants and Hermite polynomials

from Equations (3.23), (3.24), (3.25) and (3.26) in Equation (3.22) and taking

logarithm on both sides, it becomes

( )( )

4 6 8 10 12M i

14 16

f mlog 0.75m 0.67m 0.365m 0.5m 0.285m

m

0.06m 0.0035m

= − + − + −α

+ +

(3.29)

Differentiating Equation (3.29) with respect to wik,

( )( )

M i

3 5 7 9 11 13 15i i i i i i i k

ik

f mlog

m3m 4m 2.92m 5m 3.417m 0.78m 0.056m O

w

∂α

= − + − + − + − ∂ (3.30)

Therefore, the optimization function becomes

( ) 3 5 7 9 11 13 15i i i i i i i im (t) 3m 4m 2.92m 5m 3.417m 0.78m 0.056m Ψ = − + − + − +

(3.31)

After simplification, the gradient descent of Equation (3.21) now becomes

( ) ( )W T TW m OW

∂Φ − − = − Ψ ∂

(3.32)

The stochastic gradient descent algorithm for weight update is

written as

66

( ) ( )WW t 1 W(t)

W

∂Φ+ = − η

∂ (3.33)

Substituting the gradient of the cost function from Equation (3.32)

in Equation (3.33), the weight update rule is written as

( ) ( ) ( ) ( )( ) ( )T TW t 1 W t t W m t O t− + = + η − Ψ (3.34)

The Edgeworth series is used for the approximation of probability

density functions since its coefficients decrease uniformly and the error is

controlled, so that it is a true asymptotic expansion. On the other hand, the

terms in the Gram-Charlier expansion do not tend uniformly to zero from the

view point of numerical errors; i.e., in general, no term is negligible compared

to a preceding term.

3.2.1 Algorithm Description

Once the centers and spread factors have been chosen, the output

layer weight matrix ‘W’ is optimized by unsupervised learning using

stochastic gradient descent technique. The training process consists of the

following sequences as given in Figure 3.9.

Step 1 : Initialize the parameters.

a) Assign weights between input and hidden layer.

b) Assign weights between hidden and output layer.

c) Set η=0.99, λ=0.09, 0.05ε = and set M(t)=O(t).

Step 2 : Read input signals.

Step 3 : Generate mixing matrix A.

Step 4 : Obtain the observed mixture signal ‘O’.

Step 5: Preprocess (i.e., normalize) the mixture signals.

67

Step 6 : Recover source signals.

(i) Apply the observed signal to the input layer neurons.

% Forward operation

% For each pattern in the training set

a) Find the Hidden layer output.

b) Find inputs to nodes in the output layer.

c) Compute the actual output of output layer

neurons.

d) Determine delta ∆W.

e) Update the weights between hidden and output

layer.

f) If the difference between previous weight value

and current weight value is not equal to zero,

then

go to step 5

else

stop training

Step 7: Postprocess (i.e., Denormalization) the output data.

Step 8: Store and display the separated signals.

Thus, the development of this algorithm involves

maximization of the statistical independence between the output vectors

‘M’. It is equivalent to minimizing the divergence between the two

distributions:

(i) Joint probability density function fM(m,W) parameterized

by W.

68

(ii) Product of marginal density function of Mi,

( )n

M iii 1

f m ,W=

∏ .

Figure 3.9 Block diagram of source signal recovery

3.2.2 Implementation of USGDA

Step 1: Get Source 1.

[s1,srate,no_bits]=wavread('nukeanthem');

% Returns the sample rate in Hertz and the

number of bits per sample (NBITS) used to

encode the data in the file.

s1=wavread('nukeanthem.wav',num_samples);

69

sources(1,:)=s1;

Step 2: Get Source 2.

s2=wavread('dspafxf.wav',num_samples);

% Returns only the first N samples from each

channel in the file.

sources(2,:)=s2;

Step 3: Get Source 3.

s3=wavread('UTOPIA.wav',num_samples);

sources(3,:)=s3;

Step 4 : Initialize the parameters.

a) Assign weights between input and hidden layer.

b) Assign weights between hidden and output layer.

c) Set η=0.99, λ=0.09, 0.05ε = and set M(t)=O(t).

Step 5: Generate mixing matrix A.

A=rand(num_mixtures,num_sources);

Step 6: Obtain the observed mixture signal ‘O’ by

O = sources * A;

Step 7: Preprocess the mixture signal.

Step 8: Recover source signals.

%Forward operation

% For each pattern in the training set

(i) Find h (Hidden layer output).

for j =1:p

dis=0;

for k=1:il

x=O(j,k);

c=center(i,k);

diff = x-c; dis = dis + diff.^2;

70

end

phi(j)=exp((-dis)/(2*sig^2));

output_of_hiddenn(j)=phi(j);

end

(ii) Find inputs to nodes in the output layer.

input_to_outputn=output_of_hiddenn*hou;

(iii) Compute the actual output (for output layer

neurons)

for b = 1:ol

output_of_outputn(b)=(input_to_outputn(b)/α);

end

(iv) Find the difference between previous weight

value and current weight value.

(v) If the difference is equal to zero, stop training, else

find the weight update value i.e., delta

( )( )1d inv det hou _ old ;=

for k=1:output_neurons

( ) 3 5 7 9 11

2 k k k k k

13 15k k

d k [3m 4m 2.92m 5m 3.417m

0.78m 0.056m ]

= − + − +

− +

end

T1 2deloutput (d d ) O= − × ;

(vi) Update the weights between hidden and output

layer by the equation

W(t 1) W(t) lrp deloutput (t) ;+ = + × + ε

Step 9: Evaluate M(t) and postprocess it to obtain the original

signals.

Step 10: Repeat steps 8 and 9 until the difference between the

previous weight value and the current weight value

is not equal to zero.

71

The radial basis function phi = exp(-di2)/(2σ)^2 was evaluated over the

interval -1< x <1 and -1<y<1 as shown in the graph in Figure 3.10.

Figure 3.10 Evaluation of radial basis function over the

interval -1< x <1 and -1<y<1

Thus, the algorithm which is implemented, involves the condition of

independency which inturn related to varying the weights of RBF neural

network, successfully separates the signals from the mixed input signal. The

ASN-RBF neural network and the proposed learning algorithm perform well

under non-stationary environments and when the number of source signals is

unknown and dynamically changed.