connectionist propability estimators in hmm using genetic clustering application for speech...

8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

1/18

International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

CONNECTIONIST PROBABILITY ESTIMATORS IN HMM

USING GENETIC CLUSTERINGAPPLICATION FOR SPEECH RECOGNITION

AND MEDICAL DIAGNOSIS

Lilia Lazli1, Boukadoum Mounir2, Abdennasser Chebira3, Kurosh Madani3andMohamed Tayeb Laskri1

University,Badji MokharComputer Science (LRI/GRIA),Laboratory of research in1B.P.12 Sidi Amar 23000 AnnabaAlgeria.

[email protected],[email protected], http://www.univ-annaba.org 2Univesit du Qubec A Montral (UQAM), Canada

[email protected] Images, Signals and Intelligent Systems Laboratory (LISSI / EA 3956)PARIS XII University, Senart-Fontainebleau Institute of Technology,

Bat.A, Av. Pierre Point, F-77127 Lieusaint, France,

paris12.fr-http://www.univ,paris12.fr-kmadani}@univchebira,a{

ABSTRACT

The main goal of this paper is to comparethe performance which can be achieved byfive different approaches analyzing theirapplications potentiality on real worldparadigms. We compare the performanceobtained with (1) Multi-networkRBF/LVQ structure (2) Discrete HiddenMarkov Models (HMM) (3) HybridHMM/MLP system using a Multi Layer-Perceptron (MLP) to estimate the HMMemission probabilities and using the K-means algorithm for pattern clustering (4)Hybrid HMM-MLP system using theFuzzy C-Means (FCM) algorithm forfuzzy pattern clustering and (5) HybridHMM-MLP system using the GeneticAlgorithm (AG) for genetic clustering.Experimental results on Arabic speech

vocabulary and biomedical signals showsignificant decreases in error rates ofhybrid HMM/MLP/AG pattern recognitionin comparison to those of other researchexperiments by integrating three types offeatures (PLP, log-RASTA PLP, J-RASTA PLP) were used to test therobustness of our hybrid recognizer in thepresence of convolution and additivenoise.

KEYWORDS

Speech and medical recognition, J-RASTA PLP, fuzzy clustering, GeneticA l g o r i t h m , H M M / M L P m o d e l s .

1 INTRODUCTION

In many target (or pattern)classification problems theavailabilityof multiple looks at an object cansubstantially improve robustness andreliability in decision making. The use

of several aspects is motivated by thedifficulty in distinguishing betweendifferent classes from a single view atan object [9]. It occurs frequently thatreturns from two different objects atcertain orientations are so similar thatthey may easily be confused.Consequently, a more reliable decisionabout the presence and type of anobject can be made based uponobservations of the received signals or

patterns at multiple aspect angles. Thisallows for more information toaccumulate about the size, shape,composition and orientation of theobjects, which in turn yields moreaccurate discrimination. Moreover,when the feature space undergoeschanges, owing to different operatingand environmental conditions, multi-aspect classification is almost anecessity in order to maintain the

14
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://www.univ-annaba.org/http://www.univ-annaba.org/mailto:[email protected]:[email protected]://www.univ-paris12.fr/http://www.univ-paris12.fr/http://www.univ-paris12.fr/http://www.univ-paris12.fr/mailto:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]://www.univ-paris12.fr/mailto:[email protected]://www.univ-annaba.org/mailto:[email protected]:[email protected]


2/18


performance of the pattern recognitionsystem.

In this paper we present the HiddenMarkov Model (HMM) and applythem to complex pattern recognition

problem. We attempt to illustrate someapplications of the theory of HMM toreal problems to match complex

patterns problems as those related tobiomedical diagnosis or those linked tosocial behavior modeling. Weintroduce the theory and the foundationof Hidden Markov Models (HMM). Inthe pattern recognition domain, and

particularly in speech recognition,HMM techniques hold an important

place [7]. There are two reasons whythe HMM exists. First the models arevery rich in mathematical structure andhence can form the theoretical basis foruse in a wide range of applications.Second the models, when applied

properly, work very well in practice forseveral important applications.However, standard HMM require theassumption that adjacent featurevectors are statistically independentand identically distributed. Theseassumptions can be relaxed byintroducing Neural Network (NN) inthe HMM framework.

Significant advances have beenmade in recent years in the area ofspeaker independent speechrecognition. Over the last few years,connectionist models, and Multi Layer-Perceptron (MLP) in particular, have

been widely studied as potentiallypowerful approaches to speechrecognition. These neural networksestimate the posterior probabilitiesused by the HMM. Among these, thehybrid approach using the MLP toestimate HMM emission probabilitieshas recently been shown to be

particularly efficient by example forFrench speech [1] and AmericanEnglish speech [14].

We then propose a hybridHMM/MLP model for speech

recognition and biomedical diagnosiswhich makes it possible to join thediscriminating capacities, resistance tothe noise of MLP and the flexibilitiesof HMMs in order to obtain better

performances than traditional HMM.We develop also, a method based onconcepts of fuzzy logic for clusteringand classification of vectors : FuzzyC-Means (FCM) algorithm, anddemonstrate its effectiveness withregard to K-Means (KM) traditionalalgorithm, owing to the fact that theKM algorithm provides a hard decision(not probabilitized) in connection withthe observations in the HMM states.

Regarding the other algorithm ofclustering, we specifically, methodsinvolving a supervised clustering by

partition and we chose one as the basisfor our work. This solution is to makethe choice of a measure that we use inour application. This algorithm is a"good" score in a test that measures thequality of a partition. We are thusreduced to an optimization problem.

The properties of this algorithm doesnot guarantee convergence to a globaloptimum, that is why we are interestedin a type heuristics genetic algorithms(GA), less likely to be trapped by theminimum local and now widely used inoptimization problems.

Genetic Algorithms (GA) are searchprocedures based on the mechanics ofgenetics and natural selection.

Numerous techniques and algorithmshave been developed to assist theengineer in the creation of optimumdevelopment strategies. These

procedures quickly converge tooptimal solutions after examining onlya small fraction of the search space andhave been successfully applied tocomplex engineering optimisation

problems.

15


3/18


2 HMM Advantages and Drawbacks

Standard HMM procedures, as definedabove, have been very useful for

pattern recognition; HMM can deal

efficiently with the temporal aspect ofpattern (including temporal distortionor time warping) as well as withfrequency distortion. There are

powerful training and decodingalgorithms that permit efficient trainingon very large databases, and forexample recognition of isolated wordsas well as continuous speech. Giventheir flexible topology, HMM caneasily be extended to include

phonological rules (e.g., building wordmodels from phone models) orsyntactic rules. For training, only alexical transcription is necessary(assuming a dictionary of phonologicalmodels); explicit segmentation of thetraining material is not required. Anexample of a sample HMM is given infigure. 1; this could be the model of asort assumed to be composed of threestationary states.

p(xi\q1) p(xi\q2) p(xi\q3)

Figure 1. Example of a three-state HiddenMarkov Models Where 1) {q1, q2, q3}: theHMM states. 2) p(qi\qj): the transition

probability of state qito state qj(i, j = 1..3). 3)p(xi\qj): the emission probability of observationxifrom the state qj(i = 1...number of trames, j

= 1..3).

However, the assumptions thatpermit HMM optimization andimprove their efficiency also, practice,limit their generality. As aconsequence, although the theory of

HMM can accommodate significantextensions (e.g., correlation of acousticvectors, discriminate training,),

practical considerations such asnumber of parameters and train-abilitylimit their implementations to simplesystems usually suffering from severaldrawbacks [2] including:- Poor discrimination due to trainingalgorithms that maximizes likelihoodsinstead of a posteriori probabilities

(i.e., the HMM associated with eachpattern unit is trained independently ofthe other models). Discriminatelearning algorithms do exist for HMM

but in general they have not scaledwell to large problems.- A priori choice of model topologyand statistical distributions, e.g.,assuming that the Probability DensityFunctions (PDF) associated asmultivariate Gaussian densities ormixtures of multivariate Gaussiandensities, each with a diagonal onlycovariance matrix (i.e., possiblecorrelation between the components ofthe acoustic vectors is disregarded).- Assumption that the statesequences are first-order Markovchains.- Typically, very limited acousticalcontext is used, so that possible

correlation between successiveacoustic vectors is not modeled verywell.Much Artificial Neural Network(ANN) based Automatic patternrecognition research has beenmotivated by these problems.

3 ESTIMATING HMM

LIKELIHOODS WITH ANN

ANN can be used to classify patternclasses such as units. For statistical

16


4/18


recognition systems, the role of thelocal estimator is to approximate

probabilities or PDF. Practically, giventhe basic HMM equations, we wouldlike to estimate something likep(xn\qk),

is the value of the probability densityfunction of the observed data vectorgiven the hypothesized HMM state.The ANN in particular the MLP can betrained to produce the posterior

probability p(qk\xn) of the HMM stategive the acoustic data. This can beconverted to emission PDF valuesusing Bayesrule.

Several authors [1-4, 7-10] haveshown for speech recognition that

ANN can be trained to estimate aposteriori probabilities of outputclasses conditioned on the input

pattern. Recently, this property hasbeen successfully used in HMMsystems, referred to as hybridHMM/ANN systems, in which ANNare trained to estimate local

probabilities p(qk\xn) of HMM statesgiven the acoustic data.

Since the network outputsapproximate Bayesian probabilities,

gk(xn,)is an estimate of:

)(

)()\()\(

n

kknnk

xp

qpqxpxqp

(1)

which implicitly contains the a prioriclass probability p(qk). It is thus

possible to vary the class priors duringclassification without retraining, since

these probabilities occur only asmultiplicative terms in producing thenetwork outputs. As a result, class

probabilities can be adjusted duringuse of a classifier to compensate fortraining data with class probabilitiesthat are not representative of actual useor test conditions [18], [19].

Thus, scaled likelihoodsp(xn\qk) foruse as emission probabilities instandard HMM can be obtained by

dividing the network outputs gk(xn) by

the training set, which gives us anestimate of :

)(

)\(

n

kn

xp

qxp

(2)

During recognition, the seatingfactorp(xn)is a constant for all classesand will not change the classification.It could be argued that, when dividing

by the priors, we are using a scaledlikelihood, which is no longer adiscriminate criterion. However, thisneeds not the true, since thediscriminate training has affected the

parametric optimization for the systemthat is used during recognition. Thus,this permit uses of the standard HMMformalism, while taking advantage ofANN characteristics.

The figure2 shows that anequilibrium that is reached in the idealcase does in fact correspond to thenetwork output g being equal to

posterior probability p. Consider anarea in feature space around a training

patternxn, and assume that we consideronly two classes: the target class forthis pattern, and the class of all patternsthat do not belong to the first class.Vectors in the selected area willundergo an upward force

corresponding to the gradient of theerror term for all patterns with a targetof 1; the quadratic error term in thiscase is (1-g)2, with a derivative of(1-g), and this case will occur a

fraction of the time given byprobability p. Therefore, the upwardforce (average gradient term) appliedto the region is equal to p(1-g). Thedownward force is similarly defined,and will balance for the equilibratedcase. This will only occur when thenetwork output is equal to the posterior

probability.

17


5/18


Figure 2. Geometric illustration of proof thatANN will (ideally) produce posterior

probabilities. Equilibrium is reached when theupward force, (due to the average of the ANNoutput from 1 for a target that should be high),is equal to the downward force (due to the

average deviation from 0 for a target thatshould be low).

4 WHY THIS IS GOOD ?

Since we ultimately derive essentiallythe same probability with a MLP as wewould with a conventional (e.g.,Gaussian mixture) estimator, what isthe point? There are at least two

potential advantages that we and others

have observed [4, 9, 10, 14]:

1) The standard statistical recognizersrequire strong assumptions about thestatistical character of the input, suchas parameterizing the input densities asmixtures of Gaussian densities with nocorrelation between features, or as the

product of discrete densities fordifferent features that are assumed to

be statistically independent. This typeof assumption is not required with aMLP estimator, which will be an

advantage particularly when a mixtureof feature types are used, e.g., binaryand continuous. Specifically, standardHMM approaches require theassumption that successive acoustic

vectors are uncorrelated. For the MLPestimator, multiple inputs can be usedfrom a range of time steps, and thenetwork will learn something about thecorrelation between the acousticinputs. Note that the use of such anetwork will lead to a more generalemession probability that may also beused in the global decoding. That is, ifc + d + 1 frames of acoustic vectors

dnncndn cn xxxX ,...,,..., are used as

input to provide contextual informationto the network, the output values of theMLP will estimate

.,...,1, KkXqp dn cnk This providesa simple mechanism for incorporatingacoustic context into the statisticalformulation.2) MLP are a good match todiscriminative objective functions(e.g., Mean Squared Error: MSE), and

so the probabilities will be optimizedto maximize discrimination betweensound classes, rather than to mostclosely match the distributions withineach class. It can be argued that such atraining is conservative of parameters,since the parameters of the estimate aretrained to split the space betweenclasses, rather than to represent thevolumes that compose the division ofspace constituting each class.

5 J-RASTA PLP ANALYSIS

Developing speech recognizer that arerobust under realistic acousticenvironment has been and still is amajor problem for the speechcommunity. Speech distortion can becategorized into two types. There islinear spectral distortion (i.e.,convolutional noise), which is typically

introduced by microphones and thetelephone channel. There is also

(1-g)

= (1-p)g

g

= p(1-g)

18


6/18


additive noise such as stationary orslowly varying background noise.

Three types of features (PLP, Log-RASTA PLP and J-RASTA PLP) wereused to test the robustness of our

hybrid recognizer in the presence ofconvolution and additive noise.Perceptual Linear Predictive (PLP)analysis is an extension of linear

predictive analysis that takes intoaccount some aspects of human sound

perception [20, 21]. Log-RASTA PLP(RelActive SpecTrAl processing) is

based on PLP but also aims at reducingthe effect of linear spectral distortion.In addidtion of PLP and log-RASTA

PLP analysis already used in our pastworks [22], we used her J-RASTAPLP tries to handle both linear spectraldistortion and additive noisesimultaneously [23].

J-RASTA-PLP [23] is used as theacoustic pre-processor for both thedatabases experiments. Each frame ofthe feature vector represents 25 ms ofspeech, with 12.5 ms overlap ofconsecutive frames. J-RASTA PLPwas chosen for its robustness to linearspectral distortions in speech signalsthat are often introduced bycommunication channels.

J-RASTA-PLP attempts to makerecognition more robust and invariantto acoustic environment variables. It insome sense performs speechenhancement but for the purposes offeature extraction. The speech

enhancement process added wasdesigned to improve intelligibility ofnoisy speech for human listeners.Addition of this enhancement mayimprove recognition scores further.

Speech recordings were sampledover the microphone at 11 kHz. After

pre-emphasis (factor 0.95) andapplication of a Hamming windows,The J-RASTA PLP features werecomputed every 10 ms on analysis

windows of 30 ms.

Each frame is represented by 12components plus energy (J-RASTAPLP + E). The values of the 13coefficients are standardized by theirstandard deviation measured on the

frames of training. The features set forour hybrid HMM/MLP system wasbased on a 26 dimensional vectorcomposed of the cepstral parameters(J-RASTA PLP parameters), the cepstral parameters, the energy and

the energy.

RASTA was configured to

incorporate high-pass filtering and

slight spectral subtraction. A constant J

of 1e-6 was used for training. Multiple

regressions J mapping was used duringtesting.

Nine frames of contextual

information was used at the input of

the MLP (9 frames of context being

known as yielding usually the best

recognition performance).

6 CLUSTERING PROCEDURE

One of the basic problems that aries in

a great variety of fields, including

pattern recognition, machine learning

and statistics, is the so-called clustering

problem [5,13,15,22]. The fundamental

data clustering problem may be

defined as discovering groups in data

or grouping similar objects together.

Each of these groups is called a cluster,

a region in which the density of objects

is locally higher than in other regions.In this paper, data clustering is

viewed as a data partitioning problem.

Several approaches to find groups in a

given database have been developed

[5, 13, 15], but we focus on the K-

Means (KM) algorithm as it is one of

the most used iterative partitional

clustering algorithms and because it

may also be used to initialize more

expensive clustering algorithms (e.g.,

the EM algorithm).

19


7/18


6.1 Drawbacks of the K-Means

Algorithm

Despite being used in a wide array ofapplications, the KM algorithm is not

exempt from drawbacks. Some of thesedrawbacks have been extensivelyreported in literature [5, 13, 15, 22].The most important are listed below:- As many clustering methods, theKM algorithm assumes that thenumber of clusters kin the database isknown beforehand which, obviously,is not necessarily true in real-worldapplications.

- As an iterative technique, the KMalgorithm is especially sensitive toinitial starting conditions (initialclusters and instance order).- The KM algorithm convergesfinitely to a local minimum. Therunning of the algorithm defines adeterminist mapping from the initialsolution to the final one.- The vectorial quantization and in

particular, the KM algorithm providesa hard and fixed decision not

probabilitized which does not transmitenough information on the realobservations.

Many clustering algorithms based onfuzzy logic concepts have beenmotivated for this last problem.

6.2 Fuzzy C-Means Algorithm

In general, and by example for speechcontext a purely acoustic segmentationof the speech cannot suitably detect the

basic units of the vocal signal. One ofthe causes is that the borders betweenthese units are not acoustically defined.For this reason, we were interested touse the automatic classificationmethods which are based on fuzzylogic in order to segment the datavectors. Among the adapted

algorithms, we have chosen the FCMalgorithm which has already been

successfully used in various fields andespecially in the image processing[13], [15].

FCM algorithm is a method ofclustering which allows one piece of

data to belong to two or more clusters.The use of the measurement data isused in order to take not of pattern data

by considering in spectral domain only.However, this method is applied forsearching some general regularity inthe collocation of patterns focused onfinding a certain class of geometricalshapes favored by the particularobjective function [5]. The FCMalgorithm is based on minimization of

the following objective function, withrespect to U, a fuzzy c-partition of thedata set, and to V, a set of K

prototypes [5]:

m

j

c

i

ij

m

ijm VXuVUJ1 1

2

),(

11

(3)

Where m is any real number greater

than 1, uijis the degree of membershipofxjin the cluster i, xjis the jth of d-dimensional measured data, Viis the d-dimension center of the cluster, and ||*||is the any norm expressed thesimilarity between any measured dataand the center.

Fuzzy partition is carried outthrough an iterative optimization of (3)with the update of membership u andthe cluster centers Vby [5]:

c

k

m

ik

ij

ij

d

du

1

12

1

(4)

n

j

m

ij

n

j

j

m

ij

i

u

Xu

V

1

1

(5)

20


8/18


The criteria in this iteration will stop

when ijijij

uu max where is a

termination criterion between 0 and 1.As a part of our main objective, we

aim to findthe best and the worst set ofinitial starting conditions to approachthe extremes of the probabilitydistributions of the square-error values.Due to the computational expense of

performing an exhaustive search wetackle the problem using GeneticAlgorithms.

6.3 Genetic Algorithms

Roughly speaking, we can say thatGenetic Algorithms (GA) are kinds ofevolutionary algorithms, that is,

probabilistic search algorithms whichsimulate natural evolution [24, 25, 26].GA are used to solve combinatorialoptimization problems following therules of natural selection and naturalgenetics. They are based upon thesurvival of the fittest among string

structures together with a structuredyet randomized information exchange.Working in this way and under certainconditions. GA evolve to the globaloptima with probability arbitrarilyclose to 1.

When dealing with GA, the searchspace of a problem is represented as acollection of individuals. Theindividuals are represented bycharacter strings. Each individual is

coding a solution to the problem. Inaddition, each individual hasassociated a fitness measure. The partof the space to be examined is calledthepopulation. The purpose of the useof a GA is to find the individual fromthe search space with the best genetic

material.Figure 3 shows the pseudo-code of

the GA that we use. First, the initialpopulation is chosen and the fitness of

each of its individuals is determined. Inour case, the result of FCM clustering

is considering as initial population.Next, in every iteration, two parentsare selected from the population. This

parental couple produces childrenwhich, with a probability near zero, are

mutated, i.e., hereditary distinctions arechanged. After the evaluation of thechildren, the worst individual of the

population is replaced by the fittest ofthe children. This process is iterateduntil a convergence criterion issatisfied.

The operators which define thechildren production process and themutation process are the crossoveroperator and the mutation operator

respectively. Both operators areapplied with different probabilities and

play different roles in the GA.Mutation is needed to explore newareas of the search space and helps thealgorithm avoid local optima.Crossover is aimed to increase theaverage quality of the population. Bychoosing adequate crossover andmutation operators as well as anappropriate reduction mechanism, the

probability that the GA reaches a near-optimal solution in a reasonablenumber of iterations increases.

Figure 3. The pseudo-code of the GeneticAlgorithm

Begin GA

Choose initial population at randomEvaluate initial populationWhile not convergence criterion doBegin

Select two parents from the currentpopulationProduce children by the selected parents

Mutate the childrenEvaluate the childrenReplace the worst individual of the population

by the best childEnd

Output the best individual foundEnd GA.

21


9/18


7 VALIDATION ON SPEECH

AND BIOMEDICAL SIGNAL

CLASSIFICATION PARADIGM

Further assume that for each class in

the vocabulary we have a training setof k occurrences (instances) of eachclass where each instance of thecategories constitutes an observationsequence. In order to build our tool, we

perform the following operations.

1. For each class v in the vocabulary,we estimate the model parameters v(A,B, ) that optimize the likelihood ofthe training set for the vthcategory.

2. For each unknown category to berecognized, the processing of Fig. 4 iscarried out: measurement of theobservation sequence O = {o1, o2,,oT}, via a feature analysis of thesignal corresponding to the class; thecomputation of model likelihoods forall possible models,P(O/v), 1 v V;at the end the selection of the categorywith the highest likelihood.

7.1 Speech Databases

Three speech databases have been usedin this work:

1) The first one referred to as DB1,the isolated digits task has 13 words inthe vocabulary: 1, 2, 3, 4, 5, 6, 7, 8, 9,zero, oh, yes, no. They are spoken by30 speakers, producing a total of 3900utterances (each word should bemarked 10 times). The digits databasehas about 30,000 frames of trainingdata. This first corpus consists ofisolated digits collected over themicrophone. This data base was used

to conduct extensive testings of therobustness of various signal processingfront ends (PLP, log-RASTA PLP, J-RASTA PLP) in the presence of bothconvolution and additive noise.2) The second database, referred toas DB2 contained about 50 speakerssaying their last name, first name, thecity of birth and the city of residence.Each word should be marked 10 times.

The used training set in the followingexperiments consists of 2000 sounds.

Figure 4. Block diagram of a speech and biomedical database HMM recognizer

22


10/18


3) The third database, referred to asDB3, contained the 13 control words(i.e. View/new, save/save as/save all)so that each speaker pronounces eachcontrol word 10 times. The used

training set in the followingexperiments consists of 3900 soundssaying by 30 speakers.

For each databases, 12 speakerswere used for training (a non-overlapping subset of these were usedfor cross-validation used to adapt thelearning rate of the MLP), while theremaining 8 speakers were used fortesting.

The acoustic feature were quantizedinto independent codebooks accordingto the FCM algorithms respectively:

- 128 clusters for the J-RASTA PLPvectors.

- 128 clusters for the first timederivative of cepstral vectors.

- 32 clusters for the first timederivative of energy.

- 32 clusters for the second timederivative of energy.

7.2 Biomedical Database

For task of biomedical databaserecognition using HMM/MLP model,the object of this survey is theclassification of an electric signalcoming from a medical test [16],experimental device is described in

Fig. 5. The used signals are calledPotentials Evoked Auditory (PEA),examples of PEA signals are illustratedin Fig. 6. Indeed, the explorationfunctional otoneurology possesses atechnique permitting the objectivesurvey of the nervous conduction alongthe auditory ways. The classification ofthe PEA is a first step in thedevelopment of a help tool to thediagnosis. The main difficulty of this

classification resides in theresemblance of signals corresponding

to different pathologies, but also in thedisparity of the signals within a sameclass. The results of the medical testcan be indeed different for twodifferent measures for the same patient.

The PEA signals descended of theexamination and their associatedpathology are defined in a data basecontaining the files of 11185 patients..We chose 3 categories of patients (3classes) according to the type of theirtrouble. The categories of patients are:1) Normal (N): the patients of thiscategory have a normal audition(normal class).2) Endocochlear (E): these patients

suffer from disorders that touches thepart of the ear situated before thecochlea (class Endocochlear).3) Retrocochlear (R): these patientssuffer from disorders that touches the

part of the ear situated to the level ofthe cochlea or after the cochlea. (classretrocochlear).

We selected 213 signals(correspondents to patients). So thatevery process (signal) contains 128

parameters. 92 among the 213 signalsbelong to the N class, 83, to the class Eand 38 to the class R. To construct our

basis of training, we chose the signalscorresponding to pathologies indicatedlike being certain by the physician. AllPEA signals come from the sameexperimental system. In order to valuethe HMM/MLP realized system andfor ends of performance comparison,

we forced ourselves to apply the sameconditions already respected in thework of the group describes in thefollowing article [17] and that uses amulti-network structure heterogeneousset to basis of RBF and LVQ networksand the work describes in the followingarticles [11], [12] and that uses adiscrete HMM. For this reason, the

basis of training contains 24 signals, ofwhich 11 correspondent to the class R,

6 to the class E and 7 to the N class.

23


11/18


After the phase of training, when thenon learned signals are presented to theHMM, the corresponding class must bedesignated. For the case in which wewish to use an HMM with a discrete

observation symbol density, rather thancontinuous vectors above, a quantizedvector VQ is required to map eachcontinuous observation vector into adiscrete codebook index. Once thecodebook of vectors has been obtained,the mapping between continuousvectors and codebook indices becomesa simple nearest neighbor computation,i.e., the continuous vector is assignedthe index of the nearest codebook

vector. Thus the major issue in VQ isthe design of an appropriate codebookfor quantization.

Codebook sizes of from M = 64vectors have been used in biomedicaldatabase recognition experiments usingHMM/MLP model.

Figure 5.PEA generation experiment

Figure 6.(A) PEA signal for normal patient,

(B ) Patient with auditory disorder

7.3 Discrete MLP with entriesprovided by the FCM algorithm

For this case, we compare theperformance of the basis hybrid model

with that of an hybrid HMM/MLPmodel using in entry of the network anacoustic vector composed of realvalues which were obtained byapplying the FCM algorithm with 2880real components corresponding to thevarious membership degrees of theacoustic vectors to the classes of the"code-book ". We presented eachcepstral parameter (J-RASTA PLP, J-RASTA PLP, E, E) by a real

vector which the components definitethe membership degrees of theparameter to the various classes of the"code-book ". For speech databases,for example, we reported the valuesused for DB2. 10-state, strictly left-to-right, word HMM with emission

probabilities computed from an MLPwith 9 frames of quantizied acousticvectors at the input, i.e., the currentacoustic vector preceded by the 4

acoustic vectors on the left context andfollowed by the 4 acoustic vectors onthe right context. the entry layer ismade up of a real vector with 2880 realcomponents corresponding to thevarious membership degrees of theacoustic vectors to the classes of the"code-book ".

Thus a MLP with only one hiddenlayer including 2880 neurons at theentry, 30 neurons for the hidden layerand 10 output neurons was trained. Thethree-layer MLP is trained by usingstochastic gradient descent, andrelative entropy as the error criterion.A sigmoid function is applied to thehidden layer units, and softmax(exponential of the units weighted

sum normalized by the sum ofexponentials for the entire layer) isused as the output nonlinearity. The

number of neurons of the hidden layerwas choosen empirically.

24


12/18


For the PEA signals, a MLP with 64neurons at the entry, 18 neurons for thehidden layer and 5 output neurons wastrained.

7.4 Discrete MLP with entriesprovided by the GA algorithm

In the case of application of GA, theinitial population is chosen and the

fitness of each of its individuals isdetermined. In our case, the result ofFCM clustering is considering as initial

population.

For clarification, we report after,

some results of the clusterning of DB1

7.4.1 Coding of individuals

The key idea is that the encoding of anindividual must enable sampling"effective" in the search space. Wechose a representation based onindexing of classes. An individual istherefore encoded by a chain of

integers whose ordinal value (= index)represents the number of the object andthe cardinal value corresponds to thenumber of the class to which the object

belongs.

Practical consideration: The followingpartition of objects (acoustic vectors)in 10 classes (10-digit).class0 = {V1,V2,..V21,V22},class 1 = {V23,V24,...V34,V35},class 2 = {V36,V37,...V?}, ,class 9 = {}will be represented bythe following chromosome:

000000... 111111 888888 999999

Figure 7. Example of coding of individuals

7.4.2 Population size

It is, a fundamental parameter of theGA, for which there is no general

criteria for determining what optimumsize should be. We tried to adapt thesize of the population the size of the

problem to treat.

7.4.3 Meri t function

An individual or even genotyperepresents, let us recall, partitioning ofk observations all classes. We use thefunction of following representation:

li Cx il

l x

Cg 1

(6)

withglthe centre of gravity of the class

Cl, l[1,M],Mthe number of classes(digit in our case), which allows toinfer theM representatives. The test ofintra-class criteria, measurement of thequality of a partition whose expressionis given in (7) is our cost objective

function , and we seek to minimize thevalue. This objective functiontransformation (criterion) led to the

function of merit that we use in ouralgorithm to measure the quality of an

individual.

),(1

2

li

M

l Cx

i gxdpw

li

(7)

where pi means the weight of the ith

object.

7.4.4 Reproduction

In this phase, is created on eachiteration, a new population, byapplying the genetic operators:selection, crossing and mutation. It isto select the most appropriateindividuals in the population within themeaning of the function of merit,following a method, and reproducethem as what in the next generation.

In our GA we used called "steadystate genetic algorithm" consisting of

variant to replace that a certainpercentage of the population each

25


13/18


generation (the size of the populationremains constant over the course of theGA). The evolution of the populationis ensured through operators in the

selection, the crossing and mutation

that allow to combine and modifychromosome. We summarize, usedgenetic operators.

- Selection: there are many selectionstrategies, we present just two briefly.In the selection of wheel type the

probability of selection of anindividual is proportional to its valueof adaptation. The selection of typetournament is to select a subset of the

population and retain the bestindividual of this sub-group. Seen thatin our case, the problem is aminimization of the fitness problem.The best individual is one that has thesmallest value of fitness

Practical consideration: According tofitness of each chromosome of thepopulation, it selects the bestindividuals.

Table 1. Selection of the best individuals

Parents fitness1 6.4520

2 6.8843

3 7.3374

4 6.2424

5 6.7076

6 7.1452

7 6.4746

8 6.8331

Parents fitness

1 6.2424

2 6.4520

3 6.4746

4 6.7076

5 6.8331

6 6.8843

7 7.1452

8 7.3374

- Crossing: it allows to create twoindividuals (children) by combininggenes from both parents obtained in the

previous step. This operator allows toincrease the quality of a populationaverage. We used the variant to arandomly chosen from possible Cup

points crossing point (for a chainoflength lthere are l 1). However, this

solution may lead to a child non-viable (or even both), this is why it

tests eventually to different cut-offpoints. Can optionally restrict thisroute by setting a priori, a number ofiterations. If at the end of this audit, thechild remains unsustainable, we take

one of the parents. A child is said to benon-viable, if it does not have the samenumber of classes as these parents.

Practical consideration: Are the twofollowing parents selected from ourapplication.Parent 1 Cut-of

00000000 11110111 ... 33633333 44744477

55556655 66665566 88338822... 99999990

Parent 200000000 11111115... 33333333 44444444...

55556565 66666666 88888833 99999999...

Figure 8.Both parents

It gets 102 cut point. The two childrenobtained are the following:

Child 1

00000000 11110111 33633333 44444444

55556565 66666666 88888833 99999999

Child 2

00000000 11111115 33333333 44744477

55556655 66665566 88338822 99999990

Figure 9.Two children

In regards to the value of theprobability of crossing, we followedthe recommendations proposed by

Goldberg [24], citing the work of Dejong, offers a high probability ofcrossing ( 0)5).

- Mutation: this stage allows tointroduce random variation in genotypeof the individual (here the children).This operator allows therefore toexplore new areas of research space,thus decreasing the risk of converge

towards local minima.. Each gene istherefore likely to change in a given

26


14/18


probability, it is also possible to chooserandomly a portion of the genes. Themutation may lead to an unsustainableindividual, this possibility is thereforetaken into account at the level of the

coding of this operator.

Practical consideration:Be the genesof the enfant1 presented above:Child before mutation

00000000 11110111 33633333 4444444455556565 66666666 88888833 99999999

Child after mutation

00000000 11110111 33633333 4444444455556565 66666666 88888833 97999999

Change of gene

Figure 10.Example of mutation

Here again, we followedrecommendations of Goldberg[24] forthe probability of mutation, byadopting a value inversely proportionalto the size of the population.

7.4.5 Replacement of the newpopulation

We conducted a replacement of badsolutions. For this purpose, we haveclassified all solutions (parents andchildren) according to their adaptationand are kept in our people that the Nfirst individuals, i.e. more small valuesof fitness, obtaining the new

population which consists of theNbestadaptations. If the number ofgenerations is reached, then the best

solution is extracted, and moves to thenext step; otherwise, it is a newiteration of reproduction.

Practical consideration:The populationof parents and children obtained, is

presented in the table on the left. Tomake a replacement of the worst

people, we have classified all theparents and their children according to

their fitness growing. It will keep in thepopulation than theNfirst individuals.The new population is presented in thetable at right.

Table 2. Best population obtained

Child fitness Parents fitness

1 4.2582 1 3.1490

2 11.5489 2 3.1803

3 7.2761 3 3.2824

4 9.1245 4 3.3785

5 13.5487 5 3.5947

6 5.0214 6 3.8692

7 3.1421 7 3.8761

83.8692 8 3.8815

9 3.5103

10 7.6815

11 7.2584

12 3.1436

New

populationfitness

1:Child 7 3.1421

2: Child 12 3.1436

3: Parent 1 3.1490

4: Parent 2 3.1803

5: Parent 3 3.2824

6: Parent 4 3.3785

7: Child 9 3.5103

8: Parent 5 3.5947

7.4.6 Test of judgment

There is no criterion of judgment thatguarantee the convergence of the GAto an optimal solution. It is customaryto fix a priori, the number of iterations.When this number is reached, it takesthe best chromosome obtained as theoptimal solution. We used this methodin our experiments.

For the GA, we set a maximumiteration number that represents the testcase.

Figure 11.Convergence of the classificationprocess for the base of the Arabic numerals (0-9).

27


15/18


7 . 5 R e s u l t s a n d D i s c u s s i o n

Five types of models were comparedwithin the framework of speech andmedical databases recognition. Theexperimental results are reported intables 3 and 4. The table 3 summarizesthe various results obtained of themodels using the acoustic parameters

provided by a Log-RASTA PLP and J-RASRA PLP analysis. For the speechdatabases, all the tests of the fivemodels were carried out on the DB1 infirst then on the DB2 and finally on

DB3.For the majority of cases, we candraw a preliminary conclusion fromthe results reported in tables 3 and 4for the speech recognition and the PEAsignals diagnosis. We reported her, theresults describes in [22].- For the speech databases recorded inreal conditions and noisy environment,in majority of cases, the result withapplication of J-RASTA PLP analysis

is better than application of Log-RASTA PLP, this confirmed thetheory.- By comparing the rate with the rateof classification of the system using themulti-network structure (RBF/LVQ)describes in [17], the rate of thediscrete HMM is better.- The hybrid discrete HMM/MLPapproaches always outperformsstandard discrete HMM.

- The hybrid discrete HMM/MLPsystem using the GA clustering gavethe best results and is more performthan the hybrid discrete HMM/MLPsystem using the FCM clustering andthe rate of th later is beter than hybridmodel using KM clustering.

Table 3. Recognition rates for the tree speech databases

(DB1, DB2, and DB3) with the five different types ofmodels (M1, M5) and application of acoustic analysisLog-RASTA PLP and J-RASTA PLP. M1: RBF/LVQ,M2: Discrete HMM, M3: Hybrid HMM/MLP with KMentries, M4: Hybrid HMM/MLP with FCM entries andM5: Hybrid HMM/MLP with AG entries.

DB1 DB2 DB3Log-

RASTA

PLP

J-

RASTA

PLP

Log-

RASTA

PLP

J-

RASTA

PLP

Log-

RASTA

PLP

J-

RASTA

PLP

M1

M2

M3

M4

M5

80.33

85.75

90.42

97.28

96.32

79,30

87,45

92,96

97,46

98,37

85.12

89.68

92.37

96.16

96.82

85,70

90,23

93,65

96,89

99,32

72.84

75.31

75.84

73.23

75.22

71,78

76,48

80,35

82,62

82,95

Table 4. Recognition rates for the biomedical databaseswith the five different type of models (M1,, M5).

Models Recognition rates

M1 80.66

M2 84.48

M3 90.12

M4 93.76

M5 95.23

8 CONCLUSION AND FUTURE

WORK

In this paper, we presented the test offive types of models in the frameworkof speech recognition and medicaldiagnosis. A discriminate training

algorithm for hybrid HMM/MLPsystem based on the genetic clusteringare described. Our results on isolatedspeech and biomedical signalsrecognition tasks show an increase inthe estimates of the posterior

probilities of the correct class aftertraining, and significant decreases inerror rates in comparison to the fivesystems: 1) Multi-network RBF/LVQstructure, 2) Discrete HMM, 3)

HMM/MLP approach with KMclustering, 4) HMM/MLP approach

28


16/18


with FCM clustering and 5)HMM/MLP approach with GAclustering.

These preliminary experiments haveset a baseline performance for our

hybrid GA/HMM/MLP system. Betterrecognition rates were observed. Fromthe effectiveness view point of themodels, it seems obvious that thehybrid models are more powerful thandiscrete HMM or multi-networkRBF/LVQ structure for the PEAsignals diagnosis and speech databases.

We thus envisage improving theperformance of the suggested systemwith the following points:

- It would be important to use othertechniques of speech parametersextraction and to compare therecognition rate of the system with thatusing the log-RASTA PLP and J-RASTA PLP analysis. We think ofusing the LDA (Linear DiscriminateAnalysis) and CMS (Cepstral MeanSubtraction) owing to the fact thatthese representations are currently

considered among most powerful inASR.

- It appears also interesting to use thecontinuous HMM with a multi-Gaussian distribution and to comparethe performance of the system withthat of the discrete HMM.

- In addition, for an extended speechvocabulary, it is interesting to use the

phonemes models instead of words,which facilitates the training withrelatively small bases.

- For the PEA signals recognition, themain idea is to define a fusion scheme:cooperation of HMM with the multi-network RBF/LVQ structure in orderto succeed to a hybrid model andcompared the perfomance with theHMM/MLP/GA model proposed inthis paper.

9 REFERENCES

1. O. DEROO, C. RIIS, F. MALFRERE,H.LEICH, S. DUPONT, V.FONTAINE AND J.M. BOTE.Hybrid HMM/ANN system for

speaker independent continuous speechrecognition in French ". Thesis, Facult

polytechnique de Mons TCTS,BELGIUM, (1997).

2. H. BOURLARD, S. DUPONT. "Sub-band-based speech recognition". InProc, IEEE International, Conf,Acoustic, Speech and Signal Process,

pp.1251-1254, MUNICH, (1997).3. F. BERTHOMMIER, H. GLOTIN. "A

new SNR-feature mapping for robustmulti-stream speech recognition". InBerkeley University of California,

editor, Proceeding. International.Congress on Phonetic Sciences(ICPhS), Vol1 of XIV, pp. 711-715,SANFRANCISCO,(1999).

4. J-M. BOITE, H. BOURLARD, B.DHOORE, S. ACCAINO,

J.VANTIEGHEM. "Task independentand dependent training: performancecomparison of HMM and hybridHMM/MLP approaches". IEEE, vol.I,

pp.617-620, (1994).5. J.C. BEZDEK, J. KELLER, R.

KRISHNAPWAM and N.R. PAL."Fuzzy models and algorithms for

pattern recognition and imageprocessing". KLUWER, Boston,LONDON, (1999).

6. H. HERMANSKY, N. MORGAN."RASTA Processing of speech". IEEETrans. On Speech and AudioProcessing, vol.2, no.4, pp. 578-589,(1994).

7. A. HAGEN, A. MORRIS. "Comparisonof HMM experts with MLP experts inthe full combination multi-band

approach to robust ASR". To appear inInternational Conference on SpokenLanguage Processing, BEIJING,(2000).

8. A. HAGEN, A. MORRIS. "From multi-band full combination to multi-streamfull combination processing in robustASR" to appear in ISCA TutorialResearch Workshop ASR2000, Paris,FRANCE,(2000).

9. L. LAZLI, M. SELLAMI. "HybridHMM-MLP system based on fuzzylogic for arabic speech recognition".

PRIS2003, The third internationalworkshop on Pattern Recognition in

29


17/18


Information Systems, pp. 150-155,April 22-23, Angers, FRANCE, (2003).

10. L. LAZLI, M. SELLAMI."Connectionist Probability Estimatorsin HMM Speech Recognition using

Fuzzy Logic". MLDM 2003: the 3rd

international conference on MachineLearning & Data Mining in patternrecognition, LNAI 2734, Springer-verlag, pp.379-388, July 5-7, Leipzig,GERMANY, (2003).

11. L. LAZLI, A-N. CHEBIRA, K.

MADANI. "Hidden Markov Models for

Complex Pattern Classification". Ninth

International Conference on Pattern

Recognition and Information

Processing, PRIP07.http://uiip.bas-

net.by/conf/prip2007/prip2007.php-

id=200.htm, may 22-24, Minsk,

BELARUS, (2007).12. L. LAZLI, A-N. CHEBIRA, M-T.

LASKRI, K. MADANI. "Using hidden

Markov models for classification of

potentials evoked auditory". Confrence

maghrbine sur les technologies de

linformation, MCSEAI08, pp. 477-

480, avril 28-30, USTMB, Oran,

ALGERIA, (2008).13. D-L. PHAM, J-L. PRINCE. "An

Adaptive Fuzzy C-means algorithm forImage Segmentation in the presence ofIntensity In homogeneities". PatternRecognition Letters. 20(1), pp. 57-68,(1999).

14. S-K. RIIS, A. KROGH . "HiddenNeural Networks: A framework forHMM-NN hybrids". IEEE 1997, toappear in Proc. ICASSP-97, Apr 21-24,Munich, GERMANY, (1997).

15. H. TIMM . "Fuzzy Cluster Analysis ofClassified Data". IFSA/Nafips,VANCOUVER, (2001).

16. J-F. MOTSH. "La dynamiquetemporelle du trons crbral: Recueil,extraction et analyse optimale des

potentiels voqus auditifs du tronccrbral". Thesis, University of"Crteil", PARIS XII, (1987).

30
http://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htm


18/18


17. A-S. DUJARDIN. " Pertinence d'uneapproche hybride multi-neuronale dansla rsolution de problmes lis audiagnostic industrile ou mdical".Internal report, I2S laboratory, IUT of"Snart Fontainebleau", University of

Paris XII, Avenue Pierre Point, 77127Lieusant, FRANCE, (2006).

18. A. MORRIS, A. HAGEN, H. GLOTIN, H.BOURLARD. "Multi-stream adaptativeevidence combination for noise robustASR". Accepted for publication in SpeechCommunication, (2000).

19. A. MORRIS, A. HAGEN, H.BOURLARD. "MAP combination ofmulti-stream HMM or HMM/ANNexperts". Accepted for publication inEuro-speech 2001, Special Event NoiseRobust Recognition, Aalborg,DENMARK, (2001).

20. H. HERMANSKY. "Perceptual LinearPredictive (PLP) Analysis for Speech".J. Acoust. Soc. Am., pp. 1738-1752,(1990).

21. H. HERMANSKY. "RASTA Processingof speech". IEEE Trans. On Speech andAudio Processing,vol.2, no.4, pp. 578-589, (1994).

22. L. LAZLI, A. CHEBIRA, M-T.LASKRI, K. MADANI. " HybridHMM/ANN system using fuzzyclustering for speech and medical

pattern recognition". DICTAP 2011,Part II, CCIS 167, Springer-Verlag

Berlin Heidelberg, pp. 557-570, (2001).23. J. KOEHLER, N. MORGAN, H.

HERMANSKY, H.G HIRSCH, G.TONG. "Integrating RASTA-PLP intoSpeech Recognition". Proceedings ofthe Int. Conf. on Acoustics Speech andSignal Processing, Adelaide, Australia,(1994).

24. D. E. GOLDBERG. "Algoritmesgntiques - Exploration, optimisationetapprentissage automatique". AddisonWesley, (1994).

25. X. Yin and N. Germany. "A FastGenetic Algorithm with sharing schemeusing cluster analysis methods inMultimodal function optimization". InProceedings of the Artificial neural Netsand Genetic algorithms, (1993).

26. D. Goldberg. "Genetic Algorithms".Addison Wesley editor.

31

connectionist propability estimators in hmm using genetic clustering application for speech...

Documents