connectionist propability estimators in hmm using genetic clustering application for speech...
TRANSCRIPT
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
1/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
CONNECTIONIST PROBABILITY ESTIMATORS IN HMM
USING GENETIC CLUSTERINGAPPLICATION FOR SPEECH RECOGNITION
AND MEDICAL DIAGNOSIS
Lilia Lazli1, Boukadoum Mounir2, Abdennasser Chebira3, Kurosh Madani3andMohamed Tayeb Laskri1
University,Badji MokharComputer Science (LRI/GRIA),Laboratory of research in1B.P.12 Sidi Amar 23000 AnnabaAlgeria.
[email protected],[email protected], http://www.univ-annaba.org 2Univesit du Qubec A Montral (UQAM), Canada
[email protected] Images, Signals and Intelligent Systems Laboratory (LISSI / EA 3956)PARIS XII University, Senart-Fontainebleau Institute of Technology,
Bat.A, Av. Pierre Point, F-77127 Lieusaint, France,
paris12.fr-http://www.univ,paris12.fr-kmadani}@univchebira,a{
ABSTRACT
The main goal of this paper is to comparethe performance which can be achieved byfive different approaches analyzing theirapplications potentiality on real worldparadigms. We compare the performanceobtained with (1) Multi-networkRBF/LVQ structure (2) Discrete HiddenMarkov Models (HMM) (3) HybridHMM/MLP system using a Multi Layer-Perceptron (MLP) to estimate the HMMemission probabilities and using the K-means algorithm for pattern clustering (4)Hybrid HMM-MLP system using theFuzzy C-Means (FCM) algorithm forfuzzy pattern clustering and (5) HybridHMM-MLP system using the GeneticAlgorithm (AG) for genetic clustering.Experimental results on Arabic speech
vocabulary and biomedical signals showsignificant decreases in error rates ofhybrid HMM/MLP/AG pattern recognitionin comparison to those of other researchexperiments by integrating three types offeatures (PLP, log-RASTA PLP, J-RASTA PLP) were used to test therobustness of our hybrid recognizer in thepresence of convolution and additivenoise.
KEYWORDS
Speech and medical recognition, J-RASTA PLP, fuzzy clustering, GeneticA l g o r i t h m , H M M / M L P m o d e l s .
1 INTRODUCTION
In many target (or pattern)classification problems theavailabilityof multiple looks at an object cansubstantially improve robustness andreliability in decision making. The use
of several aspects is motivated by thedifficulty in distinguishing betweendifferent classes from a single view atan object [9]. It occurs frequently thatreturns from two different objects atcertain orientations are so similar thatthey may easily be confused.Consequently, a more reliable decisionabout the presence and type of anobject can be made based uponobservations of the received signals or
patterns at multiple aspect angles. Thisallows for more information toaccumulate about the size, shape,composition and orientation of theobjects, which in turn yields moreaccurate discrimination. Moreover,when the feature space undergoeschanges, owing to different operatingand environmental conditions, multi-aspect classification is almost anecessity in order to maintain the
14
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://www.univ-annaba.org/http://www.univ-annaba.org/mailto:[email protected]:[email protected]://www.univ-paris12.fr/http://www.univ-paris12.fr/http://www.univ-paris12.fr/http://www.univ-paris12.fr/mailto:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]://www.univ-paris12.fr/mailto:[email protected]://www.univ-annaba.org/mailto:[email protected]:[email protected] -
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
2/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
performance of the pattern recognitionsystem.
In this paper we present the HiddenMarkov Model (HMM) and applythem to complex pattern recognition
problem. We attempt to illustrate someapplications of the theory of HMM toreal problems to match complex
patterns problems as those related tobiomedical diagnosis or those linked tosocial behavior modeling. Weintroduce the theory and the foundationof Hidden Markov Models (HMM). Inthe pattern recognition domain, and
particularly in speech recognition,HMM techniques hold an important
place [7]. There are two reasons whythe HMM exists. First the models arevery rich in mathematical structure andhence can form the theoretical basis foruse in a wide range of applications.Second the models, when applied
properly, work very well in practice forseveral important applications.However, standard HMM require theassumption that adjacent featurevectors are statistically independentand identically distributed. Theseassumptions can be relaxed byintroducing Neural Network (NN) inthe HMM framework.
Significant advances have beenmade in recent years in the area ofspeaker independent speechrecognition. Over the last few years,connectionist models, and Multi Layer-Perceptron (MLP) in particular, have
been widely studied as potentiallypowerful approaches to speechrecognition. These neural networksestimate the posterior probabilitiesused by the HMM. Among these, thehybrid approach using the MLP toestimate HMM emission probabilitieshas recently been shown to be
particularly efficient by example forFrench speech [1] and AmericanEnglish speech [14].
We then propose a hybridHMM/MLP model for speech
recognition and biomedical diagnosiswhich makes it possible to join thediscriminating capacities, resistance tothe noise of MLP and the flexibilitiesof HMMs in order to obtain better
performances than traditional HMM.We develop also, a method based onconcepts of fuzzy logic for clusteringand classification of vectors : FuzzyC-Means (FCM) algorithm, anddemonstrate its effectiveness withregard to K-Means (KM) traditionalalgorithm, owing to the fact that theKM algorithm provides a hard decision(not probabilitized) in connection withthe observations in the HMM states.
Regarding the other algorithm ofclustering, we specifically, methodsinvolving a supervised clustering by
partition and we chose one as the basisfor our work. This solution is to makethe choice of a measure that we use inour application. This algorithm is a"good" score in a test that measures thequality of a partition. We are thusreduced to an optimization problem.
The properties of this algorithm doesnot guarantee convergence to a globaloptimum, that is why we are interestedin a type heuristics genetic algorithms(GA), less likely to be trapped by theminimum local and now widely used inoptimization problems.
Genetic Algorithms (GA) are searchprocedures based on the mechanics ofgenetics and natural selection.
Numerous techniques and algorithmshave been developed to assist theengineer in the creation of optimumdevelopment strategies. These
procedures quickly converge tooptimal solutions after examining onlya small fraction of the search space andhave been successfully applied tocomplex engineering optimisation
problems.
15
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
3/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
2 HMM Advantages and Drawbacks
Standard HMM procedures, as definedabove, have been very useful for
pattern recognition; HMM can deal
efficiently with the temporal aspect ofpattern (including temporal distortionor time warping) as well as withfrequency distortion. There are
powerful training and decodingalgorithms that permit efficient trainingon very large databases, and forexample recognition of isolated wordsas well as continuous speech. Giventheir flexible topology, HMM caneasily be extended to include
phonological rules (e.g., building wordmodels from phone models) orsyntactic rules. For training, only alexical transcription is necessary(assuming a dictionary of phonologicalmodels); explicit segmentation of thetraining material is not required. Anexample of a sample HMM is given infigure. 1; this could be the model of asort assumed to be composed of threestationary states.
p(xi\q1) p(xi\q2) p(xi\q3)
Figure 1. Example of a three-state HiddenMarkov Models Where 1) {q1, q2, q3}: theHMM states. 2) p(qi\qj): the transition
probability of state qito state qj(i, j = 1..3). 3)p(xi\qj): the emission probability of observationxifrom the state qj(i = 1...number of trames, j
= 1..3).
However, the assumptions thatpermit HMM optimization andimprove their efficiency also, practice,limit their generality. As aconsequence, although the theory of
HMM can accommodate significantextensions (e.g., correlation of acousticvectors, discriminate training,),
practical considerations such asnumber of parameters and train-abilitylimit their implementations to simplesystems usually suffering from severaldrawbacks [2] including:- Poor discrimination due to trainingalgorithms that maximizes likelihoodsinstead of a posteriori probabilities
(i.e., the HMM associated with eachpattern unit is trained independently ofthe other models). Discriminatelearning algorithms do exist for HMM
but in general they have not scaledwell to large problems.- A priori choice of model topologyand statistical distributions, e.g.,assuming that the Probability DensityFunctions (PDF) associated asmultivariate Gaussian densities ormixtures of multivariate Gaussiandensities, each with a diagonal onlycovariance matrix (i.e., possiblecorrelation between the components ofthe acoustic vectors is disregarded).- Assumption that the statesequences are first-order Markovchains.- Typically, very limited acousticalcontext is used, so that possible
correlation between successiveacoustic vectors is not modeled verywell.Much Artificial Neural Network(ANN) based Automatic patternrecognition research has beenmotivated by these problems.
3 ESTIMATING HMM
LIKELIHOODS WITH ANN
ANN can be used to classify patternclasses such as units. For statistical
16
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
4/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
recognition systems, the role of thelocal estimator is to approximate
probabilities or PDF. Practically, giventhe basic HMM equations, we wouldlike to estimate something likep(xn\qk),
is the value of the probability densityfunction of the observed data vectorgiven the hypothesized HMM state.The ANN in particular the MLP can betrained to produce the posterior
probability p(qk\xn) of the HMM stategive the acoustic data. This can beconverted to emission PDF valuesusing Bayesrule.
Several authors [1-4, 7-10] haveshown for speech recognition that
ANN can be trained to estimate aposteriori probabilities of outputclasses conditioned on the input
pattern. Recently, this property hasbeen successfully used in HMMsystems, referred to as hybridHMM/ANN systems, in which ANNare trained to estimate local
probabilities p(qk\xn) of HMM statesgiven the acoustic data.
Since the network outputsapproximate Bayesian probabilities,
gk(xn,)is an estimate of:
)(
)()\()\(
n
kknnk
xp
qpqxpxqp
(1)
which implicitly contains the a prioriclass probability p(qk). It is thus
possible to vary the class priors duringclassification without retraining, since
these probabilities occur only asmultiplicative terms in producing thenetwork outputs. As a result, class
probabilities can be adjusted duringuse of a classifier to compensate fortraining data with class probabilitiesthat are not representative of actual useor test conditions [18], [19].
Thus, scaled likelihoodsp(xn\qk) foruse as emission probabilities instandard HMM can be obtained by
dividing the network outputs gk(xn) by
the training set, which gives us anestimate of :
)(
)\(
n
kn
xp
qxp
(2)
During recognition, the seatingfactorp(xn)is a constant for all classesand will not change the classification.It could be argued that, when dividing
by the priors, we are using a scaledlikelihood, which is no longer adiscriminate criterion. However, thisneeds not the true, since thediscriminate training has affected the
parametric optimization for the systemthat is used during recognition. Thus,this permit uses of the standard HMMformalism, while taking advantage ofANN characteristics.
The figure2 shows that anequilibrium that is reached in the idealcase does in fact correspond to thenetwork output g being equal to
posterior probability p. Consider anarea in feature space around a training
patternxn, and assume that we consideronly two classes: the target class forthis pattern, and the class of all patternsthat do not belong to the first class.Vectors in the selected area willundergo an upward force
corresponding to the gradient of theerror term for all patterns with a targetof 1; the quadratic error term in thiscase is (1-g)2, with a derivative of(1-g), and this case will occur a
fraction of the time given byprobability p. Therefore, the upwardforce (average gradient term) appliedto the region is equal to p(1-g). Thedownward force is similarly defined,and will balance for the equilibratedcase. This will only occur when thenetwork output is equal to the posterior
probability.
17
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
5/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
Figure 2. Geometric illustration of proof thatANN will (ideally) produce posterior
probabilities. Equilibrium is reached when theupward force, (due to the average of the ANNoutput from 1 for a target that should be high),is equal to the downward force (due to the
average deviation from 0 for a target thatshould be low).
4 WHY THIS IS GOOD ?
Since we ultimately derive essentiallythe same probability with a MLP as wewould with a conventional (e.g.,Gaussian mixture) estimator, what isthe point? There are at least two
potential advantages that we and others
have observed [4, 9, 10, 14]:
1) The standard statistical recognizersrequire strong assumptions about thestatistical character of the input, suchas parameterizing the input densities asmixtures of Gaussian densities with nocorrelation between features, or as the
product of discrete densities fordifferent features that are assumed to
be statistically independent. This typeof assumption is not required with aMLP estimator, which will be an
advantage particularly when a mixtureof feature types are used, e.g., binaryand continuous. Specifically, standardHMM approaches require theassumption that successive acoustic
vectors are uncorrelated. For the MLPestimator, multiple inputs can be usedfrom a range of time steps, and thenetwork will learn something about thecorrelation between the acousticinputs. Note that the use of such anetwork will lead to a more generalemession probability that may also beused in the global decoding. That is, ifc + d + 1 frames of acoustic vectors
dnncndn cn xxxX ,...,,..., are used as
input to provide contextual informationto the network, the output values of theMLP will estimate
.,...,1, KkXqp dn cnk This providesa simple mechanism for incorporatingacoustic context into the statisticalformulation.2) MLP are a good match todiscriminative objective functions(e.g., Mean Squared Error: MSE), and
so the probabilities will be optimizedto maximize discrimination betweensound classes, rather than to mostclosely match the distributions withineach class. It can be argued that such atraining is conservative of parameters,since the parameters of the estimate aretrained to split the space betweenclasses, rather than to represent thevolumes that compose the division ofspace constituting each class.
5 J-RASTA PLP ANALYSIS
Developing speech recognizer that arerobust under realistic acousticenvironment has been and still is amajor problem for the speechcommunity. Speech distortion can becategorized into two types. There islinear spectral distortion (i.e.,convolutional noise), which is typically
introduced by microphones and thetelephone channel. There is also
(1-g)
= (1-p)g
g
= p(1-g)
18
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
6/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
additive noise such as stationary orslowly varying background noise.
Three types of features (PLP, Log-RASTA PLP and J-RASTA PLP) wereused to test the robustness of our
hybrid recognizer in the presence ofconvolution and additive noise.Perceptual Linear Predictive (PLP)analysis is an extension of linear
predictive analysis that takes intoaccount some aspects of human sound
perception [20, 21]. Log-RASTA PLP(RelActive SpecTrAl processing) is
based on PLP but also aims at reducingthe effect of linear spectral distortion.In addidtion of PLP and log-RASTA
PLP analysis already used in our pastworks [22], we used her J-RASTAPLP tries to handle both linear spectraldistortion and additive noisesimultaneously [23].
J-RASTA-PLP [23] is used as theacoustic pre-processor for both thedatabases experiments. Each frame ofthe feature vector represents 25 ms ofspeech, with 12.5 ms overlap ofconsecutive frames. J-RASTA PLPwas chosen for its robustness to linearspectral distortions in speech signalsthat are often introduced bycommunication channels.
J-RASTA-PLP attempts to makerecognition more robust and invariantto acoustic environment variables. It insome sense performs speechenhancement but for the purposes offeature extraction. The speech
enhancement process added wasdesigned to improve intelligibility ofnoisy speech for human listeners.Addition of this enhancement mayimprove recognition scores further.
Speech recordings were sampledover the microphone at 11 kHz. After
pre-emphasis (factor 0.95) andapplication of a Hamming windows,The J-RASTA PLP features werecomputed every 10 ms on analysis
windows of 30 ms.
Each frame is represented by 12components plus energy (J-RASTAPLP + E). The values of the 13coefficients are standardized by theirstandard deviation measured on the
frames of training. The features set forour hybrid HMM/MLP system wasbased on a 26 dimensional vectorcomposed of the cepstral parameters(J-RASTA PLP parameters), the cepstral parameters, the energy and
the energy.
RASTA was configured to
incorporate high-pass filtering and
slight spectral subtraction. A constant J
of 1e-6 was used for training. Multiple
regressions J mapping was used duringtesting.
Nine frames of contextual
information was used at the input of
the MLP (9 frames of context being
known as yielding usually the best
recognition performance).
6 CLUSTERING PROCEDURE
One of the basic problems that aries in
a great variety of fields, including
pattern recognition, machine learning
and statistics, is the so-called clustering
problem [5,13,15,22]. The fundamental
data clustering problem may be
defined as discovering groups in data
or grouping similar objects together.
Each of these groups is called a cluster,
a region in which the density of objects
is locally higher than in other regions.In this paper, data clustering is
viewed as a data partitioning problem.
Several approaches to find groups in a
given database have been developed
[5, 13, 15], but we focus on the K-
Means (KM) algorithm as it is one of
the most used iterative partitional
clustering algorithms and because it
may also be used to initialize more
expensive clustering algorithms (e.g.,
the EM algorithm).
19
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
7/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
6.1 Drawbacks of the K-Means
Algorithm
Despite being used in a wide array ofapplications, the KM algorithm is not
exempt from drawbacks. Some of thesedrawbacks have been extensivelyreported in literature [5, 13, 15, 22].The most important are listed below:- As many clustering methods, theKM algorithm assumes that thenumber of clusters kin the database isknown beforehand which, obviously,is not necessarily true in real-worldapplications.
- As an iterative technique, the KMalgorithm is especially sensitive toinitial starting conditions (initialclusters and instance order).- The KM algorithm convergesfinitely to a local minimum. Therunning of the algorithm defines adeterminist mapping from the initialsolution to the final one.- The vectorial quantization and in
particular, the KM algorithm providesa hard and fixed decision not
probabilitized which does not transmitenough information on the realobservations.
Many clustering algorithms based onfuzzy logic concepts have beenmotivated for this last problem.
6.2 Fuzzy C-Means Algorithm
In general, and by example for speechcontext a purely acoustic segmentationof the speech cannot suitably detect the
basic units of the vocal signal. One ofthe causes is that the borders betweenthese units are not acoustically defined.For this reason, we were interested touse the automatic classificationmethods which are based on fuzzylogic in order to segment the datavectors. Among the adapted
algorithms, we have chosen the FCMalgorithm which has already been
successfully used in various fields andespecially in the image processing[13], [15].
FCM algorithm is a method ofclustering which allows one piece of
data to belong to two or more clusters.The use of the measurement data isused in order to take not of pattern data
by considering in spectral domain only.However, this method is applied forsearching some general regularity inthe collocation of patterns focused onfinding a certain class of geometricalshapes favored by the particularobjective function [5]. The FCMalgorithm is based on minimization of
the following objective function, withrespect to U, a fuzzy c-partition of thedata set, and to V, a set of K
prototypes [5]:
m
j
c
i
ij
m
ijm VXuVUJ1 1
2
),(
11
(3)
Where m is any real number greater
than 1, uijis the degree of membershipofxjin the cluster i, xjis the jth of d-dimensional measured data, Viis the d-dimension center of the cluster, and ||*||is the any norm expressed thesimilarity between any measured dataand the center.
Fuzzy partition is carried outthrough an iterative optimization of (3)with the update of membership u andthe cluster centers Vby [5]:
c
k
m
ik
ij
ij
d
du
1
12
1
(4)
n
j
m
ij
n
j
j
m
ij
i
u
Xu
V
1
1
(5)
20
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
8/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
The criteria in this iteration will stop
when ijijij
uu max where is a
termination criterion between 0 and 1.As a part of our main objective, we
aim to findthe best and the worst set ofinitial starting conditions to approachthe extremes of the probabilitydistributions of the square-error values.Due to the computational expense of
performing an exhaustive search wetackle the problem using GeneticAlgorithms.
6.3 Genetic Algorithms
Roughly speaking, we can say thatGenetic Algorithms (GA) are kinds ofevolutionary algorithms, that is,
probabilistic search algorithms whichsimulate natural evolution [24, 25, 26].GA are used to solve combinatorialoptimization problems following therules of natural selection and naturalgenetics. They are based upon thesurvival of the fittest among string
structures together with a structuredyet randomized information exchange.Working in this way and under certainconditions. GA evolve to the globaloptima with probability arbitrarilyclose to 1.
When dealing with GA, the searchspace of a problem is represented as acollection of individuals. Theindividuals are represented bycharacter strings. Each individual is
coding a solution to the problem. Inaddition, each individual hasassociated a fitness measure. The partof the space to be examined is calledthepopulation. The purpose of the useof a GA is to find the individual fromthe search space with the best genetic
material.Figure 3 shows the pseudo-code of
the GA that we use. First, the initialpopulation is chosen and the fitness of
each of its individuals is determined. Inour case, the result of FCM clustering
is considering as initial population.Next, in every iteration, two parentsare selected from the population. This
parental couple produces childrenwhich, with a probability near zero, are
mutated, i.e., hereditary distinctions arechanged. After the evaluation of thechildren, the worst individual of the
population is replaced by the fittest ofthe children. This process is iterateduntil a convergence criterion issatisfied.
The operators which define thechildren production process and themutation process are the crossoveroperator and the mutation operator
respectively. Both operators areapplied with different probabilities and
play different roles in the GA.Mutation is needed to explore newareas of the search space and helps thealgorithm avoid local optima.Crossover is aimed to increase theaverage quality of the population. Bychoosing adequate crossover andmutation operators as well as anappropriate reduction mechanism, the
probability that the GA reaches a near-optimal solution in a reasonablenumber of iterations increases.
Figure 3. The pseudo-code of the GeneticAlgorithm
Begin GA
Choose initial population at randomEvaluate initial populationWhile not convergence criterion doBegin
Select two parents from the currentpopulationProduce children by the selected parents
Mutate the childrenEvaluate the childrenReplace the worst individual of the population
by the best childEnd
Output the best individual foundEnd GA.
21
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
9/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
7 VALIDATION ON SPEECH
AND BIOMEDICAL SIGNAL
CLASSIFICATION PARADIGM
Further assume that for each class in
the vocabulary we have a training setof k occurrences (instances) of eachclass where each instance of thecategories constitutes an observationsequence. In order to build our tool, we
perform the following operations.
1. For each class v in the vocabulary,we estimate the model parameters v(A,B, ) that optimize the likelihood ofthe training set for the vthcategory.
2. For each unknown category to berecognized, the processing of Fig. 4 iscarried out: measurement of theobservation sequence O = {o1, o2,,oT}, via a feature analysis of thesignal corresponding to the class; thecomputation of model likelihoods forall possible models,P(O/v), 1 v V;at the end the selection of the categorywith the highest likelihood.
7.1 Speech Databases
Three speech databases have been usedin this work:
1) The first one referred to as DB1,the isolated digits task has 13 words inthe vocabulary: 1, 2, 3, 4, 5, 6, 7, 8, 9,zero, oh, yes, no. They are spoken by30 speakers, producing a total of 3900utterances (each word should bemarked 10 times). The digits databasehas about 30,000 frames of trainingdata. This first corpus consists ofisolated digits collected over themicrophone. This data base was used
to conduct extensive testings of therobustness of various signal processingfront ends (PLP, log-RASTA PLP, J-RASTA PLP) in the presence of bothconvolution and additive noise.2) The second database, referred toas DB2 contained about 50 speakerssaying their last name, first name, thecity of birth and the city of residence.Each word should be marked 10 times.
The used training set in the followingexperiments consists of 2000 sounds.
Figure 4. Block diagram of a speech and biomedical database HMM recognizer
22
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
10/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
3) The third database, referred to asDB3, contained the 13 control words(i.e. View/new, save/save as/save all)so that each speaker pronounces eachcontrol word 10 times. The used
training set in the followingexperiments consists of 3900 soundssaying by 30 speakers.
For each databases, 12 speakerswere used for training (a non-overlapping subset of these were usedfor cross-validation used to adapt thelearning rate of the MLP), while theremaining 8 speakers were used fortesting.
The acoustic feature were quantizedinto independent codebooks accordingto the FCM algorithms respectively:
- 128 clusters for the J-RASTA PLPvectors.
- 128 clusters for the first timederivative of cepstral vectors.
- 32 clusters for the first timederivative of energy.
- 32 clusters for the second timederivative of energy.
7.2 Biomedical Database
For task of biomedical databaserecognition using HMM/MLP model,the object of this survey is theclassification of an electric signalcoming from a medical test [16],experimental device is described in
Fig. 5. The used signals are calledPotentials Evoked Auditory (PEA),examples of PEA signals are illustratedin Fig. 6. Indeed, the explorationfunctional otoneurology possesses atechnique permitting the objectivesurvey of the nervous conduction alongthe auditory ways. The classification ofthe PEA is a first step in thedevelopment of a help tool to thediagnosis. The main difficulty of this
classification resides in theresemblance of signals corresponding
to different pathologies, but also in thedisparity of the signals within a sameclass. The results of the medical testcan be indeed different for twodifferent measures for the same patient.
The PEA signals descended of theexamination and their associatedpathology are defined in a data basecontaining the files of 11185 patients..We chose 3 categories of patients (3classes) according to the type of theirtrouble. The categories of patients are:1) Normal (N): the patients of thiscategory have a normal audition(normal class).2) Endocochlear (E): these patients
suffer from disorders that touches thepart of the ear situated before thecochlea (class Endocochlear).3) Retrocochlear (R): these patientssuffer from disorders that touches the
part of the ear situated to the level ofthe cochlea or after the cochlea. (classretrocochlear).
We selected 213 signals(correspondents to patients). So thatevery process (signal) contains 128
parameters. 92 among the 213 signalsbelong to the N class, 83, to the class Eand 38 to the class R. To construct our
basis of training, we chose the signalscorresponding to pathologies indicatedlike being certain by the physician. AllPEA signals come from the sameexperimental system. In order to valuethe HMM/MLP realized system andfor ends of performance comparison,
we forced ourselves to apply the sameconditions already respected in thework of the group describes in thefollowing article [17] and that uses amulti-network structure heterogeneousset to basis of RBF and LVQ networksand the work describes in the followingarticles [11], [12] and that uses adiscrete HMM. For this reason, the
basis of training contains 24 signals, ofwhich 11 correspondent to the class R,
6 to the class E and 7 to the N class.
23
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
11/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
After the phase of training, when thenon learned signals are presented to theHMM, the corresponding class must bedesignated. For the case in which wewish to use an HMM with a discrete
observation symbol density, rather thancontinuous vectors above, a quantizedvector VQ is required to map eachcontinuous observation vector into adiscrete codebook index. Once thecodebook of vectors has been obtained,the mapping between continuousvectors and codebook indices becomesa simple nearest neighbor computation,i.e., the continuous vector is assignedthe index of the nearest codebook
vector. Thus the major issue in VQ isthe design of an appropriate codebookfor quantization.
Codebook sizes of from M = 64vectors have been used in biomedicaldatabase recognition experiments usingHMM/MLP model.
Figure 5.PEA generation experiment
Figure 6.(A) PEA signal for normal patient,
(B ) Patient with auditory disorder
7.3 Discrete MLP with entriesprovided by the FCM algorithm
For this case, we compare theperformance of the basis hybrid model
with that of an hybrid HMM/MLPmodel using in entry of the network anacoustic vector composed of realvalues which were obtained byapplying the FCM algorithm with 2880real components corresponding to thevarious membership degrees of theacoustic vectors to the classes of the"code-book ". We presented eachcepstral parameter (J-RASTA PLP, J-RASTA PLP, E, E) by a real
vector which the components definitethe membership degrees of theparameter to the various classes of the"code-book ". For speech databases,for example, we reported the valuesused for DB2. 10-state, strictly left-to-right, word HMM with emission
probabilities computed from an MLPwith 9 frames of quantizied acousticvectors at the input, i.e., the currentacoustic vector preceded by the 4
acoustic vectors on the left context andfollowed by the 4 acoustic vectors onthe right context. the entry layer ismade up of a real vector with 2880 realcomponents corresponding to thevarious membership degrees of theacoustic vectors to the classes of the"code-book ".
Thus a MLP with only one hiddenlayer including 2880 neurons at theentry, 30 neurons for the hidden layerand 10 output neurons was trained. Thethree-layer MLP is trained by usingstochastic gradient descent, andrelative entropy as the error criterion.A sigmoid function is applied to thehidden layer units, and softmax(exponential of the units weighted
sum normalized by the sum ofexponentials for the entire layer) isused as the output nonlinearity. The
number of neurons of the hidden layerwas choosen empirically.
24
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
12/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
For the PEA signals, a MLP with 64neurons at the entry, 18 neurons for thehidden layer and 5 output neurons wastrained.
7.4 Discrete MLP with entriesprovided by the GA algorithm
In the case of application of GA, theinitial population is chosen and the
fitness of each of its individuals isdetermined. In our case, the result ofFCM clustering is considering as initial
population.
For clarification, we report after,
some results of the clusterning of DB1
7.4.1 Coding of individuals
The key idea is that the encoding of anindividual must enable sampling"effective" in the search space. Wechose a representation based onindexing of classes. An individual istherefore encoded by a chain of
integers whose ordinal value (= index)represents the number of the object andthe cardinal value corresponds to thenumber of the class to which the object
belongs.
Practical consideration: The followingpartition of objects (acoustic vectors)in 10 classes (10-digit).class0 = {V1,V2,..V21,V22},class 1 = {V23,V24,...V34,V35},class 2 = {V36,V37,...V?}, ,class 9 = {}will be represented bythe following chromosome:
000000... 111111 888888 999999
Figure 7. Example of coding of individuals
7.4.2 Population size
It is, a fundamental parameter of theGA, for which there is no general
criteria for determining what optimumsize should be. We tried to adapt thesize of the population the size of the
problem to treat.
7.4.3 Meri t function
An individual or even genotyperepresents, let us recall, partitioning ofk observations all classes. We use thefunction of following representation:
li Cx il
l x
Cg 1
(6)
withglthe centre of gravity of the class
Cl, l[1,M],Mthe number of classes(digit in our case), which allows toinfer theM representatives. The test ofintra-class criteria, measurement of thequality of a partition whose expressionis given in (7) is our cost objective
function , and we seek to minimize thevalue. This objective functiontransformation (criterion) led to the
function of merit that we use in ouralgorithm to measure the quality of an
individual.
),(1
2
li
M
l Cx
i gxdpw
li
(7)
where pi means the weight of the ith
object.
7.4.4 Reproduction
In this phase, is created on eachiteration, a new population, byapplying the genetic operators:selection, crossing and mutation. It isto select the most appropriateindividuals in the population within themeaning of the function of merit,following a method, and reproducethem as what in the next generation.
In our GA we used called "steadystate genetic algorithm" consisting of
variant to replace that a certainpercentage of the population each
25
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
13/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
generation (the size of the populationremains constant over the course of theGA). The evolution of the populationis ensured through operators in the
selection, the crossing and mutation
that allow to combine and modifychromosome. We summarize, usedgenetic operators.
- Selection: there are many selectionstrategies, we present just two briefly.In the selection of wheel type the
probability of selection of anindividual is proportional to its valueof adaptation. The selection of typetournament is to select a subset of the
population and retain the bestindividual of this sub-group. Seen thatin our case, the problem is aminimization of the fitness problem.The best individual is one that has thesmallest value of fitness
Practical consideration: According tofitness of each chromosome of thepopulation, it selects the bestindividuals.
Table 1. Selection of the best individuals
Parents fitness1 6.4520
2 6.8843
3 7.3374
4 6.2424
5 6.7076
6 7.1452
7 6.4746
8 6.8331
Parents fitness
1 6.2424
2 6.4520
3 6.4746
4 6.7076
5 6.8331
6 6.8843
7 7.1452
8 7.3374
- Crossing: it allows to create twoindividuals (children) by combininggenes from both parents obtained in the
previous step. This operator allows toincrease the quality of a populationaverage. We used the variant to arandomly chosen from possible Cup
points crossing point (for a chainoflength lthere are l 1). However, this
solution may lead to a child non-viable (or even both), this is why it
tests eventually to different cut-offpoints. Can optionally restrict thisroute by setting a priori, a number ofiterations. If at the end of this audit, thechild remains unsustainable, we take
one of the parents. A child is said to benon-viable, if it does not have the samenumber of classes as these parents.
Practical consideration: Are the twofollowing parents selected from ourapplication.Parent 1 Cut-of
00000000 11110111 ... 33633333 44744477
55556655 66665566 88338822... 99999990
Parent 200000000 11111115... 33333333 44444444...
55556565 66666666 88888833 99999999...
Figure 8.Both parents
It gets 102 cut point. The two childrenobtained are the following:
Child 1
00000000 11110111 33633333 44444444
55556565 66666666 88888833 99999999
Child 2
00000000 11111115 33333333 44744477
55556655 66665566 88338822 99999990
Figure 9.Two children
In regards to the value of theprobability of crossing, we followedthe recommendations proposed by
Goldberg [24], citing the work of Dejong, offers a high probability ofcrossing ( 0)5).
- Mutation: this stage allows tointroduce random variation in genotypeof the individual (here the children).This operator allows therefore toexplore new areas of research space,thus decreasing the risk of converge
towards local minima.. Each gene istherefore likely to change in a given
26
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
14/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
probability, it is also possible to chooserandomly a portion of the genes. Themutation may lead to an unsustainableindividual, this possibility is thereforetaken into account at the level of the
coding of this operator.
Practical consideration:Be the genesof the enfant1 presented above:Child before mutation
00000000 11110111 33633333 4444444455556565 66666666 88888833 99999999
Child after mutation
00000000 11110111 33633333 4444444455556565 66666666 88888833 97999999
Change of gene
Figure 10.Example of mutation
Here again, we followedrecommendations of Goldberg[24] forthe probability of mutation, byadopting a value inversely proportionalto the size of the population.
7.4.5 Replacement of the newpopulation
We conducted a replacement of badsolutions. For this purpose, we haveclassified all solutions (parents andchildren) according to their adaptationand are kept in our people that the Nfirst individuals, i.e. more small valuesof fitness, obtaining the new
population which consists of theNbestadaptations. If the number ofgenerations is reached, then the best
solution is extracted, and moves to thenext step; otherwise, it is a newiteration of reproduction.
Practical consideration:The populationof parents and children obtained, is
presented in the table on the left. Tomake a replacement of the worst
people, we have classified all theparents and their children according to
their fitness growing. It will keep in thepopulation than theNfirst individuals.The new population is presented in thetable at right.
Table 2. Best population obtained
Child fitness Parents fitness
1 4.2582 1 3.1490
2 11.5489 2 3.1803
3 7.2761 3 3.2824
4 9.1245 4 3.3785
5 13.5487 5 3.5947
6 5.0214 6 3.8692
7 3.1421 7 3.8761
83.8692 8 3.8815
9 3.5103
10 7.6815
11 7.2584
12 3.1436
New
populationfitness
1:Child 7 3.1421
2: Child 12 3.1436
3: Parent 1 3.1490
4: Parent 2 3.1803
5: Parent 3 3.2824
6: Parent 4 3.3785
7: Child 9 3.5103
8: Parent 5 3.5947
7.4.6 Test of judgment
There is no criterion of judgment thatguarantee the convergence of the GAto an optimal solution. It is customaryto fix a priori, the number of iterations.When this number is reached, it takesthe best chromosome obtained as theoptimal solution. We used this methodin our experiments.
For the GA, we set a maximumiteration number that represents the testcase.
Figure 11.Convergence of the classificationprocess for the base of the Arabic numerals (0-9).
27
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
15/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
7 . 5 R e s u l t s a n d D i s c u s s i o n
Five types of models were comparedwithin the framework of speech andmedical databases recognition. Theexperimental results are reported intables 3 and 4. The table 3 summarizesthe various results obtained of themodels using the acoustic parameters
provided by a Log-RASTA PLP and J-RASRA PLP analysis. For the speechdatabases, all the tests of the fivemodels were carried out on the DB1 infirst then on the DB2 and finally on
DB3.For the majority of cases, we candraw a preliminary conclusion fromthe results reported in tables 3 and 4for the speech recognition and the PEAsignals diagnosis. We reported her, theresults describes in [22].- For the speech databases recorded inreal conditions and noisy environment,in majority of cases, the result withapplication of J-RASTA PLP analysis
is better than application of Log-RASTA PLP, this confirmed thetheory.- By comparing the rate with the rateof classification of the system using themulti-network structure (RBF/LVQ)describes in [17], the rate of thediscrete HMM is better.- The hybrid discrete HMM/MLPapproaches always outperformsstandard discrete HMM.
- The hybrid discrete HMM/MLPsystem using the GA clustering gavethe best results and is more performthan the hybrid discrete HMM/MLPsystem using the FCM clustering andthe rate of th later is beter than hybridmodel using KM clustering.
Table 3. Recognition rates for the tree speech databases
(DB1, DB2, and DB3) with the five different types ofmodels (M1, M5) and application of acoustic analysisLog-RASTA PLP and J-RASTA PLP. M1: RBF/LVQ,M2: Discrete HMM, M3: Hybrid HMM/MLP with KMentries, M4: Hybrid HMM/MLP with FCM entries andM5: Hybrid HMM/MLP with AG entries.
DB1 DB2 DB3Log-
RASTA
PLP
J-
RASTA
PLP
Log-
RASTA
PLP
J-
RASTA
PLP
Log-
RASTA
PLP
J-
RASTA
PLP
M1
M2
M3
M4
M5
80.33
85.75
90.42
97.28
96.32
79,30
87,45
92,96
97,46
98,37
85.12
89.68
92.37
96.16
96.82
85,70
90,23
93,65
96,89
99,32
72.84
75.31
75.84
73.23
75.22
71,78
76,48
80,35
82,62
82,95
Table 4. Recognition rates for the biomedical databaseswith the five different type of models (M1,, M5).
Models Recognition rates
M1 80.66
M2 84.48
M3 90.12
M4 93.76
M5 95.23
8 CONCLUSION AND FUTURE
WORK
In this paper, we presented the test offive types of models in the frameworkof speech recognition and medicaldiagnosis. A discriminate training
algorithm for hybrid HMM/MLPsystem based on the genetic clusteringare described. Our results on isolatedspeech and biomedical signalsrecognition tasks show an increase inthe estimates of the posterior
probilities of the correct class aftertraining, and significant decreases inerror rates in comparison to the fivesystems: 1) Multi-network RBF/LVQstructure, 2) Discrete HMM, 3)
HMM/MLP approach with KMclustering, 4) HMM/MLP approach
28
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
16/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
with FCM clustering and 5)HMM/MLP approach with GAclustering.
These preliminary experiments haveset a baseline performance for our
hybrid GA/HMM/MLP system. Betterrecognition rates were observed. Fromthe effectiveness view point of themodels, it seems obvious that thehybrid models are more powerful thandiscrete HMM or multi-networkRBF/LVQ structure for the PEAsignals diagnosis and speech databases.
We thus envisage improving theperformance of the suggested systemwith the following points:
- It would be important to use othertechniques of speech parametersextraction and to compare therecognition rate of the system with thatusing the log-RASTA PLP and J-RASTA PLP analysis. We think ofusing the LDA (Linear DiscriminateAnalysis) and CMS (Cepstral MeanSubtraction) owing to the fact thatthese representations are currently
considered among most powerful inASR.
- It appears also interesting to use thecontinuous HMM with a multi-Gaussian distribution and to comparethe performance of the system withthat of the discrete HMM.
- In addition, for an extended speechvocabulary, it is interesting to use the
phonemes models instead of words,which facilitates the training withrelatively small bases.
- For the PEA signals recognition, themain idea is to define a fusion scheme:cooperation of HMM with the multi-network RBF/LVQ structure in orderto succeed to a hybrid model andcompared the perfomance with theHMM/MLP/GA model proposed inthis paper.
9 REFERENCES
1. O. DEROO, C. RIIS, F. MALFRERE,H.LEICH, S. DUPONT, V.FONTAINE AND J.M. BOTE.Hybrid HMM/ANN system for
speaker independent continuous speechrecognition in French ". Thesis, Facult
polytechnique de Mons TCTS,BELGIUM, (1997).
2. H. BOURLARD, S. DUPONT. "Sub-band-based speech recognition". InProc, IEEE International, Conf,Acoustic, Speech and Signal Process,
pp.1251-1254, MUNICH, (1997).3. F. BERTHOMMIER, H. GLOTIN. "A
new SNR-feature mapping for robustmulti-stream speech recognition". InBerkeley University of California,
editor, Proceeding. International.Congress on Phonetic Sciences(ICPhS), Vol1 of XIV, pp. 711-715,SANFRANCISCO,(1999).
4. J-M. BOITE, H. BOURLARD, B.DHOORE, S. ACCAINO,
J.VANTIEGHEM. "Task independentand dependent training: performancecomparison of HMM and hybridHMM/MLP approaches". IEEE, vol.I,
pp.617-620, (1994).5. J.C. BEZDEK, J. KELLER, R.
KRISHNAPWAM and N.R. PAL."Fuzzy models and algorithms for
pattern recognition and imageprocessing". KLUWER, Boston,LONDON, (1999).
6. H. HERMANSKY, N. MORGAN."RASTA Processing of speech". IEEETrans. On Speech and AudioProcessing, vol.2, no.4, pp. 578-589,(1994).
7. A. HAGEN, A. MORRIS. "Comparisonof HMM experts with MLP experts inthe full combination multi-band
approach to robust ASR". To appear inInternational Conference on SpokenLanguage Processing, BEIJING,(2000).
8. A. HAGEN, A. MORRIS. "From multi-band full combination to multi-streamfull combination processing in robustASR" to appear in ISCA TutorialResearch Workshop ASR2000, Paris,FRANCE,(2000).
9. L. LAZLI, M. SELLAMI. "HybridHMM-MLP system based on fuzzylogic for arabic speech recognition".
PRIS2003, The third internationalworkshop on Pattern Recognition in
29
-
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
17/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
Information Systems, pp. 150-155,April 22-23, Angers, FRANCE, (2003).
10. L. LAZLI, M. SELLAMI."Connectionist Probability Estimatorsin HMM Speech Recognition using
Fuzzy Logic". MLDM 2003: the 3rd
international conference on MachineLearning & Data Mining in patternrecognition, LNAI 2734, Springer-verlag, pp.379-388, July 5-7, Leipzig,GERMANY, (2003).
11. L. LAZLI, A-N. CHEBIRA, K.
MADANI. "Hidden Markov Models for
Complex Pattern Classification". Ninth
International Conference on Pattern
Recognition and Information
Processing, PRIP07.http://uiip.bas-
net.by/conf/prip2007/prip2007.php-
id=200.htm, may 22-24, Minsk,
BELARUS, (2007).12. L. LAZLI, A-N. CHEBIRA, M-T.
LASKRI, K. MADANI. "Using hidden
Markov models for classification of
potentials evoked auditory". Confrence
maghrbine sur les technologies de
linformation, MCSEAI08, pp. 477-
480, avril 28-30, USTMB, Oran,
ALGERIA, (2008).13. D-L. PHAM, J-L. PRINCE. "An
Adaptive Fuzzy C-means algorithm forImage Segmentation in the presence ofIntensity In homogeneities". PatternRecognition Letters. 20(1), pp. 57-68,(1999).
14. S-K. RIIS, A. KROGH . "HiddenNeural Networks: A framework forHMM-NN hybrids". IEEE 1997, toappear in Proc. ICASSP-97, Apr 21-24,Munich, GERMANY, (1997).
15. H. TIMM . "Fuzzy Cluster Analysis ofClassified Data". IFSA/Nafips,VANCOUVER, (2001).
16. J-F. MOTSH. "La dynamiquetemporelle du trons crbral: Recueil,extraction et analyse optimale des
potentiels voqus auditifs du tronccrbral". Thesis, University of"Crteil", PARIS XII, (1987).
30
http://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htm -
8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO
18/18
International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)
17. A-S. DUJARDIN. " Pertinence d'uneapproche hybride multi-neuronale dansla rsolution de problmes lis audiagnostic industrile ou mdical".Internal report, I2S laboratory, IUT of"Snart Fontainebleau", University of
Paris XII, Avenue Pierre Point, 77127Lieusant, FRANCE, (2006).
18. A. MORRIS, A. HAGEN, H. GLOTIN, H.BOURLARD. "Multi-stream adaptativeevidence combination for noise robustASR". Accepted for publication in SpeechCommunication, (2000).
19. A. MORRIS, A. HAGEN, H.BOURLARD. "MAP combination ofmulti-stream HMM or HMM/ANNexperts". Accepted for publication inEuro-speech 2001, Special Event NoiseRobust Recognition, Aalborg,DENMARK, (2001).
20. H. HERMANSKY. "Perceptual LinearPredictive (PLP) Analysis for Speech".J. Acoust. Soc. Am., pp. 1738-1752,(1990).
21. H. HERMANSKY. "RASTA Processingof speech". IEEE Trans. On Speech andAudio Processing,vol.2, no.4, pp. 578-589, (1994).
22. L. LAZLI, A. CHEBIRA, M-T.LASKRI, K. MADANI. " HybridHMM/ANN system using fuzzyclustering for speech and medical
pattern recognition". DICTAP 2011,Part II, CCIS 167, Springer-Verlag
Berlin Heidelberg, pp. 557-570, (2001).23. J. KOEHLER, N. MORGAN, H.
HERMANSKY, H.G HIRSCH, G.TONG. "Integrating RASTA-PLP intoSpeech Recognition". Proceedings ofthe Int. Conf. on Acoustics Speech andSignal Processing, Adelaide, Australia,(1994).
24. D. E. GOLDBERG. "Algoritmesgntiques - Exploration, optimisationetapprentissage automatique". AddisonWesley, (1994).
25. X. Yin and N. Germany. "A FastGenetic Algorithm with sharing schemeusing cluster analysis methods inMultimodal function optimization". InProceedings of the Artificial neural Netsand Genetic algorithms, (1993).
26. D. Goldberg. "Genetic Algorithms".Addison Wesley editor.
31