connectionist propability estimators in hmm using genetic clustering application for speech...

Upload: ijdiwc

Post on 03-Jun-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    1/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    CONNECTIONIST PROBABILITY ESTIMATORS IN HMM

    USING GENETIC CLUSTERINGAPPLICATION FOR SPEECH RECOGNITION

    AND MEDICAL DIAGNOSIS

    Lilia Lazli1, Boukadoum Mounir2, Abdennasser Chebira3, Kurosh Madani3andMohamed Tayeb Laskri1

    University,Badji MokharComputer Science (LRI/GRIA),Laboratory of research in1B.P.12 Sidi Amar 23000 AnnabaAlgeria.

    [email protected],[email protected], http://www.univ-annaba.org 2Univesit du Qubec A Montral (UQAM), Canada

    [email protected] Images, Signals and Intelligent Systems Laboratory (LISSI / EA 3956)PARIS XII University, Senart-Fontainebleau Institute of Technology,

    Bat.A, Av. Pierre Point, F-77127 Lieusaint, France,

    paris12.fr-http://www.univ,paris12.fr-kmadani}@univchebira,a{

    ABSTRACT

    The main goal of this paper is to comparethe performance which can be achieved byfive different approaches analyzing theirapplications potentiality on real worldparadigms. We compare the performanceobtained with (1) Multi-networkRBF/LVQ structure (2) Discrete HiddenMarkov Models (HMM) (3) HybridHMM/MLP system using a Multi Layer-Perceptron (MLP) to estimate the HMMemission probabilities and using the K-means algorithm for pattern clustering (4)Hybrid HMM-MLP system using theFuzzy C-Means (FCM) algorithm forfuzzy pattern clustering and (5) HybridHMM-MLP system using the GeneticAlgorithm (AG) for genetic clustering.Experimental results on Arabic speech

    vocabulary and biomedical signals showsignificant decreases in error rates ofhybrid HMM/MLP/AG pattern recognitionin comparison to those of other researchexperiments by integrating three types offeatures (PLP, log-RASTA PLP, J-RASTA PLP) were used to test therobustness of our hybrid recognizer in thepresence of convolution and additivenoise.

    KEYWORDS

    Speech and medical recognition, J-RASTA PLP, fuzzy clustering, GeneticA l g o r i t h m , H M M / M L P m o d e l s .

    1 INTRODUCTION

    In many target (or pattern)classification problems theavailabilityof multiple looks at an object cansubstantially improve robustness andreliability in decision making. The use

    of several aspects is motivated by thedifficulty in distinguishing betweendifferent classes from a single view atan object [9]. It occurs frequently thatreturns from two different objects atcertain orientations are so similar thatthey may easily be confused.Consequently, a more reliable decisionabout the presence and type of anobject can be made based uponobservations of the received signals or

    patterns at multiple aspect angles. Thisallows for more information toaccumulate about the size, shape,composition and orientation of theobjects, which in turn yields moreaccurate discrimination. Moreover,when the feature space undergoeschanges, owing to different operatingand environmental conditions, multi-aspect classification is almost anecessity in order to maintain the

    14

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://www.univ-annaba.org/http://www.univ-annaba.org/mailto:[email protected]:[email protected]://www.univ-paris12.fr/http://www.univ-paris12.fr/http://www.univ-paris12.fr/http://www.univ-paris12.fr/mailto:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]:kmadani%[email protected]://www.univ-paris12.fr/mailto:[email protected]://www.univ-annaba.org/mailto:[email protected]:[email protected]
  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    2/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    performance of the pattern recognitionsystem.

    In this paper we present the HiddenMarkov Model (HMM) and applythem to complex pattern recognition

    problem. We attempt to illustrate someapplications of the theory of HMM toreal problems to match complex

    patterns problems as those related tobiomedical diagnosis or those linked tosocial behavior modeling. Weintroduce the theory and the foundationof Hidden Markov Models (HMM). Inthe pattern recognition domain, and

    particularly in speech recognition,HMM techniques hold an important

    place [7]. There are two reasons whythe HMM exists. First the models arevery rich in mathematical structure andhence can form the theoretical basis foruse in a wide range of applications.Second the models, when applied

    properly, work very well in practice forseveral important applications.However, standard HMM require theassumption that adjacent featurevectors are statistically independentand identically distributed. Theseassumptions can be relaxed byintroducing Neural Network (NN) inthe HMM framework.

    Significant advances have beenmade in recent years in the area ofspeaker independent speechrecognition. Over the last few years,connectionist models, and Multi Layer-Perceptron (MLP) in particular, have

    been widely studied as potentiallypowerful approaches to speechrecognition. These neural networksestimate the posterior probabilitiesused by the HMM. Among these, thehybrid approach using the MLP toestimate HMM emission probabilitieshas recently been shown to be

    particularly efficient by example forFrench speech [1] and AmericanEnglish speech [14].

    We then propose a hybridHMM/MLP model for speech

    recognition and biomedical diagnosiswhich makes it possible to join thediscriminating capacities, resistance tothe noise of MLP and the flexibilitiesof HMMs in order to obtain better

    performances than traditional HMM.We develop also, a method based onconcepts of fuzzy logic for clusteringand classification of vectors : FuzzyC-Means (FCM) algorithm, anddemonstrate its effectiveness withregard to K-Means (KM) traditionalalgorithm, owing to the fact that theKM algorithm provides a hard decision(not probabilitized) in connection withthe observations in the HMM states.

    Regarding the other algorithm ofclustering, we specifically, methodsinvolving a supervised clustering by

    partition and we chose one as the basisfor our work. This solution is to makethe choice of a measure that we use inour application. This algorithm is a"good" score in a test that measures thequality of a partition. We are thusreduced to an optimization problem.

    The properties of this algorithm doesnot guarantee convergence to a globaloptimum, that is why we are interestedin a type heuristics genetic algorithms(GA), less likely to be trapped by theminimum local and now widely used inoptimization problems.

    Genetic Algorithms (GA) are searchprocedures based on the mechanics ofgenetics and natural selection.

    Numerous techniques and algorithmshave been developed to assist theengineer in the creation of optimumdevelopment strategies. These

    procedures quickly converge tooptimal solutions after examining onlya small fraction of the search space andhave been successfully applied tocomplex engineering optimisation

    problems.

    15

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    3/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    2 HMM Advantages and Drawbacks

    Standard HMM procedures, as definedabove, have been very useful for

    pattern recognition; HMM can deal

    efficiently with the temporal aspect ofpattern (including temporal distortionor time warping) as well as withfrequency distortion. There are

    powerful training and decodingalgorithms that permit efficient trainingon very large databases, and forexample recognition of isolated wordsas well as continuous speech. Giventheir flexible topology, HMM caneasily be extended to include

    phonological rules (e.g., building wordmodels from phone models) orsyntactic rules. For training, only alexical transcription is necessary(assuming a dictionary of phonologicalmodels); explicit segmentation of thetraining material is not required. Anexample of a sample HMM is given infigure. 1; this could be the model of asort assumed to be composed of threestationary states.

    p(xi\q1) p(xi\q2) p(xi\q3)

    Figure 1. Example of a three-state HiddenMarkov Models Where 1) {q1, q2, q3}: theHMM states. 2) p(qi\qj): the transition

    probability of state qito state qj(i, j = 1..3). 3)p(xi\qj): the emission probability of observationxifrom the state qj(i = 1...number of trames, j

    = 1..3).

    However, the assumptions thatpermit HMM optimization andimprove their efficiency also, practice,limit their generality. As aconsequence, although the theory of

    HMM can accommodate significantextensions (e.g., correlation of acousticvectors, discriminate training,),

    practical considerations such asnumber of parameters and train-abilitylimit their implementations to simplesystems usually suffering from severaldrawbacks [2] including:- Poor discrimination due to trainingalgorithms that maximizes likelihoodsinstead of a posteriori probabilities

    (i.e., the HMM associated with eachpattern unit is trained independently ofthe other models). Discriminatelearning algorithms do exist for HMM

    but in general they have not scaledwell to large problems.- A priori choice of model topologyand statistical distributions, e.g.,assuming that the Probability DensityFunctions (PDF) associated asmultivariate Gaussian densities ormixtures of multivariate Gaussiandensities, each with a diagonal onlycovariance matrix (i.e., possiblecorrelation between the components ofthe acoustic vectors is disregarded).- Assumption that the statesequences are first-order Markovchains.- Typically, very limited acousticalcontext is used, so that possible

    correlation between successiveacoustic vectors is not modeled verywell.Much Artificial Neural Network(ANN) based Automatic patternrecognition research has beenmotivated by these problems.

    3 ESTIMATING HMM

    LIKELIHOODS WITH ANN

    ANN can be used to classify patternclasses such as units. For statistical

    16

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    4/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    recognition systems, the role of thelocal estimator is to approximate

    probabilities or PDF. Practically, giventhe basic HMM equations, we wouldlike to estimate something likep(xn\qk),

    is the value of the probability densityfunction of the observed data vectorgiven the hypothesized HMM state.The ANN in particular the MLP can betrained to produce the posterior

    probability p(qk\xn) of the HMM stategive the acoustic data. This can beconverted to emission PDF valuesusing Bayesrule.

    Several authors [1-4, 7-10] haveshown for speech recognition that

    ANN can be trained to estimate aposteriori probabilities of outputclasses conditioned on the input

    pattern. Recently, this property hasbeen successfully used in HMMsystems, referred to as hybridHMM/ANN systems, in which ANNare trained to estimate local

    probabilities p(qk\xn) of HMM statesgiven the acoustic data.

    Since the network outputsapproximate Bayesian probabilities,

    gk(xn,)is an estimate of:

    )(

    )()\()\(

    n

    kknnk

    xp

    qpqxpxqp

    (1)

    which implicitly contains the a prioriclass probability p(qk). It is thus

    possible to vary the class priors duringclassification without retraining, since

    these probabilities occur only asmultiplicative terms in producing thenetwork outputs. As a result, class

    probabilities can be adjusted duringuse of a classifier to compensate fortraining data with class probabilitiesthat are not representative of actual useor test conditions [18], [19].

    Thus, scaled likelihoodsp(xn\qk) foruse as emission probabilities instandard HMM can be obtained by

    dividing the network outputs gk(xn) by

    the training set, which gives us anestimate of :

    )(

    )\(

    n

    kn

    xp

    qxp

    (2)

    During recognition, the seatingfactorp(xn)is a constant for all classesand will not change the classification.It could be argued that, when dividing

    by the priors, we are using a scaledlikelihood, which is no longer adiscriminate criterion. However, thisneeds not the true, since thediscriminate training has affected the

    parametric optimization for the systemthat is used during recognition. Thus,this permit uses of the standard HMMformalism, while taking advantage ofANN characteristics.

    The figure2 shows that anequilibrium that is reached in the idealcase does in fact correspond to thenetwork output g being equal to

    posterior probability p. Consider anarea in feature space around a training

    patternxn, and assume that we consideronly two classes: the target class forthis pattern, and the class of all patternsthat do not belong to the first class.Vectors in the selected area willundergo an upward force

    corresponding to the gradient of theerror term for all patterns with a targetof 1; the quadratic error term in thiscase is (1-g)2, with a derivative of(1-g), and this case will occur a

    fraction of the time given byprobability p. Therefore, the upwardforce (average gradient term) appliedto the region is equal to p(1-g). Thedownward force is similarly defined,and will balance for the equilibratedcase. This will only occur when thenetwork output is equal to the posterior

    probability.

    17

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    5/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    Figure 2. Geometric illustration of proof thatANN will (ideally) produce posterior

    probabilities. Equilibrium is reached when theupward force, (due to the average of the ANNoutput from 1 for a target that should be high),is equal to the downward force (due to the

    average deviation from 0 for a target thatshould be low).

    4 WHY THIS IS GOOD ?

    Since we ultimately derive essentiallythe same probability with a MLP as wewould with a conventional (e.g.,Gaussian mixture) estimator, what isthe point? There are at least two

    potential advantages that we and others

    have observed [4, 9, 10, 14]:

    1) The standard statistical recognizersrequire strong assumptions about thestatistical character of the input, suchas parameterizing the input densities asmixtures of Gaussian densities with nocorrelation between features, or as the

    product of discrete densities fordifferent features that are assumed to

    be statistically independent. This typeof assumption is not required with aMLP estimator, which will be an

    advantage particularly when a mixtureof feature types are used, e.g., binaryand continuous. Specifically, standardHMM approaches require theassumption that successive acoustic

    vectors are uncorrelated. For the MLPestimator, multiple inputs can be usedfrom a range of time steps, and thenetwork will learn something about thecorrelation between the acousticinputs. Note that the use of such anetwork will lead to a more generalemession probability that may also beused in the global decoding. That is, ifc + d + 1 frames of acoustic vectors

    dnncndn cn xxxX ,...,,..., are used as

    input to provide contextual informationto the network, the output values of theMLP will estimate

    .,...,1, KkXqp dn cnk This providesa simple mechanism for incorporatingacoustic context into the statisticalformulation.2) MLP are a good match todiscriminative objective functions(e.g., Mean Squared Error: MSE), and

    so the probabilities will be optimizedto maximize discrimination betweensound classes, rather than to mostclosely match the distributions withineach class. It can be argued that such atraining is conservative of parameters,since the parameters of the estimate aretrained to split the space betweenclasses, rather than to represent thevolumes that compose the division ofspace constituting each class.

    5 J-RASTA PLP ANALYSIS

    Developing speech recognizer that arerobust under realistic acousticenvironment has been and still is amajor problem for the speechcommunity. Speech distortion can becategorized into two types. There islinear spectral distortion (i.e.,convolutional noise), which is typically

    introduced by microphones and thetelephone channel. There is also

    (1-g)

    = (1-p)g

    g

    = p(1-g)

    18

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    6/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    additive noise such as stationary orslowly varying background noise.

    Three types of features (PLP, Log-RASTA PLP and J-RASTA PLP) wereused to test the robustness of our

    hybrid recognizer in the presence ofconvolution and additive noise.Perceptual Linear Predictive (PLP)analysis is an extension of linear

    predictive analysis that takes intoaccount some aspects of human sound

    perception [20, 21]. Log-RASTA PLP(RelActive SpecTrAl processing) is

    based on PLP but also aims at reducingthe effect of linear spectral distortion.In addidtion of PLP and log-RASTA

    PLP analysis already used in our pastworks [22], we used her J-RASTAPLP tries to handle both linear spectraldistortion and additive noisesimultaneously [23].

    J-RASTA-PLP [23] is used as theacoustic pre-processor for both thedatabases experiments. Each frame ofthe feature vector represents 25 ms ofspeech, with 12.5 ms overlap ofconsecutive frames. J-RASTA PLPwas chosen for its robustness to linearspectral distortions in speech signalsthat are often introduced bycommunication channels.

    J-RASTA-PLP attempts to makerecognition more robust and invariantto acoustic environment variables. It insome sense performs speechenhancement but for the purposes offeature extraction. The speech

    enhancement process added wasdesigned to improve intelligibility ofnoisy speech for human listeners.Addition of this enhancement mayimprove recognition scores further.

    Speech recordings were sampledover the microphone at 11 kHz. After

    pre-emphasis (factor 0.95) andapplication of a Hamming windows,The J-RASTA PLP features werecomputed every 10 ms on analysis

    windows of 30 ms.

    Each frame is represented by 12components plus energy (J-RASTAPLP + E). The values of the 13coefficients are standardized by theirstandard deviation measured on the

    frames of training. The features set forour hybrid HMM/MLP system wasbased on a 26 dimensional vectorcomposed of the cepstral parameters(J-RASTA PLP parameters), the cepstral parameters, the energy and

    the energy.

    RASTA was configured to

    incorporate high-pass filtering and

    slight spectral subtraction. A constant J

    of 1e-6 was used for training. Multiple

    regressions J mapping was used duringtesting.

    Nine frames of contextual

    information was used at the input of

    the MLP (9 frames of context being

    known as yielding usually the best

    recognition performance).

    6 CLUSTERING PROCEDURE

    One of the basic problems that aries in

    a great variety of fields, including

    pattern recognition, machine learning

    and statistics, is the so-called clustering

    problem [5,13,15,22]. The fundamental

    data clustering problem may be

    defined as discovering groups in data

    or grouping similar objects together.

    Each of these groups is called a cluster,

    a region in which the density of objects

    is locally higher than in other regions.In this paper, data clustering is

    viewed as a data partitioning problem.

    Several approaches to find groups in a

    given database have been developed

    [5, 13, 15], but we focus on the K-

    Means (KM) algorithm as it is one of

    the most used iterative partitional

    clustering algorithms and because it

    may also be used to initialize more

    expensive clustering algorithms (e.g.,

    the EM algorithm).

    19

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    7/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    6.1 Drawbacks of the K-Means

    Algorithm

    Despite being used in a wide array ofapplications, the KM algorithm is not

    exempt from drawbacks. Some of thesedrawbacks have been extensivelyreported in literature [5, 13, 15, 22].The most important are listed below:- As many clustering methods, theKM algorithm assumes that thenumber of clusters kin the database isknown beforehand which, obviously,is not necessarily true in real-worldapplications.

    - As an iterative technique, the KMalgorithm is especially sensitive toinitial starting conditions (initialclusters and instance order).- The KM algorithm convergesfinitely to a local minimum. Therunning of the algorithm defines adeterminist mapping from the initialsolution to the final one.- The vectorial quantization and in

    particular, the KM algorithm providesa hard and fixed decision not

    probabilitized which does not transmitenough information on the realobservations.

    Many clustering algorithms based onfuzzy logic concepts have beenmotivated for this last problem.

    6.2 Fuzzy C-Means Algorithm

    In general, and by example for speechcontext a purely acoustic segmentationof the speech cannot suitably detect the

    basic units of the vocal signal. One ofthe causes is that the borders betweenthese units are not acoustically defined.For this reason, we were interested touse the automatic classificationmethods which are based on fuzzylogic in order to segment the datavectors. Among the adapted

    algorithms, we have chosen the FCMalgorithm which has already been

    successfully used in various fields andespecially in the image processing[13], [15].

    FCM algorithm is a method ofclustering which allows one piece of

    data to belong to two or more clusters.The use of the measurement data isused in order to take not of pattern data

    by considering in spectral domain only.However, this method is applied forsearching some general regularity inthe collocation of patterns focused onfinding a certain class of geometricalshapes favored by the particularobjective function [5]. The FCMalgorithm is based on minimization of

    the following objective function, withrespect to U, a fuzzy c-partition of thedata set, and to V, a set of K

    prototypes [5]:

    m

    j

    c

    i

    ij

    m

    ijm VXuVUJ1 1

    2

    ),(

    11

    (3)

    Where m is any real number greater

    than 1, uijis the degree of membershipofxjin the cluster i, xjis the jth of d-dimensional measured data, Viis the d-dimension center of the cluster, and ||*||is the any norm expressed thesimilarity between any measured dataand the center.

    Fuzzy partition is carried outthrough an iterative optimization of (3)with the update of membership u andthe cluster centers Vby [5]:

    c

    k

    m

    ik

    ij

    ij

    d

    du

    1

    12

    1

    (4)

    n

    j

    m

    ij

    n

    j

    j

    m

    ij

    i

    u

    Xu

    V

    1

    1

    (5)

    20

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    8/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    The criteria in this iteration will stop

    when ijijij

    uu max where is a

    termination criterion between 0 and 1.As a part of our main objective, we

    aim to findthe best and the worst set ofinitial starting conditions to approachthe extremes of the probabilitydistributions of the square-error values.Due to the computational expense of

    performing an exhaustive search wetackle the problem using GeneticAlgorithms.

    6.3 Genetic Algorithms

    Roughly speaking, we can say thatGenetic Algorithms (GA) are kinds ofevolutionary algorithms, that is,

    probabilistic search algorithms whichsimulate natural evolution [24, 25, 26].GA are used to solve combinatorialoptimization problems following therules of natural selection and naturalgenetics. They are based upon thesurvival of the fittest among string

    structures together with a structuredyet randomized information exchange.Working in this way and under certainconditions. GA evolve to the globaloptima with probability arbitrarilyclose to 1.

    When dealing with GA, the searchspace of a problem is represented as acollection of individuals. Theindividuals are represented bycharacter strings. Each individual is

    coding a solution to the problem. Inaddition, each individual hasassociated a fitness measure. The partof the space to be examined is calledthepopulation. The purpose of the useof a GA is to find the individual fromthe search space with the best genetic

    material.Figure 3 shows the pseudo-code of

    the GA that we use. First, the initialpopulation is chosen and the fitness of

    each of its individuals is determined. Inour case, the result of FCM clustering

    is considering as initial population.Next, in every iteration, two parentsare selected from the population. This

    parental couple produces childrenwhich, with a probability near zero, are

    mutated, i.e., hereditary distinctions arechanged. After the evaluation of thechildren, the worst individual of the

    population is replaced by the fittest ofthe children. This process is iterateduntil a convergence criterion issatisfied.

    The operators which define thechildren production process and themutation process are the crossoveroperator and the mutation operator

    respectively. Both operators areapplied with different probabilities and

    play different roles in the GA.Mutation is needed to explore newareas of the search space and helps thealgorithm avoid local optima.Crossover is aimed to increase theaverage quality of the population. Bychoosing adequate crossover andmutation operators as well as anappropriate reduction mechanism, the

    probability that the GA reaches a near-optimal solution in a reasonablenumber of iterations increases.

    Figure 3. The pseudo-code of the GeneticAlgorithm

    Begin GA

    Choose initial population at randomEvaluate initial populationWhile not convergence criterion doBegin

    Select two parents from the currentpopulationProduce children by the selected parents

    Mutate the childrenEvaluate the childrenReplace the worst individual of the population

    by the best childEnd

    Output the best individual foundEnd GA.

    21

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    9/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    7 VALIDATION ON SPEECH

    AND BIOMEDICAL SIGNAL

    CLASSIFICATION PARADIGM

    Further assume that for each class in

    the vocabulary we have a training setof k occurrences (instances) of eachclass where each instance of thecategories constitutes an observationsequence. In order to build our tool, we

    perform the following operations.

    1. For each class v in the vocabulary,we estimate the model parameters v(A,B, ) that optimize the likelihood ofthe training set for the vthcategory.

    2. For each unknown category to berecognized, the processing of Fig. 4 iscarried out: measurement of theobservation sequence O = {o1, o2,,oT}, via a feature analysis of thesignal corresponding to the class; thecomputation of model likelihoods forall possible models,P(O/v), 1 v V;at the end the selection of the categorywith the highest likelihood.

    7.1 Speech Databases

    Three speech databases have been usedin this work:

    1) The first one referred to as DB1,the isolated digits task has 13 words inthe vocabulary: 1, 2, 3, 4, 5, 6, 7, 8, 9,zero, oh, yes, no. They are spoken by30 speakers, producing a total of 3900utterances (each word should bemarked 10 times). The digits databasehas about 30,000 frames of trainingdata. This first corpus consists ofisolated digits collected over themicrophone. This data base was used

    to conduct extensive testings of therobustness of various signal processingfront ends (PLP, log-RASTA PLP, J-RASTA PLP) in the presence of bothconvolution and additive noise.2) The second database, referred toas DB2 contained about 50 speakerssaying their last name, first name, thecity of birth and the city of residence.Each word should be marked 10 times.

    The used training set in the followingexperiments consists of 2000 sounds.

    Figure 4. Block diagram of a speech and biomedical database HMM recognizer

    22

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    10/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    3) The third database, referred to asDB3, contained the 13 control words(i.e. View/new, save/save as/save all)so that each speaker pronounces eachcontrol word 10 times. The used

    training set in the followingexperiments consists of 3900 soundssaying by 30 speakers.

    For each databases, 12 speakerswere used for training (a non-overlapping subset of these were usedfor cross-validation used to adapt thelearning rate of the MLP), while theremaining 8 speakers were used fortesting.

    The acoustic feature were quantizedinto independent codebooks accordingto the FCM algorithms respectively:

    - 128 clusters for the J-RASTA PLPvectors.

    - 128 clusters for the first timederivative of cepstral vectors.

    - 32 clusters for the first timederivative of energy.

    - 32 clusters for the second timederivative of energy.

    7.2 Biomedical Database

    For task of biomedical databaserecognition using HMM/MLP model,the object of this survey is theclassification of an electric signalcoming from a medical test [16],experimental device is described in

    Fig. 5. The used signals are calledPotentials Evoked Auditory (PEA),examples of PEA signals are illustratedin Fig. 6. Indeed, the explorationfunctional otoneurology possesses atechnique permitting the objectivesurvey of the nervous conduction alongthe auditory ways. The classification ofthe PEA is a first step in thedevelopment of a help tool to thediagnosis. The main difficulty of this

    classification resides in theresemblance of signals corresponding

    to different pathologies, but also in thedisparity of the signals within a sameclass. The results of the medical testcan be indeed different for twodifferent measures for the same patient.

    The PEA signals descended of theexamination and their associatedpathology are defined in a data basecontaining the files of 11185 patients..We chose 3 categories of patients (3classes) according to the type of theirtrouble. The categories of patients are:1) Normal (N): the patients of thiscategory have a normal audition(normal class).2) Endocochlear (E): these patients

    suffer from disorders that touches thepart of the ear situated before thecochlea (class Endocochlear).3) Retrocochlear (R): these patientssuffer from disorders that touches the

    part of the ear situated to the level ofthe cochlea or after the cochlea. (classretrocochlear).

    We selected 213 signals(correspondents to patients). So thatevery process (signal) contains 128

    parameters. 92 among the 213 signalsbelong to the N class, 83, to the class Eand 38 to the class R. To construct our

    basis of training, we chose the signalscorresponding to pathologies indicatedlike being certain by the physician. AllPEA signals come from the sameexperimental system. In order to valuethe HMM/MLP realized system andfor ends of performance comparison,

    we forced ourselves to apply the sameconditions already respected in thework of the group describes in thefollowing article [17] and that uses amulti-network structure heterogeneousset to basis of RBF and LVQ networksand the work describes in the followingarticles [11], [12] and that uses adiscrete HMM. For this reason, the

    basis of training contains 24 signals, ofwhich 11 correspondent to the class R,

    6 to the class E and 7 to the N class.

    23

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    11/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    After the phase of training, when thenon learned signals are presented to theHMM, the corresponding class must bedesignated. For the case in which wewish to use an HMM with a discrete

    observation symbol density, rather thancontinuous vectors above, a quantizedvector VQ is required to map eachcontinuous observation vector into adiscrete codebook index. Once thecodebook of vectors has been obtained,the mapping between continuousvectors and codebook indices becomesa simple nearest neighbor computation,i.e., the continuous vector is assignedthe index of the nearest codebook

    vector. Thus the major issue in VQ isthe design of an appropriate codebookfor quantization.

    Codebook sizes of from M = 64vectors have been used in biomedicaldatabase recognition experiments usingHMM/MLP model.

    Figure 5.PEA generation experiment

    Figure 6.(A) PEA signal for normal patient,

    (B ) Patient with auditory disorder

    7.3 Discrete MLP with entriesprovided by the FCM algorithm

    For this case, we compare theperformance of the basis hybrid model

    with that of an hybrid HMM/MLPmodel using in entry of the network anacoustic vector composed of realvalues which were obtained byapplying the FCM algorithm with 2880real components corresponding to thevarious membership degrees of theacoustic vectors to the classes of the"code-book ". We presented eachcepstral parameter (J-RASTA PLP, J-RASTA PLP, E, E) by a real

    vector which the components definitethe membership degrees of theparameter to the various classes of the"code-book ". For speech databases,for example, we reported the valuesused for DB2. 10-state, strictly left-to-right, word HMM with emission

    probabilities computed from an MLPwith 9 frames of quantizied acousticvectors at the input, i.e., the currentacoustic vector preceded by the 4

    acoustic vectors on the left context andfollowed by the 4 acoustic vectors onthe right context. the entry layer ismade up of a real vector with 2880 realcomponents corresponding to thevarious membership degrees of theacoustic vectors to the classes of the"code-book ".

    Thus a MLP with only one hiddenlayer including 2880 neurons at theentry, 30 neurons for the hidden layerand 10 output neurons was trained. Thethree-layer MLP is trained by usingstochastic gradient descent, andrelative entropy as the error criterion.A sigmoid function is applied to thehidden layer units, and softmax(exponential of the units weighted

    sum normalized by the sum ofexponentials for the entire layer) isused as the output nonlinearity. The

    number of neurons of the hidden layerwas choosen empirically.

    24

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    12/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    For the PEA signals, a MLP with 64neurons at the entry, 18 neurons for thehidden layer and 5 output neurons wastrained.

    7.4 Discrete MLP with entriesprovided by the GA algorithm

    In the case of application of GA, theinitial population is chosen and the

    fitness of each of its individuals isdetermined. In our case, the result ofFCM clustering is considering as initial

    population.

    For clarification, we report after,

    some results of the clusterning of DB1

    7.4.1 Coding of individuals

    The key idea is that the encoding of anindividual must enable sampling"effective" in the search space. Wechose a representation based onindexing of classes. An individual istherefore encoded by a chain of

    integers whose ordinal value (= index)represents the number of the object andthe cardinal value corresponds to thenumber of the class to which the object

    belongs.

    Practical consideration: The followingpartition of objects (acoustic vectors)in 10 classes (10-digit).class0 = {V1,V2,..V21,V22},class 1 = {V23,V24,...V34,V35},class 2 = {V36,V37,...V?}, ,class 9 = {}will be represented bythe following chromosome:

    000000... 111111 888888 999999

    Figure 7. Example of coding of individuals

    7.4.2 Population size

    It is, a fundamental parameter of theGA, for which there is no general

    criteria for determining what optimumsize should be. We tried to adapt thesize of the population the size of the

    problem to treat.

    7.4.3 Meri t function

    An individual or even genotyperepresents, let us recall, partitioning ofk observations all classes. We use thefunction of following representation:

    li Cx il

    l x

    Cg 1

    (6)

    withglthe centre of gravity of the class

    Cl, l[1,M],Mthe number of classes(digit in our case), which allows toinfer theM representatives. The test ofintra-class criteria, measurement of thequality of a partition whose expressionis given in (7) is our cost objective

    function , and we seek to minimize thevalue. This objective functiontransformation (criterion) led to the

    function of merit that we use in ouralgorithm to measure the quality of an

    individual.

    ),(1

    2

    li

    M

    l Cx

    i gxdpw

    li

    (7)

    where pi means the weight of the ith

    object.

    7.4.4 Reproduction

    In this phase, is created on eachiteration, a new population, byapplying the genetic operators:selection, crossing and mutation. It isto select the most appropriateindividuals in the population within themeaning of the function of merit,following a method, and reproducethem as what in the next generation.

    In our GA we used called "steadystate genetic algorithm" consisting of

    variant to replace that a certainpercentage of the population each

    25

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    13/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    generation (the size of the populationremains constant over the course of theGA). The evolution of the populationis ensured through operators in the

    selection, the crossing and mutation

    that allow to combine and modifychromosome. We summarize, usedgenetic operators.

    - Selection: there are many selectionstrategies, we present just two briefly.In the selection of wheel type the

    probability of selection of anindividual is proportional to its valueof adaptation. The selection of typetournament is to select a subset of the

    population and retain the bestindividual of this sub-group. Seen thatin our case, the problem is aminimization of the fitness problem.The best individual is one that has thesmallest value of fitness

    Practical consideration: According tofitness of each chromosome of thepopulation, it selects the bestindividuals.

    Table 1. Selection of the best individuals

    Parents fitness1 6.4520

    2 6.8843

    3 7.3374

    4 6.2424

    5 6.7076

    6 7.1452

    7 6.4746

    8 6.8331

    Parents fitness

    1 6.2424

    2 6.4520

    3 6.4746

    4 6.7076

    5 6.8331

    6 6.8843

    7 7.1452

    8 7.3374

    - Crossing: it allows to create twoindividuals (children) by combininggenes from both parents obtained in the

    previous step. This operator allows toincrease the quality of a populationaverage. We used the variant to arandomly chosen from possible Cup

    points crossing point (for a chainoflength lthere are l 1). However, this

    solution may lead to a child non-viable (or even both), this is why it

    tests eventually to different cut-offpoints. Can optionally restrict thisroute by setting a priori, a number ofiterations. If at the end of this audit, thechild remains unsustainable, we take

    one of the parents. A child is said to benon-viable, if it does not have the samenumber of classes as these parents.

    Practical consideration: Are the twofollowing parents selected from ourapplication.Parent 1 Cut-of

    00000000 11110111 ... 33633333 44744477

    55556655 66665566 88338822... 99999990

    Parent 200000000 11111115... 33333333 44444444...

    55556565 66666666 88888833 99999999...

    Figure 8.Both parents

    It gets 102 cut point. The two childrenobtained are the following:

    Child 1

    00000000 11110111 33633333 44444444

    55556565 66666666 88888833 99999999

    Child 2

    00000000 11111115 33333333 44744477

    55556655 66665566 88338822 99999990

    Figure 9.Two children

    In regards to the value of theprobability of crossing, we followedthe recommendations proposed by

    Goldberg [24], citing the work of Dejong, offers a high probability ofcrossing ( 0)5).

    - Mutation: this stage allows tointroduce random variation in genotypeof the individual (here the children).This operator allows therefore toexplore new areas of research space,thus decreasing the risk of converge

    towards local minima.. Each gene istherefore likely to change in a given

    26

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    14/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    probability, it is also possible to chooserandomly a portion of the genes. Themutation may lead to an unsustainableindividual, this possibility is thereforetaken into account at the level of the

    coding of this operator.

    Practical consideration:Be the genesof the enfant1 presented above:Child before mutation

    00000000 11110111 33633333 4444444455556565 66666666 88888833 99999999

    Child after mutation

    00000000 11110111 33633333 4444444455556565 66666666 88888833 97999999

    Change of gene

    Figure 10.Example of mutation

    Here again, we followedrecommendations of Goldberg[24] forthe probability of mutation, byadopting a value inversely proportionalto the size of the population.

    7.4.5 Replacement of the newpopulation

    We conducted a replacement of badsolutions. For this purpose, we haveclassified all solutions (parents andchildren) according to their adaptationand are kept in our people that the Nfirst individuals, i.e. more small valuesof fitness, obtaining the new

    population which consists of theNbestadaptations. If the number ofgenerations is reached, then the best

    solution is extracted, and moves to thenext step; otherwise, it is a newiteration of reproduction.

    Practical consideration:The populationof parents and children obtained, is

    presented in the table on the left. Tomake a replacement of the worst

    people, we have classified all theparents and their children according to

    their fitness growing. It will keep in thepopulation than theNfirst individuals.The new population is presented in thetable at right.

    Table 2. Best population obtained

    Child fitness Parents fitness

    1 4.2582 1 3.1490

    2 11.5489 2 3.1803

    3 7.2761 3 3.2824

    4 9.1245 4 3.3785

    5 13.5487 5 3.5947

    6 5.0214 6 3.8692

    7 3.1421 7 3.8761

    83.8692 8 3.8815

    9 3.5103

    10 7.6815

    11 7.2584

    12 3.1436

    New

    populationfitness

    1:Child 7 3.1421

    2: Child 12 3.1436

    3: Parent 1 3.1490

    4: Parent 2 3.1803

    5: Parent 3 3.2824

    6: Parent 4 3.3785

    7: Child 9 3.5103

    8: Parent 5 3.5947

    7.4.6 Test of judgment

    There is no criterion of judgment thatguarantee the convergence of the GAto an optimal solution. It is customaryto fix a priori, the number of iterations.When this number is reached, it takesthe best chromosome obtained as theoptimal solution. We used this methodin our experiments.

    For the GA, we set a maximumiteration number that represents the testcase.

    Figure 11.Convergence of the classificationprocess for the base of the Arabic numerals (0-9).

    27

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    15/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    7 . 5 R e s u l t s a n d D i s c u s s i o n

    Five types of models were comparedwithin the framework of speech andmedical databases recognition. Theexperimental results are reported intables 3 and 4. The table 3 summarizesthe various results obtained of themodels using the acoustic parameters

    provided by a Log-RASTA PLP and J-RASRA PLP analysis. For the speechdatabases, all the tests of the fivemodels were carried out on the DB1 infirst then on the DB2 and finally on

    DB3.For the majority of cases, we candraw a preliminary conclusion fromthe results reported in tables 3 and 4for the speech recognition and the PEAsignals diagnosis. We reported her, theresults describes in [22].- For the speech databases recorded inreal conditions and noisy environment,in majority of cases, the result withapplication of J-RASTA PLP analysis

    is better than application of Log-RASTA PLP, this confirmed thetheory.- By comparing the rate with the rateof classification of the system using themulti-network structure (RBF/LVQ)describes in [17], the rate of thediscrete HMM is better.- The hybrid discrete HMM/MLPapproaches always outperformsstandard discrete HMM.

    - The hybrid discrete HMM/MLPsystem using the GA clustering gavethe best results and is more performthan the hybrid discrete HMM/MLPsystem using the FCM clustering andthe rate of th later is beter than hybridmodel using KM clustering.

    Table 3. Recognition rates for the tree speech databases

    (DB1, DB2, and DB3) with the five different types ofmodels (M1, M5) and application of acoustic analysisLog-RASTA PLP and J-RASTA PLP. M1: RBF/LVQ,M2: Discrete HMM, M3: Hybrid HMM/MLP with KMentries, M4: Hybrid HMM/MLP with FCM entries andM5: Hybrid HMM/MLP with AG entries.

    DB1 DB2 DB3Log-

    RASTA

    PLP

    J-

    RASTA

    PLP

    Log-

    RASTA

    PLP

    J-

    RASTA

    PLP

    Log-

    RASTA

    PLP

    J-

    RASTA

    PLP

    M1

    M2

    M3

    M4

    M5

    80.33

    85.75

    90.42

    97.28

    96.32

    79,30

    87,45

    92,96

    97,46

    98,37

    85.12

    89.68

    92.37

    96.16

    96.82

    85,70

    90,23

    93,65

    96,89

    99,32

    72.84

    75.31

    75.84

    73.23

    75.22

    71,78

    76,48

    80,35

    82,62

    82,95

    Table 4. Recognition rates for the biomedical databaseswith the five different type of models (M1,, M5).

    Models Recognition rates

    M1 80.66

    M2 84.48

    M3 90.12

    M4 93.76

    M5 95.23

    8 CONCLUSION AND FUTURE

    WORK

    In this paper, we presented the test offive types of models in the frameworkof speech recognition and medicaldiagnosis. A discriminate training

    algorithm for hybrid HMM/MLPsystem based on the genetic clusteringare described. Our results on isolatedspeech and biomedical signalsrecognition tasks show an increase inthe estimates of the posterior

    probilities of the correct class aftertraining, and significant decreases inerror rates in comparison to the fivesystems: 1) Multi-network RBF/LVQstructure, 2) Discrete HMM, 3)

    HMM/MLP approach with KMclustering, 4) HMM/MLP approach

    28

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    16/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    with FCM clustering and 5)HMM/MLP approach with GAclustering.

    These preliminary experiments haveset a baseline performance for our

    hybrid GA/HMM/MLP system. Betterrecognition rates were observed. Fromthe effectiveness view point of themodels, it seems obvious that thehybrid models are more powerful thandiscrete HMM or multi-networkRBF/LVQ structure for the PEAsignals diagnosis and speech databases.

    We thus envisage improving theperformance of the suggested systemwith the following points:

    - It would be important to use othertechniques of speech parametersextraction and to compare therecognition rate of the system with thatusing the log-RASTA PLP and J-RASTA PLP analysis. We think ofusing the LDA (Linear DiscriminateAnalysis) and CMS (Cepstral MeanSubtraction) owing to the fact thatthese representations are currently

    considered among most powerful inASR.

    - It appears also interesting to use thecontinuous HMM with a multi-Gaussian distribution and to comparethe performance of the system withthat of the discrete HMM.

    - In addition, for an extended speechvocabulary, it is interesting to use the

    phonemes models instead of words,which facilitates the training withrelatively small bases.

    - For the PEA signals recognition, themain idea is to define a fusion scheme:cooperation of HMM with the multi-network RBF/LVQ structure in orderto succeed to a hybrid model andcompared the perfomance with theHMM/MLP/GA model proposed inthis paper.

    9 REFERENCES

    1. O. DEROO, C. RIIS, F. MALFRERE,H.LEICH, S. DUPONT, V.FONTAINE AND J.M. BOTE.Hybrid HMM/ANN system for

    speaker independent continuous speechrecognition in French ". Thesis, Facult

    polytechnique de Mons TCTS,BELGIUM, (1997).

    2. H. BOURLARD, S. DUPONT. "Sub-band-based speech recognition". InProc, IEEE International, Conf,Acoustic, Speech and Signal Process,

    pp.1251-1254, MUNICH, (1997).3. F. BERTHOMMIER, H. GLOTIN. "A

    new SNR-feature mapping for robustmulti-stream speech recognition". InBerkeley University of California,

    editor, Proceeding. International.Congress on Phonetic Sciences(ICPhS), Vol1 of XIV, pp. 711-715,SANFRANCISCO,(1999).

    4. J-M. BOITE, H. BOURLARD, B.DHOORE, S. ACCAINO,

    J.VANTIEGHEM. "Task independentand dependent training: performancecomparison of HMM and hybridHMM/MLP approaches". IEEE, vol.I,

    pp.617-620, (1994).5. J.C. BEZDEK, J. KELLER, R.

    KRISHNAPWAM and N.R. PAL."Fuzzy models and algorithms for

    pattern recognition and imageprocessing". KLUWER, Boston,LONDON, (1999).

    6. H. HERMANSKY, N. MORGAN."RASTA Processing of speech". IEEETrans. On Speech and AudioProcessing, vol.2, no.4, pp. 578-589,(1994).

    7. A. HAGEN, A. MORRIS. "Comparisonof HMM experts with MLP experts inthe full combination multi-band

    approach to robust ASR". To appear inInternational Conference on SpokenLanguage Processing, BEIJING,(2000).

    8. A. HAGEN, A. MORRIS. "From multi-band full combination to multi-streamfull combination processing in robustASR" to appear in ISCA TutorialResearch Workshop ASR2000, Paris,FRANCE,(2000).

    9. L. LAZLI, M. SELLAMI. "HybridHMM-MLP system based on fuzzylogic for arabic speech recognition".

    PRIS2003, The third internationalworkshop on Pattern Recognition in

    29

  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    17/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    Information Systems, pp. 150-155,April 22-23, Angers, FRANCE, (2003).

    10. L. LAZLI, M. SELLAMI."Connectionist Probability Estimatorsin HMM Speech Recognition using

    Fuzzy Logic". MLDM 2003: the 3rd

    international conference on MachineLearning & Data Mining in patternrecognition, LNAI 2734, Springer-verlag, pp.379-388, July 5-7, Leipzig,GERMANY, (2003).

    11. L. LAZLI, A-N. CHEBIRA, K.

    MADANI. "Hidden Markov Models for

    Complex Pattern Classification". Ninth

    International Conference on Pattern

    Recognition and Information

    Processing, PRIP07.http://uiip.bas-

    net.by/conf/prip2007/prip2007.php-

    id=200.htm, may 22-24, Minsk,

    BELARUS, (2007).12. L. LAZLI, A-N. CHEBIRA, M-T.

    LASKRI, K. MADANI. "Using hidden

    Markov models for classification of

    potentials evoked auditory". Confrence

    maghrbine sur les technologies de

    linformation, MCSEAI08, pp. 477-

    480, avril 28-30, USTMB, Oran,

    ALGERIA, (2008).13. D-L. PHAM, J-L. PRINCE. "An

    Adaptive Fuzzy C-means algorithm forImage Segmentation in the presence ofIntensity In homogeneities". PatternRecognition Letters. 20(1), pp. 57-68,(1999).

    14. S-K. RIIS, A. KROGH . "HiddenNeural Networks: A framework forHMM-NN hybrids". IEEE 1997, toappear in Proc. ICASSP-97, Apr 21-24,Munich, GERMANY, (1997).

    15. H. TIMM . "Fuzzy Cluster Analysis ofClassified Data". IFSA/Nafips,VANCOUVER, (2001).

    16. J-F. MOTSH. "La dynamiquetemporelle du trons crbral: Recueil,extraction et analyse optimale des

    potentiels voqus auditifs du tronccrbral". Thesis, University of"Crteil", PARIS XII, (1987).

    30

    http://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htmhttp://uiip.bas-net.by/conf/prip2007/prip2007.php-id=200.htm
  • 8/12/2019 CONNECTIONIST PROPABILITY ESTIMATORS IN HMM USING GENETIC CLUSTERING APPLICATION FOR SPEECH RECO

    18/18

    International Journal of Digital Information and Wireless Communications (IJDIWC) 1(1): 14-31The Society of Digital Information and Wireless Communications, 2011 (ISSN 2225-658X)

    17. A-S. DUJARDIN. " Pertinence d'uneapproche hybride multi-neuronale dansla rsolution de problmes lis audiagnostic industrile ou mdical".Internal report, I2S laboratory, IUT of"Snart Fontainebleau", University of

    Paris XII, Avenue Pierre Point, 77127Lieusant, FRANCE, (2006).

    18. A. MORRIS, A. HAGEN, H. GLOTIN, H.BOURLARD. "Multi-stream adaptativeevidence combination for noise robustASR". Accepted for publication in SpeechCommunication, (2000).

    19. A. MORRIS, A. HAGEN, H.BOURLARD. "MAP combination ofmulti-stream HMM or HMM/ANNexperts". Accepted for publication inEuro-speech 2001, Special Event NoiseRobust Recognition, Aalborg,DENMARK, (2001).

    20. H. HERMANSKY. "Perceptual LinearPredictive (PLP) Analysis for Speech".J. Acoust. Soc. Am., pp. 1738-1752,(1990).

    21. H. HERMANSKY. "RASTA Processingof speech". IEEE Trans. On Speech andAudio Processing,vol.2, no.4, pp. 578-589, (1994).

    22. L. LAZLI, A. CHEBIRA, M-T.LASKRI, K. MADANI. " HybridHMM/ANN system using fuzzyclustering for speech and medical

    pattern recognition". DICTAP 2011,Part II, CCIS 167, Springer-Verlag

    Berlin Heidelberg, pp. 557-570, (2001).23. J. KOEHLER, N. MORGAN, H.

    HERMANSKY, H.G HIRSCH, G.TONG. "Integrating RASTA-PLP intoSpeech Recognition". Proceedings ofthe Int. Conf. on Acoustics Speech andSignal Processing, Adelaide, Australia,(1994).

    24. D. E. GOLDBERG. "Algoritmesgntiques - Exploration, optimisationetapprentissage automatique". AddisonWesley, (1994).

    25. X. Yin and N. Germany. "A FastGenetic Algorithm with sharing schemeusing cluster analysis methods inMultimodal function optimization". InProceedings of the Artificial neural Netsand Genetic algorithms, (1993).

    26. D. Goldberg. "Genetic Algorithms".Addison Wesley editor.

    31