fuzzy hidden markov models for indonesian speech classification

7
Fuzzy Hidden Markov Models for Indonesian Speech Classification Paper: jc16-3-5238 Fuzzy Hidden Markov Models for Indonesian Speech Classification Intan Nurma Yulita, Houw Liong The, and Adiwijaya Graduate Faculty, Telkom Institute of Technology Jalan Telekomunikasi No.1, DayeuhKolot, Jawa Barat 40257, Indonesia E-mail: [email protected], [email protected], [email protected] [Received September 15, 2011; accepted November 15, 2011] Indonesia has many tribes, so that there are many di- alects. Speech classification is difficult if the database uses speech signals from various people who have dif- ferent characteristics because of gender and dialect. The different characteristics will influence frequency, intonation, amplitude, and period of the speech. It makes the system must be trained for the various tem- plates reference of speech signal. Therefore, this study has been developed for Indonesian speech classifica- tion. The solution is a new combination of fuzzy on hidden Markov models. The result shows a new ver- sion of fuzzy hidden Markov models is better than hid- den Markov model. Keywords: fuzzy logic, hidden Markov models, speech, classification, clustering 1. Introduction Over the past several decades, speech classifica- tion technology has been widely applied. There are many approaches for speech classification for example template-based, knowledge-based, and stochastic-based approaches [1]. The successful results were Hidden Markov Model (HMM) [2]. Other results were arti- ficial neural network [3], support vector machine [4], fuzzy [5] and clustering [6]. Speech classification is a “language-dependent” system. Since each language has many phonemes then the application of classification in a language cannot be applied for all language. Many re- searches have been carried out abroad, but it cannot be applied well in Indonesian. English speech recognition is the most speech recognition system, it has been developed in references [2–10]. The number of researches which have done Indonesian speech classification is still slightly. They have done Indonesian speech classification based on speaker adaptation system [11], and developed the corpus of Indonesian speech classification [12]. Speech classification is difficult because speech has several unique characteristics. In different time, a same word has different form although it has been spoken from the same person. Hence, speech classification is more difficult if the database uses speech signals from various people who have different characteristics because of gen- der and dialect. The different characteristics will influ- ence frequency, intonation, amplitude, and period of the speech. It makes the system must be trained for the var- ious templates reference of speech signal. Therefore, a study still needs to be conducted. Hidden Markov mod- els is a common approach that is used to classify speech. However, a method is needed to develop a solution from the above problem, and for Indonesian speech classifica- tion. This study designs it. The solution combines fuzzy on hidden Markov models. Fuzzy handles variant forms of speech more properly than there is no fuzzy. If the number of variant is higher, then the area of each cluster of Fuzzy C-Means clustering is wider. Actually, some study had combined fuzzy on HMM [3, 9, 10, 13] but they were not designed to solve the different characteristics prob- lem in speech dialect and for Indonesian speech. The new design of fuzzy hidden Markov models will be proposed in this study. The models will consist of Fuzzy C-Means clustering which is designed to substitute the vector quan- tization process and a new forward and backward method to handle the membership degree of data. 2. Materials and Methods 2.1. Obtaining Raw Data In this study, it was used the speech recognition data from Research and Development Center of Telkom of In- donesia. Data collection was conducted in a soundproof room (it means there is no noise in speech) and the num- ber of involved speakers was 70 people (male and female). Experiments were performed on speech data set with var- ious characteristics dialect and gender. In this data, some dialect of speaker tribes in Indonesia was used, they were Sundanese, Javanese, Batak, Betawi, Balinese but there was no information how much their proportion. The data set was divided into training data (80% of data set) and testing data (20% of data set). The speakers of training data and testing data were different because our speech classification was speaker independent system. Table 1 lists the used words for the training data and shows the extremely different sounds of each word. Vol.16 No.3, 2012 Journal of Advanced Computational Intelligence 1 and Intelligent Informatics

Upload: others

Post on 02-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fuzzy Hidden Markov Models for Indonesian Speech Classification

Fuzzy Hidden Markov Models for Indonesian Speech Classification

Paper: jc16-3-5238

Fuzzy Hidden Markov Models forIndonesian Speech Classification

Intan Nurma Yulita, Houw Liong The, and AdiwijayaGraduate Faculty, Telkom Institute of Technology

Jalan Telekomunikasi No.1, DayeuhKolot, Jawa Barat 40257, IndonesiaE-mail: [email protected], [email protected], [email protected]

[Received September 15, 2011; accepted November 15, 2011]

Indonesia has many tribes, so that there are many di-alects. Speech classification is difficult if the databaseuses speech signals from various people who have dif-ferent characteristics because of gender and dialect.The different characteristics will influence frequency,intonation, amplitude, and period of the speech. Itmakes the system must be trained for the various tem-plates reference of speech signal. Therefore, this studyhas been developed for Indonesian speech classifica-tion. The solution is a new combination of fuzzy onhidden Markov models. The result shows a new ver-sion of fuzzy hidden Markov models is better than hid-den Markov model.

Keywords: fuzzy logic, hidden Markov models, speech,classification, clustering

1. Introduction

Over the past several decades, speech classifica-tion technology has been widely applied. There aremany approaches for speech classification for exampletemplate-based, knowledge-based, and stochastic-basedapproaches [1]. The successful results were HiddenMarkov Model (HMM) [2]. Other results were arti-ficial neural network [3], support vector machine [4],fuzzy [5] and clustering [6]. Speech classification is a“language-dependent” system. Since each language hasmany phonemes then the application of classification ina language cannot be applied for all language. Many re-searches have been carried out abroad, but it cannot beapplied well in Indonesian. English speech recognition isthe most speech recognition system, it has been developedin references [2–10]. The number of researches whichhave done Indonesian speech classification is still slightly.They have done Indonesian speech classification based onspeaker adaptation system [11], and developed the corpusof Indonesian speech classification [12].

Speech classification is difficult because speech hasseveral unique characteristics. In different time, a sameword has different form although it has been spoken fromthe same person. Hence, speech classification is moredifficult if the database uses speech signals from various

people who have different characteristics because of gen-der and dialect. The different characteristics will influ-ence frequency, intonation, amplitude, and period of thespeech. It makes the system must be trained for the var-ious templates reference of speech signal. Therefore, astudy still needs to be conducted. Hidden Markov mod-els is a common approach that is used to classify speech.However, a method is needed to develop a solution fromthe above problem, and for Indonesian speech classifica-tion. This study designs it. The solution combines fuzzyon hidden Markov models. Fuzzy handles variant formsof speech more properly than there is no fuzzy. If thenumber of variant is higher, then the area of each cluster ofFuzzy C-Means clustering is wider. Actually, some studyhad combined fuzzy on HMM [3, 9, 10, 13] but they werenot designed to solve the different characteristics prob-lem in speech dialect and for Indonesian speech. The newdesign of fuzzy hidden Markov models will be proposedin this study. The models will consist of Fuzzy C-Meansclustering which is designed to substitute the vector quan-tization process and a new forward and backward methodto handle the membership degree of data.

2. Materials and Methods

2.1. Obtaining Raw DataIn this study, it was used the speech recognition data

from Research and Development Center of Telkom of In-donesia. Data collection was conducted in a soundproofroom (it means there is no noise in speech) and the num-ber of involved speakers was 70 people (male and female).Experiments were performed on speech data set with var-ious characteristics dialect and gender. In this data, somedialect of speaker tribes in Indonesia was used, they wereSundanese, Javanese, Batak, Betawi, Balinese but therewas no information how much their proportion. The dataset was divided into training data (80% of data set) andtesting data (20% of data set). The speakers of trainingdata and testing data were different because our speechclassification was speaker independent system. Table 1lists the used words for the training data and shows theextremely different sounds of each word.

Vol.16 No.3, 2012 Journal of Advanced Computational Intelligence 1and Intelligent Informatics

Page 2: Fuzzy Hidden Markov Models for Indonesian Speech Classification

Yulita, I. N., The, H. L., and Adiwijaya

Table 1. Training data.

Words Sounds InformationBalaysalasa ‘balaysalasa’ and 101 files

‘baleysalasa’Lubuklinggaw ‘lubuklinggaw’ and 101 files

‘lubuklinggo’Prabumulih ‘prabumulih’ and 101 files

‘prabulumeh’tanjungenim ‘tanjungenim’ and 100 files

‘tanjungeenim’Tarempa ‘tarempa and 98 files

‘tareempa’

2.2. PreprocessingThe purpose of preprocessing is to make all input sig-

nals in accordance with the required system specifica-tions [7]. The first step is centering, it aims to shift thelocation of the discrete amplitude distribution and makesits center locate the axis y = 0. Thus, centering makesthe average amplitude of the signal to zero. The next stepis normalization, a process to equalize the maximum am-plitude of the sound signal. Normalization is done by di-viding each discrete amplitude values with the maximumamplitude value.

2.3. Feature ExtractionThis process aimed at obtaining the characteristics of

the voice signal. In this study, MFCC is implemented forfeature extraction. It produces 24 parameter values. Theyare 12 cepstral values and 12 first-order derivative valueof these cepstral. The output of this process is that everyspeech will be divided into a number of frames and eachframe will have 24 feature values.

2.4. Vector Quantization (VQ)Basically, the output of feature extraction is shorter

than the original signal. However, in order to processHMM, an observation sequence is needed [7]. The obser-vation represents all variations of existing cepstral. VQis used for the formation of discrete symbols (codebook)from a series of observations of the HMM model for esti-mating the vector representation of the shorter term. VQprocess is divided into two stages: the formation of code-book and the codebook index determination. When con-structing codebook, the input feature vector of the VQ is awhole variety of known voice signal. By using clusteringalgorithms, feature vector will be grouped into clusters.The cluster center is called codebook. After the codebookis constructed, the next step of VQ can be done by replac-ing a feature vector with one vector codebook which hasthe smallest Euclidean distance. The output of VQ is theinput of hidden Markov models.

2.5. Hidden Markov Models (HMM)HMM is a Markov chain which its output symbol de-

scribes the chances of output symbol transitions [2, 8].Observations for each state are described separately by aprobability function or density function (probability den-sity function), which is defined as an opportunity to pro-duce a transition between states. Unlike the Observ-able Markov Model (OMM), HMM consists of a seriesof double stochastic process that primary process cannotbe directly observable (hidden) but only can be observedthrough another set of stochastic processes that producesa range of observations.

2.5.1. Basic ElementHMM as a discrete observation symbol has the follow-

ing elements [2, 8]:

1. HMM consists of N states, they are labeled by1,2, . . . ,N and state to-t is given by qt . N is testedparameter in this study.

2. Number of observation symbols (M). Observationsymbol is the output being modeled.V = V1, . . . ,Vm.

3. Transition probability distribution from one state toanother state (A)

A = ai j, 1 ≤ i, j ≤ N. . . . . . . (1)

4. Observation probability distribution of k-th symbolin the j-th state (B)

B = b j(Vk), 1 ≤ i ≤ N, i ≤ j ≤ N. . (2)

5. Initial state probability distribution

πi = P(q1 = i), 1 ≤ i ≤ N. . . . . (3)

HMM requires specification of two model parameters N,M, A, B, and, λ are measured. HMM notations are usuallywritten with λ (model) = (A,B,π).

2.5.2. Basic Problem and SolutionThere are three basic problems in HMM to be solved,

namely [2, 8]:

1. If a given observation O = (O1,O2, . . . ,Ot) andmodel evaluation λ , how to calculate the efficientprobability of observations series?

2. If a given observation (O1,O2, . . . ,Ot) and modelevaluation λ , how to choose the optimal states seriesthat represent the observation?

3. How to set the parameters of the model evaluation λto maximize the probability P(O|λ ) value?

The solutions to the problem above are [2, 8]:

2 Journal of Advanced Computational Intelligence Vol.16 No.3, 2012and Intelligent Informatics

Page 3: Fuzzy Hidden Markov Models for Indonesian Speech Classification

Fuzzy Hidden Markov Models for Indonesian Speech Classification

1. Evaluation (evaluation of opportunities)The commonly used method is to examine each pos-sible sequence of N states along the T (the numberof observations). It is not efficient. Another simpleprocedure is forward and backward procedures.

a. Forward procedureForward variable (t(i)) at t-time and i-state is de-fined by αt(i) = P(O1,O2, . . . ,Ot ,qt = i|λ ). Theforward opportunities function can be solved forN-state and T -symbol inductively with the follow-ing steps:

i. Initialization:

αt(i) = π1b1(O1), 1 ≤ i ≤ N . . (4)

ii. Induction:

αt+1( j) =

∣∣∣∣∣N

∑i=1

(αt(i)αi j)

∣∣∣∣∣b j(Ot+1) (5)

1 ≤ (i, j) ≤ N, 1 ≤ t ≤ T −1

iii. Termination:

P(O|λ ) =N

∑i=1

(αt(i)) . . . . . . (6)

Forward probability is calculated based on theTrellis diagram pattern. There are n points eachtime slot in the pattern. All possible sequence iscombined to N states.

b. Backward procedureBackward variable t(i) in time to t and i-state isdefined by βt(i)= P(Ot+1,Ot+2, . . . ,Ot ,qt = 1|λ ).Step backward procedure is as follows:

i. Initialization :

βt(i) = 1, 1 ≤ i ≤ N . . . . . . (7)ii. Induction

βt( j) =N

∑i=1

α jib j(Ot+1)βt+1( j) . . (8)

1 ≤ (i, j) ≤ N, t = T −1,T −2, . . . ,1

To obtain the state to the i-th time t and the rowsof observations at time t + 1, then it is assumedthat the possible j-state at time t + 1, to obtain atransition from i to j, and rows of observation onthe j-th state. Then it calculates the observation ofthe j-state.

c. Forward-Backward procedureThe combination of forward and backward proce-dure can be used to obtain the values of P(O|λ ).Opportunity in the state at t-time of the N statebefore time t − 1 can be calculated with the func-tion of the forward opportunities αt(i). Backwardprobability function is used to calculate the prob-ability of observation symbol sequence that startsfrom time t +1 to T . By mathematical calculation,using a forward-backward procedure is illustrated

as the following formula:

P(O|λ ) =N

∑i=1

N

∑j=1

αt(i)αi jβt+1( j)

=N

∑i=1

αt(i)βt(i). . . . . . (9)

2. DecodingThe second problem is looking for the hidden statesequence (hidden) for a sequence of generated obser-vations from model. The solution is used to find theoptimal state sequence. It is Viterbi algorithm (dy-namic programming). Viterbi algorithm maximizesthe probability value P(Q|O,λ ) so it will producethe optimal state sequence. Based on the Bayes rule,mathematically it is expressed as this formula:

P(Q|O,λ ) =P(Q,O|λ )

P(O|λ ). . . . . . . (10)

3. The third problem solution is to adjust the (train-ing) parameters based on certain optimal criterion.The usual method to solve this third problem is theBaum-Welch algorithm. This algorithm is an iter-ative method that works to find the values of localmaximum of the probability function. This trainingprocess continues until a critical state is met. Themodel result should be better training than the previ-ous model.

2.6. Fuzzy Hidden Markov Models (FHMM)The proposed FHMM does not implement vector quan-

tization. The substituted process is Fuzzy C-Means clus-tering. Fuzzy C-Means clustering has two functions.First, it obtains the codebook by clustering processing,the codebook is a cluster center. Second, it changes thefeature extraction output to be the data with membershipdegree for each cluster. The data is used to be the fuzzyhidden Markov models input.

Figure 1 can be elaborated that the system is designedto have 2 ways (training and testing). Both ways have thesame number of stages. They are preprocessing, featureextraction and Fuzzy C-Means clustering. The system in-put is speech. The speech is normalized. The normalizedspeech is extracted by feature extraction processing. Thetraining of Fuzzy C-Means clustering is done to get code-book. After the codebook is constructed, the next stepof vector quantization can be done by replacing a featurevector with a row of frame membership degree for eachcluster. The testing of vector quantization only replaces afeature vector with a row of frame membership degree foreach cluster. After Fuzzy C-Means clustering, the train-ing does re-estimation process for FHMM and the testingprocess decided the most similar reference model. Thesystem output is text.

2.6.1. Fuzzy C-Means ClusteringThe steps of Fuzzy C-Means clustering will be shown

in the following steps [14]:

Vol.16 No.3, 2012 Journal of Advanced Computational Intelligence 3and Intelligent Informatics

Page 4: Fuzzy Hidden Markov Models for Indonesian Speech Classification

Yulita, I. N., The, H. L., and Adiwijaya

Fig. 1. Speech classification using FHMM.

1. Initializing data input, matrix X , with size nxm,(n = number of frames, m = number of features)

2. Determining the parameters:

a. Number of clusters (k)

b. Maximum iterations (t)

c. The expected smallest error (ξ )

d. 1st iteration

e. Power (w)

The number of cluster indicates the variation of rec-ognized sound. If the number of cluster is 16 thenthere are 16 variation of recognized sound. Thepower of Fuzzy C-Means clustering indicates rangeof each cluster. If the power is 2 then the clusterrange is wider than that if the power is 1.3. It meansif the power is 2 then membership degree of data ishigher than that if the power is 1.3.

3. Generating random values from the matrix U whichis a matrix number of frames, and the number ofclusters, to make the matrix elements of the initialpartition U . Calculating the partition matrix (μik):

μik =μik

Q j. . . . . . . . . . . . (11)

4. Calculating the k-th cluster center (Vk j) :

Vk j =

n

∑i=1

((μik)wXi j)

n

∑i=1

(μik)w. . . . . . . . (12)

5. Calculating the objective function (Pt ) at iteration t:

Pt =N

∑i=1

c

∑k=1

(m

∑j=1

(Xi j −V 2k j)(μik)w). . . (13)

6. Doing iteration and at each iteration the partition ma-trix (μik) will be updated:

μk j =

(m

∑j=1

(Xi j −Vk j)2

) −1w−1

c

∑k=1

(m

∑j=1

(Xi j −Vk j)2

) −1w−1

. . . (14)

7. Checking the stop condition:

a. If new objective function value less the same oldobjective function value is less than the expectederror value, or more than the maximum t value it-eration, (|Pt − Pt−1| < ξ ) or (t > MaxIter), thenstop.

b. Step 4 will be repeated if the condition has notstopped and t = t +1.

Fuzzy C-Means clustering is done to obtain the clustercenter (codebook). After the codebook is constructed, thenext step can be done by replacing a feature vector witha row membership degree of frame for each cluster. Af-ter the codebook is obtained, then calculate membershipdegree of data for each cluster (κxz) using the followingequation:

κxz =

(m

∑y=1

(Xxy −Vzy)2

) −1w−1

c

∑z=1

(m

∑j=1

(Xxy −Vzy)2

) −1w−1

. . . . . (15)

Note:

1. x: number of frames of observation data

2. y: number of features

3. z: number of clusters

2.6.2. Fuzzy Forward-BackwardThe difference between HMM and FHMM is that for

each observation HMM refers to one codebook value ofone frame while in FHMM, observation refers to a framevalue but it has all the values in each codebook with differ-ent membership degree. Therefore, a new framework offorward and backward calculation needs to be conducted.In this sub-section, the other forward and backward cal-culation is also shown [13].Initialization of forward calculation (t = 1):

1. HMM

α1(i) = π1bi(O1) . . . . . . . . . . (16)

2. The proposed FHMM

α1(i) = π1bi(Θ1) . . . . . . . . . . (17)

4 Journal of Advanced Computational Intelligence Vol.16 No.3, 2012and Intelligent Informatics

Page 5: Fuzzy Hidden Markov Models for Indonesian Speech Classification

Fuzzy Hidden Markov Models for Indonesian Speech Classification

3. Other FHMM

α1(i) = π1

∣∣∣∣∣M

∑m=1

u(m,1)bi(O1)

∣∣∣∣∣ . . . . (18)

Induction of forward calculation (t = 2, . . . ,T ):

1. HMM

αt+1( j) =

∣∣∣∣∣N

∑i=1

(αt(i)αi j)

∣∣∣∣∣b j(Ot+1) . . . (19)

2. The proposed FHMM

αt+1( j) =

∣∣∣∣∣N

∑i=1

(αt(i)αi j)

∣∣∣∣∣b j(Θt+1) . . . (20)

3. Other FHMM

αt+1( j) =

∣∣∣∣∣N

∑i=1

(αt(i)αi j)

∣∣∣∣∣∣∣∣∣∣

M

∑m=1

u(m, t)b j(m)

∣∣∣∣∣ (21)

Induction of backward calculation (t = T ):

1. HMM

βt( j) =N

∑j=1

a jib j(Ot+1)Bt+1( j) . . . . (22)

2. The proposed FHMM

βt( j) =N

∑j=1

a jib j(Θt+1)Bt+1( j) . . . . (23)

3. Other FHMM

βt( j) =

∣∣∣∣∣N

∑j=1

a jiBt+1( j)

∣∣∣∣∣∣∣∣∣∣

M

∑m=1

u(m, t)b j(m)

∣∣∣∣∣ (24)

Calculation of forward-backward:

1. HMM

P(O|λ ) =N

∑i=1

N

∑j=1

αt(i)a jib j(Ot+1)Bt+1( j) (25)

2. The proposed FHMM

P(O|λ ) =N

∑i=1

N

∑j=1

αt(i)a jib j(Θt+1)Bt+1( j) (26)

3. Other FHMM

P(O|λ ) =N

∑i=1

N

∑j=1

αt(i)a jiBt+1( j)ρ . . . (27)

ρ =M

∑m=1

u(m, t)b j(m) . . . . . . . . (28)

Note:

1. bi(Θx) =c

∑z=1

Bziκxz . . . . . . . . . (29)

This formula means that the input data is observationdata which has membership degree for each cluster,and the output data is the observation probability dis-tribution of x-th symbol in the i-th state (B).

2. u(m, t) = similarity(cb(m),Ot) . . . . (30)

cb(m) is a cluster center vector for index m.

3. Similarity measure m (represents the number of fea-tures) using one of the following equations:

a. Cosine similarity:

(xi,x j) =

m

∑k=1

XikXjk√m

∑k=1

(Xik)2m

∑k=1

(Xjk)2

. . . (31)

b. Manhattan distance:

(xi,x j) =m

∑k=1

|Xik −Xjk| . . . . . . (32)

c. Euclidean distance:

(xi,x j) =

√m

∑k=1

|Xik −Xjk|2 . . . . . (33)

4. Four formulas of the proposed forward and back-ward calculation are changed because every valueb j(Ot) refers to all codebook with different degreesof membership.

3. Experimental Results

3.1. Compare HMM and FHMM if the Number ofCluster was Altered

The purpose of experiment was to obtain the optimalnumber of cluster. The static variables were the number ofstates and the power (w). In this experiment, the numberof state was 7 (seven) and the power (w) was 1.1.

Table 2 shows 32 clusters is better than 16 clusters. Itmeans that the system required 32 variant of recognizedsound to obtain a good accuracy. The experiment did nottry if the number of cluster is more than 32, for exam-ple 64 clusters since this study had only five recognizedwords, all words had few phonemes. If the number ofcluster was 64, it would cause the overspecialization sys-tem.

3.2. Compare FHMM for Each Power (www)The purpose of experiment was to obtain the optimal

power (w) of FHMM. The static variables were the num-ber of states and the number of cluster. In this experiment,

Vol.16 No.3, 2012 Journal of Advanced Computational Intelligence 5and Intelligent Informatics

Page 6: Fuzzy Hidden Markov Models for Indonesian Speech Classification

Yulita, I. N., The, H. L., and Adiwijaya

Table 2. If the number of cluster was altered.

The number of cluster HMM FHMM16 66.67 % 84.17%32 80 % 88.33%

Table 3. If the power (w) was altered.

w Accuracy1.05 92.5%1.1 88.33%1.3 83.33%1.5 65%1.7 46.67%

Fig. 2. The influence of w.

the number of state was 7 (seven) and the number of clus-ter was 32. The number of cluster was 32 because it wasthe optimal number which was obtained from experimentof Table 2.

Table 3 shows FHMM accuracy if power (w) was rang-ing from 1.05–1.7. The optimal power (w) was 1.05 andif w increased then FHMM accuracies decreased. The ex-planation of the result will be shown in Fig. 2.

Figure 2 shows an illustration that if the power (w) was1.3, each data had different degrees of membership foreach cluster. Otherwise, if the power (w) was 2, threeclusters have the same region and each data has the samedegrees of membership for each cluster. It means thatthere is no different among observation data and the sys-tem will only recognize one label.

3.3. Compare FHMM for Each StateThe purpose of experiment was to obtain the optimal

number of states. The static variables were the number ofcluster and the power (w). In this experiment, the numbercluster was 32 and the power was 1.05. The parametervalues were the optimal values which were obtained fromexperiment of Tables 2 and 3.

Table 4. If the number of cluster was altered.

The number of states HMM FHMM5 80% 90%6 86.67% 90%7 80% 92.5%8 86.67% 90%9 89.17% 91.6%

10 85.83% 91.67%

Table 5. HMM and FHMM for their optimal condition.

Method AccuracyHMM 89.17%

FHMM 92.5%

From Table 4, the optimal number of states of HMMwas 9 and the optimal number of state of FHMM was7. The accuracy was not influenced by the number ofstates. It is shown that even if the number of states wereincreased the accuracy is not always increased.

3.4. Compare HMM and FHMMThe purpose of experiment was to compare HMM and

FHMM if they have the optimal condition (the best accu-racy). The optimal condition of HMM was if the numberof cluster was 32 and the number of state was 9. The opti-mal condition of FHMM was if the number of cluster was3, the power (w) was 1.05, and the number of state was 7.

From Table 5, FHMM was better than HMM. FHMMcould improve HMM accuracy and its improvement was3.33%. The improvement is small, but we have to remem-ber that it is not effective when a real application tries toobtain the optimal condition because it means try to alternumber of state and of course, it is waste of time. HenceFHMM is a good solution for this problem, just try onceand the accuracy is equal to or greater than 90%.

4. Conclusion and Recommendation

4.1. ConclusionFrom the analysis of the performance of FHMM, by

using the data in this study, it can be concluded that a newversion of FHMM can handle several problems of variantspeech because of gender and dialect. It is better thanHMM.

4.2. Recommendations for Future WorksSince our method is an effective way for Indonesian

speech classification, it is highly recommended for a big-ger database and other languages.

6 Journal of Advanced Computational Intelligence Vol.16 No.3, 2012and Intelligent Informatics

fujipress
長方形
Replaced the figure with new one.
Page 7: Fuzzy Hidden Markov Models for Indonesian Speech Classification

Fuzzy Hidden Markov Models for Indonesian Speech Classification

References:[1] S. D. Shenouda, F. W. Zaki, and A. Goneid, “Hybrid Fuzzy HMm

System for Arabic Connectionist Speech Recognition,” Proc. of the5th WSEAS Int. Conf. on Signal Processing, robotics and Automa-tion, pp. 64-69, 2006.

[2] L. R. Rabiner, “A Tutorial on Hidden Markov Models and SelectedApplications in Speech Recognition,” Proc. of the IEEE, Vol.77,No.2, 1989.

[3] P. Melin, J. Urias, D. Solano, et al., “Voice Recognition with NeuralNetworks, Type-2 Fuzzy Logic and Genetic Algorithms,” Engineer-ing Letters, Vol.13, No.2, 2006.

[4] L. Chen, S. Gunduz, and M. T. Ozsu, “Mixed Type Audio Classifi-cation with Support Vector Machine,” Proc. of the IEEE Int. Conf.on Multimedia and Expo, 2006.

[5] R. Halavati, S. B. Shouraki, M. Eshraghi, and M. Alemzadeh, “ANovel Fuzzy Approach to Speech Processing,” 5th Hybrid Intelli-gent Systems Conf., 2004.

[6] S. E. Levinson, L. R. Rabiner, A. E. Rosenberg, and J. G.Wilpon, “Interactive Clustering Techniques for Selecting Speaker-Independent Reference Templates For Isolated Word Recogni-tion,” IEEE Trans. on Acoustics, Speech, and Signal Processing,Vol.Assp-27, 1979.

[7] B. H. Juang and L. R. Rabiner, “Fundamentals of Speech Recogni-tion,” Prentice-Hall, 1993.

[8] B. H. Juang and L. R. Rabiner, “Hidden Markov Models for SpeechRecognition,” Technometrics, Vol.33, No.3, pp. 251-272, 1991.

[9] J. Zeng and Z.-Q. Liu, “Interval Type-2 Fuzzy Hidden MarkovModels,” Proc. of Int. Conf. on Fuzzy Systems, Vol.2, pp. 1123-1128, 2004.

[10] J. Zeng and Z.-Q. Liu, “Type-2 Fuzzy Hidden Markov Modelsto Phoneme Recognition,” Proc. of the 17th Int. Conf. on PatternRecognition, 2004.

[11] H. Riza and O. Riandi, “Toward Asian Speech Translation System:Developing Speech Recognition and Machine Translation for In-donesian Language,” Int. Joint Conf. on Natural Language Process-ing, 2008.

[12] D. P. Lestari, K. Iwano, and S. Furui, “A Larger Vocabulary Contin-uous Speech Recognition System for Indonesian Language,” 15thIndonesian Scientific Conf. in Japan Proceedings, 2006.

[13] H. Uguz, A. Ozturk, R. Saracoglu, and A. Arslan, “A BiomedicalSystem Based on Fuzzy Discrete Hidden Markov Model for The Di-agnosis of The Brain Diseases,” Expert Systems With Applications,Vol.35, pp. 1104-1114, 2008.

[14] S. Kusumadewi, H. Purnomo, and A. Logika, “Fuzzy untuk Pen-dukung Keputusan,” Penerbit Graha Ilmu, pp. 84-85, 2004.

Name:Intan Nurma Yulita

Affiliation:Lecturer, Faculty of Informatics, Telkom Insti-tute of TechnologyGraduate Faculty, Telkom Institute of Technol-ogy

Address:Jalan Telekomunikasi No.1, DayeuhKolot, Jawa Barat 40257, IndonesiaBrief Biographical History:2004-2008 S.T. in Informatics, Telkom Institute of Technology2009-2011 M.T. in Informatics, Telkom Institute of TechnologyMain Works:• applying fuzzy logic on data miningMembership in Academic Societies:• Soft Computing – Indonesia (SC-INA)• Data Mining – Indonesia (INDO-DM)

Name:Houw Liong The

Affiliation:Professor, Telkom Institute of TechnologyProfessor, School of Telematics, InstitutTeknologi Harapan BangsaProfessor and Researcher, Bandung Institute ofTechnology (ITB)Professor, Artificial Intelligence at ST Inten,Bandung

Address:Jalan Telekomunikasi No.1, DayeuhKolot, Jawa Barat 40257, IndonesiaBrief Biographical History:1959-1964 Bachelor degree in Physics from Bandung Institute ofTechnology (ITB)1964-1968 Ph.D. in Physics from University of KentuckyMain Works:• researching complex systems, climate change, climate predictions, floodpredictions and space scienceMembership in Academic Societies:• Indonesia Global Network• Climate Change Network• Interactive Physics

Name:Adiwijaya

Affiliation:Lecturer, Telkom Institute of Technology

Address:Jalan Telekomunikasi No.1, DayeuhKolot, Jawa Barat 40257, IndonesiaBrief Biographical History:1999 B.Sc. in Mathematics from Bandung Institute of Technology (ITB)2004 M.Sc. in Mathematics from Bandung Institute of Technology (ITB)2012 Ph.D. in Mathematics from Bandung Institute of Technology (ITB)Main Works:• graph theory, information theory, and optimization on image processingMembership in Academic Societies:• Indonesian Mathematics Society (Indo-MS)• Indonesian Combinatorial Society (InaCombS)

Vol.16 No.3, 2012 Journal of Advanced Computational Intelligence 7and Intelligent Informatics

fujipress
ハイライト表示
fujipress
ハイライト表示
fujipress
ハイライト表示