author's personal copy · 2010. 4. 27. · dstatcom based on cascaded h-bridge multilevel...

14

Upload: others

Post on 25-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55
IvanM
Text Box
Author's personal copy
Page 2: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

asopis izdaje KoREMA Hrvatsko drutvo za komunikacije,raunarstvo, elektroniku, mjerenje i automatiku. KoREMA jelan Meunarodne federacije za automatsko upravljanje IFAC ilan Meunarodne konfederacije za mjerenja IMEKO.(Published by KoREMA – Croatian Society forCommunications, Computing, Electronics, Measurement andControl, Member of IFAC and IMEKO.)asopis novano podupire Ministarstvo znanosti i tehnologijeRepublike Hrvatskeasopis izlazi u etiri broja godinje.

IZDAVAKI SAVJET (Publisher’s Council)P r e d s j e d n i k : prof. dr. sc. Hrvoje Babi, FER – Fakultetelektrotehnike i raunarstva, Zagreb.D o p r e d s j e d n i k : prof. dr. sc. Nikola erman, FSB –Fakultet strojarstva i brodogradnje, Zagreb. l a n o v i : prof. dr. sc. Leo Budin, FER – Fakultet elek-trotehnike i raunarstva, Zagreb; prof.dr.sc.Dali onlagi, FERI– Fakultet za elektrotehniku, raunarstvo i informatiku, Maribor,Slovenija;prof.dr.sc.BernardFrankovi,Tehnikifakultet,Rijeka;prof.dr. sc.Martin Jadri, FESB – Fakultet elektrotehnike, strojar-stva i brodogradnje, Split; prof.dr. sc.Drago Matko, FE – Fakultetelektrotehnike, Ljubljana, Slovenija; prof.dr.sc. Anton Ogorelec,FE, Ljubljana, Slovenija; prof. dr. sc. Borivoje Rajkovi,KONAR – Institut za elektrotehniku, Zagreb; prof.dr. sc. Asifabanovi, Elektrotehniki fakultet, Sarajevo, Bosna iHercegovina

GLAVNI I ODGOVORNI UREDNIK (Editor-in-chief)prof. dr. sc. Ivan Petrovi

POASNI GLAVNI UREDNIK(Honorary-editor-in-chief) prof. dr.sc. Borivoje Rajkovi

UREDNIKI ODBOR (Editorial Board)Prof. Mogens Blanke, Ph. D., MSc EE, Aalborg University, Dan-ska; prof.dr.sc.Nikola Bogunovi, FER, Zagreb; prof.dr.sc. Dalionlagi, FERI, Maribor, Slovenija; prof. dr. sc. Branko Jeren,FER, Zagreb; Univ. Prof. Dipl.-Ing. Dr. tech. DDr. h. c. BrankoKatalini, Technische Universität, Wien, Austrija; prof.em. Dr.–Ing. Dr.h.c.Werner Leonhard, Njemaka; prof. dr. Gabor Matay,Technical University of Budapest, Maarska; prof. dr. BoidarMati, Akademija nauka i umjetnosti Bosne i Hercegovine,Bosna i Hercegovina; prof. Sanjit K. Mitra Ph.D., University ofCalifornia, Santa Barbara, SAD; prof. dr. sc. Borivoj Modlic,FER, Zagreb; prof.dr. sc.Nedjeljko Peri, FER, Zagreb; prof.dr.sc. Ivan Petrovi, FER, Zagreb; prof. dr.sc. Borivoje Rajkovi,KONAR – Institut za elektrotehniku, Zagreb; prof. LucioVegni, Third University of Rome, Italija.

UREDNIKI KOLEGIJ (Board of Executive Editors)prof. dr. sc. Borivoj Modlic; prof.dr. sc. Nedjeljko Peri; prof.dr.sc.Ivan Petrovi; prof. dr. sc. Borivoje Rajkovi.

lanke objavljene u asopisu AUTOMATIKA referiraju sljedeebaze podataka (Papers referred by the following data bases):SCI-E (Science Citation Index Expanded), INSPEC (Institutionof Electrical Engineers), EBSCO (Computer & Applied SciencesComplete).

Tehniki urednik: doc.dr.sc. Mario Vaak, Ivan MarkoviLektorica za hrvatski jezik: doc.dr.sc. Bernardina PetroviPriprema sloga: FER, ZagrebTisak: Europrint, Velika GoricaUrednik elektronike inaice asopisa:prof. dr. sc. eljko Jakopovi

U r e d n i t v o i a d m i n i s t r a c i j a a s o p i s a (EditorialOffice and Journal Administration):KoREMA (za asopis Automatika), Unska 3, p.p. 473, 10001 Za-greb, telefon ++ (385 1) 6129 869, faks ++ (385 1)6129 870.E-mail: [email protected]; URL: http: //www.korema.hrRukopise, pretplate, narudbe oglasa, objave i drugo slati naadresu urednitva. Rukopisi se ne vraaju. Naputak autorima(Instruction to authors in Internet, s. URL) nalazi se na internetu(v. URL).

ANNUAL SUBSCRIPTION:· US$ 50 for institutions and companies: this subscription

includes for companies the right to free advertisment(1” 6 3/4”in the standard frame) in each issue of the journal and for insti-tutions the right to a free yearly work report or a similar infor-mation (max 300 words).

· US$ 20 for individual subscribers.

Orders from abroad should be sent to Journal Administration.

GODINJA PRETPLATA:· 300 kn za pravne osobe: ova pretplata ukljuuje za tvrtke

pravo na besplatan oglas (2,5 cm 17 cm u standardnomokviru) u svakom broju asopisa, a za znanstvene i struneustanove – sveuilita, fakultete, institute, udruge i sl. – pravona besplatno objavljivanje godinjeg izvjea o radu ili slineinformacije (maks. 300 rijei)

· 60 kn za pojedince, 20 kn za studente i umirovljenike.

Pojedini broj izvan pretplate: 20 kn za pojedince, 80 kn za pravneosobe.

CIJENA OGLASA: informacije u urednitvu.

ADVERTISMENT PRICES: information in Journal Admini-stration.

asopis AUTOMATIKA prijavljen je kao periodina tiskovinapri Ministarstvu kulture i prosvjete RH – Sektor informiranja –pod brojem 397, sukladno lanku 58 st. 3 Zakona o javnom in-formiranju. Predmet je rijeen pod brojem 532-03-5/2-93-01/308,klasa 008-02/93-01-01.

Ureenje zakljueno: 22. oujka 2010.

GOD. 51 ZAGREB, SIJEANJ-OUJAK 2010. BROJ 1 STR. 1-108

ISSN 0005–1144ATKAAF 51(1), 1– 108 (2010)UDK: 007:62/68

impresum_novi_impresum.qxd 4/23/10 4:58 PM Page 1

Page 3: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

JOURNAL FOR CONTROL, MEASUREMENT, ELECTRONICS,COMPUTING AND COMMUNICATIONS

Published by KoREMA Zagreb, CroatiaMember of IFAC and IMEKO

ISSN 0005–1144 Volume 51–2010ATKAAF 51(1), 1–108 (2010) No 1UDK: 007:62/68

CONTENTS

1 Editorial

Krešimir Vrdoljak, Nedjeljko Peric,

Ivan Petrovic3 Applying Optimal Sliding Mode Based Load-frequency Control in

Power Systems with Controllable Hydro Power Plantsoriginal scientific paper

Matko Orsag, Marina Poropat,

Stjepan Bogdan19 Hybrid Fly-by-wire Quadrotor Controller

original scientific paper

Karel Jezernik, Asif Šabanovic,

Miran Rodic33 Speed Sensorless Variable Structure Torque Control of Induction

Motororiginal scientific paper

Lin Xu, Yang Han, Chen Chen,

Jun-Min Pan, Gang Yao, Li-Dan

Zhou, M. M. Khan

41 Modeling, Control and Experimental Investigation of a NovelDSTATCOM Based on Cascaded H-bridge Multilevel Inverteroriginal scientific paper

Dražen Jurišic, George S.

Moschytz, Neven Mijat55 Low-Sensitivity Active-RC Allpole Filters Using Optimized Biquads

original scientific paper

Marjan Korošec, Jože Duhovnik,

Janez Kopac71 Neural Network Based Quality Increase of Surface Roughness

Results in Free Form Machiningoriginal scientific paper

Branimir Dropuljic, Davor

Petrinovic79 Development of Acoustic Model for Croatian Language Using HTK

original scientific paper

Page 4: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Liguo Yu, Alok Mishra 89 Risk Analysis of Global Software Development and ProposedSolutionsreview

99 Comments and Opinions

102 Meetings and Events

Page 5: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Branimir Dropuljic, Davor Petrinovic

Development of Acoustic Model for Croatian LanguageUsing HTK

UDKIFAC

004.934’1:811.163.422.8.3; 5.0 Original scientific paper

Paper presents development of the acoustic model for Croatian language for automatic speech recognition(ASR). Continuous speech recognition is performed by means of the Hidden Markov Models (HMM) implementedin the HMM Toolkit (HTK). In order to adjust the HTK to the native language a novel algorithm for Croatian lan-guage transcription (CLT) has been developed. It is based on phonetic assimilation rules that are applied withinuttered words. Phonetic questions for state tying of different triphone models have also been developed. The au-tomated system for training and evaluation of acoustic models has been developed and integrated with the newgraphical user interface (GUI). Targeted applications of this ASR system are stress inoculation training (SIT) andvirtual reality exposure therapy (VRET). Adaptability of the model to a closed set of speakers is important for suchapplications and this paper investigates the applicability of the HTK tool for typical scenarios. Robustness of thetool to a new language was tested in matched conditions by a parallel training of an English model that was usedas a baseline. Ten native Croatian speakers participated in experiments. Encouraging results were achieved andreported with the developed model for Croatian language.

Key words: Acoustic model, Automatic speech recognition, Croatian language, Hidden Markov models, Phoneticassimilation, Phonetic transcription algorithm, Recognition accuracy

Razvoj akustickog modela hrvatskog jezika pomocu alata HTK. Rad opisuje razvoj akustickog modelahrvatskog jezika za potrebe sustava za automatsko prepoznavanje govora. Prepoznavanje prirodnog spojenog iz-govora ostvaruje se korištenjem skrivenih Markovljevih modela (HMM) u okviru alata HTK. U svrhu prilagodbeovog alata na hrvatski jezik razvijen je novi algoritam za automatsku fonetsku transkripciju hrvatskih rijeci. Zas-niva se na nacelu fonetske asimilacije unutar izgovorenih rijeci. Razvijen je i skup fonetskih pitanja koji se koristiza klasifikaciju prilikom udruživanja trifonskih modela slicnih glasova. Razvijena je automatizirana aplikacija zagradnju i evaluaciju akustickih modela, integrirana s novo razvijenim grafickim suceljem. Primjene ovog sustava zaprepoznavanje su trening s doziranim izlaganjem stresu (SIT) i terapija izlaganjem primjenom virtualne stvarnosti(VRET). Prilagodljivost akustickog modela na zatvoren skup govornika vrlo je važna za takve primjene, pa se uradu istražuje primjenjivost alata HTK u tipicnim scenarijima. Robusnost alata na promjenu jezika istražuje se up-arenim treniranjem i evaluacijom ekvivalentnog modela engleskog jezika u jednakim uvjetima. U eksperimentimaje sudjelovalo deset izvornih hrvatskih govornika. Ostvareni rezultati za hrvatski jezik prikazani u radu pokazujuzadovoljavajuca svojstva razvijenog akustickog modela hrvatskog jezika.

Kljucne rijeci: akusticki model, automatsko prepoznavanje govora, hrvatski jezik, skriveni Markovljevi modeli,algoritam za fonetsku transkripciju, fonetska asimilacija, tocnost prepoznavanja

1 INTRODUCTIONThis article describes an early development of automatic

speech recognition (ASR) system for subjects’ responsesin Croatian language, based on verbal information. As inthe conventional ASR the idea lies in using the recordedspeech utterance signal to extract the cepstral coefficientslinked to the characteristic of the vocal tract and using themas output observations that are modeled by Hidden MarkovModels (HMM) [1–3].

Some studies were done on exploring ASR systems for

Croatian language. News and weather forecasts spokenon the radio and also weather reports over the telephonewere collected in the multi-speaker Croatian speech cor-pus VEPRAD, which was used for building acoustic sys-tem with promising results [4–6]. Much more experimentshave been done for Slovenian language. A large multi-speaker Slovene speech database GOPOLIS, derived fromreal situation dialogs concerning airline timetable informa-tion services was used for estimating model parameters [7].Adaptation of the Slovenian language model for multilin-

ISSN 0005-1144ATKAFF 51(1), 79–88(2010)

AUTOMATIKA 51(2010) 1, 79–88 79

Page 6: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Development of Acoustic Model for Croatian Language Using HTK B. Dropuljic, D. Petrinovic

gual speech recognition was also made [8].

In spite of significant similarities between the two lan-guages, certain differences exist in phonemes and the waythat they are used in constructing words. Therefore, it isnot optimal to use the existing Slovenian models. Croat-ian database VEPRAD would be a good starting point, butwe wanted to build and test a new speech corpus that canbe applied in creating acoustic and language models asso-ciated with expression of emotions and emotional speech.First, it was necessary to build up a new acoustic modelfor the Croatian language. The HTK tool (version 3.3.)was chosen for this purpose due to its popularity and openavailability of sophisticated training tools to build up a newlanguage model from the start [9]. In order to perform ahigh quality customization of this tool it was necessary todesign and implement linguistic and phonetic rules of theCroatian language by creating a dictionary of word tran-scripts and phonetic questions [10].

The targeted applications of the developed ASR systemare stress inoculation training (SIT) and virtual reality ex-posure therapy (VRET). During such training or therapy,a subject is exposed to different levels of stress situationsand system must estimate the emotional state of the sub-ject, or his/her level of stress. Response of the subjectis measured using different modalities, like psychophysio-logical responses (heart rate, blood pressure, skin conduc-tance and temperature), but also through verbal and non-verbal responses. The analysis of vocal features can giveimportant information about the emotional state of a per-son [11]. Emotions can be expressed non-verbally, throughprosodic structure of an utterance, but also verbally, by di-rectly expressing thoughts and feelings. When it comes toemotions, a verbal or a direct form of utterance is oftensuppressed or hesitated to respect the limits of social toler-ability. Provided that a person utters what he or she feelswithout hesitation in a controlled therapy setting, a veryvaluable piece of information is obtained that can be usedfor estimating the emotional state of a person through se-mantic analysis (e.g. keywords and n-grams) [12,13]. Theemotional state estimator is part of an ongoing researchproject: Adaptive control of virtual reality scenarios in

therapy of posttraumatic stress disorder (PTSD), at Uni-versity of Zagreb [14,15].

In order to obtain the best recognition results, acousticmodels must be designed from and optimized for responsesof a closed set of subjects (usually one or a few) that are un-dergoing the targeted therapy. Therefore, our goal was todesign an automated ASR training mechanism that enablesautomatic building of speaker dependent acoustic modelsperformed by trainers or therapists that are not experts inthe ASR. Such training must also be robust to a limiteddata base size acquired from initial sessions. Experiments

presented in this paper are intended to verify the applica-bility of this concept in the intended scenarios.

A paired model for the English language was also de-signed from scratch to serve as a baseline for comparison[16]. Training and testing of these two systems was per-formed in matched conditions in order to verify the suit-ability of the HTK tool for a new language.

2 SETTING UP ACOUSTIC MODELS

As it usually done for continuous speech recognitionwith unlimited vocabulary, a large HMM net is used.Hence, sub-word units must be used to model the speech.In our setup we have trained triphone, biphone and mono-phone models. Since the same phonemes in differentwords are uttered differently depending on their position inthe word and their neighboring phonemes, the ASR systembased on the largest unit offers the greatest modeling accu-racy [9]. For triphone transcriptions, words and sentencesare transcribed as a model chain where each model has onecentral monophone. In further presentation, the left and theright context are separated with minus and plus symbols re-spectively. The following are examples of a sentence tran-scription Ivo is in school (Cro. Ivo je u školi). For tran-scription using the word-internal technique the sentencebecomes: ’sil i+v i-v+o v-o sp j+e j-e sp u sp š+k š-k+ok-o+l o-l+i l-i sil’ [9]. Note that this technique utilizes allthree types of sub-word units. Alternatively, if the cross-word transcription technique is used, the sentence is tran-scribed as: ’sil sil-i+v i-v+o v-o+j sp o-j+e j-e+u sp e-u+šsp u-š+k š-k+o k-o+l o-l+i l-i+sil sil’ [9]. It can be no-ticed that for the second technique, transcription uses onlytriphone models besides the ’sil’ and ’sp’ models. The ’sil’model represents a longer pause at the beginning and at theend of each sentence, whilst a short pause is representedusing a ’sp’ model. Described triphone transcriptions rep-resent the final form of the reference that is used to trainthe system for continuous speech recognition with unlim-ited vocabulary. Large number of states in different modelsare tied together to enhance the quality and robustness ofthe HMM [9].

System training is performed through multiple iter-ations where model parameters of the system are re-estimated in each iteration using Baum-Welch algorithm[9], [17]. At the initial stage, 32 monophone models areinitialized for Croatian language, by means of the so calledflat start method; where each of these models representsone phoneme of the Croatian language. Furthermore, eachof them is modeled as a state machine that consists of 5 dif-ferent states oriented from left to right without state skip-ping, as shown in Fig. 1. The first and the last states rep-resent non-emitting entry and exit states, whilst the centralthree represent the emitting states of each phoneme. Emit-

80 AUTOMATIKA 51(2010) 1, 79–88

Page 7: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Development of Acoustic Model for Croatian Language Using HTK B. Dropuljic, D. Petrinovic

ting states have a loop-connection that enables them to re-main unchanged as long as it is needed. Silence and pausemodels have the same structure but with only one emittingstate. In further training stages, this monophone model isused to train the biphone and triphone models that are usedfor described transcription. Structure of these models isthe same as for the monophone case (Fig. 1). Thus, initial-ization of the new models is obtained by simply copyingparameters of their central monophone model and state ty-ing using phonetic questions.

2 41 53

Fig. 1. Phoneme HMM with entry, exit and 3 emitting

states

Each emitting state is represented by one Gaussianmodel with the mean vector and the corresponding covari-ance matrix. Output observation vectors for each state are39-dimensional vectors computed from 13 mel-frequencycepstral coefficients (MFCC) [18], 13 delta and 13 accel-eration coefficients that are extracted from the recordedspeech utterance.

3 ADJUSTING ACOUSTIC MODEL FOR CROAT-IAN LANGUAGE

In order to develop an acoustic model for Croatian lan-guage, it was necessary to define forms of word transcrip-tion and different options of phoneme tying.

Standard Croatian language is dialectally closest to theWestern Štokavian dialect. Two common dialects that arealso used in Croatia are Cakavian and Kajkavian. In ourexperiment, we have used only the Standard version, withthe Ijekavian reflex of the common Slavic yat vowel.

In this article, phonemes are denoted according to theInternational Phonetic Alphabet (IPA) system using theIPA-SAM phonetic font enclosed in slashes [19].

3.1 Pronouncing DictionaryAn electronic dictionary of word transcripts containing

phonetic pronunciations of the entire vocabulary used inthe ASR is a necessary prerequisite for working with theHTK. Croatian is mostly a phonemic language with almostone-to-one relation between orthography and pronuncia-tion. With only a few exceptions, transcription is possibleby means of a finite set of rules [20], some of which arealready used in [5] and [6]. Thus, the creation of a dic-tionary of Croatian pronunciation can be easily automated;

unlike for the English language that requires manual tran-scription due to complicated pronunciations rules with nu-merous exceptions. There have been attempts to performsimilar automation for the English language as well, butthe algorithm is of much higher complexity [21]. The dic-tionary used in [5] and [6] also includes the stress for eachword. Accented and non-accented vowels are differenti-ated using a colon mark for stressed vowels. This informa-tion is not necessary for speech recognition using HTK, butis useful for natural speech synthesis that is also presentedin [5] and [6].

Croatian language has 32 phonemes, 30 of which areletters of the alphabet (Table 1.) [22]. There are twosounds that are also considered phonemes. Syllabic formof r, /r

"/, is most often found between two consonants hav-

ing the role of a vowel and the diphthong /ie/ as a substituteto the sequence ije in a word, that represents Ijekavian re-flex of the common Slavic yat [20], [23]. The phone setused in [5] is identical to the one presented in Table 1. ex-cept for the diphthong /ie/ that was not included as a sepa-rate phoneme.

�������� ��� ��������������������������������������

����� �� �� ����������� �����������

�� �� �� �� �� �� �� ��.�.� �� .�.� �� .�.� �� .�.� ��.�.� �� .�.� �� ..� �� ..� �.�.� �� .�.� ��� . .� �� .�.� ��.�.� �� .�.� �� .�.� ��� .�.� ��.�.� �� .�.� �� .�.� �� .�.� ��.��.� � .�.� ��� :� �� .�.� ��.��.� � .�.� �� .�.� �� .�.� �� � .�.� �� .�.� !� .�.� ��� � � � .�.� �� .�.� "�� � � � :� �� .�.� ��

Beside these phonemes, Croatian language has allo-phones (conditioned realization of the same phoneme).These are mostly the so called complementary allophonesgenerated by phonetic environment, but also free variantsdue to differences in pronunciation of the same phoneme inthe same context by different speakers with different pro-nunciation habits and styles. According to [23], severalallophones (e.g. [�], [�], [�], [s], [z], [F]) should be easilydistinguishable from their base phonemes by trained listen-ers. For other allophones, differences in pronunciation areconsidered to be more subtle. Nevertheless, since the natu-ral transitions within a word are covered with concatenatedtriphone models, it is not necessary to include allophonesas separate models, i.e. complementary allophones are di-rectly modeled using individually trained triphone models.

The number of complementary allophones and thus thenumber of required triphones is primarily determined by

AUTOMATIKA 51(2010) 1, 79–88 81

Page 8: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Development of Acoustic Model for Croatian Language Using HTK B. Dropuljic, D. Petrinovic

the number of phonemes of a certain language. For ex-ample, the English language has 18 to 23 vowels, and 24or 25 consonants (according to different sources and typesof English language) [24]. For Croatian language with 7vowels and 25 consonants and carriers, the total numberof allophones is much smaller and thus the correspondingacoustical model is more compact and easier to train, whatwill be demonstrated in our experiments.

Orthography of the Croatian language has changedthrough the recent history [20]. Nevertheless, the pronun-ciation and phonetic rules in the Croatian language tend tokeep the naturalness and fluency of the speech. The pro-posed Croatian Language Transcription algorithm (CLT)originates from the idea of preventing possible differencesbetween textual forms of training examples and their pro-nunciation that can occur due to changes or slight varia-tions of the applied orthography. CLT tries to observe thelogic of natural pronunciation by applying phonologicaland phonetic assimilations irrespectively of their inclusionin the actual orthography. The algorithm is fully describedthrough 6 rules that are given in Table 2.

These rules represent either a positive case (usage/U), ora negative case (exception/E), or even both, as it is shownin the third column.

Table also shows that certain assimilation examples canfollow up to two or more different rules. It is thereforerecommended to adhere to the ordering in applying theserules in the sequence as they are listed in Table 2. Themost frequent combinations are rule 2 followed by 4, or 2followed by 5. For composite application of rules, the lastcolumn of the table gives the required sequence, where *denotes the current rule while numbers denote precedingor succeeding rules.

CLT in the current form can be applied only to the wordsof Croatian origin or to assimilated foreign words that ad-here to phonemic transcription (e.g. vikend, softver). Itis not intended for foreign words in the original orthog-raphy, numerical data, dates, acronyms and similar, i.e. itmust be preceded with a text normalization algorithm. Fur-thermore the transcription of compound words (words thatare commonly connected in natural pronunciation) is ex-cluded. This is not because the CLT gains on complexity –as the same rules are applied – but only because such ap-proach would generate new words that require adaptationof the dictionary and corresponding word network. In ex-ample I’m passing through a yellow light (Cro. Prolazim

kroz žuto svjetlo), the cross-word application of CLT ruleswould transcribe it as /pr

"�lazim kr

"��ut� s�jεtl�/ and thus

generate a new word. First solution is to add such newlycreated word krožuto and similar words that do not adhereto Croatian orthography in the dictionary together with theoriginal words kroz and žuto. As an alternative, recogni-tion and network modeling can be reduced to a sub-word

level, while the actual word recognition can be performedfrom the sequence of recognized sub-word units.

3.2 Phonetic Questions

An important feature of the HTK toolkit is its ability totie the states belonging to different models into leaf-nodesof a binary tree [9]. This gives a more compact model,but also improves the robustness since several similar tri-phones are used to train parameters of shared states, whichare represented with leaf-nodes. This enables successfultraining of acoustic models even with limited training ma-terial, i.e. for building of triphone models that are not rep-resented in the training database. Branching is based onspecific questions linked to the left and right context ofthe central monophone in a triphone. The questions areformed according to the characteristics and sonority of thepronunciation in the Croatian language and are linked tothe manner and place of articulation, sonority of soundsetc.

Since phonetic questions are numerous (cca. 3 pages),they are not presented in this paper. However the list canbe found in the Appendix 6 of [10].

4 TRAINING AUTOMATION AND GUI INTER-FACE FOR CROATIAN ACOUSTIC MODELBUILDING

HTK toolkit is based on a large number of executablefiles, which are invoked through a command line interface.Each of these executables accepts a large number of inputparameters, that must be configured properly [9]. Since thetool is highly configurable, significant effort is needed tounderstand all the possible options offered by these tools.Training and testing can be automated by using scriptinglanguages. In order to simplify this to the maximum extent,we have developed a Graphical User Interface (GUI) inMatlab for training, testing and results’ analysis, as shownin Fig. 2, [10]. The developed program achieves completeautomatization of all necessary steps that are required intraining and evaluation of the ASR system. This is veryimportant for the end-users that are not technical expertsin this field. Still it enables sufficient customization to aparticular task through the GUI interface (e.g. selectionof training and testing databases, HMM models, languagesand the most significant ASR parameters). The equivalentGUI environment was developed for English language aswell [16].

5 TESTING AND COMPARISON OF ACOUSTICMODELS

Training and testing of Croatian and English modelswere done in matched conditions. Ten native Croatian

82 AUTOMATIKA 51(2010) 1, 79–88

Page 9: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Development of Acoustic Model for Croatian Language Using HTK B. Dropuljic, D. Petrinovic

�������#� ��� ���� �������� ����������

������������������ ����� �������������������

���� ��������������

�� 8� ����� � �� ��� ���� .�������.� �

�� 8�������������� �!����8 ��������������������������� � ������ ������ .�����������.� �

�� ������ ������8��6��������� � �� ��� ���������� .�����������.� ?���� 8���� ����������������=���� ��� ��� � ��� .������.� ?��

�� ��������8���������� �������8 ��� =�!����������8�8���� ����7������������

�� ����������8 ���6��8� ��7�������4������������3� �� �� ��

.�����.�46��������.�����.3�

�� ���������8�� ����������� ��� ���������� .�����������.� �?��� ���������8��6 ���� �� ����������� ����� ���� .�������.� ?���� ���8���������8�������� ���.����:��� ����� � ����������� .����������.� ?����

����8���������8��6�����������8������������������ ��.���:��� ����� =��������������6����� ��7��8��6���������������4�������@3� ��

��������������6�������������������A�����6��������������� �����8��� �� ������6�� ������������������7�4�������@3�

��� � ��� .������.� �?��

����������8��6��6�����.�.���� �������� ��8�� ��!� ��� 6� ���������������8��6�������6� ���� ���

�� � ���� .����.� ?��

�� ���!� �����8�� ����������!��6������������6����� �� � ����������� .���������.� �?�

�� �����������.�.�4�3� �� ��� .����.� ������ ���� .�������.� �?��� �����������.�.�4�3���� � ��� .������.� �?�

�� �����������.�.�4�3� ��� ��� .�����.� ��� �����������.�.�4�3� ��������� �� .���������.� �?��� "����������.�.�4�3� ����"������� .����������.� �?�

���!���������.�.�4�3�4 �������.��.�����.��.3� � � �?�

�� ����������.�.�4��3� �������������� .����������.� ��� �����������.�.�4�3� ��������� .�������.� ��� �"����������.�.�4"3� �!"� � �� .�����.� �?��� !�����������.�.�4�3� ��!����� �� .��������.� ��� ������������.��.�4"�3� �������� �� .���������.� ��� ������������.��.�4"�3� � � ��� !����������.��.�4���3� � � ��� !�����������.��.�4��3� ��!������ �� .����������.� ��� ������������.��.�4��3� ������� � .������.� ��� ������������.��.�4��3� � ������ .������.� ��� �����������.�.�4�3� � � ��� ������������.�.�4�3� � � ��� ������������.�.�4�3� � � �

������������ ��� �� ��� �6�����������������8������ �� ��6��������� ������ ����������

�� �������������.��.�4��6�6���3� ���� .���.� �?�

�� �����������������6�������������8��� � �������8�� �����6� ��������6� ���������� �!� �� .�����.� �

�� ��������6�������8��6�� � ���������8 �����8�����6� ��������������� ����� .�����.� �

�� ������� ����� ����������� ��� � .����.� �

�� ������� ����� �������������������������=��=��=���=��=���=��=����� ��������8 �����8��� � � �

�� ��7����6�����.��.����� ������ ����������

�� ��8� �����6� ������������������8�� �����!� ����� �� ��� .�����.� �

speakers (5 men and 5 women), aged 20 to 25, took partin experiments. All the speakers were fluent in Englishsuch that the same 10 speakers were used for recording ofthe English materials as well.

Random texts in Croatian and English language weretaken from the Internet for training and testing of thesemodels and 500 sentences were formed and recorded for

each language. The sentence length was limited to a maxi-mum of 100 characters. All words and signs not predictedas references were removed. Bilingual speech corpus wasformed as shown in Table 3. by reading and recordingthese sentences. The first set was used to train and test aspeaker dependent system customized to only one speaker(BD – for Croatian and NM – for English). In this case,

AUTOMATIKA 51(2010) 1, 79–88 83

Page 10: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Development of Acoustic Model for Croatian Language Using HTK B. Dropuljic, D. Petrinovic

Fig. 2. GUI interface in Matlab for HTK for Croatian lan-

guage

a single speaker uttered all 500 sentences. The second setwas used for speaker independent models. For this set, allten speakers uttered 50 sentences each in both languages.Although the minimum required number of speakers forbuilding a true speaker independent model is around 50,with the proper balance between gender, age and dialects[25], this limited data set was used in the experiment thatverifies the worst case ASR performance for a closed set ofspeakers. The third set for both languages was uttered bythe same speakers as in the first set but for this set simplystructured phone dialing sentences were used intended fortesting of the grammar influence.

The accuracy of correct transcription of all recordeddatasets was verified manually prior to training.

Adjusting an ASR system for a new subject in psy-chotherapy usually means dealing with the scarce data forbuilding a new speaker dependent system from initial ses-sions. Hence, a logical path is to iteratively customize theexisting speaker independent system to a new speaker untilthe data sets are large enough to create individually trainedASR system for this particular speaker. Also, the limitednumber of speakers in the initial phase of building a newASR system for psychotherapy can be a problem. There-fore, our intention was to verify the accuracy of the acous-tic models in such resource-constrained conditions withseveral experiments that are explained next.

������$� %������������������������ ����������������

�������� �B�B���� �B��B���� �B�B��B������������������ ��4C�3� ��� ��4C�3������������������ ���� ���� ���

��������� �������>�� �������>�� 6������������

�������������������� ��D��D��� ��D��D��� ��D��D���

��������������������� ������ ����� ����� ���

!�������"�������� ���EFG=������������

���EFG=������������

���EFG=������������

�����������������

�������� �B�B���� �B��B���� �B�B��B������������������ ��4��3� ��� ��4��3������������������ ���� ���� ���

��������� �������>�� �������>�� 6������������

�������������������� ��D��D��� ��D��D��� ��D��D���

��������������������� ������ �9��� �9��� ���

!�������"�������� ���EFG=������������

���EFG=������������

���EFG=������������

5.1 Speaker Dependent Models

As a baseline model for comparison and testing, aspeaker dependent acoustic model was created for bothlanguages in the first experiment. The first set fromthe speech corpus was used for creating this particularmodel (C_1_500 and E_1_500). Sentences from these twodatabases were separated into two separate sets for train-ing and for testing with the ratio of 90:10. Word-internalmethod and the proposed CLT algorithm were applied forthe Croatian transcription. For evaluation of the trainedmodel, no language model or grammar were used, thus theperformance of the ASR was solely determined by the ac-curacy of the trained acoustic models. This actually cor-responds to an unigram language model with equal prob-ability of all words (i.e. the so called flat model) denotedwith Guf abbreviation in Table 4. Hence, word networkswith approximately 2000 words were created from 500sentences for both languages.

The results of all performed experiments are summa-rized in Table 4. in the form of word correctness, (N-D-S)/N * 100%, and word accuracy, (N-D-S-I)/N * 100%,where N represents the number of words in the testing set,D is the number of deleted words, whereas S stands forsubstituted and I for the number of inserted words. Theobtained word correctness was 90% for the Croatian lan-guage and 75% for English, what is a satisfactory result,taking into account that flat unigram language model andrelatively large word networks were used.

All experiments were done by setting the word inser-

tion penalty in HTK HVite (p) to –60, which significantly

84 AUTOMATIKA 51(2010) 1, 79–88

Page 11: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Development of Acoustic Model for Croatian Language Using HTK B. Dropuljic, D. Petrinovic

affects the word accuracy through varying the number ofword insertions.

5.2 Speaker Independent Training and Evaluation

In order to evaluate the loss of accuracy due to speakerindependent training the second experiment was per-formed using the second set with 10 speakers (C_10_500and E_10_500) and these results were compared to theresults of the first experiment. The model was built andtested 10 times. In each trial, utterances from one speakerwere selected as a testing set and the remaining 90% ofthe set were used for training purpose, with the same pa-rameters as in the first experiment, what is also known asleave one out or cross validation method. As such, it givesthe worst case performance of the ASR system, when ex-isting acoustic model built from a closed set of speakersis used for recognition of a newly enrolled subject withoutany adaptation.

Average results together with standard deviations andthe maximum and minimum scores are shown in Table4. For the Croatian language, the average word correct-ness score dropped to 66%, while for English to 44%. Al-though speaker independent models clearly impair recog-nition score in this experiment, such training is expectedto outperform the speaker dependent training when builtusing a speech corpus of a proper size and sufficient num-ber of speakers and when evaluated using the test data basewith the unlimited number of speakers.

5.3 Grammar Influence

In the third experiment, the grammar influence wastested using the phone dialing sentences collected inthe third set of the speech corpus (C_1_50_pd andE_1_50_pd). We have created a word network using aphone dialing grammar that strictly defines possible word-to-word transitions. It was used to test the same acousticmodels trained in the first experiment. In this grammar,only names, surnames or phone numbers are allowed com-bined with a few dialing commands, as shown in Fig. 3.

Word networks were also much smaller for this casewith approximately 25 words for both languages. Thegrammar used in this evaluation is abbreviated with Gpd

in Table 4. The hundred percent accuracy was obtainedfor both languages, what shows that the grammar is partic-ularly useful in applications that have a regular and strictword order in a sentence.

5.4 Cross-Word Transcription Influence

There are two ways of transcribing continuous speech.Word internal is the default method in this article and it isapplied in all experiments discussed so far. In our fourth

SENT - START

SENT - END

ŠUMONJAIGOR

DROPULJICBRANIMIR

MAREVICNINA

DEVET

DVA

JEDAN

NULA

NAZOVI

ZOVI

BIRAJ

Fig. 3. Phone dialing grammar in Croatian language

experiment, we have performed a new training and eval-uation analogous to the first experiment, but in this casewe have used the cross-word transcription method. Thismeans that words are transcribed by using a cross wordtriphones allowing for better modeling of concatenatedwords observed in the continuous speech pronunciation.Phonetic questions that we have been used for state tyingare also important for cross-word models, since word con-catenation might easily introduce new triphones not repre-sented in the training set.

Slight improvement (+0.5%) offered by the cross-wordtranscription was observed for the Croatian model for wordcorrectness, but otherwise, word accuracy for Croatian lan-guage as well as both scores for English language, areworse than with the default word-internal method. Prob-able reason for such a behavior is the limited training setsize and significant increase of the complexity of word net-works due to all possible additional transitions required bythe cross-word triphones.

5.5 Croatian Transcription Rules InfluenceTranscription rules (Table 2.) for the acoustic model

of Croatian language were implemented within the devel-oped ASR training application according to the describedCLT algorithm. Influence of these rules on the recognitionresults was analyzed in the fifth experiment. In order todetermine how often the CLT rules are applied, we haveprocessed two word lists and counted number of words af-fected by CLT. For the first list comprised of 15000 mostcommon words in Croatian language the percentage ofmodified words was 16.65%. This list was compiled bythe Institute of Croatian Language and Linguistics, underthe project of Croatian Language Repository. The sec-ond word list was derived from sentences of the C_1_500data set. In this case, the number of word affected by CLTwas 15.56%. Typically only one or few phonemes in such

AUTOMATIKA 51(2010) 1, 79–88 85

Page 12: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Development of Acoustic Model for Croatian Language Using HTK B. Dropuljic, D. Petrinovic

words are actually affected (either by deletion, substitutionor insertion).

In order to evaluate the influence of CLT; the samemodel was rebuilt as in the first experiment, but without us-ing CLT rules. In this case, a simple one-to-one mappingbetween phonemes and graphemes was used. Syllabic r

/r"/ and the diphthong /ie/ were excluded and monophone

models were built from 30 phonemes directly correspond-ing to 30 graphemes of the Croatian alphabet. Only a smalldecrease of 0.2% (in both accuracy and correctness) can beobserved in Table 4. when CLT rules are deactivated. Cer-tainly, the trained triphone models are capable of capturingpronunciation changes due to the left and right context thatare otherwise explicitly modeled through the CLT rules.Probable reason for relatively small difference is the factthat only a small percentage of phonemes in average areaffected by CLT. A specially designed test dataset com-prised of only the words affected by CLT is expected toexhibit greater difference.

Nevertheless, although this improvement is not substan-tial, it is useful since it can provide better results during on-line recognition, while only demanding additional engage-ment at initial phase when creating pronouncing dictionaryand building models and word network.

�������&� ��� ��������� ��

� ����������������

#$��������� ������������ ����%������&��

����'�����&��

������������������

� D��B�B����49�H3=� :�=�������D��B�B����4��H3=���8� 9�@��� ��@���

�!�� ��@��� ��@������ 9@��� ��@����>� ��@��� ��@9��

��������(������

)*$�+��,�����������

� D��B��B����49��3=� :�=�������D��B��B����4���3=���8� ��� ��@��� ��@���

��������(��+������

� D��B�B����49�H3=� :�=�������D��B�B��B�=���� ���@��� ���@���

% �(��+������

� D��B�B����49�H3=��: =�������D��B�B����4��H3=���8� 9�@��� ��@�9�

�������%-�� � D��B�B����49�H3=� :����D��B�B����4��H3=���8� �9@9�� ��@���

�����������������

#$��������� ������������ ����%������&��

����'�����&��

������������������

� D��B�B����49�H3=� :����D��B�B����4��H3=���8� ��@��� ��@���

�!�� ��@��� ��@������ �@�9� �@����>� ��@��� ��@���

��������(������

)*$�+��,�����������

� D��B��B����49��3=� :2���D��B��B����4���3=���8�

��� ��@�9� ��@�����������(��+������

� D��B�B����49�H3=� :����D��B�B��B�=���� ���@��� ���@���

% �(��+������

� D��B�B����49�H3=��: ���D��B�B����4��H3=���8� ��@��� ��@���

6 CONCLUSION AND FUTURE WORKResults presented in the paper show that the acoustic

model for a continuous speech ASR system for Croatianlanguage can be easily developed with good quality results.Achieved word correctness was 90.13% for a speaker de-pended system and 65.56% for the speaker independentone, with a vocabulary size of 2403 words in both cases.Results for the Croatian model are actually better than theresults for the English model that was trained and tested inan identical setup. This proves that HTK tool used in ourexperiments is indeed easily customized to any new lan-guage. Better result for the Croatian model, compared tothe English one, probably has to do with the fact that Croa-tian language is phonetically simpler. It is therefore easierto train and model using a HMM. Another possible reasonis the fact that only native Croatian speakers participated inthe experiments. The difference was particularly observedfor the case when speaker independent data set was usedfor training. It is expected that speakers will make fewerunnatural variations and mistakes when uttering their na-tive language, what explains this difference. More detailedresults are described in [10] and [16].

We have shown that application of the proposed algo-rithm for automatic transcription of Croatian words in-deed improves the recognition score. We plan to upgradethis basic algorithm by enabling it to transcribe compoundwords, acronyms, numbers, dates and other words that arenot transcribed in accordance to the standard rules of tran-scription in Croatian language.

In described experiments we have used a recordedspeech corpus with read sentences using non-emotionalspeech. Further experiments will be carried out witha newly developed corpus recorded during actual stresstraining sessions, in order to verify the effects of emotionalspeech on the recognition accuracy. So far, our focus wason development and verification of the automated tool, us-ing mockup databases. Nevertheless, we have shown thatspeaker dependent models can be built using limited datasets, what is especially important for targeted applications.The current version of the ASR system was already suc-cessfully applied for voice control of a robotic arm withtwo joints and for an automated voice entry of exam re-sults at our University.

Additionally, future work includes development of sta-tistical language models of Croatian language to improvecurrent results that rely only on developed acoustic models.Such language models will be trained for specific applica-tions like the emotional state estimator in stress inoculationtraining (SIT) and VR exposure therapy (VRET).

ACKNOWLEDGMENTThis work was supported by the Ministry of Science,

Education and Sports of the Republic of Croatia under

86 AUTOMATIKA 51(2010) 1, 79–88

Page 13: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Development of Acoustic Model for Croatian Language Using HTK B. Dropuljic, D. Petrinovic

project #036-0000000-2029. Authors would like to thankcolleagues at the Dpt. of Electronic Systems and Informa-tion Processing for useful discussion and comments duringpreparation of the manuscript, as well as to anonymous re-viewers for their valuable suggestion in preparing the finalversion.

REFERENCES

[1] L. E Baum and T. Petrie, “Statistical interface for proba-bilistic functions of finite state Markov chains,” The Annals

of Mathematical Statistics, vol. 37, pp. 1554-1563, 1966.

[2] J. K. Baker, “The dragon system – An overview,” IEEE

Transactions on Acoustics, Speech and Signal Processing,vol. 23, pp. 24-29, Feb. 1975.

[3] F. Jelinek, L. R. Bahl and R. L. Mercer, “Design of a lin-guistic statistical decoder for the recognition of continuousspeech,” IEEE Transactions on Information Theory, vol. 21,pp. 250-256, May. 1975.

[4] S. Martincic-Ipšic and I. Ipšic, “Context-Dependent Acous-tic Modeling of Croatian Speech,” in Proc. IS-LTC’06,2006, pp. 251-257.

[5] S. Martincic-Ipšic, R.Slobodan and I. Ipšic, “Acoustic Mod-elling for Croatian Speech Recognition and Synthesis,” In-

formatica, vol. 19, pp. 227-254, 2008.

[6] S. Martincic-Ipšic, “Croatian Speech Recognition andSynthesis Based on Context-dependent Hidden MarkovModel,” PhD Thesis, in Croatian, University of Zagreb,Croatia, Nov. 2007.

[7] S. Dobrišek, F. Mihelic and N. Pavešic, “Speech Segmen-tation Aspects of Phone Transition Acoustical Modelling,”Lecture Notes in Computer Science, vol. 1692, pp. 844, Jan.1999.

[8] A. Zgank, Z. Kacic and B. Horvat, “Comparison of Acous-tic Adaptation Methods in Multilingual Speech Recogni-tion Environment,” Lecture notes in computer science, vol.2807, pp. 245-250, Feb. 2004.

[9] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, G.Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev and P.Woodland, The HTK Book (for HTK Version 3.3), 8th ed.,Cambridge University Engineering Department, 2005.

[10] B. Dropuljic, “Development of Acoustic and Lexical modelfor Automatic speech recognition for Croatian Language,”Diploma Thesis, in Croatian, University of Zagreb, Croatia,Jan. 2008.

[11] B. Schuller, G. Rigoll and M. Lang, “Hidden Markovmodel-based speech emotion recognition,” in Proc. ICASSP

‘03, 2003, pp. II - 1-4.

[12] B. Schuller, G. Rigoll and M. Lang, “Speech emotionrecognition combining acoustic features and linguistic in-formation in a hybrid support vector machine-belief net-work architecture,” in Proc. ICASSP ‘04, 2004, pp. I - 577-580.

[13] T.S. Polzin, “Verbal and non-verbal cues in the communi-cation of emotions,” in Proc. ICASSP ‘00, 2000, pp. 2429-2432.

[14] D. Kukolja, S. Popovic, B. Dropuljic, M. Horvat and K.Cosic, “Real-time emotional state estimator for adaptivevirtual reality stimulation,” Lecture Notes in Computer Sci-

ence, vol. 5638, pp. 175-184, Jul. 2009.

[15] S. Popovic, M. Horvat, D. Kukolja, B. Dropuljic andK. Cosic, “Stress inoculation training supported byphysiology-driven adaptive virtual reality stimulation,”Studies in Health Technology and Informatics, vol. 144, pp.50-54, 2009.

[16] N. Marevic, “Development of Acoustic and Lexical modelfor Automatic speech recognition for English Language,”Diploma Thesis, in Croatian, University of Zagreb, Jan.2008.

[17] L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A max-imization technique occurring in the statistical analysis ofprobabilistic functions of Markov chains,” The Annals of

Mathematical Statistics, vol. 41, no. 1, pp. 164-171, 1970.

[18] S. B. Davis, and P. Mermelstein, “Comparison of Para-metric Representations for Monosyllabic Word Recognitionin Continuously Spoken Sentences,” IEEE Transactions on

Acoustics, Speech and Signal Processing, vol. 28, pp. 357-366, Aug. 1980.

[19] International Phonetic Association, Handbook of the In-

ternational Phonetic Association: A Guide to the Use of

the International Phonetic Alphabet, Cambridge UniversityPress, Massachusetts, 1999.

[20] S. Babic, D. Brozovic, I. Škaric and S. Težak, Sounds and

Forms of Croatian Literary Language, A Great Croatian

Grammar, in Croatian, Globus, Zagreb, 2007.

[21] B. Byrne, M. Finke, S. Khudanpur, J. McDonough,H. Nock, M. Riley, M. Saraclar, C. Wooters and G.Zavaliagkos, “Pronunciation modeling for conversationalspeech recognition: A status report from WS97,” in Proc.

ASRU ’97, IEEE Workshop on, 1997, pp. 26-33.

[22] D. Petrinovic, Digital speech processing, lecture materials,in Croatian, Faculty of electrical engineering and comput-ing, University of Zagreb, 2002.

[23] E. Baric, M. Loncaric, D. Malic, S. Pavešic, M. Peti, V.Zecevic and M. Znika, Croatian Grammar, in Croatian,Školska knjiga Zagreb, 1997.

[24] J.C.Wells, Longman Pronunciation Dictionary, 1st ed.,Harlow: Longman, 1990.

[25] J.A.C. Badenhorst, M.H.Davel, "Data requirements forspeaker independent acoustic models," in Proc. of the 19th

Annual Symposium of the Pattern Recognition Association

of South Africa, Nov. 2008, pp. 147-152.

AUTOMATIKA 51(2010) 1, 79–88 87

Page 14: Author's personal copy · 2010. 4. 27. · DSTATCOM Based on Cascaded H-bridge Multilevel Inverter original scientific paper Dražen Juriši´c, George S. Moschytz, Neven Mijat 55

Development of Acoustic Model for Croatian Language Using HTK B. Dropuljic, D. Petrinovic

Branimir Dropuljic was born in Varaždin, Croa-tia in 1983. He received the Dipl.ing. degree inElectrical Engineering from University of ZagrebFaculty of Electrical Engineering and Computingin 2008. Since 2008, he has been research as-sistant at the Department of Electric Machines,Drives and Automation, Faculty of Electrical En-gineering and Computing. He is doctoral studentworking on the scientific project "Adaptive con-trol of virtual reality scenarios in therapy of post-traumatic stress disorder (PTSD)". His interests

and main filed of research include emotional state estimation from vocaland psychophysiological responses.

Davor Petrinovic was born in 1965 in Croatia.He received the Dipl.ing. degree in Electrical En-gineering from University of Zagreb Faculty ofElectrical Engineering in 1988 (today, Faculty ofElectrical Engineering and Computing). He re-ceived M.Sc. and Dr.Sc. degree in the field ofelectrical engineering, in 1996. and 1999. re-spectively, from the same institution. He wasappointed an Associate Professor in 2005, at theDepartment of Electronic Systems and Informa-tion Processing, Faculty of EE&C, Uni. Zagreb.

He was a Fulbright post. doc. scholar in 2000/01 at SCL Laboratory,UC Santa Barbara, USA and a visiting researcher at Sound and ImageProcessing Lab, School of EE, KTH, Stockholm Sweden in 2005/06. Hiscurrent research interests include speech and audio modeling, processingand coding.

AUTHORS’ ADDRESSESBranimir Dropuljic, Dipl.ing.Department of Electric Machines, Drives and Automation,Prof. Davor Petrinovic, Ph.D.Department of Electronic Systems and InformationProcessing,Faculty of Electrical Engineering and Computing,University of Zagreb,Unska 3, HR-10000 Zagreb, Croatiaemails: [email protected], [email protected]

Received: 2010-01-21Accepted: 2010-03-12

88 AUTOMATIKA 51(2010) 1, 79–88