a framework for bangla text to speech synthesis

20
A Framework for Bangla Text to Speech Synthesis Authors K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi Presented By Sanjoy Dutta Department of Computer Science & Engineering Khulna University of Engineering and Technology, Khulna, Bangladesh. Authors

Upload: sanjoy-dutta

Post on 25-May-2015

329 views

Category:

Technology


2 download

DESCRIPTION

My conference presentation slide for my paper in 16th ICCIT conference, 2013.

TRANSCRIPT

Page 1: A framework for bangla text to speech synthesis

A Framework for Bangla Text to Speech Synthesis

Authors

K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi

Presented By

Sanjoy Dutta

Department of Computer Science & Engineering

Khulna University of Engineering and Technology, Khulna, Bangladesh.

Authors

Page 2: A framework for bangla text to speech synthesis

Contents

• Problem Statement

• Factors for Speech Synthesis in Bangla

• Proposed Framework • Rules and Structure Development • Syllable Parser Development

• Audio File Selection and Normalization

• Experimental Analysis & Results

• Conclusion

2

Page 3: A framework for bangla text to speech synthesis

Problem Statement

•Develop a framework for Bangla Text to Speech Synthesis.

3

Page 4: A framework for bangla text to speech synthesis

Contents

• Problem Statement

• Factors for Speech Synthesis in Bangla

• Proposed Framework • Rules and Structure Development • Syllable Parser Development

• Audio File Selection and Normalization

• Experimental Analysis & Results

• Conclusion

4

Page 5: A framework for bangla text to speech synthesis

Factors for Speech Synthesis in Bangla

• Sequential flow of diphones

A diphone is a set of two adjacent phonemes where the transition between two phonemes are modelled, usually from the middle of the first phoneme to the middle of the second phoneme.

A phoneme is a sound or a group of different sounds perceived to have the same function by speakers of the language or dialect in question. Like in English for K/C phoneme: Skill, School.

• Position vs. Pronunciation

Three kinds of position occurs of consonant and vowels:

Constant Vowel(CV)

Vowel Constant(VC)

Vowel Constant Vowel(VCV)

5

Page 6: A framework for bangla text to speech synthesis

Contents

• Problem Statement

• Factors for Speech Synthesis in Bangla

• Proposed Framework • Rules and Structure Development • Syllable Parser Development

• Audio File Selection and Normalization

• Experimental Analysis & Results

• Conclusion

6

Page 7: A framework for bangla text to speech synthesis

Proposed Framework Structure and Rules

• Text Normalization:

Transforming text into a single standard form.

Used when converting text to speech, numbers, dates, acronyms, and abbreviations.

Text Normalization for Position vs. Pronunciation.

7

Page 8: A framework for bangla text to speech synthesis

Normalization rules for ‘ ’

8

Page 9: A framework for bangla text to speech synthesis

Normalization rules for ‘ - - -’

9

Page 10: A framework for bangla text to speech synthesis

Syllable Parser Development

10

Page 11: A framework for bangla text to speech synthesis

Syllable Parser In Action

11

Page 12: A framework for bangla text to speech synthesis

Contents

• Problem Statement

• Factors for Speech Synthesis in Bangla

• Proposed Framework • Rules and Structure Development • Syllable Parser Development

• Audio File Selection and Normalization

• Experimental Analysis & Results

• Conclusion

12

Page 13: A framework for bangla text to speech synthesis

Audio File Selection and Normalization

Total 39 consonants 11 vowels in Bangla

After Reduction

28 independent consonants

8 (the vowel ’ ‘ is the exception) vowel

13

Page 14: A framework for bangla text to speech synthesis

Audio File Selection and Normalization

Finally 224 (28*8) audio files for the syllables.

28 consonant against 5 vowels to generate

140 (28*5) diphones.

In summary, we need (9 vowels, 28

consonants, 224 syllables and 140 diphones)

401 audio files to be created.

14

Page 15: A framework for bangla text to speech synthesis

Contents

• Problem Statement

• Factors for Speech Synthesis in Bangla

• Proposed Framework • Rules and Structure Development • Syllable Parser Development

• Audio File Selection and Normalization

• Experimental Analysis & Results

• Conclusion

15

Page 16: A framework for bangla text to speech synthesis

Experimental Analysis and Results

Strategy of Analysis:

Sample Input Test: Various News Articles from News Portals

Listeners Selection: Anonymous Personals Chosen Randomly

Accuracy Analysis:

Accuracy = 𝑊𝑜𝑟𝑑𝑠 𝑙𝑖𝑠𝑡𝑒𝑛𝑒𝑟𝑠 𝑤𝑒𝑟𝑒 𝑎𝑏𝑙𝑒 𝑡𝑜 ℎ𝑒𝑎𝑟 𝑜𝑛 1𝑠𝑡 𝑎𝑡𝑡𝑒𝑚𝑝𝑡 𝑐𝑙𝑒𝑎𝑟𝑙𝑦∗100

𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑒𝑣𝑒𝑟𝑦 𝑠𝑎𝑚𝑝𝑙𝑒

16

Page 17: A framework for bangla text to speech synthesis

Experiment Result Listening Factors:

• Duration Synchronization and

Merging

• Numerical Value like years

Constrains in Sample 1:

‌ , , ,

, , ,

Constrains in Sample 2:

, , , , ,

,

17

Page 18: A framework for bangla text to speech synthesis

Limitations and Future Works

Detect Noun and Adjective words namely

( ) Noun and

( ) Adjective

both words should follow the rule 3(a) .

But they don't follow the rule 3(a) and their pronunciation is different.

18

Page 19: A framework for bangla text to speech synthesis

CONCLUSION

We believe the proposed framework can be useful for Bangla TTS development to detect the Bangla words with minimum audio file requirement.

19

Page 20: A framework for bangla text to speech synthesis

Thank You !!!

20