fundamental frequency contour synthesis for turkish text to speech

92
Fundamental Frequency Contour Synthesis for Turkish Text to Speech Erkan Abdullahbeşe

Upload: felicia-clay

Post on 01-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Fundamental Frequency Contour Synthesis for Turkish Text to Speech. Erkan Abdullahbeşe. Content :. TTS systems and prosody Turkish Intonation, Stress Observations on Collected Data Methodology Improvements on Methodology Discussion Conclusion. Introduction to Text to Speech (TTS) Systems. - PowerPoint PPT Presentation

TRANSCRIPT

Fundamental Frequency Contour Synthesis for Turkish Text to Speech

Erkan Abdullahbeşe

Content:

• TTS systems and prosody

• Turkish Intonation, Stress

• Observations on Collected Data

• Methodology

• Improvements on Methodology

• Discussion

• Conclusion

Introduction to Text to Speech (TTS) Systems

• Text -> speech signal

• Widespread applications– Message to speech generation

– Man-machine dialogue

– Multimedia applications

– Talking aids for handicapped

CHALLENGE: Machine Accent -> Natural Speech

SOLUTION: Prosody Generation Modules

What is Prosody?

• Properties of speech that cannot be derived from the phoneme sequence– Modulation of voice pitch

– Rhythm, changes in durations

– Fluctuations of loudness

• Related to domains larger than one phoneme

(supra-segmental properties)

Basic Acoustic Parameters

• Fundamental Frequency F0 (pitch)

• Duration

• Intensity

Prosodic Phenomena

• Modulate the basic acoustic parameters

• Modulation of fundamental frequency

• Intonation

• Stress (accent)

Intonation

• Ensemble of pitch variations

• Perceived as speech melody

Stress• Modulate all the basic acoustic parameters

• Increase in F0 and intensity (loudness)

• Lengthening in duration

• Three types:• Word stress

• Phrase stress

• Sentence stress

• Stress on a single syllable

• Phrase and sentence stress coincide with word stress

Prosody Generation Modules in TTS

• Prosodic description– Prosodic phrasing -> phrase boundaries

– Accent labeling -> accents on syllables

• Prosodic labels -> F0 contour

• Complex linguistic processing units (morphology, syntax, semantics)

• Speaker-dependence

• Articulation-related problems: microprosody vs. macroprosody

PROBLEMS

Basic Intonation Models• Tone Sequence Models : Pitch contour as a sequence of

fluctuations generated by local accents– Pierrehumbert: A sequence of independent H and L tones (ortography)

• Pitch accent -> pitch movements on stressed syllables• Boundary tone ->at phrase boundaries• Phrase accent -> between stressed syllable and phrase boundary

• Superposition Models : Pitch contour as the superposition of several components with different domains: syllables, words, phrases, sentences, paragraphs, whole text– Fujisaki: purely mathematical model -> parametric

• A basic F0

• A phrase component (crit. Damped sec. Order to impulse)• An accent component (crit. Damped sec. Order to rectangular)• Optimization of parameter values wrt F0 (Analysis by Synthesis)

– Möbius -> Fujisaki + Linguistics -> German

Approaches

• Perform an analysis on a speech corpus

• Transcribe the corpus– Define F0 labels(rise, fall, peak etc.) and boundary labels (minor, major etc.)

– Labeling• By hand• Examination -> rules -> automatic

• Automatic learning of : labels -> F0 values (or parametrized)– Neural Networks

– Stochastic methods

• Intonation pattern dictionary (from natural speech)– Store pitch values in ST and key information (labels) for each pattern

– For the patterns in input sentence -> compare key info -> find closest pattern from dictionary -> apply pitch

Approaches

• For integration into TTS (labeling input sentence from text) – Complex linguistic processing units

• Morphology• Syntax• Semantics

– Stochastic methods• Syntax -> most probable label sequence

Sentence Intonation Types

• Terminal intonation– pitch decreases at the end -> message completed

• Interrogative intonation – pitch slightly increases on the last syllable -> waiting for response

• Progressive intonation– pitch either increases slightly or does not show any lowering at the

end -> message not completed yet

Turkish Intonation

• Classification of sentences– Type:

• Declaratives(↓)

• wh-questions(↑)

• yes-no questions(↓)

– Structure:

• Simple

• Compound: (↑) at the end of subordinate

– Meşgul olduğundan(↑) bizimle sinemaya gelemedi(↓).

Turkish Intonation

• Tone groups (phrase or segment)– Division into tone groups

• / Oraya varınca beni arayın. /

• / Oraya varınca / beni arayın. /

– Focus (new information) in each tone group

• / Oraya varınca beni arayın. /

• / Oraya varınca beni arayın. /

• / Oraya varınca beni arayın. /

• / Oraya varınca beni arayın. /

– Pitch variations on focus

Turkish Intonation

• Four levels of pitch: low(1), mid(2), high(3), extra high(4)– gi2di3yoru1m

– sa2hi4 mi1

• Speech melody <–> musical melody (Nash)– Hierarchy of intonation units(phrase -> text)

– Each intonation unit -> melody

– Successive intonation units related by motifs -> melody of the upper level

– Music: reiteration of motifs -> musical melody

Turkish Stress

• Fixed(bound) stress vs. Free stress(Turkish)

• Stress on a single syllable of a word in Turkish

• Effect of suffixes on stress– Stress on final syllable of root + stressable suffix

yolcu + -lar → yolcular

– Stress on final syllable of root, unstressable suffix involves

oku + -yor → okuyor + -lar → okuyorlar

– Stress on non-final syllable of root

karınca + -lar → karıncalar

• May disappear in sentence

Word Stress

Turkish Stress

• Signals the prominance of the most information-bearing element in a sentence

• Types– Unmarked (preverbal position)

• Yarın İstanbul’a gidiyorlar.

– marked (any position)• Yarın İstanbul’a gidiyorlar.

• Focusing elements– Precede focus: sadece, daha

• Mehmet daha bugün ödevine başlayabildi.

– Follow focus: -mi, da, bile• Ayla mı bugün Ankara’dan dönüyor?

Sentence Stress

Turkish Stress

• Phrase: modifier or complement and head• Phrase stress on modifier in Turkish • Types

– Phrases used as nouns• telefon ahizesi• güzel çiçekler

– Phrases used as verbs• hızlı koş• severek yaşa

– Others• senin için• yarından sonra

• Preserved in the sentence

Phrase Stress

Motivation

• Nevin bugün menemen yemeli. (template)

N Z F V

Nevin menemen yemeli.

N F V

Bizim Nevin domatesli menemen yemeli.

P N A F V

• Nalan yarın ayna alıyor.

N Z F V

Nalan ayna alıyor.

N F V

Kardeşim Nalan yeni ayna alıyor.

N N A F V

Nevin bugün menemen yemeli.

Nevin menemen yemeli.

Nevin bugün menemen yemeli.

Bizim Nevin domatesli menemen yemeli.

F0

Nevin bugün menemen yemeli.

Nalan yarın ayna alıyor.

F0

Nevin bugün menemen yemeli.

Nalan ayna alıyor.

F0

Nevin bugün menemen yemeli.

Kardeşim Nalan yeni ayna alıyor.

Sentence Type Positive Negative

Declaratives 25 15

Wh-questions 10 5

Yes-no questions 10 5

Conditionals 6 4

Imperatives 6 4

Exclamations 6 4

Sentences• 100 database sentences

• 19 close test sentences (add/remove categories)

• 18 random test sentences

• Syllable-based handlabeling

• Pitch extraction

Observations

• Pitch decrease at the end (terminal intonation)

• Division into phrases

• Pitch increase on the phrase-final syllable (progressive intonation)

Declaratives

Nevin/bugün/menemen yemeli.

Observations

• Pitch decrease at the end (terminal intonation)

• Division into phrases

• Pitch increase on the phrase-final syllable (progressive intonation)

Declaratives

Evvelki gün/ikimiz de/kuyumcu Ali’ye uğradık.

Observations

• Pitch increase on the last syllable (interrogative intonation)

• Evident pitch increase on the stressed syllable of the wh-word

• No division into phrases

• Word stress often disappears

Wh-questions

Dün neden zamanımı aldın?

Observations

• Pitch increase on the last syllable (interrogative intonation)

• Evident pitch increase on the stressed syllable of the wh-word

• No division into phrases

• Word stress often disappears

Wh-questions

Kimler yarın sınıf gezisine katılacaklar?

Observations

• Pitch decrease at the end

• Evident pitch increase on the stressed syllable of the word before -mi

• No division into phrases

• Word stress often disappears

Yes-no questions

Oraları yine eskisi gibi güzel mi?

Observations

• Pitch decrease at the end

• Evident pitch increase on the stressed syllable of the word before -mi

• No division into phrases

• Word stress often disappears

Yes-no questions

Mudanya’da bu sene de çok yağmur yağıyor mu?

Observations

• Pitch decrease at the end (terminal intonation)

• Division into phrases

• Pitch increase on the phrase-final syllable (progressive intonation)

• -se always a phrase-final syllable

Conditionals

İnsan azimliyse herşeyi başarabilir.

Observations

• Pitch decrease at the end (terminal intonation)

• Division into phrases

• Pitch increase on the phrase-final syllable (progressive intonation)

• -se always a phrase-final syllable

Conditionals

Babam keyifsizse ona konuyu bu akşam anlatamam.

Observations

• Pitch decrease at the end (terminal intonation)

• Division into phrases

• Pitch increase on the phrase-final syllable (progressive intonation)

Imperatives

F0

Akşam yemeği için çarşıdan birşeyler alsınlar.

Observations

• Pitch decrease at the end (terminal intonation)

• Division into phrases

• Pitch increase on the phrase-final syllable (progressive intonation)

Imperatives

F0

Sevgiyi ve mutluluğu yarınlara erteleme.

Observations

• Diverse

• Pitch decrease at the end (terminal intonation)

• Evident pitch increase on the stressed syllable of interjection or of another word

Exclamations

Aman büyüklerine bir saygısızlık yapma!

Observations

• Diverse

• Pitch decrease at the end (terminal intonation)

• Evident pitch increase on the stressed syllable of interjection or of another word

Exclamations

Haydi bugün hep birlikte pikniğe gidelim!

Local Observations

• At most single stressed syllable excluding phrase-final increase

• Stress within the sentence coincides with the word stress

• Phrase stress preserved

Ekonomik kriz / her kesimden insanı / olumsuz etkiledi.

Local Observations

• At most single stressed syllable excluding phrase-final increase

• Stress within the sentence coincides with the word stress

• Phrase stress preserved

Evvelki gün / ikimiz de / kuyumcu Ali’ye uğradık.

Local Observations• Word stress may disappear

Beden sağlığımız için akşamları erken yatmalıyız.

Mehmet daha bugün ödevine başlayabildi.

Local Observations• Word stress disappears at the end of positives (terminal intonation)

Nevin bugün menemen yemeli.

Merve evine zamanında dönemez.

Local Observations• Sentence stress (stress on focus)

Nevin bugün menemen yemeli.

Mehmet daha bugün ödevine başlayabildi.

Local Observations

• Effects on neighbour syllables• Unstressed + stressed (ne+vin)

• Stressed + stressed

• nevin+bu+gün

Nevin bugün menemen yemeli.

Local Observations

• Effects on neighbour syllables• Stressed + stressed (Partiye+gelmeyeceğim)

Ben akşam partiye gelmeyeceğim.

Local Observations

• Effects on neighbour syllables• Stressed + unstressed (Gece+rüyasında)

Kardeşim beni dün gece rüyasında görmüş.

Local Observations

• Effects on neighbour syllables• Stressed + unstressed (ney+le)

Bu geç vakitte sizin eve neyle döneceğiz?

Local Observations

• Effects on neighbour syllables• Stressed + unstressed (last syllable, terminal intonation) (değil+di)

Akşamki yemek pek güzel değildi.

Local Observations

• Effects on neighbour syllables• Stressed + unstressed (last syllable, terminal intonation)

(güzel+mi)

Oraları yine eskisi gibi güzel mi?

Methodology

• Choose best sentence from a sentence database• Apply its pitch to the matching regions of input sentence

– Compression / Stretching– Interpolation

• Fit data to remaining regions using interpolation

Overwiev

Choose Best Sentence

Generate Regional Durations

Read Files Apply Pitch

Methodology

• Input information used for sentences– Sentence type (declarative, wh-question, yes-no question,

conditional, imperative, exclamation)

– Sentence state (positive or negative)

– Categories of each word

– Number of syllables of each word

– The index of the syllable bearing word stress, for each word (stress in sentence coincides with word stress)

Read Files

Methodology

• Word categories rely mainly on part-of-speech (POS) categories:

Read Files

Category Examplesnoun elma apple

adjective güzel beautifulpronoun biz we

verb geliyorum I’m comingadverb akşamleyin in the evening

postposition kadar as…asconjunction fakat butinterjection aman  wh-word hangi which

question suffix word almış mı did he takeconditional iyiyse if good

number beş fiveauxiliary şikayet (etti) (he complained)

component Ali’nin Ali’sfocus kitap (okuyor) (he reads) book

comma (,)    

Methodology

• Search in database to find the best sentence

• Search the template sentences with the same– Type

– State

as the input sentence

• Two different approaches for– Sentences other than questions

– Question sentences

Choose Best Sentence

• Calculate sentence resemblance scores based on word resemblance scores (WRS)

• Choose the template sentence having the maximum sentence resemblance score

Sentences other than Questions

Word Resemblance Score (WRS)

• Measure of resemblance of two words

• Consists of– Regional resemblance score (RRS) -> word stress information

– Category match score (CMS) -> word categories

WRS = RRS + CMS

• Makes use of the four regions defined for every word– Region before the stressed syllable

– Stressed syllable

– Region after the stressed syllable

– Phrase-final syllable

• Measure of resemblance of any two words in terms of these regions

• Based on number of syllables in each region

• Consists of– Score of existing regions

– Score of lacking regions

RRS = 0.9 x ERS + 0.1 x LRS

Regional Resemblance Score (RRS)

Calculation of ERS and LRS

score = ERS = LRS = 0 (initialization)for all regions

if the region exists in both wordsscore = min( 1 , (NSRW1 / NSRW2) )ERS = ERS + score

elseif region lacks in both words

LRS = LRS + 1else

LRS = LRS - 1endif

endifendfor

ERS: score of existing regionsLRS: score of lacking regionsNSRW1: number of syllables in related region for first wordNSRW2: number of syllables in related region for second word

Example Calculation of WRS for the words İstanbul and Ankara:

ERS = 1/1 + 1/2 = 3/2LRS = -1 + 1 = 0RRS = 0.9 x 3/2 + 0.1 x 0 = 1.35CMS = 3.7WRS = 1.35 + 3.7 = 5.05

Category Match Score (CMS)

• Category match -> CMS

• CMS = 3.7 (maximum possible value of RRS)

Word Region 1 Region 2 Region 3 Region 4

Ankara - An kara -

İstanbul İs tan bul -

Sentence Resemblance Score

• I1, I2, …,IN : words of the input sentence

• D1, D2, …,DM : words of the template sentence

• MxN S : score matrix with Si,j’s where Si,j = WRS of the pair (Di, Ij)

• Path : (Da, Ib), (Dc, Id), …, (De, If)

with 1 ≤a < c < … < e ≤ M and 1 ≤ b < d < … < f ≤ N

• Score of the path : sum of WRS’s of its pairs

• TASK: Find the path with the maximum score (maximum score path)

• score of maximum score path = sentence resemblance score

• optimum combination of word pairings preserving order

EXAMPLE:

TEMPLATE: Geçen akşam hepimiz müziğin büyüsüne kapılmıştık.

INPUT: Büyük dayımız Kadıköy’deki evinde senelerdir yalnız oturuyor.

(akşam, Büyük), (müziğin, dayımız), (kapılmıştık, evinde): valid

(hepimiz, dayımız), (geçen, evinde), (büyüsüne, yalnız): invalid

(akşam, evinde), (müziğin, dayımız), (kapılmıştık, oturuyor): invalid

(geçen, dayımız), (hepimiz, dayımız), (kapılmıştık, oturuyor): invalid

Procedure

• MxN MPS : maximum path scores matrix

• MxNx2 CMPS : maximum path scores coordinates matrix

• MPSi,j : contains the score of the maximum score path beginning with the pair (Di, Ij)

• CMPSi,j,k : contains the indices of the next pair in the same path ( for example if the max score path of (Di, Ij) is (Di, Ij), (Dm, In), …, (Dp, Iq), then CMPSi,j,1 = m and CMPSi,j,2 = n )

• Recursive generation of MPS from itself and S

• CMPS generated from MPS

for i = M, M-1, … , 1 for j = N, N-1, … , 1 if (i = M) or (j = N) MPSi,j = Si,j

CMPSi,j,1 = CMPSi,j,2 = EMPTY else MPSi,j = Si,j + value of the max element of

{ MPSp,q | i+1 ≤ p ≤ M and j+1 ≤ q ≤ N } CMPSi,j,1 = first indice of max element of { MPSp,q | i+1 ≤ p ≤ M and j+1 ≤ q ≤ N } CMPSi,j,2 = second indice of max element of { MPSp,q | i+1 ≤ p ≤ M and j+1 ≤ q ≤ N } endif endforendfor

Procedure

I1 I2 I3 I4 I5 I6 I7 I8 I9

D1 0 0 0 0 0 0 0 0 0

D2 0 0 1 0 0 0 0 0 0

D3 0 0 0 1 1 1 1 1 1

D4 0 0 0 1 1 1 1 1 1

D5 0 0 0 1 1 1 1 1 1

D6 0 0 0 1 1 1 1 1 1

Finding the maximum score path from MPS and CMPS

• Sentence resemblance score = maxi,j(MPSi,j) = MPSa,b for ex.

• MPSa,b -> max score path begins with (Da, Ib)

• Apply to CMPSa,b,1 and CMPSa,b,2 to obtain the second pair of the path

• If for ex. CMPSa,b,1 = c and CMPSa,b,2 = d -> (Dc, Id) is the second pair

• Similarly, apply to CMPSc,d,1 and CMPSc,d,2 to obtain the third pair of the path etc.

• Entire path is obtained

We obtained answers to the following questions:

• What is the max resemblance capacity of the template sentence to the input sentence?

– Answer: sentence resemblance score (score of the max score path)

• How to arrive this max capacity, i.e. how to match the words and choose the pairs?

– Answer: as in max score path

• Pitch curve of a question < - > Pitch curve of a word

• Whole question regarded as a word

• Use the same regions defined for words– Region before the stressed syllable

– Stressed syllable (stressed syllable of the wh-word or question suffix word)

– Region after the stressed syllable

– Phrase-final syllable (exists for wh-questions)

• Use the same procedure assigning RRS to words to assign sentence resemblance score to the questions

Question Sentences

EXAMPLESentences:Ayşe bugün evde hangi yemeği yaptı?Bu su sesi yukarıdan mı geliyor?

Regions:

Region 1 Region 2 Region 3 Region 4

Ayşebugünevde han giyemeğiyap tı

Bususesiyukarı dan mıgeliyor -

Region 1 Region 2 Region 3 Region 4

6 1 5 1

7 1 4 0

Methodology

• Region -> one or more syllables• Inputs:(related to input and template sentences)

– The label files– The number of syllables for each word– The index of the syllable bearing word stress, for each word– The information whether the last syllable shows a pitch rise or not,

for each word (conditional, wh-question)

• Assumes a perfect duration analysis for the input sentence (label file of input sentence)

• Determines the durations of each region: the onset and end, for each word in both sentences

Generate Regional Durations

Methodology

• Inputs:– Regional durations generated by the previos block

– Pitch contour of the template sentence

– The max score path pertaining to the input and template sentences

• For all pairs of the path, the pitch of the template sentence is applied to the input sentence, for the regions existing in both elements of a pair

• Usage of spline interpolation:– Stretching / compression in time

– Data fitting for nonexisting regions

Apply Pitch

Improvements

• Problem:

unvoiced regions of template sentence + spline -> distortions

• Example:– Input: Yıldızlar dünyadan gündüz görülmez

– Template: Zamanımı televizyonun karşısında boş yere harcayamam

• Path: (zamanımı, yıldızlar), (karşısında, dünyadan), (yere, gündüz), (harcayamam, görülmez)

• Problematic pairs: (karşısında, dünyadan) and (yere, gündüz) – unvoiced regions in karşısında (/k/, /ş/ and /s/) and yere

• Solution: discard zero samples (unvoiced) and then apply

Discarding Unvoiced Regions

F0

F0

Yıldızlar dünyadan gündüz görülmez.

Improvements

• Problem: poor performance of spline outside the borders of data points to be interpolated

• Example:– Input: Didem her akşam odasında günlük gazeteleri okur

– Template: Annem bize her zaman çok lezzetli yemekler pişirir

• Problematic pairs: (annem, didem) and (pişirir, okur)

• Solution: applying the value of the outermost data point to the whole region, if the region goes beyond this data point

Word Region 1 Region 2 Region 3 Region 4didem di dem - -

annem - an nem -

okur o kur - -

pişirir pişi rir - -

F0

F0

Didem her akşam odasında günlük gazeteleri okur.

Improvements

• Problem: spline sometimes yields unsatisfactory results within the data points

• Example:– Input: Çocuklar yazın güneşin altında fazla kalmamalı.

• Problematic region: /zın/ of yazın generated by spline

F0

Çocuklar yazın güneşin altında fazla kalmamalı.

Improvements

• Solution: check spline; spline -> linear interpolation when necessary– Spline check: linear regression line, upper threshold and lower threshold

lines for the pitch of template sentence

• If spline exceeds the threshold lines: spline -> linear

Linear regression and the two threshold lines.

F0

F0

Çocuklar yazın güneşin altında fazla kalmamalı.

Discussion

• good -> choosing from same type and state -> expected

• microprosody degrades performance (unvoiced regions of input sentence unknown)

Performance at sentence ends

F0

Kuzenim Nalan Oya’ya yarın alıyor.

F0

DiscussionPerformance at sentence ends

• good -> choosing from same type and state -> expected

• microprosody degrades performance (unvoiced regions of input sentence unknown)

Mars’ta hayat var mıdır?

Discussion

• erroneous endings (increase instead of decrease) due to template pitch

Performance at sentence ends

F0

F0

Discussion

• erroneous endings (increase instead of decrease) due to template pitch

Performance at sentence ends

F0

F0

Discussion

limited since

• the method is confined to– the capacity of the database (same type, state)

– the capacity of the template sentence

• prosodic boundaries (yazın) and accented syllables unknown

Performance at movements (rises and falls)

F0

Çocuklar yazın güneşin altında fazla kalmamalı.

Discussion

limited since

• the slope of the rise or fall may differ in input and template sentences (bizim)

Performance at movements (rises and falls)

F0

Bizim Nevin domatesli menemen yemeli.

Discussion

limited since

• there may be an absolute difference between pitch values of both sentences (gündüz)

Performance at movements (rises and falls)

F0

Yıldızlar genellikle gündüz görülmez.

Discussion

limited since

• microprosodic effects (kardeşim)

Performance at movements (rises and falls)

F0

Kardeşim Nalan yeni ayna alıyor.

Discussion

limited since

• effects of rises and falls on neighbouring syllables are handled partially (only within words)

Example:

Input: Merve bu sefer zamanında dönemez

Template: Akşamki yemek pek güzel değildi

Merve from yemek (/ye/ of yemek affected by /ki/ of akşamki)

Performance at movements (rises and falls)

Word Region 1 Region 2 Region 3 Region 4Merve Mer ve - -

yemek ye mek - -

F0

Akşamki yemek pek güzel değildi.

Merve bu sefer zamanında dönemez.

Discussion

• High success due to their simple nature:

Performance at questions

F0

Niçin sorularıma cevap vermiyorsun?

Discussion

• High success due to their simple nature:

Performance at questions

Önce nereye bilgi verilmeli?

F0

Discussion

• High success due to their simple nature:

Performance at questions

F0

Ona bu güzel kolyeyi satın almayacak mısın?

Discussion

• Pitch -> speech melody, human perception -> ST scale

• distance d in ST between two frequencies f1 and f2 is given as:

d = 12 x log2 (f1 / f2)

• metrics– mean squared distance between original and synthesized in ST

– proportion < 2ST distance

• compare with baseline solution constructed as:– 6 types x 2 states -> 12 groups of DB sentences

– for each sentence -> median of nonzero pitch

– average of median of sentences in each group -> 12 baselines

Objective Evaluation

Sentence Domain

Average Mean square distance in ST

Average Proportion of distance < 2 ST

Method Baseline p Method Baseline p

Close test

sentences

4.6514 10.5670 2.2160 x 10-5

0.6573 0.4682 0.0043

Random test

sentences

6.9741 8.7683 0.2128 0.6016 0.5928 0.8616

All sentences

5.7814 9.6920 9.6801 x 10-5

0.6302 0.5288 0.0160

All questions

4.3090 9.9181 0.0026 0.7084 0.4547 0.0081

Discussion

Objective Evaluation

Discussion

Objective EvaluationSentence Domain

Number of sentences

Mean square distance in ST

Proportion of distance < 2 ST

Method is better

Baseline is better

Method is better

Baseline is better

Close test sentences

15 4 14 5

Random test

sentences

14 4 11 7

All sentences

29 8 25 12

All questions

10 1 10 1

Discussion

Objective Evaluation

Results

• Method better than baseline in general

• Performance at close test sentences > Performance at random test sentences

• best results in questions

• similar results in both metrics

• ANOVA (analysis of variance)– p = the probability of the means belonging to each method to be equal

– p < 0.10 or 0.05 or 0.01 -> averages statistically significant

Conclusion

• Intonation and stress -> fundamental frequency

• Analysis of pitch contours

• Method based on syntactic structure in terms of word categories and word stress information

• Automatic generation of these inputs from text is relatively easy.

• Makes use of– a sentence database (corpus of natural speech)

– interpolation

• Recordings of a single speaker

Future Work

• Inclusion of other speakers

• A further categorization of words instead of POS categories -> subcategories -> more complex syntactic structures -> larger database for efficiency

• Other inputs:– prosodic boundaries

– accented syllables

and their automatic generation from input text (prosodic description)

• Handling microprosody