Download - DFG Project BA 737/10: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited."

DFG Project BA 737/10: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited."

Rhythm-typology revisited.

B. Andreeva & W. Barry

Jacques Koreman

Outline

Research questionsRecordingsMeasurementsStatistical analysisResultsDiscussionConclusions and Outlook

Research questions

How do different languages exploit the universal, psycho-acoustically determined means of modifying the prominence of words in an utterance?• duration • fundamental frequency • energy • spectral properties

Do the different word-phonological requirements of a language affect the degree to which the properties are exploited? • duration (length opposition; word stress)

• fundamental frequency (tonal word-accent)

• spectral properties (phonologized vowel reduction) Do speakers of a language vary in the strategies they adopt (for

production and fot perception)?

For further clarification

We have NOT investigated "word stress / word accent"…..…..but rather the change in a given word as a result of making it more or less informationally prominent in the utterance;

i.e., the loss of length distinction in the [o] in German Philosophie vs. Philosoph

or the vowel quality alternation between [ɒ] and [ə] in Englishphilosopher and philosophical

is not the focus of our investigation.(though it may have a bearing on our interpretation of results)

Phrasal (de-)accentuation

Accentuation (phonological) can make prominent (phonetic)…. by lengthening, …. by increasing loudness,…. by changing the pitch

and combinations thereof

De-accentuation can reduce prominence

…. by shortening (including segment elision),

…. by decreasing loudness,

…. by avoiding pitch changes,

…. by reducing spectral distinctiveness. These properties determine the „rhythm type“

The link to ‘rhythm’?

Speech rhythm (as a regular syllable-based or foot-based "beat") is an appealing myth…..

Though we do have a very fine sense of the appropriate temporal patterning of any particular utterance (in any particular situation) …..

... in fact we decode it in terms of information weight. Structural differences between languages are important

.… because they determine the temporal patterns, and they may constrain how words are made prominent.

'Rhythm' = utterance dependent prominence pattern (not only determined by duration)

Principle of our approach

Comparable production task across languages

(different degrees of accentuation on same words by eliciting

different focus conditions for the same sentence)

Material and elicitation

Short sentences were constructed containing two one- or two-syllable "critical words" (CWs), one early (but not initial) and one late (but not final) in the sentence.

+ iterative versions (dada) to support comparisons across languages

Question: Was sagst du? (broad)Response: Der Mann fuhr den Wagen vor.

Question: Wer fuhr den Wagen vor? (narrow early)Response: Der MANN fuhr den Wagen vor.

Question: Was fuhr der Mann vor? (narrow late)Response: Der Mann fuhr den WAGEN vor.

Question: Die DAME fuhr den Wagen vor? (narrow contr. early) Response: Der MANN fuhr den Wagen vor

Question: Der Mann fuhr die KLAGEN vor? (narrow contr. late) Response: Der Mann fuhr den WAGEN vor.

The questions were pre-recorded to accompany a PowerPoint presentation of the responses.

German example (comparable in BG, F, N, RUS)

text dada

Levels of prominence

earlyearly broadbroad latelate

+ stress+ acc.+ nucl.+ narrow

+ stress- acc.- nucl.+ narrow

+ stress- acc.- nucl.+ narrow

+ stress+ acc.+ nucl.+ narrow

+ stress+ acc.- nucl.- narrow

+ stress+ acc.+ nucl.- narrow

CW1 CW2 CW1 CW2 CW1 CW2

Break down of analysis

Material: 6 sentences6 repetitions3 focus condition (broad, narrow, narow contr.)2 sentence positions (early, late)2 realisational variants (lexical, delexicalised

iterative)

Language: Bulgarian, French, German, Norwegian, Russian

Speakers: 6 regionally homogeneous Speakers (3 m, 3 f) per language (Sofia, northern standard French, Saarland, south-east Norway, Moscow area)

Analysis total per language: 2160 utterances

Measurements

Duration Duration (ms) of stressed vowels, stressed syllables, CWs, feet

F0 Mean F0 across stressed vowel of CW

F0 change (comparison of stressed vowelin CW with preceding/following vowels)

Energy intensity (dB) of stressed vowel in CWSpectral balance = difference between 70-1000 Hz band and 1200-5000 Hz band in stressed vowel of CW

Normalized relative to mean across corresp. units in sentenceSpectr. def. F1–F3 at middle of stressed nucleus of CW

Statistical analysis

One Way Repeated Measures ANOVA per parameter for CW1 and CW2 separately

with dependent variables:- duration: syll, onset, vowel; F0 mean, F0 change; intensity, spectral tilt; F1, F2, F3);

with within-subject variable: - prominence (broad, early narrow, late narrow, contr. early narrow, contr. late narrow)

with between-subject variable:- language (BG, D, F, N, RUS)

To see whether the prominence categories are realised differently across languages

Statistical analysis (cont.)

Multivariate Anova’s per language for CW1 and CW2 separately

with dependent variables:- duration: syll, onset, vowel; F0 mean, F0 change;

intensity, spectral tilt; F1, F2, F3)

with independent variable:- prominence (broad, early narrow, late narrow, contr. early narrow, contr. late narrow)

To evaluate wich parameters are used to distiungish prominence categories in the five languages

main effects for language lang. x prominence

Parameter CW1 CW2

syllable dur. onset dur.vowel dur.

n.s.

n.s.

F0 meanF0 change

n.s.

intensityspect. tilt

F1F2F3

n.s.

n.s.

n.s.

n.s.

Parameter CW1 CW2

syllable dur.onset dur.vowel dur.

F0 meanF0 change

n.s.

n.s.

intensityspect. tilt

n.s.

n.s.

F1 F2F3

n.s.

n.s.

n.s.

Results: ANOVA with Repeated measures

Languages use the acoustic carriers of prominence to different degrees:

* Results given here for CW1 but similar patterns for CW2

η2-values are a ratio of conditions (prominence) and total variance, and thus indicate the part of the total variance explained by the focus conditions.

η2-values for prominence

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Bul

BG0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Ger

D0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Fr

F

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Nor

N0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Rus

RUS

syllableonsetvowelF1F2F3intensityspec. TiltF0 changeF0 mean

Results: Duration

Syllable duration range from accented to deaccented (from [dada] recordings):

N > F > D ~ RUS > BG CS1 49% 30% 25% 24% 15%

N > F > RUS > D ~ BGCS2 55% 37% 26% 19% 16%

Note: No apparent connection between vowel-length

opposition and use of duration for accentuation (compare N and D vs. F, RUS and BG)

Results: Duration

CW1 CW2

BG: nc_late < c_early D: c_late < c_early F: late, br < br, early N: late, br < earlyRUS: c_late < nc_early

BG: early < c_lateG: early, br < lateF: early, br < lateN: early < br < lateRUS: early, br < late

c. latenc. latebroad c. earlync. early

Results: F0 range

F0 range in % from accented to deaccented (from [dada]

recordings):

F > D > BG ~ RUS ~ N CS1 29% 23% 18% 14% 13%

F ~ D > BG > RUS > N

CS2 28% 27% 23% 16% 7%

These values do not have any systematic link to pitch accent categories, but note Norwegian (lexical tones)

Results: F0 change

CW1 CW2

BG: -D: late, br < earlyF: late, br < earlyN: late < br < earlyRUS: late < c_early

BG: early, br < c_early, c_late < br, lateD: early < br < lateF: early, br < lateN: -RUS: early, br < br, late


Results: Intensity

Intensity range in dB from deaccented to accented (from [dada] recordings):

BG > F > D = RUS > N CS1 5.6 3.4 2.9 2.9 1.5

BG > F ~ D > RUS > N

CS2 6.1 5.7 5.3 3.9 2.7

Note: Larger intensity range for CS2 than CS1 due to greater post-nuclear than pre-nuclear de-accenting.

Results: Intensity

CW1 CW2

BG: late, br < earlyD: late < br < earlyF: late < br < earlyN: late, br < earlyRUS: late, br < br, nc_early < early

BG: early, br < lateD: early < br < lateF: early < br < lateN: early, br < br, lateRUS: early, br < late


Perception tests

Different values in production analysis imply differential perceptual judgements....

....therefore pairwise presentation of different conditions (broad, contrastive early, contrastive late, non-contrastive early, non-contrastive late)

Continuous prominence values preferable for statisticaltreatment...

....therefore non-categorical judgements (using a graphic interface)

A mouseclick plays the two versions in sequenceThe sequence may be played as often as requiredBoth sequences are offered during the course of the experiment

Der Mann fuhr den Wagen vor.


Erster Satz:

Zweiter Satz:

1. stärker

2. stärker

beide gleich stark

Der Mann fuhr den Wagen vor.Erster Satz:

Zweiter Satz:

1. stärker

2. stärker

beide gleich stark


Interface for 1st critical word Interface for 2nd critical word

Perception tests (cont.)

Signal manipulation:Change one parameter at a time to the value of the opposite prominence status (accented unaccented and vice versa)

Problems: Parameters are not totally independent:Durational change affects F0 contours

Results

Parewise comparison of natural stimuli: The subjects are well able to distinguish the different level of prominence.

Perception with parameter manipulated stimuli:F0 > Duration > Intensity

(Russian subjects are slightly more sensitive to Intensity)

Discussion

Isačenko & Schädlich 1966, Fry 1958 found the same hierarchy in their perception experiments

but Kochanski et al., 2005, Tamburini & Wagner, 2007: Loudness/Intensity as

the main predictor of „prominence“ in their production analyses

N.B. Fry and Isačenko & Schädlich worked exclusively with lexical stress;

Kochanski et al. and T&W combine lexical stress and phrasal prominence and worked only on production

Our results (η2-values) show a similar importance of intensity in production, but the perception work supports Fry and Isačenko & Schädlich‘s conclusions!

Conclusions and Outlook

The languages differ in the degree to which they exploit duration, F0 and intensity in production and to some extent in perception

The differences (in production and perception) are not directly linked to structural differences between the languages

None of the results support the „mythological“ rhythm typology: stress-timed vs. syllable-timed

The complex picture of language differences in production contrasts with an apparent universal perceptional hierarchy (F0 > Duration > Intensity)

All previos rhythm typology work has concentrated solely on duration. Natural communication combines intonation and segmental structure within an information structural framework.

Languages will therefore differ rhythmically as a product of duration AND F0 and rhythm measures need to reflect this.

Download - DFG Project BA 737/10: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited."

Top Related