DFG Project BA 737/10: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited."
Rhythm-typology revisited.
B. Andreeva & W. Barry
Jacques Koreman
Outline
Research questionsRecordingsMeasurementsStatistical analysisResultsDiscussionConclusions and Outlook
Research questions
How do different languages exploit the universal, psycho-acoustically determined means of modifying the prominence of words in an utterance?• duration • fundamental frequency • energy • spectral properties
Do the different word-phonological requirements of a language affect the degree to which the properties are exploited? • duration (length opposition; word stress)
• fundamental frequency (tonal word-accent)
• spectral properties (phonologized vowel reduction) Do speakers of a language vary in the strategies they adopt (for
production and fot perception)?
For further clarification
We have NOT investigated "word stress / word accent"…..…..but rather the change in a given word as a result of making it more or less informationally prominent in the utterance;
i.e., the loss of length distinction in the [o] in German Philosophie vs. Philosoph
or the vowel quality alternation between [ɒ] and [ə] in Englishphilosopher and philosophical
is not the focus of our investigation.(though it may have a bearing on our interpretation of results)
Phrasal (de-)accentuation
Accentuation (phonological) can make prominent (phonetic)…. by lengthening, …. by increasing loudness,…. by changing the pitch
and combinations thereof
De-accentuation can reduce prominence
…. by shortening (including segment elision),
…. by decreasing loudness,
…. by avoiding pitch changes,
…. by reducing spectral distinctiveness. These properties determine the „rhythm type“
The link to ‘rhythm’?
Speech rhythm (as a regular syllable-based or foot-based "beat") is an appealing myth…..
Though we do have a very fine sense of the appropriate temporal patterning of any particular utterance (in any particular situation) …..
... in fact we decode it in terms of information weight. Structural differences between languages are important
.… because they determine the temporal patterns, and they may constrain how words are made prominent.
'Rhythm' = utterance dependent prominence pattern (not only determined by duration)
Principle of our approach
Comparable production task across languages
(different degrees of accentuation on same words by eliciting
different focus conditions for the same sentence)
Material and elicitation
Short sentences were constructed containing two one- or two-syllable "critical words" (CWs), one early (but not initial) and one late (but not final) in the sentence.
+ iterative versions (dada) to support comparisons across languages
Question: Was sagst du? (broad)Response: Der Mann fuhr den Wagen vor.
Question: Wer fuhr den Wagen vor? (narrow early)Response: Der MANN fuhr den Wagen vor.
Question: Was fuhr der Mann vor? (narrow late)Response: Der Mann fuhr den WAGEN vor.
Question: Die DAME fuhr den Wagen vor? (narrow contr. early) Response: Der MANN fuhr den Wagen vor
Question: Der Mann fuhr die KLAGEN vor? (narrow contr. late) Response: Der Mann fuhr den WAGEN vor.
The questions were pre-recorded to accompany a PowerPoint presentation of the responses.
German example (comparable in BG, F, N, RUS)
text dada
Levels of prominence
earlyearly broadbroad latelate
+ stress+ acc.+ nucl.+ narrow
+ stress- acc.- nucl.+ narrow
+ stress- acc.- nucl.+ narrow
+ stress+ acc.+ nucl.+ narrow
+ stress+ acc.- nucl.- narrow
+ stress+ acc.+ nucl.- narrow
CW1 CW2 CW1 CW2 CW1 CW2
Levels of prominence
earlyearly broadbroad latelate
+ stress+ acc.+ nucl.+ narrow
+ stress- acc.- nucl.+ narrow
+ stress- acc.- nucl.+ narrow
+ stress+ acc.+ nucl.+ narrow
+ stress+ acc.- nucl.- narrow
+ stress+ acc.+ nucl.- narrow
CW1 CW2 CW1 CW2 CW1 CW2
Break down of analysis
Material: 6 sentences6 repetitions3 focus condition (broad, narrow, narow contr.)2 sentence positions (early, late)2 realisational variants (lexical, delexicalised
iterative)
Language: Bulgarian, French, German, Norwegian, Russian
Speakers: 6 regionally homogeneous Speakers (3 m, 3 f) per language (Sofia, northern standard French, Saarland, south-east Norway, Moscow area)
Analysis total per language: 2160 utterances
Measurements
Duration Duration (ms) of stressed vowels, stressed syllables, CWs, feet
F0 Mean F0 across stressed vowel of CW
F0 change (comparison of stressed vowelin CW with preceding/following vowels)
Energy intensity (dB) of stressed vowel in CWSpectral balance = difference between 70-1000 Hz band and 1200-5000 Hz band in stressed vowel of CW
Normalized relative to mean across corresp. units in sentenceSpectr. def. F1–F3 at middle of stressed nucleus of CW
Statistical analysis
One Way Repeated Measures ANOVA per parameter for CW1 and CW2 separately
with dependent variables:- duration: syll, onset, vowel; F0 mean, F0 change; intensity, spectral tilt; F1, F2, F3);
with within-subject variable: - prominence (broad, early narrow, late narrow, contr. early narrow, contr. late narrow)
with between-subject variable:- language (BG, D, F, N, RUS)
To see whether the prominence categories are realised differently across languages
Statistical analysis (cont.)
Multivariate Anova’s per language for CW1 and CW2 separately
with dependent variables:- duration: syll, onset, vowel; F0 mean, F0 change;
intensity, spectral tilt; F1, F2, F3)
with independent variable:- prominence (broad, early narrow, late narrow, contr. early narrow, contr. late narrow)
To evaluate wich parameters are used to distiungish prominence categories in the five languages
main effects for language lang. x prominence
Parameter CW1 CW2
syllable dur. onset dur.vowel dur.
n.s.
n.s.
F0 meanF0 change
n.s.
intensityspect. tilt
F1F2F3
n.s.
n.s.
n.s.
n.s.
Parameter CW1 CW2
syllable dur.onset dur.vowel dur.
F0 meanF0 change
n.s.
n.s.
intensityspect. tilt
n.s.
n.s.
F1 F2F3
n.s.
n.s.
n.s.
Results: ANOVA with Repeated measures
Languages use the acoustic carriers of prominence to different degrees:
* Results given here for CW1 but similar patterns for CW2
η2-values are a ratio of conditions (prominence) and total variance, and thus indicate the part of the total variance explained by the focus conditions.
η2-values for prominence
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
Bul
BG0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
Ger
D0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
Fr
F
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
Nor
N0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
Rus
RUS
syllableonsetvowelF1F2F3intensityspec. TiltF0 changeF0 mean
Results: Duration
Syllable duration range from accented to deaccented (from [dada] recordings):
N > F > D ~ RUS > BG CS1 49% 30% 25% 24% 15%
N > F > RUS > D ~ BGCS2 55% 37% 26% 19% 16%
Note: No apparent connection between vowel-length
opposition and use of duration for accentuation (compare N and D vs. F, RUS and BG)
Results: Duration
CW1 CW2
BG: nc_late < c_early D: c_late < c_early F: late, br < br, early N: late, br < earlyRUS: c_late < nc_early
BG: early < c_lateG: early, br < lateF: early, br < lateN: early < br < lateRUS: early, br < late
c. latenc. latebroad c. earlync. early
Results: F0 range
F0 range in % from accented to deaccented (from [dada]
recordings):
F > D > BG ~ RUS ~ N CS1 29% 23% 18% 14% 13%
F ~ D > BG > RUS > N
CS2 28% 27% 23% 16% 7%
These values do not have any systematic link to pitch accent categories, but note Norwegian (lexical tones)
Results: F0 change
CW1 CW2
BG: -D: late, br < earlyF: late, br < earlyN: late < br < earlyRUS: late < c_early
BG: early, br < c_early, c_late < br, lateD: early < br < lateF: early, br < lateN: -RUS: early, br < br, late
c. latenc. latebroad c. earlync. early
Results: Intensity
Intensity range in dB from deaccented to accented (from [dada] recordings):
BG > F > D = RUS > N CS1 5.6 3.4 2.9 2.9 1.5
BG > F ~ D > RUS > N
CS2 6.1 5.7 5.3 3.9 2.7
Note: Larger intensity range for CS2 than CS1 due to greater post-nuclear than pre-nuclear de-accenting.
Results: Intensity
CW1 CW2
BG: late, br < earlyD: late < br < earlyF: late < br < earlyN: late, br < earlyRUS: late, br < br, nc_early < early
BG: early, br < lateD: early < br < lateF: early < br < lateN: early, br < br, lateRUS: early, br < late
c. latenc. latebroad c. earlync. early
Perception tests
Different values in production analysis imply differential perceptual judgements....
....therefore pairwise presentation of different conditions (broad, contrastive early, contrastive late, non-contrastive early, non-contrastive late)
Continuous prominence values preferable for statisticaltreatment...
....therefore non-categorical judgements (using a graphic interface)
A mouseclick plays the two versions in sequenceThe sequence may be played as often as requiredBoth sequences are offered during the course of the experiment
Der Mann fuhr den Wagen vor.
Der Mann fuhr den Wagen vor.
Erster Satz:
Zweiter Satz:
1. stärker
2. stärker
beide gleich stark
Der Mann fuhr den Wagen vor.Erster Satz:
Zweiter Satz:
1. stärker
2. stärker
beide gleich stark
Der Mann fuhr den Wagen vor.
Interface for 1st critical word Interface for 2nd critical word
Perception tests (cont.)
Signal manipulation:Change one parameter at a time to the value of the opposite prominence status (accented unaccented and vice versa)
Problems: Parameters are not totally independent:Durational change affects F0 contours
Results
Parewise comparison of natural stimuli: The subjects are well able to distinguish the different level of prominence.
Perception with parameter manipulated stimuli:F0 > Duration > Intensity
(Russian subjects are slightly more sensitive to Intensity)
Discussion
Isačenko & Schädlich 1966, Fry 1958 found the same hierarchy in their perception experiments
but Kochanski et al., 2005, Tamburini & Wagner, 2007: Loudness/Intensity as
the main predictor of „prominence“ in their production analyses
N.B. Fry and Isačenko & Schädlich worked exclusively with lexical stress;
Kochanski et al. and T&W combine lexical stress and phrasal prominence and worked only on production
Our results (η2-values) show a similar importance of intensity in production, but the perception work supports Fry and Isačenko & Schädlich‘s conclusions!
Conclusions and Outlook
The languages differ in the degree to which they exploit duration, F0 and intensity in production and to some extent in perception
The differences (in production and perception) are not directly linked to structural differences between the languages
None of the results support the „mythological“ rhythm typology: stress-timed vs. syllable-timed
The complex picture of language differences in production contrasts with an apparent universal perceptional hierarchy (F0 > Duration > Intensity)
All previos rhythm typology work has concentrated solely on duration. Natural communication combines intonation and segmental structure within an information structural framework.
Languages will therefore differ rhythmically as a product of duration AND F0 and rhythm measures need to reflect this.