effect and artifact in the perception of stress; a cross-linguistic view vincent j. van heuven
TRANSCRIPT
Effect and artifact in the perception of stress; a cross-linguistic view
Vincent J. van Heuven
Introduction, terminology
30 April 2008 Stress UAB 3
Introduction: terms
Stress Abstract linguistic property of a word Position of strongest syllable in word Only one head: culminative property
Accent Phonetic realisation of a stressed syllable
30 April 2008 Stress UAB 4
Introduction: terms
Typically, inventory of stressed syllables is larger than that of unstressed syllables
Identity of word is mainly determined by make-up of stressed syllable
Listeners pay more attention when they expect a stress
Word recognition waits for stressed syll.
30 April 2008 Stress UAB 5
Introduction: terms
Stress is realised by More careful (‘clear’, ‘hyper’) articulation More expanded vowel space Longer duration More intensity (decibels) Flatter spectral tilt (faster adduction) Resistance to assimilation and coarticulation
30 April 2008 Stress UAB 6
Introduction: terms
When word is important in sentence Stress is additionally signalled by
conspicuous pitch movement Movement is associated (‘aligned’) with the
stressed syllable Sentence stress
is sometimes called ‘pitch accent’ [not to be confused with Tokyo Japanese]
30 April 2008 Stress UAB 7
Production ~ Perception
Perception Sentence stress is more prominent than just
word stressA well-aligned pitch movement is always heard
as a stress: strongest cue by far But is not always present
Absent when word has no sentence stress Therefore: pitch is strong but inconsistent
cue
30 April 2008 Stress UAB 8
Production ~ Perception
Production Most consistent cue is relative duration of
rhyme portion in syllable Ratio between stressed and unstressed
version of rhyme (in paradigmatic comparison) is the same, whether pitch movement is present or not
30 April 2008 Stress UAB 9
Aside
Paradigmatic ~ syntagmatic comparison (the) IMport ~ (to) imPORT Do not compare first syll with second syll You will find that unstressed port is longer
and louder (dB) than stressed IM Compare stressed IM with unstressed im,
and stressed PORT with unstressed port
Functional load hypothesis
30 April 2008 Stress UAB 11
Functional load hypothesis
Classical order of importance of stress cues (Fry 1955, 1958, 1965) Pitch (movement) Duration Intensity Spectral expansion
Based mainly on English stress
30 April 2008 Stress UAB 12
Functional load hypothesis
Berinstein (1979) You can spend your money only once If language uses a parameter for segmental
contrast, it cannot use the same parameter as a stress cue
E.g., if a language has long ~ short vowels, duration is no longer an effective stress cue
30 April 2008 Stress UAB 13
Berinstein (1979)
Languages contrasted:position vowel length
English variable, initial no (?)
Spanish variable, prefinal no
K’ekchi fixed, final yes
Caqchiquel fixed, final no
30 April 2008 Stress UAB 14
Functional load hypothesis
Berinstein (1979) English has tense (long) and lax (short)
vowels Spanish has neither tenseness nor length as
a parameter Prediction: duration is less effective stress
cue in English than in Spanish
30 April 2008 Stress UAB 15
Functional load hypothesis
Berinstein (1979) K’ekchi, fixed final stress, with vowel length
contrast Cakchiquel, fixed final stress, no length
contrast Prediction: duration is less effective stress
cue in K’ekchi than in Cakchiquel
30 April 2008 Stress UAB 16
Berinstein (1979)
Perception study Stimuli
/bibibibi/, 100 ms base vowel duration (+ 40 ms for /b/) test vowel has deviant duration
70, 100 (control), 120, 140, 160, 200 ms Listeners
36 native English (mean age 22) 22 monolingual Spanish (mean age 23) 31 K’ekchi (mostly bi-lingual, mean age < 20) 46 Cakchiquel (all bi-lingual, mean age < 20)
clear position bias: more stress judgments as test syllable occurs earlier in the word
86, 67, 62, 46%
huge effect of duration (lengthening > 50% attracts stress)
34, 44, 89, 94%overall effect better than 2x chance
no position bias
34, 34, 32, 32%
small effect of duration:
28, 26, 39, 39%
overall effects just above chance
clear position bias: more stress judgments as test syllable occurs later in the word:
19, 23, 31, 44%
no clear effect of duration manipulations
28, 26, 30, 34%
overall effect hardly above chance
30 April 2008 Stress UAB 20
Berinstein (1979)
Results of perception test (cont.) English
clear position bias: more stress judgments as test syllable occurs earlier in the word
huge effect of duration (lengthening > 50% attracts stress)
overall effect better than 2x chance
Note I replicated the experiment with Dutch listeners results identical to English
30 April 2008 Stress UAB 21
Berinstein (1979)
Results of perception test K’ekchi
clear position bias: more stress judgments as test syllable occurs later in the word
no clear effect of duration manipulations overall effect hardly above chance
Spanish no position bias small effect of duration overall effects just above chance
tiny effect of duration: only 200-ms vowels attract some stress judgments
24, 29, 25, 35%
1 2 3 4
Position of syllable
0
20
40
60
80
100S
tres
s pe
rcei
ved
(%)
English
Spanish
KekchiCakchiquel
120 140 160 200
Vowel duration (ms)
0
20
40
60
80
100
Str
ess
perc
eive
d (%
)
30 April 2008 Stress UAB 24
Berinstein (1979)
Summary of observations re. duration Large effect in English
But English also has length contrast Small effect in Spanish
Even though Spanish has no length contrast Small effect in Kekchi
Even though Kekchi has vowel length contrast Same small effect in Cakchiquel
Even though Cakchiquel has no length contrast
30 April 2008 Stress UAB 25
Berinstein (1979)
Conclusion re. Berinstein (1979) Results simply contradict all predictions
Within the European languages Spanish should use duration more than English (but does not)
Within the Mayan languages Cakchiquel should use duration more than K’ekchi (but does not)
Therefore little credibility for functional load hypothesis
30 April 2008 Stress UAB 26
Berinstein (1979)
Extra: position bias in Berinstein Strong initial-stress bias in English
OK, most words have initial stress Weak final-stress bias in K’ekchi
OK, but why weak? Weak prefinal stress bias in Cakchiquel
Not predicted No stress bias at all in Spanish
Why? What is the distribution of stress in Spanish?
30 April 2008 Stress UAB 27
Functional load hypothesis
Posituk, Gandour & Harper (1996) Thai has five lexical tones
Prediction: pitch cannot be an effective stress cue
Thai contrasts long short vowels Prediction: duration cannot be an effective stress
cue
Acoustic correlates were measured i.e. NOT a perception study
30 April 2008 Stress UAB 28
Potisuk et al. (1996)
Method two male, three female speakers
(read-out speech) 25 sentences with minimal stress pairs
(20 with long vowels, 5 with short vowels)
full 5 x 5 matrix of two-tone sequences
30 April 2008 Stress UAB 29
Potisuk et al. (1996)
Note: stress pairs are not really minimal one is a two-word sequence (N-V) the other is a two-syllable compound
Measurements only initial syllables were measured (paradigmatic) F0 curve, in ERB + Z-transform, time-normalised
(reduction to mean and SD) Rhyme duration (re. sentence duration, within-
speaker normalisation for inherent segment duration)
Intensity curve (normalised within speakers, reduction to mean and SD (through Z-transform)
five-member lexical tone contrast is fully maintained in [–stress], even though F0 curves are flattened considerably
Mean F0: No difference between +stress and –stress
F0 variability: larger for +stress, stronger for some tones than for others (interaction of stress and tone)
Mean intensity: no difference
Intensity variability: no difference
Duration: [+stress] much longer than [–stress], for all lexical tones (i.e. no stress x tone interaction)
30 April 2008 Stress UAB 36
Potisuk et al. (1996)
Results Mean F0: no difference F0 variability: larger for [+stress], stronger for
some tones than for others (interaction of stress and tone)
Mean intensity: no difference Intensity variability: no difference Duration: [+stress] longer than [–stress], for all
lexical tones (i.e. no stress x tone interaction)
30 April 2008 Stress UAB 37
Potisuk et al. (1996)
Acid test: automatic classification by LDA rhyme duration >> F0-SD >> Intensity SD 99% correct classification with duration alone
Interesting point five-member lexical tone contrast is fully maintained
in [–stress], even though F0 curves are flattened considerably
In other languages lexical-tone contrasts may be neutralised in [–stress] conditions
30 April 2008 Stress UAB 38
Potisuk et al. (1996)
Conclusions Results largely go against functional load
hypothesis Duration is by far the strongest correlate
(but should not be) F0 should not be a correlate
and indeed is not in terms of mean F0
But is a good stress cue in terms of F0 range
30 April 2008 Stress UAB 39
Multiple sources of variability
Vowel duration is longer (e.g. Klatt, 1974) in [+long] vowels before deeper prosodic breaks in syllables with word stress in words with sentence stress in slow speech before voiced (and esp.) sonorant consonants
30 April 2008 Stress UAB 40
Multiple sources of variability
Listeners are able to decompose different sources of variability in a parameter E.g. Nooteboom (1979) shows that Dutch
listeners use duration effectively to make multiple simultaneous contrastsLong ~ short vowelsDepth of prosodic break
They adjust the long ~ short boundary depending on the depth of the break
30 April 2008 Stress UAB 41
Functional load hypothesis
Since simultaneous effects are perceptually decomposed, the functional load hypothesis seems too simple Results indicate that we can both have our
cake and eat it ‘Get two for the price of one’
Original hierarchy still stands
Duration as a stress cue in English
30 April 2008 Stress UAB 43
Postnuclear stress contrast?
Beckman & Edwards (1994) Simple prominence hierarchy in English Four degrees of prominence
Full vowel > reduction vowel (schwa) Pitch movement > no pitch accent Last accent > earlier accents
30 April 2008 Stress UAB 44
Postnuclear stress contrast?
Beckman & Edwards (1994) Predictions
Schwa cannot be stressed unless it is transformed to a full vowel first
No contrast between initial and final stress in postnuclear words with full vowels (Scott 1939, Huss 1978).
30 April 2008 Stress UAB 45
Postnuclear stress contrast?
Scott (1939) One sentence, initial stress only Noun ~ verb minimal stress pair 11 listeners, forced choice Response distribution towards initial stress But not significantly so
30 April 2008 Stress UAB 48
Postnuclear stress contrast?
Pilch (1970) Difference between import ~ import is
exclusively a matter of intonation Not carried by stress If intonation cues are removed (by
embedding target in postnuclear position) no difference between noun and verb reading should remain
30 April 2008 Stress UAB 49
Postnuclear stress contrast?
Huss (1978) Used same clever sentences as Scott
Actually, even cleverer
Identical word sequences with different stress pattern on noun~verb pairs in postnuclear position
See examples
(1) It is not true that all nations have always been equally self-sufficient as far as the production of sinks is concerned. The degree of self-sufficiency has
changed during the last year: Whereas formerly the Americans used to import sinks, now the Germans import sinks.
Did you say the Germans import sinks?
(2) It is not true that the balance of payment of all nations has always been equally healthy. The amount of
net import has changed in different ways for different nations: Whereas formerly the Americans’ import
used to sink, now the Germans’ import sinks.
Did you say the Germans’ import sinks?
30 April 2008 Stress UAB 51
Huss (1978)
Method 4 different noun~verb pairs Nuclear~postnuclear target position Statement~question 7 speakers 3 phonetic expert listeners 4 x 7 x 3 = 84 stress judgments per condition
Lexical stress pattern
Perceived as
Noun (initial) Verb (final)
statement Noun 25 75
Verb 24 76
Question Noun 24 76
Verb 14 86
No effect
Trend, χ2 = 1.89 (p = 0.167)
Huss (1978) perception test: Percent responses
(1) [the GERmans] [import sinks]
(2) [the GERmans’ import] [sinks]
Final lengthening of unstressed syll.
No lengthening of stressed syll.
30 April 2008 Stress UAB 54
Huss (1978)
No clear difference between initial and final stress in postnuclear minimal pairs with full vowels only
As predicted by Beckman & Edwards But stress and phrasing confounded Let us keep phrasing constant and vary
stress only. See Huss (1975)
30 April 2008 Stress UAB 55
Huss (1975)
Method 10 minimal stress noun~verb pairs
We FIRST import, he said [Verb, final stress]His FIRST import, he said [Noun, initial stress]
2 male speakers Informal listening procedure Unknown number of listeners (but
phonetically trained)
30 April 2008 Stress UAB 56
Huss (1975)
Perceptual results One group of words with stress perceived in
conformity with noun~verb contrast, high agreement among listeners
In ‘a few words’ listeners did not agree In ‘some other words’ listeners did agree but
reported stress the wrong way around Unfortunately no quantitative data
The decisive auditory parameter in the identification of stress in post-nuclear position, i.e. in the absence of a pitch contrast, was the duration ratio between the two syllables; the experimental follow-up study should bear out which acoustic parameters correlate with this auditory impression.
30 April 2008 Stress UAB 58
Huss (1975)
Perceptual results In pairwise comparison of noun~verb pairs vowel
duration seemed the clearest correlate
Acoustic measurements of one speaker presented (better speaker)
Second speaker had more perceptual ambiguities (and reversals) No quantitative data
Dur
atio
n ra
tio S
1 /
S2
Nouns, initial stress
Verbs, final stress
Duration contrast even more extreme in postnuclear than nuclear stress
30 April 2008 Stress UAB 60
Huss (1975)
Conclusion At least some speakers produce a very
reliable contrast between initial and final stress in postnuclear position in words with full vowels only
The correlate is syllable duration The contrast, when made, is adequately
perceived
30 April 2008 Stress UAB 61
Postnuclear stress contrast?
Beckman & Edwards seem wrong English speakers tend to preserve stress
contrast in postnuclear position English listeners are sensitive to the
contrast even when there is no pitch movement (duration is effective cue)
Same effects were found for DutchNooteboom (1972), van Katwijk (1974), Sluijter &
van Heuven (1996)
30 April 2008 Stress UAB 62
Sluijter & van Heuven (1996)
Prenuclear (unaccented) targets Lexical pair ‘canon~cannon’ Reiterant mimicry
Initial Final
Word stress
0.00
0.25
0.50
0.75
1.00
1.25
Dur
atio
n S
1 / S
2
Lexical
Reiterant
Nuclear
Initial Final
Word stress
0.00
0.25
0.50
0.75
1.00
1.25
Dur
atio
n S
1 / S
2
Pre-nuclear
30 April 2008 Stress UAB 64
Sluijter & van Heuven (1996)
Results Duration (ratio S1/S2) very strong stress cue Equally effective in nuclear and non-nuclear
position Affords 100% stress decisions in LDA
Linear Discriminant AnalysisAutomatic classification algorithm
30 April 2008 Stress UAB 66
Sluijter et al. (1997)
Duration, intensity and loudness as perceptual cues in stress perception in non-nuclear position
Overall result: Duration is strongest cue Loudness (intensity > 500 Hz) is second Intensity is weak cue
30 April 2008 Stress UAB 67
Aside: strength of cues
Standard plots % stress as a function of
X but averaged over all Y stepsY but averaged over all X steps
Observe difference in psychometric function Obscures interaction between X and Y
Alternative: quasi 3D plots
Plot quasi 3-D Determine cross-overs
(50%) in X and Y dimensions, by e.g.
Linear interpolation Probit fitting
Compute linear regression line through points
Determine slope of function
900: X only cue 00: Y only cue 450: equal strength
30 April 2008 Stress UAB 71
Last minute results
Dutch minimal stress pair ‘I have yesterday a canon/cannon heard’
Prenuclear ik heb gisteren een kanon GEHOORD ik heb gisteren een kanon GEHOORD
postnuclear ik heb GISTEREN een kanon gehoord ik heb GISTEREN een kanon gehoord
30 April 2008 Stress UAB 72
Last minute results
Starting from each natural base stimulus 7 manipulations of syllable duration ratio
(using Praat PSOLA) 4 repetitions of each type 20 native Dutch listeners 80 responses per data point
30 April 2008 Stress UAB 74
Last minute results
Duration ration is very effective stress cue in Dutch
Also (smaller) effect of base stimulus Same effects before and after nuclear
accent Same effects are expected for English
30 April 2008 Stress UAB 75
Summing up
Duration is very effective stress cue in Dutch, even in non-nuclear position
It should also be so in English Work in progress at Leiden University
Production and perception of stress in pre- and postnuclear position in Dutch and English.
No results for English at this stage.
Stress bias
30 April 2008 Stress UAB 77
Van Heuven & Menert (1996)
Strange difference Strong initial bias for English (but no fixed
initial stress) Weaker final bias for K’ekchi (although
exceptionless fixed stress)
Why the difference? Bias is partly the result of artifact
30 April 2008 Stress UAB 78
Van Heuven & Menert (1996)
Experiment 1 Synthesized Dutch minimal stress pairs
Monotone 100 Hz flat Declination 100 ... 70 Hz Inclination 100 ... 130 Hz Noise source (i.e. no periodicity, whisper)
Manipulated duration ratio S1 / S2
30 April 2008 Stress UAB 79
Van Heuven & Menert (1996)
Experiment 1: Results Large effects of duration manipulation Strong overall bias for initial stress Reduction of initial-stress bias:
Declination (85%) > Monotone (80%) > Inclination (60%) > Noise (55%).
30 April 2008 Stress UAB 82
Van Heuven & Menert (1996)
Experiment 2: Effect of context Same stimuli & manipulations as before Also preceded by short carrier, so that
first syllable of target does not appear out of the blue
30 April 2008 Stress UAB 83
Van Heuven & Menert (1996)
Experiment 2: Results Isolated targets: Replicates exp 1. Preceding context:
Bias for initial stress completely gone
30 April 2008 Stress UAB 85
Van Heuven & Menert (1996)
Apparently: bias is not inherent but induced by Presence/absence of a preceding context Whether (first syllable of) target has pitch
Suggestion: Bias is induced by virtual pitch jump from
assumed/inferred F0 baseline
30 April 2008 Stress UAB 86
Van Heuven & Menert (1996)
Inferred baseline is speaker’s bottom pitch (roughly 70 Hz)
Prediction The higher the level pitch of an isolated
target, the larger the virtual F0 jump, the stronger the initial stress bias
No bias when target has 70 Hz pitch
30 April 2008 Stress UAB 87
Van Heuven & Menert (1996)
Experiment 3 Same reiterant stimuli Synthesized at 70, 100, 130 and 160 Hz We also manipulated formant settings
+20%, –15%, 0% (neutral)
If virtual pitch jump, then initial stress bias should increase with onset F0
Some initial-stress bias is stimulus induced
Inferred virtual pitch from speaker’s baseline seems justified
Other effects may also play a role
Listeners expect final lengthening in isolated words
Through perceptual compensation last syllable in an equal duration string of four sounds less stressed
Results help to explain why initial stress bias is strong in English and final bias is weaker in Mayan languages K’ekchi and Cakchiquel
Vowel reduction as a stress cue
30 April 2008 Stress UAB 90
FRY (1965): DURATION vs. SPECTRAL REDUCTION
4 Minimal stress pairs (noun vs. verb)CONtract ~ conTRACT
SUBject ~ subJECT
Digest ~ diGEST
Object ~ obJECT
3 duration steps (smaller range than in Fry 1955, 1958)
30 April 2008 Stress UAB 91
DURATION vs. SPECTRAL REDUCTION
3 degrees of vowel reduction/expansion for V1 while keeping V2 constant (mid
value): f1, f2, f3 for V2 while keeping V1 constant (f4, f5, f6) Note: reduction of diphthong /ai/ by
reduction of glide trajectory (full, halfway, none= endpoint only)
duration manipulation
quality manipulationV1<V2
V1=V2
V1>V2
30 April 2008 Stress UAB 94
DURATION vs. SPECTRAL REDUCTION
Intensity (V1=V2) and F0 (120 Hz) were kept constant
Problem? There is a constant 6dB difference between
F1 and F2, i.e., spectral tilt depends on frequency difference between F1 and F2: the larger the distance the flatter the tilt
30 April 2008 Stress UAB 95
DURATION vs. SPECTRAL REDUCTION
RESULTS Effects of duration structure (in spite of
restricted duration range) stronger than of spectral reduction
Effects of reduction of V1 stronger than of V2
30 April 2008 Stress UAB 96
Van Bergem (1993)
Spectral reduction in Dutch Production study
Measurement of F1 and F2 at most stable portion during vowel (least spectral change)
Systematic manipulation of stress, focus, and lexical status of words
Manipulation of focus through question/answer pairs:
Test syllable: can
(What did you buy for your mother?
I bought [CANdy]+F for my mother +C +A +S
(For whom did you buy candy?)
I bought [CANdy]-F for my mother +C -A +S
(Where do they sell beer?)
In our [canTEEN]+F they sell beer +C +A -S
(What do they sell in our canteen?)
In our [canTEEN]-F they sell beer +C -A -S
(What can your sister do for hours?)
My sister can [TALK]+F for hours -C +A
(How long can your sister talk?)
My sister can [TALK]-F for hours -C -A
[CAN]+F (spoken in isolation) ISO
30 April 2008 Stress UAB 98
Van Bergem (1993)
Experimental set-up 15 (male) speakers 7 stress/accent/status
conditions 33 test syllables
yielding 3465 vowel tokens
30 April 2008 Stress UAB 99
Van Bergem (1993)
Selected results For test syllables with /e:/, /o:/ and /a:/ only No function words Spectrally most expanded tokens for
isolated words marginal reduction for +A+S Appreciable reduction for -A Appreciable reduction for -S Effects of A and S are equal and additive
30 April 2008 Stress UAB 100
30 April 2008 Stress UAB 101
Van Bergem (1993)
Notes These are acoustic effects Proper studies of the cue value of
spectral reduction for stress/accent perception have to be carried out yet (for any language whatsoever)
…preferably in relationship with cues to domain-final lenthening
Unified view
30 April 2008 Stress UAB 103
Unified view
There is no unified view I would like to assume that all languages
use stress parameters in the same way Not necessarily in speech production but
certainly in speech perception Although the use of pitch for the marking of
sentence stress may differ
30 April 2008 Stress UAB 104
Unified view
No room for a functional load hypothesis Unclear why duration is such a weak cue
for Spanish in Berinstein (1979) But strong cue in Catalan in recent work
at UAB Also in Spanish?
(Much) more research needed
Thanks for bearing with me