terminal intonation patterns in single-accent utterances ...€¦ · phonetics institute. although...
TRANSCRIPT
I l l
Klaus J. Kohler
Terminal Intonation Patterns in Single-Accent Utterances of German:
Phonetics, Phonology and Semantics
1. I n t r o d u c t i o n
1.1 Hypotheses
This c o n t r i b u t i o n deals w i t h Hypotheses (2) and (3) o u t l i n e d i n 2.1.2 and
2.1.3 o f C o n t r i b u t i o n I (Kohler, 1991b), i . e . w i t h the alignment of an FO
peak r e l a t i v e t o stressed vowel onset i n terminal utterances c o n t a i n i n g one
accent. Section 2. i s concerned w i t h the FO peak p o s i t i o n s i n sentences t h a t
have a unique accent placement because they are made up of j u s t one content
word beside several reduced f u n c t i o n words. Section 3. looks at FO peaks i n
sentences w i t h a l t e r n a t e accent places due t o l e x i c a l s t r e s s oppositions or
to d i f f e r e n t sentence focus. I t d e l i m i t s the s t r e s s and intonation f u n c t i o n s
of FO peaks and discusses t h e i r i n t e r a c t i o n s , also w i t h reference t o the
data presented i n C o n t r i b u t i o n IV ( H e r t r i c h , 1991a). As t h i s w i l l i nvolve
the perceptual ambiguity between one and two accents, peak sequences w i l l
also have t o be considered b r i e f l y w i t h reference t o C o n t r i b u t i o n VI
( H e r t r i c h , 1991b).
1.2 Types o f phonological s t r u c t u r e s f o r perceptual t e s t i n g
The i n v e s t i g a t i o n i s perceptual, aiming at the (phonological) c a t e g o r i z a t i o n
of phonetic FO peak s h i f t continua across a number o f d i f f e r e n t s y l l a b l e
s t r u c t u r e s (long vs. short vowel, s y l l a b l e - i n i t i a l l a t e r a l vs. g l i d e vs.
g l o t t a l stop (creaky voice) vs. voiced f r i c a t i v e , post-nuclear voiced vs.
voiceless consonant) as we l l as two p o t e n t i a l accent p o s i t i o n s i n words
( p r e f i x or stem s t r e s s ) and sentences ( s u b j e c t or verb f o c u s ) ,
1.3 Stimulus generation^
In a l l cases, several n a t u r a l l y produced tokens o f the p a r t i c u l a r sentence
type under s c r u t i n y were recorded on analogue tape (Revox A77, 19cm/s) by
the same male speaker (KK, the author) under s t u d i o c o n d i t i o n s i n the Ki e l
Phonetics I n s t i t u t e . Although a medial peak p o s i t i o n was t o be the basis f o r
stimulus manipulation i n most experiments (but see 3.1 f o r the choice o f an
ea r l y peak as w e l l ) , e a r l y and l a t e peaks were also c o l l e c t e d o f each
l i n g u i s t i c item t o sp e c i f y the ranges of FO peaks from e a r l y t o l a t e t h a t
would have t o be covered by the t e s t s e r i e s , and i n order t o provide
i n f o r m a t i o n about the shapes o f the d i f f e r e n t peaks t o be taken i n t o account
^ The s t i m u l i f o r 2.1.1.2-5 and 3.1-2 were generated by Michael Weinhold.
117
i n the synthesis. The recorded data were checked a u d i t o r i l y f o r successful
rendering o f the intended phonetic s t r u c t u r e s , and, a f t e r A/D conversion
(10 kHz, 5 kHz low-pass f i l t e r ) , the acceptable tokens were processed on a
Data General Eclipse S230 computer w i t h the K i e l Phonetics I n s t i t u t e SSP
programme package (as regards the p i t c h a l g o r i t h m , see Schafer-Vincent,
1982, 1983). Obvious FO analysis e r r o r s (octave jumps, missing FO values i n
s p i t e o f c l e a r p e r i o d i c i t y i n the s i g n a l ) were cor r e c t e d manually.
Then one token c o n t a i n i n g an a u d i t o r i l y c l a s s i f i e d medial (or e a r l y ) peak
was selected and i t s peak contour s h i f t e d along the time axis t o the l e f t
and t o the r i g h t i n a number of steps o f f i x e d d u r a t i o n determined
separately f o r each utterance, t o create new FO versions. The s h i f t was
e f f e c t e d e i t h e r as a p a r a l l e l t r a n s p o s i t i o n o f both branches o f the peak
contour, or the f a l l i n g branch was time-expanded i n l e f t s h i f t s as f a r as
the o r i g i n a l r ight-hand base p o i n t , t o approximate n a t u r a l productions by a
less steep descent and t o avoid too long a low FO s t r e t c h i n the LPC
synthesis. The two types o f l e f t s h i f t do not a l t e r the basic
c h a r a c t e r i s t i c s o f medial t o e a r l y peak changes; the p a r a l l e l t r a n s p o s i t i o n
of the whole peak c o n f i g u r a t i o n simply sounds more f i n a l and c a t e g o r i c a l
than the one w i t h the f l a t t e n e d f a l l . A f t e r the s h i f t , the t a i l contour was
j o i n e d t o the new peak p o s i t i o n by expansion or compression, s i m i l a r l y the
immediate precursor, and f i n a l l y FO was masked i n vo i c e l e s s s t r e t c h e s .
Fig. 1 i l l u s t r a t e s the p r i n c i p l e s o f generating FO peak s h i f t v e rsions.
The o r i g i n a l utterances were then synthesized w i t h the LPC a n a l y s i s values
and the new FO versions obtained through the peak s h i f t parameter
manipulation.
1.4. Perception experiments
Two types o f d i s c r i m i n a t i o n and of i d e n t i f i c a t i o n t e s t s were performed:
(1) A quick s e r i a l discrimination t e s t , i n which l i s t e n e r s were presented
w i t h the ordered series o f peak s h i f t s t i m u l i from l e f t t o r i g h t or
r i g h t t o l e f t and asked fo u r questions on prepared answer sheets; f o r
each question they heard the se r i e s at l e a s t once.
(a) Do you perceive any changes i n the melody o f the sentence from one
stimulus t o the next?
No - one change - several changes.
118
r i M E < R E L > 1 I [SEC]
m—^ ' S ie ha t j a g e
PITCH CHZ]
0J_
I !
TIME(REL) CSEC]
PITCH [HZ]
Fig. 1
(a) Speech wave and fundamental frequency ( l i n e a r scale) o f a medial peak i n the n a t u r a l l y produced utterance "Sie hat j a gelogen." ("She's been l y i n g . " ) . The end contour (on the s y l l a b l e geQ) was added by FO parameter manipulation because the analysis d i d not provide i t . The time marks A i , kz d e l i m i t the FO peak contour ( c o i n c i d i n g approximately w i t h / o : / ) , which was s h i f t e d l e f t and r i g h t . (b) The l e f t - and right-most p o s i t i o n s of the s h i f t e d FO peak contour on the same time scale as i n ( a ) , approximating the n a t u r a l productions of e a r l y and l a t e peaks, r e s p e c t i v e l y .
119
(b) At which stimulus i n the series has the f i r s t change occurred?
E n c i r c l e the r e l e v a n t number.
(c) At which s t i m u l i i n the series have f u r t h e r changes occurred?
E n c i r c l e the r e l e v a n t numbers.
(d) What are the meanings o f the o r i g i n a l utterance and o f utterances
representing the f i r s t and f u r t h e r changes i n the series?
The t e s t tape c o n s t r u c t i o n had the f o l l o w i n g format:
200-ms bleep
800-ms pause
stimulus 1 (or n)
3-s pause
stimulus 2 (or n - 1)
3- s pause
•
stimulus n (or 1).
(2) A formal randomized AX or XA discrimination t e s t , i n which a l l the p a i r s
of one or two-step d i f f e r e n c e s , as we l l as o f i d e n t i c a l s t i m u l i
( r e s t r i c t e d t o uneven rank t o l i m i t the t e s t s i z e ) , from the ordered
peak s h i f t s e ries were presented f o r 'same/different' judgements on
prepared answer sheets. Two t e s t tapes were compiled, one f o r the
ascending and one f o r the descending order o f arrangement o f s t i m u l i
w i t h i n the p a i r s , and each co n t a i n i n g a randomization o f 2 r e p e t i t i o n s
of a l l the d i f f e r e n t as we l l as the i d e n t i c a l stimulus p a i r s , w i t h the
f o l l o w i n g general format:
200-ms bleep
800-ms pause
stimulus A (or X)
2-s pause
stimulus X (or A)
4- s pause
and so on f o r a l l the stimulus p a i r s . A f t e r each block o f 10 t e s t items
a f u r t h e r 500-ms bleep was added f o r o r i e n t a t i o n .
(3) A natural stimuli i d e n t i f i c a t i o n t e s t , i n which t h r e e d i f f e r e n t
120
n a t u r a l l y spoken contexts were paired w i t h sentences c o n t a i n i n g each o f
three n a t u r a l l y produced peak p o s i t i o n s - e a r l y , medial, l a t e - f o r
subjects t o judge, on prepared answer sheets, whether the context and
(the melody o f ) the t e s t item matched or not. The t e s t tapes were
compiled i n a short version of 9 items (3 contexts x 3 peaks) and a long
one of 90 items, w i t h 10 r e p e t i t i o n s o f each of the 9 items. In each
tape the s t i m u l i were randomized and foll o w e d the same format as i n the
randomized d i s c r i m i n a t i o n t e s t , w i t h the only d i f f e r e n c e t h a t the pause
between context and t e s t stimulus was 0.5 s.
(4) A synthesized stimuli i d e n t i f i c a t i o n t e s t , i n which one synthesized
context sentence was paired w i t h each stimulus from an e a r l y t o medial
FO peak s h i f t s e r i e s , f o r subjects t o judge, on prepared answer sheets,
whether context and t e s t item matched or not. The t e s t tape contained a
randomization o f 10 r e p e t i t i o n s o f each context and t e s t stimulus
combination, f o l l o w i n g the same format as i n the natu r a l s t i m u l i
i d e n t i f i c a t i o n t e s t .
The t e s t f i l e s were compiled on the computer and output on analogue tape.
The l i s t e n i n g t e s t s (except those i n 2.1.2 and 2.1.3; see the separate
d e s c r i p t i o n s t h e r e ) took place i n the a c o u s t i c a l l y t r e a t e d s t u d i o o f the
Kie l Phonetics I n s t i t u t e . The s t i m u l i were presented v i a loudspeaker t o
v a r i a b l y sized groups o f up t o 8 persons, who were students o f a v a r i e t y o f
subjects i n c l u d i n g p h o n e t i c s / l i n g u i s t i c s / l a n g u a g e s , as we l l as members o f
academic and t e c h n i c a l s t a f f , and "naive" o u t s i d e r s , a l l w i t h German of a
northern v a r i e t y as t h e i r n a t i v e language (except f o r 2.1.2 and 2.1.3; see
the separate d e s c r i p t i o n s t h e r e ) , ^
1.5 I n t e r a c t i v e perceptual t e s t i n g at the computer
The development o f an i n t o n a t i o n model f o r German and i t s RULSYS TTS
implementation (see Co n t r i b u t i o n s I and V I I ; Kohler, 1991b, d) have made i t
possible t o check the perceptual relevance of c e r t a i n changes i n FO
c o n f i g u r a t i o n s very q u i c k l y by generating parametric d i s p l a y s and acoustic
output from orthographic i n p u t (supplemented by a d d i t i o n a l symbolic markers,
Michael Weinhold put together the t e s t tapes, c a r r i e d out the t e s t s , and compiled the data, f o r 2.1.1.2-5 and 3,1-2.
121
such as @ZZ f o r e a r l y or @ZZZ f o r l a t e peaks) and by modifying the acoustic
output i n t e r a c t i v e l y through systematic changes i n the graphic parameter
r e p r e s e n t a t i o n . This can be achieved i n two ways:
(a) In a graphic d i s p l a y o f the type i l l u s t r a t e d i n Fig. 2, FO p o i n t s are
moved, i n s e r t e d , deleted, or changed i n value, and the speech signal i s
regenerated w i t h the new parameter s p e c i f i c a t i o n f o r a u d i t o r y e v a l u a t i o n ,
also f o r a u d i t o r y comparison w i t h the stored o r i g i n a l .
(b) A p i t c h c o n f i g u r a t i o n i s defined by the use of the f r e e v a r i a b l e s X and
Y ( f o r time and frequency) as, f o r example, i n the r u l e
00.01: <VOK,FSTRESS,TERMIN> ^ <TF0=TF0+(X-100)/2.5,T2F0=T2F0+(X-100)/2.5,
T3F0=T3F0+(X-100)/2.5,2F0=Y>,
which means t h a t a (medial) peak p a t t e r n <TERMIN> associated w i t h an
accented vowel (VOK,FSTRESS> and defined by three FO p o i n t s w i t h the time
values TF0,T2F0 and T3F0 i s t o be displaced i n time by adding or s u b t r a c t i n g
the same v a r i a b l e time value X, and/or v e r t i c a l l y expanded or compressed by
varying the frequency value o f the centre FO p o i n t (2F0). An orthographic
input i s then processed by the system up t o t h i s r u l e , when an X-Y plane as
shown i n Fig. 3 appears on the screen, representing 250 time frames of 10 ms
along the h o r i z o n t a l and 250 u n i t s of 1 Hz along the v e r t i c a l . A cursor can
now be moved, e.g. i n 5-unit steps, t o feed the v a r i a b l e s X and Y i n r u l e
00.01 w i t h new values f o r f u r t h e r processing. In r u l e 00.01, the a d d i t i v e
time constant o f -100 resets the zero p o i n t , and the f a c t o r of 1/2.5
rescales the temporal step size from 5 x 10 ms t o 5 x 10/2.5 ms = 20 ms,
a l l o w i n g p a r a l l e l s h i f t s o f a l l FO p o i n t s by 20 ms w i t h one cursor step
along the h o r i z o n t a l t o the r i g h t and t o the l e f t from the (medial) zero
p o s i t i o n . The peak p a t t e r n can thus be c o n t i n u a l l y s h i f t e d along the time
scale and the a u d i t o r y consequences te s t e d i n a quick succession from
stimulus t o stimulus o f the same sentence type. S i m i l a r changes can be made
i n the frequency a x i s .
Both procedures (a) and (b) are very e f f e c t i v e f o r quick hypothesis t e s t i n g
and quick checking o f p o i n t s l e f t open by the more elaborate perception
experiments, and have been used a good deal i n the K i e l I n t o n a t i o n P r o j e c t
t o confirm and expand formal t e s t r e s u l t s as w e l l as t o prepare the ground
f o r new hypotheses and t h e i r e v a l u a t i o n i n group l i s t e n i n g t e s t s .
122
175
150
125
100
75
50
25
FO 130
10 16 22 28 35 42 +B 55 61 67 73 5 6 6 6 7 7 6 7 6 6 6
93 10104 111 20 B 3 7
Fig. 2
RULSYS development system output of the symbolic i n p u t "Sie hat j a gelogen @ZZ." w i t h an e a r l y FO peak. FO ( i n Hz; square parameter and cosine i n t e r p o l a t i o n between defined FO p o i n t s ) and phonetic t r a n s c r i p t i o n aligned t o the time scale (segment and cumulative durations i n c s ) ; cursor p o s i t i o n e d on the peak value; EO = a.
Fig. 3
X-Y plane f o r p r o v i d i n g v a r i a b l e s , defined i n a TTS r u l e (e.g. time and frequency), w i t h new values by moving a cursor along the h o r i z o n t a l and/or the v e r t i c a l a x i s .
123
2. FO peak alignment
2.1. Phonetics and phonology
2.1.1 K i e l experiments on German
The f i r s t question t o be asked w i t h regard t o FO peak alignment i s as t o how
the acoustic continuum of FO maximum value p o s i t i o n from e a r l y ( w e l l before
the onset of the stressed vowel w i t h which i t i s associated) t o medial
(around the stressed vowel ce n t r e ) t o l a t e ( a t the end of the stressed
vowel) i s p a r t i t i o n e d p e r c e p t u a l l y . Is the co n t i n u a l change o f the temporal
r e l a t i o n o f the FO maximum t o stressed vowel onset c o r r e l a t e d w i t h a gradual
perceptual change, or are there c a t e g o r i c a l breaks corresponding t o
phonological switches, and how many o f these have t o be recognized? The
second question, which i s c l o s e l y l i n k e d w i t h the f i r s t one, r e l a t e s t o
whether the perceptual o r g a n i z a t i o n of the physical continuum i s dependent
on the segmental s t r u c t u r e o f the stressed s y l l a b l e , i n p a r t i c u l a r the
du r a t i o n o f the stressed vowel, the c l e a r acoustic segmentability o f
stressed s y l l a b l e i n i t i a l consonants ( l a t e r a l s or f r i c a t i v e s vs. g l i d e s or
creaky onset) and the presence of p o s t - v o c a l i c v o i c i n g . To f i n d answers t o
these questions peak s h i f t s e r i e s were created f o r the f o l l o w i n g f i v e
utterances:
(1) "Sie hat j a gelogen." [ z i fiat 5a §3'lo:§i)] ("She's been l y i n g . " )
(2) "Es i s t j a gelungen." [es i s t 5a ga 'luqan] ( " I t has worked.")
(3) "Sie hat j a g e j o d e l t . " [ z i fiat sa ga ' j o i d a i t ] ("She's been y o d e l l i n g . " )
(4) "Sie mu6 wohl a r b e i t e n . " [ z i mus v o l 'Tasbaitn] ("She w i l l have t o
work.")
(5) "Er i s t j a g e r i t t e n . " [ E B i s t 9a ga ' B i t n ] ("He's been r i d i n g . " )
2.1.1.1 "Sie hat j a gelogen."
Taking the medial FO peak p o s i t i o n o f the o r i g i n a l utterance i n Fig. 1 as a
poin t o f departure, the contour A1A2 was moved along the time axis i n 6
equal steps o f 30 ms each t o the l e f t and 4 corresponding steps t o the
r i g h t . In the t r a n s p o s i t i o n t o the r i g h t , both branches were moved i n
p a r a l l e l , i n the one t o the l e f t , only the r i s i n g branch was, the f a l l i n g
one being expanded between the new maximum p o s i t i o n and the o r i g i n a l r i g h t
base p o i n t . A series w i t h complete p a r a l l e l s h i f t also t o the l e f t was
generated as w e l l , but the LPC synthesis q u a l i t y was i n f e r i o r due t o the
long lo w - l e v e l FO, sounding r a t h e r " m e t a l l i c " , although the p i t c h p a t t e r n
was not unnatural, conveying the meaning o f gr e a t e r f i n a l i t y i n the
124
statement and of less room f o r argument. Moreover, the n a t u r a l productions
o f e a r l y peaks i n t h i s sentence showed the same f l a t t e n e d FO descent. As
informal l i s t e n i n g d i d not suggest a d i f f e r e n t behaviour w i t h regard t o the
perceptual assessment o f s h i f t s i n the peak p o s i t i o n i n the two s e r i e s , the
one w i t h the adjusted f a l l i n g branch was chosen f o r the l i s t e n i n g
experiments.
2.1.1.1.1 D i s c r i m i n a t i o n t e s t s
The 11 s t i m u l i entered i n t o both d i s c r i m i n a t i o n t e s t s (1) and (2) of 1.4 i n
the ascending as w e l l as the descending order.
Results
Table I presents the responses by 60 l i s t e n e r s i n the l e f t - r i g h t peak
sequence o f the s e r i a l discrimination t e s t , Table I I the responses by 33
l i s t e n e r s i n the r i g h t - l e f t sequence.
Table I
Frequency d i s t r i b u t i o n o f 'change has occurred' responses by 60 l i s t e n e r s i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a gelogen." (1 = l e f t - m o s t , 11 = right-most p o s i t i o n )
Stimulus
3 4 5 6 7 8 9 10 11 F i r s t change
perceived 1 4 39 16
Further changes
perceived 1 5 11 15 21 22 11
Total 1 4 40 21 11 15 21 22 11 Table I I Frequency d i s t r i b u t i o n o f 'change has occurred' responses by 33 l i s t e n e r s i n the r i g h t - l e f t sequence o f the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a gelogen." (1 = l e f t - m o s t , 11 = right-most p o s i t i o n )
Stimulus
10 9 8 7 6 5 4 3 2 1 F i r s t change
perceived 5 7 13 4 3 1
Further changes
perceived 1 4 2 6 16 6 10 5 1
Total 5 1 11 15 10 19 7 10 5 1
125
The randomized paired discrimination t e s t i n the ascending or d e r i n g was
c a r r i e d out w i t h a group o f 39 subjects, i n the descending o r d e r i n g w i t h a
d i f f e r e n t group o f 34 sub j e c t s ; each o f the two t e s t s contained the p a i r i n g s
of the i d e n t i c a l s t i m u l i at the uneven rank numbers i n the se r i e s o f 11 t e s t
items described i n 2.1.1.1 and ordered from l e f t - m o s t t o right-most FO
peak p o s i t i o n . Fig. 4 shows the r e s u l t s .
Fig. 4
D i s c r i m i n a t i o n f u n c t i o n s i n the randomized paired discrimination t e s t , showing percentage of ' d i f f e r e n t ' judgements f o r utterance p a i r s o f "Sie hat j a gelogen." w i t h 0-step ( a ) , 1-step ( b ) , or 2-step (c) distances o f FO peak p o s i t i o n s , i n the ord e r i n g l e f t - r i g h t (continuous l i n e ) or r i g h t - l e f t (broken l i n e ) . The stimulus number r e f e r s t o the second stimulus i n the ascending and t o the f i r s t stimulus i n the descending order. 73 sbs., n = 146 at each data p o i n t ( a ) ; 39 sbs., n = 78 i n the l e f t - t o - r i g h t , 34 sbs., n = 68 i n the r i g h t - t o - l e f t o r d e r i n g o f (b) and ( c ) .
126
% 'diffGrent'
2 3 4 5 6 7 8 9 10 11
% 'diffGrGnt' -100
80
h60
Ho
h20
-L. stimnr
J I L
3 4 5 6 7 8 9 10 11
127
Discussion
Both types of t e s t converge i n demonstrating a major and a minor peak i n the
d i s c r i m i n a t i o n f u n c t i o n - around s t i m u l i 5/6 and 9/10, r e s p e c t i v e l y , but
also a strong order e f f e c t . On the one hand, d i s c r i m i n a t i o n i s sharpest, and
equally so i n both orderings o f d i f f e r e n t s t i m u l i , i f the 1-step distance i s
located between s t i m u l i 5 and 6, or, correspondingly, the 2-step distance
between s t i m u l i 5 and 7 ( i . e . f o r the p a i r s 5 - 6, 6 - 5, 5 - 7 and 7 - 5 ) ;
on the other hand, the d i f f e r e n t i a t i o n weakens i f the distance i s located at
a lower p o s i t i o n i n the series f o r the descending sequence (5 - 4, 5 - 3) or
at a higher p o s i t i o n f o r the ascending one ( 6 - 7 , 7 - 8 , 6 - 8 , 7 - 9 ) .
Stimulus 5 i s h i g h l y d i s c r i m i n a t e d i f i t comes second or i s spanned i n the
p a i r , i . e . i n 4 - 5, 3 - 5 , 4 - 6 , and t h i s even occurs by way o f ' f a l s e
alarms' i n i d e n t i c a l p a i r i n g s o f stimulus 5.
So the question a r i s e s as t o what there i s i n the s i g n a l t h a t might mark
stimulus 5 as d i f f e r e n t from a l l the others. Fig. 5 shows the p o s i t i o n s o f
the FO peaks i n s t i m u l i 4 and 5 i n r e l a t i o n t o the speech wave. Stimulus 5
i s the f i r s t one i n the series o f 11 from l e f t t o r i g h t , where the FO
contour enters the accented vowel /o:/ on a r i s i n g slope; i n a l l the
preceding s t i m u l i i n the s e r i e s , FO f a l l s throughout the vowel. In stimulus
5 the increase of acoustic energy i n the t r a n s i t i o n from the consonant / I /
t o the vowel /o:/ i s thus coupled w i t h a r i s i n g FO, the r i s i n g slope o f the
peak contour across / g a l o : / being i n t e n s i f i e d over i t s f i n a l 30 ms. In
stimulus 4 t h i s does not happen, but a f a l l i s i n t e n s i f i e d i nstead. As the
peak i s moved f u r t h e r t o the r i g h t , the FO r i s e becomes p r o g r e s s i v e l y more
extensive over a p r o g r e s s i v e l y longer increase i n acoustic energy up t o the
middle of the vowel, i . e . t o the FO peak p o s i t i o n i n stimulus 7, which
coincides w i t h the o r i g i n a l production. In t h i s continuum, d i s t i n c t i v i t y
between successive s t i m u l i w i l l drop, i f the increase i n the FO r i s e has
reached perceptual s a t u r a t i o n . This seems t o happen a f t e r stimulus 6.
A f u r t h e r s h i f t o f the FO peak t o the r i g h t beyond stimulus 7 r e s u l t s i n an
increasing low FO s t r e t c h (see Fig. 1 ) , which receives the i n t e n s i f i c a t i o n ,
whereas, at the same time, the end of the r i s e i s l i n k e d w i t h a decrease o f
acoustic energy. When both parameter changes are l a r g e enough, successive
s t i m u l i have t h e i r d i s t i n c t i v i t y r a i s e d again. This seems t o happen around
128
s t i m u l i 9 and 10 i n the ascending order, but i s obviously a much weaker
e f f e c t than the change from f a l l i n g t o r i s i n g FO i n the stressed vowel,
producing much lower peaks i n the response f u n c t i o n s .
0
Fig. 5
FO peaks i n s t i m u l i 4 and 5 o f the series of 11 "Sie hat j a gelogen." from l e f t - m o s t t o right-most p o s i t i o n , i n r e l a t i o n t o the speech wave. The v e r t i c a l l i n e s mark the FO maximum.
These r e s u l t s suggest t h a t there i s a maximum o f s e n s i t i v i t y i n the peak
s h i f t continuum i n the area of s t i m u l i 5/6. So any p a i r i n g s within or
progressing towards t h i s area are d i s c r i m i n a t e d best, v i z . 4 - 5 , 5 - 6 ,
6 - 5 , 7 - 6 (and even 8 - 7 ) ; 3 - 5 , 4 - 6 , 5 - 7 , 7 - 5 , 8 - 6 , but not
5 - 4 , 6 - 7 , 7 - 8 , 5 - 3 , 6 - 8 , where the progression i s away from the
area o f high s e n s i t i v i t y . A second, weaker s e n s i t i v i t y peak, i s loca t e d at
s t i m u l i 9/10, but does not surface i n the response f u n c t i o n s f o r the
descending order, because o f the displacement t o the r i g h t o f the
d i s c r i m i n a t i o n curve associated w i t h s t i m u l i 5/6. The acoustic continuum i s
thus p e r c e p t u a l l y p a r t i t i o n e d i n t o two c l e a r l y d e l i m i t e d sections w i t h the
boundary o c c u r r i n g between s t i m u l i 4 and 6, and t h i s perceptual d i v i s i o n
coincides w i t h an acoustic change from f a l l i n g t o r i s i n g FO across stressed
vowel onset. Around the boundary between these two sections, d i s c r i m i n a t i o n
129
i s sharpest, and, as w i l l be seen i n 2.1.1.1.2 and 2.2, the two p e r c e p t u a l l y
determined sections of the acoustic continuum correspond t o two i n t o n a t i o n a l
categories r e l a t e d t o a semantic d i f f e r e n t i a t i o n between 'established' and
'new'.
So i t appears t h a t we are d e a l i n g here w i t h an example of ' c a t e g o r i c a l
perception' (see Repp, 1984), t h i s time i n the domain o f p i t c h (Kohler,
1987a). The data p o i n t t o an abrupt perceptual change when i n the acoustic
continuum the FO peak i s moved i n t o the vowel of the stressed s y l l a b l e . A
f u r t h e r FO peak s h i f t along the acoustic continuum r e s u l t s i n a more gradual
a u d i t o r y change, w i t h a minor s e n s i t i v i t y maximum at a p o i n t where the
i n i t i a l s t r e t c h of l o w - l e v e l FO and the f i n a l weakening of the r i s e - f a l l i n
the stressed vowel become l a r g e enough. The data thus support Hypothesis (2)
(see C o n t r i b u t i o n I ; Kohler, 1991b) as f a r as the abrupt vs. gradual changes
i n perception are concerned. This means t h a t an e a r l y FO peak must
c o n s t i t u t e a phonological category o f German i n t o n a t i o n , c o n t r a s t i n g w i t h a
medial peak, whereas a l a t e peak i s less c l e a r l y separated, although the
perceptual r e s u l t s may t u r n out t o be d i f f e r e n t i f i n accordance w i t h
nat u r a l production the FO peak s h i f t t o l a t e p o s i t i o n s were accompanied by a
s i m i l a r s h i f t of the acoustic energy maximum t o the r i g h t (whereas i n the
stimulus manipulation the energy p r o f i l e of the o r i g i n a l medial FO peak
utterance, synchronized w i t h FO on the vowel centre, was used). The minor
s e n s i t i v i t y maximum i n the response f u n c t i o n could then e a s i l y be boosted
(see 2.1.1.5). In 2.1.1.1.2 and 2.2, f u r t h e r support w i l l be given t o the
o r g a n i z a t i o n o f the semantic f u n c t i o n s i n p a r a l l e l w i t h the perceptual and
phonological s t r u c t u r i n g o f FO peak alignment.
2.1.1.1.2 I d e n t i f i c a t i o n t e s t s
On the basis o f the d i s c r i m i n a t i o n t e s t r e s u l t s and of hypotheses concerning
the semantics of e a r l y , medial and l a t e peaks, three contexts were
constructed:
(1) "Wer einmal l i i g t , dem glaubt man n i c h t , auch wenn er g l e i c h d i e Wahrheit
s p r i c h t . Das g i l t auch f u r Anna."
("Once a l i a r , always a l i a r . This also applies t o Anne.")
This context sets the frame f o r an e s t a b l i s h e d f a c t and the summing up
of an argument, which i s brought t o a close.
(2) " J e t z t versteh' i c h das e r s t . "
130
("Now I understand.")
This context presents a new f a c t and opens up a new argument.
(3) "Oh!"
This context introduces emphatic s u r p r i s e .
Each o f these contexts was spoken n a t u r a l l y and paired w i t h each o f the
three n a t u r a l l y produced peaks i n the sentence "Sie hat j a gelogen." t o form
a natural stimuli i d e n t i f i c a t i o n t e s t according t o 1.4 ( 3 ) . Furthermore, a
synthesized stimuli i d e n t i f i c a t i o n t e s t (see 1.4 ( 4 ) ) was performed w i t h
p a i r i n g s o f context (2) ( " J e t z t " ) and each one o f the f i r s t 8 s t i m u l i i n the
continuum (from l e f t t o r i g h t ) o f 2.1.1.1.
Results
Table I I I and Fig. 6 present the r e s u l t s o f the two t e s t s .
Table I I I
Percentages of 'matching' responses f o r combinations o f 3 contexts and e a r l y , medial or l a t e FO peaks i n the sentence "Sie hat j a gelogen." i n a natural stimuli i d e n t i f i c a t i o n t e s t . 88 subjects
Context
(1) Wer (2) J e t z t (3) Oh
Peak p o s i t i o n
e a r l y 87.5 27.3 8.0
medial 26.1 70.5 72.7
l a t e 13.6 67.0 76.1
Discussion
The r e s u l t s o f combining the 3 contexts and 3 FO peak p o s i t i o n s show t h a t
subjects are able t o make systematic judgements because the responses are
s i g n i f i c a n t l y d i f f e r e n t from chance, being e i t h e r more than 66% or less than
30% i n favour o f 'matching'. This means t h a t the d i f f e r e n t FO peak p o s i t i o n s
must be p e r c e p t u a l l y i d e n t i f i a b l e , and since i n a l l cases the i d e n t i f i c a t i o n
of an e a r l y versus a non-early peak i s f a r more c l e a r l y d i f f e r e n t i a t e d than
t h a t o f a medial versus a l a t e one, t h i s i d e n t i f i c a t i o n t e s t reproduces the
c a t e g o r i z a t i o n o f the d i s c r i m i n a t i o n t e s t s . I t i s only i n the "Wer" context
t h a t the medial vs. l a t e FO peaks y i e l d a s i g n i f i c a n t d i f f e r e n c e i n the
131
% 'matching'
1 2 3 4 5 6 7 8
Fig. 6
I d e n t i f i c a t i o n f u n c t i o n i n the synthesized stimuli i d e n t i f i c a t i o n t e s t , showing percentage 'matching' judgements f o r 8 s t i m u l i "Sie hat j a gelogen." w i t h FO peak s h i f t from l e f t t o r i g h t i n the context " J e t z t versteh i c h das e r s t . " 19 s u b j e c t s ; f o r each stimulus n = 190.
response pa t t e r n s { % = 4.31, p = .05). Contrariwise, the e a r l y p a t t e r n f i t s
l e a s t i n t o the "Oh" context ( d i f f e r e n c e between " J e t z t " and "Oh" contexts
= 31.07, p = .001).
The contextual i z a t i on of the e a r l y t o medial FO peak continuum w i t h the
" J e t z t " i n t r o d u c t i o n ( F i g . 6) shows an abrupt change from 'matching' t o
'non-matching' judgements i n s p i t e of the gradual change along the physical
dimension, and thus adds support t o the assumption o f a c a t e g o r i c a l
perception advanced i n connection w i t h the d i s c r i m i n a t i o n t e s t s . S t i m u l i 1 -
4 represent one perceptual i d e n t i f i c a t i o n category, s t i m u l i 6 - 8 a
132
d i f f e r e n t one. They may be regarded as two phonological c a t e g o r i e s , v i z .
' e a r l y ' and 'medial' FO peaks. The discrimination of s t i m u l i i s sharpest
between these i d e n t i f i c a t i o n c a t e g o r i e s , which i s p r e c i s e l y what the theory
of c a t e g o r i c a l perception p o s t u l a t e s .
2.1.1.1.3 "Sie hat gelogen."
As the o b j e c t i o n was ra i s e d t h a t the responses i n the natural stimuli
i d e n t i f i c a t i o n t e s t of 2.1.1.1.2 might have been influenced by the modal
p a r t i c l e " j a " ( " a f t e r a l l " ; " I see") predetermining the judgement, a new set
of 9 context - peak combinations was generated by e x c i s i n g the signal
p o r t i o n s corresponding t o " j a " from the e x i s t i n g ones used i n the t e s t of
2.1.1.1.2. This s p l i c i n g was easy t o perform because the word was bounded by
si l e n c e (= voiceless occlusions i n [ t ] and [§]). Then two long versions of
the natural stimuli i d e n t i f i c a t i o n t e s t according t o 1.4(3) were generated:
one w i t h the s t i m u l i "Sie hat j a gelogen." and one w i t h "Sie hat gelogen."
These two t e s t s were run at one week's i n t e r v a l w i t h two groups o f subjects
i n the f o l l o w i n g sequence:
Group I (17 su b j e c t s ) d i d the t e s t w i t h the " j a " s t i m u l i f i r s t , the other
t e s t second, f o r Group I I (7 subjects) the order was reversed. Table IV
presents the r e s u l t s .
Table IV
Percentages o f 'matching' responses f o r combinations o f 3 contexts and e a r l y , medial or l a t e FO peaks i n the sentences "Sie hat ( j a ) gelogen." i n a natural stimuli i d e n t i f i c a t i o n t e s t w i t h 10 r e p e t i t i o n s and two groups o f subjects ( I : 17 sbs; I I : 7 sbs); A w i t h , B w i t h o u t " j a "
e a r l y
medial
l a t e
(1) Wer (2) J e t z t (3) Oh
I I I I I I I I I
A 82.9 81.4 31.8 11.4 19.4 15.7
B 86.4 85.7 39.4 38.6 20.6 18.6
A 41.2 28.6 84.1 94.3 65.9 92.9
B 48.2 57.1 80.0 90.0 64.7 84.3
A 25.3 22.9 81.8 98.6 78.2 97.1
B 31.2 37.1 85.3 87.1 87.6 90.0
As i n 2.1.1.1.2 (see Table I I I ) , the responses t o the " j a " s t i m u l i (= A) are
i n a l l cases e i t h e r c l e a r l y p o s i t i v e or negative, and s i g n i f i c a n t l y
133
d i f f e r e n t from equal d i s t r i b u t i o n . Again the 'medial' and ' l a t e ' peaks
produce more s i m i l a r judgement pa t t e r n s than the 'medial' and ' e a r l y ' ones,
and they are only s i g n i f i c a n t l y d i f f e r e n t f o r Group I i n the "Wer" context
(J ^ = 9.66, p = .01) and i n the "Oh" context ( i ^ = 6.44, p = .05). The
strong d i s t i n c t i o n between ' e a r l y ' and 'medial' and the much weaker
d i f f e r e n t i a t i o n f o r 'medial' and ' l a t e ' has thus been confirmed. This
f i n d i n g once more supports the hypothesis o f a c a t e g o r i c a l switch from
' e a r l y ' t o 'medial' and a gradual change from 'medial' t o ' l a t e ' peak
p o s i t i o n s i n the utterance "Sie hat j a gelogen."
For the s t i m u l i w i t h o u t " j a " , i n p r i n c i p l e the same data were obtained. Of
the 18 comparisons of the r e s u l t s f o r utterances w i t h / w i t h o u t " j a " only f o u r
are s t a t i s t i c a l l y s i g n i f i c a n t according t o x t e s t s , the f i r s t one i n Group
I , the others i n Group I I :
(a) the ' l a t e ' peak i n the "Oh" context; = 5.32, p = .05,
(b) the 'medial' peak i n the "Wer" context; '% = 11.67, p = .001,
(c) the ' e a r l y ' peak i n the " J e t z t " context; = 13.75, p = .001,
(d) the ' l a t e ' peak i n the " J e t z t " c o ntext; = 6.89, p = .01.
In ( a ) , ( b ) , (c) the d i f f e r e n c e i mplies an increase i n 'matching' answers
f o r s t i m u l i w i t h o u t " j a " , which i s co n t r a r y t o what would have t o be
expected i f the o b j e c t i o n were v a l i d . In the remaining case ( d ) , however,
there i s a decrease i n 'matching' responses f o r s t i m u l i w i t h o u t " j a " , which
may be taken as an i n d i c a t i o n o f a strengthening through the modal p a r t i c l e
" j a " o f the meaning conveyed by i n t o n a t i o n . But the r e s u l t s cannot be s o l e l y
determined by the p a r t i c l e , a l l the less so since t h i s pairwise t e s t i n g
increases the a e r r o r and may thus r e j e c t the n u l l hypothesis of no
d i s t i n c t i o n between the two utterance types, although i t i s c o r r e c t .
A f u r t h e r o b j e c t i o n might be t h a t the order o f the two t e s t s had an
in f l u e n c e on the r e s u l t s : i f the " j a " s t i m u l i are t e s t e d f i r s t the p a t t e r n
would also be set f o r the s t i m u l i w i t h o u t " j a " . Group I I , f o r which the
order was reversed, should thus produce a s i g n i f i c a n t l y smaller number of
'matching' responses f o r the s t i m u l i w i t h o u t " j a " more f r e q u e n t l y than Group
I , but the above data do not support t h i s assumption. Moreover, the s t i m u l i
w i t h o u t " j a " do not show s i g n i f i c a n t d i s t i n c t i o n s between Groups I and I I ,
w i t h the one exception of the 'medial' peak i n the "Oh" context {% = 9.64,
134
p = .01). In view o f the possible increase o f the a e r r o r , we can thus say
t h a t the t e s t order d i d not have a s i g n i f i c a n t i n f l u e n c e on the response
p a t t e r n s , which are b a s i c a l l y determined by an i n t o n a t i o n a l phonology, i . e .
by ' e a r l y ' vs. 'non-early' FO peak p o s i t i o n s - less s t r o n g l y by 'medial' vs.
' l a t e ' ones -, and which may be heightened, but not replaced by, other
formal means, such as modal p a r t i c l e s .
2.1.1.2 "Es i s t .ia gelungen."
The question now arises as t o whether the p e r c e p t u a l l y r e l e v a n t t i m i n g
d i f f e r e n c e s between d i f f e r e n t peak p o s i t i o n s r e l a t i v e t o stressed vowel
onset are t r a n s f e r a b l e t o other s y l l a b l e s t r u c t u r e s and i n what ways they
may have t o be adjusted. The f i r s t s y l l a b l e s t r u c t u r e selected was the one
con t a i n i n g a p h o n o l o g i c a l l y short vowel, instead o f a long one, i n an
otherwise comparable segment chain: "Es i s t j a gelungen." (see 2.1.1).
Fig. 7 shows the speech wave as we l l as the energy and FO contours i n the
natur a l medial-peak token selected f o r FO peak s h i f t . The t e s t stimulus
generation fo l l o w e d the procedure o f p a r a l l e l s h i f t s o f both branches of the
peak contour (see 1.3). The step s i z e was 30 ms, and one peak was located at
the boundary between the s t r e s s e d - s y l l a b l e i n i t i a l consonant / I / and the
stressed vowel /u/. Fig. 8 shows the 9 d i f f e r e n t peak p o s i t i o n s used f o r the
stimulus generation. Only the quick s e r i a l discrimination t e s t (see 1.4 ( 1 ) )
was performed i n the l e f t - r i g h t sequence w i t h 29 sub j e c t s .
Results
Table V presents the r e s u l t s .
Table V
D i s t r i b u t i o n o f 'change has occurred' responses by 29 l i s t e n e r s i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the 9 s t i m u l i w i t h FO peak s h i f t s i n "Es i s t j a gelungen." (1 = l e f t - m o s t , 9 = right-most p o s i t i o n )
F i r s t change perceived at
Further changes perceived at
4
5
5
19
Stimulus
6
5
Total 21 10
135
.0000
Speech wave, energy and FO contours ( l i n e a r scale) i n the natur a l medial-peak token of "Es i s t j a gelungen." selected f o r FO peak s h i f t . The time marks i n d i c a t e on- and o f f s e t s o f /g/, /a/* / V and /u/. The broken l i n e s mark the l e f t and r i g h t base p o i n t s as w e l l as the maximum o f the peak c o n f i g u r a t i o n t o be s h i f t e d .
Discussion
There i s again an abrupt change i n the response p a t t e r n as the FO peak i s
moved i n t o the stressed vowel. The absolute t i m i n g o f p o s i t i o n s 5 and 6
a f t e r vowel onset, i . e . 30 ms and 60 ms, r e s p e c t i v e l y , i s e x a c t l y the same
as i n the s t i m u l i "Sie hat j a gelogen." (see 2.1.1.1.1). These data p o i n t t o
an absolute time span o f up t o 60 ms i n t o the stressed vowel t h a t i s
responsible f o r a phonological change from ' e a r l y ' t o 'medial' peak, inde
pendent o f the phonological vowel q u a n t i t y and consequently o f vowel dura
t i o n f o l l o w i n g the FO peak, at l e a s t i n d i s y l l a b l e s . This f i n d i n g means t h a t
the 'medial' FO peak has a l a t e r r e l a t i v e p o s i t i o n i n a sho r t vowel than i n
a long one, v i z . cl o s e r t o i t s o f f s e t , and t h i s t i e s i n w i t h the produc
t i o n and perception data i n C o n t r i b u t i o n I I (Gartenberg & Panzlaff-Reuter,
136
0
Fig. 8
Speech wave and FO contour ( l i n e a r scale) i n "Es i s t j a gelungen." w i t h time marks i n d i c a t i n g the 9 FO p o s i t i o n s f o r complete-contour s h i f t from l e f t t o r i g h t .
1991, 5.2). This means, furthermore, t h a t the series of 9 s t i m u l i d i d not
include a proper ' l a t e ' peak: i t would have had t o be loca t e d w e l l i n t o the
unstressed vowel /a/.
2.1.1.3 "Sie hat j a g e j o d e l t . "
The next s y l l a b l e s t r u c t u r e t o be considered contains a long stressed vowel
/o:/, as i n "Sie hat j a gelogen.", but a g l i d e / j / w i t h a much more gradual
a r t i c u l a t o r y / a c o u s t i c t r a n s i t i o n i n the i n i t i a l p o s i t i o n o f the stressed
s y l l a b l e , instead of the more abrupt change associated w i t h the i n i t i a l
l a t e r a l / I / : "Sie hat j a g e j o d e l t . " (see 2.1.1). The question i s as t o
whether the more gradual s p e c t r a l t r a n s i t i o n i n f l u e n c e s the perception o f
the FO t r a n s i t i o n i n t o the stressed vowel, because the FO peak p o s i t i o n
r e l a t i v e t o vowel onset can be less c l e a r l y assessed. Fig. 9 shows the
speech wave as we l l as the energy, FO and spectrum d i s p l a y s i n the na t u r a l
137
2 0 0 n
P I T C H tHZ]
F R E Q U E N C Y CKHZl
Ift f Hi,
Fig. 9
Speech wave, energy, FO ( l i n e a r scale) and sp e c t r a l d i s p l a y s i n the natu r a l medial-peak token o f "Sie hat j a g e j o d e l t . " selected f o r FO peak s h i f t . The time marks i n d i c a t e the l e f t base p o i n t (appr. i n the temporal centre o f the F? t r a n s i t i o n f o r / a j o : / ) , the maximum FO value and the r i g h t base p o i n t .
medial-peak token selected f o r FO peak s h i f t . The t e s t stimulus generation
followed the same procedure as i n 2.1.1.2, w i t h a step s i z e o f 35 ms, and
one peak (nr 5) being located at the temporal centre o f the F2 formant
t r a n s i t i o n i n /ajo:/. Fig. 10 shows the 11 d i f f e r e n t peak p o s i t i o n s used f o r
the stimulus generation. Only the quick s e r i a l discrimination t e s t (see 1.4
( 1 ) ) was performed i n the l e f t - r i g h t sequence w i t h 24 su b j e c t s .
138
Fig. 10
Speech wave and FO contour { l i n e a r scale) i n "Sie hat j a g e j o d e l t . " w i t h time marks i n d i c a t i n g the 11 FO peak p o s i t i o n s f o r complete-contour s h i f t from l e f t t o r i g h t .
Results
Table VI presents the r e s u l t s .
Table VI
D i s t r i b u t i o n o f 'change has occurred' responses by 24 l i s t e n e r s i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the I I s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a g e j o d e l t . " {1 = l e f t m o s t , 11 = right-most p o s i t i o n )
Stimulus 4 5 6 7 8 9 10 11
F i r s t change
perceived 5 9 8 2
Further changes
perceived 1 8 2 3 6 1 6
Total 5 10 16 4 3 6 1 6
139
Discussion
In t h i s case the f i r s t change occurs less a b r u p t l y although i t i s s t i l l
c l e a r l y marked and coincides w i t h the temporal half-way p o s i t i o n o f the FO
peak i n the Fa t r a n s i t i o n . Further changes i n the perceptual p r o f i l e also
occur e a r l i e r than i n the other stimulus types t e s t e d so f a r . A l l t h i s goes
to show t h a t a g l i d e t r a n s i t i o n does i n t e r f e r e w i t h the c a t e g o r i z a t i o n of FO
peaks, but the general p a t t e r n o f a phonological separation o f ' e a r l y ' and
'medial' peaks and a gradual switch from 'medial' t o ' l a t e ' stays.
2.1.1.4 "Sie muB wohl a r b e i t e n . "
The next s y l l a b l e s t r u c t u r e chosen has creaky voice (the phonetic
r e a l i s a t i o n of a s y l l a b l e - i n i t i a l vowel p r e f i x e d by a g l o t t a l stop) before a
. 0 0 0 0
TinE(REL> I CSEC3
50-1
ENERGY CdBl
S P E E C H
200-1
P I T C H CHZI
Fig. 11
Speech wave, energy and FO contours ( l i n e a r scale) i n the natu r a l medial-peak token o f "Sie muB wohl a r b e i t e n . " ( w i t h creaky voice t r a n s i t i o n instead of a g l o t t a l stop i n t e r r u p t i o n of v o i c i n g ) selected f o r FO peak s h i f t . The time marks d e l i m i t the FO peak c o n f i g u r a t i o n t h a t was s h i f t e d ( l e f t and r i g h t base p o i n t s , and maximum).
140
stressed long vowel: "Sie muB wohl a r b e i t e n . " (see 2.1.1). The question i s
as t o whether a creaky voice onset has the same e f f e c t on FO peak
c a t e g o r i z a t i o n as a g l i d e . Fig. 11 shows the speech wave as we l l as the
energy and FO contours i n the token selected f o r FO peak s h i f t . The t e s t
stimulus generation followed the same procedure as i n 2.1.1.2, w i t h a step
size o f 35 ms and one peak ( n r 5) being located at the onset of the more
regul a r g l o t t a l v i b r a t i o n s at the t r a n s i t i o n from / I / t o /an/. Fig. 12 shows
the 11 d i f f e r e n t peak p o s i t i o n s used f o r the stimulus generation. Only the
quick s e r i a l discrimination t e s t (see 1.3 ( 1 ) ) was performed i n the
l e f t - r i g h t sequence w i t h 24 subjects.
0
Speech wave and FO contour ( l i n e a r scale) i n "Sie muB wohl a r b e i t e n " w i t h time marks i n d i c a t i n g the 11 FO peak p o s i t i o n s f o r complete-contour s h i f t from l e f t t o r i g h t .
141
Results
Table V I I presents the r e s u l t s .
Table V I I
D i s t r i b u t i o n o f 'change has occurred' responses by 24 l i s t e n e r s i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie muB wohl a r b e i t e n . " ( 1 = l e f t - m o s t , 11 = right-most p o s i t i o n )
Stimulus
4 5 6 7 8 9 10 11 F i r s t change
perceived 3 20 1
Further changes
perceived 1 2 5 5 8 2
Total 3 20 2 2 5 5 8 2
Discussion
The f i r s t change occurs very a b r u p t l y i n stimulus 5, i . e . about 35 ms a f t e r
the creaky voice t r a n s i t i o n . The perception of l a t e r changes i s spread over
the remainder o f the continuum without c l e a r peaks i n the response f u n c t i o n .
There i s a minor maximum at stimulus 10, i . e . at a s i m i l a r p o s i t i o n as i n
the continuum across "Sie hat j a gelogen.". In every respect "Sie muB wohl
a r b e i t e n . " thus p a t t e r n s w i t h the l a t t e r , r a t h e r than w i t h the case of a
g l i d e t r a n s i t i o n i n "Sie hat j a g e j o d e l t . " . What seems t o be important f o r
FO peak perception i s the abrupt a r t i c u l a t o r y change i n the t r a n s f e r
f u n c t i o n from / I / t o the stressed vowel ( i n "gelungen" as we l l as i n "wohl
a r b e i t e n " ) , not the gradual change i n phonation from voice t o creak t o
voice, superimposed on the a r t i c u l a t o r y switch.
2.1.1.5 "Er i s t j a g e r i t t e n . "
In the s t i m u l i examined so f a r the course o f the peak contour has been
manifested i n the observable FO values. This changes when the post- and/or
pr e - v o c a l i c consonant i n the stressed s y l l a b l e associated w i t h the peak i s
voi c e l e s s . Now p a r t o f the contour has t o be reconstructed before a peak
s h i f t becomes po s s i b l e . F i g . 13 shows the speech wave as w e l l as the energy
and FO contours i n a natu r a l medial-peak token o f "Er i s t j a g e r i t t e n . "
selected f o r FO peak s h i f t (see 2.1.1). The t e s t stimulus generation
142
followed the same procedure as i n 2.1.1.2 w i t h a step s i z e o f 30 ms and 15
peak p o s i t i o n s from l e f t t o r i g h t s t a r t i n g at the beginning of /a/. The
quick s e r i a l discrimination t e s t (see 1.4 ( 1 ) ) was performed i n f o r m a l l y i n
the l e f t - r i g h t sequence by the experimenter. Again there was an abrupt
change i n perception as the peak entered the stressed vowel. But as the peak
was moved i n t o the voiceless s e c t i o n o f / t / i t l o s t i t s c h a r a c t e r i s t i c s ,
becoming lower and lower i n p i t c h . This proves t h a t the maximum value of a
peak contour must be present i n the sign a l f o r i d e n t i f i c a t i o n : i t i s not
reconstructed by a l i s t e n e r from surrounding values of the r i s i n g and
f a l l i n g branches, whereas a low r i g h t base p o i n t may be missing due t o FO
contour t r u n c a t i o n before voicelessness i n f i n a l s y l l a b l e s (see Gartenberg &
Panzlaff-Reuter, 1991, 3.) wi t h o u t detriment t o the peak c h a r a c t e r i s t i c s (on
the c o n t r a r y , there must be t r u n c a t i o n i n c e r t a i n contexts t o guarantee
p a t t e r n i d e n t i t y ) .
.0000
T i n E ( R E L ) CSEC]
E N E R S Y [dB]
X
S P E E C H
200n
P I T C H CHZ]
1.1450
J I
Fig. 13 ~
Speech wave, energy and FO contours ( l i n e a r scale) i n the na t u r a l medial-peak token of "Er i s t j a g e r i t t e n . " selected f o r FO peak s h i f t . The time marks i n d i c a t e on- and o f f s e t s o f /g/, /a/, / b / , / i / , / t / , /n/. The dotted l i n e represents the reconstructed FO i n t e r p o l a t i o n o f the r i g h t branch of the peak contour. The broken l i n e s mark the l e f t and r i g h t base points as we l l as the maximum o f the peak c o n f i g u r a t i o n t o be s h i f t e d .
143
A f u r t h e r s h i f t o f the peak contour maximum t o the onset o f v o i c i n g i n the
f i n a l /n/ approximates the FO c o n f i g u r a t i o n found i n n a t u r a l productions o f
l a t e peaks (see F i g . 14), but the a u d i t o r y impression i s s t i l l t h a t of a
medial peak, not o f a l a t e one. A comparison o f Figs. 13 and 14 shows t h a t
the f i n a l nasals i n medial and l a t e peaks d i f f e r i n amplitude and mode of
vocal f o l d v i b r a t i o n : In medial peaks (and the same would apply t o e a r l y
ones), the low FO f a l l a t the end of an utterance i s accompanied by a drop
i n source amplitude, which weakens unstressed vowels and sonorants
considerably, o f t e n reducing them t o creaky voice and t o i r r e g u l a r breathy
g l o t t a l pulses. In l a t e peaks, t h i s d e c l i n e i s moved t o the r i g h t f o l l o w i n g
the l a t e r FO f a l l , thus keeping a high source amplitude at the onset o f
unstressed vowels and s y l l a b i c sonorants; on the other hand the low FO
s t r e t c h i n the stressed vowel before the peak gets i t s i n t e n s i t y reduced. So
there i s a n a t u r a l p a r a l l e l i s m i n the time courses of FO, source amplitude
and sound i n t e n s i t y f o r the three peak contours. I f i t i s destroyed, the
.0000 1.1450
T i n E ( R E L ) CSEC]
50-1
E N E R S Y CdB]
S P E E C H
Fig. 14
Speech wave, energy and FO contours ( l i n e a r scale) i n a n a t u r a l late-peak token o f "Er i s t j a g e r i t t e n . " The time marks i n d i c a t e on- and o f f s e t s o f /g/, /V, A/, / i / , / t / , /n/.
144
perceptual p a t t e r n i d e n t i t y may be l o s t .
Thus a l a t e peak, p o s i t i o n e d at the sonorant v o i c i n g onset a f t e r a voiceless
obstruent, can only be s u c c e s s f u l l y reconstructed by a l i s t e n e r i f the FO
descent t o the terminal low l e v e l has a high enough source amplitude t o
guarantee s u f f i c i e n t i n t e n s i t y i n the f i n a l sonorant f o r the high f a l l i n g FO
contour t o be a u d i t o r i l y monitored. But a n a t u r a l medial peak utterance w i t h
i t s low f i n a l i n t e n s i t y and g l o t t a l i r r e g u l a r i t y lacks these a t t r i b u t e s and
cannot be turned i n t o a l a t e peak percept simply by an FO s h i f t i n t o the
appropriate l o c a t i o n . The amplitude and d u r a t i o n o f the f i n a l sonorant have
t o be rai s e d considerably at the same time and the mode o f v i b r a t i o n
changed. This can be achieved by t r a n s f e r r i n g the f i n a l /n/ from the l a t e
peak stimulus. Contrariwi s e , w i t h a l a t e peak stimulus as p o i n t of departure
a p e r c e p t u a l l y convincing medial peak p a t t e r n can only be generated i f i n
a d d i t i o n t o the peak s h i f t the f i n a l sonorant i s d r a s t i c a l l y lowered i n
amplitude and shortened. This has also been reproduced i n a RULSYS TTS
formant s y n t h e s i s - b y - r u l e (Kohler, 1991f).
2.1.1.6. "Sie hat .ia g e s t r i t t e n . "
I f a short stressed vowel i s not only followed but also preceded by voice
less obstruents the masking o f peak height as the maximum value i s moved
i n t o the voi c e l e s s section arises s y l l a b l e - i n i t i a l l y as w e l l . F i g . 15 shows
the speech wave as well as the energy and FO contours i n a n a t u r a l medial-
peak token of "Sie hat j a g e s t r i t t e n . " [ z i h a t 9a §9 ' j K i t n ] ("She's been
q u a r r e l l i n g . " ) , where FO sets i n higher i n the stressed vowel compared w i t h
" g e r i t t e n " o f 2.1.1.5, because the i n i t i a l c l u s t e r / J t e / i s much longer than
the i n i t i a l /»/, and FO, r i s i n g from the l e f t base p o i n t i n /a/* has thus
reached a higher value at vowel onset. In a d d i t i o n , t h e r e i s a CFO increase
caused by the preceding voiceless f r i c a t i v e (see Gartenberg & Panzlaff-
Reuter, 1991, 3 . ) . " g e r i t t e n " and " g e s t r i t t e n " converge, however, i n having
t h e i r FO maximum close t o vowel o f f s e t , as i s usual f o r medial peaks i n
short stressed vowels before an unstressed s y l l a b l e (see l o c . c i t . , 5.2.).
The peak s h i f t i n "Sie hat j a g e s t r i t t e n . " was t e s t e d i n t e r a c t i v e l y using
the TTS research t o o l (see 1.5 ( b ) ) w i t h the rule-generated medial peak
p o s i t i o n as a p o i n t o f departure (see Fig. 16a) and a step s i z e o f 20 ms i n
complete p a r a l l e l s h i f t . S i g n i f i c a n t FO values i n the rule-generated
145
.0000
TltlE(REL) I L C S E C ]
55-1
E N E R S Y CdB]
S P E E C H
200-1
P I T C H CHZ]
Fig. 15
Speech wave, energy and FO contours ( l i n e a r scale) i n a n a t u r a l medial-peak token of "Sie hat j a g e s t r i t t e n . " The time marks i n d i c a t e on- and o f f s e t s of
/g/, hi, / J / . A/ , / b / , A/ , A/ , A/ .
utterance at /a/ and /i/ on- and o f f s e t s are 84 Hz, 88 Hz, 144 Hz and
140 Hz, r e s p e c t i v e l y . When the peak i s located 40 ms before the beginning of
/ i / , the peak i s c l e a r l y ' e a r l y ' ; at / i / onset (see Fig. 16b) i t has changed
t o 'medial'. The corresponding s i g n i f i c a n t FO values i n these two p o s i t i o n s
are 88 Hz, 104 Hz, 138 Hz, 86 Hz, and 84 Hz, 94 Hz, 148 Hz, 108 Hz. So the
change from ' e a r l y ' t o 'medial' occurs q u i t e a b r u p t l y i n t h i s s y l l a b l e
s t r u c t u r e as w e l l when the FO r i s e across the vo i c e l e s s c l u s t e r becomes more
extensive than the f a l l , and the FO o f f s e t i n the stressed vowel i s i n the
middle of the FO range between maximum and minimum values i n the utterance.
Thus i n t h i s sentence, the switch from ' e a r l y ' t o 'medial' occurs before
there i s an i n i t i a l FO r i s e i n the stressed vowel, which i s d i f f e r e n t from
a l l the other s y l l a b l e s t r u c t u r e s , w i t h i n i t i a l voiced consonants, analysed
so f a r . The reason f o r t h i s d i f f e r e n c e l i e s i n the CFO i n t e r f e r e n c e , which
i s obviously accounted f o r i n the perception process.
146
175
IBQ
125
lOQ
75
50
iZ ;IS;H ;A :T i J ; AS ;G ; EQ SH;T IR i I \j ; EOi N
j : i :
0 10 16 22 2B 35 4-2 +B 55 61 67 75 Bl B7 96 103 110 117 -5 10 6 6 6 7 7 6 7 6 6 B 6 6 9 7 7 7
125
100
75
50
25
;Z ;ig;H ; A iT ; J iA s iG ; EQ SHiT ;R ; I I t :EO;N
I \/\ \ M M
' i : ; i....
'''rT'\
M M M i M M M i
10 IB 22 2B 35 +2 4B 55 61 67 75 Bl B7 95 103 110 117 5 6 S 6 7 7 6 7 6 6 B 6 B 9 7 7 7
Fig. 16
(a) RULSYS output o f "Sie hat j a g e s t r i t t e n . " (= ( d e f a u l t ) medial peak), (b) 60 ms peak s h i f t t o the l e f t (= f i r s t c l e a r medial peak p o s i t i o n i n l e f t - r i g h t move). FO ( i n Hz; square parameter and cosine i n t e r p o l a t i o n between set FO p o i n t s ) and phonetic t r a n s c r i p t i o n aligned t o the time scale (segment and cumulative durations i n c s ) ; EO = 9 , SH = J".
147
2.1.1.7 Conclusion
The d i s c r i m i n a t i o n and i d e n t i f i c a t i o n t e s t s o f 2.1.1.1-5 a l l p o i n t i n the
same d i r e c t i o n , v i z . the perceptual e x p l o i t a t i o n o f d i f f e r e n t FO peak
synchronizations w i t h stressed vowel onsets and of the ensuing low ( f a l l i n g )
vs. high ( r i s i n g ) FO as a psychophonetic basis f o r phonological
c a t e g o r i z a t i o n at the l e v e l o f i n t o n a t i o n . For an ' e a r l y ' peak, FO i s low i n
the stressed vowel because i t i s on i t s descent at the vowel onset and, i n
complete p a r a l l e l s h i f t , also reaches i t s low end p o i n t e a r l y . I f there i s
v o i c i n g before the stressed vowel, the FO p o i n t at vowel onset i s preceded
by a higher FO value so t h a t FO f a l l s i n t o the accented s y l l a b l e . I f there
i s no previous v o i c i n g i n u t t e r a n c e - i n i t i a l p o s i t i o n o f a stressed s y l l a b l e
beginning w i t h v o i c e l e s s consonants, FO at vowel onset has as low a value as
would r e s u l t from an FO descent across the stressed s y l l a b l e periphery t o
strengthen the low FO l e v e l i n the accented vowel. The ' e a r l y ' peak i s thus
characterized by a high prenuclear FO - e i t h e r d i r e c t l y observable or by
e x t r a p o l a t i o n from the FO s t a r t i n the stressed s y l l a b l e nucleus - and by a
low FO i n the l a t t e r .
C o ntrariwise, the 'medial' peak has a low prenuclear FO, an FO r i s e ( o f at
l e a s t 2 semitones from nuclear vowel onset t o peak value, according t o
i n t e r a c t i v e t e s t i n g ) , and a subsequent descent t o a low FO at a l a t e r p o i n t
i n time than i n an ' e a r l y ' peak. The amount o f descent depends on s y l l a b l e
s t r u c t u r e s , and the r i s e may be absent because o f CFO i n t e r f e r e n c e ,
r e s u l t i n g i n a higher FO s t a r t i n g p o i n t at nucleus onset. So i n a l l cases,
the 'medial' peak accentuates a higher FO l e v e l i n the stressed vowel than
the ' e a r l y ' peak. In a ' l a t e ' peak the r i s e i s extended because i t occurs
l a t e r , but i t i s also p r e f i x e d by a s t r e t c h o f low l e v e l FO.
I n t e r a c t i v e perceptual t e s t i n g (see 1,5) has f u r t h e r shown t h a t the ' e a r l y '
and 'medial' peak pa t t e r n s do not lose t h e i r c h a r a c t e r i s t i c a u d i t o r y
d i f f e r e n c e i f t h e i r r i g h t base p o i n t s have the same FO value at the same
time a f t e r nucleus onset (due t o a f l a t t e n i n g o f the FO descent i n the
'e a r l y ' peak). I t i s thus the FO d i f f e r e n t i a t i o n i n the i n i t i a l p a r t of the
nuclear vowel t h a t counts as the d i s t i n c t i v e f e a t u r e . I f i n a RULSYS
generated ' e a r l y ' peak o f "Sie hat j a gelogen.", FO i s kept at the peak
maximum value up t o a p o i n t i n c l u d i n g the f i r s t 3 FO frames of 10 ms each i n
the stressed vowel, instead of having an immediate f a l l , the aud i t o r y
148
c h a r a c t e r i s t i c s o f the ' e a r l y ' peak are not l o s t . On the other hand, i f ,
s t a r t i n g from a 'medial' peak c o n f i g u r a t i o n i n the above utterance ( w i t h / I /
onset = 88 Hz, /!/ o f f s e t = 116 Hz, stressed vowel onset c o n s i s t i n g o f the
FO sequence 122 - 128 - 128 - 130 Hz), the / I / o f f s e t and a l l the vowel
onset frames are rai s e d t o 130 Hz, the 'medial' peak i s changed i n the
d i r e c t i o n o f an ' e a r l y ' one although the only d i f f e r e n c e between the two
patterns now l i e s i n the r i s e being completed i n the /!/ r a t h e r than
cont i n u i n g i n t o the accented vowel, i . e . i n the presence or absence of a
n u c l e u s - i n i t i a l r i s e . Admittedly, the d i f f e r e n c e between the ' e a r l y ' and
'medial' p a t t e r n s i s c l e a r l y weakened by t h i s m o d i f i c a t i o n , but i t shows
t h a t a 'medial' peak needs an FO r i s e i n the nucleus a f t e r a sonorant.
2.1.2 Munich experiments on German
The r e s u l t s o f the K i e l experiments n a t u r a l l y prompted the question as t o
how widespread the phonological c a t e g o r i z a t i o n of FO peak p o s i t i o n s i s i n
the i n t o n a t i o n system o f German i n general. Therefore the s e r i a l
discrimination and randomized paired discrimination t e s t s i n the ascending
ordering (2.1.1.1.1) as we l l as the synthesized stimuli i d e n t i f i c a t i o n t e s t
(2.1.1.1.2) were repeated i n the Phonetics I n s t i t u t e o f Munich U n i v e r s i t y ^
w i t h groups o f l i s t e n e r s o f a Bavarian d i a l e c t background. The t e s t s were
performed i n the I n s t i t u t e language l a b o r a t o r y and the s t i m u l i were
presented over headphones. Tapes and i n s t r u c t i o n s were i d e n t i c a l t o the ones
i n the K i e l experiments. 11 l i s t e n e r s p a r t i c i p a t e d i n the s e r i a l
discrimination t e s t and i n the i d e n t i f i c a t i o n t e s t , 14 i n the randomized
paired discrimination t e s t .
Results
Table V I I I presents the r e s u l t s of the s e r i a l discrimination t e s t .
3
I wish t o thank Dr Anton B a t l i n e r f o r organizing the t e s t runs.
149
Table V I I I
D i s t r i b u t i o n o f 'change has occurred' responses of 11 Munich l i s t e n e r s i n the l e f t - r i g h t sequence of the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a gelogen." (1 = l e f t - m o s t , 11 = r i g h t - m o s t )
Stimulus
4 5 6 7 8 9 10 11 F i r s t change
perceived 1 9 1
Further changes
perceived 4 1 3 2 1
Total 1 9 1 4 1 3 2 1 Fig. 17 shows the r e s u l t s o f the randomized paired discrimination t e s t .
T100
-80
-60
-40
-20
% 'different'
1 11
Fig. 17
D i s c r i m i n a t i o n f u n c t i o n s i n the randomized paired discrimination t e s t , showing percentage of ' d i f f e r e n t ' judgements f o r utterance p a i r s o f "Sie hat j a gelogen." w i t h 0-step ( a ) , 1-step ( b ) , or 2-step (c) distances o f FO peak p o s i t i o n s , i n the or d e r i n g l e f t - r i g h t . The stimulus numbers r e f e r t o the second stimulus. 14 sbs., n = 28 at each data p o i n t i n ( a ) , ( b ) , ( c ) .
150
% 'different' -r100
Fig. 18 shows the r e s u l t s o f the synthesized stimuli i d e n t i f i c a t i o n t e s t .
% 'matching'
stim nr 1 1 I I I I I I I
1 2 3 4 5 6 7 8
Fig. 18
I d e n t i f i c a t i o n f u n c t i o n i n the synthesized stimuli i d e n t i f i c a t i o n t e s t , showing percentage 'matching' judgements f o r 8 s t i m u l i "Sie hat j a gelogen." w i t h FO peak s h i f t from l e f t t o r i g h t i n the context " J e t z t versteh' ich das e r s t . " 11 su b j e c t s ; f o r each stimulus n = 110.
Discussion
The comparison of Tables I and V I I I shows t h a t the Munich group has the same
type o f response p a t t e r n w i t h a maximum f o r stimulus 5 and a large s c a t t e r
f o r f u r t h e r changes i n the series from stimulus 7 t o 11. However, due t o the
much smaller number of subjects i n the Munich group, the minor peak i n the
response curve does not show up so c l e a r l y . The r e s u l t s o f the s e r i a l
discrimination t e s t are supported by those o f the randomized paired
discrimination t e s t of Fig. 17 ( i n comparison w i t h Fig. 4 ) . There i s again a
maximum o f s e n s i t i v i t y i n the FO peak s h i f t continuum i n the area o f s t i m u l i
5/6 and a second, weaker s e n s i t i v i t y peak at stimulus 9, but the s e n s i t i v i t y
area i s narrower, w i t h the p a i r i n g s 5 - 6 and 3 - 5 not being included i n
152
the maxima, and there i s no peak o f ' f a l s e alarms' f o r the 5 - 5 p a i r . The
i d e n t i f i c a t i o n f u n c t i o n o f Fig. 18 ( i n comparison w i t h Fig. 5) po i n t s t o the
same two perceptual i d e n t i f i c a t i o n categories comprising s t i m u l i 1 - 4 , on
the one hand, and s t i m u l i 6 - 8, on the other, but w i t h a l o t more noise (an
o f f s e t o f about 20% - 30%) i n the f i r s t category and at stimulus 5, the
boundary between the two. We may again associate t h i s p a r t i t i o n i n g w i t h the
two phonological categories o f ' e a r l y ' and 'medial' FO peaks. The Munich
r e s u l t s are thus i n agreement w i t h the K i e l data and allow the
g e n e r a l i z a t i o n o f a perceptual and phonological c a t e g o r i z a t i o n o f FO peak
p o s i t i o n s r e l a t i v e t o stressed vowel onset f o r the i n t o n a t i o n o f German
across re g i o n a l v a r i e t i e s .
2.1.3 Experiments on other languages
What remained an open issue a f t e r the very c l e a r r e s u l t s o f the experiments
on d i f f e r e n t v a r i e t i e s o f German was whether we are here dealing w i t h a
phonological c a t e g o r i z a t i o n of German, a l b e i t on a psychophonetic basis, or
whether the phenomenon i s more widespread or even a language u n i v e r s a l ,
based on a s p e c i f i c f e a t u r e of human speech perception i n general. The
hypothesis t h a t such a general psychophonetic p r i n c i p l e does operate i n the
perception o f FO p a t t e r n s i n human speech. I r r e s p e c t i v e o f the phonological
c a t e g o r i z a t i o n and the l i n g u i s t i c f u n c t i o n s i t may serve i n any p a r t i c u l a r
language, leads t o the assumption t h a t n a t i v e speakers o f other languages
than German l i s t e n i n g t o German utterances should be able t o dete c t changes
i n FO peak p o s i t i o n s i n r e l a t i o n t o general human consonant - vowel
sequences, even w i t h o u t knowing any German at a l l , and t h e r e f o r e without
assessing the s t i m u l i s e mantically, but simply on the basis of general
phonetic p r o p e r t i e s o f human speech. I f the r e s u l t s of such l i s t e n i n g t e s t s
were t o coincide w i t h the r e s u l t s f o r German, t h i s would be a strong
i n d i c a t i o n o f a language-independent psychophonetic mechanism. As a f i r s t
step i n t h i s d i r e c t i o n , the s e r i a l discrimination t e s t i n the ascending
ordering of 2.1.1.1.1 was run w i t h two groups o f non-German speakers:
(a) 25 Russian speakers i n Leningrad", who had no knowledge o f German and
who e i t h e r worked on Russian, English or French phonetics (11) or were
students i n t h e i r f i r s t or second year i n the P h i l o l o g i c a l Faculty (14).
" I wish t o thank Prof. N a t a l i a Svetozarova o f Leningrad U n i v e r s i t y f o r admi n i s t e r i n g the t e s t i n her Phonetics Laboratory.
153
A copy, on standard cassette, o f the o r i g i n a l s e r i e s o f 11 s t i m u l i o f
"Sie hat j a gelogen." w i t h FO peak s h i f t s from l e f t t o r i g h t was
provided. The subjects l i s t e n e d t o the series t w i c e and then had t o
cross, on a prepared answer sheet, the number o f the stimulus i n the
series t h a t they perceived as being most c l e a r l y d i f f e r e n t from the
r e s t .
(b) 40 n a t i v e speakers of 13 d i f f e r e n t languages a t t e n d i n g German language
courses at beginners or advanced l e v e l at K i e l U n i v e r s i t y . The o r i g i n a l
t e s t tape was presented t o them over loudspeaker i n f o u r subgroups
( t w i c e 14 and twic e 6 l i s t e n e r s ) i n t h e i r r e l a t i v e l y q u i e t but
a c o u s t i c a l l y non-treated classroom. The answer-sheets and the procedure
were the same as f o r the corresponding t e s t w i t h German l i s t e n e r s i n
2.1.1.1.1. A great deal of time and care was spent on e x p l a i n i n g the
Table IX
Background i n f o r m a t i o n about the 40 f o r e i g n l i s t e n e r s i n the discrimination t e s t
Native Native Beginners Advanced Total language country
Beginners
Farsi I r a n 9 1 10
Polish 4 2 6
Portuguese B r a z i l 3 1 4
Korean 3 1 4
Spanish Chile 2 1 3
Spanish Argentina 1 1
English USA 3 3
English England 2 2
Arabic I s r a e l 1 1
Japanese 1 1
Thai 1 1
Nepali 1 1
Chinese 1 1
Singhalese ( S r i Lanka) 1 1
Swedish 1 1
28 12 40
154
t e s t i n s t r u c t i o n s i n German.^ Table IX provides the background
i n f o r m a t i o n about the 40 l i s t e n e r s .
Results
Table X presents the r e s u l t s o f the Russian group. Although the i n s t r u c t i o n
demanded a s i n g l e response, some subjects i n d i c a t e d more than one stimulus
as being c l e a r l y d i f f e r e n t .
Table X
Frequency d i s t r i b u t i o n o f ' c l e a r l y d i f f e r e n t ' responses by 25 Russian l i s t e n e r s w i t h o u t any knowledge of German i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a gelogen." ( 1 = l e f t - m o s t , 11 = right-most p o s i t i o n )
Stimulus
2 4 5 6 7 8 9 10
Phoneticians 1 1 7 4 1 1 1
Non-phoneticians 1 1 2 11 3 1 1
Total 1 2 3 18 7 2 2 1
Table XI presents the r e s u l t s o f the multilanguage group, r e s t r i c t e d t o the
perception of the f i r s t change i n the s e r i e s . The one Chinese, one Farsi
and one Korean speaker d i d not perceive any change at a l l , although the
other three Korean speakers d i d .
Table XI
Frequency d i s t r i b u t i o n o f ' f i r s t change has occurred' responses by 40 l i s t e n e r s o f 13 d i f f e r e n t languages, i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a gelogen." (1 = l e f t - m o s t , 11 = right-most p o s i t i o n )
Stimulus
4 5 6 7 8 9 11 f i r s t change
perceived 2 9 19 2 2 2 1
(3 l i s t e n e r s perceived no change at a l l . )
^ Robert Gartenberg c a r r i e d out the t e s t s and compiled the data.
155
Discussion
Both groups, i n s p i t e o f t h e i r language d i v e r s i t y , converge i n having a
c l e a r maximum of the response f u n c t i o n f o r stimulus 6. This i s a higher
p o s i t i o n than f o r the German l i s t e n e r s , who favoured stimulus 5, but who
also provided a s u b s t a n t i a l p o r t i o n o f t h e i r answers f o r stimulus 6. These
r e s u l t s are a very strong i n d i c a t i o n t h a t the dichotomy between an ' e a r l y '
and a 'medial' peak p o s i t i o n i s indeed a general psychophonetic,
language-independent phenomenon, which may then be incorporated i n t o the
language-specific phonology at d i f f e r e n t l e v e l s .
Thus i n Mandarin Chinese (see Carding, K r a t o c h v i l , Svantesson & Zhang, 1985)
i t i s put t o use i n the tone system, d i f f e r e n t i a t i n g between the
continuously (low) f a l l i n g FO o f tone 3 (e.g. i n ma3 'horse') and the (high)
r i s i n g - f a l l i n g FO o f tone 4 (e.g. i n ma4 'to c u r s e ' ) . I t i s worth noting i n
t h i s connection how a Chinese speaker (Dr C h i l i n Shih, research worker at
B e l l Labs, Murray H i l l i n 1986) c l a s s i f i e d the 11 s t i m u l i o f the l e f t - r i g h t
s e r ies "Sie hat j a gelogen." without any knowledge o f German. Without the
s l i g h t e s t doubt she associated s t i m u l i 1 - 4 w i t h tone 3, stimulus 5 w i t h
tone 4; l a t e r i n the se r i e s tone 4 changed t o the combined tone 2 - 4; but
whereas the switch from tone 3 t o tone 4 occurred a b r u p t l y i n the succession
of s t i m u l i 4 and 5, the change from tone 4 t o tone 2 - 4 was gradual and
could be less e a s i l y located ( a t stimulus 9 the change had d e f i n i t e l y taken
p l a c e ) . This informal t e s t shows (a) t h a t tones 3 and 4 i n Mandarin Chinese
are d i f f e r e n t i a t e d by the FO maximum r e l a t i v e t o the vowel onset, a
prenuclear FO peak s i g n a l l i n g the former, a nuclear FO peak the l a t t e r , and
(b) t h a t these c a t e g o r i z a t i o n s are possible on the language-independent
basis o f human speech perception i n general.
A d i f f e r e n t case o f e x p l o i t i n g the perceptual relevance of e a r l i e r vs. l a t e r
peaks are the acute and grave tonal word accents i n Norwegian and Swedish
(Garding, 1979, 1982). And f i n a l l y i n t o n a t i o n languages l i k e German,
English and French make use of the d i f f e r e n t peak p a t t e r n s i n t h e i r
i n t o n a t i o n phonologies and r e l a t e them t o semantic d i s t i n c t i o n s along the
'closed/open t o argument' dimension (see 2.2). English and French can
d i f f e r e n t i a t e i n the same way as German i n t h e i r corresponding sentences
"She's been l y i n g . " and " E l l e a menti." I t i s an i n t e r e s t i n g research
156
o b j e c t i v e f o r the f u t u r e t o i n v e s t i g a t e the d i f f e r e n t l i n g u i s t i c f u n c t i o n s
the dichotomy can be put t o i n the world's languages.
2.2 Semantics
The question t o be pursued now i s what l i n g u i s t i c f u n c t i o n s are c a r r i e d by
the phonological c o n t r a s t s o f ' e a r l y ' vs. 'medial' vs. ' l a t e ' peaks i n
German. In p a r t i c u l a r , i t i s t o be ascertained whether the c a t e g o r i c a l
change from ' e a r l y ' t o 'medial' and of the more gradual change from 'medial'
t o ' l a t e ' peak p o s i t i o n s are mapped onto a semantic space i n a congruent
fash i o n . Some i n s i g h t was gained from the data obtained through c o n t r o l l e d
dialogues (Gartenberg & H e r t r i c h , 1988). Furthermore, i n the K i e l and Munich
s e r i a l discrimination t e s t s (2.1.1.1 and 2.1.2) w i t h "Sie hat j a gelogen.",
subjects were also asked t o paraphrase the meanings o f the utterances
corresponding t o the three peak p o s i t i o n s (see 1.4 ( 1 ) ) . Here are some o f
the answers.
K i e l
(a) original utterance (b) f i r s t change (c) further change
Statement o f a f a c t or end of an argumentation.
I n t r o d u c t o r y statement, beginning of argumentation.
As ( b ) , but greater i n s i s t e n c e .
J u s t i f y i n g statement, e s t a b l i s h i n g c a u s a l i t y r e l a t i o n t o what precedes.
Report t o a t h i r d p a r t y t h a t she has been l y i n g ; the speaker stresses a f a c t r e s u l t i n g from the environment.
S l i g h t s u r p r i s e and reproach over behaviour.
Indignant statement.
Strong s u r p r i s e .
Surprise statement.
Statement, explanation.
Statement w i t h o u t s u r p r i s e .
Statement of a f a c t , e.g. i n the context "the punishment i s j u s t i f i e d . "
Surprise, astonishment.
Tendency towards i n d i g n a t i o n .
I n d i g n a t i o n , e.g. i n the context " I would not have expected t h i s o f her."
157
J u s t i f y i n g statement at the end o f a chain of arguments.
Statement, r e p o r t .
M a t t e r - o f - f a c t statement.
D e c l a r a t i v e , expected, matter-of-f a c t .
Confirmation o f a f a c t ; i t i s obvious t h a t she's been l y i n g .
Statement o f f a c t t h a t the speaker discovered long ago.
Beginning o f an argumentation, s l i g h t i n d i g n a t i o n .
Sudden r e a l i s a t i o n of l y i n g .
Statement w i t h the expression of i n d i g n a t i o n .
Greater i n d i g n a t i o n .
Question.
Statement w i t h the expression o f astonishment.
Unexpected, indignant. " I can't b e l i e v e i t . "
S u r p r i s i n g f a c t f o r the speaker, the l i e i s unexpected.
Explanation of a f a c t . I n d i g n a t i o n .
(a) original utterance
Confirmation o f what i s already known.
Munich
(b) f i r s t change
Surprise statement, reproachful undertone ( " . . I would not have expected t h a t . " )
(c) further change
Pure astonishment.
Statement.
Neutral statement.
Astonishment.
Exclamation.
Disappointment.
Exclamation w i t h i n c r e d u l i t y .
Simple statement o f a f a c t w i t h which the speaker seems t o be f a m i l i a r .
Surprise, i n d i g n a t i o n , speaker's r e a c t i o n t o a f a c t he d i d not know before.
But we knew t h a t before anyway.
M a t t e r - o f - f a c t statement.
Amazement.
Unforgivable observation.
Surprise, astonishment.
Astonishment.
M a t t e r - o f - f a c t statement, there was not r e a l l y any doubt about her behaviour.
I t was not c e r t a i n i f she would t e l l the t r u t h or not.
Contrary t o expectation she has been l y i n g , comes as a s u r p r i s e ; gradual t r a n s i t i o n from (b) t o ( c ) .
158
Another paraphrasing experiment was c a r r i e d out w i t h the f o l l o w i n g sentences
co n t a i n i n g f i r s t an ' e a r l y ' and then a ' l a t e ' peak ( t h e underlined word
received the peak accent):
(1) "Wer war das?" ("Who d i d t h a t ? " , "Who was t h a t ? " )
(2) "Mach' b i t t e das Fenster zu!" ("Shut the window, please.")
(3) "Was i s t denn e i n Atom?" ("What's an atom?")
(4) "Sehen w i r uns also morqen?" ("So we are going t o see each other
tomorrow.")
Each p a i r o f 'early/medial' peak utterances was selected from 10 n a t u r a l l y
produced r e p e t i t i o n s (speaker KK) and played t o l i s t e n e r s as o f t e n as they
l i k e d . They had t o w r i t e down t h e i r assessment o f the s i t u a t i o n or speaker
a t t i t u d e t h a t f i t t e d each sentence and peak p a t t e r n . Here are some of the
answers:
'ear7y'
(1) Several people are asked: which o f you was i t ?
In sense o f "who d i d t h a t ? "
The speaker asks the l i s t e n e r a question, he knows the answer himself and urges the l i s t e n e r t o give him the r i g h t answer.
'medial'
Speaker A asks speaker B the name of a t h i r d person.
Somebody unknown t o the speaker i s passing by and the speaker asks a l i s t e n e r the name o f the unknown person.
The speaker wants t o know something unknown t o him, e.g. he has j u s t seen somebody whose name he does not know.
Reproachful question: N e u t r a l , p o s i t i v e question, e.g. a the person concerned has t o teacher's question: "Charles the Great, expect a reprimand f o r some who was t h a t ? " mischief.
Speaker sounds s u p e r i o r , Speaker asks i n a f a m i l i a r way, i s on demanding, t r i e s t o be the same l e v e l as the person spoken t o . d i s t a n t .
(2) Speaker, r a t h e r annoyed, asks somebody standing at the open window t o shut i t ,
Order t o shut the window at once.
Request t o shut the window, not the door.
F r i e n d l y request t o shut the window, not the door, because, e.g., otherwise grandmother might catch a c o l d .
159
Tone o f a command, s l i g h t l y Opposed t o "shut the door please," t h r e a t e n i n g , repeated order o b j e c t has t o be defined s p e c i a l l y , t o the naughty son, ob j e c t o f s h u t t i n g i s s e l f - e v i d e n t , could be l e f t unmentioned.
(3) Speaker asks the l i s t e n e r f o r s p e c i f i c i n f o r m a t i o n t o t e s t him, i . e . the speaker knows the answer t o the question himself.
Speaker knows the answer already, e.g. could be a teacher.
Teacher t o h i s cl a s s , r h e t o r i c a l .
Exam question.
(4) Statement, a l l s e t t l e d , r o u t i n e utterance.
At the end o f an o r d i n a r y conversation, r o u t i n e .
Statement.
Speaker does not know the answer himself and asks the l i s t e n e r t o give him the i n f o r m a t i o n .
Speaker does not know the answer, asks a r e a l question.
A f t e r the teacher has provided the explanation a p u p i l not having heard i t asks what an atom i s .
Continuation i n a chain of questions.
Tomorrow, not today or the day a f t e r tomorrow or any other day the speaker might p r e f e r .
Speaker mentioned tomorrow f o r the next meeting and then changed i t ; t o make sure he repeats the new arrangement at the end o f the conversation.
Confirmation o f tomorrow as against the day a f t e r .
The meanings t h a t may be abstracted from the dialogue data and the
paraphrases f o r the three peaks are:
(a) early: e s t a b l i s h e d f a c t ; no room f o r d i s c u s s i o n ; f i n a l summing up o f
argument
(b) medial: new f a c t ; open f o r d i s c u s s i o n ; s t a r t i n g a new argument
(c) 7ate: emphasis on a new f a c t and c o n t r a s t t o what should e x i s t or e x i s t s
i n the speaker's or hearer's idea.
The FO peak d i f f e r e n c e s are thus not associated w i t h s t r e s s , which remains
the same i n a l l three cases, but w i t h i n t o n a t i o n , which i s i n t u r n l i n k e d t o
semantic categories expressing the speakers e v a l u a t i o n o f f a c t s i n respect
o f expectations. As regards the d i s t i n c t i o n between 'medial' and ' l a t e '
peaks s i m i l a r c a t e g o r i z a t i o n s have been proposed f o r English, the ' l a t e '
peak expressing the speaker's i n c r e d u l i t y or h i s u n c e r t a i n t y (Ward &
160
Hirschberg, 1985; Pierrehumbert & Steele, 1989).
The phonetic d i f f e r e n t i a t i o n between the three peaks and the associated
changes of meaning p o i n t t o another instance o f what Ohala (1983, 1984) has
c a l l e d the frequency code: low frequencies s i g n a l domination, high ones
submissiveness. Of course, i n the case under di s c u s s i o n , t h i s l i n k has been
given l i n g u i s t i c p l a s t i c i t y i n two ways:
the synchronization w i t h the s y l l a b l e s t r u c t u r e , i . e . w i t h human sound
a r t i c u l a t i o n ,
a semantic denotation, r a t h e r than an expressive meaning.
But the semantics o f 'closed vs. open t o argumentation' are i n t i m a t e l y
r e l a t e d t o 'domination vs. submissiveness'. I t i s , however, not necessarily
the domination or submissiveness o f the speaker t h a t i s s i g n a l l e d here, i t
may be t h a t o f the s i t u a t i o n or o f other communicative partners s e t t i n g an
established f a c t or le a v i n g the door open f o r change and new t h i n g s . These
are the basic, underlying meanings of ' e a r l y ' vs. 'non-early' peaks. The
actual meanings observable on the surface i n i n d i v i d u a l utterances and
contexts depend on the i n t e r p l a y o f these basic semantics o f i n t o n a t i o n
contours w i t h the semantics at the l e v e l s o f s y n t a c t i c s t r u c t u r e s , w i t h i n
and across sentences, and of the l e x i c o n .
I f an e a r l y peak i s used i n questions, whose semantics suggest openness,
then the question gets special connotations i n keeping w i t h the semantics o f
the e a r l y peak i n t o n a t i o n : the question i s asked w i t h a presumed knowledge
o f the answer, as i n
- the teacher's question "Wer war das?" ("Who d i d t h a t ? " = I ' l l f i n d
out anyway; possible t h r e a t )
the exam question "Was i s t e i n Phonem?" ("What's a phoneme?")
the resume asking f o r c o n f i r m a t i o n "Das Phonem i s t also eine
Lautklasse." ("So the phoneme i s a sound c l a s s . " = Can we keep t h a t i n
mind and s t a r t from t h e r e , moving t o the next question?)
I f an imperative c o n s t r u c t i o n gets an e a r l y peak, there i s again a
c o n t r a d i c t i o n between the s i g n a l l i n g , through i n t o n a t i o n , o f the expected
completion of an a c t i o n , and, through syntax, o f the order t o c a r r y i t out.
This c o n t r a d i c t i o n produces the connotation o f annoyance and impatience at
the delay o f an a c t i o n . "Mach' b i t t e das Fenster zu." ("Shut the window.
161
please.") may become a t h r e a t i n s p i t e of " b i t t e " . The e a r l y peak can also
get the connotation of r e s i g n a t i o n because nothing can be done t o a l t e r the
established f a c t s : "Nun gut. Wie Sie wo l l e n . " ( " A l r i g h t . As you l i k e . " ) . The
r e s i g n a t i o n i s a l l the gre a t e r the e a r l i e r the FO f a l l and the longer the
low FO t a i l on "gut" and "wollen". In e i t h e r - o r questions, an e a r l y peak i n
second p o s i t i o n s i g n a l s a choice w i t h i n a closed set o f a l t e r n a t i v e s ,
whereas a succession o f medial peaks w i t h low FO i n between r e f e r s t o an
open set o f a l t e r n a t i v e s , which are simply given as p o s s i b l e examples from a
longer l i s t : " W i l l s t du Tee oder Kaffee?" ("Would you l i k e tea or c o f f e e ? " ) .
Rising p a t t e r n s instead of medial peaks convey the same open set but sound
less c a t e g o r i c a l and more f r i e n d l y .
In the l a t e peak, the preceding low FO i n t e r f e r e s w i t h the openness
connotation of the r i s e and introduces the speaker's d i f f e r e n c e of opinion,
which i s ra t e d very high i n r e l a t i o n t o observable f a c t s . The speaker
stresses the d i f f e r e n c e between h i s opinion or way of assessing t h i n g s and
the opinion o f others or f a c t s or b e l i e f s as t o how t h i n g s should be. This
leads t o meanings o f s u r p r i s e , i n c r e d u l i t y , " t h a t can't be t r u e " ,
i n s i n u a t i o n , t a l k i n g down, changing i n degree according t o the amount o f
peak s h i f t t o the r i g h t . Very o f t e n the l a t e peak i s combined w i t h modal
p a r t i c l e s , r e i n f o r c i n g t h e i r meanings, such as (word w i t h l a t e peak accent
underlined)
" j a " i n exclamations
"Da s t e h t j a eine Kirche!" ("Oh, there's a church!"), expressing
s u r p r i s e because r e a l i t y d i f f e r s from the speaker's view,
"doch" i n statements and imperatives/requests
"Er i s t doch gekommen." ("He's come, what are you going on about."),
"Setzen Sie sich doch." ("Do s i t down."), "You are s t i l l standing,
i t i s my opinion t h a t you should be s i t t i n g . " ) , expressing
o p p o s i t i o n t o what the speaker i s confronted w i t h ,
"etwa" i n questions
"Hast du das etwa gekauft?" ("You d i d not buy t h a t , d i d you."),
expressing i n c r e d u l i t y , which i s a l l the stronger the gre a t e r the
emphasis s i g n a l l e d by peak height.
In these examples the modal p a r t i c l e may be missing, but the presence of a
l a t e peak s t i l l conveys the meaning of a c o n t r a s t between the speaker's
162
observation and h i s opinion on i t . I n utterances, such as "Ja." ("Yes."),
" N a t i i r l i c h . " ("Of course.") the speaker stresses h i s own opinion and r e j e c t s
any opinion t o the c o n t r a r y , producing a s u p e r c i l i o u s , arrogant,
presumptious undertone. Talking t o a c h i l d , "Wie heiBt du denn?" ("What's
your name?") stresses the distance between the speaker and the addressee and
gives the impression o f t a l k i n g down. In a sentence l i k e "Hast du bei
C h r i s t i n e ubernachtet?" ("Did you spend the n i g h t w i t h C h r i s t i n e ? " ) there
are i n d i c a t i o n s t h a t the addressee has done j u s t what the speaker suggests,
but should not have because t h i s clashes w i t h moral standards which the
speaker purports t o hold, r e s u l t i n g i n reproach or i n s i n u a t i o n ; combined
w i t h a high peak i t suggests i n c r e d u l i t y .
The important lesson t o be l e a r n t from these data i s t h a t t h e r e i s a d i r e c t
l i n k between p a r t i c u l a r FO contours and s p e c i f i c meanings, but t h i s l i n k i s
not one on the surface, but un d e r l i e s the actual meanings, which are the
r e s u l t of an i n t e r a c t i o n o f various meaning l e v e l s . Social psychologists
(e.g. Scherer, 1985) who have been concerned w i t h these d i r e c t
substance/expressive meaning r e l a t i o n s , have o f t e n lacked a d e t a i l e d i n s i g h t
i n t o the phonetic and semantic s t r u c t u r e s o f language as a p r e r e q u i s i t e t o a
successful i n t e r p r e t a t i o n . The c o r o l l a r y of the phonetic-semantic
explanations o f f e r e d f o r the use o f d i f f e r e n t FO peaks i n i n t o n a t i o n i s t h a t
these phonological i n t o n a t i o n categories i n t h e i r a s s o c i a t i o n w i t h meanings
r e l a t a b l e i n one form or another t o the basic ones given must be at l e a s t
widespread i n languages, provided the phonological dichotomy has not already
been booked at some other l e v e l , e.g. tone or word accent.
2.3 General discussion concerning Hypothesis (2)
The perception experiments of 2.1 and the semantic e v a l u a t i o n derived from
paraphrasing tasks i n 2.2 have l a r g e l y confirmed Hypothesis (2) o f
C o n t r i b u t i o n I (Kohler, 1991b): the s h i f t o f an FO peak i n a single-accent
terminal utterance between a prenucleus and a nucleus p o s i t i o n r e s u l t s i n a
c a t e g o r i c a l change o f perception, which i s c o r r e l a t e d w i t h an equally
c a t e g o r i c a l semantic switch along the dimension 'established/new' or
'closed/open t o argumentation'; the corresponding realignment t o the r i g h t
produces a gradual a u d i t o r y change c o r r e l a t e d w i t h a semantic continuum
expressing degrees of distance which the speaker e s t a b l i s h e s between himself
and the world as i t presents i t s e l f t o him. This degree o f distance r a t h e r
163
than the degree o f emphasis, as formulated i n Hypothesis ( 2 ) , i s the
semantic basis o f the 'medial' t o ' l a t e ' peak positions, emphasis being
c o r r e l a t e d w i t h peak height,
3. I n t o n a t i o n and st r e s s
I t has already been pointed out t h a t the three FO peak p o s i t i o n s discussed
i n Section 2. represent d i f f e r e n t phonological categories o f intonation
associated w i t h the same stressed s y l l a b l e . So intonation must be
d i f f e r e n t i a t e d from s t r e s s , through which a s y l l a b l e i n a chain i s selected
and marked f o r an intonation peak (or v a l l e y ) t o be hooked onto. But the
stress f e a t u r e may be chosen f o r d i f f e r e n t s y l l a b l e s i n a sequence, and thus
a s h i f t o f an FO peak (or v a l l e y ) p o s i t i o n from one s y l l a b l e t o another can
also change the st r e s s p o s i t i o n i n a s y l l a b l e chain, not j u s t the i n t o n a t i o n
peak (or v a l l e y ) associated w i t h i t . FO peaks can t h e r e f o r e become cues t o
stress beside being cues t o i n t o n a t i o n . Then two questions a r i s e :
(a) Under what c o n d i t i o n s i s an FO peak s h i f t ( w i t h o u t concomitant changes
in sound d u r a t i o n and i n t e n s i t y ) s u f f i c i e n t t o s h i f t stress t o a
d i f f e r e n t s y l l a b l e ? Two cases have t o be d i s t i n g u i s h e d : the st r e s s
p a t t e r n changes, but the peak p a t t e r n stays, or both change. In
p r i n c i p l e , at each stress p o s i t i o n three i n t o n a t i o n peaks are possible.
(b) How can the st r e s s and i n t o n a t i o n f u n c t i o n s o f FO peaks be
d i f f e r e n t i a t e d , and i n what ways do they i n t e r a c t ?
These questions r e l a t e t o the l e v e l o f l e x i c a l s t r e s s or o f sentence s t r e s s
because words i n sentences do not a l l r e t a i n t h e i r stresses. 3.1 deals w i t h
the former, 3.3 w i t h the l a t t e r . In 3.2 the importance of d u r a t i o n f o r the
s i g n a l l i n g of s t r e s s , i n a d d i t i o n t o FO, w i l l be discussed. F i n a l l y , 3.4
w i l l deal w i t h the perceptual ambiguity between one and two accents combined
w i t h c o n f l i c t i n g i n t o n a t i o n p a t t e r n s , and 3.5 w i l l enquire i n t o the
relevance o f i n t e n s i t y f o r the cuing o f stress and i n t o n a t i o n .
3.1 Lexical stress
German o f f e r s good examples f o r t e s t i n g the issues of st r e s s s i g n a l l e d by FO
peak p o s i t i o n and o f s t r e s s and i n t o n a t i o n i n t e r a c t i o n at the l e x i c a l l e v e l
because i t has minimal verb p a i r s , w i t h e i t h e r p r e f i x or stem s t r e s s , which
can occur i n the same na t u r a l sentence frame, e.g. "Er wird's wohl
umlagern." [ C B viBts vol 'umla:gBn ( u m ' l a : g B n ) ] , w i t h s t r e s s e i t h e r on the
164
p r e f i x "um-", meaning "verlagern" ("He i s presumably going t o s h i f t i t t o
another p l a c e . " ) , or on " - l a - " , meaning "belagern" ("He i s presumably going
t o besiege i t . " ) .
Utterances o f the above two sentences, (a) w i t h s t r e s s on "um-" and a
'medial' i n t o n a t i o n peak on t h i s s y l l a b l e , and (b) w i t h s t r e s s on " - l a - " and
an ' e a r l y ' i n t o n a t i o n peak, which i s a c t u a l l y located on the s y l l a b l e "um-",
were analysed and Fig. 19 presents the waveforms together w i t h t h e i r FO
di s p l a y s . The FO peak p o s i t i o n s i n the two utterances are p r a c t i c a l l y
i d e n t i c a l i n r e l a t i o n t o the s y l l a b l e s t r u c t u r e s o f "umlagern": they occur
at more or less the same time i n t e r v a l j u s t before the beginning o f / I / . The
d i f f e r e n c e s between the two are i n the shapes o f the FO peak contours and i n
the s y l l a b l e d u r a t i o n s . In the utterance w i t h stem st r e s s i n Fig. 19b the
post-peak FO descent i s more gradual, the s y l l a b l e "um-" s h o r t e r (135 ms i n
Fig. 19b vs. 222 ms i n Fig. 19a) and t h e r e f o r e the FO r i s e f a s t e r , s t a r t i n g
at a s t r u c t u r a l l y e a r l i e r p o i n t (beginning o f the / I / i n "wohl" r a t h e r than
at the "um-" s y l l a b l e onset, as i s the case i n the utterance w i t h p r e f i x
s t r e s s ) . The " - l a - " s y l l a b l e s i n the two utterances, on the other hand, have
very s i m i l a r d u r a t i o n s i n the stem and p r e f i x s t r e s s words (268 ms i n
Fig. 19b vs. 258 ms i n Fig. 19a). Two f u r t h e r s t i m u l i were generated from
the two i l l u s t r a t e d i n Fig. 19 by exchanging the FO contours (see Fig. 20).
These f o u r s t i m u l i (STl - ST4) were the basis f o r c r e a t i n g f o u r series o f FO
peak p o s i t i o n s (PI - P4):
PI A se r i e s o f 12: 6 l e f t s h i f t s ( p a r a l l e l t r a n s p o s i t i o n o f the l e f t branch
and time expansion of the r i g h t branch) and 5 complete p a r a l l e l r i g h t
s h i f t s o f 30 ms each i n the utterance o f Fig. 19a.
P2 A ser i e s o f 9: 8 complete p a r a l l e l l e f t s h i f t s o f 30 ms each i n the
utterance o f Fig. 19b.
P3 A s e r i e s o f 12 i n the utterance o f Fig. 20a, f o l l o w i n g the procedure i n
PI.
P4 A series o f 9 i n the utterance o f Fig. 20b, f o l l o w i n g the procedure i n
P2.
PI and P3 are based on the o r i g i n a l p r e f i x s t r e s s , P4 and P2 on the o r i g i n a l
stem s t r e s s utterance, and i n each p a i r i n g the series form an op p o s i t i o n
between more a b r u p t l y and slowly f a l l i n g FO peak contours, r e s p e c t i v e l y .
From these f o u r sets o f s t i m u l i two t e s t s were compiled: Test I combined
165
.0000 1.5181
PITCH CHZ]
Fig. 19
Speech waves and FO contours(1inear scale) of the o r i g i n a l (a) p r e f i x s tress w i t h 'medial' peak and (b) stem-stress w i t h e a r l y peak i n "Er wird's wohl umlagern." A, B, C mark the FO base and peak p o i n t s f o r peak contour s h i f t .
166
.0000 1.5181 TIME<REL) I CSECa
SPEECH
PITCH CHZ:
Fig. 20
As i n Fig. 19, but w i t h exchanged FO contours, adjusted t o the d i f f e r e n t t i m i n g of the new utterance.
167
the more sharply f a l l i n g sets PI and P4, Test I I the slowly f a l l i n g sets P2
and P3. Subjects were asked t o i d e n t i f y the s t i m u l i w i t h the meanings of
e i t h e r "belagern" (stem s t r e s s ) or "verlagern" ( p r e f i x s t r e s s ) . Further
d e t a i l s about t e s t stimulus generation, t e s t tape c o n s t r u c t i o n and t e s t
a d m i n i s t r a t i o n can be found i n Kohler (1990c).
In PI and P3 the se r i e s o f FO peak p o s i t i o n s s t r a d d l e the s y l l a b l e
s t r u c t u r e s where a change from p r e f i x t o stem s t r e s s i s t o be expected i f FO
i s a s u f f i c i e n t cue. The two sets d i f f e r i n t h a t the peak shape o f P3, but
not o f P I , approximates the more slowly descending FO c o n f i g u r a t i o n found i n
the e a r l y peak o f the o r i g i n a l stem-stress utterance ( c f . F i g . 19b). I t i s
hypothesized, t h e r e f o r e , t h a t i f s t r e s s i s p e r c e p t u a l l y s h i f t e d at a l l i n PI
and P3, there w i l l be a more c l e a r - c u t change i n PI because there i s a
higher p r o b a b i l i t y i n P3 t h a t an FO peak p o s i t i o n on "um-" can not only be
perceived as a 'medial' or ' l a t e ' peak p r e f i x s t r e s s but also as an ' e a r l y '
peak stem s t r e s s . S i m i l a r l y , t h e r e would be a grea t e r l i k e l i h o o d i n P2 than
i n P4 f o r an ' e a r l y ' peak stem stress t o i n t e r f e r e w i t h a 'medial' or ' l a t e '
peak p r e f i x s t r e s s because of the slower FO descent and i t s time expansion
i n the l e f t s h i f t o f P2 as against P4.
Results and Discussion
Figs. 21 and 22 present the data of the two i d e n t i f i c a t i o n t e s t s f o r the
o r i g i n a l p r e f i x and stem stress s e r i e s , r e s p e c t i v e l y , each w i t h slow and
more sharply f a l l i n g peak contours.
In the s h i f t o f the more sharply f a l l i n g FO peak contour through the
o r i g i n a l p r e f i x - s t r e s s utterance there i s a c l e a r change from i n i t i a l t o
stem s t r e s s , i n s p i t e of the d u r a t i o n o f "um-" p o i n t i n g t o the former. FO
can thus o v e r r i d e d u r a t i o n , p a r t i c u l a r l y since the d u r a t i o n o f the
unstressed " - l a - " s y l l a b l e i n the o r i g i n a l utterance i s very close t o i t s
d u r a t i o n under s t r e s s . In stimulus 10, which i s the f i r s t i n the ord e r i n g
from 1 t o 12 t o y i e l d an unequivocal stem-stress c a t e g o r i z a t i o n w i t h over
80% p o s i t i v e responses, the FO peak p o s i t i o n i s 30 ms i n t o the vowel o f the
s y l l a b l e " - l a - " . This corresponds t o the data discussed i n Section 2.,
concerning the change from an ' e a r l y ' t o a 'medial' i n t o n a t i o n peak on the
stressed s y l l a b l e . The f a c t t h a t the change from one s t r e s s p o s i t i o n t o the
168
% „ b G l Q g G r n "
1 2 3 4 5 6 7 8 9 1 0 11 12 stim nr
Fig. 21
Percentage stem-stress responses f o r "umlagern" (= "belagern", i . e . stem s t r e s s ) i n the series o f 12 FO peak p o s i t i o n s (from l e f t t o r i g h t ) combined w i t h the o r i g i n a l p r e f i x - s t r e s s utterance o f "Er wird's wohl umlagern." (n r . 7 appr. o r i g i n a l peak p o s i t i o n ) . Broken l i n e = P3, slo w l y f a l l i n g peak contour (n = 80 at each data p o i n t ) , continuous l i n e = P I , sharply f a l l i n g peak contour (n = 185 at each data p o i n t ) , dotted l i n e = P I , sharply f a l l i n g peak contour, but i n Test I I I o f 3.2, see t e x t (n = 170 at each data p o i n t ) .
other i s gradual r a t h e r than c a t e g o r i c a l can be r e l a t e d t o a residue o f the
d u r a t i o n cue. But we also have t o consider some i n t e r a c t i o n o f the st r e s s
and i n t o n a t i o n f u n c t i o n s o f FO because the FO peak assumes p o s i t i o n s before
the beginning o f the s y l l a b l e nucleus /a:/ o f " - l a - " which can
simultaneously f u n c t i o n as the 'medial' or ' l a t e ' i n t o n a t i o n peak i n
stressed "um-" and as the 'e a r l y ' i n t o n a t i o n peak i n stressed " - l a - " . The
relevance of t h i s i n t o n a t i o n i n t e r f e r e n c e w i t h s t r e s s i s confirmed by the
f i n d i n g t h a t when the more slowly f a l l i n g FO peak i s s u b s t i t u t e d the
i n i t i a l - s t r e s s category i s not so c l e a r l y represented: the i n t e r p r e t a t i o n o f
169
%,.bGlQgGrn"
100 T
0 ' 1 1 1 1 1 1 1 1 1—
1 2 3 4 5 6 ? 8 9 stim nr
Fig. 22
Percentage stem-stress responses f o r "umlagern" (= "belagern", i . e . stem s t r e s s ) i n the series o f 9 FO peak p o s i t i o n s (from l e f t t o r i g h t ) combined w i t h the o r i g i n a l stem-stress utterance "Er wird's wohl umlagern." (nr. 9 appr. o r i g i n a l peak p o s i t i o n ) . Broken l i n e = P2, slowly f a l l i n g peak contour (n = 80 at each data p o i n t ) , continuous l i n e = P4, sharply f a l l i n g peak contour (n = 185 at each data p o i n t ) , dotted l i n e = P4', sharply f a l l i n g peak contour and durations o f p r e f i x s t r e s s ( c f . Test I I I o f 3.2, n = 170 at each data p o i n t ) .
an ' e a r l y ' i n t o n a t i o n peak f o r stem st r e s s i s then never completely
precluded.
When an FO peak contour i s s h i f t e d through the o r i g i n a l stem-stress
utterance there i s no change between the str e s s categories ( F i g . 22): the
answers remain predominantly i n favour o f stem s t r e s s . I n t h i s case, FO
cannot o v e r r i d e the d u r a t i o n cue completely because "um-" i s too short i n
r e l a t i o n t o " - l a - " t o signal i n i t i a l s t r e s s . There i s some e f f e c t o f FO when
170
the more sharply f a l l i n g FO peak occurs w i t h i n the s y l l a b l e "um-". In
s t i m u l i 1 t o 5 the FO peak has been s h i f t e d l e f t w a r d a l l the way i n t o the
preceding s y l l a b l e "wohl", whereas i n 6 t o 8 i t has been moved only as f a r
back as some p o i n t w i t h i n the p r e f i x s y l l a b l e "um-", and i n these s t i m u l i
there are up t o 30% judgements o f p r e f i x s t r e s s . This p a t t e r n suggests t h a t
the o v e r r i d i n g salience o f d u r a t i o n i n the o r i g i n a l p r e f i x s t r e s s stimulus
i s checked somewhat when the c h a r a c t e r i s t i c sharply f a l l i n g contour occurs
i n the r e l e v a n t s y l l a b l e and i s more narrowly l i m i t e d t o i t , a l l o w i n g the
i n t e r p r e t a t i o n o f a 'medial' or ' l a t e ' peak on "um-", r a t h e r than an ' e a r l y '
one on the f o l l o w i n g " - l a - " . In the other s e r i e s , however, the slowly
f a l l i n g and time-expanded FO contour reduces the p r o b a b i l i t y o f i n t e r p r e t i n g
the peak as a 'medial' or ' l a t e ' peak f o r a p r e f i x s t r e s s , because o f the
stronger i n t e r f e r e n c e from an ' e a r l y ' peak i n t e r p r e t a t i o n on " - l a - " , due t o
the wider span o f the FO peak descent.
The questions asked i n i t i a l l y can now be answered as f o l l o w s :
(a) An FO peak s h i f t by i t s e l f i s s u f f i c i e n t t o b r i n g about a c l e a r change
from one st r e s s p o s i t i o n t o another, provided the d u r a t i o n o f the
st r e s s e d - s y l l a b l e - t o - b e toward which the FO peak i s s h i f t e d i s not too
shor t . But even when i t i s , there i s a r e s i d u a l FO e f f e c t .
(b) The i n t o n a t i o n f u n c t i o n o f FO i n t e r f e r e s w i t h i t s s t r e s s f u n c t i o n i f the
l a t t e r i s not supported by d u r a t i o n . This f i n d s i t s expression i n a
gradual change from one stress p o s i t i o n t o another i n a b u t t i n g s y l l a b l e s
where an ambiguity can a r i s e between a 'medial' or ' l a t e ' i n t o n a t i o n
peak i n one stressed s y l l a b l e and an ' e a r l y ' i n t o n a t i o n peak r e l a t e d t o
a subsequent stressed s y l l a b l e . This i n t e r a c t i o n i s strengthened when
the shape o f the FO peak contour approximates the more slowly f a l l i n g
one of the ' e a r l y ' i n t o n a t i o n peak of a l a t e r s t r e s s .
3.2 Duration as a f e a t u r e i n st r e s s perception
I t has been shown i n 3.1 t h a t although FO i s a strong cue i n str e s s
perception, d u r a t i o n can become an a d d i t i o n a l d i s t i n c t i v e f e a t u r e when
vowels and po s t v o c a l i c sonorants are sh o r t e r than would be associated w i t h
the production o f a stressed s y l l a b l e . On the other hand, i f they are longer
than would be associated w i t h an unstressed s y l l a b l e , the FO cue may be
dis t u r b e d , but never dominated by the d u r a t i o n cue.
171
3.2.1 Duration increase f o r inducing s t r e s s perception i n FO peaks
The importance of d u r a t i o n f o r s t r e s s perception was f u r t h e r i n v e s t i g a t e d i n
an experiment t h a t repeated Test I of 3.1 by using the peak se r i e s PI and a
modified peak s e r i e s P4', i . e . the sets o f s t i m u l i based on the o r i g i n a l
p r e f i x and stem stress utterances, r e s p e c t i v e l y , both combined w i t h the more
sharply f a l l i n g FO contour derived from the p r e f i x - s t r e s s utterance (see
Figs. 19a and 20b). But t h i s time a new basis stimulus ST4' f o r a se r i e s P4'
was created by a d j u s t i n g the dur a t i o n s o f the s y l l a b l e "um-" [um] and the
vowel [a:] o f the s y l l a b l e " - l a - " i n the basis stimulus ST4 t o the same
values as i n the basis stimulus STl. By repeating some periods i n [um] and
d e l e t i n g some i n [ a : ] , [u] was lengthened from 70 ms t o 117 ms, [m] from
65 ms t o 105 ms, and [a:] reduced from 210 ms t o 189 ms. Then the FO contour
of the basis stimulus STl was t r a n s f e r r e d - sound segment by sound segment -
to the modified basis stimulus ST4'. The se r i e s P4' was generated by
s h i f t i n g the FO peak t o the l e f t as f o r P4.
Series PI and P4' were then compiled t o a new Test I I I , which only d i f f e r s
from Test I i n the segment durations o f P4' vs. P4. The f i r s t 7 s t i m u l i o f
PI and the l a s t 7 of P4' occupy the same ranges o f FO peak p o s i t i o n s , have
very s i m i l a r segment durations ( w i t h [um] and [a:] being i d e n t i c a l ) and
comparable FO contours, but they d i f f e r i n the basis s t i m u l u s , which i s
e i t h e r the o r i g i n a l p r e f i x - s t r e s s utterance i n PI or the o r i g i n a l
stem-stress utterance i n P4', implying s p e c t r a l and i n t e n s i t y d i f f e r e n c e s .
The hypothesis connected w i t h Test I I I was t h a t the change o f segment
durations i n P4' vs. P4 would be s u f f i c i e n t t o reverse judgement from stem
stress t o p r e f i x s t r e s s i n a l l cases of the s e r i e s , r e s u l t i n g i n s i m i l a r
response f u n c t i o n s f o r s t i m u l i 1 - 7 o f PI and f o r s t i m u l i 3 - 9 o f P4', and
would thus p o i n t t o the low relevance o f s p e c t r a l and i n t e n s i t y f e a t u r e s i n
German stress perception. Test I I I was run w i t h 34 l i s t e n e r s .
Results and Discussion
The dotted l i n e s i n Figs. 21 and 22 present the r e s u l t s o f i d e n t i f i c a t i o n
Test I I I . The hypothesis o f the complete rev e r s a l o f judgements has been
confirmed by P4 and P4' i n Fig. 22 y i e l d i n g ca. 80% and 20% "belagern"
responses, r e s p e c t i v e l y . The l e f t s h i f t o f the response f u n c t i o n f o r the
i d e n t i c a l PI se r i e s i n Test I I I , compared w i t h Test I , may be due t o the
172
t e s t design: the decrease o f the number o f c l e a r stem-stress cases and the
increase o f the number of c l e a r p r e f i x - s t r e s s cases by swapping P4' f o r P4
may have pushed the responses t o the more ambivalent cases i n PI i n the
d i r e c t i o n o f stem s t r e s s , but there i s also more noise i n the PI response
curve o f Test I I I , as i s shown by the o f f s e t o f 10% - 20%.
3.2.2 Duration decrease f o r e l i m i n a t i n g s t r e s s perception i n FO peaks
P a r a l l e l t o generating ST4' from ST4, a new STl' was generated from STl by
shortening the d u r a t i o n s o f [um] t o 70 ms - 65 ms (from 117 ms - 105 ms) and
of [a:] t o 210 ms (from 189 ms), applying the same period s p l i c i n g
procedure. Then the same peak s h i f t s t o the l e f t and r i g h t were performed as
i n PI, r e s u l t i n g i n P I ' w i t h 12 peak p o s i t i o n s and sharply f a l l i n g FO
contours. Informal l i s t e n i n g t o the se r i e s P I ' by phoneticians established
t h a t a l l the 12 s t i m u l i were unequivocally perceived as stem stressed, even
when the FO peak p o s i t i o n was on "um-". Because of t h i s very c l e a r evidence
no f u r t h e r formal t e s t was run. These r e s u l t s prove again t h a t i f the
dur a t i o n o f a s t r e s s e d - s y l l a b l e - t o - b e i s too short the FO cue may not be
s u f f i c i e n t t o sig n a l s t r e s s .
3.2.3 Conclusion
In German, st r e s s i s cued by two f e a t u r e s , FO and d u r a t i o n , which may be
expressed i n a d i s t i n c t i v e f e a t u r e n o t a t i o n as iFSTRESS, ±DSTRESS. The FO
cue c l e a r l y dominates i f the d u r a t i o n i s not too short f o r stressed
s y l l a b l e s ; otherwise longer d u r a t i o n i s requi r e d t o signal s t r e s s . S y l l a b l e s
are thus marked as stressed/unstressed by the two s t r e s s f e a t u r e s : (1)
-FSTRESS, -DSTRESS = unstressed, (2) -FSTRESS, +DSTRESS = secondary s t r e s s ,
e.g. i n n o n - i n i t i a l components of compounds ("Ausfahrt" [ 'aus ,fa:Bt]
( " e x i t " ) , which receive increased d u r a t i o n , but no i n t o n a t i o n peak (or
v a l l e y ) , (3) +FSTRESS, +DSTRESS = primary s t r e s s , where the i n t o n a t i o n
points are hooked. The i n t o n a t i o n associated w i t h stressed s y l l a b l e s i s ,
among other t h i n g s , defined according t o d i f f e r e n t peak p o s i t i o n s , which may
again be expressed i n d i s t i n c t i v e f e a t u r e n o t a t i o n t a k i n g the primary
dichotomy between ' e a r l y ' and 'non-early' i n t o account: ±EARLY, and -EARLY
may then be ±LATE.
At each p o t e n t i a l stress p o s i t i o n +FSTRESS, three i n t o n a t i o n peaks are
possible. But since the FO o f these peaks serves t o signal the stressed
173
s y l l a b l e - as a str e s s cue - and at the same time the peak p o s i t i o n i n
r e l a t i o n t o such a stressed s y l l a b l e - as an i n t o n a t i o n cue, there may be
i n t e r f e r e n c e between the two cue f u n c t i o n s leading t o ambiguity, i f the
temporal distance between successive p o t e n t i a l stresses, as i n l e x i c a l items
of the type "umlagern", i s small, p a r t i c u l a r l y because o f a lack o f
i n t e r v e n i n g unstressed s y l l a b l e s (e.g. c o n t a i n i n g /a/) and even more so i n
the case o f a b u t t i n g s y l l a b l e s w i t h short q u a n t i t y vowels.
3.3 Sentence s t r e s s
In sentences not every l e x i c a l item gets a +FSTRESS marking f o r the
ass o c i a t i o n w i t h i n t o n a t i o n peaks (and v a l l e y s ) , although at a more abstract
l e v e l i t has l e x i c a l s t r e s s , i . e . at l e a s t one s y l l a b l e i s ph o n o l o g i c a l l y
marked as having the p o t e n t i a l o f r e c e i v i n g the feat u r e s +FSTRESS and
+DSTRESS. The r u l e s o f grammar and pragmatics determine which l e x i c a l - s t r e s s
s y l l a b l e s are given the fe a t u r e combinations +FSTRESS, +DSTRESS or -FSTRESS,
+DSTRESS i n sentences. In a sentence such as "Aber der Leo s a u f t . " [abB dB
•le:o: "zoift] ("But Leo d r i n k s . " ) ^ e i t h e r the subject "Leo" or the verb
" s a u f t " may be i n focus, r e c e i v i n g the features +FSTRESS, +DSTRESS, or both
elements may be so characterized simultaneously. The question t o be answered
i s whether the f i n d i n g s at the l e x i c a l l e v e l i n 3.1 - 2 can be r e p l i c a t e d at
the sentence l e v e l , v i z . whether a switch from one s t r e s s p o s i t i o n t o
another can be brought about simply by FO peak s h i f t through the sentence.
In t h i s case i t w i l l also have t o be checked whether at some intermediate
s e c t i o n o f the peak s h i f t scale both stresses are r e a l i s e d . And f i n a l l y ,
there i s the issue o f the perceptual m a n i f e s t a t i o n o f d i f f e r e n t intonation
peaks ( ' e a r l y ' , 'medial', ' l a t e ' ) at each s t r e s s p o s i t i o n , i n p a r a l l e l t o
what was found i n the sentences of Section 2. w i t h only one p o t e n t i a l
accent.
3.3.1 Stimulus preparation f o r perception experiments
A n a t u r a l production o f the utterance "Aber der Leo s a u f t . " w i t h sentence
stress and 'medial' i n t o n a t i o n peak on "Leo" was used f o r stimulus
generation. Fig. 23 shows the speech wave, energy and FO contours. A series
^ This sentence played an important r o l e i n some experiments o f the Munich I n t o n a t i o n P r o j e c t (see Altmann et a l . , 1989) and was taken as the basis of f u r t h e r experiments i n the K i e l I n t o n a t i o n P r o j e c t f o r purposes of cross-reference.
174
.0000
T i n E ( R E L ) I L [SEC]
4 5T
ENERGY CdB]
I
1.4800
J I
SPEECH
200-1
PITCH CHZ3
Fig. 23
Speech wave, energy and FO contours ( l i n e a r scale) o f the utterance "Aber der Leo s a u f t , " w i t h subject stress and 'medial' peak. The time marks i n d i c a t e the FO base and peak p o i n t s f o r peak contour s h i f t .
o f 7 l e f t s h i f t s ( p a r a l l e l t r a n s p o s i t i o n o f the l e f t branch and time
expansion o f the r i g h t branch) and of 11 complete p a r a l l e l r i g h t s h i f t s o f
30 ms each were generated on the basis o f the utterance i n Fig. 23. An
informal assessment o f the series detected a poor q u a l i t y i n the synthesis
o f the segment /z/ and o f too strong a f i n a l a s p i r a t i o n ; furthermore, the
l a s t s t i m u l i o f the s e r i e s , from 15 t o 19, w i t h the accent on " s a u f t "
sounded too strong at the beginning and husky at the end, obviously due t o
the wrong energy contour f o r a f i n a l FO peak p o s i t i o n , i . e . t o a
desynchronization o f FO and energy (see 2.1.1.5). To remedy these defects
and t o create as na t u r a l s y n t h e t i c versions as po s s i b l e , almost the e n t i r e
[ z ] was devoiced, the f i n a l a s p i r a t i o n reduced by lowering the dB-values,
and the discrepancy between energy and FO e l i m i n a t e d by lowering the energy
175
As Fig, 23, but w i t h the 19 peak p o s i t i o n p o i n t s marked.
i n "Leo" and by r a i s i n g i t around the FO peaks. The peak se r i e s was then
regenerated w i t h these parameter m o d i f i c a t i o n s of the stimulus i n Fig. 23;
i t formed the basis f o r i d e n t i f i c a t i o n and s e r i a l discrimination t e s t s .
3.3.2 I d e n t i f i c a t i o n t e s t
Five r e p e t i t i o n s o f the 19 s t i m u l i were randomized and presented ( i n the
format of 1.4 (2) f o r s i n g l e s t i m u l i ) t o 31 l i s t e n e r s w i t h the i n s t r u c t i o n
t o decide whether "Leo" or " s a u f t " was more s t r o n g l y stressed.
Results and Discussion
Fig. 25 presents the r e s u l t s o f the i d e n t i f i c a t i o n t e s t , which demonstrate
very c l e a r l y t h a t a simple FO peak s h i f t causes a change from subject t o
verb s t r e s s . The t r a n s i t i o n i n the response f u n c t i o n between the two stress
p o s i t i o n s i n d i c a t e s - as was confirmed i n phonetic expert l i s t e n i n g - t h a t
176
as the peak i s moved i n t o the f r i c a t i v e [ ? ] and t h e r e f o r e spans both words,
g i v i n g a l a t e FO r i s e t o "Leo" and an e a r l y FO f a l l t o "sauft* the
perception o f double s t r e s s r e s u l t s , which disappears again when the peak i s
located at the beginning o f the vowel o f the verb and the impression o f
focus s t r e s s on the l a t t e r i s created.
KX)-r % 'subjeci- stress'
I I I I I I I I I I I I 1 3 5 7 9 11 13 15 17 19
2 4 6 8 10 12 14 16 18 stim nr
Fig. 25
I d e n t i f i c a t i o n f u n c t i o n showing percentage 'subject s t r e s s ' judgements f o r 19 s t i m u l i "Aber der Leo s a u f t . " w i t h FO peak s h i f t from l e f t t o r i g h t , (nr 8 appr. o r i g i n a l peak p o s i t i o n ) , n = 155 at each data p o i n t .
3.3.3 S e r i a l d i s c r i m i n a t i o n t e s t s
The series o f 19 s t i m u l i was p a r t i t i o n e d i n t o two sub-series: (a) s t i m u l i 1
- 10 representing c l e a r instances o f the category o f subject stress and (b)
s t i m u l i 14 - 19 representing c l e a r instances o f the category o f verb s t r e s s ,
according t o the r e s u l t s o f the i d e n t i f i c a t i o n t e s t . Each set i n ascending
(numerical) o r d e r i n g was presented t o 32 subjects f o r e v a l u a t i n g at which
177
stimulus i n the s e r i e s the f i r s t and f u r t h e r changes i n the speech melody
had occurred.
Results and Discussion
Tables X I I and X I I I present the r e s u l t s o f the s e r i a l discrimination t e s t s
(a) and ( b ) .
Table X I I
Frequency d i s t r i b u t i o n o f 'change has occurred' responses o f 32 l i s t e n e r s i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the f i r s t 10 s t i m u l i w i t h FO peak s h i f t s i n "Aber der Leo s a u f t . " (1 = l e f t - m o s t , 10 = right-most p o s i t i o n )
Stimulus
2
1
4
5
5
12
6
10
7
2
8 10 F i r s t change perceived
Further changes
perceived 4 7
Total 1 5 12 14 9
{2 l i s t e n e r s perceived no change at a l l . )
5
5
7
7
7
7
Table X I I I
Frequency d i s t r i b u t i o n o f 'change has occurred' responses o f 32 l i s t e n e r s i n the l e f t - r i g h t sequence of the s e r i a l discrimination t e s t across the l a s t 6 s t i m u l i w i t h FO peak s h i f t s i n "Aber der Leo s a u f t . " (14 = l e f t - m o s t , 19 = right-most p o s i t i o n )
Stimulus
16 17 18 19 F i r s t change perceived 16 11 1 1
Further changes
perceived 5 2 5
Total 16 16 3 6
(3 l i s t e n e r s perceived no change at a l l . )
In both s e r i e s the f i r s t perceptual change has a maximum frequency at the
stimulus i n which the FO peak occupies the f i r s t p o s i t i o n w i t h i n the
respective s y l l a b l e nucleus ( n r 5 i n (a) and nr 16 i n ( b ) ) . This r e s u l t
coincides w i t h the data obtained i n the peak alignment t e s t i n utterances
178
c o n t a i n i n g a s i n g l e p o t e n t i a l accent ( c f . 2.1.1). I t p o i n t s t o the change
from an ' e a r l y ' t o a 'medial' peak w i t h i n each s t r e s s p o s i t i o n .
A corresponding c l e a r - c u t switch was not observed i n the "umlagern" series
of 3.1.^ The reason f o r t h i s d i f f e r e n c e l i e s i n the s h o r t e r d u r a t i o n of [um]
vs. [ l e : o : ] , which allows less separation o f the i n t o n a t i o n peak and stress
p o s i t i o n s and causes the FO c o n f i g u r a t i o n t o s t r a d d l e both p o t e n t i a l accent
s y l l a b l e s , given the width of the s h i f t e d peak contour, across a grea t e r
number of s t i m u l i . The more gradual t r a n s i t i o n from p r e f i x t o stem st r e s s i n
the response f u n c t i o n o f Fig. 21, compared w i t h t h a t i n Fig. 25, i s a
f u r t h e r i n d i c a t i o n of t h i s stronger s t r e s s / i n t o n a t i o n i n t e r a c t i o n across
segment dur a t i o n s t h a t are i n s u f f i c i e n t f o r r e s t r i c t i n g the chosen peak
t i m i n g t o . To achieve a grea t e r separation o f the d i f f e r e n t i n t o n a t i o n peaks
w i t h i n each accent, the peak descent would at l e a s t have t o be f a s t e r t o
encroach less on the other peak and stress p o s i t i o n s .
3.4 Perceptual ambiguity between s i n g l e and double accent
In s p i t e o f the more adequate temporal s t r u c t u r e i n "Aber der Leo s a u f t . " ,
f o r separating the t h e o r e t i c a l l y possible peak and st r e s s p o s i t i o n s , there
i s s t i l l an ambiguous t r a n s i t i o n period between the two p o t e n t i a l accents,
as shown i n Fig. 25. And as was argued i n 3.3.2, t h i s ambivalence i s not so
much between e i t h e r subject or verb focus s t r e s s , but between subject focus
and double s t r e s s . In the l a t t e r case, the l a t e r i s e on "Leo", followed by
an e a r l y f a l l on " s a u f t " , may be i n t e r p r e t e d as belonging t o two FO peak
c o n f i g u r a t i o n s - ' l a t e ' followed by ' e a r l y ' -, w i t h o u t an i n t e r v e n i n g d i p
between the two, or as a s i n g l e ' l a t e ' FO peak on the s u b j e c t . In the f i r s t
case, two accents are perceived, i n the second only one. Because o f the
s t i l l close temporal p r o x i m i t y between the two p o t e n t i a l s t r e s s p o s i t i o n s ,
there must be a s t r e t c h along the peak s h i f t scale where the signal i s
ambivalent between these two i n t e r p r e t a t i o n s . That we are here dealing w i t h
a confusion o f subject focus s t r e s s and double s t r e s s i s proved by expert
l i s t e n i n g t o the se r i e s o f 19 FO peak s h i f t s i n "Aber der Leo s a u f t . " ,
e s t a b l i s h i n g s t r e s s on "Leo" i n s t i m u l i 12 - 14, which may or may not be
^ The r e l e v a n t s e r i a l d i s c r i m i n a t i o n t e s t s were c a r r i e d out but are not reported here i n d e t a i l . The r e s u l t s were negative so t h a t the summarising statement i s considered s u f f i c i e n t .
179
accompanied by st r e s s on " s a u f t " . In stimulus 15, however, the change t o
focus s t r e s s on the verb has taken place: the peak r i s e i s now f a r enough
away from the p o t e n t i a l accent s y l l a b l e i n "Leo" and t h e r e f o r e no longer
associated w i t h the sub j e c t , FO being low d u r i n g the whole o f the word
"Leo".
The perceptual ambiguity between a s i n g l e ' l a t e ' peak and a ' l a t e ' + ' e a r l y '
peak combination i s even stronger i n cases where two p o t e n t i a l accent
s y l l a b l e s abut and the f i r s t contains a short vowel, as i n "Der Ring
g l a n z t . " , as i s shown i n C o n t r i b u t i o n IV ( H e r t r i c h , 1991a). Even when i n
a b u t t i n g accents the f i r s t vowel i s long, or when a short or long vowel i n
the f i r s t p o t e n t i a l accent p o s i t i o n i s fol l o w e d by one unstressed vowel {/a/
or / B / ) , as i n "Die Uhr t i c k t . " , "Die Bremse q u i e t s c h t . " , "Die Maler malen."
(see H e r t r i c h , 1991a), a perceptual confusion between the two categories i s
pos s i b l e . The confusion can be avoided i f f o r the s i n g l e ' l a t e ' peak the
descent i s r a p i d t o avoid trespassing on the second accent s y l l a b l e domain,
as was demonstrated f o r "Die Maler malen." ( l o c . c i t . ) . So i f the temporal
distance between two p o t e n t i a l accents i s short enough, the FO peak s h i f t
through the sequence produces perceptual changes from subject focus stress
t o dual s t r e s s t o verb focus s t r e s s . And i n the t r a n s i t i o n area between the
two focus stresses, perception may be ambiguous between double and s i n g l e
f i r s t accents. This ambiguity disappears as the distance between p o t e n t i a l
accents gets longer, as i n "Die Backer haben gebacken." or "Die Sekretarin
hat d i e B r i e f e geschrieben." ( l o c . c i t . ) .
In accent sequences at longer distances from each other double stress does
not occur by simple FO peak s h i f t through the ut t e r a n c e ; the peak contour
has t o be broadened at the same time t o r e a l i s e a 'medial' or ' l a t e ' r i s e on
one accent s y l l a b l e and an ' e a r l y ' f a l l on the next one. In between these
two i n t o n a t i o n t u r n s - r i s e and f a l l - associated w i t h two stressed
s y l l a b l e s , there may be an FO d i p of various degrees of extension, t o
generate two p r o p e r l y manifested FO peak contours, or the two peak p o i n t s
are j o i n e d by a plateau or a s l i g h t monotone descent/ascent, c r e a t i n g a 'hat
p a t t e r n ' ( c f . Cohen & ' t Hart, 1967). Although the 'hat p a t t e r n ' i s
p e r c e p t u a l l y and semantically d i f f e r e n t from a succession of complete peaks
(as i s shown i n C o n t r i b u t i o n V I , H e r t r i c h , 1991b, see also C o n t r i b u t i o n V I I ,
Kohler, 1991d), there are strong arguments i n favour o f t r e a t i n g a 'hat
180
p a t t e r n ' as a succession o f two peaks without an FO d i p :
(1) The t i m i n g o f the i n i t i a l r i s e i s e x a c t l y the same as the r i s i n g p a r t i n
a 'medial' or ' l a t e ' peak. There are r i s i n g p a t t e r n s t h a t are timed more
slowly and have r i s e s up t o the beginning of the next stressed s y l l a b l e
(see C o n t r i b u t i o n V I I , Kohler 1991d). They have t o be recognised as
separate e n t i t i e s . So we would have t o set up two r i s i n g p a t t e r n s - slow
and f a s t - but since the l a t t e r coincides w i t h the r i s i n g p a r t o f the
peak p a t t e r n i t i s more economical t o have no new u n i t s ' f a s t r i s e s ' .
The complementary s o l u t i o n t o regard 'medial' or ' l a t e ' peaks, too, as
being composed o f two tonal e n t i t i e s each - r i s e and f a l l - i s r u l e d out
by the f a c t t h a t they c o n s t i t u t e one s t r e s s , whereas the 'hat p a t t e r n '
r i s e s and f a l l s represent two stresses.
(2) The t i m i n g and s y l l a b l e alignments of the f i n a l f a l l c o i n c i d e w i t h the
f a l l i n g s e c t i o n of an ' e a r l y ' (or 'medial') peak.
(3) 'Hat p a t t e r n s ' can be derived from the corresponding dipped peak
sequences by general phonetic r u l e s changing the prominence
r e l a t i o n s h i p s between the f i r s t and the second peak as a consequence o f
removing phonetic features c h a r a c t e r i s t i c o f the d e f i n i t i o n s o f the
d i f f e r e n t FO peaks. Two cases can be d i s t i n g u i s h e d :
(a) In the sequence 'medial' (or ' l a t e ' ) + ' e a r l y ' peaks, the
e l i m i n a t i o n o f the FO d i p does not a f f e c t the e s s e n t i a l f e a t u r e o f the
low f a l l i n g FO i n the ' e a r l y ' peak and also preserves the c h a r a c t e r i s t i c
(low l e v e l +) r i s e i n the 'medial' (or ' l a t e ' ) peak (see 2.1.1.7), but
i t modifies the complete m a n i f e s t a t i o n o f the l a t t e r by removing the
separate FO descent, thereby reducing i t s prominence.
(b) In the sequence 'medial' (or ' l a t e ' ) + 'medial' ( o r ' l a t e ' ) peaks,
the e l i m i n a t i o n o f the FO d i p r e s u l t s i n a loss o f the 'medial' or
' l a t e ' c h a r a c t e r i s t i c s of the second peak because i n a derived 'hat
p a t t e r n ' i t lacks the e s s e n t i a l FO r i s e i n the s y l l a b l e nucleus (see
2.1.1.7), and since i t cannot be associated w i t h an ' e a r l y ' peak e i t h e r ,
not having the e a r l y low f a l l , i t lacks the prominence-lending f e a t u r e
of the 'medial' peak r i s e as w e l l as o f the ' e a r l y ' peak f a l l . But since
on the other hand, the f i r s t peak has i t , the prominence o f the second
one i s subordinated. Thus a p r i n c i p l e d r e l a t i o n s h i p can be established
between 'hat p a t t e r n s ' and peak sequences on the basis o f general
phonetic r u l e s modifying the r e l a t i v e prominences o f the peaks.
181
In both cases {3a) and ( 3 b ) , the generation o f a 'hat p a t t e r n ' from a dipped
peak sequence does not change the number of accents, but only the prominence
r e l a t i o n s between them. Thus when the sentence "Die Wahlerinnen wahlen." i s
combined e i t h e r w i t h a 'hat p a t t e r n ' c o n s i s t i n g o f a medial (or l a t e ) r i s e
on "Wahlerinnen" plus a medial f a l l on "wahlen", or w i t h a s i n g l e 'medial'
(or ' l a t e ' ) peak on "Wahlerinnen", only the second i n t o n a t i o n represents
focus s t r e s s on the subject and deaccentuation o f the verb (see also
C o n t r i b u t i o n V I , H e r t r i c h , 1991b).
3.5 I n t e n s i t y i n the cuing o f s t r e s s and i n t o n a t i o n
The question now a r i s e s as t o whether i t i s p o s s i b l e t o change stress
perception simply by varying i n t e n s i t y . Two t e s t cases may be d i s t i n g u i s h e d :
(a) Utterances t h a t are ambiguous between one and two stresses i n FO peak
s h i f t s , such as "Aber der Leo s a u f t . " i n 3.4,
(b) 'hat p a t t e r n s ' i n which a medial (or l a t e ) FO r i s e i s immediately
followed by a medial FO f a l l , reducing the prominence o f the second
stress compared w i t h the sequence o f two complete peaks ( c f . 3.4).
I f i n t e n s i t y alone can change st r e s s perception, then i t should be possible
in (a) t o produce a switch from double t o i n i t i a l focus s t r e s s simply by
reducing the i n t e n s i t y i n the second accent s y l l a b l e and by simultaneously
r a i s i n g i t i n the f i r s t . S i m i l a r l y i n ( b ) , i t should be p o s s i b l e t o a l t e r
the prominence r e l a t i o n by a comparable i n t e n s i t y adjustment i n the two
accent s y l l a b l e s .
The issue has been t e s t e d i n t e r a c t i v e l y by changing the source amplitude
values accordingly i n the RULSYS TTS s y n t h e s i s - b y - r u l e . The r e s u l t has been
negative: the focussing, and consequently the number o f stresses or the
prominence r e l a t i o n , does not change. I t i s more the r e l a t i v e loudness t h a t
i s a f f e c t e d (see also Kohler, 1991f). This i s f u r t h e r support f o r the long-
established f i n d i n g t h a t i n t e n s i t y has a low s i g n a l l i n g value f o r stress
compared w i t h FO and d u r a t i o n (Fry, 1958).
The s i t u a t i o n i s d i f f e r e n t as regards the c o n t r i b u t i o n o f i n t e n s i t y t o the
perception o f i n t o n a t i o n . Again two cases may be d i s t i n g u i s h e d :
(a) I t has already been discussed i n 2.1.1.5 t h a t a l a t e FO peak p a t t e r n
requires a p a r a l l e l t i m i n g of the i n t e n s i t y course t o guarantee i t s
perceptual i d e n t i t y .
182
(b) I t i s argued i n C o n t r i b u t i o n V (Kohler & Gartenberg, 1991) t h a t lower
i n t e n s i t i e s around the FO peaks i n ' e a r l y ' and ' l a t e ' p a t t e r n s v i s a v i s
'medial' ones have t o be o f f s e t by higher FO t o provide the same
prominence across the d i f f e r e n t i n t o n a t i o n s . On the other hand, the
' e a r l y ' peak p a t t e r n , which accentuates low FO, has i t s c h a r a c t e r i s t i c s
strengthened by not having a lower i n t e n s i t y around i t s prenucleus FO
maximum compensated f o r i n a higher FO peak value.
F i n a l l y , the d i s r u p t i o n o f the n a t u r a l p a r a l l e l i s m i n the time courses of
FO, source amplitude and sound i n t e n s i t y f o r the three t e r m i n a l peak
contours, as i t i s caused by the synthesis of FO peak s h i f t s across an
o r i g i n a l 'medial' peak utterance, may r e s u l t i n a degraded acoustic output
q u a l i t y . So, when a na t u r a l 'medial' peak speech s i g n a l o f "Sie hat j a
gelogen." i s taken as a p o i n t o f departure f o r LPC synthesis w i t h a ' l a t e '
peak, the st r e s s and i n t o n a t i o n categories are s i g n a l l e d c o r r e c t l y , but the
utterance sounds husky at the end and overloaded i n the middle because FO
and i n t e n s i t y diverge i n opposite d i r e c t i o n s i n these two places. To improve
the synthesis q u a l i t y o f ' l a t e ' peaks appropriate c o r r e c t i o n s at these
points i n the i n t e n s i t y curve had t o be c a r r i e d out f o r "Aber der Leo
s a u f t . " i n 3.3.1 (see also Kohler, 1991f).
3.6 General discussion concerning Hypothesis (3)
The perception experiments o f 3.1-5 have l a r g e l y confirmed Hypothesis (3)
and i t s c o r o l l a r i e s o f C o n t r i b u t i o n I (Kohler, 1991b). I f there i s more than
one p o t e n t i a l accent i n a single-accent t e r m i n a l utterance - e i t h e r at the
l e x i c a l or at the sentence l e v e l - three phonological i n t o n a t i o n categories
- ' e a r l y ' , 'medial', ' l a t e ' peaks - are d i s t i n g u i s h e d at each stress
p o s i t i o n , provided the temporal distance between the accent places allows
the separation o f the FO peak c o n f i g u r a t i o n s . Furthermore, an FO peak s h i f t
a l t e r s the s t r e s s p o s i t i o n as w e l l , which can r e s u l t i n an i n t e r a c t i o n o f
s t r e s s and intonation i f two accent s y l l a b l e s occur at such a short d u r a t i o n
i n t e r v a l t h a t the r i s i n g and f a l l i n g branches o f a peak contour can be at
the time associated w i t h a s i n g l e peak on the f i r s t accent s y l l a b l e or w i t h
a succession of two peaks on two successive accent s y l l a b l e s , not separated
by an FO d i p . This ambivalence of a stimulus between s i n g l e and double
stress r e s u l t s i n a perceptual ambiguity between, e.g., p r e f i x and stem word
stress at the l e x i c a l l e v e l , or subject and verb s t r e s s at the sentence
183
l e v e l . I t i s only when the FO peak i s moved out of the j o i n t domains of both
accent s y l l a b l e s i n order t o be e x c l u s i v e l y i n t h a t o f the second one t h a t
the ambiguity i s resolved and second p o s i t i o n focus s t r e s s r e s u l t s . A l i n k
has thus been e s t a b l i s h e d between 'hat p a t t e r n s ' and dipped FO peak
sequences, based on prominence r e l a t i o n s h i p s , as postulated by Hypothesis
(4) i n C o n t r i b u t i o n I . This p o i n t w i l l be f u r t h e r discussed i n Co n t r i b u t i o n s
VI ( H e r t r i c h , 1991b) and V I I (Kohler, 1991d).
Duration i s a f u r t h e r cue t o st r e s s i n German, but u s u a l l y subordinated t o
FO, unless i t i s too short f o r what i s t o be expected of stressed vowels.
I n t e n s i t y and sp e c t r a l c h a r a c t e r i s t i c s , on the other hand, do not seem t o
play a r o l e i n s t r e s s perception. I n t e n s i t y intervenes as an important cue
t o intonation i d e n t i t y and t o voice (speech) q u a l i t y when the us u a l l y
p a r a l l e l time courses of FO and i n t e n s i t y are d i s r u p t e d , and i t i s , of
course, the signal a t t r i b u t e o f loudness. F i n a l l y , the height o f an FO peak
cues prominence at the perceptual and emphasis at the semantic l e v e l (see
2.3, and C o n t r i b u t i o n s I , V and V I I , Gartenberg & Panzlaff-Reuter, 1991,
Section 6.; Kohler & Gartenberg, 1991; Kohler, 1991d).
4. Conclusions f o r the K i e l I n t o n a t i o n Model o f German (KIM)
The r e s u l t s o f the experiments discussed i n t h i s C o n t r i b u t i o n I I I suggest a
number of p o i n t s t h a t have t o be taken i n t o account i n KIM as regards the
i n t o n a t i o n peak component o f the model.
1. KIM must comprise the phonetic i n t o n a t i o n model proper and the s y n t a c t i c ,
semantic and pragmatic environment p r o v i d i n g i n t e r p r e t a t i o n s f o r symbolic
representations o f sentences as input t o the model.
2. In p a r t i c u l a r , t h i s environment must s p e c i f y the l e x i c a l items t h a t are
to receive sentence s t r e s s , and i t must provide semantic i n t e r p r e t a t i o n s
along the dimensions 'established/new', 'degree of distance between the
speaker and the world', and 'emphasis'.
3. The basic categories of the phonetic model include
(a) a f e a t u r e s p e c i f i c a t i o n o f s t r e s s w i t h reference t o the signal
p r o p e r t i e s FO and d u r a t i o n : iFSTRESS, ±DSTRESS,
(b) a f e a t u r e s p e c i f i c a t i o n o f intonation w i t h reference t o FO peak
p o s i t i o n : lEARLY, and ±LATE w i t h i n -EARLY,
(c) the t i m i n g of the i n t o n a t i o n peaks depending on s y l l a b l e s t r u c t u r e s
(mono/polysyllables, long/short vowels, voiced/voiceless consonant
184
environment),
(d) a numerical scale of peak height with reference to degrees of
prominence,
(e) IFO and CFO modifications of the basic peak contours,
(f) intensity adjustments to guarantee parallelism with FO time course.
4. After the introduction of the peak categories the model has to deal with
their concatenation.
(a) An FO descent from a peak position can be fast or slow. In the
latter case double accentuation may result, or a main accent
followed by a secondary one, e.g. in "Er hat einen Brief
geschrieben." ("He's written a letter.") the final participle is
deaccented in relation to "Brief", which gets the main nuclear
sentence stress. But the deaccentuation may result in a default
secondary stress, or in no stress at all suggesting a contrast
between, for example, "Brief" and "Karte" ("card"). This is the same
phenomenon as what Kingdon (1965, p. 195) has called 'semantic
partial stress' with reference to compounds of different degrees of
semantic unity, e.g. "butter cup" (cup for butter) with secondary
stress on "cup" vs. "buttercup" (ranunculus) with unstressed "cup".
The phonetic manifestation of this difference is not only one of
duration, but, first and foremost, of different timings of the FO
fall from the FO peak.
(b) Besides peak sequences various 'hat patterns' have to be generated
and the semantic and pragmatic differences evaluated.
These points will be developed in Contribution VII (Kohler, 1991d),
supplemented by further model components derived from the empirical data
collections in the other contributions and from interactive RULSYS TTS
experimentation.
185