terminal intonation patterns in single-accent utterances ...€¦ · phonetics institute. although...

I l l

Klaus J. Kohler

Terminal Intonation Patterns in Single-Accent Utterances of German:

Phonetics, Phonology and Semantics

1. I n t r o d u c t i o n

1.1 Hypotheses

This c o n t r i b u t i o n deals w i t h Hypotheses (2) and (3) o u t l i n e d i n 2.1.2 and

2.1.3 o f C o n t r i b u t i o n I (Kohler, 1991b), i . e . w i t h the alignment of an FO

peak r e l a t i v e t o stressed vowel onset i n terminal utterances c o n t a i n i n g one

accent. Section 2. i s concerned w i t h the FO peak p o s i t i o n s i n sentences t h a t

have a unique accent placement because they are made up of j u s t one content

word beside several reduced f u n c t i o n words. Section 3. looks at FO peaks i n

sentences w i t h a l t e r n a t e accent places due t o l e x i c a l s t r e s s oppositions or

to d i f f e r e n t sentence focus. I t d e l i m i t s the s t r e s s and intonation f u n c t i o n s

of FO peaks and discusses t h e i r i n t e r a c t i o n s , also w i t h reference t o the

data presented i n C o n t r i b u t i o n IV ( H e r t r i c h , 1991a). As t h i s w i l l i nvolve

the perceptual ambiguity between one and two accents, peak sequences w i l l

also have t o be considered b r i e f l y w i t h reference t o C o n t r i b u t i o n VI

( H e r t r i c h , 1991b).

1.2 Types o f phonological s t r u c t u r e s f o r perceptual t e s t i n g

The i n v e s t i g a t i o n i s perceptual, aiming at the (phonological) c a t e g o r i z a t i o n

of phonetic FO peak s h i f t continua across a number o f d i f f e r e n t s y l l a b l e

s t r u c t u r e s (long vs. short vowel, s y l l a b l e - i n i t i a l l a t e r a l vs. g l i d e vs.

g l o t t a l stop (creaky voice) vs. voiced f r i c a t i v e , post-nuclear voiced vs.

voiceless consonant) as we l l as two p o t e n t i a l accent p o s i t i o n s i n words

( p r e f i x or stem s t r e s s ) and sentences ( s u b j e c t or verb f o c u s ) ,

1.3 Stimulus generation^

In a l l cases, several n a t u r a l l y produced tokens o f the p a r t i c u l a r sentence

type under s c r u t i n y were recorded on analogue tape (Revox A77, 19cm/s) by

the same male speaker (KK, the author) under s t u d i o c o n d i t i o n s i n the Ki e l

Phonetics I n s t i t u t e . Although a medial peak p o s i t i o n was t o be the basis f o r

stimulus manipulation i n most experiments (but see 3.1 f o r the choice o f an

ea r l y peak as w e l l ) , e a r l y and l a t e peaks were also c o l l e c t e d o f each

l i n g u i s t i c item t o sp e c i f y the ranges of FO peaks from e a r l y t o l a t e t h a t

would have t o be covered by the t e s t s e r i e s , and i n order t o provide

i n f o r m a t i o n about the shapes o f the d i f f e r e n t peaks t o be taken i n t o account

^ The s t i m u l i f o r 2.1.1.2-5 and 3.1-2 were generated by Michael Weinhold.

117

i n the synthesis. The recorded data were checked a u d i t o r i l y f o r successful

rendering o f the intended phonetic s t r u c t u r e s , and, a f t e r A/D conversion

(10 kHz, 5 kHz low-pass f i l t e r ) , the acceptable tokens were processed on a

Data General Eclipse S230 computer w i t h the K i e l Phonetics I n s t i t u t e SSP

programme package (as regards the p i t c h a l g o r i t h m , see Schafer-Vincent,

1982, 1983). Obvious FO analysis e r r o r s (octave jumps, missing FO values i n

s p i t e o f c l e a r p e r i o d i c i t y i n the s i g n a l ) were cor r e c t e d manually.

Then one token c o n t a i n i n g an a u d i t o r i l y c l a s s i f i e d medial (or e a r l y ) peak

was selected and i t s peak contour s h i f t e d along the time axis t o the l e f t

and t o the r i g h t i n a number of steps o f f i x e d d u r a t i o n determined

separately f o r each utterance, t o create new FO versions. The s h i f t was

e f f e c t e d e i t h e r as a p a r a l l e l t r a n s p o s i t i o n o f both branches o f the peak

contour, or the f a l l i n g branch was time-expanded i n l e f t s h i f t s as f a r as

the o r i g i n a l r ight-hand base p o i n t , t o approximate n a t u r a l productions by a

less steep descent and t o avoid too long a low FO s t r e t c h i n the LPC

synthesis. The two types o f l e f t s h i f t do not a l t e r the basic

c h a r a c t e r i s t i c s o f medial t o e a r l y peak changes; the p a r a l l e l t r a n s p o s i t i o n

of the whole peak c o n f i g u r a t i o n simply sounds more f i n a l and c a t e g o r i c a l

than the one w i t h the f l a t t e n e d f a l l . A f t e r the s h i f t , the t a i l contour was

j o i n e d t o the new peak p o s i t i o n by expansion or compression, s i m i l a r l y the

immediate precursor, and f i n a l l y FO was masked i n vo i c e l e s s s t r e t c h e s .

Fig. 1 i l l u s t r a t e s the p r i n c i p l e s o f generating FO peak s h i f t v e rsions.

The o r i g i n a l utterances were then synthesized w i t h the LPC a n a l y s i s values

and the new FO versions obtained through the peak s h i f t parameter

manipulation.

1.4. Perception experiments

Two types o f d i s c r i m i n a t i o n and of i d e n t i f i c a t i o n t e s t s were performed:

(1) A quick s e r i a l discrimination t e s t , i n which l i s t e n e r s were presented

w i t h the ordered series o f peak s h i f t s t i m u l i from l e f t t o r i g h t or

r i g h t t o l e f t and asked fo u r questions on prepared answer sheets; f o r

each question they heard the se r i e s at l e a s t once.

(a) Do you perceive any changes i n the melody o f the sentence from one

stimulus t o the next?

No - one change - several changes.

118

r i M E < R E L > 1 I [SEC]

m—^ ' S ie ha t j a g e

PITCH CHZ]

0J_

I !

TIME(REL) CSEC]

PITCH [HZ]

Fig. 1

(a) Speech wave and fundamental frequency ( l i n e a r scale) o f a medial peak i n the n a t u r a l l y produced utterance "Sie hat j a gelogen." ("She's been l y i n g . " ) . The end contour (on the s y l l a b l e geQ) was added by FO parameter manipulation because the analysis d i d not provide i t . The time marks A i , kz d e l i m i t the FO peak contour ( c o i n c i d i n g approximately w i t h / o : / ) , which was s h i f t e d l e f t and r i g h t . (b) The l e f t - and right-most p o s i t i o n s of the s h i f t e d FO peak contour on the same time scale as i n ( a ) , approximating the n a t u r a l productions of e a r l y and l a t e peaks, r e s p e c t i v e l y .

119

(b) At which stimulus i n the series has the f i r s t change occurred?

E n c i r c l e the r e l e v a n t number.

(c) At which s t i m u l i i n the series have f u r t h e r changes occurred?

E n c i r c l e the r e l e v a n t numbers.

(d) What are the meanings o f the o r i g i n a l utterance and o f utterances

representing the f i r s t and f u r t h e r changes i n the series?

The t e s t tape c o n s t r u c t i o n had the f o l l o w i n g format:

200-ms bleep

800-ms pause

stimulus 1 (or n)

3-s pause

stimulus 2 (or n - 1)

3- s pause

•

stimulus n (or 1).

(2) A formal randomized AX or XA discrimination t e s t , i n which a l l the p a i r s

of one or two-step d i f f e r e n c e s , as we l l as o f i d e n t i c a l s t i m u l i

( r e s t r i c t e d t o uneven rank t o l i m i t the t e s t s i z e ) , from the ordered

peak s h i f t s e ries were presented f o r 'same/different' judgements on

prepared answer sheets. Two t e s t tapes were compiled, one f o r the

ascending and one f o r the descending order o f arrangement o f s t i m u l i

w i t h i n the p a i r s , and each co n t a i n i n g a randomization o f 2 r e p e t i t i o n s

of a l l the d i f f e r e n t as we l l as the i d e n t i c a l stimulus p a i r s , w i t h the

f o l l o w i n g general format:

200-ms bleep

800-ms pause

stimulus A (or X)

2-s pause

stimulus X (or A)

4- s pause

and so on f o r a l l the stimulus p a i r s . A f t e r each block o f 10 t e s t items

a f u r t h e r 500-ms bleep was added f o r o r i e n t a t i o n .

(3) A natural stimuli i d e n t i f i c a t i o n t e s t , i n which t h r e e d i f f e r e n t

120

n a t u r a l l y spoken contexts were paired w i t h sentences c o n t a i n i n g each o f

three n a t u r a l l y produced peak p o s i t i o n s - e a r l y , medial, l a t e - f o r

subjects t o judge, on prepared answer sheets, whether the context and

(the melody o f ) the t e s t item matched or not. The t e s t tapes were

compiled i n a short version of 9 items (3 contexts x 3 peaks) and a long

one of 90 items, w i t h 10 r e p e t i t i o n s o f each of the 9 items. In each

tape the s t i m u l i were randomized and foll o w e d the same format as i n the

randomized d i s c r i m i n a t i o n t e s t , w i t h the only d i f f e r e n c e t h a t the pause

between context and t e s t stimulus was 0.5 s.

(4) A synthesized stimuli i d e n t i f i c a t i o n t e s t , i n which one synthesized

context sentence was paired w i t h each stimulus from an e a r l y t o medial

FO peak s h i f t s e r i e s , f o r subjects t o judge, on prepared answer sheets,

whether context and t e s t item matched or not. The t e s t tape contained a

randomization o f 10 r e p e t i t i o n s o f each context and t e s t stimulus

combination, f o l l o w i n g the same format as i n the natu r a l s t i m u l i

i d e n t i f i c a t i o n t e s t .

The t e s t f i l e s were compiled on the computer and output on analogue tape.

The l i s t e n i n g t e s t s (except those i n 2.1.2 and 2.1.3; see the separate

d e s c r i p t i o n s t h e r e ) took place i n the a c o u s t i c a l l y t r e a t e d s t u d i o o f the

Kie l Phonetics I n s t i t u t e . The s t i m u l i were presented v i a loudspeaker t o

v a r i a b l y sized groups o f up t o 8 persons, who were students o f a v a r i e t y o f

subjects i n c l u d i n g p h o n e t i c s / l i n g u i s t i c s / l a n g u a g e s , as we l l as members o f

academic and t e c h n i c a l s t a f f , and "naive" o u t s i d e r s , a l l w i t h German of a

northern v a r i e t y as t h e i r n a t i v e language (except f o r 2.1.2 and 2.1.3; see

the separate d e s c r i p t i o n s t h e r e ) , ^

1.5 I n t e r a c t i v e perceptual t e s t i n g at the computer

The development o f an i n t o n a t i o n model f o r German and i t s RULSYS TTS

implementation (see Co n t r i b u t i o n s I and V I I ; Kohler, 1991b, d) have made i t

possible t o check the perceptual relevance of c e r t a i n changes i n FO

c o n f i g u r a t i o n s very q u i c k l y by generating parametric d i s p l a y s and acoustic

output from orthographic i n p u t (supplemented by a d d i t i o n a l symbolic markers,

Michael Weinhold put together the t e s t tapes, c a r r i e d out the t e s t s , and compiled the data, f o r 2.1.1.2-5 and 3,1-2.

121

such as @ZZ f o r e a r l y or @ZZZ f o r l a t e peaks) and by modifying the acoustic

output i n t e r a c t i v e l y through systematic changes i n the graphic parameter

r e p r e s e n t a t i o n . This can be achieved i n two ways:

(a) In a graphic d i s p l a y o f the type i l l u s t r a t e d i n Fig. 2, FO p o i n t s are

moved, i n s e r t e d , deleted, or changed i n value, and the speech signal i s

regenerated w i t h the new parameter s p e c i f i c a t i o n f o r a u d i t o r y e v a l u a t i o n ,

also f o r a u d i t o r y comparison w i t h the stored o r i g i n a l .

(b) A p i t c h c o n f i g u r a t i o n i s defined by the use of the f r e e v a r i a b l e s X and

Y ( f o r time and frequency) as, f o r example, i n the r u l e

00.01: <VOK,FSTRESS,TERMIN> ^ <TF0=TF0+(X-100)/2.5,T2F0=T2F0+(X-100)/2.5,

T3F0=T3F0+(X-100)/2.5,2F0=Y>,

which means t h a t a (medial) peak p a t t e r n <TERMIN> associated w i t h an

accented vowel (VOK,FSTRESS> and defined by three FO p o i n t s w i t h the time

values TF0,T2F0 and T3F0 i s t o be displaced i n time by adding or s u b t r a c t i n g

the same v a r i a b l e time value X, and/or v e r t i c a l l y expanded or compressed by

varying the frequency value o f the centre FO p o i n t (2F0). An orthographic

input i s then processed by the system up t o t h i s r u l e , when an X-Y plane as

shown i n Fig. 3 appears on the screen, representing 250 time frames of 10 ms

along the h o r i z o n t a l and 250 u n i t s of 1 Hz along the v e r t i c a l . A cursor can

now be moved, e.g. i n 5-unit steps, t o feed the v a r i a b l e s X and Y i n r u l e

00.01 w i t h new values f o r f u r t h e r processing. In r u l e 00.01, the a d d i t i v e

time constant o f -100 resets the zero p o i n t , and the f a c t o r of 1/2.5

rescales the temporal step size from 5 x 10 ms t o 5 x 10/2.5 ms = 20 ms,

a l l o w i n g p a r a l l e l s h i f t s o f a l l FO p o i n t s by 20 ms w i t h one cursor step

along the h o r i z o n t a l t o the r i g h t and t o the l e f t from the (medial) zero

p o s i t i o n . The peak p a t t e r n can thus be c o n t i n u a l l y s h i f t e d along the time

scale and the a u d i t o r y consequences te s t e d i n a quick succession from

stimulus t o stimulus o f the same sentence type. S i m i l a r changes can be made

i n the frequency a x i s .

Both procedures (a) and (b) are very e f f e c t i v e f o r quick hypothesis t e s t i n g

and quick checking o f p o i n t s l e f t open by the more elaborate perception

experiments, and have been used a good deal i n the K i e l I n t o n a t i o n P r o j e c t

t o confirm and expand formal t e s t r e s u l t s as w e l l as t o prepare the ground

f o r new hypotheses and t h e i r e v a l u a t i o n i n group l i s t e n i n g t e s t s .

122

175

150

125

100

75

50

25

FO 130

10 16 22 28 35 42 +B 55 61 67 73 5 6 6 6 7 7 6 7 6 6 6

93 10104 111 20 B 3 7

Fig. 2

RULSYS development system output of the symbolic i n p u t "Sie hat j a gelogen @ZZ." w i t h an e a r l y FO peak. FO ( i n Hz; square parameter and cosine i n t e r p o l a t i o n between defined FO p o i n t s ) and phonetic t r a n s c r i p t i o n aligned t o the time scale (segment and cumulative durations i n c s ) ; cursor p o s i t i o n e d on the peak value; EO = a.

Fig. 3

X-Y plane f o r p r o v i d i n g v a r i a b l e s , defined i n a TTS r u l e (e.g. time and frequency), w i t h new values by moving a cursor along the h o r i z o n t a l and/or the v e r t i c a l a x i s .

123

2. FO peak alignment

2.1. Phonetics and phonology

2.1.1 K i e l experiments on German

The f i r s t question t o be asked w i t h regard t o FO peak alignment i s as t o how

the acoustic continuum of FO maximum value p o s i t i o n from e a r l y ( w e l l before

the onset of the stressed vowel w i t h which i t i s associated) t o medial

(around the stressed vowel ce n t r e ) t o l a t e ( a t the end of the stressed

vowel) i s p a r t i t i o n e d p e r c e p t u a l l y . Is the co n t i n u a l change o f the temporal

r e l a t i o n o f the FO maximum t o stressed vowel onset c o r r e l a t e d w i t h a gradual

perceptual change, or are there c a t e g o r i c a l breaks corresponding t o

phonological switches, and how many o f these have t o be recognized? The

second question, which i s c l o s e l y l i n k e d w i t h the f i r s t one, r e l a t e s t o

whether the perceptual o r g a n i z a t i o n of the physical continuum i s dependent

on the segmental s t r u c t u r e o f the stressed s y l l a b l e , i n p a r t i c u l a r the

du r a t i o n o f the stressed vowel, the c l e a r acoustic segmentability o f

stressed s y l l a b l e i n i t i a l consonants ( l a t e r a l s or f r i c a t i v e s vs. g l i d e s or

creaky onset) and the presence of p o s t - v o c a l i c v o i c i n g . To f i n d answers t o

these questions peak s h i f t s e r i e s were created f o r the f o l l o w i n g f i v e

utterances:

(1) "Sie hat j a gelogen." [ z i fiat 5a §3'lo:§i)] ("She's been l y i n g . " )

(2) "Es i s t j a gelungen." [es i s t 5a ga 'luqan] ( " I t has worked.")

(3) "Sie hat j a g e j o d e l t . " [ z i fiat sa ga ' j o i d a i t ] ("She's been y o d e l l i n g . " )

(4) "Sie mu6 wohl a r b e i t e n . " [ z i mus v o l 'Tasbaitn] ("She w i l l have t o

work.")

(5) "Er i s t j a g e r i t t e n . " [ E B i s t 9a ga ' B i t n ] ("He's been r i d i n g . " )

2.1.1.1 "Sie hat j a gelogen."

Taking the medial FO peak p o s i t i o n o f the o r i g i n a l utterance i n Fig. 1 as a

poin t o f departure, the contour A1A2 was moved along the time axis i n 6

equal steps o f 30 ms each t o the l e f t and 4 corresponding steps t o the

r i g h t . In the t r a n s p o s i t i o n t o the r i g h t , both branches were moved i n

p a r a l l e l , i n the one t o the l e f t , only the r i s i n g branch was, the f a l l i n g

one being expanded between the new maximum p o s i t i o n and the o r i g i n a l r i g h t

base p o i n t . A series w i t h complete p a r a l l e l s h i f t also t o the l e f t was

generated as w e l l , but the LPC synthesis q u a l i t y was i n f e r i o r due t o the

long lo w - l e v e l FO, sounding r a t h e r " m e t a l l i c " , although the p i t c h p a t t e r n

was not unnatural, conveying the meaning o f gr e a t e r f i n a l i t y i n the

124

statement and of less room f o r argument. Moreover, the n a t u r a l productions

o f e a r l y peaks i n t h i s sentence showed the same f l a t t e n e d FO descent. As

informal l i s t e n i n g d i d not suggest a d i f f e r e n t behaviour w i t h regard t o the

perceptual assessment o f s h i f t s i n the peak p o s i t i o n i n the two s e r i e s , the

one w i t h the adjusted f a l l i n g branch was chosen f o r the l i s t e n i n g

experiments.

2.1.1.1.1 D i s c r i m i n a t i o n t e s t s

The 11 s t i m u l i entered i n t o both d i s c r i m i n a t i o n t e s t s (1) and (2) of 1.4 i n

the ascending as w e l l as the descending order.

Results

Table I presents the responses by 60 l i s t e n e r s i n the l e f t - r i g h t peak

sequence o f the s e r i a l discrimination t e s t , Table I I the responses by 33

l i s t e n e r s i n the r i g h t - l e f t sequence.

Table I

Frequency d i s t r i b u t i o n o f 'change has occurred' responses by 60 l i s t e n e r s i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a gelogen." (1 = l e f t - m o s t , 11 = right-most p o s i t i o n )

Stimulus

3 4 5 6 7 8 9 10 11 F i r s t change

perceived 1 4 39 16

Further changes

perceived 1 5 11 15 21 22 11

Total 1 4 40 21 11 15 21 22 11 Table I I Frequency d i s t r i b u t i o n o f 'change has occurred' responses by 33 l i s t e n e r s i n the r i g h t - l e f t sequence o f the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a gelogen." (1 = l e f t - m o s t , 11 = right-most p o s i t i o n )

Stimulus

10 9 8 7 6 5 4 3 2 1 F i r s t change

perceived 5 7 13 4 3 1

Further changes

perceived 1 4 2 6 16 6 10 5 1

Total 5 1 11 15 10 19 7 10 5 1

125

The randomized paired discrimination t e s t i n the ascending or d e r i n g was

c a r r i e d out w i t h a group o f 39 subjects, i n the descending o r d e r i n g w i t h a

d i f f e r e n t group o f 34 sub j e c t s ; each o f the two t e s t s contained the p a i r i n g s

of the i d e n t i c a l s t i m u l i at the uneven rank numbers i n the se r i e s o f 11 t e s t

items described i n 2.1.1.1 and ordered from l e f t - m o s t t o right-most FO

peak p o s i t i o n . Fig. 4 shows the r e s u l t s .

Fig. 4

D i s c r i m i n a t i o n f u n c t i o n s i n the randomized paired discrimination t e s t , showing percentage of ' d i f f e r e n t ' judgements f o r utterance p a i r s o f "Sie hat j a gelogen." w i t h 0-step ( a ) , 1-step ( b ) , or 2-step (c) distances o f FO peak p o s i t i o n s , i n the ord e r i n g l e f t - r i g h t (continuous l i n e ) or r i g h t - l e f t (broken l i n e ) . The stimulus number r e f e r s t o the second stimulus i n the ascending and t o the f i r s t stimulus i n the descending order. 73 sbs., n = 146 at each data p o i n t ( a ) ; 39 sbs., n = 78 i n the l e f t - t o - r i g h t , 34 sbs., n = 68 i n the r i g h t - t o - l e f t o r d e r i n g o f (b) and ( c ) .

126

% 'diffGrent'

2 3 4 5 6 7 8 9 10 11

% 'diffGrGnt' -100

80

h60

Ho

h20

-L. stimnr

J I L

3 4 5 6 7 8 9 10 11

127

Discussion

Both types of t e s t converge i n demonstrating a major and a minor peak i n the

d i s c r i m i n a t i o n f u n c t i o n - around s t i m u l i 5/6 and 9/10, r e s p e c t i v e l y , but

also a strong order e f f e c t . On the one hand, d i s c r i m i n a t i o n i s sharpest, and

equally so i n both orderings o f d i f f e r e n t s t i m u l i , i f the 1-step distance i s

located between s t i m u l i 5 and 6, or, correspondingly, the 2-step distance

between s t i m u l i 5 and 7 ( i . e . f o r the p a i r s 5 - 6, 6 - 5, 5 - 7 and 7 - 5 ) ;

on the other hand, the d i f f e r e n t i a t i o n weakens i f the distance i s located at

a lower p o s i t i o n i n the series f o r the descending sequence (5 - 4, 5 - 3) or

at a higher p o s i t i o n f o r the ascending one ( 6 - 7 , 7 - 8 , 6 - 8 , 7 - 9 ) .

Stimulus 5 i s h i g h l y d i s c r i m i n a t e d i f i t comes second or i s spanned i n the

p a i r , i . e . i n 4 - 5, 3 - 5 , 4 - 6 , and t h i s even occurs by way o f ' f a l s e

alarms' i n i d e n t i c a l p a i r i n g s o f stimulus 5.

So the question a r i s e s as t o what there i s i n the s i g n a l t h a t might mark

stimulus 5 as d i f f e r e n t from a l l the others. Fig. 5 shows the p o s i t i o n s o f

the FO peaks i n s t i m u l i 4 and 5 i n r e l a t i o n t o the speech wave. Stimulus 5

i s the f i r s t one i n the series o f 11 from l e f t t o r i g h t , where the FO

contour enters the accented vowel /o:/ on a r i s i n g slope; i n a l l the

preceding s t i m u l i i n the s e r i e s , FO f a l l s throughout the vowel. In stimulus

5 the increase of acoustic energy i n the t r a n s i t i o n from the consonant / I /

t o the vowel /o:/ i s thus coupled w i t h a r i s i n g FO, the r i s i n g slope o f the

peak contour across / g a l o : / being i n t e n s i f i e d over i t s f i n a l 30 ms. In

stimulus 4 t h i s does not happen, but a f a l l i s i n t e n s i f i e d i nstead. As the

peak i s moved f u r t h e r t o the r i g h t , the FO r i s e becomes p r o g r e s s i v e l y more

extensive over a p r o g r e s s i v e l y longer increase i n acoustic energy up t o the

middle of the vowel, i . e . t o the FO peak p o s i t i o n i n stimulus 7, which

coincides w i t h the o r i g i n a l production. In t h i s continuum, d i s t i n c t i v i t y

between successive s t i m u l i w i l l drop, i f the increase i n the FO r i s e has

reached perceptual s a t u r a t i o n . This seems t o happen a f t e r stimulus 6.

A f u r t h e r s h i f t o f the FO peak t o the r i g h t beyond stimulus 7 r e s u l t s i n an

increasing low FO s t r e t c h (see Fig. 1 ) , which receives the i n t e n s i f i c a t i o n ,

whereas, at the same time, the end of the r i s e i s l i n k e d w i t h a decrease o f

acoustic energy. When both parameter changes are l a r g e enough, successive

s t i m u l i have t h e i r d i s t i n c t i v i t y r a i s e d again. This seems t o happen around

128

s t i m u l i 9 and 10 i n the ascending order, but i s obviously a much weaker

e f f e c t than the change from f a l l i n g t o r i s i n g FO i n the stressed vowel,

producing much lower peaks i n the response f u n c t i o n s .

0

Fig. 5

FO peaks i n s t i m u l i 4 and 5 o f the series of 11 "Sie hat j a gelogen." from l e f t - m o s t t o right-most p o s i t i o n , i n r e l a t i o n t o the speech wave. The v e r t i c a l l i n e s mark the FO maximum.

These r e s u l t s suggest t h a t there i s a maximum o f s e n s i t i v i t y i n the peak

s h i f t continuum i n the area of s t i m u l i 5/6. So any p a i r i n g s within or

progressing towards t h i s area are d i s c r i m i n a t e d best, v i z . 4 - 5 , 5 - 6 ,

6 - 5 , 7 - 6 (and even 8 - 7 ) ; 3 - 5 , 4 - 6 , 5 - 7 , 7 - 5 , 8 - 6 , but not

5 - 4 , 6 - 7 , 7 - 8 , 5 - 3 , 6 - 8 , where the progression i s away from the

area o f high s e n s i t i v i t y . A second, weaker s e n s i t i v i t y peak, i s loca t e d at

s t i m u l i 9/10, but does not surface i n the response f u n c t i o n s f o r the

descending order, because o f the displacement t o the r i g h t o f the

d i s c r i m i n a t i o n curve associated w i t h s t i m u l i 5/6. The acoustic continuum i s

thus p e r c e p t u a l l y p a r t i t i o n e d i n t o two c l e a r l y d e l i m i t e d sections w i t h the

boundary o c c u r r i n g between s t i m u l i 4 and 6, and t h i s perceptual d i v i s i o n

coincides w i t h an acoustic change from f a l l i n g t o r i s i n g FO across stressed

vowel onset. Around the boundary between these two sections, d i s c r i m i n a t i o n

129

i s sharpest, and, as w i l l be seen i n 2.1.1.1.2 and 2.2, the two p e r c e p t u a l l y

determined sections of the acoustic continuum correspond t o two i n t o n a t i o n a l

categories r e l a t e d t o a semantic d i f f e r e n t i a t i o n between 'established' and

'new'.

So i t appears t h a t we are d e a l i n g here w i t h an example of ' c a t e g o r i c a l

perception' (see Repp, 1984), t h i s time i n the domain o f p i t c h (Kohler,

1987a). The data p o i n t t o an abrupt perceptual change when i n the acoustic

continuum the FO peak i s moved i n t o the vowel of the stressed s y l l a b l e . A

f u r t h e r FO peak s h i f t along the acoustic continuum r e s u l t s i n a more gradual

a u d i t o r y change, w i t h a minor s e n s i t i v i t y maximum at a p o i n t where the

i n i t i a l s t r e t c h of l o w - l e v e l FO and the f i n a l weakening of the r i s e - f a l l i n

the stressed vowel become l a r g e enough. The data thus support Hypothesis (2)

(see C o n t r i b u t i o n I ; Kohler, 1991b) as f a r as the abrupt vs. gradual changes

i n perception are concerned. This means t h a t an e a r l y FO peak must

c o n s t i t u t e a phonological category o f German i n t o n a t i o n , c o n t r a s t i n g w i t h a

medial peak, whereas a l a t e peak i s less c l e a r l y separated, although the

perceptual r e s u l t s may t u r n out t o be d i f f e r e n t i f i n accordance w i t h

nat u r a l production the FO peak s h i f t t o l a t e p o s i t i o n s were accompanied by a

s i m i l a r s h i f t of the acoustic energy maximum t o the r i g h t (whereas i n the

stimulus manipulation the energy p r o f i l e of the o r i g i n a l medial FO peak

utterance, synchronized w i t h FO on the vowel centre, was used). The minor

s e n s i t i v i t y maximum i n the response f u n c t i o n could then e a s i l y be boosted

(see 2.1.1.5). In 2.1.1.1.2 and 2.2, f u r t h e r support w i l l be given t o the

o r g a n i z a t i o n o f the semantic f u n c t i o n s i n p a r a l l e l w i t h the perceptual and

phonological s t r u c t u r i n g o f FO peak alignment.

2.1.1.1.2 I d e n t i f i c a t i o n t e s t s

On the basis o f the d i s c r i m i n a t i o n t e s t r e s u l t s and of hypotheses concerning

the semantics of e a r l y , medial and l a t e peaks, three contexts were

constructed:

(1) "Wer einmal l i i g t , dem glaubt man n i c h t , auch wenn er g l e i c h d i e Wahrheit

s p r i c h t . Das g i l t auch f u r Anna."

("Once a l i a r , always a l i a r . This also applies t o Anne.")

This context sets the frame f o r an e s t a b l i s h e d f a c t and the summing up

of an argument, which i s brought t o a close.

(2) " J e t z t versteh' i c h das e r s t . "

130

("Now I understand.")

This context presents a new f a c t and opens up a new argument.

(3) "Oh!"

This context introduces emphatic s u r p r i s e .

Each o f these contexts was spoken n a t u r a l l y and paired w i t h each o f the

three n a t u r a l l y produced peaks i n the sentence "Sie hat j a gelogen." t o form

a natural stimuli i d e n t i f i c a t i o n t e s t according t o 1.4 ( 3 ) . Furthermore, a

synthesized stimuli i d e n t i f i c a t i o n t e s t (see 1.4 ( 4 ) ) was performed w i t h

p a i r i n g s o f context (2) ( " J e t z t " ) and each one o f the f i r s t 8 s t i m u l i i n the

continuum (from l e f t t o r i g h t ) o f 2.1.1.1.

Results

Table I I I and Fig. 6 present the r e s u l t s o f the two t e s t s .

Table I I I

Percentages of 'matching' responses f o r combinations o f 3 contexts and e a r l y , medial or l a t e FO peaks i n the sentence "Sie hat j a gelogen." i n a natural stimuli i d e n t i f i c a t i o n t e s t . 88 subjects

Context

(1) Wer (2) J e t z t (3) Oh

Peak p o s i t i o n

e a r l y 87.5 27.3 8.0

medial 26.1 70.5 72.7

l a t e 13.6 67.0 76.1

Discussion

The r e s u l t s o f combining the 3 contexts and 3 FO peak p o s i t i o n s show t h a t

subjects are able t o make systematic judgements because the responses are

s i g n i f i c a n t l y d i f f e r e n t from chance, being e i t h e r more than 66% or less than

30% i n favour o f 'matching'. This means t h a t the d i f f e r e n t FO peak p o s i t i o n s

must be p e r c e p t u a l l y i d e n t i f i a b l e , and since i n a l l cases the i d e n t i f i c a t i o n

of an e a r l y versus a non-early peak i s f a r more c l e a r l y d i f f e r e n t i a t e d than

t h a t o f a medial versus a l a t e one, t h i s i d e n t i f i c a t i o n t e s t reproduces the

c a t e g o r i z a t i o n o f the d i s c r i m i n a t i o n t e s t s . I t i s only i n the "Wer" context

t h a t the medial vs. l a t e FO peaks y i e l d a s i g n i f i c a n t d i f f e r e n c e i n the

131

% 'matching'

1 2 3 4 5 6 7 8

Fig. 6

I d e n t i f i c a t i o n f u n c t i o n i n the synthesized stimuli i d e n t i f i c a t i o n t e s t , showing percentage 'matching' judgements f o r 8 s t i m u l i "Sie hat j a gelogen." w i t h FO peak s h i f t from l e f t t o r i g h t i n the context " J e t z t versteh i c h das e r s t . " 19 s u b j e c t s ; f o r each stimulus n = 190.

response pa t t e r n s { % = 4.31, p = .05). Contrariwise, the e a r l y p a t t e r n f i t s

l e a s t i n t o the "Oh" context ( d i f f e r e n c e between " J e t z t " and "Oh" contexts

= 31.07, p = .001).

The contextual i z a t i on of the e a r l y t o medial FO peak continuum w i t h the

" J e t z t " i n t r o d u c t i o n ( F i g . 6) shows an abrupt change from 'matching' t o

'non-matching' judgements i n s p i t e of the gradual change along the physical

dimension, and thus adds support t o the assumption o f a c a t e g o r i c a l

perception advanced i n connection w i t h the d i s c r i m i n a t i o n t e s t s . S t i m u l i 1 -

4 represent one perceptual i d e n t i f i c a t i o n category, s t i m u l i 6 - 8 a

132

d i f f e r e n t one. They may be regarded as two phonological c a t e g o r i e s , v i z .

' e a r l y ' and 'medial' FO peaks. The discrimination of s t i m u l i i s sharpest

between these i d e n t i f i c a t i o n c a t e g o r i e s , which i s p r e c i s e l y what the theory

of c a t e g o r i c a l perception p o s t u l a t e s .

2.1.1.1.3 "Sie hat gelogen."

As the o b j e c t i o n was ra i s e d t h a t the responses i n the natural stimuli

i d e n t i f i c a t i o n t e s t of 2.1.1.1.2 might have been influenced by the modal

p a r t i c l e " j a " ( " a f t e r a l l " ; " I see") predetermining the judgement, a new set

of 9 context - peak combinations was generated by e x c i s i n g the signal

p o r t i o n s corresponding t o " j a " from the e x i s t i n g ones used i n the t e s t of

2.1.1.1.2. This s p l i c i n g was easy t o perform because the word was bounded by

si l e n c e (= voiceless occlusions i n [ t ] and [§]). Then two long versions of

the natural stimuli i d e n t i f i c a t i o n t e s t according t o 1.4(3) were generated:

one w i t h the s t i m u l i "Sie hat j a gelogen." and one w i t h "Sie hat gelogen."

These two t e s t s were run at one week's i n t e r v a l w i t h two groups o f subjects

i n the f o l l o w i n g sequence:

Group I (17 su b j e c t s ) d i d the t e s t w i t h the " j a " s t i m u l i f i r s t , the other

t e s t second, f o r Group I I (7 subjects) the order was reversed. Table IV

presents the r e s u l t s .

Table IV

Percentages o f 'matching' responses f o r combinations o f 3 contexts and e a r l y , medial or l a t e FO peaks i n the sentences "Sie hat ( j a ) gelogen." i n a natural stimuli i d e n t i f i c a t i o n t e s t w i t h 10 r e p e t i t i o n s and two groups o f subjects ( I : 17 sbs; I I : 7 sbs); A w i t h , B w i t h o u t " j a "

e a r l y

medial

l a t e

(1) Wer (2) J e t z t (3) Oh

I I I I I I I I I

A 82.9 81.4 31.8 11.4 19.4 15.7

B 86.4 85.7 39.4 38.6 20.6 18.6

A 41.2 28.6 84.1 94.3 65.9 92.9

B 48.2 57.1 80.0 90.0 64.7 84.3

A 25.3 22.9 81.8 98.6 78.2 97.1

B 31.2 37.1 85.3 87.1 87.6 90.0

As i n 2.1.1.1.2 (see Table I I I ) , the responses t o the " j a " s t i m u l i (= A) are

i n a l l cases e i t h e r c l e a r l y p o s i t i v e or negative, and s i g n i f i c a n t l y

133

d i f f e r e n t from equal d i s t r i b u t i o n . Again the 'medial' and ' l a t e ' peaks

produce more s i m i l a r judgement pa t t e r n s than the 'medial' and ' e a r l y ' ones,

and they are only s i g n i f i c a n t l y d i f f e r e n t f o r Group I i n the "Wer" context

(J ^ = 9.66, p = .01) and i n the "Oh" context ( i ^ = 6.44, p = .05). The

strong d i s t i n c t i o n between ' e a r l y ' and 'medial' and the much weaker

d i f f e r e n t i a t i o n f o r 'medial' and ' l a t e ' has thus been confirmed. This

f i n d i n g once more supports the hypothesis o f a c a t e g o r i c a l switch from

' e a r l y ' t o 'medial' and a gradual change from 'medial' t o ' l a t e ' peak

p o s i t i o n s i n the utterance "Sie hat j a gelogen."

For the s t i m u l i w i t h o u t " j a " , i n p r i n c i p l e the same data were obtained. Of

the 18 comparisons of the r e s u l t s f o r utterances w i t h / w i t h o u t " j a " only f o u r

are s t a t i s t i c a l l y s i g n i f i c a n t according t o x t e s t s , the f i r s t one i n Group

I , the others i n Group I I :

(a) the ' l a t e ' peak i n the "Oh" context; = 5.32, p = .05,

(b) the 'medial' peak i n the "Wer" context; '% = 11.67, p = .001,

(c) the ' e a r l y ' peak i n the " J e t z t " context; = 13.75, p = .001,

(d) the ' l a t e ' peak i n the " J e t z t " c o ntext; = 6.89, p = .01.

In ( a ) , ( b ) , (c) the d i f f e r e n c e i mplies an increase i n 'matching' answers

f o r s t i m u l i w i t h o u t " j a " , which i s co n t r a r y t o what would have t o be

expected i f the o b j e c t i o n were v a l i d . In the remaining case ( d ) , however,

there i s a decrease i n 'matching' responses f o r s t i m u l i w i t h o u t " j a " , which

may be taken as an i n d i c a t i o n o f a strengthening through the modal p a r t i c l e

" j a " o f the meaning conveyed by i n t o n a t i o n . But the r e s u l t s cannot be s o l e l y

determined by the p a r t i c l e , a l l the less so since t h i s pairwise t e s t i n g

increases the a e r r o r and may thus r e j e c t the n u l l hypothesis of no

d i s t i n c t i o n between the two utterance types, although i t i s c o r r e c t .

A f u r t h e r o b j e c t i o n might be t h a t the order o f the two t e s t s had an

in f l u e n c e on the r e s u l t s : i f the " j a " s t i m u l i are t e s t e d f i r s t the p a t t e r n

would also be set f o r the s t i m u l i w i t h o u t " j a " . Group I I , f o r which the

order was reversed, should thus produce a s i g n i f i c a n t l y smaller number of

'matching' responses f o r the s t i m u l i w i t h o u t " j a " more f r e q u e n t l y than Group

I , but the above data do not support t h i s assumption. Moreover, the s t i m u l i

w i t h o u t " j a " do not show s i g n i f i c a n t d i s t i n c t i o n s between Groups I and I I ,

w i t h the one exception of the 'medial' peak i n the "Oh" context {% = 9.64,

134

p = .01). In view o f the possible increase o f the a e r r o r , we can thus say

t h a t the t e s t order d i d not have a s i g n i f i c a n t i n f l u e n c e on the response

p a t t e r n s , which are b a s i c a l l y determined by an i n t o n a t i o n a l phonology, i . e .

by ' e a r l y ' vs. 'non-early' FO peak p o s i t i o n s - less s t r o n g l y by 'medial' vs.

' l a t e ' ones -, and which may be heightened, but not replaced by, other

formal means, such as modal p a r t i c l e s .

2.1.1.2 "Es i s t .ia gelungen."

The question now arises as t o whether the p e r c e p t u a l l y r e l e v a n t t i m i n g

d i f f e r e n c e s between d i f f e r e n t peak p o s i t i o n s r e l a t i v e t o stressed vowel

onset are t r a n s f e r a b l e t o other s y l l a b l e s t r u c t u r e s and i n what ways they

may have t o be adjusted. The f i r s t s y l l a b l e s t r u c t u r e selected was the one

con t a i n i n g a p h o n o l o g i c a l l y short vowel, instead o f a long one, i n an

otherwise comparable segment chain: "Es i s t j a gelungen." (see 2.1.1).

Fig. 7 shows the speech wave as we l l as the energy and FO contours i n the

natur a l medial-peak token selected f o r FO peak s h i f t . The t e s t stimulus

generation fo l l o w e d the procedure o f p a r a l l e l s h i f t s o f both branches of the

peak contour (see 1.3). The step s i z e was 30 ms, and one peak was located at

the boundary between the s t r e s s e d - s y l l a b l e i n i t i a l consonant / I / and the

stressed vowel /u/. Fig. 8 shows the 9 d i f f e r e n t peak p o s i t i o n s used f o r the

stimulus generation. Only the quick s e r i a l discrimination t e s t (see 1.4 ( 1 ) )

was performed i n the l e f t - r i g h t sequence w i t h 29 sub j e c t s .

Results

Table V presents the r e s u l t s .

Table V

D i s t r i b u t i o n o f 'change has occurred' responses by 29 l i s t e n e r s i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the 9 s t i m u l i w i t h FO peak s h i f t s i n "Es i s t j a gelungen." (1 = l e f t - m o s t , 9 = right-most p o s i t i o n )

F i r s t change perceived at

Further changes perceived at

4

5

5

19

Stimulus

6

5

Total 21 10

135

.0000

Speech wave, energy and FO contours ( l i n e a r scale) i n the natur a l medial-peak token of "Es i s t j a gelungen." selected f o r FO peak s h i f t . The time marks i n d i c a t e on- and o f f s e t s o f /g/, /a/* / V and /u/. The broken l i n e s mark the l e f t and r i g h t base p o i n t s as w e l l as the maximum o f the peak c o n f i g u r a t i o n t o be s h i f t e d .

Discussion

There i s again an abrupt change i n the response p a t t e r n as the FO peak i s

moved i n t o the stressed vowel. The absolute t i m i n g o f p o s i t i o n s 5 and 6

a f t e r vowel onset, i . e . 30 ms and 60 ms, r e s p e c t i v e l y , i s e x a c t l y the same

as i n the s t i m u l i "Sie hat j a gelogen." (see 2.1.1.1.1). These data p o i n t t o

an absolute time span o f up t o 60 ms i n t o the stressed vowel t h a t i s

responsible f o r a phonological change from ' e a r l y ' t o 'medial' peak, inde

pendent o f the phonological vowel q u a n t i t y and consequently o f vowel dura

t i o n f o l l o w i n g the FO peak, at l e a s t i n d i s y l l a b l e s . This f i n d i n g means t h a t

the 'medial' FO peak has a l a t e r r e l a t i v e p o s i t i o n i n a sho r t vowel than i n

a long one, v i z . cl o s e r t o i t s o f f s e t , and t h i s t i e s i n w i t h the produc

t i o n and perception data i n C o n t r i b u t i o n I I (Gartenberg & Panzlaff-Reuter,

136

0

Fig. 8

Speech wave and FO contour ( l i n e a r scale) i n "Es i s t j a gelungen." w i t h time marks i n d i c a t i n g the 9 FO p o s i t i o n s f o r complete-contour s h i f t from l e f t t o r i g h t .

1991, 5.2). This means, furthermore, t h a t the series of 9 s t i m u l i d i d not

include a proper ' l a t e ' peak: i t would have had t o be loca t e d w e l l i n t o the

unstressed vowel /a/.

2.1.1.3 "Sie hat j a g e j o d e l t . "

The next s y l l a b l e s t r u c t u r e t o be considered contains a long stressed vowel

/o:/, as i n "Sie hat j a gelogen.", but a g l i d e / j / w i t h a much more gradual

a r t i c u l a t o r y / a c o u s t i c t r a n s i t i o n i n the i n i t i a l p o s i t i o n o f the stressed

s y l l a b l e , instead of the more abrupt change associated w i t h the i n i t i a l

l a t e r a l / I / : "Sie hat j a g e j o d e l t . " (see 2.1.1). The question i s as t o

whether the more gradual s p e c t r a l t r a n s i t i o n i n f l u e n c e s the perception o f

the FO t r a n s i t i o n i n t o the stressed vowel, because the FO peak p o s i t i o n

r e l a t i v e t o vowel onset can be less c l e a r l y assessed. Fig. 9 shows the

speech wave as we l l as the energy, FO and spectrum d i s p l a y s i n the na t u r a l

137

2 0 0 n

P I T C H tHZ]

F R E Q U E N C Y CKHZl

Ift f Hi,

Fig. 9

Speech wave, energy, FO ( l i n e a r scale) and sp e c t r a l d i s p l a y s i n the natu r a l medial-peak token o f "Sie hat j a g e j o d e l t . " selected f o r FO peak s h i f t . The time marks i n d i c a t e the l e f t base p o i n t (appr. i n the temporal centre o f the F? t r a n s i t i o n f o r / a j o : / ) , the maximum FO value and the r i g h t base p o i n t .

medial-peak token selected f o r FO peak s h i f t . The t e s t stimulus generation

followed the same procedure as i n 2.1.1.2, w i t h a step s i z e o f 35 ms, and

one peak (nr 5) being located at the temporal centre o f the F2 formant

t r a n s i t i o n i n /ajo:/. Fig. 10 shows the 11 d i f f e r e n t peak p o s i t i o n s used f o r

the stimulus generation. Only the quick s e r i a l discrimination t e s t (see 1.4

( 1 ) ) was performed i n the l e f t - r i g h t sequence w i t h 24 su b j e c t s .

138

Fig. 10

Speech wave and FO contour { l i n e a r scale) i n "Sie hat j a g e j o d e l t . " w i t h time marks i n d i c a t i n g the 11 FO peak p o s i t i o n s f o r complete-contour s h i f t from l e f t t o r i g h t .

Results

Table VI presents the r e s u l t s .

Table VI

D i s t r i b u t i o n o f 'change has occurred' responses by 24 l i s t e n e r s i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the I I s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a g e j o d e l t . " {1 = l e f t m o s t , 11 = right-most p o s i t i o n )

Stimulus 4 5 6 7 8 9 10 11

F i r s t change

perceived 5 9 8 2

Further changes

perceived 1 8 2 3 6 1 6

Total 5 10 16 4 3 6 1 6

139

Discussion

In t h i s case the f i r s t change occurs less a b r u p t l y although i t i s s t i l l

c l e a r l y marked and coincides w i t h the temporal half-way p o s i t i o n o f the FO

peak i n the Fa t r a n s i t i o n . Further changes i n the perceptual p r o f i l e also

occur e a r l i e r than i n the other stimulus types t e s t e d so f a r . A l l t h i s goes

to show t h a t a g l i d e t r a n s i t i o n does i n t e r f e r e w i t h the c a t e g o r i z a t i o n of FO

peaks, but the general p a t t e r n o f a phonological separation o f ' e a r l y ' and

'medial' peaks and a gradual switch from 'medial' t o ' l a t e ' stays.

2.1.1.4 "Sie muB wohl a r b e i t e n . "

The next s y l l a b l e s t r u c t u r e chosen has creaky voice (the phonetic

r e a l i s a t i o n of a s y l l a b l e - i n i t i a l vowel p r e f i x e d by a g l o t t a l stop) before a

. 0 0 0 0

TinE(REL> I CSEC3

50-1

ENERGY CdBl

S P E E C H

200-1

P I T C H CHZI

Fig. 11

Speech wave, energy and FO contours ( l i n e a r scale) i n the natu r a l medial-peak token o f "Sie muB wohl a r b e i t e n . " ( w i t h creaky voice t r a n s i t i o n instead of a g l o t t a l stop i n t e r r u p t i o n of v o i c i n g ) selected f o r FO peak s h i f t . The time marks d e l i m i t the FO peak c o n f i g u r a t i o n t h a t was s h i f t e d ( l e f t and r i g h t base p o i n t s , and maximum).

140

stressed long vowel: "Sie muB wohl a r b e i t e n . " (see 2.1.1). The question i s

as t o whether a creaky voice onset has the same e f f e c t on FO peak

c a t e g o r i z a t i o n as a g l i d e . Fig. 11 shows the speech wave as we l l as the

energy and FO contours i n the token selected f o r FO peak s h i f t . The t e s t

stimulus generation followed the same procedure as i n 2.1.1.2, w i t h a step

size o f 35 ms and one peak ( n r 5) being located at the onset of the more

regul a r g l o t t a l v i b r a t i o n s at the t r a n s i t i o n from / I / t o /an/. Fig. 12 shows

the 11 d i f f e r e n t peak p o s i t i o n s used f o r the stimulus generation. Only the

quick s e r i a l discrimination t e s t (see 1.3 ( 1 ) ) was performed i n the

l e f t - r i g h t sequence w i t h 24 subjects.

0

Speech wave and FO contour ( l i n e a r scale) i n "Sie muB wohl a r b e i t e n " w i t h time marks i n d i c a t i n g the 11 FO peak p o s i t i o n s f o r complete-contour s h i f t from l e f t t o r i g h t .

141

Results

Table V I I presents the r e s u l t s .

Table V I I

D i s t r i b u t i o n o f 'change has occurred' responses by 24 l i s t e n e r s i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie muB wohl a r b e i t e n . " ( 1 = l e f t - m o s t , 11 = right-most p o s i t i o n )

Stimulus

4 5 6 7 8 9 10 11 F i r s t change

perceived 3 20 1

Further changes

perceived 1 2 5 5 8 2

Total 3 20 2 2 5 5 8 2

Discussion

The f i r s t change occurs very a b r u p t l y i n stimulus 5, i . e . about 35 ms a f t e r

the creaky voice t r a n s i t i o n . The perception of l a t e r changes i s spread over

the remainder o f the continuum without c l e a r peaks i n the response f u n c t i o n .

There i s a minor maximum at stimulus 10, i . e . at a s i m i l a r p o s i t i o n as i n

the continuum across "Sie hat j a gelogen.". In every respect "Sie muB wohl

a r b e i t e n . " thus p a t t e r n s w i t h the l a t t e r , r a t h e r than w i t h the case of a

g l i d e t r a n s i t i o n i n "Sie hat j a g e j o d e l t . " . What seems t o be important f o r

FO peak perception i s the abrupt a r t i c u l a t o r y change i n the t r a n s f e r

f u n c t i o n from / I / t o the stressed vowel ( i n "gelungen" as we l l as i n "wohl

a r b e i t e n " ) , not the gradual change i n phonation from voice t o creak t o

voice, superimposed on the a r t i c u l a t o r y switch.

2.1.1.5 "Er i s t j a g e r i t t e n . "

In the s t i m u l i examined so f a r the course o f the peak contour has been

manifested i n the observable FO values. This changes when the post- and/or

pr e - v o c a l i c consonant i n the stressed s y l l a b l e associated w i t h the peak i s

voi c e l e s s . Now p a r t o f the contour has t o be reconstructed before a peak

s h i f t becomes po s s i b l e . F i g . 13 shows the speech wave as w e l l as the energy

and FO contours i n a natu r a l medial-peak token o f "Er i s t j a g e r i t t e n . "

selected f o r FO peak s h i f t (see 2.1.1). The t e s t stimulus generation

142

followed the same procedure as i n 2.1.1.2 w i t h a step s i z e o f 30 ms and 15

peak p o s i t i o n s from l e f t t o r i g h t s t a r t i n g at the beginning of /a/. The

quick s e r i a l discrimination t e s t (see 1.4 ( 1 ) ) was performed i n f o r m a l l y i n

the l e f t - r i g h t sequence by the experimenter. Again there was an abrupt

change i n perception as the peak entered the stressed vowel. But as the peak

was moved i n t o the voiceless s e c t i o n o f / t / i t l o s t i t s c h a r a c t e r i s t i c s ,

becoming lower and lower i n p i t c h . This proves t h a t the maximum value of a

peak contour must be present i n the sign a l f o r i d e n t i f i c a t i o n : i t i s not

reconstructed by a l i s t e n e r from surrounding values of the r i s i n g and

f a l l i n g branches, whereas a low r i g h t base p o i n t may be missing due t o FO

contour t r u n c a t i o n before voicelessness i n f i n a l s y l l a b l e s (see Gartenberg &

Panzlaff-Reuter, 1991, 3.) wi t h o u t detriment t o the peak c h a r a c t e r i s t i c s (on

the c o n t r a r y , there must be t r u n c a t i o n i n c e r t a i n contexts t o guarantee

p a t t e r n i d e n t i t y ) .

.0000

T i n E ( R E L ) CSEC]

E N E R S Y [dB]

X

S P E E C H

200n

P I T C H CHZ]

1.1450

J I

Fig. 13 ~

Speech wave, energy and FO contours ( l i n e a r scale) i n the na t u r a l medial-peak token of "Er i s t j a g e r i t t e n . " selected f o r FO peak s h i f t . The time marks i n d i c a t e on- and o f f s e t s o f /g/, /a/, / b / , / i / , / t / , /n/. The dotted l i n e represents the reconstructed FO i n t e r p o l a t i o n o f the r i g h t branch of the peak contour. The broken l i n e s mark the l e f t and r i g h t base points as we l l as the maximum o f the peak c o n f i g u r a t i o n t o be s h i f t e d .

143

A f u r t h e r s h i f t o f the peak contour maximum t o the onset o f v o i c i n g i n the

f i n a l /n/ approximates the FO c o n f i g u r a t i o n found i n n a t u r a l productions o f

l a t e peaks (see F i g . 14), but the a u d i t o r y impression i s s t i l l t h a t of a

medial peak, not o f a l a t e one. A comparison o f Figs. 13 and 14 shows t h a t

the f i n a l nasals i n medial and l a t e peaks d i f f e r i n amplitude and mode of

vocal f o l d v i b r a t i o n : In medial peaks (and the same would apply t o e a r l y

ones), the low FO f a l l a t the end of an utterance i s accompanied by a drop

i n source amplitude, which weakens unstressed vowels and sonorants

considerably, o f t e n reducing them t o creaky voice and t o i r r e g u l a r breathy

g l o t t a l pulses. In l a t e peaks, t h i s d e c l i n e i s moved t o the r i g h t f o l l o w i n g

the l a t e r FO f a l l , thus keeping a high source amplitude at the onset o f

unstressed vowels and s y l l a b i c sonorants; on the other hand the low FO

s t r e t c h i n the stressed vowel before the peak gets i t s i n t e n s i t y reduced. So

there i s a n a t u r a l p a r a l l e l i s m i n the time courses of FO, source amplitude

and sound i n t e n s i t y f o r the three peak contours. I f i t i s destroyed, the

.0000 1.1450

T i n E ( R E L ) CSEC]

50-1

E N E R S Y CdB]

S P E E C H

Fig. 14

Speech wave, energy and FO contours ( l i n e a r scale) i n a n a t u r a l late-peak token o f "Er i s t j a g e r i t t e n . " The time marks i n d i c a t e on- and o f f s e t s o f /g/, /V, A/, / i / , / t / , /n/.

144

perceptual p a t t e r n i d e n t i t y may be l o s t .

Thus a l a t e peak, p o s i t i o n e d at the sonorant v o i c i n g onset a f t e r a voiceless

obstruent, can only be s u c c e s s f u l l y reconstructed by a l i s t e n e r i f the FO

descent t o the terminal low l e v e l has a high enough source amplitude t o

guarantee s u f f i c i e n t i n t e n s i t y i n the f i n a l sonorant f o r the high f a l l i n g FO

contour t o be a u d i t o r i l y monitored. But a n a t u r a l medial peak utterance w i t h

i t s low f i n a l i n t e n s i t y and g l o t t a l i r r e g u l a r i t y lacks these a t t r i b u t e s and

cannot be turned i n t o a l a t e peak percept simply by an FO s h i f t i n t o the

appropriate l o c a t i o n . The amplitude and d u r a t i o n o f the f i n a l sonorant have

t o be rai s e d considerably at the same time and the mode o f v i b r a t i o n

changed. This can be achieved by t r a n s f e r r i n g the f i n a l /n/ from the l a t e

peak stimulus. Contrariwi s e , w i t h a l a t e peak stimulus as p o i n t of departure

a p e r c e p t u a l l y convincing medial peak p a t t e r n can only be generated i f i n

a d d i t i o n t o the peak s h i f t the f i n a l sonorant i s d r a s t i c a l l y lowered i n

amplitude and shortened. This has also been reproduced i n a RULSYS TTS

formant s y n t h e s i s - b y - r u l e (Kohler, 1991f).

2.1.1.6. "Sie hat .ia g e s t r i t t e n . "

I f a short stressed vowel i s not only followed but also preceded by voice

less obstruents the masking o f peak height as the maximum value i s moved

i n t o the voi c e l e s s section arises s y l l a b l e - i n i t i a l l y as w e l l . F i g . 15 shows

the speech wave as well as the energy and FO contours i n a n a t u r a l medial-

peak token of "Sie hat j a g e s t r i t t e n . " [ z i h a t 9a §9 ' j K i t n ] ("She's been

q u a r r e l l i n g . " ) , where FO sets i n higher i n the stressed vowel compared w i t h

" g e r i t t e n " o f 2.1.1.5, because the i n i t i a l c l u s t e r / J t e / i s much longer than

the i n i t i a l /»/, and FO, r i s i n g from the l e f t base p o i n t i n /a/* has thus

reached a higher value at vowel onset. In a d d i t i o n , t h e r e i s a CFO increase

caused by the preceding voiceless f r i c a t i v e (see Gartenberg & Panzlaff-

Reuter, 1991, 3 . ) . " g e r i t t e n " and " g e s t r i t t e n " converge, however, i n having

t h e i r FO maximum close t o vowel o f f s e t , as i s usual f o r medial peaks i n

short stressed vowels before an unstressed s y l l a b l e (see l o c . c i t . , 5.2.).

The peak s h i f t i n "Sie hat j a g e s t r i t t e n . " was t e s t e d i n t e r a c t i v e l y using

the TTS research t o o l (see 1.5 ( b ) ) w i t h the rule-generated medial peak

p o s i t i o n as a p o i n t o f departure (see Fig. 16a) and a step s i z e o f 20 ms i n

complete p a r a l l e l s h i f t . S i g n i f i c a n t FO values i n the rule-generated

145

.0000

TltlE(REL) I L C S E C ]

55-1

E N E R S Y CdB]

S P E E C H

200-1

P I T C H CHZ]

Fig. 15

Speech wave, energy and FO contours ( l i n e a r scale) i n a n a t u r a l medial-peak token of "Sie hat j a g e s t r i t t e n . " The time marks i n d i c a t e on- and o f f s e t s of

/g/, hi, / J / . A/ , / b / , A/ , A/ , A/ .

utterance at /a/ and /i/ on- and o f f s e t s are 84 Hz, 88 Hz, 144 Hz and

140 Hz, r e s p e c t i v e l y . When the peak i s located 40 ms before the beginning of

/ i / , the peak i s c l e a r l y ' e a r l y ' ; at / i / onset (see Fig. 16b) i t has changed

t o 'medial'. The corresponding s i g n i f i c a n t FO values i n these two p o s i t i o n s

are 88 Hz, 104 Hz, 138 Hz, 86 Hz, and 84 Hz, 94 Hz, 148 Hz, 108 Hz. So the

change from ' e a r l y ' t o 'medial' occurs q u i t e a b r u p t l y i n t h i s s y l l a b l e

s t r u c t u r e as w e l l when the FO r i s e across the vo i c e l e s s c l u s t e r becomes more

extensive than the f a l l , and the FO o f f s e t i n the stressed vowel i s i n the

middle of the FO range between maximum and minimum values i n the utterance.

Thus i n t h i s sentence, the switch from ' e a r l y ' t o 'medial' occurs before

there i s an i n i t i a l FO r i s e i n the stressed vowel, which i s d i f f e r e n t from

a l l the other s y l l a b l e s t r u c t u r e s , w i t h i n i t i a l voiced consonants, analysed

so f a r . The reason f o r t h i s d i f f e r e n c e l i e s i n the CFO i n t e r f e r e n c e , which

i s obviously accounted f o r i n the perception process.

146

175

IBQ

125

lOQ

75

50

iZ ;IS;H ;A :T i J ; AS ;G ; EQ SH;T IR i I \j ; EOi N

j : i :

0 10 16 22 2B 35 4-2 +B 55 61 67 75 Bl B7 96 103 110 117 -5 10 6 6 6 7 7 6 7 6 6 B 6 6 9 7 7 7

125

100

75

50

25

;Z ;ig;H ; A iT ; J iA s iG ; EQ SHiT ;R ; I I t :EO;N

I \/\ \ M M

' i : ; i....

'''rT'\

M M M i M M M i

10 IB 22 2B 35 +2 4B 55 61 67 75 Bl B7 95 103 110 117 5 6 S 6 7 7 6 7 6 6 B 6 B 9 7 7 7

Fig. 16

(a) RULSYS output o f "Sie hat j a g e s t r i t t e n . " (= ( d e f a u l t ) medial peak), (b) 60 ms peak s h i f t t o the l e f t (= f i r s t c l e a r medial peak p o s i t i o n i n l e f t - r i g h t move). FO ( i n Hz; square parameter and cosine i n t e r p o l a t i o n between set FO p o i n t s ) and phonetic t r a n s c r i p t i o n aligned t o the time scale (segment and cumulative durations i n c s ) ; EO = 9 , SH = J".

147

2.1.1.7 Conclusion

The d i s c r i m i n a t i o n and i d e n t i f i c a t i o n t e s t s o f 2.1.1.1-5 a l l p o i n t i n the

same d i r e c t i o n , v i z . the perceptual e x p l o i t a t i o n o f d i f f e r e n t FO peak

synchronizations w i t h stressed vowel onsets and of the ensuing low ( f a l l i n g )

vs. high ( r i s i n g ) FO as a psychophonetic basis f o r phonological

c a t e g o r i z a t i o n at the l e v e l o f i n t o n a t i o n . For an ' e a r l y ' peak, FO i s low i n

the stressed vowel because i t i s on i t s descent at the vowel onset and, i n

complete p a r a l l e l s h i f t , also reaches i t s low end p o i n t e a r l y . I f there i s

v o i c i n g before the stressed vowel, the FO p o i n t at vowel onset i s preceded

by a higher FO value so t h a t FO f a l l s i n t o the accented s y l l a b l e . I f there

i s no previous v o i c i n g i n u t t e r a n c e - i n i t i a l p o s i t i o n o f a stressed s y l l a b l e

beginning w i t h v o i c e l e s s consonants, FO at vowel onset has as low a value as

would r e s u l t from an FO descent across the stressed s y l l a b l e periphery t o

strengthen the low FO l e v e l i n the accented vowel. The ' e a r l y ' peak i s thus

characterized by a high prenuclear FO - e i t h e r d i r e c t l y observable or by

e x t r a p o l a t i o n from the FO s t a r t i n the stressed s y l l a b l e nucleus - and by a

low FO i n the l a t t e r .

C o ntrariwise, the 'medial' peak has a low prenuclear FO, an FO r i s e ( o f at

l e a s t 2 semitones from nuclear vowel onset t o peak value, according t o

i n t e r a c t i v e t e s t i n g ) , and a subsequent descent t o a low FO at a l a t e r p o i n t

i n time than i n an ' e a r l y ' peak. The amount o f descent depends on s y l l a b l e

s t r u c t u r e s , and the r i s e may be absent because o f CFO i n t e r f e r e n c e ,

r e s u l t i n g i n a higher FO s t a r t i n g p o i n t at nucleus onset. So i n a l l cases,

the 'medial' peak accentuates a higher FO l e v e l i n the stressed vowel than

the ' e a r l y ' peak. In a ' l a t e ' peak the r i s e i s extended because i t occurs

l a t e r , but i t i s also p r e f i x e d by a s t r e t c h o f low l e v e l FO.

I n t e r a c t i v e perceptual t e s t i n g (see 1,5) has f u r t h e r shown t h a t the ' e a r l y '

and 'medial' peak pa t t e r n s do not lose t h e i r c h a r a c t e r i s t i c a u d i t o r y

d i f f e r e n c e i f t h e i r r i g h t base p o i n t s have the same FO value at the same

time a f t e r nucleus onset (due t o a f l a t t e n i n g o f the FO descent i n the

'e a r l y ' peak). I t i s thus the FO d i f f e r e n t i a t i o n i n the i n i t i a l p a r t of the

nuclear vowel t h a t counts as the d i s t i n c t i v e f e a t u r e . I f i n a RULSYS

generated ' e a r l y ' peak o f "Sie hat j a gelogen.", FO i s kept at the peak

maximum value up t o a p o i n t i n c l u d i n g the f i r s t 3 FO frames of 10 ms each i n

the stressed vowel, instead of having an immediate f a l l , the aud i t o r y

148

c h a r a c t e r i s t i c s o f the ' e a r l y ' peak are not l o s t . On the other hand, i f ,

s t a r t i n g from a 'medial' peak c o n f i g u r a t i o n i n the above utterance ( w i t h / I /

onset = 88 Hz, /!/ o f f s e t = 116 Hz, stressed vowel onset c o n s i s t i n g o f the

FO sequence 122 - 128 - 128 - 130 Hz), the / I / o f f s e t and a l l the vowel

onset frames are rai s e d t o 130 Hz, the 'medial' peak i s changed i n the

d i r e c t i o n o f an ' e a r l y ' one although the only d i f f e r e n c e between the two

patterns now l i e s i n the r i s e being completed i n the /!/ r a t h e r than

cont i n u i n g i n t o the accented vowel, i . e . i n the presence or absence of a

n u c l e u s - i n i t i a l r i s e . Admittedly, the d i f f e r e n c e between the ' e a r l y ' and

'medial' p a t t e r n s i s c l e a r l y weakened by t h i s m o d i f i c a t i o n , but i t shows

t h a t a 'medial' peak needs an FO r i s e i n the nucleus a f t e r a sonorant.

2.1.2 Munich experiments on German

The r e s u l t s o f the K i e l experiments n a t u r a l l y prompted the question as t o

how widespread the phonological c a t e g o r i z a t i o n of FO peak p o s i t i o n s i s i n

the i n t o n a t i o n system o f German i n general. Therefore the s e r i a l

discrimination and randomized paired discrimination t e s t s i n the ascending

ordering (2.1.1.1.1) as we l l as the synthesized stimuli i d e n t i f i c a t i o n t e s t

(2.1.1.1.2) were repeated i n the Phonetics I n s t i t u t e o f Munich U n i v e r s i t y ^

w i t h groups o f l i s t e n e r s o f a Bavarian d i a l e c t background. The t e s t s were

performed i n the I n s t i t u t e language l a b o r a t o r y and the s t i m u l i were

presented over headphones. Tapes and i n s t r u c t i o n s were i d e n t i c a l t o the ones

i n the K i e l experiments. 11 l i s t e n e r s p a r t i c i p a t e d i n the s e r i a l

discrimination t e s t and i n the i d e n t i f i c a t i o n t e s t , 14 i n the randomized

paired discrimination t e s t .

Results

Table V I I I presents the r e s u l t s of the s e r i a l discrimination t e s t .

3

I wish t o thank Dr Anton B a t l i n e r f o r organizing the t e s t runs.

149

Table V I I I

D i s t r i b u t i o n o f 'change has occurred' responses of 11 Munich l i s t e n e r s i n the l e f t - r i g h t sequence of the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a gelogen." (1 = l e f t - m o s t , 11 = r i g h t - m o s t )

Stimulus

4 5 6 7 8 9 10 11 F i r s t change

perceived 1 9 1

Further changes

perceived 4 1 3 2 1

Total 1 9 1 4 1 3 2 1 Fig. 17 shows the r e s u l t s o f the randomized paired discrimination t e s t .

T100

-80

-60

-40

-20

% 'different'

1 11

Fig. 17

D i s c r i m i n a t i o n f u n c t i o n s i n the randomized paired discrimination t e s t , showing percentage of ' d i f f e r e n t ' judgements f o r utterance p a i r s o f "Sie hat j a gelogen." w i t h 0-step ( a ) , 1-step ( b ) , or 2-step (c) distances o f FO peak p o s i t i o n s , i n the or d e r i n g l e f t - r i g h t . The stimulus numbers r e f e r t o the second stimulus. 14 sbs., n = 28 at each data p o i n t i n ( a ) , ( b ) , ( c ) .

150

% 'different' -r100

Fig. 18 shows the r e s u l t s o f the synthesized stimuli i d e n t i f i c a t i o n t e s t .

% 'matching'

stim nr 1 1 I I I I I I I

1 2 3 4 5 6 7 8

Fig. 18

I d e n t i f i c a t i o n f u n c t i o n i n the synthesized stimuli i d e n t i f i c a t i o n t e s t , showing percentage 'matching' judgements f o r 8 s t i m u l i "Sie hat j a gelogen." w i t h FO peak s h i f t from l e f t t o r i g h t i n the context " J e t z t versteh' ich das e r s t . " 11 su b j e c t s ; f o r each stimulus n = 110.

Discussion

The comparison of Tables I and V I I I shows t h a t the Munich group has the same

type o f response p a t t e r n w i t h a maximum f o r stimulus 5 and a large s c a t t e r

f o r f u r t h e r changes i n the series from stimulus 7 t o 11. However, due t o the

much smaller number of subjects i n the Munich group, the minor peak i n the

response curve does not show up so c l e a r l y . The r e s u l t s o f the s e r i a l

discrimination t e s t are supported by those o f the randomized paired

discrimination t e s t of Fig. 17 ( i n comparison w i t h Fig. 4 ) . There i s again a

maximum o f s e n s i t i v i t y i n the FO peak s h i f t continuum i n the area o f s t i m u l i

5/6 and a second, weaker s e n s i t i v i t y peak at stimulus 9, but the s e n s i t i v i t y

area i s narrower, w i t h the p a i r i n g s 5 - 6 and 3 - 5 not being included i n

152

the maxima, and there i s no peak o f ' f a l s e alarms' f o r the 5 - 5 p a i r . The

i d e n t i f i c a t i o n f u n c t i o n o f Fig. 18 ( i n comparison w i t h Fig. 5) po i n t s t o the

same two perceptual i d e n t i f i c a t i o n categories comprising s t i m u l i 1 - 4 , on

the one hand, and s t i m u l i 6 - 8, on the other, but w i t h a l o t more noise (an

o f f s e t o f about 20% - 30%) i n the f i r s t category and at stimulus 5, the

boundary between the two. We may again associate t h i s p a r t i t i o n i n g w i t h the

two phonological categories o f ' e a r l y ' and 'medial' FO peaks. The Munich

r e s u l t s are thus i n agreement w i t h the K i e l data and allow the

g e n e r a l i z a t i o n o f a perceptual and phonological c a t e g o r i z a t i o n o f FO peak

p o s i t i o n s r e l a t i v e t o stressed vowel onset f o r the i n t o n a t i o n o f German

across re g i o n a l v a r i e t i e s .

2.1.3 Experiments on other languages

What remained an open issue a f t e r the very c l e a r r e s u l t s o f the experiments

on d i f f e r e n t v a r i e t i e s o f German was whether we are here dealing w i t h a

phonological c a t e g o r i z a t i o n of German, a l b e i t on a psychophonetic basis, or

whether the phenomenon i s more widespread or even a language u n i v e r s a l ,

based on a s p e c i f i c f e a t u r e of human speech perception i n general. The

hypothesis t h a t such a general psychophonetic p r i n c i p l e does operate i n the

perception o f FO p a t t e r n s i n human speech. I r r e s p e c t i v e o f the phonological

c a t e g o r i z a t i o n and the l i n g u i s t i c f u n c t i o n s i t may serve i n any p a r t i c u l a r

language, leads t o the assumption t h a t n a t i v e speakers o f other languages

than German l i s t e n i n g t o German utterances should be able t o dete c t changes

i n FO peak p o s i t i o n s i n r e l a t i o n t o general human consonant - vowel

sequences, even w i t h o u t knowing any German at a l l , and t h e r e f o r e without

assessing the s t i m u l i s e mantically, but simply on the basis of general

phonetic p r o p e r t i e s o f human speech. I f the r e s u l t s of such l i s t e n i n g t e s t s

were t o coincide w i t h the r e s u l t s f o r German, t h i s would be a strong

i n d i c a t i o n o f a language-independent psychophonetic mechanism. As a f i r s t

step i n t h i s d i r e c t i o n , the s e r i a l discrimination t e s t i n the ascending

ordering of 2.1.1.1.1 was run w i t h two groups o f non-German speakers:

(a) 25 Russian speakers i n Leningrad", who had no knowledge o f German and

who e i t h e r worked on Russian, English or French phonetics (11) or were

students i n t h e i r f i r s t or second year i n the P h i l o l o g i c a l Faculty (14).

" I wish t o thank Prof. N a t a l i a Svetozarova o f Leningrad U n i v e r s i t y f o r admi n i s t e r i n g the t e s t i n her Phonetics Laboratory.

153

A copy, on standard cassette, o f the o r i g i n a l s e r i e s o f 11 s t i m u l i o f

"Sie hat j a gelogen." w i t h FO peak s h i f t s from l e f t t o r i g h t was

provided. The subjects l i s t e n e d t o the series t w i c e and then had t o

cross, on a prepared answer sheet, the number o f the stimulus i n the

series t h a t they perceived as being most c l e a r l y d i f f e r e n t from the

r e s t .

(b) 40 n a t i v e speakers of 13 d i f f e r e n t languages a t t e n d i n g German language

courses at beginners or advanced l e v e l at K i e l U n i v e r s i t y . The o r i g i n a l

t e s t tape was presented t o them over loudspeaker i n f o u r subgroups

( t w i c e 14 and twic e 6 l i s t e n e r s ) i n t h e i r r e l a t i v e l y q u i e t but

a c o u s t i c a l l y non-treated classroom. The answer-sheets and the procedure

were the same as f o r the corresponding t e s t w i t h German l i s t e n e r s i n

2.1.1.1.1. A great deal of time and care was spent on e x p l a i n i n g the

Table IX

Background i n f o r m a t i o n about the 40 f o r e i g n l i s t e n e r s i n the discrimination t e s t

Native Native Beginners Advanced Total language country

Beginners

Farsi I r a n 9 1 10

Polish 4 2 6

Portuguese B r a z i l 3 1 4

Korean 3 1 4

Spanish Chile 2 1 3

Spanish Argentina 1 1

English USA 3 3

English England 2 2

Arabic I s r a e l 1 1

Japanese 1 1

Thai 1 1

Nepali 1 1

Chinese 1 1

Singhalese ( S r i Lanka) 1 1

Swedish 1 1

28 12 40

154

t e s t i n s t r u c t i o n s i n German.^ Table IX provides the background

i n f o r m a t i o n about the 40 l i s t e n e r s .

Results

Table X presents the r e s u l t s o f the Russian group. Although the i n s t r u c t i o n

demanded a s i n g l e response, some subjects i n d i c a t e d more than one stimulus

as being c l e a r l y d i f f e r e n t .

Table X

Frequency d i s t r i b u t i o n o f ' c l e a r l y d i f f e r e n t ' responses by 25 Russian l i s t e n e r s w i t h o u t any knowledge of German i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a gelogen." ( 1 = l e f t - m o s t , 11 = right-most p o s i t i o n )

Stimulus

2 4 5 6 7 8 9 10

Phoneticians 1 1 7 4 1 1 1

Non-phoneticians 1 1 2 11 3 1 1

Total 1 2 3 18 7 2 2 1

Table XI presents the r e s u l t s o f the multilanguage group, r e s t r i c t e d t o the

perception of the f i r s t change i n the s e r i e s . The one Chinese, one Farsi

and one Korean speaker d i d not perceive any change at a l l , although the

other three Korean speakers d i d .

Table XI

Frequency d i s t r i b u t i o n o f ' f i r s t change has occurred' responses by 40 l i s t e n e r s o f 13 d i f f e r e n t languages, i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the 11 s t i m u l i w i t h FO peak s h i f t s i n "Sie hat j a gelogen." (1 = l e f t - m o s t , 11 = right-most p o s i t i o n )

Stimulus

4 5 6 7 8 9 11 f i r s t change

perceived 2 9 19 2 2 2 1

(3 l i s t e n e r s perceived no change at a l l . )

^ Robert Gartenberg c a r r i e d out the t e s t s and compiled the data.

155

Discussion

Both groups, i n s p i t e o f t h e i r language d i v e r s i t y , converge i n having a

c l e a r maximum of the response f u n c t i o n f o r stimulus 6. This i s a higher

p o s i t i o n than f o r the German l i s t e n e r s , who favoured stimulus 5, but who

also provided a s u b s t a n t i a l p o r t i o n o f t h e i r answers f o r stimulus 6. These

r e s u l t s are a very strong i n d i c a t i o n t h a t the dichotomy between an ' e a r l y '

and a 'medial' peak p o s i t i o n i s indeed a general psychophonetic,

language-independent phenomenon, which may then be incorporated i n t o the

language-specific phonology at d i f f e r e n t l e v e l s .

Thus i n Mandarin Chinese (see Carding, K r a t o c h v i l , Svantesson & Zhang, 1985)

i t i s put t o use i n the tone system, d i f f e r e n t i a t i n g between the

continuously (low) f a l l i n g FO o f tone 3 (e.g. i n ma3 'horse') and the (high)

r i s i n g - f a l l i n g FO o f tone 4 (e.g. i n ma4 'to c u r s e ' ) . I t i s worth noting i n

t h i s connection how a Chinese speaker (Dr C h i l i n Shih, research worker at

B e l l Labs, Murray H i l l i n 1986) c l a s s i f i e d the 11 s t i m u l i o f the l e f t - r i g h t

s e r ies "Sie hat j a gelogen." without any knowledge o f German. Without the

s l i g h t e s t doubt she associated s t i m u l i 1 - 4 w i t h tone 3, stimulus 5 w i t h

tone 4; l a t e r i n the se r i e s tone 4 changed t o the combined tone 2 - 4; but

whereas the switch from tone 3 t o tone 4 occurred a b r u p t l y i n the succession

of s t i m u l i 4 and 5, the change from tone 4 t o tone 2 - 4 was gradual and

could be less e a s i l y located ( a t stimulus 9 the change had d e f i n i t e l y taken

p l a c e ) . This informal t e s t shows (a) t h a t tones 3 and 4 i n Mandarin Chinese

are d i f f e r e n t i a t e d by the FO maximum r e l a t i v e t o the vowel onset, a

prenuclear FO peak s i g n a l l i n g the former, a nuclear FO peak the l a t t e r , and

(b) t h a t these c a t e g o r i z a t i o n s are possible on the language-independent

basis o f human speech perception i n general.

A d i f f e r e n t case o f e x p l o i t i n g the perceptual relevance of e a r l i e r vs. l a t e r

peaks are the acute and grave tonal word accents i n Norwegian and Swedish

(Garding, 1979, 1982). And f i n a l l y i n t o n a t i o n languages l i k e German,

English and French make use of the d i f f e r e n t peak p a t t e r n s i n t h e i r

i n t o n a t i o n phonologies and r e l a t e them t o semantic d i s t i n c t i o n s along the

'closed/open t o argument' dimension (see 2.2). English and French can

d i f f e r e n t i a t e i n the same way as German i n t h e i r corresponding sentences

"She's been l y i n g . " and " E l l e a menti." I t i s an i n t e r e s t i n g research

156

o b j e c t i v e f o r the f u t u r e t o i n v e s t i g a t e the d i f f e r e n t l i n g u i s t i c f u n c t i o n s

the dichotomy can be put t o i n the world's languages.

2.2 Semantics

The question t o be pursued now i s what l i n g u i s t i c f u n c t i o n s are c a r r i e d by

the phonological c o n t r a s t s o f ' e a r l y ' vs. 'medial' vs. ' l a t e ' peaks i n

German. In p a r t i c u l a r , i t i s t o be ascertained whether the c a t e g o r i c a l

change from ' e a r l y ' t o 'medial' and of the more gradual change from 'medial'

t o ' l a t e ' peak p o s i t i o n s are mapped onto a semantic space i n a congruent

fash i o n . Some i n s i g h t was gained from the data obtained through c o n t r o l l e d

dialogues (Gartenberg & H e r t r i c h , 1988). Furthermore, i n the K i e l and Munich

s e r i a l discrimination t e s t s (2.1.1.1 and 2.1.2) w i t h "Sie hat j a gelogen.",

subjects were also asked t o paraphrase the meanings o f the utterances

corresponding t o the three peak p o s i t i o n s (see 1.4 ( 1 ) ) . Here are some o f

the answers.

K i e l

(a) original utterance (b) f i r s t change (c) further change

Statement o f a f a c t or end of an argumentation.

I n t r o d u c t o r y statement, beginning of argumentation.

As ( b ) , but greater i n s i s t e n c e .

J u s t i f y i n g statement, e s t a b l i s h i n g c a u s a l i t y r e l a t i o n t o what precedes.

Report t o a t h i r d p a r t y t h a t she has been l y i n g ; the speaker stresses a f a c t r e s u l t i n g from the environment.

S l i g h t s u r p r i s e and reproach over behaviour.

Indignant statement.

Strong s u r p r i s e .

Surprise statement.

Statement, explanation.

Statement w i t h o u t s u r p r i s e .

Statement of a f a c t , e.g. i n the context "the punishment i s j u s t i f i e d . "

Surprise, astonishment.

Tendency towards i n d i g n a t i o n .

I n d i g n a t i o n , e.g. i n the context " I would not have expected t h i s o f her."

157

J u s t i f y i n g statement at the end o f a chain of arguments.

Statement, r e p o r t .

M a t t e r - o f - f a c t statement.

D e c l a r a t i v e , expected, matter-of-f a c t .

Confirmation o f a f a c t ; i t i s obvious t h a t she's been l y i n g .

Statement o f f a c t t h a t the speaker discovered long ago.

Beginning o f an argumentation, s l i g h t i n d i g n a t i o n .

Sudden r e a l i s a t i o n of l y i n g .

Statement w i t h the expression of i n d i g n a t i o n .

Greater i n d i g n a t i o n .

Question.

Statement w i t h the expression o f astonishment.

Unexpected, indignant. " I can't b e l i e v e i t . "

S u r p r i s i n g f a c t f o r the speaker, the l i e i s unexpected.

Explanation of a f a c t . I n d i g n a t i o n .

(a) original utterance

Confirmation o f what i s already known.

Munich

(b) f i r s t change

Surprise statement, reproachful undertone ( " . . I would not have expected t h a t . " )

(c) further change

Pure astonishment.

Statement.

Neutral statement.

Astonishment.

Exclamation.

Disappointment.

Exclamation w i t h i n c r e d u l i t y .

Simple statement o f a f a c t w i t h which the speaker seems t o be f a m i l i a r .

Surprise, i n d i g n a t i o n , speaker's r e a c t i o n t o a f a c t he d i d not know before.

But we knew t h a t before anyway.

M a t t e r - o f - f a c t statement.

Amazement.

Unforgivable observation.

Surprise, astonishment.

Astonishment.

M a t t e r - o f - f a c t statement, there was not r e a l l y any doubt about her behaviour.

I t was not c e r t a i n i f she would t e l l the t r u t h or not.

Contrary t o expectation she has been l y i n g , comes as a s u r p r i s e ; gradual t r a n s i t i o n from (b) t o ( c ) .

158

Another paraphrasing experiment was c a r r i e d out w i t h the f o l l o w i n g sentences

co n t a i n i n g f i r s t an ' e a r l y ' and then a ' l a t e ' peak ( t h e underlined word

received the peak accent):

(1) "Wer war das?" ("Who d i d t h a t ? " , "Who was t h a t ? " )

(2) "Mach' b i t t e das Fenster zu!" ("Shut the window, please.")

(3) "Was i s t denn e i n Atom?" ("What's an atom?")

(4) "Sehen w i r uns also morqen?" ("So we are going t o see each other

tomorrow.")

Each p a i r o f 'early/medial' peak utterances was selected from 10 n a t u r a l l y

produced r e p e t i t i o n s (speaker KK) and played t o l i s t e n e r s as o f t e n as they

l i k e d . They had t o w r i t e down t h e i r assessment o f the s i t u a t i o n or speaker

a t t i t u d e t h a t f i t t e d each sentence and peak p a t t e r n . Here are some of the

answers:

'ear7y'

(1) Several people are asked: which o f you was i t ?

In sense o f "who d i d t h a t ? "

The speaker asks the l i s t e n e r a question, he knows the answer himself and urges the l i s t e n e r t o give him the r i g h t answer.

'medial'

Speaker A asks speaker B the name of a t h i r d person.

Somebody unknown t o the speaker i s passing by and the speaker asks a l i s t e n e r the name o f the unknown person.

The speaker wants t o know something unknown t o him, e.g. he has j u s t seen somebody whose name he does not know.

Reproachful question: N e u t r a l , p o s i t i v e question, e.g. a the person concerned has t o teacher's question: "Charles the Great, expect a reprimand f o r some who was t h a t ? " mischief.

Speaker sounds s u p e r i o r , Speaker asks i n a f a m i l i a r way, i s on demanding, t r i e s t o be the same l e v e l as the person spoken t o . d i s t a n t .

(2) Speaker, r a t h e r annoyed, asks somebody standing at the open window t o shut i t ,

Order t o shut the window at once.

Request t o shut the window, not the door.

F r i e n d l y request t o shut the window, not the door, because, e.g., otherwise grandmother might catch a c o l d .

159

Tone o f a command, s l i g h t l y Opposed t o "shut the door please," t h r e a t e n i n g , repeated order o b j e c t has t o be defined s p e c i a l l y , t o the naughty son, ob j e c t o f s h u t t i n g i s s e l f - e v i d e n t , could be l e f t unmentioned.

(3) Speaker asks the l i s t e n e r f o r s p e c i f i c i n f o r m a t i o n t o t e s t him, i . e . the speaker knows the answer t o the question himself.

Speaker knows the answer already, e.g. could be a teacher.

Teacher t o h i s cl a s s , r h e t o r i c a l .

Exam question.

(4) Statement, a l l s e t t l e d , r o u t i n e utterance.

At the end o f an o r d i n a r y conversation, r o u t i n e .

Statement.

Speaker does not know the answer himself and asks the l i s t e n e r t o give him the i n f o r m a t i o n .

Speaker does not know the answer, asks a r e a l question.

A f t e r the teacher has provided the explanation a p u p i l not having heard i t asks what an atom i s .

Continuation i n a chain of questions.

Tomorrow, not today or the day a f t e r tomorrow or any other day the speaker might p r e f e r .

Speaker mentioned tomorrow f o r the next meeting and then changed i t ; t o make sure he repeats the new arrangement at the end o f the conversation.

Confirmation o f tomorrow as against the day a f t e r .

The meanings t h a t may be abstracted from the dialogue data and the

paraphrases f o r the three peaks are:

(a) early: e s t a b l i s h e d f a c t ; no room f o r d i s c u s s i o n ; f i n a l summing up o f

argument

(b) medial: new f a c t ; open f o r d i s c u s s i o n ; s t a r t i n g a new argument

(c) 7ate: emphasis on a new f a c t and c o n t r a s t t o what should e x i s t or e x i s t s

i n the speaker's or hearer's idea.

The FO peak d i f f e r e n c e s are thus not associated w i t h s t r e s s , which remains

the same i n a l l three cases, but w i t h i n t o n a t i o n , which i s i n t u r n l i n k e d t o

semantic categories expressing the speakers e v a l u a t i o n o f f a c t s i n respect

o f expectations. As regards the d i s t i n c t i o n between 'medial' and ' l a t e '

peaks s i m i l a r c a t e g o r i z a t i o n s have been proposed f o r English, the ' l a t e '

peak expressing the speaker's i n c r e d u l i t y or h i s u n c e r t a i n t y (Ward &

160

Hirschberg, 1985; Pierrehumbert & Steele, 1989).

The phonetic d i f f e r e n t i a t i o n between the three peaks and the associated

changes of meaning p o i n t t o another instance o f what Ohala (1983, 1984) has

c a l l e d the frequency code: low frequencies s i g n a l domination, high ones

submissiveness. Of course, i n the case under di s c u s s i o n , t h i s l i n k has been

given l i n g u i s t i c p l a s t i c i t y i n two ways:

the synchronization w i t h the s y l l a b l e s t r u c t u r e , i . e . w i t h human sound

a r t i c u l a t i o n ,

a semantic denotation, r a t h e r than an expressive meaning.

But the semantics o f 'closed vs. open t o argumentation' are i n t i m a t e l y

r e l a t e d t o 'domination vs. submissiveness'. I t i s , however, not necessarily

the domination or submissiveness o f the speaker t h a t i s s i g n a l l e d here, i t

may be t h a t o f the s i t u a t i o n or o f other communicative partners s e t t i n g an

established f a c t or le a v i n g the door open f o r change and new t h i n g s . These

are the basic, underlying meanings of ' e a r l y ' vs. 'non-early' peaks. The

actual meanings observable on the surface i n i n d i v i d u a l utterances and

contexts depend on the i n t e r p l a y o f these basic semantics o f i n t o n a t i o n

contours w i t h the semantics at the l e v e l s o f s y n t a c t i c s t r u c t u r e s , w i t h i n

and across sentences, and of the l e x i c o n .

I f an e a r l y peak i s used i n questions, whose semantics suggest openness,

then the question gets special connotations i n keeping w i t h the semantics o f

the e a r l y peak i n t o n a t i o n : the question i s asked w i t h a presumed knowledge

o f the answer, as i n

- the teacher's question "Wer war das?" ("Who d i d t h a t ? " = I ' l l f i n d

out anyway; possible t h r e a t )

the exam question "Was i s t e i n Phonem?" ("What's a phoneme?")

the resume asking f o r c o n f i r m a t i o n "Das Phonem i s t also eine

Lautklasse." ("So the phoneme i s a sound c l a s s . " = Can we keep t h a t i n

mind and s t a r t from t h e r e , moving t o the next question?)

I f an imperative c o n s t r u c t i o n gets an e a r l y peak, there i s again a

c o n t r a d i c t i o n between the s i g n a l l i n g , through i n t o n a t i o n , o f the expected

completion of an a c t i o n , and, through syntax, o f the order t o c a r r y i t out.

This c o n t r a d i c t i o n produces the connotation o f annoyance and impatience at

the delay o f an a c t i o n . "Mach' b i t t e das Fenster zu." ("Shut the window.

161

please.") may become a t h r e a t i n s p i t e of " b i t t e " . The e a r l y peak can also

get the connotation of r e s i g n a t i o n because nothing can be done t o a l t e r the

established f a c t s : "Nun gut. Wie Sie wo l l e n . " ( " A l r i g h t . As you l i k e . " ) . The

r e s i g n a t i o n i s a l l the gre a t e r the e a r l i e r the FO f a l l and the longer the

low FO t a i l on "gut" and "wollen". In e i t h e r - o r questions, an e a r l y peak i n

second p o s i t i o n s i g n a l s a choice w i t h i n a closed set o f a l t e r n a t i v e s ,

whereas a succession o f medial peaks w i t h low FO i n between r e f e r s t o an

open set o f a l t e r n a t i v e s , which are simply given as p o s s i b l e examples from a

longer l i s t : " W i l l s t du Tee oder Kaffee?" ("Would you l i k e tea or c o f f e e ? " ) .

Rising p a t t e r n s instead of medial peaks convey the same open set but sound

less c a t e g o r i c a l and more f r i e n d l y .

In the l a t e peak, the preceding low FO i n t e r f e r e s w i t h the openness

connotation of the r i s e and introduces the speaker's d i f f e r e n c e of opinion,

which i s ra t e d very high i n r e l a t i o n t o observable f a c t s . The speaker

stresses the d i f f e r e n c e between h i s opinion or way of assessing t h i n g s and

the opinion o f others or f a c t s or b e l i e f s as t o how t h i n g s should be. This

leads t o meanings o f s u r p r i s e , i n c r e d u l i t y , " t h a t can't be t r u e " ,

i n s i n u a t i o n , t a l k i n g down, changing i n degree according t o the amount o f

peak s h i f t t o the r i g h t . Very o f t e n the l a t e peak i s combined w i t h modal

p a r t i c l e s , r e i n f o r c i n g t h e i r meanings, such as (word w i t h l a t e peak accent

underlined)

" j a " i n exclamations

"Da s t e h t j a eine Kirche!" ("Oh, there's a church!"), expressing

s u r p r i s e because r e a l i t y d i f f e r s from the speaker's view,

"doch" i n statements and imperatives/requests

"Er i s t doch gekommen." ("He's come, what are you going on about."),

"Setzen Sie sich doch." ("Do s i t down."), "You are s t i l l standing,

i t i s my opinion t h a t you should be s i t t i n g . " ) , expressing

o p p o s i t i o n t o what the speaker i s confronted w i t h ,

"etwa" i n questions

"Hast du das etwa gekauft?" ("You d i d not buy t h a t , d i d you."),

expressing i n c r e d u l i t y , which i s a l l the stronger the gre a t e r the

emphasis s i g n a l l e d by peak height.

In these examples the modal p a r t i c l e may be missing, but the presence of a

l a t e peak s t i l l conveys the meaning of a c o n t r a s t between the speaker's

162

observation and h i s opinion on i t . I n utterances, such as "Ja." ("Yes."),

" N a t i i r l i c h . " ("Of course.") the speaker stresses h i s own opinion and r e j e c t s

any opinion t o the c o n t r a r y , producing a s u p e r c i l i o u s , arrogant,

presumptious undertone. Talking t o a c h i l d , "Wie heiBt du denn?" ("What's

your name?") stresses the distance between the speaker and the addressee and

gives the impression o f t a l k i n g down. In a sentence l i k e "Hast du bei

C h r i s t i n e ubernachtet?" ("Did you spend the n i g h t w i t h C h r i s t i n e ? " ) there

are i n d i c a t i o n s t h a t the addressee has done j u s t what the speaker suggests,

but should not have because t h i s clashes w i t h moral standards which the

speaker purports t o hold, r e s u l t i n g i n reproach or i n s i n u a t i o n ; combined

w i t h a high peak i t suggests i n c r e d u l i t y .

The important lesson t o be l e a r n t from these data i s t h a t t h e r e i s a d i r e c t

l i n k between p a r t i c u l a r FO contours and s p e c i f i c meanings, but t h i s l i n k i s

not one on the surface, but un d e r l i e s the actual meanings, which are the

r e s u l t of an i n t e r a c t i o n o f various meaning l e v e l s . Social psychologists

(e.g. Scherer, 1985) who have been concerned w i t h these d i r e c t

substance/expressive meaning r e l a t i o n s , have o f t e n lacked a d e t a i l e d i n s i g h t

i n t o the phonetic and semantic s t r u c t u r e s o f language as a p r e r e q u i s i t e t o a

successful i n t e r p r e t a t i o n . The c o r o l l a r y of the phonetic-semantic

explanations o f f e r e d f o r the use o f d i f f e r e n t FO peaks i n i n t o n a t i o n i s t h a t

these phonological i n t o n a t i o n categories i n t h e i r a s s o c i a t i o n w i t h meanings

r e l a t a b l e i n one form or another t o the basic ones given must be at l e a s t

widespread i n languages, provided the phonological dichotomy has not already

been booked at some other l e v e l , e.g. tone or word accent.

2.3 General discussion concerning Hypothesis (2)

The perception experiments of 2.1 and the semantic e v a l u a t i o n derived from

paraphrasing tasks i n 2.2 have l a r g e l y confirmed Hypothesis (2) o f

C o n t r i b u t i o n I (Kohler, 1991b): the s h i f t o f an FO peak i n a single-accent

terminal utterance between a prenucleus and a nucleus p o s i t i o n r e s u l t s i n a

c a t e g o r i c a l change o f perception, which i s c o r r e l a t e d w i t h an equally

c a t e g o r i c a l semantic switch along the dimension 'established/new' or

'closed/open t o argumentation'; the corresponding realignment t o the r i g h t

produces a gradual a u d i t o r y change c o r r e l a t e d w i t h a semantic continuum

expressing degrees of distance which the speaker e s t a b l i s h e s between himself

and the world as i t presents i t s e l f t o him. This degree o f distance r a t h e r

163

than the degree o f emphasis, as formulated i n Hypothesis ( 2 ) , i s the

semantic basis o f the 'medial' t o ' l a t e ' peak positions, emphasis being

c o r r e l a t e d w i t h peak height,

3. I n t o n a t i o n and st r e s s

I t has already been pointed out t h a t the three FO peak p o s i t i o n s discussed

i n Section 2. represent d i f f e r e n t phonological categories o f intonation

associated w i t h the same stressed s y l l a b l e . So intonation must be

d i f f e r e n t i a t e d from s t r e s s , through which a s y l l a b l e i n a chain i s selected

and marked f o r an intonation peak (or v a l l e y ) t o be hooked onto. But the

stress f e a t u r e may be chosen f o r d i f f e r e n t s y l l a b l e s i n a sequence, and thus

a s h i f t o f an FO peak (or v a l l e y ) p o s i t i o n from one s y l l a b l e t o another can

also change the st r e s s p o s i t i o n i n a s y l l a b l e chain, not j u s t the i n t o n a t i o n

peak (or v a l l e y ) associated w i t h i t . FO peaks can t h e r e f o r e become cues t o

stress beside being cues t o i n t o n a t i o n . Then two questions a r i s e :

(a) Under what c o n d i t i o n s i s an FO peak s h i f t ( w i t h o u t concomitant changes

in sound d u r a t i o n and i n t e n s i t y ) s u f f i c i e n t t o s h i f t stress t o a

d i f f e r e n t s y l l a b l e ? Two cases have t o be d i s t i n g u i s h e d : the st r e s s

p a t t e r n changes, but the peak p a t t e r n stays, or both change. In

p r i n c i p l e , at each stress p o s i t i o n three i n t o n a t i o n peaks are possible.

(b) How can the st r e s s and i n t o n a t i o n f u n c t i o n s o f FO peaks be

d i f f e r e n t i a t e d , and i n what ways do they i n t e r a c t ?

These questions r e l a t e t o the l e v e l o f l e x i c a l s t r e s s or o f sentence s t r e s s

because words i n sentences do not a l l r e t a i n t h e i r stresses. 3.1 deals w i t h

the former, 3.3 w i t h the l a t t e r . In 3.2 the importance of d u r a t i o n f o r the

s i g n a l l i n g of s t r e s s , i n a d d i t i o n t o FO, w i l l be discussed. F i n a l l y , 3.4

w i l l deal w i t h the perceptual ambiguity between one and two accents combined

w i t h c o n f l i c t i n g i n t o n a t i o n p a t t e r n s , and 3.5 w i l l enquire i n t o the

relevance o f i n t e n s i t y f o r the cuing o f stress and i n t o n a t i o n .

3.1 Lexical stress

German o f f e r s good examples f o r t e s t i n g the issues of st r e s s s i g n a l l e d by FO

peak p o s i t i o n and o f s t r e s s and i n t o n a t i o n i n t e r a c t i o n at the l e x i c a l l e v e l

because i t has minimal verb p a i r s , w i t h e i t h e r p r e f i x or stem s t r e s s , which

can occur i n the same na t u r a l sentence frame, e.g. "Er wird's wohl

umlagern." [ C B viBts vol 'umla:gBn ( u m ' l a : g B n ) ] , w i t h s t r e s s e i t h e r on the

164

p r e f i x "um-", meaning "verlagern" ("He i s presumably going t o s h i f t i t t o

another p l a c e . " ) , or on " - l a - " , meaning "belagern" ("He i s presumably going

t o besiege i t . " ) .

Utterances o f the above two sentences, (a) w i t h s t r e s s on "um-" and a

'medial' i n t o n a t i o n peak on t h i s s y l l a b l e , and (b) w i t h s t r e s s on " - l a - " and

an ' e a r l y ' i n t o n a t i o n peak, which i s a c t u a l l y located on the s y l l a b l e "um-",

were analysed and Fig. 19 presents the waveforms together w i t h t h e i r FO

di s p l a y s . The FO peak p o s i t i o n s i n the two utterances are p r a c t i c a l l y

i d e n t i c a l i n r e l a t i o n t o the s y l l a b l e s t r u c t u r e s o f "umlagern": they occur

at more or less the same time i n t e r v a l j u s t before the beginning o f / I / . The

d i f f e r e n c e s between the two are i n the shapes o f the FO peak contours and i n

the s y l l a b l e d u r a t i o n s . In the utterance w i t h stem st r e s s i n Fig. 19b the

post-peak FO descent i s more gradual, the s y l l a b l e "um-" s h o r t e r (135 ms i n

Fig. 19b vs. 222 ms i n Fig. 19a) and t h e r e f o r e the FO r i s e f a s t e r , s t a r t i n g

at a s t r u c t u r a l l y e a r l i e r p o i n t (beginning o f the / I / i n "wohl" r a t h e r than

at the "um-" s y l l a b l e onset, as i s the case i n the utterance w i t h p r e f i x

s t r e s s ) . The " - l a - " s y l l a b l e s i n the two utterances, on the other hand, have

very s i m i l a r d u r a t i o n s i n the stem and p r e f i x s t r e s s words (268 ms i n

Fig. 19b vs. 258 ms i n Fig. 19a). Two f u r t h e r s t i m u l i were generated from

the two i l l u s t r a t e d i n Fig. 19 by exchanging the FO contours (see Fig. 20).

These f o u r s t i m u l i (STl - ST4) were the basis f o r c r e a t i n g f o u r series o f FO

peak p o s i t i o n s (PI - P4):

PI A se r i e s o f 12: 6 l e f t s h i f t s ( p a r a l l e l t r a n s p o s i t i o n o f the l e f t branch

and time expansion of the r i g h t branch) and 5 complete p a r a l l e l r i g h t

s h i f t s o f 30 ms each i n the utterance o f Fig. 19a.

P2 A ser i e s o f 9: 8 complete p a r a l l e l l e f t s h i f t s o f 30 ms each i n the

utterance o f Fig. 19b.

P3 A s e r i e s o f 12 i n the utterance o f Fig. 20a, f o l l o w i n g the procedure i n

PI.

P4 A series o f 9 i n the utterance o f Fig. 20b, f o l l o w i n g the procedure i n

P2.

PI and P3 are based on the o r i g i n a l p r e f i x s t r e s s , P4 and P2 on the o r i g i n a l

stem s t r e s s utterance, and i n each p a i r i n g the series form an op p o s i t i o n

between more a b r u p t l y and slowly f a l l i n g FO peak contours, r e s p e c t i v e l y .

From these f o u r sets o f s t i m u l i two t e s t s were compiled: Test I combined

165

.0000 1.5181

PITCH CHZ]

Fig. 19

Speech waves and FO contours(1inear scale) of the o r i g i n a l (a) p r e f i x s tress w i t h 'medial' peak and (b) stem-stress w i t h e a r l y peak i n "Er wird's wohl umlagern." A, B, C mark the FO base and peak p o i n t s f o r peak contour s h i f t .

166

.0000 1.5181 TIME<REL) I CSECa

SPEECH

PITCH CHZ:

Fig. 20

As i n Fig. 19, but w i t h exchanged FO contours, adjusted t o the d i f f e r e n t t i m i n g of the new utterance.

167

the more sharply f a l l i n g sets PI and P4, Test I I the slowly f a l l i n g sets P2

and P3. Subjects were asked t o i d e n t i f y the s t i m u l i w i t h the meanings of

e i t h e r "belagern" (stem s t r e s s ) or "verlagern" ( p r e f i x s t r e s s ) . Further

d e t a i l s about t e s t stimulus generation, t e s t tape c o n s t r u c t i o n and t e s t

a d m i n i s t r a t i o n can be found i n Kohler (1990c).

In PI and P3 the se r i e s o f FO peak p o s i t i o n s s t r a d d l e the s y l l a b l e

s t r u c t u r e s where a change from p r e f i x t o stem s t r e s s i s t o be expected i f FO

i s a s u f f i c i e n t cue. The two sets d i f f e r i n t h a t the peak shape o f P3, but

not o f P I , approximates the more slowly descending FO c o n f i g u r a t i o n found i n

the e a r l y peak o f the o r i g i n a l stem-stress utterance ( c f . F i g . 19b). I t i s

hypothesized, t h e r e f o r e , t h a t i f s t r e s s i s p e r c e p t u a l l y s h i f t e d at a l l i n PI

and P3, there w i l l be a more c l e a r - c u t change i n PI because there i s a

higher p r o b a b i l i t y i n P3 t h a t an FO peak p o s i t i o n on "um-" can not only be

perceived as a 'medial' or ' l a t e ' peak p r e f i x s t r e s s but also as an ' e a r l y '

peak stem s t r e s s . S i m i l a r l y , t h e r e would be a grea t e r l i k e l i h o o d i n P2 than

i n P4 f o r an ' e a r l y ' peak stem stress t o i n t e r f e r e w i t h a 'medial' or ' l a t e '

peak p r e f i x s t r e s s because of the slower FO descent and i t s time expansion

i n the l e f t s h i f t o f P2 as against P4.

Results and Discussion

Figs. 21 and 22 present the data of the two i d e n t i f i c a t i o n t e s t s f o r the

o r i g i n a l p r e f i x and stem stress s e r i e s , r e s p e c t i v e l y , each w i t h slow and

more sharply f a l l i n g peak contours.

In the s h i f t o f the more sharply f a l l i n g FO peak contour through the

o r i g i n a l p r e f i x - s t r e s s utterance there i s a c l e a r change from i n i t i a l t o

stem s t r e s s , i n s p i t e of the d u r a t i o n o f "um-" p o i n t i n g t o the former. FO

can thus o v e r r i d e d u r a t i o n , p a r t i c u l a r l y since the d u r a t i o n o f the

unstressed " - l a - " s y l l a b l e i n the o r i g i n a l utterance i s very close t o i t s

d u r a t i o n under s t r e s s . In stimulus 10, which i s the f i r s t i n the ord e r i n g

from 1 t o 12 t o y i e l d an unequivocal stem-stress c a t e g o r i z a t i o n w i t h over

80% p o s i t i v e responses, the FO peak p o s i t i o n i s 30 ms i n t o the vowel o f the

s y l l a b l e " - l a - " . This corresponds t o the data discussed i n Section 2.,

concerning the change from an ' e a r l y ' t o a 'medial' i n t o n a t i o n peak on the

stressed s y l l a b l e . The f a c t t h a t the change from one s t r e s s p o s i t i o n t o the

168

% „ b G l Q g G r n "

1 2 3 4 5 6 7 8 9 1 0 11 12 stim nr

Fig. 21

Percentage stem-stress responses f o r "umlagern" (= "belagern", i . e . stem s t r e s s ) i n the series o f 12 FO peak p o s i t i o n s (from l e f t t o r i g h t ) combined w i t h the o r i g i n a l p r e f i x - s t r e s s utterance o f "Er wird's wohl umlagern." (n r . 7 appr. o r i g i n a l peak p o s i t i o n ) . Broken l i n e = P3, slo w l y f a l l i n g peak contour (n = 80 at each data p o i n t ) , continuous l i n e = P I , sharply f a l l i n g peak contour (n = 185 at each data p o i n t ) , dotted l i n e = P I , sharply f a l l i n g peak contour, but i n Test I I I o f 3.2, see t e x t (n = 170 at each data p o i n t ) .

other i s gradual r a t h e r than c a t e g o r i c a l can be r e l a t e d t o a residue o f the

d u r a t i o n cue. But we also have t o consider some i n t e r a c t i o n o f the st r e s s

and i n t o n a t i o n f u n c t i o n s o f FO because the FO peak assumes p o s i t i o n s before

the beginning o f the s y l l a b l e nucleus /a:/ o f " - l a - " which can

simultaneously f u n c t i o n as the 'medial' or ' l a t e ' i n t o n a t i o n peak i n

stressed "um-" and as the 'e a r l y ' i n t o n a t i o n peak i n stressed " - l a - " . The

relevance of t h i s i n t o n a t i o n i n t e r f e r e n c e w i t h s t r e s s i s confirmed by the

f i n d i n g t h a t when the more slowly f a l l i n g FO peak i s s u b s t i t u t e d the

i n i t i a l - s t r e s s category i s not so c l e a r l y represented: the i n t e r p r e t a t i o n o f

169

%,.bGlQgGrn"

100 T

0 ' 1 1 1 1 1 1 1 1 1—

1 2 3 4 5 6 ? 8 9 stim nr

Fig. 22

Percentage stem-stress responses f o r "umlagern" (= "belagern", i . e . stem s t r e s s ) i n the series o f 9 FO peak p o s i t i o n s (from l e f t t o r i g h t ) combined w i t h the o r i g i n a l stem-stress utterance "Er wird's wohl umlagern." (nr. 9 appr. o r i g i n a l peak p o s i t i o n ) . Broken l i n e = P2, slowly f a l l i n g peak contour (n = 80 at each data p o i n t ) , continuous l i n e = P4, sharply f a l l i n g peak contour (n = 185 at each data p o i n t ) , dotted l i n e = P4', sharply f a l l i n g peak contour and durations o f p r e f i x s t r e s s ( c f . Test I I I o f 3.2, n = 170 at each data p o i n t ) .

an ' e a r l y ' i n t o n a t i o n peak f o r stem st r e s s i s then never completely

precluded.

When an FO peak contour i s s h i f t e d through the o r i g i n a l stem-stress

utterance there i s no change between the str e s s categories ( F i g . 22): the

answers remain predominantly i n favour o f stem s t r e s s . I n t h i s case, FO

cannot o v e r r i d e the d u r a t i o n cue completely because "um-" i s too short i n

r e l a t i o n t o " - l a - " t o signal i n i t i a l s t r e s s . There i s some e f f e c t o f FO when

170

the more sharply f a l l i n g FO peak occurs w i t h i n the s y l l a b l e "um-". In

s t i m u l i 1 t o 5 the FO peak has been s h i f t e d l e f t w a r d a l l the way i n t o the

preceding s y l l a b l e "wohl", whereas i n 6 t o 8 i t has been moved only as f a r

back as some p o i n t w i t h i n the p r e f i x s y l l a b l e "um-", and i n these s t i m u l i

there are up t o 30% judgements o f p r e f i x s t r e s s . This p a t t e r n suggests t h a t

the o v e r r i d i n g salience o f d u r a t i o n i n the o r i g i n a l p r e f i x s t r e s s stimulus

i s checked somewhat when the c h a r a c t e r i s t i c sharply f a l l i n g contour occurs

i n the r e l e v a n t s y l l a b l e and i s more narrowly l i m i t e d t o i t , a l l o w i n g the

i n t e r p r e t a t i o n o f a 'medial' or ' l a t e ' peak on "um-", r a t h e r than an ' e a r l y '

one on the f o l l o w i n g " - l a - " . In the other s e r i e s , however, the slowly

f a l l i n g and time-expanded FO contour reduces the p r o b a b i l i t y o f i n t e r p r e t i n g

the peak as a 'medial' or ' l a t e ' peak f o r a p r e f i x s t r e s s , because o f the

stronger i n t e r f e r e n c e from an ' e a r l y ' peak i n t e r p r e t a t i o n on " - l a - " , due t o

the wider span o f the FO peak descent.

The questions asked i n i t i a l l y can now be answered as f o l l o w s :

(a) An FO peak s h i f t by i t s e l f i s s u f f i c i e n t t o b r i n g about a c l e a r change

from one st r e s s p o s i t i o n t o another, provided the d u r a t i o n o f the

st r e s s e d - s y l l a b l e - t o - b e toward which the FO peak i s s h i f t e d i s not too

shor t . But even when i t i s , there i s a r e s i d u a l FO e f f e c t .

(b) The i n t o n a t i o n f u n c t i o n o f FO i n t e r f e r e s w i t h i t s s t r e s s f u n c t i o n i f the

l a t t e r i s not supported by d u r a t i o n . This f i n d s i t s expression i n a

gradual change from one stress p o s i t i o n t o another i n a b u t t i n g s y l l a b l e s

where an ambiguity can a r i s e between a 'medial' or ' l a t e ' i n t o n a t i o n

peak i n one stressed s y l l a b l e and an ' e a r l y ' i n t o n a t i o n peak r e l a t e d t o

a subsequent stressed s y l l a b l e . This i n t e r a c t i o n i s strengthened when

the shape o f the FO peak contour approximates the more slowly f a l l i n g

one of the ' e a r l y ' i n t o n a t i o n peak of a l a t e r s t r e s s .

3.2 Duration as a f e a t u r e i n st r e s s perception

I t has been shown i n 3.1 t h a t although FO i s a strong cue i n str e s s

perception, d u r a t i o n can become an a d d i t i o n a l d i s t i n c t i v e f e a t u r e when

vowels and po s t v o c a l i c sonorants are sh o r t e r than would be associated w i t h

the production o f a stressed s y l l a b l e . On the other hand, i f they are longer

than would be associated w i t h an unstressed s y l l a b l e , the FO cue may be

dis t u r b e d , but never dominated by the d u r a t i o n cue.

171

3.2.1 Duration increase f o r inducing s t r e s s perception i n FO peaks

The importance of d u r a t i o n f o r s t r e s s perception was f u r t h e r i n v e s t i g a t e d i n

an experiment t h a t repeated Test I of 3.1 by using the peak se r i e s PI and a

modified peak s e r i e s P4', i . e . the sets o f s t i m u l i based on the o r i g i n a l

p r e f i x and stem stress utterances, r e s p e c t i v e l y , both combined w i t h the more

sharply f a l l i n g FO contour derived from the p r e f i x - s t r e s s utterance (see

Figs. 19a and 20b). But t h i s time a new basis stimulus ST4' f o r a se r i e s P4'

was created by a d j u s t i n g the dur a t i o n s o f the s y l l a b l e "um-" [um] and the

vowel [a:] o f the s y l l a b l e " - l a - " i n the basis stimulus ST4 t o the same

values as i n the basis stimulus STl. By repeating some periods i n [um] and

d e l e t i n g some i n [ a : ] , [u] was lengthened from 70 ms t o 117 ms, [m] from

65 ms t o 105 ms, and [a:] reduced from 210 ms t o 189 ms. Then the FO contour

of the basis stimulus STl was t r a n s f e r r e d - sound segment by sound segment -

to the modified basis stimulus ST4'. The se r i e s P4' was generated by

s h i f t i n g the FO peak t o the l e f t as f o r P4.

Series PI and P4' were then compiled t o a new Test I I I , which only d i f f e r s

from Test I i n the segment durations o f P4' vs. P4. The f i r s t 7 s t i m u l i o f

PI and the l a s t 7 of P4' occupy the same ranges o f FO peak p o s i t i o n s , have

very s i m i l a r segment durations ( w i t h [um] and [a:] being i d e n t i c a l ) and

comparable FO contours, but they d i f f e r i n the basis s t i m u l u s , which i s

e i t h e r the o r i g i n a l p r e f i x - s t r e s s utterance i n PI or the o r i g i n a l

stem-stress utterance i n P4', implying s p e c t r a l and i n t e n s i t y d i f f e r e n c e s .

The hypothesis connected w i t h Test I I I was t h a t the change o f segment

durations i n P4' vs. P4 would be s u f f i c i e n t t o reverse judgement from stem

stress t o p r e f i x s t r e s s i n a l l cases of the s e r i e s , r e s u l t i n g i n s i m i l a r

response f u n c t i o n s f o r s t i m u l i 1 - 7 o f PI and f o r s t i m u l i 3 - 9 o f P4', and

would thus p o i n t t o the low relevance o f s p e c t r a l and i n t e n s i t y f e a t u r e s i n

German stress perception. Test I I I was run w i t h 34 l i s t e n e r s .


The dotted l i n e s i n Figs. 21 and 22 present the r e s u l t s o f i d e n t i f i c a t i o n

Test I I I . The hypothesis o f the complete rev e r s a l o f judgements has been

confirmed by P4 and P4' i n Fig. 22 y i e l d i n g ca. 80% and 20% "belagern"

responses, r e s p e c t i v e l y . The l e f t s h i f t o f the response f u n c t i o n f o r the

i d e n t i c a l PI se r i e s i n Test I I I , compared w i t h Test I , may be due t o the

172

t e s t design: the decrease o f the number o f c l e a r stem-stress cases and the

increase o f the number of c l e a r p r e f i x - s t r e s s cases by swapping P4' f o r P4

may have pushed the responses t o the more ambivalent cases i n PI i n the

d i r e c t i o n o f stem s t r e s s , but there i s also more noise i n the PI response

curve o f Test I I I , as i s shown by the o f f s e t o f 10% - 20%.

3.2.2 Duration decrease f o r e l i m i n a t i n g s t r e s s perception i n FO peaks

P a r a l l e l t o generating ST4' from ST4, a new STl' was generated from STl by

shortening the d u r a t i o n s o f [um] t o 70 ms - 65 ms (from 117 ms - 105 ms) and

of [a:] t o 210 ms (from 189 ms), applying the same period s p l i c i n g

procedure. Then the same peak s h i f t s t o the l e f t and r i g h t were performed as

i n PI, r e s u l t i n g i n P I ' w i t h 12 peak p o s i t i o n s and sharply f a l l i n g FO

contours. Informal l i s t e n i n g t o the se r i e s P I ' by phoneticians established

t h a t a l l the 12 s t i m u l i were unequivocally perceived as stem stressed, even

when the FO peak p o s i t i o n was on "um-". Because of t h i s very c l e a r evidence

no f u r t h e r formal t e s t was run. These r e s u l t s prove again t h a t i f the

dur a t i o n o f a s t r e s s e d - s y l l a b l e - t o - b e i s too short the FO cue may not be

s u f f i c i e n t t o sig n a l s t r e s s .

3.2.3 Conclusion

In German, st r e s s i s cued by two f e a t u r e s , FO and d u r a t i o n , which may be

expressed i n a d i s t i n c t i v e f e a t u r e n o t a t i o n as iFSTRESS, ±DSTRESS. The FO

cue c l e a r l y dominates i f the d u r a t i o n i s not too short f o r stressed

s y l l a b l e s ; otherwise longer d u r a t i o n i s requi r e d t o signal s t r e s s . S y l l a b l e s

are thus marked as stressed/unstressed by the two s t r e s s f e a t u r e s : (1)

-FSTRESS, -DSTRESS = unstressed, (2) -FSTRESS, +DSTRESS = secondary s t r e s s ,

e.g. i n n o n - i n i t i a l components of compounds ("Ausfahrt" [ 'aus ,fa:Bt]

( " e x i t " ) , which receive increased d u r a t i o n , but no i n t o n a t i o n peak (or

v a l l e y ) , (3) +FSTRESS, +DSTRESS = primary s t r e s s , where the i n t o n a t i o n

points are hooked. The i n t o n a t i o n associated w i t h stressed s y l l a b l e s i s ,

among other t h i n g s , defined according t o d i f f e r e n t peak p o s i t i o n s , which may

again be expressed i n d i s t i n c t i v e f e a t u r e n o t a t i o n t a k i n g the primary

dichotomy between ' e a r l y ' and 'non-early' i n t o account: ±EARLY, and -EARLY

may then be ±LATE.

At each p o t e n t i a l stress p o s i t i o n +FSTRESS, three i n t o n a t i o n peaks are

possible. But since the FO o f these peaks serves t o signal the stressed

173

s y l l a b l e - as a str e s s cue - and at the same time the peak p o s i t i o n i n

r e l a t i o n t o such a stressed s y l l a b l e - as an i n t o n a t i o n cue, there may be

i n t e r f e r e n c e between the two cue f u n c t i o n s leading t o ambiguity, i f the

temporal distance between successive p o t e n t i a l stresses, as i n l e x i c a l items

of the type "umlagern", i s small, p a r t i c u l a r l y because o f a lack o f

i n t e r v e n i n g unstressed s y l l a b l e s (e.g. c o n t a i n i n g /a/) and even more so i n

the case o f a b u t t i n g s y l l a b l e s w i t h short q u a n t i t y vowels.

3.3 Sentence s t r e s s

In sentences not every l e x i c a l item gets a +FSTRESS marking f o r the

ass o c i a t i o n w i t h i n t o n a t i o n peaks (and v a l l e y s ) , although at a more abstract

l e v e l i t has l e x i c a l s t r e s s , i . e . at l e a s t one s y l l a b l e i s ph o n o l o g i c a l l y

marked as having the p o t e n t i a l o f r e c e i v i n g the feat u r e s +FSTRESS and

+DSTRESS. The r u l e s o f grammar and pragmatics determine which l e x i c a l - s t r e s s

s y l l a b l e s are given the fe a t u r e combinations +FSTRESS, +DSTRESS or -FSTRESS,

+DSTRESS i n sentences. In a sentence such as "Aber der Leo s a u f t . " [abB dB

•le:o: "zoift] ("But Leo d r i n k s . " ) ^ e i t h e r the subject "Leo" or the verb

" s a u f t " may be i n focus, r e c e i v i n g the features +FSTRESS, +DSTRESS, or both

elements may be so characterized simultaneously. The question t o be answered

i s whether the f i n d i n g s at the l e x i c a l l e v e l i n 3.1 - 2 can be r e p l i c a t e d at

the sentence l e v e l , v i z . whether a switch from one s t r e s s p o s i t i o n t o

another can be brought about simply by FO peak s h i f t through the sentence.

In t h i s case i t w i l l also have t o be checked whether at some intermediate

s e c t i o n o f the peak s h i f t scale both stresses are r e a l i s e d . And f i n a l l y ,

there i s the issue o f the perceptual m a n i f e s t a t i o n o f d i f f e r e n t intonation

peaks ( ' e a r l y ' , 'medial', ' l a t e ' ) at each s t r e s s p o s i t i o n , i n p a r a l l e l t o

what was found i n the sentences of Section 2. w i t h only one p o t e n t i a l

accent.

3.3.1 Stimulus preparation f o r perception experiments

A n a t u r a l production o f the utterance "Aber der Leo s a u f t . " w i t h sentence

stress and 'medial' i n t o n a t i o n peak on "Leo" was used f o r stimulus

generation. Fig. 23 shows the speech wave, energy and FO contours. A series

^ This sentence played an important r o l e i n some experiments o f the Munich I n t o n a t i o n P r o j e c t (see Altmann et a l . , 1989) and was taken as the basis of f u r t h e r experiments i n the K i e l I n t o n a t i o n P r o j e c t f o r purposes of cross-reference.

174

.0000

T i n E ( R E L ) I L [SEC]

4 5T

ENERGY CdB]

I

1.4800

J I

SPEECH

200-1

PITCH CHZ3

Fig. 23

Speech wave, energy and FO contours ( l i n e a r scale) o f the utterance "Aber der Leo s a u f t , " w i t h subject stress and 'medial' peak. The time marks i n d i c a t e the FO base and peak p o i n t s f o r peak contour s h i f t .

o f 7 l e f t s h i f t s ( p a r a l l e l t r a n s p o s i t i o n o f the l e f t branch and time

expansion o f the r i g h t branch) and of 11 complete p a r a l l e l r i g h t s h i f t s o f

30 ms each were generated on the basis o f the utterance i n Fig. 23. An

informal assessment o f the series detected a poor q u a l i t y i n the synthesis

o f the segment /z/ and o f too strong a f i n a l a s p i r a t i o n ; furthermore, the

l a s t s t i m u l i o f the s e r i e s , from 15 t o 19, w i t h the accent on " s a u f t "

sounded too strong at the beginning and husky at the end, obviously due t o

the wrong energy contour f o r a f i n a l FO peak p o s i t i o n , i . e . t o a

desynchronization o f FO and energy (see 2.1.1.5). To remedy these defects

and t o create as na t u r a l s y n t h e t i c versions as po s s i b l e , almost the e n t i r e

[ z ] was devoiced, the f i n a l a s p i r a t i o n reduced by lowering the dB-values,

and the discrepancy between energy and FO e l i m i n a t e d by lowering the energy

175

As Fig, 23, but w i t h the 19 peak p o s i t i o n p o i n t s marked.

i n "Leo" and by r a i s i n g i t around the FO peaks. The peak se r i e s was then

regenerated w i t h these parameter m o d i f i c a t i o n s of the stimulus i n Fig. 23;

i t formed the basis f o r i d e n t i f i c a t i o n and s e r i a l discrimination t e s t s .

3.3.2 I d e n t i f i c a t i o n t e s t

Five r e p e t i t i o n s o f the 19 s t i m u l i were randomized and presented ( i n the

format of 1.4 (2) f o r s i n g l e s t i m u l i ) t o 31 l i s t e n e r s w i t h the i n s t r u c t i o n

t o decide whether "Leo" or " s a u f t " was more s t r o n g l y stressed.


Fig. 25 presents the r e s u l t s o f the i d e n t i f i c a t i o n t e s t , which demonstrate

very c l e a r l y t h a t a simple FO peak s h i f t causes a change from subject t o

verb s t r e s s . The t r a n s i t i o n i n the response f u n c t i o n between the two stress

p o s i t i o n s i n d i c a t e s - as was confirmed i n phonetic expert l i s t e n i n g - t h a t

176

as the peak i s moved i n t o the f r i c a t i v e [ ? ] and t h e r e f o r e spans both words,

g i v i n g a l a t e FO r i s e t o "Leo" and an e a r l y FO f a l l t o "sauft* the

perception o f double s t r e s s r e s u l t s , which disappears again when the peak i s

located at the beginning o f the vowel o f the verb and the impression o f

focus s t r e s s on the l a t t e r i s created.

KX)-r % 'subjeci- stress'

I I I I I I I I I I I I 1 3 5 7 9 11 13 15 17 19

2 4 6 8 10 12 14 16 18 stim nr

Fig. 25

I d e n t i f i c a t i o n f u n c t i o n showing percentage 'subject s t r e s s ' judgements f o r 19 s t i m u l i "Aber der Leo s a u f t . " w i t h FO peak s h i f t from l e f t t o r i g h t , (nr 8 appr. o r i g i n a l peak p o s i t i o n ) , n = 155 at each data p o i n t .

3.3.3 S e r i a l d i s c r i m i n a t i o n t e s t s

The series o f 19 s t i m u l i was p a r t i t i o n e d i n t o two sub-series: (a) s t i m u l i 1

- 10 representing c l e a r instances o f the category o f subject stress and (b)

s t i m u l i 14 - 19 representing c l e a r instances o f the category o f verb s t r e s s ,

according t o the r e s u l t s o f the i d e n t i f i c a t i o n t e s t . Each set i n ascending

(numerical) o r d e r i n g was presented t o 32 subjects f o r e v a l u a t i n g at which

177

stimulus i n the s e r i e s the f i r s t and f u r t h e r changes i n the speech melody

had occurred.


Tables X I I and X I I I present the r e s u l t s o f the s e r i a l discrimination t e s t s

(a) and ( b ) .

Table X I I

Frequency d i s t r i b u t i o n o f 'change has occurred' responses o f 32 l i s t e n e r s i n the l e f t - r i g h t sequence o f the s e r i a l discrimination t e s t across the f i r s t 10 s t i m u l i w i t h FO peak s h i f t s i n "Aber der Leo s a u f t . " (1 = l e f t - m o s t , 10 = right-most p o s i t i o n )

Stimulus

2

1

4

5

5

12

6

10

7

2

8 10 F i r s t change perceived

Further changes

perceived 4 7

Total 1 5 12 14 9

{2 l i s t e n e r s perceived no change at a l l . )

5

5

7

7

7

7

Table X I I I

Frequency d i s t r i b u t i o n o f 'change has occurred' responses o f 32 l i s t e n e r s i n the l e f t - r i g h t sequence of the s e r i a l discrimination t e s t across the l a s t 6 s t i m u l i w i t h FO peak s h i f t s i n "Aber der Leo s a u f t . " (14 = l e f t - m o s t , 19 = right-most p o s i t i o n )

Stimulus

16 17 18 19 F i r s t change perceived 16 11 1 1

Further changes

perceived 5 2 5

Total 16 16 3 6

(3 l i s t e n e r s perceived no change at a l l . )

In both s e r i e s the f i r s t perceptual change has a maximum frequency at the

stimulus i n which the FO peak occupies the f i r s t p o s i t i o n w i t h i n the

respective s y l l a b l e nucleus ( n r 5 i n (a) and nr 16 i n ( b ) ) . This r e s u l t

coincides w i t h the data obtained i n the peak alignment t e s t i n utterances

178

c o n t a i n i n g a s i n g l e p o t e n t i a l accent ( c f . 2.1.1). I t p o i n t s t o the change

from an ' e a r l y ' t o a 'medial' peak w i t h i n each s t r e s s p o s i t i o n .

A corresponding c l e a r - c u t switch was not observed i n the "umlagern" series

of 3.1.^ The reason f o r t h i s d i f f e r e n c e l i e s i n the s h o r t e r d u r a t i o n of [um]

vs. [ l e : o : ] , which allows less separation o f the i n t o n a t i o n peak and stress

p o s i t i o n s and causes the FO c o n f i g u r a t i o n t o s t r a d d l e both p o t e n t i a l accent

s y l l a b l e s , given the width of the s h i f t e d peak contour, across a grea t e r

number of s t i m u l i . The more gradual t r a n s i t i o n from p r e f i x t o stem st r e s s i n

the response f u n c t i o n o f Fig. 21, compared w i t h t h a t i n Fig. 25, i s a

f u r t h e r i n d i c a t i o n of t h i s stronger s t r e s s / i n t o n a t i o n i n t e r a c t i o n across

segment dur a t i o n s t h a t are i n s u f f i c i e n t f o r r e s t r i c t i n g the chosen peak

t i m i n g t o . To achieve a grea t e r separation o f the d i f f e r e n t i n t o n a t i o n peaks

w i t h i n each accent, the peak descent would at l e a s t have t o be f a s t e r t o

encroach less on the other peak and stress p o s i t i o n s .

3.4 Perceptual ambiguity between s i n g l e and double accent

In s p i t e o f the more adequate temporal s t r u c t u r e i n "Aber der Leo s a u f t . " ,

f o r separating the t h e o r e t i c a l l y possible peak and st r e s s p o s i t i o n s , there

i s s t i l l an ambiguous t r a n s i t i o n period between the two p o t e n t i a l accents,

as shown i n Fig. 25. And as was argued i n 3.3.2, t h i s ambivalence i s not so

much between e i t h e r subject or verb focus s t r e s s , but between subject focus

and double s t r e s s . In the l a t t e r case, the l a t e r i s e on "Leo", followed by

an e a r l y f a l l on " s a u f t " , may be i n t e r p r e t e d as belonging t o two FO peak

c o n f i g u r a t i o n s - ' l a t e ' followed by ' e a r l y ' -, w i t h o u t an i n t e r v e n i n g d i p

between the two, or as a s i n g l e ' l a t e ' FO peak on the s u b j e c t . In the f i r s t

case, two accents are perceived, i n the second only one. Because o f the

s t i l l close temporal p r o x i m i t y between the two p o t e n t i a l s t r e s s p o s i t i o n s ,

there must be a s t r e t c h along the peak s h i f t scale where the signal i s

ambivalent between these two i n t e r p r e t a t i o n s . That we are here dealing w i t h

a confusion o f subject focus s t r e s s and double s t r e s s i s proved by expert

l i s t e n i n g t o the se r i e s o f 19 FO peak s h i f t s i n "Aber der Leo s a u f t . " ,

e s t a b l i s h i n g s t r e s s on "Leo" i n s t i m u l i 12 - 14, which may or may not be

^ The r e l e v a n t s e r i a l d i s c r i m i n a t i o n t e s t s were c a r r i e d out but are not reported here i n d e t a i l . The r e s u l t s were negative so t h a t the summarising statement i s considered s u f f i c i e n t .

179

accompanied by st r e s s on " s a u f t " . In stimulus 15, however, the change t o

focus s t r e s s on the verb has taken place: the peak r i s e i s now f a r enough

away from the p o t e n t i a l accent s y l l a b l e i n "Leo" and t h e r e f o r e no longer

associated w i t h the sub j e c t , FO being low d u r i n g the whole o f the word

"Leo".

The perceptual ambiguity between a s i n g l e ' l a t e ' peak and a ' l a t e ' + ' e a r l y '

peak combination i s even stronger i n cases where two p o t e n t i a l accent

s y l l a b l e s abut and the f i r s t contains a short vowel, as i n "Der Ring

g l a n z t . " , as i s shown i n C o n t r i b u t i o n IV ( H e r t r i c h , 1991a). Even when i n

a b u t t i n g accents the f i r s t vowel i s long, or when a short or long vowel i n

the f i r s t p o t e n t i a l accent p o s i t i o n i s fol l o w e d by one unstressed vowel {/a/

or / B / ) , as i n "Die Uhr t i c k t . " , "Die Bremse q u i e t s c h t . " , "Die Maler malen."

(see H e r t r i c h , 1991a), a perceptual confusion between the two categories i s

pos s i b l e . The confusion can be avoided i f f o r the s i n g l e ' l a t e ' peak the

descent i s r a p i d t o avoid trespassing on the second accent s y l l a b l e domain,

as was demonstrated f o r "Die Maler malen." ( l o c . c i t . ) . So i f the temporal

distance between two p o t e n t i a l accents i s short enough, the FO peak s h i f t

through the sequence produces perceptual changes from subject focus stress

t o dual s t r e s s t o verb focus s t r e s s . And i n the t r a n s i t i o n area between the

two focus stresses, perception may be ambiguous between double and s i n g l e

f i r s t accents. This ambiguity disappears as the distance between p o t e n t i a l

accents gets longer, as i n "Die Backer haben gebacken." or "Die Sekretarin

hat d i e B r i e f e geschrieben." ( l o c . c i t . ) .

In accent sequences at longer distances from each other double stress does

not occur by simple FO peak s h i f t through the ut t e r a n c e ; the peak contour

has t o be broadened at the same time t o r e a l i s e a 'medial' or ' l a t e ' r i s e on

one accent s y l l a b l e and an ' e a r l y ' f a l l on the next one. In between these

two i n t o n a t i o n t u r n s - r i s e and f a l l - associated w i t h two stressed

s y l l a b l e s , there may be an FO d i p of various degrees of extension, t o

generate two p r o p e r l y manifested FO peak contours, or the two peak p o i n t s

are j o i n e d by a plateau or a s l i g h t monotone descent/ascent, c r e a t i n g a 'hat

p a t t e r n ' ( c f . Cohen & ' t Hart, 1967). Although the 'hat p a t t e r n ' i s

p e r c e p t u a l l y and semantically d i f f e r e n t from a succession of complete peaks

(as i s shown i n C o n t r i b u t i o n V I , H e r t r i c h , 1991b, see also C o n t r i b u t i o n V I I ,

Kohler, 1991d), there are strong arguments i n favour o f t r e a t i n g a 'hat

180

p a t t e r n ' as a succession o f two peaks without an FO d i p :

(1) The t i m i n g o f the i n i t i a l r i s e i s e x a c t l y the same as the r i s i n g p a r t i n

a 'medial' or ' l a t e ' peak. There are r i s i n g p a t t e r n s t h a t are timed more

slowly and have r i s e s up t o the beginning of the next stressed s y l l a b l e

(see C o n t r i b u t i o n V I I , Kohler 1991d). They have t o be recognised as

separate e n t i t i e s . So we would have t o set up two r i s i n g p a t t e r n s - slow

and f a s t - but since the l a t t e r coincides w i t h the r i s i n g p a r t o f the

peak p a t t e r n i t i s more economical t o have no new u n i t s ' f a s t r i s e s ' .

The complementary s o l u t i o n t o regard 'medial' or ' l a t e ' peaks, too, as

being composed o f two tonal e n t i t i e s each - r i s e and f a l l - i s r u l e d out

by the f a c t t h a t they c o n s t i t u t e one s t r e s s , whereas the 'hat p a t t e r n '

r i s e s and f a l l s represent two stresses.

(2) The t i m i n g and s y l l a b l e alignments of the f i n a l f a l l c o i n c i d e w i t h the

f a l l i n g s e c t i o n of an ' e a r l y ' (or 'medial') peak.

(3) 'Hat p a t t e r n s ' can be derived from the corresponding dipped peak

sequences by general phonetic r u l e s changing the prominence

r e l a t i o n s h i p s between the f i r s t and the second peak as a consequence o f

removing phonetic features c h a r a c t e r i s t i c o f the d e f i n i t i o n s o f the

d i f f e r e n t FO peaks. Two cases can be d i s t i n g u i s h e d :

(a) In the sequence 'medial' (or ' l a t e ' ) + ' e a r l y ' peaks, the

e l i m i n a t i o n o f the FO d i p does not a f f e c t the e s s e n t i a l f e a t u r e o f the

low f a l l i n g FO i n the ' e a r l y ' peak and also preserves the c h a r a c t e r i s t i c

(low l e v e l +) r i s e i n the 'medial' (or ' l a t e ' ) peak (see 2.1.1.7), but

i t modifies the complete m a n i f e s t a t i o n o f the l a t t e r by removing the

separate FO descent, thereby reducing i t s prominence.

(b) In the sequence 'medial' (or ' l a t e ' ) + 'medial' ( o r ' l a t e ' ) peaks,

the e l i m i n a t i o n o f the FO d i p r e s u l t s i n a loss o f the 'medial' or

' l a t e ' c h a r a c t e r i s t i c s of the second peak because i n a derived 'hat

p a t t e r n ' i t lacks the e s s e n t i a l FO r i s e i n the s y l l a b l e nucleus (see

2.1.1.7), and since i t cannot be associated w i t h an ' e a r l y ' peak e i t h e r ,

not having the e a r l y low f a l l , i t lacks the prominence-lending f e a t u r e

of the 'medial' peak r i s e as w e l l as o f the ' e a r l y ' peak f a l l . But since

on the other hand, the f i r s t peak has i t , the prominence o f the second

one i s subordinated. Thus a p r i n c i p l e d r e l a t i o n s h i p can be established

between 'hat p a t t e r n s ' and peak sequences on the basis o f general

phonetic r u l e s modifying the r e l a t i v e prominences o f the peaks.

181

In both cases {3a) and ( 3 b ) , the generation o f a 'hat p a t t e r n ' from a dipped

peak sequence does not change the number of accents, but only the prominence

r e l a t i o n s between them. Thus when the sentence "Die Wahlerinnen wahlen." i s

combined e i t h e r w i t h a 'hat p a t t e r n ' c o n s i s t i n g o f a medial (or l a t e ) r i s e

on "Wahlerinnen" plus a medial f a l l on "wahlen", or w i t h a s i n g l e 'medial'

(or ' l a t e ' ) peak on "Wahlerinnen", only the second i n t o n a t i o n represents

focus s t r e s s on the subject and deaccentuation o f the verb (see also

C o n t r i b u t i o n V I , H e r t r i c h , 1991b).

3.5 I n t e n s i t y i n the cuing o f s t r e s s and i n t o n a t i o n

The question now a r i s e s as t o whether i t i s p o s s i b l e t o change stress

perception simply by varying i n t e n s i t y . Two t e s t cases may be d i s t i n g u i s h e d :

(a) Utterances t h a t are ambiguous between one and two stresses i n FO peak

s h i f t s , such as "Aber der Leo s a u f t . " i n 3.4,

(b) 'hat p a t t e r n s ' i n which a medial (or l a t e ) FO r i s e i s immediately

followed by a medial FO f a l l , reducing the prominence o f the second

stress compared w i t h the sequence o f two complete peaks ( c f . 3.4).

I f i n t e n s i t y alone can change st r e s s perception, then i t should be possible

in (a) t o produce a switch from double t o i n i t i a l focus s t r e s s simply by

reducing the i n t e n s i t y i n the second accent s y l l a b l e and by simultaneously

r a i s i n g i t i n the f i r s t . S i m i l a r l y i n ( b ) , i t should be p o s s i b l e t o a l t e r

the prominence r e l a t i o n by a comparable i n t e n s i t y adjustment i n the two

accent s y l l a b l e s .

The issue has been t e s t e d i n t e r a c t i v e l y by changing the source amplitude

values accordingly i n the RULSYS TTS s y n t h e s i s - b y - r u l e . The r e s u l t has been

negative: the focussing, and consequently the number o f stresses or the

prominence r e l a t i o n , does not change. I t i s more the r e l a t i v e loudness t h a t

i s a f f e c t e d (see also Kohler, 1991f). This i s f u r t h e r support f o r the long-

established f i n d i n g t h a t i n t e n s i t y has a low s i g n a l l i n g value f o r stress

compared w i t h FO and d u r a t i o n (Fry, 1958).

The s i t u a t i o n i s d i f f e r e n t as regards the c o n t r i b u t i o n o f i n t e n s i t y t o the

perception o f i n t o n a t i o n . Again two cases may be d i s t i n g u i s h e d :

(a) I t has already been discussed i n 2.1.1.5 t h a t a l a t e FO peak p a t t e r n

requires a p a r a l l e l t i m i n g of the i n t e n s i t y course t o guarantee i t s

perceptual i d e n t i t y .

182

(b) I t i s argued i n C o n t r i b u t i o n V (Kohler & Gartenberg, 1991) t h a t lower

i n t e n s i t i e s around the FO peaks i n ' e a r l y ' and ' l a t e ' p a t t e r n s v i s a v i s

'medial' ones have t o be o f f s e t by higher FO t o provide the same

prominence across the d i f f e r e n t i n t o n a t i o n s . On the other hand, the

' e a r l y ' peak p a t t e r n , which accentuates low FO, has i t s c h a r a c t e r i s t i c s

strengthened by not having a lower i n t e n s i t y around i t s prenucleus FO

maximum compensated f o r i n a higher FO peak value.

F i n a l l y , the d i s r u p t i o n o f the n a t u r a l p a r a l l e l i s m i n the time courses of

FO, source amplitude and sound i n t e n s i t y f o r the three t e r m i n a l peak

contours, as i t i s caused by the synthesis of FO peak s h i f t s across an

o r i g i n a l 'medial' peak utterance, may r e s u l t i n a degraded acoustic output

q u a l i t y . So, when a na t u r a l 'medial' peak speech s i g n a l o f "Sie hat j a

gelogen." i s taken as a p o i n t o f departure f o r LPC synthesis w i t h a ' l a t e '

peak, the st r e s s and i n t o n a t i o n categories are s i g n a l l e d c o r r e c t l y , but the

utterance sounds husky at the end and overloaded i n the middle because FO

and i n t e n s i t y diverge i n opposite d i r e c t i o n s i n these two places. To improve

the synthesis q u a l i t y o f ' l a t e ' peaks appropriate c o r r e c t i o n s at these

points i n the i n t e n s i t y curve had t o be c a r r i e d out f o r "Aber der Leo

s a u f t . " i n 3.3.1 (see also Kohler, 1991f).

3.6 General discussion concerning Hypothesis (3)

The perception experiments o f 3.1-5 have l a r g e l y confirmed Hypothesis (3)

and i t s c o r o l l a r i e s o f C o n t r i b u t i o n I (Kohler, 1991b). I f there i s more than

one p o t e n t i a l accent i n a single-accent t e r m i n a l utterance - e i t h e r at the

l e x i c a l or at the sentence l e v e l - three phonological i n t o n a t i o n categories

- ' e a r l y ' , 'medial', ' l a t e ' peaks - are d i s t i n g u i s h e d at each stress

p o s i t i o n , provided the temporal distance between the accent places allows

the separation o f the FO peak c o n f i g u r a t i o n s . Furthermore, an FO peak s h i f t

a l t e r s the s t r e s s p o s i t i o n as w e l l , which can r e s u l t i n an i n t e r a c t i o n o f

s t r e s s and intonation i f two accent s y l l a b l e s occur at such a short d u r a t i o n

i n t e r v a l t h a t the r i s i n g and f a l l i n g branches o f a peak contour can be at

the time associated w i t h a s i n g l e peak on the f i r s t accent s y l l a b l e or w i t h

a succession of two peaks on two successive accent s y l l a b l e s , not separated

by an FO d i p . This ambivalence of a stimulus between s i n g l e and double

stress r e s u l t s i n a perceptual ambiguity between, e.g., p r e f i x and stem word

stress at the l e x i c a l l e v e l , or subject and verb s t r e s s at the sentence

183

l e v e l . I t i s only when the FO peak i s moved out of the j o i n t domains of both

accent s y l l a b l e s i n order t o be e x c l u s i v e l y i n t h a t o f the second one t h a t

the ambiguity i s resolved and second p o s i t i o n focus s t r e s s r e s u l t s . A l i n k

has thus been e s t a b l i s h e d between 'hat p a t t e r n s ' and dipped FO peak

sequences, based on prominence r e l a t i o n s h i p s , as postulated by Hypothesis

(4) i n C o n t r i b u t i o n I . This p o i n t w i l l be f u r t h e r discussed i n Co n t r i b u t i o n s

VI ( H e r t r i c h , 1991b) and V I I (Kohler, 1991d).

Duration i s a f u r t h e r cue t o st r e s s i n German, but u s u a l l y subordinated t o

FO, unless i t i s too short f o r what i s t o be expected of stressed vowels.

I n t e n s i t y and sp e c t r a l c h a r a c t e r i s t i c s , on the other hand, do not seem t o

play a r o l e i n s t r e s s perception. I n t e n s i t y intervenes as an important cue

t o intonation i d e n t i t y and t o voice (speech) q u a l i t y when the us u a l l y

p a r a l l e l time courses of FO and i n t e n s i t y are d i s r u p t e d , and i t i s , of

course, the signal a t t r i b u t e o f loudness. F i n a l l y , the height o f an FO peak

cues prominence at the perceptual and emphasis at the semantic l e v e l (see

2.3, and C o n t r i b u t i o n s I , V and V I I , Gartenberg & Panzlaff-Reuter, 1991,

Section 6.; Kohler & Gartenberg, 1991; Kohler, 1991d).

4. Conclusions f o r the K i e l I n t o n a t i o n Model o f German (KIM)

The r e s u l t s o f the experiments discussed i n t h i s C o n t r i b u t i o n I I I suggest a

number of p o i n t s t h a t have t o be taken i n t o account i n KIM as regards the

i n t o n a t i o n peak component o f the model.

1. KIM must comprise the phonetic i n t o n a t i o n model proper and the s y n t a c t i c ,

semantic and pragmatic environment p r o v i d i n g i n t e r p r e t a t i o n s f o r symbolic

representations o f sentences as input t o the model.

2. In p a r t i c u l a r , t h i s environment must s p e c i f y the l e x i c a l items t h a t are

to receive sentence s t r e s s , and i t must provide semantic i n t e r p r e t a t i o n s

along the dimensions 'established/new', 'degree of distance between the

speaker and the world', and 'emphasis'.

3. The basic categories of the phonetic model include

(a) a f e a t u r e s p e c i f i c a t i o n o f s t r e s s w i t h reference t o the signal

p r o p e r t i e s FO and d u r a t i o n : iFSTRESS, ±DSTRESS,

(b) a f e a t u r e s p e c i f i c a t i o n o f intonation w i t h reference t o FO peak

p o s i t i o n : lEARLY, and ±LATE w i t h i n -EARLY,

(c) the t i m i n g of the i n t o n a t i o n peaks depending on s y l l a b l e s t r u c t u r e s

(mono/polysyllables, long/short vowels, voiced/voiceless consonant

184

environment),

(d) a numerical scale of peak height with reference to degrees of

prominence,

(e) IFO and CFO modifications of the basic peak contours,

(f) intensity adjustments to guarantee parallelism with FO time course.

4. After the introduction of the peak categories the model has to deal with

their concatenation.

(a) An FO descent from a peak position can be fast or slow. In the

latter case double accentuation may result, or a main accent

followed by a secondary one, e.g. in "Er hat einen Brief

geschrieben." ("He's written a letter.") the final participle is

deaccented in relation to "Brief", which gets the main nuclear

sentence stress. But the deaccentuation may result in a default

secondary stress, or in no stress at all suggesting a contrast

between, for example, "Brief" and "Karte" ("card"). This is the same

phenomenon as what Kingdon (1965, p. 195) has called 'semantic

partial stress' with reference to compounds of different degrees of

semantic unity, e.g. "butter cup" (cup for butter) with secondary

stress on "cup" vs. "buttercup" (ranunculus) with unstressed "cup".

The phonetic manifestation of this difference is not only one of

duration, but, first and foremost, of different timings of the FO

fall from the FO peak.

(b) Besides peak sequences various 'hat patterns' have to be generated

and the semantic and pragmatic differences evaluated.

These points will be developed in Contribution VII (Kohler, 1991d),

supplemented by further model components derived from the empirical data

collections in the other contributions and from interactive RULSYS TTS

experimentation.

185

terminal intonation patterns in single-accent utterances ...€¦ · phonetics institute. although...

Documents