perception of spe ra related aspects and - s u

On Vowels

Perception of Spectral Features, Related Aspects of Production and Sociophonetic Dilllensions

Hartmut Traunmuller

University of Stockholm, 1983

Akademisk avhandling for filosofie

doktorsexamen

Institutionen for lingvistik

106 91 Stockholm

Abstract

The first and major part of this thesis deals with spectral features of

vowels and with the distinction of phonetic information from personal

and transmittal information also conveyed to listeners by speech sounds.

The results of perceptual experiments with synthetic vowels whose

fundamental and first formant were varied in frequency suggested that

the smaller tonotopical distances between formants « 6 Bark) are

invariant in phonetically identical vowels. This was also confirmed by an

analysis of formant frequency data of vowels produced by male and

female speakers of several languages. It is further investigated how

partials are resolved in the process of timbre perception. Previous

experiments by other researchers suggest an effective bandwidth close

to three Bark. In similar experiments, though using different stimuli, this

result could not be replicated. A re-analysis of some other experimental

results gave, among other details, effective bandwidths roughly propor

tional to frequency in the range below 600 Hz. Due to contextual effects,

the general validity of this result is in question. The non-uniform

sex-differences in formant frequencies are shown mainly to be conse

quences of an anatomical development in accord with the perceptual

condition of invariant phonetic qualities.

The second part of the thesis, Vocalism in Eastern Central Bavarian, represents a case study of the realization of sociophonetic dimensions in

speech. In the chosen group of dialects some phonological rules lead to a

richly shadowed vowel system. The application of these rules is

investigated with respect to dialectal, sociolectaJ, speaker age, and

speech tempo variation.

© 1983 Hartmut Trannmuller

ISBN 91-7146-324-0 minab/gotab, Stockholm, 1983

FÖRORD

Doktorsavhandlingar brukar oftast behandla ett noga avgränsat ämne

enligt en i förhand uppställd plan. Jag har emellertid valt ett annat

tillvägagångssätt. Med utgångspunkt i ett experiment, rörande percep

tionen av vokaler, följde jag de uppslag och fråge ställning ar som

uppenbarade sig under arbete ts gång. Dess a uppslag ledde delvis åt

vitt skilda håll. Således berör jag förutom huvudtemat, talets percep

tion, även psykoakustik, talets produktion, fonologi och sociolingvis

tik. Ämnesbehandl i ngen bl ev dä ri genom i fl era av seenden mi ndre dj up

gående. Jag 1 eddes av övertygel sen, att ett obundet forskande skull e

leda till ett ökat vetenskapligt utbyte.

Förutom från mi n handl edare, Bj örn Li ndb lom, vars värdefull a syn

punkter har påverkat utformningen av flertalet del uppsatser som ingår

i avhandl ingen, har jag utnyttjat uppsl ag och synp unkter från James

Lubker (uppsats 2), Tore Janson och Astrid Stedje (uppsats 6) samt

från de anonyma förhands recensenterna av uppsatserna 2 och 3.

Johan Sundberg och Björn Lindblom har ställt sina ansat srörsdata

t ill mitt för fogande och Johan Liljencrants har intro ducerat mig i

sitt LEA-simuleringsprogram.

Karin Holmgren, Peter Branderud och Johan Stark har på 01 ika sätt

underlättat mitt umgänge med fonetiklaboratoriets dator.

Richard Schulman och James Lubker har granskat min engelska och

Suzanne Schlyter har gjort om mina versioner av sammanfattningar på

franska .

Karin Holmgren, Milly Söderman med flera har hjälpt mig vid manu

skriptens framställ ning.

Gunnar Fant och hans medarbetare vid Institutionen för tal över

föring, KTH, har visat ett stimulerande intresse för mitt arbete.

T i l 1 a l l a, ä ve n t i l l dem s om h a r s t ä l 1 t up P som f ö r s ö k spe r s o n e r ,

vill jag framföra ett hjärtligt tack!

Jag vill också tacka min maka, Neeltje, som har haft överseende med

min ibland nyckfulla arbetsrytm och an dra påfrestningar som mitt

arbete har medfört.

I wish to express my gratitude to the anonymous reviewers of papers

2 and 3, whose valuable criticism led to a significant improvement of

these papers.

Stockholm, 1983-04-19

Hartmut TraunmUll e r .

The thesis consists of the present summary and the fol lowing papers by

Hartmut TraunmUll er:

( 1) Analytical expressions for the tonotopical sensory scal e, submit

ted to Acusti ca.

(2 ) Perceptual dimension of openness in vowels, in: The Journal of the

Acoustical Society of America, 69, 1981, pp. 1 465-1475.

(3) Articulatory and perceptual factors controlling the age- and sex

conditioned variability in formant frequencies of vowels, submitted to

Speech Communication.

(4) Perception of timbre: evidence for spectral resolution bandwidth

different from critical band?, in: R. Carl son and B. Granstrom (eds. l , The Representation of Speech in the Peripheral Auditory S y stem,

Elsevier Biomedical Press, Amsterdam, 1982, pp. 103-108.

(5 ) Die spektrale Auflosung bei der Wahrnehmung der Kl angfarbe von

Vokalen, submitted to Acustica.

(6) Der Vokalismus im Ostmittelbairischen, in: Zeitschrift f'jr Dia

lektologie und Linguistik, 49, 1982, S. 289-333.

Contents: page

1 Spectral features of vowel s 2 1.1 The tonotopical sensory scale 2 1 . 2 The acoustic-to-phonetic transformation 3 1 . 3 Spectral resolution in timbre perception 8

1. 4 Acoustical consequences of vocal tract growth 13

2 Vocalism in Eastern Central Bavarian 15

1

1 SPECTRAL FEATURES OF VOWELS

1.1 The tonotopical sensory scale

(paper 1)

One of the principal features of the sense o f hearing is the frequen

cy-to-place transformation realize d in the cochle a. The resulting

tonotopical order is kno wn to be maintained also at higher levels in

the brain. A psychoacoustical measure of the tonotopical dimension is

known as "cr itical-band rate" or "tona l ity" ("Tonheit") z. Its unit,

one Bark, is e qual to the fundamental bandwidth of resolution evi

denced in loudness summation and in mask ing experiments. One Bark cor

responds approximately to 25 steps in just noticeable frequency diffe

rences and to 150 haircells and 1.3 mm along the basilar mem�rane. The

Bark-scale, see Figure 1, has been published by Z wicker (1960) in the

form of a table. Several analytical expressions which approximate this

function have been publish e d. In the course of the inve stigations

summari zed below, the publ i shed equati ons were found not to be accu

rate enough in some applications in w h ic h either the tonoto pical

di stance between for-

mants was to be assessed

or the formants of syn

thetic vowels were to be

shifted a certa in dis

tance along the tono

topical scale. In ad

dition, some of the e

quations were not simple

to invert. Therefore a

new attempt was made

with a rigorous demand

for accuracy withi n the

range of vowel formant

frequencies up to child

ren'S F4. T h e obtained

express i on compa res with

the most accurate ones

previously kno wn as fol

lows:

24

t 20 _2_,_ 26,81 - 0,53 ---, =-Bark 1960 Hz 1 +

-'" f '-JS 16 N

12

8

4

0 0,1 0,2 0,5 2 5 10 20

f [kHz] ---

Figure 1. Critical-band rate z as a function of frequency f. Data

pOints from Zwicker (1960). Frequency f scaled logarithmically. The

curve corresponds to equati on (4),

2

Equation and inverse

(1) z = 7ln [(f/650) + [ (f/650)2 + 1]112]

(2) f=650sinh(z!7)

(3) z = 13atn( O.00076f) + 3. Satn( f!7 500) 2 no simple inverse

(4) z=26. 8lf/ (1960+f) -0. 53

( 5 ) f = 1960 (z + 0. 53) / (26. 28 - z)

Authors:

Deviation from table

+ - 0. 13 Bark for

f < 4000 Hz

+ - 0.20 Bark

+ -0.05 Bark for

200 Hz < f < 6700 Hz

(1), (2): Schroeder; (3): Zwicker and Terhardt; (4), (5): Traunmiiller.

The subtractive constant in (4) is irrelevant in applications where

only distances between formants are to be known. The paper also con

tains modifications of eqs. (3) and (4) to cover the whole range of

auditory perception with high accuracy. These equations are less

simple, however. It is also shown how to calculate critical band

widths.

1.2 The acoustic-to-phonetic transformation

(papers 2 and 3)

Speech sounds, as heard by listeners, contain besides the phonetically

coded verbal information al so information about the speaker (e.g. age,

sex, mood) and about his location in relation to the listener. The

acoustic properties of the realizations of given phonemes vary widely

depending on the physical dimensions of the speaker's vocal tract, on

his vocal effort, and on the transmission of the signal (distance,

reverberation). Listeners are able to distinguish these phonetic,

personal, and transmittal qualities of speech sounds. This raises the

question on which decisive properties the perception of phonetic

quality is based. Besides the mentioned factors, the acoustic proper

ties of given speech sounds or, seen the other way, the phonetic

quality of a sound with given acoustic properties is also dependent on

context. The present study, however, treats only intrinsic factors of

vowels.

3

A perceptual analysis of vowels spoken by men, women, and children,

recorded on a gramophone disk and reproduced with several different

velocities led Chiba and Kajiyama (1941) to the conclusion that "a

vowel is character ized by its relative formants, provided t h at t h e

centers of the formants are situated within certain regions fixed for

a given vowel". Assuming, as these researchers did, a linear relation

ship between the logarithm of frequency and the positio n along the

basilar membrane, vowels with the same formant frequency ratios would

create equivalent exc itation patterns. Potter and Steinberg (1950) posed essentially the same hypothesis: " within limits, a certain

spatial pattern of stimulation along the bas ilar membrane may be

identified as a given sound regardless of posit ion along the mem

brane" .

The di sp 1 acement between the formant frequency patterns of the same

vo wels spoken by men, women, and children sho ws indeed a certain

degree of uniformity on a tonotopical scale. The female-male differen

ces in F4, F3, and, except for back rounded vowel s, in F2 all come

close to 1.0 Bark. F 2 in back vowels and F1 in closed vowels appear,

however, to deviate from this tendency (see Table 1). This is substan-

Vowel Fl F2 F3 kl k2 k3 dl d2 d3

u 310 760 2225 1.06 1.01 1.23 0.20 0.05 1.40

o.U 425 815 2375 1.07 1.05 1.17 0.25 0.30 1.05

J 500 840 2470 1.11 1.06 1.13 0.45 0.35 0.80

u.a 670 1045 2510 1.17 1.12 1.15 0.85 0.70 0.90

a 735 1270 2480 1.25 1.15 1.15 1.25 0.90 0.90

II! 650 1670 2425 1.2 7 1.17 1.18 1.30 1.05 1.10

E 480 1840 2455 1.19 1.18 1.20 0.80 1.10 1.20

e. I 360 2045 2580 1.11 1.22 1.18 0.35 1.35 1.10

275 21 90 2950 1.07 1.21 1.13 0.20 1.25 0.80

Y 265 1835 2225 1.00 1.19 1.17 0.00 1.15 1.05

� 375 1610 2185 1.05 1.16 1.16 0.20 1.00 1.00

re 450 1390 2275 1.07 1 .1 8 1.17 0.30 1.10 1.05

Table 1.

Formant frequencies F. in Hz, i n vowels spoken by men; formant frequency ratios

k=F IF •

fem mal e' and critical-band rate differences d = z(Ffem) - z(Fmale), in

Bark. Mean v alues from spea kers of several la nguages (Am. Engl ish, Sw edish,

Serbocroat ian, Dan ish, Estonian. and Dutch). t aken from a study by G. Fant

(1975). For more det a i 1 s see paper 3, Tab 1 es 1 to 3.

4

tiated by fairly uniform published data on vowels from speakers of

several 1 anguages.

The present investigation of the factors decisive for perceived phone

tic quality involved the following steps:

(1) The factors influencing the perceived phonetiC quality of synthe

tic vowels containing only one formant were mapped. The two parameters

fo and F� were varied over the whole range of frequencies that can be

observed for f 0 and F1 in natural speech . The vowel s were identified

by 23 native speakers of a Bavarian dialect in which five degrees of

vowel openness occur di stinctively (cf. Table 2, page 15).

(2) It was investigated to what extent variables other than fo and F1 in natural vowels, i.e , the higher formants influence the perception

of features carried by fo and F� alone in one-formant vowels. To this

end, synthetic versions of natural vowels with F� and/or fo systemati

cally displaced in frequency were generated to be identified by those

same subjects.

(3) From the results of these experiments it coul d be inferred that

the phonetic interpretation of the cochlear version of speech sound

spectra (excitation patterns) is based on an analysis of local confi

gurations no wider than roughly 6 Bark. The shape of these configura

tions, which depends critically on the distances between neighbouring

formants, is more important for the phonetic quality than the absolute

position of these configurations. On this basis it was predicted that

in vowel s sharing the same phonetic qual ity, any distances between

formants will be invariant if they are smaller than 6 Bark. This

prediction was then confirmed by an analysis of the data (Table 1) on

male and female vowel productions. It can be seen in Figure 2 that the

distances between F3 and F2 and between F2 and F1 are almost the same

in vowels spoken by men and by women as long as they are smaller than

5 or 6 Bark, while most larger distances happen to be smaller in

vowel s spoken by men than in those by women. The t onal ity distance

between F1 and the glottal "formant" Fg, just above f 0' whi ch shows up

as a peak in the spectrum of open vowels also appears to be invariant

and should be so. The data on this matter are not, however, highly

re 1 i ab 1 e.

5

The perception of the phonetic

qual i ty of vowel s can be seen

as a process of tonotopical

gestalt recognition, analogous

to vi sua 1 percept i on of form.

Loca 1 features p1 ay an i mpor

tant ro1 e in both cases. Thi s

analogy is further corrobo

rated by the transposabi1 ity

of formant patterns on diffe

rent carriers such as buzz or

noise (in whispering). Visual'

form can also be carried by

different underlying struc

tures.

The conclusion that local con

figurations are more important

than the overall shape of the

Z2-Z1

[Bark)

10

5

\ Y \e

E 1-I

i2'f. I _________ L _________ _

I I I

a b I ...... a

---.. u

O+-�--��--+_-r�r_.__.__r o 5 Z3 - Z2 [Bark] 10

Figure 2. Tonotopical distance, expressed in Bark, be

tween F2 and F1 plotted against that between F3 and F2 in the vowels of Table 1. Crosses: lIale versions; rings:

fe_ale versions. These distances are predicted to be

f nvari ant if they are below 5 or 6 Bark.

excitation pattern was arrived at by several observations:

** The phonetic quality of the male vowels [i e e] was unchanged

when both Fl and fo were moved up in frequency while their distance

was kept constant on a modified tonotopical scale that had the

property of leaving the perceived degree of openness invariant

(paper 2, expo 2). On the basi s of an overall match, these vowel s

should all have been heard as rounded due to the decreased distance

between F2 and Fl. Actually, this happened only when this distance

became less than 6 Bark (in the [e]-based stimuli). Thus, the dis

tance between F2 and Fl is not a preva i 1 i ng cue to roundedness if

it is larger than 6 Bark.

** When fo was moved up in frequency in the vowels [e E �], these

were heard as progressively less open, to end up with Ii] or [y]

(paper 2, expo 4). This is in accordance with the results of the

one-formant vowel identifications. In the [�]-based stimul i, how

ever, a large minority of subjects perceived no change in phonetic

identity when fo was moved up. In this vowel, the distance between

F2 and F1 was only 5 Bark. Hence, the identification can be based

on the configuration shaped by FI, F2, and the higher formants.

6

** Previous experimentation had shown that one-formant vowels with

the formant located above 1.2 kHz were i denti fi ed on the basi s of

the position of that formant with little or no dependence on fo. In

this case, then, neither the overall shape nor any local confi

guration provides any cues as to the phonetic identity of the

stimulus.

** It might be expected that one-formant vowels would be heard as

back vowels since they are dissimilar to front vowels in that they

do not have any prominent components at high frequencies, but only

one of the subjects perceived them this way . The other 22 heard

mostly front vowels, in particular front rounded ones, though also

some back vowels, mostly [u] and [0] ( besides [ a ] , for which no

front/back opposition exists in the subjects' dialect). If local

con figurations are more important than the overall shape of the

spectrum, then the prevailing cue to backness will be that there is

a second formant, F2, close to Fl, while in front vowels F2 is

close to F3. In one-formant vowel s there is defi ni tely no second

formant close to the first, and this appears often to prevent

people from hearing back vowels, although these stimuli lack any

kind of positive cue to frontness and are more similar to back

vowels in overall spectrum shape. Those back vowels in which F2 is

generally most promi nent ([J] and [oJ) were rarely ever heard.

It is well known that the frequency position of the first formant is

correlated with the perceptual dimension of openness in vowels. One

formant vowels can be expected to carry at least some information

about this dimension. Figure 3 shows the identifications of one

formant vowel s. Phonemes wi th the same degree of openness are col

lapsed. It can be seen that the prevailing criterion for perceived

openness is not F� alone, but the tonotopical distance between F� and

fo (Boundaries running horizontally in the figure). Only when F� is

very high, is the position of F� alone decisive (boundary between [aJ and less open vowels at fo > 3 50 Hz). At fo = 350 Hz, an abrupt

change in response behaviour can be seen. At higher fundamental fre

quencies, the distance between the first two partial s is apparently

too large to allow the ear to extract the original formant. The second

partial is discerned as the first spectral peak above fo and apparent

ly interpreted as FI, with fo as Fg.

7

The experiments with shif

ted fo and/or Fl demon

strated that the relation

between Fl and fo is in

deed the prevailing cue to

pe rce i ved deg ree of open

ness in natural, voiced

vowels. The higher for

mants contribute only mar

ginally .

At fundamental frequencies

below 350 Hz, the distance

between F� and fo is not

str ictly invariant in

vowels with the same

p e r c e ive d d e g r e e o f

openness. This can be seen

m o r e c l e a r l y b y a n

analysis of vowel s spok en

by men, women, and chil

dren (see paper 3, 'figure

9) . It reveals th at in

closed and half-open vo-

.E c .. E c " c "

" c c

c c E L o

c .. .. »

.. ... c c +-'" "

7

6

5

4

3

2

0,02 0,1

tonality of fundamental [Bark]

0,2 0,3 fundamental

0,4 0,5 frequency

0,6 [k Hz J

0,7

Figure 3, Identifications of one-formant vowels by subjects

competent in an Eastern Central Bavarian dialect. Horizon

tally: fundamental frequency. scaled tonotopically; verti

cally: tonotopical distance between formant [or partial) and fundamental; bisector of the coordinates: tonotopical posi

tion of formant. Phonemes with the same degree of openness

(or "vowel height") collapsed, Dashed areas: Boundary re-

gions with l ow conformity between subjects. First � \0.... .. -partials also shown.

wel s the tonotopi cal di stance between F1 and f 0 is small er in vowel s

spoken by women than in those by men and children. Several alternative

hypotheses able to explain this observation are discussed. At least in

par t, this particu lar ity may be due to restricted spectral

selectivity, as discussed in the following section.

1.3 Spectral resolution in timbre perception

( papers 4 and 5)

In ordinary vowels, the resonances of the vocal tract, known as for

mants, are "sampled" by the partials of the glottal voice. This leads

to a more or less exact rendition of the formants, depending on fo' In

a rough approximation, the formant bandwidth B follows the expression

B = 0.05 F + 50 Hz, wi th F = center frequency of the formant. For low

formant frequencies, the interspace between consecutive partials is

3

wider than B even in l ow

pitched voices (see Fi gure 4). Despite the consequently un

certain rendition of the peaks

of lower formants, 1 i ste n ers

are nevertheless capab l e to

extract a feature ( openness ) closely associated with t h e

frequency pos i t i on of Fl.

On the perceptual side o f this

pr obl em, the k n own frequency

selectivity of the ear has to

be considered. A frequency

band of 1 Bark comprises at

any p osit i on a range of fr e

quencies wider than for mant

bandwi dth B.

If two partials are closer

than 1 Bar k, they will not

produce separate pe a k s in

1'0

[k�Zl 5

2

0,5

200 300 _400 fo [Hz]

Figure 4. Frequencies f above which there is always

at least one pa rtial within the frequency range

f : 0.025 f : 25 Hz (formant peaks, c u rve a) or

within f � 0.5 Afg

(one-Bar\( frequency bands, curve

b) at a f undamental frequency fo

' In the region

below c urve a, the formant peaks are deficiently

reproduced and in that below curve b, single par

tial s can appear as peaks in a diagram of loudness

density vs. critical-ba nd rat e. The dashed region

covers the range of characteristic frequencies of

vowels (fo to F4) at normal phonation.

loudness-density over critical-band rate. At sufficiently low funda-

mental frequencies, peaks will appear in the auditory spectrum only

where shaped by formants. At higher fo

's, the l owest partials will

shape thei r own peaks ( see Fi gure 4). In the frequency range where

this occurs, particularly in the speech of women and children, we find

Fl and in back r ounded v owels also F2. These f ormants will then be

diff i cult to locali ze and the second partial might be interpreted as

Fl. In natural vowels, this might not occur, since the higher formants

make the " c orrect " identification of these vowel s possible. In one

f or mant v owels, the expected consequence can be observed, but it

appears only at fo > 350 Hz ( see Figure 3). This could be explained on

the basis of an effective bandwi dth of resolution or spectral integra

tion close to 3 Bark.

Spectral integration over a range of 3 Bark could also explain the

fact that the difference in the position of Fl betw �en male and female

producti ons of closed and half-open vowels is smaller tha� e x pected on

the basis of an invariant tonotopical distance between Fl and fo

' In

9

the tonotopical representation of these vowe l s, the configuration

shaped by the partials up to and including Fl can be considered deci

sive for perceived openness. Th is configuration is characterized by a

certain distance between its low-frequency flank and the peak shaped

by Fl. This distance, which we would predict to be invariant, would be

independent of fo for f

o < 150 Hz (� 1.5 Bark), because for lower

f 's, the low frequency flank of the configuration would coincide with o

the end of the tonotopical scal e.

It has been observed previousl y, that a group of closel y spaced vowel

formants can be repl aced by a single formant in simplified synthetic

vowels. The second formant in synthetic two-formant vowel s stands not

only for F2, but for the who l e group of higher formants (F2, F3, F4)

in closed an d hal f-open front vowe l s. In Fi gure 3 , the boundary be

tween raJ and less open vowel s, mostly liE] and [CE], does not match

the ordinary position of Fl. Apparentl y, the singl e formant has been

matched to some mean of nan d F2 in [ a] , where these formants are

closer than 3 Bark, while in liE] and [CE], where they are more

distant from each other, it has been matched with F1 alone.

In some experiments by Chistovich et al. (1979), subjects had to match

vowel-like two-formant sounds with one-formant sounds by adjusting the

f requency posit ion of a formant. The resu l ts s h owed that the single

formant was p l aced in the midd l e between the formants of the two

formant stimulus as l ong as their tonotopical distance�z remained

below a critical va l ue.6zc of 3.0 to 3 .5 Bark. The preferred position

of the s i ngl e formant coul d be moved conti nuousl y between the two

formants by variation of their relative leve l s. If6Z was increased

above6zc

' this was not feasible any more, and the single formant was

placed close to either one of the two formants.

Further evidence fo r an effective bandwi dth larger than 1 Bark was

obtained by Benedini (1978), investigating the timbre differen ces

between complex tones consisting of four, five, or six harmonics of

100 Hz. In a model si mul ati on, presuppos i ng gaussi an shaped spreadi ng

of each partia l , Benedini arrived at 6 = 1 Bark , equivalent t o a

bandwidth of roughly 2 Bark.

All of these observations might be explained by one and the same, and

possibly quite peripheral feature of t he auditory system. The present

1�

studies were intended to further illuminate this possibility.

In a perceptual experiment, subjects had to rate the simil arity be

tween a speech-unli ke two-formant noise and a tone. The tone was equal

in frequency either to one of the formants or to their critical-band

rate mean, which was at 1.6 kHz in each case. The distance between the

two formants was varied between 1 and 5 Bark in steps of 1. In a

second experiment, the noises and tones were replaced by buzz-excited

two-formant and one-formant sounds.

The essential findings were the following:

** Most subjects ignored F2. Their ratings were quite high ly

correlated with the tonotopical closeness of F1 to the matched tone

or single formant.

** The perceptual saliency of Fl was much higher than that of F2,

even when F1 was attenuated (in a variation of the first experi

ment) .

** Subj ects appeared to use the same criteri a whether match i ng

simple tones or one-formant sounds to two-formant stimuli.

** A minority of subjects gave higher ratings to pairs in which

the single formant or tone was at the critical-band rate mean of

t he formant pairs. This tendency was present up to the maxi mal

distance occurring in the set of stimul i (5 Bark).

These results, which are strikingly dif ferent from those obtained by

Chistovich et al. (1979) with more vowel-like stimuli, do not support

the hypothesis that a 3-Bark band of integration is fundamental to

timbre perception in general. The majority of subjects in the present

exper iments apparently based their judgements on the pitch of the

lower formant vs. that of the matched tone or single formant.

A re-analysis of the timbre dif ferences measured by Benedini led to

the following findings:

** The perceptual 'Neight of single partials, equal in level, was

in that experiment roughly proportional to their frequency (with

f � 60 0 Hz), except for the lowest partial present in the sound.

*� The per c e p t u a 1 wei g h t 0 f the lo wes t par t i a 1 a p pea red to b e

substantially increased. The timbre differences created by removal

of that parti al probably constitute a dimension (full vs. residual

tones) dif ferent from that descriptive of the other timbre diffe-

11

rences present in that experiment.

** The bandwidth o f spectral resolution was also proportional to

f. The resolution function can be described by a resonance of the

type a = 11 (1 + 112/n)n/2

, where a = attenuation rel ati ve to the

peak and.fl.= vld, with v = f2/fl - f1/f2' and d = damping coeffi

c ient. The best fit to the data was obtained with n = 2 and d =

0.316 (Q = 3 .16).

** The perceived magnitude of timbre differences was intricately

dependent on context of presentation.

The other experimental results suggesting a bandwidth of roughly 3 Bark would be compatible with the assumption that d is constant over

the whole range of auditory perception. A d = 0.316 is equivalent to a

bandwidth B lIS 2. 0 Bark in the frequency range 0 .8 kHz < f < 4.8 kHz.

The higher value of � obtained in formant matching experiments could I 4 Zc be accounted for as being due to the addition of the intrinsic B of

the formants involved. At lower frequencies, however, the bandwidth of

resolution evidenced here is clearly too narrow to solve the problem

illustrated in Figure 4.

An analysis of the perceived timbre differences between complex tones,

low pass limited at frequencies fh b etween 130 and 1 7 20 Hz, also

measured by Benedini (1978), showed that these differences were highly

correlated with the tonotopical distances between these limiting fre

quenci es. I f the spectral resolution obtained from the precedi ng

experiments is taken into account by replacing fh

with 1.4 fh, a still

higher correlation is obtained (rank order correlation coefficient rs

= 0.993) . However, if the frequen cy dependence of the perceptual

weight of the partials is also taken into account, this correlation

declines. This indicates that the listeners recognized that the stimu

li differed only in fh' and consequently based their judgements exclu

sively on the distances between the upper flanks o f the compared

harmonic tones.

Consequently, it is concluded that the judgement of timbre differences

involves the extraction of certain dimensions or features induced by

context. It is suggested that the limited resolution that appears in

experiments involving vowel quality may be due to a limited resolution

intrinsic to the phonetic templates supposedly stored in memory.

12

1.4 Acoustical consequences of vocal tract growth

( paper3 )

The speaker category differences in the characteristic frequencies of

vowels are, obviously, a consequence of physiological facts . Th e

physical properties of the glottal structures determine the range of

fundamental frequencies that can be used comfortably, and the dimen

sions of the vocal tract confine the range of possible formant fre

quency variation. Most of the differences in vocal tract shape between

men and women are due to the p hysiological changes affecting boys

during puberty. We may, then, ask whether these changes demand a

modification of the speaker's articulatory habits in order to keep the

phonetic qual ity of vowel s the same as before, or wheth er, inversely,

the differences in the acoustic data can be understood as a result of

unchanged articulatory habits in spite of the physiological changes.

This last hypothesis does not apply to the earlier developement in

c hildren whose articulatory habits are not yet rigidly and precisely

established.

To a first order of approximation, the age- and sex-dependent diffe

rences in vowel formant frequencies can be understood as a consequence

of a proportional re-scaling of all t hree dimensions of the vocal

tract. This would leave the formant frequency ratios i'1variant. All

t he k-values in Table 1 would then be the same. By virtue of the

observed systematic deviations from the mean k-value, we conclude that

for the same vowels of different speaker categories, the vocal tracts

are not only different in size, but not even proportional in shape.

The p h ysiological changes affecting boys during puberty include an

elongation of the vocal folds, an increase in the cross sectional area

of the pharyngeal tube, and an elongation of that tube du e to the

descent of the larynx. Further, the back of the tongue is pulled down

to some extent as a consequence of larynx descent. Ceteris paribus,

the cross-sectional areas of the vocal tract will then increase also

in the pa1 ato-velar region. In terms of percentage change, t h is in

crease wi 11 be 1 argest for closed ( hi g h ) vowel s. Thi s effect has not

been taken into account in previous studies treating this topic.

The acoustical consequences of these changes were estimated by means

of model calculations using a computer program simulating an electri-

13

cal line analog of the vocal tract. The area functions were taken from

data by B. Lindblom and J. Sundberg on a mal e speaker. They were

changed slightly in order to yield approximately the formant frequen

cies in Table 1. The vocal tract shapes we re conseq ue ntly perturbed in

a way rather crudely cancelling the changes occurring dur ing puberty.

It was f urther assumed that these changes account for most of t h e

differences between men and women. The results of the calculations are

shown in Figure 5 together with the observed female/male k-values. It

can be seen that, for all three formants, the calculated k-values

reproduce the observed tenden-

cies to some degree. We may,

then, conclude that the chan

ges occurring during puberty

in the m ean c ase do not re

quire an active modi fication

of articul atory gestures to

t kn = Fn�/ Fnc!' 1.3 ,-----------------,

1.2

1.1

k3 1.0 L--_____________ --'

any large extent. Only with 1.3 ,-----------------, regard to labial o pening in

rounded vowels, do the present

calculations not provide an

unequivocal answer. By and

large, the normal combinations

of vocal fold length, vertical

larynx position, pharyngeal

cros s-sec t ional a r eas an d

overall openness of the vocal

tract a p pear to l eave the

phoneti c qual ity of s peech

sounds invariant. If the phy

siological changes were not in

harmony with the acoustic-to

phonetic tran sformation, we

woul d expect a predi sposition

for a s tead y vowel shift,

uniform in all languages.

1.2

1.1

10 -'-'-'-'-'-'-'-'-'

k 2 0.9 L--_____________ ......J 1.3 ,------------------,

1.2

1.1 t •

1.0 '-'-'-'-'-'--'-'-'--

k 1

0.9 L-____________ ----'

U 0 J a a � e e i y _ �

Figure 5. Female-male formant frequency ratios in

the vowels of Table 1 shown by diamonds. Connected

rings show result of present calcul ations simulati ng

female-mal e anatomical differences.

14

2 VOCALISM IN EASTERN CENTRAL BAVARIAN

(paper 6)

The perceptual experiments in which the subjects had to identify

synthetic vowels were performed with speakers of the Eastern Central

Bavaria n dialect (Ostmittelbairisch) of Amstetten, Austria. Speakers

of that dialect were chosen because of it's large number of different

vowels and finely shadowed distinctions of vowel openness (or vowel

"heigh t") which might contribute to a high reliability and "reso

lution" in the subjects' responses. The vowels are shown in Table 2.

+nasal T e ..... -- ...... u y F [ tt:] [ 01] ee Je CI! a D 0 oe

-nasal e E CI! a D J 0 u Y (J CI! cr JI 01 ul ie ee )e ue ye tEe

palatal + + + + - + + + + - + + + +

rounded - - - - - + + + + + + + + + + + + + + +

openness 1 2 3 4 5 4 3 2 1 1 2 3 4 3 2 1 1 3 3 1 1 3

Tabl e 2. Feature analysis of the non-reduced vowels in the E astern Central Bavarian dialect

of Amstetten, Austria. In stressed syllables, these vowels ar e short in duration

when followed by a fortis consonance, otherwis e they are long when stressed. Th e

nas alized vowels a re phon emic only when long. Thos e in brackets occur as allo

phones only. In diphthongs, the figure for openness refers to the initial segment.

In the Bavarian dialect region and most typically in Eastern Central

Bavarian, spoken in Austria north of the Alps, several socially stra

ti fi ed forms of 1 anguage are in use. There is a conti nuum of speec h

f orms between the rural dialect (or urban jargon) and the regional

for m of standard German. T he use of a certai n form of speech is not

rigidly linked to the social status of the speakers. The speakers also

choose their form of speech depending on several other factors such as

to pic and environment of the discour se. loans from hig h er or lower

ranked soci 01 ects fu1 fi 11 the functi on of express i ng respect or di s

tain towards somebody or something. There is also a particular rela

tionship between speech tempo and sociolect. The dialectal word forms

a re more reduced than the corresponding forms of standard German.

Therefore, the dialect is preferred in li vely speech, while a speec h

form closer to the standard is preferred in a more deliberate mode of

speech . This kind of "dynamic diglossia" is quite different from the

static type of diglossia observable in Switzerland where the standard

15

is used only in literary and ceremonial contexts, or in northern

Ge rmany, where the dialect largely has been superseded by the stan

dard.

The vowel systems of seven dialectal varieties are presented. As could

be expected, the dial ect of Vienna, the capital of the region, is

closest to the regional dialectal coine, a form of speech ranked

intermediately on the social scale. The dial ectal peculiarities in

c rease with increasing communicational distance from the capital . In

general, for all their speech forms, speakers stay within the frame of

a uniform vowel sys tem common to all the sociolects used in a given

region (We exclude here the small fraction of natives with an active

profi ci ency in the "Hochl autung"), though as for the preci se phoneti c

quality of vowels, some dif ferences between age groups can be ob

served.

Among the phonological rules, those concerning the voca lization of

Ill, Ir/, and Inl have particularly profound consequences on the

phonetic make-up of these dial ects. While these consonants cause some

feature(s) to be added to preceding vowel s, the cond itioning conso

nants themselves are consequently deleted. The four rounded front

vowel s [ y {6 CI! II] and the diphthoongs [ur or JI] arise by / 1 1-vocalization. None of these vowels occurs in underlying dialectal

forms. The diphthongs [iee e Jeueye (Ee] - there are more of them in

some dialectal varieties - are the products of Ir/-vocal ization. Al

though some of these diachronically in some instances have a different

origin and no Irl in the standard German equivalents, this can no

longer be derived by a synchronous analysis of any one of those dia

lects alone. The vocalization of Inl is a more widespread process. It

produces all the nasalized vowels (see Table 2). It is, however,

particular for the present dial ects, except that of Vienna, that

nasality is also distinctive in vowels preceding nasals. As compared

with the oral vowels, the number of distinctive degrees of openness is

reduced by one in the nasal ones. Th is is expl ained as being due to

the nasal antiformant in the regi on of Flo The di sturbance caused by

this antiformant reduces the number of degrees of openness that can be

distinguished auditively.

The application of several rules is dependent on speech tempo. These

rul es concern vowe l-reducti on and the l oss of certai n segments. An-

16

other case is monophthongization, which produces e.g. [iE1, [01, and

[CE] from underlying laII, laUI, and lall via rOY] . In lively speech,

this monophthongization can be observed in the whole dialectal area.

In deliberate speech, it characterizes the dialects of Vienna and its

wider surroundings. This is an innovation that has been traced to the

speakers of the Viennese jargon at the end of the 19th century. It is

probably true that the speak ers of any urban jargon are incl ined to

exaggerate the particularities of the local dialect. One means to this

end is the generalization of rules which ordinarily apply to lively

speech a lone. This kind of process may a lso explain certain delayed

substratum effects observed in historical documents.

The characteristic frequencies (fundamental and four formants) of the

oral monophthongs produce d b y 12 male speak ers of the dialect of

Amstetten were measured. The result,

of overlap in the pairs [ e e], see Figure 6, showed some degree

[� CEL an d [0 :>1. This over-

lap was confirmed by an audi

tory analysis resulting in

dubious categ o r iza t ion s in

26%, 13%, an d 6%, respective

ly, of these pairs. None of

the spe akers evidence d any

difficulty in percept u all y

di scri mi nati ng between the t wo

degrees of openness distingui

shing these vowels. The gene

ral merger of lei with lei and

of 1M with lCEI ;s otherwise

characteristic of the dialect

o f Vienna only.

2,5 2,0 1 ,5 1,0 0,5 0, 0 +--L...o....o.�.I.......i.... .......... ........J...� .......... .-.........J...-........... '--'--"'----'-

0,2

0,4

0,6

O,B

F 2 [kHz]

i Y u +--+--- -- '-- ---- ----,� e+��=�:== ____ � � -� ----+ CE J

(2+-+-----+_ CEo-! '0 F 1 ;;r-[kHz) a

Figure 6. FlfF2 diagram of vowels by male speakers of

the Eastern Central Bavarian d i alect of Am stetten. Tonotopically scaled coordinates. Mean values (rings)

and standard deviations (bars) of the formant frequen

c i es. Vowel s wi th the same degree of openness connec

ted by dashed 1 i nes.

For references see the particular papers.

17

perception of spe ra related aspects and - s u

Documents