speech science xii speech perception (acoustic cues) version 2007-8

25
Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Upload: leon-dawson

Post on 16-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Speech Science XII

Speech Perception

(acoustic cues)

Version 2007-8

Page 2: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Topics

Psychoacoustics

Psychophonetics – acoustic cues

Reading: BHR, chap. 6, 184-203

(5th ed.) chaps. 9/10, 201 ff.

P.-M., 3.2.2., first part. pp. 158-171 (2nd ed.)

149-162 (1st ed.)

Page 3: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Psychoacoustics 1

• Psychoacoustics investigates the relationship between basic (acoustic) signal properties and basic auditory impressions: - How loud something sounds.- How high- or low-pitched something sounds.- How long somethings sounds.- What the timbre (quality) of a sound is.

• The questions asked are: - Can the signal be heard? (signal strength)- Can differences between signals be heard? (for all signal properties)

Page 4: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Psychoacoustics 2

• Important: Psychoacoustics relates the objective, measurable signal to subjective impressions.These are two different “worlds”

• The simplest “model” of psychoacoustic perceptionwould be a linear relationship: - A change in a signal parameter always has an equivalent change in the auditory impression.

• This not the case(which makes psychoacoustics very complex ….)

• Some of the non-linearity has direct implications for phonetic understanding…..

Page 5: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

A non-linear relationship: Loudness

Signal strengthinside ear

Signal strengthoutside ear

Page 6: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

The reason for non-linear loudness

Resonance characteristics of the outer ear

Page 7: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Non-linearity above threshold

Phon = dB at 1kHz

So, e.g.:80 Phons = 80 dBat 1 kHz but approx.100 dB at 50 Hz.&70 dB at 3.5 kHz

Page 8: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Also, sounds mask one another

If noise is present,a tone has to bestronger to be heard(it has a higher audibility threshold).

The closer the toneis in frequency tothe centre frequencyof the noise, thestronger it has to beto be heard!

Inte

nsi

ty o

f pure

tone (

mask

ed)

stim

uls

(dB)

Intensity of masking noise

Page 9: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

“Critical Bands” (Barks & Erbs)

Wide-band noise witha gap still masks a tonein the middle of the gap

… until the gap reachesa critical width.

Then the signal is heardat the same threshold asif there were no noise.

The noise no longer interferes with the part of the hearingmechanism dealing with the tone.

These “critical bands” arenarrow at low and broader at higher frequencies.

strong masking

no masking

Page 10: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Non-linearity of loudness with duration

• Above approx 300 ms (exact duration not certain) the perceived loudness of a sound is determined by signal strength (and frequency) independent of its duration.

• Below this duration, a shorter sound is heard as less loud than a longer sound of equal intensity.I.e., it is as if the energy is integrated over time, so that a shorter sound has less energy than a longer one.

• Phonetic importance? Short (unstressed) syllables are perceptually less prominent than longer (stressed) syllables.

Page 11: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

„Psychophonetics“

• Used here as a term to parallel “psychoacoustics”. In our definition, psychophonetics is the study of the relationship between the acoustic speech signal and functional aspects of speech – e.g., speech sounds, (stressed/unstressed) syllables, tonal accents, junctural phenomena etc.

• The experimental procedure typically requires changing the analytic properties of the acoustic speech signal in a controlled manner and recording the perceptual effect.

• The properties changed are those of acoustic analysis: duration, intensity, fundamental frequency and spectral structure.

Page 12: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

„Acoustic Cues“

• This term was coined in the 1950s, when synthesis and manipulation of the acoustic speech signal was starting. (Origin: Haskins Laboratories, NJ, USA)

• The „cues“ are those acoustic properties that can be shown to affect the perception of a speech sound.(so we have „acoustic cues“ for vowels and consonants, and within these categories for:e.g. voicing, manner, place of articulation in consonants, degree of opening, place, rounding etc. in vowels )

Page 13: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Acoustic cues – vowels 1

• Cues: Formants 1 and 2 (to a first approximation)

…. and the evidence from formant synthesis:

rounded vowels lower F2

front vowels higher F2

open vowels higher F1

Page 14: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Acoustic cues - vowels 2

• While monophthongs have a steady state formant structure, diphthongs – e.g. [] – and (vowel glide) approximants – e.g. [] – have changing formants as a „cue“ to their identity.

• [] have a more or less fixed formant pattern, determined by the identity two vocalic elements which define them.

• [] have a defined starting point, but the degree of formant change is determined by the following vowel. The starting point has a (slightly more damped) formant structure similar to the related vowel: [] []; [] []; [] [] (see acoustics slides)

Page 15: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Acoustic cues – plosives

• Plosives have a temporally complex set of acoustic cues resulting from (i) the closing movement, (ii) the closure phase and the (iii) release of the closure.

• The closure is a period with no energy (voiceless stops) or a weak low frequency periodic signal (voicing in the closure). This introduces a perceptible interruption.

• The release burst is the result of turbulence due to the escaping air from the increased intra-oral pressure built up during the closure. This may be relatively weak (in voiced stops) or strong (in voiceless stops).The different spectral properties of the burst noise signal the different places of articulation.

Page 16: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Release

bursts and vowel quality

Page 17: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Vowel formant transitions as consonant cues

• Formant transitions (changing formant values in the vowel preceding and following the stop consonant) reflect the articulator movement towards and away from the closure. The F2 transition is a cue to the consonantal place of articulation; F1 just signals the opening and closing movement.

• The place of the stop determines the F2 formant value from which or towards which the transition moves (called the locus). But the actual shape of the transition is determined by the vowel (as it is with vowel glides).

Page 18: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Locus frequencies – e.g. [d]

F1 rise = opening movement

Page 19: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

What sort of transitions for which place?

• The previous slide showed that the locus for [d] (and – logically – for [t, n, l, s, z]) is fairly constant. The value (for the average adult male vocal tract) is about 1800 Hz.

• For labial consonants, the vowel can be formed independent of the consonant closure (the tongue is free to move). Both F2 and F1 therefore just reflect the opening and closing of the jaw and lips. The “locus” is therefore always low.

• For velar consonants, the consonant closure is very dependent on the vowel (both use the tongue dorsum).The locus is higher than for alveolars both for front and back vowels, but for back vowels it is lower than for front vowels. F2 and F3 transitions often converge with velars.

Page 20: Speech Science XII Speech Perception (acoustic cues) Version 2007-8
Page 21: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

The importance of timing as a cue to the „voicing“ distinction

The temporal differencesshown here signal thedifference between „weak“and „strong“ plosives,whether there is closurevoicing present or not.It is often claimed that thedistinction “fortis-lenis” isbetter than “voiced-voiceless”

Page 22: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Acoustic cues - fricatives

• Fricative identity is determined by the spectral distribution of the energy (see also acoustics slides).

[][]

[] []

[] []

[] []

Page 23: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Summary of cues - Manner

Page 24: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Summary of cues - Place

Page 25: Speech Science XII Speech Perception (acoustic cues) Version 2007-8

Summary of cues:Fortis-lenis

voice bar