speech acoustics
TRANSCRIPT
Speech acoustics
Objectives: Describe relative frequency and intensity
of phonemes by voice, manner, and formant frequency.
Describe various phonemic cues.Describe speech constraints.
Average speech intensity
~65 dB SPL (~45 dB HL) 30 dB range Any vowel has more power than any
consonant
Average speech frequency
~50 – 10,000 Hz Most energy below 1000 Hz
Fundamental frequency Men: 100 Hz Women: 200 Hz Children: 300 Hz Crying babies: 500 Hz
Cues for talker identity
Average speech duration
Vowels: 130 – 360 msec Consonants: 20 – 150 msec Rate: ~5 syllables/second; ~12
phonemes/second
Vowel formants
High F1
Low F2
High F1
High F2
Low F1
Low F2
Low F1
High F2
Vowel formants
Consonants: place, manner, voicing
w
Consonants: energy bandsFrequency Bands
Consonant 1 2 3 4 Intensity
r 600-800 1000-1500 1800-2400 46
l 250-400 2000-3000 43
sh 1500-2000 4500-5500 41
ng 250-400 1000-1500 200-3000 41
ch 1500-2000 4000-5000 38
n 250-350 1000-1500 2000-3000 37
m 250-350 1000-1500 2500-3500 35
th (ð) 250-350 4500-6000 34
t 2500-3500 34
h 1500-2000 32
k 2000-2500 34
j 200-300 2000-3000 36
f 4000-5000 34
g 200-300 1500-2500 33
s 5000-6000 32
z 200-300 4000-5000 31
v 300-400 3500-4500 31
p 1500-2000 30
d 300-400 2500-3000 29
b 300-400 2000-2500 29
th (θ) ~6000 28
Phonemic cues - Stops
Closure Voiceless stops – silent period Voiced stops – low level energy
Burst Wide-band energy ~40 msec Greater intensity for voiceless stops Frequency depends on place
Formant transition First formant always rising Second formant transition depends on
place
Phonemic cues - Stops
Voice easier to detect than place For voiced stops
Voice-onset time is earlier Energy present at fundamental frequency Burst energy is lower in amplitude Vowels are longer in duration before voiced
final stops (“eyes” v. “ice”)
Phonemic cues - Nasals
Always voiced Continuant Nasal resonance
highest for /m/ lowest for /n/
Second formant (frequency and transition) gives place information
Phonemic cues - Fricatives
Hissing quality Voiced fricatives
Periodic Lower frequency Lower amplitude Greater overall energy (from
fundamental) Sibilants (s, z, sh, zh)
Higher amplitude than other fricatives
-f- -θ- -s- -S-
Suprasegmental cues
Stress changes in fundamental frequency,
intensity, duration Intonation
changes in fundamental frequency, pitch pattern
expresses attitudes, feeling, meaning (command, request, statement)
Duration variations in speech sounds due to
context of other sounds
Speech constraints
Syntactic S = NP (Aux) VP
NP = (Det) (AP) N (PP) “the naughty boy in the daycare…”
VP = V (NP) (PP) (Adv) “…took the toy away brusquely”
Speech constraints
Syntactic S = NP (Aux) VP
NP = (Det) (AP) N (PP) “the naughty boy in the daycare…”
VP = V (NP) (PP) (Adv) “…took the toy away brusquely”
Speech constraints
SyntacticThe question “What should you eat”
Answer is a noun phrase
The question “How should you eat” Answer is an adverbial phrase
Speech constraints
Semantic Words in a sentence are related
meaningfully “Plug the mouse into the computer”
Situational Conversation usually refers to the context
of the environment “I like that oat!”
Mall vs. Farm
Overlapping cues help protect the signal from noise
Speech predictability helps protect the signal from noise
Noise can come from the speaker (poor intelligibility, etc) the environment (distractions, etc) the listener (ESL, etc)
Effects of hearing loss on speech perception
Objectives: Describe speech characteristics that are
lost and that are preserved for hearing losses of various degree, type and configuration.
0 20 50 100 200 500 1000 2000 5000 10000 200000
20
40
60
80
100
120
140
160
Auditory Response Area
0 20 50 100 200 500 1000 2000 5000 10000 200000
20
40
60
80
100
120
140
160
Auditory Response Area
0 20 50 100 200 500 1000 2000 5000 10000 200000
20
40
60
80
100
120
140
160
Auditory Response Area
Speech audiogram
Speech audiogram
X X X X X X
Speech audiogram
Consonants: energy bandsFrequency Bands
Consonant 1 2 3 4 Intensity
r 600-800 1000-1500 1800-2400 46
l 250-400 2000-3000 43
sh 1500-2000 4500-5500 41
ng 250-400 1000-1500 200-3000 41
ch 1500-2000 4000-5000 38
n 250-350 1000-1500 2000-3000 37
m 250-350 1000-1500 2500-3500 35
th 250-350 4500-6000 34
t 2500-3500 34
h 1500-2000 32
k 2000-2500 34
j 200-300 2000-3000 36
f 4000-5000 34
g 200-300 1500-2500 33
s 5000-6000 32
z 200-300 4000-5000 31
v 300-400 3500-4500 31
p 1500-2000 30
d 300-400 2500-3000 29
b 300-400 2000-2500 29
th ~6000 28
Consonants: energy bandsFrequency Bands
Consonant 1 2 3 4 Intensity
r 600-800 1000-1500 1800-2400 46
l 250-400 2000-3000 43
sh 1500-2000 4500-5500 41
ng 250-400 1000-1500 200-3000 41
ch 1500-2000 4000-5000 38
n 250-350 1000-1500 2000-3000 37
m 250-350 1000-1500 2500-3500 35
th 250-350 4500-6000 34
t 2500-3500 34
h 1500-2000 32
k 2000-2500 34
j 200-300 2000-3000 36
f 4000-5000 34
g 200-300 1500-2500 33
s 5000-6000 32
z 200-300 4000-5000 31
v 300-400 3500-4500 31
p 1500-2000 30
d 300-400 2500-3000 29
b 300-400 2000-2500 29
th ~6000 28
Consonants: energy bandsFrequency Bands
Consonant 1 2 3 4 Intensity
r 600-800 1000-1500 1800-2400 46
l 250-400 2000-3000 43
sh 1500-2000 4500-5500 41
ng 250-400 1000-1500 200-3000 41
ch 1500-2000 4000-5000 38
n 250-350 1000-1500 2000-3000 37
m 250-350 1000-1500 2500-3500 35
th 250-350 4500-6000 34
t 2500-3500 34
h 1500-2000 32
k 2000-2500 34
j 200-300 2000-3000 36
f 4000-5000 34
g 200-300 1500-2500 33
s 5000-6000 32
z 200-300 4000-5000 31
v 300-400 3500-4500 31
p 1500-2000 30
d 300-400 2500-3000 29
b 300-400 2000-2500 29
th ~6000 28
Speech audiogram
Speech audiogram
34 dots
Correlating SII to speech
Adult values (children would be worse)
Digits easy
Words hard
X X X X X X
Correlating SII to speech
Deafness
No access to average speech
Severe
Access to only loudest components of speech
Speech production High airflow rate Speech initiation at low lung volumes Poor velar control (nasality) High fundamental frequency Slow speech rate
Moderate
Access to louder half of speech, or to loud speech
Speech production Substitutions and distortions Errors in affricate, fricatives and blends
Slight to Mild
Access to all but the quietest components of speech
Speech production Fewer distortions/substitutions Good intelligibility
Rising v. Sloping loss
Rising v. Sloping loss
SII = 64 SII = 45