Using Fo and vocal-tract length to attend to one of two talkers.
Chris Darwin
University of Sussex
With thanks to :• Rob Hukin• John Culling• John Bird• MRC & EPSRC
1. Review past work on the way that the
human auditory system uses differences in
Fo to separate two voices;
2. Present new data on the use of Fo, vocal-
tract length and their combination to allow
listeners to select one of two simultaneous
messages.
Something old, something new, something borrowed, background blue.
Difference in Fo leads to:
1. binaural separation of sound sources
2. increase in intelligibility
3. ability to track a sound source over time.
Three types of experiment:
Difference in Fo leads to:
1. binaural separation of sound sources
2. increase in intelligibility
3. ability to track a sound source over time.
Three types of experiment:
Broadbent & Ladefoged (1957)
• PAT-generated sentence “What did you say before that?”
F1 F2
• when Fo the same -125 Hz (either natural or monotone),
• listeners heard:
• one voice only 16/18
• in one place 18/18
• when Fo different -125 /135 (monotone),
• listeners heard:
• two voices 15/18
• in two places 12/18
B & L Conclusion
Common Fo integrates
– broadband frequency regions of a single voice – coming simultaneously to different ears
into a single voice heard in one position.
Is a common Fo sufficient for fusion?
• Broadbent & Ladefoged's stimuli used formant
resonators with broad low-frequency skirts.
• Sharply-filtered sounds sometimes give impression
of two sound sources even with common Fo.
Dichotic : same Fo
original
PSOLAFo -> 0%
PSOLAFo -> 0%
LP filter
HP filter
Left ear
Right ear
apologies to Hideki
Dichotic : different Fo
original
PSOLAFo -> - 4%
PSOLAFo -> + 4%
LP filter
HP filter
Left ear
Right ear
Complementary LP/HP filters
0
0.2
0.4
0.6
0.8
1
0 500 1000 1500 2000
600 LP 600 HP
1400 LP 1400 HP
Variable bandwidth
Dichotic Results (female voice)
0
25
50
75
100
0 1000 2000
filter transition width
% fused
Same Fo
HP High Fo
HP Low Fo
Filter X-over @ 1 kHz
Higher filter cut-offs need wider bandwidths
0
25
50
75
100
0 1000 2000
filter transition width
% fused
600
800
1200
1400
2000
filter cut-off frequency
Same Fo
Low-frequency overlap
-40
-20
0
0 500 1000 1500 2000
frequency (Hz)
600 800 1200 1400 2000
cf natural ILDs higher for low frequency sounds
Summary
Fusion at same Fo?
Fusion at Different Fo (±4%)?
DichoticLow-frequency
overlap neededNo
But what about Fo’s ability to separate different voices? (original B & L question)
Difference in Fo leads to:
1. binaural separation of sound sources
2. increase in intelligibility
3. ability to track a sound source over time.
Three types of experiment:
Fo improves identification
0
20
40
60
80
100
0 2 4 6 8 10 12
Assmann & Summerfield 200msBrokx & Notteboom
% correct
semitones
double vowels
sentences
• double vowels over by 1 semitone
• sentences improve for longer
Mechanisms of Fo improvement
• A. Global: Across formant grouping by Fo (as originally conceived by B & L)
• B. Local: Better definition of individual formants - especially F1 where harmonics resolved
At small ∆Fos B more important than A for double vowels (Culling & Darwin, JASA 1993).
Also true for sentences?
Fo between two sentences(Bird & Darwin 1998; after Brokx & Nooteboom, 1982)
0
20
40
60
80
100
0 2 4 6 8 10
Normal
Fo difference (semitones)
40 Subjects40 Sentence Pairs
Perfect Fourth ~4:3
Target sentence Fo = 140 Hz
Masking sentence = 140 Hz ± 0,1,2,5,10 semitones
Two sentences (same talker)• only voiced consonants • (with very few stops)
Task: write down target sentence
Replicates & extends Brokx & Nooteboom
Chimeric sentences(Bird & Darwin, Grantham Meeting 1998)
100-100 100-106 100-112 100-133 100-178
Fo below 800 Hz Fo above 800 Hz
Paired sentences' Fos
Low Pass High Pass
Normal 100 100112 112
Same Fo in High 100 100112 100
Same Fo in Low 100 100100 112
Swapped 100 112(gives wrong gping) 112 100
Segregating sentence pairs by Fo
0
20
40
60
80
100
0 2 4 6 8 10
Normal
Same Fo in High PassSame Fo in Low Pass
Fo difference (semitones)
40 Subjects40 Sentence Pairs
• all the action is in the low frequency region (<800 Hz)
• no strong evidence of across-formant grouping
Adding Fo-swapped
0
20
40
60
80
100
0 2 4 6 8 10
NormalFo-swappedSame Fo in High PassSame Fo in Low Pass
Fo difference (semitones)
40 Subjects40 Sentence Pairs
• inappropriate
pairing of Fo only
detrimental above
4 semitones
Summary of Fo-differences
• Across-formant grouping only significant
for large Fo differences (> ~ 4 semitones)
• Most of the improvement with small Fo
differences happens in the F1 frequency-
region.
another caveat for auto-correlation
• Improvement in identification of double
vowels for small ∆Fos is about as good when
each vowel is made up of alternating
harmonics of the two Fos (Culling & Darwin)
• Autocorrelation would pull out completely
wrong envelopes.
No simultaneous effect of FM
• Although separation by Fo shows strong
effects, there is no detectable effect of
simultaneous separation by different different
Frequency Modulations of FoFrequency Modulations of Fo.
• Listeners unable to discriminate correlated
from uncorrelated FM in simulataneous
inharmonic sine waves (Carlyon).
Summary of Fo effects in separating competing voices
• Intelligibility increased by small Fo only in
F1 region (and harmonic alternation tolerated)...
• … but not by Fo in only higher freq.
region.
• Across-formant consistency of Fo only
important at larger Fo
• FM produces no additional separation
Difference in Fo leads to:
1. binaural separation of sound sources
2. increase in intelligibility
3. ability to track a sound source over time.
Three types of experiment:
CRM task (tracking a sound source) (Bolia et al., 2000)
• 2 simultaneous sentences each of form
Ready (Call Sign) go to (Color) (Number) now.
Same talker (TT); Same Sex (TS); Different sex (TD)
• Target denoted by Call-Sign "Baron"
• 8 Talkers in corpus, 2048 tokens
Listeners responded by selecting the appropriate colored digit with the
computer mouse
CRM task (Bolia et al., 2000)
Fo contours for 2 individuals
050
100150200
Call Sign Arrow Call Sign Tiger Call Sign Eagle Call Sign Baron
050
100150200
0 1 2Time (s)
0 1 2Time (s)
0 1 2Time (s)
0 1 2Time (s)
Individuals, with most constant Fo contours, show most improvement with ∆Fo
y = -0.0262x + 0.4163
R2 = 0.9315
-0.10
0.00
0.10
0.20
0.30
0 5 10 15 20
%Fo difference
Av change with Fo
Superadditivity of ∆Fo and ∆VT
0.00
0.50
1.00
1.50
0.00 0.50 1.00 1.50
predicted d'
actual d'
male
female
∆Fo & ∆VTsuperadditive
… and still lessthan real different-sextalkers