using fo and vocal-tract length to attend to one of two talkers. chris darwin university of sussex...

39
Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC & EPSRC

Post on 20-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Using Fo and vocal-tract length to attend to one of two talkers.

Chris Darwin

University of Sussex

With thanks to :• Rob Hukin• John Culling• John Bird• MRC & EPSRC

1. Review past work on the way that the

human auditory system uses differences in

Fo to separate two voices;

2. Present new data on the use of Fo, vocal-

tract length and their combination to allow

listeners to select one of two simultaneous

messages.

Something old, something new, something borrowed, background blue.

Difference in Fo leads to:

1. binaural separation of sound sources

2. increase in intelligibility

3. ability to track a sound source over time.

Three types of experiment:

Difference in Fo leads to:

1. binaural separation of sound sources

2. increase in intelligibility

3. ability to track a sound source over time.

Three types of experiment:

Broadbent & Ladefoged (1957)

• PAT-generated sentence “What did you say before that?”

F1 F2

• when Fo the same -125 Hz (either natural or monotone),

• listeners heard:

• one voice only 16/18

• in one place 18/18

• when Fo different -125 /135 (monotone),

• listeners heard:

• two voices 15/18

• in two places 12/18

B & L Conclusion

Common Fo integrates

– broadband frequency regions of a single voice – coming simultaneously to different ears

into a single voice heard in one position.

Is a common Fo sufficient for fusion?

• Broadbent & Ladefoged's stimuli used formant

resonators with broad low-frequency skirts.

• Sharply-filtered sounds sometimes give impression

of two sound sources even with common Fo.

Formant T(f) & abs difference

-30

-20

-10

0

10

20

30

0 500 1000 1500 2000

frequency

dB

Dichotic : same Fo

original

PSOLAFo -> 0%

PSOLAFo -> 0%

LP filter

HP filter

Left ear

Right ear

apologies to Hideki

Dichotic : different Fo

original

PSOLAFo -> - 4%

PSOLAFo -> + 4%

LP filter

HP filter

Left ear

Right ear

Complementary LP/HP filters

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500 2000

600 LP 600 HP

1400 LP 1400 HP

Variable bandwidth

Dichotic Results (female voice)

0

25

50

75

100

0 1000 2000

filter transition width

% fused

Same Fo

HP High Fo

HP Low Fo

Filter X-over @ 1 kHz

Higher filter cut-offs need wider bandwidths

0

25

50

75

100

0 1000 2000

filter transition width

% fused

600

800

1200

1400

2000

filter cut-off frequency

Same Fo

Low-frequency overlap

-40

-20

0

0 500 1000 1500 2000

frequency (Hz)

600 800 1200 1400 2000

cf natural ILDs higher for low frequency sounds

Summary

Fusion at same Fo?

Fusion at Different Fo (±4%)?

DichoticLow-frequency

overlap neededNo

But what about Fo’s ability to separate different voices? (original B & L question)

Difference in Fo leads to:

1. binaural separation of sound sources

2. increase in intelligibility

3. ability to track a sound source over time.

Three types of experiment:

Fo improves identification

0

20

40

60

80

100

0 2 4 6 8 10 12

Assmann & Summerfield 200msBrokx & Notteboom

% correct

semitones

double vowels

sentences

• double vowels over by 1 semitone

• sentences improve for longer

Mechanisms of Fo improvement

• A. Global: Across formant grouping by Fo (as originally conceived by B & L)

• B. Local: Better definition of individual formants - especially F1 where harmonics resolved

At small ∆Fos B more important than A for double vowels (Culling & Darwin, JASA 1993).

Also true for sentences?

Fo between two sentences(Bird & Darwin 1998; after Brokx & Nooteboom, 1982)

0

20

40

60

80

100

0 2 4 6 8 10

Normal

Fo difference (semitones)

40 Subjects40 Sentence Pairs

Perfect Fourth ~4:3

Target sentence Fo = 140 Hz

Masking sentence = 140 Hz ± 0,1,2,5,10 semitones

Two sentences (same talker)• only voiced consonants • (with very few stops)

Task: write down target sentence

Replicates & extends Brokx & Nooteboom

Chimeric sentences(Bird & Darwin, Grantham Meeting 1998)

100-100 100-106 100-112 100-133 100-178

Fo below 800 Hz Fo above 800 Hz

Paired sentences' Fos

Low Pass High Pass

Normal 100 100112 112

Same Fo in High 100 100112 100

Same Fo in Low 100 100100 112

Swapped 100 112(gives wrong gping) 112 100

Segregating sentence pairs by Fo

0

20

40

60

80

100

0 2 4 6 8 10

Normal

Same Fo in High PassSame Fo in Low Pass

Fo difference (semitones)

40 Subjects40 Sentence Pairs

• all the action is in the low frequency region (<800 Hz)

• no strong evidence of across-formant grouping

Adding Fo-swapped

0

20

40

60

80

100

0 2 4 6 8 10

NormalFo-swappedSame Fo in High PassSame Fo in Low Pass

Fo difference (semitones)

40 Subjects40 Sentence Pairs

• inappropriate

pairing of Fo only

detrimental above

4 semitones

Summary of Fo-differences

• Across-formant grouping only significant

for large Fo differences (> ~ 4 semitones)

• Most of the improvement with small Fo

differences happens in the F1 frequency-

region.

another caveat for auto-correlation

• Improvement in identification of double

vowels for small ∆Fos is about as good when

each vowel is made up of alternating

harmonics of the two Fos (Culling & Darwin)

• Autocorrelation would pull out completely

wrong envelopes.

No simultaneous effect of FM

• Although separation by Fo shows strong

effects, there is no detectable effect of

simultaneous separation by different different

Frequency Modulations of FoFrequency Modulations of Fo.

• Listeners unable to discriminate correlated

from uncorrelated FM in simulataneous

inharmonic sine waves (Carlyon).

Summary of Fo effects in separating competing voices

• Intelligibility increased by small Fo only in

F1 region (and harmonic alternation tolerated)...

• … but not by Fo in only higher freq.

region.

• Across-formant consistency of Fo only

important at larger Fo

• FM produces no additional separation

Difference in Fo leads to:

1. binaural separation of sound sources

2. increase in intelligibility

3. ability to track a sound source over time.

Three types of experiment:

CRM task (tracking a sound source) (Bolia et al., 2000)

• 2 simultaneous sentences each of form

Ready (Call Sign) go to (Color) (Number) now.

Same talker (TT); Same Sex (TS); Different sex (TD)

• Target denoted by Call-Sign "Baron"

• 8 Talkers in corpus, 2048 tokens

Listeners responded by selecting the appropriate colored digit with the

computer mouse

CRM task (Bolia et al., 2000)

CRM task results (Brungart et al)

Effect of change in Fo

Effect of change in Fo

Fo contours for 2 individuals

050

100150200

Call Sign Arrow Call Sign Tiger Call Sign Eagle Call Sign Baron

050

100150200

0 1 2Time (s)

0 1 2Time (s)

0 1 2Time (s)

0 1 2Time (s)

Individuals, with most constant Fo contours, show most improvement with ∆Fo

y = -0.0262x + 0.4163

R2 = 0.9315

-0.10

0.00

0.10

0.20

0.30

0 5 10 15 20

%Fo difference

Av change with Fo

Effect of change of VT

Effect of joint change of Fo and VT

Original: male

Effect of joint change of Fo and VT

Original: female

Superadditivity of ∆Fo and ∆VT

0.00

0.50

1.00

1.50

0.00 0.50 1.00 1.50

predicted d'

actual d'

male

female

∆Fo & ∆VTsuperadditive

… and still lessthan real different-sextalkers

Conclusions

• Same Fo not a sufficient condition for

dichotic fusion for complemenarily filtered

speech.

• Intelligibility increase for small ∆Fo

confined to F1 region. Only across-formant

for larger ∆Fo.

• Fo & VT-size useful for tracking sources

across time. Superadditive.