tutorial on auditory scene analysis perception and...
TRANSCRIPT
-
Tutorial on Auditory Scene Analysis
Perception and Physiology
Tutorial on Tutorial on Auditory Scene AnalysisAuditory Scene Analysis
Perception and PhysiologyPerception and Physiology
Shihab ShammaInstitute for Systems Research
Electrical and Computer EngineeringUniversity of Maryland College Park
With additional contribution from
Christophe MicheylResearch Laboratory of Electronics
Massachusettes Institue of Technology
Shihab ShammaShihab ShammaInstitute for Systems Research
Electrical and Computer EngineeringUniversity of Maryland College Park
With additional contribution from With additional contribution from
Christophe MicheylChristophe MicheylResearch Laboratory of ElectronicsResearch Laboratory of Electronics
Massachusettes Institue of TechnologyMassachusettes Institue of Technology
-
An auditory sceneAn auditory sceneAn auditory scene
Frequency
Time
-
FrequencySinging voice
Time
-
Two classes of ASA processesTwo classes of ASA processesTwo classes of ASA processes
Time
Frequency
Simultaneous processes
Sequentialprocesses
-
Schema-basedtop-down
under attentional controldependent upon learning
SchemaSchema--basedbasedtoptop--downdown
under under attentionalattentional controlcontroldependent upon learningdependent upon learning
Primitivebottom-up
automaticnot dependent on learning
PrimitivePrimitivebottombottom--upup
automaticautomaticnot dependent on learningnot dependent on learning
schemasobjects
objects
stimulus stimulus
-
Schema-basedtop-down
under attentional controldependent upon learning
SchemaSchema--basedbasedtoptop--downdown
under under attentionalattentional controlcontroldependent upon learningdependent upon learning
Primitivebottom-up
automaticnot dependent on learning
PrimitivePrimitivebottombottom--upup
automaticautomaticnot dependent on learningnot dependent on learning
schemasobjects
objects
stimulus stimulus
-
OutlinePart I: Psychoacoustics of ASA
Part II: Neural Correlates of two-tone Streaming
===========================
Two excellent references:A Bregman’s book (1990) - Auditory Scene AnalysisBJC Moore & H Gockel a recent review (2002) “Factors influencing
sequential stream segregation”, Acta Acustica (88) 320-332
-
Outline of Part IOutline of Part IOutline of Part I
Sequential ASA processes: streaming•(What is it?) The perceptual phenomenon •(How does it work?) Theories and computational models•(How does it really work?) Neural mechanisms•(What’s it good for?) Relationships with other aspects of perception
Simultaneous ASA processes: hearing out concurrent sounds•The identification of concurrent vowels•Concurrent harmonic complexes: the role of frequency selectivity
Sequential ASA processes: streamingSequential ASA processes: streaming••(What is it?)(What is it?) The perceptual phenomenon The perceptual phenomenon ••(How does it work?)(How does it work?) Theories and computational modelsTheories and computational models••(How does it really work?)(How does it really work?) Neural mechanismsNeural mechanisms••(What(What’’s it good for?) s it good for?) Relationships with other aspects of perceptionRelationships with other aspects of perception
Simultaneous ASA processes: hearing out concurrent Simultaneous ASA processes: hearing out concurrent soundssounds••The identification of concurrent vowelsThe identification of concurrent vowels••Concurrent harmonic complexes: the role of frequency selectivityConcurrent harmonic complexes: the role of frequency selectivity
-
Auditory streaming
What is it?
Description and demonstration of the phenomenon
Auditory streamingAuditory streaming
What is it?What is it?
Description and demonstration of Description and demonstration of the phenomenonthe phenomenon
-
Miller & Heise (1950), Bregman & Campbell (1971), … Bregman (1990), …
Frequency
Time
A
B
A
B
A
B
A
B
……
dF
-
Frequency
Time
A
B
A
B
A
B
A
B
……
“1 stream of sounds jumping up and down
in pitch”
-
Frequency
A
B
A
B
A
B
A
B
…
…
dF
Time
-
Frequency
Time
A A
B B
A A
B B …
“2 streams,one high, one low”
…
Note: you can only attend to one stream at a time
-
Frequency
A
B
A A
B
A…
…
Time
-
Frequency
Time
A
B
A A
B
A…
…
“1 streamwith a galloping
rhythm”
…
-
…
“2 streams,one high and slow, the
other low and fast”
…
Frequency
BB
Time
A A A A…
Note: when streamed, the relative timing between A and B tonesbecomes less important.
-
Streaming also depends on temporal parameters
Streaming also depends on Streaming also depends on temporal parameterstemporal parameters
Frequencydt
A
B
A…
…
A A
B B B
Time
Slow Fast
-
Streaming also depends on connectedness (continuation)
Streaming also depends on Streaming also depends on connectednessconnectedness (continuation)(continuation)
Frequency
A A A A
B B B B
…
…
B
Time
A A A A
B B B …
…
-
Dependence of streaming on stimulus parametersDependence of streaming on stimulus parametersDependence of streaming on stimulus parameters
ABA_ stimulus spectrogramABA_ stimulus spectrogramABA_ stimulus spectrogram
dF
dT
always2 streams
1 or 2 streams
always1 stream
Tone repetition rate
fission boundary
temporal coherence boundary
After: van Noorden (1975)
Fast Slow
-
Streaming
How does it work?
Theories and computational models
StreamingStreaming
How does it work?How does it work?
Theories and Theories and computational modelscomputational models
-
The channeling theoryHartmann and Johnson (1991) Music Percept.
The channeling theoryThe channeling theoryHartmann and Johnson (1991) Music Percept.Hartmann and Johnson (1991) Music Percept.
Peripheral auditory filtersLevel
Frequency
-
The channeling theoryHartmann and Johnson (1991) Music Percept.
The channeling theoryThe channeling theoryHartmann and Johnson (1991) Music Percept.Hartmann and Johnson (1991) Music Percept.
“1 stream”
Level
AB Frequency
-
The channeling theoryHartmann and Johnson (1991) Music Percept.
The channeling theoryThe channeling theoryHartmann and Johnson (1991) Music Percept.Hartmann and Johnson (1991) Music Percept.
“2 streams”
Level
A B Frequency
-
Beauvois & Meddis’s modelBeauvois and Meddis (1996) J. Acoust. Soc. Am.
Computer simulation of auditory stream segregation in alternating-tone sequence
Beauvois & MeddisBeauvois & Meddis’’s models modelBeauvois and Meddis (199Beauvois and Meddis (19966) J. Acoust. Soc. Am.) J. Acoust. Soc. Am.
Computer simulation of auditory stream segregation in Computer simulation of auditory stream segregation in alternatingalternating--tone sequencetone sequence
-
McCabe & Denham’s modelMcCabe and Denham (1997) J. Acoust. Soc. Am.
A model of auditory streaming
McCabe & DenhamMcCabe & Denham’’s models modelMcCabe and Denham (1997) J. Acoust. Soc. Am.McCabe and Denham (1997) J. Acoust. Soc. Am.
A model of auditory streamingA model of auditory streaming
-
Is peripheral chanelling the whole story?Is peripheral chanelling the whole story?Is peripheral chanelling the whole story?
-
Sounds that excite the same peripheral channels can yield streaming
Vliegen & Oxenham (1999)Vliegen, Moore, Oxenham (1999)
Grimault, Micheyl, Carlyon et al. (2001)Grimault, Bacon, Micheyl (2002)Roberts, Glasberg, Moore (2002)
...
Sounds that excite the same peripheral Sounds that excite the same peripheral channels can yield streamingchannels can yield streaming
Vliegen & Oxenham (1999)Vliegen & Oxenham (1999)Vliegen, Moore, Oxenham (1999)Vliegen, Moore, Oxenham (1999)
Grimault, Micheyl, Carlyon et al. (2001)Grimault, Micheyl, Carlyon et al. (2001)Grimault, Bacon, Micheyl (2002)Grimault, Bacon, Micheyl (2002)Roberts, Glasberg, Moore (2002)Roberts, Glasberg, Moore (2002)
......
-
Streaming with complex tonesStreaming with complex tonesStreaming with complex tones
Amplitude
F0400Hz 800Hz 1200Hz …
F0
FrequencyF0150Hz
300Hz450Hz …
-
Spectral Grouping or “Fusion” of Harmonics
Mistuning a harmonic
• Fusion is found in humans and many animals alike• Fusion also breaks with onset mismatches
-
Streaming based on F0 differencesStreaming based on F0 differencesStreaming based on F0 differences
Frequency
TimeA
B
A A
B
A …
Frequency
TimeA
B
A A
B
A …
F0
TimeA
B
A A
B
A …
Musical melodies also stream
Telemann
-
Auditory spectral excitation pattern evoked by bandpass-filtered harmonic complex
Auditory spectral excitation pattern Auditory spectral excitation pattern evoked by bandpassevoked by bandpass--filtered harmonic complexfiltered harmonic complex
1000 2000 3000 4000 5000 60005
10
15
20
25
30
35
40
45
Leve
l (dB
)
Frequency (Hz)
F0 = 400Hz
-
Auditory spectral excitation pattern evoked by bandpass-filtered harmonic complex
Auditory spectral excitation pattern Auditory spectral excitation pattern evoked by bandpassevoked by bandpass--filtered harmonic complexfiltered harmonic complex
-
Auditory spectral excitation pattern evoked by bandpass-filtered harmonic complex
Auditory spectral excitation pattern Auditory spectral excitation pattern evoked by bandpassevoked by bandpass--filtered harmonic complexfiltered harmonic complex
-
Auditory spectral excitation pattern evoked by bandpass-filtered harmonic complexAuditory spectral excitation pattern Auditory spectral excitation pattern evoked by bandpassevoked by bandpass--filtered harmonic complexfiltered harmonic complex
F0A=100Hz F0B= F0A+1.5oct = 283Hz
Small ∆FAB
Large ∆FAB
-
F0-based streaming with unresolved harmonics is possibleVliegen & Oxenham; Vliegen, Moore, Oxenham (1999)
Grimault, Micheyl, Carlyon et al. (2000)
but the effect is weaker than with resolved harmonicsGrimault, Micheyl, Carlyon et al. (2000)
F0F0--based streaming with unresolved harmonics is possiblebased streaming with unresolved harmonics is possibleVliegen & Oxenham; Vliegen, Moore, Oxenham (1999)Vliegen & Oxenham; Vliegen, Moore, Oxenham (1999)
Grimault, Micheyl, Carlyon Grimault, Micheyl, Carlyon et al.et al. (2000)(2000)
but the effect is weaker than with resolved harmonicsbut the effect is weaker than with resolved harmonicsGrimault, Micheyl, Carlyon Grimault, Micheyl, Carlyon et al.et al. (2000)(2000)
0.0
0.2
0.4
0.6
0.8
1.0
-6 0 6 12 18F0 difference (semitones)
Pro
babi
lity
of "2
stre
ams"
resp
onse
Low region
High region
F0(A) = 250 Hz
From: Grimault et al. (2000) JASA 108, 263-
Unresolved
Unresolved
Resolved Resolved
-
Phase-based streamingRoberts, Glasberg, Moore (2002)
PhasePhase--based streamingbased streamingRoberts, Glasberg, Moore (2002)Roberts, Glasberg, Moore (2002)
Harmonics insine phaseφ(n)=0
Harmonics in alternating-phaseφ(n)=0 for odd n φ(n)=90 for even n
-
Streaming Based on Timbre
Ripple A Ripple B A-B-ADifferent Spectral Envelopes
A B A
-
What is it good for ?Organizing auditory scenes into different sources:
• foreground-background• parsing speakers and speech• ignoring distractions
Harmonic SegregationFM HarmonicsContinuity Illusion Ignoring Distractions
-
•• The formation of auditory streams is determined The formation of auditory streams is determined partly by peripheral frequency selectivitypartly by peripheral frequency selectivity
•• Streaming may be produced by sounds that excite the Streaming may be produced by sounds that excite the samesameperipheral channelsperipheral channels
•• What matters is the What matters is the perceptual differenceperceptual difference between the between the streamed soundsstreamed sounds
•• Perceptual differencePerceptual difference is created by simultaneous is created by simultaneous (primative) processes: Harmonicity; Onset and offset (primative) processes: Harmonicity; Onset and offset detection; Analysis of spectral shape.detection; Analysis of spectral shape.
•• Curiously Curiously ……. . Sound Sound localizationlocalization (e.g., ITD) does not behave (e.g., ITD) does not behave as a primitive process as a primitive process
Interim SummaryInterim Summary
-
Neural Correlates
of
Two-Tone Streaming
-
A basic pre-requisite for any neural correlate of streaming:
depend on both dF and dT
A basic preA basic pre--requisite for any neural correlate of requisite for any neural correlate of streaming: streaming:
depend on both dF and dTdepend on both dF and dT
dF
dT
always2
streams1 or 2 streams
always1 stream
Tone repetition rate
fission boundary
temporal coherence boundary
-
Single/few/multi-unit intra-cortical recordingsMonkeys: Fishman et al. (2001) Hear. Res. 151, 167-187
Bats: Kanwal, Medvedev, Micheyl (2003) Neural Networks
Single/few/multiSingle/few/multi--unit intraunit intra--cortical recordingscortical recordingsMonkeys: Monkeys: Fishman et al. (2001) Hear. Res. 151, 167Fishman et al. (2001) Hear. Res. 151, 167--187187
Bats:Bats: Kanwal, Medvedev, Micheyl (2003) Neural NetworksKanwal, Medvedev, Micheyl (2003) Neural Networks
At low repetition rates,units respond to both on- and off-BF tones
At low repetition rates,At low repetition rates,units respond to both units respond to both onon-- and offand off--BF tonesBF tones
At high repetition rates, only on-BF tone response
is visible
At high repetition rates, At high repetition rates, only ononly on--BF tone responseBF tone response
is visibleis visible
-
Maybe, but:Maybe, but:Maybe, but:That neural responses in auditory cortex depend both
on ∆F and ∆T is hardly a surpriseThis is insufficient evidence for the fact that streaming is
relfected in neural responses in the auditory cortex
A much more convinving correlate of streaming would be obtained if neural responses were shown to
co-vary with the perceptwhile the physical stimulus remains unchanged
...
That neural responses in auditory cortex depend both That neural responses in auditory cortex depend both on on ∆∆F and F and ∆∆T is hardly a surpriseT is hardly a surprise
This is insufficient evidence for the fact that streaming is This is insufficient evidence for the fact that streaming is relfected in neural responses in the auditory cortexrelfected in neural responses in the auditory cortex
A much more convinving correlate of streaming would A much more convinving correlate of streaming would be obtained if neural responses were shown to be obtained if neural responses were shown to
coco--vary with the vary with the perceptperceptwhile the physical stimulus remains unchangedwhile the physical stimulus remains unchanged
......
-
Ambiguous stimuli, bi-stable perceptsAmbiguous stimuli, biAmbiguous stimuli, bi--stable perceptsstable perceptsNecker’s cubeNeckerNecker’’s cubes cube Rubin’s vase-facesRubinRubin’’s vases vase--facesfaces
have been used successfully in the pastto demonstrate single-unit correlates of visual percepts
(not just stimulus parameters)e.g., Logothetis & Schall (1989) Science
Leopold & Logothetis (1996) Nature
have been used successfully in the pasthave been used successfully in the pastto demonstrate singleto demonstrate single--unit correlates of visual perceptsunit correlates of visual percepts
(not just stimulus parameters)(not just stimulus parameters)e.g., Logothetis & Schall (1989) Sciencee.g., Logothetis & Schall (1989) Science
Leopold & Logothetis (1996) NatureLeopold & Logothetis (1996) Nature
Necker's cube
-
The build-up of auditory streaming:a systematic change in the auditory percept over time
during prolonged listening to repeating sequences
The buildThe build--up of auditory streaming:up of auditory streaming:a systematic change in the auditory percept over timea systematic change in the auditory percept over time
during prolonged listening to repeating sequencesduring prolonged listening to repeating sequences
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4 5 6 7 8 9Time (s)
Pro
babi
lity
'2 s
tream
s' re
spon
se
1 ST3 ST6 ST9 ST
ST: semitone
-
The break-down of apparent motionThe breakThe break--down of apparent motiondown of apparent motionWertheimer (1912), Anstis et al. (1985)Wertheimer (1912), Anstis et al. (1985)Wertheimer (1912), Anstis et al. (1985)
A
B
A
B
fast rates or large distances:two dots lit alternately
fast rates or large distances:fast rates or large distances:two dots lit alternatelytwo dots lit alternately
slow rates & small distancesone dot moving
slow rates & small distancesslow rates & small distancesone dot movingone dot moving
intermediate parameters:apparent movement at first,
then steady dots
intermediate parameters:intermediate parameters:apparent movement at first,apparent movement at first,
then steady dotsthen steady dots
-
Explanations for perceptual breakdown/buildup effectsExplanations for perceptual Explanations for perceptual breakdown/buildup effectsbreakdown/buildup effects
Neurophysiological explanationNeural adaptation of coherence/pitch-motion detectors
(Anstis & Saida, 1985)
« Cognitive » explanationThe default is integration (1 stream);
the brain needs to accumulate evidence that there is more than 1 streambefore declaring « 2 streams »
(Bregman, 1978, 1990,…)
Other explanations coming up …
Neurophysiological explanationNeurophysiological explanationNeural adaptation of coherence/pitchNeural adaptation of coherence/pitch--motion detectors motion detectors
(Anstis & Saida, 1985)(Anstis & Saida, 1985)
«« CognitiveCognitive »» explanationexplanationThe default is integration (1 stream);The default is integration (1 stream);
the brain needs to accumulate evidence that there is more than 1the brain needs to accumulate evidence that there is more than 1 streamstreambefore declaring before declaring «« 2 streams2 streams »»
(Bregman, 1978, 1990,(Bregman, 1978, 1990,……))
Other explanations coming up Other explanations coming up ……
-
Alternate Models & Experimental Paradigms
• It is essential that neural recordings and perception occur simultaneously
• Human fMRI and MEG studies are valuable - up to a point!
• Animal studies are physiologically versatile - but introspective behavioral measure are not an option!
Therefore, we critically need …1. Cortical representations of perception that integrate both spectral and
dynamic features - to account for all perceptual distances
2. Objective psychoacoustic measures to facilitate animal experimentation
3. Characterization of adaptive processes during perception
-
Spectro-Temporal Models of Streaming
-
Cortical Physiology and Auditory ComputationsJonathan Fritz, Didier Depireux, David KleinJonathan Simon
Acknowledgment
Auditory Speech and Music ProcessingTaishih Chi, Mounya ElHilali, Powen Ru, Nima Masgarani
Supported by:MURI # N00014-97-1-0501 from the Office of Naval Research# NIDCD T32 DC00046-01 from the NIDCD# NSFD CD8803012 from the National Science Foundation
Tutorial on Auditory Scene AnalysisPerception and PhysiologyAn auditory sceneTwo classes of ASA processesOutline of Part IAuditory streamingWhat is it?Description and demonstration of the phenomenonStreaming also depends on temporal parametersStreaming also depends on connectedness (continuation) Dependence of streaming on stimulus parametersStreamingHow does it work?Theories and computational modelsThe channeling theoryHartmann and Johnson (1991) Music Percept.The channeling theoryHartmann and Johnson (1991) Music Percept.The channeling theoryHartmann and Johnson (1991) Music Percept.Beauvois & Meddis’s modelBeauvois and Meddis (1996) J. Acoust. Soc. Am.Computer simulation of auditory stream segregation inMcCabe & Denham’s modelMcCabe and Denham (1997) J. Acoust. Soc. Am.A model of auditory streamingIs peripheral chanelling the whole story?Sounds that excite the same peripheral channels can yield streamingVliegen & Oxenham (1999)Vliegen, Moore, Oxenham (1999)GStreaming with complex tonesStreaming based on F0 differencesAuditory spectral excitation pattern evoked by bandpass-filtered harmonic complexF0-based streaming with unresolved harmonics is possible Vliegen & Oxenham; Vliegen, Moore, Oxenham (1999) Grimault, MicheylPhase-based streamingRoberts, Glasberg, Moore (2002)A basic pre-requisite for any neural correlate of streaming: depend on both dF and dTSingle/few/multi-unit intra-cortical recordingsMonkeys: Fishman et al. (2001) Hear. Res. 151, 167-187 Bats: Kanwal, MedvedevMaybe, but:Ambiguous stimuli, bi-stable perceptsThe build-up of auditory streaming:a systematic change in the auditory percept over timeduring prolonged listening to repeatThe break-down of apparent motionExplanations for perceptual breakdown/buildup effects
/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown
/Description >>> setdistillerparams> setpagedevice