auditory perception classes: critical bands/auditory filters
Post on 31-Jan-2016
55 Views
Preview:
DESCRIPTION
TRANSCRIPT
Auditory perception classes:critical bands/auditory filters
David PoeppelQuestions/complaints/concerns:
david.poeppel@nyu.edu
QuickTime™ and a decompressor
are needed to see this picture.
Coronal slice illustrating auditory pathway from ear to auditory cortex
Coronal slice (structural MRI) illustrating localized activation in superior auditory cortex (upper bank of superior temporal gyrus) to sinusoidal tones of different frequencies.
Auditory system
Kandel 2000
AVCN PVCN DCN
LSO MSO MNTB
DNLL VNLL
ICc ICp ICdc ICx
A1 R RT
MGv MGd MGm Sg/Lim PM
CL CM ML
CPB
RTMRTLALRM
10, orb12vl46d8a
TS1, 2
RPB
Tpt STS
Contralateral cochlear n.
Sup. olivary complex
Lateral lemniscus
Midbrain
Thalamus
Core
Belt
Parabelt
Temporal lobe
Frontal lobe
Kaas & Hackett , PNAS 97, 11793 - 9 (2000)
Hall & Garcia, 2012
Auditory cortex
Medial geniculate body
Medial geniculate body
Inferior colliculus Inferior colliculus
Auditory cortex
Superior Olivary Complex
Superior Olivary Complex
Cochlear Nucleus Cochlear Nucleus
Left Cochlea Right Cochlea
Lateral lemniscus
Lateral lemniscus
Chandrasekaran & Kraus 2009
Hickok & Poeppel, 2007, Nat Rev Neurosci
Functional anatomy of speech sound processing
A few reminders about the characteristics
• Frequency range of human auditory system– 20 Hz to 20,000 Hz (textbook); 50 Hz to 10,000 Hz (really); most
psychophysics is done between 100 - 5,000 Hz (because that is the range in which one obtains interpretable data).
• Intensity range– Extends over many orders of magnitude (depending on frequency); at
the ‘sweet spot’ (~1000-3000 Hz) about a 120 dB dynamic range
• Sensitivity– JNDs for frequency: ~0.2% (e.g., at 1000 Hz base frequency, listeners
can distinguish 1000 Hz from 1002 Hz -- impressive!)– JNDs for loudness discrimination: ~1 dB– Sensitivity to timing differences: a few microseconds in spatial hearing
(JNDs for azimuthal localization ~1 deg); 2 milliseconds (gap thresholds); 25 milliseconds (order threshold). Really impressive …
Absolute threshold of hearing in quiet
audiograms
Perceptual attributes of sounds
• Spatial location– Binaural hearing (inter-aural time and intensity differences), head-
related transfer function.
• Loudness– Signal amplitude (ASA Demos 8-11, compare 6/3/1 dB steps)
• Pitch – sound frequency, fundamental frequency of complex periodic signals,
or inter-harmonic spacing
• Timbre– Distribution of energy across frequency, shape of the spectrum
The frequency resolution/processing of the system underlies the construction of perceptual attributes.
Pure vs. complex tones (all A440) - pitch, timbre, phase
T (= 2.27 ms) t
pitch is (largely) phase invariant
The auditory periphery
QuickTime™ and a decompressor
are needed to see this picture.
Human cochlea: 3-D reconstruction
Cochlear animation, the Hudspeth version
http://www.rockefeller.edu/labheads/hudspeth/movie06_popup.html
Masking
• The interference one sound causes in the reception of another sound– Peripheral component/cause: overlapping excitation pattern– Central component/cause: uncertainty - “informational masking”
• Masking experiments have been used extensively to investigate spectral and temporal aspects of hearing– Masking to study frequency selectivity: the critical band– Forward and backward masking (temporal and spectral constraints)– Comodulation masking release CMR (‘unmasking’ of sub-threshold
signal by comodulated signal in different regime)
Classic experiment: Fletcher 1940
Bandlimited noise stimuli
Classic experiment: Fletcher 1940
Schooneveldt & Moore 1989
Determine threshold of sinusoidal signal in noise.Noise always centered at signal frequency.
Frequency (Hz)
Sou
nd le
vel (
dB)
masker
signal
• ASA Demos 2-6 -- count tones in noise, as function of bandwidth.• Increases in noise bandwidth result in more noise passing through a given filter, yielding more masking. However, when the noise bandwidth exceeds the filter bandwidth, there is no more threshold change. The point at which further increases yield no further threshold in creases: critical band. • Starting with Fletcher, masking studies have been used to evaluate frequency selectivity of auditory system.• Interpretation of masking data: auditory periphery can be described as a set of contiguous, overlapping bandpass filters, with overlapping passbands. These “auditory filters” comprise the first stage in the spectro-temporal analysis of all sounds.
Critical bands by loudness comparison
frequency
lou
dn
ess
Reference noise band compared to test noise band with increasing bandwidth (constant power).When the bandwidth of the test noise exceeds the critical bandwidth, the loudness begins to increase.(ASA Demo 7)
Zwicker & Feldtkeller 1967; Scharf 1970; Rossing 1982
Model of masking: Power spectrum model1. The (peripheral) auditory system contains an array of linear overlapping bandpass filters.2. When detecting signal in noise, listener makes use of just one filter, centered close to the
signal frequency. This filter will pass the signal but remove a great deal of the noise.3. Only the noise components passing through the filter will mask the signal.4. The threshold is determined by the amount of noise passing through the filter. The threshold
corresponds to some signal-to-noise ratio K at the output of the filter.
• Simplifying assumption made by Fletcher: rectangular filters, ‘flat top’, width of the filter is CB.• Estimate value of CB indirectly by measuring power of sinusoidal signal Ps required for
detection in broadband white noise of power density N0.
Noise falling within CB is N0 x CB. Following 4 above, Ps/(N0 x CB) = K
CB = Ps/(N0 x K)
By measuring Ps and N0 and estimating K, the value of the critical band can be determined.(Fletcher estimated K=1; Scharf, 1970, revised that to about 0.4) (Ps/N0 called ‘critical ratio’)
Estimating the shape of the auditory filter based on power-spectrum model:
Ps = K 0
∞
N(f) W(f) df
• Masker is represented by its long-term power spectrum N(f)• Weighting function, or auditory filter is W(f)• Ps is power of the signal at threshold
To assess auditory filter shape, Roy Patterson developed a new masking approach. The signal is fixed in frequency and the masker is noise with a bandstop, the width of which is varied.Threshold corresponds to a constant signal-to-masker ratio at the output of the filter.
Patterson, R.D. (1976). J. Acoust. Soc. Am., 59, 640-659.
Patterson’s ‘notched noise’ method
Patterson, R.D. (1974). J. Acoust. Soc. Am., 55, 802-809.
Patterson’s ‘notched noise’ method
Shape of auditory filter from notched noise
The auditory filter, unsurprisingly, is unlike a simple rectangular filter. This filter cannot be specified with a single number … However, some sort of summary statistic is useful. Common measure: bandwidth of the filter at the point at which the power has fallen by a factor of 2 -- i.e. by 3 dB. (Other measure: equivalent rectangular bandwidth (ERB)).
Other approaches to characterizing the auditory filter:• psychophysical tuning curves (e.g. Vogten 1974)• rippled noise method (Glasberg, Moore, Nimmo-Smith 1984 )
Typical values of auditory filter bandwidth, based on notched-noise approach: 10-15% of center frequency.
Psychophysical tuning curves
• A critical band CB is understood as a spectral window over which energy is integrated for certain tasks.
• Spectral band filters, as observed in psychoacoustics and in behavioral experiments, are sliding spectral windows.
• CB estimates in humans are ~0.14 - 0.23 octaves
Four approaches to study critical bands
1. Masking with noise on target
2. Critical bands by loudness
3. Notched noise method
4. Psychophysical tuning curves
The width of the critical band (auditory filter) changes with center frequency
The shape of the critical band (auditory filter) changes with signal amplitude
Relation between auditory filters and excitation pattern
Top: 1-kHz sinusoid as ‘represented’ by five auditory filters, centered at different frequencies.
Bottom: calculated excitation pattern of auditory system
Moore & Glasberg 1983
Summary 1: The filter-bank model of hearing
• The basilar membrane 'filters' parts of a sound into the auditory-nerve fibers, so that the output of the cochlea is like the output of a bank of filters that transmit information in parallel. • Each of these 'auditory' filters is centered at a different frequency, and responds to only a narrow range of frequencies. The centre frequencies of adjacent filters are very close, so that their frequency ranges overlap considerably. • There are around 25,000 nerve fibres in the auditory nerve, so the center frequencies of filters are, effectively, continuously distributed over the ear's frequency range.
Hall & Garcia, 2012
1. Human auditory perceptual analysis is quantized into < 30 “critical bands” 2. of perceptually near-identical frequency analysis classes 3. corresponding to approximately equal length bands of cochlear tissue (receptor surface)
Summary 2: critical bands/auditory filters
basal end apical end
Co-modulation masking release (CMR)Task: Detection of a pure tone embedded in noiseStimuli: Target & 2 types of noise
1) Gaussian white noise (UM)
2) Amplitude-modulated noise (CM) (Gaussian white noise x sinusoid)
amplitude
time
Co-modulated temporal
fluctuations across different frequency
bands
Auditory CMR Stimuli presentation timeline & summary
Target
ISI = 500ms
ISI = 500ms
mask = 1000ms
target = 750ms
x 100 trials per condition per session x 3...
Task: Detection of a pure tone embedded in noiseStimuli: Target masked by either UM or CM noise
UM CMEach type of noise masker is filtered to have different bandwidths: typically: 50, 100, 200, 400, 1000, 2000 Hz (=12 blocks) centered at a given frequency (e.g., 1, 2 or 4 kHz)
Measure: thresholds for detection of target in each noise condition using interleaved staircases (2x 1-up, 2-down), 50 trials per staircase
Trials blocked by condition type
Co-modulation masking release (CMR)
UM
CM
Critical Band width for 1 kHz Center Frequency
First reported in: Hall, J.W., Haggard, M.P. & Fernandez, M.A. (1984). Detection by spectro-temporal pattern analysis. J. Acoust. Soc. Am. 76: 50-56.
Co-modulation masking release (CMR)
Co-modulation masking release (CMR)
• Target detection is easier when the noise masker is amplitude co-modulated (CM) compared to the reference (UM) white noise condition.
• The CMR effect increases with increasing bandwidth (counterintuitive...adding more noise reduces threshold!)
• Proposed mechanisms (many): e.g., “the dip-listening hypothesis”
Summary
Co-modulation masking release (CMR)
• The classical auditory CMR effect (typically run at 50Hz) can be obtained @ lower modulation frequencies...(extensive piloting done at 4Hz, 5Hz & 10Hz modulation frequencies). The above results are also reported in the literature: Bacon et al. (1997). Masking by modulated and unmodulated noise: effects of bandwidth, modulation rate, signal frequency, and masker level.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Spectral resolution is great.But do you always need it?Shannon 1995
top related