durai slides.pdf
TRANSCRIPT
-
jh_propagation.ai
SOUND PROPAGATION
listenersound source
speed c
x
lwavelength
c = f l
frequency f (Hz)
AN INTRODUCTION TOHUMAN SPATIAL HEARING
Richard O. DudaCIPIC Interface Laboratory
UC Davis
http://phosphor.cipic.ucdavis.edu
October 12, 2000
umd00_title.ai
umd00_overview.ai
OVERVIEW
Physics of sound
Acoustic cues for sound localization Azimuth Elevation Range
Head-related transfer functions (HRTFs)
Approaches to synthesizing spatial sound
Opportunities and challenges
jh_paths.ai
MULTIPATH PROPAGATION
Reflection
Refraction
Scattering
umd00_axiom_1.ai
AXIOM I
The sound pressure at the twoear drums is a sufficient stimulus.
Producing the same sound pressure willproduce the same auditory perception.
Bone conductionAdaptationConflicting visual cuesConflicting expectations
Caveats:
umd00_axiom_2.ai
AXIOM II
Exact reproduction of the sound pressureis not necessary for producing the sameauditory perception.
The limitations of neural responsesallow different (and simpler) stimulito produce the same response.
Bandwidth (20 Hz to 20 kHz) Amplitude (1-dB resolution) Monaural phase (2-ms resolution) Latency (10-ms resolution) Spectral fine structure(critical bands, Q = 8)
Examples:
umd00_axiom_3.ai
AXIOM III
Although it is not necessary to reproduceall of the cues exactly, conflicting cuesdegrade perception.
Key engineering challenge -- find themost cost-effective approximation.
ubc_vp_coords.ai
VERTICAL-POLARCOORDINATES
qf
Plane
of
const
ant
azimu
th
r
Cone ofconstantelevation
MedianPlane
Sound source
Horizontal plane
q
ubc_ip_coords.ai
INTERAURAL-POLARCOORDINATES
f
q f
Plane of consta
nt elevation
rInterau
ral axis
Cone ofconstantazimuth
MedianPlaneHorizontal plane
Sound source
-
jh_azimuth_cues.ai
AZIMUTH CUES
sound source
q
ITD (Interaural Time Difference)
ILD (Interaural Level Difference)
WOODWORTH'S FORMULA
ubc_delay.ai
Contralateral Ear Ipsilateral Ear
Sound Source
a q
qa
aq
q
a sin q
DTips =- a sin q
c
ITD = a q + sin qc
DTcon =a qc
ARRIVAL TIME
ubc_delay_curve.ai
Rayleigh's solution (20% rise time)Woodworth's formula
Angle of Incidence (deg)
Arr
iva
l tim
e
(ms)
0 50 100 150 200 250 300 350 400
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
-0.4
jh_elevation_cues.ai
ELEVATION CUES
soundsource
f
Pinna reflections and resonances
Torso and shoulder reflections
umd00_torso_refl1.ai
TORSO REFLECTIONsoundsource
f
h
soundsource
fmin
ffmin 90o
DTT2hc
|H(f)|
f12DTT
32DTT
52DTT
72DTT ubc_pinna_nomenclature.ai
THE PINNA
Cavum concha
Cymba concha
Helix
Crus helias
Triangular fossaScaphoid fossa
LobuleIntertragal incisure
Antihelix
External auditory meatusTragus
Antitragus
ubc_pinna_modes.ai
PINNA PHENOMENA
Pinna reflections (Batteau)
Pinna resonances (Shaw)
+ +
+
++
PINNAE
ubc_pinnae.ai jh_elevation_cues.ai
RANGE CUES
Loudness (for familiar sources)
Excess ILD (for close sources)
Direct/reverberant (for distant sources)
sound source
soundsource
-
umd00_dynamnic_cues1.ai
HEAD-MOTION CUES ANDFRONT/BACK CONFUSION
?
?
umd00_dynamnic_cues2.ai
HEAD-MOTION CUES ANDELEVATION MAGNITUDE
aa
aa
aa
f
ITD = a2ac
ITD = a cos f2acITD = 0
umd00_other_cues.ai
OTHER CUES
Visual cues Synchronized motion Absence
Knowledge of source
Knowedge of environment
jh_ff.ai
FREE-FIELD RADIATION FROM ASPHERICAL SOURCE
X(f) = Fourier transform of source pressureXff(f)= Free-field pressure at head center
Xff = Hff X
Hff(f)= e- j k r , k =
r0r
Inverse range Propagation delay
2 p fc
Sound Source
X(f)
r0r0
r
Xff(f)
ubc_HRTF_def.ai
THE HEAD-RELATEDTRANSFER FUNCTION
X(f) = Fourier transform of source pressureXL(f)= Fourier transform of left ear pressureXR(f)= Fourier transform of right ear pressureXff(f)= Free-field pressure at the origin
XL(f)= HL(f) Xff(f) XR(f)= HR(f) Xff(f)
HR(f)
Sound Source
X(f)
XR(f)
XL(f)
HL(f)
ubc_HRIR_def.ai
THE HEAD-RELATEDIMPULSE RESPONSE
hR(t)
Sound Source
d(t)
xR(t)
xL(t)
hL(t)
xL(t) = Left ear pressurexR(t) = Right ear pressurexff(t) = Free-field pressure at the origin
xL(t) = hL(t) xff(t-t) dt xR(t) = hR(t) xff(t-t) dt- 8
8
- 88
HRIR SOUND SYNTHESIS
jh_synthesis.ai
xR(t)xL(t)
Convolver Convolver
Head-RelatedImpulse Responses
Sound SignalhL(t)
hR(t)
Azimuth q Elevation f Range r
VirtualSource
x(t)
jh_structural_model.ai
A STRUCTURAL MODEL
VirtualSource
xR(t)xL(t)
x(t)
+ +
Head Torso Room Head Torso Room
Pinna Pinna
Sound Signal
COMPUTING HRTFs BYBOUNDARY ELEMENT METHODS
Digitize with a 3-D scannerSolve wave equation numerically
ubc_bem.ai
* See Kahana et al.
-
THE KEMARACOUSTIC MANIKIN
ubc_kemar.ai
f
q
Interaural
Axis
Elevation
Az
imuth
umd00_hoop.ai
ACOUSTICHRTF MEASUREMENT
jh_kemar_hrir_m45.ai
KEMAR HRIR
Azimuth = -45o, Elevation = 0o
0 0.5 1 1.5 2
Left ear
Right ear
Time (ms)
jh_kemar_hrtf_m45.ai
KEMAR HRTF
Azimuth = -45o, Elevation = 0o
Frequency (kHz)
Re
spo
nse
(d
B)
-30
-20
-10
0
10
20
30
0.1 1 1020.2 20
Left ear
Right ear
ubc_ke_freq.ai
RIGHT-EAR HRTF FOR KEMAR(Horizontal Plane)
100 1000 10000Frequency (Hz)
FRONT
Re
spo
nse
(d
B)
-25
-20
-15
-10
-5
0
5
10
15
20
AZIMUTH = 0o
AZIMUTH = 90o
AZIMUTH = -90o
100 1000 10000
Re
spo
nse
(d
B)
BACK
-25
-20
-15
-10
-5
0
5
10
15
20
Frequency (Hz)
AZIMUTH = 90o
AZIMUTH = 180o
AZIMUTH = 270o
ubc_ke_np_freq.ai
HRTF FOR KEMAR, NO PINNA(Horizontal Plane)
100 1000 10000
-25
-20
-15
-10
-5
0
5
10
Frequency (Hz)
Re
spo
nse
(d
B)
FRONTAZIMUTH = 90oAZIMUTH = 0o
AZIMUTH = -90o
BACK
Frequency (Hz)
Re
spo
nse
(d
B)
100 1000 10000
-25
-20
-15
-10
-5
0
5
10AZIMUTH = 90o
AZIMUTH = 180o
AZIMUTH = 270o
umd00_full_HRTF.ai
HRTF ELEVATION DEPENDENCE
Fre
quency
(k
Hz)
2
4
6
8
10
12
14
16
Elevation (deg)0 100 200
-15
-10
-5
0
5
10
15
dB
umd00_HRTF_nopinna.ai
HRTF WITHOUT PINNA
Fre
quency
(k
Hz)
2
4
6
8
10
12
14
16
-15
-10
-5
0
5
10
15
dBElevation (deg)0 100 200
umd00_pinplane.ai
A PINNA ON A PLANE
-
umd00_HRTF_pinna.ai
HRTF FOR ISOLATED PINNA
Fre
quency
(k
Hz)
2
4
6
8
10
12
14
16
-15
-10
-5
0
5
10
15
dBElevation (deg)0 100 200
-15
-10
-5
0
5
10
15
Fre
quency
(k
Hz)
2
4
6
8
10
12
14
16
Elevation (deg)0 100 200
Fre
quency
(k
Hz)
2
4
6
8
10
12
14
16
-15
-10
-5
0
5
10
15
Fre
quency
(k
Hz)
2
4
6
8
10
12
14
16
-15
-10
-5
0
5
10
15
dB
Full HRTF
Head and torso
Pinna
umd00_HRTF_contributions.ai
CONTRIBUTIONS TO THE HRTF
jh_structural_model.ai
A STRUCTURAL MODEL
VirtualSource
xR(t)xL(t)
x(t)
+ +
Head Torso Room Head Torso Room
Pinna Pinna
Sound Signal
ubc_sphere_model.ai
THE SPHERICAL-HEAD MODEL
VirtualSource
q
xR(t)xL(t)
x(t)
DTL(q)
HHsL(q)
DTR(q)
HHsR(q)
jh_sphere_assess.ai
ASSESSING THESPHERICAL HEAD MODEL
Only one parameter -- easily customized
Well focused
Good left/right position
No up/done control -- image elevated
With a head tracker: Moderately externalized Little front/back confusion
Without a head tracker: Internalized Usually seems to be in back
jh_torso_reflections.ai
ELLIPSOIDAL-TORSO MODEL
soundsource
f
HeadModel
HeadModel
rT
DTT
rT
DTT
= torso reflection coefficient
= torso reflection delay
jh_ellipsoid_assess.ai
ASSESSING THEELLIPSOIDAL TORSO MODEL
Five parameters; still easily customized
Provides an elevation cue Significant below 3 kHz Ineffective in median plane
Only one component of a full model
jh_structural_model_2.ai
STRUCTURAL HRTF MODEL
HeadModel
HeadModel
TorsoModel
PinnaModel
DTH(q)
HHS
(q)
Head Model
rT
DTT(q,f)
Torso Model
jh_structural_model_3.ai
SIMPLIFIED PINNA MODEL
kP(f)
DTP(f)
Fixed-poleresonator
kP(f)
DTP(f)
Fixed-poleresonator
-
umd00_systems.ai
SPATIAL SOUND SYSTEMS
Multichannel
Two-channel: headphones
Two-channel: crosstalk-canceled loud speakers
umd00_systems2.ai
MULTICHANNEL SYSTEMS
Pros Works with a large audience No customization needed Conceptually simple
Cons Speakers must be distant Many channels needed for full 3-D Space consuming, expensive
umd00_systems3.ai
TWO-CHANNEL: HEADPHONES
Pros Can reproduce full 3-D with only 2 channels Private and non-interfering Conceptually simple
Cons Uncomfortable for extended use Clumsy for a large audience Requires customization for full 3-D Difficult to achieve frontal externalization
xL(t) xR(t)
umd00_systems4.ai
TWO-CHANNEL: CROSSTALK-CANCELED
LOUD SPEAKERS
Pros Can reproduce full 3-D with only 2 channels Unencumbered listening
Cons Small "sweet spot" Cannot be used with a large audience Requires customization for full 3-D Difficult to get near or rear locations
xL(t) xR(t)
Inverse HRTFs
umd00_customization.ai
APPROACHES TOCUSTOMIZATION
Measure exact HRTF for each person Acoustic Computational
Nearest-neighbor Trial and error Anthropometry
Scale a standard HRTF Global Pinna/head/torso components
Use an adaptive model Match to anthropometry Match to exact HRTF
umd00_problems.ai
CHALLENGESAND
OPPORTUNITIES
Frequency range (combining partial HRTFs)
Elevation perception Front/back confusion Low elevations
Range perception Headphones: externalization Median plane Frontal Speakers: back locations
Transducers Headphone compensation Loudspeaker "sweet spot"
Latency in dynamic systems
Room acoustics