durai slides.pdf

jh_propagation.ai

SOUND PROPAGATION

listenersound source

speed c

x

lwavelength

c = f l

frequency f (Hz)

AN INTRODUCTION TOHUMAN SPATIAL HEARING

Richard O. DudaCIPIC Interface Laboratory

UC Davis

http://phosphor.cipic.ucdavis.edu

October 12, 2000

umd00_title.ai

umd00_overview.ai

OVERVIEW

Physics of sound

Acoustic cues for sound localization Azimuth Elevation Range

Head-related transfer functions (HRTFs)

Approaches to synthesizing spatial sound

Opportunities and challenges

jh_paths.ai

MULTIPATH PROPAGATION

Reflection

Refraction

Scattering

umd00_axiom_1.ai

AXIOM I

The sound pressure at the twoear drums is a sufficient stimulus.

Producing the same sound pressure willproduce the same auditory perception.

Bone conductionAdaptationConflicting visual cuesConflicting expectations

Caveats:

umd00_axiom_2.ai

AXIOM II

Exact reproduction of the sound pressureis not necessary for producing the sameauditory perception.

The limitations of neural responsesallow different (and simpler) stimulito produce the same response.

Bandwidth (20 Hz to 20 kHz) Amplitude (1-dB resolution) Monaural phase (2-ms resolution) Latency (10-ms resolution) Spectral fine structure(critical bands, Q = 8)

Examples:

umd00_axiom_3.ai

AXIOM III

Although it is not necessary to reproduceall of the cues exactly, conflicting cuesdegrade perception.

Key engineering challenge -- find themost cost-effective approximation.

ubc_vp_coords.ai

VERTICAL-POLARCOORDINATES

qf

Plane

of

const

ant

azimu

th

r

Cone ofconstantelevation

MedianPlane

Sound source

Horizontal plane

q

ubc_ip_coords.ai

INTERAURAL-POLARCOORDINATES

f

q f

Plane of consta

nt elevation

rInterau

ral axis

Cone ofconstantazimuth

MedianPlaneHorizontal plane

Sound source

jh_azimuth_cues.ai

AZIMUTH CUES

sound source

q

ITD (Interaural Time Difference)

ILD (Interaural Level Difference)

WOODWORTH'S FORMULA

ubc_delay.ai

Contralateral Ear Ipsilateral Ear

Sound Source

a q

qa

aq

q

a sin q

DTips =- a sin q

c

ITD = a q + sin qc

DTcon =a qc

ARRIVAL TIME

ubc_delay_curve.ai

Rayleigh's solution (20% rise time)Woodworth's formula

Angle of Incidence (deg)

Arr

iva

l tim

e

(ms)

0 50 100 150 200 250 300 350 400

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

-0.4

jh_elevation_cues.ai

ELEVATION CUES

soundsource

f

Pinna reflections and resonances

Torso and shoulder reflections

umd00_torso_refl1.ai

TORSO REFLECTIONsoundsource

f

h

soundsource

fmin

ffmin 90o

DTT2hc

|H(f)|

f12DTT

32DTT

52DTT

72DTT ubc_pinna_nomenclature.ai

THE PINNA

Cavum concha

Cymba concha

Helix

Crus helias

Triangular fossaScaphoid fossa

LobuleIntertragal incisure

Antihelix

External auditory meatusTragus

Antitragus

ubc_pinna_modes.ai

PINNA PHENOMENA

Pinna reflections (Batteau)

Pinna resonances (Shaw)

+ +

+

++

PINNAE

ubc_pinnae.ai jh_elevation_cues.ai

RANGE CUES

Loudness (for familiar sources)

Excess ILD (for close sources)

Direct/reverberant (for distant sources)

sound source

soundsource

umd00_dynamnic_cues1.ai

HEAD-MOTION CUES ANDFRONT/BACK CONFUSION

?

?

umd00_dynamnic_cues2.ai

HEAD-MOTION CUES ANDELEVATION MAGNITUDE

aa

aa

aa

f

ITD = a2ac

ITD = a cos f2acITD = 0

umd00_other_cues.ai

OTHER CUES

Visual cues Synchronized motion Absence

Knowledge of source

Knowedge of environment

jh_ff.ai

FREE-FIELD RADIATION FROM ASPHERICAL SOURCE

X(f) = Fourier transform of source pressureXff(f)= Free-field pressure at head center

Xff = Hff X

Hff(f)= e- j k r , k =

r0r

Inverse range Propagation delay

2 p fc

Sound Source

X(f)

r0r0

r

Xff(f)

ubc_HRTF_def.ai

THE HEAD-RELATEDTRANSFER FUNCTION

X(f) = Fourier transform of source pressureXL(f)= Fourier transform of left ear pressureXR(f)= Fourier transform of right ear pressureXff(f)= Free-field pressure at the origin

XL(f)= HL(f) Xff(f) XR(f)= HR(f) Xff(f)

HR(f)

Sound Source

X(f)

XR(f)

XL(f)

HL(f)

ubc_HRIR_def.ai

THE HEAD-RELATEDIMPULSE RESPONSE

hR(t)

Sound Source

d(t)

xR(t)

xL(t)

hL(t)

xL(t) = Left ear pressurexR(t) = Right ear pressurexff(t) = Free-field pressure at the origin

xL(t) = hL(t) xff(t-t) dt xR(t) = hR(t) xff(t-t) dt- 8

8

- 88

HRIR SOUND SYNTHESIS

jh_synthesis.ai

xR(t)xL(t)

Convolver Convolver

Head-RelatedImpulse Responses

Sound SignalhL(t)

hR(t)

Azimuth q Elevation f Range r

VirtualSource

x(t)

jh_structural_model.ai

A STRUCTURAL MODEL

VirtualSource

xR(t)xL(t)

x(t)

+ +

Head Torso Room Head Torso Room

Pinna Pinna

Sound Signal

COMPUTING HRTFs BYBOUNDARY ELEMENT METHODS

Digitize with a 3-D scannerSolve wave equation numerically

ubc_bem.ai

* See Kahana et al.

THE KEMARACOUSTIC MANIKIN

ubc_kemar.ai

f

q

Interaural

Axis

Elevation

Az

imuth

umd00_hoop.ai

ACOUSTICHRTF MEASUREMENT

jh_kemar_hrir_m45.ai

KEMAR HRIR

Azimuth = -45o, Elevation = 0o

0 0.5 1 1.5 2

Left ear

Right ear

Time (ms)

jh_kemar_hrtf_m45.ai

KEMAR HRTF

Azimuth = -45o, Elevation = 0o

Frequency (kHz)

Re

spo

nse

(d

B)

-30

-20

-10

0

10

20

30

0.1 1 1020.2 20

Left ear

Right ear

ubc_ke_freq.ai

RIGHT-EAR HRTF FOR KEMAR(Horizontal Plane)

100 1000 10000Frequency (Hz)

FRONT

Re

spo

nse

(d

B)

-25

-20

-15

-10

-5

0

5

10

15

20

AZIMUTH = 0o

AZIMUTH = 90o

AZIMUTH = -90o

100 1000 10000

Re

spo

nse

(d

B)

BACK

-25

-20

-15

-10

-5

0

5

10

15

20

Frequency (Hz)

AZIMUTH = 90o

AZIMUTH = 180o

AZIMUTH = 270o

ubc_ke_np_freq.ai

HRTF FOR KEMAR, NO PINNA(Horizontal Plane)

100 1000 10000

-25

-20

-15

-10

-5

0

5

10

Frequency (Hz)

Re

spo

nse

(d

B)

FRONTAZIMUTH = 90oAZIMUTH = 0o

AZIMUTH = -90o

BACK

Frequency (Hz)

Re

spo

nse

(d

B)

100 1000 10000

-25

-20

-15

-10

-5

0

5

10AZIMUTH = 90o

AZIMUTH = 180o

AZIMUTH = 270o

umd00_full_HRTF.ai

HRTF ELEVATION DEPENDENCE

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

Elevation (deg)0 100 200

-15

-10

-5

0

5

10

15

dB

umd00_HRTF_nopinna.ai

HRTF WITHOUT PINNA

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

-15

-10

-5

0

5

10

15

dBElevation (deg)0 100 200

umd00_pinplane.ai

A PINNA ON A PLANE

umd00_HRTF_pinna.ai

HRTF FOR ISOLATED PINNA

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

-15

-10

-5

0

5

10

15

dBElevation (deg)0 100 200

-15

-10

-5

0

5

10

15

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

Elevation (deg)0 100 200

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

-15

-10

-5

0

5

10

15

Fre

quency

(k

Hz)

2

4

6

8

10

12

14

16

-15

-10

-5

0

5

10

15

dB

Full HRTF

Head and torso

Pinna

umd00_HRTF_contributions.ai

CONTRIBUTIONS TO THE HRTF

jh_structural_model.ai

A STRUCTURAL MODEL

VirtualSource

xR(t)xL(t)

x(t)

+ +

Head Torso Room Head Torso Room

Pinna Pinna

Sound Signal

ubc_sphere_model.ai

THE SPHERICAL-HEAD MODEL

VirtualSource

q

xR(t)xL(t)

x(t)

DTL(q)

HHsL(q)

DTR(q)

HHsR(q)

jh_sphere_assess.ai

ASSESSING THESPHERICAL HEAD MODEL

Only one parameter -- easily customized

Well focused

Good left/right position

No up/done control -- image elevated

With a head tracker: Moderately externalized Little front/back confusion

Without a head tracker: Internalized Usually seems to be in back

jh_torso_reflections.ai

ELLIPSOIDAL-TORSO MODEL

soundsource

f

HeadModel

HeadModel

rT

DTT

rT

DTT

= torso reflection coefficient

= torso reflection delay

jh_ellipsoid_assess.ai

ASSESSING THEELLIPSOIDAL TORSO MODEL

Five parameters; still easily customized

Provides an elevation cue Significant below 3 kHz Ineffective in median plane

Only one component of a full model

jh_structural_model_2.ai

STRUCTURAL HRTF MODEL

HeadModel

HeadModel

TorsoModel

PinnaModel

DTH(q)

HHS

(q)

Head Model

rT

DTT(q,f)

Torso Model

jh_structural_model_3.ai

SIMPLIFIED PINNA MODEL

kP(f)

DTP(f)

Fixed-poleresonator

kP(f)

DTP(f)

Fixed-poleresonator

umd00_systems.ai

SPATIAL SOUND SYSTEMS

Multichannel

Two-channel: headphones

Two-channel: crosstalk-canceled loud speakers

umd00_systems2.ai

MULTICHANNEL SYSTEMS

Pros Works with a large audience No customization needed Conceptually simple

Cons Speakers must be distant Many channels needed for full 3-D Space consuming, expensive

umd00_systems3.ai

TWO-CHANNEL: HEADPHONES

Pros Can reproduce full 3-D with only 2 channels Private and non-interfering Conceptually simple

Cons Uncomfortable for extended use Clumsy for a large audience Requires customization for full 3-D Difficult to achieve frontal externalization

xL(t) xR(t)

umd00_systems4.ai

TWO-CHANNEL: CROSSTALK-CANCELED

LOUD SPEAKERS

Pros Can reproduce full 3-D with only 2 channels Unencumbered listening

Cons Small "sweet spot" Cannot be used with a large audience Requires customization for full 3-D Difficult to get near or rear locations

xL(t) xR(t)

Inverse HRTFs

umd00_customization.ai

APPROACHES TOCUSTOMIZATION

Measure exact HRTF for each person Acoustic Computational

Nearest-neighbor Trial and error Anthropometry

Scale a standard HRTF Global Pinna/head/torso components

Use an adaptive model Match to anthropometry Match to exact HRTF

umd00_problems.ai

CHALLENGESAND

OPPORTUNITIES

Frequency range (combining partial HRTFs)

Elevation perception Front/back confusion Low elevations

Range perception Headphones: externalization Median plane Frontal Speakers: back locations

Transducers Headphone compensation Loudspeaker "sweet spot"

Latency in dynamic systems

Room acoustics

durai slides.pdf

Documents