durai slides.pdf

Upload: srishti-agarwal

Post on 12-Mar-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

  • jh_propagation.ai

    SOUND PROPAGATION

    listenersound source

    speed c

    x

    lwavelength

    c = f l

    frequency f (Hz)

    AN INTRODUCTION TOHUMAN SPATIAL HEARING

    Richard O. DudaCIPIC Interface Laboratory

    UC Davis

    http://phosphor.cipic.ucdavis.edu

    October 12, 2000

    umd00_title.ai

    umd00_overview.ai

    OVERVIEW

    Physics of sound

    Acoustic cues for sound localization Azimuth Elevation Range

    Head-related transfer functions (HRTFs)

    Approaches to synthesizing spatial sound

    Opportunities and challenges

    jh_paths.ai

    MULTIPATH PROPAGATION

    Reflection

    Refraction

    Scattering

    umd00_axiom_1.ai

    AXIOM I

    The sound pressure at the twoear drums is a sufficient stimulus.

    Producing the same sound pressure willproduce the same auditory perception.

    Bone conductionAdaptationConflicting visual cuesConflicting expectations

    Caveats:

    umd00_axiom_2.ai

    AXIOM II

    Exact reproduction of the sound pressureis not necessary for producing the sameauditory perception.

    The limitations of neural responsesallow different (and simpler) stimulito produce the same response.

    Bandwidth (20 Hz to 20 kHz) Amplitude (1-dB resolution) Monaural phase (2-ms resolution) Latency (10-ms resolution) Spectral fine structure(critical bands, Q = 8)

    Examples:

    umd00_axiom_3.ai

    AXIOM III

    Although it is not necessary to reproduceall of the cues exactly, conflicting cuesdegrade perception.

    Key engineering challenge -- find themost cost-effective approximation.

    ubc_vp_coords.ai

    VERTICAL-POLARCOORDINATES

    qf

    Plane

    of

    const

    ant

    azimu

    th

    r

    Cone ofconstantelevation

    MedianPlane

    Sound source

    Horizontal plane

    q

    ubc_ip_coords.ai

    INTERAURAL-POLARCOORDINATES

    f

    q f

    Plane of consta

    nt elevation

    rInterau

    ral axis

    Cone ofconstantazimuth

    MedianPlaneHorizontal plane

    Sound source

  • jh_azimuth_cues.ai

    AZIMUTH CUES

    sound source

    q

    ITD (Interaural Time Difference)

    ILD (Interaural Level Difference)

    WOODWORTH'S FORMULA

    ubc_delay.ai

    Contralateral Ear Ipsilateral Ear

    Sound Source

    a q

    qa

    aq

    q

    a sin q

    DTips =- a sin q

    c

    ITD = a q + sin qc

    DTcon =a qc

    ARRIVAL TIME

    ubc_delay_curve.ai

    Rayleigh's solution (20% rise time)Woodworth's formula

    Angle of Incidence (deg)

    Arr

    iva

    l tim

    e

    (ms)

    0 50 100 150 200 250 300 350 400

    -0.3

    -0.2

    -0.1

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    -0.4

    jh_elevation_cues.ai

    ELEVATION CUES

    soundsource

    f

    Pinna reflections and resonances

    Torso and shoulder reflections

    umd00_torso_refl1.ai

    TORSO REFLECTIONsoundsource

    f

    h

    soundsource

    fmin

    ffmin 90o

    DTT2hc

    |H(f)|

    f12DTT

    32DTT

    52DTT

    72DTT ubc_pinna_nomenclature.ai

    THE PINNA

    Cavum concha

    Cymba concha

    Helix

    Crus helias

    Triangular fossaScaphoid fossa

    LobuleIntertragal incisure

    Antihelix

    External auditory meatusTragus

    Antitragus

    ubc_pinna_modes.ai

    PINNA PHENOMENA

    Pinna reflections (Batteau)

    Pinna resonances (Shaw)

    + +

    +

    ++

    PINNAE

    ubc_pinnae.ai jh_elevation_cues.ai

    RANGE CUES

    Loudness (for familiar sources)

    Excess ILD (for close sources)

    Direct/reverberant (for distant sources)

    sound source

    soundsource

  • umd00_dynamnic_cues1.ai

    HEAD-MOTION CUES ANDFRONT/BACK CONFUSION

    ?

    ?

    umd00_dynamnic_cues2.ai

    HEAD-MOTION CUES ANDELEVATION MAGNITUDE

    aa

    aa

    aa

    f

    ITD = a2ac

    ITD = a cos f2acITD = 0

    umd00_other_cues.ai

    OTHER CUES

    Visual cues Synchronized motion Absence

    Knowledge of source

    Knowedge of environment

    jh_ff.ai

    FREE-FIELD RADIATION FROM ASPHERICAL SOURCE

    X(f) = Fourier transform of source pressureXff(f)= Free-field pressure at head center

    Xff = Hff X

    Hff(f)= e- j k r , k =

    r0r

    Inverse range Propagation delay

    2 p fc

    Sound Source

    X(f)

    r0r0

    r

    Xff(f)

    ubc_HRTF_def.ai

    THE HEAD-RELATEDTRANSFER FUNCTION

    X(f) = Fourier transform of source pressureXL(f)= Fourier transform of left ear pressureXR(f)= Fourier transform of right ear pressureXff(f)= Free-field pressure at the origin

    XL(f)= HL(f) Xff(f) XR(f)= HR(f) Xff(f)

    HR(f)

    Sound Source

    X(f)

    XR(f)

    XL(f)

    HL(f)

    ubc_HRIR_def.ai

    THE HEAD-RELATEDIMPULSE RESPONSE

    hR(t)

    Sound Source

    d(t)

    xR(t)

    xL(t)

    hL(t)

    xL(t) = Left ear pressurexR(t) = Right ear pressurexff(t) = Free-field pressure at the origin

    xL(t) = hL(t) xff(t-t) dt xR(t) = hR(t) xff(t-t) dt- 8

    8

    - 88

    HRIR SOUND SYNTHESIS

    jh_synthesis.ai

    xR(t)xL(t)

    Convolver Convolver

    Head-RelatedImpulse Responses

    Sound SignalhL(t)

    hR(t)

    Azimuth q Elevation f Range r

    VirtualSource

    x(t)

    jh_structural_model.ai

    A STRUCTURAL MODEL

    VirtualSource

    xR(t)xL(t)

    x(t)

    + +

    Head Torso Room Head Torso Room

    Pinna Pinna

    Sound Signal

    COMPUTING HRTFs BYBOUNDARY ELEMENT METHODS

    Digitize with a 3-D scannerSolve wave equation numerically

    ubc_bem.ai

    * See Kahana et al.

  • THE KEMARACOUSTIC MANIKIN

    ubc_kemar.ai

    f

    q

    Interaural

    Axis

    Elevation

    Az

    imuth

    umd00_hoop.ai

    ACOUSTICHRTF MEASUREMENT

    jh_kemar_hrir_m45.ai

    KEMAR HRIR

    Azimuth = -45o, Elevation = 0o

    0 0.5 1 1.5 2

    Left ear

    Right ear

    Time (ms)

    jh_kemar_hrtf_m45.ai

    KEMAR HRTF

    Azimuth = -45o, Elevation = 0o

    Frequency (kHz)

    Re

    spo

    nse

    (d

    B)

    -30

    -20

    -10

    0

    10

    20

    30

    0.1 1 1020.2 20

    Left ear

    Right ear

    ubc_ke_freq.ai

    RIGHT-EAR HRTF FOR KEMAR(Horizontal Plane)

    100 1000 10000Frequency (Hz)

    FRONT

    Re

    spo

    nse

    (d

    B)

    -25

    -20

    -15

    -10

    -5

    0

    5

    10

    15

    20

    AZIMUTH = 0o

    AZIMUTH = 90o

    AZIMUTH = -90o

    100 1000 10000

    Re

    spo

    nse

    (d

    B)

    BACK

    -25

    -20

    -15

    -10

    -5

    0

    5

    10

    15

    20

    Frequency (Hz)

    AZIMUTH = 90o

    AZIMUTH = 180o

    AZIMUTH = 270o

    ubc_ke_np_freq.ai

    HRTF FOR KEMAR, NO PINNA(Horizontal Plane)

    100 1000 10000

    -25

    -20

    -15

    -10

    -5

    0

    5

    10

    Frequency (Hz)

    Re

    spo

    nse

    (d

    B)

    FRONTAZIMUTH = 90oAZIMUTH = 0o

    AZIMUTH = -90o

    BACK

    Frequency (Hz)

    Re

    spo

    nse

    (d

    B)

    100 1000 10000

    -25

    -20

    -15

    -10

    -5

    0

    5

    10AZIMUTH = 90o

    AZIMUTH = 180o

    AZIMUTH = 270o

    umd00_full_HRTF.ai

    HRTF ELEVATION DEPENDENCE

    Fre

    quency

    (k

    Hz)

    2

    4

    6

    8

    10

    12

    14

    16

    Elevation (deg)0 100 200

    -15

    -10

    -5

    0

    5

    10

    15

    dB

    umd00_HRTF_nopinna.ai

    HRTF WITHOUT PINNA

    Fre

    quency

    (k

    Hz)

    2

    4

    6

    8

    10

    12

    14

    16

    -15

    -10

    -5

    0

    5

    10

    15

    dBElevation (deg)0 100 200

    umd00_pinplane.ai

    A PINNA ON A PLANE

  • umd00_HRTF_pinna.ai

    HRTF FOR ISOLATED PINNA

    Fre

    quency

    (k

    Hz)

    2

    4

    6

    8

    10

    12

    14

    16

    -15

    -10

    -5

    0

    5

    10

    15

    dBElevation (deg)0 100 200

    -15

    -10

    -5

    0

    5

    10

    15

    Fre

    quency

    (k

    Hz)

    2

    4

    6

    8

    10

    12

    14

    16

    Elevation (deg)0 100 200

    Fre

    quency

    (k

    Hz)

    2

    4

    6

    8

    10

    12

    14

    16

    -15

    -10

    -5

    0

    5

    10

    15

    Fre

    quency

    (k

    Hz)

    2

    4

    6

    8

    10

    12

    14

    16

    -15

    -10

    -5

    0

    5

    10

    15

    dB

    Full HRTF

    Head and torso

    Pinna

    umd00_HRTF_contributions.ai

    CONTRIBUTIONS TO THE HRTF

    jh_structural_model.ai

    A STRUCTURAL MODEL

    VirtualSource

    xR(t)xL(t)

    x(t)

    + +

    Head Torso Room Head Torso Room

    Pinna Pinna

    Sound Signal

    ubc_sphere_model.ai

    THE SPHERICAL-HEAD MODEL

    VirtualSource

    q

    xR(t)xL(t)

    x(t)

    DTL(q)

    HHsL(q)

    DTR(q)

    HHsR(q)

    jh_sphere_assess.ai

    ASSESSING THESPHERICAL HEAD MODEL

    Only one parameter -- easily customized

    Well focused

    Good left/right position

    No up/done control -- image elevated

    With a head tracker: Moderately externalized Little front/back confusion

    Without a head tracker: Internalized Usually seems to be in back

    jh_torso_reflections.ai

    ELLIPSOIDAL-TORSO MODEL

    soundsource

    f

    HeadModel

    HeadModel

    rT

    DTT

    rT

    DTT

    = torso reflection coefficient

    = torso reflection delay

    jh_ellipsoid_assess.ai

    ASSESSING THEELLIPSOIDAL TORSO MODEL

    Five parameters; still easily customized

    Provides an elevation cue Significant below 3 kHz Ineffective in median plane

    Only one component of a full model

    jh_structural_model_2.ai

    STRUCTURAL HRTF MODEL

    HeadModel

    HeadModel

    TorsoModel

    PinnaModel

    DTH(q)

    HHS

    (q)

    Head Model

    rT

    DTT(q,f)

    Torso Model

    jh_structural_model_3.ai

    SIMPLIFIED PINNA MODEL

    kP(f)

    DTP(f)

    Fixed-poleresonator

    kP(f)

    DTP(f)

    Fixed-poleresonator

  • umd00_systems.ai

    SPATIAL SOUND SYSTEMS

    Multichannel

    Two-channel: headphones

    Two-channel: crosstalk-canceled loud speakers

    umd00_systems2.ai

    MULTICHANNEL SYSTEMS

    Pros Works with a large audience No customization needed Conceptually simple

    Cons Speakers must be distant Many channels needed for full 3-D Space consuming, expensive

    umd00_systems3.ai

    TWO-CHANNEL: HEADPHONES

    Pros Can reproduce full 3-D with only 2 channels Private and non-interfering Conceptually simple

    Cons Uncomfortable for extended use Clumsy for a large audience Requires customization for full 3-D Difficult to achieve frontal externalization

    xL(t) xR(t)

    umd00_systems4.ai

    TWO-CHANNEL: CROSSTALK-CANCELED

    LOUD SPEAKERS

    Pros Can reproduce full 3-D with only 2 channels Unencumbered listening

    Cons Small "sweet spot" Cannot be used with a large audience Requires customization for full 3-D Difficult to get near or rear locations

    xL(t) xR(t)

    Inverse HRTFs

    umd00_customization.ai

    APPROACHES TOCUSTOMIZATION

    Measure exact HRTF for each person Acoustic Computational

    Nearest-neighbor Trial and error Anthropometry

    Scale a standard HRTF Global Pinna/head/torso components

    Use an adaptive model Match to anthropometry Match to exact HRTF

    umd00_problems.ai

    CHALLENGESAND

    OPPORTUNITIES

    Frequency range (combining partial HRTFs)

    Elevation perception Front/back confusion Low elevations

    Range perception Headphones: externalization Median plane Frontal Speakers: back locations

    Transducers Headphone compensation Loudspeaker "sweet spot"

    Latency in dynamic systems

    Room acoustics