cosc 6326/psych6750x audition and auditory displays

Cosc 6326/Psych6750X

Audition and Auditory Displays

Use of auditory displays

Sound in information display

• speech provides a high bandwidth communication channel

• audition is a long distance sense without field of view restrictions

• Sound is useful for information display (Cohen & Wenzel 1995) – when origin of message is a sound (voice, music)

– when message is simple and short (e.g. event markers)

– when message will not be referred to later (e.g. time)

– when message deals with events in time– warnings or prompts (hearing is always on, no field

of view issues)– continuously changing information (e.g.

countdown)– when other systems (e.g. vision) are overloaded

– when verbal response is required (compatibility)

– when illumination or disability prevents vision (e.g. alarm clock, limited field of view, blindness)

– when the user moves from place to place (sound as an ubiquitous I/O channel)

Sonification

• In ‘visualization’ situations, ‘sonification’ of data can assist in the exploration of complex datasets

• In these applications ‘realism’ is typically not a major issue

• Sound can help interpret complex or multidimensional data; can provide an independent display dimension

• In addition to information display, in immersive displays sound contributes to: – realism, situational awareness and presence– ambience and emotive context– cueing visual attention– natural communication– space perception

Realism and ambience

• High quality sound improves perceived ‘quality’ of visual displays

• Sounds in the environment provides vital information that contributes to situational awareness

• Persistence of sounds of objects out of field of view may help maintain object permanence

• Sound is believed to be vital for conveying emotion and ambience in movies

• Ambient sounds can be realistic or abstract (e.g. music to set mood)

• Absence of appropriate sound degrades realism

• If background sounds are not well matched to visuals participant may feel detached –‘presence’ may be degraded

• Relation between presence and realism is not straightforward (later lecture)

• Sound is an omni-directional sense and may help user feel immersed in the VE

• Auditory collision cues may help navigating a VE (especially with HMDs)

Audition

Sound

• Sound is “mechanical vibrations and waves of an elastic medium, particularly in the frequency range of human hearing (16 Hz to 20 kHz)”

• Normally, the medium is air. Sound is an air pressure wave.

• Sound is usually used to describe the physical stimulus.

• Audition refers to perception.

• An auditory event is usually elicited by a sound event.

• A sinusoidal pressure wave is known as a pure tone.

t

x(t)

T0=1/f0

• Sinusoid– x(t) = A cos(2f0t + )

A is amplitude

f0 is frequency

is phase

– T0 is period

is related to time shift of peak

f

cwavelength ==λ

Dimensions of sound • Harmonic content: pitch, melody, harmony,

waveshape, timbre, vibrato

• Timing: duration, tempo, rhythm,

• Loudness, envelope

• Spatial: azimuth, elevation, distance

• Ambience: resonance, reverberation, spaciousness

• Representation: literal, auditory icons, abstract

• Perceptual and physical dimensions are analogous but distinct– pitch and frequency (directly related for pure

tones)– loudness and intensity– timbre and complexity

Matlin and Foley, Sensation and Perception

Kandel et al, Principles of Neural Science

Physiology and psychophysics

• Cochlea performs mechanical spectral analysis of sound signal

• Pure tone induces traveling wave in basilar membrane.– maximum mechanical displacement along

membrane is function of frequency (place coding)

• Displacement of basilar membrane changes with compression and rarefaction (frequency coding)

Matlin and Foley, Sensation and Perception

Kandel et al, Principles of Neural Science

Perception of pitch

• Along the basilar membrane, hair cell response is tuned to frequency– each neuron in the auditory nerve responds to

acoustic energy near its preferred frequency– preferred frequency is place coded along the

cochlea. Frequency coding believed to have a role at lower frequencies

• Higher auditory centers maintain frequency selectivity and are ‘tonotopically mapped’

• Pitch is related to frequency for pure tones.

• For periodic or quasi-periodic sounds the pitch typically corresponds to inverse of period

• Some have no perceptible pitch (e.g. clicks, noise)

• Sounds can have same pitch but different spectral content, temporal envelope … timbre

Perception of loudness

• Intensity is measured on a logarithmic scale in decibels

• Range from threshold to pain is about 120 dB-SPL

• Loudness is related to intensity but also depends on many other factors (attention, frequency, harmonics, …)

Spatial hearing

• Auditory events can be perceived in all directions from observer

• Auditory events can be localized internally or externally at various distances

• Audition also supports motion perception– change in direction– Doppler shift

• Ability to localize depends on sound source and environment– a tone in reverberant room is difficult to locate

in time and space– a click in an anechoic chamber, on the other

hand, is precisely located and time limited

Auditory Scene Analysis

• Process of separating out the different sources present in the environment

• Detection and segregation of distinct sources

• Grouping of sounds in spatial and temporal proximity into single streams

Cocktail party effect

• In environments with many sound sources it is easier to process auditory streams if they are separated spatially

• Spatial sound techniques can help in sound discrimination, detection and speech comprehension in busy immersive environments

Spatial Auditory Cues

• Two basic types of head-centric direction cues– binaural cues– spectral cues

Binaural Directional Cues

• When a source is located eccentrically it is closer to one ear than the other– sound arrives later and weaker at one ear– head ‘shadow’ also weakens sound arrive at

opposite ear

• Binaural cues are robust but ambiguous

http://headwize.com/tech/aureal1_tech.htm

• Interaural time differences (ITD)– ITD increase with directional deviation from the

median plane. It is about 600 s for a source located directly to one side.

– Humans are sensitive to as little as 10 s ITD. Sensitivity decreases with ITD.

– For a given ITD, phase difference is linear function of frequency

– For pure tones, phase based ITD is ambiguous

– At low to moderate frequencies phase difference can be detected. At high frequencies can use ITD in signal envelope.

– ITD cues appear to be integrated over a window of 100-200ms (binaural sluggishness, Kollmeier & Gillkey, 1990)

• Interaural intensity differences (IID)– With lateral sources head shadow reduces

intensity at opposite ear– Effect of head shadow most pronounced for high

frequencies. – IID cues are most effective above about 2000 Hz– IID of less than 1dB are detectable. At 4000 Hz a

source located at 90° gives about 30 dB IID (Matlin and Foley, 1993)

Goldstein, Sensation and Perception

Ambiguity and Lateralization

Ambiguity and Lateralization

• These binaural cues are ambiguous. The same ITD/IID can arise from sources anywhere along a ‘cone of confusion’

• Spectral cues and changes in ITD/IID with observer/object motion can help disambiguate

• When directional cues are used in headphone systems, sounds are lateralised left versus right but seem to emanate from inside the head (not localised)

• also for near sources (less than 1 m) there is significant IID due to differences in distance to each ear even at lower frequencies (Shinn-Cunningham et al 2000)

• Intersection of these ‘near field’ IID curves with cones of confusion constrains them to toroids of confusion

Spectral Cues

• Pinnae or outer ears and head shadow each each ear and create frequency dependent attenuation of sounds that depend on direction of source

• Pinnae are relatively small, spectral cues are effective predominately at higher frequencies (i.e. above 6000 Hz)

• Direction estimation requires separation of spectrum of sound source from spectral shaping by the pinnae

• Shape of the pinnae shows large individual differences which is reflected in differences in spectral cues

Distance Cues

• anechoic– intensity decreases

with distance

– attenuation is higher at high frequency

– confound with spectrum and intensity of source

• Near field IIDhttp://headwize.com/tech/aureal1_tech.htm

http://headwize.com/tech/aureal1_tech.htm

• reverberation– ratio of direct to reverberant energy indicates

distance wrt environment– reverberation pattern indicates ‘spaciousness’

of the environment– reverberation is more realistic but can degrade

localisation, speech recognition …

Visual-Auditory Interactions

• Auditory cues associated with visual targets can cue visual attention

• Latency for audition is less than vision• A sound associated with visual target

– can speed visual search– can reduce response times– facilitate saccadic eye movements– can cue attention outside the field of view

• Ventriloquism and visual capture– When a visual and auditory source are grouped,

the sound is usually perceived in the direction of the visual target

Auditory/Aural Displays

• Headphone displays– Precise independent control of inputs to each ear. – Individual display. – Closed ear type can exclude external sounds. Reduces

interference from external sources; simplifies AR systems.– Entail an encumbrance. – Diotic, dichotic (stereo) and spatialised displays– Head fixed frame of reference. Display needs to be head

tracked to register with virtual world.

• Speaker systems– Simpler, less encumbrance, multi-user– Cannot ‘occlude’ real world sounds but can sometimes

mask– Complication with echoes and cross-coupling between

channels– Interference from/with visual displays– World frame of reference. – Subwoofer allows for deep bass. Could augment

headphones

Spatialised audio

• simple ITD, IID cues in a display lateralize a sound. Sound is not ‘externalized’

• spatialised audio: generate most of the spatial cues in real world environment using signal processing

• with appropriate modeling of sound sources and user tracking can provide a compelling illusion of spatial sound in a VE

http://www.engr.sjsu.edu/~knapp/HCIROD3D/3D_sys1/binaural.htm

Binaural recording







• Head related transfer function (HRTF)– describes how sound at a given location is

transformed (by pinnae etc.) as it travels to the ear, as a function of frequency

– function of source direction and distance and frequency (4D)

– equivalent to the Fourier transform of the response to a impulse source at the desired position

– IID and ITD as well as spectral cues are incorporated (interaural differences in HRTF)

Shilling & Shinn-Cunningham 2001

0.15 m

1.0 m

• To simulate a source at a given location– correct HRTF for response of the speaker

system– convolve source with impulse response

corresponding to corrected HRTF. – multiple sources possible by adding up HRTF

transformed signals

• To measure HRTF– place microphones in ear canals– measure microphone response to short clicks at

various locations– correct for response characteristics of

microphones

• Lengthy, painstaking process.

• Storage requirements for dense sampling

Cohen and Wenzel, 1995

• Limitations in practice:– sampling: often one distance and limited number of

directions– interpolated for other locations– generic versus individualized HRTF’s (front/back

confusion and elevation errors)– HRTF is a characteristic of the user and does not

model effects of environment. – need to track head position. Delay can be problematic.

HRTF measurement using model head (KEMAR)

Room Modeling

• Can model the effects of reverberation, echoes etc. for a room transfer function– Vary with listener and source position

– can have very long response

– combinatorially impractical

• Has been effort to develop efficient methods for acoustic modeling of rooms

• Improves realism and distance estimation but difficult for real-time immersive VEs

Shilling & Shinn-Cunningham 2001

Speaker Systems

• Spatialised audio complicated by fact that both ears hear each speaker and that reverberation will occur

• Effectiveness is sensitive to speaker placement

• Stereo speakers: sound seems to be localised between the speakers

• increasing number of speakers increases ability to localise sounds (e.g. 5.1 surround sound systems)

• more complex schemes are possible using DSP but very challenging (‘ambisonics’)– cancel interaural cross-talk based on HRTF

corresponding to speaker location– computations are complex, not robust and must be

done in real time if head tracked

Auditory Rendering

• Auditory modeling/rendering of VEs– sampling– synthesis of complex sounds

• spectral• physical models• granular synthesis

– Filtering: HRTFs, reverberation, room modeling– Object occlusion, air absorption, Doppler motion

cosc 6326/psych6750x audition and auditory displays

Documents

sound event

sound voice

ambiencehigh quality

immersive displays sound

major issue sound

vital information

auditory event

information displayspeech