Virtualized Audioas a
Distributed Interactive Application
Peter A. DindaNorthwestern University
Access Grid Retreat, 1/30/01
2
Overview
• Audio systems are pathetic and stagnant• We can do better: Virtualized Audio (VA)
• VA can exploit distributed environments• VA demands interactive response
Wha
t I b
elie
veW
hy I
care
3
Performer
Microphones
Performance Room
Mixer
Amp
Listening Room
Listener
Sound Field 1Sound Field 2
Loudspeakers
Headphones
Traditional Audio (TA) System
4
TA Mixing And Filtering
PerformerPerformanceRoom Filter
Mixing(reduction)
Amp FilterLoudspeaker
Filter
MicrophoneSampling
ListeningRoom Filter
Listener’s Location
and HRTF
Headphones
Perception of LoudspeakerReproduced
Sound
Listener’sLocation
and HRTF
Perception of Real Sound
Perception of Headphone Reproduced
Sound
5
Virtualized Audio (VA) SystemPerformer
Microphones
Performance Room
Separation
Real Listening Room
Listener atVirtual Location
Sound Field 1VirtualSound Field 3
Headphones
Auralization
Virtual Performer(virtual speaker)
Virtual Listening Room
Sound Field 2
Virtual Performer
Amp
Listener
HSS
HRTF
Listener
6
VA: Filtering, Separation, and Auralization
PerformerPerformanceRoom Filter
SoundSeparation
Amp FilterHSS
LoudspeakerFilter
MicrophoneSampling
ListeningRoom Filter
Listener’s Location
and HRTF
Perception of Virtualized
Audio
Listener’sLocation
and HRTF
Perception of Real Sound
Nearly Identical
Auralization
Headphones
Perception of Virtualized
Audio
HRTF
Listener
VA Forward Problem
VA Reverse Problem
7
The Reverse Problem -Source Separation
Human Space Microphones
RecoveryAlgorithms
microphonesignals
microphonepositions
“Reverse Problem”
sound source positions
room geometryand properties
sound source signals
other inputs
•Microphone signals are a result of sound source signals, positions, microphone positions, and the geometry and material properties of the room.
•We seek to recover these underlying producers of the microphone signals.
8
The Reverse Problem• Blind source separation and deconvolution• Statistical estimation problem• Can “unblind” problem in various ways
– Large number of microphones– Tracking of performers– Separate out room deconvolution from source
location– Directional microphones– Phased arraysPotential to trade off computational requirements
and specialized equipment
Much existing research to be exploited
9
Transducer BeamingTr
ansd
ucer Wave
L
L
L
LLL
10
Phased Arrays of Transducers
L
Tran
sduc
er
Tran
sduc
er
Transd
ucer
Transd
ucer
Transd
ucer
Transd
ucer
Transd
ucer
Transd
ucer
Transd
ucer
L
Phased Array Physical Equivalent
11
The Forward Problem - Auralization
AuralizationAlgorithms
sound source positions
room geometry/properties
sound source signals
Listener positions
Listener signals
Listener wearing Headphones (or HSS scheme)•In general, all inputs are a function of time
•Auralization must proceed in real-time
12
Ray-based Approaches To Auralization
• For each sound source, cast some number of rays, then collect rays that intersect listener positions– Geometrical simplification for rectangular spaces
and specular reflections• Problems
– Non-specular reflections requires exponential growth in number of rays to simulate
– Most interesting spaces are not rectangular
13
Wave Propagation Approach
• Captures all properties except absorption
• absorption adds 1st partial terms
2p/2t = 2p/2x + 2p/2y + 2p/2z
14
Method of Finite Differences• Replace differentials with differences• Solve on a regular grid• Simple stencil computation (2D Ex. in Fx)• Do it really fast
pdo i=2,Y-1 pdo j=2,X-1 workarray(m0,j,i) = (.99) * ( $ R*temparray(j+1,i) $ + 2.0*(1-2.0*R)*temparray(j,i) $ + R*temparray(j-1,i) $ + R*temparray(j,i+1) $ + R*temparray(j,i-1) $ - workarray(m1,j,i) ) endpdo endpdo
15
How Fast is Really Fast?• O(xyz(kf)4 / c3) stencil operations per second
are necessary– f=maximum frequency to be resolved– x,y,z=dimensions of simulated space– k=grid points per wavelength (2..10 typical)– c=speed of sound in medium
• for air, k=2, f=20 KHz, x=y=z=4m, need to perform 4.1 x 1012 stencil operations per second (~30 FP operations each)
16
LTI Simplification• Consider the system as LTI - Linear and Time-
Invariant• We can characterize an LTI system by its
impulse response h(t)• In particular, for this system there is an impulse
response from each sound source i to each listener j: h(i,j,t)
• Then for sound sources si (t), the output mj(t) listener j hears is mj (t) = ih(i,j,t) * si(t), where * is the convolution operator
17
LTI Complications• Note that h(i,j) must be recomputed whenever
space properties or signal source positions change
• The system is not really LTI– Moving sound source - no Doppler effect
• Provided sound source and listener movements, and space property changes are slow, approximation should be close, though.
• Possible “virtual source” extension
18
Where do h(i,j,t)’s come from?
• Instead of using input signals as boundary conditions to wave propagation simulation, use impulses (Dirac deltas)
• Only run simulation when an h(i,j,t) needs to be recomputed due to movement or change in space properties.
19
Exploiting a Remote Supercomputer or the Grid
Headphones
HRTF
Room Filter (source 1)
Room Filter(source 2)
Stream from Source 1
Stream from Source 2
Finite Difference Simulation of
Wave Equation
Room Model
Impulse Response
FIR/IIR FilterEstimation
Source andListener Positions
Client Workstation
Remote Supercomputer
or the Grid
20
Interactivity in the Forward Problem
AuralizationAlgorithms
Listener positions
Listener signals
Listener wearing headphones
sound source positions
room geometry/properties
sound source signals
21
Full Example of Virtualized Audio
Human Space Microphones
RecoveryAlgorithms
microphonesignals
microphonepositions
“Reverse Problem”
sound source positions
room geometryand properties
sound source signals
other inputs
Human Space Microphones
RecoveryAlgorithms
microphonesignals
microphonepositions
“Reverse Problem”
sound source positions
room geometryand properties
sound source signals
other inputs
Human Space Microphones
RecoveryAlgorithms
microphonesignals
microphonepositions
“Reverse Problem”
sound source positions
room geometryand properties
sound source signals
other inputs
Com
bine
AuralizationAlgorithmsroom geometry/properties
sound source signals
sound source positions
22
VA as a Distributed Interactive Application
• Disparate resource requirements– Low latency audio input/output– Massive computation requirements
• Low latency control loop with human in the loop• Response time must be bounded• Adaptation mechanisms
– Choice between full simulation and LTI simplification• number of listeners
– Frequency limiting versus delay– Truncation of impulse responses– Spatial resolution of impulse response functions
23
Conclusion• We can and should do better than the
current state of audio• Lots of existing research to exploit
– The basis of virtualized audio• Trade off computation and specialized
hardware• VA is a distributed interactive application
VA forward problem currently being implemented at Northwestern