audio engineering society convention papernjb/research/aes129_balloon.pdf · audio engineering...

12
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4–7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42 nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Estimating Room Impulse Responses from Recorded Balloon Pops Jonathan S. Abel 1 , Nicholas J. Bryan 1 , Patty P. Huang 1 , Miriam A. Kolar 1 , and Bissera V. Pentcheva 2 1 Center for Computer Research in Music and Acoustics (CCRMA), Department of Music, Stanford University, Stanford, CA, 94305, USA 2 Department of Art & Art History, Stanford University, Stanford, CA, 94305, USA Correspondence should be addressed to Jonathan S. Abel ([email protected]) ABSTRACT Balloon pops are convenient for probing the acoustics of a space, as they generate relatively uniform radiation patterns and consistent “N-wave” waveforms. However, the N-wave spectrum contains nulls which impart an undesired comb-filter-like quality when the recorded balloon pop is convolved with audio. Here, a method for converting recorded balloon pops into full audio bandwidth impulse responses is presented. Rather than directly processing the balloon pop recording, an impulse response is synthesized according to the echo density and frequency band energies estimated in running windows over the balloon pop. Informal listening tests show good perceptual agreement between measured room impulse responses using a loudspeaker source and a swept sine technique, and those derived from recorded balloon pops. 1. INTRODUCTION The impulse response of a room can be measured in many ways and with varying degrees of accu- racy. A preferred method, for its precision and high signal-to-noise ratio, is the swept-sine tech- nique [1, 2], which requires a suite of recording and playback equipment, including a powerful loud- speaker capable of output over a wide bandwidth. Even when using a handheld recorder to capture the loudspeaker response, the material and logistical re- quirements of this technique are significant. To min- imize equipment requirements, the excitation signal can be manually generated using a starter pistol or an orchestral whip; however, a starter pistol is in- appropriate or prohibited in some contexts, and a whip has an undesirable radiation pattern. Both are difficult to trigger remotely, away from reflect- ing objects.

Upload: others

Post on 14-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Audio Engineering Society

Convention PaperPresented at the 129th Convention

2010 November 4–7 San Francisco, CA, USA

The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that havebeen peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced fromthe author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takesno responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio

Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rightsreserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from theJournal of the Audio Engineering Society.

Estimating Room Impulse Responses fromRecorded Balloon Pops

Jonathan S. Abel1, Nicholas J. Bryan1, Patty P. Huang1, Miriam A. Kolar1, and Bissera V. Pentcheva2

1Center for Computer Research in Music and Acoustics (CCRMA), Department of Music, Stanford University,Stanford, CA, 94305, USA

2Department of Art & Art History, Stanford University, Stanford, CA, 94305, USA

Correspondence should be addressed to Jonathan S. Abel ([email protected])

ABSTRACTBalloon pops are convenient for probing the acoustics of a space, as they generate relatively uniform radiationpatterns and consistent “N-wave” waveforms. However, the N-wave spectrum contains nulls which impartan undesired comb-filter-like quality when the recorded balloon pop is convolved with audio. Here, a methodfor converting recorded balloon pops into full audio bandwidth impulse responses is presented. Rather thandirectly processing the balloon pop recording, an impulse response is synthesized according to the echodensity and frequency band energies estimated in running windows over the balloon pop. Informal listeningtests show good perceptual agreement between measured room impulse responses using a loudspeaker sourceand a swept sine technique, and those derived from recorded balloon pops.

1. INTRODUCTION

The impulse response of a room can be measuredin many ways and with varying degrees of accu-racy. A preferred method, for its precision andhigh signal-to-noise ratio, is the swept-sine tech-nique [1, 2], which requires a suite of recordingand playback equipment, including a powerful loud-speaker capable of output over a wide bandwidth.Even when using a handheld recorder to capture the

loudspeaker response, the material and logistical re-quirements of this technique are significant. To min-imize equipment requirements, the excitation signalcan be manually generated using a starter pistol oran orchestral whip; however, a starter pistol is in-appropriate or prohibited in some contexts, and awhip has an undesirable radiation pattern. Bothare difficult to trigger remotely, away from reflect-ing objects.

Page 2: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

A logistically simple alternative to these acousticmeasurement scenarios is to use a balloon pop asthe excitation source and a handheld recorder asthe recording device. While not as repeatable assome other methods, this system provides a numberof advantages: it is physically compact and easilyportable. Its power requirements are minimal, theradiation pattern of the popped balloon is relativelyuniform, and the “N-wave” time response of the bal-loon pop is known (determined by the balloon diam-eter) [3].

Deriving an accurate room impulse response froma recorded balloon pop is somewhat problematic,however. While a number of important room acous-tic features can be inferred from the raw response,convolving the balloon pop response with other au-dio material—the process used to impart the mea-sured room characteristics on outside sound sources,for auralization purposes, for example—produces anundesirable comb-filter-like quality. This sonic dis-tortion arises from the pattern of nulls present inthe balloon pop waveform N-wave Fourier transform.The nulls are not part of the spatial impulse responsebeing measured, but an artifact of the excitationmechanism. Because the nulls are deep and locatedat frequencies dependent on the balloon pop radia-tion direction, it is not possible to invert the filteringimposed by the N-wave, and other means must beemployed to produce an impulse response suitablefor use in a convolutional reverberator.

In this paper we present a method for convertingrecorded balloon pops into full audio bandwidth im-pulse responses that can be convolved with audiomaterial to create auralizations of the acoustic effectof the measured space that do not contain artifactsof the balloon pop excitation. Rather than process-ing the balloon pop recording to remove its N-wavecharacteristics, the approach we take is to analyzethe recorded balloon response and then synthesizean artifact-free impulse response by generating andfiltering a pattern of echoes perceptually similar tothose seen in the raw balloon response.

As an example application, we present our conver-sion of balloon pop impulse responses and a con-volutional reverberation auralization using the es-timated response as implemented for the Icons ofSound Project, an interdisciplinary exploration ofthe Byzantine-era perceptual experience within the

religious center Hagia Sophia. In our application,balloon pops recorded at this 1,500 year-old WorldHeritage Site [4] are converted to synthesized im-pulse responses, which are then used to create anauralization of the Cherubikon, a hymn performedthere in the 6th century, in order to reconstruct ahistorical performance scenario. It is worth notingthat in this work, the application necessitated themeasurement method; Hagia Sophia is exemplaryof a venue whose public context fundamentally con-strains the acoustic measurement method. Otherhistorical, archaeological and naturally formed sitessimilarly require a rapid measurement process withlittle equipment. Thus, the balloon pop/handheldrecorder scenario is mandated, and a functional im-pulse response recovery method necessary.

This paper is organized as follows: §2 explains thedynamics of balloon pops. §3 presents methods forestimating the time evolution of the echo densityover the course of the impulse response and dis-cusses methods for synthesizing that echo densityprofile. §4 examines the spatial character of the bal-loon pop response, describing methods for estimat-ing and synthesizing the time behavior of the inter-channel cross-correlation (synthesizing stereo pulsesequences matching the measured balloon recordingcross-correlation as it evolves over time). §5 discussesthe estimation of the balloon pop energy as a func-tion of time and frequency, and methods for imprint-ing that energy on the different frequency bands ofthe synthesized echo sequence. §6 give results includ-ing synthesis of an impulse response using data froma balloon pop recording made in Memorial Churchat Stanford University as it compares to an impulseresponse measured in the same location using a loud-speaker and swept-sine technique. Finally, balloonpops recorded in Hagia Sophia in Istanbul are con-verted into impulse responses having reverberationtime and other acoustical parameters consistent withthose reported in the literature [5, 6, 7].

2. BALLOON POP DYNAMICS

Fundamental to our study is the nature of the bal-loon pop acoustic disturbance. Bursting balloonsgenerate N-wave type pressure waveforms, as putforth by Deihl and Carlson in [3]. This can be seenby considering the geometry shown in Fig. 1 (a). Aspherical balloon of radius ρ bursts in the presence

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 2 of 12

Page 3: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

Fig. 1: Balloon Geometry and N-Wave. (a) Spheri-cal balloon of radius ρ inflated to pressure p and mi-crophone positioned a distance R from the ballooncenter. (b) An N-wave waveform 2ρ/c in durationradiated from the bursting balloon.

of a microphone positioned a distance R from theballoon center. Assuming the balloon skin instantlyvanishes at time t = 0, a spherical region havingpressure p remains.

Over time, half of the pressure p/2 radiates outwardfrom the origin while the other half propagates in-ward, producing an inverting reflection from the bal-loon center. The pressure along an arc initially atradius x ≤ ρ inside the balloon will arrive at a mi-crophone a radius R from the balloon center at time(R − x)/c, where c is the speed of sound in air,and with pressure px/2R due to spherical spread-ing. Note that the pressure seen at the microphonevaries linearly with time, and this trend is contin-ued with the arrival of the portion of the pressureinitially traveling inward toward the balloon center.The pressure waveform seen at the microphone isthen

n(t) =

{ pρ

2R(R − ct), ct ∈ [R − ρ,R + ρ],

0, otherwise,(1)

as plotted in Fig. 1 (b).

As an experimental example, a balloon pop recordedin Memorial Church is plotted on a logarithmic time

100

101

102

103

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

time (ms)

ampl

itude

Fig. 2: Balloon Pop Recording, Memorial Church,Stanford University. The recorded waveform is plot-ted on a logarithmic time axis to show details of theearly response. The direct path and floor reflectionare visible as N-wave waveforms.

scale in Fig. 2. The recorded pressure waveform, de-noted b(t), reveals individual arrivals of the balloonpop response onset, giving way to a diffuse late-fieldresponse. The direct path arrival and reflection fromthe marble floor have the character of the idealizedN-wave response, but with a prolonged onset andrelease likely caused by the finite time taken for theballoon to burst. Note that the leading edges of thedirect path and floor reflection arrivals are somewhatpeaked. This feature may be the result of the nonlin-ear propagation mechanism responsible for the for-mation of shock waves in which high-pressure wave-form regions travel faster than low-pressure wave-form regions [8].

Of interest in using balloon pop recordings as im-pulse responses is the spectrum N(ω) of the balloonpop waveform n(t). The magnitude transform ofthe direct path arrival of the balloon pop is shownin Fig. 3. The N-wave structure of the balloon pophas spectral nulls at dc and a harmonic series of fre-quencies at the inverse balloon diameter (expressedin frequency units),

N(ω) =pc

2R

ν sin(ν) − cos(ν)jν2

, (2)

where ν = ωc/ρ is the frequency ω scaled by halfthe sound propagation time across the balloon. It isthese nulls which impart an unwanted comb-filter-like character. The overall spectrum of the balloonpop response in an acoustic space does not containthis precise series of nulls, as different radiation di-rections from the balloon are associated with differ-

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 3 of 12

Page 4: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

10−1

100

101

−50

−40

−30

−20

−10

0

10

frequency (kHz)

mag

nitu

de (

dB)

Fig. 3: Measured N-wave Transform Magnitude.The transform magnitude of the direct path ar-rival for the Memorial Church balloon pop responseof Fig. 2.

ent diameters (except in the case of spherical bal-loons). Because of this, the nulls do not overlap,except at dc, and disallow any form of linear inversefiltering.

3. ROOM ECHO DENSITY

We now examine the measured balloon pop record-ing to produce a sequence of full audio bandwidthechoes. The echo sequence is generated in such a wayas to create the same psychoacoustic impression asthe original sequence of arrivals.

3.1. Balloon Pop Recording Echo Density

For a pattern of echoes of at least a modest echodensity, the psychoacoustic impression created doesnot depend on the precise timing of the arrivals, butrather the overall echo density [9]. The approach wetake here is not to identify and replace each of theN-wave arrivals present in the recorded balloon re-sponse with full bandwidth pulses, but rather to syn-thesize a sequence of pulses having the same evolvingdensity as that of the measured balloon pop responseb(t).

The normalized echo density (NED), introducedin [10], measures the echo density of a sequence ona scale ranging from 0 to around 1 by comparingthe percentage of sequence outliers to that expectedfor Gaussian noise. Over a sliding reverberation im-pulse response window, the NED η(t) is the fractionof impulse response taps which lie outside the win-

dow standard deviation:

η(t) =1/erfc(1/

√2)

2∆ + 1

t+∆∑τ=t−∆

1{h2(τ) > σ2(t)}, (3)

where h(t) is the reverberation impulse response (as-sumed to be zero mean), 2∆+1 is the window lengthin samples, σ(t) is the window standard deviation,

σ2(t) =1

2∆ + 1

t+∆∑τ=t−∆

h2(τ), (4)

1{·} is the indicator function (returning one whenits argument is true and zero otherwise), anderfc(1/

√2) .= 0.3173 is the expected fraction of sam-

ples lying outside a standard deviation from themean for a Gaussian distribution. In the presenceof a few energetic arrivals, relatively few responsesamples will lie outside the window standard devia-tion and the NED will be small, near zero. Gaussiannoise, on the other hand, produces NED values near1. It turns out that NED is a reliable indicator ofperceived echo density [9] and is used here to esti-mate the perceived echo density along the recordedballoon pop response b(t).

3.2. Echo Sequence Synthesis

In [11], an expression is presented relating the nor-malized echo density η(t) to the absolute echo den-sity (AED) e(t) measured in echoes per second,

η(t) =e(t)

e(t) + 1/δ, (5)

where δ represents the typical echo duration.

To generate a sequence of echoes, the balloon popnormalized echo density ηb(t) is estimated and con-verted to absolute echo density. Rearranging (5), wehave

e(t) =ηb(t) c/2ρ

1 − ηb(t). (6)

Note that the balloon pop response echoes are N-wave doublets, not simple pulses. However, by inte-grating the response b(t), the N-wave functions areconverted into parabolic pulses as seen in the idealcase in Fig. 4,∫ t

0

n(τ)dτ = max{

0,pρ

2c

[1 −

(ct − R

ρ

)2]}, (7)

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 4 of 12

Page 5: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

Fig. 4: Integrated N-wave waveform. The time inte-gral of an N-wave is a parabolic pulse 2ρ/c in width.

19.5 20 20.5 21 21.5−0.5

0

0.5

time (ms)

ampl

itude

Fig. 5: Recorded Balloon Pop N-wave (solid), Inte-grated N-wave (dashed).

and in the measured response in Fig. 5.

Taking δ as duration of a full bandwidth pulse, andfixing e(t), ??etab) can be used to relate the balloonresponse NED ηb(t) to the “impulse response” NED,ηh, by

ηh =ηb

(1 − ηb) · 2ρ/δc + ηb(8)

where 2ρ/cδ is the ratio of the balloon and full band-width pulse durations, which are on the order of10−20 for the balloons and loudspeakers we’ve used.The relationship between the balloon and full band-width NEDs is shown in Fig. 7. When ηb is near zero,indicating a sparse echo sequence, so is ηh. When ηb

is close to one, indicating Gaussian statistics, so isηh. For intermediate values of ηb, ηh is smaller thanηb.

The integrated balloon response NED profile ηb(t)is shown in Fig. 6, along with the NED profile es-timated for a full bandwidth pulse sequence, de-noted ηh(t). (Note that the computed full band-width NED was held fixed at ηh = 0.995 after firstobtaining that value.) The balloon in full bandwidthprofiles take on their typical character for room re-

101

102

103

0

0.5

1

1.5

NE

D

time (ms)

Fig. 6: Recorded, Estimated Normalized EchoDensity Profiles. The NED of the integratedrecorded (Memorial Church) balloon pop is shown(solid) along with the NED estimated for a 20kHz-bandwidth impulse response (dashed).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

normalized echo density

norm

aliz

ed e

cho

dens

ity

Fig. 7: Normalized Echo Density, Echo DurationRelationship. The NED of a sequence of short-duraction echoes is plotted as a function of theNED of the same sequence but with longer-durationechoes. Curves for echo duration ratios of 1, 2, 5, 10,and 20 (top to bottom) are shown.

sponses, starting near zero at the room response on-set, and increasing to near 1 as the room becomesmixed. In this way, the perceived echo density of aloudspeaker-generated impulse response is expectedto be less than that of the balloon response until theroom is fully mixed with an NED near 1.

Once the AED profile e(t) is known, the process de-scribed in [10] is used to synthesize a pattern of ar-rivals. Pulses are generated at Poisson-distributedtime intervals according to the estimated echo den-sity profile e(t). Rather than generating an echo pat-tern covering the entire balloon response, the firstfew clear arrivals (typically the direct path and floorreflection) are placed by hand. The NED of the re-maining portion of the room response is used to gen-erate the echo sequence, denoted p(t). At this stageof the processing, the pulse amplitudes are drawn

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 5 of 12

Page 6: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

from a Gaussian distribution and scaled accordingto the local echo density so as to give a roughly con-stant energy profile. Fig. 8 shows an example syn-thesized pattern of full bandwidth echoes p(t) basedon the balloon response b(t) of Fig. 2.

100

101

102

103

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

ampl

itude

time (ms)

Fig. 8: Synthesized Memorial Church Echo Pattern.

4. ROOM SPATIAL CHARACTER

If the balloon pop response is recorded with stereoor binaural microphones, the perceived spatial char-acter should be retained in the synthesized stereoimpulse response. A simple method for doing so isdescribed below, while more sophisticated and accu-rate methods are available in [12].

The cross-correlation between the left and rightchannels in the mid and high frequencies is indica-tive of the spatial nature of the room response [5, 15].Small cross-correlation values near zero indicate en-ergy independently arriving at the listener’s ears,creating a sense of envelopment. By contrast, cross-correlations near 1 are produced in the presence ofenergy arriving coherently from a specific direction.In this work, we estimate the inter-channel cross-correlation in a running window over the recordedresponse and impose that correlation on synthesizedstereo echo sequences.

The inter-channel cross-correlation evaluated at lagl over a running unit-sum window w(t) is

C(t) =

t+∆∑τ=t−∆

w(τ) b1(τ) b2(τ + l)

[t+∆∑

τ=t−∆

w(τ) b21(τ)

] 12[

t+∆∑τ=t−∆

w(τ) b22(τ)

] 12,

(9)

101

102

103

−0.5

0

0.5

1

time (ms)

corr

elat

ion

101

102

103

−1

−0.5

0

0.5

1

time (ms)

ampl

itude

Fig. 9: Memorial Church Cross-Correlation. Thetime evolution of the cross-correlation computedover a 50ms running window along the stereo record-ing of a balloon pop in Memorial Church is shown(lower) along with the left and right channel impulseresponses overlayed (upper).

Typically, the cross-correlation is evaluated at lagsin the range l ∈ [−1, 1]ms, and the maximum overthat range is taken as the cross-corelation coefficient.

To synthesize a stereo echo pattern having the mea-sured time evolution of the correlation coefficient(and correlation maximizing lag), three statisticallyindependent sequences may be generated accordingto the NED profile. Two sequences are first as-signed to the two channels. The third sequenceis added to the first channel in proportion to thecross-correlation square root. It is then delayed,according to the maximizing correlation lag, scaledand added to the second channel. While this proce-dure generates the desired correlation at the desiredlag, it is computationally cumbersome, and we sug-gest simply imprinting the measured zero-lag cross-correlation on the synthesized response.

A stereo balloon pop response was recorded inMemorial Church using an ear-width spaced pair ofomnidirectional microphones aligned parallel to thefront and back walls of the church. The balloon waspositioned on a line between and perpendicular tothe microphones.

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 6 of 12

Page 7: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

Fig. 9 shows the stereo balloon response on a loga-rithmic time axis; note the direct path, initial floorreflection, and increasing arrival density. Also shownis the zero-lag cross-correlation C(t). The cross-correlation at the arrival time of the balloon popexcitation is near 1.0, consistent with the nearlyoverlapping direct path and initial floor reflectionchannel responses. After this portion of the balloonresponse, a sizable fraction of the energy is returnedfrom the sides of the church including from the bal-conies, and various wooden pew surfaces. The cross-correlation decreases accordingly, fluctuating arounda value of 0.2 where it remains throughout the late-field. In many rooms, one would expect the impulseresponse cross-correlation to decrease to near zeroin the late-field. In this example, however, a pat-tern of energy reflecting between the front and backsurfaces of the church arrives at both microphonesnearly simultaneously, producing similar waveformsand a positive correlation.

In contrast, a balloon pop recorded in Hagia Sophia,via two omnidirectional microphones (worn by therecording engineer just above the ears) has the cross-correlation profile shown in Fig. 10. While the directpath and first reflection are highly correlated, aftera short time the correlation coefficient decreases tonear zero where it remains.

To synthesize a stereo impulse response having aprescribed correlation, we start with the stereo im-pulse pattern generated according to the echo den-sity above. The channels of the echo sequence aregenerated independently, and are expected to have acorrelation near zero throughout their duration. Togenerate a stereo pair having the prescribed cross-correlation, each pair of independent samples is mul-tiplied by the matrix M,

M =

[cos θ sin θ

sin θ cos θ

]. (10)

By setting θ = 0, M becomes the identity ma-trix, and the signals are unchanged. By settingθ = π/4, M = 11>/

√2 (1 being a column of 1s),

a 2-channel monophonic signal is created. A littlealgebra shows that to achieve a cross-correlation ofC(t), θ(t) should be set to

θ(t) =arcsinC(t)

2. (11)

101

102

103

−0.5

0

0.5

1

time (ms)

corr

elat

ion

101

102

103

−1

−0.5

0

0.5

1

time (ms)

ampl

itude

Fig. 10: Hagia Sophia Cross-Correlation. Thetime evolution of the correlation coefficient com-puted over a 50ms running window along the stereorecording of a balloon pop in Memorial Church isshown (lower) along with the left and right channelimpulse responses overlayed (upper).

5. TIME-FREQUENCY RESPONSE

Starting with echo sequences having the proper echodensity and cross-correlation profiles, we next im-print the spectral behavior of the recorded balloonresponse. This is done by splitting the balloon re-sponse into a set of third-octave frequency bands us-ing a perfect amplitude reconstruction zero-phase fil-ter bank. Band energies are estimated over time forthe balloon response and synthesized echo sequence.The balloon response energy profiles are applied astime windows to the echo sequence bands. Finally,the windowed echo sequence bands are equalized sothat the direct arrival is white and then summed.

5.1. Filter Bank

A perfect amplitude reconstruction zero-phase fil-ter bank is used to split the balloon pop responseb(t) and synthesized echo sequence p(t) into fre-quency bands bk(t) and pk(t) respectively. The filterbank is constructed via a cascade of band-splittingsquared Butterworth filters as shown in Fig. 11 [13].The band filter magnitude transfer functions areshown in Fig. 12. By using 3th-order Butterworthband splitting filters applied forward and backward

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 7 of 12

Page 8: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

in time, zero-phase responses and transitions of60dB/octave are achieved.

Fig. 11: Filter Bank Signal Flow Architecture.

Fig. 12: Filter Bank Magnitude Response. Themagnitude responses of third-octave band analy-sis/synthesis filters are shown.

5.2. Band Energy Profile Analysis

Once separated into bands, frequency-dependent en-ergy profiles for the balloon response β2

k(t) and echosequence ν2

k(t) can be computed by smoothing thesquare band signals b2

k(t) and p2k(t) over running win-

dows,β2

k(t) = bk(t)2 ∗ w(t), (12)

ν2k(t) = pk(t)2 ∗ w(t), (13)

where w(t) is a positive smoothing window havingunit sum,

∑t w(t) = 1. A 10ms-long Hanning win-

dow is used so as to reveal short-duration featuresin the original balloon pop band energies.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−60

−50

−40

−30

−20

−10

0

mag

nitu

de (

dB)

time (seconds)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−60

−50

−40

−30

−20

−10

0

mag

nitu

de (

dB)

time (seconds)

Fig. 13: Measured, Extrapolated Band Energies.The dB band energies, smoothed over 50ms Han-ning windows, are shown for the balloon responserecorded in Memorial Church (upper). The bandenergies extrapolated to below the estimated noisefloor (lower).

As an example, consider the Memorial Church bal-loon pop of Fig. 2 (a); its spectrogram appearsin Fig. 14. The balloon pop smoothed band energiesβ2

k(t) are shown in Fig. 13. The high frequenciesdecay relatively quickly, whereas the low frequenciespersist.

Note that the band energies do not decay fully, butarrive at a frequency-dependent noise floor (around -40dB). So as to provide a more natural decay beyondthe measurement noise floor, the band energies β2

k(t)are extrapolated using the method described in [14].Fig. 15 and Fig. 13 (b) show the extended spectro-gram and band energy profiles respectively.

5.3. Band Energy Profile Synthesis

The different filter bank bands of the echo sequencepk(t) are then imprinted with the band amplitudesby first windowing with the balloon pop amplitudeprofiles normalized by those of the noise sequence,

γk(t) = βk(t) / νk(t). (14)

Then windowed band echo sequences pk(t)γk(t) aresummed to form a synthesis b̃(t) of the ballon pop

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 8 of 12

Page 9: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

Fig. 14: Balloon Pop Spectrogram. The spectro-gram above was generated from an unprocessed bal-loon pop recorded in Memorial Church.

Fig. 15: Extended Balloon Pop Spectrogram. Theballoon pop of Fig. 14 was extended to below itsnoise floor using the technique described in [14].

recording,b̃(t) =

∑k

pk(t) γk(t). (15)

This synthesis b̃(t) is expected to be perceptuallysimilar to the original recording b(t), save for the ef-fects of having full bandwidth arrivals. To generatethe final impulse response estimate h̃(t), the win-dowed band echo sequences are equalized to whitenthe direct path arrival, and summed:

h̃(t) =∑

k

pk(t) γk(t) αk, (16)

where αk are the inverse direct path arrival bandgain estimates.

6. METHOD EXAMPLES AND RESULTS

6.1. Memorial Church Measurement

Balloon pop responses were recorded in MemorialChurch using a pair of ear-spaced omnidirectionalmicrophones and balloons attached to 3-meter-longsticks held roughly 2.5 meters above the ground andapproximately 6 meters from the microphone pair.The spectrogram of the recorded, extended balloonresponse b(t) is shown in Fig. 16 (upper). A syn-thesized balloon response b̃(t)was created; its spec-trogram appears in Fig. 16 (lower). The spectro-grams of the recorded and synthesized balloon popsare similar, and informal listening tests confirm theirperceptual equivalence, including during the psy-choacoustically important balloon pop response on-set.

A baseline room impulse response h(t) was measuredusing a swept-sine technique and a Mackie HR824loudspeaker positioned near the ballon burst loca-tion. The measured impulse response h(t) spec-trogram is shown in Fig. 17 (upper). A synthe-sized impulse response h̃(t) was created by equal-izing the balloon response estimate b̃(t); its spectro-gram also appears in Fig. 17 (lower). The spectro-grams show similar behavior. However, since theloudspeaker energy is radiated mainly out of thefront of the speaker, and the balloon pop radiatesits energy rather uniformly, the direct path arrivalis relatively more powerful in the case of the loud-speaker impulse response measurement. Future im-pulse response measurements using a dodecahedralloudspeaker source would provide a better basis for

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 9 of 12

Page 10: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

comparison with the omnidirectional balloon pop ra-diation pattern.

Fig. 16: Memorial Church Measured and EstimatedBalloon Response Spectrograms. The extended bal-loon pop response (upper) and its full bandwidthestimate (lower) are shown.

Fig. 17: Measured and Estimated Impulse Re-sponse Spectrograms. The impulse response spec-trogram measured using a loudspeaker and sweptsinusoid (upper) is shown along with an estimatedimpulse response (lower) based on a balloon poprecording from a similar location.

101

102

103

0

0.5

1

1.5

NE

D

time (ms)

Fig. 18: Measured and Estimated NED Profiles.The NED profile of the integrated Memorial Churchballoon pop response (solid) is shown along with theNED of the synthesized echo sequence designed tohave the same arrival density (dashed-dot), and theNED profile of the loudspeaker-generated impulseresponse (dashed).

0 50 100 150 200 250 300 350 400−1

−0.5

0

0.5

1

time (ms)

ampl

itude

0 50 100 150 200 250 300 350 400

−0.1

−0.05

0

0.05

0.1

0.15

time (ms)

ampl

itude

Fig. 19: Recorded Balloon Pop and Loudspeaker-Generated Impulse Response Onsets. The recordedballoon pop onset (upper) shows a number of spec-ular reflections not present in the loudspeaker-generated impulse response (lower) Note the differ-ent amplitude scales—both impulse responses arenormalized to have unit maximum amplitude.

NED profiles for the impulse response measurementh(t) and balloon pop recording b(t) are shown alongwith the NED profile of the synthesized pulse se-quence p(t) in Fig. 18. The NED profiles of themeasured and estimated impulse responses closelytrack each other, and sit slightly lower than the bal-loon pop NED until the late-field start, as expected.It’s worth examining the onsets of the measured im-pulse response h(t) and the recorded balloon pop

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 10 of 12

Page 11: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

b(t), shown in Fig. 19. Note that while impulse re-sponse h(t) contains a few specular reflections afterthe late-field onset in the 100–170 ms time inter-val, the balloon pop recording has a large number ofspecular reflections in the same time interval. Thisseems to be a result of the balloon’s uniform radia-tion pattern generating acoustic paths involving thechurch dome and floor, which are only weakly ex-cited by the loudspeaker. There arrivals likely affectthe local spectrum, but as they appear in a high echodensity background (with NED values near 1.0) theirpresence does not affect the perceived echo density.

6.2. Hagia Sophia Measurement

Fig. 20 shows a spectrogram of a balloon poprecorded in Hagia Sophia by one of the authors,Pentcheva, in May 2010, with omnidirectional mi-crophones affixed to her hair, just above her ears. Aballoon was popped by an assistant, roughly 3 me-ters in front of Pentcheva. In the spectrogram andballoon response, the direct path arrival and severalearly reflections are visible, as is a very long decayassociated with the voluminous space of 70×80 me-ters and nearly 50 meters in height [4]. Stray noisesmade in the building were noted near 3 seconds, 6seconds, and after 8 seconds, corrupting the recordedballoon response.

The balloon response b(t) was processed using ouranalysis/synthesis method into an impulse responseh̃(t) and its spectrogram is shown in Fig. 21. Asthis impulse response was synthesized, neither theunwanted background noises nor the handheldrecorder noise floor are present. Fig. 22 showsestimated T30 decay times for the partially-filledmuseum; these are consistent with those publishedby Gade et. al. [15]. Estimates of C80 not presentedhere are also consistent with that described in [15].

7. SUMMARY

Balloon pop impulse response recordings offer cer-tain advantages over other acoustic measurementmethods, and in notable contexts are the onlymeans; however, the pattern of nulls present in theballoon pop waveform “N-wave” Fourier transformcreates undesired effects when this response is con-volved with other material. Our method for convert-ing recorded balloon pops into full audio bandwidth

Fig. 20: Hagia Sophia Balloon Pop Spectrogram.The spectrogram of a balloon pop recorded in Ha-gia Sophia in the presence of background noise andsound events.

Fig. 21: Hagia Sophia Estimated Impulse Re-sponse. The spectrogram of the impulse responseestimated from the balloon pop shown in Fig. 20.

impulse responses yields an accurate and clarifiedroom impulse response that can be convolved withsample audio material to create auralizations of theacoustic effect of the measured space, free of distor-tions from the excitation process.

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 11 of 12

Page 12: Audio Engineering Society Convention Papernjb/research/AES129_Balloon.pdf · Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4{7 San Francisco,

Abel et al. RIR Estimation from Balloon Pops

10−1

100

101

10−1

100

101

frequency (kHz)

60 d

B d

ecay

tim

e (s

econ

ds)

Fig. 22: Hagia Sophia (Partially Occupied) Esti-mated 60 dB Decay Time.

8. ACKNOWLEDGMENTS

This research was enabled by the Stanford Presiden-tial Fund for Innovation in the Humanities, grantedfor “Icons of Sound: Architectural Psychoacousticsin Byzantium”, and the Stanford Institute for Cre-ativity and the Arts (SiCa).

9. REFERENCES

[1] A. Farina, “Simultaneous measurement of im-pulse response and distortion with a swept-sinetechnique,” Convention Paper 5093. Presentedat the 108 Convention, Paris, February 19-22,2000.

[2] A. Farina, R. Ayalon.“Recording concert hallacoustics for posterity,” In the Proceedings ofthe 24th AES Conference on Multichannel Au-dio, Banff, Canada, June 26-28 2003.

[3] D. T. Deihl and F. R. Carlson Jr. “N Wavesfrom Bursting Balloons,” American Journal ofPhysics, Vol. 36, No. 5. May, 1968.

[4] B. Pentcheva. Icons of Sound: Hagia Sophiaand the Byzantine Choros. Chapter 2 in “TheSensual Icon: Space, Ritual, and the Senses inByzantium,” Penn State Press, 2010.

[5] C. A. Weitze, J. H. Rindel, C. L. Chris-tensen, and A. C. Gade. “The AcousticalHistory of Hagia Sophia revived throughComputer Simulation,” Rome, 2001. Re-trieved from http://www.odeon.dk/pdf/ForumAcousticum2002.pdf, April 7, 2009.

[6] C. A. Weitze, C. L. Christensen, J. H.Rindel. “Comparison between In-situ record-ings and Auralizations for Mosques andByzantine Churches,” May 25, 2010. Re-trieved from http://www.odeon.dk/pdf/NAM\%202002\%20paper.pdf.

[7] Odeon A/S. “Acoustics in an-cient church, Hagia Sofia,” Re-trieved from http://www.odeon.dk/acoustics-ancient-church-hagia-sofia,April 26, 2010.

[8] Mark F. Hamilton, David T. Blackstock. Non-linear Acoustics: Theory and Applications.Academic Press, San Diego, 1998.

[9] P. Huang, J. S. Abel, H. Terasawa, and J.Berger. “Reverberation Echo Density Psychoa-coustics,” Convention Paper 7583. Presented atthe 125th Convention, New York, October 2-5,2008.

[10] J. S. Abel and P. Huang. “A simple, robustmeasure of reverberation echo density,” Con-vention Paper 6985. Presented at the 121stConvention, San Francisco, October 5-8, 2006.

[11] P. Huang and J. S. Abel. “Aspects of Reverber-ation Echo Density,” Convention Paper 7163.Presented at the 123rd Convention, New York,October 5-8, 2007.

[12] Y. Hur, J. S. Abel, Y. Park, D. H.Youn.“Microphone Array Synthetic Reconfig-uration,” Presented at the 127th Audio Engi-neering Society Convention, New York, October9-12, 2010.

[13] P. P. Vaidyanathan. Multirate systems and filterbanks. Prentice Hall, Englewood Cliffs, 1993.

[14] N. J. Bryan, J. S. Abel. “Methods For Ex-tending Room Impulse Responses Beyond TheirNoise Floor.” Presented at the 129th Audio En-gineering Society Convention, San Francisco,November 4-7, 2010.

[15] A. C. Gade, “Acoustics in Halls for Speech andMusic,” Springer Handbook of Acoustics, Ed. T.D. Rossing, 301-350 (2007).

AES 129th Convention, San Francisco, CA, USA, 2010 November 4–7

Page 12 of 12