2011 beamforming regularization, scaling matrices and inverse problems for sound field extrapolation...

8/11/2019 2011 Beamforming regularization, scaling matrices and inverse problems for sound field extrapolation and charact

1/32

Audio Engineering Society

Convention PaperPresented at the 131st Convention2011 October 2023 New York, USAThis paper was peer-reviewed as a complete manuscript for presentation at this Convention. Additional papers may be obtainedby sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; alsosee www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission

from theJournal of the Audio Engineering Society.

Beamforming regularization, scaling

matrices and inverse problems for soundfield extrapolation and characterization:Part I Theory

Philippe-Aubert Gauthier1,2,Eric Chambatte1,2, Cedric Camier1,2, Yann Pasco1,2, and Alain Berry1,2

1Groupe dAcoustique de lUniversite de Sherbrooke, Univ. de Sherbrooke, Sherbrooke, J1K 2R1 Canada

2Centre for Interdisciplinary Research in Music, Media and Technology, McGill Univ., Montreal, H3A 1E3 Canada

Correspondence should be addressed to Philippe-Aubert Gauthier

([email protected])

ABSTRACT

Sound field extrapolation (SFE) is aimed at the prediction of a sound field in an extrapolation region usinga microphone array in a measurement region. For sound environment reproduction purposes, sound fieldcharacterization (SFC) aims at a more generic or parametric description of a measured or extrapolatedsound field using different physical or subjective metrics. In this paper, a SFE method recently introducedis presented and further developed. The method is based on an inverse problem formulation combined witha beamforming matrix in the discrete smoothing norm of the cost function. The results obtained from theSFE method are applied to SFC for subsequent sound environment reproduction. A set of classificationcriteria is proposed to distinguish simple types of sound fields on the basis of two simple scalar metrics. Acompanion paper presents the experimental verifications of the theory presented in this paper.

1. INTRODUCTION

For spatial sound reproduction technologies based on

physical simulationsuch as Wave Field Synthesis (WFS)

[1, 2], the underlying hypothesis is that the immersion

of a listener in a physical reconstruction of a target

sound field will lead to an appropriate sound percep-

tion over a large listening area. In this area, the local-

ization cues (interaural level difference, interaural time

difference and spectral modifications) are naturally de-

rived from the interaction of the listeners body and ex-

ternal ears with the recreated sound field. To reproduce


2/32

Gauthier et al. Sound field extrapolation and characterization I

or recreate a real sound field or real sound environment,

WFS and other physical reproduction techniques require

a complete physical description of the target sound field.

Sound field extrapolation using microphone array tech-nologies is appropriate for this purpose. In this paper, a

sound field extrapolation and characterization methodol-

ogy is presented. The experimental tests of the method

are reported in a companion paper.

This work is part of a larger project which involves the

entire sound field reproduction of an airplane cabin in a

full-scale mock-up. The objective of the reported theory

and experiments is to get preliminary insights about the

efficiency and validity of the sound field extrapolation

(SFE) and sound field characterization (SFC) methods in

a practical situation. Preliminary experiments in labora-

tory conditions ensure the validation of the method be-fore the realization of actual on-site measurements, SFE

and SFC for subsequentsoundenvironment reproduction

in a mock-up of an airplane cabin.

Sound field extrapolation (SFE) finds many applications

in various domains: acoustic imaging, source localiza-

tion, sound field reproduction, etc. SFE relies primarily

on the measurement of a sound field using a microphone

array placed in a measurement region. Among the most

common techniques, one finds: inverse problems [3, 4]

and spatial transform methods (such as nearfield acousti-

cal holography [5]). In this paper, we consider an inverse

method since this method can easily deal with any mi-crophone array configuration, regular or not. However,

the typically-large condition number of the matrix that

must be inverted signals that matrix-form inverse prob-

lems are sensitive to measurement noise [6]. Therefore,

regularization of the inverse problem is mandatory. Usu-

ally, with conventional regularization methods, this is at

some expense: reduced spatial resolution and supple-

mentary regularization errors. In a recent paper, a new

measurement-data-dependent regularization method that

suffers less from the aforementioned issues was intro-

duced [7].

The novelty of the method is that it applies a beamform-ing regularization matrix in the discrete smoothing norm

of the cost function used to solve the inverse problem

in the least-mean-square sense [3]. The advantages of

this method are to increase the solution spatial resolu-

tion and reduce the measurement noise sensitivity. In the

inverse problem, the beamforming regularization matrix

simply penalizes more strongly the sources for which an

a priori delay-and-sum beamformer gives a weaker am-

plitude. Recently, an experimental validation of the SFE

method was reported [8]. The validation was based on

the direct comparison of an extrapolated sound field withthe exact sound field in an extrapolation region differ-

ent from the measurement region. It was shown that the

proposed SFE method is effective. In this paper, new the-

oretical developments for the interpretation of the beam-

forming regularization matrix will be introduced on the

basis of the transformation of the general-form inverse

problem with the beamforming regularization matrix to

a standard-form inverse problem [3]. A companion pa-

per (Part - II) discusses the results of a complete experi-

mental verification of this recently developedmethod for

SFE. This companion paper presents experiments in an

hemi-achenoic room and in a reverberant chamber.

For sound environment and soundfield reproduction pur-

poses, SFE results can readily be applied to the deriva-

tion of multichannel signals using sound field reproduc-

tion technologies such as Wave Field Synthesis or Am-

bisonics [8, 10]. However, in some practical applica-

tions such as sound environment reproduction in vehi-

cle mock-ups, the entire sound environment tends to be

made of mostly stationary signals, at least for a finite pe-

riod of time (corresponding to cruise speed, fixed alti-

tude, stationary road condition, etc.). This very specific

yet simplified nature of the sound environment encoun-

tered in most vehicles allows for the fragmentation of thesound environment into sound components or sound en-

vironment atoms [11, 12]. For such components of the

entire sound environment, it is sometimes more useful to

summarize the spatial property of the sound component

by few simple metrics using a general sound field char-

acterization (SFC) method. For example, this is the hy-

pothesis behind the Directional Audio Coding (DiRAC)

[13, 14, 15, 16] approach by Pulki and coworkers for a

point in space for which the spatial sound properties are

summarized as impinging directions and diffuseness as

function of frequency. In this paper, we develop these

ideas further and apply them to the typical SFE results

obtained by the proposed method for an extended spa-tial area. Moreover, we propose a simple classification

method to distinguish simple and generic types of sound

field. This classification is deduced from direct observa-

tion of the metrics efficiency to distinguish these generic

types of sound field. Supplementary methods for virtual

acoustics and simulations from microphone array mea-

surement are also possible and discussed in [8].

AES 131st Convention, New York, USA, 2011 October 2023

Page 2 of 32


3/32


Fig. 1: Illustration of the coordinate systems. Micro-

phones are located in xm. Points that belong to the in-

verse problem equivalent source distribution are denoted

by y. Any field point is denoted by x. The real sound

sources are confined to Vs.

1.1. Paper structure

Section 2 presents the general theory behind SFE using

the inverse problem approach and the beamforming reg-

ularization matrix as initially introduced in [7]. In this

paper, the beamforming regularization theory introduced

in [7] is further developed. The SFC metrics and meth-

ods are presentedin Sec. 3 where several of these metricsare discussed and compared on the basis of simple yet

archetypical theoretical cases. Based on the evaluation

of these metrics, a sound field components classification

tree is also proposed in Sec. 3. A short discussion and a

conclusion gather the main concluding remarks.

2. SOUND FIELD EXTRAPOLATION

The generic microphone array and coordinate systems

are shown in Fig. 1. The array includes Mmicrophones.

For a given frequency, a sound pressure field measure-

ment is stored in a complex vector p(xm) M. Al-though the method is developed in the frequency do-

main, it is possible to derive the resulting time-domain

quantities using inverse Fourier transform as long as the

equivalence between circular and linear convolution is

respected with proper zero-padding of the input data [9].

2.1. Direct problem

The discrete direct sound radiation problem in matrix

form:

p(xm) =G(xm,yl)q(yl ), (1)

with

p

M,G

ML andq

L, (2)

whereqis the source strength vector for sources located

inyl ,Gis the transfer matrix that represents sound radi-

ation and p is the resulting sound pressure vector at the

microphone locationsxm. In this paper, a simple model

of the direct problem is used: qare amplitudes of el-

emental plane waves propagating in different directions

andin Fig. 1. Therefore, we let R . Then, inthis more specific case: Gml =e

ikl xm with k l being the

wave vector for the l -th plane wave (kl = knl ,k=/c,

is the angular frequency [rad/s], c is the sound speed[m/s],nlis a unit vector aligned withkl ,land lare thepropagation azimuth and elevation). Many other types of

sources or idealized waves could be used in the direct

problem definition. Indeed, one may object that spheri-

cal or cylindrical harmonic waves could be more suitable

for the inverse problem. This is only the case when the

microphonearraydoes not include the origin of the coor-

dinate system. Indeed, the linear combination of spheri-

cal harmonics or cylindrical harmonics tends to numeri-

cally diverge in the immediate vicinity of the coordinate

system origin. In our case, the microphone array a priori

covers an extended area (as opposed to compact arrays

such as the first-order Ambisonics Sound Field micro-phone [17]) and typically includes the origin, hence our

interest for plane waves in the discrete problem. In all

cases, it is possible to convert plane waves into spherical

harmonics in a subsequent step.

2.2. Sound field extrapolation outside the mi-crophone array

Theextrapolatedsound pressurefield [Pa] at any location

xis then computed using a linear combination of plane

waves

p(x) =L

l=1

eikl xql , (3)

where the complex plane wave distribution is centered

around the coordinate system origin x = 0. Indeed, onenotes that the sound pressureat the origin is thedirect lin-

ear combination of the plane wave complex amplitudes

p(0) =L

l=1

ql. (4)


Page 3 of 32


4/32


For any field point x that excludes the array origin, it

would be interesting for notational purposes to obtain an

expression similar to Eq. (4) where a new complex plane

wave distributionqwould be centered around the fieldpointx. This corresponds to a simple translation of the

coordinate system origin. This is expressed as follow

p(x) =L

l=1

ql(x), (5)

with

ql(x) =ejkl xql. (6)

2.3. Inverse problem: general-form andstandard-form of Tikhonov regularization

For SFE, the goal of inverse problem is to estimate the

source amplitude qthat best predicts the measured sound

fieldp, knowing the propagationoperatorG. Put simply,

as for most classical inverse problems in acoustics, we

ask for the causes, i.e. the sources amplitudesq, that cre-

ated the effect, i.e. the measurement data p, for a given

known system model G , i.e. for an imposed geometry

of source distribution. Note that for practical applica-

tions, the source geometry is imposed and specified, but

for now, we keep an unspecified definition of the source

distributionylto propose a general view of the method.

A typical approach to that problem is to cast it as a min-

imization problem with Tikhonov regularization [3]:

q=argmin

p Gq22 +2(q)2

. (7)

In Eq. (7), 2 represents the vector 2-norm (x22=

xHx, superscript Hdenotes Hermitian transpose), isthe penalization parameter and()is a discrete smooth-ing norm [3]. The function()is termed a discretesmoothing norm because it smoothly regularizes the un-

known solutionq. In classical Tikhonov regularization

as reported in many papers [4, 6, 18, 19, 20], the dis-

crete smoothing norm is the solution vector 2-norm [3]:

(q) = q2. The inverse problem solutionq should

approach the real sound source distribution or should, atleast, be able to achieve SFE according to Eq. (3) within

an extrapolation region for which the prediction error

would be below a given threshold.

In this paper, we will assume that the discrete smoothing

norm is of the more general form

(q) =Lq2, (8)

where L NL is a rectangular or square weighting ma-trix. Then, one writes the general-form inverse problem

as

q=argmin

p Gq22 +2Lq22, (9)

and the standard-form inverse problem as [3]

q

=argmin

p Gq22 +2q22

. (10)

In this equation, thenew standard-formmatrices andvec-

torsGandpmust be computed fromGandpwhich, de-

pending on the weighting matrixL, might not be a trivial

task. This must often be achieved using numerical meth-

ods [3].

However, when the weighting matrix is square (L

LL) and when its inverse exists, one directly obtains

the standard-form transformed quantities

G=GL1, p=pandq=L1q

. (11)

When L= I, as often reported in the literature, thereis no difference between the general-form and standard-

form problems. The optimal solution of the general-form

problem Eq. (9) is [3, 4, 6]

q= GHp

GHG +2LHL. (12)

The solution of the standard-form problem Eq. (10) is

[3, 4, 6]

q

=GHp

GHG +2I. (13)

For the specific case of Eq. (11) (i.e. L LL anddet(L)=0), Eq. (13) can be directly expressed as func-tion ofL

q=L1q

=L1

[L1]TGHp

[L1]TGHGL1 +2I (14)

with superscript T denoting matrix transposition. There-

fore, for the specific case of Eq. (11), it is possible to in-

terpret the problem from two equivalent vantage points:

a general-form problem with a regularization matrix L(Eq. (12))or a standard-formproblem with a transforma-

tion matrixL1 that transforms the propagation operator

G(Eq. (13)). In the even more specific case of a diagonal

matrixL, the regularizationmatrixLputs weights on the

individual solution components ql and the transforma-

tion matrix is a diagonal matrix L1 that scales columns

of the propagation operatorG.


Page 4 of 32


5/32


6/32


in reference [7]. Moreover, initial experiments with the

beamforming regularization matrix in an hemi-anechoic

room, with a 96-microphone array and with a single om-

nidirectionnal source showed the practicability and va-lidity of the method [8].

To illustrate the validity of the method, a simple theoreti-

cal example is given in Figs. 2, 3 and 4. The direct prob-

lem source distribution involves 642 plane waves com-

ing from 4steradians. The microphone array is shownin Fig. 3: it is made of 256 microphones. The original

sound field is created by a dipole inx = [1.8k,2k,0]T

where k is the acoustic wavelength. Real parts ofthe original and extrapolated sound fields are shown in

Fig. 4. The SFE was based on Eq. (20) with=0.01.Clearly, SFE using the proposed method is effective.

In the following, we illustrate the equivalence of the

beamforming regularization matrix and the correspond-

ing scaling matrixL (see Eqs. (11) to (13))

L =L1 =diag(|GHp|/GHp) LL. (21)

Then, Eq. (14) gives

qBF= diag(|QBF|/QBF)

2GHp

GHdiag(|QBF|/QBF)2G +2I (22)

since L is diagonal. This solution is equivalent to

Eq. (20). Therefore, it is possible to interpret the original

beamforming regularization matrix as a standard-formproblem using a data-dependent scaled system matrixG

G=GL =Gdiag(|GHp|/GHp), (23)

withqBF= Lq

BF. MatrixL will be called the scaling

matrix.

2.6. Spatial resolution: equivalence between

the beamforming regularization matrix andbeamforming scaling matrix

As discussed in [7], since the beamforming regulariza-

tion matrix involves a general-form inverse problem, one

must rely on the generalized singular value decomposi-tion (GSVD) of the matrices pair Gand Lto evaluate

the possible spatial resolution of the problem. This is

typically evaluated on the basis of the generalized sin-

gular vectors. In the case of the beamforming scaling

matrix, the problem is written in standard form and the

spatial resolution of the problem can be evaluated on the

basis of the singular value decomposition (SVD) of the

0.5

00.5

1

0.5

0

0.5

1

1

0.5

0

0.5

1

cos(l)cos(l)sin(l)cos(l)

sin(l)

Fig. 2:Spherical distribution ofL= 642 incoming planewaves. Each propagating directionlandlis shown asa black dot on the sphere with the corresponding direc-

tion cosines.

x2

x1

x3

Fig. 3:Theoretical 256-microphonearray geometry. Mi-

crophones are horizontally aligned with a uniform rect-

angular grid (shown in grey) and they are randomly po-sitioned along x3on the basis of a two-layer geometry

(with 0.1 wavelength as the vertical separation distance).

Microphone acoustic centers are shown as black dots.

The problem is dimensionless and the array spans two

wavelengths alongx1and x2.


Page 6 of 32


7/32


Fig. 4: (a): Real part of the original sound field created

by a dipole sound source in x = [1.8k,2k,0]T in the

direction/2.25 radians. Field positions are normalized

by the acoustical wavelengthk. (b): Real part of the ex-trapolated sound field using a plane wave source distri-

bution obtained from Eq. (20) with= 0.01. The dipoleis marked as a black and white dot. The dipole main

and orthogonal axes are highlighted by dashed black and

white lines. The white contour line indicates the region

of 0.001 of local quadratic SFE error and the black con-

tour line indicates the region of 0.1 of local quadratic

SFE error (p(x) p(x)22).

scaled system matrix G. In this paper, the equivalence

of the two formulations in terms of spatial resolution

will be illustrated. Furthermore, this demonstration will

also illustrate the better spatial resolution obtained by thebeamforming regularization or scaling matrices.

The SVD ofGis given by

G=UVH =M

i=1

uiivHi , (24)

with unitary matricesU

MM andV

LL (UHU =VHV=I). In Eq. (24), the vectorsuiand viare the leftand right singular vectors, respectively. They correspond

to the columns of Uand V. Each singular vector pair

corresponds to a singular value i stored on the main

diagonal of

ML. It is assumed that the number

of microphones is smaller than the number of unknown

sources. The singular values are ordered in decreasing

order (1 2 >0). On the basis of this SVD, thesolution of the standard form is written

qBF=L

M

i=i

fi

uHi p

ivi (25)

where the filter factors fi=2i/(

2i +

2)represent theregularization effect.

The GSVD of G and L is given by [3, 21] with U

MM,V LL,C ML,M LL andZ LL.The columns ofUandVare orthonormal (UHU=IandVHV= I) and Z is nonsingular. The columns ofU, VandZ(ui,viand zi, respectively) form a new set of sin-

gular vectors that are used as independent basis vectors.

The columns ofU are used as basis vectors for acoustic

pressurepwhile the columns ofZare used as basis vec-

tors for source distributionq. Note thatUandVare not

equal to those found from standard SVD (namely,Uand

V). Matrices C and M have their coefficients ciand m istored in increasing order on their main diagonals. The

generalized singular valuesiare given by

i=ci/mi. (26)

On the basis of this GSVD, Eq. (12) is written

qBF=M

i=1

fiuHi p

cizi (27)

where fi= 2i/(

2i +

2)represents the regularization ef-fect on the solution.


Page 7 of 32


8/32


An example is now introduced to highlight the increased

spatial resolution of the inverse problem approach with

the beamforming regularization matrix in comparison

with the inverse problem with the identity matrix as theweighting matrix. Moreover, this example will illus-

trate the fact that the spatial resolution obtained with

the beamforming regularization matrixLwritten in gen-

eral form is equal to the resolution obtained with the

beamforming scaled system matrixGwritten in standard

form. To illustrate this property, we simply rely on the

comparison of the spatial resolution of the generalized

singular vectors zi and the scaled singular vectors Lvi

since they both form the orthogonal bases of the solu-

tions of the standard form Eq. (25) and the general form

Eq. (27). Theziand Lviare presented in Figs. 5(a) and

5(b). For comparison purposes, the right singular vectors

of the matrixGare shown.

The example involves a linear microphone array ofM=32 microphones spanning 4 acoustical wavelengths kwith a plane wave distribution ofL=256 plane waves(l=0, ,and l=0) and for a plane wave incidentfrom =/2. First note the effect of the beamform-ing regularization matrix and the corresponding scaling

matrix by comparison with the standard system matrix

G: they provide a locally-increased spatial resolution in

the vicinity of the impinging sound wave. Moreover, the

comparison of Fig. 5(a) and Fig. 5(b) illustrates the ex-

act correspondence of the vector bases used for the stan-

dard form Eq. (25) and the general form Eq. (27). There-fore, on the basis of this example and Eqs. (20) and (22),

the increasedspatial resolutionproperty associated to the

beamforming regularizationmatrix method (as originally

presented on the basis of the GSVD in [7]) is equivalent

to the increased spatial resolution property for the scaled

system matrix.

3. SOUND FIELD CHARACTERIZATION

In this section, several metrics and quantifiers are pre-

sented to characterize the measured and extrapolated

sound fields for a given frequency on the basis of theinverse problem solutions q, qBF or q(,), i.e. theplane wave distributions. In some cases, the metrics are

computed directly from the inverse problem solution and

in some other cases the metrics are computed from the

SFE result, namely the sound pressure or the particle ve-

locity. The presented metrics are either objective or sub-

jective predictors. A distinction is also introduced be-

tween local and global metrics. It is known from the lit-

erature that some metrics aremore effective to predict the

listener sound localization in different frequency bands

[24], therefore several metrics are presented and dis-cussed before being exemplified using the SFE method

presented earlier.

3.1. Sound intensity and direction-of-arrival

fields and averages

The extrapolated sound pressure field [Pa] as function of

xis given by the algebraic superposition of the Lhar-

monic plane waves used in the direct problem as ex-

pressed in Eq. (3).

The acoustic velocity field u(x)[m/s] is computed using

the linearized Euler equation [25]

u(x) =p(x)

i , (28)

with being the air density [kg/m3],the angular fre-quency [rad/s] and the gradient operator given by

=

x1e1 +

x2e2 +

x3e3 (29)

whereeiis a canonical vector [21] pointing in the xidi-

rection. Accordingly, for the problem at hand, one finds

p(x) =L

l=1

ikl eikl xql, (30)

and

u(x) =L

l=1

nl

ceikl xql, (31)

with nl = k l/kl2 being a unit vector collinear withkl . For a given harmonic sound field, the time averaged

acoustic intensityI(x)[W/m2] is given by [25]

I(x) =1

2[p(x)u(x)], (32)

which gives, in our specific case

I(x) =1

2

L

l=1

eikl xql

L

l=1

nl

ceikl xql

. (33)

Many metrics presented in the sequel are derived from

the sound pressure, velocity and intensity fields.


Page 8 of 32


9/32


0 0.25 0.5 0.75 1

10

20

30

Q

BF

0 0.2 0.4 0.6 0.8 11

0

1

Lv1

0 0.2 0.4 0.6 0.8 11

0

1

Lv2

0 0.2 0.4 0.6 0.8 11

0

1

Lv3

0 0.2 0.4 0.6 0.8 11

0

1

Lv4

0 0.2 0.4 0.6 0.8 11

0

1

Lv5

0 0.2 0.4 0.6 0.8 11

0

1

Lv6

l/

(a)

0 0.25 0.5 0.75 1

10

20

30

Q

BF

0 0.2 0.4 0.6 0.8 11

0

1

z1

0 0.2 0.4 0.6 0.8 11

0

1

z2

0 0.2 0.4 0.6 0.8 11

0

1

z3

0 0.2 0.4 0.6 0.8 11

0

1

z4

0 0.2 0.4 0.6 0.8 11

0

1

z5

0 0.2 0.4 0.6 0.8 11

0

1

z6

l/

(b)

Fig. 5: Absolute value of the beamforming output QBF(Eq. (15)) (top) and the first six (from top) singular vectors

(black lines) (a) and generalized singular vectors (black lines) (b) for a linear microphone array of 32 microphones

spanning 4 acoustical wavelengthskwith a plane wave distribution of 256 plane waves (l= 0, ,and l= 0) andfor a plane wave incident from= /2. The real part of the vectors are shown as continuous lines and the imaginaryparts of the vectors as dashed lines. For comparison purpose, the right singular vectors of the matrixGare shown in

grey.


Page 9 of 32


10/32


The direction of the sound intensityI(x)can also be usedto predict a local indication of the DOA (direction-of-

arrival). Indeed, the DOA is a unit vector in the opposite

direction of the sound intensity vector. Then the DOAvectornDOA(x)is given by

nDOA(x) = I(x)

I(x)2. (34)

For a set of N SFE pointsxn, the average intensity vector

is introduced for a spatially discrete extrapolationregion:

IN=1

N

N

n=1

I(xn). (35)

This averaging operation can also be computed for all

the subsequent metrics, including the DOA. It will be

systematically denoted byN.

Intensity and DOA fields are both objective and sub-

jective metrics, they represent a directional transport of

acoustical energy, but they are also sometimes used as

indicators of sound localization by human hearing, es-

pecially the DOA. However, the sound intensity solely

expresses the net flow of energy, it does not indicate the

direction of particular simultaneous arrivals, as for the

DOA.

3.2. Energy density field and average energy

The local time-averaged energy density field E(x)of anharmonic acoustic sound field is a combination of ki-

netic and potentialenergydensity fields,Ec(x) andEp(x)[J/m3], respectively [25]:

E(x) =Ec(x) +Ep(x) =

4

u(x)22 +

|p(x)|2

(c)2

.

(36)

According to Eqs. (3) and (31), one obtains for the prob-

lem at hand

E(x) =

4

L

l=1

nl

ceikl xql

2

2

+|Ll=1 e

ikl xql|2

(c)2 .

(37)

The energy density field can provide some interesting

insights about a measured sound field. For a com-

pletely diffuse sound field, the local spatial average en-

ergy density fieldE(x)N(withNneighboring points ofx) should be constant in space [25]. For an harmonic

sound field, a local spatial average is an average over

a volume with dimensions larger than the wave length

[25]. However, in practical situations, the local energy

density fieldE(x)is not spatially uniform and this intro-

duces some issues. This will be discussed in Secs. 3.6and 3.7.

In this paper, we also introduce the normalized standard

deviation of the local energy density with respect to the

average energy density (EN)

N= 1

NEN

N

n=1

|E(xn) EN| 100%, (38)

in % ofEN. A small Nwill suggest a uniform distri-bution of the energy density while a large Nsuggests anheterogeneous distribution of the energy density.

3.3. Directional pressure, energy density anddiffusion

Most of the previously introduced SFCmetrics andquan-

tifiers (Secs. 3.1 and 3.2) rely on the computation of SFE

and local quantities before being actually averaged over

the SFE sampled region. It is possible to introduce clas-

sical metrics on the basis of the plane wave source distri-

bution qwithout actual SFE.

3.3.1. Directional pressure

For the proposed SFE method, the output is a plane wave

amplitude vector qwhich directly gives the directionalpressure: pl(l+,l) = ql(l ,l ). Indeed, if a local-ization algorithm could be designed to listen to a single

directionl+,lfrom the SFE results, it would onlydetect a sound pressure wave withqlas its complex am-

plitude. Therefore, the passage from ql to pl is direct.

However, one should keep in mind the reversal of the

propagation directions l ,l to corresponding listeningdirectionsl+,l.

3.3.2. Directional energy density

Since, for a single harmonic plane wave ql(l ,l )thevelocity field is related to the pressure field through the

characteristic impedance c, one can directly write thedirectional energy density [25] on the basis of Eq. (36)and the directional pressure p l

El = |pl|

2

2c2=

|ql|2

2c2. (39)

The directional energy densityEl , since it is based on di-

rectional pressure pl , represents the energy density that


Page 10 of 32


11/32


comes from the listening directionl+ ,l . The av-erage directional energy is given by

ElL=1L

Ll=1

El= 12c2L

Ll=1

|ql|2. (40)

3.3.3. Directional diffusion

The previous metrics lead to the definition of directional

diffusion. The directional diffusion in % is defined as

follows [26]

d= (1 /o) 100 % (41)

where is the average of the absolutedifference betweenthe directional energy density and the spatial average of

the directional energy density andois the value offora single impinging plane wave. Therefore, d= 100% fora perfectly diffuse sound field and d=0 % in anechoicconditions. In this paper, we follow the propostion of

Goveret al.[26] and use the following definition for

= 1

El

L

l=1

|El El|. (42)

However, for the evaluation ofo, we rely on an averageof(according to Eq. (42)) over all the possible planewave directions:

o= 1L

L

l=1

1E

(l)l

L

l=1

|E(l

)l E(l

)l |. (43)

That is, the inverse problem is theoretically computedL

times for all the possible harmonic plane wave directions

(indexl in the previous equation). The resulting solu-

tions q(l)l lead to the directional energy densities E

(l)l

used in this definition ofo. Indeed, the heterogeneousnature of the inverse problem solution q as function of

sound wave direction due to array geometry requires

the computation of a direction-averagedoas shown inEq. (43). Note that since the inverse problem solution

qis obtained with regularization, we do not expect that

the directional diffusion will reach 0 % in practical situ-ation. Indeed, the regularization introduces some spheri-

cal spreading of the solution, even for a single incoming

plane wave.

3.4. Incident directivity factor

Assuming that the plane wave distributionql(l ,l )uni-formly covers 4steradians, it is possible to quantify the

directivity of the source distribution. Inspired from the

definition of the directivity factor of sound sources, an

incident directivity factor is accordingly introduced

Q=q2q22

, (44)

with q= q or q= qBF. The corresponding incidentdirectivity index [dB ref 1] is

DI=10log10(Q). (45)

This type of incident directivity factor was also intro-

duced by Gover [26] for the analysis of transient sound

fields in rooms. As for the directional diffusion, the di-

rectivity index is an averaged parameter that expresses

the anisotropic character of a sound field.

3.5. Sound localization: Velocity and energyvectors, interaural time difference

Both the velocity and energy vectors are derived from

the audio engineering field where researchers look for

predictors of human sound localization in presence of

stereophonic sound systems.

The velocity vector was proposed as a sound localization

predictor at low frequencies, i.e. typically below 700 Hz

where the interaural phase difference is a dominating cue

for the localizationand where thehead diffraction is min-

imal [23, 24, 27]. It is originally defined as the normal-ized particle velocity at the center of the reproduction

region, where the listener stands. More recently, the ve-

locity vector definition was expanded to the entire sound

field, and it is now given by Daniel et al.[27]

V(x) =cu(x)

p(x). (46)

One notes that the velocity vector is the particle velocity

vector u(x) normalized by the particle velocityamplitudep(x)/cthat would be obtained for a purely propagatingplane wave of sound pressure amplitude equal to the lo-

cal sound pressure p(x). Therefore, the velocity vectoris a dimensionless metric. Note that, by contrast with the

intensity and DOA vectors, the velocity vector is a com-

plex quantity. The real part of the velocity vector is in

the opposite direction of the DOA vector nDOA(x)(seeEq. (34)) and is associated with precise sound localiza-

tion [24] and active sound intensity. It is also generally

accepted that the imaginary part of the velocity vector is


Page 11 of 32


12/32


associated with image broadeningor perceived phasi-

ness [24]. Typically, the imaginary part of the veloc-

ity vector is also related to reactive sound intensity. In

[27], it is stressed that the velocity vector can also beused to predict the interaural time difference (ITD). In

accordance with the informations conveyed by the ve-

locity vector, we introduce an equivalent expression of

the velocity vector

V(x) =VR(x) + iVI(x), (47)

with VR(x) =(V(x)), and VI(x) =(V(x)). Themag-nitude, azimuth and elevation ofVR, in the spherical co-

ordinates shown in Fig. 1, are denoted VR,VRand VR ,respectively. Since it is assumed that VRis associated

with sound localization, we derive the ITD as follows

ITD(x,VR,H) =H

c sin(VR H)cos(VR ), (48)

where it is assumed that the listeners head is oriented

towardsH and H without any roll movement of thelisteners head, i.e. the two ears are always in the same

horizontal plane. In Eq. (48), the listeners ear separation

isH[m].

For the higher frequency range, the head diffraction has

an strong effect and this makes the interaural level differ-

ence (ILD) one of the dominating localization cues [27].

Accordingly, the following energy vector is a more rele-

vant predictor of sound localization above 700 Hz [27],

E=

Ll=1 nl |ql|

2

Ll=1|ql |

2 , (49)

or

E = RE

cos(E)cos(E)e1

+ sin(E)cos(E)e2 + sin(E)e3

, (50)

where 0 RE1 andE,Eare the spherical compo-nents of the energy vector. Note that the energy vector as

defined in Eq. (49) can only predict sound localization atthe coordinates systems origin.

3.6. Diffuseness field and average diffuseness

By combining the intensity field (Eq. (32)) and the

energy density field (Eq. (36)), Merimaa and Pulkki

[13, 14, 15, 16] introduced the definition of diffuseness

for a single frequency and a single point in space. For

any pointx, it is possible to write the diffuseness field as

follows:

(x) =1 I(x)/c2E(x)

. (51)

The diffuseness varies between zero and unity. Theo-

retically, in a completely diffuse field it is expected that

(x) = 1 while in a purely propagative field,(x) = 0 isexpected. One of the issues that arises with the diffuse-

ness(x)is that it mostly depends on the sound inten-sity. Therefore, any situation that would lead to a null in-

tensity will be detected as a diffuse sound field. This can

arise for two propagating plane waves in opposite direc-

tion and identical amplitudes which produce a standing

wave. In this case, the net energy flow is zero and the in-

tensity field is null. This situation is easily generalized toany sound field made of opposite-direction propagating

plane waves with similar amplitudes. Therefore, the dif-

fuseness, as defined above, cannot distinguish between a

standing wave pattern and a diffusesound field. This will

be illustrated in Sec. 3.7.

Another limitation of this definition of the diffuseness

field is that it may not be appropriate for a diffuse sound

field since the local energy density E(x)tends to varywith position for a harmonic diffuse sound field (see

Sec. 3.2). This strong variation of the local energy den-

sity, by marked contrast with the spatially-uniform local

average energy density of the theoretical harmonic dif-

fuse sound field [25], makes it difficult to use the local

diffuseness(x)as a quantifier of the overall anisotropyor diffusion of the sound field over the SFE area.

Again, for the discrete set of pointsxn, we can also intro-

duce a discrete average

N=1

N

N

n=0

(xn). (52)

In subjective terms, the diffuseness is often related to the

listener envelopment or the sensation of surrounding and

enveloping sound.

3.7. Theoretical test cases

In order to evaluate the capability of the previous char-

acterization metrics (computed on the basis of SFE) to

distinguish or characterize several types of sound fields,

several test cases are reported: a single source in free

field, multiple sources in free field, a standing wave and


Page 12 of 32


13/32


a diffuse sound field. As the test cases are presented, the

relevance of these metrics are discussed for the charac-

terization of sound environments in vehicles or other sit-

uations. Finally, the quantifiers most able to distinguishbetween these archetypical test cases will be identified.

As it will be shown, two simple scalar metrics are the

most appropriate for a classification of such sound field

components. It is also made clear that other field metrics

are useful to visualize and understand the behavior of the

sound field in a large area.

3.7.1. Single source in free field

For this first test case, the SFE results reported in Fig. 4

are used. This corresponds to a single dipole of strength

0.1 [5] in free field. The intensity field computed at

N= 625 locations in the horizontal plane is shown in

Fig. 6(a). Clearly, the intensity field is stronger in thevicinity of the exact dipole position. Also note that the

average intensity IDOAN (computed from the NSFEpoints, see Eq. (35)) is correctly oriented. The DOA field

is shown in Fig. 6(b). This result clearly highlights the

effectiveness of the SFE DOA vector as a predictor of

perceived sound localization in a free-field situation over

the entire SFE region. The average DOA orientation, be-

sides being slightly different from the average intensity

orientation, is also correctly aligned. The slight orien-

tation difference between the average intensity and the

average DOA is caused by the fact that for the average

DOA all theNpoints share the same contribution in the

averaging while for the average intensity the contribution

of each of the Npoints in the averaging is proportional

to the local intensity magnitude. Also, DOA more effi-

ciently predicts the perceived direction of the incoming

sound. Note that proper intensity and DOA results are

not expected outside the effective SFE region shown by

the contour lines in Figs. 6(a) and 6(b).

The corresponding energy density E(x)and diffuseness(x)fields are shown in Figs. 7(a) and 7(b). The energydensity is confined to the vicinity of the true dipole po-

sition. Moreover, one notes that the local diffuseness is

zero nearly everywhere in the SFE region except along

the dipole null-axis. This is expected since the null pres-sure observed on the dipole null-axis makes the intensity

null in this region. Therefore, the diffuseness approaches

unity. This result explains the observed DOA in that re-

gion. Indeed, a closer look at the DOA (Fig. 6(b)) along

the dipole null-axis shows that the DOA strongly varies

with the position in that area. This variation is an ar-

tifact since the intensity approaches zero in that region,

2 1 0 1 2

2

1.5

1

0.5

0

0.5

1

1.5

2

I(x) [W/m2], Max(|I(x)|) = 0.087649 W/m

2

x1/

k

I(x) N

= 0.0067485

x2

/k

(a)

2 1 0 1 2

2

1.5

1

0.5

0

0.5

1

1.5

2

nDOA

N

nDOA

(x)

x1/

k

x2/k

(b)

Fig. 6: (a): Intensity field I(x)(Eq. (32)) [W/m2], av-

erage intensity vectorI(x)N[W/m2

], (b): direction-of-arrival nDOA(x)(Eq. (34)) and average DOA nDOAN(N=625) for a single dipole in free field for the SFEshown in Fig. 4. The microphone array is shown in light

grey. The average vectors (computed for the SFE points)

are centered at the origin and shown as a large arrow.

Local SFE errors shown as contour lines (see Fig. 4 for

more details).


Page 13 of 32


14/32


then the DOA is not defined (nDOA 0/0). Interestingly,the DOA clearly goes through strong or erroneous vari-

ation when it crosses the SFE effective region illustrated

as the contour lines in Fig. 6(b). The local variation ofthe DOA also suggests a potential local sensation of dif-

fuseness which also corresponds to actual perception in

the null-axis of a dipole in free-field. This will be con-

firmed by the velocity vectors.

The directional pressure pland the energy vector E ob-

tained from SFE are shown in Fig. 8(a) as a spherical

plot in linear scale (radius) and logarithmic scale (color)

[dBref 1]. Clearly, the directional pressure,which is sup-

posed to predict perceivedsound directionabove 700 Hz,

is precise and well aligned with the energy vector which

points towards (from the coordinates system origin) the

dipole. The directional energy and the energy vector areshown in Fig. 8(b). As expected from the directional en-

ergy definition (Eq. (39)), the directional energy is much

more precise than the directional pressure.

The velocity vectorV(x), the real part of which is a pre-dictor of sound source localization below 700 Hz, is re-

ported in Fig. 9. As expected from the definition of the

velocity vector (see Eq. (46)), the real part of the veloc-

ity vector orientation matches the orientation of the DOA

vector (see Eq. (34)). In addition, the imaginary part

of the velocity vector highlights the regions where the

sound is perceived as diffuse or not localized. Clearly,

for the reported test case, the imaginary part of the ve-

locity vector is non-negligible in the dipole null-axis, the

region where the diffuseness (x)approaches one (seeFig. 7(b)). Therefore, as shown by this example, the ve-

locity vector is an interesting metric since it combines

the information carried by the DOA vectornDOA(x)andthe diffuseness(x). The ITD predicted from the realpart of the velocity vector is shown in Fig. 10 where one

clearly notes the transition from negative to positive ITD

when the listener passes from one side to the other side

of the sound source while its head azimuth is fixed to the

angle of the energy vector. Therefore, SFE seems to cor-

rectly predict the velocity vector and the ITD in the SFE

region where the local SFE errors are low.The scalar metrics related to this test case are reported in

Tab. 1. We recall that these scalar metrics are directly de-

rived from the plane wave amplitudes obtained from the

inverse problem solution: these scalar metrics are rep-

resentative of the sound field as a whole. Interestingly,

even if they are intuitively understood as origin-centered,

they are in fact the same for any SFE pointsx. Indeed, all

(a)

(b)

Fig. 7: (a): Energy density E(x) (Eq. (36))[J/m3]104, average energyE(x)N [J/m3], (b): dif-fuseness (x) (Eq. (51)) and average diffuseness(x)N(N=14400) for a single dipole in free field forthe extrapolated sound field shown in Fig. 4.


Page 14 of 32


15/32


x1

Directional pressure pl(

l,

l), color: dB ref 1

x3

x2

10

8

6

4

2

0

(a)

x1

Directional energy density El(

l,

l), color: dB ref 1

x3

x2

10

8

6

4

2

0

(b)

Fig. 8: Spherical plots of the (a): directional pressure

pl(l ,l ) (linear (radius) and dB ref 1 (color) scale),(b): the directional energy density El(linear (radius) and

dB ref 1 (color) scales) and the energy vector E(shown

as large arrow) for a single dipole in free field for the

extrapolated sound field shown in Fig. 4.

2 1 0 1 2

2

1

0

1

2

Re[V(x)], Max(|V(x)|) = 36.289

x1/

k

x2

/k

2 1 0 1 2

2

1

0

1

2

Im[V(x)], Max(|V(x)|) = 36.289

x1/

k

x2

/k

Fig. 9: Real (top) and imaginary (bottom) parts of the

velocity vectorV(x)(Eq. (46)) for a single dipole in freefield for the extrapolated sound field shown in Fig. 4.

Local SFE errors shown as contour lines (see Fig. 4 for

more details).


Page 15 of 32


16/32


Metrics Values

Deviation ofE(x)(N) 110.3190%Directional diffusion (d) 13.6152 %

Directivity factor (Q) 0.2920Directivity index (DI) 5.3462 dB ref 1

Energy vector azimuth (E) 1.0985 radEnergy vector elevation (E) 0.0014 rad

Energy vector radius (RE) 0.9174

Table 1:Scalar metrics for the single dipole in free field

(see Fig. 4) withN=14400.

Fig. 10:Predicted ITD [ms] (Eq. 48) for a single dipole

in free field for the extrapolated sound field shown in

Fig. 4. The listener head orientation is fixed toH= Eover the entire SFE region. The head orientation is

shown as a black large arrow. Local SFE errors shown

as contour lines (see Fig. 4 for more details).

the scalar metrics, except the directional diffusionpl , are

based on absolute and squared values of the plane wave

amplitudesq. Therefore, the phase shift of the solution

for a translation of the origin (Eq. (6)) does not affect the

computed scalar metrics. This is an important property

of these scalar metrics.

For this reported test case, the value of the directivity

factor and directivity index suggest a moderately direc-

tive sound field. This is not really the case, therefore,

these two quantifiers might not be the most appropri-

ate or should, at least, be modified for the case of singlesource in free field. For comparison purpose, withL =642 plane waves in the source distribution of the direct

problem, a single plane wave would give Q =1/1=1and DI=0 dB ref 1 while a totally diffuse sound fieldwith L=642 equal amplitude plane waves would giveQ = 1/642 = 0.0016and DI = 28.0754dB ref 1. How-ever, as will be shown in the following, more immers-

ing sound field situation, theQ and DI values reported

in Tab. 1 are in the highest observable range.

Comparison of these Qand DI directivity metrics with

the directional diffusion dsuggests that the latter is a

more effective directivity metric. Indeed, d is muchcloser to its lowest value (0%) thanQ(or DI) is closer to

its highest value. Moreover, a high standard deviation of

the energy density (N) suggests an heterogeneous distri-bution of the energy density through space which com-

forts the idea that the sound field is all but diffuse.

From the scalar metrics, one also notes that the energy

vector magnitudeRE=0.9174 is relatively high. This isa direct consequence of the energy vector E (Eq. (49))

definition which implies that the energy vector is high if

and only if the directional pressure shows a strong spher-

ical polarity. Indeed, in the case of spherically symmet-

ricalqlthe vector sum in Eq. (49) is null. This will be

further discussed for the upcoming test cases.

3.7.2. Two sources in free field

This test case corresponds to a free-field 2-channel

stereophonic sound reproduction situation. As shown in

Fig. 11, two in-phase monopole sources are located in

x1/k=1 and x2/k=2. The monopole amplitude[25] of each source is 0.5.

For this test case, theSFE results arepresentedin Fig. 11.

The intensityfield computed atN= 625 points in the hor-izontal plane is shown in Fig. 12(a). Clearly, the intensity

field is stronger in the vicinity of the monopolepositions.

Also note that the average intensity is correctly orientedin terms of stereophonic sound perception. The DOA

field is shown in Fig. 12(b). This result clearly highlights

the effectiveness of the SFE-based DOA vector as a pre-

dictor of perceived sound localization in a stereophonic

free-field situation (two coherent sound sources). This

prediction is valid over the entire effective SFE region.

Indeed, for the extended central sweet spot (x1/k=0,


Page 16 of 32


17/32


x2 2), the reported test case exactly corresponds toa stereophonic listening with a centered phantom im-

age created by level-differencestereophony. The average

DOA is also correctly oriented in that sweet spot. For anoff-axis listening position, the phantom image predicted

by the DOA and SFE deviates towards the closest sound

source.

The energy density field E(x)and the diffuseness field(x)for the same extrapolated sound field are shown inFigs. 13(a) and 13(b). Again, the energy density is well

localized in the vicinity of the exact source positions.

One notes that the local diffuseness is zero nearly ev-

erywhere in the SFE region except in the region where a

wrong or imprecise phantom image position is expected

from two-channel stereophonic systems (for a given fre-

quency). Source localization cues from the local diffuse-ness(x)agrees with the predicted perceived sound di-rection from the DOA in these regions (see Fig. 12(b)).

Indeed, a closer look at the DOA (Fig. 12(b)) in this re-

gion reveals a DOA that strongly varies with the position

in that area.

The directional pressure, directional energy and energy

vectors are presented in Figs. 14(a) and 14(b). One notes

that the energy vector is a good predictor of sound lo-

calization for a listener at the center of the array while

the directional pressure and energy density reveal both

the presence of the two real sources and the presence of

the perceived central sound image. The complex velocity

vector field is shown in Fig. 16. Again, we note that the

real part of the velocity vectors predict both the sound

localization created by the stereophonic image and the

diffuse curved-regions. The ITD predicted from the real

part of the velocity vector is shown in Fig. 15 where one

clearly notes that the ITD is zero in the central region.

Moreover, one can observe the expected passage from

negative to positive ITD on the left and on the right sides

of the central position. Again, SFE seems to correctly

predict the velocity vector and the ITD for the reported

test case.

The scalar metrics for this test case are presented in

Tab. 2. The directivity factor Qand the correspondingdirectivity index DI are lower than for the case of a sin-

gle source in free field. These values suggest a less di-

rective sound field, which is the case. The directional

diffusiondalso gives a higher value. The deviation of

the energy density is very similar to the value obtained

for the single dipole test case. Therefore, it seems that

this deviationNmight be a good indicator of free-field

Fig. 11:(a): Real part of the original sound field created

by two monopole sound sources. (b): Real part of the

extrapolated sound field using a plane wave source dis-

tribution obtained from Eq. (20) with =0.001. Themonopoles are marked as a black and white dots. The

white contour line indicates the region of 0.001 of local

quadratic SFE error and the black contour line indicates

the region of 0.1 of local quadratic SFE error.


Page 17 of 32


18/32


2 1 0 1 2

2

1.5

1

0.5

0

0.5

1

1.5

2

I(x) [W/m2], Max(|I(x)|) = 0.15636 W/m

2

x1/

k

I(x) N

= 0.011408

x2

/k

(a)

2 1 0 1 2

2

1.5

1

0.5

0

0.5

1

1.5

2

nDOA

(x)

x1/

k

nDOA

N

x2/k

(b)

Fig. 12: (a): Intensity field I(x)(Eq. (32)) [W/m2], av-

erage intensity vector I(x)N [W/m2

], (b): direction-of-arrival nDOA(x)(Eq. (34)) and average DOA vectornDOA(x)N (N=625) for two monopoles in free fieldfor SFE shown in Fig. 11. The microphone array is

shown in light grey. The average vectors (computed for

the SFE points) are centered at the origin and shown as

a large arrow. Local SFE errors shown as contour lines

(see Fig. 11 for more details).

(a)

(b)

Fig. 13: (a): Energy density E(x) [J/m3] 104

(Eq. (36)), average energyE(x)N[J/m3], (b): diffuse-

ness (x)(Eq. (51)) and average diffuseness (x)N(N=14400) for two monopoles in free field for the ex-trapolated sound field shown in Fig. 11.


Page 18 of 32


19/32


x1


l,

l), color: dB ref 1

x3

x2

10

8

6

4

2

0

(a)

x1


l,

l), color: dB ref 1

x3

x2

10

8

6

4

2

0

(b)

Fig. 14: Spherical plots of the (a): directional pressure

pl(linear (radius) and dB ref 1 (color) scale), (b): energy

vector E, directional energy density El (linear (radius)

and dB ref 1 (color) scale) and energy vector E for thetwo monopole sources in free field (see Fig. 11). The

energy vector is shown as a large arrow (aligned withx2and fused with the main lobe).

Fig. 15: Predicted ITD [ms] (Eq. 48) for the two

monopole sources in free field (see Fig. 11). The listener

head orientation is fixed toH= Eover the entire SFEregion. The head orientation is shown as a black large

arrow.

Metrics Values

Deviation ofE(x)(N) 108.2139 %Directional diffusion (d) 22.9553 %




Table 2:Scalar metrics for the two monopole sources in

free field (see Fig. 11) with N=14400.

situations with localized sound sources. Again, the en-

ergy vector azimuth and elevation angles agree with theexpected sound perception.

3.7.3. Standing wave in rectangular coordi-nates

In this case, the sound field is a standing wave in rectan-

gular coordinates created by eight propagating waves in

three-dimensional space. Low frequency standing wave


Page 19 of 32


20/32


2 1 0 1 2

2

1

0

1

2

Re[V(x)], Max(|V(x)|) = 35.0474

x1/

k

x2

/k

2 1 0 1 2

2

1

0

1

2

Im[V(x)], Max(|V(x)|) = 35.0474

x1/

k

x2

/k

Fig. 16: Velocity vector V(x) (Eq. (46)) for the twomonopole sources in free field (see Fig. 11). The

monopoles are marked as black and white dots. The mi-

crophone array is shown in light grey. Local SFE errors

shown as contour lines (see Fig. 11 for more details).

patterns can be found in small closed spaces such as ve-

hicle cabins. Although not very often investigated by

the spatial audio community, the identification, charac-

terization and subsequent reproduction of standing wavepatterns represents a specific challenge encountered in

sound environment reproduction of closed spaces. The

reported theoretical case corresponds to an oblique mode

of a rigid-walled rectangular cavity. For this test case, the

sound field is given by

p(x) =cos(kx1x1)cos(kx2x2)cos(kx3x3), (53)

with kx1 =2cos(s)cos(s), kx2 =2sin(s)cos(s)andkx3 =2sin(s). For the reported case, the standingwave angles were set to s = /7 and s = /6. Thecomparison of the original sound field and the SFE re-

sults with a regularization parameter of =0.0001 isshown in Fig. 17: SFE is effective over a large region.

This is perhaps caused by the fact that the plane wave

model used in the direct problem definition is more ap-

propriate for that type of sound field.

As expected, the corresponding intensity field (not

shown here) is numerically null over the entire region.

The energy density field E(x)and the diffuseness field(x) for the extrapolated sound field are shown inFigs. 18(a) and 18(b). Clearly, energy distribution over

the entire SFE region corresponds to the modal pattern.

However, the local diffuseness(x)is one nearly every-

where. On the basis of the definition of the diffuseness(x)(see Eq. (51)), this was to be expected. Indeed, ac-cording to that definition, this metric will attribute full

diffuseness to a standing wave field since its net intensity

is null over the entire SFE domain.

The directional pressure and energy density are shown in

Figs. 19(a) and 19(b). In the second of these two figures,

the identification of the eight propagating waves that cre-

ate the three-dimensional standing wave is clear.

For this standing wave test case, the scalar metrics are

reported in Tab. 3. The directivity factor Q(Eq. (44))

is 0.2530 and the corresponding directivity index DI

(Eq. (45)) is5.9682 dB ref 1. These values suggest amoderately directive incident sound field. This is not re-

ally the case, therefore, these two quantifiers might not

be the most appropriate or should, at least, be modi-

fied to detect standing wave pattern. Some indications

of the standing-wave nature of the sound field is pro-

vided by the fact that the directional diffuse dis very

low: 12.3839%, which is the lowest of all the observed


Page 20 of 32


21/32


Fig. 17: (a): Real part of the original sound field de-

fined as a standing wave in rectangular coordinates. The

dimensionless problem is normalized by the acoustical

wavelength k. (b): Real part of the extrapolated soundfield using a plane wave source distribution obtained

from Eq. (20) with=0.0001. The white contour lineindicates the regionof 0.001 of local quadraticSFE error.

The nodal lines are shown as dashed black lines.

(a)

(b)

Fig. 18: (a): Energy density E(x) (Eq. (36))[J/m3]106, average energyE(x)N [J/m3], (b): dif-fuseness (x) (Eq. (51)) and average diffuseness(x)N(N= 14400) for a standing wave and for the ex-trapolated sound field shown in Fig. 17. The nodal lines

are shown as dashed lines.


Page 21 of 32


22/32


x1

x3


l,

l), color: dB ref 1

x2

10

8

6

4

2

0

(a)

x1

x3


l,

l), color: dB ref 1

x2

10

8

6

4

2

0

(b)

Fig. 19:(a): Directional pressurepl(linear (radius) and

dB ref 1 (color) scale) and (b): directional energy den-

sity El (linear (radius) and dB ref 1 (color) scale) for a

standing wave for the extrapolated sound field shown in

Fig. 17.

directional diffusion for the four reported test cases.Moreover, the energy vector radius RE is very small:

0.0379 in comparison with RE close to 1 for the singledipole and two monopole test cases. Notably, this test

case illustrates a very interesting property of the energy

vector magnitude. Indeed, as soon as two plane waves

of opposite direction share a similar amplitude|ql|, theytend to cancel each other in the computation of the en-

Metrics Values





Table 3: Scalar metrics for the stationary wave (see

Fig. 17) withN=14400.

ergy vector. Most interesting is the fact that this hap-

pens for plane wave distributions such as the one shown

in Fig. 19(a) for a oblique standing wave but also for anyother stationary wavessuch a cylindrical or sphericalhar-

monics. Moreover, this same cancellation also arises for

a diffuse sound field where soundenergy travels in all di-

rections. Therefore, the energy vector magnitude seems

a good predictor of directive (few sources in free space)

or non-directive (standing waves or partly diffuse sound

fields) sound field. This will be further discussed in the

case of the diffuse sound field.

3.7.4. Diffuse sound field

For this test case, an harmonic diffuse sound field is

created using a limited set of 642 plane waves coming

from random directions covering a 4steradians solidangle. Both the amplitude and phase of the plane waves

were random. For more details about the definition and

properties of harmonic diffuse sound fields, the reader

is referred to [25]. The original diffuse sound field and

the corresponding SFE result are shown in Fig. 20 for

=0.0001. Again, the SFE method performs very wellover a large effective area even for the specific case of

a diffuse sound field. The sound field characterization

metrics, namely sound intensity, DOA, energy density,

diffuseness, directional pressure and directional energy

density are shown in Figs. 21(a) to 23(b).

Both the sound intensity and the DOA fields shown in

Figs. 21(a) and 21(b) suggest a diffuse situation. Indeed,

the sound intensity average is very low and the DOA and

sound intensity spatial variations are large. This diffuse

character will be supported by the corresponding scalar

metrics.

The acoustical energy density E(x)and the diffuseness(x)are shown in Figs. 22(a) and 22(b), respectively.


Page 22 of 32


23/32


Fig. 20:(a): Real part of the original diffuse sound field

defined. Thedimensionless problem is normalized by the

acoustical wavelengthk. (b): Real part of the extrapo-lated sound field using a plane wave source distribution

obtained from Eq. (20) with= 0.0001. The white con-tour line indicates the region of 0.001 of local quadratic

SFE error and the black contour line indicates the region

of 0.1 of local quadratic SFE error.

2 1 0 1 2

2

1.5

1

0.5

0

0.5

1

1.5

2

I(x) [W/m2], Max(|I(x)|) = 0.00085613 W/m

2

x1/

k

I(x) N

= 2.7693e005

x2

/k

(a)

2 1 0 1 2

2

1.5

1

0.5

0

0.5

1

1.5

2

nDOA

N

nDOA

(x)

x1/

k

x2

/k

(b)

Fig. 21:(a): Intensity fieldI(x)[W/m3] (Eq. (32)), aver-age intensity vectorI(x)N[W/m3], (b): DOAnDOA(x)(Eq. (34)) and average DOA vector nDOA(x)N (N=625) for a diffuse sound field (SFE shown in Fig. 20).

The microphone array is shown in light grey. The aver-

age vector are centered at the origin and shown as large

arrows. Local SFE errors shown as contour lines (see

Fig. 20 for more details).


Page 23 of 32


24/32


(a)

(b)

Fig. 22: (a): Energy density E(x) [J/m3] 106

(Eq. (36)), average energyE(x)N [J/m3], (b): diffuse-

ness (x)(Eq. (51)) and average diffuseness (x)N(N=14400) for a diffuse sound field (SFE shown inFig. 20).

Metrics Values





Table 4: Scalar metrics for the diffuse sound field (see

Fig. 17) withN=14400.

The energy density distribution is not homogeneous and

the diffuseness goes through strong spatial variations

from zero to unity. By itself, the diffuseness average(x)N= 0.43666 suggests a moderately diffuse soundfield, which is not the case. Since the diffuseness is

(x)1 x for the stationary wave (see Fig. 18(b))but much less than unity for the true diffuse sound field,

the diffuseness might not be the most appropriate and il-

lustrative quantifier to distinguish an harmonic standing

wave from an harmonic diffuse sound field.

The information on the directional pressure and energy

density are shown in Figs. 23(a) and 23(b). Since the

energy vector E is very small, it is not shown on these

two figures. By comparison with the previously reported

test cases, these directional quantifiers show a distribu-

tion that covers more uniformly the 4steradians solidangle. However, one can observe that the directional en-

ergy shows some sort of principal directions, something

that would not be the case for a true description of a

diffuse sound field. This heterogeneity will explain the

fact that scalar metrics reported in subsequent paragraphs

does not reach the expected and ideal theoretical and ex-

treme values. In fact, due the array finite size and geom-

etry, it might not be possible to obtain a entirely filled

plane wave distribution for a true diffuse sound field.

The scalar metrics for the diffuse sound field test cases

are very relevant. They are shown in Tab. 4. Attention

will be directed to the two most relevant scalar metrics.First, one notes that the directional diffusion dis higher

than for all the other test cases. Second, the energy vec-

tor radiusREis, as expected, very low, hence suggesting

a poorly perceived sound source position. As it will be

shown in the next section, these two scalar metrics can be

used to derive a classification tree that might be able to

distinguish between the archetypical situations of a pre-


Page 24 of 32


25/32


x1

x3


l,

l), color: dB ref 1

x2

10

8

6

4

2

0

(a)

x1

x3


l,

l), color: dB ref 1

x2

10

8

6

4

2

0

(b)

Fig. 23:(a): Directional pressurepl(linear (radius) and

dB ref 1 (color) scale) and (b): directional energy density

El(linear (radius) and dB ref 1 (color) scale) for a diffuse

sound field (SFE shown in Fig. 20).

cise sound source, a standing wave pattern and a diffuse

sound field.

3.8. Transition between test cases

The previous sections and subsections highlighted the ef-

fectiveness of the field and scalar metrics to predict var-

ious characteristics of the sound field computed by SFE.

Yet the physical metrics such as sound intensity field,

DOA field, diffuseness field and expected ITD are very

useful, they also provide a very much detailed descrip-

tion of the sound field. Sometimes, it is interesting to

classify or characterize the sound field in broader cate-

gories, atoms or terms so that the most appropriate soundreproduction techniquecan be selected for that sound en-

vironment component. In this section, we present the

transition between the archetypical test cases reported

earlier: it will be shown that a simple classification could

be based on two scalar metrics, namely the energy vector

magnitudeREand the directional diffusiond.

Transitions between: the single dipole and the two

monopole cases (hereafter named #1 to #2), the single

dipole and the standing wave (#1 to #3), the standing

wave to the diffuse sound field (#3 to #4) and the single

dipole to the diffuse sound field (#1 to #4) were com-

puted for 20 interpolation points. For the interpolation,the orignal test cases were first scaled to ensure a simi-

lar vector 2-norm of the measured pressurep(xm)at themicrophone array. Note that transition between the two

monopoles in free field and the standing wave or diffuse

sound field are not reported. Indeed, it was sufficient

to keep only a single free-field case, namely the single

dipole, for the transition between free-field and standing

wave or diffuse sound field. Otherwise, the transition

graphics would have been too dense. Next, for the in-

terpolation, a linear amplitude fade is applied between

two limiting test cases and the inverse problem is solved

for each of the interpolation points. The energy vector

and the directional diffusion are then directly computedas above. Note that for the interpolations the regulariza-

tion parameter in Eq. 22 is chosen as follows: #1 to #2,=0.01; #1 to #3, =0.0001; #1 to #4, =0.0001and #3 to #4,= 0.0001.

The results, in terms of energy vector magnitude and di-

rectional diffusion are shown in Fig. 24.

Lets first examine the transition #1 to #2 which is from

the single dipole to the two monopole cases. As one

could expect, they both occupy a similar region of the

dREplane and, most notably, they share a very highR

Ewhich indicates a very directive sound field.

Next, consider the transition #1 to #3 which is from the

single dipole to the standing wave. The major differ-

ence between the two limiting points of this transition is

alongRE: the standing wave involves, as explained ear-

lier, a very lowRE. When the transition fade is at 50 %,

RE 0.6. The curvature (along axis d) of this transi-tion curve is easily explained by the fact that the stand-


Page 25 of 32


26/32


27/32


simple assume that a point that tends toward this limit-

ing corner will approach a single propagating wave in

free field. The other limiting case occurs at RE =0,

d/100=1 (shown as a thick+in Fig. 24). This caseis even more difficult to reach both physically and as a

solution of the inverse problem. Indeed, this is only pos-

sible if the plane wave amplitudes are all exactly equal

in magnitude. The imperfection of the array and the SFE

algorithm cannot reach such an ideal case. This explains

why the directional diffusion seems to be limited at 0.6for thereported cases. The last possible limiting case is at

RE= 0,d/100 = 0 (shown as a thickin Fig. 24). Thiscould only occur if two plane waves of exactly the same

amplitude were exactly facing each other. That would

be the indication of an axial mode. The other corner of

the classification plot, RE=1, d/100=1, correspondsto an impossible case of both spherically-polarized and

spherically-uniform plane wave amplitudes. Therefore,

one should not expect any points in the top right triangu-

lar part of the plot.

Before actually proceeding to the experimental valida-

tion of the proposed SFE and SFC methods, the sug-

gested classification criteria must be discussed.

First, one should be aware that this is a preliminary pro-

posal and that it could be refined. Indeed, it is easy

to object that the methodology behind the definition of

this classification is some sort of a manual multi-

dimensional analysis. We note that a systematic multi-dimensional analysis could be performed. However, the

simple classification, as proposed in this paper, has the

great advantage of involving two simple metrics and not

a metrics linear combination. Consequently, the classifi-

cation is easy to understand and interpret.

Second, it could be objected that the proposed transition

values RE and d that circumscribe the directive, non-

directive, stationary and diffuse regions are derived, for

specific test cases for a specific microphone array and

for dimensionless simulations. Further verifications for

various cases should be performed. In fact, we suggest

that these transition values should be verified for othermicrophone arrays. In all cases, the experimental results

will show that the proposed classification and transition

values are convenient for the reported experiments.

3.10. Sound-field type scores

In real applications, a sharp classification tree might not

always be the most appropriate approach to quantify the

measured sound field. Therefore, we propose the intro-

duction of sound-field type scores. The free-field score

is given by

Sff=RE. (54)

The modal or standing-wave score is given by

Sm= (1 RE)(1 d/60)2, (55)

and the diffuse-field score is given by

Sd= (1 RE)(d/60)2. (56)

As one notes, the division of the directional diffusion by

60 is inspired from the observations that stem from the

transition plot shown in Fig. 24. Further refinements of

these scores could be done. The interest of the scores

in comparison with a sharp classification scheme is thatthey can deal with intermediate cases. Moreover, for

broadband noise or signals, it would be possible to plot

the score as function of frequency. This is the topic of

current verifications.

To illustrate the capability of these sound-field scores to

deal with the reported test cases, Fig. 26 shows the scores

of the four transitions already reported in Fig. 24. For

Fig. 26(a), the transition from the single dipole to the two

monopoles cases systematically gives Sffas the highest

score. For the transition from the single monopole to the

diffuse sound field, the scores shown in Fig. 26(b) are

able to distinguish the free-field and diffuse-field situa-tions. For the twoother transitions reported in Figs. 26(c)

and (d), the scores are also good detectors for the stand-

ing waves and diffuse sound field.

4. CONCLUSION

The aim of this paper was twofold: 1) develop and de-

scribea method of spatial sound field extrapolation based

on microphone array measurements of arbitrary geome-

try and 2) develop and define a sound field characteri-

zation method and a sound field classification based on

known objective and subjective metrics.

To achieve SFE, a recently developed method was pro-posed and further analyzed. This method is based on the

combination of classical least-square inverse problems in

matrix form with a beamforming regularization matrix

used as a discrete smoothing norm in the regularization.

In this paper, we extended the analysis of this regular-

ization method which was compared to the application

of a beamforming scaling matrix in the inverse problem


Page 27 of 32


28/32


ffm d0

0.5

1

Sff,

Sm,

Sd

ffm d0

0.5

1(a) Transition from #1 to #2 (0%, 50%, 100%)

ffm d0

0.5

1

ffm d0

0.5

1

Sff,

Sm,

Sd

ffm d0

0.5

1(b) Transition from #1 to #4 (0%, 50%, 100%)

ffm d0

0.5

1

ffm d0

0.5

1

Sff,

Sm,

Sd

ffm d0

0.5

1

(c) Transition from #3 to #4 (0%, 50%, 100%)

ffm d0

0.5

1

ffm d

0

0.5

1

Sff,

Sm,

Sd

ffm d

0

0.5

1(d) Transition from #1 to #3 (0%, 50%, 100%)

ffm d

0

0.5

1

Fig. 26:Sound type scores (Sff,Sm,Sd) for the four tran-

sitions shown in Fig. 24. The scores are only shown for

0 %, 50 % and 100 % (from left to right) of the transi-

tions, these points correspond to the markers in Fig. 24.

The highest scores are highlighted as black bars.

with classical Tikhonov regularization. On the basis of

the general singular value decomposition of the transfer

matrix and beamforming regularization matrix pair and

the singular value decomposition of the transfer matrix,

it was shown that the beamforming regularization ma-trix approach is equivalent to an inverse problem where

the original transfer matrix is scaled by a beamforming

scaling matrix. This is a new result that opens up the

understanding of the original proposal on the use of a

beamforming regularization matrix [7].

Once SFE was achieved, the inverse problem solution

was investigated to evaluate different sound field met-

rics, namely: energy density, sound intensity, direction

of arrival, diffuseness, velocity vector, energy vector, di-

rectional energy, interaural time difference, incident di-rectivity factor, incident directivity index and directional

diffusion. Using theoretical simulations, they were com-

pared in terms of their capability to simply character-

ize archetypical sound field types: a small number of

sources in free-field situations, standing wave patterns

2011 beamforming regularization, scaling matrices and inverse problems for sound field extrapolation...

Documents