acoustic feedback suppression in audio mixer for pa applications1114170/... · 2017-06-22 ·...

Acoustic feedback suppression

in audio mixer for PA applications

Mattias Ekström

Master’s Thesis in Engineering Physics, Department of Physics, Umeå University, 2017

Department of Physics

Linnaeus väg 20 901 87 Umeå Sweden www.physics.umu.se

Department of physicsUmeå University June 19, 2017

Acoustic feedback suppression

in audio mixer for PA applications

Mattias Ekström (maek0025@ student. umu. se )

June 19, 2017

Master’s thesis, engineering physics, spring 2017, 30 creditsSupervisor: Christian Schüld, Limes Audio

Examiner: Ove Andersson, Department of physics

mailto:[email protected]

Abstract

When a speaker is addressing an audience, a PA system consisting of a microphone

and a loudspeaker is often used. If the microphone picks up too much of the loud-

speaker energy, acoustic feedback in the form of an unwanted characteristic howling

can occur. Limes Audio is a software company that specializes in improving sound

quality in digital communications, mainly conference telephony, and has developed

a reference product, the Magneto mixer, to demonstrate the capability of their soft-

ware TrueVoice. The company now wishes to expand the field of usage for the

Magneto mixer to enable it to work as a microphone mixer in PA scenarios, and for

this, a feedback suppression feature is needed. This master’s thesis aims at survey-

ing the market and the literature in the field and specifying the requirements for

a feedback suppression feature. Three methods for suppressing howling feedback

are evaluated through simulations and compared in terms of maximum stable gain

(MSG) and subjective listening experience. The method that performed the best

based on these criteria was acoustic feedback cancellation with a 5 Hz frequency

shift on the loudspeaker signal. This method makes use of an adaptive filter to

model the acoustic feedback path and to remove the feedback component from the

microphone signal. In the simulations, the method was able to increase the stable

gain by approximately 10 dB while maintaining a good sound quality.

i

Rundgångsreducering i ljudmixer för tillämpning iPA-system

Sammanfattning

När en talare talar för en publik används ofta ett PA system bestående av en mikro-

fon och en högtalare. Om mikrofonen tar upp för mycket av ljudet från högtalaren

finns en överhängande risk för akustisk rundgång i form av ett karaktäristiskt oöns-

kat tjut. Limes Audio är ett företag som utvecklar mjukvara för att förbättra ljud-

kvaliten i digital kommunikation, främst inom konferenstelefoni. De har utvecklat en

demonstrationsprodukt, Magnetomixern, som kan användas som en konferenstele-

fon för att demonstrera deras programvara TrueVoice. Företaget önskar nu utveckla

Magnetomixern till att även fungera som en ljudmixer för PA-scenarion, eller kon-

ferenstelefoni där intern ljudförstärkning i rummet behövs, och för detta behövs en

funktion för att ta bort eventuell rundgång. Detta examensarbete har som mål att

lägga grunden för en sådan funktion i Magnetomixern genom att undersöka markna-

den och litteraturen på området. Tre metoder för att eliminera rundgång utvärderas

i simuleringar och jämförs beträffande maximal stabil förstärkning (MSG) och sub-

jektiv ljudkvalitet. Metoden ”Acoustic feedback cancellation” tillsammans med ett 5

Hz frekvensskifte på högtalarsignalen gav högst MSG och bäst ljudkvalitet. Metoden

använder ett adaptivt filter för att approximera den akustiska återkopplingsvägen

mellan högtalare och mikrofon samt tar bort rundgångskomponenter från mikrofon-

signalen. I simuleringarna kunde metoden öka den maximala stabila förstärkningen

med upp till 10 dB medan en god ljudkvalitet på talet bibehölls.

ii

Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017

List of abbreviations

AEC Acoustic Echo Cancellation

AEQ Automatic Equalization

AFC Acoustic Feedback Cancellation

FFT Fast Fourier Transform

FIR Finite Impulse Response

IIR Infinite Impulse Response

IMSD Interframe Magnitude Slope Deviation

LTI Linear Time-Invariant

MSG Maximum Stable Gain

NFS Notch filter based Feedback Suppression

NLMS Normalized Least Mean Square

PA Public Address

PHPR Peak-to-Harmonic Power Ratio

PNPR Peak-to-Neighouring Power Ratio

RIR Room Impulse Response

iii

Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Disposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory 4

2.1 Basics of signals and systems . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.2 Digital filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 The feedback phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Stability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Methods used in feedback suppression 12

4 Description of algorithms 16

4.1 Frequency shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.1 Analytic signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 Two-stage notch filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.1 Detection stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2.2 Suppression stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3 Acoustic feedback cancellation . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3.1 NLMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Method for testing 25

5.1 MATLAB simulation and evaluation . . . . . . . . . . . . . . . . . . . . . 25

6 Results 28

iv


6.1 Feedback suppression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.2 Maximum stable gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2.1 Frequency shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2.2 Notch filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2.3 Acoustic feedback suppression . . . . . . . . . . . . . . . . . . . . . 33

6.3 Subjective listening experience . . . . . . . . . . . . . . . . . . . . . . . . 34

7 Discussion, conclusion and future work 36

References 38

v

1 Introduction

1.1 Background

In any given situation where a speaker is addressing an audience using a Public Address

(PA) system, consisting of a microphone and a loudspeaker, the entire performance is at

risk of being ruined by feedback, perceived as ”howling” at a certain frequency. Feed-

back howling is not only an unpleasant experience for the audience, but also puts the

PA equipment at risk of being damaged. Feedback occurs when the microphone takes

up too much of the loudspeaker’s energy (see chapter 2), and causes unstable oscillations

at problematic frequencies which is perceived as howling, that probably is familiar to

the reader. Throughout the history of PA systems, feedback has been a reoccurring

phenomenon and different measures have been taken to prevent this unpleasant experi-

ence. Since the 1960s, when the first feedback suppression methods were presented[1],

[2], novel methods and algorithms have been proposed, and since the dramatic increase

in the use of digital computers in the 1980s and forward, more powerful and efficient

algorithms have been developed through software implementations in digital signal pro-

cessors (DSP). Today, many consider the best method to avoid howling feedback to be

a careful and well planned setup of the microphone and loudspeakers, along with an ex-

perienced sound technician that sets the equalization in the PA system to be optimized

for the specific room, and decrease the gain of potentially problematic frequencies [3].

In many applications though, there is a need for a plug-and-play solution without the

presence of a sound technician, and for these scenarios, the processes usually performed

by a sound technician must be automated or other measures needs to be taken in order

to avoid howling feedback.

1


1.2 Motivation

Limes Audio AB is a company owned by Google that develops audio solutions for en-

terprise applications. Their main product, TrueVoice, has been developed to remove

echoes, noise and other sonic artefacts in conference telephony and other applications

that makes use of a communication system with a loudspeaker and microphone situated

in the same unit. Limes Audio has designed a reference product called the Magneto

mixer, that can be used as a plug-and-play conference mixing unit together with a com-

puter, and has the TrueVoice software embedded. The company now wishes to look into

the possibility of expanding the field of usage for the Magneto mixer, from working as a

conference telephony mixing unit to also be able to work as a plug-and-play mixer unit

in a PA system, and other teleconferencing scenarios where internal sound reinforcement

is necessary. For this, the software in the Magneto mixer needs to be adapted for the

PA case, which has a different problem formulation than the teleconference case.

1.3 Objective

For the Magneto mixer to work properly in the PA case, there is a need for a feedback

suppression feature. There are two main objectives for this work. The first objective

is to survey the literature on the subject as well as the competitors solutions to the

feedback problem, and provide documentation on the findings. The second objective is

to specify the requirements for a feedback suppression feature in the Magneto mixer and

to develop MATLAB code demonstrating the performance of some chosen methods, and

to perform an evaluation regarding which method Limes Audio should aim at including

in the Magneto Mixer in their future work of integrating a feedback suppressor in the

Magneto mixer.

1.4 Disposition

Chapter 2 describes the mathematical theory of the feedback problem and the conditions

required for howling feedback to occur. Chapter 3 briefly describes the available methods

on the subject, and provides arguments for my choice of methods for the next section.

Chapter 4 describes the chosen feedback suppression algorithms in detail, and chapter 5

describes the methods used for testing the implementations and simulating the PA-setup

2


in MATLAB. Chapter 6 presents the results from the evaluation procedures and chapter 7

concludes the report with a discussion of the findings in the work, and suggestions for

future work.

3

2 Theory

This chapter describes the theoretical foundation upon which all feedback suppression

algorithms are based, starting from the fundamentals in signals and systems. The math-

ematical formulation of the feedback problem is presented, and the conditions required

for howling feedback to occur are explained.

2.1 Basics of signals and systems

2.1.1 Linear systems

A system H is an operator that takes an input x(t) and produces an output y(t):

y(t) = H{x(t)}. (2.1)

H is said to be linear if it satisfies the superposition principle: if several inputs x1(t), x2(t), ..., xi(t)

produces outputs

y1(t) = H{x1(t)} (2.2)

y2(t) = H{x2(t)} (2.3)

... (2.4)

yi(t) = H{xi(t)}, (2.5)

4


then the output upon addition of the inputs and possibly scaling them by factors αi

satisfies

α1y1(t) + ... + αiyi(t) = H{α1x1(t) + ... + αixi(t)}. (2.6)

A system is furthermore said to be time-invariant if a time shift T in the input only

results in a corresponding time shift in the output:

y(t − T ) = H{x(t − T )}. (2.7)

A Linear Time-Invariant (LTI) system can be described by its impulse response h(t)

in the time domain and by its frequency response H(ω) in the frequency domain. The

impulse response is the output from an LTI system being excited with an impulse at

time t = 0. In the discrete domain, this impulse is represented by the Kronecker delta

impulse

di =

0 if i 6= 0

1 if i = 0.(2.8)

The corresponding impulse in the continuous domain is the Dirac delta function. If the

impulse response is known, one can, for any input x(t), determine the output y(t) of the

system with the convolution operator ∗:

y(t) = h(t) ∗ x(t). (2.9)

The frequency response, H(ω) is obtained by computing the Fourier transform of the

impulse response h(t), and describes the frequency spectrum of the output of the LTI

system, when the input is one of the above described impulse functions:

H(ω) = F{h(t)}, (2.10)

where F is the Fourier transform operator. A property of interest for the convolution

operator is the convolution theorem, which states that, upon computing the Fourier

transform of both sides of eq. (2.9):

Y (ω) = F{h(t) ∗ x(t)} = F{h(t)}F{x(t)} = H(ω)X(ω), (2.11)

where Y (ω), X(ω) are the Fourier transforms of their corresponding signal [4].

5


From eq. (2.11), it can easily be deduced that the total frequency response of a system

can be found by dividing the Fourier transform of the output signal by the Fourier

transform of the input signal:

H(ω) =Y (ω)

X(ω). (2.12)

For real-valued signals, the corresponding Fourier transforms are complex and Hermi-

tian [4]. From the complex-valued frequency response H(ω), the magnitude response

|H(ω)| and the phase response ∠H(ω) can be computed. These quantities describe the

magnitude and phase of the frequency components in the output signal from the system.

2.1.2 Digital filters

A digital filter is a system that manipulates an input signal in a desired way to produce

a specific output. Examples of these are band pass filters, low pass filters and high pass

filters. Digital filters can be either Finite Impulse Response (FIR), or Infinite Impulse

Response (IIR). As the names suggests, the impulse response of a FIR filter is of finite

order, and infinite for an IIR filter. Since FIR filters have finite impulse responses, they

are always stable, but can be computationally demanding, as opposed to IIR filters, that

can sometimes be unstable, but are in general less computationally demanding than FIR

filters [4].

2.2 The feedback phenomenon

In situations where a speaker is addressing an audience located in the same room, a PA

system, consisting of a microphone and loudspeakers, is often used. Due to the fact that

the microphone and loudspeaker are situated in the same room, there is a significant risk

of feedback from the loudspeakers to the microphone, which sometimes can be heard as a

characteristic ”howling” of tones with problematic frequencies for the specific enclosure.

Howling occurs when the microphone takes up too much of the loudspeaker energy and is

undesired, resulting in an unpleasant experience for the audience and a risk of damaging

the PA equipment.

The scenario can be described by the model shown in fig. 2.1. Throughout the work,

we will assume that the source signal u(t) contains speech only, the background noise

6


x(t)

G

y(t)

F

u(t)

Figure 2.1 – A model of the scenario case, here including one microphone and one

loudspeaker (single-channel system).

will not be considered. Furthermore, the speech is assumed to have been sampled to the

discrete domain at 16 kHz, which according to the Nyquist sampling theorem results

in that all signals components up to 8 kHz will be sampled without aliasing[4]. The

vast majority of the human speech is contained within this bandwidth, and therefore it

is assumed that the continuous source signal is band limited to 8 Hz and thus can be

sampled at 16 kHz and perfectly reconstructed from the samples without aliasing.

In fig. 2.1, a speaker produces speech into a microphone, resulting in a source signal

u(t). The signal is then processed in the electro-acoustic forward path, here denoted

G. This processing includes the amplifier gain and possibly digital audio effects such as

compression and equalization. One of the most simple types of processing in the electro-

acoustic forward path is a broadband gain, which is simply the ratio of the output signal

power and the input signal power. A broadband gain G(t) can be expressed in dB as

Gain = 20log

(

x(t)

y(t)

)

[dB], (2.13)

and is the only processing in the electro-acoustic forward path considered in this work.

The amplified output signal x(t) is then transmitted to the loudspeaker. The output

from the loudspeaker propagates through the room in which the PA system is set up,

and interacts with the environment in a way described by the acoustic feedback path

F . The acoustic feedback path is modelled as a linear system, with input signal x(t).

According to eq. (2.9), we can compute the output from that system, which is the

feedback signal going back into the microphone, by convolving the loudspeaker signal

7


with the impulse response of the acoustic feedback path F (t), also denoted the Room

Impulse Response (RIR). The signal is fed back into the microphone, forming a closed

loop system described by

y(t) = F (t) ∗ x(t) + u(t)

x(t) = G(t) ∗ y(t), (2.14)

where F (t) and G(t) are the impulse responses of the acoustic feedback path and the

electro-acoustic forward path, respectively. Upon computing the Fourier transform on

both sides of eq. (2.14) and making use of the convolution theorem in eq. (2.11), one

obtains:

Y (ω) = F (ω)X(ω) + U(ω) (2.15)

X(ω) = G(ω)Y (ω), (2.16)

where F (ω) and G(ω) are the frequency responses of the corresponding systems, and

X(ω), U(ω) and Y (ω) are the frequency contents of their corresponding signal. From

this, one can compute the total frequency response from the source u(t) to the output

x(t) by using the property described in eq. (2.12):

H(ω) =X(ω)

U(ω)=

G(ω)Y (ω)

Y (ω) − F (ω)X(ω)=

G(ω)

1 − F (ω)G(ω). (2.17)

The term F (ω)G(ω) is referred to as the loop response of the system, and the related

magnitude response |F (ω)G(ω)| is denoted the loop gain, whereas the phase response

∠F (ω)G(ω) is denoted the loop phase. The system described by the transfer function

in eq. (2.17) is assumed to be a linear, time-dependent, finite order system, as described

in section 2.1.1. These assumptions are justified in [3], where the authors argue that the

linearity can be derived from the fact that a sound wave’s interaction with the environ-

ment can be considered level independent, meaning that the nature of the reflections is

not dependent on the sound pressure level and therefore linear. The time-dependency

assumption is an obvious one, since the feedback path is dependent upon all movements

and changes in the room, including the microphone or loudspeaker changing positions.

Finally, the system can be considered to be of finite order owing to the fact that RIRs in

8


0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400−4

−2

0

2

4

6

8·10−2

Sample number

f(t

)

Figure 2.2 – The impulse response of a typical room, truncated at 2001 samples

general are infinite, showing an exponential decay over time, as shown in fig. 2.2. From

this observation, it is reasonable to allow truncation at a certain length of the RIR.

2.3 Stability analysis

Even though the system H(ω) is indeed time varying due to changes in the RIR, it is

common practice in the field of feedback suppression to carry out the stability analysis for

a time invariant system [3]. This is the reason that the expressions in eqs. (2.15) to (2.17)

do not depend on time. The stability analysis originates from the paper ”Regeneration

theory” by Harry Nyquist [5], which can be consulted for further reading. For the system

described in eq. (2.17), the system becomes unstable for |F (ω)G(ω)| ≥ 1, or 0 dB. In

order for the signal to diverge due to feedback, the components for the problematic

frequencies from each loop needs to superimpose over time. For this to occur, the

frequency components needs to be in phase, which requires the phase to be multiples

of 2π. This condition for instability is summarized in the Nyquist stability criterion:

if there exists a radial frequency ω for which the loop gain is greater than or equal to

unity, and for which the loop phase is any multiple of 2π, then the system is unstable:

9


0 1000 2000 3000 4000 5000 6000 7000 8000−70

−60

−50

−40

−30

−20

−10

0

f [Hz]

|F(f

)|[d

B]

(a) The magnitude response

0 1000 2000 3000 4000 5000 6000 7000 8000−700

−600

−500

−400

−300

−200

−100

0

100

f [kHz]

∠F

(f)[

rad]

(b) The phase response

Figure 2.3 – The characteristics of a typical room: the magnitude response and the phase

response

|F (ω)G(ω)| ≥ 1 (2.18)

∠F (ω)G(ω) = m2π m ∈ Z. (2.19)

The corresponding frequency f = ω/2π will, if present in the source signal, cause unsta-

ble oscillations in the system perceived as a howling sound. It should be pointed out that

the assumption that the system is time-invariant is not necessarily fulfilled. Actually,

it is virtually never fulfilled for any given PA scenario. However, under the assumption

that the RIR is ”slowly changing” over time, the Nyquist stability criterion applies. It

is important to note that this assumption can cause problems when the RIR is rapidly

changing, such as when the speaker is holding a portable microphone and is walking

around in the room, as explained in chapter 4. For this reason it is of importance to be

aware to this assumption.

Any given room with RIR F (t) has a specific value of Maximum Stable Gain (MSG),

which can be found from the frequency response F (ω). Expressed in dB, the initial MSG

is computed by finding the peak with the largest magnitude in the frequency response

that fulfils the phase condition eq. (2.19), and calculate how far that peak is from 0db.

The initial MSG is computed, in dB, as

− 20log(

max|F (ω)|)

∀ω : ∠F (ω) = m2π m ∈ Z. (2.20)

10


The magnitude and phase responses of a typical room, and also one of the room char-

acteristics used in the simulations in this work, are shown in fig. 2.3. The MSG of the

RIR shown in fig. 2.3 is 3.087 dB.

The main objective of feedback suppression is to manipulate the total transfer function,

by introducing additional sub-systems which alters the total frequency response in order

to increase the MSG, preferably without distorting the source signal. In the following

chapters, we will look into different methods of achieving this.

11

3 Methods used in feedbacksuppression

This chapter is a summary of the history and available literature of the field of feedback

suppression. The field of acoustic feedback suppression is a well studied subject, and

several methods have been proposed to solve the howling problem. There are four main

categories of feedback suppression, namely

• Periodic modulation methods

• Gain reduction methods

• Room modelling methods

• Spatial filtering methods (beamforming)

The first methods to address the issue of howling feedback, developed in the 1960s [1],

[2], belong to the first category. Implemented with electronic components, these methods

consists of manipulation of the microphone signal before amplification by altering the

phase of the signal by a small value φ, or by shifting the frequency of the signal by a small

∆f . In [2], an increase in maximum stable gain of 14dB was reported, but the effects

on the sound quality were too severe to be considered acceptable. Frequency shifting

is a method that is used in some commercial products today. One of these methods,

namely a frequency shift of 5 Hz, is evaluated in this work and is explained in-depth in

section 4.1.

The second category, gain reduction methods, can be divided into three subcategories,

depending on the frequency range in which the gain is reduced. Early works applied a

full-band gain reduction upon detecting howling [6]. This method does obviously not

increase the maximum stable gain, but merely brings back an unstable system to a stable

12


state. Full-band gain reduction was later refined into Automatic Equalization (AEQ),

which divides the input signal into frequency bands, and performs feedback detection

on every sub-band. If a howling frequency is detected, the gain is reduced only in the

sub-band where the critical frequency resides, thus leaving the rest of the signal intact.

The AEQ methods can be described as an attempt to automate the work of an audio

engineer, who often works with sub-band equalization to reduce feedback. The AEQ

method was further refined into Notch filter based Feedback Suppression (NFS), where

notch filters are used to suppress problematic frequencies at which howling has been

detected. Notch filters are stop band filters with a very narrow stop band (called a

”notch”), which severely reduces the gain in that particular frequency band and thus

removes those frequencies from the signal. These notch filters can be designed to be very

narrow, thus only suppressing a very small frequency band of the signal, namely where

the howling occurs. It should be mentioned that notch filters can be implemented as both

FIR and IIR filters, but in order to make them very narrow, a high order is required,

which means that IIR filters are often prefered. To suppress several frequencies in a

signal, a number of notch filters, centered at different frequencies, can be applied on a

signal, either by applying several filters in series or by designing one filter with two or

several ”notches”.

The NFS methods are by far the most used in commercial products today. All NFS

methods include a detection phase and a suppression phase [3], and are divided into

one-stage NFS methods and two-stage NFS methods. In one-stage methods, detection

and suppression are performed in the same step. In [7], the authors use adaptive notch

filters in order to detect and suppress howling in the same stage. It is concluded in

the paper, that the adaptive notch filters used in their work did not produce sufficient

feedback suppression in the entire frequency range. The most commonly used methods

in the NFS category are so-called two-stage methods, where detection and suppression

are separated. Often including the Fourier transform computed by the Fast Fourier

Transform (FFT), the frequency spectra of segments of the signal are evaluated. A

frequency spectrum is scanned with a peak-picking algorithm to find the frequencies

that has the most power, and the frequencies corresponding to these peaks are tested

against certain criteria to determine if they are indeed howling frequencies, or just tonal

components in the signal. If the detection algorithm finds a howling frequency, the

suppression stage receives information about the frequency at which howling occurs,

and applies a notch filter at that specific frequency to suppress the howling.

13


There are several spectral (frequency based) and temporal (time-based) features that a

howling components has, but a tonal component has not. In practice, one of these, or

a combination of them can be used to determine if a peak in the frequency spectrum

corresponds to a howling frequency. [8] evaluates a number of criteria that can be used

to evaluate if a signal component is a howling component of a tonal component. A

two-stage notch filter based method is evaluated in this work and is explained in-depth

in section 4.2.

The third category, room modelling methods, sometimes also called Acoustic Feedback

Cancellation (AFC), resembles the methods used in Acoustic Echo Cancellation (AEC),

a feature used in conference telephony and other applications where a speaker commu-

nicates with another speaker at a distant location using a conference telephone. In these

cases, the far-end speaker’s voice is output from a loudspeaker and fed back into a mi-

crophone, resulting in an echo back to the far-end speaker, if no measures are taken.

A common approach in AEC is to use an adaptive filter F to approximate the RIR F ,

and filter the output from the loudspeaker with F in order to model the feedback, and

remove the approximated feedback component from the microphone signal. If the adap-

tive filter is perfectly approximated, no feedback component remains in the microphone

signal.

The main difference between the AFC case and the AEC case is that the loudspeaker

signal is highly correlated to the microphone signal in AFC, which is not the case in the

AEC case [9]. When there is high correlation between the loudspeaker and microphone

signals, which occurs during ”double-talk” scenarios (when the near-end speaker and far-

end speaker speaks simultaneously), the AEC algorithms are known to perform poorly in

adapting the filters. This makes the AEC methods unsuitable for the AFC case, which

can be described as the AEC case with constant double-talk. In order to use adaptive

filters to remove the unwanted feedback, one needs to use decorrelation methods to

decrease the correlation between the loudspeaker and microphone signals[3], [10], [11].

Different methods for decorrelation have been suggested, such as noise injection on the

loudspeaker signal, frequency shifting or phase shifting the loudspeaker signal, non-linear

processing, introduction of a delay in the forward path and decorrelating pre-filters[9].

The method of using adaptive filters to remove unwanted signal contents, and the AFC

method evaluated in this work is further explained in section 4.3.

The fourth category, which is also known as beam forming, consists of using special

microphone- and/or loudspeaker arrays in order to reduce the signal transport between

the microphone and the loudspeaker, by modifying the directivity patterns of the array

14


to have the null direction in the direction of the other unit. These methods require

additional hardware and will for that reason not be considered in this work, which is

limited to software implementation.

In [3], the authors conclude that the most promising method in terms of achievable

increase in MSG and subjective sound quality is the AFC approach. For this reason,

one of these methods will be included in the MATLAB evaluation of methods. Upon

surveying the market, it is obvious that the two-stage NFS methods are by far the most

common in feedback suppression products. For this reason, one of these methods will

be implemented and evaluated in MATLAB. The nature of these methods includes the

disadvantage of being reactive, in the sense that howling sound needs to be detected, and

thus is often heard before it is suppressed. This is clearly a drawback of these methods.

AFC on the other hand is a proactive suppression method, which removes feedback and

echoes continuously, making it slightly more interesting than the NFS approach. As

explained in the section above, the AFC methods need a routine for de-correlating the

loudspeaker signal from the microphone signal, and a 5 Hz frequency shift was chosen

for this, mainly due to its simplicity, but also since frequency shifting is by itself a

feedback suppression method, which then can also be included as a stand-alone method

for comparison. The algorithms, by which these three methods operate, are presented

in detail in the following chapter.

15

4 Description of algorithms

In this section, the three chosen methods frequency shifting, notch filter-based feedback

suppression and acoustic feedback cancellation will be described in detail, and the nature

of howling will be related to them.

4.1 Frequency shifting

The frequency shifting method, as the name suggests, manipulates the microphone signal

by shifting all frequency components with a predetermined value ∆f . By performing

this frequency shift, one aims at circumventing the magnitude condition eq. (2.18), by

not allowing the signal components with the critical frequency fc to build up every loop,

but instead being shifted to frequencies which fulfil the magnitude condition eq. (2.18),

and thus stabilizing the system. A frequency shift can be performed in software by

performing manipulations of the so-called discrete-time analytic signal

ya(t) = y(t) + iy(t), (4.1)

where y(t) is the Hilbert transform of the original signal and i is the imaginary unit. The

analytic signal is defined as the original signal with zero negative frequency content. The

negative frequencies can be discarded due to the fact that audio signals are real signals,

and a property of real signals is that their frequency spectra is Hermitian, meaning that

the negative frequencies does not provide any information that cannot be found in the

positive frequency content [4]. One can perform frequency shifting by multiplying the

analytic signal with a complex exponential

16


x(t)

G

y(t)

F

u(t)FS

d(t)

Figure 4.1 – The system with a frequency shift of the microphone signal in the electro-

acoustic forward path

Smod(t) = eiωst, (4.2)

where ωs = 2π∆f , and is the modulation frequency. The output from the frequency

shift is then obtained by taking the real part of the resulting complex valued signal:

d(t) = Re(ya(t)Smod(t)) = y(t)cos(φ(t)) − y(t)sin(φ(t)), φ(t) = 2π∆ft, (4.3)

where d(t) is the frequency shifted output signal. The modulation can be described by

the system in fig. 4.1.

4.1.1 Analytic signal

The analytic signal can be obtained by computing the Fourier transform Y (ω) of a

segment of the input signal, and computing the inverse Fourier transform of the single-

sided spectrum, with the negative frequencies set to 0[12]. The inverse Fourier transform

is an approximation of the analytic signal. Since the spectrum of the approximated

analytic signal is single-sided, it is complex-valued and can be expressed according to

eq. (4.1). The nature of the Fourier transform requires that the input samples are

framed with frame size M samples which will introduce a delay of M samples in the

processing. In this work, an alternative method was used, which uses a modulated low

pass filter in order to obtain an approximation of the analytic signal [13]. To remove

the negative frequency components, a FIR low-pass filter of order 256 with normalized

17


−1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−110

−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

Normalized Frequency [ × π rad/sample]

Magnit

ude

[dB

]

Figure 4.2 – The magnitude response of the modulated low-pass filter, with a pass

band covering the entire positive frequency range and a stop band covering the negative

frequency range.

cut-off frequency of fs/4 was used. This filter was modulated with the frequency fs/4,

resulting in a complex-valued band pass filter with a pass band covering the entire

positive frequency range, and a stop band covering the entire negative frequency range.

The filter is visualized in fig. 4.2.

The input samples were buffered into a delay vector of the same length as the complex

modulated low pass filter (256), and for each sample, the dot product between the delay

vector and the filter was computed in order to obtain the current analytic signal sample,

which is an approximation of eq. (4.1) at time t. Equation (4.3) was then applied to the

approximated analytical signal sample in order to obtain the frequency shifted output

signal sample d(t).

4.2 Two-stage notch filtering

The two-stage notch filtering method makes use of information about the frequency

spectrum of the incoming signal in the detection stage, and applies notch filters to the

signal in the suppression stage, based on the findings in the detection stage. This section

describes the two-stage algorithm used in this work.

18


4.2.1 Detection stage

The incoming signal was framed in frames of M = 4096 samples using an overlap between

frames to reduce detection time. The overlap was set to M/2 samples. When a frame

had been filled with M samples, the frame was multiplied with a Blackman window

to avoid spectral leakage. The frequency spectrum Y (ω) of the windowed signal was

then computed with the Fourier transform. Due to the fact that the input signal is

real-valued, it is sufficient to consider only the single-sided frequency spectrum, out of

which the 10 largest peaks were located through a peak-picking algorithm. Evaluating

10 peaks gives a satisfactory level of confidence that a howling frequency is detected,

since it is almost always the case that howling frequencies do not occur at the same

level of applied gain, but occur ”once at a time” upon increasing the applied gain. In

MATLAB, the function findpeaks was used for this. The frequencies corresponding

to these peaks were considered ”possible howling frequencies” or {ωi}, 1 ≤ i ≤ 10.

The set of possible howling frequencies were then evaluated in three different steps to

determine if the frequency at hand was a howling frequency or a tonal component of the

input signal. This was done by the two spectral evaluations Peak-to-Harmonic Power

Ratio (PHPR) and Peak-to-Neighouring Power Ratio (PNPR), along with the temporal

evaluation Interframe Magnitude Slope Deviation (IMSD) [8].

4.2.1.1 Peak-To-Harmonic Power Ratio (PHPR)

Tonal components in speech often include harmonics, which are integer multiples of the

frequency component. This is not the case for a howling frequency, which consists of a

very narrow frequency without significant harmonics. The power of the possible howling

frequency is divided by the power of the m′th harmonic to compute the PHPR. This

feature is computed for each candidate howling frequency ωi for the m’th harmonic:

PHPR(ωi, m) = 10log10

|Y (ωi)|2

|Y (mωi)|2. (4.4)

4.2.1.2 Peak-To-Neighbouring Power Ratio (PNPR)

In speech, frequency components includes the property of having a broader bandwidth

than a single sinusoidal frequency component. In the frequency domain, this bandwidth

is identified by the power of the tonal component being shared over several neighbouring

frequency bins, centered around a peak. A howling component on the other hand, does

19


not share power with the neighbouring frequency bins. By computing the power of a

possible howling frequency and dividing it with the power of neighbouring frequency

bins, one can make an assessment on whether the component is a tonal component or

a howling component. The PNPR for the possible howling frequency ωi with the m’th

neighbouring frequency bin is computed as

PNPR(ωi, m) = 10log10

|Y (ωi)|2

|Y (ωi + 2πm/M)|2. (4.5)

The values computed in eqs. (4.4) to (4.5) are then compared to predetermined thresholds

TP HP R, TP NP R, and if the computed values are higher than the threshold values for

frequency ωi, it is considered to be a howling frequency.

4.2.1.3 Interframe Magnitude Slope Deviation (IMSD)

This feature uses the fact that howling has been observed to increases exponentially in

energy over time, which means linearly in dB-scale. This increase is not observed in tonal

components. IMSD for the possible howling component ωi computes a measurement of

the deviation from linear increase, by performing differentiations between the energy for

ωi at older frames, and more contemporary frames. A large deviation from linearity, that

is to say a large IMSD, suggests that the candidate is indeed not a howling frequency,

whereas for small deviations, the candidate is considered a howling frequency. The IMSD

is computed by

IMSD(ωi, t) =1

MF

MF −1∑

m=1

[

1

MF

MF −1∑

j=0

1

MF − j

(20log|Y (ωit − jP )| − 20log|Y (ωi, t − MF P |)−

1

m

m−1∑

j=0

1

m − j(20log|Y (ωi, t − jP )| − 20log|(Y/ωi, t − mP )|)

]

. (4.6)

The IMSD for each candidate howling component is compared to the threshold value

TIMSD, and if IMSD(ωi, t) < TIMSD, the frequency ωi is considered to be a howling

frequency.

20


Table 4.1

Threshold Value [dB]

TP HP R 10

TP NP R 30

TIMSD 1

4.2.1.4 Final assessment

The thresholds used in the three evaluations are presented in table 4.1.

For the PHPR, the 2nd and 3rd harmonics were included in the evaluation, and howling

was said to be detected if the threshold was exceeded for all harmonics. In the PNPR,

the six closest neighbours, three above and three below, were included, and howling was

said to be detected if the ratio exceeded the threshold for all neighbours. The IMSD

stored the frequency contents of the last 16 frames, and thus evaluated the slope for all

possible howling frequency components over 16 frames. These numbers were inspired by

[8], where the authors evaluated a number of spectral and temporal criteria for howling

detection, and found the combination above to be robust and with a small false-alarm

percentage1. The final threshold values were tweaked and tested until a reasonable

howling detection was obtained.

The total assessment of the possible howling frequencies for each frame consisted of a

combination of PHPR, PNPR and IMSD, and only if all three conditions for howling

were fulfilled for the frequency ωi, it was considered to be a howling frequency, and

actions were taken to suppress the frequency at hand.

4.2.2 Suppression stage

Upon detecting howling at frequency ωi, the suppression stage applied a notch filter in

the acoustic forward path, centered at frequency ωi. A maximum of 20 notch filters was

set, in order to prevent the source signal from being overly distorted. To make the filters

as narrow as possible, biquadratic IIR filters were used in series, where the output yi[n]

from filter i with input xi[n] can be computed from the difference equation

1The false-alarm percentage is the ratio of occurrences of erroneously detected frequencies over the

total number of detected frequencies

21


yi[n] =1

a0

(

b0xi[n] + b1xi[n − 1] + b2xi[n − 2] − a1yi[n − 1] − a2yi[n − 1])

, (4.7)

where n is the sample number and a0, ..., a2, b0, ..., b2 are the filter coefficients. For each

of the 20 filters, the two latest output samples y[n − 1] and y[n − 2] and the three latest

input samples x[n], x[n − 1] and x[n − 2] are required. These samples were stored in a

3x21 matrix Ydel, where the input samples to the i′th filter were stored in i′th column,

and the output samples were stored in the i + 1′th column:

Ydel =

x1[n − 2] y1[n − 2] = x2[n − 2] . . . yC [n − 2]

. . .. . .

x1[n] y1[n] = x2[n] . . . yC [n]

(4.8)

where C is the number of active notch filters. Since the filters were applied in series,

the output samples from the i′th filter are the same as the input samples to the i + 1′th

filter. The filter design is by itself not considered in depth in this work. The filters

do not need to be designed in real time upon detection, since the frequency resolution

of the Fourier transform is known a-priori. The size of the Fourier transform frames

used in the detection phase was 4096 samples, which results in 2048 samples in the

one-sided frequency spectrum. Since the highest possible frequency was 8 kHz, the

frequency resolution was 8000/2048 = 3.9063 Hz / frequency bin. Knowing the frequency

resolution, notch filters can be designed offline for all available frequencies, and then

stored to save computational effort in the real-time implementation. Upon detecting

howling at a specific frequency, a look-up table can be used to activate the correct

filter. In this work however, the filters were designed upon detection with the MATLAB

function iirnotch, which returned the filter coefficients that were stored in a 6x20

matrix. All notch filters were designed to have a Q-factor of 35. With C number of

active notch filters, the output sample d(t) is the last element from the C + 1′th column

of the matrix Ydel. Recall that the total number of notch filters allowed were 20, which

makes the last element of the 21st column the final output sample, if all notch filters are

active.

4.3 Acoustic feedback cancellation

The method of using adaptive filters to cancel out unwanted components from the micro-

phone signal is widely used in teleconference applications. Acoustic feedback cancellation

22


x(t)

G

+y(t)+

F

u(t)

F

y(t)

−d(t)

Figure 4.3 – The AFC situation, where the impulse response F (t) is approximated with

an adaptive filter F (t).

is similar to the teleconference case, but instead of a far-end speaker signal being output

from the loudspeaker, it is the near-end speakers voice. The AFC system is described

in fig. 4.3.

F is an adaptive filter which is designed and adapted to resemble the real RIR F .

The loudspeaker signal x(t) is then filtered with F in order to estimate the feedback

component of the microphone signal. There are several algorithms to go about this, and

the one utilized in this work is the Normalized Least Mean Square (NLMS) algorithm[14].

This is a common algorithm in echo cancellation, and is generally a good trade-off

between computational complexity and convergence speed [15]. The NLMS algorithm is

described as follows.

4.3.1 NLMS

In each iteration, the output from the adaptive filter is computed as

d[n] = y[n] − F T [n]x[n], (4.9)

where F is the adaptive filter of size N , and x is a delay vector containing the N latest

loudspeaker output samples. The term F T [n]x[n] is thus the approximated feedback

component in the microphone signal.

The adaptive filter F is then updated according to

23


F [n + 1] = F [n] + µd∗[n]x[n]

xH [n]x[n], (4.10)

where µ is the step size and the term xH [n]x[n] is the energy contents of the loudspeaker

output delay vector. The division by the energy term, which is the difference between

NLMS and LMS, is included to avoid the algorithm to be sensitive to scaling of the

loudspeaker vector x. If the filter converges perfectly so that F = F , all feedback

components of the source signal will be removed, so that d[n] = u[n], leaving only

speech in the microphone signal. The choice of the step size parameter is of great

importance to the convergence of the adaptive filter. If the step size is too small, the

adaptive filter will converge slowly and respond slowly to changes in the RIR, resulting

in an erroneous filter in non-stationary conditions. On the other hand, if the step size is

too large, the convergence speed will increase, but problems with stability might occur.

For speech applications, a step size of between 0.01 and 0.04 has been recommended

in literature [3]. In this work, a fixed step-size of 0.01 was used, which was found to

be a reasonable trade-off between convergence speed and stability. To avoid that the

filter updates when the loudspeaker signal was not strong enough, a threshold Tenergy

was introduced, and the condition xH [n]x[n] > Tenergy was set as a requirement for

allowing the filter to update. As previously mentioned, the NLMS algorithm performs

poorly when there is a high correlation between the loudspeaker and microphone signals.

For this reason, the loudspeaker signal was decorrelated from the microphone signal by

frequency shifting the output signal d[n] by 5 Hz before amplification with the algorithm

described in section 4.1. Since in the simulations the actual RIR is known, we can

evaluate the performance of the adaptive filter by computing the filter misadjustment in

each iteration:

FMA =

N−1∑

i=0

(Fi − Fi)2

N−1∑

i=0

F 2

i

. (4.11)

24

5 Method for testing

In order to properly evaluate the tested methods, a theoretical measure of the maximum

stable gain was needed. This was done in MATLAB, where a PA-system was simulated

and set up to be able to evaluate the methods, both in terms of maximum achievable

stable gain and the subjective listening experience: how well do the methods sound.

5.1 MATLAB simulation and evaluation

Methods from the DSP toolbox were used in order to read audio data from the source

file. The source file that was used in the simulations was a 35 second section from a radio

essay by Johan Norberg called ”Johan Norberg om den exploderande lyckotrenden”[16],

resampled to 16 kHz. The file was read in blocks of 1024 samples at a time, and a loop

through the samples of the blocks simulated single input, single output processing. A

simple user interface was created, to be used in the ”live” mode, in order to subjectively

evaluate the methods. The user could choose between the three evaluated methods, and

also set the applied gain in real-time. The user also had the option to disable all feedback

suppression to evaluate the system without any processing.

Once a sample had been processed, it was put in an output buffer of the same length

as the RIR used in the simulations, namely 2001 samples. The dot product of the full

2001 samples of the output buffer and the RIR was computed to obtain the feedback

component of the microphone signal. A new feedback component was computed for

every new input sample, and added to the input sample to obtain the microphone signal,

consisting of both the source signal and the feedback component. Every 1024’th iteration,

the 1024 newest samples were output to the loudspeaker. This process successfully

25


simulated the loudspeaker signal’s interaction with the room, and the feedback into the

microphone. Howling could clearly be heard in the simulations, upon adjusting the gain

to a level over the initial maximum stable gain.

The simulation of the PA-system could also be run in a ”test” mode, where measurements

of the maximum stable gain were taken and stored. Since the three methods differ in

their way to affect the signal different expressions had to be used in order to calculate

the maximum stable gain. This could have been done in several ways. For instance, the

gain could be automatically raised in small steps in order to induce howling feedback,

upon which the gain level at which howling occurs could be noted. This way to go

about this is sub-optimal, since an instability does not directly induce howling, which

means that the howling can be missed if the measurements are too short, resulting in

an overestimation of the maximum stable gain.

In the simulations, the maximum stable gain was measured from the known RIR used

to simulate the acoustic feedback path. By considered the RIR without feedback sup-

pression, one can determine the initial maximum stable gain, simply by observing the

frequency response, and finding the MSG using eq. (2.20).

The maximum stable gain for the different methods was calculated by applying the

filters corresponding to the methods to the RIR, obtaining a modified RIR for each

method. For the frequency shifting method, a time-varying filter corresponding to the

5 Hz frequency shift was applied to the RIR, which resulted in a maximum stable gain

that oscillated over time. For the notch filter methods, notch filters were applied to the

RIR when a howling frequency was detected in the simulations, and a new maximum

stable gain was computed from the modified RIR, with the detected howling frequencies

suppressed. For the frequency shifting method and the NFS method, the MSG was

computed as

MSGNF S,F S = −20log(

max|H(t, ω)F (ω)|)

∀ω : ∠F (ω) = m2π m ∈ Z, (5.1)

where the filter H(t, ω) is a time dependent 5 Hz frequency shift or the cascade of

active notch filters, depending on which method is being tested. For the AFC method,

the maximum stable gain was calculated by finding the highest peak in the difference

|F (ω) − F (ω)| that fulfils the phase condition accoring to

MSGAF C = −20log(

max|F (ω) − F (ω)|)

∀ω : ∠F (ω) = m2π m ∈ Z. (5.2)

The 35s speech segment was divided into four sections of approximately 9 seconds each.

26


In the first section, the applied gain in the electro-acoustic forward path was set to 0

dB, which was approximately 3 dB below the initial maximum stable gain. The gain

was increased dB-linearly in the second section, until reaching its final level of 8 dB

at the beginning of section 3. At the beginning of section 4, the RIR was changed,

corresponding to a 1 meter displacement of the microphone. The applied gain and the

altered RIR was kept constant during the fourth section. This test method, found in [3],

is a theoretical evaluation of the maximum stable gain, and how it is affected by the gain

level and changes in the RIR. Both RIRs can be found in [17]. Since a real-time scenario

will result in a more rapidly changing RIR, there is no guarantee that one will be able

to reproduce these results in a real-time setup. The test serves as an initial assessment

of the methods.

27

6 Results

In this section, the results of the simulations are presented. The three methods were

evaluated in terms of maximum stable gain and subjective listening experience.

6.1 Feedback suppression

Loudspeaker signal

5 10 15 20 25 30 35

Time (secs)

0

1

2

3

4

5

6

7

8

Fre

quency (

kH

z)

-150

-140

-130

-120

-110

-100

-90

-80

-70

Po

we

r/fr

eq

ue

ncy (

dB

/Hz)

Figure 6.1 – The spectrogram of the loudspeaker signal, 0 dB applied gain.

28


Figure 6.1 shows the spectrogram of the loudspeaker signal, when the applied gain was

0 dB. To illustrate the feedback phenomenon, fig. 6.2 shows a spectrogram of the same

signal, but the applied gain being manually raised to induce howling. At three occasions,

a frequency around 500 Hz shows a divergence in power, which suggests that feedback

has occurred at this frequency. The applied gain when the feedback occurred was 4 dB,

which is slightly above the initial MSG. When howling feedback was clearly heard, the

gain was manually decreased to 0 dB.

Loudspeaker signal

5 10 15 20 25 30 35

Time (secs)

0

1

2

3

4

5

6

7

8

Fre

quency (

kH

z)

-140

-120

-100

-80

-60

-40

-20

Po

we

r/fr

eq

ue

ncy (

dB

/Hz)

Figure 6.2 – Spectrogram of the loudspeaker signal with howling feedback present, 4 dB

applied gain.

To illustrate the performance of the feedback suppressor algorithms, the gain was set to

6 dB upon which the feedback suppression algorithms were activated. The spectrograms

for the three methods are shown in figs. 6.3 to 6.5.

In fig. 6.3, one can observe the oscillating nature of the frequency shifting method. There

are indeed frequencies that has an increased power compared to the case with no howling

feedback, but they are shifted up, keeping the system stable. Figure 6.4 shows that the

notch filter method at the specified gain setting was successful at suppressing feedback.

At 27 seconds, an increased power can be observed briefly in the low-frequency range,

indicating that a howling frequency was audible before being detected and suppressed.

The spectrogram for the AFC method, shown in fig. 6.5, shows no such increase in power

29


Loudspeaker signal

5 10 15 20 25 30 35

Time (secs)

0

1

2

3

4

5

6

7

8

Fre

quency (

kH

z)

-150

-140

-130

-120

-110

-100

-90

-80

-70

-60

Po

we

r/fr

eq

ue

ncy (

dB

/Hz)

Figure 6.3 – Spectrogram for the frequency shifting method, 6 dB applied gain.

for any frequency, meaning that this method successfully had suppressed all howling

feedback.

30


Loudspeaker signal

5 10 15 20 25 30 35

Time (secs)

0

1

2

3

4

5

6

7

8

Fre

quency (

kH

z)

-150

-140

-130

-120

-110

-100

-90

-80

-70

-60

-50

Po

we

r/fr

eq

ue

ncy (

dB

/Hz)

Figure 6.4 – Spectrogram for the NFS method, 6 dB applied gain.

Loudspeaker signal

5 10 15 20 25 30 35

Time (secs)

0

1

2

3

4

5

6

7

8

Fre

quency (

kH

z)

-150

-140

-130

-120

-110

-100

-90

-80

-70P

ow

er/

fre

qu

en

cy (

dB

/Hz)

Figure 6.5 – Spectrogram for the AFC method, 6 dB applied gain.

31


6.2 Maximum stable gain

The results from the maximum stable gain calculations are shown in fig. 6.6, where the

different sections are marked with vertical dashed lines. The gain applied in the electro-

acoustic forward path is shown as a bold dashed line, and the maximum stable gain

curves for all methods are included.

6.2.1 Frequency shifting

It can be seen that the frequency shifting method oscillates around 6 dB MSG, meaning

that this method theoretically raises the MSG by approximately 3 dB compared to the

case with no feedback suppressor. Upon changing the RIR, the MSG decreased to a

slightly lower level. From around 15s into the simulations, the MSG of the frequency

shifting method is below the actual applied gain level, meaning that we can expect

howling or ringing sounds from 15s and forward.

6.2.2 Notch filters

For the notch filter method, the points where a notch filter was applied can be clearly

visualized by the vertical jumps in the curve. During the parts of the simulation where

the MSG of the notch filter method was above the actual applied gain, the algorithm

should not detect any howling frequencies. In fig. 6.6, this is true for the first ∼ 13 sec-

onds, where no howling was detected and no notch filter was activat. When the applied

gain increased to the level of the MSG for the notch filter method, a howling frequency

was detected, and a notch filter was activated, removing the problematic frequency and

thus increasing the maximum stable gain. Around 17 seconds into the simulation, the

gain level was raised above the MSG level of the notch filter method, which means that

the algorithm failed to detect a howling frequency. During the time interval 17-27 sec-

onds, we should, according to this theoretical measurement, experience some howling

or ringing tones. When the RIR changed, the algorithm successfully suppressed the

problematic frequency / frequencies, raising the MSG to a stable level. When all 20

notch filters were active, which occurs at around 28 seconds into the simulations, the

expected MSG was just below 10 dB, which was an increase with around 7 dB compared

to the case where no feedback suppressor was used. The number of active notch filters

over time is illustrated in fig. 6.7, where it can be seen that no notch filters were active

32


0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

0

2

4

6

8

10

12

14

16

Time [s]

Gai

n[d

B]

GainFrequency shiftNotch filtersAcoustic feedback cancellation

Figure 6.6 – Maximum stable gain over time for all methods. The MSG curve for the

frequency shifting method has been smoothed for better visualization.

until the gain starts to increase, upon which a rapid increase in the number of notch

filters is observed. Changing the RIR almost instantly resulted in 5 new notch filters,

indicating that a change in the RIR does indeed affect the frequencies for which the

Nyquist stability criterion is fulfilled.

6.2.3 Acoustic feedback suppression

The curve for the AFC method is fluctuating heavily throughout the simulations, visu-

alizing the updates of the adaptive filter F . With the algorithm used, a basic NLMS-

method with the only requirement for the filter to update being the energy threshold,

there is no guarantee that the updated filter F [n + 1] will perform better than the pre-

vious filter F [n], and this is the reason that the MSG level sometimes can drop down.

33


0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300

2

4

6

8

10

12

14

16

18

20

Time [s]

Num

ber

ofnot

chfilt

ers

Figure 6.7 – The number of notch filters over time for the NFS method

The MSG level is mostly above the actual applied gain, meaning that throughout the

simulation, we should experience no or very little howling. A very temporary drop is

observed at around 28 seconds, which could result in a brief howling or ringing sound

at that time. The AFC method is observed to perform better at higher applied gain,

and the final MSG is expected to be around 11 dB, which is an increase of around 8 dB.

The maximum MSG value for this method occurred before the change in RIR and was

around 13 dB, which means that the potential MSG increase is 10 dB. In fig. 6.8, the fil-

ter misadjustment is shown. Initially, when the applied gain was 0 dB, the error was high

and the filter was thus badly approximated. Since the gain was low, this was expected,

since there was not enough information in the loudspeaker signal to correctly adapt the

filter. Upon increasing the gain, the misadjustment decreased, and the filter converged.

The change in the RIR resulted in an increase of the misadjustment by approximately

3 dB, upon which the misadjustment again decreased, indicating a converging filter.

6.3 Subjective listening experience

It is difficult to objectively evaluate the quality of processed speech, and due to this, the

listening experience will be described in words, and the sound quality for the methods

will be compared to each other. In the frequency shifting method, it should first of

all be concluded that a 5 Hz frequency shift on ordinary speech did not affect the

34


0 5 10 15 20 25 30 35 40−6

−5

−4

−3

−2

−1

0

1

Time [s]

Filte

rm

isadju

stm

ent

[dB

]

Figure 6.8 – The filter misadjustment for the NLMS filter

sound quality in a way that was not notable to me. Since the method does not prevent

howling to arise, ringing sounds were heard at gain levels above the initial MSG level.

The howling that occurred was then frequency shifted each loop, resulting in a brief

sweeping sound for each howling frequency being up-shifted. As the gain level increased,

more howling frequencies were heard as brief up-shifted sweeps, making the total sound

quality unacceptable for live applications. The system did not, however, show divergent

behaviour, even at the highest applied gain levels.

The notch filter method was, as expected, reactive, meaning that howling was heard

before the frequencies were suppressed. The level of the howling did not reach disturbing

levels before they were suppressed, however, making the listening experience decent

throughout the simulations. When a small number (0-5) of notch filter were active,

there were no audible artefacts, but the more notch filters that were activated, the more

the total sound quality was affected. By the time that close to all, or all notch filters

(15-20) were active, the sound was notably distorted, but the listening experience was

still deemed acceptable, especially compared to the frequency shifting method.

The AFC method resulted in the best listening experience, with no or very few disturbing

audible artefacts. The dip observed at 28 seconds in fig. 6.6 was not heard. There were

at times small echoes and noises in the background, which are assumed to be related to

an erroneously adapted filter. These small artefacts were not deemed disturbing, and

might blend in with the echo and reverberation that is present in all live PA scenarios.

35

7 Discussion, conclusion andfuture work

This thesis has surveyed the market and the literature of feedback suppression. A brief

summary of the available methods and the history on the subject has been presented,

and three methods have been chosen for further evaluation in MATLAB simulations. A

measurement of the maximum stable gain of these three methods has been presented,

and the subjective listening experience has been commented on. After working with the

simulations, and listening to the methods, my recommendation is that the AFC method

with a 5 Hz frequency shift on the loudspeaker signal should be used by Limes Audio.

The method has the highest expected maximum stable gain, and the subjective listening

experience was the best amongst the evaluated methods. There are a few things about

this method that needs attention. First, the method has only been evaluated successfully

in simulations. By the end of the project, attempts were made to make the MATLAB

simulations read audio streams from a microphone instead of from a file, enabling a real-

time setup and a more realistic evaluation of the methods. Even though the attempts

were successful, the simple task of reading audio from a microphone into MATLAB and

outputting it from a loudspeaker had an inherent time delay of approximately 300 ms,

which is unacceptable for live PA-applications. Due to time limitations of the project,

no work was put on fixing this issue, but the algorithms were tested with the time delay

included, in order to test basic functionality. All algorithms could be run without any

additional time delay or lag, however. The algorithms were successful in suppressing

feedback, but no MSG value could be fully determined.

As for the requirements on the feedback suppressor, an increased MSG of 8 − 10 dB

should be possible. This value is highly dependent upon the setup of the PA system and

the room in which it resides, and for this reason, there is no way to guarantee a certain

36


level of increased MSG. The MSG depends on the peaks in the magnitude response of

the RIR, an example of which can be seen in fig. 2.3. Even if the 10 or 20 largest peaks

are removed, there is a mean magnitude level in the magnitude response, and if the gain

is set so that this mean level reaches 0 dB, virtually all frequencies will howl, and there

is no way to stabilize that system and still maintain the speech signal.

The main concern with a AFC approach is changes in the room impulse response, due

to changes in the relative position of the microphone/loudspeaker. Even though the

simulations suggested that a rapid change of the position of the microphone with 1 m

does not result in howling or temporary instability, real-time testing is necessary to

evaluate how well the algorithm performs when the RIR changes. If the changes in

the RIR are to become a problem, a potential solution could be to combine the AFC

method with an AEQ or NFS method, which performs detection in a similar way that is

described in section 4.2. Upon detecting howling that is due to a changing RIR, the gain

in that particular sub-band could be temporarily lowered until the filter F has had the

time to adapt. The gain could then be slowly raised back to its initial value, if feedback

is no longer detected. Another idea would be to constantly monitor the changes in the

adaptive filter, and compare the updated filter to the old. If the difference between the

filters is ”big”, indicating a significant change in the RIR, then the step size could be

temporarily raised to speed up the convergence. When the difference is again ”small”,

indicating small changes in the RIR and more stationary conditions, the step size could

again be lowered to its initial value to ensure stability.

As for future work, the next step is to implement the AFC algorithm described in this

work to the Magneto Mixer by porting the MATLAB code to C. Real-time testing is

necessary to evaluate if there is a need for a ”fail-safe” sub-band equalization feature as

described above, and parameters such as the step size should be examined. In order to

ensure that the filter only updates when a better filter is available, future work should

evaluate if there is a suitable way to implement parallel adaptive filters. One filter F1 is

used to filter the output data and remove it from the microphone signal, described by

eq. (4.9). Another filter F2 can then be used in the update routine in eq. (4.10). An

evaluation algorithm could then determine if the filter F2 is better than the filter F1, and

if that is the case, we make the update F1 = F2. In AEC, echo return loss enhancement

(ERLE) is used for this, but it is yet to be determined if ERLE can be used in the AFC

case.

37

References

[1] M. R. Schroeder, “Improvement of acoustic-feedback stability by frequency shift-

ing”, The Journal of The Acoustical Society of America, vol. 36, no. 9, pp. 1718–

1724, 1964.

[2] M. Schroeder, “Improvement of acoustic feedback stability in public address sys-

tems”, Proc. 3rd int.congr.acoust, 1959.

[3] T. van Waterschoot and M. Moonen, “Fifty years of acoustic feedback control:

state of the art and future challenges”, Proc. IEEE, vol. 99, no. 2, pp. 288–327,

2011.

[4] M. Mandal and A. Asif, Continuous and discrete time signals and systems. Cam-

bridge University Press, 2007, isbn: 9780521854559.

[5] H. Nyquist, “Regeneration theory”, j-BELL-SYST-TECH-J, vol. 11, no. 1, pp. 126–

147, Jan. 1932.

[6] J. E. T. Patronis, “Electronic detection of acoustic feedback and automatic sound

system gain control”, J. Audio Eng.Soc., vol. 26, no. 2, 1978.

[7] P. Gil-Cacho, T. van Waterschoot, M. Moonen, and S. H. Jensen, “Regularized

adaptive notch filters for acoustic howling suppression”, 17th Eur. sig. proc.conf

(EUSIPCO 2009), 2009.

[8] T. van Waterschoot and M. Moonen, “Comparative evaluation of howling detection

criteria in notch-filter-based howling suppression”, Journal of the audio engineering

society, vol. 58, no. 11, pp. 923–940, 2010.

[9] ——, “Assessing the acoustic feedback control performance of adaptive feedback

cancellation in sound reinforcement systems”, in Proc. 17th European Signal Pro-

cess. Conf. (EUSIPCO ’09), Glasgow, Scotland, UK, Aug. 2009, pp. 1997–2001.

[10] M. Guo, S. Jensen, J.Jensen, and S.L.Grant, “On the use of phase modulation

method for decorrelation in acoustic feedback cancellation”, in Proc. 20th European

Signal Process. Conf. (EUSIPCO ’12), Bucharest, Romania, Aug. 2012.

38


[11] T. van Waterschoot, G.Rombouts, and M. Moonen, “On the performance of decor-

relation by prefiltering for adaptive feedback cancellation in public address sys-

tems”, in Proc. 4th IEEE Benelux Signal Process. Symp. (SPS ’04), Hilvarenbeek,

The Netherlands, Apr. 2004, pp. 167–170.

[12] J. S. Marple, “Computing the discrete-time ”analytic” signal via fft”, in Signals,

Systems Amp; Computers, 1997. Conference Record of the 31st Asilomar Confer-

ence on, Nov. 1997, pp. 1322–1325.

[13] A. Reilly, G.Frazer, and B. Boashash, “Analytic signal generation - tips and traps”,

IEEE Transactions on Signal Processing, vol. 42, no. 11, pp. 3241–3245, Nov. 1994.

[14] E. Hansler and G. Schmidt, Acoustic Echo and Noise Control: A Practical Ap-

proach. Wiley-Interscience, 2004, isbn: 9780471453468.

[15] J. Dhiman, S.Ahmad, and K. Gulia, “Comparison between adaptive filter algo-

rithms (lms, nlms and rls)”, International Journal of Science, Engineering and

Technology Research (IJSETR), vol. 2, no. 5, May 2013.

[16] J. Norberg, Johan norberg om den exploderande lyckotrenden, http://sverigesradio.

se/sida/avsnitt/50019?programid=503, [Online, accessed february 2017], Feb.

2012.

[17] T. van Waterschoot and M. Moonen, Ftp, ftp://ftp.esat.kuleuven.be/sista/

vanwaterschoot/abstracts/08-13.html, [Online, accessed february 2017], Aug.

2013.

39

http://sverigesradio.se/sida/avsnitt/50019?programid=503

http://sverigesradio.se/sida/avsnitt/50019?programid=503

ftp://ftp.esat.kuleuven.be/sista/vanwaterschoot/abstracts/08-13.html

ftp://ftp.esat.kuleuven.be/sista/vanwaterschoot/abstracts/08-13.html

acoustic feedback suppression in audio mixer for pa applications1114170/... · 2017-06-22 ·...

Documents