acoustic feedback suppression in audio mixer for pa applications1114170/... · 2017-06-22 ·...
TRANSCRIPT
Acoustic feedback suppression
in audio mixer for PA applications
Mattias Ekström
Master’s Thesis in Engineering Physics, Department of Physics, Umeå University, 2017
Department of Physics
Linnaeus väg 20 901 87 Umeå Sweden www.physics.umu.se
Department of physicsUmeå University June 19, 2017
Acoustic feedback suppression
in audio mixer for PA applications
Mattias Ekström (maek0025@ student. umu. se )
June 19, 2017
Master’s thesis, engineering physics, spring 2017, 30 creditsSupervisor: Christian Schüld, Limes Audio
Examiner: Ove Andersson, Department of physics
Abstract
When a speaker is addressing an audience, a PA system consisting of a microphone
and a loudspeaker is often used. If the microphone picks up too much of the loud-
speaker energy, acoustic feedback in the form of an unwanted characteristic howling
can occur. Limes Audio is a software company that specializes in improving sound
quality in digital communications, mainly conference telephony, and has developed
a reference product, the Magneto mixer, to demonstrate the capability of their soft-
ware TrueVoice. The company now wishes to expand the field of usage for the
Magneto mixer to enable it to work as a microphone mixer in PA scenarios, and for
this, a feedback suppression feature is needed. This master’s thesis aims at survey-
ing the market and the literature in the field and specifying the requirements for
a feedback suppression feature. Three methods for suppressing howling feedback
are evaluated through simulations and compared in terms of maximum stable gain
(MSG) and subjective listening experience. The method that performed the best
based on these criteria was acoustic feedback cancellation with a 5 Hz frequency
shift on the loudspeaker signal. This method makes use of an adaptive filter to
model the acoustic feedback path and to remove the feedback component from the
microphone signal. In the simulations, the method was able to increase the stable
gain by approximately 10 dB while maintaining a good sound quality.
i
Rundgångsreducering i ljudmixer för tillämpning iPA-system
Sammanfattning
När en talare talar för en publik används ofta ett PA system bestående av en mikro-
fon och en högtalare. Om mikrofonen tar upp för mycket av ljudet från högtalaren
finns en överhängande risk för akustisk rundgång i form av ett karaktäristiskt oöns-
kat tjut. Limes Audio är ett företag som utvecklar mjukvara för att förbättra ljud-
kvaliten i digital kommunikation, främst inom konferenstelefoni. De har utvecklat en
demonstrationsprodukt, Magnetomixern, som kan användas som en konferenstele-
fon för att demonstrera deras programvara TrueVoice. Företaget önskar nu utveckla
Magnetomixern till att även fungera som en ljudmixer för PA-scenarion, eller kon-
ferenstelefoni där intern ljudförstärkning i rummet behövs, och för detta behövs en
funktion för att ta bort eventuell rundgång. Detta examensarbete har som mål att
lägga grunden för en sådan funktion i Magnetomixern genom att undersöka markna-
den och litteraturen på området. Tre metoder för att eliminera rundgång utvärderas
i simuleringar och jämförs beträffande maximal stabil förstärkning (MSG) och sub-
jektiv ljudkvalitet. Metoden ”Acoustic feedback cancellation” tillsammans med ett 5
Hz frekvensskifte på högtalarsignalen gav högst MSG och bäst ljudkvalitet. Metoden
använder ett adaptivt filter för att approximera den akustiska återkopplingsvägen
mellan högtalare och mikrofon samt tar bort rundgångskomponenter från mikrofon-
signalen. I simuleringarna kunde metoden öka den maximala stabila förstärkningen
med upp till 10 dB medan en god ljudkvalitet på talet bibehölls.
ii
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
List of abbreviations
AEC Acoustic Echo Cancellation
AEQ Automatic Equalization
AFC Acoustic Feedback Cancellation
FFT Fast Fourier Transform
FIR Finite Impulse Response
IIR Infinite Impulse Response
IMSD Interframe Magnitude Slope Deviation
LTI Linear Time-Invariant
MSG Maximum Stable Gain
NFS Notch filter based Feedback Suppression
NLMS Normalized Least Mean Square
PA Public Address
PHPR Peak-to-Harmonic Power Ratio
PNPR Peak-to-Neighouring Power Ratio
RIR Room Impulse Response
iii
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Disposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Theory 4
2.1 Basics of signals and systems . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Digital filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 The feedback phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Stability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Methods used in feedback suppression 12
4 Description of algorithms 16
4.1 Frequency shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.1 Analytic signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Two-stage notch filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.1 Detection stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 Suppression stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Acoustic feedback cancellation . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.1 NLMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Method for testing 25
5.1 MATLAB simulation and evaluation . . . . . . . . . . . . . . . . . . . . . 25
6 Results 28
iv
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
6.1 Feedback suppression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Maximum stable gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2.1 Frequency shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2.2 Notch filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2.3 Acoustic feedback suppression . . . . . . . . . . . . . . . . . . . . . 33
6.3 Subjective listening experience . . . . . . . . . . . . . . . . . . . . . . . . 34
7 Discussion, conclusion and future work 36
References 38
v
1 Introduction
1.1 Background
In any given situation where a speaker is addressing an audience using a Public Address
(PA) system, consisting of a microphone and a loudspeaker, the entire performance is at
risk of being ruined by feedback, perceived as ”howling” at a certain frequency. Feed-
back howling is not only an unpleasant experience for the audience, but also puts the
PA equipment at risk of being damaged. Feedback occurs when the microphone takes
up too much of the loudspeaker’s energy (see chapter 2), and causes unstable oscillations
at problematic frequencies which is perceived as howling, that probably is familiar to
the reader. Throughout the history of PA systems, feedback has been a reoccurring
phenomenon and different measures have been taken to prevent this unpleasant experi-
ence. Since the 1960s, when the first feedback suppression methods were presented[1],
[2], novel methods and algorithms have been proposed, and since the dramatic increase
in the use of digital computers in the 1980s and forward, more powerful and efficient
algorithms have been developed through software implementations in digital signal pro-
cessors (DSP). Today, many consider the best method to avoid howling feedback to be
a careful and well planned setup of the microphone and loudspeakers, along with an ex-
perienced sound technician that sets the equalization in the PA system to be optimized
for the specific room, and decrease the gain of potentially problematic frequencies [3].
In many applications though, there is a need for a plug-and-play solution without the
presence of a sound technician, and for these scenarios, the processes usually performed
by a sound technician must be automated or other measures needs to be taken in order
to avoid howling feedback.
1
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
1.2 Motivation
Limes Audio AB is a company owned by Google that develops audio solutions for en-
terprise applications. Their main product, TrueVoice, has been developed to remove
echoes, noise and other sonic artefacts in conference telephony and other applications
that makes use of a communication system with a loudspeaker and microphone situated
in the same unit. Limes Audio has designed a reference product called the Magneto
mixer, that can be used as a plug-and-play conference mixing unit together with a com-
puter, and has the TrueVoice software embedded. The company now wishes to look into
the possibility of expanding the field of usage for the Magneto mixer, from working as a
conference telephony mixing unit to also be able to work as a plug-and-play mixer unit
in a PA system, and other teleconferencing scenarios where internal sound reinforcement
is necessary. For this, the software in the Magneto mixer needs to be adapted for the
PA case, which has a different problem formulation than the teleconference case.
1.3 Objective
For the Magneto mixer to work properly in the PA case, there is a need for a feedback
suppression feature. There are two main objectives for this work. The first objective
is to survey the literature on the subject as well as the competitors solutions to the
feedback problem, and provide documentation on the findings. The second objective is
to specify the requirements for a feedback suppression feature in the Magneto mixer and
to develop MATLAB code demonstrating the performance of some chosen methods, and
to perform an evaluation regarding which method Limes Audio should aim at including
in the Magneto Mixer in their future work of integrating a feedback suppressor in the
Magneto mixer.
1.4 Disposition
Chapter 2 describes the mathematical theory of the feedback problem and the conditions
required for howling feedback to occur. Chapter 3 briefly describes the available methods
on the subject, and provides arguments for my choice of methods for the next section.
Chapter 4 describes the chosen feedback suppression algorithms in detail, and chapter 5
describes the methods used for testing the implementations and simulating the PA-setup
2
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
in MATLAB. Chapter 6 presents the results from the evaluation procedures and chapter 7
concludes the report with a discussion of the findings in the work, and suggestions for
future work.
3
2 Theory
This chapter describes the theoretical foundation upon which all feedback suppression
algorithms are based, starting from the fundamentals in signals and systems. The math-
ematical formulation of the feedback problem is presented, and the conditions required
for howling feedback to occur are explained.
2.1 Basics of signals and systems
2.1.1 Linear systems
A system H is an operator that takes an input x(t) and produces an output y(t):
y(t) = H{x(t)}. (2.1)
H is said to be linear if it satisfies the superposition principle: if several inputs x1(t), x2(t), ..., xi(t)
produces outputs
y1(t) = H{x1(t)} (2.2)
y2(t) = H{x2(t)} (2.3)
... (2.4)
yi(t) = H{xi(t)}, (2.5)
4
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
then the output upon addition of the inputs and possibly scaling them by factors αi
satisfies
α1y1(t) + ... + αiyi(t) = H{α1x1(t) + ... + αixi(t)}. (2.6)
A system is furthermore said to be time-invariant if a time shift T in the input only
results in a corresponding time shift in the output:
y(t − T ) = H{x(t − T )}. (2.7)
A Linear Time-Invariant (LTI) system can be described by its impulse response h(t)
in the time domain and by its frequency response H(ω) in the frequency domain. The
impulse response is the output from an LTI system being excited with an impulse at
time t = 0. In the discrete domain, this impulse is represented by the Kronecker delta
impulse
di =
0 if i 6= 0
1 if i = 0.(2.8)
The corresponding impulse in the continuous domain is the Dirac delta function. If the
impulse response is known, one can, for any input x(t), determine the output y(t) of the
system with the convolution operator ∗:
y(t) = h(t) ∗ x(t). (2.9)
The frequency response, H(ω) is obtained by computing the Fourier transform of the
impulse response h(t), and describes the frequency spectrum of the output of the LTI
system, when the input is one of the above described impulse functions:
H(ω) = F{h(t)}, (2.10)
where F is the Fourier transform operator. A property of interest for the convolution
operator is the convolution theorem, which states that, upon computing the Fourier
transform of both sides of eq. (2.9):
Y (ω) = F{h(t) ∗ x(t)} = F{h(t)}F{x(t)} = H(ω)X(ω), (2.11)
where Y (ω), X(ω) are the Fourier transforms of their corresponding signal [4].
5
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
From eq. (2.11), it can easily be deduced that the total frequency response of a system
can be found by dividing the Fourier transform of the output signal by the Fourier
transform of the input signal:
H(ω) =Y (ω)
X(ω). (2.12)
For real-valued signals, the corresponding Fourier transforms are complex and Hermi-
tian [4]. From the complex-valued frequency response H(ω), the magnitude response
|H(ω)| and the phase response ∠H(ω) can be computed. These quantities describe the
magnitude and phase of the frequency components in the output signal from the system.
2.1.2 Digital filters
A digital filter is a system that manipulates an input signal in a desired way to produce
a specific output. Examples of these are band pass filters, low pass filters and high pass
filters. Digital filters can be either Finite Impulse Response (FIR), or Infinite Impulse
Response (IIR). As the names suggests, the impulse response of a FIR filter is of finite
order, and infinite for an IIR filter. Since FIR filters have finite impulse responses, they
are always stable, but can be computationally demanding, as opposed to IIR filters, that
can sometimes be unstable, but are in general less computationally demanding than FIR
filters [4].
2.2 The feedback phenomenon
In situations where a speaker is addressing an audience located in the same room, a PA
system, consisting of a microphone and loudspeakers, is often used. Due to the fact that
the microphone and loudspeaker are situated in the same room, there is a significant risk
of feedback from the loudspeakers to the microphone, which sometimes can be heard as a
characteristic ”howling” of tones with problematic frequencies for the specific enclosure.
Howling occurs when the microphone takes up too much of the loudspeaker energy and is
undesired, resulting in an unpleasant experience for the audience and a risk of damaging
the PA equipment.
The scenario can be described by the model shown in fig. 2.1. Throughout the work,
we will assume that the source signal u(t) contains speech only, the background noise
6
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
x(t)
G
y(t)
F
u(t)
Figure 2.1 – A model of the scenario case, here including one microphone and one
loudspeaker (single-channel system).
will not be considered. Furthermore, the speech is assumed to have been sampled to the
discrete domain at 16 kHz, which according to the Nyquist sampling theorem results
in that all signals components up to 8 kHz will be sampled without aliasing[4]. The
vast majority of the human speech is contained within this bandwidth, and therefore it
is assumed that the continuous source signal is band limited to 8 Hz and thus can be
sampled at 16 kHz and perfectly reconstructed from the samples without aliasing.
In fig. 2.1, a speaker produces speech into a microphone, resulting in a source signal
u(t). The signal is then processed in the electro-acoustic forward path, here denoted
G. This processing includes the amplifier gain and possibly digital audio effects such as
compression and equalization. One of the most simple types of processing in the electro-
acoustic forward path is a broadband gain, which is simply the ratio of the output signal
power and the input signal power. A broadband gain G(t) can be expressed in dB as
Gain = 20log
(
x(t)
y(t)
)
[dB], (2.13)
and is the only processing in the electro-acoustic forward path considered in this work.
The amplified output signal x(t) is then transmitted to the loudspeaker. The output
from the loudspeaker propagates through the room in which the PA system is set up,
and interacts with the environment in a way described by the acoustic feedback path
F . The acoustic feedback path is modelled as a linear system, with input signal x(t).
According to eq. (2.9), we can compute the output from that system, which is the
feedback signal going back into the microphone, by convolving the loudspeaker signal
7
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
with the impulse response of the acoustic feedback path F (t), also denoted the Room
Impulse Response (RIR). The signal is fed back into the microphone, forming a closed
loop system described by
y(t) = F (t) ∗ x(t) + u(t)
x(t) = G(t) ∗ y(t), (2.14)
where F (t) and G(t) are the impulse responses of the acoustic feedback path and the
electro-acoustic forward path, respectively. Upon computing the Fourier transform on
both sides of eq. (2.14) and making use of the convolution theorem in eq. (2.11), one
obtains:
Y (ω) = F (ω)X(ω) + U(ω) (2.15)
X(ω) = G(ω)Y (ω), (2.16)
where F (ω) and G(ω) are the frequency responses of the corresponding systems, and
X(ω), U(ω) and Y (ω) are the frequency contents of their corresponding signal. From
this, one can compute the total frequency response from the source u(t) to the output
x(t) by using the property described in eq. (2.12):
H(ω) =X(ω)
U(ω)=
G(ω)Y (ω)
Y (ω) − F (ω)X(ω)=
G(ω)
1 − F (ω)G(ω). (2.17)
The term F (ω)G(ω) is referred to as the loop response of the system, and the related
magnitude response |F (ω)G(ω)| is denoted the loop gain, whereas the phase response
∠F (ω)G(ω) is denoted the loop phase. The system described by the transfer function
in eq. (2.17) is assumed to be a linear, time-dependent, finite order system, as described
in section 2.1.1. These assumptions are justified in [3], where the authors argue that the
linearity can be derived from the fact that a sound wave’s interaction with the environ-
ment can be considered level independent, meaning that the nature of the reflections is
not dependent on the sound pressure level and therefore linear. The time-dependency
assumption is an obvious one, since the feedback path is dependent upon all movements
and changes in the room, including the microphone or loudspeaker changing positions.
Finally, the system can be considered to be of finite order owing to the fact that RIRs in
8
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400−4
−2
0
2
4
6
8·10−2
Sample number
f(t
)
Figure 2.2 – The impulse response of a typical room, truncated at 2001 samples
general are infinite, showing an exponential decay over time, as shown in fig. 2.2. From
this observation, it is reasonable to allow truncation at a certain length of the RIR.
2.3 Stability analysis
Even though the system H(ω) is indeed time varying due to changes in the RIR, it is
common practice in the field of feedback suppression to carry out the stability analysis for
a time invariant system [3]. This is the reason that the expressions in eqs. (2.15) to (2.17)
do not depend on time. The stability analysis originates from the paper ”Regeneration
theory” by Harry Nyquist [5], which can be consulted for further reading. For the system
described in eq. (2.17), the system becomes unstable for |F (ω)G(ω)| ≥ 1, or 0 dB. In
order for the signal to diverge due to feedback, the components for the problematic
frequencies from each loop needs to superimpose over time. For this to occur, the
frequency components needs to be in phase, which requires the phase to be multiples
of 2π. This condition for instability is summarized in the Nyquist stability criterion:
if there exists a radial frequency ω for which the loop gain is greater than or equal to
unity, and for which the loop phase is any multiple of 2π, then the system is unstable:
9
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
0 1000 2000 3000 4000 5000 6000 7000 8000−70
−60
−50
−40
−30
−20
−10
0
f [Hz]
|F(f
)|[d
B]
(a) The magnitude response
0 1000 2000 3000 4000 5000 6000 7000 8000−700
−600
−500
−400
−300
−200
−100
0
100
f [kHz]
∠F
(f)[
rad]
(b) The phase response
Figure 2.3 – The characteristics of a typical room: the magnitude response and the phase
response
|F (ω)G(ω)| ≥ 1 (2.18)
∠F (ω)G(ω) = m2π m ∈ Z. (2.19)
The corresponding frequency f = ω/2π will, if present in the source signal, cause unsta-
ble oscillations in the system perceived as a howling sound. It should be pointed out that
the assumption that the system is time-invariant is not necessarily fulfilled. Actually,
it is virtually never fulfilled for any given PA scenario. However, under the assumption
that the RIR is ”slowly changing” over time, the Nyquist stability criterion applies. It
is important to note that this assumption can cause problems when the RIR is rapidly
changing, such as when the speaker is holding a portable microphone and is walking
around in the room, as explained in chapter 4. For this reason it is of importance to be
aware to this assumption.
Any given room with RIR F (t) has a specific value of Maximum Stable Gain (MSG),
which can be found from the frequency response F (ω). Expressed in dB, the initial MSG
is computed by finding the peak with the largest magnitude in the frequency response
that fulfils the phase condition eq. (2.19), and calculate how far that peak is from 0db.
The initial MSG is computed, in dB, as
− 20log(
max|F (ω)|)
∀ω : ∠F (ω) = m2π m ∈ Z. (2.20)
10
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
The magnitude and phase responses of a typical room, and also one of the room char-
acteristics used in the simulations in this work, are shown in fig. 2.3. The MSG of the
RIR shown in fig. 2.3 is 3.087 dB.
The main objective of feedback suppression is to manipulate the total transfer function,
by introducing additional sub-systems which alters the total frequency response in order
to increase the MSG, preferably without distorting the source signal. In the following
chapters, we will look into different methods of achieving this.
11
3 Methods used in feedbacksuppression
This chapter is a summary of the history and available literature of the field of feedback
suppression. The field of acoustic feedback suppression is a well studied subject, and
several methods have been proposed to solve the howling problem. There are four main
categories of feedback suppression, namely
• Periodic modulation methods
• Gain reduction methods
• Room modelling methods
• Spatial filtering methods (beamforming)
The first methods to address the issue of howling feedback, developed in the 1960s [1],
[2], belong to the first category. Implemented with electronic components, these methods
consists of manipulation of the microphone signal before amplification by altering the
phase of the signal by a small value φ, or by shifting the frequency of the signal by a small
∆f . In [2], an increase in maximum stable gain of 14dB was reported, but the effects
on the sound quality were too severe to be considered acceptable. Frequency shifting
is a method that is used in some commercial products today. One of these methods,
namely a frequency shift of 5 Hz, is evaluated in this work and is explained in-depth in
section 4.1.
The second category, gain reduction methods, can be divided into three subcategories,
depending on the frequency range in which the gain is reduced. Early works applied a
full-band gain reduction upon detecting howling [6]. This method does obviously not
increase the maximum stable gain, but merely brings back an unstable system to a stable
12
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
state. Full-band gain reduction was later refined into Automatic Equalization (AEQ),
which divides the input signal into frequency bands, and performs feedback detection
on every sub-band. If a howling frequency is detected, the gain is reduced only in the
sub-band where the critical frequency resides, thus leaving the rest of the signal intact.
The AEQ methods can be described as an attempt to automate the work of an audio
engineer, who often works with sub-band equalization to reduce feedback. The AEQ
method was further refined into Notch filter based Feedback Suppression (NFS), where
notch filters are used to suppress problematic frequencies at which howling has been
detected. Notch filters are stop band filters with a very narrow stop band (called a
”notch”), which severely reduces the gain in that particular frequency band and thus
removes those frequencies from the signal. These notch filters can be designed to be very
narrow, thus only suppressing a very small frequency band of the signal, namely where
the howling occurs. It should be mentioned that notch filters can be implemented as both
FIR and IIR filters, but in order to make them very narrow, a high order is required,
which means that IIR filters are often prefered. To suppress several frequencies in a
signal, a number of notch filters, centered at different frequencies, can be applied on a
signal, either by applying several filters in series or by designing one filter with two or
several ”notches”.
The NFS methods are by far the most used in commercial products today. All NFS
methods include a detection phase and a suppression phase [3], and are divided into
one-stage NFS methods and two-stage NFS methods. In one-stage methods, detection
and suppression are performed in the same step. In [7], the authors use adaptive notch
filters in order to detect and suppress howling in the same stage. It is concluded in
the paper, that the adaptive notch filters used in their work did not produce sufficient
feedback suppression in the entire frequency range. The most commonly used methods
in the NFS category are so-called two-stage methods, where detection and suppression
are separated. Often including the Fourier transform computed by the Fast Fourier
Transform (FFT), the frequency spectra of segments of the signal are evaluated. A
frequency spectrum is scanned with a peak-picking algorithm to find the frequencies
that has the most power, and the frequencies corresponding to these peaks are tested
against certain criteria to determine if they are indeed howling frequencies, or just tonal
components in the signal. If the detection algorithm finds a howling frequency, the
suppression stage receives information about the frequency at which howling occurs,
and applies a notch filter at that specific frequency to suppress the howling.
13
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
There are several spectral (frequency based) and temporal (time-based) features that a
howling components has, but a tonal component has not. In practice, one of these, or
a combination of them can be used to determine if a peak in the frequency spectrum
corresponds to a howling frequency. [8] evaluates a number of criteria that can be used
to evaluate if a signal component is a howling component of a tonal component. A
two-stage notch filter based method is evaluated in this work and is explained in-depth
in section 4.2.
The third category, room modelling methods, sometimes also called Acoustic Feedback
Cancellation (AFC), resembles the methods used in Acoustic Echo Cancellation (AEC),
a feature used in conference telephony and other applications where a speaker commu-
nicates with another speaker at a distant location using a conference telephone. In these
cases, the far-end speaker’s voice is output from a loudspeaker and fed back into a mi-
crophone, resulting in an echo back to the far-end speaker, if no measures are taken.
A common approach in AEC is to use an adaptive filter F to approximate the RIR F ,
and filter the output from the loudspeaker with F in order to model the feedback, and
remove the approximated feedback component from the microphone signal. If the adap-
tive filter is perfectly approximated, no feedback component remains in the microphone
signal.
The main difference between the AFC case and the AEC case is that the loudspeaker
signal is highly correlated to the microphone signal in AFC, which is not the case in the
AEC case [9]. When there is high correlation between the loudspeaker and microphone
signals, which occurs during ”double-talk” scenarios (when the near-end speaker and far-
end speaker speaks simultaneously), the AEC algorithms are known to perform poorly in
adapting the filters. This makes the AEC methods unsuitable for the AFC case, which
can be described as the AEC case with constant double-talk. In order to use adaptive
filters to remove the unwanted feedback, one needs to use decorrelation methods to
decrease the correlation between the loudspeaker and microphone signals[3], [10], [11].
Different methods for decorrelation have been suggested, such as noise injection on the
loudspeaker signal, frequency shifting or phase shifting the loudspeaker signal, non-linear
processing, introduction of a delay in the forward path and decorrelating pre-filters[9].
The method of using adaptive filters to remove unwanted signal contents, and the AFC
method evaluated in this work is further explained in section 4.3.
The fourth category, which is also known as beam forming, consists of using special
microphone- and/or loudspeaker arrays in order to reduce the signal transport between
the microphone and the loudspeaker, by modifying the directivity patterns of the array
14
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
to have the null direction in the direction of the other unit. These methods require
additional hardware and will for that reason not be considered in this work, which is
limited to software implementation.
In [3], the authors conclude that the most promising method in terms of achievable
increase in MSG and subjective sound quality is the AFC approach. For this reason,
one of these methods will be included in the MATLAB evaluation of methods. Upon
surveying the market, it is obvious that the two-stage NFS methods are by far the most
common in feedback suppression products. For this reason, one of these methods will
be implemented and evaluated in MATLAB. The nature of these methods includes the
disadvantage of being reactive, in the sense that howling sound needs to be detected, and
thus is often heard before it is suppressed. This is clearly a drawback of these methods.
AFC on the other hand is a proactive suppression method, which removes feedback and
echoes continuously, making it slightly more interesting than the NFS approach. As
explained in the section above, the AFC methods need a routine for de-correlating the
loudspeaker signal from the microphone signal, and a 5 Hz frequency shift was chosen
for this, mainly due to its simplicity, but also since frequency shifting is by itself a
feedback suppression method, which then can also be included as a stand-alone method
for comparison. The algorithms, by which these three methods operate, are presented
in detail in the following chapter.
15
4 Description of algorithms
In this section, the three chosen methods frequency shifting, notch filter-based feedback
suppression and acoustic feedback cancellation will be described in detail, and the nature
of howling will be related to them.
4.1 Frequency shifting
The frequency shifting method, as the name suggests, manipulates the microphone signal
by shifting all frequency components with a predetermined value ∆f . By performing
this frequency shift, one aims at circumventing the magnitude condition eq. (2.18), by
not allowing the signal components with the critical frequency fc to build up every loop,
but instead being shifted to frequencies which fulfil the magnitude condition eq. (2.18),
and thus stabilizing the system. A frequency shift can be performed in software by
performing manipulations of the so-called discrete-time analytic signal
ya(t) = y(t) + iy(t), (4.1)
where y(t) is the Hilbert transform of the original signal and i is the imaginary unit. The
analytic signal is defined as the original signal with zero negative frequency content. The
negative frequencies can be discarded due to the fact that audio signals are real signals,
and a property of real signals is that their frequency spectra is Hermitian, meaning that
the negative frequencies does not provide any information that cannot be found in the
positive frequency content [4]. One can perform frequency shifting by multiplying the
analytic signal with a complex exponential
16
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
x(t)
G
y(t)
F
u(t)FS
d(t)
Figure 4.1 – The system with a frequency shift of the microphone signal in the electro-
acoustic forward path
Smod(t) = eiωst, (4.2)
where ωs = 2π∆f , and is the modulation frequency. The output from the frequency
shift is then obtained by taking the real part of the resulting complex valued signal:
d(t) = Re(ya(t)Smod(t)) = y(t)cos(φ(t)) − y(t)sin(φ(t)), φ(t) = 2π∆ft, (4.3)
where d(t) is the frequency shifted output signal. The modulation can be described by
the system in fig. 4.1.
4.1.1 Analytic signal
The analytic signal can be obtained by computing the Fourier transform Y (ω) of a
segment of the input signal, and computing the inverse Fourier transform of the single-
sided spectrum, with the negative frequencies set to 0[12]. The inverse Fourier transform
is an approximation of the analytic signal. Since the spectrum of the approximated
analytic signal is single-sided, it is complex-valued and can be expressed according to
eq. (4.1). The nature of the Fourier transform requires that the input samples are
framed with frame size M samples which will introduce a delay of M samples in the
processing. In this work, an alternative method was used, which uses a modulated low
pass filter in order to obtain an approximation of the analytic signal [13]. To remove
the negative frequency components, a FIR low-pass filter of order 256 with normalized
17
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
−1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
−110
−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
Normalized Frequency [ × π rad/sample]
Magnit
ude
[dB
]
Figure 4.2 – The magnitude response of the modulated low-pass filter, with a pass
band covering the entire positive frequency range and a stop band covering the negative
frequency range.
cut-off frequency of fs/4 was used. This filter was modulated with the frequency fs/4,
resulting in a complex-valued band pass filter with a pass band covering the entire
positive frequency range, and a stop band covering the entire negative frequency range.
The filter is visualized in fig. 4.2.
The input samples were buffered into a delay vector of the same length as the complex
modulated low pass filter (256), and for each sample, the dot product between the delay
vector and the filter was computed in order to obtain the current analytic signal sample,
which is an approximation of eq. (4.1) at time t. Equation (4.3) was then applied to the
approximated analytical signal sample in order to obtain the frequency shifted output
signal sample d(t).
4.2 Two-stage notch filtering
The two-stage notch filtering method makes use of information about the frequency
spectrum of the incoming signal in the detection stage, and applies notch filters to the
signal in the suppression stage, based on the findings in the detection stage. This section
describes the two-stage algorithm used in this work.
18
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
4.2.1 Detection stage
The incoming signal was framed in frames of M = 4096 samples using an overlap between
frames to reduce detection time. The overlap was set to M/2 samples. When a frame
had been filled with M samples, the frame was multiplied with a Blackman window
to avoid spectral leakage. The frequency spectrum Y (ω) of the windowed signal was
then computed with the Fourier transform. Due to the fact that the input signal is
real-valued, it is sufficient to consider only the single-sided frequency spectrum, out of
which the 10 largest peaks were located through a peak-picking algorithm. Evaluating
10 peaks gives a satisfactory level of confidence that a howling frequency is detected,
since it is almost always the case that howling frequencies do not occur at the same
level of applied gain, but occur ”once at a time” upon increasing the applied gain. In
MATLAB, the function findpeaks was used for this. The frequencies corresponding
to these peaks were considered ”possible howling frequencies” or {ωi}, 1 ≤ i ≤ 10.
The set of possible howling frequencies were then evaluated in three different steps to
determine if the frequency at hand was a howling frequency or a tonal component of the
input signal. This was done by the two spectral evaluations Peak-to-Harmonic Power
Ratio (PHPR) and Peak-to-Neighouring Power Ratio (PNPR), along with the temporal
evaluation Interframe Magnitude Slope Deviation (IMSD) [8].
4.2.1.1 Peak-To-Harmonic Power Ratio (PHPR)
Tonal components in speech often include harmonics, which are integer multiples of the
frequency component. This is not the case for a howling frequency, which consists of a
very narrow frequency without significant harmonics. The power of the possible howling
frequency is divided by the power of the m′th harmonic to compute the PHPR. This
feature is computed for each candidate howling frequency ωi for the m’th harmonic:
PHPR(ωi, m) = 10log10
|Y (ωi)|2
|Y (mωi)|2. (4.4)
4.2.1.2 Peak-To-Neighbouring Power Ratio (PNPR)
In speech, frequency components includes the property of having a broader bandwidth
than a single sinusoidal frequency component. In the frequency domain, this bandwidth
is identified by the power of the tonal component being shared over several neighbouring
frequency bins, centered around a peak. A howling component on the other hand, does
19
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
not share power with the neighbouring frequency bins. By computing the power of a
possible howling frequency and dividing it with the power of neighbouring frequency
bins, one can make an assessment on whether the component is a tonal component or
a howling component. The PNPR for the possible howling frequency ωi with the m’th
neighbouring frequency bin is computed as
PNPR(ωi, m) = 10log10
|Y (ωi)|2
|Y (ωi + 2πm/M)|2. (4.5)
The values computed in eqs. (4.4) to (4.5) are then compared to predetermined thresholds
TP HP R, TP NP R, and if the computed values are higher than the threshold values for
frequency ωi, it is considered to be a howling frequency.
4.2.1.3 Interframe Magnitude Slope Deviation (IMSD)
This feature uses the fact that howling has been observed to increases exponentially in
energy over time, which means linearly in dB-scale. This increase is not observed in tonal
components. IMSD for the possible howling component ωi computes a measurement of
the deviation from linear increase, by performing differentiations between the energy for
ωi at older frames, and more contemporary frames. A large deviation from linearity, that
is to say a large IMSD, suggests that the candidate is indeed not a howling frequency,
whereas for small deviations, the candidate is considered a howling frequency. The IMSD
is computed by
IMSD(ωi, t) =1
MF
MF −1∑
m=1
[
1
MF
MF −1∑
j=0
1
MF − j
(20log|Y (ωit − jP )| − 20log|Y (ωi, t − MF P |)−
1
m
m−1∑
j=0
1
m − j(20log|Y (ωi, t − jP )| − 20log|(Y/ωi, t − mP )|)
]
. (4.6)
The IMSD for each candidate howling component is compared to the threshold value
TIMSD, and if IMSD(ωi, t) < TIMSD, the frequency ωi is considered to be a howling
frequency.
20
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
Table 4.1
Threshold Value [dB]
TP HP R 10
TP NP R 30
TIMSD 1
4.2.1.4 Final assessment
The thresholds used in the three evaluations are presented in table 4.1.
For the PHPR, the 2nd and 3rd harmonics were included in the evaluation, and howling
was said to be detected if the threshold was exceeded for all harmonics. In the PNPR,
the six closest neighbours, three above and three below, were included, and howling was
said to be detected if the ratio exceeded the threshold for all neighbours. The IMSD
stored the frequency contents of the last 16 frames, and thus evaluated the slope for all
possible howling frequency components over 16 frames. These numbers were inspired by
[8], where the authors evaluated a number of spectral and temporal criteria for howling
detection, and found the combination above to be robust and with a small false-alarm
percentage1. The final threshold values were tweaked and tested until a reasonable
howling detection was obtained.
The total assessment of the possible howling frequencies for each frame consisted of a
combination of PHPR, PNPR and IMSD, and only if all three conditions for howling
were fulfilled for the frequency ωi, it was considered to be a howling frequency, and
actions were taken to suppress the frequency at hand.
4.2.2 Suppression stage
Upon detecting howling at frequency ωi, the suppression stage applied a notch filter in
the acoustic forward path, centered at frequency ωi. A maximum of 20 notch filters was
set, in order to prevent the source signal from being overly distorted. To make the filters
as narrow as possible, biquadratic IIR filters were used in series, where the output yi[n]
from filter i with input xi[n] can be computed from the difference equation
1The false-alarm percentage is the ratio of occurrences of erroneously detected frequencies over the
total number of detected frequencies
21
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
yi[n] =1
a0
(
b0xi[n] + b1xi[n − 1] + b2xi[n − 2] − a1yi[n − 1] − a2yi[n − 1])
, (4.7)
where n is the sample number and a0, ..., a2, b0, ..., b2 are the filter coefficients. For each
of the 20 filters, the two latest output samples y[n − 1] and y[n − 2] and the three latest
input samples x[n], x[n − 1] and x[n − 2] are required. These samples were stored in a
3x21 matrix Ydel, where the input samples to the i′th filter were stored in i′th column,
and the output samples were stored in the i + 1′th column:
Ydel =
x1[n − 2] y1[n − 2] = x2[n − 2] . . . yC [n − 2]
. . .. . .
x1[n] y1[n] = x2[n] . . . yC [n]
(4.8)
where C is the number of active notch filters. Since the filters were applied in series,
the output samples from the i′th filter are the same as the input samples to the i + 1′th
filter. The filter design is by itself not considered in depth in this work. The filters
do not need to be designed in real time upon detection, since the frequency resolution
of the Fourier transform is known a-priori. The size of the Fourier transform frames
used in the detection phase was 4096 samples, which results in 2048 samples in the
one-sided frequency spectrum. Since the highest possible frequency was 8 kHz, the
frequency resolution was 8000/2048 = 3.9063 Hz / frequency bin. Knowing the frequency
resolution, notch filters can be designed offline for all available frequencies, and then
stored to save computational effort in the real-time implementation. Upon detecting
howling at a specific frequency, a look-up table can be used to activate the correct
filter. In this work however, the filters were designed upon detection with the MATLAB
function iirnotch, which returned the filter coefficients that were stored in a 6x20
matrix. All notch filters were designed to have a Q-factor of 35. With C number of
active notch filters, the output sample d(t) is the last element from the C + 1′th column
of the matrix Ydel. Recall that the total number of notch filters allowed were 20, which
makes the last element of the 21st column the final output sample, if all notch filters are
active.
4.3 Acoustic feedback cancellation
The method of using adaptive filters to cancel out unwanted components from the micro-
phone signal is widely used in teleconference applications. Acoustic feedback cancellation
22
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
x(t)
G
+y(t)+
F
u(t)
F
y(t)
−d(t)
Figure 4.3 – The AFC situation, where the impulse response F (t) is approximated with
an adaptive filter F (t).
is similar to the teleconference case, but instead of a far-end speaker signal being output
from the loudspeaker, it is the near-end speakers voice. The AFC system is described
in fig. 4.3.
F is an adaptive filter which is designed and adapted to resemble the real RIR F .
The loudspeaker signal x(t) is then filtered with F in order to estimate the feedback
component of the microphone signal. There are several algorithms to go about this, and
the one utilized in this work is the Normalized Least Mean Square (NLMS) algorithm[14].
This is a common algorithm in echo cancellation, and is generally a good trade-off
between computational complexity and convergence speed [15]. The NLMS algorithm is
described as follows.
4.3.1 NLMS
In each iteration, the output from the adaptive filter is computed as
d[n] = y[n] − F T [n]x[n], (4.9)
where F is the adaptive filter of size N , and x is a delay vector containing the N latest
loudspeaker output samples. The term F T [n]x[n] is thus the approximated feedback
component in the microphone signal.
The adaptive filter F is then updated according to
23
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
F [n + 1] = F [n] + µd∗[n]x[n]
xH [n]x[n], (4.10)
where µ is the step size and the term xH [n]x[n] is the energy contents of the loudspeaker
output delay vector. The division by the energy term, which is the difference between
NLMS and LMS, is included to avoid the algorithm to be sensitive to scaling of the
loudspeaker vector x. If the filter converges perfectly so that F = F , all feedback
components of the source signal will be removed, so that d[n] = u[n], leaving only
speech in the microphone signal. The choice of the step size parameter is of great
importance to the convergence of the adaptive filter. If the step size is too small, the
adaptive filter will converge slowly and respond slowly to changes in the RIR, resulting
in an erroneous filter in non-stationary conditions. On the other hand, if the step size is
too large, the convergence speed will increase, but problems with stability might occur.
For speech applications, a step size of between 0.01 and 0.04 has been recommended
in literature [3]. In this work, a fixed step-size of 0.01 was used, which was found to
be a reasonable trade-off between convergence speed and stability. To avoid that the
filter updates when the loudspeaker signal was not strong enough, a threshold Tenergy
was introduced, and the condition xH [n]x[n] > Tenergy was set as a requirement for
allowing the filter to update. As previously mentioned, the NLMS algorithm performs
poorly when there is a high correlation between the loudspeaker and microphone signals.
For this reason, the loudspeaker signal was decorrelated from the microphone signal by
frequency shifting the output signal d[n] by 5 Hz before amplification with the algorithm
described in section 4.1. Since in the simulations the actual RIR is known, we can
evaluate the performance of the adaptive filter by computing the filter misadjustment in
each iteration:
FMA =
N−1∑
i=0
(Fi − Fi)2
N−1∑
i=0
F 2
i
. (4.11)
24
5 Method for testing
In order to properly evaluate the tested methods, a theoretical measure of the maximum
stable gain was needed. This was done in MATLAB, where a PA-system was simulated
and set up to be able to evaluate the methods, both in terms of maximum achievable
stable gain and the subjective listening experience: how well do the methods sound.
5.1 MATLAB simulation and evaluation
Methods from the DSP toolbox were used in order to read audio data from the source
file. The source file that was used in the simulations was a 35 second section from a radio
essay by Johan Norberg called ”Johan Norberg om den exploderande lyckotrenden”[16],
resampled to 16 kHz. The file was read in blocks of 1024 samples at a time, and a loop
through the samples of the blocks simulated single input, single output processing. A
simple user interface was created, to be used in the ”live” mode, in order to subjectively
evaluate the methods. The user could choose between the three evaluated methods, and
also set the applied gain in real-time. The user also had the option to disable all feedback
suppression to evaluate the system without any processing.
Once a sample had been processed, it was put in an output buffer of the same length
as the RIR used in the simulations, namely 2001 samples. The dot product of the full
2001 samples of the output buffer and the RIR was computed to obtain the feedback
component of the microphone signal. A new feedback component was computed for
every new input sample, and added to the input sample to obtain the microphone signal,
consisting of both the source signal and the feedback component. Every 1024’th iteration,
the 1024 newest samples were output to the loudspeaker. This process successfully
25
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
simulated the loudspeaker signal’s interaction with the room, and the feedback into the
microphone. Howling could clearly be heard in the simulations, upon adjusting the gain
to a level over the initial maximum stable gain.
The simulation of the PA-system could also be run in a ”test” mode, where measurements
of the maximum stable gain were taken and stored. Since the three methods differ in
their way to affect the signal different expressions had to be used in order to calculate
the maximum stable gain. This could have been done in several ways. For instance, the
gain could be automatically raised in small steps in order to induce howling feedback,
upon which the gain level at which howling occurs could be noted. This way to go
about this is sub-optimal, since an instability does not directly induce howling, which
means that the howling can be missed if the measurements are too short, resulting in
an overestimation of the maximum stable gain.
In the simulations, the maximum stable gain was measured from the known RIR used
to simulate the acoustic feedback path. By considered the RIR without feedback sup-
pression, one can determine the initial maximum stable gain, simply by observing the
frequency response, and finding the MSG using eq. (2.20).
The maximum stable gain for the different methods was calculated by applying the
filters corresponding to the methods to the RIR, obtaining a modified RIR for each
method. For the frequency shifting method, a time-varying filter corresponding to the
5 Hz frequency shift was applied to the RIR, which resulted in a maximum stable gain
that oscillated over time. For the notch filter methods, notch filters were applied to the
RIR when a howling frequency was detected in the simulations, and a new maximum
stable gain was computed from the modified RIR, with the detected howling frequencies
suppressed. For the frequency shifting method and the NFS method, the MSG was
computed as
MSGNF S,F S = −20log(
max|H(t, ω)F (ω)|)
∀ω : ∠F (ω) = m2π m ∈ Z, (5.1)
where the filter H(t, ω) is a time dependent 5 Hz frequency shift or the cascade of
active notch filters, depending on which method is being tested. For the AFC method,
the maximum stable gain was calculated by finding the highest peak in the difference
|F (ω) − F (ω)| that fulfils the phase condition accoring to
MSGAF C = −20log(
max|F (ω) − F (ω)|)
∀ω : ∠F (ω) = m2π m ∈ Z. (5.2)
The 35s speech segment was divided into four sections of approximately 9 seconds each.
26
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
In the first section, the applied gain in the electro-acoustic forward path was set to 0
dB, which was approximately 3 dB below the initial maximum stable gain. The gain
was increased dB-linearly in the second section, until reaching its final level of 8 dB
at the beginning of section 3. At the beginning of section 4, the RIR was changed,
corresponding to a 1 meter displacement of the microphone. The applied gain and the
altered RIR was kept constant during the fourth section. This test method, found in [3],
is a theoretical evaluation of the maximum stable gain, and how it is affected by the gain
level and changes in the RIR. Both RIRs can be found in [17]. Since a real-time scenario
will result in a more rapidly changing RIR, there is no guarantee that one will be able
to reproduce these results in a real-time setup. The test serves as an initial assessment
of the methods.
27
6 Results
In this section, the results of the simulations are presented. The three methods were
evaluated in terms of maximum stable gain and subjective listening experience.
6.1 Feedback suppression
Loudspeaker signal
5 10 15 20 25 30 35
Time (secs)
0
1
2
3
4
5
6
7
8
Fre
quency (
kH
z)
-150
-140
-130
-120
-110
-100
-90
-80
-70
Po
we
r/fr
eq
ue
ncy (
dB
/Hz)
Figure 6.1 – The spectrogram of the loudspeaker signal, 0 dB applied gain.
28
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
Figure 6.1 shows the spectrogram of the loudspeaker signal, when the applied gain was
0 dB. To illustrate the feedback phenomenon, fig. 6.2 shows a spectrogram of the same
signal, but the applied gain being manually raised to induce howling. At three occasions,
a frequency around 500 Hz shows a divergence in power, which suggests that feedback
has occurred at this frequency. The applied gain when the feedback occurred was 4 dB,
which is slightly above the initial MSG. When howling feedback was clearly heard, the
gain was manually decreased to 0 dB.
Loudspeaker signal
5 10 15 20 25 30 35
Time (secs)
0
1
2
3
4
5
6
7
8
Fre
quency (
kH
z)
-140
-120
-100
-80
-60
-40
-20
Po
we
r/fr
eq
ue
ncy (
dB
/Hz)
Figure 6.2 – Spectrogram of the loudspeaker signal with howling feedback present, 4 dB
applied gain.
To illustrate the performance of the feedback suppressor algorithms, the gain was set to
6 dB upon which the feedback suppression algorithms were activated. The spectrograms
for the three methods are shown in figs. 6.3 to 6.5.
In fig. 6.3, one can observe the oscillating nature of the frequency shifting method. There
are indeed frequencies that has an increased power compared to the case with no howling
feedback, but they are shifted up, keeping the system stable. Figure 6.4 shows that the
notch filter method at the specified gain setting was successful at suppressing feedback.
At 27 seconds, an increased power can be observed briefly in the low-frequency range,
indicating that a howling frequency was audible before being detected and suppressed.
The spectrogram for the AFC method, shown in fig. 6.5, shows no such increase in power
29
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
Loudspeaker signal
5 10 15 20 25 30 35
Time (secs)
0
1
2
3
4
5
6
7
8
Fre
quency (
kH
z)
-150
-140
-130
-120
-110
-100
-90
-80
-70
-60
Po
we
r/fr
eq
ue
ncy (
dB
/Hz)
Figure 6.3 – Spectrogram for the frequency shifting method, 6 dB applied gain.
for any frequency, meaning that this method successfully had suppressed all howling
feedback.
30
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
Loudspeaker signal
5 10 15 20 25 30 35
Time (secs)
0
1
2
3
4
5
6
7
8
Fre
quency (
kH
z)
-150
-140
-130
-120
-110
-100
-90
-80
-70
-60
-50
Po
we
r/fr
eq
ue
ncy (
dB
/Hz)
Figure 6.4 – Spectrogram for the NFS method, 6 dB applied gain.
Loudspeaker signal
5 10 15 20 25 30 35
Time (secs)
0
1
2
3
4
5
6
7
8
Fre
quency (
kH
z)
-150
-140
-130
-120
-110
-100
-90
-80
-70P
ow
er/
fre
qu
en
cy (
dB
/Hz)
Figure 6.5 – Spectrogram for the AFC method, 6 dB applied gain.
31
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
6.2 Maximum stable gain
The results from the maximum stable gain calculations are shown in fig. 6.6, where the
different sections are marked with vertical dashed lines. The gain applied in the electro-
acoustic forward path is shown as a bold dashed line, and the maximum stable gain
curves for all methods are included.
6.2.1 Frequency shifting
It can be seen that the frequency shifting method oscillates around 6 dB MSG, meaning
that this method theoretically raises the MSG by approximately 3 dB compared to the
case with no feedback suppressor. Upon changing the RIR, the MSG decreased to a
slightly lower level. From around 15s into the simulations, the MSG of the frequency
shifting method is below the actual applied gain level, meaning that we can expect
howling or ringing sounds from 15s and forward.
6.2.2 Notch filters
For the notch filter method, the points where a notch filter was applied can be clearly
visualized by the vertical jumps in the curve. During the parts of the simulation where
the MSG of the notch filter method was above the actual applied gain, the algorithm
should not detect any howling frequencies. In fig. 6.6, this is true for the first ∼ 13 sec-
onds, where no howling was detected and no notch filter was activat. When the applied
gain increased to the level of the MSG for the notch filter method, a howling frequency
was detected, and a notch filter was activated, removing the problematic frequency and
thus increasing the maximum stable gain. Around 17 seconds into the simulation, the
gain level was raised above the MSG level of the notch filter method, which means that
the algorithm failed to detect a howling frequency. During the time interval 17-27 sec-
onds, we should, according to this theoretical measurement, experience some howling
or ringing tones. When the RIR changed, the algorithm successfully suppressed the
problematic frequency / frequencies, raising the MSG to a stable level. When all 20
notch filters were active, which occurs at around 28 seconds into the simulations, the
expected MSG was just below 10 dB, which was an increase with around 7 dB compared
to the case where no feedback suppressor was used. The number of active notch filters
over time is illustrated in fig. 6.7, where it can be seen that no notch filters were active
32
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
0
2
4
6
8
10
12
14
16
Time [s]
Gai
n[d
B]
GainFrequency shiftNotch filtersAcoustic feedback cancellation
Figure 6.6 – Maximum stable gain over time for all methods. The MSG curve for the
frequency shifting method has been smoothed for better visualization.
until the gain starts to increase, upon which a rapid increase in the number of notch
filters is observed. Changing the RIR almost instantly resulted in 5 new notch filters,
indicating that a change in the RIR does indeed affect the frequencies for which the
Nyquist stability criterion is fulfilled.
6.2.3 Acoustic feedback suppression
The curve for the AFC method is fluctuating heavily throughout the simulations, visu-
alizing the updates of the adaptive filter F . With the algorithm used, a basic NLMS-
method with the only requirement for the filter to update being the energy threshold,
there is no guarantee that the updated filter F [n + 1] will perform better than the pre-
vious filter F [n], and this is the reason that the MSG level sometimes can drop down.
33
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 300
2
4
6
8
10
12
14
16
18
20
Time [s]
Num
ber
ofnot
chfilt
ers
Figure 6.7 – The number of notch filters over time for the NFS method
The MSG level is mostly above the actual applied gain, meaning that throughout the
simulation, we should experience no or very little howling. A very temporary drop is
observed at around 28 seconds, which could result in a brief howling or ringing sound
at that time. The AFC method is observed to perform better at higher applied gain,
and the final MSG is expected to be around 11 dB, which is an increase of around 8 dB.
The maximum MSG value for this method occurred before the change in RIR and was
around 13 dB, which means that the potential MSG increase is 10 dB. In fig. 6.8, the fil-
ter misadjustment is shown. Initially, when the applied gain was 0 dB, the error was high
and the filter was thus badly approximated. Since the gain was low, this was expected,
since there was not enough information in the loudspeaker signal to correctly adapt the
filter. Upon increasing the gain, the misadjustment decreased, and the filter converged.
The change in the RIR resulted in an increase of the misadjustment by approximately
3 dB, upon which the misadjustment again decreased, indicating a converging filter.
6.3 Subjective listening experience
It is difficult to objectively evaluate the quality of processed speech, and due to this, the
listening experience will be described in words, and the sound quality for the methods
will be compared to each other. In the frequency shifting method, it should first of
all be concluded that a 5 Hz frequency shift on ordinary speech did not affect the
34
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
0 5 10 15 20 25 30 35 40−6
−5
−4
−3
−2
−1
0
1
Time [s]
Filte
rm
isadju
stm
ent
[dB
]
Figure 6.8 – The filter misadjustment for the NLMS filter
sound quality in a way that was not notable to me. Since the method does not prevent
howling to arise, ringing sounds were heard at gain levels above the initial MSG level.
The howling that occurred was then frequency shifted each loop, resulting in a brief
sweeping sound for each howling frequency being up-shifted. As the gain level increased,
more howling frequencies were heard as brief up-shifted sweeps, making the total sound
quality unacceptable for live applications. The system did not, however, show divergent
behaviour, even at the highest applied gain levels.
The notch filter method was, as expected, reactive, meaning that howling was heard
before the frequencies were suppressed. The level of the howling did not reach disturbing
levels before they were suppressed, however, making the listening experience decent
throughout the simulations. When a small number (0-5) of notch filter were active,
there were no audible artefacts, but the more notch filters that were activated, the more
the total sound quality was affected. By the time that close to all, or all notch filters
(15-20) were active, the sound was notably distorted, but the listening experience was
still deemed acceptable, especially compared to the frequency shifting method.
The AFC method resulted in the best listening experience, with no or very few disturbing
audible artefacts. The dip observed at 28 seconds in fig. 6.6 was not heard. There were
at times small echoes and noises in the background, which are assumed to be related to
an erroneously adapted filter. These small artefacts were not deemed disturbing, and
might blend in with the echo and reverberation that is present in all live PA scenarios.
35
7 Discussion, conclusion andfuture work
This thesis has surveyed the market and the literature of feedback suppression. A brief
summary of the available methods and the history on the subject has been presented,
and three methods have been chosen for further evaluation in MATLAB simulations. A
measurement of the maximum stable gain of these three methods has been presented,
and the subjective listening experience has been commented on. After working with the
simulations, and listening to the methods, my recommendation is that the AFC method
with a 5 Hz frequency shift on the loudspeaker signal should be used by Limes Audio.
The method has the highest expected maximum stable gain, and the subjective listening
experience was the best amongst the evaluated methods. There are a few things about
this method that needs attention. First, the method has only been evaluated successfully
in simulations. By the end of the project, attempts were made to make the MATLAB
simulations read audio streams from a microphone instead of from a file, enabling a real-
time setup and a more realistic evaluation of the methods. Even though the attempts
were successful, the simple task of reading audio from a microphone into MATLAB and
outputting it from a loudspeaker had an inherent time delay of approximately 300 ms,
which is unacceptable for live PA-applications. Due to time limitations of the project,
no work was put on fixing this issue, but the algorithms were tested with the time delay
included, in order to test basic functionality. All algorithms could be run without any
additional time delay or lag, however. The algorithms were successful in suppressing
feedback, but no MSG value could be fully determined.
As for the requirements on the feedback suppressor, an increased MSG of 8 − 10 dB
should be possible. This value is highly dependent upon the setup of the PA system and
the room in which it resides, and for this reason, there is no way to guarantee a certain
36
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
level of increased MSG. The MSG depends on the peaks in the magnitude response of
the RIR, an example of which can be seen in fig. 2.3. Even if the 10 or 20 largest peaks
are removed, there is a mean magnitude level in the magnitude response, and if the gain
is set so that this mean level reaches 0 dB, virtually all frequencies will howl, and there
is no way to stabilize that system and still maintain the speech signal.
The main concern with a AFC approach is changes in the room impulse response, due
to changes in the relative position of the microphone/loudspeaker. Even though the
simulations suggested that a rapid change of the position of the microphone with 1 m
does not result in howling or temporary instability, real-time testing is necessary to
evaluate how well the algorithm performs when the RIR changes. If the changes in
the RIR are to become a problem, a potential solution could be to combine the AFC
method with an AEQ or NFS method, which performs detection in a similar way that is
described in section 4.2. Upon detecting howling that is due to a changing RIR, the gain
in that particular sub-band could be temporarily lowered until the filter F has had the
time to adapt. The gain could then be slowly raised back to its initial value, if feedback
is no longer detected. Another idea would be to constantly monitor the changes in the
adaptive filter, and compare the updated filter to the old. If the difference between the
filters is ”big”, indicating a significant change in the RIR, then the step size could be
temporarily raised to speed up the convergence. When the difference is again ”small”,
indicating small changes in the RIR and more stationary conditions, the step size could
again be lowered to its initial value to ensure stability.
As for future work, the next step is to implement the AFC algorithm described in this
work to the Magneto Mixer by porting the MATLAB code to C. Real-time testing is
necessary to evaluate if there is a need for a ”fail-safe” sub-band equalization feature as
described above, and parameters such as the step size should be examined. In order to
ensure that the filter only updates when a better filter is available, future work should
evaluate if there is a suitable way to implement parallel adaptive filters. One filter F1 is
used to filter the output data and remove it from the microphone signal, described by
eq. (4.9). Another filter F2 can then be used in the update routine in eq. (4.10). An
evaluation algorithm could then determine if the filter F2 is better than the filter F1, and
if that is the case, we make the update F1 = F2. In AEC, echo return loss enhancement
(ERLE) is used for this, but it is yet to be determined if ERLE can be used in the AFC
case.
37
References
[1] M. R. Schroeder, “Improvement of acoustic-feedback stability by frequency shift-
ing”, The Journal of The Acoustical Society of America, vol. 36, no. 9, pp. 1718–
1724, 1964.
[2] M. Schroeder, “Improvement of acoustic feedback stability in public address sys-
tems”, Proc. 3rd int.congr.acoust, 1959.
[3] T. van Waterschoot and M. Moonen, “Fifty years of acoustic feedback control:
state of the art and future challenges”, Proc. IEEE, vol. 99, no. 2, pp. 288–327,
2011.
[4] M. Mandal and A. Asif, Continuous and discrete time signals and systems. Cam-
bridge University Press, 2007, isbn: 9780521854559.
[5] H. Nyquist, “Regeneration theory”, j-BELL-SYST-TECH-J, vol. 11, no. 1, pp. 126–
147, Jan. 1932.
[6] J. E. T. Patronis, “Electronic detection of acoustic feedback and automatic sound
system gain control”, J. Audio Eng.Soc., vol. 26, no. 2, 1978.
[7] P. Gil-Cacho, T. van Waterschoot, M. Moonen, and S. H. Jensen, “Regularized
adaptive notch filters for acoustic howling suppression”, 17th Eur. sig. proc.conf
(EUSIPCO 2009), 2009.
[8] T. van Waterschoot and M. Moonen, “Comparative evaluation of howling detection
criteria in notch-filter-based howling suppression”, Journal of the audio engineering
society, vol. 58, no. 11, pp. 923–940, 2010.
[9] ——, “Assessing the acoustic feedback control performance of adaptive feedback
cancellation in sound reinforcement systems”, in Proc. 17th European Signal Pro-
cess. Conf. (EUSIPCO ’09), Glasgow, Scotland, UK, Aug. 2009, pp. 1997–2001.
[10] M. Guo, S. Jensen, J.Jensen, and S.L.Grant, “On the use of phase modulation
method for decorrelation in acoustic feedback cancellation”, in Proc. 20th European
Signal Process. Conf. (EUSIPCO ’12), Bucharest, Romania, Aug. 2012.
38
Acoustic feedback suppressionin audio mixer for PA applications June 19, 2017
[11] T. van Waterschoot, G.Rombouts, and M. Moonen, “On the performance of decor-
relation by prefiltering for adaptive feedback cancellation in public address sys-
tems”, in Proc. 4th IEEE Benelux Signal Process. Symp. (SPS ’04), Hilvarenbeek,
The Netherlands, Apr. 2004, pp. 167–170.
[12] J. S. Marple, “Computing the discrete-time ”analytic” signal via fft”, in Signals,
Systems Amp; Computers, 1997. Conference Record of the 31st Asilomar Confer-
ence on, Nov. 1997, pp. 1322–1325.
[13] A. Reilly, G.Frazer, and B. Boashash, “Analytic signal generation - tips and traps”,
IEEE Transactions on Signal Processing, vol. 42, no. 11, pp. 3241–3245, Nov. 1994.
[14] E. Hansler and G. Schmidt, Acoustic Echo and Noise Control: A Practical Ap-
proach. Wiley-Interscience, 2004, isbn: 9780471453468.
[15] J. Dhiman, S.Ahmad, and K. Gulia, “Comparison between adaptive filter algo-
rithms (lms, nlms and rls)”, International Journal of Science, Engineering and
Technology Research (IJSETR), vol. 2, no. 5, May 2013.
[16] J. Norberg, Johan norberg om den exploderande lyckotrenden, http://sverigesradio.
se/sida/avsnitt/50019?programid=503, [Online, accessed february 2017], Feb.
2012.
[17] T. van Waterschoot and M. Moonen, Ftp, ftp://ftp.esat.kuleuven.be/sista/
vanwaterschoot/abstracts/08-13.html, [Online, accessed february 2017], Aug.
2013.
39