stereophonic acoustic echo cancellation using blind...
TRANSCRIPT
STEREOPHONIC ACOUSTIC ECHO CANCELLATION
USING BLIND SOURCE SEPARATION AS POST-PROCESSING
Takatoshi Okuno and M. O. Tokhi
Department of Automatic Control and Systems Engineering,
The University of Sheffield,
Mappin Street, Sheffield, S1 3JD, UK.
{cop98to, o.tokhi}@sheffield.ac.uk
ABSTRACT
Stereophonic or multi-channel acoustic echo cancella-tion systems suffer from cross-talk [1, 2]. This problemis also common to other multi-channel sound systems.Almost all attempts made to solve this problem arebased on pre-processing techniques, which decorrelatethe multi-channel input signals [3, 4]. However, pre-processing of these signals leads to a degradation of theaudio quality, and further attempts need to be madeto improve the audio quality [5]. Conversely, post-processing has the potential of avoiding these problemsand producing a natural sound.
This paper investigates post-processing in stereo-phonic acoustic echo cancellation (SAEC) using theblind source separation (BSS) technique. It is demon-strated through computer simulation that reasonableresults are obtained with this proposed method.
1. INTRODUCTION
This paper presents a new concept for SAEC usingpost-processing techniques. It is noted in the literaturethat almost all previously reported research has usedpre-processing to decorrelate stereo input signals. Thismeans that the stereo sound emitted from loudspeak-ers in the near-end is degraded by this pre-processing.Post-processing is a more natural approach than pre-processing. Considering post-processing, the problemcan be regarded as a signal separation problem, suchas BSS.
Figure 1 shows the configuration of the proposedSAEC system, where h11, h21, h12 and h22 representthe four acoustic paths of the stereophonic communi-cation system. Using these notations, the relationshipsbetween the input signal vectors u1, u2 and the desiredsignal vectors d1, d2 can be obtained as
d1 = h11 ∗ u1 + h21 ∗ u2 (1)
d2 = h12 ∗ u1 + h22 ∗ u2 (2)
The authors thank Mr. E. Begic for fruitful comments in
this research.
Near-end
Input signal
1e
2u
1u
1d 11d
Error signal 2d
12d
Adaptive Filter
2e
22d
21d Post
processing
22h
12h
11h
21h 11e
21e
12e
22e
Adaptive Filter
Figure 1: The configuration of a proposed SAEC sys-tem. (adaptive filters are depicted only for input signalu1.)
where ∗ denotes the convolution operation between twovectors. Equations (1) and (2) can be collected into asingle equation in the frequency domain using a matrixexpression as
[
D1
D2
]
=
[
H11 H21
H12 H22
] [
U1
U2
]
. (3)
It is easily noted that there are four unknown valiables,namely H11, H21, H12 and H22, and only one deficientset of simultaneous equations.
2. THE PROPOSED POST-PROCESSING
Let the four ideal desired signal vectors d11, d21, d12
and d22 in Figure 1 be expressed, using equations (1)and (2), as
d11 = h11 ∗ u1 + α (4)
d21 = h21 ∗ u2 + β (5)
d12 = h12 ∗ u1 + γ (6)
d22 = h22 ∗ u2 + δ (7)
where α, β, γ and δ are defined as the correspondingseparation errors. In this case, conventional adaptive
filtering (for example, LMS, RLS) could be utilised toestimate the impulse responses h11, h21, h12 and h22.
To obtain these ideal desired signals separately, con-sider a BSS system as shown in Figure 2. This has beenreported by Torkkola [6, 7] for the blind separation ofconvolved sources and is based on the concept of infor-mation maximisation (Infomax) [8].
+
+
11W
22W
12W
1d
2d
Mixtures Separated signals
11d
Maximise entropy
Update unmixing matrices
y
12d
21d
22d
Activation function
21W 0
1
g
Figure 2: The configuration of the feedback architec-ture of BSS using Infomax as post-processing.
As shown in Figure 2, the relationships between theinput and output are described through the feedbackarchitecture as
d11(n) =
N−1∑
k=0
w1k1d1(n − k) +
N−1∑
k=1
w2k1d22(n − k), (8)
d21(n) = −N−1∑
k=1
w2k1d22(n − k), (9)
d12(n) = −
N−1∑
k=1
w1k2d11(n − k), (10)
d22(n) =
N−1∑
k=0
w2k2d2(n − k) +
N−1∑
k=1
w1k2d11(n − k), (11)
where N denotes the number of filter coefficients andwikj is the k-th filter coefficient of the filter Wij . Al-though d21 and d12 are not in general utilised in appli-cations employing BSS, these signals are used here asoutput signals as well as to cancel cross-talk.
The stochastic adaptation rule for coefficients of fil-ters Wij on the basis of the Infomax criterion [6], canbe derived as follows
wi0i(n + 1) = wi0i(n) + η
(
yidi +1
wi0i(n)
)
, (12)
wiki(n + 1) = wiki(n) + η yidi(t − k), (13)
wikj(n + 1) = wikj(n) + η yidjj(t − k), (14)
where η is defined as the learning rate and yi is definedas
yi =∂
∂yi
∂yi
∂dii
where yi = g(dii), with g denoting an activation func-tion, such as a sigmoid function, as
yi = g(dii) =1
1 + exp (−dii). (15)
The Infomax criterion has a side effect; it temporar-ily whitens the outputs, which should be removed inthis application [7]. If W11 and W22 are forced to bescaling coefficients, instead of using equations (12) and(13), this effect can be avoided by
W11(z) = W22(z) = 1. (16)
Hence, W12 and W21 will ideally converge to
W21 = −H21(z)H22(z)−1 (17)
W12 = −H12(z)H11(z)−1. (18)
Finally, the outputs of the system can be obtained asin equations (4)-(7), and then conventional adaptive fil-tering algorithm can be used to cancel acoustic echoes.
After adaptive filtering is performed, two error sig-nals that are transmitted to the far-end are calculatedas
e1(n) = e11(n) + e21(n), (19)
e2(n) = e12(n) + e22(n). (20)
3. COMPUTER SIMULATION
Computer simulations were performed on the basis ofthe proposed methodology, using the parameters in Ta-ble 1.
Table 1: Parameters used in the computer simulationof BSS and adaptive filtering.
Parameters
Sampling frequency 8kHz
Step size µ in N-LMS 0.01
Learning rate η in BSS 0.001
Filter length for adaptive filters 16 samples
Filter length for BSS 16 samples
Two uncorrelated speech signals discretised at 8kHz sampling frequency, were utilised as stereo inputsignals (female voice and male voice). Although us-ing these uncorrelated speech signals is not realistic, itseems worthy to use these as a first step, as such input
signals have not been used previously to perform BSS.It means that there would be possibilities to use theseinput signals for modifying the separation algorithm,such as cross correlation function, even if these inputsignals are highly correlated with each other. Fourroom impulse responses, which will be estimated, wereartificially produced as [9]
h11(k) = 0.9 + 0.5k−1 + 0.3k−2,
h21(k) = −0.7k−5 − 0.3k−6 − 0.2k−7,
h12(k) = 0.5k−5 + 0.3k−6 + 0.2k−7,
h22(k) = 0.8 − 0.1k−1.
These room impulse responses characterise a minimumphase response. Note that due to the inverse matrix inequations (17) and (18), precise estimation of the roomimpulse response with non-minimum phase character-istic is not easy.
Figures 3 and 4 show examples of the BSS simula-tion. It is remarkable to note that almost perfect sep-aration has been achieved, even at periods when thereis no signal on one of the channels.
0
(a) Mixed signal (d1)
0
(b) Separated signal (d11)
0
(c) Separated signal (d21)
time (s)8.4 sec
Figure 3: The simulation result of BSS using Infomaxcriterion, (a): mixed signal d1, (b): separated signald11 and (c): separated signal d21.
Figure 5 indicates the convergence behaviour of thecross filter w21, using the coefficient error between trueand estimated values as
10 log10
(
∑N−1
k=0|g(k) − g(k)|2
|g(0) − g(0)|2
)
(21)
where g(k) is true value of the coefficients, g(k) is theestimate of the filter coefficient. The convergence isquite fast because the length of the room impulse re-sponse is very short. However, a steady-state value of
0
(a) Mixed signal (d2)
0
(b) Separated signal (d12)
0
(c) Separated signal (d22)
time (s) 8.4 sec
Figure 4: The simulation result of BSS using Infomaxcriterion, (a): mixed signal d2, (b): separated signald12 and (c): separated signal d22.
about -15 dB would be evaluated as a good result.
0 1 2 3 4 5 6 7
x 104
−20
−15
−10
−5
0Coefficient error
Iteration
dB
w21
Figure 5: Coefficient error between real and estimatedvalues for w21.
After applying BSS, the output signals were calcu-lated using converged cross filters w21, w12 and forcedfilters w11, w22 to evaluate equations (4)-(7). Then,normalised LMS algorithm was used to identify indi-vidually each room impulse response, h11, h21, h12 andh22 [10]. Figure 6 shows an example of the acousticecho cancellation with normalised LMS algorithm ford11 only. The coefficient error converged to around -15 dB only because there was a separation error as inequation (4). Although this separation error might berecognised as double-talk signal, it must be surpressed.Unless a perfect solution by BSS is obtained, this sep-aration error can not be neglected.
Figure 7 shows two error signals e11, e21 and thefinal error e1 in Figure 1, which is sent to the far-endafter adaptive filtering.
0
(a) Separated (desired) signal & Error signal
d11
0
time (s)
e11
8.4 sec
0 1 2 3 4 5 6
x 104
−10
−5
0
(b) Coefficient error
Iteration
dB
NLMS
Figure 6: Normalised LMS adaptive filtering to cancelout the acoustic echo d11 after applying BSS, (a): sep-arated signal d11 and error signal e11, (b): coefficienterror.
0
(a) Error signal e11
0
(b) Error signal e21
0
(c) Error signal e1
time (s) 8.4 sec
Figure 7: Two error signals (a): e11, (b): e21 and (c):the final output signal e1.
4. CONCLUSIONS
SAEC using BSS has been investigated from the viewof post-processing. It has been demonstrated with sim-ulated exercises that the proposed method has a greatdeal of potential with BSS for SAEC. It has been shownthat BSS using the Infomax criterion is a powerfulmethod for separating mixed signals. The next stepin this research would be to consider room impulse re-sponses with non-minimum phase characteristics andhighly correlated input signals. Moreover, this researchwill investigate the convergence speeds of BSS and adap-tive filters and the effect of BSS on the adaptive filters.
5. REFERENCES
[1] S. L. Gay and J. Benesty, Ed.: Acoustic
Signal Processing for Telecommunication. KluwerAcademic Publishers, Boston 2000.
[2] M. M. Sondhi, D. R. Morgan, and J. L.
Hall: Stereophonic acoustic echo cancellation -
an overview of the fundamental problem. IEEESignal Processing Letter, 2, pp. 148-151. 1995.
[3] S. Shimauchi, Y. Haneda, S. Makino, and
Y. Kaneda: New configuration for a stereo echo
canceller with nonlinear pre-processing. IEEEProceedings of ICASSP 98, pp. 3685-3688. 1998.
[4] J. Benesty, D. R. Morgan, and M. M.
Sondhi: A better understanding and an im-
proved solution to the specific problems of stereo-
phonic acoustic echo cancellation. IEEE Transac-tions on Speech Audio Processing, 6, pp. 156-165.1998.
[5] A. Gilloire, and V. Turbin: Using audi-
tory properties to improve the behaviour of stereo-
phonic acoustic echo cancellers. IEEE Proceed-ings of ICASSP 98, pp. 3681-3684. 1998.
[6] K. Torkkola: Blind separation of convolved
sources based information maximization. IEEEWorkshop on Neural Networks for Signal Pro-cessing, pp. 423-432. 1996.
[7] S. Haykin, Ed.: Unsupervised adaptive filtering,
Vol. I. Chapter 8, pp. 321-375, John Wiley &Sons Inc. 2000.
[8] A. J. Bell and T. J. Sejnowski: An
information-maximisation approach to blind sep-
aration and blind deconvolution. Neural Compu-tation, 7(6), pp. 1129-1159. 1995.
[9] T. W. Lee: Independent Component Analysis.Kluwer Academic Publishers, Boston 1998.
[10] S. Haykin: Adaptive filter theory. Prentice-Hall,NJ 1996.