stereophonic acoustic echo cancellation using blind...

STEREOPHONIC ACOUSTIC ECHO CANCELLATION

USING BLIND SOURCE SEPARATION AS POST-PROCESSING

Takatoshi Okuno and M. O. Tokhi

Department of Automatic Control and Systems Engineering,

The University of Sheffield,

Mappin Street, Sheffield, S1 3JD, UK.

{cop98to, o.tokhi}@sheffield.ac.uk

ABSTRACT

Stereophonic or multi-channel acoustic echo cancella-tion systems suffer from cross-talk [1, 2]. This problemis also common to other multi-channel sound systems.Almost all attempts made to solve this problem arebased on pre-processing techniques, which decorrelatethe multi-channel input signals [3, 4]. However, pre-processing of these signals leads to a degradation of theaudio quality, and further attempts need to be madeto improve the audio quality [5]. Conversely, post-processing has the potential of avoiding these problemsand producing a natural sound.

This paper investigates post-processing in stereo-phonic acoustic echo cancellation (SAEC) using theblind source separation (BSS) technique. It is demon-strated through computer simulation that reasonableresults are obtained with this proposed method.

1. INTRODUCTION

This paper presents a new concept for SAEC usingpost-processing techniques. It is noted in the literaturethat almost all previously reported research has usedpre-processing to decorrelate stereo input signals. Thismeans that the stereo sound emitted from loudspeak-ers in the near-end is degraded by this pre-processing.Post-processing is a more natural approach than pre-processing. Considering post-processing, the problemcan be regarded as a signal separation problem, suchas BSS.

Figure 1 shows the configuration of the proposedSAEC system, where h11, h21, h12 and h22 representthe four acoustic paths of the stereophonic communi-cation system. Using these notations, the relationshipsbetween the input signal vectors u1, u2 and the desiredsignal vectors d1, d2 can be obtained as

d1 = h11 ∗ u1 + h21 ∗ u2 (1)

d2 = h12 ∗ u1 + h22 ∗ u2 (2)

The authors thank Mr. E. Begic for fruitful comments in

this research.

Near-end

Input signal

1e

2u

1u

1d 11d

Error signal 2d

12d

Adaptive Filter

2e

22d

21d Post

processing

22h

12h

11h

21h 11e

21e

12e

22e

Adaptive Filter

Figure 1: The configuration of a proposed SAEC sys-tem. (adaptive filters are depicted only for input signalu1.)

where ∗ denotes the convolution operation between twovectors. Equations (1) and (2) can be collected into asingle equation in the frequency domain using a matrixexpression as

[

D1

D2

]

=

[

H11 H21

H12 H22

] [

U1

U2

]

. (3)

It is easily noted that there are four unknown valiables,namely H11, H21, H12 and H22, and only one deficientset of simultaneous equations.

2. THE PROPOSED POST-PROCESSING

Let the four ideal desired signal vectors d11, d21, d12

and d22 in Figure 1 be expressed, using equations (1)and (2), as

d11 = h11 ∗ u1 + α (4)

d21 = h21 ∗ u2 + β (5)

d12 = h12 ∗ u1 + γ (6)

d22 = h22 ∗ u2 + δ (7)

where α, β, γ and δ are defined as the correspondingseparation errors. In this case, conventional adaptive

filtering (for example, LMS, RLS) could be utilised toestimate the impulse responses h11, h21, h12 and h22.

To obtain these ideal desired signals separately, con-sider a BSS system as shown in Figure 2. This has beenreported by Torkkola [6, 7] for the blind separation ofconvolved sources and is based on the concept of infor-mation maximisation (Infomax) [8].

+

+

11W

22W

12W

1d

2d

Mixtures Separated signals

11d

Maximise entropy

Update unmixing matrices

y

12d

21d

22d

Activation function

21W 0

1

g

Figure 2: The configuration of the feedback architec-ture of BSS using Infomax as post-processing.

As shown in Figure 2, the relationships between theinput and output are described through the feedbackarchitecture as

d11(n) =

N−1∑

k=0

w1k1d1(n − k) +

N−1∑

k=1

w2k1d22(n − k), (8)

d21(n) = −N−1∑

k=1

w2k1d22(n − k), (9)

d12(n) = −

N−1∑

k=1

w1k2d11(n − k), (10)

d22(n) =

N−1∑

k=0

w2k2d2(n − k) +

N−1∑

k=1

w1k2d11(n − k), (11)

where N denotes the number of filter coefficients andwikj is the k-th filter coefficient of the filter Wij . Al-though d21 and d12 are not in general utilised in appli-cations employing BSS, these signals are used here asoutput signals as well as to cancel cross-talk.

The stochastic adaptation rule for coefficients of fil-ters Wij on the basis of the Infomax criterion [6], canbe derived as follows

wi0i(n + 1) = wi0i(n) + η

(

yidi +1

wi0i(n)

)

, (12)

wiki(n + 1) = wiki(n) + η yidi(t − k), (13)

wikj(n + 1) = wikj(n) + η yidjj(t − k), (14)

where η is defined as the learning rate and yi is definedas

yi =∂

∂yi

∂yi

∂dii

where yi = g(dii), with g denoting an activation func-tion, such as a sigmoid function, as

yi = g(dii) =1

1 + exp (−dii). (15)

The Infomax criterion has a side effect; it temporar-ily whitens the outputs, which should be removed inthis application [7]. If W11 and W22 are forced to bescaling coefficients, instead of using equations (12) and(13), this effect can be avoided by

W11(z) = W22(z) = 1. (16)

Hence, W12 and W21 will ideally converge to

W21 = −H21(z)H22(z)−1 (17)

W12 = −H12(z)H11(z)−1. (18)

Finally, the outputs of the system can be obtained asin equations (4)-(7), and then conventional adaptive fil-tering algorithm can be used to cancel acoustic echoes.

After adaptive filtering is performed, two error sig-nals that are transmitted to the far-end are calculatedas

e1(n) = e11(n) + e21(n), (19)

e2(n) = e12(n) + e22(n). (20)

3. COMPUTER SIMULATION

Computer simulations were performed on the basis ofthe proposed methodology, using the parameters in Ta-ble 1.

Table 1: Parameters used in the computer simulationof BSS and adaptive filtering.

Parameters

Sampling frequency 8kHz

Step size µ in N-LMS 0.01

Learning rate η in BSS 0.001

Filter length for adaptive filters 16 samples

Filter length for BSS 16 samples

Two uncorrelated speech signals discretised at 8kHz sampling frequency, were utilised as stereo inputsignals (female voice and male voice). Although us-ing these uncorrelated speech signals is not realistic, itseems worthy to use these as a first step, as such input

signals have not been used previously to perform BSS.It means that there would be possibilities to use theseinput signals for modifying the separation algorithm,such as cross correlation function, even if these inputsignals are highly correlated with each other. Fourroom impulse responses, which will be estimated, wereartificially produced as [9]

h11(k) = 0.9 + 0.5k−1 + 0.3k−2,

h21(k) = −0.7k−5 − 0.3k−6 − 0.2k−7,

h12(k) = 0.5k−5 + 0.3k−6 + 0.2k−7,

h22(k) = 0.8 − 0.1k−1.

These room impulse responses characterise a minimumphase response. Note that due to the inverse matrix inequations (17) and (18), precise estimation of the roomimpulse response with non-minimum phase character-istic is not easy.

Figures 3 and 4 show examples of the BSS simula-tion. It is remarkable to note that almost perfect sep-aration has been achieved, even at periods when thereis no signal on one of the channels.

0

(a) Mixed signal (d1)

0

(b) Separated signal (d11)

0

(c) Separated signal (d21)

time (s)8.4 sec

Figure 3: The simulation result of BSS using Infomaxcriterion, (a): mixed signal d1, (b): separated signald11 and (c): separated signal d21.

Figure 5 indicates the convergence behaviour of thecross filter w21, using the coefficient error between trueand estimated values as

10 log10

(

∑N−1

k=0|g(k) − g(k)|2

|g(0) − g(0)|2

)

(21)

where g(k) is true value of the coefficients, g(k) is theestimate of the filter coefficient. The convergence isquite fast because the length of the room impulse re-sponse is very short. However, a steady-state value of

0

(a) Mixed signal (d2)

0

(b) Separated signal (d12)

0

(c) Separated signal (d22)

time (s) 8.4 sec

Figure 4: The simulation result of BSS using Infomaxcriterion, (a): mixed signal d2, (b): separated signald12 and (c): separated signal d22.

about -15 dB would be evaluated as a good result.

0 1 2 3 4 5 6 7

x 104

−20

−15

−10

−5

0Coefficient error

Iteration

dB

w21

Figure 5: Coefficient error between real and estimatedvalues for w21.

After applying BSS, the output signals were calcu-lated using converged cross filters w21, w12 and forcedfilters w11, w22 to evaluate equations (4)-(7). Then,normalised LMS algorithm was used to identify indi-vidually each room impulse response, h11, h21, h12 andh22 [10]. Figure 6 shows an example of the acousticecho cancellation with normalised LMS algorithm ford11 only. The coefficient error converged to around -15 dB only because there was a separation error as inequation (4). Although this separation error might berecognised as double-talk signal, it must be surpressed.Unless a perfect solution by BSS is obtained, this sep-aration error can not be neglected.

Figure 7 shows two error signals e11, e21 and thefinal error e1 in Figure 1, which is sent to the far-endafter adaptive filtering.

0

(a) Separated (desired) signal & Error signal

d11

0

time (s)

e11

8.4 sec

0 1 2 3 4 5 6

x 104

−10

−5

0

(b) Coefficient error

Iteration

dB

NLMS

Figure 6: Normalised LMS adaptive filtering to cancelout the acoustic echo d11 after applying BSS, (a): sep-arated signal d11 and error signal e11, (b): coefficienterror.

0

(a) Error signal e11

0

(b) Error signal e21

0

(c) Error signal e1

time (s) 8.4 sec

Figure 7: Two error signals (a): e11, (b): e21 and (c):the final output signal e1.

4. CONCLUSIONS

SAEC using BSS has been investigated from the viewof post-processing. It has been demonstrated with sim-ulated exercises that the proposed method has a greatdeal of potential with BSS for SAEC. It has been shownthat BSS using the Infomax criterion is a powerfulmethod for separating mixed signals. The next stepin this research would be to consider room impulse re-sponses with non-minimum phase characteristics andhighly correlated input signals. Moreover, this researchwill investigate the convergence speeds of BSS and adap-tive filters and the effect of BSS on the adaptive filters.

5. REFERENCES

[1] S. L. Gay and J. Benesty, Ed.: Acoustic

Signal Processing for Telecommunication. KluwerAcademic Publishers, Boston 2000.

[2] M. M. Sondhi, D. R. Morgan, and J. L.

Hall: Stereophonic acoustic echo cancellation -

an overview of the fundamental problem. IEEESignal Processing Letter, 2, pp. 148-151. 1995.

[3] S. Shimauchi, Y. Haneda, S. Makino, and

Y. Kaneda: New configuration for a stereo echo

canceller with nonlinear pre-processing. IEEEProceedings of ICASSP 98, pp. 3685-3688. 1998.

[4] J. Benesty, D. R. Morgan, and M. M.

Sondhi: A better understanding and an im-

proved solution to the specific problems of stereo-

phonic acoustic echo cancellation. IEEE Transac-tions on Speech Audio Processing, 6, pp. 156-165.1998.

[5] A. Gilloire, and V. Turbin: Using audi-

tory properties to improve the behaviour of stereo-

phonic acoustic echo cancellers. IEEE Proceed-ings of ICASSP 98, pp. 3681-3684. 1998.

[6] K. Torkkola: Blind separation of convolved

sources based information maximization. IEEEWorkshop on Neural Networks for Signal Pro-cessing, pp. 423-432. 1996.

[7] S. Haykin, Ed.: Unsupervised adaptive filtering,

Vol. I. Chapter 8, pp. 321-375, John Wiley &Sons Inc. 2000.

[8] A. J. Bell and T. J. Sejnowski: An

information-maximisation approach to blind sep-

aration and blind deconvolution. Neural Compu-tation, 7(6), pp. 1129-1159. 1995.

[9] T. W. Lee: Independent Component Analysis.Kluwer Academic Publishers, Boston 1998.

[10] S. Haykin: Adaptive filter theory. Prentice-Hall,NJ 1996.

stereophonic acoustic echo cancellation using blind...

Documents