blind source separation of acoustic signals based on multistage independent component analysis...

Blind Source Separation of Acoustic Signals Blind Source Separation of Acoustic Signals Based on Based on

Multistage Independent Component AnalysisMultistage Independent Component Analysis

Blind Source Separation of Acoustic Signals Blind Source Separation of Acoustic Signals Based on Based on

Multistage Independent Component AnalysisMultistage Independent Component Analysis

Hiroshi SARUWATARIHiroshi SARUWATARI,,

Tsuyoki NISHIKAWA, and Kiyohiro SHIKANOTsuyoki NISHIKAWA, and Kiyohiro SHIKANO

Graduate School of Information ScienceGraduate School of Information Science

Nara Institute of Science and TechnologyNara Institute of Science and Technology

Background of blind source separation (BSS)

Overview of Independent Component Analysis (ICA) Time-domain ICA (TDICA) Frequency-domain ICA (FDICA)

Disadvantages of TDICA and FDICA

Proposal of Multistage ICA

Experimental results under real acoustic condition

Conclusion

ContentsContents

BackgroundBackground

Hands-free speech recognition systemHands-free speech recognition system

Target SpeechMicrophone

Speech recognition system

？Interference

Interference is also observed at microphone.Speech recognition performance

significantly degrades.

Is it fine tomorrow?

Microphone array Receiver which consists of multiple elements To enhance target speech or reduce interference

Problem of microphone array processing A priori information is required.

Directions of arrival of the sound sources Breaks of target speech for filter adaptation

Background (Cont’d)Background (Cont’d)

Realization of high quality hands-free speech interface system

Problem of Microphone ArrayProblem of Microphone Array

• Delay and Sum： to produce narrow beam-pattern

• Adaptive： to update filters to reduce the noise

TargetHigh sidelobe gains noise.

Set target direction.

It is necessary to observe only noise for filter learning.

Approach taken to estimate source signals only from the observed mixed signals. Any information about source directions and acoustic c

onditions is not required. Independent component Analysis (ICA) is mainly used.

Previous works on ICA J. Cardoso, 1989 C. Jutten, 1990 (Higher-order decorrelation) P. Common, 1994 (define the term “ICA”) A. Bell et al., 1995 (infomax)

Blind Source Separation (BSS)Blind Source Separation (BSS)

Microphone2

Microphone1MutuallyMutuallyIndependentIndependent KnownKnown

ICA-Based BSSICA-Based BSS

Speaker2

Speaker1

Good Morning!

Hello!

Observedsignal1

Observedsignal2Source2

Source1

To estimate source signalsTo estimate source signals

No a priori information

(unsupervised adaptive filtering)

BSS for Instantaneous mixtureBSS for Instantaneous mixture

Linearly Mixing ProcessLinearly Mixing Process

Mixing Matrix Source Observed

Separation ProcessSeparation ProcessSeparated Unmixing Matrix

Independent?Independent?

Cost Function

Optimize

Various Criterion for ICAVarious Criterion for ICA

• Decorrelation– To minimize correlation among signals

in multiple time durations

• Nonlinear function 1– To minimize higher-order correlation

• Nonlinear function 2– To assume p.d.f of sources

Separated Signal：　 T21 )(),...,()( tytyt y

diag)()(E T tt yy diag)()(E T tt yy

diag)()(E T3 tt yy diag)()(E T3 tt yy

diag)()(E T tt yyΦ diag)()(E T tt yyΦ

:Φ Sigmoid function

Cost Function for Nonlinear Function 2Cost Function for Nonlinear Function 2

),,( 1 Kyyp Kullback-Leibler (KL) divergence between 　　　　　　　　 and

k kyp1)(

1. Joint Entropy of y 2. Sum of marginal entropy of ky

・Minimized when are mutually independent・ This can be achieved by maximization of because hardly changes.

ky);( WH Y

);( WYH k

);();(

)(log)()(

Derivation for Nonlinear Function 2Derivation for Nonlinear Function 2

)()()()(

xxyWWW

Nonlinear Function 2 　⇒　 To be diagonalized

This can be approximated by Sigmoid Function in speech signal.

1 )(log...,,

)(log)(

To update along the negative gradient of

Why Instantaneous Mixture Model?Why Instantaneous Mixture Model?

• ICA for Instantaneous Mixture model→ Mixing matrix is represented as real-valued.→ No assumption for time delay among

　　microphones and room reflections.

It is a useful example to show how ICA can work, but only mathematical “Toy model”.

Is it applicable to sound signals in real acoustic environment?

ICA for Convolutive Mixture Model (1)ICA for Convolutive Mixture Model (1)

• In application to microphone array,– each received signal has a time delay which corresponds

to the sound direction and position of each element.– mixing matrix A is not simple scalar-valued coefficient,

but is represented as “convolution filter”.

)()( 11

Mixing Process in Real EnvironmentMixing Process in Real Environment

Mixing Matrix (FIR-Filter) Source Observed

Convolution

⇒We should use FIR filter for separation process.

ICA for Convolutive Mixture Model (2)ICA for Convolutive Mixture Model (2)

)()( 11

Mixing Process in Frequency DomainMixing Process in Frequency Domain

Complex-Valued Mixing Matrix Source observed

⇒We only solve the complex-valued instantaneous mixture

　　 in each subband independently.

To simplify the convolutive mixture down to instantaneous mixtures by the frequency tra

nsform

Permutation Problem in FDICAPermutation Problem in FDICA

Permutation and Gain DeterminacyPermutation and Gain Determinacy

・ ICA is conducted in each subband independently.

・ Ordering and Scaling of outputs are arbitrary in ICA.

ICA at 1 kHz

2 kHzICA

1 kHzICA

S1×0.5

S2×0.4

S1×1.1

Solutions for Permutation ProblemSolutions for Permutation Problem

・ To use correlation among outputs waveforms

(Murata et al. 1998)

・ To use directivity pattern of W

(Kurita and Saruwatari, 2000)

・ To use correlation among unmixing matrices in 　 neighboring frequency bins

(Parra et al, 2000, Asano et al, 2001)

・ To use correlation among outputs waveforms

(Murata et al. 1998)

・ To use directivity pattern of W

(Kurita and Saruwatari, 2000)

・ To use correlation among unmixing matrices in 　 neighboring frequency bins

(Parra et al, 2000, Asano et al, 2001)

Problem in ICA-Based BSSProblem in ICA-Based BSS

It is necessary to achieve robust BSSIt is necessary to achieve robust BSS method against reverberation.method against reverberation.

・ Separation performance under reverberant

　 conditions significantly degrades.Why? Reverberation in typical roomReverberation in typical room

==2400-tap FIR-filter in 8 kHz sampling2400-tap FIR-filter in 8 kHz sampling

⇒⇒The number of parameters is too large.The number of parameters is too large.

Conventional ApproachesConventional Approaches

Frequency-Domain ICA (FDICA) To estimate the separation coefficients every

frequency bin in the frequency domain

Time-Domain ICA (TDICA) To estimate the separation FIR filter in the

time domain

FDICA-Based BSSFDICA-Based BSS

f<Advantages><Advantages> To simplify the convolutive mixture down to instantaneous mixtures by the frequency transform Easy to converge the separation filter in iterative ICA learning with high stability

<Advantages><Advantages> To simplify the convolutive mixture down to instantaneous mixtures by the frequency transform Easy to converge the separation filter in iterative ICA learning with high stability

In conventional dereverberation processing, dereverberation performance is improved as the number of subbands (filter length) is enlarged.

Performance of FDICAPerformance of FDICA

In FDICA, as the number of subbands is enlarged,

is source-separation performance also improved?

In FDICA, as the number of subbands is enlarged,

To investigate the relation between the number of subbands and source-separation performance

Experimental SetupExperimental Setup

Sound source: 2-male and 2-female speech from ASJ corpus (12 combinations) Evaluation: Noise Reduction Rate

= Output SNR [dB] – Input SNR [dB]

• Interelement spacing is 4 cm.• Reverberation time is 300 ms.

FDICA AlgorithmFDICA Algorithm

Iterative learning algorithm (Amari, 1996):

YYYY ofpartimaginary :ofpartreal : (I)(R) ,

η: Step size parameter

Relation between Number of Subbands Relation between Number of Subbands and Separation Performanceand Separation Performance

6.16.6

7.2 7.68.5

32 64 128 256 512 1024 2048 4096Number of Subbands

Separation performance significantly degradesSeparation performance significantly degrades

<Disadvantages><Disadvantages> When the number of subbands becomes too large, the independence assumption of narrow-band signals collapses (Nishikawa, Araki, et al., 2001). Separation performance in FDICA is saturated before reaching a sufficient performance.

Relation between Number of Subbands aRelation between Number of Subbands and Correlationnd Correlation

32 64 128 256 512 1024 2048 4096Number of Subbands

Higher correlationHigher correlation

If we increase the number of subbandsIf we increase the number of subbandstoo much, the correlation between too much, the correlation between narrow-band signals increases.narrow-band signals increases.

Trade-off Relation between Independence and Trade-off Relation between Independence and Robustness against ReverberationRobustness against Reverberation

Number of Subbands LargeSmall

Low-Low-IndependenceIndependence

High-Independence Robust against Reverberation

Poor againstReverberation

TDICA-Based BSSTDICA-Based BSS

<Advantages> To treat the fullband signals where the independence assumption of sources usually holds High-convergence possibility near the optimal point

In conventional dereverberation processing, dereverberation performance is improved as the filter length is lengthened.

Performance of TDICAPerformance of TDICA

In TDICA, as the filter length is lengthened,

To investigate the relation between the filter length and source-separation performance

TDICA AlgorithmTDICA Algorithm

Cost function

)()( )0(detlog)0( diagdetlog2

1))(( yy RRw

minimize

wheret

bbb nttn T)()()( )()()( yyRy

)()()()()( )(1

)()( tzktkt bK

bb xWxwy

kzkz wW

)()( )()( kttz bbk xx

TDICA Algorithm (Cont’d)TDICA Algorithm (Cont’d)

Iterative equation of separation filter

Natural gradient of Q(w(z))

)()()(

))(()( T zz

zQkΔ ii

)()()(1 kΔkk iii www

where α is the step-size parameter

Proposed Iterative Equation of TDICAProposed Iterative Equation of TDICA

Iterative equation of separation filter (TDICA1)

)()()()0(diag

)()0()(

)t()(1)(

)(1)(1

Evaluate only correlation of same time

Iterative equation of separation filter (TDICA2)

)()()(diag)0(diag

)()0(diag)(

)t()(1)(

)(1)(1

Expand to evaluate correlation of different time

0.90.30.10.30.40.30.2

0.60.2

0.4 0.3

10 20 50 100 200 500 1000 2000Filter Length [taps]

Results of TDICA1 and TDICA2Results of TDICA1 and TDICA2

TDICA 1 (only correlation of same time) TDICA 2 (correlation of different time)

It is necessary to evaluate correlation of different times to achieve a superior performance. Source separation using long filter fails in TDICA because the iterative rule for FIR-filter learning is complicated.

Advantages To simplify the convolutive

mixture down to instantaneous mixtures

Easy to converge the separation filter with high stability

Disadvantages Independence assumption

collapses in each narrow-band.

Separation performance is saturated before reaching a sufficient performance.

Problems and SolutionProblems and Solution

FDICA TDICA Advantages

To treat the fullband speech signals where the independence assumption of sources usually holds

High-convergence possibility near the optimal point

Disadvantages Iterative rule for filter learnin

g is complicated. Convergence degrades unde

r reverberant conditions.

ComplementComplement

Second-stageFirst-stage

To use advantages of FDICA and TDICA together

Multistage ICA (MSICA) combining FDICA and TDICA

Separation Procedure of MSICASeparation Procedure of MSICA

FDICA TDICAMixing system

Separated signals of FDICA are regarded as the input signals for TDICA. Residual cross-talk components of FDICA can be removed by TDICA.

To investigate the relation between the filter length and source-separation performance of TDICA part in MSICA

Comparison among separation performances of TDICA, FDICA, and MSICA

Comparison among Each ICAComparison among Each ICA

10.2 10.1 10.4 10.6

12.5 12.7

0.4 0.3

10.011.0

10 20 50 100 200 500 1000 2000Filter Length [taps]

[dB] TDICA MSICA FDICA

Relation between Filter Length Relation between Filter Length and Separation Performanceand Separation Performance

9.4 Separation performance of MSICA is improved even with the long filter. TDICA is still useful near the optimal point.

Separation performance of MSICA is improved even with the long filter. TDICA is still useful near the optimal point.

To investigate the relation between the filter length and source-separation performance of TDICA part in MSICA

Comparison among separation performances of TDICA, FDICA, and MSICA

Comparison among Each ICAComparison among Each ICA

Comparison ResultsComparison Results

1 2 3 4 5 6 7 8 9 10 11 12

Combination of Speakers

B] TDICA FDICA MSICAAverage: TDICA: 5.9 dB， FDICA: 9.4 dB，MSICA： 12.1 dBAverage: TDICA: 5.9 dB， FDICA: 9.4 dB，MSICA： 12.1 dB

Separation performance of MSICA is superior to

those of TDICA and FDICA.Combination of FDICA and TDICA is inherently

effective for improving the separation performance.

Separation performance of MSICA is superior to

those of TDICA and FDICA.Combination of FDICA and TDICA is inherently

effective for improving the separation performance.

Spectral Distortion (TDICA)Spectral Distortion (TDICA)

10 taps: No whitening, 1000 taps: whitening

Spectral Distortion (FDICA, MSICA)Spectral Distortion (FDICA, MSICA)

FDICA: No whitening, MSICA: No whitening

Sound Demonstration of MSICASound Demonstration of MSICA

Reverberation time: 300 ms

Mixed speech (Female+Male)

Separated speech (Female)

Separated speech (Male)

FemaleFemale

MaleMale

ConclusionConclusion

Disadvantages of FDICA and TDICA Separation performance of FDICA is saturated

before reaching a sufficient performance. Source separation using long filter fails in TDICA

because the iterative rule for FIR-filter learning is complicated.

MSICA combining FDICA and TDICA In TDICA part in MSICA, separation

performance is improved even with the long filter.

Separation performance of MSICA is superior to those of TDICA and FDICA.

Future WorkFuture Work

Further evaluation in real environment Robustness under reverberant conditions Larger array with more than 2-element

To apply BSS to speech recognition system Improvement of convergence speedOn-line and real-time algorithm

ICA and BSS: Where do we go?ICA and BSS: Where do we go?

We should go to NARA-city!

44thth International Symposium on International Symposium onICA and BSS (ICA2003), in NARAICA and BSS (ICA2003), in NARA

Date: April 1-4, 2003 Place: Nara, JAPAN Scientific Areas:

ICA and Factor Analysis, PCA etc. Blind source separation Blind and semi-blind equalization and deconvolution Blind identification Any signal processing application related with ICA

URL: http://ica2003.jp/

Analysis ConditionsAnalysis Conditions

Filter length10 (TDICA)

1000 (MSICA)

Number of blocks B

3 (TDICA)9 (MSICA)

ICA-iteration 500

Number of subbands 1024 point

Frame shift 16 point

ICA-iteration 30

FDICA, FDICA part in MSICA

TDICA, TDICA part in MSICA

blind source separation of acoustic signals based on multistage independent component analysis...

technology slide

null slide

ica decorrelation

source signals

signal sigmoid function

term ica

cost function

speech signal

Documents

weak values in quantum measurement theory - concepts and...

intravenous administration of auto serum-expanded...

regression approaches to voice quality control based on...

technical report of le fort i osteotomy using microsoft...

信号処理論第二第2回 (10/4) -...

open access research meta-analysis: the effects of smoking...

film...june saruwatari matt williams, manager 555 w 5th st...

1680 ieee/acm transactions on audio, … of mobile devices,...

wilhelm berger piano solo pieces - dsd … · piano solo...

pediatric assessment in drug development and … assessment...

innovative component technology to support the …innovative...

weak values with decoherence (typo in program) yutaka...

信号処理論第二第8回...

信号処理論特論第3&4回 (5/8,...

master thesis 2009 weak values in quantum...

応用音響学 - 東京大学saruwatari/aa2019_02.pdf音声...

信号処理論特論第3回...

13106204 yui ishimatsu 13106223 yumi saruwatari 13106248...

quantum measurement theory on a half line yutaka shikano...

2,4, ∗ ozgu¨r e. mu¨stecaplıog˘lu, 1, - arxivali u. c....