blind source separation of acoustic signals based on multistage independent component analysis...
Post on 19-Dec-2015
222 Views
Preview:
TRANSCRIPT
Blind Source Separation of Acoustic Signals Blind Source Separation of Acoustic Signals Based on Based on
Multistage Independent Component AnalysisMultistage Independent Component Analysis
Blind Source Separation of Acoustic Signals Blind Source Separation of Acoustic Signals Based on Based on
Multistage Independent Component AnalysisMultistage Independent Component Analysis
Hiroshi SARUWATARIHiroshi SARUWATARI,,
Tsuyoki NISHIKAWA, and Kiyohiro SHIKANOTsuyoki NISHIKAWA, and Kiyohiro SHIKANO
Graduate School of Information ScienceGraduate School of Information Science
Nara Institute of Science and TechnologyNara Institute of Science and Technology
Background of blind source separation (BSS)
Overview of Independent Component Analysis (ICA) Time-domain ICA (TDICA) Frequency-domain ICA (FDICA)
Disadvantages of TDICA and FDICA
Proposal of Multistage ICA
Experimental results under real acoustic condition
Conclusion
ContentsContents
BackgroundBackground
Hands-free speech recognition systemHands-free speech recognition system
Target SpeechMicrophone
Speech recognition system
?Interference
Interference is also observed at microphone.Speech recognition performance
significantly degrades.
Is it fine tomorrow?
Goal
Microphone array Receiver which consists of multiple elements To enhance target speech or reduce interference
Problem of microphone array processing A priori information is required.
Directions of arrival of the sound sources Breaks of target speech for filter adaptation
Background (Cont’d)Background (Cont’d)
Realization of high quality hands-free speech interface system
Problem of Microphone ArrayProblem of Microphone Array
• Delay and Sum: to produce narrow beam-pattern
• Adaptive: to update filters to reduce the noise
θ
TargetHigh sidelobe gains noise.
θ
Set target direction.
It is necessary to observe only noise for filter learning.
Null
Approach taken to estimate source signals only from the observed mixed signals. Any information about source directions and acoustic c
onditions is not required. Independent component Analysis (ICA) is mainly used.
Previous works on ICA J. Cardoso, 1989 C. Jutten, 1990 (Higher-order decorrelation) P. Common, 1994 (define the term “ICA”) A. Bell et al., 1995 (infomax)
Blind Source Separation (BSS)Blind Source Separation (BSS)
Microphone2
Microphone1MutuallyMutuallyIndependentIndependent KnownKnown
ICA-Based BSSICA-Based BSS
Speaker2
Speaker1
Good Morning!
Hello!
Observedsignal1
Observedsignal2Source2
Source1
To estimate source signalsTo estimate source signals
No a priori information
(unsupervised adaptive filtering)
BSS for Instantaneous mixtureBSS for Instantaneous mixture
)(
)(
)(
)( 11
1
111
tx
tx
ts
ts
AA
AA
LKLKL
K
Linearly Mixing ProcessLinearly Mixing Process
Mixing Matrix Source Observed
Separation ProcessSeparation ProcessSeparated Unmixing Matrix
)(
)(
)(
)( 1
1
1111
tx
tx
WW
WW
ty
ty
LKLK
L
K
Independent?Independent?
Cost Function
Optimize
Various Criterion for ICAVarious Criterion for ICA
• Decorrelation– To minimize correlation among signals
in multiple time durations
• Nonlinear function 1– To minimize higher-order correlation
• Nonlinear function 2– To assume p.d.f of sources
Separated Signal: T21 )(),...,()( tytyt y
diag)()(E T tt yy diag)()(E T tt yy
diag)()(E T3 tt yy diag)()(E T3 tt yy
diag)()(E T tt yyΦ diag)()(E T tt yyΦ
:Φ Sigmoid function
Cost Function for Nonlinear Function 2Cost Function for Nonlinear Function 2
),,( 1 Kyyp Kullback-Leibler (KL) divergence between and
K
k kyp1)(
1. Joint Entropy of y 2. Sum of marginal entropy of ky
・Minimized when are mutually independent・ This can be achieved by maximization of because hardly changes.
ky);( WH Y
);( WYH k
K
kk
K
k k
WYHWH
dyp
ppWKL
1
1
);();(
)(
)(log)()(
Y
yy
y
Derivation for Nonlinear Function 2Derivation for Nonlinear Function 2
1TT
T1T
T1T
)(E
)(E)(
)()()()(
WyyI
xyW
xxyWWW
W
y
x
dxpKL
)(WKL
Nonlinear Function 2 ⇒ To be diagonalized
where
W
This can be approximated by Sigmoid Function in speech signal.
T
1
1 )(log...,,
)(log)(
K
K
yyp
yyp
y
To update along the negative gradient of
Why Instantaneous Mixture Model?Why Instantaneous Mixture Model?
• ICA for Instantaneous Mixture model→ Mixing matrix is represented as real-valued.→ No assumption for time delay among
microphones and room reflections.
It is a useful example to show how ICA can work, but only mathematical “Toy model”.
Is it applicable to sound signals in real acoustic environment?
NO!
ICA for Convolutive Mixture Model (1)ICA for Convolutive Mixture Model (1)
• In application to microphone array,– each received signal has a time delay which corresponds
to the sound direction and position of each element.– mixing matrix A is not simple scalar-valued coefficient,
but is represented as “convolution filter”.
)(
)(
)(
)(
)()(
)()( 11
1
111
tx
tx
ts
ts
tAtA
tAtA
LKLKL
K
Mixing Process in Real EnvironmentMixing Process in Real Environment
Mixing Matrix (FIR-Filter) Source Observed
Convolution
⇒We should use FIR filter for separation process.
ICA for Convolutive Mixture Model (2)ICA for Convolutive Mixture Model (2)
)(
)(
)(
)(
)()(
)()( 11
1
111
fX
fX
fS
fS
fAfA
fAfA
LKLKL
K
Mixing Process in Frequency DomainMixing Process in Frequency Domain
Complex-Valued Mixing Matrix Source observed
⇒We only solve the complex-valued instantaneous mixture
in each subband independently.
To simplify the convolutive mixture down to instantaneous mixtures by the frequency tra
nsform
Permutation Problem in FDICAPermutation Problem in FDICA
Permutation and Gain DeterminacyPermutation and Gain Determinacy
・ ICA is conducted in each subband independently.
・ Ordering and Scaling of outputs are arbitrary in ICA.
ICA at 1 kHz
ICA at 1 kHz
2 kHzICA
1 kHzICA
FDICA
S1
S2
S1×0.5
S2×1
S2×0.4
S1×1.1
Solutions for Permutation ProblemSolutions for Permutation Problem
・ To use correlation among outputs waveforms
(Murata et al. 1998)
・ To use directivity pattern of W
(Kurita and Saruwatari, 2000)
・ To use correlation among unmixing matrices in neighboring frequency bins
(Parra et al, 2000, Asano et al, 2001)
・ To use correlation among outputs waveforms
(Murata et al. 1998)
・ To use directivity pattern of W
(Kurita and Saruwatari, 2000)
・ To use correlation among unmixing matrices in neighboring frequency bins
(Parra et al, 2000, Asano et al, 2001)
Problem in ICA-Based BSSProblem in ICA-Based BSS
It is necessary to achieve robust BSSIt is necessary to achieve robust BSS method against reverberation.method against reverberation.
It is necessary to achieve robust BSSIt is necessary to achieve robust BSS method against reverberation.method against reverberation.
・ Separation performance under reverberant
conditions significantly degrades.Why? Reverberation in typical roomReverberation in typical room
==2400-tap FIR-filter in 8 kHz sampling2400-tap FIR-filter in 8 kHz sampling
⇒⇒The number of parameters is too large.The number of parameters is too large.
Conventional ApproachesConventional Approaches
Frequency-Domain ICA (FDICA) To estimate the separation coefficients every
frequency bin in the frequency domain
Time-Domain ICA (TDICA) To estimate the separation FIR filter in the
time domain
FDICA-Based BSSFDICA-Based BSS
f<Advantages><Advantages> To simplify the convolutive mixture down to instantaneous mixtures by the frequency transform Easy to converge the separation filter in iterative ICA learning with high stability
<Advantages><Advantages> To simplify the convolutive mixture down to instantaneous mixtures by the frequency transform Easy to converge the separation filter in iterative ICA learning with high stability
In conventional dereverberation processing, dereverberation performance is improved as the number of subbands (filter length) is enlarged.
Performance of FDICAPerformance of FDICA
In FDICA, as the number of subbands is enlarged,
is source-separation performance also improved?
In FDICA, as the number of subbands is enlarged,
is source-separation performance also improved?
<Speculation>
To investigate the relation between the number of subbands and source-separation performance
Experimental SetupExperimental Setup
Sound source: 2-male and 2-female speech from ASJ corpus (12 combinations) Evaluation: Noise Reduction Rate
= Output SNR [dB] – Input SNR [dB]
• Interelement spacing is 4 cm.• Reverberation time is 300 ms.
FDICA AlgorithmFDICA Algorithm
Iterative learning algorithm (Amari, 1996):
where
YYYY ofpartimaginary :ofpartreal : (I)(R) ,
η: Step size parameter
Relation between Number of Subbands Relation between Number of Subbands and Separation Performanceand Separation Performance
6.16.6
7.2 7.68.5
7.4
9.4
3.0
0
2
4
6
8
10
32 64 128 256 512 1024 2048 4096Number of Subbands
Noi
se R
educ
tion
Rat
e [d
B]
Separation performance significantly degradesSeparation performance significantly degrades
<Disadvantages><Disadvantages> When the number of subbands becomes too large, the independence assumption of narrow-band signals collapses (Nishikawa, Araki, et al., 2001). Separation performance in FDICA is saturated before reaching a sufficient performance.
<Disadvantages><Disadvantages> When the number of subbands becomes too large, the independence assumption of narrow-band signals collapses (Nishikawa, Araki, et al., 2001). Separation performance in FDICA is saturated before reaching a sufficient performance.
Relation between Number of Subbands aRelation between Number of Subbands and Correlationnd Correlation
0
0.02
0.04
0.06
0.08
0.1
32 64 128 256 512 1024 2048 4096Number of Subbands
Cor
rela
tion
am
ong
Sig
nals
Higher correlationHigher correlation
If we increase the number of subbandsIf we increase the number of subbandstoo much, the correlation between too much, the correlation between narrow-band signals increases.narrow-band signals increases.
If we increase the number of subbandsIf we increase the number of subbandstoo much, the correlation between too much, the correlation between narrow-band signals increases.narrow-band signals increases.
Trade-off Relation between Independence and Trade-off Relation between Independence and Robustness against ReverberationRobustness against Reverberation
Number of Subbands LargeSmall
No
ise
Red
uct
ion
Rat
e
Low-Low-IndependenceIndependence
High-Independence Robust against Reverberation
Poor againstReverberation
TDICA-Based BSSTDICA-Based BSS
<Advantages> To treat the fullband signals where the independence assumption of sources usually holds High-convergence possibility near the optimal point
<Advantages> To treat the fullband signals where the independence assumption of sources usually holds High-convergence possibility near the optimal point
In conventional dereverberation processing, dereverberation performance is improved as the filter length is lengthened.
Performance of TDICAPerformance of TDICA
In TDICA, as the filter length is lengthened,
is source-separation performance also improved?
In TDICA, as the filter length is lengthened,
is source-separation performance also improved?
<Speculation>
To investigate the relation between the filter length and source-separation performance
TDICA AlgorithmTDICA Algorithm
Cost function
B
b
bb
BzQ
1
)()( )0(detlog)0( diagdetlog2
1))(( yy RRw
minimize
wheret
bbb nttn T)()()( )()()( yyRy
)()()()()( )(1
0
)()( tzktkt bK
k
bb xWxwy
1
0
)()(K
k
kzkz wW
)()( )()( kttz bbk xx
TDICA Algorithm (Cont’d)TDICA Algorithm (Cont’d)
Iterative equation of separation filter
Natural gradient of Q(w(z))
)()()(
))(()( T zz
k
zQkΔ ii
i
ii WW
w
Ww
)()()(1 kΔkk iii www
where α is the step-size parameter
Proposed Iterative Equation of TDICAProposed Iterative Equation of TDICA
Iterative equation of separation filter (TDICA1)
)()()()0(diag
)()0()(
)t()(1)(
1
)(1)(1
kzk
kB
k
iibb
B
b
bbi
wWRR
RRw
yy
yy
Evaluate only correlation of same time
Iterative equation of separation filter (TDICA2)
)()()(diag)0(diag
)()0(diag)(
)t()(1)(
1
)(1)(1
kzk
kB
k
iibb
B
b
bbi
wWRR
RRw
yy
yy
Expand to evaluate correlation of different time
5.8
4.4
2.8
0.90.30.10.30.40.30.2
0.60.2
7.8
1.7
0.4 0.3
0
1
2
3
4
5
6
7
8
10 20 50 100 200 500 1000 2000Filter Length [taps]
Noi
se R
educt
ion R
ate
[dB]
Results of TDICA1 and TDICA2Results of TDICA1 and TDICA2
TDICA 1 (only correlation of same time) TDICA 2 (correlation of different time)
It is necessary to evaluate correlation of different times to achieve a superior performance. Source separation using long filter fails in TDICA because the iterative rule for FIR-filter learning is complicated.
It is necessary to evaluate correlation of different times to achieve a superior performance. Source separation using long filter fails in TDICA because the iterative rule for FIR-filter learning is complicated.
<Discussions>
Advantages To simplify the convolutive
mixture down to instantaneous mixtures
Easy to converge the separation filter with high stability
Disadvantages Independence assumption
collapses in each narrow-band.
Separation performance is saturated before reaching a sufficient performance.
Problems and SolutionProblems and Solution
FDICA TDICA Advantages
To treat the fullband speech signals where the independence assumption of sources usually holds
High-convergence possibility near the optimal point
Disadvantages Iterative rule for filter learnin
g is complicated. Convergence degrades unde
r reverberant conditions.
ComplementComplement
Second-stageFirst-stage
To use advantages of FDICA and TDICA together
Multistage ICA (MSICA) combining FDICA and TDICA
Multistage ICA (MSICA) combining FDICA and TDICA
Separation Procedure of MSICASeparation Procedure of MSICA
FDICA TDICAMixing system
Separated signals of FDICA are regarded as the input signals for TDICA. Residual cross-talk components of FDICA can be removed by TDICA.
Separated signals of FDICA are regarded as the input signals for TDICA. Residual cross-talk components of FDICA can be removed by TDICA.
To investigate the relation between the filter length and source-separation performance of TDICA part in MSICA
Comparison among separation performances of TDICA, FDICA, and MSICA
Comparison among Each ICAComparison among Each ICA
10.2 10.1 10.4 10.6
12.5 12.7
0.9
2.8
4.4
5.8
7.8
1.7
0.4 0.3
10.011.0
0
2
4
6
8
10
12
14
10 20 50 100 200 500 1000 2000Filter Length [taps]
Noi
se R
educt
ion R
ate
[dB] TDICA MSICA FDICA
Relation between Filter Length Relation between Filter Length and Separation Performanceand Separation Performance
9.4 Separation performance of MSICA is improved even with the long filter. TDICA is still useful near the optimal point.
Separation performance of MSICA is improved even with the long filter. TDICA is still useful near the optimal point.
<Discussions>
To investigate the relation between the filter length and source-separation performance of TDICA part in MSICA
Comparison among separation performances of TDICA, FDICA, and MSICA
Comparison among Each ICAComparison among Each ICA
Comparison ResultsComparison Results
0
2
4
6
8
10
12
14
16
18
1 2 3 4 5 6 7 8 9 10 11 12
Combination of Speakers
Noi
se R
educ
tion
Rat
e [d
B] TDICA FDICA MSICAAverage: TDICA: 5.9 dB, FDICA: 9.4 dB,MSICA: 12.1 dBAverage: TDICA: 5.9 dB, FDICA: 9.4 dB,MSICA: 12.1 dB
Separation performance of MSICA is superior to
those of TDICA and FDICA.Combination of FDICA and TDICA is inherently
effective for improving the separation performance.
Separation performance of MSICA is superior to
those of TDICA and FDICA.Combination of FDICA and TDICA is inherently
effective for improving the separation performance.
<Discussions>
Spectral Distortion (TDICA)Spectral Distortion (TDICA)
10 taps: No whitening, 1000 taps: whitening
Spectral Distortion (FDICA, MSICA)Spectral Distortion (FDICA, MSICA)
FDICA: No whitening, MSICA: No whitening
Sound Demonstration of MSICASound Demonstration of MSICA
Reverberation time: 300 ms
Mixed speech (Female+Male)
Separated speech (Female)
Separated speech (Male)
FemaleFemale
40°
MaleMale
-30°
ConclusionConclusion
Disadvantages of FDICA and TDICA Separation performance of FDICA is saturated
before reaching a sufficient performance. Source separation using long filter fails in TDICA
because the iterative rule for FIR-filter learning is complicated.
MSICA combining FDICA and TDICA In TDICA part in MSICA, separation
performance is improved even with the long filter.
Separation performance of MSICA is superior to those of TDICA and FDICA.
Future WorkFuture Work
Further evaluation in real environment Robustness under reverberant conditions Larger array with more than 2-element
To apply BSS to speech recognition system Improvement of convergence speedOn-line and real-time algorithm
ICA and BSS: Where do we go?ICA and BSS: Where do we go?
We should go to NARA-city!
Why?
44thth International Symposium on International Symposium onICA and BSS (ICA2003), in NARAICA and BSS (ICA2003), in NARA
Date: April 1-4, 2003 Place: Nara, JAPAN Scientific Areas:
ICA and Factor Analysis, PCA etc. Blind source separation Blind and semi-blind equalization and deconvolution Blind identification Any signal processing application related with ICA
URL: http://ica2003.jp/
Analysis ConditionsAnalysis Conditions
Filter length10 (TDICA)
1000 (MSICA)
Number of blocks B
3 (TDICA)9 (MSICA)
ICA-iteration 500
Number of subbands 1024 point
Frame shift 16 point
ICA-iteration 30
FDICA, FDICA part in MSICA
TDICA, TDICA part in MSICA
top related