full-rank gaussian modeling of convolutive audio mixtures applied to source separation ngoc q. k....
TRANSCRIPT
Full-rank Gaussian modeling of convolutive audio mixtures applied
to source separation
Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent
METISS project team, INRIA,Center de Rennes - Bretagne Atlantique, France
Nov. 2010.
1
Table of content
2
Problem introduction and motivation
Considered framework and contributions
Estimation of model parameters
Conclusion and perspective
Under-determined source separation
3
Use recorded mixture signals to separate sources , where
Convolutive mixing model:
Denotes the source images, i.e. the contribution of a source to all microphones, and the vector of mixture signals
where the vector of mixing filters from source to microphone array
1
( ) ( )
J
jj
t tx c
( ) ( ) ( )
j j jt s tc h
1( ) ( ),..., ( )T
It x t x tx( )js t
IJ I J
( )j tc( )tx
1( ) ( ),..., ( ) T
j j Ijt h t h thj
Baseline approaches
Sparsity assumption: only FEW sources are active at each time-frequency point
Binary masking (DUET): only ONE source is active at each time-frequency point
L1-norm minimization:
,n f
( , ) 1
arg min ( , ) , s.t.
j
J
js n f j
s n f
4
STFT with narrowband approximation
( )
( ) ( ) j t
j jj
t s t
c
x h ( , )
( , ) ( ) ( , )
j n f
j jj
n f f s n f
c
x h
These techniques remain limited in the realistic reverberant environments since the narrowband approximation does not hold
Considered frameworkModels the STFT coefficients of the source images as zero-mean multivariate Gaussian random variables, i.e.
Spatial covariance models
Rank-1 model (given by the narrowband assumption):
Full-rank unconstrained model: The coefficients of are
unrelated a priori
5
( ) ( ) ( ) Hj j jf f fR h h
Most general possible model which allows more flexible modeling the mixing process
( , ) , ( , )jj cn f N n fcc 0 R
( , ) ( , ) ( )j j jn f v n f fcR R
Scalar source variances encoding spectro-temporal power of sources
I x I spatial covariance matrices encoding spatial position and spatial
spread of sources
( )j fR
Considered framework
6
Source separation can be achieved in two steps:
1. Model parameters are estimated in the ML sense
- Expectation Maximization (EM) algorithm is well-known as an appropriate choice for this ML estimation of the Gaussian mixing model
2. Source separation by multichannel Wiener filtering
Raised issues:
- Parameter initialization for EM
- Permutation alignment (well-known in frequency-
domain BSS)
1( , ) ( , ) ( ) ( , ) ( , )j j jn f v n f f n f n fxc R R x
Proposed algorithm
7
Flow of the BSS algorithms
( , )n fx( )tx ˆ( )ts
( ), ( )init initj jf fh R
( ), ( ), ( , )j j jf f v n fh R
ISTFTSTFT
Initialization by Hierarchical Clustering
Model parameter
estimation by EM
Permutation alignment
Wiener filtering
ˆ( , )n fs
In each step, we adapt the existing methods for the rank-1 model to our proposed full-rank unconstrained model
Parameter initialization[S. Winter et al. EURASIP vol.2007]
8
Principle: perform the hierarchical clustering of the mixture STFT coefficients in each frequency bin after a proper phase and amplitude normalization
Adaptations to our algorithms:1. and are computed from the phase normalized STFT coefficients instead of from both phase and amplitude normalized coefficients
2.We defines the distance between clusters as the average distance between samples instead of the minimum distance between them.
Source variance initialization:
( )initj fh ( )init
j fR
( , ) 1, , n,f jv n f j
( , )n fx
EM algorithm
9
EM for rank-1 model [C. Fevotte and J-F Cardoso, WASPAA2005]
- Mixing model: must consider noise component
Adaptations to the full-rank model
- Apply EM directly to the noiseless mixing model, i.e.
- Derive alternating parameter update rule (M-step) by maximizing the
likelihood of the complete data ( , ) ,j n f j nc
1
( , ) ( , )
J
jj
n f n fx c
1
( , ) ( ) ( , ) ( , )
J
j jj
n f f s n f n fx h b
11 ˆ( , ) tr ( ) ( , )
1 1 ˆ( ) ( , )( , )
j
j
j j
jn j
v n f f n fI
f n fN v n f
c
c
R R
R R
Permutation alignment [H. Sawada et al. ICASSP2006 ]
10
1arg ( )
2 ( ) ji w f
jw f ePhase of before and after permutation alignment with 60 250msT
Principle: permute the source orders base on the estimated
source DoAs and the clustered phase-normalized mixing vectors.
Adaptation to the full-rank model: Computing the first
principal component of by PCA and then
applying the algorithm to the “equivalent” mixing vector
The order of is permuted identically to that of ( )j fR
( )j fw
( )j fR
( , )jv n f
( )j fw
Experiment setup
r=0.5ms1
s2
s3
m1 m2
1.8m
1.5
m
Source and microphone height: 1.4 mRoom dimensions: 4.45 x 3.35 x 2.5 mMicrophone distance: d = 0.05 mReverberation time: 50, 130, 250, 500ms
Number of stereo mixtures
3
Speech length 8 s
Sampling rate 16 kHz
STFT window type Sine
Window length 1024
Number of EM iterations 10
Number of clusters K 30
Geometry settingParameter and program settings
11
Experimental result
Full-rank model outperforms both the rank-1 model and baseline approaches in a realistic reverberant environments
mixture
12
Conclusion & future work
13
Contributions- Proposed to model the convolutive mixing process by full-rank unconstrained spatial covariance matrices
- Designed the model parameter estimation algorithms for the full-rank model by adapting the estimation for rank-1 model
- We showed that the proposed algorithm using the full-rank unconstrained spatial covariance model outperforms state-of-the-art approaches.
Current result (in collaboration with S. Arberet and A. Ozerov)
Combined the proposed full-rank unconstrained covariance model with NMF model for source spectra (to appear in ISSPA, May 2010). Future workConsider the full-rank unconstrained model in the context of source localization.
Thanks for your attention!
& Your comments…?
14