full-rank gaussian modeling of convolutive audio mixtures applied to source separation ngoc q. k....

14
Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS project team, INRIA, Center de Rennes - Bretagne Atlantique, France Nov. 2010. 1

Upload: noah-kelley

Post on 14-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Full-rank Gaussian modeling of convolutive audio mixtures applied

to source separation

Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent

METISS project team, INRIA,Center de Rennes - Bretagne Atlantique, France

Nov. 2010.

1

Page 2: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Table of content

2

Problem introduction and motivation

Considered framework and contributions

Estimation of model parameters

Conclusion and perspective

Page 3: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Under-determined source separation

3

Use recorded mixture signals to separate sources , where

Convolutive mixing model:

Denotes the source images, i.e. the contribution of a source to all microphones, and the vector of mixture signals

where the vector of mixing filters from source to microphone array

1

( ) ( )

J

jj

t tx c

( ) ( ) ( )

j j jt s tc h

1( ) ( ),..., ( )T

It x t x tx( )js t

IJ I J

( )j tc( )tx

1( ) ( ),..., ( ) T

j j Ijt h t h thj

Page 4: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Baseline approaches

Sparsity assumption: only FEW sources are active at each time-frequency point

Binary masking (DUET): only ONE source is active at each time-frequency point

L1-norm minimization:

,n f

( , ) 1

arg min ( , ) , s.t.

j

J

js n f j

s n f

4

STFT with narrowband approximation

( )

( ) ( ) j t

j jj

t s t

c

x h ( , )

( , ) ( ) ( , )

j n f

j jj

n f f s n f

c

x h

These techniques remain limited in the realistic reverberant environments since the narrowband approximation does not hold

Page 5: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Considered frameworkModels the STFT coefficients of the source images as zero-mean multivariate Gaussian random variables, i.e.

Spatial covariance models

Rank-1 model (given by the narrowband assumption):

Full-rank unconstrained model: The coefficients of are

unrelated a priori

5

( ) ( ) ( ) Hj j jf f fR h h

Most general possible model which allows more flexible modeling the mixing process

( , ) , ( , )jj cn f N n fcc 0 R

( , ) ( , ) ( )j j jn f v n f fcR R

Scalar source variances encoding spectro-temporal power of sources

I x I spatial covariance matrices encoding spatial position and spatial

spread of sources

( )j fR

Page 6: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Considered framework

6

Source separation can be achieved in two steps:

1. Model parameters are estimated in the ML sense

- Expectation Maximization (EM) algorithm is well-known as an appropriate choice for this ML estimation of the Gaussian mixing model

2. Source separation by multichannel Wiener filtering

Raised issues:

- Parameter initialization for EM

- Permutation alignment (well-known in frequency-

domain BSS)

1( , ) ( , ) ( ) ( , ) ( , )j j jn f v n f f n f n fxc R R x

Page 7: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Proposed algorithm

7

Flow of the BSS algorithms

( , )n fx( )tx ˆ( )ts

( ), ( )init initj jf fh R

( ), ( ), ( , )j j jf f v n fh R

ISTFTSTFT

Initialization by Hierarchical Clustering

Model parameter

estimation by EM

Permutation alignment

Wiener filtering

ˆ( , )n fs

In each step, we adapt the existing methods for the rank-1 model to our proposed full-rank unconstrained model

Page 8: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Parameter initialization[S. Winter et al. EURASIP vol.2007]

8

Principle: perform the hierarchical clustering of the mixture STFT coefficients in each frequency bin after a proper phase and amplitude normalization

Adaptations to our algorithms:1. and are computed from the phase normalized STFT coefficients instead of from both phase and amplitude normalized coefficients

2.We defines the distance between clusters as the average distance between samples instead of the minimum distance between them.

Source variance initialization:

( )initj fh ( )init

j fR

( , ) 1, , n,f jv n f j

( , )n fx

Page 9: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

EM algorithm

9

EM for rank-1 model [C. Fevotte and J-F Cardoso, WASPAA2005]

- Mixing model: must consider noise component

Adaptations to the full-rank model

- Apply EM directly to the noiseless mixing model, i.e.

- Derive alternating parameter update rule (M-step) by maximizing the

likelihood of the complete data ( , ) ,j n f j nc

1

( , ) ( , )

J

jj

n f n fx c

1

( , ) ( ) ( , ) ( , )

J

j jj

n f f s n f n fx h b

11 ˆ( , ) tr ( ) ( , )

1 1 ˆ( ) ( , )( , )

j

j

j j

jn j

v n f f n fI

f n fN v n f

c

c

R R

R R

Page 10: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Permutation alignment [H. Sawada et al. ICASSP2006 ]

10

1arg ( )

2 ( ) ji w f

jw f ePhase of before and after permutation alignment with 60 250msT

Principle: permute the source orders base on the estimated

source DoAs and the clustered phase-normalized mixing vectors.

Adaptation to the full-rank model: Computing the first

principal component of by PCA and then

applying the algorithm to the “equivalent” mixing vector

The order of is permuted identically to that of ( )j fR

( )j fw

( )j fR

( , )jv n f

( )j fw

Page 11: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Experiment setup

r=0.5ms1

s2

s3

m1 m2

1.8m

1.5

m

Source and microphone height: 1.4 mRoom dimensions: 4.45 x 3.35 x 2.5 mMicrophone distance: d = 0.05 mReverberation time: 50, 130, 250, 500ms

Number of stereo mixtures

3

Speech length 8 s

Sampling rate 16 kHz

STFT window type Sine

Window length 1024

Number of EM iterations 10

Number of clusters K 30

Geometry settingParameter and program settings

11

Page 12: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Experimental result

Full-rank model outperforms both the rank-1 model and baseline approaches in a realistic reverberant environments

mixture

12

Page 13: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Conclusion & future work

13

Contributions- Proposed to model the convolutive mixing process by full-rank unconstrained spatial covariance matrices

- Designed the model parameter estimation algorithms for the full-rank model by adapting the estimation for rank-1 model

- We showed that the proposed algorithm using the full-rank unconstrained spatial covariance model outperforms state-of-the-art approaches.

Current result (in collaboration with S. Arberet and A. Ozerov)

Combined the proposed full-rank unconstrained covariance model with NMF model for source spectra (to appear in ISSPA, May 2010). Future workConsider the full-rank unconstrained model in the context of source localization.

Page 14: Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS

Thanks for your attention!

& Your comments…?

14