sparse nonnegative matrix based on -divergence for single channel separation in cochleagram
TRANSCRIPT
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
1/14
International Journal of Mathematics and Computer
Applications Research (IJMCAR)
ISSN 2249-6955Vol. 2 Issue 4 Dec 2012 11-24
TJPRC Pvt. Ltd.,
SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR
SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
M. E. ABD EL AZIZ & WAEL KIDER
Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt
ABSTRACT
In this paper, a novel family of -divergence based two-dimensional nonnegative matrix factorization methods to
solve SCBSS has been proposed. The separation system of cochleagram and the family of divergence based
factorization algorithms have been developed in a principled manner coupled with the theoretical support of audio signal
separability. The proposed method enjoys at least two significant advantages: Firstly, the cochleagram rendered by the
gammatone filterbank has non-uniform time-frequency resolution which enables the mixed signal to be more separable and
improves the efficiency in source tracking. Secondly, the divergency holds a desirable property of scale invariant that
enables low energy components in the cochleagram bear the same relative importance as the high energy ones. We
compare our system to the Factorial SC and SNMF2D models, where the proposed algorithm shows a superior
performance in terms of signal-to interference ratio. Finally, the low computational requirements of the algorithm allows
close to real time applications.
KEYWORDS: Blind Signal Separation (BSS), Nonnegative Matrix Factorization (NMF), Divergence, - NMF,
Single Channel Source Separation (SCSS)
INTRODUCTION
Single channel source separation (SCSS) aims to extract several source signals from a single mixture recording.
Since at least two sources are interfering and sound sources may overlap in time so that the standard source separation
methods such as ICA (Hyvarinen et al 2001) cannot be applied, the standard NMF or SNMF models (Schmidt et al 2006)
are only satisfactory for solving source separation providing that spectral frequencies do not change over time. The
recently SNMF2D model (Gao et al 2011) solving the problem of SNMF where the spectral dictionary and temporal code
optimized by using kullback divergence, where they rarely interfere in a time-frequency representation. This fact has been
used in computational auditory scene analysis (Wang et al 2006, Brown 1994); inspired by the human ability to organize
the perceived time-frequency representation according to likely sources, but SNMF2D has some drawbacks that originate
from its lack of generalized criterion for controlling the sparsity. Roweis (Roweis 2003) introduced the refiltering
framework which uses so-called spectrogram masks in order to attenuate spectrogram parts which do not belong to the
desired sources. To estimate these mask signals, he proposed the factorial-max vector quantizer (VQ) model, which
assumes that the magnitude-log source spectrograms are generated by vector quantizers plus a noise term. In order to train
speaker specific code-books and to estimate the noise variances he applied k-means to source specific spectrograms.
Hence, max-VQ explicitly models the sources in a training stage. The factorial-max VQ model can be extend by replacing
the vector quantizers with sparse coders (Peharz 2010). A sparse coder can be seen as a generalization of a vector
quantizer, since it represents data with a linear combination of up to so-called atoms ( being a parameter to chose),
while a vector quantizer uses a single, non-scalable code-word, consequently. In order to train speaker specific dictionaries,
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
2/14
12 M.E. Abd El Aziz & Wael Kider
it use a non-negative matrix factorization algorithm with -sparseness constraints on the coefficient matrix (NMF ).Thesparse coder model suffer from some drawbacks such as it affected by outlier and noise since it using Euclidean distance
and also it using STFT that will produce errors especially when complicated transient phenomena such as the mixing of
speech and music occur in the analysed signal.
The aim of this work is to remedy these drawbacks so we formulate a single channel NMF model that accounts for
convolutive mixing and can see as generalization for (Peharz 2010) in which it using -NMF algorithm where it is robust
with respect to noise and/or outliers in single channel convolution. The source cochleagram spectrograms are modeled
through NMF and the mixing filters serve to identify the elementary components pertaining to each source.
The remaining of this paper is organized as follows. Single channel NMF model is introduced in section 2.
Section 3 is devoted to the Factorial Sparse Coder algorithm. In section 4 the definition of -divergence. Section 5
presents the estimation of spectral basis and temporal code. Section 6 presents the results of our algorithm to source
separation in various settings. Conclusions are drawn in section 7.
SINGLE CHANNEL NMF MODEL
We consider sampled signal generated as 2 convolutive noisy mixtures of point source signals such that
1
Where is additive noise. The time-domain mixing given by (1) can be approximated in the short-time Fouriertransform (STFT) domain as:
2
where and are the complex-valued STFTs of the corresponding time signals, 1 , , is a frequency binindex, 1 , , is a time frame index. Equation (2) can be rewritten in matrix form: (3)We used NMF to model the power spectrogram | | | ,| of source j as a product of two nonnegativematrices and , such that
(4)The 3D-representation for matrices and presented in Figure 1.
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
3/14
Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 13
Figure1: (A) Frontal Slice 3D-Representation (B) Vertical and Horizontal Slice 3D Representation
FACTORIAL SPARSE CODER ALGORITHM
In this section we illustrated the Factorial Sparse Coder Model (Factorial SC) (Peharz 2010) where it using
method similar to K-SVD algorithm for dictionary training for sparse coders consist of two stages. For the sparse coding
stage it proposed non-negative matching pursuit (NMP), a non-negative variant of OMP. In the dictionary update step it
use several iterations of nonnegative matrix factorization (NMF) proposed by Lee and Seung (Lee et al 2001) , the
Factorial Sparse Coder Model reformulation the equation (4) as
where is a source specific dictionary, is the corresponding coefficient vector and is an index vectorindicating the selected atoms. The summarized of Factorial SC algorithm can found in algorithm 1.
Where a solution is defined as a triplet , , where contains the indices of the selected atoms out of, are the corresponding coefficients and is the residual. The set of all solutions is denoted as . Starting with a single
trivial solution , , , in every iteration each solution is extended with up to atoms, selected by the functionselectBestAtoms. In selectBestAtoms, it calculate . Atoms with negative values in , and atoms which wouldmake the prior probability to zero, are discarded, where the prior probabilities are calculated according to the original
dictionaries and as.
, , |
5
where the factors and | can be estimated from the coefficient matrix returned by NMF (Peharz2010).When is the number of remaining atoms, , atoms with largest values in are selected. The innerproducts and the indices of the selected atoms are returned in the vectors and . In lines 10-12, we perform NMF forthe coefficient vector , which approximate equation (6).
arg min , : 0 6
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
4/14
14 M.E. Abd El Aziz & Wael Kider
Continuing in this manner, the solution set comprises up to solutions in iteration. After 1 iterations, it startto prune the solution set to the best solutions in every iteration, i.e. it select the solutions with highest posterior (7),where the probabilities and are evaluated according to the original dictionaries.
, | |
,
,
,
7The Laplacian form factors can be estimated from the residual error in the training stage. When the algorithmhas stopped, it select the solution with maximal posterior out of the final solution set and build the coefficient matrix ,which is split according to the original dictionaries: . The approximations of the source spectrograms are thengiven as . It calculate a mask for each source according to , 1,2. Finally,approximations of the source signals are given by the inverse short term fourier transform (ISTFT) of the masked mixture:
, where is the original complex mixture spectrogram.Algorithm 1: Factorial SC
1. , , 2. for l=1:L3. 4. for5. , , 6. , , , , 7. for b=1: | |8. , 9. , 10. for j = 1 : J
11. 12. endfor
13. 14. , , 15. endfor
16. endfor
17. 18. if then19. Prune to the best solutions20. endif
21. endfor
Since this algorithm work in STFT it has some drawbacks such as the classical spectrogram as computed by the
STFT has an equal-spaced bandwidth across all frequency channels. Since speech signals are characterized as highly non-
stationary and non-periodic whereas music changes continuously; therefore, application of the Fourier transform will
produce errors especially when complicated transient phenomena such as the mixing of speech and music occur in the
analysed signal. Unlike the spectrogram, the log-frequency spectrogram possesses non-uniform TF resolution. However, it
does not exactly match to the nonlinear resolution of the cochlear since their centre frequencies are distributed
logarithmically along the frequency axis and all filters have constant-Q factor (Brown 1991).
On the other hand, the gammatone filters used in the cochlear model are approximately logarithmically spaced
with constant-Q for frequencies from /10 to /2 and approximately linearly spaced for frequencies below /10 .Hence, this characteristic results in selective non-uniform resolution in the TF representation of the analysed audio signal.
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
5/14
Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 15
Gammatone filterbank was previously proposed in (Hu et al 2007, Jin et al 2009) as a model to cochlear filtering which
decomposes the time-domain input into the frequency domain. The impulse response of a gammatone filter centered at
frequency is given by:
,
, 00 , 8
where denotes the order of filter, represents the rectangular bandwidth which increases as the center frequency
increases. With regards to a particular filter channel , let be the center frequency. Then, the filter output response , can be expressed as: , , 9
where represents convolution. The response is shifted backwards by 1/2 to compensate for thefilter delay. The output of each filter channel is divided into time frame with 50% overlap between consecutive frames (Hu
et al 2005). The resulting outputs form the time-frequency spectra which are then constructed to form the cochleagram.
The use of the gammatone filter is consistent according to the neurobiological modeling perspective. Figure 2 shows an
example of frequency response for different types transform.
So by work in cochleagram spectrum we solve the problem of STFT, in the next section the -divergence introduced to
solve the problem of outliers/noise that produced by using Euclidean distance.
Figure 2: Different Types Transform (A) Original Source (B) Cochleagram (C) Spectrum (D) Log-Spectrum
-DIVERGENCE
The -divergence (Cichocki et al 2011) can be defined as :
| 1 , 1 10
This divergence can be by suitable choice of the (, ) parameters simplifies into some existing divergences,
including the well-known Alpha- and Beta-divergences. For example when 1the -divergence reduces to the
Alpha-divergence (Cichocki et al 2009)::
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
6/14
16 M.E. Abd El Aziz & Wael Kider
, | 1, 111 1 1
0, 1
On the other hand, when 1, it reduces to the Beta-divergence (Cichocki et al 2010):
, |
1 1,121 11 1 , 1 0
Also -divergence reduces to the standard Itakura-Saito divergence for 1 and 1 (Lee et al 2001).,
|
1 13
We used -divergence for many reasons that found in Cichocki et al 2011), in which it illustrated the role of the
hyper-parameters and
on the robustness of the
-divergence with respect to errors and noises, and it compare the
behavior of the -divergence with the standard Kullback-Liebler divergence, also by scaling arguments of the -
divergence by a positive scaling factor 0, it yields the following relation, | | 14
These basic properties imply that whenever 0, we can rewrite the -divergence in terms of a
-order Beta-
divergence combined with an -zoom of its arguments as
, | | 15Estimation of the Spectral Basis and Temporal Code
In order to use -divergence so our objective function is:
|| 16Where is the structure defined by :
17Let be a scalar parameter of the set , . The derivative of w.r.t :
D | | 18Where || is the derivative of || w.r.t. given by
|| 19
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
7/14
Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 17
The gradient of the -divergence can be expressed in a compact form (for any , ) in terms of a 1 deformed logarithm .By using (18), we obtain the following derivatives:
D | ||
1 . ..
20
D | ||
1 . ..
21
The previous equations can be written in the following matrix form:
D | 1 22D | 1 23
So the update rule for both and in matrix form are . . .. 24
.
.
.
.
25
In finally we conclude our algorithm (in which we called -FSC) as follow
Algorithm 2: -FSC
Input :
Output: , 1. cochleagram 2. , -NM F (
2.
Estimate and | from coefficient matrix.3. , Factorial SC( , , , | % replace line 11 by . . ..
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
8/14
18 M.E. Abd El Aziz & Wael Kider
Algorithm3 : -NMF 1: Initialize randomly
2: for 1: do3: sparsely code with using -NMP
4: for 1: do5: . .
.
.
6: || || , k 1 , , K7: . . ..8: end for
9: end for
Algorithm 4: -NMP
1: 2: 3: 4: for
1:do
5: 6: 7: 8: if 0then9: Terminate
10: end if
11: , 12: , 13: for 1: do14: . . ..15: end for
16: 17: end for
RESULTS AND ANALYSIS
Experiment Setup
The proposed method is tested by separating music and speech sources. Several experimental simulations under
different conditions have been designed to investigate the efficacy of the proposed method. MATLAB is used as the
programming platform. For mixture generation, two speakers' male and female were selected from TIMIT speech database
(www.ldc.upenn.edu/Catalog/LDC93S1.html.) and the music signals are selected from the RWC database
(http://staff.aist.go.jp). Some mixtures are sampled at 16 kHz sampling rate and other at 8 kHz. We compare our algorithm
-FSC with MMSS (Li et al 2009), SNMF2D and Factorial SC algorithms. Where the TF representation for Factorial SC
and MMSS is computed by normalizing the time-domain signal to unit power and computing the STFT using 1024 point
Hamming window FFT with 50% overlap. For SNMF2D the frequency axis of the obtained spectrogram is then
logarithmically scaled and grouped into 175 frequency bins in the range of 50 Hz to 8 kHz with 24 bins per octave. For
-FSC the cochleagram based on Gammatone filterbank of 128 channels (filter order of 4) and the output is divided into
20-ms time frame with 50% overlap between consecutive frames. In all cases, the sources are mixed with equal average
power over the duration of the signals. Two types of mixtures are used: mixture of music and speech; mixture of different
kinds of music.
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
9/14
Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 19
Measure of Performance
We have evaluated our separation performance in terms of the signal-to-distortion ratio (SDR) which is one form
of perceptual measure. This is a global measure that unifies source-to-interference ratio (SIR), source-to-artifacts ratio
(SAR), and source-to-noise ratio (SNR). MATLAB routines for computing these criteria are obtained from the SiSEC08
webpage (Vincent et al 2008, Vincent et al 2005).
Analysis of Results
Figure 3 shows the time domain of the original speech of male, female and the mixture of two sources; Figure 4
show the Cochleagram of two sources and its mixture .Figure 5 further shows the separation results in the cochleagram.
The plot clearly shows the spectral energy of the two audio sources is clustered at different frequencies in the cochleagram
due to their different fundamental frequencies. These prominent features have been separated using the proposed -FSC
algorithm. Figure 6 shows the final recovered time-domain sources.
To further analyses the performance of all the above matrix factorization methods in separating the mixed signal
and capturing the TF patterns of the sources, the cochleagram of the each recovered source has been plotted in Figure 5. In
Figure 5, panels (a)-(b), (c)-(d) , (e)-(f) and (g)-(h) denote the recovered cochleagram of the female speech and male by
using the Factorial SC, MMSS, SNMF2D and-FSC algorithms, respectively. In particular, panels (c)-(d) implies thatMMSS algorithm cannot obtain better reconstruction of the sources. SNMF2D give better estimation than MMSS. On the
other hand, it is noted that both Factorial SC and -FSC algorithms exhibit good reconstruction of the female speech as
well as the male. However, the Factorial SC algorithm fails to identify several missing components as indicated in the red
box marked area of panel (a)-(b). Hence, less accuracy is obtained in the estimation of the male as compared with the -
FSC algorithm which has successfully estimated both sources with high accuracy.
Table 1 shows the comparison of the proposed algorithm (-FSC) based on the cochleagram with other
algorithms such as MMSS, SNMF2D and Factoral SC. It is noted that MMSS give poor results and SNMF2D is better than
MMSS but less than others algorithms .Where both Factoral-SC and -FSC algorithms exhibit a good reconstruction in
terms of SDR, SIR and SAR. However, the resulting factorizations are not equivalent.
The major reason for the large discrepancy between them is the resulting spectrogram fails to infer the dominating
source. This leads to high degree of ambiguity in TF domain and causes lack of uniqueness in extracting the spectral-
temporal features of the sources. The cochleagram enables the mixed signal to be more separable and thereby reduces the
mixing ambiguity between |S|and |S|. This explains the performance of separating mixture music and female utteranceis highest among all the mixtures because both sources have very distinguishable TF patterns in the cochleagram.
In summary, all the results in Table 1 and Figures 5 unanimously show the importance of using the (-FSC)
factorization algorithm in order to correctly estimate the spectral and temporal features of each source.
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
10/14
20 M.E. Abd El Aziz & Wael Kider
Figure3 :(A) Original Female speech (B) Original Male speech (C) Mixture of Sources
Figure4 :(A) Cochleagram of Original Female speech (B)Cochleagram of Original Male speech (C) Cochleagram of
Mixture
Table1: Comparison between -FSC,Factoral SC, SNMF2D and MMSS
Mixture Algorithm
SDR SAR SIR
S1 S2 S1 S2 S1 S2
Female1 Speech and male speech -FSC 12.7711 12.4117 13.9310 13.9303 19.2441 17.8848
Factoral SC 12.6270 12.2991 13.8221 13.8214 18.9913 17.7675
SNMF2D -17.444 16.783 -17.335 42.0723 16.0476 16.7971
MMSS 3.7309 7.5410 4.9557 8.0659 11.0300 17.6080
Female1 Speech and Female speech -FSC 13.7165 13.6159 14.7072 14.7966 21.8654 19.9908
Factoral SC 12.8072 11.9564 13.4305 13.8704 20.7392 16.6114
SNMF2D -19.962 11.9852 9.2016 12.0191 -19.464 33.3438
MMSS 5.1450 5.3637 6.3621 6.7431 13.1413 11.0951
Music and music -FSC 17.4555 17.9902 18.3730 18.9706 24.7368 23.6153
Factoral SC 16.4991 17.3420 17.3343 17.6024 23.1495 21.3892
SNMF2D -24.925 18.2986 15.5392 18.3553 -24.805 37.2287
MMSS 10.7776 -8.1304 -4.8884 11.0298 23.5920 0.7689
Music and Female -FSC 14.1972 15.3901 15.1863 15.9172 21.2375 19.8083Factoral SC 14.3782 13.6053 14.4900 14.1660 20.9607 18.9371
SNMF2D -17.5836 8.8261 9.7609 8.8630 -17.1394 30.0781
MMSS 6.5059 9.3063 7.3610 9.3634 14.7163 28.6164
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
11/14
Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 21
Figure5: Separation Results: (a)-(b), (c)-(d) , (e)-(f) and (g)-(h) Denote the Recovered Female speech and Male in the
Cochleagram by using the Factorial SC,MMSS,SNMF2D , -FSC Algorithms, Respectively
Figure 6 :Time Separation of Source ,(a)-(b) Factorial SC. (c)-(d) MMSS. (e)- (f) SNMF2D. (g)-(h) -FSC
CONCLUSIONS
In this paper we proposed a separation framework using the gammatone filterbank. That produces a non-uniform
TF domain termed as the cochleagram whereby each TF unit has different resolution unlike the classical spectrogram
which deals only with uniform resolution. Towards this end, it is shown that the mixed signal is significantly more
separable in the cochleagram than the classic spectrogram and the log-frequency spectrogram (constant-Q transform).
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
12/14
22 M.E. Abd El Aziz & Wael Kider
Also a family of -divergence based novel two-dimensional nonnegative matrix factorization algorithms has been
developed to extract the spectral and temporal features of the sources. The proposed factorizations are scale invariant
whereby the lower energy components in the cochleagram can be treated with equal importance as the higher energy
components. Within the context of SCBSS, this property is highly desirable as it enables the spectral-temporal features of
the sources that are usually characterized by large dynamic range of energy to be estimated with significantly higher
accuracy. This is to be contrasted with the matrix factorization based on LS distance and KL divergence where both
methods favor the high-energy components but neglect the low-energy components.
In the comparison of FSC and NMF2D algorithms, the proposed FSC obtains the best separation performance.
The impetus behind this work is that, sparseness achieved by the conventional NMF, SNMF, NMF2D and SNMF2D is not
efficient enough; in source separation it is very necessary to yield control over the degree of sparseness explicitly for each
temporal code.
REFERENCES
1. Brown, C. (1991).Calculation of a constant Q spectral transform, J. Acoust. Soc. Am., vol. 89, no 1, pp. 425434.2. Brown, C. and Cooke, M.(1994). Computational auditory scene analysis, Computer Speech and Language, vol.8,
pp. 297336.
3. Cichocki, A. , Zdunek, R. ,and Phan, A.H. (2009). Nonnegative Matrix and Tensor Factorizations, John Wiley &Sons Ltd.: Chichester, UK.
4. Cichocki, A. , Sergio , C. and Amari, S. (2011). Generalized Alpha-Beta Divergences and Their Application toRobust Nonnegative Matrix Factorization, Entropy, 13, 134-170.
5. Cichocki, A. and Amari, S. (2010). Families of Alpha- Beta- and Gamma- divergences: Flexible and robustmeasures of similarities, Entropy, 12, pp. 15321568.
6. Gao, B. , Woo, W. L. and Dlay, S. S. (2011). Single channel source separation using EMD-subband variableregularized sparse features, IEEE Trans .Audio, Speech, Lang. Process., vol. 19, no. 4, pp. 961976.
7. http://staff.aist.go.jp8. Hu, G. and Wang, D. L. (2007). Auditory segmentation based on onset andoffset analysis, IEEE Trans. Audio,
Speech and Language Processing, vol. 15, no. 2, pp. 396405.
9. Hu, G. and Wang,.( 2004).Monaural speech segregation based on pitch tracking and amplitude modulation, IEEETrans. Neural Networks, vol. 15, no. 5, pp. 11351150.
10. Hyvarinen, A., Karhunen,J. and Oja, W. (2001). Independent Component Analysis. John Wiley & Sons.11. Jin, Z. and Wang, D.L (2009). A supervised learning approach to monaural segregation of reverberant speech,
IEEE Trans. on Audio, Speech and Language Processing, vol. 17, pp.625-638.
12. Lee, D. D. and Seung, H. S.(2001). Algorithms for non-negative matrix factorization, Advances in neuralinformation processing systems, vol. 13, pp. 556562.
13. Li ,Y., Woodruff, J. and D.L Wang .(2009). Monaural musical sound separation based on pitch and commonamplitude modulation. IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, pp. 1361-
1371.
14. Peharz, R. (2010).Single channel source separation using dictionary design methods for sparse coders, Mastersthesis, Graz University of Technology.
15. Roweis. S. (2003). Factorial models and refiltering for speech separation and denoising, in EUROSPEECH, pp.10091012.
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
13/14
Sparse Nonnegative Matrix Based on -Divergence for Single Channel Separation in Cochleagram 23
16. Schmidt, M. N. and Morup, M.(2006). Nonnegative matrix factor 2-D deconvolution for blind single channelsource separation, in Proc.Int. Conf. Ind. Compon. Anal. Blind Signal Separat. (ICABSS06), Charleston, SC,
vol. 3889, pp. 700707.
17. Vincent, E. , Araki ,S .(2008).Signal Separation Evaluation Campaign (SiSEC 2008). [Online]. Available:http://sisec.wiki.irisa.fr.
18. Vincent, E. , Gribonval, R. and Fevotte, C.(2005). Performance measurement in blind audio source separation,IEEE Trans. on Audio, Speech, and Language Processing. vol. 14, no. 4, pp. 14621469, Jul.
19. www.ldc.upenn.edu/Catalog/LDC93S1.html.20. Wang, D.and Brown, G. J. ( 2006). Computational Auditory Scene Analysis: Principles, Algorithms,and
Applications, ser. IEEE Press. J. Wiley and Sons Ltd.
-
7/30/2019 SPARSE NONNEGATIVE MATRIX BASED ON -DIVERGENCE FOR SINGLE CHANNEL SEPARATION IN COCHLEAGRAM
14/14