s ubband cocktail-party speech separation: casa vs. bss
DESCRIPTION
S ubband cocktail-party speech separation: CASA vs. BSS. Seungjin Choi Department of Computer Science and Engineering POSTECH, Korea [email protected] Co-work with Frederic Berthommier ICP, INPG, France. Number95 Stereo Database. ST-Numbers95 Database ICP/INP Grenoble - PowerPoint PPT PresentationTRANSCRIPT
Seungjin Choi
Department of Computer Science and EngineeringPOSTECH, Korea
Co-work with Frederic BerthommierICP, INPG, France
Subband cocktail-party speech separation: CASA vs. BSS
A large database of binary mixtures of sentences (n=613) has been recorded by [Tessier and Berthommier, 1999]. The signal of Numbers95 is played by loudspeakers and recorded. The temporal overlap between words is about 75% and the relative level is 0dB. The setup is static. Only 332 mixture sentences truncated at 1 s are used in the present study.
Left source
Mixture
Loudspeakers location
40 cm
Microphones location
60 deg
Left s=1
Right s=2
Left Right
90 cm90 cm
Number95 Stereo Database
Right source
ST-Numbers95 Database
ICP/INP Grenoble
Authors: E.Tessier and F. Berthommier
ST-Numbers95 Database
ICP/INP Grenoble
Authors: E.Tessier and F. Berthommier
Reference
Filterbank decomposition
Subband processing
0
0.2
0.4
0.6
0.8
1
100 4000 Hz
100Frequency
Gai
n
4000 Hz0
0.2
0.4
0.6
0.8
1
100 4000 Hz0
0.2
0.4
0.6
0.8
1
Frequency
4000 Hz0
0.2
0.4
0.6
0.8
1
Gai
n
100
The CASA Model
subb
and
Left
5 10 15 20 25 30 35 40
1
2
3
4
Frame (256 bins)
subb
and
Right
5 10 15 20 25 30 35 40
1
2
3
4
-10 -8 -6 -4 -2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
TDOA (bin)
Wei
ght
))(TDOAW(1)(TDOAW i,is'is,i
)(X)(TDOAW)(Y iis,is,i
TDOA estimationand weighting
)(X)(F)(X leftii
Filterbank decomposition
nbsb
i 1s,is )(Y)(Y Resynthesis
Left source
Left output
Fre
quen
cy
Reference
Time0 500 1000 1500 2000 2500 3000
0
0.2
0.4
0.6
0.8
1F
requ
ency
0
0.2
0.4
0.6
0.8
1
2 4 6 8 10 12 14
Reconstruction Acuracy
2))(sY)(sR(
2)(sR
log 10 sRAY
RA (output)
2))(sX)(sR(
2)(sR
log 10 sRAX
RA (mixture)
Frame of 1024 bins with half overlap
Hz100,4000 /2
Rl
Yl
Gain of CASA
)sSNRI( 2))(sY)(sR(
2))(sX)(sR(
log 10 2))(sX)(sR(
2)(sR
log 10 2))(sY)(sR(
2)(sR
log 10 sGain
rightor left s
)RA(mixture - RA(output) sRAX-sRAY sGain
Gain of CASA : Relative Level
RAX
RAY
4
-2
0
2
Gai
n le
ft (
dB)
Effect of the number of subbands (nbsb) for the CASA model on the RA (in dB). From left to right: averaged left source RA, averaged right source RA, averaged left+right RA over all frames. The number of subbands varies from 1 to 5 and the two curves correspond to duration= 256 and 512 bins. The RA of the mixture, which is subtracted for gain evaluation is labelled (*).
Subband effect for CASA
1 2 3 4 56
6.5
7
7.5
8
8.5
9
9.5
nbsb
dB
RA left
256512
1 2 3 4 56
6.5
7
7.5
8
8.5
9
9.5
nbsb
dB
RA right
dB
1 2 3 4 512
13
14
15
16
17
18
19
nbsb
RA left+right
Effect of nbsb : RA
Left Right
Mixt.
nbsb=1
nbsb=2
nbsb=4
0 2 4 6 8 10 12 140
5
10
15
20Left
RA
(d
B)
Frame 1024 bins with half overlap
0 2 4 6 8 10 12 14-5
0
5
10
15Right
RA
(d
B)
2
4
-50 -40 -30 -20 -10 0 10 20 30 40 50-8
-6
-4
-2
0
2
4
Relative Level (dB)
Ga
in (
dB
)
Subband effect for CASA: Gain
RightLeft
nbsb=4
nbsb=1
The BSS Model
L
0p
(i)c'
(i)p,cc'
(i)c
(i)c p)(tYW(t)X(t)Y
Wrl
Wlr
Xl(t)
Xr(t)
Yl(t)
Yr(t)
p)(tY (t))((Y(t)WΔ (i)c'
(i)c
(i)p,cc' signη
Gain | Non linear function | Delayed output
nbp
Time
Fre
quen
cy
0 500 1000 1500 2000 2500 3000 35000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Yl(t)
Yr(t)
1 second
Gain of BSS :Relative Level
RAX
RAY
Gai
n le
ft (
dB)
-6
-2
2
6
Effect of the number of subbands (nbsb) for the BSS model on the RA (in dB). From left to right: av. left source RA, av. right source RA, av. left+right RA over all frames. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). In each figures, two points are added at nbsb=1 for the "BSS giv" condition () and for "BSS ori" data ().
Subband effect for BSS
1 2 3 45
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10left
nbsb
dB
2 3 10100
1 2 3 45
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10right
nbsb
dB dB
1 2 3 410
11
12
13
14
15
16
17
18
19
20left+right
nbsb
RA and Gain for BSS
Left
Right
Mixt.
0 4 6 8 10 12 14-5
0
5
10
15
20
0 2 4 6 8 10 12 14-5
0
5
10
15
RA
(dB
)
Left
+
--
RAX
RAY
2
Frame 1024 bins with half overlap
RA
(dB
)
Right
+ -
0 2 4 6 8 10 12 14-10
0
10
20
RL
(d
B)
LISA.exe
Speech Separation Program (C++)
POSTECH
Authors: S. Choi and H. Hong
Speech Separation Program (C++)
POSTECH
Authors: S. Choi and H. Hong
Subband effect for BSS: Gain
Relative Level (dB)-50 -40 -30 -20 -10 0 10 20 30 40 50
-12
-10
-8
-6
-4
-2
0
2
4
6Gain of BSS (nbp=100)
Gai
n (
dB
) LeftRight
nbsb=2
nbsb=1
Demixing filters
20 40 60 80 100 120 140 160 180 200
-0.2
-0.1
0
0.1
0.2
0.3Wlr
20 40 60 80 100 120 140 160 180 200
-0.2
-0.1
0
0.1
0.2
0.3
time (bin)
Wrl
0 10 20 30 40 50 60 70 80 90 1000
100
200
300
400
500Wlr
0 10 20 30 40 50 60 70 80 90 1000
100
200
300Wrl
Frequency
20 40 60 80 100 1200
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50 60 70 80 90 1000
100
200
300
400Wlr
0 10 20 30 40 50 60 70 80 90 1000
50
100
150
200
250Wrl
Frequency
nbsb=1
Coherence spectrograms
NBP=10
Mean(Coh)=0.65
Time
Fre
quen
cy
0 500 1000 1500 2000 2500 3000 35000
0.2
0.4
0.6
0.8
1left
Time
Fre
quen
cy
0 500 1000 1500 2000 2500 3000 35000
0.2
0.4
0.6
0.8
1right
1nn,1nn,
1nn,
2)(c'Y
2)(cY
2
)(*c'Y)(cY
n),c'Y,cCoh(Y
Frames of 256 bins with half overlap
Yl(n), Yl(n+1)
Yr(n), Yr(n+1)
Effect of nbp: Coherence spectrograms
10
100
3
NBP=3
NBP=10
NBP=100
Left RightCoh
0.60
0.65
0.68
Effect of the number of subbands (nbsb) on the coherence index for the BSS model. Left: average left+right RA over all frames. Right: coherence defined as the mean of the coherence spectrogram. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). The CohX coherence between the two mixture channels is labelled (*) in the right figure. In each figures, two points are added at nbsb=1 for the "BSS giv" condition () and for "BSS ori" data ().
Coherence statistic
dB
2 3 10100
1 2 3 410
11
12
13
14
15
16
17
18
19
20left+right
nbsb1 2 3 4
0.55
0.6
0.65
0.7
0.75
0.8Coh
nbsb
dB RA 1+2 RA 1 RA 2 Gain 1 Gain 2CASA 4 18.06 8.97 9.09 2.83 2.41BSS 2 18.62 9.16 9.46 3.02 3.13BSS ori 16.67 8.44 8.22 2.29 2.13BSS giv 16.71 8.29 8.42 2.15 2.09
Summary results
Left
Right
CASA BSS
… Hearing
REF
Left
Rightmean
-50 -40 -30 -20 -10 0 10 20 30 40 50-8
-6
-4
-2
0
2
4
Gain of CASA 4
Gai
n (
dB
)
Relative Level (dB)-50 -40 -30 -20 -10 0 10 20 30 40 50
-10
-8
-6
-4
-2
0
2
4
6Gain of BSS
Relative Level (dB)
Gai
n (d
B)
mean
Left
Right