s ubband cocktail-party speech separation: casa vs. bss

20
Seungjin Choi Department of Computer Science and Engineering POSTECH, Korea [email protected] Co-work with Frederic Berthommier ICP, INPG, France Subband cocktail-party speech separation: CASA vs. BSS

Upload: ahava

Post on 13-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

S ubband cocktail-party speech separation: CASA vs. BSS. Seungjin Choi Department of Computer Science and Engineering POSTECH, Korea [email protected] Co-work with Frederic Berthommier ICP, INPG, France. Number95 Stereo Database. ST-Numbers95 Database ICP/INP Grenoble - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: S ubband cocktail-party speech separation: CASA vs. BSS

Seungjin Choi

Department of Computer Science and EngineeringPOSTECH, Korea

[email protected]

Co-work with Frederic BerthommierICP, INPG, France

Subband cocktail-party speech separation: CASA vs. BSS

Page 2: S ubband cocktail-party speech separation: CASA vs. BSS

A large database of binary mixtures of sentences (n=613) has been recorded by [Tessier and Berthommier, 1999]. The signal of Numbers95 is played by loudspeakers and recorded. The temporal overlap between words is about 75% and the relative level is 0dB. The setup is static. Only 332 mixture sentences truncated at 1 s are used in the present study.

Left source

Mixture

Loudspeakers location

40 cm

Microphones location

60 deg

Left s=1

Right s=2

Left Right

90 cm90 cm

Number95 Stereo Database

Right source

ST-Numbers95 Database

ICP/INP Grenoble

Authors: E.Tessier and F. Berthommier

ST-Numbers95 Database

ICP/INP Grenoble

Authors: E.Tessier and F. Berthommier

Reference

Page 3: S ubband cocktail-party speech separation: CASA vs. BSS

Filterbank decomposition

Subband processing

0

0.2

0.4

0.6

0.8

1

100 4000 Hz

100Frequency

Gai

n

4000 Hz0

0.2

0.4

0.6

0.8

1

100 4000 Hz0

0.2

0.4

0.6

0.8

1

Frequency

4000 Hz0

0.2

0.4

0.6

0.8

1

Gai

n

100

Page 4: S ubband cocktail-party speech separation: CASA vs. BSS

The CASA Model

subb

and

Left

5 10 15 20 25 30 35 40

1

2

3

4

Frame (256 bins)

subb

and

Right

5 10 15 20 25 30 35 40

1

2

3

4

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

TDOA (bin)

Wei

ght

))(TDOAW(1)(TDOAW i,is'is,i

)(X)(TDOAW)(Y iis,is,i

TDOA estimationand weighting

)(X)(F)(X leftii

Filterbank decomposition

nbsb

i 1s,is )(Y)(Y Resynthesis

Page 5: S ubband cocktail-party speech separation: CASA vs. BSS

Left source

Left output

Fre

quen

cy

Reference

Time0 500 1000 1500 2000 2500 3000

0

0.2

0.4

0.6

0.8

1F

requ

ency

0

0.2

0.4

0.6

0.8

1

2 4 6 8 10 12 14

Reconstruction Acuracy

2))(sY)(sR(

2)(sR

log 10 sRAY

RA (output)

2))(sX)(sR(

2)(sR

log 10 sRAX

RA (mixture)

Frame of 1024 bins with half overlap

Hz100,4000 /2

Rl

Yl

Page 6: S ubband cocktail-party speech separation: CASA vs. BSS

Gain of CASA

)sSNRI( 2))(sY)(sR(

2))(sX)(sR(

log 10 2))(sX)(sR(

2)(sR

log 10 2))(sY)(sR(

2)(sR

log 10 sGain

rightor left s

)RA(mixture - RA(output) sRAX-sRAY sGain

Page 7: S ubband cocktail-party speech separation: CASA vs. BSS

Gain of CASA : Relative Level

RAX

RAY

4

-2

0

2

Gai

n le

ft (

dB)

Page 8: S ubband cocktail-party speech separation: CASA vs. BSS

Effect of the number of subbands (nbsb) for the CASA model on the RA (in dB). From left to right: averaged left source RA, averaged right source RA, averaged left+right RA over all frames. The number of subbands varies from 1 to 5 and the two curves correspond to duration= 256 and 512 bins. The RA of the mixture, which is subtracted for gain evaluation is labelled (*).

Subband effect for CASA

1 2 3 4 56

6.5

7

7.5

8

8.5

9

9.5

nbsb

dB

RA left

256512

1 2 3 4 56

6.5

7

7.5

8

8.5

9

9.5

nbsb

dB

RA right

dB

1 2 3 4 512

13

14

15

16

17

18

19

nbsb

RA left+right

Page 9: S ubband cocktail-party speech separation: CASA vs. BSS

Effect of nbsb : RA

Left Right

Mixt.

nbsb=1

nbsb=2

nbsb=4

0 2 4 6 8 10 12 140

5

10

15

20Left

RA

(d

B)

Frame 1024 bins with half overlap

0 2 4 6 8 10 12 14-5

0

5

10

15Right

RA

(d

B)

2

4

Page 10: S ubband cocktail-party speech separation: CASA vs. BSS

-50 -40 -30 -20 -10 0 10 20 30 40 50-8

-6

-4

-2

0

2

4

Relative Level (dB)

Ga

in (

dB

)

Subband effect for CASA: Gain

RightLeft

nbsb=4

nbsb=1

Page 11: S ubband cocktail-party speech separation: CASA vs. BSS

The BSS Model

L

0p

(i)c'

(i)p,cc'

(i)c

(i)c p)(tYW(t)X(t)Y

Wrl

Wlr

Xl(t)

Xr(t)

Yl(t)

Yr(t)

p)(tY (t))((Y(t)WΔ (i)c'

(i)c

(i)p,cc' signη

Gain | Non linear function | Delayed output

nbp

Time

Fre

quen

cy

0 500 1000 1500 2000 2500 3000 35000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Yl(t)

Yr(t)

1 second

Page 12: S ubband cocktail-party speech separation: CASA vs. BSS

Gain of BSS :Relative Level

RAX

RAY

Gai

n le

ft (

dB)

-6

-2

2

6

Page 13: S ubband cocktail-party speech separation: CASA vs. BSS

Effect of the number of subbands (nbsb) for the BSS model on the RA (in dB). From left to right: av. left source RA, av. right source RA, av. left+right RA over all frames. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). In each figures, two points are added at nbsb=1 for the "BSS giv" condition () and for "BSS ori" data ().

Subband effect for BSS

1 2 3 45

5.5

6

6.5

7

7.5

8

8.5

9

9.5

10left

nbsb

dB

2 3 10100

1 2 3 45

5.5

6

6.5

7

7.5

8

8.5

9

9.5

10right

nbsb

dB dB

1 2 3 410

11

12

13

14

15

16

17

18

19

20left+right

nbsb

Page 14: S ubband cocktail-party speech separation: CASA vs. BSS

RA and Gain for BSS

Left

Right

Mixt.

0 4 6 8 10 12 14-5

0

5

10

15

20

0 2 4 6 8 10 12 14-5

0

5

10

15

RA

(dB

)

Left

+

--

RAX

RAY

2

Frame 1024 bins with half overlap

RA

(dB

)

Right

+ -

0 2 4 6 8 10 12 14-10

0

10

20

RL

(d

B)

LISA.exe

Speech Separation Program (C++)

POSTECH

Authors: S. Choi and H. Hong

Speech Separation Program (C++)

POSTECH

Authors: S. Choi and H. Hong

Page 15: S ubband cocktail-party speech separation: CASA vs. BSS

Subband effect for BSS: Gain

Relative Level (dB)-50 -40 -30 -20 -10 0 10 20 30 40 50

-12

-10

-8

-6

-4

-2

0

2

4

6Gain of BSS (nbp=100)

Gai

n (

dB

) LeftRight

nbsb=2

nbsb=1

Page 16: S ubband cocktail-party speech separation: CASA vs. BSS

Demixing filters

20 40 60 80 100 120 140 160 180 200

-0.2

-0.1

0

0.1

0.2

0.3Wlr

20 40 60 80 100 120 140 160 180 200

-0.2

-0.1

0

0.1

0.2

0.3

time (bin)

Wrl

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500Wlr

0 10 20 30 40 50 60 70 80 90 1000

100

200

300Wrl

Frequency

20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400Wlr

0 10 20 30 40 50 60 70 80 90 1000

50

100

150

200

250Wrl

Frequency

nbsb=1

Page 17: S ubband cocktail-party speech separation: CASA vs. BSS

Coherence spectrograms

NBP=10

Mean(Coh)=0.65

Time

Fre

quen

cy

0 500 1000 1500 2000 2500 3000 35000

0.2

0.4

0.6

0.8

1left

Time

Fre

quen

cy

0 500 1000 1500 2000 2500 3000 35000

0.2

0.4

0.6

0.8

1right

1nn,1nn,

1nn,

2)(c'Y

2)(cY

2

)(*c'Y)(cY

n),c'Y,cCoh(Y

Frames of 256 bins with half overlap

Yl(n), Yl(n+1)

Yr(n), Yr(n+1)

Page 18: S ubband cocktail-party speech separation: CASA vs. BSS

Effect of nbp: Coherence spectrograms

10

100

3

NBP=3

NBP=10

NBP=100

Left RightCoh

0.60

0.65

0.68

Page 19: S ubband cocktail-party speech separation: CASA vs. BSS

Effect of the number of subbands (nbsb) on the coherence index for the BSS model. Left: average left+right RA over all frames. Right: coherence defined as the mean of the coherence spectrogram. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). The CohX coherence between the two mixture channels is labelled (*) in the right figure. In each figures, two points are added at nbsb=1 for the "BSS giv" condition () and for "BSS ori" data ().

Coherence statistic

dB

2 3 10100

1 2 3 410

11

12

13

14

15

16

17

18

19

20left+right

nbsb1 2 3 4

0.55

0.6

0.65

0.7

0.75

0.8Coh

nbsb

Page 20: S ubband cocktail-party speech separation: CASA vs. BSS

dB RA 1+2 RA 1 RA 2 Gain 1 Gain 2CASA 4 18.06 8.97 9.09 2.83 2.41BSS 2 18.62 9.16 9.46 3.02 3.13BSS ori 16.67 8.44 8.22 2.29 2.13BSS giv 16.71 8.29 8.42 2.15 2.09

Summary results

Left

Right

CASA BSS

… Hearing

REF

Left

Rightmean

-50 -40 -30 -20 -10 0 10 20 30 40 50-8

-6

-4

-2

0

2

4

Gain of CASA 4

Gai

n (

dB

)

Relative Level (dB)-50 -40 -30 -20 -10 0 10 20 30 40 50

-10

-8

-6

-4

-2

0

2

4

6Gain of BSS

Relative Level (dB)

Gai

n (d

B)

mean

Left

Right