or morgan, me & pitch - columbia universitydpwe/talks/2015-03-morgan-pitch.pdfrobustness,...

16
Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 1 1. Robustness and Separation 2. An Academic Journey 3. Future Dan Ellis Columbia / ICSI [email protected] http://labrosa.ee.columbia.edu / C OLUMBIA U NIVERSITY IN THE CITY OF NEW YORK or Morgan, Me & Pitch Robustness, Separation & Pitch

Upload: others

Post on 22-Dec-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /161

1. Robustness and Separation2. An Academic Journey3. Future

Dan Ellis Columbia / ICSI

[email protected] http://labrosa.ee.columbia.edu/

COLUMBIA UNIVERSITYIN THE CITY OF NEW YORK

or

Morgan, Me & Pitch

Robustness, Separation & Pitch

Page 2: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

1953: How To Separate Speech?• The “Cocktail Party Problem” [Cherry ’53]

• Spatial information: ATC over a single speaker• Pitch differences via gender differences

• Auditory Scene Analysis [Bregman ’90]

2

• Grouping cues• Onset• Harmonicity• Common Fate• “Schema”

Page 3: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

The Usefulness of Pitch• Common pitch can link energy

from a single source

3

!"#$%&$'()#*"&+$,-"#$'(!$./-#01232'(4-**2&2#52.

1232'(6-**2&2#52(5$#(72'8(-#(.82257(.20&20$,-"#

1-.,2#(*"&(,72(9%-2,2&(,$'/2&:

Brungart et al.’01

Normal mix

“Pitchless”

Page 4: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

1984: Perception-Inspired Separation• Model the periodicity information

in the auditory nerve

4

Weintraub 1985

Lyon 1984

Mix

Female

Page 5: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

1996: An Academic Journey• Dan in 1996: The “weft”

5

5: Results 115

the energy visible in the fu ll signa l qu ite closely; note, however , the holes inthe weft envelopes, par t icu la r ly a round 300 Hz in the second weft ; a t thesepoin ts, the tota l signa l energy is fu lly expla ined by the background noiseelement , and, in the absence of st rong evidence for per iodic energy from theindividua l cor relogram channels, the weft in tensity for these channels hasbeen backed off to zero.

200400100020004000f/Hz "Bad dog"

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6501002004001000

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6time/s

−70

−60

−50

−40

dB

200400100020004000f/Hz Wefts1,2

501002004001000

200400100020004000f/Hz Noise1

Click2 Click3

200400100020004000f/Hz Click1

Figu re 5.7: The “bad dog” sound example, represen ted in the top panes by it st ime-frequency in tensity envelope and it s per iodogram (summaryautocor rela t ions for every t ime step). The noise and click elements expla in ing theexample a re displayed as their in tensity envelopes; weft elements addit iona llydisplay their per iod-t rack on axes match ing the per iodogram.

Page 6: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

1999: “Size Matters”• All you need is a BDNN

• .. and the data (and patience) to train it

6

500

1000

2000

4000

9.25

18.5

37

74

32

34

36

38

40

42

44

Hidden layer / unitsTraining set / hours

WER

%

WER for PLP12N nets vs. net size & training data

1 2 5 10 20 50

178 GCUP

356 GCUP

712 GCUP

1.42 TCUP

2.8 TCUP

5.7 TCUP

11.4 TCUP

100 200 50032

34

36

38

40

42

44WER vs. frames/weight

WER

%

frames/weight

Ellis & Morgan’99

Page 7: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

2001: Overlap Remains

• Meeting Recorder Project• natural speech interactions• ~10% of speech frames have overlaps

7

Janin, Baron, Edwards, Ellis, Gelbart, Morgan, Peskin, Pfau, Shriberg, Stolcke, Wooters ’03

120 125 130 135 140 145 150 155 time / secs

speakeractive

level/dB

mr-2000-06-30-1600

Spkr A

Spkr B

Spkr C

Spkr D

Spkr E

Tabletop20

40

breathnoise

crosstalk

backchannel floor seizure

interruptions

speaker Bcedes floor

Page 8: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

2003: EARS• “Pushing the envelope (aside)”

8

Page 9: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

2004: Pitch Based Separation• Literal implementations of the process

described in Bregman 1990:• compute “regularity” cues:

- common onset- gradual change- harmonic patterns- common fate

9

Hu & Wang 2004

Original v3n7

Brown 1992

Ellis 1996

Hu & Wang 2004

Page 10: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

2004: Model-Based Separation

• Data-driven separation• Learn codebooks for individual speakers• Find best combination of sources

• Pitch gives the “grist”

10

Roweis ’01Kristjansson, Attias, Hershey ’04

Page 11: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

2006: Pitch for VAD• Pitch is the most robust perceptual cue to

speech

11

Lee & Ellis’06

Page 12: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

2004-2011: The Epic

12

Processing and Perception of Speech and Music

Speech and Audio Signal Processing

Ben Gold Nelson Morgan Dan Ellis

S E C O N D E D I T I O N

Page 13: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

2012: Project Babel• Noisy speech is a challenge:

• How to disentangle speech and interference?• Energy peaks are speech (spectral subtraction)• Energy troughs are noise (Wiener, log-mmse)• Speech has a known form (Factorial HMM)• Voiced speech is periodic (Pitch-based)

13

freq /

kHz IARPA_BABEL_OP1_204_73990_20130521_162632_inLine

0

1

2

3

4

50 10 15 20 25 time / sec level / dB

-40

-20

0

20

Page 14: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

Classification-based Pitch Tracker• Subband Autocorrelation Classification

(SAcC) Pitch Tracker: • Trained on noisy speech with true pitch targets

• Subband autocorrelation features• PCA to reduce dimensions

14

Lee & Ellis ’12

Cochleafilterbank Pitch

Sound

short-timeautocorrelation

delay line

frequ

ency

chan

nels

Correlogramslice

freqlag

time

BN

B3B2B1

c1

ck

uv60.061.863.665.4

392.1403.6

Neural network classifier

Viterbismoother

Page 15: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

Flat-Pitch Processing• Time-varying filtering is tricky

• if pitch variation and filter impulse response are on a similar time-scale

15

• Solution: Flatten the pitch• use local pitch

estimateto resample

• process constant-pitch

• resampling is (near) invertible

noisyspeech

enhancedspeech

SAcCpitch

tracker

time-varyingresampling

pitch-flattentime map

time-varyingresampling

flat-pitch-domainenhancement

inversetime map

freq

/ Hz

Noisy signal

0

500

1000

Resampled to pitch = 200 Hz

Filtered − comb

time / s

Resampled back to original pitch and mixed with original

0 0.5 1 1.5

pitchpr(vx)

Page 16: or Morgan, Me & Pitch - Columbia Universitydpwe/talks/2015-03-morgan-pitch.pdfRobustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16 2001: Overlap Remains • Meeting Recorder

Robustness, Separation, Pitch - Dan Ellis 2015-03-14 - /16

Conclusions• Pitch is a key feature for speech separation

• “marking” signal against other speech or noise

• Some ideas don’t go away• .. though they can change shape

• Impact from collaboration• you can’t do good work with someone

without the human connection

16