ee269 signal processing for machine learning - cepstrum

EE269Signal Processing for Machine Learning

Cepstrum

Instructor : Mert Pilanci

Stanford University

October 11 2021

Linear systems and additive noise

I Linear systems, e.g., filters, can easily separate additive noisefrom useful information when we know the frequency range ofthe noise and information

y[n] = x[n] + w[n]

I In vector notation

Hy = Hx+Hw

Multiplicative or convolutive noise

I This is harder if the signal and noise are convoluted, e.g., inspeech processing

y[n] = x[n] ∗ w[n]

I w[n] is the flowing air (noise source)

I h[n] is the vocal tract (filter)

We can develop an operator that can separate convolutedcomponents by transforming convolution into addition

Cepstrum

I Developed to separate convoluted signals

y[n] = x[n] ∗ w[n]

Discrete Fourier Domain:

Y [k] = X[k]W [k]

I Take logarithms

log[Y [k]] = logX[k] + logW [k]

I we can apply a linear filter to log Y [k] to separate

I equivalently we can take DFT of log Y [k] and process infrequency domain

cepstrum is the DFT (or DCT) of the log spectrum

Application: Mel-frequency spectrum

I perceptual scale of pitches

I 1 mels = 1000 Hz

I a formula to convert f hertz into m mels

m = 2595 log10

⇣1 + f

I weighted DFT magnitude

I mel-frequency spectrum MF [r] is defined as

MF [r] =X

|Vr[k]X[k]|2

I Vr[k] is the triangular weighting function for the rth filter.

I bandwidths are constant for center frequencies ¡ 1kHz and

then increase exponentially

I identical to convolutions with 22 filters

I weighted DFT magnitude

I mel-frequency spectrum MF [r] is defined as

MF [r] =X

|Vr[k]X[k]|2

I Vr[k] is the triangular weighting function for the rth filter.

I bandwidths are constant for center frequencies ¡ 1kHz and

then increase exponentially

I identical to convolutions with 22 filters

MF[r] =X

|Vr[k]X[k]|2

I Mel Frequency Cepstral Coe�cient (MFCC)

MFCC[m] =RX

log(MF[r]) cos

✓r +

�(1)

I i.e., inner-product with cosines MFCC[m] = hlogMF[r], cm[r]i

Application: Speaker Identification

I train a k-Nearest Neighbor classifier to classify frames

I AN4 dataset (CMU): 5 male and 5 female subjects speaking

words and numbers

I collect the training samples into frames of 30 ms with an

overlap of 75%

I calculate MFCC

I train a k-Nearest Neighbor classifier on the frames

I for a given test signal, predictions are made every frame

I most frequently occurring label is declared as the speaker

speaker 1 (blue) and speaker 2 (red) time domain signals

frame based MFCC features

I average accuracy is 92.93%

ee269 signal processing for machine learning - cepstrum

Documents

ece 6560 multirate signal processing chapter...

biological signal & signal processing

signal detection and extraction by cepstrum techniques

moving cepstrum

machine diagnostics using advanced signal processing ·...

signal processing -

sam signal processing examples statistical signal processing...

signal processing examples with c64x digital signal...

what is signal processing? alex acero president, ieee signal...

signal processing for electronic nose, signal processing

signal processing for spatial sound control · digital...

higher - order gabor spectra a mathematical model...

advanced digital signal processing part 5: multi-rate...

signal spectra, signal processing

digital signal processing lecture 1 -...

array signal processing - openstax cnx · chapter 1 array...

signal processing for telecommunications and … · part i:...

minimum-phase signal calculation using the real cepstrum ·...

signal representations:...

casper signal processing workshop 2009 ska signal processing...