cepstral coefficients

11
Methods and algorithms of speech recognition course Lection 5 Nikolay V. Karpov nkarpov(а)hse.ru

Upload: nikolay-karpov

Post on 10-Jul-2015

755 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Cepstral coefficients

Methods and algorithms of speech recognition course

Lection 5

Nikolay V. Karpov

nkarpov(а)hse.ru

Page 2: Cepstral coefficients

Cepstral Coefficients Relation to pole positions Relation to LPC filter coefficients

◦ Line Spectrum Frequencies

Relation to pole positions and to formant frequencies◦ Summary of LPC parameter sets

Most speech recognisers describe the spectrum of speech sounds using cepstral coefficients. This is because they are good at discriminating between different phonemes, are fairly independent of each other and have approximately Gaussian distributions for a particular phoneme.

Most speech coders describe the spectrum of speech sounds using line spectrum frequencies. This is because they can be quantised to low precision without distorting the spectrum too much.

Page 3: Cepstral coefficients

deeVc nii

n )(log2

1

Cepstrum: inverse Fourier transform of log spectrum (periodic spectrum ⇒discrete cepstrum)

kx

The coefficients can be obtained directly from the kxnc

n

n

nzczC )( dezCc ni

n )(2

1Define

This is the standard inverse z-transform derived by taking the inverse Fourier transform of both sides of the first equation.By equating the Fourier transforms of the two expressions for cn, we get

p

k

k

p

k

k

k zxzazA

zAGzA

GzVzC

1

1

1

)1(1)(

))(log()log()(

log))(log()(

Page 4: Cepstral coefficients

By using the Taylor series

p

k n

nn

k

p

k

k

n

n

zn

xG

zxGzAGzC

yn

yy

1 1

1

1

1

)log(

)1log()log())(log()log()(

1for ;)1log(

By collecting all the terms in , we can get in terms ofnc kxnz

0nfor

0nfor )log(

0nfor 0

1

p

k

n

k

n

nx

Gc

Becausethe decrease exponentially with n

1kxnc

Page 5: Cepstral coefficients

Differentiating

)(')(')()(')(')()(

)(')('

))(log()log()(

zzAzzCzAzAzCzAzA

zAzC

zAGzC

p

n

n

n

p

k

km

k

m

m

m

m

m

p

n

n

n

m

m

m

p

k

k

k

p

n

n

n

m

m

m

p

k

k

k

znazamczmc

znazmcza

znazzmczza

11

)(

11

111

1

)1(

0

)1(

1

)()1(

)()1(

replacing m by n-k (to make the z exponent uniform) givesp

k

n

k

kn

kn

p

n

n

n

n

n

n zacknznaznc1 1

)(

11

)(

Page 6: Cepstral coefficients

)1,min(

1

)(

)1,min(

1

)( )(1

)(np

k

kknnn

np

k

kknnn acknn

acacknnanc

Thus we have a recurrence relation to calculate the cn from the ak

coefficients

5

31221344

211233

1122

11

)(4

1

)(3

1

2

1

c

acacacac

acacac

acac

ac

Page 7: Cepstral coefficients

These coefficients are called the complex cepstrum coefficients (even though they are real). The cepstrum coefficients use log|V|instead of log(V) and (except for c0) are half as big.

Note the cute names: spectrum→cepstrum, frequency→quefrency, filter→lifter, etc

Page 8: Cepstral coefficients

p

p

p

j

j

j zazazazazV

GzA 2

2

1

1

11

11)(

)(

)1(

1

2

12

1

1

1**)1(

)()()(1

)()()(

pp

ppp

p

zzaazaazaa

zAzzAzP

V(z) is stable if and only if the roots of P(z)and Q(z)all lie on the unit circle and they are interleaved

)1(

1

2

12

1

1

1**)1(

)()()(1

)()()(

pp

ppp

p

zzaazaazaa

zAzzAzQ

Page 9: Cepstral coefficients

If the roots of P(z) are at exp(2πjfi ) for i=1,3,… and those of Q(z) are at exp(2πjfi ) for i=0,2,… with fi+1>fi≥ 0 then the LSF frequencies are defined as f1, f2, …,fp.Note that it is always true that f0=+1 and fp+1=–1

3213211**3

32121

2.12.11)( 7.05.0)(

2.02.01)( 5.07.01)(

zzzzQzzzzAz

zzzzPzzzA

Page 10: Cepstral coefficients

p

i i

i

p

i i

i

p

p

p

zx

xzz

zxz

zxz

zAz

zAzH

zHzAzzAzQ

zHzAzzAzP

1*

1*1

1

1**)1(

1**)1(

1**)1(

)1(

)(

)1(

)1(

)(

)()(

1)()()(0)(

1)()()(0)(

here the are the roots of )()( 1 zVzAix

It turns out that providing all the xi lie inside the unit circle, the absolute values of the terms making up H(z) are either all > 1 or else all < 1. Taking | | of a typical term

Page 11: Cepstral coefficients

Filter Coefficients:ai– Stability check difficult; Sensitive to errors; Cannot interpolate Pole Positions: xi+ Stability check easy; Can interpolate but unordered.– Hard to calculate; Sensitive to errors near |xi|=1 Reflection Coefficients: ri+ Stability check easy; Can interpolate– Sensitive to errors near ±1 Log Area Ratios: gi+ Stability guaranteed; Can interpolate Cepstral Coefficients :ci+ Good for speech recognition– Stability check difficult Line Spectrum Frequencies: fi+ Stability check easy; Can interpolate; Vary smoothly in time; Strongly correlated ⇒better coding; Related to spectral peaks (formants). – Awkward to calculate