cepstral coefficients
TRANSCRIPT
Methods and algorithms of speech recognition course
Lection 5
Nikolay V. Karpov
nkarpov(а)hse.ru
Cepstral Coefficients Relation to pole positions Relation to LPC filter coefficients
◦ Line Spectrum Frequencies
Relation to pole positions and to formant frequencies◦ Summary of LPC parameter sets
Most speech recognisers describe the spectrum of speech sounds using cepstral coefficients. This is because they are good at discriminating between different phonemes, are fairly independent of each other and have approximately Gaussian distributions for a particular phoneme.
Most speech coders describe the spectrum of speech sounds using line spectrum frequencies. This is because they can be quantised to low precision without distorting the spectrum too much.
deeVc nii
n )(log2
1
Cepstrum: inverse Fourier transform of log spectrum (periodic spectrum ⇒discrete cepstrum)
kx
The coefficients can be obtained directly from the kxnc
n
n
nzczC )( dezCc ni
n )(2
1Define
This is the standard inverse z-transform derived by taking the inverse Fourier transform of both sides of the first equation.By equating the Fourier transforms of the two expressions for cn, we get
p
k
k
p
k
k
k zxzazA
zAGzA
GzVzC
1
1
1
)1(1)(
))(log()log()(
log))(log()(
By using the Taylor series
p
k n
nn
k
p
k
k
n
n
zn
xG
zxGzAGzC
yn
yy
1 1
1
1
1
)log(
)1log()log())(log()log()(
1for ;)1log(
By collecting all the terms in , we can get in terms ofnc kxnz
0nfor
0nfor )log(
0nfor 0
1
p
k
n
k
n
nx
Gc
Becausethe decrease exponentially with n
1kxnc
Differentiating
)(')(')()(')(')()(
)(')('
))(log()log()(
zzAzzCzAzAzCzAzA
zAzC
zAGzC
p
n
n
n
p
k
km
k
m
m
m
m
m
p
n
n
n
m
m
m
p
k
k
k
p
n
n
n
m
m
m
p
k
k
k
znazamczmc
znazmcza
znazzmczza
11
)(
11
111
1
)1(
0
)1(
1
)()1(
)()1(
replacing m by n-k (to make the z exponent uniform) givesp
k
n
k
kn
kn
p
n
n
n
n
n
n zacknznaznc1 1
)(
11
)(
)1,min(
1
)(
)1,min(
1
)( )(1
)(np
k
kknnn
np
k
kknnn acknn
acacknnanc
Thus we have a recurrence relation to calculate the cn from the ak
coefficients
5
31221344
211233
1122
11
)(4
1
)(3
1
2
1
c
acacacac
acacac
acac
ac
These coefficients are called the complex cepstrum coefficients (even though they are real). The cepstrum coefficients use log|V|instead of log(V) and (except for c0) are half as big.
Note the cute names: spectrum→cepstrum, frequency→quefrency, filter→lifter, etc
p
p
p
j
j
j zazazazazV
GzA 2
2
1
1
11
11)(
)(
)1(
1
2
12
1
1
1**)1(
)()()(1
)()()(
pp
ppp
p
zzaazaazaa
zAzzAzP
V(z) is stable if and only if the roots of P(z)and Q(z)all lie on the unit circle and they are interleaved
)1(
1
2
12
1
1
1**)1(
)()()(1
)()()(
pp
ppp
p
zzaazaazaa
zAzzAzQ
If the roots of P(z) are at exp(2πjfi ) for i=1,3,… and those of Q(z) are at exp(2πjfi ) for i=0,2,… with fi+1>fi≥ 0 then the LSF frequencies are defined as f1, f2, …,fp.Note that it is always true that f0=+1 and fp+1=–1
3213211**3
32121
2.12.11)( 7.05.0)(
2.02.01)( 5.07.01)(
zzzzQzzzzAz
zzzzPzzzA
p
i i
i
p
i i
i
p
p
p
zx
xzz
zxz
zxz
zAz
zAzH
zHzAzzAzQ
zHzAzzAzP
1*
1*1
1
1**)1(
1**)1(
1**)1(
)1(
)(
)1(
)1(
)(
)()(
1)()()(0)(
1)()()(0)(
here the are the roots of )()( 1 zVzAix
It turns out that providing all the xi lie inside the unit circle, the absolute values of the terms making up H(z) are either all > 1 or else all < 1. Taking | | of a typical term
Filter Coefficients:ai– Stability check difficult; Sensitive to errors; Cannot interpolate Pole Positions: xi+ Stability check easy; Can interpolate but unordered.– Hard to calculate; Sensitive to errors near |xi|=1 Reflection Coefficients: ri+ Stability check easy; Can interpolate– Sensitive to errors near ±1 Log Area Ratios: gi+ Stability guaranteed; Can interpolate Cepstral Coefficients :ci+ Good for speech recognition– Stability check difficult Line Spectrum Frequencies: fi+ Stability check easy; Can interpolate; Vary smoothly in time; Strongly correlated ⇒better coding; Related to spectral peaks (formants). – Awkward to calculate