1 topics covered in this chapter –three basic problems in pattern comparison how to detect the...

42
1 Topics covered in this chapter – Three basic problems in pattern comparison • How to detect the speech signal in a recording interval (i.e. separate speech from background) • How to locally compare spectra from two speech utterances (local spectral distortion measure), and • How to globally align and normalize the distance between two speech patterns (sequences of spectral vectors) which may or may not represent the same linguistic sequence of sounds (word, phrase, sentence, etc.)

Upload: morris-roberts

Post on 26-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

1

Topics covered in this chapter– Three basic problems in pattern

comparison• How to detect the speech signal in a

recording interval (i.e. separate speech from background)

• How to locally compare spectra from two speech utterances (local spectral distortion measure), and

• How to globally align and normalize the distance between two speech patterns (sequences of spectral vectors) which may or may not represent the same linguistic sequence of sounds (word, phrase, sentence, etc.)

Page 2: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

2

Distortion Measures

• Mathematical considerations to find out the dissimilarity between two feature vectors.

• Let x and y are two vectors defined on a vector space X.

• A metric or distance function d on the vector space X as a real valued function on the Cartesian product XX is defined as ……

Page 3: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

3

Distortion Measures

y)d(x,z)y z,d(x d)

ifinvariant called isfunction distortion headdition tin

condition) inequality

r (triangula Xyfor x, z)d(y,),(y)d(x, c)

(symmetry) X yfor x, x)d(y,y)d(x, b)

property) sdefinitnes (positive

y xiff 0y)d(x, and Xyfor x, ),(0 a)

zxd

yxd

Page 4: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

4

Distortion Measures

• If a measure of a distance d, satisfies only the positive definiteness property then it is called as distortion measure if vectors are representation of the speech spectra.

• Distance in speech recognition means measure of dissimilarity.

• For speech processing, an important consideration in choosing a measure of distance is its subjective meaningfulness

• The mathematical measure of distance to be useful in speech processing should consider the lingustic characteristics.

Page 5: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

5

Distortion Measures

For example a large difference in the waveform error does not always imply large subjective differences.

Page 6: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

6

Distortion Measures

• Perceptual considerations: the choice of an appropriate measure of spectral dissimilarity is the concept of subjective judgment of sound difference or phonetic relevance.

• Spectral changes that keep the sound the same perceptually should be associated with small distances.

• And spectral changes that keep the sound the different perceptually should be associated with large distances

Page 7: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

7

Distortion Measures

• Consider comparing two spectral representations, S(w) and S’(w) using a distance measure d(S,S’)

• If the spectral content of two signal are phonetically same (same sound) then the distance measure d is ideally very small

Page 8: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

8

Distortion Measures• Spectral changes due to large

phonetic distance include– Significant differences in formant

locations. i.e the spectral resonance of S(w) and S’(w) occure at very different frequencies.

– Significant differences in formant bandwidths. i.e the frequency widths of spectral resonance of S(w) and S’(w) are very different.For each of these cases sounds are different so the spectral distance measure d(S,S’) is ideally very large

Page 9: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

9

Distortion Measures

To relate a physical measure of difference to subjective perceived measure of difference it is important to understand auditory sensitivity to changes in frequencies, bandwidths of the speech spectrum, signal sensitivity and fundamental frequency.

Page 10: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

10

Distortion Measures

• This sensitivity is presented in the form of just discriminable change – the change in a physical parameter such that the auditory system can reliably detect the change as measured in standard listening test.

Page 11: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

11

Spectral-distortion measures

• Measuring the difference between two speech patterns in terms of average spectral distortion is reasonable way both in terms of its mathematically tractability and its computational efficiency

• Perceived sound differences can be interpreted in terms of differences of spectral features

Page 12: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

12

Log spectral distance

• Consider two spectra S(w) and S’(w). The difference between two spectra on a log magnitude versus frequency scale is defined by

• A distance or distortion measure between S and S’ can be defined by

-1---------)(logS'-)S(log)V(

22

)()()S'(S, p

d

VddP

p

Page 13: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

13

22

)',()()S'(S, p

d

SSVddP

p

This is related to how humans perceive sound differences

Page 14: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

14

Log spectral distance

• For P=1 the above equation defines the mean absolute log spectral distortion

• For P=2, equation defines the rms log spectral distortion that has application in many speech processing systems

• For P tends to infinity, equation reduces to the peak log spectral distrotion

Page 15: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

16

Cepstral distances• For the Cepstral coefficients we use the

rms log spectral distance.

1|)S(|log

spectrumpower The

)(log)(log)(log

)S( of log Taking

domainfrequency in the )()()S(

excitement and components tract vocalx(t)*h(t)S(t)

2

jn

nnec

XHS

XH

Page 16: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

17

Cepstral distances

ly.respective )(S' and )S( of tscoefficien

cepstral are 'c and c where)'(

32

)('log)(log)',(

tscoefficien LPC

thefrom obtained becan tscoefficien Cepstral The

nn2

2

2

nnn

ceps

cc

dSSSSd

Page 17: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

20

Weighted cepstral distances and liftering

• Liftering makes the system more robust to noise,

• Liftering is done to obtain the equal variance

• Liftering is significant for the improvement for the recognition performance

• If we incorporate n2 factor into the cepstral distance to normalize the contribution from each cepstarl term, the distance

n

nnn

nnw ncncccnd 2)()( 2'2'222

Page 18: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

24

Page 19: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

25

Weighted cepstral distances and liftering

• The original sharp spectral peaks are highly sensitive to the LPC analysis condition and the resulting peakiness creates unnecessary sensitivity in spectral comparison

• The liftering process tends to reduce the sensitivity without altering the fundamental “formant” structure.

• i.e the undesirable (noiselike) components of the LPC spectrum are reduced or removed, while essential characteristics of the “formant” structure are retained

Page 20: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

26

Weighted cepstral distances and liftering

• A useful form of weighted cepstral distance is

• Where w(n) is any lifter function.

L

nnncw cnwcnwd

1

2'2 ))()((

Page 21: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

27

Itakura and Saito

• The log spectral difference V(w) is defined by V(w) = log S(w) – log S’(w) is the basis of many distortion measures

• The distortion measure proposed by Itakura and Saito in their formulation of linear prediction as an approximate maximum likelihood estimation is

Page 22: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

28

Itakura and Saito

-

2

22

2

2

)(

2

dw S(w) logexp

where

ly.respective (w)S' and

S(w) of errors prediction are ' and where

1'

log2)('

)()',(

21)()',(

dw

wS

wSSSd

dwwVeSSd

IS

wVIS

Page 23: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

29

Itakura and Saito

• The Itakura Satio distortion measure can be used to illustrate the spectral matching properties by replacing S’(w) with the pth order all pole spectrum

energy residual theis where, 2

)(

gain theis where

1loglog2

)(1

,

2

2

222

22

2

dweAwS

dweAwS

eASd

jw

jw

jwIs

Page 24: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

30

Itakura

2)(

)(log

1,

1

is measure distortion Itakura then the

consider uslet

2

2

22

2

dw

eA

eA

AAd

jwp

jw

p

I

Page 25: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

31

Likelihood Distortions

• The role of the gain terms is not explicit in the Itakura distortion because the signal level essentially makes no difference in the human understanding of speech so long as it is unambiguously heard.

• Gain independent distortion measure called likelihood ration distortion can be derived directly from IS distortion measure

2222

1,

11,

1

AAd

AAd

p

LR

p

I

Page 26: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

32

Likelihood Distortions

When the distortion is very small the Itakura distortion measure is not very different from the likelihood distortion measure.

Page 27: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

33

Variations of likelihood distortions

• Compare to the cepstral distance likelihood distortions are asymmetric.

• To symmetries the distortion measure there are two methods– COSH distortion – Weighted likelihood distortion

Page 28: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

34

COSH distortion

• COSH distortion is given by

• The COSH distortion is almost identical to twice the log spectral distance for small distortions

12)('

)(logcosh

dw

wS

wSdCOSH

Page 29: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

35

Weighted likelihood ratio distortion

The purpose of weighting is to take the spectral shape into account as a weighting function such that different spectral components along frequency axis can be emphasized or de-emphasized to reflect some of the observed perceptual effects

Page 30: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

36

Weighted likelihood ratio distortion

lyrespective A'

' and

Afor

sequencesation autocorrel are )('ˆ and )(ˆ and

A'

1log and

A

1log

of tscoefficien cepstral are c' and c where

''

)('ˆ)(ˆ

2

2

2

2

22

nn

22

nrnr

ccnrnr

d nnWLR

Page 31: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

37

Comparison of dWLR and d22

tenvironmennoisy in n recognitiospeech assuch

necessary, is peaks spectral of emphasisary extraordin

wherensapplicatio in the required isproperty This

)A'

1log-

A

1(log

deviation compressed than theareaspeak spectralin

emphasisheavier shows which )A'

1-

A

1(

deviationlinear by replaced is thisdin

)A'

1log-

A

1(log dIn

22

22

WLR

2222

Page 32: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

38

Weighted slope metric distortion measure

Based on a series of experiments designed to measure the subjective “phonetic” distance between pairs of synthetic vowels and fricatives, it is found that by controlled variation of several acoustic parameters and spectral distortions including formant frequency, formant amplitude, spectral tilt, highpass, lowpass, and notch filtering only formant frequency deviation was phonetically relevant

Page 33: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

39

Weighted slope metric distortion measure

WSM attach a weight on the spectral slope difference near spectral peaks, rather than the spectral amplitude difference, and take the overall energy difference explicitly into consideration

considerd bands critical

of no. total theisK and S' and Sbetween ),(')(

difference slope spectral band critical afor tscoefficien

weighting theis u(i) ,S' and Sbetween '

energy absolutefor constant weighting theis where

)(')()(')',(1

2

ii

EE

u

iiiuEEuSSd

ss

ss

E

K

issssEWSM

S

Page 34: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

40

Summary

• The spectral distortion measures are designed to measure dissimilarity or distance between two (power) spectra of speech

• Many of these dissimilarity measures are not metrics because they do not satisfy the symmetry property

• If an objective speech distortion measure needs to reflect the subjective reality of human perception of sound differences, or even phonetic disparity, the asymmetry seems to be actual desirable.

S

Page 35: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

41

Summary

• All distortion measures are equally important because certain distortion measures may be better for an less noisy environment, while others may be robust when the background is more noisy.

Page 36: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

42

Summary

• Log spectral: Lp metric requires large amount of calculations because we need 2 FFT’s to obtain S(w) and S’(w), logarithms of all values of S and S’ and an integral

p

P

p

dwSwSd

/1

2)('log)(log(

Page 37: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

43

Summary

• Truncated and weighted cepstral: Requires only L operations where L is of the order of 12-16 hence calculations required are less compared to Lp metric

2'

1

2

1

2'2

)()(

)()(

nn

L

nCW

L

nnnc

ccnWd

ccLd

Page 38: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

44

Summary

• The likelihood, Itakura-Saito, Itakura and COSH measurements: all requires on the order of p is the LPC order of all pole polynomial (8-12). Hence the computations are same for cepstral measures

Page 39: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

45

Summary

12)('

)(logcosh

12

2log

1'

log2)('

)()',(

2

2

2

2

2

2

dw

wS

wSd

dw

A

Ad

dw

A

Ad

dw

wS

wSSSd

COSH

p

LR

p

I

IS

Page 40: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

46

Summary

• Weighted likelihood ratio distortion: Requires L operations, similar to that of the cepstral measures

2'

122

)('

)('ˆ)(ˆnn

L

nWLR cc

nrnrd

Page 41: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

47

Summary

• Weighted Slope metric (WSM): Requires K operations, where K is the number of frequency bands used in computations (32-64)

K

issssEWSM iiiuEEuSSd

1

2)(')()(')',(

Page 42: 1 Topics covered in this chapter –Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech

48

Summary

• From all these points we can say that all the measures are both physically reasonable and computationally tractable for speech recognition except for the Lp metrics.

• Hence, practically we are going to use all the measures to study the speech recognition system