1 pattern comparison techniques test pattern:reference pattern:

67
1 PATTERN COMPARISON TECHNIQUES PATTERN COMPARISON TECHNIQUES }, ,..., , , { 3 2 1 i t t t t T }. ,..., , { 2 1 j J j j j t t t R Test Pattern: Reference Pattern:

Upload: cory-adams

Post on 02-Jan-2016

227 views

Category:

Documents


6 download

TRANSCRIPT

1

PATTERN COMPARISON TECHNIQUESPATTERN COMPARISON TECHNIQUES

},,...,,,{ 321 ittttT }.,...,,{ 21jJ

jjj tttR Test Pattern: Reference Pattern:

2

4.2 SPEECH (ENDPIONT) DETECTION4.2 SPEECH (ENDPIONT) DETECTION

3

4.3 DISTORTION MEASURES-4.3 DISTORTION MEASURES-MATHEMATICAL CONSIDERATIONSMATHEMATICAL CONSIDERATIONS

).,(),()(

.,,for),(),(),()(

;,for),(),()(

;ifonlyandif0),(

,for),(0)(

yxdzyzxdd

zyxzydzxdyxdc

yxxydyxdb

yxyxd

yxyxda

x and y: two feature vectors defined on a vector space XThe properties of metric or distance function d:

A distance function is called invariant if

4

PERCEPTUAL CONSIDERATIONSPERCEPTUAL CONSIDERATIONSSpectral changes that do not fundamentally change the perceived sound include:

5

PERCEPTUAL CONSIDERATIONSPERCEPTUAL CONSIDERATIONS

Spectral changes that lead to phonetically different sounds include:

6

PERCEPTUAL PERCEPTUAL CONSIDERATIONSCONSIDERATIONSJust-discriminable change:Just-discriminable change:

known as JND (just-noticeable known as JND (just-noticeable difference), DL (difference limen), or difference), DL (difference limen), or differential thresholddifferential threshold

7

4.4 DISTORTION MEASURES-4.4 DISTORTION MEASURES-PERCEPTUAL CONSIDERATIONSPERCEPTUAL CONSIDERATIONS

8

4.4 DISTORTION MEASURES-4.4 DISTORTION MEASURES-PERCEPTUAL CONSIDERATIONSPERCEPTUAL CONSIDERATIONS

9

Spectral Distortion Spectral Distortion MeasuresMeasures

Spectral Density

Fourier Coefficients of Spectral Density

Autocorrelation Function

10

Spectral Distortion Spectral Distortion MeasuresMeasures

Short-term autocorrelation

Then is an energy spectral density)(S

11

Spectral Distortion Spectral Distortion MeasuresMeasures

Autocorrelation matrices

12

Spectral Distortion Spectral Distortion MeasuresMeasures

If σ/A(z) is the all-pole model for the speech spectrum,The residual energy resulting from “inverse filtering”

the input signal with an all-zero filter A(z) is:

13

Spectral Distortion Spectral Distortion MeasuresMeasures

Important properties of all-pole modeling:

The recursive minimization relationship:

14

LOG SPECTRAL DISTANCELOG SPECTRAL DISTANCE

15

LOG SPECTRAL DISTANCELOG SPECTRAL DISTANCE

16

CEPSTRAL DISTANCESCEPSTRAL DISTANCES

The complex cepstrum of a signal is defined as The Fourier transform of log of the signal spectrum.

nnn

nn

n

jnn

cc

dSSd

dSc

cc

ecS

2

222

2

0

)(

2|)(log)(log|

distance spectral log rms the tospectra theof distance cepstral L therelate

can we theorem,sParseval' applyingby spectra, ofpair aFor

2)(log : thatNote

ts.coefficien cepstral theas toreferred and real are where

)(log

:as expressed becan ))log(S( oftion representa seriesFourier The

17

CEPSTRAL DISTANCESCEPSTRAL DISTANCES

nnn

njn

j

k

n

kknknn

n

nn

ccceceA

aaakcn

ac

zczA

and log where]|)(|/log[

:becomesexpansion seriesTaylor thespectrum,power log theof In terms

p.kfor 0 and 1 where0nfor 1

:derive we,z of powers like of tscoefficien theequating and z

respect toith equation w theof sidesboth atingDifferenti

log)](/log[ :expansionLaurent

20

22

0

1

1

1-1-

1

Truncated cepstral distance

L

nnnc ccLd

1

22 )()(

18

CEPSTRAL DISTANCESCEPSTRAL DISTANCES

19

CEPSTRAL DISTANCESCEPSTRAL DISTANCES

20

Weighted Cepstral Distances and Weighted Cepstral Distances and LifteringLiftering

It can be shown that under certain regular conditions, the cepstral coefficients, except c0, have:

1) Zero means2) Variances essentially inversed proportional to the square of the

coefficient index:

22 1

ncE n

If we normalize the cepstral distance by the variance inverse:

21

Weighted Cepstral Distances and Weighted Cepstral Distances and LifteringLiftering

Differentiating both sides of the Fourier series equation of spectrum:

This is an L2 distance based upon the differences between the spectral slopes

22

Cepstral Weighting or Liftering Cepstral Weighting or Liftering ProcedureProcedure

h is usually chosen as L/2and L is typically 10 to 16

23

A useful form of weighted cepstral distance:

24

Likelihood DistortionsLikelihood Distortions

Previously defined:Previously defined:

Itakura-Saito Itakura-Saito distortion measuredistortion measure

Where and are one-step prediction errorsWhere and are one-step prediction errors of and as defined: of and as defined:

2

2

)(S )(S

25

26

Likelihood DistortionsLikelihood Distortions

The residual energy can be easily evaluated by:The residual energy can be easily evaluated by:

27

By replacing by its optimal p-th order LPC model spectrum: By replacing by its optimal p-th order LPC model spectrum: )(S

If we set If we set σσ22 to match the residual energy to match the residual energy αα : :

Which is often referred to as Which is often referred to as Itakura distortion measure Itakura distortion measure

Likelihood DistortionsLikelihood Distortions

28

Likelihood DistortionsLikelihood DistortionsAnother way to write the Itakura distortion measure is:Another way to write the Itakura distortion measure is:

Another gain-independent distortion measure is called the Another gain-independent distortion measure is called the Likelihood Ratio distortion:Likelihood Ratio distortion:

29

4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions

.1

12|)(|

|)(|

||

1,

||

1

||

1,

||

1

2

2

2

2222

p

pt

jp

j

pIS

pLR

aRa

d

eA

eA

AAd

AAd

30

4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions

.||

1,

||

1

,1,1

log)||/1,||/1(

,1/

...,)(log!3

1)(log

!2

1log1)exp(log

22

22

222

2

22

AAd

for

AAd

and

uuuuu

pLR

pp

ppI

p

That is, when the distortion is small, the Itakura distortion measureThat is, when the distortion is small, the Itakura distortion measure is not very different from the LR distortion measureis not very different from the LR distortion measure

31

4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions

),(),( ssdssd ISIS

32

4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions

)(S

)(nX)(

)()(

zB

zAzH

)(S

)(nX

Consider the Itakura-Saito distortion between Consider the Itakura-Saito distortion between the input and output of a linear system H(z)the input and output of a linear system H(z)

33

4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions

,)(

)()(

.2

1)(log)(

1),(

)(log)(

).()()(

2

2

2

2

zB

zAzH

deH

eHSSd

eHV

SeHS

j

jIS

j

j

34

4.5.4 Likelihood Distortions4.5.4 Likelihood Distortions

22

2

2

2

1

1

1

1

1,

1

12)(

)(1

2)(

1),(

1)(

1)(

2

1

BAd

d

eA

eBd

eHSSd

zazB

zazA

IS

j

j

jIS

p

ii

p

ii

35

4.5.5 Variations of Likelihood Distortions4.5.5 Variations of Likelihood Distortions

.),(),(2

1

),(1

)(

mmIS

mIS

mx

ssdssd

ssd

Symmetric distortion measures:Symmetric distortion measures:

36

4.5.5 Variations of Likelihood Distortions4.5.5 Variations of Likelihood Distortions

).,(2

1),(:

.!4!2

1cosh

).,(2

1)](cosh[

21)(1)(

2

1),(

.),(),(2

1),(,1

22

42

)()()1(

)1(

ssdssdso

VVV

ssdd

V

dVeVessd

ssdssdssdm

COSH

COSH

VVx

ISISx

COSH distortionCOSH distortion

37

4.5.5 Variations of Likelihood 4.5.5 Variations of Likelihood DistortionsDistortions

38

4.5.6 Spectral Distortion Using a 4.5.6 Spectral Distortion Using a Warped Frequency ScaleWarped Frequency Scale

Psychophysical studies have shown that human perception of the frequency Content of sounds does not follow a linear scale. This research has led to the idea of defining subjective pitch of pure tones.

For each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the “mel” scale.

As a reference point, the pitch of a 1 kHz tone, 40 dB above the perceptual hearing threshold, is defined as 1000 mels.

39

40

4.5.6 Spectral Distortion Using a 4.5.6 Spectral Distortion Using a Warped Frequency ScaleWarped Frequency Scale

41

4.5.6 Spectral Distortion Using a 4.5.6 Spectral Distortion Using a Warped Frequency ScaleWarped Frequency Scale

42

4.5.6 Spectral Distortion Using a 4.5.6 Spectral Distortion Using a Warped Frequency ScaleWarped Frequency Scale

4324

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

500,13

500,10

500,8

000,7

800,5

800,4

000,4

400,3

900,2

500,2

150,2

850,1

600,1

370,1

170,1

000,1

840

700

570

450

350

250

150

50

500,3

500,2

800,1

300,1

100,1

900

700

550

450

380

320

280

240

210

190

160

150

140

120

110

100

100

100

000,12

500,9

700,7

400,6

300,5

400,4

700,3

150,3

700,2

320,2

000,2

720,1

840,1

270,1

080,1

920

770

630

510

400

300

200

100

500,15

000,12

500,9

700,7

400,6

300,5

400,4

700,3

150,3

700,2

320,2

000,2

720,1

840,1

270,1

080,1

920

770

630

510

400

300

200

100

Number

Bank

Critical

(Hz)

Frequency

Center

(Hz)

Band

Critical

(Hz)

Frequency

fLowerCutof

(Hz)

Frequency

fUpperCutof

Examples ofExamples ofCritical bandwidthCritical bandwidth

44

Warped cepstral distanceWarped cepstral distance

B

B B

dbbSbSssd ,

2|))((log))((log|),( 2

2

2

~

i kikkkii

i

B

B

kibjkk

kii

wcccc

dbeccccB

d

,

))((2

2

~

))((

))((2

1

b is the frequency in Barks, S(b is the frequency in Barks, S(θθ(b)) is the spectrum on a (b)) is the spectrum on a Bark scale, and B is the Nyquist frequency in Barks.Bark scale, and B is the Nyquist frequency in Barks.

45

4.5.6 Spectral Distortion Using a Warped Frequency Scale4.5.6 Spectral Distortion Using a Warped Frequency Scale

.)()()(

.2

2~

))((

ikkk

L

Li

L

Lkiic

B

B

kibjik

wccccLd

B

dbew

Where the warping function is defined byWhere the warping function is defined by

46

4.5.6 Spectral Distortion Using a Warped Frequency Scale4.5.6 Spectral Distortion Using a Warped Frequency Scale

.13||6)]()([2

1)(

13||10)1000(3333

)()(

6||13

tan76.0

1000

3333)()(

21

10/)776.8(2

1

bforbbb

bforbb

bforb

bb

b

47

4.5.6 Spectral Distortion Using a Warped Frequency Scale4.5.6 Spectral Distortion Using a Warped Frequency Scale

48

4.5.6 Spectral Distortion Using a 4.5.6 Spectral Distortion Using a Warped Frequency ScaleWarped Frequency Scale

49

4.5.6 Spectral Distortion Using a Warped Frequency Scale4.5.6 Spectral Distortion Using a Warped Frequency Scale

2~

1

~2

1

~~

)()(

,...,2,1,2

1cos)(log

c~ n

L

nn

K

k

kn

ccLd

LnK

knSc

Mel-frequency cepstrum:Mel-frequency cepstrum:

is the output power of the triangular filtersis the output power of the triangular filtersKkSk ,...,1,~

Mel-frequency cepstral distanceMel-frequency cepstral distance

50

4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures

51

4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures

,1

1logloglog

,...,2,1,1

1

1

1

i

i

i

ii

i

i

i

ii

k

k

A

Ag

pik

k

A

Ag

Wave reflection occurs at each sectional boundary with Wave reflection occurs at each sectional boundary with reflection coefficients denoted by reflection coefficients denoted by pik i ,...,2,1,

52

4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures

n

iiig ggd

1

22 )log(log

Another possible parametric representation of the all-pole Another possible parametric representation of the all-pole spectrum is the set of line spectral frequencies (LSFs) defined as spectrum is the set of line spectral frequencies (LSFs) defined as the roots of the following two polynomials based Upon the inverse the roots of the following two polynomials based Upon the inverse filter A(z):filter A(z):

These two polynomials are equivalent to artificially augmenting These two polynomials are equivalent to artificially augmenting the the p-section nonuniform acoustic tube with an extra section that is p-section nonuniform acoustic tube with an extra section that is either completely closed (area=0) or completely open either completely closed (area=0) or completely open (area=(area=∞). LSF parameters, due to their particular structure, ∞). LSF parameters, due to their particular structure, possess properties similar to those of the formant frequencies possess properties similar to those of the formant frequencies and bandwidths.and bandwidths.

).()()(

)()()(1)1(

1)1(

zAzzAzQ

zAzzAzPp

p

53

4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures

.)()(

)( where

2/)]()([)(

:tionimplementa oneIn

bands critical ofnumber total the:K

difference slope spectral band critical for the coefficien weighting the:)(

differenceenergy aboslute for theconstant weighting the:

)()()(),(1

2

iVu

u

iVu

uiu

iuiuiu

iu

u

iiiuEEuSSd

GMGM

GM

LMLM

LMs

ss

E

K

iSSEWSM

Weighted slope metric proposed by Weighted slope metric proposed by Klatt:Klatt:

54

4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures

).( and )(in

y singularitprevent toand sticscharacteri spectral global theand local the

todue onscontributi thebalance toused are and tscoefficien The

ly.respective peaks, spectral (GM) maximum global theand ,(LM) maximum

localnearest its and band criticalith at the magnitude spectral the

between dB)(in sdifference spectral log theare )( and )(

)()()( where

2/)]()([)(

)()()(),(1

2

iuiu

uu

iViV

iVu

u

iVu

uiu

iuiuiu

iiiuEEuSSd

ss

GMLM

GMLM

GMGM

GM

LMLM

LMs

ss

K

iSSEWSM

55

4.5.7 Alternative Spectral Representations and Distortion Measures4.5.7 Alternative Spectral Representations and Distortion Measures

56

ComputatiComputationon

ExpressionExpressionNotatioNotationn

DistortionDistortion

MeasureMeasure

Metric pL

Distance

CepstralTruncated

Distance)Cepstral

(Liftered)Weighted

pd

)(2 Ldc

2cWd

pp d

ss1

2)(log)(log

L

nnn cc

1

2)(

2

1

))(( nn

L

n

ccnw

egralsFFTs int,log,2

*,L

*,L

Summary of Spectral Distortion Summary of Spectral Distortion MeasuresMeasures

57

ComputatiComputationon

ExpressionExpressionNotatioNotationn

Distortion Distortion MeasureMeasure

Distortion

SaitoItakura

Distortion

Itakura

Distortion

RatioLikelihood

ISd

Id

LRd

1log2

1log2)(

)(

2

2

2

2

2

2

2

2

p

p

p d

A

A

d

S

S

*,p

*,p

*,p

2

2

2

log

2log

p

pt

p

aRa

d

A

A

1

12

2

2

2

p

pt

p

aRa

d

A

A

Summary of Spectral Distortion Summary of Spectral Distortion MeasuresMeasures

58

ComputatiComputationon

ExpressionExpressionNotatioNotationn

Distortion Distortion MeasureMeasure

DistanceCOSH COSHd ),(),(

2

1

12)(

)(logcosh

ssdssd

d

S

S

ISIS

*,2p

DistortionRatio

LikelihoodWeighted

WLRd )()()(

122 nn

L

n

ccnrnr

*,L

Metric

SlopeWighted

WSMd

K

iSSE iiiuEEu

1

2)()()( *,K

Summary of Spectral Distortion Summary of Spectral Distortion MeasuresMeasures

59

4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE FEATURES INTO THE DISTORTION MEASURE

,)(),(log

n

jnn et

tc

t

tS

A first-order differential (log) spectrum is defined A first-order differential (log) spectrum is defined by:by:

60

4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE

0])([

0])([

0])([

)]()([

43

32

21

2

23

221

2321

22321

thththttc

thththttc

ththhtc

ththhtcE

M

Mt

M

Mt

M

Mt

M

Mt

Fitting the cepstral trajectoryFitting the cepstral trajectoryby a second order polynomial,by a second order polynomial,Choose h1, h2, h3 such that Choose h1, h2, h3 such that

E is minimized.E is minimized.

Differentiating E with respect Differentiating E with respect to h1, h2, and h3 and setting to h1, h2, and h3 and setting to zero results in 3 equations:to zero results in 3 equations:

61

4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE

.

,)(12

1

)12(

)()12()(

)(

2

3

42

2

3

2

1

M

MtM

M

MtM

M

MtM

M

Mt

M

MtM

M

M

Mt

tT

ThtcM

h

tMT

tcrMtcTh

T

ttch

The solutions to these equations are:The solutions to these equations are:

62

4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE

63

4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE

M

MtM

M

Mt

M

MtM

tn

M

MtMnt

n

tMT

tctMtcT

ht

tc

Ttctht

tc

42

2

302

2

20

)12(

)()12()(2

2)(

)()(

The first and second time derivatives of cn can be obtained by differentiatingThe first and second time derivatives of cn can be obtained by differentiatingthe fitting curve, givingthe fitting curve, giving

64

4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE

0

2

2)2(

0

)1(

2)2()2(

2

2

2

2

22

2

2)1()1(

22

2

)( and

)( where

)(2

),(log),(log

,)(2

),(log),(log

)2(

)1(

t

nn

t

nn

nnn

nnn

t

tc

t

tc

d

t

tS

t

tSd

d

t

tS

t

tSd

A differential spectral distance:A differential spectral distance:

A second differential spectral distance:A second differential spectral distance:

65

4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE

1usually

,

321

2

232

22221

22 )2()1(

dddd

n

jnn

n

jnn

etjn

et

tcjntS

t

.)(

)()],([log

)1(

2

Cepstral weighting or liftering by differentiatingCepstral weighting or liftering by differentiating

Combining the first and second differential spectral distances with the Combining the first and second differential spectral distances with the Cepstral distance results in:Cepstral distance results in:

66

4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE

.)(

2

),(log),(log

2)1()1(2

22222

nnn

w

n

d

t

tS

t

tSd

A weighted differential cepstral distance:A weighted differential cepstral distance:

67

4.6 INCORPORATION OF SPECTRAL DYNAMIC 4.6 INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASUREFEATURES INTO THE DISTORTION MEASURE

nnnnnWW

nnnnn

nn

nn

nn

nnnnn

w

tttctcndd

tttctcn

ttntctcn

tttctcn

dtStS

td

)].()([)]()([2

)]()([)]()([2

)]()([)]()([

)()()()(

2,(log),(log

)1()1(222

22

22

21

)1()1(221

2)1()1(222

2221

2)1(2

)1(211

2

22

2122

21

Taking the L2 distanceTaking the L2 distance

Other operators can be added to produce a combined representation Other operators can be added to produce a combined representation Of the spectrum and the differential spectra. As an example:Of the spectrum and the differential spectra. As an example:

jnn

nn ettcjntS

t

)]()([),(log )1(2