1 robust hmm classification schemes for speaker recognition using integral decode marie roch florida...

1

Robust HMM classification schemes for speaker recognition

using integral decode

Marie Roch

Florida International University

2

Who am I?

3

Speaker Recognition

Verification Identification

Text

Dependent

Text

Independent

• Types of speaker recognition

4

Speaker Recognition

• Why is it hard?• Minimal training data

• Background noise

• Transducer mismatch

• Channel distortions

• People’s voices change over time and under stress

• Performance

5

Feature Extraction

• Extract speech

• Spectral analysis

• Cepstrum:

• Cepstral means removal

))((log(1 Sdft

6

Hidden Markov Models

• Statistical pattern recognition

• State dependent modeling– Distribution/state– Radial basis functions common

• State sequence unobservable

7

HMM

• Efficient decoders:

• Training – EM algorithm– Convergence to local maxima guaranteed

)( 2TNO

8

Recognition

• Model for each speaker

• Maximum a priori (MAP) decision rule

ArgMaxFeatures

Models

Scores

9

The MAP decision rule

• Optimal decision rule provided we have accurate distribution parameters & observations.

• Problem:– Corruption of feature vectors.– Distribution known to be inaccurate.

10

A case of mistaken identity

11

Integral decode

• Goal: Include uncorrupted observation ôt.

• Problem: ôt unobservable.

• Determine a local neighborhood t about ot and use a priori information to weight the likelihood:

ooMoMotot

)|Pr()|Pr()|Pr(

12

Integral decode issues

• Problems approximating the integral– High frame rate * number of models– Non-trivial dimensionality

• Selection of the neighborhood

13

Approximating the integral

• Monte Carlo impractical

• Use simplified cubature technique:

1 2)|)(stepPr()()|)(step(...)|Pr( i i i

prior

t

area

pdf

tt CioiMiofMo

C

j jj

jj E

kiki 1 )

1

2)1(()(step

14

Neighborhood choice

• Choosing an appropriate neighborhood:– Upper bound difference neighborhoods [Merhav and Lee 93]

– Error source modeling

15

Upper bound difference neighborhoods

• Arbitrary signal pairs with a few general conditions.

• PSD

• Cepstra 1 1

1

ki iic

ki j

ij

i

ji

ji

ee

eeS 1 )1)(1(

)1)(1()(

16

Taking the upper bound

• Asymptotic difference between cepstral parameters:

iiii

k

iiii

k

iiiicc

,,,max 4k,

1

1

1

1

)2()1(

17

Error source modeling

• Multiple error sources

• Simplifying assumption of one normal distribution with zero mean

• Use time series analysis to estimate the noise

• Trend

ttt nO

to

tot

tt

t

t

1

1

1

1

18

Error Source Modeling

• Estimate variance from detrended signal

19

Error source modeling

• Problem: – is infinite

• Solution:– Most of the points are outliers– Set percentage of distribution beyond which

points are culled.

ooMoMotot

)|Pr()|Pr()|Pr(

t

20

Complexity of integration

• Expensive

• Ways to reduce/cope– Implemented

• Top K processing• Principle Components Analysis

– Possible• Gaussian Selection• Sub-band Models• SIMD or MIMD parallelism

)( 2

pdf

nIntegratio

C

Mixtures

Decoder

Speakers

CEMTNSO

21

Top K Processing

)( 2 CTMENSO CTopK

1 second 3 seconds

5 seconds

22

Principal Component Analysis

• Choose P most important directions

23

Principal Component Analysis

• Integrate using new basis set for step function

24

Speech Corpus

• King-92– Used San Diego subset

• 26 male speakers

• Long distance telephone speech

• Quiet room environment

• 5 sessions recorded one week apart– 1-3 train

– Sessions 4-5 partitioned into test segments

25

Baseline performance

26

Integral decode performance

Test Baseline Upper Bound Difference Error Modeling Length Error Error % Error % 1 0.4420 0.4237 0.0183 4.14 0.4401 0.0019 0.43 3 0.1833 0.1554 0.0279 15.22 0.1753 0.0080 4.64 5 0.0872 0.0738 0.0134 15.37 0.0638 0.0234 26.83

1 second 3 seconds 5 seconds

27

Integral decode with other conditions

• Performance on – high quality speech– transducer mismatch

28

Future work

• Extensions to the integral decode– Automatic parameter selection– Gaussian selection– distributed computation

• Efficient multiple class preclassifiers

30

Optimal/utterance hyperparameters – 5 seconds

KingN

B2

6 KingW

B5

1

SpidreF

18XD

R SpidreM

27XD

R

31

95% Confidence Intervals

• Caveat: – Per speaker

means– Large

granularity

32

Pattern Recognition

• Long term statistics [Bricker et al 71, Markel et al 77]

• Vector Quantization [Soong et al 87]

• HMM [Rosenberg et al 90, Tishby 91, Matsui & Furui 92, Reynolds et al 95]

• Connectionist frameworks• Feed forward [Oglesby & Mason 90] • Learning vector quantization [He et al 99]

33

Pattern Recognition Contd.

• Hybrid/Modified HMMs• Min Classification Error discriminant [Liu et al 95] • Tree structured neural classifiers [Liou & Mammone 95]

• Trajectory modeling [Russell et al 85, Liu et al 95, Ostendorf et al 96, He et al 99]

• Sub-band recognition [Besacier & Bonastre 97]

1 robust hmm classification schemes for speaker recognition using integral decode marie roch florida...

Documents

neighborhood slide

stress performance slide

baseline performance

error source modeling

detrended signal slide

noise trend slide

mimd parallelism slide

step function slide