text-independent speaker verification

18

Click here to load reader

Upload: cody-ray

Post on 06-May-2015

4.137 views

Category:

Technology


2 download

DESCRIPTION

Presentation slides discussing the theory and empirical results of a text-independent speaker verification system I developed based upon classification of MFCCs. Both mininimum-distance classification and least-likelihood ratio classification using Gaussian Mixture Models were discussed.

TRANSCRIPT

Page 1: Text-Independent Speaker Verification

Speaker Recognition

Cody A. RayECES 435 Final Project

March 11, 2010

Page 2: Text-Independent Speaker Verification

• Speaker Recognition

• Speaker Identification • Speaker Verification

• Text• Dependent

• Text• Independent

• Text• Dependent

• Text• Independent

Page 3: Text-Independent Speaker Verification

Speaker Recognition System

Feature Extraction Training Speaker

Model

FeatureExtraction Matching

Verification

Training speech Feature Vector

Target & Background

ScoreTest speech

Cepstrum LPCC MFCC Glottal Flow Derivative

Deterministic Models Min Distance DTW

Stochastic Models GMM HMM

Minimum Distance Maximum-Likelihood Maximum a posteriori Minimum-Mean-Squared Error

Testing

Page 4: Text-Independent Speaker Verification

Feature Extraction

• Big surprise here – MFCCs!

Window DFT | . |

DCT Filter Bank

Speech signal x[m] w[n-m] X(n, w)

MFCCsLog

Emel(n, l) Mel-Scale

MFCC - 12 coefficients (skip 0’th order coefficient)256 sample frames, 128 sample increment, Hamming windowTriangular filters in mel domain (absolute magnitude)

Page 5: Text-Independent Speaker Verification

Mel Frequency Bank

Page 6: Text-Independent Speaker Verification

System 1: Minimum-Distance

• Average of mel-cepstral features for test and training data

C melts [n] =

1

MCmel

ts [mL,n]m=1

M

C meltr [n] =

1

MCmel

tr [mL,n]m=1

M

Page 7: Text-Independent Speaker Verification

Minimum-Distance Classifier

• Mean-squared difference between average testing and training feature vectors

D =1

R −1(C mel

ts [n] − C meltr [n])2

n=1

R−1

if D < T, then speaker is present

Page 8: Text-Independent Speaker Verification

System 2: Gaussian Mixture Model

Multivariate Normal Distribution

Page 9: Text-Independent Speaker Verification

Gaussian Mixture Model

Page 10: Text-Independent Speaker Verification

GMM Speaker Recognition System

λ = pi,μ i,Σi}{

TargetModel

Feature Vectors

Imposter 1

Imposter 2€

+

Λ(X) ≥ θ, accept

Λ(X) < θ, reject

Λ(X)

Page 11: Text-Independent Speaker Verification

Log-Likelihood Ratio

Λ(X) = log[p(X | λC )] − log[p(X | λC

)]

Λ(X) ≥ θ, accept

Λ(X) < θ, reject

P(λC | X)

P(λC

| X)=

p(X | λC )P(λC ) /P(X)

p(X | λC

)P(λC

) /P(X)

Page 12: Text-Independent Speaker Verification

Experiments

• 8 Speakers (4 Male, 4 Female)• 2 Sentences Each– Don’t ask me to carry an oily rag like that– She had your dark suit in greasy wash water all year

• “Rag” used for training, “suit” for testing

Page 13: Text-Independent Speaker Verification

ResultsTest1 Test2 Test3 Test4 Test5 Test6 Test7 Test8

Train1 0.1192 0.1945 0.2151 0.2184 0.5364 0.3823 0.4963 0.4538

Train2 0.0724 0.0378 0.0406 0.0783 0.4035 0.3177 0.4125 0.3986

Train3 0.1672 0.1311 0.1042 0.0969 0.3382 0.2121 0.3597 0.2847

Train4 0.1482 0.1412 0.1363 0.0817 0.3268 0.3211 0.3154 0.3282

Train5 0.1882 0.1928 0.2237 0.1466 0.1044 0.0709 0.1382 0.1299

Train6 0.3012 0.3521 0.3208 0.3112 0.3023 0.0958 0.2755 0.2094

Train7 0.2743 0.2973 0.3252 0.2517 0.1618 0.1318 0.0724 0.1427

Train8 0.3589 0.3600 0.3381 0.2186 0.1976 0.1133 0.2487 0.0585

Page 14: Text-Independent Speaker Verification

ResultsTest1 Test2 Test3 Test4 Test5 Test6 Test7 Test8

Train1 0.1192 0.1945 0.2151 0.2184 0.5364 0.3823 0.4963 0.4538

Train2 0.0724 0.0378 0.0406 0.0783 0.4035 0.3177 0.4125 0.3986

Train3 0.1672 0.1311 0.1042 0.0969 0.3382 0.2121 0.3597 0.2847

Train4 0.1482 0.1412 0.1363 0.0817 0.3268 0.3211 0.3154 0.3282

Train5 0.1882 0.1928 0.2237 0.1466 0.1044 0.0709 0.1382 0.1299

Train6 0.3012 0.3521 0.3208 0.3112 0.3023 0.0958 0.2755 0.2094

Train7 0.2743 0.2973 0.3252 0.2517 0.1618 0.1318 0.0724 0.1427

Train8 0.3589 0.3600 0.3381 0.2186 0.1976 0.1133 0.2487 0.0585

Page 15: Text-Independent Speaker Verification

ResultsTest1 Test2 Test3 Test4 Test5 Test6 Test7 Test8

Train1 0.1192 0.1945 0.2151 0.2184 0.5364 0.3823 0.4963 0.4538

Train2 0.0724 0.0378 0.0406 0.0783 0.4035 0.3177 0.4125 0.3986

Train3 0.1672 0.1311 0.1042 0.0969 0.3382 0.2121 0.3597 0.2847

Train4 0.1482 0.1412 0.1363 0.0817 0.3268 0.3211 0.3154 0.3282

Train5 0.1882 0.1928 0.2237 0.1466 0.1044 0.0709 0.1382 0.1299

Train6 0.3012 0.3521 0.3208 0.3112 0.3023 0.0958 0.2755 0.2094

Train7 0.2743 0.2973 0.3252 0.2517 0.1618 0.1318 0.0724 0.1427

Train8 0.3589 0.3600 0.3381 0.2186 0.1976 0.1133 0.2487 0.0585

Threshold = 0.12Accuracy = 91%

Page 16: Text-Independent Speaker Verification

ResultsTest1 Test2 Test3 Test4 Test5 Test6 Test7 Test8

Train1 0.1192 0.1945 0.2151 0.2184 0.5364 0.3823 0.4963 0.4538

Train2 0.0724 0.0378 0.0406 0.0783 0.4035 0.3177 0.4125 0.3986

Train3 0.1672 0.1311 0.1042 0.0969 0.3382 0.2121 0.3597 0.2847

Train4 0.1482 0.1412 0.1363 0.0817 0.3268 0.3211 0.3154 0.3282

Train5 0.1882 0.1928 0.2237 0.1466 0.1044 0.0709 0.1382 0.1299

Train6 0.3012 0.3521 0.3208 0.3112 0.3023 0.0958 0.2755 0.2094

Train7 0.2743 0.2973 0.3252 0.2517 0.1618 0.1318 0.0724 0.1427

Train8 0.3589 0.3600 0.3381 0.2186 0.1976 0.1133 0.2487 0.0585

Threshold = 0.11Accuracy = 91%

Page 17: Text-Independent Speaker Verification

Conclusions

• Accuracy isn’t terrible, but room to improve• Threshold tradeoff– false-negatives vs. false-positives

• DON’T use Minimum-Distance classifier for text-independent authentication systems

Page 18: Text-Independent Speaker Verification

Future Work

• Implement LLR Classifier using GMM library• Repeat experiment with GMM-based system• Compare Min-Distance and GMM results