speaker recognition using gaussian mixture model

26
GMM Gaussian mixture models 06/06/2022 1 Saurab Dulal IOE, pulchowk Campus

Upload: saurab-dulal

Post on 05-Dec-2014

300 views

Category:

Engineering


6 download

DESCRIPTION

This presentation slide contains, Introduction to Gaussian mixture model and its application in identifying speaker.

TRANSCRIPT

Page 1: Speaker Recognition using Gaussian Mixture Model

1

GMMGaussian mixture models

04/10/2023

Saurab Dulal

IOE, pulchowk Campus

Page 2: Speaker Recognition using Gaussian Mixture Model

2

Introduction to GMM• Gaussian“Gaussian is a

characteristic symmetric "bell curve" shape that quickly falls off towards 0 (practically)”

• Mixture Model“mixture model is a

probabilistic model which assumes the underlying data to belong to a mixture distribution”

Page 3: Speaker Recognition using Gaussian Mixture Model

3

Introduction to GMM• Mathematical Description of GMM

p(x) = w1 p1 (x) + w2p2 (x) + w3 p3 (x) ……… +wn pn (x)

where p(x) = mixture component

w1, w2 ….. wn = mixture weight or mixture coefficient

pi (x) = Density functions

Fig :- Image

showing

Best fit

Gaussian

Curve

Page 4: Speaker Recognition using Gaussian Mixture Model

4

Introduction to GMM“The most common mixture distribution is the Gaussian

(Normal) density function, in which each of the mixture components are Gaussian distributions, each with their own mean and variance parameters.”

p(x) = w1N( x | µ1∑1 )+ w1N( x | µ2∑2 )… +w1N( x | µn∑n )

µi ‘s are means and ∑i ‘s are covariance-matrix of individual components(probability density function)

G1,w1 G2,w2

G3,w3

G4,w4

G5,w5

Page 5: Speaker Recognition using Gaussian Mixture Model

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Component 1 Component 2p(

x)

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Mixture Model

x

p(x)

Page 6: Speaker Recognition using Gaussian Mixture Model

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Component 1 Component 2p(

x)

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Mixture Model

x

p(x)

Page 7: Speaker Recognition using Gaussian Mixture Model

-5 0 5 100

0.5

1

1.5

2

Component Modelsp(

x)

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Mixture Model

x

p(x)

Page 8: Speaker Recognition using Gaussian Mixture Model

8

GMM for Speaker Recognition

Motivation • Interpretation that Gaussian component

represent some general speaker –dependent spectral shapes

• Capabilities of Gaussian mixture to model arbitrary densities

Page 9: Speaker Recognition using Gaussian Mixture Model

9

Description of SR-using GMM

• Speech Analysis• Model Description• Model Interpretations• Maximum Likelihood Parameters Estimation• Speaker Identification

Page 10: Speaker Recognition using Gaussian Mixture Model

10

Speech Analysis

• Linear predictive coding(LPC)• Mel-scale filter-bank(to reducenoise)

Analysis is ended with the generation of Cepstrum coefficients x1

’, x2’

x3’….xn’

A cepstrum is the result of taking the Inverse Fourier transform (IFT) of the logarithm of the estimated spectrum of a signal.

Cosine transform

Page 11: Speaker Recognition using Gaussian Mixture Model

2000/05/03 11

Model Description

Gaussian Mixture Density

)()|(1

xbpxpM

iii

Where x

D-dimensional random vector

)()'(

2

1exp

)2(

1)( 1

212 iii

iDi xxxb

iiip ,, Mi ,,1

Nodal, Grand,Global

Nodal, diagonal (this)

Covariance matrix

Mean

Component Density

Speaker Model

Page 12: Speaker Recognition using Gaussian Mixture Model

12

Choice of Covariance Matrix• Nodal Covariance One co-variance matrix per Gaussian component

• Grand CovarianceOne co-variance matrix for all Gaussian component

• Global Covariance single co-variance matrix shared by all speaker component

Page 13: Speaker Recognition using Gaussian Mixture Model

13

Model Interpretation

• Intuitive notion Acoustic classes(vowels, nasals, fricatives) reflects

some general speaker-dependent vocal tract configuration that are useful for characterizing speaker-identity

• GMM have ability to form smooth approximation to arbitrary shaped density

• It doesn’t only have smooth approx but also multimodal nature of densities

Page 14: Speaker Recognition using Gaussian Mixture Model

2000/05/03 14

ML-Parameters EstimationStep:

1. Beginning with an initial model

2. Estimate a new model such that

Mixture density

3. Repeated 2. until certain threshold is reached.

…Maximum Likelihood

)|()|( XpXp

Page 15: Speaker Recognition using Gaussian Mixture Model

2000/05/03 15

(Mixture Weights)

(Means)

(Variances)

T

tti xip

Tp

1

),|(1

T

t t

T

t tti

xip

xxip

1

1

),|(

),|(

2

1

1

22

),|(

),|(iT

t t

T

t tti

xip

xxip

M

k tkk

tiit

xbp

xbpxip

1)(

)(),|(

Mixture

Density

ComponentDensity

and refers to arbitrary elements of vectors ii

,2 and tx

ii ','2

'tx

and

Page 16: Speaker Recognition using Gaussian Mixture Model

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4ANEMIA PATIENTS AND CONTROLS

Red Blood Cell Volume

Red

Blo

od C

ell H

emog

lobi

n C

once

ntra

tion

Page 17: Speaker Recognition using Gaussian Mixture Model

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell

He

mo

glo

bin

Co

nce

ntr

atio

n

EM ITERATION 1

Page 18: Speaker Recognition using Gaussian Mixture Model

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell

He

mo

glo

bin

Co

nce

ntr

atio

n

EM ITERATION 3

Page 19: Speaker Recognition using Gaussian Mixture Model

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell

He

mo

glo

bin

Co

nce

ntr

atio

n

EM ITERATION 5

Page 20: Speaker Recognition using Gaussian Mixture Model

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell

He

mo

glo

bin

Co

nce

ntr

atio

n

EM ITERATION 10

Page 21: Speaker Recognition using Gaussian Mixture Model

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell

He

mo

glo

bin

Co

nce

ntr

atio

n

EM ITERATION 15

Page 22: Speaker Recognition using Gaussian Mixture Model

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell

He

mo

glo

bin

Co

nce

ntr

atio

n

EM ITERATION 25

Page 23: Speaker Recognition using Gaussian Mixture Model

0 5 10 15 20 25400

410

420

430

440

450

460

470

480

490LOG-LIKELIHOOD AS A FUNCTION OF EM ITERATIONS

EM Iteration

Lo

g-L

ike

liho

od

Page 24: Speaker Recognition using Gaussian Mixture Model

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell

He

mo

glo

bin

Co

nce

ntr

atio

n

ANEMIA DATA WITH LABELS

Anemia Group

Control Group

Page 25: Speaker Recognition using Gaussian Mixture Model

2000/05/03 25

Speaker IdentificationA group of speakers S = {1,2,…,S} is represented by GMM’s λ1, λ2, …, λs, the obective is to find the speaker model which has the maximum a posteriori probability for a given observation sequence

)(

)Pr()|(maxarg)|Pr(maxargˆ11 Xp

XpXS kk

Skk

Sk

)|(maxargˆ1

kSk

XpS

)|(logmaxargˆ1

1kt

T

tSk

xpS

T

ttiikt xbpxp

1

)()|( which

logtake

Page 26: Speaker Recognition using Gaussian Mixture Model

26

ReferencesD. A. Reynolds and R. C. Rose, “Robust Text- Independent

Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Trans. on Speech and Audio Processing, vol.3, No.1, pp.72-83,January 1995.

• http://en.wikipedia.org/wiki/Probability_density_function• http://crsouza.blogspot.com/2010/10/gaussian-mixture-

models-and-expectation.html• https://www.ll.mit.edu/mission/communications/ist/public

ations/0802_Reynolds_Biometrics-GMM.pdf• http://statweb.stanford.edu/~tibs/stat315a/LECTURES/e

m.pdf• http://eprints.pascal network.org/archive/00008291/01/S

oftAssignReconstr_ICIP2011.pdf• http://home.deib.polimi.it/matteucc/Clustering/tutorial_ht

ml/kmeans.html