objectives: mllr for two gaussians mean and variance adaptation matlb example

ECE 8443 – Pattern RecognitionECE 8423 – Adaptive Signal Processing

• Objectives:MLLR For Two GaussiansMean and Variance AdaptationMATLB Example

• Resources:JLG: MAP AdaptationJLG: MAP AdaptationWiki: MAP Estimation

• URL: .../publications/courses/ece_8423/lectures/current/lecture_20.ppt• MP3: .../publications/courses/ece_8423/lectures/current/lecture_20.mp3

LECTURE 20: MLLR AND MAP – ANOTHER LOOK

http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6WCW-45JK53D-5&_user=10&_coverDate=04/30/1998&_rdoc=1&_fmt=high&_orig=browse&_srch=doc-info(#toc%236749%231998%23999879997%23302055%23FLT%23display%23Volume)&_cdi=6749&_sort=d&_docanchor=&view

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8423/lectures/current/lecture_20.ppt

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8423/lectures/current/lecture_20.mp3

http://www.biopsychology.org/norwich/n1984/norwich1984.htm

http://amp.ece.cmu.edu/people/Simon/voice%20recognition/index.htm

ECE 8423: Lecture 20, Slide 2

MLLR Example (Revisited)

• Let us consider the problem of the adaptation of the parameters associated with two Gaussians:

• We choose to adapt the mean parameters using an affine transformation (essentially a regression model):

• Since this is a single dimension Gaussian, the variance will simply be scaled:

• Our goal is to estimate the transformation parameters:

• We would like to do this using N points of training data (old data) and M points of adaptation data (new data), where typically N >> M.

• The solution we seek will essentially replace the mean of the old data with the mean of the new data, but it will do this through estimation of a transformation matrix rather than simply using the sample mean.

1,0for),(2

1)(

2

2

2

)(

ieop ii

o

i

ii

i

N

1,0where1ˆ 1,0,1,

0,1,0,

iwww

w

www t

iiit

i

iiiiii wξ

22ˆ iii s

1

0

1,10,1

1,00,0

2

1

s

s

ww

wws

w

wW


Establishing an Auxiliary Function

• The latter (reestimation by using the sample mean), we will soon see, can be regarded as a variant of a maximum a posteriori estimate. The drawback of this approach is two-fold:

Reestimation using the sample mean requires the state or event associated with that Gaussian to be observed. In many applications, only a small percentage the events associated with specific Gaussians are observed in a small adaptation set. The transformation matrix approach allows us to share transformation matrices across states, and hence adapt the parameters associated with unseen events or states.

Through transformation sharing, we can intelligently, based on the overall likelihood of the data given the model, cluster and share parameters, and to do this in a data-driven manner. This is extremely powerful and gives much better performance than static or a priori clustering.

• Assume a series of T observations, , where oi is simply a scalar value that could have been generated from one of two Gaussians.

• Define an auxiliary Q-function for parameter reestimation based on the EM theorem such that its maximization will produce an ML estimate:

},,,{ 21 Tooo o

istate""denotesswhere))ˆ|,(log(*)|,()ˆ,(2

1 1

i

T

ttt isoPisoPQ


Establishing an Auxiliary Function

• This auxiliary function is simply the total likelihood. There are T observations, each of which could have been produced by one Gaussian. Hence, there are four possible permutations, and this function sums the “entropy” contribution of each outcome.

• However, note that we use both the current model, , and the new, or adapted model, , in this calculation, which makes it resemble a divergence or cross-entropy. This is crucial because we want the new model to be more probable given the data than the old model.

• Recall two simple but important properties of conditional probabilities:

The latter is known as Bayes’ Rule.

• Applying these to the likelihood in our auxiliary function:

• Also, we can expand the log:

)|(

)|,(),|(

)(

)()|()|(

)(

),()|(

CBP

CBAPCBAP

BP

APABPBAP

BP

BAPBAP

)|()|,()|(

)|,(),|(

)and(given t timeatoccurredGaussianithethatyprobabilit th

titt

tt

ti

oPtisoPoP

isoPoisP

ot

2

2ˆ2

)ˆ(

ˆ2

)ˆ(ˆlog

ˆ2

1log)ˆ|,(log

2

2

i

iti

o

i

t

oconsteisoP i

it


Minimizing the Auxiliary Function

• Therefore, we can write our auxiliary function as:

• We can differentiate w.r.t. our transformation coefficients, wi,j, and set to 0:

• Solve for the new mean:

2

1 12

22

1 1 ˆ2

)ˆ()ˆlog()ˆ,(

i

T

t i

iti

i

T

tii

ottconstQ

0)ˆ(ˆ

)ˆ

(

ˆ2

)ˆ

)(ˆ(2

ˆ

ˆ2

)ˆ()ˆlog()ˆ,(

12

,

12

,

1,0,

2

1 12

22

1 1,,

T

titi

i

ji

i

T

t i

ji

iit

i

iiii

i

T

t i

iti

i

T

tii

jiji

otdw

d

dw

do

t

ww

ottconst

dw

dQ

dw

d

:Therefore

:Recall

T

ti

T

tti

i

i

iT

ti

T

tti

i

t

ot

w

w

t

ot

1

1

1,

0,

1

1 ]1[ˆ


Mean Adaptation

• This is an underdetermined system of equations – one equation and two unknowns.

• Use a pseudoinverse solution which adds an additional constraint that the parameter vector should have the minimum norm. Use “+” to denote this solution:

• This pseudoinverse is found using singular value decomposition. In this case this returns the weight vector that has a minimum norm.

• Note that if each Gaussian is equally likely, then the computation reduces to the sample mean.

T

ti

T

tti

ii

i

t

ot

w

w

1

1

1,

0,

)(

)(]1[

http://en.wikipedia.org/wiki/Pseudoinverse

http://en.wikipedia.org/wiki/Moore-Penrose_pseudoinverse


Variance Adaptation:

• Recall our expression for the auxiliary function

• Recall also that:

• We can differentiate Q with respect to si:

• Equating this to zero gives the ML solution:

• Hence:

2

1 12

22

1 1 ˆ2

)ˆ()ˆlog()ˆ,(

i

T

t i

iti

i

T

tii

ottconstQ

22ˆ iii s

T

t i

i

i

iti

T

t i

i

ii

T

t i

i

iit

i

T

t i

i

ii

i

ds

dot

ds

dt

ds

do

tds

dtQ

ds

d

13

2

1

13

2

1

ˆ

ˆ

)ˆ()(

ˆ

ˆ1

)(

ˆ2

)ˆ

2()ˆ(

)(ˆ

ˆ1

)()ˆ,(

2

1

1

2

1*

)(

)ˆ)((

iT

ti

T

titi

i

t

ots

T

ti

T

titi

i

T

t i

i

i

iti

T

t i

i

ii

t

xt

ds

dxt

ds

dt

1

1

2

2

13

2

1 )(

)ˆ)((ˆ

ˆ

ˆ

)ˆ()(

ˆ

ˆ1

)(


Example (MATLAB Code)

clear;N=500;M=200;

mu_train=2.0;var_train=1.0;x_train=mu_train+randn(1,N)*sqrt(var_train);

mu_test=2.5;var_test=1.5;x_test=mu_test+randn(1,M)*sqrt(var_test);

mean_hat=[];var_hat=[];

for i=1:M m=mean(x_test(1:i));

w = pinv([1 mu_train])*m; mean_hat=[mean_hat [1 mu_train]*w];

v=var(x_test(1:i)); s=v/var_train; var_hat=[var_hat s*var_train];end;

subplot(211);line([1,M],[mu_train, mu_train],'linestyle','-.');hold on;subplot(211);line([1,M],[m, m],'linestyle','-.','color','r');subplot(211);plot(mean_hat);text(-32, mu_train, 'train \mu');text(-32, m, 'adapt \mu');title('Mean Adaptation Convergence');

subplot(212);line([1,M],[var_train, var_train],'linestyle','-.');hold on;subplot(212);line([1,M],[v, v],'linestyle','-.','color','r');subplot(212);plot(var_hat);text(-32, var_train, 'train \sigma^2');text(-32, v, 'adapt \sigma^2');title('Variance Adaptation Convergence');


Example (Results)


Maximum A Posteriori (MAP) Estimation

• Suppose we are given a set of M samples from a discrete random variable, x(n), for n=0,1,…,M-1, and we need to estimate a parameter, , from this data.

• We assume the parameter is characterized by a probability density function, f(), that is called the a priori density.

• The probability density function f(|x) is the probability density of conditioned on the observed data, x, and is referred to as the a posteriori density for .

• The Maximum A Posteriori (MAP) estimate of is the peak value of this density.

• The maximum can generally be found by: . We refer to the value of that maximizes this equation as MAP .

• We often maximize the log of the posterior: .

• Determination of f(|x) is difficult. However, we use Bayes’ rule to write:

where is the prior density of .

• Applying a logarithm, we can write theMAP solution as:

0)|(

xf

0)|(ln

x

f

xx

xf

fff

)()|()|(

)(f

0ln)|(ln

ff x


MAP Estimation Example


• Demonstrated MLLR on a simple example involving mean and variance estimation for two Gaussians.

• Reviewed a MATLAB implementation and MATLAB simulation results.

Summary

objectives: mllr for two gaussians mean and variance adaptation matlb example

Documents