machine learning for signal processing linear gaussian...

65
Machine Learning for Signal Processing Linear Gaussian Models Class 21. 12 Nov 2013 Instructor: Bhiksha Raj 12 Nov 2013 11755/18797 1

Upload: others

Post on 02-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Machine Learning for Signal Processing

Linear Gaussian Models

Class 21. 12 Nov 2013

Instructor: Bhiksha Raj

12 Nov 2013 11755/18797 1

Page 2: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Administrivia

• HW3 is up

– .

• Projects – please send us an update

12 Nov 2013 11755/18797 2

Page 3: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Recap: MAP Estimators

• MAP (Maximum A Posteriori): Find a “best

guess” for y (statistically), given known x

y = argmax Y P(Y|x)

12 Nov 2013 11755/18797 3

Page 4: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Recap: MAP estimation

• x and y are jointly Gaussian

12 Nov 2013 11755/18797 4

• z is Gaussian

y

xz

y

x

zzE

][

yyyx

xyxx

zz CC

CCCzVar )( ]))([( T

yxxy yxEC

T

zz

zz

zzz zzC

CNzP ))((5.0exp||2

1),()(

Page 5: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MAP estimation: Gaussian PDF

12 Nov 2013 11755/18797 5

F1 X Y

Page 6: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MAP estimation: The Gaussian at a particular value of X

12 Nov 2013 11755/18797 6

x0

Page 7: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Conditional Probability of y|x

• The conditional probability of y given x is also Gaussian

– The slice in the figure is Gaussian

• The mean of this Gaussian is a function of x

• The variance of y reduces if x is known

– Uncertainty is reduced

12 Nov 2013 11755/18797 7

)),(()|( 11

xyxx

T

yxyyxxxyxy CCCCxCCNxyP

)(][ 1

|| xxxyxyxyxy xCCyE

xyxx

T

xyyy CCCCxyVar 1)|(

Page 8: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

F1

MAP estimation: The Gaussian at a particular value of X

12 Nov 2013 11755/18797 8

Most likely

value

x0

Page 9: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MAP Estimation of a Gaussian RV

12 Nov 2013 11755/18797 9

x0

][)|(maxargˆ| yExyPy xyy

Page 10: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Its also a minimum-mean-squared error estimate

• Minimize error:

• Differentiating and equating to 0:

12 Nov 2013 11755/18797 10

]|ˆˆ[]|ˆ[2

xyyyyxyy T

EEErr

]|[ˆ2ˆˆ]|[]|ˆ2ˆˆ[ xyyyyxyyxyyyyyy EEEErr TTTTTT

0ˆ]|[2ˆˆ2. yxyyy dEdErrd TT

]|[ˆ xyy E The MMSE estimate is the mean of the distribution

Page 11: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

For the Gaussian: MAP = MMSE

12 Nov 2013 11755/18797 11

Most likely

value

is also

The MEAN

value

Would be true of any symmetric distribution

Page 12: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MMSE estimates for mixture distributions

12

Let P(y|x) be a mixture density

The MMSE estimate of y is given by

Just a weighted combination of the MMSE estimates from the component distributions

),|()|()|( xyxxy kPkPPk

yxyxyxy dkPkPEk

),|()|(]|[ yxyyx dkPkPk

),|()|(

],|[)|( xyx kEkPk

Page 13: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MMSE estimates from a Gaussian mixture

12 Nov 2013 11755/18797 13

P(y|x) is also a Gaussian mixture

Let P(x,y) be a Gaussian Mixture

),;()()()( kk

k

NkPPP zzyx,

y

xz

)(

),|()|()(

)(

),,(

)(

)()|(

x

xyxx

x

yx

x

yx,xy

P

kPkPP

P

kP

P

PP kk

k

kPkPP ),|()|()|( xyxxy

Page 14: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MMSE estimates from a Gaussian mixture

12 Nov 2013 11755/18797 14

Let P(y|x) is a Gaussian Mixture

k

kPkPP ),|()|()|( xyxxy

),(),,(,,

,,

,

,

yyyx

xyxx

y

xxy

kk

kk

k

k

CC

CCNkP

)),((),|( ,

1

,,,

xxxyxy xxy kkkk CCNkP

k

kkkk CCNkPP )),(()|()|( ,

1

,,, xxxyxy xxxy

Page 15: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MMSE estimates from a Gaussian mixture

12 Nov 2013 11755/18797 15

E[y|x] is also a mixture

P(y|x) is a mixture Gaussian density

k

kkkk CCNkPP )),(()|()|( ,

1

,,, xxxyxy xxxy

k

kEkPE ],|[)|(]|[ xyxxy

k

kkkk CCkPE )()|(]|[ ,

1

,,, xxxyxy xxxy

Page 16: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MMSE estimates from a Gaussian mixture

12 Nov 2013 11755/18797 16

Weighted combination of MMSE estimates obtained from individual Gaussians!

Weight P(k|x) is easily computed too..

k

kkkk CCkPE )()|(]|[ ,

1

,,, xxxyxy xxxy

)(

),()|(

x

xx

P

kPkP ),()()( , xxxk

k

CNkPP x

Page 17: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MMSE estimates from a Gaussian mixture

12 Nov 2013 11755/18797 17

A mixture of estimates from individual Gaussians

Page 18: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Voice Morphing

• Align training recordings from both speakers

– Cepstral vector sequence

• Learn a GMM on joint vectors

• Given speech from one speaker, find MMSE estimate of the other

• Synthesize from cepstra

12 Nov 2013 11755/18797 18

Page 19: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MMSE with GMM: Voice Transformation

- Festvox GMM transformation suite (Toda)

awb bdl jmk slt

awb

bdl

jmk

slt

12 Nov 2013 11755/18797 19

Page 20: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

MAP / ML / MMSE

• General statistical estimators

• All used to predict a variable, based on other parameters related to it..

• Most common assumption: Data are Gaussian, all RVs are Gaussian

– Other probability densities may also be used..

• For Gaussians relationships are linear as we saw..

12 Nov 2013 11755/18797 20

Page 21: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Gaussians and more Gaussians..

• Linear Gaussian Models..

• But first a recap

12 Nov 2013 11755/18797 21

Page 22: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

A Brief Recap

• Principal component analysis: Find the K bases that

best explain the given data

• Find B and C such that the difference between D and

BC is minimum

– While constraining that the columns of B are orthonormal

12 Nov 2013 11755/18797 22

BCD B

C

D

Page 23: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Remember Eigenfaces

• Approximate every face f as f = wf,1 V1+ wf,2 V2 + wf,3 V3 +.. + wf,k Vk

• Estimate V to minimize the squared error

• Error is unexplained by V1.. Vk

• Error is orthogonal to Eigenfaces

12 Nov 2013 11755/18797 23

Page 24: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Karhunen Loeve vs. PCA

• Eigenvectors of the Correlation matrix: – Principal directions of tightest

ellipse centered on origin

– Directions that retain maximum energy

12 Nov 2013 11755/18797 24

Page 25: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Karhunen Loeve vs. PCA

• Eigenvectors of the Correlation matrix: – Principal directions of tightest

ellipse centered on origin

– Directions that retain maximum energy

12 Nov 2013 11755/18797 25

• Eigenvectors of the Covariance matrix: – Principal directions of tightest

ellipse centered on data

– Directions that retain maximum variance

Page 26: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Karhunen Loeve vs. PCA

• Eigenvectors of the Correlation matrix: – Principal directions of tightest

ellipse centered on origin

– Directions that retain maximum energy

12 Nov 2013 11755/18797 26

• Eigenvectors of the Covariance matrix: – Principal directions of tightest

ellipse centered on data

– Directions that retain maximum variance

Page 27: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Karhunen Loeve vs. PCA

• Eigenvectors of the Correlation matrix: – Principal directions of tightest

ellipse centered on origin

– Directions that retain maximum energy

12 Nov 2013 11755/18797 27

• Eigenvectors of the Covariance matrix: – Principal directions of tightest

ellipse centered on data

– Directions that retain maximum variance

Page 28: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Karhunen Loeve vs. PCA

• If the data are naturally centered at origin, KLT == PCA

• Following slides refer to PCA!

– Assume data centered at origin for simplicity

• Not essential, as we will see..

12 Nov 2013 11755/18797 28

Page 29: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Remember Eigenfaces

• Approximate every face f as f = wf,1 V1+ wf,2 V2 + wf,3 V3 +.. + wf,k Vk

• Estimate V to minimize the squared error

• Error is unexplained by V1.. Vk

• Error is orthogonal to Eigenfaces

12 Nov 2013 11755/18797 29

Page 30: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Eigen Representation

• K-dimensional representation

– Error is orthogonal to representation

– Weight and error are specific to data instance

12 Nov 2013 11755/18797 30

= w11 + e1

e1 w11

Illustration assuming 3D space

0

Page 31: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Representation

• K-dimensional representation

– Error is orthogonal to representation

– Weight and error are specific to data instance

12 Nov 2013 11755/18797 31

= w12 + e2

e2

w12

Illustration assuming 3D space

Error is at 90o

to the eigenface

90o

Page 32: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Representation

• K-dimensional representation

– Error is orthogonal to representation

12 Nov 2013 11755/18797 32

w

All data with the same representation wV1 lie a plane orthogonal to wV1

0

Page 33: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

With 2 bases

• K-dimensional representation

– Error is orthogonal to representation

– Weight and error are specific to data instance

12 Nov 2013 11755/18797 33

= w11 + w21 + e1

e1 w11

Illustration assuming 3D space

0,0

Error is at 90o

to the eigenfaces

w21

Page 34: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

With 2 bases

• K-dimensional representation

– Error is orthogonal to representation

– Weight and error are specific to data instance

12 Nov 2013 11755/18797 34

= w12 + w22 + e2

e2

w12

Illustration assuming 3D space

Error is at 90o

to the eigenfaces

w22

Page 35: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

In Vector Form

• K-dimensional representation

– Error is orthogonal to representation

– Weight and error are specific to data instance

12 Nov 2013 11755/18797 35

Xi = w1iV1 + w2iV2 + ei

e2

w12

Error is at 90o

to the eigenfaces

w22

V1

V2 D2

i

i

i

iw

wVVX e

2

1

21

Page 36: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

In Vector Form

12 Nov 2013 11755/18797 36

Xi = w1iV1 + w2iV2 + ei

e2

w12

Error is at 90o

to the eigenface

w22

V1

V2 D2

eVwx

• K-dimensional representation • x is a D dimensional vector • V is a D x K matrix • w is a K dimensional vector • e is a D dimensional vector

Page 37: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Learning PCA

• For the given data: find the K-dimensional subspace such that it captures most of the variance in the data

– Variance in remaining subspace is minimal 12 Nov 2013 11755/18797 37

Page 38: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Constraints

38

eVwx

• VTV = I : Eigen vectors are orthogonal to each other

• For every vector, error is orthogonal to Eigen vectors

– eTV = 0

• Over the collection of data

– Average wTw = Diagonal : Eigen representations are uncorrelated

– Determinant eTe = minimum: Error variance is minimum

• Mean of error is 0 12 Nov 2013 11755/18797

e2

w12

Error is at 90o

to the eigenface

w22

V1

V2 D2

Page 39: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

A Statistical Formulation of PCA

• x is a random variable generated according to a linear relation

• w is drawn from an K-dimensional Gaussian with diagonal

covariance

• e is drawn from a 0-mean (D-K)-rank D-dimensional Gaussian

• Estimate V (and B) given examples of x 39

eVwx

),0(~ BNw

),0(~ ENe

12 Nov 2013 11755/18797

e2

w12

Error is at 90o

to the eigenface

w22

V1

V2 D2

Page 40: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Linear Gaussian Models!!

• x is a random variable generated according to a linear relation

• w is drawn from a Gaussian

• e is drawn from a 0-mean Gaussian

• Estimate V given examples of x

– In the process also estimate B and E 40

eVwx

),0(~ BNw

),0(~ ENe

12 Nov 2013 11755/18797

Page 41: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Linear Gaussian Models!!

• x is a random variable generated according to a linear relation

• w is drawn from a Gaussian

• e is drawn from a 0-mean Gaussian

• Estimate V given examples of x

– In the process also estimate B and E 41

eVwx

),0(~ BNw

),0(~ ENe

12 Nov 2013 11755/18797

Page 42: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Linear Gaussian Models

• Observations are linear functions of two uncorrelated Gaussian random variables

– A “weight” variable w

– An “error” variable e

– Error not correlated to weight: E[eTw] = 0

• Learning LGMs: Estimate parameters of the model given instances of x

– The problem of learning the distribution of a Gaussian RV 42

eVwμx ),0(~ BNw

),0(~ ENe

12 Nov 2013 11755/18797

Page 43: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

LGMs: Probability Density

• The mean of x:

12 Nov 2013 11755/18797 43

eVwμx ),0(~ BNw

),0(~ ENe

μewVμx ][][][ EEE

• The Covariance of x:

EBEEE TT VVxxxx ]][][[

Page 44: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

The probability of x

• x is a linear function of Gaussians: x is also Gaussian

• Its mean and variance are as given

12 Nov 2013 11755/18797 44

eVwμx ),0(~ BNw),0(~ ENe

),(~ EBN T VVμx

μxVVμx

VVx

15.0exp

||2

1)( EB

EBP TT

TD

Page 45: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Estimating the variables of the model

• Estimating the variables of the LGM is equivalent to estimating P(x)

– The variables are , V, B and E

12 Nov 2013 11755/18797 45

eVwμx ),0(~ BNw),0(~ ENe

),(~ EBN T VVμx

Page 46: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Estimating the model

• The model is indeterminate:

– Vw = VCC-1w = (VC)(C-1w)

– We need extra constraints to make the solution unique

• Usual constraint : B = I

– Variance of w is an identity matrix

12 Nov 2013 11755/18797 46

eVwμx ),0(~ BNw),0(~ ENe

),(~ EBN T VVμx

Page 47: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Estimating the variables of the model

• Estimating the variables of the LGM is equivalent to estimating P(x)

– The variables are , V, and E

12 Nov 2013 11755/18797 47

eVwμx ),0(~ INw),0(~ ENe

),(~ EN T VVμx

Page 48: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

The Maximum Likelihood Estimate

• The ML estimate of does not depend on the covariance of the Gaussian

12 Nov 2013 11755/18797 48

),(~ EN T VVμx

• Given training set x1, x2, .. xN, find , V, E

i

iN

xμ1

Page 49: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Centered Data

• We can safely assume “centered” data

– = 0

• If the data are not centered, “center” it

– Estimate mean of data

• Which is the maximum likelihood estimate

– Subtract it from the data 12 Nov 2013 11755/18797 49

Page 50: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Simplified Model

• Estimating the variables of the LGM is equivalent to estimating P(x)

– The variables are V, and E

12 Nov 2013 11755/18797 50

eVwx ),0(~ INw),0(~ ENe

),0(~ EN T VVx

Page 51: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Estimating the model

• Given a collection of xi terms

– x1, x2,..xN

• Estimate V and E

• w is unknown for each x

• But if assume we know w for each x, then what do we get:

12 Nov 2013 11755/18797 51

),0(~ EN T VVxeVwx

Page 52: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Estimating the Parameters

• We will use a maximum-likelihood estimate

• The log-likelihood of x1..xN knowing their wis

12 Nov 2013 11755/18797 52

eVwx ii ),0()( ENP e ),()|( ENP Vwwx

)()(5.0exp

||2

1)|( 1

VwxVwxwx EE

P T

D

)..|..(log 11 NNP wwxx

i

ii

T

ii EEN )()(5.0||log5.0 11VwxVwx

Page 53: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Maximizing the log-likelihood

• Differentiating w.r.t. V and setting to 0

12 Nov 2013 11755/18797 53

i

ii

T

ii EENLL )()(5.0||log5.0 11VwxVwx

0)(2 1 T

i

i

iiE wVwx1

i

T

ii

i

T

ii wwwxV

• Differentiating w.r.t. E-1 and setting to 0

i i

T

ii

T

iiN

E xwVxx1

Page 54: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Estimating LGMs: If we know w

• But in reality we don’t know the w for each x

– So how to deal with this?

• EM..

12 Nov 2013 11755/18797 54

1

i

T

ii

i

T

ii wwwxV

i i

T

ii

T

iiN

E xwVxx1

eVwx ii ),0()( ENP e

Page 55: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Recall EM

• We figured out how to compute parameters if we knew the missing information

• Then we “fragmented” the observations according to the posterior probability P(z|x) and counted as usual

• In effect we took the expectation with respect to the a posteriori probability of the missing data: P(z|x)

12 Nov 2013 11755/18797 55

Collection of “blue”

numbers

Collection of “red”

numbers

6

.. .. Collection of “blue”

numbers

Collection of “red”

numbers

6

.. .. Collection of “blue”

numbers

Collection of “red”

numbers

6

6 6

6 6 6 .. 6 ..

Instance from blue dice Instance from red dice Dice unknown

Page 56: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

EM for LGMs

• Replace unseen data terms with expectations taken w.r.t. P(w|xi)

12 Nov 2013 11755/18797 56

1

i

T

ii

i

T

ii wwwxV

i i

T

ii

T

iiN

E xwVxx1

eVwx ii ),0()( ENP e

i

T

i

i

T

ii iE

NNE xwVxx xw| ][

111

][][

i

T

i

T

i iiEE wwwxV xw|xw|

Page 57: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

EM for LGMs

• Replace unseen data terms with expectations taken w.r.t. P(w|xi)

12 Nov 2013 11755/18797 57

1

i

T

ii

i

T

ii wwwxV

i i

T

ii

T

iiN

E xwVxx1

eVwx ii ),0()( ENP e

i

T

i

i

T

ii iE

NNE xwVxx xw| ][

111

][][

i

T

i

T

i iiEE wwwxV xw|xw|

Page 58: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Expected Value of w given x

• x and w are jointly Gaussian!

– x is Gaussian

– w is Gaussian

– They are linearly related

12 Nov 2013 11755/18797 58

eVwx ),0()( ENP e ),0()( INP w

),0()( ENP T VVx

w

xz ),()( zzzz CNP

Page 59: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

Expected Value of w given x

• x and w are jointly Gaussian!

12 Nov 2013 11755/18797 59

wwwx

xwxx

zzCC

CCC

w

xz

0

w

x

z

Vwx wxxw ]))([( TEC

),()( zzzz CNP eVwx

),0()( ENP T VVx

),0()( INP w

I

EC

T

T

V

VVVzz

Page 60: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

The conditional expectation of w given z

• P(w|z) is a Gaussian

12 Nov 2013 11755/18797 60

)),(()|( 11

xwxxwxwwxxxwxwxw CCCCxCCNP T

I

EC

T

T

V

VVVzz

wwwx

xwxx

zzCC

CCC

0

w

x

z

))(,)(()|( 11VVVVxVVVxw

EIENP TTTT

i

TT EEi

xVVVwxw

1

| )(][ TT

iiiEEVarE ][][)(][ ||| wwwww xwxwxw

TTTT

iiiEEEIE ][][)(][ ||

1

| wwVVVVww xwxwxw

Page 61: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

LGM: The complete EM algorithm

• Initialize V and E

• E step:

• M step:

• 12 Nov 2013 11755/18797 61

i

TT EEi

xVVVwxw

1

| )(][

TTTT

iiiEEEIE ][][)(][ ||

1

| wwVVVVww xwxwxw

1

][][

i

T

i

T

i iiEE wwwxV xw|xw|

i

T

i

i

T

ii iE

NNE xwVxx xw| ][

11

Page 62: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

So what have we achieved

• Employed a complicated EM algorithm to learn a Gaussian PDF for a variable x

• What have we gained???

• Next class:

– PCA

• Sensible PCA

• EM algorithms for PCA

– Factor Analysis

• FA for feature extraction

12 Nov 2013 11755/18797 62

Page 63: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

• Find directions that capture most of the variation in the data

• Error is orthogonal to these variations

3 Oct 2011 11755/18797 63

eVwx

),0(~ INw

),0(~ ENe

LGMs : Application 1 Learning principal components

Page 64: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

• The full covariance matrix of a Gaussian has D2 terms

• Fully captures the relationships between variables

• Problem: Needs a lot of data to estimate robustly

3 Oct 2011 11755/18797 64

LGMs : Application 2 Learning with insufficient data

FULL COV FIGURE

Page 65: Machine Learning for Signal Processing Linear Gaussian Modelsmlsp.cs.cmu.edu/courses/fall2014/lectures/slides/old/...Recap: MAP Estimators •MAP (Maximum A Posteriori): Find a “best

To be continued..

• Other applications..

• Next class

12 Nov 2013 11755/18797 65