![Page 1: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/1.jpg)
A Unifying Review of Linear Gaussian ModelsSummary Presentation 2/15/10 – Dae Il Kim
Department of Computer ScienceGraduate StudentAdvisor: Erik Sudderth Ph.D.
![Page 2: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/2.jpg)
Overview
• Introduce the Basic Model• Discrete Time Linear Dynamical System (Kalman Filter)• Some nice properties of Gaussian distributions• Graphical Model: Static Model (Factor Analysis, PCA, SPCA)• Learning & Inference: Static Model• Graphical Model: Gaussian Mixture & Vector Quantization• Learning & Inference: GMMs & Quantization• Graphical Model: Discrete-State Dynamic Model (HMMs)• Independent Component Analysis• Conclusion
![Page 3: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/3.jpg)
The Basic Model
• Basic Model: Discrete Time Linear Dynamical System (Kalman Filter)
Variations of this model produce:Factor AnalysisPrincipal Component AnalysisMixtures of GaussiansVector QuantizationIndependent Component AnalysisHidden Markov Models
),0(~ QNw ),0(~ RNv
Additive Gaussian Noise
wAxwAxx tttt 1
vCxvCxy tttt
A = k x k state transition matrixC = p x k observation / generative matrix
Generative Model
![Page 4: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/4.jpg)
Nice Properties of Gaussians
• Markov Property
1
1
11111 )|()|()(}),...,{},,...,({
ttt
ttt xyPxxPxPyyxxP
• Inference in these models
}),...,({
}),...,{},,...,({}),...,{|},...,({
1
1111
yyP
yyxxPyyxxP
}),...,{|}({: 1 tt yyxPFiltering}),...,{|}({: 1 yyxPSmoothing t
• Learning via Expectation Maximization (EM)
dXYXPxQQstepEX
kQ
k )|,(log)(maxarg: 1
dXYXPYXPstepMX
kk )|,(log),|(maxarg: 1
1|),()|( 1
txttt QAxNxxPtyttt RCxNxyP |),()|(
• Conditional Independence
![Page 5: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/5.jpg)
Graphical Model for Static Models
Factor Analysis: Q = I & R is diagonalSPCA: Q = I & R = αIPCA: Q = I & R = lime0eI
wxA 0
vCxy
),0(~ QNw
),0(~ RNv
Generative Model
Additive Gaussian Noise
![Page 6: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/6.jpg)
Example of the generative process for PCA
Z = latent variableX = observed variable
1-dimensional latent space 2-dimensional observation space
Bishop (2006)
Marginal distribution for p(x)
![Page 7: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/7.jpg)
Learning & Inference: Static ModelsAnalytically integrating over the joint, we obtain the marginal distribution of y.
),0(~ RCQCNy T
Note: Filtering and Smoothing reduce to the same problem in the static model since the time dependence is gone. We want to find P(x.|y.) over a single hidden state given the single observation. Inference can be performed simply by linear matrix projection and the result is also Gaussian.
)(
)()|()|(
yp
xpxypyxp
yRCCN
xINyRCxNT |),0(
|),0(|),(
We can calculate our poterior using Bayes rule
xCIyNyxP |),()|(
1)( RCCC TT
Our posterior now becomes another Gaussian
Where beta is equal to:
![Page 8: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/8.jpg)
Graphical Model: Gaussian Mixture Models & Vector Quantization
][0 wWTAxA
vCxy
Generative Model
),(~ QNw
),0(~ RNv
Additive Gaussian Noise
(Winner Takes All - WTA)[x] = new vector with unity in the position of the largest coordinate of the input and zeros in all other positions. [0 0 1 ]
Note: Each state x. is generated independently according to a fixed discrete probability histogram controlled by the mean and covariance of w.
IR 0lim
This model becomes a Vector Quantization model when:
![Page 9: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/9.jpg)
Learning & Inference: GMMs & Quantization
)(
),()|()ˆ(
yP
yexPyexPx j
jj
k
i ij
jj
exPyRCN
exPyRCN
1)(|),(
)(|),(
k
i ij
jj
yRCN
yRCN
1)(|),(
)(|),(
Calculating the posterior responsibility for each cluster is analagous to the E-Step in this model.
Computing the Likelihood for the data is straightforward
),()(1
yexPyP j
k
i
)(|),(1
i
k
ii exPyRCN
k
iii yRCN
1
)(|),(
Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others.
![Page 10: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/10.jpg)
Gaussian Mixture Models
)( jj exP
Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others.
Marginal Distribution p(y)Joint Distribution p(y,x)
![Page 11: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/11.jpg)
Graphical Model: Discrete-State Dynamic Models
][1 ttt wAxWTAx
vCxvCxy tttt
),(~ QNw
),0(~ RNv
Additive Gaussian Noise
][ wAxWTA t
Generative Model
![Page 12: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/12.jpg)
Independent Component Analysis• ICA can be seen as a linear generative model with non-gaussian priors for the hidden
variables or as a nonlinear generative model with gaussian priors for the hidden variables.
TT yWyfWW )(
The gradient learning rule to increase the likelihood:
dx
xpdxf x )(log)(
)(0 wgxA
vCxy
),0(~ QNw
),0(~ RNv
g(.) is a general nonlinearity that is invertible and differentiable
Generative Model
)))2
(1(4
ln(tan()(w
erfwg
![Page 13: A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth](https://reader035.vdocuments.us/reader035/viewer/2022062516/56649d805503460f94a64266/html5/thumbnails/13.jpg)
Conclusion
Many more potential models!