measure independence in kernel space
DESCRIPTION
Measure Independence in Kernel Space. Presented by: Qiang Lou. References. I made slides based on following papers: F. Bach and M. Jordan. Kernel Independent Component Analysis. Journal of Machine Learning Research, 2002. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/1.jpg)
Measure Independence in Kernel Space
Presented by:
Qiang Lou
![Page 2: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/2.jpg)
References
I made slides based on following papers:
F. Bach and M. Jordan. Kernel Independent Component Analysis. Journal of Machine Learning Research, 2002.
Arthur Gretton, Ralf herbrich, Alexander Smola, Olivier Bousquet, Bernhard Scholkopf. Kernel Methods for Measuring Independence. Journal of Machine Learning and Research, 2005.
![Page 3: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/3.jpg)
Outline
Introduction Canonical Correlation Kernel Canonical Correlation Application Example
![Page 4: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/4.jpg)
Introduction
What is Independence?
Intuitively, two variables y1, y2 are said to be independent if information on value of one variable does not give any information on the value of the other variable.
Technically, y1 and y2 are independent if and only if and only if the joint pdf is factorizable in the following way:
p(y1, y2) = p1(y1)*p2(y2)
![Page 5: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/5.jpg)
Introduction
How to measure Independence. --Can we use correlation? --Uncorrelated variables means Independent variables?
Remark:y1 and y2 are uncorrelated means:
E[y1 y2] – E[y1]E[y2] = 0
![Page 6: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/6.jpg)
Introduction
The answer is “No”Fact:
Independence implies uncorrelatedness, but the reverse is not true.
Which means: p(y1, y2) = p1(y1)*p2(y2) → E[y1 y2] – E[y1]E[y2] = 0
E[y1 y2] – E[y1]E[y2] = 0 → p(y1, y2) = p1(y1)*p2(y2)
This is easy to prove…
![Page 7: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/7.jpg)
Introduction
Now comes the question:
How to measure independence?
![Page 8: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/8.jpg)
Canonical Correlation
Canonical Correlation Analysis (CCA) is concerned with finding a pair of linear transformations such that one component within each set of transformed variables is correlated with a single component in the other set.
We focus on the first canonical correlation which is defined as the maximum possible correlation between the two projections and of x1 and x2:
C is the covariance matrix of (x1, x2)
![Page 9: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/9.jpg)
Canonical Correlation
Taking derivatives with respect to and , we obtain:
![Page 10: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/10.jpg)
Canonical Correlation
![Page 11: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/11.jpg)
Canonical Correlation
![Page 12: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/12.jpg)
Canonical Correlation
So, it can be extended to more than two sets of variables: (find smallest eigenvalue)
![Page 13: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/13.jpg)
Kernel Canonical Correlation
)(: xx
Kernel trick:
defining a map from X to a feature space F,
such that we can find a kernel satisfying:
)()(),( jiji xxxxK
![Page 14: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/14.jpg)
Kernel Canonical Correlation
F-correlation -- canonical correlation between Φ1(x1) and Φ2(x2)
![Page 15: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/15.jpg)
Kernel Canonical Correlation
Notes: X1 and x2 are independent implies value of is 0.
Is the converse true?-- If F is ‘large’, it’s true.
-- If F is the space corresponding to a Gaussian Kernel which is
positive definite kernel on X = R
![Page 16: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/16.jpg)
Kernel Canonical Correlation
Estimation of the F-correlation-- kernelized version of canonical correlation
We will show that depends only on Gram matrices K1 and K2 of these observations, we will use to denote this canonical correlation.
Suppose the data are centered in feature space. (i.e. )
![Page 17: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/17.jpg)
Kernel Canonical Correlation
We want to know:
Which means we want to know three things:
![Page 18: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/18.jpg)
Kernel Canonical Correlation
For fixed f1 and f2, the empirical covariance of the projections in feature can be written:
![Page 19: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/19.jpg)
Kernel Canonical Correlation
Similarly, we can get the following:
![Page 20: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/20.jpg)
Kernel Canonical Correlation
Put three expressions together, we get:
Similar with the problem we talked before, this is equivalent to the following generalized eigenvalue problem:
![Page 21: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/21.jpg)
Kernel Canonical Correlation
Problem:
suppose that the Gram matrices K1 and K2 have full rank, canonical correlation will always be 1, whatever K1 and K2 are. Let V1 and V2 denote the subspaces of RN generated by the columns of K1 and K2, then we can rewrite:
If K1 and K2 have full rank, V1 and V2 would be equal to RN
![Page 22: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/22.jpg)
Kernel Canonical Correlation
Solution: regularization by penalizing the norm of f1 and f2, so we get the regularized F-correlation as following:
where k is a small positive constant.
We expand:
![Page 23: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/23.jpg)
Kernel Canonical Correlation
Now we can get regularized KCC:
![Page 24: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/24.jpg)
Kernel Canonical Correlation
Generalizing to more than two sets of variables, it’s equivalent to the generalized eigenvalue problem:
![Page 25: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/25.jpg)
Example Application
Applications: -- ICA (Independent Component Analysis)
-- Feature Selection
See the demo for application in ICA…
![Page 26: Measure Independence in Kernel Space](https://reader034.vdocuments.us/reader034/viewer/2022051623/56815d4a550346895dcb5410/html5/thumbnails/26.jpg)
Thank you!!!
Questions?