machine learning to recognize phenomena in large scale ...bapoczos/other... · machine learning to...
TRANSCRIPT
![Page 1: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/1.jpg)
Machine Learning to Recognize
Phenomena in Large Scale Simulations
Barnabás Póczos
Carnegie Mellon University
School of Computer Science
April 2, 2012
Jeff SchneiderLiang Xiong Dougal Sutherland
![Page 2: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/2.jpg)
2
anomaly detection, classification,
low-dim embedding
2nd Contribution
ML on distributions
Recognize interesting phenomena in
computer vision, astronomy, turbulence data
Applications
OUTLINE
2
Divergence estimation1st Contribution
EstimatorsDivergence estimation
![Page 3: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/3.jpg)
3
Entropy, Mutual Information, Divergence
C. Shannon
A. Rényi
Fernandes & Gloor: Mutual information is critically dependent on
prior assumptions: would the correct estimate of mutual
information please identify itself?BIOINFORMATICS Vol. 26 no. 9 2010, pages 1135–1139
![Page 4: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/4.jpg)
4
How should we estimate them?
Naïve plug-in approach using density estimation
Density: nuisance parameter
Density estimation: difficult, curse of dimensionality!
� histogram
� kernel density estimation
� k-nearest neighbors [D. Loftsgaarden & C. Quesenberry. 1965.]
How can we estimate them directly, without estimating the density?
![Page 5: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/5.jpg)
Using
Estimate divergence
DIVERGENCE ESTIMATION
without density estimation
5
![Page 6: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/6.jpg)
The Estimator
Póczos & Schneider, AISTATS 2011
6
![Page 7: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/7.jpg)
Theoretical Results(Asymptotically unbiased, L2 consistent)
Why is this divergence estimator consistent?
7
![Page 8: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/8.jpg)
Asymptotically Unbiased
We need to prove:
The estimator
Normalized k-NN distances converge to the Erlang distribution
Póczos & Schneider, AISTATS 2011
All we need is
8
![Page 9: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/9.jpg)
A little problem…
Asymptotic uniformly integrability…Solutions:
1 2 3
Appendix of Póczos & Schneider, AISTATS 2011
9
![Page 10: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/10.jpg)
2nd Contribution
ML on distributions
computer vision, astronomy, turbulence dataApplications
OUTLINE
10
Divergence estimation1st Contribution
Estimators
anomaly detection, classification,
low-dim embedding
anomaly detection, classification,
low-dim embedding
![Page 11: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/11.jpg)
Dealing with complex objects� break into smaller parts, represent the input as a set of smaller parts
� treat the set elements as sample points from some unknown distribution
� do ML on unknown distributions represented by sets
Contribution IIGeneralize ML to sets and distributions
Most machine learning algorithms operate on vectorial objects.
The world is complicated. Often • hand crafted vectorial features are not good enough
• natural to work with complex inputs directly (sets or distributions...)
![Page 12: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/12.jpg)
Machine Learning on Distributions
If we can estimate divergences and inner products between
distributions, then we can construct ML algorithms
that operate on distributions.
Many ML algorithms only require
� the pairwise distances between the inputs
� the inner products between the inputs
� Classification
� Low-dimensional embedding
� Anomaly detection
12
![Page 13: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/13.jpg)
Support Distribution Machines
?
+ -
+
+ -
-
Goal: Classify the following distributions (sample sets) using SVMs
� The inputs are distributions
� We don’t even know those distributions, only sample sets are available
Differences compared to standard methods on vectors
13
Training data
Test point
![Page 14: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/14.jpg)
Distribution Classification
Problems:
Use RKHS based SVM to solve this problem!
Dual form of SVM:
14
Calculate the Gram matrix
![Page 15: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/15.jpg)
Kernel Estimation
Linear kernel:
Polynomial kernel:
Gaussian kernel:
Solution: make it symmetric, and project it to the cone of PSD matrices
15
We already know how!
![Page 16: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/16.jpg)
OUTLINE
2nd Contribution
ML on distributions
Applicationscomputer vision, astronomy, turbulence datacomputer vision, astronomy, turbulence data
Divergence estimation1st Contribution
Estimators
anomaly detection, classification,
low-dim embedding
![Page 17: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/17.jpg)
B. Póczos, L. Xiong & J. Schneider, UAI, 2011.
What are the most anomalous galaxy clusters?
The most anomalous galaxy cluster is dominated by
� star forming blue galaxies
� irregular galaxies
Sloan Digital Sky Survey (SDSS)
�Spectrum, PCA: 4000d ⇒⇒⇒⇒ 2d
�505 galaxy clusters
(10-50 galaxies in each)
�7530 galaxies
Finding Unusual Galaxy Clusters
Blue galaxy Red galaxy
Credits: ESA, NASA 17
![Page 18: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/18.jpg)
Understanding Turbulences
Credits: ESA, NASA, PPPL, Wikipedia 18
![Page 19: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/19.jpg)
Turbulence Data Classification
Simulated fluid flow through time(JHU Turbulence Research Group)
Positive (vortex) NegativeNegative
Velocity distributions
19
•11 positive, 20 negative examples
find interesting events, patterns, phenomena
What is interesting?
Goal: find vortices!
Something interesting happened?
•Results: Leave one out cross-val : 97%
![Page 20: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/20.jpg)
Finding Vortices
Classification probabilities20
![Page 21: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/21.jpg)
Find Interesting Phenomena in Turbulence Data
Anomaly scores
Anomaly detection with 1-class SDM
21
![Page 22: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/22.jpg)
Image Representation with Distributions
� Each image patch is represented by PCA compressed SIFT vectors.
SIFT = Scale-invariant feature transform. PCA: 128dim⇒⇒⇒⇒ d dim
Image patches•Overlapping •Non-overlapping
Patch locations•Grid points•Interesting points•Random
Patch sizes•Same•Different, •Hierarchy
Dealing with complex objects
� break into smaller parts,
� represent the object as a sample set of these parts
d-dimensional sample set representation of the image
22
� Each set is a sample set from some unknown distribution.
� Each image is represented as a set of these d dim feature vectors.
![Page 23: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/23.jpg)
Detecting Anomalous ImagesB. Póczos, L. Xiong & J. Schneider, UAI, 2011.
50 highway images
5 anomalies
2-dimensional sample set representation of images (128 dim SIFT ⇒⇒⇒⇒ 2 dim)
Anomaly detection: divergences between the distributions of these sample sets23
![Page 24: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/24.jpg)
Detecting Anomalous Images1 2 3 4 95 86 7 10
55545351 52 24
![Page 25: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/25.jpg)
1 2 3 4 95 86 7 10
55545351 52
GMM-5 Density Approximation Failed
25
![Page 26: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/26.jpg)
Object ClassificationETH-80 [Leibe and Schiele, 2003]
� BoW: 88.9%
� NPR: 90.1%
8 categories, 400 images, each image is represented by 576 18 dim points
Póczos, Xiong, Sutherland, & Schneider, CVPR 2012
NPR
26
2-fold CV,16 runs
![Page 27: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/27.jpg)
NPR
Outdoor Scenes Classification[Oliva and Torralba, 2001]
� Best published: 91.57%(Qin and Yung, ICMV 2010)
� NPR: 92.3%
coast
mountain country
forest
street
highway city
tall building
8 categories, 2688 images,
each represented by 1815 53 dim points.
Póczos, Xiong, Sutherland, & Schneider, CVPR 201227
10 fold CV, 16 runs
![Page 28: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/28.jpg)
8 categories, 1040 images,
each represented by 295 to 1542 57 dim points.
NPR
Sport Events Classification[Li and Fei Fei, 2007]
� Best published: 86.7%(Zhang et al, CVPR 2011)
� NPR: 87.1%
Póczos, Xiong, Sutherland, & Schneider, CVPR 2012 28
� 1040 images, each represented
by 295-1542 53D points.
� 2 fold CV, 16 runs
Computer Vision:
� Goal: best results
� Sophisticated feature construction, complex algorithms, heuristics
SDM approach:
� simple, easy to implement
� standard SIFT features
� 3 datasets, 3 best performances
2 fold CV, 16 runs
![Page 29: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/29.jpg)
Noisy USPS Dataset Classification
Results:
SVM on raw images 82.1 ± .5% accuracy
� Original (noiseless) USPS dataset is easy ~97%
SDM on the 2D distributions, Rényi divergence: 96.0 ± .3% accuracy
29
160
160
� Each instance (image) is a set of 500 2d points
� 1000 training and 1000 test instances
![Page 30: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/30.jpg)
Multidimensional Scaling of USPS Data
Raw images
using Euclidean distanceEstimated Euclidean distance
between the distributions
Nonlinear embedding with MDS into 2d.
10 instances from figures 1,2,3,4.
30
Calculate pairwise Euclidean distances.
![Page 31: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/31.jpg)
Local Linear Embedding of Distributions
72 rotated COIL froggies Edge detected COIL froggy
Euclidean distance between images Euclidean distance between distributions31
![Page 32: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April](https://reader035.vdocuments.us/reader035/viewer/2022081401/5f068e717e708231d4189362/html5/thumbnails/32.jpg)
computer vision, astronomy, turbulence data
classification, regression, clustering,
anomaly detection, low-dim embedding
Divergence estimation1st Contribution
Estimators
2nd Contribution
ML on distributions
Applications
Take Me Home!
• Outperforms state-of-the-art results in CV benchmarks
• Solves new problems in Astronomy, Turbulence data analysis
• Support Distribution Machines
32
Thank You for your attention! ☺