machine learning to recognize phenomena in large scale ...bapoczos/other... · machine learning to...

32
Machine Learning to Recognize Phenomena in Large Scale Simulations Barnabás Póczos Carnegie Mellon University School of Computer Science April 2, 2012 Jeff Schneider Liang Xiong Dougal Sutherland

Upload: others

Post on 13-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Machine Learning to Recognize

Phenomena in Large Scale Simulations

Barnabás Póczos

Carnegie Mellon University

School of Computer Science

April 2, 2012

Jeff SchneiderLiang Xiong Dougal Sutherland

Page 2: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

2

anomaly detection, classification,

low-dim embedding

2nd Contribution

ML on distributions

Recognize interesting phenomena in

computer vision, astronomy, turbulence data

Applications

OUTLINE

2

Divergence estimation1st Contribution

EstimatorsDivergence estimation

Page 3: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

3

Entropy, Mutual Information, Divergence

C. Shannon

A. Rényi

Fernandes & Gloor: Mutual information is critically dependent on

prior assumptions: would the correct estimate of mutual

information please identify itself?BIOINFORMATICS Vol. 26 no. 9 2010, pages 1135–1139

Page 4: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

4

How should we estimate them?

Naïve plug-in approach using density estimation

Density: nuisance parameter

Density estimation: difficult, curse of dimensionality!

� histogram

� kernel density estimation

� k-nearest neighbors [D. Loftsgaarden & C. Quesenberry. 1965.]

How can we estimate them directly, without estimating the density?

Page 5: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Using

Estimate divergence

DIVERGENCE ESTIMATION

without density estimation

5

Page 6: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

The Estimator

Póczos & Schneider, AISTATS 2011

6

Page 7: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Theoretical Results(Asymptotically unbiased, L2 consistent)

Why is this divergence estimator consistent?

7

Page 8: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Asymptotically Unbiased

We need to prove:

The estimator

Normalized k-NN distances converge to the Erlang distribution

Póczos & Schneider, AISTATS 2011

All we need is

8

Page 9: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

A little problem…

Asymptotic uniformly integrability…Solutions:

1 2 3

Appendix of Póczos & Schneider, AISTATS 2011

9

Page 10: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

2nd Contribution

ML on distributions

computer vision, astronomy, turbulence dataApplications

OUTLINE

10

Divergence estimation1st Contribution

Estimators

anomaly detection, classification,

low-dim embedding

anomaly detection, classification,

low-dim embedding

Page 11: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Dealing with complex objects� break into smaller parts, represent the input as a set of smaller parts

� treat the set elements as sample points from some unknown distribution

� do ML on unknown distributions represented by sets

Contribution IIGeneralize ML to sets and distributions

Most machine learning algorithms operate on vectorial objects.

The world is complicated. Often • hand crafted vectorial features are not good enough

• natural to work with complex inputs directly (sets or distributions...)

Page 12: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Machine Learning on Distributions

If we can estimate divergences and inner products between

distributions, then we can construct ML algorithms

that operate on distributions.

Many ML algorithms only require

� the pairwise distances between the inputs

� the inner products between the inputs

� Classification

� Low-dimensional embedding

� Anomaly detection

12

Page 13: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Support Distribution Machines

?

+ -

+

+ -

-

Goal: Classify the following distributions (sample sets) using SVMs

� The inputs are distributions

� We don’t even know those distributions, only sample sets are available

Differences compared to standard methods on vectors

13

Training data

Test point

Page 14: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Distribution Classification

Problems:

Use RKHS based SVM to solve this problem!

Dual form of SVM:

14

Calculate the Gram matrix

Page 15: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Kernel Estimation

Linear kernel:

Polynomial kernel:

Gaussian kernel:

Solution: make it symmetric, and project it to the cone of PSD matrices

15

We already know how!

Page 16: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

OUTLINE

2nd Contribution

ML on distributions

Applicationscomputer vision, astronomy, turbulence datacomputer vision, astronomy, turbulence data

Divergence estimation1st Contribution

Estimators

anomaly detection, classification,

low-dim embedding

Page 17: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

B. Póczos, L. Xiong & J. Schneider, UAI, 2011.

What are the most anomalous galaxy clusters?

The most anomalous galaxy cluster is dominated by

� star forming blue galaxies

� irregular galaxies

Sloan Digital Sky Survey (SDSS)

�Spectrum, PCA: 4000d ⇒⇒⇒⇒ 2d

�505 galaxy clusters

(10-50 galaxies in each)

�7530 galaxies

Finding Unusual Galaxy Clusters

Blue galaxy Red galaxy

Credits: ESA, NASA 17

Page 18: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Understanding Turbulences

Credits: ESA, NASA, PPPL, Wikipedia 18

Page 19: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Turbulence Data Classification

Simulated fluid flow through time(JHU Turbulence Research Group)

Positive (vortex) NegativeNegative

Velocity distributions

19

•11 positive, 20 negative examples

find interesting events, patterns, phenomena

What is interesting?

Goal: find vortices!

Something interesting happened?

•Results: Leave one out cross-val : 97%

Page 20: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Finding Vortices

Classification probabilities20

Page 21: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Find Interesting Phenomena in Turbulence Data

Anomaly scores

Anomaly detection with 1-class SDM

21

Page 22: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Image Representation with Distributions

� Each image patch is represented by PCA compressed SIFT vectors.

SIFT = Scale-invariant feature transform. PCA: 128dim⇒⇒⇒⇒ d dim

Image patches•Overlapping •Non-overlapping

Patch locations•Grid points•Interesting points•Random

Patch sizes•Same•Different, •Hierarchy

Dealing with complex objects

� break into smaller parts,

� represent the object as a sample set of these parts

d-dimensional sample set representation of the image

22

� Each set is a sample set from some unknown distribution.

� Each image is represented as a set of these d dim feature vectors.

Page 23: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Detecting Anomalous ImagesB. Póczos, L. Xiong & J. Schneider, UAI, 2011.

50 highway images

5 anomalies

2-dimensional sample set representation of images (128 dim SIFT ⇒⇒⇒⇒ 2 dim)

Anomaly detection: divergences between the distributions of these sample sets23

Page 24: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Detecting Anomalous Images1 2 3 4 95 86 7 10

55545351 52 24

Page 25: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

1 2 3 4 95 86 7 10

55545351 52

GMM-5 Density Approximation Failed

25

Page 26: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Object ClassificationETH-80 [Leibe and Schiele, 2003]

� BoW: 88.9%

� NPR: 90.1%

8 categories, 400 images, each image is represented by 576 18 dim points

Póczos, Xiong, Sutherland, & Schneider, CVPR 2012

NPR

26

2-fold CV,16 runs

Page 27: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

NPR

Outdoor Scenes Classification[Oliva and Torralba, 2001]

� Best published: 91.57%(Qin and Yung, ICMV 2010)

� NPR: 92.3%

coast

mountain country

forest

street

highway city

tall building

8 categories, 2688 images,

each represented by 1815 53 dim points.

Póczos, Xiong, Sutherland, & Schneider, CVPR 201227

10 fold CV, 16 runs

Page 28: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

8 categories, 1040 images,

each represented by 295 to 1542 57 dim points.

NPR

Sport Events Classification[Li and Fei Fei, 2007]

� Best published: 86.7%(Zhang et al, CVPR 2011)

� NPR: 87.1%

Póczos, Xiong, Sutherland, & Schneider, CVPR 2012 28

� 1040 images, each represented

by 295-1542 53D points.

� 2 fold CV, 16 runs

Computer Vision:

� Goal: best results

� Sophisticated feature construction, complex algorithms, heuristics

SDM approach:

� simple, easy to implement

� standard SIFT features

� 3 datasets, 3 best performances

2 fold CV, 16 runs

Page 29: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Noisy USPS Dataset Classification

Results:

SVM on raw images 82.1 ± .5% accuracy

� Original (noiseless) USPS dataset is easy ~97%

SDM on the 2D distributions, Rényi divergence: 96.0 ± .3% accuracy

29

160

160

� Each instance (image) is a set of 500 2d points

� 1000 training and 1000 test instances

Page 30: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Multidimensional Scaling of USPS Data

Raw images

using Euclidean distanceEstimated Euclidean distance

between the distributions

Nonlinear embedding with MDS into 2d.

10 instances from figures 1,2,3,4.

30

Calculate pairwise Euclidean distances.

Page 31: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

Local Linear Embedding of Distributions

72 rotated COIL froggies Edge detected COIL froggy

Euclidean distance between images Euclidean distance between distributions31

Page 32: Machine Learning to Recognize Phenomena in Large Scale ...bapoczos/other... · Machine Learning to Recognize Phenomena in Large Scale Simulations ... School of Computer Science April

computer vision, astronomy, turbulence data

classification, regression, clustering,

anomaly detection, low-dim embedding

Divergence estimation1st Contribution

Estimators

2nd Contribution

ML on distributions

Applications

Take Me Home!

• Outperforms state-of-the-art results in CV benchmarks

• Solves new problems in Astronomy, Turbulence data analysis

• Support Distribution Machines

32

Thank You for your attention! ☺