machine learning lunch carnegie mellon universitybapoczos/other... · machine learning lunch...

46
Nonparametric information estimation using Rank based Euclidean Graph Optimization methods Barnabás Póczos Department of Computing Science, University of Alberta, Canada Machine Learning Lunch Carnegie Mellon University Feb 8, 2010

Upload: others

Post on 23-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

Nonparametric information estimation using Rank based Euclidean Graph Optimization

methods

Barnabás Póczos

Department of Computing Science,

University of Alberta, Canada

Machine Learning Lunch

Carnegie Mellon University

Feb 8, 2010

Page 2: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

2

Outline

GOAL: Information estimation

Background on information and entropy

Applications in machine learning

Graph optimization methods

TSP, MST, kNNCopula transformation

Information estimation

Page 3: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

Background on information and entropy

Page 4: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

4

Who cares about dependence?

• Observing physical, biological, chemical systems

– Which observations are dependent / independent?

– Is there dependence between inputs and outputs?

• Neuroscience: dependence between stimuli and neural response

• Stock markets– which stocks are dependent of each others, which are

independent,

– dependence on other events?

Page 5: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

5

What is dependence?

We want to measure dependence

Correlation?

… nope that’s not good enough…

Correlation can be zero when there is dependence between the variables

Isn’t this only an artificial problem?

It is a serious problem…

Page 6: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

6

Correlation?… nope that’s not good enough

Correlation measures only pairwise dependencies.

What can we do when we have more variables?

X1

X2

covariance matrix = I

cross-correlation = 0,

but there is dependence

that’s why PCA is not able to separate mixed sources, and we need

ICA methods for considering higher order dependencies, too.

Page 7: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

7

What is dependence ?

Alfréd Rényi

1961, Hungary

Page 8: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

8

-divergence

• Rényi’s -information

• Shannon’s mutual information

Special cases:

Theorem:

1967, Hungary

Imre Csiszár

Page 9: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

9

Measuring Dependence and Uncertainty

Shannon mutual information

Measuring uncertainty (Shannon entropy)

1948, Princeton

Claude Shannon

Page 10: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

10

Rényi’s information and entropy

Rényi’s entropy

Rényi’s information

Page 11: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

Applications of mutual information in machine learning

Page 12: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

12

Feature selection (Peng et. al, 2005)

Goal:

• find a set of features that has the largest dependence

with the target class

• find dependent and independent features

XT =XT Rn x d

labels

Microarray dataprocessing

d = number of genes

n = number of patients

Page 13: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

13

Feature selection (Peng et. al, 2005)

XT =XT Rn x d

labels

d = number of genes

n = number of patients

Max dependency

Max relevance

Min redundancy, Max relevance

Don’t want dependent features

Page 14: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

14

Independent Subspace Analysis

Observation

Hidden, independent sources

(subspaces)

Estimate A and S observing samples from X only

Goal:

Page 15: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

15

Independent Subspace Analysis

In case of perfect

separation WA is a

block permutation

matrix

Objective:

Page 16: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

16

Mutual Information Applications

• Information theory, Channel capacity

• Information geometry

• Clustering

• Optimal experiment design, active learning

• Prediction of protein structure

• Drug design

• Boosting

• fMRI data processing

• …

Page 17: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

The estimation problem

Page 18: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

18

The estimation problem

GOAL:

Page 19: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

19

How can we estimate these?

How can we estimate them?

Page 20: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

Graph optimization methods

TSP, MST, kNN

Page 21: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

21

History• 1959, Beardwood, Halton, Hammersley

– Given uniform iid samples on [0,1]2. TSP length=?

• 1976, Karp

– randomly distributed cities

– This asymptotic can be used for developing a deterministic poly time near-optimal(1+) TSP algorithm (a.s.)

– 1st to show that 9 alg. for a stochastic version

of an NP complete problem (TSP) that inpolynomial time gives a near optimal solution (a.s.)

John Hammersley

Oxford, UK

Richard Karp

Berkeley, CA

– uniform iid samples on [0,1]d. TSP length=?

Page 22: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

22

• 1981 – TSP ) MST, Minimal Matching graphs, conjecture for kNN

• Renyi’s -entropy estimation?

observations with density f on [0,1]d

• Beardwood, Halton, Hammersley

Michael Steele Joseph Yukich Wansoo Rhee

1999, Hero, Costa & Michel

Page 23: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

23

Steele, Yukich theorem for MST, TSP, Minimal Matching

(Fig taken from J. Costa and A. Hero)Sensitive to outliers…!

Page 24: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

24

A new result: This procedure works on Set-Nearest Neighborhood

graphs, too

Calculate:

S={1,3,7} ) consider 1st, 3rd, 7th nearest neighbors of each node

S={k} ) consider the kth nearest neighbors of each node

S={1,2,…,k} ) kNN graphs

Page 25: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

25

Rényi’s entropy estimators using Set-Nearest Neighborhood graphs

(Fig. taken from J. Costa and A. Hero)

Page 26: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

How can we get information estimators from entropy estimators?

How can we make the estimators more robust?

Page 27: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

27

The invariance trick

Information is preserved under monotonic transformations.

Page 28: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

28

Transformation to get uniform marginals

The transformation (copula transformation):

Monotone transform Uniform marginals

• Information estimation problem is reduced for

Rényi’s entropy estimation

• a little problem: we don’t know Fi distribution functions

Monotone transformation leading to uniform marginals?

Prob theory 101:

Page 29: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

29

Copula methods

• Very popular in financial analysis

– Capture dependence between marginal variables

– An alternative form of normalization

• … but not so widespread in machine learning…

Well in financial analysis, they led

to the global financial crisis…

Maybe we should use them in ML more often!

Page 30: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

30

The copula transformation

Monotone transform Uniform marginals

a little problem: we don’t know Fi distribution functions

Solution:

Use empirical distributions Fjn and empirical copula transform.

We need this in 1D only! ) no curse of dimensionality.

Page 31: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

31

Empirical copula transformation

We don’t know F1,…,Fd distribution functions

) estimate them with empirical distribution functions

Page 32: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

32

Empirical copula transformation

“true” copula transformation:

empirical copula transformation:

empirical copula transformation of the observations:

Page 33: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

33

REGO AlgorithmRank-based Euclidean Graph Optimization

Page 34: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

Theoretical Results

consistency

Page 35: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

35

Is consistency trivial?

Difficulties:

• The true copula transformation is not known… only the empirical

• Samples lie on a grid, not coming from the true copula distribution

True copula transform

Empirical copula transform

Page 36: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

36

• Empirical copula transformed points are not iid in n

) Steele & Yukich’s theorem cannot be applied

We need different proofs for the two cases:

• 0<p· 1: We have triangle inequality for ||.||p, but it is not Lipschitz

• 1<p<d/2: ||.||p is Lipschitz, but no triangle inequality

Difficulties:

Is consistency trivial?

Page 37: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

37

Consistency Results

Main theorem:

Note:• Consistent MI estimator using ranks only

• Ranks only ) Robust

• We don’t have theoretical results for other d and values…

Page 38: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

Empirical Results

Page 39: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

39

Empirical results, consistency in 2D

n

kMST=0.8n

Page 40: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

40

Empirical results, consistency in 10D

Page 41: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

41

Image registration

n=4900=70 x 70

200 outliers

Page 42: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

42

Independent subspace analysis

Page 43: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

43

Independent subspace analysisin the presence of outliers

Page 44: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

44

Infinitesimal robustness (finite sample influence function)

The finite sample influence functions

The amount of change caused by adding one outlier, x

(with some modification)

It cannot be arbitrarily big!

Page 45: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

45

Conclusion

Marriage of seemingly unrelated mathematical areas produced a consistent and robust

Rényi’s information estimator

Graph optimization methods

TSP, MST, kNNCopula transformation

Information estimation

Page 46: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

46

Thanks for your attention!

“Hey Rege, Róka Rege, Hey REGŐ Rejtem…

I am calling my grandfather, I feel his soul here.

I am listening to his words, they are breaking the silence.

Impart your knowledge to your grandson please,

1000 years have already passed…,

Knowledge equals life, hey!”