bregman information bottleneck nips’03, whistler december 2003 koby crammer hebrew university of...

21
Bregman Bregman Information Bottleneck Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Koby Crammer Hebrew Hebrew University University of Jerusalem of Jerusalem Noam Slonim Noam Slonim Princeton Princeton University University

Upload: louisa-ellis

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

Bregman Bregman Information BottleneckInformation Bottleneck

NIPS’03, Whistler December 2003

Koby CrammerKoby CrammerHebrew UniversityHebrew University

of Jerusalemof Jerusalem

Noam SlonimNoam SlonimPrinceton UniversityPrinceton University

Page 2: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

MotivationMotivation

• Extend the IB for a broad family of representations• Relation to the Exponential family

Hello, world

Multinomial distribution

Vectors

Page 3: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

OutlineOutline

• Rate-Distortion Formulation• Bregman Divergences• Bregman IB• Statistical Interpretation• Summary

Page 4: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

Information BottleneckInformation Bottleneck

X T Y

X

[ p(y=1|X) … p(y=n|X)]

[ p(y=1|T) … p(y=n|T)]

T

Page 5: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

• Input

• Variables

• Distortion

Rate-Distortion FormulationRate-Distortion Formulation

Page 6: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

• Bolzman Distribution:

• Markov + Bayes

• Marginal

Self-Consistent EquationsSelf-Consistent Equations

Page 7: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

Bregman DivergencesBregman Divergences

f

(u,f(u))

(v,f(v))

(v, f(u)+f’(u)(v-u))

Bf(v||u) = f(v) - (f(u)+f’(u)(v-u))Bf(v||u) = f:S R

Page 8: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

• Functional

• Bregman Function

• Input

• Variables

• Distortion

Bregman IB: Rate-Distortion FormulationBregman IB: Rate-Distortion Formulation

Page 9: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

• Bolzman Distribution:

• Prototypes: convex combination of input vectors

• Marginal

Self-Consistent EquationsSelf-Consistent Equations

Page 10: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

Special CasesSpecial Cases

• Information Bottleneck: Bregman function: f(x)=x log(x) – x Domain: Simplex Divergence: Kullback-Leibler

• Soft K-means Bregman function: f(x)=(1/2) x2

Domain: Realsn

Divergence: Euclidian Distance [Still, Bialek, Bottou, NIPS 2003]

Page 11: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

Bregman IBBregman IB

Information Bottleneck

BregmanClustering

Rate-Distortion

Exponential Family

Page 12: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

Exponential FamilyExponential Family

• Expectation parameters:

• Examples (single dimension): Normal

Poisson

Page 13: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

• Expectation parameters:

• Properties :

Exponential Family and Exponential Family and Bregman DivergencesBregman Divergences

Page 14: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

IllustrationIllustration

Page 15: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

• Expectation parameters:

• Properties :

Exponential Family and Exponential Family and Bregman DivergencesBregman Divergences

Page 16: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

• Distortion:

• Data vectors and prototypes: expectation parameters

• Question: For what exponential distribution we have ?

Answer: Poisson

Back to Distributional ClusteringBack to Distributional Clustering

Page 17: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

Product of Poisson

Distributions

IllustrationIllustration

a a b a a a b a a a .8.2

a b

6040

a b

Pr

Multinomial Distribution

Page 18: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

Back to Distributional ClusteringBack to Distributional Clustering

• Information Bottleneck: Distributional clustering of Poison distributions

• (Soft) k-means: (Soft) Clustering of Normal distributions

Page 19: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

• Distortion

• Input: Observations

• Output Parameters of Distribution

• IB functional: EM [Elidan & Fridman, before]

Maximum Likelihood PerspectiveMaximum Likelihood Perspective

Page 20: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

• Posterior:

• Partition Function:

Weighted -norm of the Likelihood

• → ∞ , most likely cluster governs• →0 , clusters collapse into a single prototype

Back to Self Consistent EquationsBack to Self Consistent Equations

Page 21: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University

Summary Summary

• Bregman Information Bottleneck Clustering/Compression

for many representations and divergences

• Statistical Interpretation Clustering of distributions from the exponential family EM like formulation

• Current Work: Algorithms Characterize distortion measures which also yield

Bolzman distributions General distortion measures