multivariate information bottleneck
DESCRIPTION
Multivariate Information Bottleneck. Noam Slonim Princeton University Lewis-Sigler Institute for Integrative Genomics Nir Friedman Naftali Tishby Hebrew University School of Computer Science and Engineering. Multivariate Information Bottleneck - Preview. - PowerPoint PPT PresentationTRANSCRIPT
.
Multivariate Information Bottleneck
Noam SlonimPrinceton UniversityLewis-Sigler Institute for Integrative Genomics
Nir FriedmanNaftali TishbyHebrew UniversitySchool of Computer Science and Engineering
2
Multivariate Information Bottleneck - Preview
- A general framework for specifying a new family of clustering problems
- Almost all of these problems, are not treated by standard clustering approaches
- Insights and demonstrations why these problems are important
- A general optimal solution for all these problems, based on a single Information Theoretic principle
- Applications for text analysis, gene expression data and more...
3
Multivariate IB – introduction
1X
-Original IB: Compressing one variable while preserving the information about some other single variable
2X
1T
2X
21 X,XP 21 X,TP
4
Multivariate IB – introduction (cont.)
-However, we could think of other problems, e.g. symmetric compression:
Question: How to formulate and solve all such problems under one unifying principle?
2T
1X
2X
1T 21 X,XP
5
(a few words about …) Bayesian Networks
-A Bayes net over (X1,…,Xn) is a DAG G in which vertices correspond to the random variables
Gii
in PaXPX,...,XP 1
- P(X1,…,Xn) is consistent with G iff each Xi is independent of all the other (non-descendant) variables, given its parents Pai
4X
2X 3X
1X
142 XX,XInd
6
Multi-information and Bayes nets
-The information (X1,…,Xn) contains about each other is captured by:
nX,...,X n
nnn
XP...XPX,...,XP
logX,...,XP X,...,XI1 1
111
-If P(X1,…,Xn) is consistent with G then:
i
Giin
G Pa;XIX,...,XI 1
7
Original IB through Bayes net formulation
11 X,TI
21 X,TI
1X 2X
1T outin GG IIXTPL :Minimize
New generalized formulation:
ConstX;TIX;TI :Minimize 2111 Which in this case means:
2111 X;XIX;TII inG
1X 2X
1T
inG
Constant
What compresses what
1X 2X
1T
outG
21 X;TII outG
What predicts what
8
Alternative formulation: preliminaries
)QP(KLmin)GP(KL GQ
For a given DAG G, define: P
For P which is consistent with Gin:outin GGout II)GP(KL
Real multi-info in P(X,T) Multi-info as though P(X,T)is consistent with Gout
12
Beyond the original IB[Slonim, Friedman, Tishby]
n432 X ... X X X X 1 n432 X ... X X X X 1
k32 T ... T T T 1
Gin dependencies(minimize)
Gout dependencies(maximize)
Compression (Bottleneck) variables
Input variables Input variables
)PaT(P inGjj
Parameters
13
A simple example: Symmetric IB
2T
2211 X;TIX;TII inG
1X 2X
1T
inG What compresses what
1X 2X
1T
outG
21 T;TII outG
What predicts what
2T
212211 T;TIX;TIX;TIL :Minimize
14
A multivariate formal optimal solution
)PaP(t w.r.t I-I Minimizing inoutin Gjj
GG
jGjG
j
jGjj t,Padexp
),Z(Pa)P(t
)PaP(t in
inin
121211 tTPxTPDt,xd KL
-Where now d(Paj,tj) is a generalized (KL) distortion measure…
- For example, in symmetric IB:
15
Multivariate IB algorithms – example for aIB[Slonim, Friedman, Tishby, 2002]
W1 W2 W3 W4 W5 ................ WN
W1 W2 W3,W4 W5 .......... WN
W1,W2...WN
W1 W2 W3 W4 W5 ................ WN
W1 W2 W3,W4 W5 .......... WN
W1,W2...WN
afterbeforej,j LLtt rl
rlJS, tTP,tTPDttd~ rl
121211
rlrlrlj,jjjj,j ttd
~tPtPtt
-Which pair to merge?
-Where now is a generalized (JS) distortion measure…
rlj,j ttd
~
- For example, in symmetric aIB:
16
Symmetric aIB compression: documents, words
60
65
70
75
80
85
90
Test2 Test4 Test5
Original aIBSymmetric aIB
- Accuracy of symmetric aIB vs. original aIB over 3 small datasets:
Word clusters provide a more robust representation…
17
Symmetric IB through Deterministic Annealing
Data: 20,000 messages from 20 different discussion groups [Lang, 95]
W – a word in the corpusC – the class (newsgroup) of the message
P(W=‘bible’,C=‘alt.atheism’): Probability that choosing a random position in the corpus would select the word ‘bible’ in a message of the newsgroup (class) ‘alt.atheism’…
)C,W(Plog
Words
Classes
19
Symmetric IB through Deterministic Annealing
alt.atheismrec.autosrec.motorcyclesrec.sport.*sci.medsci.spacesoc.religion.christiantalk.politics.*
comp.*misc.forsalesci.cryptsci.electronics
carturkishgameteamjesusgunhockey…
xfileimageencryptionwindowdosmac…
New
sgro
up
Word
P(TC,TW)
25
Symmetric IB through Deterministic Annealing
New
sgro
up
Wordatheistschristianityjesusbiblesinfaith…
alt.atheismsoc.religion.christiantalk.religion.misc
P(TC,TW)
26
Symmetric aIB compression: genes, samples
Data: Gene expression of 500 “informative” genes Vs. 72 Leukemia samples (Golub et al, 1999)
Genes
Samples
)SG(Plog
27
Symmetric aIB compression: genes, samples
0.1
0.2
0.3
0.4
0.5
0.6
0.7
ALLB-cellhosp1
ALLB-cellhosp1
ALLT-cellhosp1Male
BMB-cell
BMB-cell
AML AMLhosp2
AMLhosp3
10 Geneclusters
8 Sample clusters
X00437_s_atM12886_atX76223_s_atM59807_atU23852_s_atD00749_s_atU89922_s_atX03934_atU50743_atM21624_atM28826_atM37271_s_atX59871_atX14975_atM16336_s_atL05148_atM28825_at
)TT(P GS
Data after symmetric aIB compression:
28
Another example: parallel IB
- Consider a document collection with different topics, and different writing styles:
topic4topic4topic4topic4
topic4topic4
topic4topic4
topic4topic4
topic4topic4
topic2topic2
topic2topic2
topic2topic2topic2topic2
topic2topic2
topic3topic3
topic3topic3
topic3topic3
topic3topic3
topic3topic3
Science
Science
topic1topic1
topic1topic1
topic1topic1
topic1topic1
topic1topic1
topic1topic1
topic1topic1
topic1topic1
29
Another example: parallel IB (cont.)
topic2topic2
topic2topic2
topic2topic2
topic2topic2
topic2topic2
topic2topic2
topic1topic1
topic1topic1
topic1topic1
topic1topic1
topic1topic1
topic4topic4
topic4topic4
topic4topic4
topic4topic4
topic4topic4
topic4topic4
topic3topic3
topic3topic3
topic3topic3
topic3topic3
topic3topic3
topic3topic3
topic3topic3
topic3topic3
Topic1 Topic2 Topic3 Topic4
-One possible “legitimate” partition is by the topic:
30
Another example: parallel IB (cont.)
-And another possible “legitimate” partition is by the writing style:
topic1topic1
topic3topic3
topic2topic2
topic3topic3
topic4topic4
topic1topic1
topic4topic4
topic1topic1
topic2topic2
topic2topic2
topic4topic4
topic1topic1
topic3topic3
topic1topic1
topic1topic1
topic3topic3
topic4topic4
topic1topic1
topic2topic2
topic3topic3
topic1topic1
topic3topic3
topic2topic2
topic4topic4
topic4topic4
Style1 Style2 Style3
There might be more than one“legitimate” partition…
31
Parallel IB: solution
2212211 X;T,TIX;TIX;TIL :Minimize
2T
2211 X;TIX;TII inG
1X 2X
1T
inG Minimize dependencies
1X 2X
1T
outG
)X;T,T(II outG221
Maximize dependencies
2T
))]T,tX(P)T,xX(P(D[E)t,x(d KL)XT(P 2122121112
Effective distortion:
32
Parallel sIB: Text analysis results
-Data: ~1,500 “documents” taken from E. R. Burroughs: The Beasts of Tarzan & The Gods of Mars
R. Kipling: The Jungle Book & Rewards and Fairies
- X1 corresponds to “documents”, X2 corresponds to words
32542
1254
4061
2315
T2,bT2,a
Burroughs
Kipling3670Rewards and Fairies
2550The Jungle Book
0407The Gods of Mars
2315The Beasts of Tarzan
T1,bT1,a
33
Parallel sIB :Gene Expression data results
- Data: Gene expression of 500 “informative” genes Vs. 72 Leukemia samples (Golub et al, 1999)
- X1 corresponds to samples, X2 corresponds to genes
.72.64<PS>
90T-cell
380B-cell
470ALL
223AML
T1,bT1,a
.66.71
90
137
1037
1114
T2,bT2,a
.76.53
63
326
389
1312
T3,bT3,a
.69.70
72
1820
2522
1213
T4,bT4,a
34
Another Example: Triplet IB
-Consider the following sequence data:
s(1) s(2) s(3) … s(t-1) s(t) s(t+1) …
-Can we extract features s.t. their combination is informative about a symbol between them?
Xp Xm Xn
Tp Tn
35
Triplet IB: solution
mnpnnpp X;T,TβIX;TIX;TIL :Minimize
nnppG X;TIX;TII in
nT
pX
pT
inG Minimize dependencies
mX nX pX
pT
outG
)X;T,I(TI mnpGout
Maximize dependencies
nT
mX nX
36
Triplet IB Data
(E. R. Burroughs, “Tarzan the Terrible”)
“… As Tarzan ascended the platform his eyes narrowed angrily at thesight which met them… ‘’What means this?” he cried angrily…”
1st word in triplet
Xp
2nd word in triplet
Xm
3rd word in triplet
Xn
Xm = {apemans, apes, eyes, girl, great, jungle, tarzan, time, two, way}
Data: Tarzan and the Jewels of Opar, Tarzan of the Apes, Tarzan the Terrible, Tarzan the Untamed, The Beasts of Tarzan, The Jungle Tales of Tarzan, The Return of Tarzan
Joint distribution P(Xp,Xm,Xn) of dimension 90 x 10 x 233
37
Triplet sIB: Text analysis results
- Given Xp and Xn, two schemes to predict middle word:Xm = argmax P( xm’ | tp,tn )
- Test on a NEW sequence, “The son of Tarzan”:
22%28%55%53%Average
21%28%81%60%Way (101)
8%11%92%41%Two (148)
26%48%82%70%Time (145)
25%40%67%41%Tarzan (48)
24%27%54%49%Jungle (241)
48%50%92%92%Great (219)
1%5%30%43%Girl (240)
28%32%81%83%Eyes (177)
14%17%26%43%Apes(78)
Xp, XnTp, TnXp, XnTp, TnXm
Precision (%) Recall (%)
Xm = argmax P( xm’ | xp,xn )
38
Summary
- The IB method is a principled framework, for extracting “informative” structure out of a joint distribution P(X1,X2).
- The Multivariate IB extends this framework to extract “informative” structure from more complex joint distributions, P(X1,…,Xn), in various ways.
- This enables us to define and solve a new family of optimization problems, under a single unifying Information Theoretic principle.
- References: www.cs.huji.ac.il/~noamm
- “Clustering” conceals a family of distinct problems which deserve special consideration. The multivariate IB framework enables to define these sub-problems, solve them, and demonstrate their importance.