graph-based consensus maximization among multiple supervised and unsupervised models jing gao 1,...
TRANSCRIPT
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
Jing Gao1, Feng Liang2, Wei Fan3, Yizhou Sun1, Jiawei Han1
1 CS UIUC2 STAT UIUC
3 IBM TJ Watson
NIPS’2009
2/46
Outline• An overview of ensemble methods
– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques
• Consensus among supervised and unsupervised models– Problem and motivation
– Methodology
– Interpretations
– Experiments
3/46
Ensemble
Data
……
model 1
model 2
model k
Ensemble model
Applications: classification, clustering, collaborative filtering, anomaly detection……
Combine multiple models into one!
4/46
Stories of Success• Million-dollar prize
– Improve the baseline movie recommendation approach of Netflix by 10% in accuracy
– The top submissions all combine several teams and algorithms as an ensemble
• Data mining competitions– Classification problems– Winning teams employ an
ensemble of classifiers
5/46
Why Ensemble Works? (1)• Intuition
– combining diverse, independent opinions in human decision-making as a protective mechanism (e.g. stock portfolio)
• Uncorrelated error reduction– Suppose we have 5 completely independent
classifiers for majority voting– If accuracy is 70% for each
• 10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5) • 83.7% majority vote accuracy
– 101 such classifiers• 99.9% majority vote accuracy
from T. Holloway, Introduction to Ensemble Learning, 2007.
6/46
Why Ensemble Works? (2)
Model 1
Model 2Model 3
Model 4Model 5
Model 6
Some unknown distribution
Ensemble gives the global picture!
from W. Fan, Random Decision Tree.
7/46
Why Ensemble Works? (3)
• Overcome limitations of single hypothesis– The target function may not be implementable with individual
classifiers, but may be approximated by model averaging
Decision Tree Model Averagingfrom I. Davidson et. al., When Efficient Model Averaging Out-Performs Boosting and Bagging, ECML 06.
8/46
Research Focus
• Base models– Improve diversity!
• Combination scheme– Consensus (unsupervised)
– Learn to combine (supervised)
• Tasks– Classification (supervised ensemble)
– Clustering (unsupervised ensemble)
9/46
Outline• An overview of ensemble methods
– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques
• Consensus among supervised and unsupervised models– Problem and motivation
– Methodology
– Interpretations
– Experiments
10/46
Bagging• Bootstrap
– Sampling with replacement– Contains around 63.2% original records in each sample
• Ensemble– Train a classifier on each bootstrap sample
– Use majority voting to determine the class label of ensemble classifier
• Discussions– Incorporate diversity through bootstrap samples
– Sensitive base classifiers work better, such as decision tree
11/46
Boosting• Principles
– Boost a set of weak learners to a strong learner– Make records currently misclassified more important
• AdaBoost– Initially, set uniform weights on all the records
– At each round• Create a bootstrap sample based on the weights
• Train a classifier on the sample and apply it on the original training set• Records that are wrongly classified will have their weights increased• Records that are classified correctly will have their weights decreased• If the error rate is higher than 50%, start over
– Final prediction is weighted average of all the classifiers with weight representing the training accuracy
12/46
Classifications (colors) and Weights (size) after 1 iterationOf AdaBoost
3 iterations20 iterations
from Elder, John. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods. 2007.
13/46
Random Forest• Algorithm
– For each tree• Choose a training set by choosing N times with
replacement from the training set
• For each node, randomly choose m<M features and calculate the best split
• Fully grown and not pruned
– Use majority voting among all the trees
• Discussions– Bagging+random features: improve diversity
14/46
B1: {0,1}
B2: {0,1}
B3: continuous
B2: {0,1}
B3: continuous
B2: {0,1}
B3: continuous
B3: continous
B1 == 0
B2 == 0?
Y
B3 < 0.3?
N
Y N
……… B3 < 0.6?
Random threshold 0.3
Random threshold 0.6
B1 chosen randomly
B2 chosen randomly
B3 chosen randomly
Random Decision Tree
from W. Fan, Random Decision Tree.
15/46
Outline• An overview of ensemble methods
– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques
• Consensus among supervised and unsupervised models– Problem and motivation
– Methodology
– Interpretations
– Experiments
16/46
Clustering Ensemble• Goal
– Combine “weak” clusterings to a better one
from A. Topchy et. al. Clustering Ensembles: Models of Consensus and Weak Partitions. PAMI, 2005
17/46
Methods• Base Models
– Bootstrap samples, different subsets of features– Different clustering algorithms– Random number of clusters
• Combination– find the correspondence between the labels in
the partitions and fuse the clusters with the same labels
– treat each output as a categorical variable and cluster in the new feature space
18/46
Meta Clustering (1)• Cluster-based
– Regard each cluster from a base model as a record
– Similarity is defined as the percentage of shared common examples
– Conduct meta-clustering and assign record to the associated meta-cluster
• Instance-based– Compute the similarity between two records as the percentage of models
that put them into the same cluster
v2
v4
v3
v1
v5
v6
c2 c5
c3c1
c7
c9
c4 c6
c8
c10
from A. Gionis et. al. Clustering Aggregation. TKDD, 2007
19/46
Meta Clustering (2)• Probability-based
– Assume output comes from a mixture of models– Use EM algorithm to learn the model
• Spectral clustering– Formulate the problem as a bipartite graph
– Use spectral clustering to partition the graph
v2 v4v3v1 v5 v6
c2 c5c3c1
c7 c8
c4 c6
c9 c10
from A. Gionis et. al. Clustering Aggregation. TKDD, 2007
20/46
Outline• An overview of ensemble methods
– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques
• Consensus among supervised and unsupervised models– Problem and motivation
– Methodology
– Interpretations
– Experiments
21/46
Multiple Source Classification
Image Categorization Like? Dislike? Research Area
images, descriptions, notes, comments, albums, tags…….
movie genres, cast, director, plots…….
users viewing history, movie ratings…
publication and co-authorship network, published papers, …….
22/46
Model Combination helps!
Some areas share similar keywordsSIGMOD
SDM
ICDM
KDD
EDBT
VLDB
ICML
AAAI
Tom
Jim
Lucy
Mike
Jack
Tracy
Cindy
Bob
Mary
Alice
People may publish in relevant but different areas
There may be cross-discipline co-operations
supervised
unsupervised
Supervised or unsupervised
23/46
Problem
24/46
Motivations• Consensus maximization
– Combine output of multiple supervised and unsupervised models on a set of objects
– The predicted labels should agree with the base models as much as possible
• Motivations– Unsupervised models provide useful constraints for
classification tasks
– Model diversity improves prediction accuracy and robustness
– Model combination at output level is needed due to privacy-preserving or incompatible formats
25/46
Related Work (1)
• Single models – Supervised: SVM, Logistic regression, ……– Unsupervised: K-means, spectral clustering, ……– Semi-supervised learning, transductive learning
• Supervised ensemble– Require raw data and labels: bagging, boosting, Bay
esian model averaging
– Require labels: mixture of experts, stacked generalization
– Majority voting works at output level and does not require labels
26/46
Related Work (2)
• Unsupervised ensemble – find a consensus clustering from multiple par
titionings without accessing the features
• Multi-view learning– a joint model is learnt from both labeled and
unlabeled data from multiple sources– it can be regarded as a semi-supervised ens
emble requiring access to the raw data
27/46
Related Work (3)
28/46
Outline• An overview of ensemble methods
– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques
• Consensus among supervised and unsupervised models– Problem and motivation
– Methodology
– Interpretations
– Experiments
29/46
A Toy Example
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
1
2
3
1
2
3
30/46
Groups-Objects
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
1
2
3
1
2
3
g1
g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
31/46
Bipartite Graph
],...,[ 1 icii uuu
],...,[ 1 jcjj qqq
iu
Groups Objects
M1
M3
jq
object i
group j
conditional prob vector
otherwise
qua jiij 0
1adjacency
initial probability
cg
g
y
j
j
j
]10...0[
............
1]0...01[
[1 0 0]
[0 1 0] [0 0 1]
M2
M4……
……
32/46
Objective
iu
Groups Objects
M1
M3
jq
[1 0 0]
[0 1 0] [0 0 1]
M2
M4……
……
minimize disagreement
)||||||||(min 2
11 1
2, jj
s
j
n
i
v
jjiijUQ yqqua
Similar conditional probability if the object is connected to the group
Do not deviate much from the initial probability
33/46
Methodology
iu
Groups Objects
M1
M3
jq
[1 0 0]
[0 1 0] [0 0 1]
M2
M4……
……
v
jij
v
jjij
i
a
qa
u
1
1
n
iij
j
n
iiij
j
a
yuaq
1
1
Iterate until convergence
Update probability of a group
Update probability of an object
n
iij
n
iiij
j
a
uaq
1
1
34/46
Outline• An overview of ensemble methods
– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques
• Consensus among supervised and unsupervised models– Problem and motivation
– Methodology
– Interpretations
– Experiments
35/46
Constrained Embedding
v
j
c
zn
i ij
n
i izijjzUQ
a
uaq
1 11
1,min
)||||||||(min 2
11 1
2, jj
s
j
n
i
v
jjiijUQ yqqua
objects
groups
constraints for groups from classification models
zislabelsgifq jjz '1
36/46
Ranking on Consensus Structure
iu
Groups Objects
M1
M3
jq
[1 0 0]
[0 1 0] [0 0 1]
M2
M4……
……
jq
1.11.11
1. )( yDqADADDq nT
v
query
adjacency matrix
personalized damping factors
37/46
Incorporating Labeled Information
iu
)||||||||(min 2
11 1
2, jj
s
j
n
i
v
jjiijUQ yqqua
Groups Objects
M1
M3
jq
[1 0 0]
[0 1 0] [0 0 1]
M2
M4……
……
v
jij
v
jjij
i
a
qa
u
1
1
n
iij
j
n
iiij
j
a
yuaq
1
1
Objective
Update probability of a group
Update probability of an object
l
iii fu
1
2||||
v
jij
v
jijij
i
a
fqa
u
1
1
n
iij
n
iiij
j
a
uaq
1
1
38/46
Outline• An overview of ensemble methods
– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques
• Consensus among supervised and unsupervised models– Problem and motivation
– Methodology
– Interpretations
– Experiments
39/46
Experiments-Data Sets
• 20 Newsgroup– newsgroup messages categorization– only text information available
• Cora– research paper area categorization– paper abstracts and citation information available
• DBLP– researchers area prediction– publication and co-authorship network, and
publication content– conferences’ areas are known
40/46
Experiments-Baseline Methods (1)
• Single models– 20 Newsgroup:
• logistic regression, SVM, K-means, min-cut
– Cora• abstracts, citations (with or without a labeled set)
– DBLP• publication titles, links (with or without labels from conferences)
• Proposed method– BGCM– BGCM-L: semi-supervised version combining four models– 2-L: two models– 3-L: three models
41/46
Experiments-Baseline Methods (2)
• Ensemble approaches– clustering ensemble on all of the four models-
MCLA, HBGF
SingleModels
Ensemble atRaw Data
Ensemble at Output
Level
K-means, Spectral Clustering,
…...
Semi-supervised Learning,
Collective Inference
SVM, Logistic Regression,
…...
Multi-view Learning
Bagging, Boosting, Bayesian
model averaging,
…...
Unsupervised Learning
Supervised Learning
Semi-supervised Learning
Clustering Ensemble
Consensus Maximization
Majority Voting
Mixture of Experts, Stacked
Generalization
42/46
Accuracy (1)
43/46
Accuracy (2)
44/46
45/46
Conclusions• Ensemble
– Combining independent, diversified models improves accuracy
– Information explosion, various learning packages available
• Consensus Maximization– Combine the complementary predictive powers of
multiple supervised and unsupervised models– Propagate labeled information between group and
object nodes iteratively over a bipartite graph– Two interpretations: constrained embedding and
ranking on consensus structure• Applications
– Multiple source learning, Ranking, Truth Finding……
46/46
Thanks!
• Any questions?
http://www.ews.uiuc.edu/~jinggao3/nips09bgcm.htm