support cluster machine paper from icml2007 read by haiqin yang 2007-10-18
DESCRIPTION
Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang 2007-10-18. This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, which was published in 2007. Outline. Background and Motivation Support Cluster Machine - SCM Kernel in SCM - PowerPoint PPT PresentationTRANSCRIPT
1
Support Cluster MachinePaper from ICML2007
Read by Haiqin Yang
2007-10-18
This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping Fan, Xiangyang Xue, which was published in 2007.
2
Outline
Background and Motivation
Support Cluster Machine - SCM
Kernel in SCM
Experiments
An Interesting Application: Privacy-preserving Data Mining
Discussions
3
Background and Motivation
Large scale classification problem Decomposition methods
Osuna et al., 1997; Joachims, 1999; Platt, 1999; Collobert & Bengio, 2001; Keerthi et al., 2001;
Incremental algorithms Cauwenberghs & Poggio, 2000; Fung & Mangasarian, 2002; Laskov et al., 2006;
Parallel techniques Collobert et al., 2001; Graf et al., 2004;
Approximate formula Fung & Mangasarian, 2001; Lee & Mangasarian, 2001;
Choose representatives Active learning - Schohn & Co
hn, 2003; Cluster Based-SVM - Yu et al.,
2003; Core Vector Machine (CVM) -
Tsang et al., 2005; Clustering SVM - Boley, D. &
Cao, 2004;
4
Support Cluster Machine - SCM
Given training samples:
Procedure
5
SCM Solution
Dual representation
Decision function
6
Kernel
Probability product kernel
By Gaussian assumption, i.e.,
Hence
7
Kernel Property I
That is
Decision function
Property II
8
Experiments
Datasets Toydata MNIST – Handwritten digits
(‘0’-’9’) classification Adult – Privacy-preserving Dat
aset
Clustering algorithms Threshold Order Dependent (T
OD) EM algorithm
Classification methods libSVM SVMTorch SVMlight
CVM (Core Vector Machine) SCM
Model selection
CPU: 3.0GHz
9
Toydata
Samples: 2500 samples/class generated from a mixture of Gaussian distribution
Clustering algorithm: TOD Clustering results: 25 positive, 25 negative
10
MNIST Data description
10 classes: Handwritten digits ‘0’-’9’ Training samples: 60,000, about 6000 for each class Testing samples: 10,000
Construct 45 binary classifiers Results
25 Clusters for EM algorithm
11
MNIST
Test results for TOD algorithm
12
Privacy-preserving Data Mining Inter-Enterprise data mining
Problem: Two parties owning confidential databases wish to build a decision-tree classifier on the union of their databases, without revealing any unnecessary information.
Horizontally partitionedRecords (users) split across companiesExample: Credit card fraud detection model
Vertically partitionedAttributes split across companiesExample: Associations across websites
13
Privacy-preserving Data Mining Randomization approach
50 | 40K | ... 30 | 70K | ... ...
...
Randomizer Randomizer
Reconstructdistribution
of Age
Reconstructdistributionof Salary
Data MiningAlgorithms
Model
65 | 20K | ... 25 | 60K | ... ...
14
Classification Example
Age Salary Repeat Visitor?
23 50K Repeat
17 30K Repeat
43 40K Repeat
68 50K Single
32 70K Single
20 20K Repeat
Age < 25
Salary < 50K
Repeat
Repeat
Single
Yes
Yes
No
No
15
Privacy-preserving Dataset: Adult
Data description Training samples: 30162 Testing samples: 15060 Percentage of positive samples: 24.78%
Procedure Horizontally partition data into three subsets (parties) Cluster by TOD algorithm Obtain three positive and three negative GMMs Combine positive and negative GMMs into one positive and one negative
GMMs with modified priors Classify them by SCM
16
Privacy-preserving Dataset: Adult Partition results
Experimental results
17
Discussions Solved problems
Large scale problems: downsample by clustering + classifier Privacy-preserving problems: hide individual information
Differences to other methods Training units are generative model, testing units are vectors Training units contain complete statistical information Only one parameter for model selection Easy implementation Generalization ability is not clear, while the RBF kernel in SVM has the p
roperty of larger width leads to lower VC dimension.
18
Discussions
Advantages of using priors and covariances
19
Thank you!