dual active feature and sample selection for graph classification
DESCRIPTION
Dual Active Feature and Sample Selection for Graph Classification. Xiangnan Kong 1 , Wei Fan 2 , Philip S. Yu 1. 1 Department of Computer Science University of Illinois at Chicago 2 IBM T. J. Watson Research. KDD 2011. Graph Classification. Traditional Classification:. Feature Vector. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/1.jpg)
Dual Active Feature and Sample Selection for Graph Classification
Xiangnan Kong1, Wei Fan2, Philip S. Yu1
1 Department of Computer Science University of Illinois at Chicago2 IBM T. J. Watson Research
KDD 2011
![Page 2: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/2.jpg)
Graph Classification
Graph Classification:
Traditional Classification:
xFeature Vector
input label output
Graph Object
input label output
![Page 3: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/3.jpg)
Cheminformatics: Drug Discovery
Training data Testing data
? ? ?
? ? ?+ + -
-- +
Chemical Compound
label
Anti-canceractivity
+/-
H
H HO C
HH HN
HHN
CCCCC
C
Graph Object
![Page 4: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/4.jpg)
Applications:XML Documents
labelCategory
Program Flows
labelError?
System Call Graph
labelNormal softare/ Virus?
![Page 5: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/5.jpg)
Graph ClassificationGiven a set of graph objects with class labels how to predict the labels of unlabeled graphs
Subgraph Feature Mining
Challenge:complex structurelack of features
![Page 6: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/6.jpg)
HHNx1
x2
Subgraph Features
H
H H
OC
H
H
H
H
H
H
H
H1 0
0 1
1
1
Subgraph Features
…
…
…
H
G1
G2
F1 F2 F3H HN
HHN
OOC C
C CO
O
C
CC
CC
CC
CC
CCC
CCC C C
CCCC
C C C
Classifierx1 x2
Graph Objects
Feature Vectors
Classifiers
How to extract a set of subgraph features for a
graph classification?
![Page 7: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/7.jpg)
Subgraph Feature SelectionExisting Methods
Mining discriminative subgraph features for a graph classification task
HHN
C
CC
CCC
+ + -
-- +F1 F2
Focused on supervised settings
![Page 8: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/8.jpg)
Labeling Cost Supervised Settings
Require a large number of labeled graphs
Labeling cost is high ?We can only afford to label a few graph objects
-> Feature selection-> Classification Accuracy
![Page 9: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/9.jpg)
Active Sample Selection Given a set of candidate graph
samples We want to select the most
important graph to query the label
? ? ?
? ? ?
+
+
-
![Page 10: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/10.jpg)
Active Sample Selection Given a set of candidate graph
samples We want to select the most
important graph to query the label
?+
+
- ?
? ?
?
?
![Page 11: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/11.jpg)
Two parts of the problem Active Sample Selection
select most important graph in the pool to query label
?
??
?
HHN
OOC C
OC
CCC
CCC
Subgraph Feature Selection Select relevant features
to the classification taskCorrelated !
![Page 12: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/12.jpg)
Active Sample Selection
No feature
Subgraph enumeration is NP-hardRepresentati
veInformative
![Page 13: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/13.jpg)
Active Sample Selection View
depend on which subgraph features are used
![Page 14: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/14.jpg)
OC
Example
H
H H
OC
H
H
H
H
H
H
H
HH
G1 G2
F1
F2
H HN
HHN C C
OO
C
CC
CC
C
C
CC
CCC CC
C C C
CCCC
C C C
Graphs
Subgraph Features
Very Similar
![Page 15: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/15.jpg)
HHN
H
H H
OC
H
H
H
H
H
H
H
HH
G1 G2
F1
F2
H HN
HHN
OOC C
C CO
O
C
CC
CCC CC
C C C
CCCC
C C CGraphs
Subgraph Features
Example
Very Different
![Page 16: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/16.jpg)
Subgraph Feature Selection
Graph Object
Subgraph Feature Feature
Selection View
Active Sample SelectionView
![Page 17: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/17.jpg)
Dual Active Feature and Sample Selection
Active SampleSelecti
onLabeled Graphs
+-
Unlabeled Graphs
??
?
Perform active sample selection & feature selection simultaneously
HHN C
CC
CCC
OOC C
Subgraph Feature Selection
Query & Label
![Page 18: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/18.jpg)
gActive Method Max-min Active Sample Selection
Maximizing the Reward for querying a graph
+
-
Worst Case
min.
max.
query
![Page 19: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/19.jpg)
gActive Method Dependence
Maximization Graphs’ features match with their labels
Informative Query graph far away from labeled graphs
Representative Query graph close to unlabeled graphs
Max-min Active Sample Selection Maximize the reward
Feature Selection Max. an utility function
+
![Page 20: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/20.jpg)
Example:
+
-
More Details in the paper:Branch& Bound Subgraph Mining (speed up)
![Page 21: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/21.jpg)
Anti-Cancer Activity datasets (NCI & AIDS)▪ Graph: chemical compounds▪ Label: anti-cancer activities
Experiments:Data Sets
balanced with 500 positive + 500 negative samples
![Page 22: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/22.jpg)
Experiments:Compared Methods
Unsupervised feature selection + Random SamplingFreq. + Random frequent subgraphs + random query
Supervised feature selection + Random SamplingIG + Random information gain + random query
Unsupervised feature selection + Margin-based Freq. + Margin frequent subgraphs + close to marginUnsupervised feature selection + TEDFreq.+ TED frequent subgraphs + transductive experimental design
Supervised feature selection + Margin-baseIG + Margin information gain + close to margin
Dual active feature and sample selectiongActive the proposed method in this paper
![Page 23: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/23.jpg)
Experiment Results
![Page 24: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/24.jpg)
Experiment Results (NCI-47) Accuracy(higher is better)
# Queried Graphs ( #features=200, NCI-47 )
gActive Dual Active Feature & Sample selection
I.G. + Random
Freq. + MarginFreq. + TEDI.G. + Margin
Freq. + Random
Supervised < Unsupervised Supervised > Unsupervised
![Page 25: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/25.jpg)
Experiment Results
![Page 26: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/26.jpg)
Experiment Results
gActive wins consistently
![Page 27: Dual Active Feature and Sample Selection for Graph Classification](https://reader035.vdocuments.us/reader035/viewer/2022062520/56816100550346895dd041ec/html5/thumbnails/27.jpg)
Conclusions Dual Active Feature and Sample
Selection for Graph Classification Perform subgraph feature selection and active
sample selection simultaneously
Thank you!
Future works other data and applications
▪ itemset and sequence data