com (co-occurrence miner): graph classification based on pattern co-occurrence ning jin, calvin...
Post on 20-Dec-2015
216 views
TRANSCRIPT
COM (Co-Occurrence Miner):Graph Classification Based on
Pattern Co-occurrence
Ning Jin, Calvin Young, Wei WangUniversity of North Carolina at
Chapel Hill11/04/2009
What Are Graphs?
Graph: • a set of nodes connected by a set of edges• nodes and edges can have labels• edges can have directions
1 2
1
2
Graph Classification: Example
Negative set:
Positive set:
Graph Classification: Example
Negative set:
Positive set:
Graph ClassificationUsing Frequent Subgraph Patterns
The positive graphs should have
Some common subgraph patterns
that negative graphs don’t have
Generate classifiers
Frequent subgraph mining in the positive set
(frequency >= threshold)
Feature selection
High dimensional data points classification
Graph ClassificationUsing Frequent Subgraph Patterns
The positive graphs should have
Some common subgraph patterns
that negative graphs don’t have
Generate classifiers
Frequent subgraph mining in the positive set
Feature selection
High dimensional data points classification
Graph ClassificationUsing Discriminative Subgraph Patterns
Frequent subgraph mining in the positive set
Feature selection
Mining discriminative/significant
subgraph patternsmerge
Scoring function: Pattern redundancy:Pattern 1: found in positive graphs P1, P2 and in negative graphs N1, N2Pattern 2: found in positive graphs P1, P2, P3 and in negative graphs N1 Pattern 1 is redundant given pattern 2
Previous Discriminative Pattern Mining Methods
• Each tree node represents a subgraph pattern• Each node is a supergraph of its parent node, with one more edge• One subgraph pattern corresponds to only one node
Pattern redundancy:Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4Pattern 1 is redundant given pattern 2
Scoring function:
1. Heuristic Exploration Order
Pattern 1
Pattern 2
Pattern redundancy:Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4Pattern 1 is redundant given pattern 2
Heuristic Exploration Order: Delta Score
Pattern p
Pattern p’
Delta score of p = score of p – score of p’
Workflow of Pattern ExplorationCollect frequent edges in the positive set and insert
into a heap H
If H not empty
terminate
Pop from H the pattern p with the highest delta score
Extend pattern p and insert new non-redundant patterns into H
A frequency threshold tp is needed
2. Use Co-occurrences of PatternsD
DC
B
A
D
DC
B
A ACan be approximated by Co-occurrence
D
DC
B
A
D
DC
B
A AGraph G
Graph G’
When Co-occurrence Is Superior
Separately:A-B: N1, N2, P1, P2, P3, P4B-C: N3, N4, P1, P2, P3, P4
Co-occurrence of A-B and B-C:P1, P2, P3, P4No negative graphs
Co-occurrence Generation
Candidate co-occurrence 1
Candidate co-occurrence 2
Candidate co-occurrence 3
Candidate co-occurrence 4
Candidate co-occurrence n
For each new pattern p:
Pattern p
Union of pattern p and candidate co-occurrence k
insert
insert
merging candidate k and pattern p can improve the score of p
most significantly A co-occurrence is a set of subgraph patterns: {p1, p2, …, pm}
3. Use Association Rules to ClassifyAssociation Rule:{p1, p2, p3, …, pn} “positive”
Input of COM (Co-Occurrence rule Miner):Positive graph set, negative graph setFrequency threshold tp of classification rule in the positive set; frequency threshold tn in the negative set
Output of COM:A set of association rules
Association Rule Generation
Each candidate co-occurrence corresponds to a candidate association rule
If a rule satisfies >=tp and <=tn, it is a resulting rule
Terminate when each positive
graph is covered
Remove redundant rules
Experiments: Datasets
Protein datasets:Six SCOP families
Chemical datasets:Six PubChem bioassays
Experiments: Parameters & Evaluation
Protein datasets: tp = 30%, tn = 0%
Chemical datasets:tp = 1%, tn = 0.4%
Experimental Results: Protein Datasets
Experimental Results: Chemical Datasets
Conclusions
• Using heuristic pattern exploration order and co-occurrences can improve runtime efficiency of mining discriminative patterns
• Using association rules can achieve competitive classification accuracy
Questions & Suggestions