com (co-occurrence miner): graph classification based on pattern co-occurrence ning jin, calvin...

22
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co- occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel Hill 11/04/2009

Post on 20-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

COM (Co-Occurrence Miner):Graph Classification Based on

Pattern Co-occurrence

Ning Jin, Calvin Young, Wei WangUniversity of North Carolina at

Chapel Hill11/04/2009

Page 2: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

What Are Graphs?

Graph: • a set of nodes connected by a set of edges• nodes and edges can have labels• edges can have directions

1 2

1

2

Page 3: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Graph Classification: Example

Negative set:

Positive set:

Page 4: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Graph Classification: Example

Negative set:

Positive set:

Page 5: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Graph ClassificationUsing Frequent Subgraph Patterns

The positive graphs should have

Some common subgraph patterns

that negative graphs don’t have

Generate classifiers

Frequent subgraph mining in the positive set

(frequency >= threshold)

Feature selection

High dimensional data points classification

Page 6: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Graph ClassificationUsing Frequent Subgraph Patterns

The positive graphs should have

Some common subgraph patterns

that negative graphs don’t have

Generate classifiers

Frequent subgraph mining in the positive set

Feature selection

High dimensional data points classification

Page 7: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Graph ClassificationUsing Discriminative Subgraph Patterns

Frequent subgraph mining in the positive set

Feature selection

Mining discriminative/significant

subgraph patternsmerge

Scoring function: Pattern redundancy:Pattern 1: found in positive graphs P1, P2 and in negative graphs N1, N2Pattern 2: found in positive graphs P1, P2, P3 and in negative graphs N1 Pattern 1 is redundant given pattern 2

Page 8: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Previous Discriminative Pattern Mining Methods

• Each tree node represents a subgraph pattern• Each node is a supergraph of its parent node, with one more edge• One subgraph pattern corresponds to only one node

Pattern redundancy:Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4Pattern 1 is redundant given pattern 2

Scoring function:

Page 9: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

1. Heuristic Exploration Order

Pattern 1

Pattern 2

Pattern redundancy:Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4Pattern 1 is redundant given pattern 2

Page 10: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Heuristic Exploration Order: Delta Score

Pattern p

Pattern p’

Delta score of p = score of p – score of p’

Page 11: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Workflow of Pattern ExplorationCollect frequent edges in the positive set and insert

into a heap H

If H not empty

terminate

Pop from H the pattern p with the highest delta score

Extend pattern p and insert new non-redundant patterns into H

A frequency threshold tp is needed

Page 12: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

2. Use Co-occurrences of PatternsD

DC

B

A

D

DC

B

A ACan be approximated by Co-occurrence

D

DC

B

A

D

DC

B

A AGraph G

Graph G’

Page 13: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

When Co-occurrence Is Superior

Separately:A-B: N1, N2, P1, P2, P3, P4B-C: N3, N4, P1, P2, P3, P4

Co-occurrence of A-B and B-C:P1, P2, P3, P4No negative graphs

Page 14: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Co-occurrence Generation

Candidate co-occurrence 1

Candidate co-occurrence 2

Candidate co-occurrence 3

Candidate co-occurrence 4

Candidate co-occurrence n

For each new pattern p:

Pattern p

Union of pattern p and candidate co-occurrence k

insert

insert

merging candidate k and pattern p can improve the score of p

most significantly A co-occurrence is a set of subgraph patterns: {p1, p2, …, pm}

Page 15: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

3. Use Association Rules to ClassifyAssociation Rule:{p1, p2, p3, …, pn} “positive”

Input of COM (Co-Occurrence rule Miner):Positive graph set, negative graph setFrequency threshold tp of classification rule in the positive set; frequency threshold tn in the negative set

Output of COM:A set of association rules

Page 16: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Association Rule Generation

Each candidate co-occurrence corresponds to a candidate association rule

If a rule satisfies >=tp and <=tn, it is a resulting rule

Terminate when each positive

graph is covered

Remove redundant rules

Page 17: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Experiments: Datasets

Protein datasets:Six SCOP families

Chemical datasets:Six PubChem bioassays

Page 18: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Experiments: Parameters & Evaluation

Protein datasets: tp = 30%, tn = 0%

Chemical datasets:tp = 1%, tn = 0.4%

Page 19: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Experimental Results: Protein Datasets

Page 20: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Experimental Results: Chemical Datasets

Page 21: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Conclusions

• Using heuristic pattern exploration order and co-occurrences can improve runtime efficiency of mining discriminative patterns

• Using association rules can achieve competitive classification accuracy

Page 22: COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel

Questions & Suggestions