© 2008 ibm corporation mining significant graph patterns by leap search xifeng yan (ibm t. j....
Post on 20-Dec-2015
215 views
TRANSCRIPT
![Page 1: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/1.jpg)
© 2008 IBM Corporation
Mining Significant Graph Patterns by Leap Search
Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)
![Page 2: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/2.jpg)
2
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Graph Patterns
Interestingness measures / Objective functions
• Frequency: frequent graph pattern
• Discriminative: information gain, Fisher score
• Significance: G-test
• …
![Page 3: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/3.jpg)
3
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Frequent Graph Pattern
![Page 4: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/4.jpg)
4
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Optimal Graph Pattern (this work)
![Page 5: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/5.jpg)
5
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Objective Functions
Challenge: Not Anti-Monotonic
X
![Page 6: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/6.jpg)
6
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Challenge: Non Anti-Monotonic
Anti-Monotonic
Non Monotonic
Non-Monotonic: Enumerate all subgraphs then check their score?
Enumerate subgraphs : small-size to large-size
![Page 7: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/7.jpg)
7
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Frequent Pattern Based Mining Framework
Exploratory task
Graph clustering
Graph classification
Graph index
(SIGMOD’04, ’05)(ISMB’05, ’07)
Graph Database Frequent Patterns Optimal Patterns
1. Bottleneck : millions, even billions of patterns
2. No guarantee of quality
![Page 8: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/8.jpg)
8
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Direct Pattern Mining Framework
Exploratory task
Graph clustering
Graph classification
Graph index
Graph Database Optimal Patterns
Direct
How?
![Page 9: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/9.jpg)
9
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Upper-Bound
![Page 10: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/10.jpg)
10
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Upper-Bound: Anti-Monotonic (cont.)
Rule of Thumb : If the frequency difference of a graph pattern in the positive dataset and the negative dataset increases, the pattern becomes more interesting
We can recycle the existing graph mining algorithms to accommodate non-monotonic functions.
![Page 11: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/11.jpg)
11
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Vertical Pruning
Larg
e <- s
mall
![Page 12: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/12.jpg)
12
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Horizontal Pruning: Structural Proximity
![Page 13: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/13.jpg)
13
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Structural Proximity: Another Perspective
# of frequent patterns >> # of possible frequency pairs
Many patterns share the same score
![Page 14: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/14.jpg)
14
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Frequency Envelope
![Page 15: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/15.jpg)
15
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Structural Leap Search
![Page 16: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/16.jpg)
16
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Frequency Association
Significant patterns often fall into the high-quantile of frequency
Starting with the most frequent patterns
![Page 17: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/17.jpg)
17
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Descending Leap Mine
1. Structural Leap Searchwith frequency threshold
3. Structural Leap Search
2. Support-Descending Mining
F(g*) converges
![Page 18: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/18.jpg)
18
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Results: NCI Anti-Cancer Screen Datasets
Name # of Compounds Tumor Description
MCF-7 27,770 Breast
MOLT-4 39,765 Leukemia
NCI-H23 40,353 Non-Small Cell Lung
OVCAR-8 40,516 Ovarian
P388 41,472 Leukemia
PC-3 27,509 Prostate
SF-295 40,271 Central Nerve System
SN12C 40,004 Renal
SW-620 40,532 Colon
UACC257 39,988 Melanoma
YEAST 79,601 Yeast anti-cancer
Link: http://pubchem.ncbi.nlm.nih.gov
Chemical Compounds: anti-cancer or not
# of vertices: 10 ~ 200
![Page 19: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/19.jpg)
19
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Efficiency
Vertical Pruning
Horizontal Pruning
![Page 20: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/20.jpg)
20
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Effectiveness (runtime)
frequency descending
frequency descending+ leap mine
![Page 21: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/21.jpg)
21
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Effectiveness (accuracy)
slightly different
![Page 22: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/22.jpg)
22
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Graph Classification
Name OA Kernel LEAP OA Kernel (6x) LEAP (6x)
Average (AUC) 0.70 0.72 0.75 0.77
* OA Kernel: Optimal Assignment Kernel LEAP: LEAP search
(6x)
(6x)
![Page 23: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/23.jpg)
23
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Scalability Means Something !
LEAP
OA
LEAP(6X)
OA(6X)
~20sec
~100sec
~200sec
~8000sec
Linear
Quadratic
![Page 24: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/24.jpg)
24
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Direct Pattern Mining Framework
Exploratory task
Graph clustering
Graph classification
Graph index
Graph Database Optimal Graph Patterns
Direct
![Page 25: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/25.jpg)
25
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Beyond Graph Patterns
Exploratory task
Clustering
Classification
Index
itemset/sequence/tree Database Optimal Patterns
Direct
1. Direct mining can be applied to itemsets, sequences, and trees
2. Existing algorithms can be recycled to mine patterns with sophisticated measures.
3. Pattern-based methods including indexing and classification are competitive.
![Page 26: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/26.jpg)
26
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Thank you
Direct Mining of Discriminative and Essential Graphicaland Itemset Features via Model-based Search Tree
SIGKDD’08 @ Las Vegas
![Page 27: © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)](https://reader036.vdocuments.us/reader036/viewer/2022062421/56649d435503460f94a1ed07/html5/thumbnails/27.jpg)
27
IBM T. J. Watson Research Center
Graph Pattern Mining | © 2008 IBM Corporation
Graph Classification: Kernel Approach
Kernel-based Graph Classification
Optimal Assignment Kernel (Fröhlich et al. ICML’05)