our new progress on frequent/sequential pattern mining
DESCRIPTION
Our New Progress on Frequent/Sequential Pattern Mining. We develop new frequent/sequential pattern mining methods Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins. Mining Complete Set of Frequent Patterns on T10I4D100k. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/1.jpg)
Our New Progress on Frequent/Sequential Pattern Mining
We develop new frequent/sequential pattern mining methods
Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins
Our newmethods
Conventionalmethods
Frequent patternmining
FP-growth Apriori, TreeProjection
Sequential patternmining
PrefixSpan,FreeSpan
GSP
Frequent closedpattern mining
CLOSET A-close, CHARM
![Page 2: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/2.jpg)
Mining Complete Set of Frequent Patterns on T10I4D100k
0
20
40
60
80
100
120
140
0.00% 0.05% 0.10% 0.15%
Support threshold
Ru
nti
me (
seco
nd
)
Apriori
TreeProjection
FP-growth
![Page 3: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/3.jpg)
Mining Complete Set of Frequent Patterns on T25I20D100k
0
20
40
60
80
100
120
140
160
180
200
0.00% 0.50% 1.00% 1.50%
Support threshold
Ru
nti
me (
seco
nd
)
Apriori
TreeProjection
FP-growth
![Page 4: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/4.jpg)
Mining Complete Set of Frequent Patterns on Connect-4
0
50
100
150
200
250
300
350
400
70% 75% 80% 85% 90% 95%
Support threshold
Ru
nti
me (
seco
nd
) Apriori
TreeProjection
FP-growth
![Page 5: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/5.jpg)
Mining Sequential Patterns on C10T4S16I4
0
100
200
300
400
500
600
700
800
0.00% 0.50% 1.00% 1.50% 2.00%
Support threshold
Ru
n t
ime (
seco
nd
)
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
![Page 6: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/6.jpg)
Mining Sequential Patterns on C10T8S8I8
0
20
40
60
80
100
120
140
160
180
200
0.00% 0.50% 1.00% 1.50% 2.00%
Support threshold
Ru
n t
ime (
seco
nd
)
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
![Page 7: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/7.jpg)
Scalability of Mining Sequential Patterns on C10-100T8S8I8
0
100
200
300
400
500
600
700
800
0 20000 40000 60000 80000 100000
Number of sequences
Ru
n t
ime
(s
ec
on
d)
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
![Page 8: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/8.jpg)
Scalability of Mining Sequential Patterns on C10-100T4S16I4
0
200
400
600
800
1000
1200
1400
1600
0 20000 40000 60000 80000 100000
Number of sequences
Ru
n t
ime
(s
ec
on
d)
PrefixSpan-1
PrefixSpan-2
GSP
FreeSpan-2
![Page 9: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/9.jpg)
Why Prefix Is Faster Than GSP?
0.001
0.01
0.1
1
10
100
0.00% 0.50% 1.00% 1.50% 2.00%
Support threshold
# cand/pattern inGSP
Runtime/proj. db inPrefixSpan
0.001
0.01
0.1
1
10
100
0.00% 0.50% 1.00% 1.50% 2.00%
Support threshold
# cand/pattern inGSP
Runtime/proj. db inPrefixSpan
Dataset C10T4S16I4 Dataset C10T8S8I8
![Page 10: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/10.jpg)
Mining Frequent Closed Itemsets on T25I20D100k
0
20
40
60
80
100
0.7% 0.9% 1.1% 1.3% 1.5%
Support threshold
Ru
nti
me (
seco
nd
)
A-CLOSE
CLOSET
ChARM
![Page 11: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/11.jpg)
Mining Frequent Closed Itemsets on Connect-4
1
10
100
1000
10000
40% 50% 60% 70% 80% 90% 100%
Support threshold
Ru
nti
me (
seco
nd
) A-CLOSE
CLOSET
ChARM
![Page 12: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/12.jpg)
Mining Frequent Closed Itemsets on Pumsb
0
50
100
150
200
250
300
75% 80% 85% 90% 95%
Support threshold
Ru
nti
me (
seco
nd
) A-CLOSE
CLOSET
ChARM
![Page 13: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/13.jpg)
References R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for
generation of frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), (to appear), 2000.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases, pages 487--499, Santiago, Chile, September 1994.
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. KDD'2000, Boston, August 2000.
J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, Proc. SIGMOD’2000, Dallas, TX, May 2000.
J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, submitted for publication
R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT), pages 3--17, Avignon, France, March 1996.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. ICDT’99, Israel, January 1999.
M.J. Zaki and C. Hsiao. ChARM: An efficient algorithm for closed association rule mining. In Proc. KDD'2000, Boston, August 2000.
![Page 14: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/14.jpg)
DBMiner Version 2.5 (Beta)
DBMiner Technology Inc.B.C. Canada
![Page 15: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/15.jpg)
What we had for DBMiner 2.0…
Association module on data cubes Classification module on data cubes Clustering module on data cubes OLAP browser 3D Cube browser
![Page 16: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/16.jpg)
What we will do in DBMiner 2.5…
Keep the existing association module and classification module in version 2.0
Change the existing clustering module Add new visual classification module
both on SQL server and OLAP Add new sequential pattern modules
on SQL server using FP algorithm
![Page 17: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/17.jpg)
What we have done…
We have incorporated the existing association module and added OLAP browser Module
We have added the visual classification module
We have changed the existing clustering module
We have added the sequential pattern module
We are still in the development stage
![Page 18: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/18.jpg)
Association module on data cubes
![Page 19: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/19.jpg)
New sequential pattern module on SQL Server
![Page 20: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/20.jpg)
New visual classification module on data cubes
![Page 21: Our New Progress on Frequent/Sequential Pattern Mining](https://reader035.vdocuments.us/reader035/viewer/2022062500/56815454550346895dc26dc0/html5/thumbnails/21.jpg)
New clustering module on data cubes