mining association rules from stars
DESCRIPTION
Department of Information & Computer Education, NTNU. Mining Association Rules from Stars. Eric Ka Ka Ng, Ada Wai-Chee Fu, and Ke Wang, 2002 IEEE International Conference on Data Mining (ICDM'02) , December 09 - 12 2002, Maebashi City, Japan. Advisor : Jia-Ling Koh Speaker : Chen-Yi Lin. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/1.jpg)
1
Mining Association Rules from Stars
Department of Information & Computer Education, NTNU
Eric Ka Ka Ng, Ada Wai-Chee Fu, and Ke Wang, 2002 IEEE International Conference on Data Mining (ICDM'02), December 09 - 12 2002, Mae
bashi City, Japan.
Advisor: Jia-Ling Koh
Speaker: Chen-Yi Lin
![Page 2: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/2.jpg)
2
Introductions Problem Definition The Proposed Method Experimental Results Conclusions
Department of Information & Computer Education, NTNU
Outline
![Page 3: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/3.jpg)
3
Introductions
In real life, a database is typically made up of multiple tables and one important case is where some of the tables form a star schema.
Department of Information & Computer Education, NTNU
Dimension table
Fact table (FT)
![Page 4: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/4.jpg)
4
Problem Definition (1/2)
Dimension table contains primary key (tid), some other attributes and no foreign keys.– The attributes in the dimension tables are uniqu
e.– The attributes take categorical values.
Fact table (FT)– stores the tids from dimension tables as foreign
keys.
Department of Information & Computer Education, NTNU
![Page 5: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/5.jpg)
5
Department of Information & Computer Education, NTNU
Problem Definition (2/2)
Dimension table and its binary representation
tidcategorical value
![Page 6: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/6.jpg)
6
The Proposed Method (1/8)
tid_list is an ordered list of elements of the form tid(count).– : e.g. – : e.g. – : e.g. – –
)( iA xtid 2,5)( 313 aaxtid A )(Xtid A )()()( jAiAjiA xtidxtidxxtid nakeyB _ 2,4_ 531 bbakeyB ixtidB _ XtidB _
Department of Information & Computer Education, NTNU
![Page 7: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/7.jpg)
7
The Proposed Method (2/8)
Minsup=5
count=6count=5
Hence the itemset is frequent6131 yyxx
Department of Information & Computer Education, NTNU
![Page 8: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/8.jpg)
8
The Proposed Method (3/8)
Binding multiple Dimension Tables– (1) To assign each combination of tid from A a
nd tid from B in FT a new tid– (2) and to set the tid in the tid_lists for items in
AB to the corresponding new tid.
Department of Information & Computer Education, NTNU
![Page 9: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/9.jpg)
9
The Proposed Method (4/8)
The set of frequent itemsets with items from tables A and/or B
An example of “binding” order
The set of frequent itemsets with items from tables A
Department of Information & Computer Education, NTNU
![Page 10: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/10.jpg)
10
The Proposed Method (5/8) 1,1,2 4311 aaaxtid A 1,1,1,1 54211 ttttxtid AB
(1)
(2)
Department of Information & Computer Education, NTNU
![Page 11: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/11.jpg)
11
The Proposed Method (6/8)
The fact table FT is scanned once and the information is stored into a data structure– Prefix Tree
• each node has a label (a tid) and a counter.
Department of Information & Computer Education, NTNU
![Page 12: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/12.jpg)
12
The Proposed Method (7/8)
Prefix tree structure representing
tid counter
Department of Information & Computer Education, NTNU
![Page 13: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/13.jpg)
13
The Proposed Method (8/8)
Collapsing the prefix tree
Department of Information & Computer Education, NTNU
![Page 14: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/14.jpg)
14
Experimental Results (1/5)
All experiments are conducted on SUN Ultra-Enterprise Generic_106541-18 with SunOS 5.7 and 8192MB Main Memory.
Programs are written in C++.
Department of Information & Computer Education, NTNU
![Page 15: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/15.jpg)
15
Experimental Results (2/5)
In the first dataset, items in A and B are strongly related, such that frequent itemsets contain items across A and B, while items in C are not involved.
In the second dataset, items in A, B and C are all strongly related, so that maximal frequent itemsets always contain items from all of A, B and C.
Department of Information & Computer Education, NTNU
![Page 16: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/16.jpg)
16
Experimental Results (3/5)
Running time for (A, B) related and (A, B, C) related datasets
Department of Information & Computer Education, NTNU
masl: implementing tid_list as a linked list structuremasb: implementing tid_list as a fixed-size bitmap and an array of countfpt: the join-before-mine approach with FP-tree algorithm [HPY00]
![Page 17: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/17.jpg)
17
Experimental Results (4/5)
Mixture datasets– 10% of transactions contain frequent itemsets fr
om only A, B, C, respectively.– 15% contain frequent itemsets from AB, BC, A
C, respectively.– 10% contain frequent itemsets from ABC.– 15% are random noise.
Department of Information & Computer Education, NTNU
![Page 18: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/18.jpg)
18Running time for mixture datasets
Department of Information & Computer Education, NTNU
Experimental Results (5/5)
![Page 19: Mining Association Rules from Stars](https://reader035.vdocuments.us/reader035/viewer/2022081519/568144fc550346895db1c793/html5/thumbnails/19.jpg)
19
Conclusions
Department of Information & Computer Education, NTNU
In the paper, the proposed method is a new algorithm for mining association rules on a star schema without performing the natural join.
The proposed method can be generalized to be applied to a snowflake structure.