chapter 7: frequent itemsets and association...
TRANSCRIPT
![Page 1: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/1.jpg)
Information Retrieval & Data MiningUniversität des Saarlandes, SaarbrückenWinter Semester 2011/12
VII.1-
Chapter 7: Frequent Itemsets and Association Rules
1
![Page 2: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/2.jpg)
20 December 2011IR&DM, WS'11/12 VII.1-
Chapter VII: Frequent Itemsets and Association Rules*1. Definitions
• Transaction data, frequent itemsets, closed and maximal itemsets, association rules
2. The Apriori Algorithm• Monotonicity and candidate pruning, mining closed
and maximal itemsets3. Generating Association Rules4. Other measures for Association Rules
• Properties of measures
2
*Zaki & Meira, Chapters 10 and 11; Tan, Steinbach & Kumar, Chapter 6
![Page 3: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/3.jpg)
20 December 2011IR&DM, WS'11/12 VII.1-
Chapter VII.1: Definitions1. The transaction data model
1.1. Data as subsets1.2. Data as binary matrix
2. Itemsets, support, and frequency3. Closed and maximal itemsets4. Association rules and confidence5. Related data mining tasks
3
![Page 4: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/4.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
The transaction data model• Data mining considers larger variety of data types
than typical IR• Methods usually work on any data that can be
expressed in certain type–Graphs, points in metric space, vectors, ...
• The data type used in itemset mining is the transaction data–Data contains transactions over some set of items
4
![Page 5: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/5.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
The market basket data
5
Items are: bread, milk, diapers, beer, and eggsTransactions are: 1:{bread, milk}, 2:{bread, diapers, beer, eggs},3:{milk, diapers, beer}, 4:{bread, milk, diapers, beer}, and5:{bread, milk, diapers}
![Page 6: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/6.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
The market basket data
5
TID Bread Milk Diapers Beer Eggs
1
2
3
4
5
✔ ✔
✔ ✔ ✔ ✔
✔ ✔ ✔
✔ ✔ ✔ ✔
✔ ✔ ✔
Items are: bread, milk, diapers, beer, and eggsTransactions are: 1:{bread, milk}, 2:{bread, diapers, beer, eggs},3:{milk, diapers, beer}, 4:{bread, milk, diapers, beer}, and5:{bread, milk, diapers}
![Page 7: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/7.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
The market basket data
5
TID Bread Milk Diapers Beer Eggs
1
2
3
4
5
✔ ✔
✔ ✔ ✔ ✔
✔ ✔ ✔
✔ ✔ ✔ ✔
✔ ✔ ✔
Items are: bread, milk, diapers, beer, and eggsTransactions are: 1:{bread, milk}, 2:{bread, diapers, beer, eggs},3:{milk, diapers, beer}, 4:{bread, milk, diapers, beer}, and5:{bread, milk, diapers}
Transaction IDs
![Page 8: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/8.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Transaction data as subsets
6
a: breadb: beerc: milkd: diaperse: eggs
2n subsets of n items. Layer k has subsets.�nk
�
![Page 9: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/9.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Transaction data as subsets
6
a: breadb: beerc: milkd: diaperse: eggs
{bread, milk}
2n subsets of n items. Layer k has subsets.�nk
�
![Page 10: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/10.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Transaction data as subsets
6
a: breadb: beerc: milkd: diaperse: eggs
{bread, milk}
{bread, milk, diapers}{beer, milk, diapers}
{bread, beer, milk, diapers}
{bread, beer, diapers, eggs}
2n subsets of n items. Layer k has subsets.�nk
�
![Page 11: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/11.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Transaction data as binary matrix
7
TID Bread Milk Diapers Beer Eggs
1
2
3
4
5
✔ ✔
✔ ✔ ✔ ✔
✔ ✔ ✔
✔ ✔ ✔ ✔
✔ ✔ ✔
![Page 12: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/12.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Transaction data as binary matrix
7
TID Bread Milk Diapers Beer Eggs
1
2
3
4
5
1 1 0 0 0
1 0 1 1 1
0 1 1 1 0
1 1 1 1 0
1 1 1 0 0
![Page 13: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/13.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Transaction data as binary matrix
7
TID Bread Milk Diapers Beer Eggs
1
2
3
4
5
1 1 0 0 0
1 0 1 1 1
0 1 1 1 0
1 1 1 1 0
1 1 1 0 0
Any data that can be expressed as a binary matrix can be used.
![Page 14: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/14.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Itemsets, support, and frequency
8
• An itemset is a set of items–A transaction t is an itemset with associated transaction ID,
t = (tid, I), where I is the set of items of the transaction
• A transaction t = (tid, I) contains itemset X if X ⊆ I• The support of itemset X in database D is the number
of transactions in D that contain it: supp(X, D) = |{t ∈ D : t contains X}|• The frequency of itemset X in database D is its
support relative to the database size, supp(X, D) / |D|• Itemset is frequent if its frequency is above user-
defined threshold minfreq
![Page 15: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/15.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Frequent itemset example
9
TID Bread Milk Diapers Beer Eggs
1
2
3
4
5
1 1 0 0 0
1 0 1 1 1
0 1 1 1 0
1 1 1 1 0
1 1 1 0 0
Itemset {Bread, Milk} has support 3 and frequency 3/5Itemset {Bread, Milk, Eggs} has support and frequency 0For minfreq = 1/2, frequent itemsets are:{Bread}, {Milk}, {Diapers}, {Beer}, {Bread, Milk}, {Bread, Diapers}, {Milk, Diapers}, and {Diapers, Beer}
![Page 16: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/16.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Closed and maximal itemsets
10
• Let F be the set of all frequent itemsets (w.r.t. some minfreq) in data D• Frequent itemset X ∈ F is maximal if it does not have
any frequent supersets–That is, for all Y ⊃ X, Y ∉ F
• Frequent itemset X ∈ F is closed if it has no superset with the same frequency–That is, for all Y ⊃ X, supp(Y, D) < supp(X, D)• It can’t be that supp(Y, D) > supp(X, D). Why?
![Page 17: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/17.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Example of maximal frequent itemsets
11
![Page 18: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/18.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Example of maximal frequent itemsets
11
Not maximal because of {a, c, e}
![Page 19: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/19.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Example of maximal frequent itemsets
11
![Page 20: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/20.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Example of closed frequent itemsets
12
![Page 21: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/21.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Example of closed frequent itemsets
12
Itemset {a, b} is contained in
transactions 1 and 2
![Page 22: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/22.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Example of closed frequent itemsets
12
Closed, but not maximal
![Page 23: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/23.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Example of closed frequent itemsets
12
Closed and maximal
![Page 24: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/24.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Example of closed frequent itemsets
12
![Page 25: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/25.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Itemset taxonomy
13
![Page 26: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/26.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Association rules and confidence
14
• An association rule is a rule of type X → Y, where X and Y are itemsets– If transaction contains itemset X, it (probably) also contains
itemset Y• The support of rule X → Y in data D is
supp(X → Y, D) = supp(X ∪ Y, D)–Tan et al. (and other authors) divide this value by |D|
• The confidence of rule X → Y in data D is c(X → Y, D) = supp(X ∪ Y, D)/supp(X, D)–The confidence is the empirical conditional probability that
transaction contains Y given that it contains X
![Page 27: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/27.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Association rule examples
15
TID Bread Milk Diapers Beer Eggs
1
2
3
4
5
1 1 0 0 0
1 0 1 1 1
0 1 1 1 0
1 1 1 1 0
1 1 1 0 0
{Bread, Milk} → {Diapers} has support 2 and confidence 2/3{Diapers} → {Bread, Milk} has support 2 and confidence 1/2{Eggs} → {Bread, Diapers, Beer} has support 1 and confidence 1
![Page 28: Chapter 7: Frequent Itemsets and Association Rulesresources.mpi-inf.mpg.de/d5/teaching/ws11_12/irdm/slides/irdm-7-1.… · IR&DM, WS'11/12 20 December 2011 VII.1-Itemsets, support,](https://reader035.vdocuments.us/reader035/viewer/2022062919/5ee27ca1ad6a402d666cec71/html5/thumbnails/28.jpg)
IR&DM, WS'11/12 VII.1-20 December 2011
Related data mining tasks
16
• Frequent itemset mining– Given a database and minfreq, find all frequent itemsets– Often a pre-processing step
• Maximal or closed itemset mining– Given a database and minfreq, find all maximal or closed itemsets– Can provide a succinct presentation of the data•Which items appear often together
• Association rule mining– Given a database and minsupp and minconf, find all confident
and common association rules– Implication analysis: If X is bought/observed, what else will
probably be bought/observed