1 of 25 1 of 45 association rule mining cit366: data mining & data warehousing instructor:...
TRANSCRIPT
![Page 1: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/1.jpg)
1of25
1of45 Association Rule Mining
CIT366: Data Mining & Data WarehousingInstructor: Bajuna SaleheThe Institute of Finance Management: Computing and IT Dept.
![Page 2: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/2.jpg)
2of25
2of45 What Is Association Mining?Association rule mining:
– Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories
Frequent Pattern: A pattern (set of items, sequence, etc.) that occurs frequently in a database
![Page 3: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/3.jpg)
3of25
3of45
Motivations For Association Mining
Motivation: Finding regularities in data– What products were often purchased together?
• Beer and nappies!
– What are the subsequent purchases after buying a PC?
– What kinds of DNA are sensitive to this new drug?
– Can we automatically classify web documents?
![Page 4: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/4.jpg)
4of25
4of45
Motivations For Association Mining (cont…)
Broad applications– Basket data analysis, cross-marketing, catalog
design, sale campaign analysis– Web log (click stream) analysis, DNA sequence
analysis, etc.
![Page 5: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/5.jpg)
5of25
5of45 Market Basket AnalysisMarket basket analysis is a typical example of frequent itemset mining
Customers buying habits are divined by finding associations between different items that customers place in their “shopping baskets”
This information can be used to develop marketing strategies
![Page 6: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/6.jpg)
6of25
6of45 Market Basket Analysis (cont…)
![Page 7: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/7.jpg)
7of25
7of45 Application of AssociationAssociation analysis can be used in promoting/improving marketing strategy by analysing frequent itemset.
As a marketing manager of a Company X for instance you would like to determine which items are frequently purchased together within the same transactions.
![Page 8: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/8.jpg)
8of25
8of45 Application of AssociationAn example of such a rule, mined from the X Company transactional database, isbuys(X; “computer”)=>buys(X; “software”) [support = 1%; confidence = 50%] where X is a variable representing a customer.A confidence, or certainty, of 50% means that if a customer buys a computer, there is a 50% chance that she will buy software as well.
![Page 9: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/9.jpg)
9of25
9of45 Application of Association
A 1% support means that 1% of all of the transactions under analysis showed that computer and software were purchased together.
This association rule involves a single attribute or predicate (i.e., buys) that repeats. Association rules that contain a single predicate are referred to as single-dimensional association rules.
![Page 10: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/10.jpg)
10of25
10of45 Application of Association
In addition to the marketing application, the same sort of question has the following uses:
Baskets = documents; items = words. Words appearing frequently together in documents may represent phrases or linked concepts. Can be used for intelligence gathering.
![Page 11: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/11.jpg)
11of25
11of45 Application of Association
Baskets = sentences, items = documents. Two documents with many of the same sentences could represent plagiarism or mirror sites on the Web.
![Page 12: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/12.jpg)
12of25
12of45 Association Rule Basic Concepts
Let I be a set of items {I1, I2, I3,…, Im}
Let D be a database of transactions where each transaction T is a set of items such that T I
So, if A is a set of items a transaction T is said to contain A if and only if A T
An association rule is an implication A B where A I, B I, and A B=
![Page 13: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/13.jpg)
13of25
13of45
Association Rule Support & Confidence
We say that an association rule A B holds in the transaction set D with support, s, and confidence, cThe support of the association rule is given as the percentage of transactions in D that contain both A and B (or A B)
So, the support can be considered the probability P(A B)
![Page 14: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/14.jpg)
14of25
14of45
Association Rule Support & Confidence (cont…)
The confidence of the association rule is given as the percentage of transactions in D containing A that also contain BSo, the confidence can be considered the conditional probability P(B|A)Association rules that satisfy minimum support and confidence values are said to be strong
![Page 15: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/15.jpg)
15of25
15of45 Itemsets & Frequent ItemsetsAn itemset is a set of items
A k-itemset is an itemset that contains k itemsThe occurrence frequency of an itemset is the number of transactions that contain the itemset
– This is also known more simply as the frequency, support count or count
An itemset is said to be frequent if the support count satisfies a minimum support count threshold
The set of frequent itemsets is denoted Lk
![Page 16: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/16.jpg)
16of25
16of45 Support & Confidence AgainSupport and confidence values can be calculated as follows:
)|()( ABPBAconfidence
Acountsupport
BAcountsupport
Asupport
BAsupport
_
_
)()( BAPBAsupport
count
BAuntsupport_co
![Page 17: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/17.jpg)
17of25
17of45
Mining Association Rules: An Example
Transaction-id Items bought
10 A, B, C
20 A, C
30 A, D
40 B, E, F
Frequent pattern Support
{A} 75%
{B} 50%
{C} 50%
{A, C} 50%
()
}){_)(
count
Ccount({A}supportCAsupport
})({_
}){_)(
Acountsupport
Ccount({A}supportCAconfidence
%7.66
%50
![Page 18: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/18.jpg)
18of25
18of45
Mining Association Rules: An Example (cont…)
Transaction-id Items bought
10 A, B, C
20 A, C
30 A, D
40 B, E, F
Frequent pattern Support
{A} 75%
{B} 50%
{C} 50%
{A, C} 50%
()
}){_)(
count
Acount({C}supportACsupport
})({_
}){_)(
Ccountsupport
Acount({C}supportACconfidence
%100
%50
![Page 19: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/19.jpg)
19of25
19of45 Association Rule MiningSo, in general association rule mining can be reduced to the following two steps:
1. Find all frequent itemsets• Each itemset will occur at least as frequently as
as a minimum support count
2. Generate strong association rules from the frequent itemsets
• These rules will satisfy minimum support and confidence measures
![Page 20: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/20.jpg)
20of25
20of45 Combinatorial Explosion!A major challenge in mining frequent itemsets is that the number of frequent itemsets generated can be massive
For example, a long frequent itemset will contain a combinatorial number of shorter frequent sub-itemsets
A frequent itemset of length 100 will contains the following number of frequent sub-itemsets:
30100 1027.112100
100...
2
100
1
100
![Page 21: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/21.jpg)
21of25
21of45 The Apriori AlgorithmAny subset of a frequent itemset must be frequent
– If {beer, nappy, nuts} is frequent, so is {beer, nappy}
– Every transaction having {beer, nappy, nuts} also contains {beer, nappy}
Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested!
![Page 22: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/22.jpg)
22of25
22of45 The Apriori Algorithm (cont…)The Apriori algorithm is known as a candidate generation-and-test approach
Method: – Generate length (k+1) candidate itemsets from
length k frequent itemsets
– Test the candidates against the DB
Performance studies show the algorithm’s efficiency and scalability
![Page 23: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/23.jpg)
23of25
23of45
The Apriori Algorithm: An Example
Database TDB
1st scan
C1L1
L2
C2 C2
2nd scan
C3 L33rd scan
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset sup
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
Itemset sup
{A} 2
{B} 3
{C} 3
{E} 3
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Itemset sup{A, B} 1{A, C} 2{A, E} 1{B, C} 2{B, E} 3{C, E} 2
Itemset sup{A, C} 2{B, C} 2{B, E} 3{C, E} 2
Itemset
{B, C, E}Itemset sup{B, C, E} 2
![Page 24: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/24.jpg)
24of25
24of45
Important Details Of The Apriori Algorithm
There are two crucial questions in implementing the Apriori algorithm:
– How to generate candidates?– How to count supports of candidates?
![Page 25: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/25.jpg)
25of25
25of45
Generating Candidates
There are 2 steps to generating candidates:– Step 1: Self-joining Lk
– Step 2: Pruning
Example of Candidate-generation– L3={abc, abd, acd, ace, bcd}
– Self-joining: L3*L3
• abcd from abc and abd• acde from acd and ace
– Pruning:• acde is removed because ade is not in L3
– C4={abcd}
![Page 26: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/26.jpg)
26of25
26of45
How to Count Supports Of Candidates?
Why counting supports of candidates a problem?
– The total number of candidates can be huge– One transaction may contain many candidates
Method:– Candidate itemsets are stored in a hash-tree– Leaf node of hash-tree contains a list of itemsets
and counts– Interior node contains a hash table– Subset function: finds all the candidates
contained in a transaction
![Page 27: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/27.jpg)
27of25
27of45 Generating Association RulesOnce all frequent itemsets have been found association rules can be generated
Strong association rules from a frequent itemset are generated by calculating the confidence in each possible rule arising from that itemset and testing it against a minimum confidence threshold
![Page 28: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/28.jpg)
28of25
28of45 Example
TID List of item_IDs
T100 Coke, Crisps, Milk
T200 Crisps, Bread
T300 Crisps, Nappies
T400 Coke, Crisps, Bread
T500 Coke, Nappies
T600 Crisps, Nappies
T700 Coke, Nappies
T800 Coke, Crisps, Nappies, Milk
T900 Coke, Crisps, Nappies
ID Item
I1 Coke
I2 Crisps
I3 Nappies
I4 Bread
I5 Milk
![Page 29: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/29.jpg)
29of25
29of45 Example
![Page 30: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/30.jpg)
30of25
30of45
Challenges Of Frequent Pattern Mining
Challenges– Multiple scans of transaction database– Huge number of candidates– Tedious workload of support counting for
candidates
Improving Apriori: general ideas– Reduce passes of transaction database scans– Shrink number of candidates– Facilitate support counting of candidates
![Page 31: 1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d0f5503460f949e52ae/html5/thumbnails/31.jpg)
31of25
31of45 Questions?
?