mining high utility itemsets without candidate generation

29
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker: I-Chih Chiu 1

Upload: milt

Post on 23-Feb-2016

102 views

Category:

Documents


1 download

DESCRIPTION

Mining High Utility Itemsets without Candidate Generation. Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source : CIKM "12 Advisor: Jia -ling Koh Speaker: I- Chih Chiu. Outline. Introduction Problem Definition Utility-List Structure High Utility Itemset Miner - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mining High Utility  Itemsets  without Candidate Generation

Mining High Utility Itemsets without Candidate Generation

Date: 2013/05/13Author: Mengchi Liu, Junfeng QuSource: CIKM "12Advisor: Jia-ling KohSpeaker: I-Chih Chiu 1

Page 2: Mining High Utility  Itemsets  without Candidate Generation

Outline• Introduction• Problem Definition• Utility-List Structure• High Utility Itemset Miner• Experiment• Conclusion

2

Page 3: Mining High Utility  Itemsets  without Candidate Generation

Introduction• The rapid development of database techniques

facilitates the storage and usage of massive data from business corporations, governments, and scientific organizations.

• The high utility itemset mining problem is one of the most important from the famous frequent itemset mining problem.

3

Page 4: Mining High Utility  Itemsets  without Candidate Generation

Introduction • Traditional frequent itemset mining algorithms

cannot evaluate the utility information about itemsets. In a supermarket database

Each item has a distinct price/profit. Each item in a transaction is associated with a distinct

quantity.An itemset with high support may have low utility

4

transaction support total utilityegg, bread 10 30

beef, pork 5 45

Ex :

Page 5: Mining High Utility  Itemsets  without Candidate Generation

Motivation• Recently, a number of high utility itemset mining

algorithms have been proposed.Generate candidate high utility itemsets.Compute the exact utilities of the candidates by scanning

the database to identify high utility itemsets.

• However, the algorithms often generate a very large number of candidate itemsets.Excessive memory requirement for storing candidate

itemsets.A large amount of running time for generating candidates

and computing their exact utilities. 5

Page 6: Mining High Utility  Itemsets  without Candidate Generation

Goal• A novel structure, called utility-list, is proposed.

the utility information about an itemset the heuristic information about whether the itemset should

be pruned or not.

• An efficient algorithm, called HUI-Miner (High Utility Itemset Miner), is developed. It does not generate candidate high utility itemsets. It can mine high utility itemsets after constructing the initial

utility-lists.

6

Page 7: Mining High Utility  Itemsets  without Candidate Generation

Diagram

7

High utility itemsets

HUI-Miner

Construct utility list

transactions

Page 8: Mining High Utility  Itemsets  without Candidate Generation

Outline• Introduction• Problem Definition• Utility-List Structure• High Utility Itemset Miner• Experiment• Conclusion

8

Page 9: Mining High Utility  Itemsets  without Candidate Generation

Problem Definition• : a set of items.• Each transaction() has a unique identifier().

Def. 1. : is the associated with in T in the .

Def. 2. : is the of in the . Def. 3. : is the product of and .

9𝑢 (𝑒 ,𝑇 5 )=𝑖𝑢 (𝑒 ,𝑇 5 )×𝑒𝑢 (𝑒)

Ex :

Page 10: Mining High Utility  Itemsets  without Candidate Generation

Def. 4. : The of in is the sum of the utilities of all the items in in , where .

Def. 5. : The of is the sum of the utilities of in all the transactions in , where .

Def. 6. : The of is the sum of the utilities of all the items in , where .

10

𝑢 ({𝑎𝑒 },𝑇 2 )=𝑢 (𝑎 ,𝑇 2 )+𝑢 (𝑒 ,𝑇 2 )𝑢 ({𝑎𝑒 })=𝑢 ( {𝑎𝑒 },𝑇 2 )+𝑢 ( {𝑎𝑒 } ,𝑇 5 )

Ex :

Ex : TransactionUtility

Page 11: Mining High Utility  Itemsets  without Candidate Generation

11

Def. 7. : The of itemset in is the sum of the utilities of all the transactions containing X in DB, where .

Property 1. If is less than a given “minutil”, all supersets of are not high utility.Rationale.

𝑡𝑤𝑢 ({ 𝑓 })=𝑡𝑢 (𝑇 4 )+𝑡𝑢 (𝑇 6 )Ex :

Assume minutil=30, According to Property 1, all supersets of are not high utility.

Ex :

TransactionUtility Transaction−WeightedUtility

Page 12: Mining High Utility  Itemsets  without Candidate Generation

Outline• Introduction• Problem Definition• Utility-List Structure

Initial Utility-ListsUtility-Lists of 2-ItemsetsUtility-Lists of k-Itemsets(k3)

• High Utility Itemset Miner• Experiment• Conclusion

12

Page 13: Mining High Utility  Itemsets  without Candidate Generation

Initial Utility-ListsDef. 8. A transaction is considered as “revised“ after (1) all the items whose transaction-weighted utilities are less than a given are deleted from the transaction.(2) the remaining items are sorted in transaction-weighted- utility-ascending order.

The remaining items are sorted: e<c<b<a<d

13

Suppose

All Revised Transactions

Transaction−WeightedUtility

Page 14: Mining High Utility  Itemsets  without Candidate Generation

Def. 9 : The set of all the items after in . : an itemset, : a transaction (or itemset)

Def. 10. : The of itemset X in transaction T is the sum of the utilities of all the items in in , where .

14

All Revised Transactions

𝑇 2 / {𝑒𝑏 }={𝑎𝑑 }Ex :

𝑇 2 / {𝑐 }={𝑏𝑎𝑑 }

InitialUtility −Lists

Tids : a transaction T containing XIutils : the utility of X in T, i.e., Rutils : the remaining utility of X in T, i.e.,

<3,2,9> is in the utility-list of {c}.

Ex :

Page 15: Mining High Utility  Itemsets  without Candidate Generation

Utility-Lists of 2-Itemsets• No need for database scan.

15

identifying common

transactions

Utility-listsof 2-itemset

Page 16: Mining High Utility  Itemsets  without Candidate Generation

Utility-Lists of k-Itemsets• To construct the utility-list of k-itemset ()

Intersect the utility-list of and

16

Ex :{}

(k=2)

(k3)

Page 17: Mining High Utility  Itemsets  without Candidate Generation

Outline• Introduction• Problem Definition• Utility-List Structure• High Utility Itemset Miner

Search spacePruning StrategyHUI-Miner Algorithm

• Experiment• Conclusion

17

Page 18: Mining High Utility  Itemsets  without Candidate Generation

Search space• Set-Enumeration Tree

18

Def. 11. Given a set-enumeration tree, an itemset represented by a node is called an extension of an itemset represented by an ancestor node of the node. For an itemset containing items, its extension containing items is called an - of the itemset.

Property 2. If is an extension of , Rationale. Any extension of X is a combination of X with the item(s) after X.

Ex :: the 1-extension of : the 2-extension of

Def. 9

Page 19: Mining High Utility  Itemsets  without Candidate Generation

Pruning Strategy• Exhaustive search → Time consuming

19

Lemma 1. Given the utility-list of , if the sum of all the and in the utility-list is less than a given “”, any extension of is not high utility.

Assume X= {ec } , X ’={ecb }

Page 20: Mining High Utility  Itemsets  without Candidate Generation

• : the of transaction • : the set in the utility-list of • : the set in the utility-list of ’

20

{𝑒𝑐 }⊂ {𝑒𝑐𝑏 }⇒ {𝑇 2 }⊆{𝑇 2 ,𝑡 4 }

Ex :Suppose

The sum of all the iutils amd rutils

7+6+11=24 < 30

Page 21: Mining High Utility  Itemsets  without Candidate Generation

HUI-Miner Algorithm

21

Page 22: Mining High Utility  Itemsets  without Candidate Generation

Outline• Introduction• Problem Definition• Utility-List Structure• High Utility Itemset Miner• Experiment• Conclusion

22

Page 23: Mining High Utility  Itemsets  without Candidate Generation

Experimental Setup• Besides HUI-Miner, experiments include three algorithms

IHUPTWUUP-GrowthUP-Growth+

• Eight databases

23

real

synthetic

Page 24: Mining High Utility  Itemsets  without Candidate Generation

• Running Time Terminated a mining task, once its running time exceeds 10000

seconds.

For most sparse databases, the performance superiority of HUI-Miner becomes very significant when the decreases.

24

Page 25: Mining High Utility  Itemsets  without Candidate Generation

• Memory Consumption Except for database accidents in (a), HUI-Miner always consumes less memory

than the other algorithms.

Another observation is that UP-Growth+ consumes more memory than UP-Growth in (b) and(d).UP-Growth+ holds more information than UPGrowth in sparse and large database.

25

Page 26: Mining High Utility  Itemsets  without Candidate Generation

Experiment• Processing Order of Items

The processing order of items significantly influences the performance of a high utility itemset mining algorithm.

26

Page 27: Mining High Utility  Itemsets  without Candidate Generation

27

Page 28: Mining High Utility  Itemsets  without Candidate Generation

Outline• Introduction• Problem Definition• Utility-List Structure• High Utility Itemset Miner• Experiment• Conclusion

28

Page 29: Mining High Utility  Itemsets  without Candidate Generation

Conclusion• Proposed a novel data structure, utility-list, and

developed an efficient algorithm, HUI-Miner, for high utility itemset mining.

• Utility-lists provide not only utility information about itemsets but also important pruning information for HUI-Miner.

• HUI-Miner can mine high utility itemsets without candidate generation, which avoids the costly generation and utility computation of candidates.

29