up growth an efficient algorithm for high utility itemset mining(sigkdd2010) (1)

Click here to load reader

Upload: vinayaga-moorthy

Post on 21-Jun-2015

160 views

Category:

Documents


0 download

TRANSCRIPT

  • 1. Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/251

2. Outline 2010/8/252 Motivation Problem Definition Method UP-Tree Structure UP-Growth Method Experimental Results Conclusions 3. Motivation 2010/8/253 The unit profits and purchased quantities of the items are not taken into considerations in frequent itemset mining. The basic meaning of utility is the interestedness/ importance/profitability of items to the users. 4. (Cont.) 2010/8/254 The utility of items in a transaction database consists of two aspects: External utility: the importance of distinct items. Internal utility: the importance of the items in the transaction. The utility of an itemset is defined as the external utility multiplied by the internal utility. High utility itemset: its utility is no less than a user-specified threshold. 5. (Cont.) 2010/8/255 Mining high utility itemsets from the databases is not an easy task since the downward closure property used in frequent itemset mining cannot be applied here. How to effectively prune the search space and efficiently capture all high utility itemsets with no miss is a big challenge. 6. Problem Definition 2010/8/256 If TWU(X) is no less than the minimum utility threshold, X is called a high transaction-weighted utilization itemset (abbreviated as HTWUI) u(ip,Td)=p(ip)*q(ip, Td) u({A},T1)=5*1=5 ( , ) ( , ) p d d p di X X T u X T u i T u({AC},T1)=u({A},T1)+u({C}, T1)=5+1=6 ( ) ( , ) d d dX T T D u X u X T u({AD})=u({AD},T1)+u({AD} ,T3)=7+17=24 ( ) ( , )d d dTU T u T T TU(T1)=u({ACD},T1)= 8 ( ) ( ) d d dX T T D TWU X TU T TWU({AD})=TU(T1)+TU(T3) =8+30=38 The transaction-weighted downward closure(TWDC): For any itemset X, if X is not a HTWUI, any superset of X is a low utility itemset. An itemset is called a high utility itemset if its utility is no less than min_util 7. Proposed Method 2010/8/257 Construction of UP-Tree Generation of potential high utility itemsets (PHUIs) from the UP-Tree by UP-Growth 8. Construction of UP-Tree 2010/8/258 The construction of UP-Tree can be performed with two scans of the original database. First scan TU of each transaction is computed. TWU of each single item is also accumulated. Discarding global unpromising items. Unpromising items are removed from the transaction and utilities are eliminated from the TU of the transaction. The remaining promising items in the transaction are sorted in the descending order of TWU. Second scan Transactions are inserted into UP-Tree. 9. (Cont.) 2010/8/259 min_util= 40 First scan unpromising items Descending order of TWU 10. (Cont.) 2010/8/2510 Second scan 11. (Cont.) 2010/8/2511 1 8 12. (Cont.) 2010/8/2512 1 8 13. (Cont.) 2010/8/2513 2 30 1 1 22 22 14. (Cont.) 2010/8/2514 Strategy 1. Discarding global unpromising items (DGU). 15. Generating PHUIs from the global UP-tree 2010/8/2515 {D}s conditional pattern base ({D}- CPB) An item ip is called a local promising item in {ai}-CPB if pu(ip, {ai}-CPB) is no smaller than min_util; {A}is a local unpromising item in {D}-CPB , any superset of {A} is not a high utility itemset. 16. (Cont.) 2010/8/2516 Generating PHUIs from {D}-Tree: {{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45, {DB}:45,{DBC}:45, {DC}:53} A set of PHUIs is {{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45, {DB}:45,{DBC}:45, {DC}:53}, {B}:61 {BE}:54, {BEC}:54, {BC}:54, {A}:65, {AC}:55, {ACE}:47, {AE}:47, {E}:88, {EC}:76, {C}:96}. 17. Decreasing global node (DGN) utilities in construction of a global UP-Tree 2010/8/2517 Strategy 2. Discarding global node utilities (DGN) The utilities of its descendants are discarded from the utility of the node during the construction of a global UP-Tree {B}s-CPB 18. (Cont.) 2010/8/2518 19. (Cont.) 2010/8/2519 1 1 20. (Cont.) 2010/8/2520 1 1 21. (Cont.) 2010/8/2521 2 7 {C}.nu=1+p({C})q({C}, T2)=1+16=7 22. (Cont.) 2010/8/2522 2 7 {E}.nu=p({C})q({C}, T2)+p({E})q({E}, T2)=16+32=12 1 1 2 23. (Cont.) 2010/8/2523 2 7 {E}.nu=p({C})q({C}, T2)+p({E})q({E}, T2)+p({A})q({A}, T2)=16+32+52=22 1 1 2 1 2 2 24. (Cont.) 2010/8/2524 A set of PHUIs is {{D}:58, {DE}:45, {DEB}:45, {DEBC}:45, {DEC}:45, {DB}:45, {DBC}:45, {DC}:53, {B}:61, {A}:65, {E}:88, {C}:96}. 25. UP-Growth 2010/8/2525 For efficiently generating PHUIs from the global UP-Tree with two strategies: DLU(Discarding local unpromising items) DLN(Decreasing local node utilities) 26. DLU 2010/8/2526 Due to memory space limit, instead of maintaining exact utility values of the items in the conditional pattern base, we maintain a minimum item utility table(MIUT). Strategy 3. Discarding local unpromising items(DLU) The MIUT of unpromising items are discarded from path utilities of the paths during the construction of a local UP-Tree 27. (Cont.) 2010/8/2527 8-miu({A}) {AC}.count = 51 = 525-miu({A}) {BAEC}.count = 51 = 5 28. DLN 2010/8/2528 Strategy 4. Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP- Tree. 1 3 29. DLN 2010/8/2529 Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP- Tree. 2 1 6 3+{20-miu({B})1-miu({E}) 1} = 3+13 = 16 1 1 7 1 2 0 20-miu({E})1 = 20-3= 17 30. DLN 2010/8/2530 Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP- Tree. 3 2 9 16+{20-miu({B})1-miu({E}) 1} = 16+13 = 29 2 3 4 2 4 0 17+20-miu({E})1 = 17+17= 34 31. Experimental Results 2010/8/2531 32. Scalability 2010/8/2532 33. Conclusions 2010/8/2533 This paper proposed an efficient UP-Growth algo. For mining high utility itemsets. A UP-Tree structure is proposed for maintaining the information of high utility itemsets By four strategies, the mining performance is enhanced significantly since both the search space and the number of candidates are effectively reduced.