adaptive load shedding for mining frequent patterns from data streams xuan hong dang, wee-keong ng,...
TRANSCRIPT
![Page 1: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/1.jpg)
Adaptive Load Shedding for Mining Frequent Patterns from
Data Streams
Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong
(DaWaK 2006)
2008/3/19 1Yi-Chun Chen
![Page 2: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/2.jpg)
Outline
• Motivation
• Objective
• Definition
• Adaptive Load Shedding in Data Stream
• Performace Results
• Conclusion
2008/3/19 2Yi-Chun Chen
![Page 3: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/3.jpg)
Motivation• Finding frequent itemsets plays an important role
in analyzing data streams
• Only assuming that the machinery itself is fast enough to handle all incoming transactions without incurring any unwanted latencies
2008/3/19 Yi-Chun Chen 3
![Page 4: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/4.jpg)
(Cont.)
• The arrival rate of data streams usually exceeds the system capacity
• Algorithms mining from data streams must cope with system overload situations
2008/3/19 Yi-Chun Chen 4
![Page 5: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/5.jpg)
Objective• Given a processing capacity C of a mining
system and a data stream DS with high arrival rates
• Load(DS) : the workload of the system
• If , a load shedding is invoked
• Guarantee
• Discover a set of patterns closely approximates to the set of actual frequent itemsets
2008/3/19 Yi-Chun Chen 5
( )Load DS C
( )Load DS C
![Page 6: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/6.jpg)
(Cont.)
• How to determine overload situations?
• How much load to shed?
• How to approximate frequent patterns under the introduction of load shedding?
2008/3/19 Yi-Chun Chen 6
![Page 7: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/7.jpg)
Definition
•
• • : the occurrence count of X in DS up to the
transaction
MFIs: maximal frequent itemset
2008/3/19 Yi-Chun Chen 7
1 2, ,..., mI a a a
1 2, ,..., ,...NDS t t t
( )freq X thN
( )sup( )
freq XX
N
![Page 8: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/8.jpg)
Adaptive Load Shedding in Data Streams
• Overload Detection
• Load Shedding by Sampling Transactions
2008/3/19 Yi-Chun Chen 8
![Page 9: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/9.jpg)
Overload Detection
• To quickly estimate the system workload, we propose an approximate method on MFIs– MFIs also contains all frequent itemsets
– The # of MFIs is smaller than the # of frequent itemsets
– The support of MFIs is always closest to
2008/3/19 Yi-Chun Chen 9
![Page 10: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/10.jpg)
(Cont.)
• load coefficient:– k be the # of MFIs in a transaction
– be a MFI, where
• Suppose we measure the above statistics for n transactions over one time unit
– r be the current rate of the data stream
2008/3/19 Yi-Chun Chen 10
1 , 1
2 2 i ji
k kX XX
i i j
L
iX 1 i k
1
n
iiL
r Cn
![Page 11: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/11.jpg)
Load Shedding by Sampling Transactions
• In order to estimate how much load to shed
– P be a parameter expressing the fraction of transactions that should be discarded
– Suppose P < 1 , then we use Hoeffding bound to discard transactions and to approximate frequent patterns
2008/3/19 Yi-Chun Chen 11
1
n
iiL
P r Cn
![Page 12: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/12.jpg)
(Cont.)• Hoeffding bound:
– , – r be the number of times that occurs in these
transactions– sup(X) = p : the true support of X
– : the estimated support of X
– We want to satisfy the inequality, so the required number of sampling transactions is at least
2008/3/19 Yi-Chun Chen 12
0 0Pr r n p n 2
022 ne 0 1 1iX
0n
0sup ( ) /E X r n
0 2
1 2ln
2n
![Page 13: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/13.jpg)
(Cont.)
• Sample batch: each incoming transaction is chosen with probability P until we sample enough transactions
• Local patterns: all freq. itemsets in this sample batch are found only within part of the stream
• Global freq. itemsets in the entire stream
2008/3/19 Yi-Chun Chen 13
0n
![Page 14: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/14.jpg)
(Cont.)
• Due to the non-uniform distribution of the stream
– False global patterns
– Significant support : the max. support error of each pattern
• : frequent
• : sub-frequent
• : infrequent
2008/3/19 Yi-Chun Chen 14
0 ( )
sup( )X
0 sup( )X
0sup( )X
Significant patterns
![Page 15: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/15.jpg)
(Cont.)
• The required number of sampling transactions is at least
• If and ,then is too huge• we assume that each itemset appearing more than 0.01% ,then if
, then every itemset will be chosen
• ,
2008/3/19 Yi-Chun Chen 15
0 2
1 2ln
2n
0.001 0.01 0 2600000n
0 10000n
0 20
1 2; ln2
n Max
1
![Page 16: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/16.jpg)
Performance Results
• Accuracy Measurements
• Adaptability• Recall: 找到的 true freq. patterns / 實際上是 true freq. patterns
• Precision: 找到 true freq. patterns / 找到的 total freq. patterns
• Synthetic: T5I3D1000K, T8I4D1000K with 10000 unique items
• Real-life: “BMS-POS” T6.5 D515597 with 1657 distinct items
• Fix , select
2008/3/19 Yi-Chun Chen 16
00.01, 0.01 25n K 04, 0.1
0 ;250.1
n K
![Page 17: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/17.jpg)
2008/3/19 Yi-Chun Chen 17
![Page 18: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/18.jpg)
2008/3/19 Yi-Chun Chen 18
![Page 19: Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d1a5503460f949ef737/html5/thumbnails/19.jpg)
Conclusion
• To address the problem of finding frequent patterns from data streams where the mining system may not keep up with the arrival reat of the stream
2008/3/19 Yi-Chun Chen 19