mining top k frequent closed itemsets

35
Mining top-k frequent close d itemsets over data stream s using the sliding window model Author: Pauray S.M Tsai Publication: ESA 2010 Presenter: Yuan-Chung Chang

Upload: yuanchung

Post on 24-May-2015

2.080 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Mining top k frequent closed itemsets

Mining top-k frequent closed itemsets over data streams using the sliding window

model

Author: Pauray S.M TsaiPublication: ESA 2010Presenter: Yuan-Chung Chang

Page 2: Mining top k frequent closed itemsets

2

Outline

Introduction Motivation Mining top-k frequent closed itemsets

FCI_max algorithm Example for FCI_max algorithm Conclusion

Page 3: Mining top k frequent closed itemsets

3

Introduction

With the emergence of new applications, the data we process are not again static, but the continuous dynamic data stream.

Because the data in streams come with high speed and are continuous and unbounded, there are three challenges for data stream mining. First, each item in a stream could be examined only once. Second, although the data are generated continuously, the

memory space could be used is limited. Third, the mining result should be generated as fast as

possible.

Page 4: Mining top k frequent closed itemsets

4

Introduction (cont.)

In the database community, one of the major applications is mining association rules in large transaction databases.

There are two problems occurring in traditional association rule mining. First, a minimum support is required for mining. Second, there are usually a lot of association rules

generated from the mining, which gives rise to difficulties in practical applications.

Page 5: Mining top k frequent closed itemsets

5

Introduction (cont.) In the data stream environment, the problem of mining

frequent itemsets becomes more complicated.

Traditional algorithms for mining frequent itemsets cannot satisfy the requirement of examining each item in a stream only once. How to effectively maintain frequent itemsets over data

streams is another important issue.

Because data are generated continuously in data streams, present frequent itemsets may become infrequent, and present infrequent itemsets may become frequent.

We cannot save all the itemsets and their related information in the memory due to the restriction of memory space.

Page 6: Mining top k frequent closed itemsets

6

Introduction (cont.)

The time models for data stream mining mainly include the landmark model (2002), the tilted-time window model (2003) and the sliding window model (2006). The landmark model considers all the data from a specified

point of time to the current time. The tilted-time window model is a variation of the

landmark model. The sliding window model focuses on the recent data from

the current moment back to a specified time point.

Page 7: Mining top k frequent closed itemsets

7

Motivation The two problems occurring in traditional association rule

mining also exist in the data stream environment: specifying an appropriate minimum support and reducing the number of frequent itemsets.

The idea of mining frequent closed itemsets was first proposed in 1999.

Page 8: Mining top k frequent closed itemsets

8

Motivation (cont.) An alternative approach for mining top-k frequent closed

itemsets of length no less than min_l without specifying the minimum support was proposed in 2005. The mining result only presents frequent closed itemsets of

length no less than min_l, resulting in the loss of information about closed itemsets with high support but short length.

In fact, the longer the length of a closed itemset is, the smaller the support of it will be.

In this paper, the author proposes an efficient single pass algorithm, FCI_max, to discover top-k frequent closed itemsets of length no more than max_l, using a sliding window technique.

Page 9: Mining top k frequent closed itemsets

9

Motivation (cont.)

For mining top-k frequent closed itemsets of length no less than min_l (2005) Case 1: Mining top-3 frequent closed itemsets with min_l = 2.

• The mining result is {ab:7, abc:6, ad:4}. {a:8}

Case 2: Mining top-3 frequent closed itemsets with min_l = 3.• The mining result is {abc:6, abcd:3, abe:2, ace:2}. {a:8},{ab:7},{ad:4}

Page 10: Mining top k frequent closed itemsets

10

Motivation (cont.)

For mining top-k frequent closed itemsets of length no more than max_l (2010) Case 3: Mining top-4 frequent closed itemsets with max_l = 3.

• The mining result is {a:8, ab:7, abc:6, ad:4}.

Case 4: Mining top-4 frequent closed itemsets with max_l = 2.• The mining result is {a:8, ab:7, ad:4, ae:3}.

Page 11: Mining top k frequent closed itemsets

11

Mining top-k frequent closed itemsets

The auther use the sliding window model shown in Fig. 1 for the following discussion.

Page 12: Mining top k frequent closed itemsets

12

Mining top-k frequent closed itemsets

The number of windows: n The time covered by each window: t Items in window: {x1,x2, . . . , xm}

The sliding windows: {Wi1,Wi2, . . . ,Win}

The set of identifiers of transactions containing itemset {x1,x2m, . . . , xm} in window Wij: SPij({x1,x2, . . . , xm})

The union of SPij({x1,x2m, . . . , xm}): CSi({x1,x2, . . . , xm})

The number of transaction identifiers in CSi({x1,x2m, . . . , xm}): CSi({x1,x2, . . . , xm})

The top-k 1-itemsets by CSi: {S1,S2, . . . ,Sk}

The current top-k frequent closed itemsets are denoted as a set: P The initial value of P is set to {S1,S2, . . . ,Sk}

Page 13: Mining top k frequent closed itemsets

13

Mining top-k frequent closed itemsets

The detailed algorithm for mining top-k frequent closed itemsets with max_l FCI_max algorithm

Page 14: Mining top k frequent closed itemsets

14

Mining top-k frequent closed itemsets

Page 15: Mining top k frequent closed itemsets

15

Example for FCI_max algorithm

Assume the number of windows is 4 and the size of a window is 5 minutes.

Assume the number of given frequent closed itemsets is 5 and the maximum length of frequent closed itemsets is 4.

Page 16: Mining top k frequent closed itemsets

16

Page 17: Mining top k frequent closed itemsets

17

Page 18: Mining top k frequent closed itemsets

18

Page 19: Mining top k frequent closed itemsets

19

Page 20: Mining top k frequent closed itemsets

20

Page 21: Mining top k frequent closed itemsets

21

Page 22: Mining top k frequent closed itemsets

22

Page 23: Mining top k frequent closed itemsets

23

Page 24: Mining top k frequent closed itemsets

24

Page 25: Mining top k frequent closed itemsets

25

Page 26: Mining top k frequent closed itemsets

26

Page 27: Mining top k frequent closed itemsets

27

Page 28: Mining top k frequent closed itemsets

28

Page 29: Mining top k frequent closed itemsets

29

Page 30: Mining top k frequent closed itemsets

30

Page 31: Mining top k frequent closed itemsets

31

Page 32: Mining top k frequent closed itemsets

32

Page 33: Mining top k frequent closed itemsets

33

Page 34: Mining top k frequent closed itemsets

34

Conclusion

In this paper, the auther proposes an efficient single pass algorithm, FCI_max, to discover top-k frequent closed itemsets of length no more than max_l.

The method of using the maximum length to replace with the minimum support resolves the problem of losing information about itemsets with short length but high support.

FCI_max algorithm needs not to store all the support counts of itemsets at each time point.

It utilizes a technique of dynamic computation to generate all the frequent closed itemsets and their related information, which efficiently discovers top-k frequent closed itemsets under the data stream environment.

Page 35: Mining top k frequent closed itemsets

www.themegallery.com

Thank youfor your listening

Q & A