data mining techniques sequential patterns. sequential pattern mining progress in bar-code...

17
Data Mining Techniques Sequential Patterns

Upload: lynn-snow

Post on 28-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

Data Mining Techniques Sequential Patterns

Page 2: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

Sequential Pattern Mining• Progress in bar-code technology has made it

possible for retail organizations to collect and store massive amounts of sales data, referred to as the basket data

• A record in such data typically consists of the transaction date and the items bought in the transaction

• Very often, data records also contain customer-id, particularly when the purchase has been made using a credit card or a frequent-buyer card

• Catalog companies also collect such data using the orders they receive

Page 3: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

Sequential Pattern Mining• An example of such a pattern is that customers typi

cally rent “Star Wars (星際大戰 )”, then “Empire Strikes Back (帝國大反擊 )”, and then “Return of the Jedi (絕地大反攻 )”

• These rentals need not be consecutive– Customers who rent some other videos in between also s

upport this sequential pattern

• Elements of a sequential pattern need not be simple items– “Computer Science and Programming Language”, follo

wed by “Data Structure”, followed by “System Programs and Operating Systems” is an example of a sequential pattern in which the elements are sets of items

Page 4: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

Sequential Pattern Mining• Given Transaction Time, Customer Id,

Items BoughtOriginal Database

Answer Set

Page 5: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

Definition• The length of a sequence is the number of ite

msets in the sequence• A sequence of length k is called a k-sequence

• The support for an itemset i is defined as the fraction of customers who bought the items in i in a single transaction

• The itemset i and the 1-sequence <i> have the same support

• An itemset with minimum support is called a large (frequent) itemset or litemset

Page 6: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

AprioriAll Algorithm• Each itemset in a large sequence must have

minimum support

• Any large sequence must be a list of litemsets

• Finding all sequential patterns in five phases– Sort Phase– Litemset Phase– Transformation Phase– Sequence Phase– Maximal Phase

Page 7: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

AprioriAll Algorithm:Sort Phase

Customer-Sequence Version of the Database

Page 8: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

AprioriAll Algorithm:Litemset Phase

Apriori/DHPFP Growth

min_sup_count=2

Page 9: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

AprioriAll Algorithm:Transformation Phase

Page 10: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

AprioriAll Algorithm:Sequence Phase

Customer Sequences Large 1-Sequences

Large 2-Sequences

Large 3-Sequences

Large 4-Sequences

Maximal Large Sequences

2

Page 11: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

Sequence Phase:Candidate Generation

Page 12: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

AprioriAll Algorithm:Maximal Phase

• The sequence <(3) (4 5) (8)> is contained in <(7) (3 8) (9) (4 5 6) (8)>, since (3) (3 8), (4 5) (4 5 6) and (8) (8)

• The sequence <(3) (5)> is not contained in <(3 5)> (and vice versa)– The former represents items 3 and 5 being bought one after

the other

– The latter represents items 3 and 5 being bought together.

• In a set of sequences, a sequence s is maximal if s is not contained in any other sequence.

Page 13: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

AprioriAll Algorithm

• With minimum support set to 25%, i.e., a minimum support of 2 customers– < (30) (90)> and <(30) (40 70)> are maximal – <(10 20) (30)> which is only supported by customer 2

does not have minimum support– <(30)>, <(40)>, <(70)>, <(90)>, <(30) (40)>, <(30) (70)>

and <(40 70)>, though having minimum support, are not in the answer because they are not maximal.

Answer Set

Page 14: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

Summary

Page 15: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

Discussions

• AprioriAll algorithm will generate a huge set of candidate sequences– If there are 1000 frequent sequences of length-1, t

he algorithm will generate 1000 × 1000 + (1000 × 999) / 2 = 1,499,500 candidate sequences

• Many scans of databases in mining

• Difficulties at mining long sequential patterns

Page 16: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

Research Topics• Time-Interval Sequential Patterns• Time-Gap Sequential Patterns• Non-redundant Sequential Patterns• Constrained Sequential Pattern Mining• Multi-dimensional Sequential Patterns• Generalized Sequential Patterns• Incremental Mining Sequential Patterns• Data Stream Sequential Pattern Mining• Interactive Mining Sequential Patterns

Page 17: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to

Exercise 6

A Sequence Database (min-sup = 50%)

<eg(af)cbc>40

<(ef)(ab)(df)cb>30

<(ad)c(bc)(ae)>20

<a(abc)(ac)d(cf)>10

Customer sequenceSID