data mining techniques sequential patterns. sequential pattern mining progress in bar-code...
TRANSCRIPT
![Page 1: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/1.jpg)
Data Mining Techniques Sequential Patterns
![Page 2: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/2.jpg)
Sequential Pattern Mining• Progress in bar-code technology has made it
possible for retail organizations to collect and store massive amounts of sales data, referred to as the basket data
• A record in such data typically consists of the transaction date and the items bought in the transaction
• Very often, data records also contain customer-id, particularly when the purchase has been made using a credit card or a frequent-buyer card
• Catalog companies also collect such data using the orders they receive
![Page 3: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/3.jpg)
Sequential Pattern Mining• An example of such a pattern is that customers typi
cally rent “Star Wars (星際大戰 )”, then “Empire Strikes Back (帝國大反擊 )”, and then “Return of the Jedi (絕地大反攻 )”
• These rentals need not be consecutive– Customers who rent some other videos in between also s
upport this sequential pattern
• Elements of a sequential pattern need not be simple items– “Computer Science and Programming Language”, follo
wed by “Data Structure”, followed by “System Programs and Operating Systems” is an example of a sequential pattern in which the elements are sets of items
![Page 4: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/4.jpg)
Sequential Pattern Mining• Given Transaction Time, Customer Id,
Items BoughtOriginal Database
Answer Set
![Page 5: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/5.jpg)
Definition• The length of a sequence is the number of ite
msets in the sequence• A sequence of length k is called a k-sequence
• The support for an itemset i is defined as the fraction of customers who bought the items in i in a single transaction
• The itemset i and the 1-sequence <i> have the same support
• An itemset with minimum support is called a large (frequent) itemset or litemset
![Page 6: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/6.jpg)
AprioriAll Algorithm• Each itemset in a large sequence must have
minimum support
• Any large sequence must be a list of litemsets
• Finding all sequential patterns in five phases– Sort Phase– Litemset Phase– Transformation Phase– Sequence Phase– Maximal Phase
![Page 7: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/7.jpg)
AprioriAll Algorithm:Sort Phase
Customer-Sequence Version of the Database
![Page 8: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/8.jpg)
AprioriAll Algorithm:Litemset Phase
Apriori/DHPFP Growth
min_sup_count=2
![Page 9: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/9.jpg)
AprioriAll Algorithm:Transformation Phase
![Page 10: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/10.jpg)
AprioriAll Algorithm:Sequence Phase
Customer Sequences Large 1-Sequences
Large 2-Sequences
Large 3-Sequences
Large 4-Sequences
Maximal Large Sequences
2
![Page 11: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/11.jpg)
Sequence Phase:Candidate Generation
![Page 12: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/12.jpg)
AprioriAll Algorithm:Maximal Phase
• The sequence <(3) (4 5) (8)> is contained in <(7) (3 8) (9) (4 5 6) (8)>, since (3) (3 8), (4 5) (4 5 6) and (8) (8)
• The sequence <(3) (5)> is not contained in <(3 5)> (and vice versa)– The former represents items 3 and 5 being bought one after
the other
– The latter represents items 3 and 5 being bought together.
• In a set of sequences, a sequence s is maximal if s is not contained in any other sequence.
![Page 13: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/13.jpg)
AprioriAll Algorithm
• With minimum support set to 25%, i.e., a minimum support of 2 customers– < (30) (90)> and <(30) (40 70)> are maximal – <(10 20) (30)> which is only supported by customer 2
does not have minimum support– <(30)>, <(40)>, <(70)>, <(90)>, <(30) (40)>, <(30) (70)>
and <(40 70)>, though having minimum support, are not in the answer because they are not maximal.
Answer Set
![Page 14: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/14.jpg)
Summary
![Page 15: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/15.jpg)
Discussions
• AprioriAll algorithm will generate a huge set of candidate sequences– If there are 1000 frequent sequences of length-1, t
he algorithm will generate 1000 × 1000 + (1000 × 999) / 2 = 1,499,500 candidate sequences
• Many scans of databases in mining
• Difficulties at mining long sequential patterns
![Page 16: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/16.jpg)
Research Topics• Time-Interval Sequential Patterns• Time-Gap Sequential Patterns• Non-redundant Sequential Patterns• Constrained Sequential Pattern Mining• Multi-dimensional Sequential Patterns• Generalized Sequential Patterns• Incremental Mining Sequential Patterns• Data Stream Sequential Pattern Mining• Interactive Mining Sequential Patterns
![Page 17: Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to](https://reader036.vdocuments.us/reader036/viewer/2022082817/56649e565503460f94b4e1e0/html5/thumbnails/17.jpg)
Exercise 6
A Sequence Database (min-sup = 50%)
<eg(af)cbc>40
<(ef)(ab)(df)cb>30
<(ad)c(bc)(ae)>20
<a(abc)(ac)d(cf)>10
Customer sequenceSID