frequent item mining - kent state universityjin/dm08/fim.pdf · 3 definion: frequent itemset •...

35
Frequent Item Mining

Upload: vodang

Post on 18-Jul-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

FrequentItemMining

Page 2: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

Whatisdatamining?

•  =Pa6ernMining?•  Whatpa6erns?

•  Whyaretheyuseful?

Page 3: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

3

Defini>on:FrequentItemset•  Itemset

–  Acollec>onofoneormoreitems•  Example:{Milk,Bread,Diaper}

–  k‐itemset•  Anitemsetthatcontainskitems

•  Supportcount(σ)–  Frequencyofoccurrenceofanitemset

–  E.g.σ({Milk,Bread,Diaper})=2

•  Support–  Frac>onoftransac>onsthatcontainanitemset

–  E.g.s({Milk,Bread,Diaper})=2/5

•  FrequentItemset–  Anitemsetwhosesupportisgreaterthanor

equaltoaminsupthreshold

Page 4: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

FrequentItemsetsMining

TID Transactions 100 { A, B, E } 200 { B, D } 300 { A, B, E } 400 { A, C } 500 { B, C } 600 { A, C } 700 { A, B } 800 { A, B, C, E } 900 { A, B, C } 1000 { A, C, E }

•  Minimumsupportlevel50%–  {A},{B},{C},{A,B},{A,C}

Page 5: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

FrequentPa6ernMining

B

A

E

A B

C

C

FB

D

F

F

D

EA B

A

C

AE

D

C

F

D

A

B

A

C

E

A

D

A B

D C

A

A B

B

DD

CC

A B

D C

Page 6: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

BeyondItemsets •  SequenceMining

–  Findingfrequentsubsequencesfromacollec>onofsequences

•  GraphMining–  Findingfrequent(connected)subgraphsfromacollec>onof

graphs

•  TreeMining–  Findingfrequent(embedded)subtreesfromasetoftrees/

graphs

•  GeometricStructureMining–  Findingfrequentsubstructuresfrom3‐Dor2‐Dgeometric

graphs

•  Amongothers…

Page 7: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

WhyFrequentPa6ernMiningisSoImportant?

•  Applica>onDomains–  Business,biology,chemistry,WWW,computer/networingsecurity,…

•  Summarizingtheunderlyingdatasets,providingkeyinsights•  Basictoolsforotherdataminingtasks

–  Assoca>onrulemining

–  Classifica>on–  Clustering–  ChangeDetec>on–  etc…

Page 8: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

Network motifs: recurring patterns that occur significantly more than in randomized nets

•  Domo>fshavespecificrolesinthenetwork?

•  Manypossibledis>nctsubgraphs

Page 9: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

The 13 three-node connected subgraphs

Page 10: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

199 4-node directed connected subgraphs

Anditgrowsfastforlargersubgraphs:93645‐nodesubgraphs,1,530,8436‐node…

Page 11: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

Finding network motifs – an overview

•  Genera>onofasuitablerandomensemble(referencenetworks)

•  Networkmo>fsdetec>onprocess:

  Count how many times each subgraph appears

  Compute statistical significance for each subgraph – probability of appearing in random as much as in real network (P-val or Z-score)

Page 12: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

Real=5 Rand=0.5±0.6

Zscore(#StandardDeviaPons)=7.5

Ensembleofnetworks

Page 13: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

ThreeDifferentViewsofFIM

•  Transac>onalDatabase– Howwedostoreatransac>onaldatabase?•  Horizontal,Ver>cal,Transac>on‐ItemPair

•  BinaryMatrix•  Bipar>teGraph

•  HowdoestheFIMformulatedinthesedifferentse`ngs?

13

Page 14: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

14

FrequentItemsetGenera>on

Givenditems,thereare2dpossiblecandidateitemsets

Page 15: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

15

FrequentItemsetGenera>on•  Brute‐forceapproach:–  Eachitemsetinthela`ceisacandidatefrequentitemset–  Countthesupportofeachcandidatebyscanningthedatabase

– Matcheachtransac>onagainsteverycandidate

–  Complexity~O(NMw)=>ExpensivesinceM=2d!!!

Page 16: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

16

ReducingNumberofCandidates•  Aprioriprinciple:–  Ifanitemsetisfrequent,thenallofitssubsetsmustalsobefrequent

•  Aprioriprincipleholdsduetothefollowingpropertyofthesupportmeasure:

–  Supportofanitemsetneverexceedsthesupportofitssubsets

–  Thisisknownasthean>‐monotonepropertyofsupport

Page 17: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

17

FoundtobeInfrequent

Illustra>ngAprioriPrinciple

Prunedsupersets

Page 18: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

18

Illustra>ngAprioriPrincipleItems (1-itemsets)

Pairs (2-itemsets)

(No need to generate candidates involving Coke or Eggs)

Triplets (3-itemsets) Minimum Support = 3

If every subset is considered, 6C1 + 6C2 + 6C3 = 41

With support-based pruning, 6 + 6 + 1 = 13

Page 19: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

Apriori

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB, 487-499, 1994

Page 20: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

20

HowtoGenerateCandidates?

•  SupposetheitemsinLk‐1arelistedinanorder

•  Step1:self‐joiningLk‐1insertintoCkselectp.item1,p.item2,…,p.itemk‐1,q.itemk‐1

fromLk‐1p,Lk‐1q

wherep.item1=q.item1,…,p.itemk‐2=q.itemk‐2,p.itemk‐1<q.itemk‐1

•  Step2:pruningforallitemsetscinCkdo

forall(k‐1)‐subsetssofcdo

if(sisnotinLk‐1)thendeletecfromCk

Page 21: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

21

ChallengesofFrequentItemsetMining

•  Challenges–  Mul>plescansoftransac>ondatabase

–  Hugenumberofcandidates

–  Tediousworkloadofsupportcoun>ngforcandidates

•  ImprovingApriori:generalideas

–  Reducepassesoftransac>ondatabasescans–  Shrinknumberofcandidates

–  Facilitatesupportcoun>ngofcandidates

Page 22: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

22

CompactRepresenta>onofFrequentItemsets

•  Someitemsetsareredundantbecausetheyhaveiden>calsupportastheirsupersets

•  Numberoffrequentitemsets

•  Needacompactrepresenta>on

Page 23: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

23

MaximalFrequentItemset

BorderInfrequentItemsets

MaximalItemsets

Anitemsetismaximalfrequentifnoneofitsimmediatesupersetsisfrequent

Page 24: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

24

ClosedItemset

•  Anitemsetisclosedifnoneofitsimmediatesupersetshasthesamesupportastheitemset

Page 25: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

25

MaximalvsClosedItemsetsTransacPonIds

NotsupportedbyanytransacPons

Page 26: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

26

MaximalvsClosedFrequentItemsets

Minimumsupport=2

#Closed=9

#Maximal=4

Closedandmaximal

Closedbutnotmaximal

Page 27: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

27

MaximalvsClosedItemsets

Page 28: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

ResearchQues>ons

•  HowtoefficientlyenumerateMaximalFrequentItemsets?

•  HowaboutClosedFrequentItemsets?

28

Page 29: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

29

Alterna>veMethodsforFrequentItemsetGenera>on

•  Representa>onofDatabase– horizontalvsver>caldatalayout

Page 30: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

30

ECLAT

•  Foreachitem,storealistoftransac>onids(>ds)

TID‐list

Page 31: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

31

ECLAT•  Determinesupportofanyk‐itemsetbyintersec>ng>d‐listsof

twoofits(k‐1)subsets.

•  3traversalapproaches:–  top‐down,bo6om‐upandhybrid

•  Advantage:veryfastsupportcoun>ng•  Disadvantage:intermediate>d‐listsmaybecometoolargefor

memory

∧ →

Page 32: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

32

FP‐growthAlgorithm

•  Useacompressedrepresenta>onofthedatabaseusinganFP‐tree

•  OnceanFP‐treehasbeenconstructed,itusesarecursivedivide‐and‐conquerapproachtominethefrequentitemsets

Page 33: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

33

FP‐treeconstruc>onnull

A:1

B:1

null

A:1

B:1

B:1

C:1

D:1

A]erreadingTID=1:

A]erreadingTID=2:

Page 34: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

34

FP‐TreeConstruc>on

null

A:7

B:5

B:3

C:3

D:1

C:1

D:1 C:3

D:1

D:1

E:1 E:1

PointersareusedtoassistfrequentitemsetgeneraPon

D:1 E:1

TransacPonDatabase

Headertable

Page 35: Frequent Item Mining - Kent State Universityjin/DM08/FIM.pdf · 3 Definion: Frequent Itemset • Itemset – A collecon of one or more items • Example: {Milk, Bread, Diaper}

35

FP‐growth

null

A:7

B:5

B:1

C:1

D:1

C:1

D:1 C:3

D:1

D:1

CondiPonalPa`ernbaseforD:P={(A:1,B:1,C:1),

(A:1,B:1),(A:1,C:1),(A:1),(B:1,C:1)}

RecursivelyapplyFP‐growthonP

FrequentItemsetsfound(withsup>1):AD,BD,CD,ACD,BCD

D:1