frequent itemset mining - page d'accueil / lirmm.fr / -...

31
ì Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM COCONUT Team (PART I) IMAGINA 18/19 Webpage: http://www.lirmm.fr/~lazaar/teaching.html Email: [email protected] 1

Upload: others

Post on 05-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

ì

FrequentItemsetMining

NadjibLAZAAR

LIRMM-UMCOCONUTTeam

(PARTI)

IMAGINA18/19

Webpage:http://www.lirmm.fr/~lazaar/teaching.htmlEmail:[email protected]

�1

Page 2: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

DataMining

ì DataMining(DM)orKnowledgeDiscoveryinDatabases(KDD)revolvesaroundtheinvestigationandcreationofknowledge,processes,algorithms,andthemechanismsforretrievingpotentialknowledgefromdatacollections.

�2

Page 3: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

GameDataMining

ì Dataaboutplayersbehavior,serverperformance,systemfunctionality…

ì Howtoconvertthesedataintosomethingmeaningful?

ì Howtomovefromrawdatatoactionableinsights?

➔ Gamedataminingistheanswer

�3

Page 4: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

FrequentItemsetMining:Motivations

FrequentItemsetMiningisamethodformarketbasketanalysis.

Itaimsatfindingregularitiesintheshoppingbehaviorofcustomersofsupermarkets,mail-ordercompanies,on-lineshopsetc.

ì Morespecifically:Findsetsofproductsthatarefrequentlyboughttogether.

ì Possibleapplicationsoffoundfrequentitemsets:ì Improvearrangementofproductsinshelves,onacatalog’spagesetc.ì Supportcross-selling(suggestionofotherproducts),productbundling.ì Frauddetection,technicaldependenceanalysis,faultlocalization…etc.

ì Oftenfoundpatternsareexpressedasassociationrules,forexample:ì Ifacustomerbuysbreadandwine,thenshe/hewillprobablyalsobuycheese.

�4

Page 5: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

FrequentItemsetMining:Basicnotions

ì Items:

ì Itemset,transaction:

ì Transactionaldataset:

ì Languageofitemsets:

ì Coverofanitemset:

ì (absolute)Frequency:

�5

I = {i1, …, in}

P, T, ⊆ I

D = {T1, …, Tm}

ℒI = 2I

cover(P) = {i |Ti ∈ D ∧ P ⊆ Ti}

freq(P) = |cover(P) |

Page 6: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Absolute/relativefrequency

ì AbsoluteFrequency:

ì RelativeFrequency:

�6

freq(P) = |cover(P) |

freq(P) =1

|D ||cover(P) |

Page 7: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

FrequentItemsetMining:Definition

ì Given:ì Asetofitemsì Atransactionaldatasetì Aminimumsupport

ì Theneed:ì ThesetofitemsetPs.t.:

�7

I = {i1, …, in}D = {T1, …, Tm}

θ

freq(P) ≥ θ

Page 8: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Example(1)

�8

I = {a, b, c, d, e}, D = {T1, …, T10}

cover(bc) = {2,7,9}

freq(bc) = 3

Page 9: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Example(1)

�9

Page 10: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Example(1)

�10

7 3

3 3

3

7 76

64 4 4 4

4

50 1 1

0

0

0 0 0

0 0 1 1 0

0

2

2

Page 11: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Example(1)

�11

7 3

3 3

3

7 76

64 4 4 4

4

50 1 1

0

0

0 0 0

0 0 1 1 0

0

2

2

Frequentitemset?

Page 12: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Example(1)

�12

7 3

3 3

3

7 76

64 4 4 4

4

5 1 1

1 1 2

2

Frequentitemsetwithminimumsupportθ=3?

Page 13: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

SearchingforFrequentItemsets

ì Anaïvesearchthatconsistsofenumeratingandtestingthefrequencyofitemsetcandidatesinagivendatasetisusuallyinfeasible.

ì Why?

�13

Numberofitems(n) Searchspace(2n)

10 ≈103

20 ≈106

30 ≈109

100 ≈1030

128 ≈1068(atomsintheuniverse)

1000 ≈10301

Page 14: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Anti-monotonicityproperty

ì GivenatransactiondatabaseDoveritemsIandtwoitemsetsX,Y:

ì Thatis,

�14

X ⊆ Y ⇒ cover(Y ) ⊆ cover(X)

X ⊆ Y ⇒ freq(Y ) ≤ freq(X)

Page 15: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Example(2)

�15

7 3

3 3

3

7 76

64 4 4 4

4

50 1 1

0

0

0 0 0

0 0 1 1 0

0

2

2

ade

acde

cover(ade) = {1,4,8,10}, freq(ade) = 4

cover(acde) = {4,8}freq(acde) = 2

Page 16: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Aprioriproperty

ì GivenatransactiondatabaseDoveritemsI,aminsupθandtwoitemsetsX,Y:

ì Itfollows:

ì Contraposition:

�16

Allsubsetsofafrequentitemsetarefrequent!

Allsupersetsofaninfrequentitemsetareinfrequent!

X ⊆ Y ⇒ freq(Y ) ≤ freq(X)

X ⊆ Y ⇒ ( freq(Y ) ≥ θ ⇒ freq(X) ≥ θ)

X ⊆ Y ⇒ ( freq(X) < θ ⇒ freq(Y ) < θ)

Page 17: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Example(3)

�17

7 3

3 3

3

7 76

64 4 4 4

4

50 1 1

0

0

0 0 0

0 0 1 1 0

0

2

2acde

Allsubsetsofafrequentitemsetarefrequent!

cdeadeaceacd

ac ad ae cd ce de

a c d eθ = 2

Page 18: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Example(3)

�18

7 3

3 3

3

7 76

64 4 4 4

4

50 1 1

0

0

0 0 0

0 0 1 1 0

0

2

2

be

bce bdeabe

bcdeabdeabce

abcde

Allsupersetsofaninfrequentitemsetareinfrequent!

θ = 2

Page 19: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Partiallyorderedsets

ì Apartialorderisabinaryrelationoveraset:

�19

(reflexivity)

(anti-symmetry)

(transitivity)

Page 20: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

ì Comparableitemsets:

ì Incomparableitemsets:

�20

Page 21: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

AprioriAlgorithm[AgrawalandSrikant1994]

ì Determinethesupportoftheone-elementitemsets(i.e.singletons)anddiscardtheinfrequentitems.

ì Formcandidateitemsetswithtwoitems(bothitemsmustbefrequent),determinetheirsupport,anddiscardtheinfrequentitemsets.

ì Formcandidateitemsetswiththreeitems(allcontainedpairsmustbefrequent),determinetheirsupport,anddiscardtheinfrequentitemsets.

ì Andsoon!

�21

Basedoncandidategenerationandpruning

Page 22: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

AprioriAlgorithm[AgrawalandSrikant1994]

�22

Page 23: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Aprioricandidatesgeneration

�23

Page 24: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Improvingcandidatesgeneration

ì Usingapriori-genfunction,anitemofk+1sizecanbegeneratedinajpossibleways:

ì Need:Generateitemsetcandidateatmostonce.

ì How:Assigntoeachitemsetauniqueparentitemset,fromwhichthisitemsetistobegenerated

�24

Page 25: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Improvingcandidatesgeneration

ì Assigninguniqueparentsturnstheposetlatticeintoatree:

�25

Page 26: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Canonicalformforitemsets

ì Anitemsetcanberepresentedasawordoveranalphabet

ì Q:howmanywordsof3itemscanwehave?Of4items?Ofkitems?

ì Anarbitraryorder(e.g.,lexicographyorder)onitemscangiveacanonicalform,auniquerepresentationofitemsetsbybreakingsymmetries.ì Lexonitems:

�26

Page 27: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

RecursiveprocessingwithCanonicalforms

ì ForeachPofagivenlevel,generateallpossibleextensionofPbyoneitemsuchthat:

ì ForeachP’,processitrecursively.

�27

Page 28: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

Example(4)

�28

7 3

3 3

3

7 76

64 4 4 4

4

50 1 1

0

0

0 0 0

0 0 1 1 0

0

2

2

Q:whatarethechildrenof:

Page 29: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

ItemsOrdering

ì Anyordercanbeused,thatis,theorderisarbitrary

ì Thesearchspacediffersconsiderablydependingontheorder

ì Thus,theefficiencyoftheFrequentItemsetMiningalgorithmscandifferconsiderablydependingontheitemorder

ì Advancedmethodsevenadapttheorderoftheitemsduringthesearch:usedifferent,but“compatible”ordersindifferentbranches

�29

Page 30: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

ItemsOrdering(heuristics)

ì Frequentitemsetsconsistoffrequentitemsì Sorttheitemsw.r.t.theirfrequency.(decreasing/increasing)

ì Thesumoftransactionsizes,transactioncontainingagivenitem,whichcapturesimplicitlythefrequencyofpairs,tripletsetc.ì Sortitemsw.r.t.thesumofthesizesofthetransactionsthat

coverthem.

�30

Page 31: Frequent Itemset Mining - Page d'accueil / Lirmm.fr / - lirmmlazaar/imagina/NL-DM-IMAGINA1819-part1.pdf · Items Ordering ì Any order can be used, that is, the order is arbitrary

ì

Tutorials

link:http://www.lirmm.fr/~lazaar/imagina/TD1.pdf

�31