horizontal format data mining with extended bitmaps

Question?

• Is it possible to leverage benefits of vertical data formats in combination with efficiencies of bitmap operations to mine association rules in a distributed environment.

Association Rule Mining??

• Finding Interesting Relationships between the variables.

• Finding the subset that is common to a chosen minimum number of the itemsets from the set of itemsets.

• Pattern Recognition.

• Explained By Market Basket Analysis.

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

Sample (Toy ) Data

Apriori

• Fundamental Algorithm for Association Rule Mining.

• Mines frequent patterns from a horizontal data format which represents the items categorized into particular transactions.

• i-th stage identifies all frequent i-element sets.

• Two steps: • > Candidate generation.• > Candidate counting.

Vertical Form

• Transactions categorized into particular items.

• Vertical format data mining only has to parse the dataset once to get the itemsets.

• For the itemset generation from the 2nd itemset it only needs to refer the previous itemset.

• Eliminates parsing through the dataset each time to count the frequency of itemsets, for each round.

• More efficient than its horizontal form.

BitMaps

• Compactly store individual bits.

• Exploit bit-level parallelism effectively.

• 0’s and 1’s.

• 1 indicates existence.

Combined?

• Algorithm takes a horizontal data set.

• With a one pass of the data set construct a bit map based data structure.

• This structure is in vertical format.

• The structure facilitates efficient mining of association rules.

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

Sample (Toy ) Data

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

Sample (Toy ) Data

Horizontal

Format

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

Ordered Item

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

Master Array

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

Master Array

Associated

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

Master Array

Associated

Bitmap

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

TID Item ID’s

T100 I1, I2, I5

T200 I2, I4

T300 I1, I2

T400 I2, I5

Structure

Counting

Frequent Item

No. of Items Frequent Item Sets

1 I1, I2, I5

2 I1-I2, I2-I5

Minimum Support = 2

Counting

Frequent Item

No. of Items Frequent Item Sets

1 I1, I2, I5

2 I1-I2, I2-I5

Minimum Support = 2

Results

Insights

• The algorithm performs better than Apriori in most scenarios.

• Data structure generation dominates the total time in most cases.

• As an aside…

• Can this be made to a distributed mining algorithm?

Turns out this can be done rather easily.

Algorithm lends to map reduce like distributed processing..

Each master array index is self contained..

So can be mined in parallel.

Data structure generation Map phase

Result accumulation -> Reduce phase

What Does Future Hold?

• Make this distributed.

• Java not the best of options. Use C so we can control memory allocations the way we want.

• Experiment with bitmap compression techniques.

Summary

horizontal format data mining with extended bitmaps

i2 itemst400 i2

i2t400 i2

i511t200 i2

i5t200 i2

i51t200 i2

i2 t400i2

i5 i2 1i4i5

i5 i2 i5 i4

Technology

fast, small, simple rank/select on bitmaps⋆

thrustream extended range horizontal ... - kirloskar …...

apply ai-based effects to bitmaps and vectors · 2020. 3....

storage protection using horizontal barriers and large...

kick circulation analysis for extended-reach and horizontal...

computer graphics bitmaps & sprites co2409 computer graphics...

model dh80 dry (sin ra5532) k8.0 (115 metric) extended...

how to display bitmaps efficiently in java

tutorial 6 working with bitmaps and gradients, and...

roaring bitmaps (january 2016)

pixmaps, bitmaps and images how x applications can create,...

· web view256-color vga programming in c. bitmaps &...

horizontal split casing pump csa/cna horizontal...

scm bitmaps in interpretation petrel 2010

horizontal axially split type - sct extended · 2018. 6....

hattise acosting bugg inn rip bitmaps classy unhood rays rep...

building apps with graphics & animation. displaying bitmaps...

chapter 8 drawing pixels, bitmaps, fonts, and images

module 21: bitmaps - tenouk · module 21: bitmaps program...

displaying bitmaps efficiently - java