efficient itemset extraction using imine index by by u.p.pushpavalli u.p.pushpavalli ii year me(cse)...

31
EFFICIENT ITEMSET EXTRACTION EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year II Year ME(CSE) ME(CSE)

Upload: eunice-robertson

Post on 12-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

EFFICIENT ITEMSET EXTRACTION EFFICIENT ITEMSET EXTRACTION USING IMINE INDEXUSING IMINE INDEX

ByBy

U.P.PushpavalliU.P.Pushpavalli

II Year ME(CSE)II Year ME(CSE)

Page 2: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

OBJECTIVEOBJECTIVE

The main objective is to provide an index support for The main objective is to provide an index support for frequent itemset mining.frequent itemset mining.

To provide a compact and complete structure for item set To provide a compact and complete structure for item set extraction .extraction .

Implemented by FP based and LCM based algorithms.Implemented by FP based and LCM based algorithms.

Page 3: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

A frequent itemset is an itemset whose support is ≥ minsup

Support: For rule of form A=>B, Support refers to percentage

of transaction in D that contain AUB. Confidence: For rule of form A=>B, confidence is the conditional

probability that B is true when A is known to be true. support(LHS U RHS) / support(LHS)

Page 4: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Existing-Apriori AlgorithmExisting-Apriori Algorithm

Uses database scan and pattern matching to collect counts for the candidate itemsets

Any subset of a frequent itemset must be Any subset of a frequent itemset must be frequent.frequent.

Page 5: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Apriori –Example

TID Items10 a, c, d20 b, c, e30 a, b, c, e40 b, eMin_sup=2

Itemset Supa 2b 3c 3d 1e 3

Database D 1-candidates

Scan D

Itemset Supa 2b 3c 3e 3

Freq 1-itemsetsItemset

abacaebcbece

2-candidates

Itemset Supab 1ac 2ae 1bc 2be 3ce 2

Counting

Scan D

Itemset Supac 2bc 2be 3ce 2

Freq 2-itemsetsItemset

bce

3-candidates

Itemset Supbce 2

Freq 3-itemsets

Scan D

Page 6: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Bottleneck of Apriori:

Huge candidate sets Multiple scans of database

Page 7: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Mining Frequent Patterns- Without Candidate Generation

Large database is compressed into a compact, Frequent-Pattern tree (FP-tree) structure Highly condensed, but complete for frequent

pattern mining Avoids costly database scans Divide-and-conquer methodology Avoids candidate generation

Page 8: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

FP-tree

{}

f:4 c:1

b:1

p:1

b:1c:3

a:3

b:1m:2

p:2 m:1

Header Table

Item frequency head f 4c 4a 3b 3m 3p 3

min_support = 3

TID Items bought (ordered) frequent items100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}200 {a, b, c, f, l, m, o} {f, c, a, b, m}300 {b, f, h, j, o} {f, b}400 {b, c, k, s, p} {c, b, p}500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}

Page 9: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Drawbacks:

Requires two database scans

Rebuilding tree for every support count

Memory utilization high

Page 10: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

IMINE-PROPOSEDIMINE-PROPOSED SYSTEMSYSTEM

Covering index.Covering index.

No constraints are enforced during the index creation No constraints are enforced during the index creation phase.phase.

Efficiently exploited by various item set extraction Efficiently exploited by various item set extraction algorithms.algorithms.

Physical organization supports efficient data access during Physical organization supports efficient data access during item set extraction.item set extraction.

Support item set extraction in large data sets.Support item set extraction in large data sets.

Page 11: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Creating I-Tree based on the FP-tree data structure

Creating I-Btree based on the B+Tree structure

Extraction task – Reading selected I-Tree portions.

Data access methods frequent-item,Support and Item-based projection

Designing IMine Physical organization to reduce I/O

Item set mining- Implementing FP-based and LCM algorithms

Performance evaluation

System Flow DiagramSystem Flow Diagram

Page 12: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

MODULES:

Implementation of I-tree I-BtreeIMine Data Access MethodsIMine Physical OrganizationItem set mining using FP-based and LCM algorithms

Page 13: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Index StructureIndex Structure

Characterized by 2 components and provide 2 Characterized by 2 components and provide 2 levels of indexinglevels of indexing I-Tree (Itemset-Tree)I-Tree (Itemset-Tree)

Prefix-tree based on FP-tree data structure.Prefix-tree based on FP-tree data structure.Scans the database once.Scans the database once.

I-Btree (Item-Btree)I-Btree (Item-Btree)Reading selected I-Tree portions during Reading selected I-Tree portions during extraction .extraction .

Page 14: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

IMineIMine

Parent pointerFirst child pointerRight brother pointer

I-TreeI-Tree

Page 15: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

IMineIMine

I-Btree

Page 16: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

I-TREEI-TREE

I-Tree layers:I-Tree layers: Top layerTop layer

Very frequently accessed during the mining Very frequently accessed during the mining process.process.Nodes with high support are stored.Nodes with high support are stored.

Middle layerMiddle layerQuite frequently accessed during the mining Quite frequently accessed during the mining process.process.

Bottom layerBottom layerRarely accessed during the mining processRarely accessed during the mining processNodes with unitary support are stored.Nodes with unitary support are stored.

Page 17: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Physical organizationPhysical organization:: Minimize the cost of reading the data needed for Minimize the cost of reading the data needed for

the current extraction processthe current extraction process Correlation types:Correlation types:

Intratransaction correlationIntratransaction correlation I-Tree layersI-Tree layers

Intertransaction correlationIntertransaction correlation I-Tree path correlationI-Tree path correlation

Page 18: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

I/O analysis for index data access:I/O analysis for index data access: Through I-Btree, block 3 is loaded in the buffer Through I-Btree, block 3 is loaded in the buffer

cache.cache. Following the node parent, block 1 is loaded Following the node parent, block 1 is loaded

[p:3]→[d:5] →[h:7] →[e:7] →[b:10] is in memory[p:3]→[d:5] →[h:7] →[e:7] →[b:10] is in memory If the 2 blocks are still in the buffer cache, reading If the 2 blocks are still in the buffer cache, reading

other prefix path does not require additional disk other prefix path does not require additional disk readsreads

Page 19: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

IMine data access methodIMine data access method:: Frequent-item based projectionFrequent-item based projection

Support projection-based algorithmSupport projection-based algorithm FP-growthFP-growth

Support-based projectionSupport-based projectionSupport level-based and array-based algorithmSupport level-based and array-based algorithm

Apriori and LCM v.2Apriori and LCM v.2

Item-based projectionItem-based projectionLoad all transactionsLoad all transactions

Page 20: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Loading frequent-item based projected DB:Loading frequent-item based projected DB: Ex: item p appears in 2 nodes [p:3] , [p:2]Ex: item p appears in 2 nodes [p:3] , [p:2]

Starting from I-Btree and reading 2 Starting from I-Btree and reading 2

prefix path for pprefix path for p

[p:3→d:5→h:7→e:7→b:10][p:3→d:5→h:7→e:7→b:10]

[p:2→i:2→h:3→e:3][p:2→i:2→h:3→e:3]

Page 21: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Loading Support-based projected DB:Loading Support-based projected DB:

Given the I-Tree ,subpaths between the I-Tree Given the I-Tree ,subpaths between the I-Tree roots and the first node with an infrequent item.roots and the first node with an infrequent item.

Reads a node subtree by means of a top-down Reads a node subtree by means of a top-down depth-first I-Tree visit exploiting both the node depth-first I-Tree visit exploiting both the node child and brother pointers.child and brother pointers.

Page 22: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Item Set MiningItem Set Mining

Step1:Step1: The needed index data is loadedThe needed index data is loaded

Step2:Step2: Item set extraction takes place on loaded dataItem set extraction takes place on loaded data

Page 23: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

I-MINE

Page 24: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

I_BTree

Page 25: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

LCM

Page 26: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

IMINE -Execution Time

Page 27: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

IMINE-Memory Usage

Page 28: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Software Specification

Operating system : Windows XP/Vista

Language : JDK 1.6.1 and above

Back End : SQLServer2000

Page 29: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

ConclusionConclusion

Provide a complete and compact representation of Provide a complete and compact representation of transactional datatransactional data

Supports different algorithmic approaches to item set Supports different algorithmic approaches to item set extractionextraction

Performance better than the existing FP-growth , Performance better than the existing FP-growth , LCM v.2 algorithms.LCM v.2 algorithms.

Page 30: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Future EnhancementsFuture Enhancements

Compact structure suitable for different data Compact structure suitable for different data distributionsdistributions

Incremental update of the indexIncremental update of the index

Page 31: EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Thank YouThank You