![Page 1: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/1.jpg)
October 2005, 24-27 DAmN Workshop - Bertinoro
A Novel Incremental Approach to Constraint-Based Mining
Rosa MeoUniversity of Torino, Italy
![Page 2: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/2.jpg)
Outline
Motivationsconstraint-based miningiterative and interactive mining, inductive databases
Query evaluation exploiting materializationsConstraints properties
Item dependent Context dependent
Incremental algorithmsperformance results
Conclusions
![Page 3: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/3.jpg)
Motivations
Knowledge Discovery from Databases (KDD) is usually an interactive and iterative process This sequence consists in constraint-based queries which are very often a refinement of previous onesThe system is a multi-user environmentInductive queries are iceberg queries and are executed on-line
![Page 4: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/4.jpg)
Motivations
ProblemEach new query cannot be executed from scratch: unfeasible workload for the extraction engine
Proposed solutionThe problem can be solved materializing previous queries and adopting an “incremental” engine, in the sense that it can derive the result of a query Q reusing the result of previous, “correlated” queries
![Page 5: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/5.jpg)
Incremental Mining
The term incremental usually refers to updating the result set of a query when the database has been updatedNow, we want to update the result of a query when a new (“correlated”) query is submitted and the database is not changed
Purpose: enhancing query evaluation exploiting the materializations
![Page 6: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/6.jpg)
Optimizing Mining Queries
The novel languages for data mining need for optimizerscan exploit the available information in the database:
database schema (keys and functional dependencies)data values distributions and constraint selectivitythe indices (mining indices)results of previous, correlated queries
![Page 7: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/7.jpg)
A Generic Mining Language
A very generic constraint-based mining query requestsextraction from a source table
R=Q(T,G,I,Γ(M),Ξ)
![Page 8: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/8.jpg)
A Generic Mining Language
A very generic constraint-based mining query requestsextraction from a source table of a set of items (on some schema)
R=Q(T,G,I,Γ(M),Ξ)
![Page 9: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/9.jpg)
A Generic Mining Language
A very generic constraint-based mining query requestsextraction from a source table of a set of items (on some schema) satisfying some user defined constraints (mining constraints)
R=Q(T,G,I,Γ(M),Ξ)
![Page 10: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/10.jpg)
A Generic Mining Language
A very generic constraint-based mining query requestsextraction from a source table of a set of items (on some schema) satisfying some user defined constraints (mining constraints)from the groups of the database (grouping constraints)
R=Q(T,G,I,Γ(M),Ξ)
![Page 11: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/11.jpg)
A Generic Mining Language
A very generic constraint-based mining query requestsextraction from a source table of a set of items (on some schema) satisfying some user defined constraints (mining constraints)from the groups of the database (grouping constraints)The number of such groups must be sufficient (user defined statistical evaluation measure, such as support)
R=Q(T,G,I,Γ(M),Ξ)
![Page 12: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/12.jpg)
An Example
transaction customer product date price quantity1 1001 ski_pants 12/7/98 140 1 1 1001 hiking_boots 12/7/98 180 1 3 1001 jacket 13/7/98 300 1 2 2256 col_shirt 13/7/98 25 2 2 2256 brown_boots 13/7/98 150 1 2 2256 jacket 13/7/98 300 1 4 2256 col_shirt 14/7/98 25 3 4 2256 jacket 14/7/98 300 2
purchase
Mining Query
R=Q(purchase,customer,product,price>100,support_count>=2)
R itemset support_count{jackets} 2
![Page 13: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/13.jpg)
Relationships Between Two Queries Q1 and Q2
Query equivalence: R1=R2no computation is needed
Query containment: R2 ⊆ R1We need to determine which element I in R1 satisfies also (the more tight) constraints of Q2
In the general case we need to make access to the database and verify that I satisfies constraints of Q2
In same “lucky” cases we can evaluate constraints only once for each I, with no need to make any scan of the database
Context dependent constraints (CDC)
Item dependent constraints (IDC)
![Page 14: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/14.jpg)
Searching for Equivalence
In order to allow the optimizer to recognize equivalent queries, we allow query rewritingQuery rewriting: determination of a relational expression on a set of other queries whose result is equivalent to the result of the rewritten query but is better in terms of execution costs
![Page 15: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/15.jpg)
Query Rewriting
Query rewriting of a query for itemsetmining: join of other results
Q
Q1
Q=(T,G,I,P1(M1)∧ P2(M2) ∧ P3(M3), Ξ)
Q1=(T,G,I,P1(M1), Ξ)
Q2=(T,G,I,P2(M2), Ξ)
Q3=(T,G,I, P3(M3), Ξ)
TQ2
QnR
![Page 16: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/16.jpg)
Query Rewriting
Query rewriting of a query for itemsetmining: join of other results
Q
Q1
Q1=(T,G,I,P1(M1), Ξ)
Q2=(T,G,I,P2(M2), Ξ)
Q3=(T,G,I, P3(M3), Ξ)
TQ2Q=(T,G,I,P1(M1)∧ P2(M2) ∧ P3(M3), Ξ)
R1
Qn
![Page 17: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/17.jpg)
Query Rewriting
Query rewriting of a query for itemsetmining: join of other results
Q
Q1
Q1=(T,G,I,P1(M1), Ξ)
Q2=(T,G,I,P2(M2), Ξ)
Q3=(T,G,I, P3(M3), Ξ)
TQ2Q=(T,G,I,P1(M1)∧ P2(M2) ∧ P3(M3), Ξ)
Qn
R2
![Page 18: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/18.jpg)
Query Rewriting
Query rewriting of a query for itemsetmining: join of other results
Q
Q1
Rn
Q1=(T,G,I,P1(M1), Ξ)
Q2=(T,G,I,P2(M2), Ξ)
Q3=(T,G,I, P3(M3), Ξ)
TQ2Q=(T,G,I,P1(M1)∧ P2(M2) ∧ P3(M3), Ξ)
Qn
![Page 19: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/19.jpg)
Query Rewriting (2)
Query rewriting of a query for itemsetmining: union of other results
Q
Q1
R
Q=(T,G,I,P1(M1)∨P2(M2)∨P3(M3), Ξ)
Q1=(T,G,I,P1(M1), Ξ)
Q2=(T,G,I,P2(M2), Ξ)
Q3=(T,G,I, P3(M3), Ξ)
TQ2
Qn
![Page 20: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/20.jpg)
Query Rewriting (2)
Query rewriting of a query for itemsetmining: union of other results
Q
Q1
Q1=(T,G,I,P1(M1), Ξ)
Q2=(T,G,I,P2(M2), Ξ)
Q3=(T,G,I, P3(M3), Ξ)
TQ2Q=(T,G,I,P1(M1)∨P2(M2)∨P3(M3), Ξ)
R1
Qn
![Page 21: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/21.jpg)
Query Rewriting (2)
Query rewriting of a query for itemsetmining: union of other results
Q
Q1
R2
Q1=(T,G,I,P1(M1), Ξ)
Q2=(T,G,I,P2(M2), Ξ)
Q3=(T,G,I, P3(M3), Ξ)
TQ2Q=(T,G,I,P1(M1)∨P2(M2)∨P3(M3), Ξ)
Qn
![Page 22: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/22.jpg)
Query Rewriting (2)
Query rewriting of a query for itemsetmining: union of other results
Q
Q1
Q1=(T,G,I,P1(M1), Ξ)
Q2=(T,G,I,P2(M2), Ξ)
Q3=(T,G,I, P3(M3), Ξ)
TQ2Q=(T,G,I,P1(M1)∨P2(M2)∨P3(M3), Ξ)
Rn
Qn
![Page 23: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/23.jpg)
Query Rewriting (2)
Query rewriting of a query for itemsetmining: union of other results
Q
Q1
Q1=(T,G,I,P1(M1), Ξ)
Q2=(T,G,I,P2(M2), Ξ)
Q3=(T,G,I, P3(M3), Ξ)
TQ2Q=(T,G,I,P1(M1)∨P2(M2)∨P3(M3), Ξ)
R
Qn
![Page 24: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/24.jpg)
time_keyproduct_keystore_keypromotion_key
qtydollar_salesdollar_cost
sales fact
time_keytime attributes
time dimension
promotion dimension
promotion_key
promotion attributes
store dimension
store attributes
store_key
product dimension
product_keyproduct attributes
Fact table
customer_key
customer dimension
customer attributes
customer_key
Dimension tables
![Page 25: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/25.jpg)
Query RewritingCan also be used for query containment
In the “lucky” cases, we have R2 = R1 σC2(DI)In the general case, we have R2 = ρ σC2(R1 F DJ)
Q1
Q1=(T,G,I,P1(M1), Ξ)
Q2
Q2=(T,G,I,P1(M1) ∧ P2(M2) , Ξ)
R1
I, SIQ1I, sI
Q2R2
![Page 26: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/26.jpg)
IDC vs CDCItem Dependent Constraints (IDC )
are functionally dependent on the item extracted
are satisfied for a given itemset either for all the groups in the database or for none
if an itemset is common to R1 and R2, it will have the same support: inclusion
Context Dependent Constraints (CDC )depend on the transactions in the
database
might be satisfied for a given itemsetonly for some groups in the database
a common itemset to R1 and R2 might not have the same support: dominance
230020/8/98jacket22564
32513/7/98col shirt22564
130013/7/98jacket22563
214013/7/98ski pants22562
225 12/7/98col shirt22562
230017/7/98jacket10012
118012/7/98hiking boots10011
1140 12/7/98ski pants10011
quantityprice date product customertransaction
2
2
2
IDC: price > 150
CDC: qty > 1
![Page 27: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/27.jpg)
time_keyproduct_keystore_keypromotion_key
qtyunits_salesdollar_cost
sales fact
time_keytime attributes
time dimension
promotion dimension
promotion_key
promotion attributes
store dimension
store attributes
store_key
product dimension
product_keyproduct attributes
customer_key
customer dimension
customer attributes
customer_key
Item dependent constraints
Context dependent constraints
![Page 28: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/28.jpg)
Incremental Algorithm for IDC
Q2…..
Constraint: price >10…..
Item Domain Tableitem priceABC
12148
categoryhi-techhi-techhousing
Rules in memory BODY HEAD
A B…
1R1
Q1…..
Constraint: price > 5 …..
Previous query
SUPP CONF
A C2
…… ………
BODY HEAD
A B 2 1R2
SUPP CONF
… … … …
(R2=σP(R1))
delete from R1 all rules containing item C
Failitem C belongs to a row that does not satisfy the new IDC constraint
Fail
Current query
![Page 29: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/29.jpg)
Incremental Algorithm for CDC
Q2…..
Constraint: qty >10…..
Current query
read the DB find groups -in which new constraints are satisfied-containing items belonging to BHF
update support counters in BHF
R2
BODY HEAD
… … … …SUPP CONF
build BHF
…
Q1…..
Constraint: qty > 5…..
Previous query
Rules in memory
BODY HEAD
… … … …SUPP CONF
R1
![Page 30: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/30.jpg)
Body-Head Forest (BHF)
gm
a (4) f
g (3)
body (head) tree contains itemsets which are candidates for being in the body (head) part of the rule
an itemset is represented as a single path in the tree and vice versa
each path in the body (head) tree is associated to a counter representing the body (rule) support
a f g rule:
rule support = 3confidence = 3/4
![Page 31: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/31.jpg)
Database for the Experiments
Retail databaseGid, item, price, category, discount, qty, cost
125 000 rows10 000 transactions (gid)12.5 items (average) for each transaction22 item categoriesPrice uniformly distributed: 100 – 20 000price⇔discount (discount=price*0.2)
![Page 32: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/32.jpg)
Experiments on some Typical BI Queries
Breakdown of Execution Times for all the Query Evaluation Components
9
13
25
6800016300
708500
119100
1
10
100
1000
10000
100000
1000000
lowPriceItems musicToComputers
Query
Exec
utio
n Ti
me
(0.1
ms)
parseoptimizeprepareexecute
![Page 33: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/33.jpg)
Experiments on the Optimizer
Time (s) Time (s)
100
1
.001
0.48
0.45
0.42
Query size10 15 30
Query Catalog Size25 50 100
![Page 34: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/34.jpg)
Searching for Equivalence with a Genetic Approach
Q1Q2
Q3 Q4 Q…
Q1∪Q2Q1∩Q2
Q2∪Q3 Q2∩Q3
Qx= Q1∪Q2 ∪Q4 ∩Q2∩Q3
Q1∪Q2 ∩Q3
![Page 35: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/35.jpg)
Searching for Equivalence with a Genetic Approach
Q1Q2
Q3 Q4 Q…
Q1∪Q2Q1∩Q2
Q2∪Q3 Q2∩Q3
Qx= Q1∪Q2 ∪Q4 ∩Q2∩Q3
Q1∪Q2 ∩Q3
![Page 36: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/36.jpg)
Searching for Equivalence with a Genetic Approach
Candidate query rewritings for target query are represented by a bit string (genome). With
p: number of predicates d: number of disjunctsgenome of a query is: 01001 … 10101
Ex. with predicates A,B,C,D:A∧B∨C∧D =1100 0011
Search is guided by the fitness function (Hamming distance between the target query and a candidate query rewriting)
pd
p
![Page 37: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/37.jpg)
Experiments on the Genetic Search for Equivalence
(q=10, p=10)(q=10, d=1)
(p=10, d=1)
![Page 38: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/38.jpg)
Experiments on Containment: ID and CD
ID algorithm
execution time vsconstraint selectivity
execution time vsvolume of previous result
(a) (b)
CD algorithm
(c) (d)
![Page 39: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/39.jpg)
Experiments on Query Inclusion
Queries with Item Dep. Constraints
0,000,200,400,600,801,001,201,401,60
96,49 77,85 42,49 37,17 26,52 17,89 6,603 0
Constraints Selectivity (%)
Exec
utio
n Ti
me
(s)
Time (s)
![Page 40: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/40.jpg)
Comparison betweenTraditional vs Incremental
From scratch vs Incremental Approach
0
20000
40000
6000080000
100000
120000
140000
parse optimize prepare execute
query evaluation components
Exe
cutio
n tim
e (0
.1 m
s)
From ScratchIncremental
![Page 41: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/41.jpg)
Evaluation of Incremental Algorithm
From Scratch:suitable to treat item and context dep. constraints as wellWorks in two steps 1. freq. singletons satisfying constraints2. DFS for itemsets generation
INCR:proposed incremental algorithmWorks in two steps1. Loads in memory previous result R1
2. Checks on any I∈R1 the new constraints and update support
![Page 42: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/42.jpg)
Experiments onIncremental Algorithms (1)
Evaluation of Constraints Factor
Behavior with respect to Constraints
0100200300400500600700800900
1000
price
>=30
0pri
ce>=
400
price
>=50
0pri
ce>=
600
price
>=70
0pri
ce>=
800
price
>=90
0pri
ce>=
1000
price
>=11
00pri
ce>=
1200
price
>=13
00pri
ce>=
1400
price
>=16
00pri
ce>=
1800
price
>=20
00
Constraint Selectivity
Tim
e ScratchINCR
![Page 43: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/43.jpg)
Experiments onIncremental Algorithms (2)
Evaluation of MinSup Factor
Behavior With Respect to MinSup
0
200
400
600
800
1000
0.0085 0.009 0.0095 0,010 0,015 0,020 0,025 0,030
Minimum Support Threshold
time INCR
Scratch
![Page 44: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/44.jpg)
Experiments onIncremental Algorithms (3)
Evaluation of Result Set Cardinality Factor
Behavior With Respect to Previous Result Cardinality
050
100150200250300350
6482 7058 8780 10728 17114 38808 158336
# Elements in Result of Qprec
Tim
e Scratch
INCR
![Page 45: A Novel Incremental Approach to Constraint-Based Miningadobra/DaMn/talks/BertinoroDaMn-meo.pdf · Motivations Knowledge Discovery from Databases (KDD) is usually an interactive and](https://reader035.vdocuments.us/reader035/viewer/2022063008/5fbcca5bc293bf0a457dcc83/html5/thumbnails/45.jpg)
Conclusions
We have proposed query rewriting for itemsetmining as a promising strategy We defined item and context dependent constraints and studied their role in query rewritingExperiments show that this approach is feasible and in many cases necessary to speed up execution times in inductive databases and data mining.