promotion analysis in multi-dimensional space vldb 2009 tianyi wu tianyi wu 1 dong xin 2 qiaozhu mei...
TRANSCRIPT
Promotion Analysis in Multi-Dimensional Space
VLDB 2009Tianyi Wu1 Dong Xin2 Qiaozhu Mei 2 Jiawei Han1 1University of Illinois, Urbana-Champaign, Urbana, IL, USA 2Microsoft Research, Redmond, WA, USA
Presenter : Chun Kit Chui (Kit)Supervisor : Dr. Ben Kao
Content
Introduction Promotion analysis problem
Problem definition Promotiveness measure
The basic query execution framework Subspace pruning Object pruning Promotion cube
Experimental evaluations Conclusion
Introduction
Introduction
Promotion has been playing a key role in marketing…
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Book sales database
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
What is the rank of our book sales among other retailers?
Book sales database
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
What is the rank of our book sales among other retailers?
We ranked the 3rd among all book retailers !
Retailer #Sales
A 61
B 180
C 80
Book sales database
Global aggregate result
E.g. To compute the aggregate value of this cell, we project all tuples with Retailer = “A” and sum up their sales.
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
Book sales database
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Retailer Category Readership #Sales
A Sci & Tech College students 42
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
Book sales database
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Retailer Category Readership #Sales
A Sci & Tech College students 42
B Sci & Tech College students 33
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
Book sales database
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Retailer Category Readership #Sales
A Sci & Tech College students 42
B Sci & Tech College students 33
C Sci & Tech College students 28
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!
Book sales database
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Retailer Category Readership #Sales
A Sci & Tech College students 42
B Sci & Tech College students 33
C Sci & Tech College students 28
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
Full space
Subspaces
Global rank
Local rank
May not be interesting.
Globally low-ranked object may becomes prominent in some subspaces.
Compare with ALL objects in ALL aspects.
Compare with objects in certain area.
Low cost
High cost
Single SQL.
A naïve approach is to compute rank for ALL possible subspaces and return the interesting ones.
Promotion query
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Retailer Category Readership #Sales
A Sci & Tech College students 42
B Sci & Tech College students 33
C Sci & Tech College students 28
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
Object dimension
Subspace dimensions
Scoredimension
Target object
Introduction
Person PromotionAn NBA manager would like to promote
Michael Jordan as a superstar.3rd all time leading scorer.Further analysis…
Top scorer in the guard position. Top scorer on the Chicago Bulls team. 11 individual years’ scoring champion.
Player Position Team Year Game … Score
Michael Jordan
Guard Chicago Bulls
1998 vs N.Y. Knicks
… 33
Michael Jordan
Guard Chicago Bulls
1998 vs Utah Jazz
… 15
Scottie Pippen
Small Forward
Chicago Bulls
1998 vs Utah Jazz
… 18
… … … … … … …
Target object
Object dimension
Subspace dimensions
Scoredimension
Local rank in some subspaces
Introduction
The promotion query problemGiven an object (e.g. a product, a person)Goal: Discover the most interesting
subspaces where the object is highly ranked.
Problem Definition
Promotiveness measure
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
Object dimension
Subspace dimensions
Scoredimension
Problem Definition
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
{2007}
{NY,2007} {WA,2007}
Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
All possible subspaces.
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
{2007}
{NY,2007} {WA,2007}
Note that the target object T1 only appears in year = 2008, therefore the subspace {2007} can be pruned.
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
Target subspaces of T1.
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object SUM(Score)
T1 0.5 + 0.8 = 1.3
T2 1 + 1 = 2
T3 0.3 + 0.6 + 0.7 = 1.6
We project all tuples of T1 into this cell and sum up their scores.
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 1.3Rank (T1) = 1st / 3
Object SUM(Score)
T1 0.5 + 0.8 = 1.3
T2 1 + 1 = 2
T3 0.3 + 0.6 + 0.7 = 1.6
Object Year SUM(Score)
T1 2008 0.5 + 0.8 = 1.3
T2 2008 1
T3 2008 0.7
We project all tuples of T1 with Year = “2008” into this cell and sum up their scores.
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 1
Object SUM(Score)
T1 0.5 + 0.8 = 1.3
T2 1 + 1 = 2
T3 0.3 + 0.6 + 0.7 = 1.6
Object Year SUM(Score)
T1 2008 0.5 + 0.8 = 1.3
T2 2008 1
T3 2008 0.7
Object Location Year SUM(Score)
T1 NY 2008 0.5
T2 NY 2008 NO Tuples !
T3 NY 2008 NO Tuples !
We project all tuples of T1 with Location = “NY” and Year = “2008” into this cell and sum up their scores.
SUM (T1) = 1.3Rank (T1) = 1st / 3
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 1.3Rank (T1) = 1st / 3
SUM (T1) = 0.5Rank (T1) = 1st / 1
Object SUM(Score)
T1 0.5 + 0.8 = 1.3
T2 1 + 1 = 2
T3 0.3 + 0.6 + 0.7 = 1.6
Object Year SUM(Score)
T1 2008 0.5 + 0.8 = 1.3
T2 2008 1
T3 2008 0.7
Object Location Year SUM(Score)
T1 NY 2008 0.5
T2 NY 2008 NO Tuples !
T3 NY 2008 NO Tuples !
T1 ranks 1st in both {2008} and {NY,2008}, which one is more interesting?
Problem Definition
Promotiveness of a subspace S : a class of measures to quantify how well a subspace S can promote the target object T. Rank of the target object, Rank(S,T)
Higher rank -> more promotive.
Significance of the subspace, Sig(S) More significant subspace (e.g. more objects) -> more promotive.
P(S, T) = f( Rank(S, T) ) * g( Sig(S) ) Example
P(S,T) = Rank-1(S,T) P(S,T) = Rank-1(S,T) * ObjectCount(S) P(S,T) = Rank-1(S,T) * I(ObjectCount(S) > MinSig)
Problem Definition
The promotion query problem Input
a target object T a top-R parameter
Output top-R subspaces with the largest P(S, T) scores
Assume simple ranking model P(S,T) = Rank-1(S,T)
Query processing methods
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Partition Aggregation
Start
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
{*} SUM (T1) = 1.3Rank (T1) = 3rd / 3
Partition Aggregation
Start
Start from the coarsest subspace {*}.
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
{*} SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
{NY} {WA}
Partition Aggregation
Start
Partition the data based on the first dimension (i.e. Location). Generate candidate subspaces by substituting values in that dimension.
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
{*}
{NY} {WA}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T2 WA 2008 1.0
T1 WA 2008 0.8
T3 WA 2007 0.6
T3 WA 2008 0.7
Partition Aggregation
Start
Partition the data based on the first dimension (i.e. Location). Generate candidate subspaces by substituting values in that dimension.
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T2 WA 2008 1.0
T1 WA 2008 0.8
T3 WA 2007 0.6
T3 WA 2008 0.7
SUM (T1) = 0.5Rank (T1) = 1st / 2
Recursive operate on the child subspace, perform aggregation.T1 ranks 1st among two objects (i.e. T1 and T3).
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T2 WA 2008 1.0
T1 WA 2008 0.8
T3 WA 2007 0.6
T3 WA 2008 0.7
SUM (T1) = 0.5Rank (T1) = 1st / 2
{NY,2008}{NY,2007}
Partition the data based on the next dimension (i.e. Year). Generate candidate subspaces by substituting values in that dimension.
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T2 WA 2008 1.0
T1 WA 2008 0.8
T3 WA 2007 0.6
T3 WA 2008 0.7
SUM (T1) = 0.5Rank (T1) = 1st / 2
{NY,2008}{NY,2007}
Pruned!!!
Recursive operate on the child subspace.The target object T1 does not appear in this subspace, prune it!!!
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T2 WA 2008 1.0
T1 WA 2008 0.8
T3 WA 2007 0.6
T3 WA 2008 0.7
SUM (T1) = 0.5Rank (T1) = 1st / 2
{NY,2008}{NY,2007}
Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T2 WA 2008 1.0
T1 WA 2008 0.8
T3 WA 2007 0.6
T3 WA 2008 0.7
SUM (T1) = 0.5Rank (T1) = 1st / 2
{NY,2008}{NY,2007}
Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 3rd / 3
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T2 WA 2008 1.0
T1 WA 2008 0.8
T3 WA 2007 0.6
T3 WA 2008 0.7
SUM (T1) = 0.5Rank (T1) = 1st / 2
{NY,2008}{NY,2007}
Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 3rd / 3
{WA,2008}{WA,2007}
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T3 WA 2007 0.6
T1 WA 2008 0.8
T2 WA 2008 1.0
T3 WA 2008 0.7
SUM (T1) = 0.5Rank (T1) = 1st / 2
{NY,2008}{NY,2007}
Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 3rd / 3
{WA,2008}{WA,2007}
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T3 WA 2007 0.6
T1 WA 2008 0.8
T2 WA 2008 1.0
T3 WA 2008 0.7
SUM (T1) = 0.5Rank (T1) = 1st / 2
{NY,2008}{NY,2007}
Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 3rd / 3
{WA,2008}{WA,2007}
Pruned!!!
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T3 WA 2007 0.6
T1 WA 2008 0.8
T2 WA 2008 1.0
T3 WA 2008 0.7
SUM (T1) = 0.5Rank (T1) = 1st / 2
{NY,2008}{NY,2007}
Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 3rd / 3
{WA,2008}{WA,2007}
Pruned!!! SUM (T1) = 0.8Rank (T1) = 2nd / 3
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 2
SUM (T1) = 0.8Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 2nd / 3
{NY,2007}
Pruned!!!
{WA,2007}
Pruned!!!
Object Location Year Score
T1 NY 2008 0.5
T3 NY 2007 0.3
T2 WA 2007 1.0
T3 WA 2007 0.6
T1 WA 2008 0.8
T2 WA 2008 1.0
T3 WA 2008 0.7
{2007} {2008}
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 2
SUM (T1) = 0.8Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 2nd / 3
{NY,2007}
Pruned!!!
{WA,2007}
Pruned!!!
Object Location Year Score
T3 WA 2007 0.6
T3 NY 2007 0.3
T2 WA 2007 1.0
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2008 1.0
T3 WA 2008 0.7
{2007} {2008}
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 2
SUM (T1) = 0.8Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 2nd / 3
{NY,2007}
Pruned!!!
{WA,2007}
Pruned!!!
{2007} {2008}
Object Location Year Score
T3 WA 2007 0.6
T3 NY 2007 0.3
T2 WA 2007 1.0
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2008 1.0
T3 WA 2008 0.7
Pruned!!!
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 2
SUM (T1) = 0.8Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 2nd / 3
{NY,2007}
Pruned!!!
{WA,2007}
Pruned!!!
{2007}
Pruned!!!
Object Location Year Score
T3 WA 2007 0.6
T3 NY 2007 0.3
T2 WA 2007 1.0
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2008 1.0
T3 WA 2008 0.7
SUM (T1) = 1.3Rank (T1) = 1st / 3
Finish!!!
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 2
SUM (T1) = 0.8Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 2nd / 3
{NY,2007}
Pruned!!!
{WA,2007}
Pruned!!!
{2007}
Pruned!!!
Object Location Year Score
T3 WA 2007 0.6
T3 NY 2007 0.3
T2 WA 2007 1.0
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2008 1.0
T3 WA 2008 0.7
SUM (T1) = 1.3Rank (T1) = 1st / 3
P(S,T) = Rank-1(S,T)
Return Top-3 subspaces
Query execution framework Basic framework
To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.
Partition Aggregation
Start
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 2
SUM (T1) = 0.8Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 1
SUM (T1) = 0.8Rank (T1) = 2nd / 3
{NY,2007}
Pruned!!!
{WA,2007}
Pruned!!!
{2007}
Pruned!!!
Object Location Year Score
T3 WA 2007 0.6
T3 NY 2007 0.3
T2 WA 2007 1.0
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2008 1.0
T3 WA 2008 0.7
SUM (T1) = 1.3Rank (T1) = 1st / 3
P(S,T) = Rank-1(S,T) * I(ObjCount(S) > 1)
Return Top-3 subspaces
Query execution framework
Query execution framework
The basic execution framework…Computes ALL subspaces, and thus the
overall cost could be quite prohibitive for large datasets.
Develop optimization techniques based on thresholding techniquesSubspace pruningObject pruning
Efficient computation methods Subspace pruning
Object pruning
Subspace pruning
Key motivation If the upper bound of the promotiveness score of an
unseen subspace is lower than the current top-R promotiveness score, we can prune the subspace.
How to obtain the upper bound of promotiveness scores of the unseen subspaces? P(S,T) = Rank-1(S,T) Obtain a lower bound of the rank.
Subspace pruning
Assumption : The aggregation measure is a
monotone function (e.g. SUM) The Sig measure is also monotone.
Key observations
{*}
{A} {B} {C}
{AB}
{ABC}
{AC} {BC}
Observation 1: Objects in the child subspace must be a member of its parent subspace.
Observation 2: Object’s aggregate score in the child subspace must be smaller than or equal to its parent subspace.
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a}
S2 = {ab}
S3 = {abc}
S4 = {ac}
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} 0.7
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Initialization step: We first scan the dataset once to calculate the aggregate of the target object t7 in each subspace.
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Start: Compute the aggregate of objects in subspace {a}.
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S1(1/3)
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S1(1/3)
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S1(1/3) S2(1/2)
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S2(1/2)
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S2(1/2)
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S2(1/2) Can we deduce a lower bound of Rank of S4?
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S2(1/2) Can we deduce a lower bound of Rank of S4?
Given the tuples in S3, can we deduce some of the members of S4?
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3
S4 = {ac} t3(?) t6(?) t7(?) t1(?) t5(?) 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Can we deduce a lower bound of Rank of S4?
Can we say something about the scores of these members?
Current top-1 result : S2(1/2)Tuples in the child subspace must also appear in the parent subspaces.
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3
S4 = {ac} t3(?) t6(?) t7(0.4) t1(?) t5(?) 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Can we deduce a lower bound of Rank of S4?
Can we say something about the scores of these members?
Current top-1 result : S2(1/2)
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3
S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Can we deduce a lower bound of Rank of S4?
Can we say something about the scores of these members?
Current top-1 result : S2(1/2)Aggregate score of a tuple in the child subspace must be smaller or equal to its score in the parent subspace.
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3
S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Can we deduce a lower bound of Rank of S4?Current top-1 result : S2(1/2)
?
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3
S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4 >= 3
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Can we deduce a lower bound of Rank of S4?Current top-1 result : S2(1/2)
Subspace pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3
S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4 >= 3 ?
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
S4 Pruned!!!
Current top-1 result : S2(1/2)The promotive score of S4 should be less than or equal to 1/3, which is less than the current top-1 promotive score (1/2), so S4 can be pruned!!!!
Subspace pruning
Object pruning
Key motivation Try to prune the objects by obtaining an
upper bound of the aggregate score of unseen objects.
Unseen objects with upper bound smaller than the smallest aggregate score of target object can be pruned.
Object pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} 0.7
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3
Object pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3
Question: Can we prune some objects in the subtree of S1?
Object pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3
Question: Can we prune some objects in the subtree of S1?
0.2 is the upper bound of the aggregate scores of t2 in the subtree of S1. i.e. the aggregate score will only be <= 0.2 !!!
Object pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3
Since the minimum of the aggregate scores of t7 is 0.3, the aggregate scores of t2 will not affect the rank of t7 in the subtree of S1.
0.2 is the upper bound of the aggregate scores of t2 in the subtree of S1. i.e. the aggregate score will only be <= 0.2 !!!
Object pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3
Since the minimum of the aggregate scores of t7 is 0.3, the aggregate scores of t2 will not affect the rank of t7 in the subtree of S1.
Pruned!!!
0.2 is the upper bound of the aggregate scores of t2 in the subtree of S1. i.e. the aggregate score will only be <= 0.2 !!!
Object pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3
Similarly, t4, t5 can also be Pruned!!!
Object pruning
Promotion cube
Promotion cube
Promotion cellGiven a subspace S, a promotion cell S.Pcell is
defined as the sequence of the top-k largest object aggregate scores in S.
Promotion cubeThe promotion cube consists of a set of triples
in the format (S, S.Pcell, Sig), where Sig is the significance of the subspace S.
Promotion cubeS1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Subspace Top-3 largest aggregate scores in the subspace
Sig
S1 = {a} (1.2) (1.0) (0.7) 1
S2 = {ab} (0.7) (0.6) (0.6) 1
S3 = {abc} (0.6) (0.5) (0.3) 1
S4 = {ac} (0.8) (0.7) (0.6) 1
In the promotion cube, we precompute the top-3 largest aggregate scores (not the object) in each subsapce.
Promotion cube
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} 0.7
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Subspace Top-3 largest aggregate scores in the subspace
Sig
S1 = {a} (1.2) (1.0) (0.7) 1
S2 = {ab} (0.7) (0.6) (0.6) 1
S3 = {abc} (0.6) (0.5) (0.3) 1
S4 = {ac} (0.8) (0.7) (0.6) 1
In the promotion cube, we precompute the top-3 largest aggregate scores (not the object) in each subsapce.
Subspace Top-3 largest aggregate scores in the subspace
Sig
S1 = {a} (1.2) (1.0) (0.7) 1
S2 = {ab} (0.7) (0.6) (0.6) 1
S3 = {abc} (0.6) (0.5) (0.3) 1
S4 = {ac} (0.8) (0.7) (0.6) 1
Promotion cube
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} 0.7
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Can you tell the exact rank of t7 in S1? The aggregate score of t7 is 0.7, there are 2 other objects with aggregate value larger than t7!
^
Subspace Top-3 largest aggregate scores in the subspace
Sig
S1 = {a} (1.2) (1.0) (0.7) 1
S2 = {ab} (0.7) (0.6) (0.6) 1
S3 = {abc} (0.6) (0.5) (0.3) 1
S4 = {ac} (0.8) (0.7) (0.6) 1
Promotion cube
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} No need to compute 0.7 3 1/3
S2 = {ab} 0.6
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S1(1/3)
Can you tell the exact rank of t7 in S1? The aggregate score of t7 is 0.7, there are 2 other objects with aggregate value larger than t7!
^
Subspace Top-3 largest aggregate scores in the subspace
Sig
S1 = {a} (1.2) (1.0) (0.7) 1
S2 = {ab} (0.7) (0.6) (0.6) 1
S3 = {abc} (0.6) (0.5) (0.3) 1
S4 = {ac} (0.8) (0.7) (0.6) 1
Promotion cube
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} No need to compute 0.7 3 1/3
S2 = {ab} No need to compute 0.6 2 1/2
S3 = {abc} 0.3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S2(1/2)
^
Subspace Top-3 largest aggregate scores in the subspace
Sig
S1 = {a} (1.2) (1.0) (0.7) 1
S2 = {ab} (0.7) (0.6) (0.6) 1
S3 = {abc} (0.6) (0.5) (0.3) 1
S4 = {ac} (0.8) (0.7) (0.6) 1
Promotion cube
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} No need to compute 0.7 3 1/3
S2 = {ab} No need to compute 0.6 2 1/2
S3 = {abc} No need to compute 0.3 3 1/3
S4 = {ac} 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S2(1/2)^
Subspace Top-3 largest aggregate scores in the subspace
Sig
S1 = {a} (1.2) (1.0) (0.7) 1
S2 = {ab} (0.7) (0.6) (0.6) 1
S3 = {abc} (0.6) (0.5) (0.3) 1
S4 = {ac} (0.8) (0.7) (0.6) 1
Promotion cube
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} No need to compute 0.7 3 1/3
S2 = {ab} No need to compute 0.6 2 1/2
S3 = {abc} No need to compute 0.3 3 1/3
S4 = {ac} No need to compute 0.4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Current top-1 result : S2(1/2)
Can you tell the exact rank of t7 in S4? No! The aggregate score of t7 is 0.4, there are at least 3 objects with aggregate value larger than t7!
^
Subspace Top-3 largest aggregate scores in the subspace
Sig
S1 = {a} (1.2) (1.0) (0.7) 1
S2 = {ab} (0.7) (0.6) (0.6) 1
S3 = {abc} (0.6) (0.5) (0.3) 1
S4 = {ac} (0.8) (0.7) (0.6) 1
Promotion cube
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} No need to compute 0.7 3 1/3
S2 = {ab} No need to compute 0.6 2 1/2
S3 = {abc} No need to compute 0.3 3 1/3
S4 = {ac} No need to compute 0.4 >=3
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
S4 Pruned!!!
Current top-1 result : S2(1/2)
Can you tell the exact rank of t7 in S4? No!The aggregate score of t7 is 0.4, there are at least 3 objects with aggregate value larger than t7!
Experimental evaluations
Settings
ImplementationPentium 3GHz processor2GB of memory160G hard diskWinXP/ Microsoft Visual C# 2008 (in-memory)
Dataset DBLP DatasetTPC-H
Algorithms
PromoRankThe basic query execution framework.
PromoRank++The basic query execution framework
with subspace pruning and object pruning.
PromoCube
DBLP Dataset
Subspace dimensions Conference (2,506) Year (50) Database (boolean) Data mining (boolean) Information retrieval (boolean) Machine learning (boolean)
Object dimension: Author(450K) Score dimension: Paper count Base tuples : 1.76M
DBLP DatasetThe running time increases as R increases. It is because the pruning threshold is determined by the current top-R’s aggregate score. The threshold becomes looser as R becomes larger.
PromoCube performs extremely well when R is small.It is because in such case, the PromoCube can directly return the result using O(1) lookup time.
DBLP DatasetN
um
be
r o
f su
bsp
ace
ag
gre
ga
tion
s
PromoCube performs extremely well when R is small.It is because in such case, the PromoCube can directly return the result using O(1) lookup time.
The running time increases as R increases. It is because the pruning threshold is determined by the current top-R’s aggregate score. The threshold becomes looser as R becomes larger.
TPC-H Dataset
Subspace dimensions l_shipdate (2526) l_quantity (50) l_discount (11) l_tax (9) l_linenumber (7) l_returnflag (3)
Object dimension: l suppkey (10,000) Score dimension: l_extendedprice (ranges from 901.00
to 104949.50) Base tuples: 6,001,215
TPC-H
The gap between PromoRank and PromoRank++ is not large when number of dimensions is small.This is because the total number of target subspace itself is quite small, less chance to perform pruning that exploit parent-child relationship.
PromoCube is increasingly faster w.r.t. number of tuples.
This is because the actual aggregation and partition cost saving of PromoCube is much larger.PromoCube prunes subspace before any aggregation happens, but PromoCube++ prunes subspaces during aggregation process.
Runtime increases when dimensionality increases.This is because there will be more target subspaces when there are more dimensions.
TPC-H
All algorithm’s running time is faster when there are more objects.It is because more objects, less number of target subspaces for each object.With other parameters unchanged, if there are more objects, each object will appear in less tuples, causing less number of target subspaces for each object .
Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality
implies more subspaces. With the same number of tuples, the chance of two
tuples having the same dimension values becomes lower.
Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.
TPC-H{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}{NY,2007} {WA,2007}
{2007}
{*}
{NY}
{NY,2007}
{2007}
{NY,2008}
{2008}
Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality
implies more subspaces. With the same number of tuples, the chance of two
tuples having the same dimension value becomes lower.
Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.
TPC-H{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}{NY,2007} {WA,2007}
{2007}
Object Location Year Score
T1 NY 2007 0.6
T2 NY 2008 0.4
Object Location Year Score
T1 NY 2007 0.6
T2 WA 2008 0.4{*}
{NY}
{NY,2007}
{2007}
{NY,2008}
{2008}
Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality
implies more subspaces. With the same number of tuples, the chance of two
objects having the same dimension value becomes lower (sparse).
Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.
0.6 0.4
1
0.6 0.4
0.6 0.4
Conclusion
Introduced the promotion analysis problem.
Presented a basic query execution framework.
Proposed two pruning techniques and the Promotion Cube for efficient query processing.
The End
Appendix
Object pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Rank P
S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3
S2 = {ab} t6(0.7) t3(0.6) t7(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2
S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3
S4 = {ac} t3(0.8) t6(0.7) t1(0.6) t7(0.4) t5(0.2) t2(0.1) t4(0.1) 0.4 4 1/4
S1={a}
S2={ab} S4={ac}
S3={abc}
Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)
Introduction
Promotion has been playing a key role in marketing…
Manager
Query execution framework{*}
{A} {B}
Basic framework To use a recursive process to
partition and aggregate the data to compute the target object’s rank in each subspace.
Depth-first manner
{C} {D}
{AB}
{ABC}
{ABCD}
{AC} {AD} {BC} {BD} {CD}
{ABD} {ACD} {BCD}
TPC-H
Effectiveness of promotion query