answering top-k queries using views gautam das (univ. of texas), dimitrios gunopulos (univ. of...
TRANSCRIPT
![Page 1: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/1.jpg)
Answering Top-k Queries Using Views
Gautam Das (Univ. of Texas),
Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto),
Dimitris Tsirogiannis (Univ. of Toronto)
![Page 2: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/2.jpg)
VLDB '06
Introduction
Preferences expressed as scoring functions on the attributes of a relation, e.g
tid X1 X2 X3
1 82 1 59
2 53 19 83
3 29 99 15
4 80 45 8
5 28 32 39€
fQ
tid Score
2 612
1 543
4 370
3 360
5 343
Top-k: k tuples with the highest score
€
fQ = 3X1 + 2X2 + 5X3
R
![Page 3: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/3.jpg)
VLDB '06
Related Work
TA [Fagin et. al. ‘96] Deterministic stopping condition Always the correct top-k set
PREFER [Hristidis et. al. ‘01] Stores multiple copies of base relation R Utilizes only one
We complement existing approaches
![Page 4: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/4.jpg)
VLDB '06
Motivation Query answering using views Space-Performance tradeoff Improved efficiency Can we exploit the same tradeoffs for
top-k query answering?
![Page 5: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/5.jpg)
VLDB '06
Problem Statement
€
fQ = 3X1 + 2X2 + 5X3V1 tid Score
3 553
4 385
5 216
2 201
1 169
€
fV1 = 2X1 + 5X2
V2 tid Score
2 351
1 237
5 177
3 159
4 88
€
fV 2 = X2 + 4X3
R tid X1 X2 X3
1 82 1 59
2 53 19 83
3 29 99 15
4 80 45 8
5 28 32 39
Ranking Views: Materialized results of previously asked top-k queriesProblem: Can we answer new ad-hoc top-k queriesefficiently using ranking views?
![Page 6: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/6.jpg)
VLDB '06
Outline LPTA Algorithm View Selection Problem
Cost Estimation Framework View Selection Algorithms
Experimental Evaluation Conclusions
![Page 7: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/7.jpg)
VLDB '06
LPTA - Setting Linear additive scoring functions e.g.
Set of Views: Materialized result of a previously executed
top-k query Arbitrary subset of attributes Sorted access on pairs
Random access on the base table R
€
tid,scoreQ tid( )( )
€
fQ = 3X1 + 2X2 + 5X3
![Page 8: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/8.jpg)
VLDB '06
LPTA - Example
€
tid11
€
s11
€
tid21
€
tid31
€
tid41
€
tid51€
s21
€
s31
€
s41
€
s51
€
tid12
€
s12
€
tid22
€
tid32
€
tid42
€
tid52€
s22
€
s32
€
s42
€
s52
V1 V2
€
tid11
€
tid12
Top-1V1
V2
Qstoppingcondition
X1
X2
R(X1, X2)
€
O = (0,0)
€
P = (1,0)
€
R = (1,1)
€
T = (0,1)
![Page 9: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/9.jpg)
VLDB '06
LPTA
Linear Programming adaptation of TA
€
R(X1,X2)
€
tidd1
€
sd1
€
tidd2
€
sd2
€
max( fQ )
0 ≤ X1,X2 ≤1
2X1 + 5X2 ≤ sd1
X2 + 2X2 ≤ sd2
€
unseenmax ≤ topkmin
€
fV1 = 2X1 + 5X2
€
fV 2 = X1 + 2X2
Q:
€
fQ = 3X1 +10X2
V1 V2
d iteration€
tid
€
Score
€
tid
€
Score
![Page 10: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/10.jpg)
VLDB '06
LPTA - Example (cont’)
€
tid11
€
s11
€
tid21
€
tid31
€
tid41
€
tid51€
s21
€
s31
€
s41
€
s51
€
tid12
€
s12
€
tid22
€
tid32
€
tid42
€
tid52€
s22
€
s32
€
s42
€
s52
V1 V2
€
tid11
€
tid12
€
tid21
€
tid22
Top-1 V1
V2
Qstoppingcondition
X1
X2
R(X1, X2)
€
O = (0,0)
€
P = (1,0)
€
R = (1,1)
€
T = (0,1)
![Page 11: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/11.jpg)
VLDB '06
LPTA Algorithm View Selection Problem
Cost Estimation Framework View Selection Algorithms
Experimental Evaluation Conclusions
Outline
![Page 12: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/12.jpg)
VLDB '06
View Selection Problem Given a collection of views
and a query Q, determine the most efficient subset to execute Q on.
Conceptual discussion Two dimensions Higher dimensions
€
V = {V1,K ,Vr}
€
U ⊆V
![Page 13: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/13.jpg)
VLDB '06
View Selection - 2d
A
B
Min top-k tupleQV1
V2
€
O = (0,0)
€
T = (0,1)
€
P = (1,0)
€
R = (1,1)
€
X
€
YA1
B1
M
![Page 14: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/14.jpg)
VLDB '06
View Selection - Higher d
Theorem: If is a set of views for an -dimensional dataset and Q a query, the optimal execution of LPTA requires a subset of views such that .
Question: How do we select the optimal subset of views?
€
V = {V1,K ,Vr}
€
U ⊆V
€
U ≤ m€
m
![Page 15: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/15.jpg)
VLDB '06
Outline LPTA Algorithm View Selection Problem
Cost Estimation Framework View Selection Algorithms
Experimental Evaluation Conclusions
![Page 16: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/16.jpg)
VLDB '06
Cost Estimation Framework What is the cost of running LPTA when a
specific set of views is used to answer a query?
Cost = number of sequential accesses
Cost = 6 sequential accesses
Min top-k tuple
Can we find that costwithout actually running LPTA?
A
B
QV1
V2
![Page 17: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/17.jpg)
VLDB '06
Simulation of LPTA on Histograms
1. Use HQ to estimate the score of the k highest tuple (topkmin).
2. Simulate LPTA in a bucket by bucket lock step to estimate the cost.
HQ HV1 HV2
topkmin
HQ: approximates the scoredistribution of the query Q
b bucketsn/b tuples per bucket
Cost
![Page 18: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/18.jpg)
VLDB '06
Outline LPTA Algorithm View Selection Problem
Cost Estimation Framework View Selection Algorithms
Experimental Evaluation Conclusions
![Page 19: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/19.jpg)
VLDB '06
View Selection Algorithms Exhaustive (E): Check all possible
subsets of size , . Greedy (SV): Keep expanding the set of
views to use until the estimated cost stops reducing.
€
p ≤ m
€
pr
( )
![Page 20: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/20.jpg)
VLDB '06
Requires the solution of a single linear program.
€
(0,1)
€
(1,0)
€
(0,0)
€
fV j ≤ s
Q Selected Views
€
s
€
s€
s
€
s
€
s
€
max( fQ )
Select Views Spherical (SVS)
T
![Page 21: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/21.jpg)
VLDB '06
Select Views By Angle (SVA)Select Views By Angle (SVA): Sort the views by
increasing angle with respect to Q.
€
(0,1)
€
(1,0)
€
(0,0)
QSelected Views
V1
V2V3V4
€
ϕ1
€
ϕ 2
€
ϕ 3
€
ϕ 4
€
ϕ1 <ϕ 2 <ϕ 3 <ϕ 4
![Page 22: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/22.jpg)
VLDB '06
General Queries and Views Views that materialize their top-k tuples.
Truncate the view histograms. Accommodating range conditions
Select the views that cover the range conditions.
Truncate each attribute’s histogram.
![Page 23: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/23.jpg)
VLDB '06
Outline LPTA Algorithm View Selection Problem
Cost Estimation Framework View Selection Algorithms
Experimental Evaluation Conclusions
![Page 24: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/24.jpg)
VLDB '06
Experiments Datasets (Uniform, Zipf, Real) Experiments:
Performance comparison of LPTA, PREFER and TA
Accuracy of the cost estimation framework Performance of LPTA using each of the
view selection algorithms Scalability of the LPTA algorithm
![Page 25: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/25.jpg)
VLDB '06
Performance comparison of LPTA, PREFER and TA
Uniform dataset, 3dReal dataset, 2d
![Page 26: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/26.jpg)
VLDB '06
Cost Estimation Accuracy
(buckets = 0.5% of n) (buckets = 1% of n)2d
![Page 27: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/27.jpg)
VLDB '06
Performance of LPTA using View Selection Algorithms
(2d) (3d)500K tuples, top-100
![Page 28: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/28.jpg)
VLDB '06
Scalability Experiments on LPTA
(2d, uniform dataset) (500K tuples, top-100)
![Page 29: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/29.jpg)
VLDB '06
Conclusions Using views for top-k query answering LPTA: linear programming adaptation of
TA View selection problem, cost estimation
framework, view selection algorithms Experimental evaluation
![Page 30: Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris](https://reader035.vdocuments.us/reader035/viewer/2022081603/5697bf7a1a28abf838c83185/html5/thumbnails/30.jpg)
VLDB '06
(Thank You!)
Questions?