best-effort top-k query processing under budgetary constraints
DESCRIPTION
Best-Effort Top-k Query Processing Under Budgetary Constraints. Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI). Yosi Mass, Haggai Roitman. Chen Li. Ralf Schenkel, Gerhard Weikum. Mobile Applications Highly impatient users, need fast results. Motivating Example. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/1.jpg)
1
Best-Effort Top-k Query Processing Under Budgetary Constraints
Michal Shmueli-Scheuer
(IBM Haifa Research Lab and UCI)
Yosi Mass, Haggai Roitman Chen Li Ralf Schenkel, Gerhard Weikum
![Page 2: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/2.jpg)
2
Motivating Example
Engine
Top-kresultsqueries
Michal Shmueli-Scheuer
Top-k
Mobile Applications
Highly impatient
users, need fast
results.
Mediation Systems
Achieve high query throughput.
Online Analytics (e.g. logs)
Achieve high query throughput.
![Page 3: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/3.jpg)
3
• Pre-computed lists over multiple attributes.
• Combine scores by some monotonic aggregation function.
• Two accesses modes:– sorted access (Cs)– random access (Cr)
• Objective: Compute k objects with highest scores.
Traditional top-k query
Rm
c0.9
b0.6
g0.5
…..
a0.4
R1
a0.9
b0.6
c0.5
…..
d0.4
n
m
sort
ed
R2
d0.87
a0.85
f0.5
…..
c0.2
Michal Shmueli-Scheuer
![Page 4: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/4.jpg)
4
NRA algorithm (Fagin et al.)
a[0.9,1.77]
d[0.87,1.77]
Top-2R1
a0.9
b0.6
c0.5
…..
d0.4
R2
d0.87
a0.85
f0.5
.…..
c0.2
Worst score
Best score
highi
mink
candidates
mink > best-score of candidates
f = SUM
Michal Shmueli-Scheuer
![Page 5: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/5.jpg)
5
NRA algorithm (Fagin et al.)
a[1.75,1.75]
d[0.87,1.47]
Top-2R1
a0.9
b0.6
c0.5
…..
d0.4
R2
d0.87
a0.85
f0.25
.…..
c0.2
Worst score
Best score
highi
mink
b[0.6,1.45]
candidates
mink > best-score of candidates
Michal Shmueli-Scheuer
![Page 6: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/6.jpg)
6
NRA algorithm (Fagin et al.)
a[1.75,1.75]
d[0.87,1.37]
Top-2R1
a0.9
b0.6
c0.5
…..
d0.4
R2
d0.87
a0.85
f0.25
.…..
c0.2
Worst score
Best score
highi mink
b[0.6,0.85]
c[0.5,0.75]
f[0.25,0.75]
candidates
mink > best-score of candidates
Michal Shmueli-Scheuer
![Page 7: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/7.jpg)
7
Top-k with Budget Constraints
R1
s0.95
u0.93
t0.92
d0.9
x0.5
y0.4
z0.2
…
R2
a1.0
b0.9
c0.85
d0.8
e0.7
t0.6
f0.4
..
d1.7
t1.52
Top-2NRA: 12Cs = 12
precision =0.5
Cs=1, Cr =3
f = SUM
Access Costs
Sorted access cost- Cs
Random access cost- Cr
Budget =10 ?
TA: 7Cs +7Cr = 28
precision =0Given budget B ,maximize result quality
Michal Shmueli-Scheuer
![Page 8: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/8.jpg)
8
Contributions
• Sorted Accesses– Efficient Plan– Solution with Adaptive
• Sorted and Random Accesses– Efficient Plan– Solution with Adaptive
• Experiments
Michal Shmueli-Scheuer
![Page 9: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/9.jpg)
9
Results Under Limited Budget
Michal Shmueli-Scheuer
K results for unlimited Results for limited budget
budget
![Page 10: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/10.jpg)
10
Efficient Plan- Sorted Accesses
• Assume that we know the k results for unlimited budget (REXACT).
• Plan – {L1,4} {L2,2}
o5
o1
Top-2
P1
P2
Q1
Q2
• Interesting positions- where the k objects appear in the lists.
L1 L2
o1, SL1
o1, SL2
o5, SL1
o2, SL2
o5, SL2
o4, SL2
o8, SL1
o6, SL1
o3, SL2
Michal Shmueli-Scheuer
![Page 11: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/11.jpg)
11
Efficient Plan- Sorted Accesses
• Goal: find plan t, such that :
|||R|maxarg e
||t
xacttBtTt RR
P1
P2
Q1
Q2
L1 L2
o1, SL1
o1, SL2
o5, SL1
o2, SL2
o5, SL2
o4, SL2
o8, SL1
o6, SL1
o3, SL2
Denoted as ROPT
Plans for B=5
Plan: {L1,2} {L2,3}
Michal Shmueli-Scheuer
![Page 12: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/12.jpg)
12
Sorted Accesses
• Observations:
Prefer high scores
L1 L2 L3
O2, SL1 O2, SL2 O2, SL3
O1, SL1 O1, SL2
Michal Shmueli-Scheuer
![Page 13: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/13.jpg)
13
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sco
res
<title>
<description>
Observations – contd.
Prefer large score reductions
title=“war” description=“weapon”
Michal Shmueli-Scheuer
![Page 14: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/14.jpg)
14
Score Utilities
Score gain: Score reduction:
o1, 0.6
o2, 1
o5, 0.8
o4, 0.9
o3, 0.7
y =39.03
8.09.01
2.08.01
Michal Shmueli-Scheuer
![Page 15: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/15.jpg)
15
Optimization Problem
bbts
xLutil
m
ii
m
i
i
1
1
,
.
))(( maximizes
Where m is the number of lists
• Bi-objective optimization problem:
util(Li,x) = * gain +(1-)* reduction
Heuristics:
• Fair Heuristic
• Rank Heuristic
Michal Shmueli-Scheuer
![Page 16: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/16.jpg)
16
Adaptive
gain reduction)) (1-(
time
Michal Shmueli-Scheuer
![Page 17: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/17.jpg)
17
Adaptive
candidates
top-k
o4 [0.6,bs]
o1 [ws,bs]
o2 [ws,bs]
o3 [0.8,bs]
L1 L2 L3
O1, SL1
O1, SL2
O1, SL3
)(
] | )([)(cEi
iiikhighScSPcp
o6 [ws,bs]
hight1
hight2
Theobald et al. VLDB04
(o4) = 0.8-0.6=0.2
Michal Shmueli-Scheuer
![Page 18: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/18.jpg)
18
Adaptive
setcandckkcp
setcandp
.
)(|.|
1ˆ
kp̂
TREC query, k=100
Michal Shmueli-Scheuer
![Page 19: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/19.jpg)
19
Efficient Plan- Random Accesses
• Observations:– random accesses occur always after sorted
accesses have been finished.
schedule 1: {SA……RA……SA….}
schedule 2: {SA……SA……RA….}
precision(schedule1) = precision(schedule2)
Michal Shmueli-Scheuer
![Page 20: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/20.jpg)
20
Observations- contd.
• Random accesses are only useful to objects in REXACT.
L2
o1, SL2
o2, SL2
o5, SL2
o5, Not in
REXACT
top-k
o1 [ws,bs]
o5 [ws,bs]
o2 [ws,bs]
candidates
o4 [ws,bs]o5 [ws,bs]Precision
remains the same
Precision reduced
o1 [ws,bs]
o2 [ws,bs]
o3 [ws,bs]
Michal Shmueli-Scheuer
![Page 21: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/21.jpg)
21
Random Accesses
Gathering with Sorted
Probing with Random
• When to switch from SA to RA?
(1-(
)(
Not enough RAs to prune the candidates
Not enough good candidates, RA is wasted
time
Michal Shmueli-Scheuer
![Page 22: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/22.jpg)
22
Random Accesses
• Switch from Sorted to Random:
R= (1- )*SS – total cost of sorted accesses.
R – total cost for random accesses.
• Which items to access ?– maximize expected score.
S+R > B
Michal Shmueli-Scheuer
![Page 23: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/23.jpg)
23
Experimental Data• TREC Terabyte
– 25M webpages– 50 queries with average length of 3 words.
• IMDB – 375,000 movies– 20 queries , each with 4 attributes: {Title, Genre, Actors, Description}
• Synthetic data
– Zipf, #lists =[2,6], #objects =[10000,1000000]
• Aggregate Function : Sum
Michal Shmueli-Scheuer
![Page 24: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/24.jpg)
24
Evaluation Methods
• percentage of optimal precision
opt
a
precision
precision lg
Michal Shmueli-Scheuer
• SME
RalgRopt RoptRexact
![Page 25: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/25.jpg)
25
50%
60%
70%
80%
90%
500 1000 2000 3000 4000 5000
Budget (#SA)
per
cen
tag
e o
f O
pti
mal
Pre
cisi
on
NRA
KBA
Fair
Ranking
Results- Sorted Accesses
TREC, k=100
• Less budget, more improvement
Michal Shmueli-Scheuer
![Page 26: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/26.jpg)
26
20%
30%
40%
50%
60%
70%
80%
90%
20 50 100
k
per
cen
tag
e o
f O
pti
mal
Pre
cisi
on
NRA
KBA
Fair
Ranking
Varied k
IMDB, B=400
• Lower K, more improvement.
Michal Shmueli-Scheuer
![Page 27: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/27.jpg)
27
40%
60%
80%
100%
2 3 4 5 6
Number of Lists
per
cen
tag
e o
f O
pti
mal
Pre
cisi
on NRA
KBA
Fair
Ranking
Number of Lists
Zipf, K=100, B=4000
• More lists, more improvement.
Michal Shmueli-Scheuer
![Page 28: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/28.jpg)
28
Results- Random Accesses
TREC, k=100,Cr=10
TREC, K=100, Cr=100
![Page 29: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/29.jpg)
29
Related Works• Minimize budget for optimal results:
– the algorithm computes the exact results with minimum cost. (Bast et al. VLDB06, Bruno et al. ICDE02, Chang et al. SIGMOD02)
– Dual problem.• Anytime top-k :
– The algorithm collects statistics during processing, which can be used to provide probabilistic guarantees at any time during processing. (Aray et al. VLDB07)
– Do not do any optimizations.• Approximate top-k:
– approximate results with probabilistic guarantees. (Theobald et al. VLDB04, Fagin et al. 2001)
Michal Shmueli-Scheuer
![Page 30: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/30.jpg)
30
Conclusions
• First attempt to deal with budget constraints.
• For SA only, average precision around 70%.
• Tradeoff between RAs and SAs, for relatively low cost of RA, RA schedules are improved.
Michal Shmueli-Scheuer
![Page 31: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/31.jpg)
31
Thank You !
![Page 32: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/32.jpg)
32
![Page 33: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/33.jpg)
33
• Given a set of n objects and m scoring lists sorted in decreasing order, find the top-k objects according to a scoring function f
• top-k: a set T of k objects such that f(rj1,…,rjm) ≤ f(ri1,…,rim) for every object Xi in T and every object Xj not in T
• Assumption: The scoring function f is monotone– f(r1,…,rm) ≤ f(r1’,…,rm’) if ri ≤ ri’ for all I– Two accesses modes:
• sorted access – Cs• random access - Cr
• Objective: Compute top-k with the minimum cost
Top-k query
![Page 34: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/34.jpg)
34
Sorted Accesses
• Observations:– object with high
scores has higher potential to be part of the top-k.
– object with “mediocre” scores does not help.
Prefer high scores
L1 L2 L3
O1, SL1 O1, SL2 O1, SL3
![Page 35: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/35.jpg)
35
Example
uselessQ
Wireless zone
![Page 36: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/36.jpg)
36
Applications
• Mobile Applications– Highly impatient users, need fast results.
• Mediation Systems– Achieve high query throughput.
• Online analytics (e.g. logs)– Achieve high query throughput.
Michal Shmueli-Scheuer
![Page 37: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/37.jpg)
37
Motivating Example
Query throughput
Mediator
Servers
User query
Engine
Given #queries per
time unit
Allo
cate
tim
e fo
r
each
que
ry
![Page 38: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/38.jpg)
38
Terminology
1. Sorted Access2. Random Access3. highi
4. Top-k queue5. Candidates queue6. mink7. worstScore(d)8. bestScore(d)
![Page 39: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/39.jpg)
39
Efficient Offline Solution- Sorted
• Goal: find trace t, such that :
|||R| e
t
xactt
RR
|||R|maxarg e
||t
xacttBtTt RR
P1
P2
P1
P2
L1 L2
o1, SL1
o1, SL2
o5, SL1
o2, SL2
o5, SL2
o4, SL2
o8, SL1
o6, SL1
o3, SL2Denoted as ROPT
t105
t214
t323
t432
t541
t650
L1 L2
B=5
![Page 40: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/40.jpg)
40
Efficient Offline Solution- Sorted
• Goal: find trace t, such that :
|||R|maxarg e
||t
xacttBtTt RR
P1
P2
P1
P2
L1 L2
o1, SL1
o1, SL2
o5, SL1
o2, SL2
o5, SL2
o4, SL2
o8, SL1
o6, SL1
o3, SL2
• Feasible for K up to 100, and m up to 10.
B =5
t105
t214
t323
t432
t541
t650
L1 L2
![Page 41: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/41.jpg)
41
Efficient Offline Solution- Sorted
• Proof: (in negation)– Assume that t does not exists, and chose trace s that within the budget and has optimal
precision. Assume s` with traces s`i that are largest position of Pi less or equal to si.
– By construction the score of any object in S is the same to S`
![Page 42: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/42.jpg)
42
Fair Heuristic
• Assume budget =b
m
jj
iLi
xLutil
xLutilbSA
1
),(
),(
),(*)1(),(*),( xLutilxLutilxLutil isriasi
Runs in batches
![Page 43: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/43.jpg)
43
Efficient Offline Solution- Random
• Budget for RAs =(B-|t|*Cs)
Top-k
o1, S
o4, S
o2, S
o3, S
d Rexact
o9, S
o5, S
o7, S
o8, S
….
….
best(o)-mink
(best(o) = wosrt(o)+RA)
o10, S
o14, S
….
![Page 44: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/44.jpg)
44
Motivation
• Many applications work in budgeted constraint environments. Still, they wish to perform top-k queries.
Mediator
Servers
User query
Engine
Budget-awareQuery processing
![Page 45: Best-Effort Top-k Query Processing Under Budgetary Constraints](https://reader030.vdocuments.us/reader030/viewer/2022020721/56812b1f550346895d8f1a80/html5/thumbnails/45.jpg)
45
Future work
• Different access costs for different lists
• Time-aware top-k
• Top-k with budget constraints for P2P