Supporting Top-k join Queries in Relational
DatabasesIhab F. Ilyas, Walid G. Aref, Ahmed
K. Elmagarmid
Presented by: Richa Varshney
IntroductionO Ordered set of join results according to
some provided function.
O Often searches are done on multiple features.
O Each feature produces a different ranking for the query.
O Joining the individual feature rankings to produce a global ranking.
2
Example 1: Ranking in Multimedia Retrieval
Color Histogram
Edge Histogram
Texture
Query
Color Histogram
Edge Histogram
Texture
VideoDatabase
3
Example 2
SELECT h.id , s.nameFROM houses h , schools sWHERE h.location = s.locationORDER BY h.price+10 x s.tuitionSTOP AFTER 4
44
Example 2 (Cont’d)
ID
Location Price
123456
LafayetteW.LafayetteIndianapolisKokomoLafayetteKokomo
……
90,000110,000111,000118,000125,000154,000
ID
Location Tuition
12345678
IndianapolisW.LafayetteLafayetteLafayetteIndianapolisIndianapolisKokomoKokomo
30003500600062007000790082008200
SchoolsHouses
1 3 1500001 4 1520002 2 1450003 1 141000
5
MotivationSELECT A.1,B.2FROM A,B,CWHERE A.1 = B.1 and B.2 = C.2ORDER BY (0.3*A.1+0.7*B.2)STOP AFTER 5;
Problems:-• Sorting is an expensive
operation.• Sorting is a blocking
operator.
66
ContributionO Propose a new Rank-Join algorithmO Analyze the I/O cost of the algorithmO Implement the algorithm O Propose a score-guided and adaptive
join strategyO Evaluate performance
77
Ripple JoinCartesian product L x R
(L1(1,1,5) R1(1,3,5))
88
(L2,R2) {(2,2,4),(2,1,4)}
(L2,R1) {2,2,4), (1,3,5)}
(L1,R2) {(1,1,5), (2,1,4)}
L
L
R
R
Variation Of Ripple Join
Rectangle
Block
Hash Ripple Join: where all the sampled tuples are kept in hash tables in memory
99
Query Model: Top-k JoinOm Relations R1, ….., Rm | Ri has:
O n attributesO score attribute, si (can be an expression over
other attributes)
OA global score for a join result is computed as
F(s1,…., sm)
O A top-k join query is an ordered set of join results according to some provided function that combines the orders on each input.
O An example template:
SELECT some_attributesFROM R1,…..,Rm
WHERE join_conditionORDER BY F(s1,…..,sm) STOP AFTER k 10
10
1) Generate new valid join combinations2) Compute score for each combination3) For each incoming input, calculate the
threshold score:a) The last seen feature value and the
top ranked feature value for all other features in the query.
b) Store the maximum of these as T (threshold)
4) Store top k(maximum combined score) results in priority queue.
5) Halt when lowest value of queue ≥ T11
Rank-Join Algorithm
11
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
Compute a Threshold (T) by
Max {(Last L).B + (First R.B), (First L).B + (Last R).B}
(1). Get a valid combination using any certain algorithm
Ripple Select (L1, R1) => No Result1212
Example
Example--Cont.
(1) Get a valid combination using any certain algorithm
Select (L2, R2)
(L2, R2), (L2, R1), (L1, R2) => (L1, R2)
(2) Compute the score (J) for the result
J1(L1, R2) => L.B + R.B = 5 + 4 = 91313
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
O (3) Compute a Threshold (T) score by Max {(Last L).B + (First R.B), (First L).B + (Last R).B}
Selection (L1, R1) , (L2, R2) => T = Max (L2.B + R1.B, L1.B + R2.B) =Max (4+5, 5+4) = 9
O (4) J1= 9 ,T = 9,J1 >= T,Report J1
Since we need top 3 (k=3), continue until k=3 and Min(J1, J2, …Jk) > T
14
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
Example--Cont.
14
(1) Select (L3, R3)
(L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3)
=> (L3, R3), (L2, R3)
(2) J2(L2, R3) = 4 + 3 = 7
J3(L3, R3) = 3 + 3= 6
Example--Cont.
1515
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
O (3) Calculate T= Max { (Last L).B + (First R).B,(First L).B+ (Last R).B} = Max {L3.B + R1.B , L1.B + R3.B}= Max(3 + 5, 5 + 3) = 8
O (4) J1(L1,R2) = 9(reported),J2( L2, R3) = 7 ,J3(L3, R3) = 6 (Note, J’s are in descending order)
Min (J) = 6 < T Continue16
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
Example--Cont.
16
(1)Select (L4, R4) => (L4, R1), (L2, R4), (L3, R4)
(2) J(L4, R1) = 7, J(L2, R4) = 6, J(L3, R4) = 5
(3) T= Max(L4.B+R1.B, L1.B + R4.B) = Max(7, 7) = 7
(4) J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7,J3(L3,R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5Min(J1, J2) = 7 >= T (k = 3)
Example--Cont.
1717
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
Hash Rank Join (HRJN) Operator
O Built on idea of hash ripple join
O Initialized by specifying four parameters:O Two inputs(Can be HRJN operator)O Join condition(general equality
condition/computes valid join)O Combining function(monotone/computes
global scores)
O Maintains highest (first) and lowest (last selected) objects from each relation.
O Results are added to a priority queue
1818
Hash Rank Join (HRJN) Operator: Problems
O Buffer ProblemO Cannot predict how many partial joins will
result
O Local Ranking Problem
1919
HRJN Solutions
O Use Block Ripple Join to solve Local Ranking Problem. (e.g. block size = 2)
2020
HRJN Solutions—Cont.
OHRJN* score-guided join strategyO How to select next (block) tuple
T1 = f(Ltop ,Rbottom ) and T2 = f(Lbottom ,Rtop ), where f is the ranking functionCase 1: T1 >T2 , more inputs should beretrieved from R
Case 2: T1 <T2 , more inputs should beretrieved from L 21
21
An adaptive join strategyO Use input availability as a guide instead of
the aforementioned score-guided strategy
O If both inputs are available, choose the next input to process.
O Otherwise, the available input is processed.
O e.g., a mediator over Web-accessible sources and distributed multimedia repositories
2222
Join Order
O When more than two tables join, the join order matters. (A and C have high similarity)
2323
O Rank-Join order heuristic- Get a ranked sample, top S ranked list from L and R- Calculate the similarity using Footrule
24
Join Order Algorithm
where (i, j ) is a valid join result that joins object i from L with object j from R
2525
Rank Join Order Heuristic
Performance Evaluation
Changing the number of required answers: Selectivity = 0.2 % and m= 4
2626
Performance Evaluation--Cont.
Changing the number of required answers: Selectivity = 0.2 % and m= 4
2727
Performance Evaluation--Cont.
Changing the number of required answers: Selectivity = 0.2 % and m= 4
2828
Performance Evaluation--Cont.
Changing the join selectivity: m =4 and K =502929
Performance Evaluation--Cont.
Changing the join selectivity: m =4 and K =503030
Performance Evaluation--Cont.
Changing the join selectivity: m =4 and K =503131
Performance Evaluation--Cont.
Effect of pipelining: selectivity = 0 . 2% and K =50
3232
Performance Evaluation--Cont.
Effect of pipelining: selectivity = 0 . 2% and K =50
3333
Performance Evaluation--Cont.
Effect of pipelining: selectivity = 0 . 2% and K =50
3434
Thank You
3535