supporting top- k join queries in relational databases
DESCRIPTION
Supporting Top- k join Queries in Relational Databases. Ihab F. Ilyas , Walid G. Aref , Ahmed K. Elmagarmid. Presented by: Richa Varshney. Introduction. O rdered set of join results according to some provided function. Often searches are done on multiple features. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/1.jpg)
Supporting Top-k join Queries in Relational
DatabasesIhab F. Ilyas, Walid G. Aref, Ahmed
K. Elmagarmid
Presented by: Richa Varshney
![Page 2: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/2.jpg)
IntroductionO Ordered set of join results according to
some provided function.
O Often searches are done on multiple features.
O Each feature produces a different ranking for the query.
O Joining the individual feature rankings to produce a global ranking.
2
![Page 3: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/3.jpg)
Example 1: Ranking in Multimedia Retrieval
Color Histogram
Edge Histogram
Texture
Query
Color Histogram
Edge Histogram
Texture
VideoDatabase
3
![Page 4: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/4.jpg)
Example 2
SELECT h.id , s.nameFROM houses h , schools sWHERE h.location = s.locationORDER BY h.price+10 x s.tuitionSTOP AFTER 4
44
![Page 5: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/5.jpg)
Example 2 (Cont’d)
ID
Location Price
123456
LafayetteW.LafayetteIndianapolisKokomoLafayetteKokomo
……
90,000110,000111,000118,000125,000154,000
ID
Location Tuition
12345678
IndianapolisW.LafayetteLafayetteLafayetteIndianapolisIndianapolisKokomoKokomo
30003500600062007000790082008200
SchoolsHouses
1 3 1500001 4 1520002 2 1450003 1 141000
5
![Page 6: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/6.jpg)
MotivationSELECT A.1,B.2FROM A,B,CWHERE A.1 = B.1 and B.2 = C.2ORDER BY (0.3*A.1+0.7*B.2)STOP AFTER 5;
Problems:-• Sorting is an expensive
operation.• Sorting is a blocking
operator.
66
![Page 7: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/7.jpg)
ContributionO Propose a new Rank-Join algorithmO Analyze the I/O cost of the algorithmO Implement the algorithm O Propose a score-guided and adaptive
join strategyO Evaluate performance
77
![Page 8: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/8.jpg)
Ripple JoinCartesian product L x R
(L1(1,1,5) R1(1,3,5))
88
(L2,R2) {(2,2,4),(2,1,4)}
(L2,R1) {2,2,4), (1,3,5)}
(L1,R2) {(1,1,5), (2,1,4)}
L
L
R
R
![Page 9: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/9.jpg)
Variation Of Ripple Join
Rectangle
Block
Hash Ripple Join: where all the sampled tuples are kept in hash tables in memory
99
![Page 10: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/10.jpg)
Query Model: Top-k JoinOm Relations R1, ….., Rm | Ri has:
O n attributesO score attribute, si (can be an expression over
other attributes)
OA global score for a join result is computed as
F(s1,…., sm)
O A top-k join query is an ordered set of join results according to some provided function that combines the orders on each input.
O An example template:
SELECT some_attributesFROM R1,…..,Rm
WHERE join_conditionORDER BY F(s1,…..,sm) STOP AFTER k 10
10
![Page 11: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/11.jpg)
1) Generate new valid join combinations2) Compute score for each combination3) For each incoming input, calculate the
threshold score:a) The last seen feature value and the
top ranked feature value for all other features in the query.
b) Store the maximum of these as T (threshold)
4) Store top k(maximum combined score) results in priority queue.
5) Halt when lowest value of queue ≥ T11
Rank-Join Algorithm
11
![Page 12: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/12.jpg)
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
Compute a Threshold (T) by
Max {(Last L).B + (First R.B), (First L).B + (Last R).B}
(1). Get a valid combination using any certain algorithm
Ripple Select (L1, R1) => No Result1212
Example
![Page 13: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/13.jpg)
Example--Cont.
(1) Get a valid combination using any certain algorithm
Select (L2, R2)
(L2, R2), (L2, R1), (L1, R2) => (L1, R2)
(2) Compute the score (J) for the result
J1(L1, R2) => L.B + R.B = 5 + 4 = 91313
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
![Page 14: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/14.jpg)
O (3) Compute a Threshold (T) score by Max {(Last L).B + (First R.B), (First L).B + (Last R).B}
Selection (L1, R1) , (L2, R2) => T = Max (L2.B + R1.B, L1.B + R2.B) =Max (4+5, 5+4) = 9
O (4) J1= 9 ,T = 9,J1 >= T,Report J1
Since we need top 3 (k=3), continue until k=3 and Min(J1, J2, …Jk) > T
14
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
Example--Cont.
14
![Page 15: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/15.jpg)
(1) Select (L3, R3)
(L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3)
=> (L3, R3), (L2, R3)
(2) J2(L2, R3) = 4 + 3 = 7
J3(L3, R3) = 3 + 3= 6
Example--Cont.
1515
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
![Page 16: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/16.jpg)
O (3) Calculate T= Max { (Last L).B + (First R).B,(First L).B+ (Last R).B} = Max {L3.B + R1.B , L1.B + R3.B}= Max(3 + 5, 5 + 3) = 8
O (4) J1(L1,R2) = 9(reported),J2( L2, R3) = 7 ,J3(L3, R3) = 6 (Note, J’s are in descending order)
Min (J) = 6 < T Continue16
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
Example--Cont.
16
![Page 17: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/17.jpg)
(1)Select (L4, R4) => (L4, R1), (L2, R4), (L3, R4)
(2) J(L4, R1) = 7, J(L2, R4) = 6, J(L3, R4) = 5
(3) T= Max(L4.B+R1.B, L1.B + R4.B) = Max(7, 7) = 7
(4) J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7,J3(L3,R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5Min(J1, J2) = 7 >= T (k = 3)
Example--Cont.
1717
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 3
![Page 18: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/18.jpg)
Hash Rank Join (HRJN) Operator
O Built on idea of hash ripple join
O Initialized by specifying four parameters:O Two inputs(Can be HRJN operator)O Join condition(general equality
condition/computes valid join)O Combining function(monotone/computes
global scores)
O Maintains highest (first) and lowest (last selected) objects from each relation.
O Results are added to a priority queue
1818
![Page 19: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/19.jpg)
Hash Rank Join (HRJN) Operator: Problems
O Buffer ProblemO Cannot predict how many partial joins will
result
O Local Ranking Problem
1919
![Page 20: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/20.jpg)
HRJN Solutions
O Use Block Ripple Join to solve Local Ranking Problem. (e.g. block size = 2)
2020
![Page 21: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/21.jpg)
HRJN Solutions—Cont.
OHRJN* score-guided join strategyO How to select next (block) tuple
T1 = f(Ltop ,Rbottom ) and T2 = f(Lbottom ,Rtop ), where f is the ranking functionCase 1: T1 >T2 , more inputs should beretrieved from R
Case 2: T1 <T2 , more inputs should beretrieved from L 21
21
![Page 22: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/22.jpg)
An adaptive join strategyO Use input availability as a guide instead of
the aforementioned score-guided strategy
O If both inputs are available, choose the next input to process.
O Otherwise, the available input is processed.
O e.g., a mediator over Web-accessible sources and distributed multimedia repositories
2222
![Page 23: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/23.jpg)
Join Order
O When more than two tables join, the join order matters. (A and C have high similarity)
2323
![Page 24: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/24.jpg)
O Rank-Join order heuristic- Get a ranked sample, top S ranked list from L and R- Calculate the similarity using Footrule
24
Join Order Algorithm
where (i, j ) is a valid join result that joins object i from L with object j from R
![Page 25: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/25.jpg)
2525
Rank Join Order Heuristic
![Page 26: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/26.jpg)
Performance Evaluation
Changing the number of required answers: Selectivity = 0.2 % and m= 4
2626
![Page 27: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/27.jpg)
Performance Evaluation--Cont.
Changing the number of required answers: Selectivity = 0.2 % and m= 4
2727
![Page 28: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/28.jpg)
Performance Evaluation--Cont.
Changing the number of required answers: Selectivity = 0.2 % and m= 4
2828
![Page 29: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/29.jpg)
Performance Evaluation--Cont.
Changing the join selectivity: m =4 and K =502929
![Page 30: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/30.jpg)
Performance Evaluation--Cont.
Changing the join selectivity: m =4 and K =503030
![Page 31: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/31.jpg)
Performance Evaluation--Cont.
Changing the join selectivity: m =4 and K =503131
![Page 32: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/32.jpg)
Performance Evaluation--Cont.
Effect of pipelining: selectivity = 0 . 2% and K =50
3232
![Page 33: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/33.jpg)
Performance Evaluation--Cont.
Effect of pipelining: selectivity = 0 . 2% and K =50
3333
![Page 34: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/34.jpg)
Performance Evaluation--Cont.
Effect of pipelining: selectivity = 0 . 2% and K =50
3434
![Page 35: Supporting Top- k join Queries in Relational Databases](https://reader037.vdocuments.us/reader037/viewer/2022110405/56813320550346895d99f190/html5/thumbnails/35.jpg)
Thank You
3535