class 14 joins - harvard universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null,...
Post on 04-Jul-2020
3 Views
Preview:
TRANSCRIPT
joinsprof. Stratos Idreos
HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/
class 14
/34CS165, Fall 2017 Stratos Idreos 2
Milestone1: quite involved but easy on the algorithmic side Milestone2: easy once you understand we just batching data Milestone3: not easy, not easy (all concepts/tools needed) Milestone4: better than M3 but still heavy on concepts Milestone5: should be quick
Testing server: will run twice a day as of this week
Remember: limited chances of success if you try to do this alone
Lab marathon: once more in a couple of weeks
/34CS165, Fall 2017 Stratos Idreos 3
FINAL REPORT CONTAINS EXPERIMENTAL ANALYSIS HOW TO DO EXPERIMENTS?
find out what matters, test by changing one thing at a time
say we want to test the select operator, to compare scan vs secondary index
/34CS165, Fall 2017 Stratos Idreos 4
do not listen to youtube while you run experiments!close all apps, recreate the same environment every time
create scripts for everything
ISOLATE PERFORMANCE AS BEST AS POSSIBLE
/34CS165, Fall 2017 Stratos Idreos 5
resp
onse
tim
e (s
ecs)
data size (GB)re
spon
se ti
me
(sec
s)data size (GB)
e.g., to test the select operator
OK not OK
examples for final evaluation
/34CS165, Fall 2017 Stratos Idreos 6
thro
ughp
ut (q
/s)
# of queriesth
roug
hput
(q/s
)# of queries
e.g., to test shared scans
OK not OK
examples for final evaluation
/34CS165, Fall 2017 Stratos Idreos 7
Midterm2: Nov 15 -> Nov 20?
Midterm1: overall great performance! If you did not score 90+
please consider joining more for OH! If you did score 90+
please consider joining more for OH!
all quizzes, all discussions, all “Read” readings extra weekend OH will be announced
/34CS165, Fall 2017 Stratos Idreos 8
so far
database kernel
data data data
algo
rithm
s/op
erat
ors
disk
memory
cpu
/34CS165, Fall 2017 Stratos Idreos 8
so far
database kernel
data data data
algo
rithm
s/op
erat
ors
disk
memory
cpu
columns, rows, hybrids, trees
/34CS165, Fall 2017 Stratos Idreos 8
so far
database kernel
data data data
algo
rithm
s/op
erat
ors
disk
memory
cpu
scan, binary search, tuple reconstruction, min,
max, search b-tree, etc.
columns, rows, hybrids, trees
/34CS165, Fall 2017 Stratos Idreos 8
so far
database kernel
data data data
algo
rithm
s/op
erat
ors
disk
memory
cpu
scan, binary search, tuple reconstruction, min,
max, search b-tree, etc.
columns, rows, hybrids, trees
early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing
selects down, etc
/34CS165, Fall 2017 Stratos Idreos 9
joins(project=m4)
/34CS165, Fall 2017 Stratos Idreos 10
fact table(id1,id2,…)
dimension table 1(id1,…)
dimension table 2(id2,…)
…
…
…
star schema
/34CS165, Fall 2017 Stratos Idreos 10
fact table(id1,id2,…)
dimension table 1(id1,…)
dimension table 2(id2,…)
…
…
…
star schema
avoid duplicates - minimize update cost - but we have to do joins
/34CS165, Fall 2017 Stratos Idreos 11
…
snowflake schema
/34CS165, Fall 2017 Stratos Idreos 12
professor(id,name,…)
course(id,name, profId,…)
student(id,name,…)
database
give me all students enrolled in cs165select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId
enrolled(studentId,
courseId,…) foreign key
/34CS165, Fall 2017 Stratos Idreos 13
table 1 table 2
joinforeign key
referencing table 2 primary key
find all tuples where FK=PKjoin: glue the data back together
/34CS165, Fall 2017 Stratos Idreos 13
table 1 table 2
joinforeign key
referencing table 2 primary key
find all tuples where FK=PKjoin: glue the data back together
equi join
/34CS165, Fall 2017 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2017 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2017 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2017 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2017 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2017 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2017 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2017 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2017 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2017 Stratos Idreos 14
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3
joinjoin resultkey,payload key,payload
/34CS165, Fall 2017 Stratos Idreos 15
inner join
left outer
right outer
v1, a1 v2, a2
v3, b1 v1, b2
v1, a1, b2
v1, a1, b2 v2, a2, null
v1, a1, b2 v3, null, b1
join
/34CS165, Fall 2017 Stratos Idreos 16
select courses.name=“cs165”
join enrolled.courseid=course.id
students enrolled courses
join student.id=enrolled.studentid
project student.name
good plan
select student.name from students, enrolled, courses where courses.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId
/34CS165, Fall 2017 Stratos Idreos 17
select courses.name=“cs165”
join enrolled.courseid=course.id
students enrolled courses
join student.id=enrolled.studentid
project student.name
select student.name from students, enrolled, courses where courses.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId
pushing selects down
/34CS165, Fall 2017 Stratos Idreos 18
selectR.C
selectS.F
join
fetchR.A
fetchS.A
maxR.D
minS.G
R(A,B,C,D) - S(A,E,F,G)
select max(R.D),min(S.G) from R,S where R.A=S.A and R.C<10 and S.F>30
block operator
access patterns
/34CS165, Fall 2017 Stratos Idreos 19
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2017 Stratos Idreos 19
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2017 Stratos Idreos 20
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2017 Stratos Idreos 20
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
select(Sa,50,65)
/34CS165, Fall 2017 Stratos Idreos 21
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2017 Stratos Idreos 21
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
project m4
/34CS165, Fall 2017 Stratos Idreos 22
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2017 Stratos Idreos 22
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
/34CS165, Fall 2017 Stratos Idreos 23
new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j
nested loops
L R
for all tuples of one side check all tuples of the other side
/34CS165, Fall 2017 Stratos Idreos 24
outer(L) inner(R)join probe the red side
for better data locality
stream outer pages hold inner pages
Level1
Level2
res
Total footprint=L+R+res (bytes), R.pages<=Level1.pages-2
say red fits in Level1
/34CS165, Fall 2017 Stratos Idreos 25
what if not all data fits in main-memory?
what if not all data fits in L3 cache?
what if not all data fits in L2 cache?
what if not all data fits in L1 cache?
can we utilize >1 cores?
/34CS165, Fall 2017 Stratos Idreos 26
for every L.key = R.key pair
return [L.pos,R.pos]
1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5
1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6
joinL: key,payload R: key,payload
1) design a nested loops join algorithm and give its I/O cost2) which column should be the inner and why?3) describe optimizations to minimize Level1 misses
Quickly if there is time think of the following: 4) can we use sorting? 5) how would you use a b-tree to do a join?
level 1
level 2
R.size> Level1.size, L.size>Level1.size, R.size+L.size << L2, Level1 block size = Level 2 block size
CPU
data/results stored one column-at-a-time
/34CS165, Fall 2017 Stratos Idreos 27
comp O(LxR) I/O O(L/lp+Lx(R/rp))
new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j
lp=LeftEntriesThatFitInOnePage rp=RightEntriesThatFitInOnePage L= number of values in L column R= number of values in R column
/34CS165, Fall 2017 Stratos Idreos 28
zig zagouter inner
phase 1
123
outer innerphase 2
123
…
A number of pages will still be
in LLC!
/34CS165, Fall 2017 Stratos Idreos 29
comp O(LxR) I/O O(L/lp+Lx(R/rp))
new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j
lp=LeftEntriesThatFitInOnePage rp=RightEntriesThatFitInOnePage L= number of values in L column R= number of values in R column
/34CS165, Fall 2017 Stratos Idreos 29
comp O(LxR) I/O O(L/lp+Lx(R/rp))
new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j
I/O with zig zag: L/lp + R/rp, if R/rp<=LLC-2
L/lp +Lx(R/rp -(LLC-2)), if R/rp<=2x(LLC-2)
lp=LeftEntriesThatFitInOnePage rp=RightEntriesThatFitInOnePage L= number of values in L column R= number of values in R column
/34CS165, Fall 2017 Stratos Idreos 30
new resL[]; new resR[]; k=0 for (i=0;i<L;i=i+lp) for (j=0;j<R;j=j+rp) for (r=i;r<i+lp;r++) for (m=j;m<j+rp;m++) if L[r]==R[m]
resL[k]=r resR[k++]=m
comp O(LxR) I/O O(L/lp+Lx(R/rp))
comp O(LxR) I/O O(L/lp+(L/lp)x(R/rp))
blocked nested loopsnew resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j
/34CS165, Fall 2017 Stratos Idreos 30
new resL[]; new resR[]; k=0 for (i=0;i<L;i=i+lp) for (j=0;j<R;j=j+rp) for (r=i;r<i+lp;r++) for (m=j;m<j+rp;m++) if L[r]==R[m]
resL[k]=r resR[k++]=m
comp O(LxR) I/O O(L/lp+Lx(R/rp))
comp O(LxR) I/O O(L/lp+(L/lp)x(R/rp))
blocked nested loopsnew resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j
I/O with zig zag: L/lp + R/rp, if R/rp<=LLC-2
L/lp +(L/lp)x(R/rp -(LLC-2)), if R/rp<=2x(LLC-2)
But if R/rp>2x(LLC-2) zig zag does not work anymore… and so we can flip inner/outer decision
/34CS165, Fall 2017 Stratos Idreos 31
sort
on k
ey
sort
on k
ey
merge
sort merge join
while left and right still have tuples if left.val < right.val left++ else if left.val > right.val right++ else add to result, left++, right++
perfect if data is sorted or needs to be sorted anyway
for the rest of the plan
+ handle duplicates on both sides!
/34CS165, Fall 2017 Stratos Idreos 32
utilize index
new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) jk=R.btree.probe(L[i]) if (jk!=null) resL[k]=i
resR[k++]=jk.pos
L R
/34CS165, Fall 2017 Stratos Idreos 33
Read: textbook: chapters 4, 14
DATA SYSTEMSprof. Stratos Idreos
class 14
joins
top related