example. bulk nested-loop joins using buffers: e.g. 22 blocks
Post on 26-Dec-2015
221 Views
Preview:
TRANSCRIPT
Example
Bulk Nested-Loop JoinsUsing Buffers: e.g. 22 blocks
Sort Merge Joins
Notes From Winter 2015
VisualizationsSailors
Page 1
Page 2
Page 3
Page 4
VisualizationsSailors
Record 1Record 2Record 3Record 4Record 5
Page 2
Page 3
Page 4
VisualizationsSailors
Record 1Record 2Record 3Record 4Record 5
Page 2
Page 3
Page 4
Reserves
Simple Nested Loops JoinKey idea:
Take each record of S and match it with each record of R.
Steps:1. Get tuple of S.2. Iterate through
each tuple in R.
Sailors Reserves
Simple Nested Loops JoinKey idea:
Take each record of S and match it with each record of R.
Steps:1. Get tuple of S.2. Iterate through
each tuple in R.
Sailors Reserves
(name = Bob, sid = 1)
Simple Nested Loops JoinKey idea:
Take each record of S and match it with each record of R.
Steps:1. Get tuple of S.2. Iterate through
each tuple in R.
Sailors Reserves
(name = Bob, sid = 1) (sid = 3, bid = 6)(sid = 1, bid = 4)
(name = Bob, sid = 1, bid = 4)Output:
Simple Nested Loops JoinKey idea:
Take each record of S and match it with each record of R.
Steps:1. Get tuple of S.2. Iterate through
each tuple in R.
Sailors Reserves
(name = Bob, sid = 1) (sid = 3, bid = 6)(sid = 1, bid = 4)
(name = Bob, sid = 1, bid = 4)Output:
(sid = 1, bid = 7)
(name = Bob, sid = 1, bid = 7)
Sort-Merge JoinKey idea:
Sort S and R, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Sam, sid = 3)(name = Sue, sid = 7)
(name = Jill, sid = 2)(name = Joe, sid = 12)
(name = Sue, sid = 8)
(name = Yue, sid = 4)
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
(name = Bob, sid = 1, bid = 4)Output:
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)
Output:
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)
Output:
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)
Output:
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)
Output:
. . .
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)
Output:
. . .
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)
Output:
. . .
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)
Output:
. . .
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)
Output:
. . .
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
(name = Bob, sid = 1, bid = 4)(name = Bob, sid = 1, bid = 7)(name = Sam, sid = 3, bid = 6)
Output:
. . .
Sort-Merge JoinKey idea:
Sort S and R on join column, then merge them!
Steps:1. Sort S and R.2. “Zip” or merge.I/Os:
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
Optimizing Sort-Merge JoinKey idea:
Internal Sort on both. Perform merge on all runs!
Steps:1. Internal sort S
and R. (Pass 0)2. Merge all runs.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)(name = Yue, sid = 4)(name = Sue, sid = 7)
(name = Sue, sid = 8)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 3, bid = 6)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)(sid = 8, bid = 15)(sid = 12, bid = 1)
. . .
Optimizing Sort-Merge JoinKey idea:
Internal Sort on both. Perform merge on all runs!
Steps:1. Internal sort S
and R. (Pass 0)2. Merge all runs.
Sailors Reserves
(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Yue, sid = 4)(name = Sue, sid = 8)
(name = Jack, sid = 18)
(name = Cat, sid = 22). . .
(name = Sam, sid = 3)(name = Sue, sid = 7)(name = Joe, sid = 12)
. . .
(sid = 1, bid = 4)(sid = 1, bid = 7)(sid = 4, bid = 3)(sid = 8, bid = 1)
(sid = 8, bid = 13)
(sid = 12, bid = 1). . .
(sid = 3, bid = 6)(sid = 8, bid = 15)
. . .
External Sorting
• Classic interview question: how to sort if data don’t fit in memory.
External Sorting: 2-Way Merge Sort
What is a sorted run?(name = Bob, sid = 1)(name = Jill, sid = 2)
(name = Sam, sid = 3)
(name = Sue, sid = 6)(name = Kev, sid = 8)(name = Jack, sid = 9)
(name = Joe, sid = 10)(name = Sid, sid = 12)(name = Sal, sid = 15)
(name = Bit, sid = 1)(name = Bat, sid = 2)(name = Tam, sid = 3)
(name = Foo, sid = 6)(name = Bar, sid = 8)
(name = Bam, sid = 9)
(name = Ke, sid = 10)(name = Kay, sid = 12)(name = Al, sid = 15)
A sorted subset of a table.Above we have pages with tuple size = 3There are two sorted runs both with a length of 3 pages. (We denote the size of a sorted run by how many pages the sorted run spans)
2 Way Merge Sort
• Given N pages of tuples to sort• B = 3 case• Pass 0: Quicksort a single page• 2-way merge sort: how many passes?– 1+ceil(log_2(N)) – 1 for pass 0, ceil(log_2(N)) because pass 0 has N runs
and at each level we merge 2 runs together– ceil of this because the number of runs in pass 0 might
not be a a power of two, but we still need to keep merging until we're down to 1 run,
Generalized Sort Merge
top related