chapter 12 query processing (2) yonsei university 2 nd semester, 2013 sanghyun park
DESCRIPTION
Join Operation Several different algorithms to implement joins Nested-loop join Block nested-loop join Indexed nested-loop join Merge-join Hash-join Choice based on cost estimate Examples use the following information Number of records of student5,000takes10,000 Number of blocks of student100takes400TRANSCRIPT
![Page 1: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/1.jpg)
Chapter 12Query Processing (2)
Yonsei University2nd Semester, 2013
Sanghyun Park
![Page 2: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/2.jpg)
Outline Overview (already covered) Measures of Query Cost (already covered) Selection Operation (already covered) Sorting (already covered) Join Operation Other Operations Evaluation of Expressions
![Page 3: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/3.jpg)
Join Operation Several different algorithms to implement joins
Nested-loop join Block nested-loop join Indexed nested-loop join Merge-join Hash-join
Choice based on cost estimate
Examples use the following information Number of records of student 5,000 takes 10,000 Number of blocks of student 100 takes 400
![Page 4: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/4.jpg)
Nested-Loop Join (1/2) To compute the theta join r s
for each tuple tr in r do beginfor each tuple ts in s do begin test pair (tr, ts) to see if they satisfy the join condition if they do, add tr• ts to the resultend
end
r is called the outer relation and s the inner relation of the join
Requires no indices and can be used with any kind of join condition
Expensive since it examines every pair of tuples in the two relations
![Page 5: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/5.jpg)
Nested-Loop Join (2/2) In the worst case, if the buffer can hold only one block of each relation,
the estimated cost is nr bs + br block accesses
If the smaller relation fits entirely in memory, use that as the inner relation. The cost is reduced to br + bs block accesses
Assuming worst case memory availability, cost estimate is 5000 400 + 100 = 2,000,100 block accesses
with student as outer relation 10000 100 + 400 = 1,000,400 block accesses
with takes as outer relation
If smaller relation (student) fits entirely in memory, the cost estimate will be 500 block accesses
Block nested-loop join algorithm (next slide) is preferable
![Page 6: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/6.jpg)
Block Nested-Loop Join Variant of nested-loop join in which every block of inner relation is paired
with every block of outer relation
for each block Br of r do beginfor each block Bs of s do begin for each tuple tr in Br do begin for each tuple ts in Bs do begin Check if (tr, ts) satisfy the join condition if they do, add tr
• ts to the result end endendend
Worst case estimate: br bs + br block accesses
![Page 7: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/7.jpg)
Indexed Nested-Loop Join Index lookups can replace file scans if
Join is an equi-join or natural join and An index is available on the inner relation’s join attribute
For each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple tr
Cost of the join: br + nr c Where c is the cost of a single selection using the join condition
If indices are available on join attributes of both r and s,use the relation with fewer tuples as the outer relation
![Page 8: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/8.jpg)
Merge-Join (1/2) Sort both relations on their join attribute
(if not already sorted on the join attributes)
Merge the sorted relations to join them
![Page 9: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/9.jpg)
Merge-Join (2/2) Can be used only for equi-joins and natural joins
Each block needs to be read only once
Thus number of block accesses for merge-join isbr + bs + the cost of sorting if relations are unsorted
![Page 10: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/10.jpg)
Hash-Join (1/2) Applicable for equi-joins and natural joins
A hash function h is used to partition tuples of both relations
h maps JoinAttrs values to {0,1,…,n} where JoinAttrs denotes the common attributes of r and s used in the natural join r0, r1, . . ., rn denote partitions of r tuples s0, s1. . ., sn denote partitions of s tuples
r tuples in ri need only to be compared with s tuples in si
![Page 11: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/11.jpg)
Hash-Join (2/2)
![Page 12: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/12.jpg)
Other Operations Duplicate elimination can be implemented via sorting or hashing
On sorting, duplicates will come adjacent to each other Hashing is similar – duplicates will come into the same bucket
Projection is implemented by performing projection on each tuple, which gives a relation that could have duplicate records, and then removing duplicate records
Aggregation can be implemented in a manner similar to duplicate elimination Sorting or hashing can be used to bring tuples in the same group
together, and then the aggregate functions can be applied on each group
![Page 13: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/13.jpg)
Evaluation of Expressions So far, we have studied how individual operations are carried out;
now we consider how to evaluate an expression containing multiple operations
The obvious way is simply to evaluate one operation at a time, in an appropriate order; the result of each evaluation is materialized in a temporary relation for subsequent use
A disadvantage to this method is the need to build the temporary relation, which (unless they are small) must be written to disk
An alternative approach is to evaluate several operations simultaneously in a pipeline with the results of one operation passed on to the next, without the need to store a temporary relation
![Page 14: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/14.jpg)
Materialization Evaluate one operation at a time, starting at the lowest-level;
store intermediate results into temporary relations to evaluatenext-level operations
Consider the expression:∏name(building = “Watson”(department) instructor)
![Page 15: Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park](https://reader033.vdocuments.us/reader033/viewer/2022052305/5a4d1b5f7f8b9ab0599ac8c9/html5/thumbnails/15.jpg)
Pipelining Evaluate several operations simultaneously, passing the results of
one operation to the next
Consider the previous example again: Don’t store result of building=“Watson”(department) Instead, pass tuples directly to the join Similarly, don’t store result of join, pass tuples directly to the projection
Much cheaper than materialization
Pipelining may not always be possible