bhargav vadher (208) april 9 th, 2008 submittetd to: dr. t y lin computer science department san...

14
Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University

Upload: asher-york

Post on 17-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Bhargav Vadher (208)APRIL 9th, 2008

Submittetd To:Dr. T Y Lin

Computer Science DepartmentSan Jose State University

Introduction Multipass sort-based algorithm. Performance of multipass sort-based

algorithm. Multipass hash-based algorithm. Performance of multipass hash-based

algorithm.

So far we seen most of algorithm required two passes.

But, what if relation R is big and required multipass.› Multipass sort-based algorithm.› Multipass hash-based algorithm.

Assume that › Number of memory buffer = M› We have relation R and S

BASIS:if B(R) ≤ M then› Read R in main memory› Sort R by favorite sorting algorithm› Write R back to disk.

INDUCTION:if B(R) > M then› Partition R in M blocks (R1, R2, …….RM)

› Sort Ri recursively i = 1,2,3….M

› Merge sorted sub list into one

If we are not just sorting but also want to do unary operation› just modify the previous algorithm to calculate δ and γ.

for δ output 1 copy of each distinct tuple and discard the rest.

for γ sort only on grouping attribute. combine tuples by grouping attribute.

Finally› Divide the M buffers between R and S according to number of

block in R and S acquired.› for R M * B(R) / (B(R) + B(S))

S rest of buffer blocks available.

Suppose S(M, k) = Max size of relation sorted with M block of buffer and k passes.

BASIS:If k = 1 only one pass allowedso, B(R) ≤ M S(M, 1) = M

INDUCTION:If k > 1 multiple pass allowed› partition R into M buffer blocks› S(M, k) = M S(M, k-1)

where, k-1 = no. of pass for each block of R.

Each pass of algorithm…› Requests data from disk› Sort it with accordance method› Write it back to disk

So, k – pass sorting algorithm requires› 2k B(R) disk I/O operations

And, multipass sorting algorithm requires› 2 (k-1) (B(R) + B(S)) disk I/O operation for sort sub list

+› B(R) + B(S) disk I/O operation for merging sorted sub list in

final phase

Basics:› alternative approach of multipass algorithm› has the relations in M-1 buckets,

where, M is number of memory buffers› for unary, apply the operation to each bucket

individually› for binary, apply the operation to each

corresponding pair of bucket

The approach can be described as…BASIS:

for unaryif the relation fits into the M memory blocks

› Read it into the memory from disk› Perform the operation on it

for binaryif one of them relation fits into the M-1 memory

blocks› Read that relation into main memory M-1 blocks› Read second relation 1 block at a time into Mth block› Perform the operation

INDUCTION:If none of two relation fits into the main memory buffers

› Hash each relation into main memory’s M-1 buckets.

› Hash the alternative relations in Mth bucket.› Recursively perform the operation on each bucket

or pairs of

corresponding buckets.› Accumulate the output form each of the bucket

For unary operation:Assume

› operations are like δ and γ› Relation is R› Number of buuffer M› u(M, k) = number of blocks in largest relation with k pass

hash

BASIS:If u(M, 1) = M, since R must be fitted in M buffers

so, B(R) ≤ M

INDUCTION: Assume that first step divides R into M-1 equal

buckets. The buckets of second relation must be small

enough to be handled by k-1 passes. So, buckets are of size u(M, k-1). Since R is divided in M-1 buckets, we have

› u(M, k) = (M-1) u(M, k-1).

if we expand the recurrence above we can perform unary operation of relation R in k passes with M

buffers› provided that M ≤ (B(R)) 1/k

For binary operation:BASIS:

If we use the one pass algorithm to join then› Either R or S must be fit into M-1 blocks.› j(M, 1) = M-1.

INDUCTION:› On the first of k passes, divide the R into M-1 buckets so each

buckets is of 1 / (M-1) of entire relation. So, j(M, k) = (M-1) j(M, k-1)

› So, we can join R(X, Y) S(Y, Z) using k passes and M buffers Provided Mk ≥ min (B(R), B(S))

Q & A

Thank You