response db 3

6
Implementation of Database Exercise 3 Tanmaya Mahapatra Matriculation Number : 340959 [email protected] Bharath Rangaraj Matriculation Number : 340909 [email protected] Manasi Jayapal Matriculation Number : 340892 [email protected] November 24, 2013 1 Exercise 3.1 [Evaluating Relational Operators] : Consider the join R ⋊⋉ R.a=S.b S , given the following information about the relations to be joined. The cost metric is the number of page I/Os unless otherwise noted, and the cost of writing out the result should be uniformly ignored. Relation R contains 10,000 tuples and has 10 tuples per page. Relation S contains 2000 tuples and also has 10 tuples per page. Attribute b of relation S is the primary key for S. Both relations are stored as simple heap files. Neither relation has any indexes built on it. 52 buffer pages are available. 1.1 What is the cost of joining R and S using a page-oriented simple nested loops join ? What is the minimum num- ber of buffer pages required for this cost to remain un- changed ? Solution : For Relation R 1. Number of Tuples in R = 10000 2. Number of Tuples per page = 10 = (P R ) 1

Upload: tanmaya1991

Post on 22-Oct-2015

15 views

Category:

Documents


1 download

DESCRIPTION

Database

TRANSCRIPT

Page 1: Response Db 3

Implementation of Database

Exercise 3

Tanmaya Mahapatra

Matriculation Number : 340959

[email protected]

Bharath Rangaraj

Matriculation Number : 340909

[email protected]

Manasi Jayapal

Matriculation Number : 340892

[email protected]

November 24, 2013

1 Exercise 3.1 [Evaluating Relational Operators] :

Consider the join R ⋊⋉R.a=S.b S, given the following information about therelations to be joined. The cost metric is the number of page I/Os unlessotherwise noted, and the cost of writing out the result should be uniformlyignored.

Relation R contains 10,000 tuples and has 10 tuples per page. Relation Scontains 2000 tuples and also has 10 tuples per page. Attribute b of relationS is the primary key for S. Both relations are stored as simple heap files.Neither relation has any indexes built on it. 52 buffer pages are available.

1.1 What is the cost of joining R and S using a page-orientedsimple nested loops join ? What is the minimum num-ber of buffer pages required for this cost to remain un-changed ?

Solution :

• For Relation R

1. Number of Tuples in R = 10000

2. Number of Tuples per page = 10 = (PR)

1

Page 2: Response Db 3

1 EXERCISE 3.1 [EVALUATING RELATIONAL OPER-ATORS] :

2

3. Number of Pages in R = 10000

10= 1000 = (M)

• For Relation S

1. Number of Tuples in S = 2000

2. Number of Tuples per page = 10 = (PS)

3. Number of Pages in S = 2000

10= 200 = (N)

In simple page-oriented the I/O cost is M +M ×N

Considering the Relation R to be outer relation we have :1000 + 1000× 200= 1000 + (2× 105)= 201000 I/OsBut if we consider relation S to be outer relation we have:200 + 200× 1000= 200 + (2× 105)= 200200 I/OsLet the Minimum Number of Buffers required = BWe leave 2 buffers : 1 for Output and 1 For scanning the inner relation. Weread the outer relation R in B-2 buffer pagesso the cost should remain same :The formula is M +N × ⌈ M

B−2⌉

Considering R to be the outer relation we get :1000 + 200× ⌈ 1000

B−2⌉

This value should be equal to = 1000 + (2× 105)=⇒ 1000 + 200× ⌈ 1000

B−2⌉ = 1000 + (2× 105)

=⇒ 200× ⌈ 1000

B−2⌉ = (2× 105)

=⇒ 200× ⌈ 1000

B−2⌉ = 200× 1000

=⇒ ⌈ 1000

B−2⌉ = 1000

The max. value of ⌈ 1000

B−2⌉ should be 1000 for the equation to hold true.

=⇒ 1000

B−2= 1000

=⇒ B − 2 = 1=⇒ B = 3∴ 3 is the minimum number of buffer pages required for this cost to remainunchanged.

1.2 What is the cost of joining R and S using a block nestedloops join? What is the minimum number of buffer pagesrequired for this cost to remain unchanged ?

Solution :

In block nested loop join the total I/O cost is given by : M + ⌈ M

B−2⌉ ×N

Number of Buffers (B) = 52

1.2 What is the cost of joining R and S using a blocknested loops join? What is the minimum number of bufferpages required for this cost to remain unchanged ?

Page 3: Response Db 3

1 EXERCISE 3.1 [EVALUATING RELATIONAL OPER-ATORS] :

3

Buffers to be used (B-2) = 50Considering Relation R to be outer relation & substituing the values in theequation we get:1000 + ⌈1000

50⌉ × 200

= 1000 + ⌈20⌉ × 200= 1000 + 4000= 5000 I/Os.∴ 5000 I/Os is the cost of joining R and S using a block nested loops join(Ris Outer Relation).Considering Relation S to be outer relation & substituing the values in theequation we get:200 + ⌈200

50⌉ × 1000

= 200 + ⌈4⌉ × 1000= 200 + 4000= 4200 I/Os.∴ 4200 I/Os is the cost of joining R and S using a block nested loops join(S is Outer Relation).Let the Minimum Number of Buffer reuired = BWe leave 2 buffers : 1 for Output and 1 For scanning the inner relation. Weread the outer relation R in B-2 buffer pagesso the cost should remain same :Considering relation R to be the Outer Relation. The formula isM+⌈ M

B−2⌉×

N

1000 + ⌈ 1000

B−2⌉ × 200 = 5000

=⇒ ⌈ 1000

B−2⌉ × 200 = 4000

=⇒ ⌈ 1000

B−2⌉ = 20

=⇒ 1000

20= B − 2

=⇒ B − 2 = 50=⇒ B = 52∴ 52 is the minimum number of buffer pages required for this cost to remainunchanged.

1.3 What is the cost of joining R and S using a sort-mergejoin? What is the minimum number of buffer pages re-quired for this cost to remain unchanged ?

Solution :

In sort Merge Join the total cost equals to the cost of sorting R + the costof sorting S + the cost of merging R and S.Cost for sorting RHere M = 1000 and B = 52Let N2 = ⌈M

B⌉

= ⌈100052

1.3 What is the cost of joining R and S using a sort-mergejoin? What is the minimum number of buffer pages re-quired for this cost to remain unchanged ?

Page 4: Response Db 3

1 EXERCISE 3.1 [EVALUATING RELATIONAL OPER-ATORS] :

4

= ⌈19.230769231⌉=⇒ N2 = 20I/O Cost is given by : 2×M × {⌈logB−1N2⌉+ 1}= 2× 1000× {⌈log51 20⌉+ 1}= 2000× {⌈0.76191890317⌉+ 1}= 2000× 2= 4000 I/OsCost for sorting SHere N = 200 and B = 52Let N3 = ⌈M

B⌉

= ⌈20052

⌉= ⌈3.846153846⌉=⇒ N3 = 4I/O Cost is given by :2×N × {⌈logB−1N3⌉+ 1}= 2× 200× {⌈log51 4⌉+ 1}= 400× {⌈0.35258286878⌉+ 1}= 400× 2= 800 I/OsIn addition, the second phase of the sort-merge join algorithm re-quires an additional scan of both relations.∴ The total cost is 4000 + 800 + 1000 + 200 = 6000 I/Os (1000 and 200 forscanning R & S respectively in second pahse)Calculating Minimum Number of Buffers Required to keep thecost sameWith minimum Number of Buffers the I/O cost for sorting the largest Re-lation should remain same. Then other things would also remain sameautomatically.So calculating B for largest Relation RLet B = x

N1 = 1000

x

=⇒ 2×M × {⌈logx−11000

x⌉+ 1} = 4000

=⇒ ⌈logx−11000

x⌉+ 1 = 2

=⇒ ⌈logx−11000

x⌉ = 1

The value of the ceil function should be 1 for the eqn. to hold true.=⇒ (x− 1) = 1000

x

Solving the quadratic Equation :x2 − x− 1000 = 0x = −b±

√b2−4×a×c

2×a

Here a = 1 , b = -1 and c = -1000

x = 1±√1−4×1×−1000

2×1

x = 1±√4001

2×1

x = 1±63.25

2×1

1.3 What is the cost of joining R and S using a sort-mergejoin? What is the minimum number of buffer pages re-quired for this cost to remain unchanged ?

Page 5: Response Db 3

1 EXERCISE 3.1 [EVALUATING RELATIONAL OPER-ATORS] :

5

Ignoring Negative valuesx = 1+63.25

2

x = 32.12=⇒ x = 33∴ 33 is the minimum number of buffer pages required for this cost to remainunchanged.

1.4 What is the cost of joining R and S using a hash join?What is the minimum number of buffer pages requiredfor this cost to remain unchanged ?

Solution :

In Hash Join : In the partitioning phase we have to scan both R and S onceand write them both out once. The cost of this phase is therefore 2(M + N). In the second phase we scan each partition once, assuming no partitionoverflows, at a cost of M + N I/Os. The total cost is therefore 3(M + N)Here M = 1000 and N = 200. Substituting the values :3× (1000 + 200)= 3× 1200= 3600 I/Os.∴ 3600 I/Os is the cost of joining R and S using a hash join.Calculating Number of Minimum BuffersIn this algo we need B >

√f ×M where f is fudge factor (Considering the

largest Outer Relation R)Since B >

√f ×M

=⇒ B >√f ×

√M

=⇒ B >√f and B >

√M

=⇒ B >√1000

B = 31.62B ≃ 32∴ 32 is the minimum number of buffer pages required for this cost to remainunchanged.

1.5 What would be the lowest possible I/O cost for joiningR and S using any join algorithm, and how much bufferspace would be needed to achieve this cost ? Explainbriefly.

Solution :

In Block Nested Loop Join if enough buffer is available then each rela-tion is scanned just once, for a total I/O cost of M + N , which is optimalin comparison to other methods.∴ The lowest possible I/O cost for joining R and S is 1000 + 200 = 1200

1.4 What is the cost of joining R and S using a hash join?What is the minimum number of buffer pages required forthis cost to remain unchanged ?

Page 6: Response Db 3

1 EXERCISE 3.1 [EVALUATING RELATIONAL OPER-ATORS] :

6

I/Os. (Using Block Nested Loops Join)Calculating Minimum Number of Buffers RequiredLet the Minimum Number of Buffers required = BConsidering Relation R to be outer relation. The total I/O cost is given by:M +N × ⌈ M

B−2⌉ =⇒ 1000 + 200× ⌈ 1000

B−2⌉ = 1200

=⇒ 1000 + 200× ⌈ 1000

B−2⌉ = 1200

=⇒ 200× ⌈ 1000

B−2⌉ = 200

=⇒ ⌈ 1000

B−2⌉ = 1

The ceil value should be 1 for eqn. to hold true. But for that 1000 shouldbe divided by a number which should atleast give 1. =⇒ B − 2 = 1000=⇒ B = 1002Checking for correctness if we choose S to be outer RelationN +M × ⌈ N

B−2⌉

= 200 + 1000× ⌈ 200

1000⌉

= 200 + 1000× ⌈0.2⌉= 200 + 1000× 1= 1200

∴ 1002 is the number of buffer space needed to achieve this cost.

1.5 What would be the lowest possible I/O cost for joiningR and S using any join algorithm, and how much bufferspace would be needed to achieve this cost ? Explain briefly.