1 optimization recap and examples. 2 optimization introduction for every sql expression, there are...

22
1 Optimization Recap and Optimization Recap and examples examples

Post on 20-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

1

Optimization Recap and Optimization Recap and examplesexamples

Page 2: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

2

Optimization introductionOptimization introduction

• For every SQL expression, there are

many possible ways of implementation.

• The different alternatives could result

in huge run-time differences.

• Our aim is to introduce the basic

hardware used, and optimization

principles

Page 3: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

3

Disk-Memory-CPUDisk-Memory-CPU

Delete from Sailors where

sid=90

DISK

sailors

Reserves

Main Memory

CPU

Page 4: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

4

Hardware RecapHardware Recap• The DB is kept on the Disk.

• The Disk is divided into BLOCKS

• Any processing of the information occurs in the Main Memory.

• Therefore, a block which we want to access has to be brought from the Disk to the memory, and perhaps written back.

• Blocks are read/written from/to the Disk as single units.

• The time of reading/writing a block to/from the disk is an I/O operation, and takes a lot of time.

Page 5: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

5

Hardware RecapHardware Recap

• We assume a constant time for each Disk access,

and that only disk access define the run time. • We do not consider writing to the disk

• Every table in the DB is stored as a File (on the Disk), which is a ‘bunch of Blocks’.

• We will deal with files that are ‘heap-sorted’, i.e., there is no order in the file tuples

• Every block contains many tuples, each of them has a Record ID (RID), which states its location:

(number of block, number of tuple within the block)

Page 6: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

6

SIDSNAM

E

ratin

g

ag

e

192

3

Joe832

332

1

Phil941

133

2

Boe733

122

6

Bill623

144

4

Paul121

111

2

Jim333

144

5

Vicky954

RID

Block 1

Block 2

Block 3

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(3,1)

Page 7: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

7

SIDSNAM

E

ratin

g

ag

e

192

3

Joe832

332

1

Phil941

133

2

Boe733

122

6

Bill623

144

4

Paul121

111

2

Jim333

144

5

Vicky954

B blocks

t tuples

Q: What would be the cost of the following queries?

Select * from sailors

Select * from sailors where sname=‘Jim’

Select * from sailors where rating>4

Answer: B

Page 8: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

8

Indexes on filesIndexes on files

• An Index on a table is an additional file

which helps access the data fast.

• An index holds ‘data entries’ to the table file

• The index can have the structure of a B+

Tree, or a hash function.

Page 9: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

9

Tree index Tree index on sname of on sname of

sailorssailors

‘A’->’M’ B1

‘N’->’Z’ B2

‘N’->’T’ L3

‘U’->’Z’ L4

‘A’->’G’ L1

‘H’->’M’ L2

Root block

Leaf blocks

Branch blocks

‘Bill’(2,1)

‘Boe’(1,3)

‘Vicky’ (3,1)

‘Paul’ (2,2)

‘Phil’ (1,2)

‘Jim’(2,3)

‘Joe’(1,1)

B1

L4L3L2L1

B2

SIDSNAM

E

ratin

gage

1923Joe832

3321Phil941

1332Boe733

1226Bill623

1444Paul121

1112Jim333

1445Vicky954

Page 10: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

10

Tree indexTree index

• The tree is kept balanced

• The tree entries are always ordered

• The leaves point to the exact location of

tuples

• Getting to the leaf is typically 2-3 I/O

• Each leaf points to the next/previous leaf

• A Clustered index means that the index and

the table are ordered by the same attribute

Page 11: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

11

Tree index Tree index on sname of on sname of

sailorssailors

‘Bill’(2,1)

‘Boe’(1,3)

‘Phil’ (1,2)

‘Joe’ (2,2)

‘Joe’ (3,1)

‘Jim’(2,3)

‘Joe’(1,1)

B1

L4L3L2L1

B2

SIDSNAM

E

ratin

gage

1923Joe832

3321Phil941

1332Boe733

1226Bill623

1444Joe121

1112Jim333

1445Joe954

How would the following queries be processed?

Select * from sailors where sname=‘Joe’

Select * from sailors

Select * from sailors where sname>’J’

Notice: index Notice: index is not is not

clusteredclustered

Page 12: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

12

Tree index Tree index on sname of on sname of

sailorssailors

‘Bill’(1,1)

‘Boe’(1,2)

‘Phil’ (3,1)

‘Joe’ (2,2)

‘Joe’ (2,3)

‘Jim’(1,1)

‘Joe’(2,1)

B1

L4L3L2L1

B2

SIDSNAM

E

ratin

gage

1226Bill6 23

1332Boe733

1112Jim333

1923Joe832

1444Joe121

1445Joe954

3321Phil941

How would the following queries be processed?

Select * from sailors where sname=‘Joe’

Select * from sailors

Select * from sailors where sname>’J’

Notice: index Notice: index is clusteredis clustered

Page 13: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

13

Hash indexHash index

• Works in a similar way, but using a

hash function instead of a tree

• Works only for equality conditions

• Average of 1.2 I/O to get to the tuple

location

Page 14: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

14

Natural Join Natural Join

• We want to compute

• Naïve algorithm:

SELECT *

FROM Reserves R, Sailors S

WHERE R.sid = S.sid

Foreach tuple r in RForeach tuple s in S

if r.sid=s.sidadd r,s to result

Cost: BR+tR*BS

Running example data

tR=5000

tS=10,000

50 tuples per block

12 buffer pages

= 100+5000*200=1,000,100

Page 15: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

15

Natural Join Natural Join • We want to compute

• We have 4 optional algorithms:

1. Block Nested Loops Join

2. Index Nested Loops Join

3. Sort Merge Join

4. Hash Join

SELECT *

FROM Reserves R, Sailors S

WHERE R.sid = S.sid

This is assuming there is not enough

space in the memory for the smaller of the 2

relations+2

Page 16: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

16

Block Nested Loop JoinBlock Nested Loop Join

• Suppose there are B available blocks in the

memory, BR blocks of relation R, and BS

blocks of relations S, and BR<BS

• Until all blocks of R have been read:

– Read B-2 blocks of R

– Read all blocks of S (one by one), and write the

result

• Run time: BR + BS * ceil(BR /(B-2))= 100+200*100/10=2,100

Page 17: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

17

Index Nested LoopIndex Nested Loop

• Suppose there is an index on sid of Sailors

• Until all blocks of R have been read:– Read a block of R

– For each tuple in the block, use the index of S to locate the matching tuples in S.

• We mark the time it takes to read the tuples in S that match a single tuple in R as X.

• Run time: BR + tR*X

• If the index is clustered, X=2-4

• If it is not clustered, we evaluate X.= 100+5000*3=15,100

Page 18: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

18

• Q: So when would we typically choose

to use an index-nested loop over block-

nested?

• A: Look at the inequality…

Page 19: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

19

Sort-Merge JoinSort-Merge Join

• Sort both relations on the join column

• Join them according to the join algorithm:

sidbiddayagent

2810312/4/96Joe

2810311/3/96Frank

3110110/2/96Joe

3110212/7/96Sam

3110113/7/96Sam

581032/6/96Frank

sidsnameratingage

22dustin745

28yuppy935

31lubber855

36lubber636

44guppy535

58rusty1035

Page 20: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

20

Run time of Sort-MergeRun time of Sort-Merge

• M,N: number of blocks of the relations

• Sorting: MlogM+NlogN

• Merging: N+M if no partition is scanned twice.

• Total: MlogM+NlogN+N+M

• Especially good if one or both of the relations are already sorted.

= 100*7+200*8+100+200=2,600

Page 21: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

21

QuestionQuestion

Suppose:

tuple size= 100 bytes

number of tuples (employees)=3,000

Page size=1000 bytes

You have an unclustered index on Hobby.

You know that 50 employees collect stamps.

Would you use the index?

And for 1,000 stamp-lovers?

SELECT E.dnoFROM Employees EWHERE E.hobby=‘stamps’

Page 22: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different

22

Question 2Question 2

• Length of tuples, Number of tuples– Emp: 20 bytes, 20,000 tuples

– Dept: 40 bytes, 5000 tuples

• Pages contain 4000 bytes; 12 buffer pages

• Which algorithm would you use if there is an unclustered tree index on E.eid? And clustered?

SELECT E.ename FROM Employees E, Departments DWHERE E.eid=D.eid