1 indexing. 2 motivation sells(bar,beer,price )bars(bar,addr ) joe’sbud2.50joe’smaple st....

42
1 Indexing

Upload: ralf-hines

Post on 04-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

1

Indexing

Page 2: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

2

Motivation

Sells( bar, beer, price ) Bars( bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00

Query: Find all locations that sell beer for 2.75.

Select addr From Sells, Bars Where (Sells.bar = Bars.bar) and (Sells.price = 2.75)

Page 3: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

3

Sells Bars

JOIN Sells.bar = Bars.bar

PROJECTBars.addr

SELECTSells.price = 2.75

Sells( bar, beer, price ) Bars(bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors 3.00

Page 4: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

4

Sells Bars

JOIN Sells.bar = Bars.bar

PROJECTBars.addr

SELECTSells.price = 2.75

Can speed upexecution if we have anindex on price

Sells( bar, beer, price ) Bars(bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors 3.00

Page 5: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

5

Example of Using an Index

Page 6: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

6

Index should be on disk; here’s a disk-based hash index example

Page 7: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

The whole table could be organized as an index; here’s a table organized as a hash

7

Page 8: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Summary• Indexes help searching for information far more

efficiently– speed up query execution

• Hash indexes– have to be disk based– can be outside the table, pointing into records in table– or the whole table can be organized as a hash index

• Great for equality search• Not so great for range search

8

Page 9: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

9

Range Search

Sells( bar, beer, price ) Bars( bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00

Query: Find all locations that sell beer between 2.75 and 3

Select addr From Sells, Bars Where (Sells.bar = Bars.bar) and (Sells.price >= 2.75) and (Sells.price <= 3)

Page 10: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Hashes are not well suited for range search

10

Page 11: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

A better idea: organize table as a sorted file

11

Page 12: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

We need more than just a sorted file: B+ trees

12

Page 13: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

The B+ tree index can be outside the table

13

Page 14: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

14

A larger B+ tree example; here leaves point to where the data records reside

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 90

Note how the leaves are chained together, to help range search

Page 15: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

15

• Parameter d = the degree• Each node has >= d and <= 2d keys (except root)

– if a node has n keys, then it has (n+1) pointers to lower-level nodes

• Each leaf has >=d and <= 2d keys– if a leaf has n keys, then it has n pointers to the data

records with those keys– and also a pointer to the next leaf (to facilitate range

query answering)

B+ Trees Basics

Page 16: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

16

B+ Tree Example

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 90

d = 2

20 <= k < 60 60 <= k

pointer to data records

each non-root node must have at least 2 keys

k < 20

data records

Page 17: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

17

B+ Tree Design• How large is d ?• Example:

– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 byes

• 2d x 4 + (2d+1) x 8 <= 4096• d = 170

Page 18: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

18

Searching a B+ Tree• Equality queries (i.e., given exact key values):

– Start at the root– Proceed down, to the leaf

• Range queries:– As above– Then sequential traversal

Select nameFrom peopleWhere age = 18

Select nameFrom peopleWhere 20 <= age and age <= 65

Page 19: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

19

Searching a B+ Tree

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 90

Page 20: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

20

Page 21: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

21

B+ Trees in Practice• Typical order: 100. Typical fill-factor: 67%.

– so average fanout = 200*0.67% = 133• Typical capacities:

– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records

• Can often hold top levels in buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 MBytes

Page 22: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

22

Page 23: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Sample Exam Question• Assume a file which has 950 thousands (that is, 950000)

records. Assume also that we are indexing this file using a B+ tree. In this particular B+ tree, the average page has an occupancy of 100 pointers (that is, the tree's average branching factor is 100).

• Assume further that the amount of memory set aside for storing the index is 150 blocks, and that all 950000 records of the above file reside on disk. Assume no duplicate search key exists, given a search key K, compute the minimal number of disk I/O needed to retrieve the record with that search key. Explain your answer.

23

Page 24: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Solution• Assume each leaf node points to data records

– so leaf nodes do not contain data records• We have 950,000 records, so need 950,000 pointers from

leaf nodes• Each leaf node can point to 100 records, so need 9,500

leaf nodes• So tree has three levels: root at level 0, 100 nodes at

level 1, 10000 nodes at level 2• Have 150 memory pages for index, so can store root

node + all of level 1 and some of level 2• So minimal cost is 1

24

Page 25: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Some Math• Assume tree has fanout of F (that is, each non-

leaf node has F pointers to lower-level nodes)• Assume tree has N leaf pages• Then the number of levels in tree is log_F(N)• So each insert/delete/lookup costs roughly

log_F(N)– assuming the entire tree is on disk

• This assumes that tree will stay height-balanced• Insert/delete must ensure that tree stay height-

balanced25

Page 26: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26

To explain insert/delete, we will use the following B+ tree, with a slightly different notation

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

2* means a pair of (key, pointer to data record)

note how pointers are specified using a different notation here

Page 27: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27

Inserting a Data Entry into a B+ Tree Find correct leaf L.

Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2)

• Redistribute entries: d in L, (d+1) in L2, copy up middle key.

• Insert index entry pointing to L2 into parent of L.

Then if parent of L needs to be split up, do so; this can happen all the way to the root To split a non-leaf node, redistribute entries

evenly, but push up middle key. (Contrast with leaf splits.)

Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at

top.

Page 28: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28

Insert 8*Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

first insert 8* in heresee that node is full; so have to split

2* 3* 5* 7* 8*

5

Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)

s copied up and

Page 29: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 29

2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*7*5* 8*

17 24 3013

insert 5 in here, but now this node is full

5

appears once in the index. Contrast

5 24 30

17

13

Entry to be inserted in parent node.(Note that 17 is pushed up and only

this with a leaf split.)

Page 30: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 30

Example B+ Tree After Inserting 8*

Notice that root was split, leading to increase in height.

2* 3*

Root

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Page 31: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 31

Note on Inserting 8* into Example B+ Tree

Observe how minimum occupancy is guaranteed in both leaf and index pg splits.

Note difference between copy-up and push-up; be sure you understand the reasons for this.

2* 3* 5* 7* 8*

5

Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)

s copied up and

appears once in the index. Contrast

5 24 30

17

13

Entry to be inserted in parent node.(Note that 17 is pushed up and only

this with a leaf split.)

Page 32: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 32

Deleting a Data Entry from a B+ Tree

Start at root, find leaf L where entry belongs.

Remove the entry. If L is at least half-full, done! If L has only d-1 entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• will have to change parent a bit (see example)• If re-distribution fails, merge L and sibling.

If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

Merge could propagate to root, decreasing height.

Page 33: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 33

Deleting 19*

2* 3*

Root

17

24 30

14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Just remove node, doesn’t change tree

2* 3*

Root

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Page 34: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 34

Now Deleting 20*

2* 3*

17

24 30

14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Must borrow from a sibling if possible

2* 3*

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Notice how the middle key is copied up

Page 35: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 35

Now Deleting 24*

2* 3*

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Can’t borrow from any sibling; must merge

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

30135 17

Page 36: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 36

... Deleting 24*

Must merge. Observe `toss’ of

index entry (on right), and `pull down’ of index entry (below).

30

22* 27* 29* 33* 34* 38* 39*

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

Root30135 17

Page 37: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

37

Comparison• Hash• B+ tree

Page 38: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

38

Types of Indexes• clustered vs. non-clustered• primary vs. secondary

Page 39: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

39

A non-clustered index

Page 40: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

A clustered index

40

Page 41: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

A clustered index

41

Page 42: 1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 42

Indexes over composite search keys

Composite Search Keys: Search on a combination of fields. Equality query: Every field

value is equal to a constant value. E.g. wrt <sal,age> index:• age=20 and sal =75

Range query: Some field value is not a constant. E.g.:• age =20; or age=20

and sal > 10

sue 13 75

bob

cal

joe 12

10

20

8011

12

name age sal

<sal, age>

<age, sal> <age>

<sal>

12,20

12,10

11,80

13,75

20,12

10,12

75,13

80,11

11

12

12

13

10

20

75

80

Data recordssorted by name

Data entries in indexsorted by <sal,age>

Data entriessorted by <sal>

Examples of composite keyindexes using lexicographic order.