1 indexing. 2 motivation sells(bar,beer,price )bars(bar,addr ) joe’sbud2.50joe’smaple st....

1

Indexing

2

Motivation

Sells( bar, beer, price ) Bars( bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00

Query: Find all locations that sell beer for 2.75.

Select addr From Sells, Bars Where (Sells.bar = Bars.bar) and (Sells.price = 2.75)

3

Sells Bars

JOIN Sells.bar = Bars.bar

PROJECTBars.addr

SELECTSells.price = 2.75

Sells( bar, beer, price ) Bars(bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors 3.00

4

Sells Bars

JOIN Sells.bar = Bars.bar

PROJECTBars.addr

SELECTSells.price = 2.75

Can speed upexecution if we have anindex on price

Sells( bar, beer, price ) Bars(bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors 3.00

5

Example of Using an Index

6

Index should be on disk; here’s a disk-based hash index example

The whole table could be organized as an index; here’s a table organized as a hash

7

Summary• Indexes help searching for information far more

efficiently– speed up query execution

• Hash indexes– have to be disk based– can be outside the table, pointing into records in table– or the whole table can be organized as a hash index

• Great for equality search• Not so great for range search

8

9

Range Search

Sells( bar, beer, price ) Bars( bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00

Query: Find all locations that sell beer between 2.75 and 3

Select addr From Sells, Bars Where (Sells.bar = Bars.bar) and (Sells.price >= 2.75) and (Sells.price <= 3)

Hashes are not well suited for range search

10

A better idea: organize table as a sorted file

11

We need more than just a sorted file: B+ trees

12

The B+ tree index can be outside the table

13

14

A larger B+ tree example; here leaves point to where the data records reside

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 90

Note how the leaves are chained together, to help range search

15

• Parameter d = the degree• Each node has >= d and <= 2d keys (except root)

– if a node has n keys, then it has (n+1) pointers to lower-level nodes

• Each leaf has >=d and <= 2d keys– if a leaf has n keys, then it has n pointers to the data

records with those keys– and also a pointer to the next leaf (to facilitate range

query answering)

B+ Trees Basics

16

B+ Tree Example

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 90

d = 2

20 <= k < 60 60 <= k

pointer to data records

each non-root node must have at least 2 keys

k < 20

data records

17

B+ Tree Design• How large is d ?• Example:

– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 byes

• 2d x 4 + (2d+1) x 8 <= 4096• d = 170

18

Searching a B+ Tree• Equality queries (i.e., given exact key values):

– Start at the root– Proceed down, to the leaf

• Range queries:– As above– Then sequential traversal

Select nameFrom peopleWhere age = 18

Select nameFrom peopleWhere 20 <= age and age <= 65

19

Searching a B+ Tree

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 90

21

B+ Trees in Practice• Typical order: 100. Typical fill-factor: 67%.

– so average fanout = 200*0.67% = 133• Typical capacities:

– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records

• Can often hold top levels in buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 MBytes

Sample Exam Question• Assume a file which has 950 thousands (that is, 950000)

records. Assume also that we are indexing this file using a B+ tree. In this particular B+ tree, the average page has an occupancy of 100 pointers (that is, the tree's average branching factor is 100).

• Assume further that the amount of memory set aside for storing the index is 150 blocks, and that all 950000 records of the above file reside on disk. Assume no duplicate search key exists, given a search key K, compute the minimal number of disk I/O needed to retrieve the record with that search key. Explain your answer.

23

Solution• Assume each leaf node points to data records

– so leaf nodes do not contain data records• We have 950,000 records, so need 950,000 pointers from

leaf nodes• Each leaf node can point to 100 records, so need 9,500

leaf nodes• So tree has three levels: root at level 0, 100 nodes at

level 1, 10000 nodes at level 2• Have 150 memory pages for index, so can store root

node + all of level 1 and some of level 2• So minimal cost is 1

24

Some Math• Assume tree has fanout of F (that is, each non-

leaf node has F pointers to lower-level nodes)• Assume tree has N leaf pages• Then the number of levels in tree is log_F(N)• So each insert/delete/lookup costs roughly

log_F(N)– assuming the entire tree is on disk

• This assumes that tree will stay height-balanced• Insert/delete must ensure that tree stay height-

balanced25

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26

To explain insert/delete, we will use the following B+ tree, with a slightly different notation

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

2* means a pair of (key, pointer to data record)

note how pointers are specified using a different notation here


Inserting a Data Entry into a B+ Tree Find correct leaf L.

Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2)

• Redistribute entries: d in L, (d+1) in L2, copy up middle key.

• Insert index entry pointing to L2 into parent of L.

Then if parent of L needs to be split up, do so; this can happen all the way to the root To split a non-leaf node, redistribute entries

evenly, but push up middle key. (Contrast with leaf splits.)

Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at

top.


Insert 8*Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

first insert 8* in heresee that node is full; so have to split

2* 3* 5* 7* 8*

5

Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)

s copied up and


2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*7*5* 8*

17 24 3013

insert 5 in here, but now this node is full

5

appears once in the index. Contrast

5 24 30

17

13

Entry to be inserted in parent node.(Note that 17 is pushed up and only

this with a leaf split.)


Example B+ Tree After Inserting 8*

Notice that root was split, leading to increase in height.

2* 3*

Root

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*


Note on Inserting 8* into Example B+ Tree

Observe how minimum occupancy is guaranteed in both leaf and index pg splits.

Note difference between copy-up and push-up; be sure you understand the reasons for this.

2* 3* 5* 7* 8*

5

Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)

s copied up and

appears once in the index. Contrast

5 24 30

17

13

Entry to be inserted in parent node.(Note that 17 is pushed up and only

this with a leaf split.)


Deleting a Data Entry from a B+ Tree

Start at root, find leaf L where entry belongs.

Remove the entry. If L is at least half-full, done! If L has only d-1 entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• will have to change parent a bit (see example)• If re-distribution fails, merge L and sibling.

If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

Merge could propagate to root, decreasing height.


Deleting 19*

2* 3*

Root

17

24 30

14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Just remove node, doesn’t change tree

2* 3*

Root

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*


Now Deleting 20*

2* 3*

17

24 30

14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Must borrow from a sibling if possible

2* 3*

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Notice how the middle key is copied up


Now Deleting 24*

2* 3*

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Can’t borrow from any sibling; must merge

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

30135 17


... Deleting 24*

Must merge. Observe `toss’ of

index entry (on right), and `pull down’ of index entry (below).

30

22* 27* 29* 33* 34* 38* 39*

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

Root30135 17

37

Comparison• Hash• B+ tree

38

Types of Indexes• clustered vs. non-clustered• primary vs. secondary

39

A non-clustered index

A clustered index

40

A clustered index

41


Indexes over composite search keys

Composite Search Keys: Search on a combination of fields. Equality query: Every field

value is equal to a constant value. E.g. wrt <sal,age> index:• age=20 and sal =75

Range query: Some field value is not a constant. E.g.:• age =20; or age=20

and sal > 10

sue 13 75

bob

cal

joe 12

10

20

8011

12

name age sal

<sal, age>

<age, sal> <age>

<sal>

12,20

12,10

11,80

13,75

20,12

10,12

75,13

80,11

11

12

12

13

10

20

75

80

Data recordssorted by name

Data entries in indexsorted by <sal,age>

Data entriessorted by <sal>

Examples of composite keyindexes using lexicographic order.

1 indexing. 2 motivation sells(bar,beer,price )bars(bar,addr ) joe’sbud2.50joe’smaple st....

Documents