1 indexing. 2 motivation sells(bar,beer,price )bars(bar,addr ) joe’sbud2.50joe’smaple st....
TRANSCRIPT
1
Indexing
2
Motivation
Sells( bar, beer, price ) Bars( bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00
Query: Find all locations that sell beer for 2.75.
Select addr From Sells, Bars Where (Sells.bar = Bars.bar) and (Sells.price = 2.75)
3
Sells Bars
JOIN Sells.bar = Bars.bar
PROJECTBars.addr
SELECTSells.price = 2.75
Sells( bar, beer, price ) Bars(bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors 3.00
4
Sells Bars
JOIN Sells.bar = Bars.bar
PROJECTBars.addr
SELECTSells.price = 2.75
Can speed upexecution if we have anindex on price
Sells( bar, beer, price ) Bars(bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors 3.00
5
Example of Using an Index
6
Index should be on disk; here’s a disk-based hash index example
The whole table could be organized as an index; here’s a table organized as a hash
7
Summary• Indexes help searching for information far more
efficiently– speed up query execution
• Hash indexes– have to be disk based– can be outside the table, pointing into records in table– or the whole table can be organized as a hash index
• Great for equality search• Not so great for range search
8
9
Range Search
Sells( bar, beer, price ) Bars( bar, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00
Query: Find all locations that sell beer between 2.75 and 3
Select addr From Sells, Bars Where (Sells.bar = Bars.bar) and (Sells.price >= 2.75) and (Sells.price <= 3)
Hashes are not well suited for range search
10
A better idea: organize table as a sorted file
11
We need more than just a sorted file: B+ trees
12
The B+ tree index can be outside the table
13
14
A larger B+ tree example; here leaves point to where the data records reside
80
20 60 100 120 140
10 15 18 20 30 40 50 60 65 80 85 90
10 15 18 20 30 40 50 60 65 80 85 90
Note how the leaves are chained together, to help range search
15
• Parameter d = the degree• Each node has >= d and <= 2d keys (except root)
– if a node has n keys, then it has (n+1) pointers to lower-level nodes
• Each leaf has >=d and <= 2d keys– if a leaf has n keys, then it has n pointers to the data
records with those keys– and also a pointer to the next leaf (to facilitate range
query answering)
B+ Trees Basics
16
B+ Tree Example
80
20 60 100 120 140
10 15 18 20 30 40 50 60 65 80 85 90
10 15 18 20 30 40 50 60 65 80 85 90
d = 2
20 <= k < 60 60 <= k
pointer to data records
each non-root node must have at least 2 keys
k < 20
data records
17
B+ Tree Design• How large is d ?• Example:
– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 byes
• 2d x 4 + (2d+1) x 8 <= 4096• d = 170
18
Searching a B+ Tree• Equality queries (i.e., given exact key values):
– Start at the root– Proceed down, to the leaf
• Range queries:– As above– Then sequential traversal
Select nameFrom peopleWhere age = 18
Select nameFrom peopleWhere 20 <= age and age <= 65
19
Searching a B+ Tree
80
20 60 100 120 140
10 15 18 20 30 40 50 60 65 80 85 90
10 15 18 20 30 40 50 60 65 80 85 90
20
21
B+ Trees in Practice• Typical order: 100. Typical fill-factor: 67%.
– so average fanout = 200*0.67% = 133• Typical capacities:
– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records
• Can often hold top levels in buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 MBytes
22
Sample Exam Question• Assume a file which has 950 thousands (that is, 950000)
records. Assume also that we are indexing this file using a B+ tree. In this particular B+ tree, the average page has an occupancy of 100 pointers (that is, the tree's average branching factor is 100).
• Assume further that the amount of memory set aside for storing the index is 150 blocks, and that all 950000 records of the above file reside on disk. Assume no duplicate search key exists, given a search key K, compute the minimal number of disk I/O needed to retrieve the record with that search key. Explain your answer.
23
Solution• Assume each leaf node points to data records
– so leaf nodes do not contain data records• We have 950,000 records, so need 950,000 pointers from
leaf nodes• Each leaf node can point to 100 records, so need 9,500
leaf nodes• So tree has three levels: root at level 0, 100 nodes at
level 1, 10000 nodes at level 2• Have 150 memory pages for index, so can store root
node + all of level 1 and some of level 2• So minimal cost is 1
24
Some Math• Assume tree has fanout of F (that is, each non-
leaf node has F pointers to lower-level nodes)• Assume tree has N leaf pages• Then the number of levels in tree is log_F(N)• So each insert/delete/lookup costs roughly
log_F(N)– assuming the entire tree is on disk
• This assumes that tree will stay height-balanced• Insert/delete must ensure that tree stay height-
balanced25
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26
To explain insert/delete, we will use the following B+ tree, with a slightly different notation
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
2* means a pair of (key, pointer to data record)
note how pointers are specified using a different notation here
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27
Inserting a Data Entry into a B+ Tree Find correct leaf L.
Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2)
• Redistribute entries: d in L, (d+1) in L2, copy up middle key.
• Insert index entry pointing to L2 into parent of L.
Then if parent of L needs to be split up, do so; this can happen all the way to the root To split a non-leaf node, redistribute entries
evenly, but push up middle key. (Contrast with leaf splits.)
Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at
top.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28
Insert 8*Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
first insert 8* in heresee that node is full; so have to split
2* 3* 5* 7* 8*
5
Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)
s copied up and
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 29
2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*7*5* 8*
17 24 3013
insert 5 in here, but now this node is full
5
appears once in the index. Contrast
5 24 30
17
13
Entry to be inserted in parent node.(Note that 17 is pushed up and only
this with a leaf split.)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 30
Example B+ Tree After Inserting 8*
Notice that root was split, leading to increase in height.
2* 3*
Root
17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 31
Note on Inserting 8* into Example B+ Tree
Observe how minimum occupancy is guaranteed in both leaf and index pg splits.
Note difference between copy-up and push-up; be sure you understand the reasons for this.
2* 3* 5* 7* 8*
5
Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)
s copied up and
appears once in the index. Contrast
5 24 30
17
13
Entry to be inserted in parent node.(Note that 17 is pushed up and only
this with a leaf split.)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 32
Deleting a Data Entry from a B+ Tree
Start at root, find leaf L where entry belongs.
Remove the entry. If L is at least half-full, done! If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).
• will have to change parent a bit (see example)• If re-distribution fails, merge L and sibling.
If merge occurred, must delete entry (pointing to L or sibling) from parent of L.
Merge could propagate to root, decreasing height.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 33
Deleting 19*
2* 3*
Root
17
24 30
14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
Just remove node, doesn’t change tree
2* 3*
Root
17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 34
Now Deleting 20*
2* 3*
17
24 30
14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
Must borrow from a sibling if possible
2* 3*
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
Notice how the middle key is copied up
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 35
Now Deleting 24*
2* 3*
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
Can’t borrow from any sibling; must merge
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*
30135 17
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 36
... Deleting 24*
Must merge. Observe `toss’ of
index entry (on right), and `pull down’ of index entry (below).
30
22* 27* 29* 33* 34* 38* 39*
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*
Root30135 17
37
Comparison• Hash• B+ tree
38
Types of Indexes• clustered vs. non-clustered• primary vs. secondary
39
A non-clustered index
A clustered index
40
A clustered index
41
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 42
Indexes over composite search keys
Composite Search Keys: Search on a combination of fields. Equality query: Every field
value is equal to a constant value. E.g. wrt <sal,age> index:• age=20 and sal =75
Range query: Some field value is not a constant. E.g.:• age =20; or age=20
and sal > 10
sue 13 75
bob
cal
joe 12
10
20
8011
12
name age sal
<sal, age>
<age, sal> <age>
<sal>
12,20
12,10
11,80
13,75
20,12
10,12
75,13
80,11
11
12
12
13
10
20
75
80
Data recordssorted by name
Data entries in indexsorted by <sal,age>
Data entriessorted by <sal>
Examples of composite keyindexes using lexicographic order.