cs4432: database systems ii
DESCRIPTION
CS4432: Database Systems II. Lecture #10. Professor Elke A. Rundensteiner. 39. 32. 38. 31. 33. 35. 34. 36. overflow area (not sequential). 10. 20. 30. Example Sequential Index continuous free space. 40. 50. 60. 70. 80. 90. Problems … Problems … Problems …. - PowerPoint PPT PresentationTRANSCRIPT
CS 4432 lecture #10 - b+ tree indexing
1
CS4432: Database Systems IILecture #10
Professor Elke A. Rundensteiner
CS 4432 lecture #10 - b+ tree indexing
2
Example Sequential Index
continuous
free space
102030
405060
708090
39313536
323834
33
overflow area(not sequential)
CS 4432 lecture #10 - b+ tree indexing
3
• Without re-organization we get unpredictable performance
• Too much/often re-organization brings too much overhead
• DBA does not know when to reorganize
• DBA does not know how full to loadpages of new index
Problems … Problems … Problems …
CS 4432 lecture #10 - b+ tree indexing
4
Another type of index
• Give up “sequentiality” of index• Predictable performance under
updates• Achieve always balance of “tree” • Automate restructuring under
updates
CS 4432 lecture #10 - b+ tree indexing
5
Root
B+Tree Example n=3
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
CS 4432 lecture #10 - b+ tree indexing
6
Sample non-leaf
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
57
81
95
CS 4432 lecture #10 - b+ tree indexing
7
B+ Trees in Practice
• Typical order: 100. Typical fill-factor: 67%.– average fanout = 133
• Typical capacities:– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records
• Can often hold top levels in buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 MBytes
CS 4432 lecture #10 - b+ tree indexing
8
Sample non-leaf
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
57
81
95
CS 4432 lecture #10 - b+ tree indexing
9
Sample leaf node:
From non-leaf node
to next leafin
sequence5
7
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 8
5
CS 4432 lecture #10 - b+ tree indexing
10
In textbook’s notationn=3
Leaf:
Non-leaf:
30
35
30
30 35
30
CS 4432 lecture #10 - b+ tree indexing
11
Size of nodes: n+1 pointersn keys
(fixed)
CS 4432 lecture #10 - b+ tree indexing
13
Full nodemin. node
Non-leaf
Leaf
n=3
12
01
50
18
0
30
3 5 11
30
35
counts
even if
null
Non-leaf: (n+1)/2 pointers
Leaf: (n+1)/2 pointers to data
CS 4432 lecture #10 - b+ tree indexing
14
B+tree rules tree of order n
(1) All leaves at same lowest level(balanced tree)
(2) Pointers in leaves point to records except for “sequence pointer”
CS 4432 lecture #10 - b+ tree indexing
16
Root
B+Tree Example : Searches
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
CS 4432 lecture #10 - b+ tree indexing
17
Insert into B+tree
(a) simple case– space available in leaf
(b) leaf overflow(c) non-leaf overflow(d) new root
CS 4432 lecture #10 - b+ tree indexing
18
(a) Insert key = 32 n=33 5 11
30
31
30
100
32
CS 4432 lecture #10 - b+ tree indexing
19
(a) Insert key = 7 n=3
3 5 11
30
31
30
100
3 5
7
7
CS 4432 lecture #10 - b+ tree indexing
20
(c) Insert key = 160 n=3
10
0
120
150
180
150
156
179
180
200
160
18
0
160
179
CS 4432 lecture #10 - b+ tree indexing
21
(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
30new root
CS 4432 lecture #10 - b+ tree indexing
22
Recap: Insert Data into B+ Tree
• Find correct leaf L. • Put data entry onto L.
– If L has enough space, done!– Else, must split L (into L and a new node L2)
• Redistribute entries evenly, copy up middle key.• Insert index entry pointing to L2 into parent of L.
• This can happen recursively– To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.)
• Splits “grow” tree; root split increases height. – Tree growth: gets wider or one level taller at top.
CS 4432 lecture #10 - b+ tree indexing
23
(a) Simple case (b) Coalesce with neighbor (sibling)
(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf
Deletion from B+tree
CS 4432 lecture #10 - b+ tree indexing
24
(a) Delete key = 11 n=33 5 11
30
31
30
100
CS 4432 lecture #10 - b+ tree indexing
25
(b) Coalesce with sibling– Delete 50
10
40
100
10
20
30
40
50
n=4
40
CS 4432 lecture #10 - b+ tree indexing
26
(c) Redistribute keys– Delete 50
10
40
100
10
20
30
35
40
50
n=4
35
35
CS 4432 lecture #10 - b+ tree indexing
27
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Coalese and Non-leaf coalese– Delete 37
n=4
40
30
25
25
new root
CS 4432 lecture #10 - b+ tree indexing
28
Delete Data from B+ Tree
• Start at root, find leaf L where entry belongs.• Remove the entry.
– If L is at least half-full, done! – If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).
• If re-distribution fails, merge L and sibling.
• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.
• Merge could propagate to root, decreasing height.
CS 4432 lecture #10 - b+ tree indexing
29
• Concurrency control harder in B-Trees• B-tree consumes more space• B-tree automatically decides :
– when to reorganize– how full to load pages of new index
• Buffering– B-tree: has fixed buffer requirements– Static index: must read several overflow blocks to be efficient (variable size buffers)
Discussion of B-trees (vs. static indexed sequential files)
CS 4432 lecture #10 - b+ tree indexing
30
ComparisonB-tree vs. indexed seq.
file• Less space, so
lookup faster• Inserts managed
by overflow area• Requires
temporary restructuring
• Unpredictable performance
• Consumes more space, so lookup slower
•Each insert/delete potentially restructures
•Build-in restructuring
• Predictable performance
CS 4432 lecture #10 - b+ tree indexing
31
• Speaking of buffering… Is LRU a good policy for B+tree
buffers?Of course not!
Should try to keep root in memory at all times
(and perhaps some nodes from second level)
CS 4432 lecture #10 - b+ tree indexing
33
Interesting problem:
For B+tree, how large should n be?
…
n is number of keys / node
CS 4432 lecture #10 - b+ tree indexing
34
assumptions: n children per node and N records in database
(1) Time to read B-Tree node from disk is (tseek + tread*n) msec.(2) Once in main memory, use binary search to locate key, (a + b log_2 n) msec(3) Need to search (read) log_n (N) tree nodes
(4) t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N)
CS 4432 lecture #10 - b+ tree indexing
35
Can get: f(n) = time to find a record
f(n)
nopt n
FIND nopt by f’(n) = 0
What happens to nopt as:•Disk gets faster? CPU get faster? …
CS 4432 lecture #10 - b+ tree indexing
36
Bulk Loading of B+ Tree
• For large collection of records, create B+ tree.• Method 1: Repeatedly insert records slow.• Method 2: Bulk Loading more efficient.
CS 4432 lecture #10 - b+ tree indexing
37
Bulk Loading of B+ Tree
• Initialization: – Sort all data entries – Insert pointer to first (leaf) page in new (root) page.
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Sorted pages of data entries; not yet in B+ treeRoot
CS 4432 lecture #10 - b+ tree indexing
38
Bulk Loading (Contd.)
• Index entries for leaf pages always entered into right-most index page
• When this fills up, it splits.
Split may go up right-most path to root.
3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*
Root
Data entry pages
not yet in B+ tree3523126
10 20
3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*
6
Root
10
12 23
20
35
38
not yet in B+ treeData entry pages
CS 4432 lecture #10 - b+ tree indexing
39
Summary of Bulk Loading
• Method 1: multiple inserts.– Slow.– Does not give sequential storage of leaves.
• Method 2: Bulk Loading – Has advantages for concurrency control.– Fewer I/Os during build.– Leaves will be stored sequentially (and
linked) – Can control “fill factor” on pages.