b-treescs.boisestate.edu/~scutchin/cs321/lectures/b-trees_ro_apr15.pdf · 16 b-trees &...
TRANSCRIPT
![Page 1: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/1.jpg)
B-Trees
CS321 Spring 2019
Steve Cutchin
![Page 2: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/2.jpg)
2
B-Tree Motivation
• When data is too large to fit in main memory, then
the number of disk accesses becomes important.
• A disk access is unbelievably expensive compared
to a typical computer instruction (mechanical
limitations).
• One disk access is worth about 200,000
instructions.
• The number of disk accesses will dominate the
running time.
![Page 3: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/3.jpg)
3
Motivation Cont..
• Secondary memory (disk) is divided into equal-
sized blocks (typical sizes are 512, 2048, 4096 or
8192 bytes)
• The basic I/O operation transfers the contents of
one disk block to/from main memory.
• Our goal is to devise a multiway search tree that
will minimize file accesses (by exploiting disk
block read).
![Page 4: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/4.jpg)
4
m-ary Trees
• A node contains multiple keys.
• Order of subtrees is based on parent node’s keys
• If each node has m children & there are n keys
then the average time taken to search the tree is
logmn.
Etc.
K1 K2 K3 K4
T1 T2 T3
K < K1 K1 < K < K2
![Page 5: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/5.jpg)
B Tree Definition
• A B-Tree is a search tree with a root node.
• Each node in a B-Tree can have multiple keys.
• Each node in a B-Tree can have multiple children.
• The number of children is dependent on the
number of keys.
• A node in a B-Tree has at most 1 more child than
it has keys.
5
![Page 6: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/6.jpg)
Layout of a B-Tree
6
Each node has at most 3 keys and 4 children.
Each node has a minimum of 2 children.
This is a 2-3-4 B-Tree
![Page 7: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/7.jpg)
Important Metrics
• The minimal degree of a B-Tree is defined as:
– Degree = t, t >= 2.
– Every node except root has at least t children.
– Every node except root has at least t-1 keys.
– Every node except root has at most 2*t – 1 keys.
• The order of a B-Tree is defined as:
– Order = m
– No node may have more than m children.
• Therefore: Order = 2*degree; 7
![Page 8: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/8.jpg)
Layout of a B-Tree
8
What is the degree of this B-Tree?
What is the order of this B-Tree?
![Page 9: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/9.jpg)
Size of B Trees
• All leaves in a tree have the same depth.
• The depth of a B-Tree is uniform and equal to its
height.
• By definition all B-Trees are balanced.
9
![Page 10: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/10.jpg)
Size of B Trees
• For a given B-Tree with n keys and degree t
• Height h <= logt((n+1)/2);
• For a given B-tree with height of h and degree t
• n >= 2 * th - 1
10
![Page 11: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/11.jpg)
B-Tree and Block Size
• A B-Tree Node is usually the size of a Disk Page.
• So if a Disk Page = 4096 bytes we want our Node
to be that size:
• Say, 84 bytes overhead for the Node.
• 4 Bytes for each key. 4 Bytes for each child
pointer. 4 bytes for num keys, 4 bytes num
children.
11
![Page 12: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/12.jpg)
B-Tree and Block Size
• 4096 = 4K + 4C + 4 + 4 + 84.
• C = K+1.
• 4096 = 4K + 4K+4 + 4 + 4 + 84.
• 4096 = 8K + 12 + 84
• 4096 -12 -84 = 8K
• K = 500 Keys per Node for one block.
• C = 501 Children per Node for each block.
• A tree of height 2 has 125,751,500 Keys
• A tree of height 2 has 251,503 Disk Blocks.
12
![Page 13: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/13.jpg)
13
Definition of a B-Tree
• Def: B-tree of degree t is a tree with the following
properties.:
– The root has at least 2 children, unless it is a leaf.
– Every non-root node must have t-1 keys.
– Every non-root internal node has t children.
– If the tree is non-empty the root has at least one key.
– Every node may have at most 2t-1 keys.
– An internal node may have at most 2t children.
– A full tree occurs when every node has 2t-1 keys.
![Page 14: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/14.jpg)
Components of B-Tree Nodes
• Every node x has the following attributes:
– X.n = the number of keys in X
– X.keys[n] = the actual keys.
– X.leaf = is this a leaf? Can the root be a leaf?
– X.child[n+1] = array of pointers to the children.
• Rule: key[1] <= key[2] <= … key[n].
14
![Page 15: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/15.jpg)
15
Definition of a B-Tree
• Def: B-tree of order m is a tree with the following
properties:
– The root has at least 2 children, unless it is a leaf.
– No node in the tree has more then m children.
– Every node except for the root and the leaves have at
least m/2 children.
– All leaves appear at the same level.
– An internal node with k children contains exactly k-1
keys.
![Page 16: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/16.jpg)
16
B-Trees & Efficiency
• Used in Mac, NTFS, OS2 for file structure.
• Allow insertion and deletion into a tree structure,
based on logmn property, where m is the order of
the tree.
• The idea is that you leave some key spaces open.
So an insert of a new key is done using available
space (most cases).
– Less dynamic then our typical Binary Tree
– Efficient for disk based operations.
![Page 17: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/17.jpg)
17
2-3 Trees
G
I | M
J | K
C
D | E A H N | O
![Page 18: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/18.jpg)
B Tree Operations (adt)
• Search(key)
• Insert(key)
• Delete(key)
18
![Page 19: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/19.jpg)
19
Searching m-ary Trees
A generalized SOT will visit all keys in ascending
order.
for (i==1;i<=m-1;i++) {
visit subtree to left of ki
visit ki
}
visit subtree to right of km-1
![Page 20: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/20.jpg)
20
Basic Recursive Search
Ordered Recursive Search. Array indexed by 1.
Search(T,k)
for (i==1;i<=m-1;i++) {
if (k < ki) return Search(T.child[i],k);
}
Return Search(T.child[m],k);
Notice the for loop! O(?)
![Page 21: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/21.jpg)
21
Insertion
• Insert ki into B-tree of order m.
- We find the insertion point (in a leaf) by doing a search.
- If there is room then enter ki.
- Else, promote the middle key to the parent & split the
node into nodes around the middle key.
• If the splitting backs up to the root, then
– Make a new root containing the middle key.
• Note: the tree grows from the leaves, balance is
always maintained.
![Page 22: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/22.jpg)
22
I | K | M
J H N | O L
G | K
M
L
C
D | E A H N | O
I
J
G
I | M
J | K
C
D | E A H N | O
K is promoted again, this
gives the new tree:
L is inserted into
the above tree.
Insertion Example
![Page 23: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/23.jpg)
23
Splitting Nodes
T3 T2
A | B | C
T1 T4
T4
C A
B
T1 T2 T3
• Middle key is promoted
• Creating a new root
![Page 24: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/24.jpg)
24
Deletion
• If the entry to be deleted is not in a leaf, swap it
with its successor (or predecessor) under the
natural order of the keys. Then delete the entry
from the leaf.
• If leaf contains more than the minimum number of
entries, then one can be deleted with no further
action.
![Page 25: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/25.jpg)
25
Deletion Example 1
C
D | E A
C
E A
Delete D
C
D | E A
Delete C D
E A
Successor is promoted, Element D
C is Deleted.
![Page 26: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/26.jpg)
26
Deletion Cont...
• If the node contains the minimum number of
entries, consider the two immediate siblings of the
parent node:
• If one of these siblings has more than the
minimum number of entries, then redistribute one
entry from this sibling to the parent node, and
one entry from the parent to the deficient node.
– This is a rotation which balances the nodes
– Note: all nodes must comply with minimum entry
restriction.
![Page 27: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/27.jpg)
27
Deletion Example 2
C
D |
E
A
C
D |
E
C
Delete A
C |
D
E
D
E
![Page 28: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/28.jpg)
28
Deletion Cont...
• If both immediate siblings have exactly the
minimum number of entries, then merge the
deficient node with one of the immediate sibling
node and one entry from the parent node.
• If this leaves the parent node with too few entries,
then the process is propagated upward.
![Page 29: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/29.jpg)
29
Deletion Example 3 G | K
M
L
C
D | E A H N | O
I
J
G | K
M
L
C
D | E A N | O
I
J
Delete H
G | K
M
L
C
D | E A N | O I | J
Combine with parent and 1 sibling of
parent
Node is
deficient
![Page 30: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/30.jpg)
30
Deletion Example 3 Cont.. G | K
M
L
C
D | E A N | O I | J
Node is now
deficient
G
L
C
D | E A N | O I | J
K | M
Deficient node is combined with
1 key from parent and sibling of
parent
Node G is legal so propagation
up the tree stops.
![Page 31: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/31.jpg)
31
Review of Deletions
• All Deletions take place in leaf nodes
– To delete a internal key swap it with its successor or
predecessor which is a leaf.
– Then Delete
• Deficient Nodes are legalized by:
– Rotation with a sibling and parent.
OR
– Combining with key from parent and sibling
– Propagating up the tree until a legal node is
encountered.
![Page 32: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/32.jpg)
32
End Notes
Studies have shown that on average there is about
1/((m/2) -1) splits per insertion.
– E.g.
• For a 2/3 tree there is 1
• For a 10-ary tree there is 1/4
![Page 33: B-Treescs.boisestate.edu/~scutchin/cs321/lectures/B-Trees_ro_apr15.pdf · 16 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into](https://reader033.vdocuments.us/reader033/viewer/2022042003/5e6e97204f6b8f6051283068/html5/thumbnails/33.jpg)
Acknowledgement Dave Bockus
© Dave Bockus
Acknowledgements to:
Dr Frederic Maire Brisbane, Queensland, AUSTRALIA
for some of the material found in this presentation