1 b+-trees (part 1) what is a b+ tree? why b+ trees? searching a b+ tree insertion in a b+ tree...

23
1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET SOURCES

Upload: olivia-mcdowell

Post on 15-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

1

B+-Trees (PART 1)

• What is a B+ tree?

• Why B+ trees?

• Searching a B+ tree

• Insertion in a B+ tree

NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET SOURCES

Page 2: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

2

What is a B+ tree?

• A B+-tree of order M ≥ 3 is an M-ary tree with the following properties:• Leaves contain data items or references to data items

– all are at the same depth– each leaf has L/2 to L data or data references (L may be equal to, less

or greater than M; but usually L << M)• Internal nodes contain searching keys

– The keys in each node are sorted in increasing order– each node has at least M/2 and at most M subtrees– The number of search keys in each node is one less than the number of

subtrees• key i in an internal node is the smallest key in subtree i+1

• Root– can be a single leaf, or has 2 to M children

Node are at least half-full, so that the tree will not degenerate into simple binary tree or even link list

Page 3: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

3

The internal node structure of a B+ tree • Each leaf node stores key-data pair or key-dataReference pair. Data or data references are in leaves only. • Leaves form a doubly-linked list that is sorted in increasing order of keys.

• Each internal node has the following structure:

j a1 k1 a2 k2 a3 … kj aj+1

j == number of keys in node.ai is a reference to a subtree.ki == smallest key in subtree ai+1 and > largest key in subtree ai.k1 < k2 < k3 < . . . < kj

Page 4: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

4

What is a B+ tree? • Example: A B+ tree of order M = 5, L = 5

• Records or references to records are stored at the leaves, but we only show the keys here• At the internal nodes, only keys (and references to subtrees) are stored

• Note: The index set (i.e., internal nodes) contains distinct keys

Page 5: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

5

What is a B+ tree? • Example: A B+ tree of order M = 4, L = 4

Note: For simplicity the doubly linked list references that join leaf nodes are omitted

Page 6: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

6

Why B+ trees?

• Like a B-tree each internal node and leaf node is designed to fit into one I/O block of data. An I/O block usually can hold quite a lot of data. Hence, an internal node can keep a lot of keys, i.e., large M. This implies that the tree has only a few levels and only a few disk accesses can accomplish a search, insertion, or deletion.

• B+-tree is a popular structure used in commercial databases. To further speed up searches, insertions, and deletions, the first one or two levels of the B+-tree are usually kept in main memory.

The reason that B+ trees are used in databases is, unlike B-trees, B+ trees support both equality and range-searches efficiently:

• Example of equality search: Find a student record with key 950000• Example of range search: Find all student records with Exam grade greater than 70 and less than 90

Page 7: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

7

Why B+ trees ? (Cont’d)

A B+ tree supports equality and range-searches efficiently

Index Entries

Data Entries("Sequence set")

(Direct search)

Page 8: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

8

B+ Trees in Practice• For a B+ tree of order M and L = M, with h levels of index,

where h 1:– The maximum number of records stored is n = (M – 1)h – The space required to store the tree is O(n)

– Inserting a record requires O(logMn) operations in the worst case

– Finding a record requires O(logMn) operations in the worst case

– Removing a (previously located) record requires O(logMn) operations in the worst case

– Performing a range query with k elements occurring within the range requires O(logMn + k) operations in the worst case.

• Example for a B+ tree of order M = 134 and L = 133:– A tree with 3 levels stores a maximum of 1333 = 2,352,637 records– A tree with 4 levels stores a maximum of: 1334 = 312,900,700 records

Page 9: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

9

Searching a B+ Trees

• Searching KEY:

– Start from the root

– If an internal node is reached:

• Search KEY among the keys in that node

– linear search or binary search

• If KEY < smallest key, follow the leftmost child reference down

• If KEY >= largest key, follow the rightmost child reference down

• If Ki <= KEY < Kj, follow the child reference between Ki and Kj

– If a leaf is reached:

• Search KEY among the keys stored in that leaf

– linear search or binary search

• If found, return the corresponding record; otherwise report not found

Page 10: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

10

Searching a B+ Trees

• In processing a query, a path is traversed in the tree from the root to some leaf node.

• If there are K search-key values in the file, the path is no longer than

logm/2(K).

• With 1 million search key values and m = 100, at most log50(1,000,000) = 4 nodes are accessed in a lookup.

Page 11: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

11

Insertion in B+ Trees

• A B+ tree has two OVERFLOW CONDITIONS:• A leaf-node overflows if after insertion it contains L + 1 keys• A root-node or an internal node of a B+ tree of order M overflows if, after a key

insertion, it contains M keys.

• Insertion algorithm:• Search for the appropriate leaf node x to insert the key. Note: Insertion of a key

always starts at a leaf node.

• If the key exists in the leaf node x, report an error, else• Insert the key in its proper sorted order in the leaf node• If the leaf does not overflow (If x contains less than L+1 keys after insertion),

the insertion is done, else• If a leaf node overflows, split it into two, COPY the smallest key y of right splinted

node to the parent of the node (Records with keys < y go to the left leaf node. Records with keys >= y go to the right leaf node). If the parent overflows, split the

parent into two (keys < middle key go to the left node. keys > middle key go to the right node. The middle key PROPAGATES to the parent of the splinted parent. The process propagates upward until a parent that does not overflow is reached or the root node is reached. If the root node is reached and it overflows, create a new root node.

Suppose that we want to insert a key K and its associated record into the B+ tree.

Page 12: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

12

Insertion in B+ Trees: No overflow

• Insert KEY:– Search for KEY using search operation

• If the key is found in a leaf node report an error– Insert KEY into that leaf

• If the leaf does not overflow (contains <= L keys), just insert KEY into it• If the leaf overflows (contains L+1 keys), splitting is necessary

An example of inserting O into a B+ tree of order M = 4, L = 3.

Search for O; this leaf has 2 keys. Insert O and maintain the order.

Page 13: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

13

Insertion in B+ Trees: Splitting a Leaf Node

• If the leaf overflows (contains L+1 keys after insertion), splitting is necessary• Splitting leaf:

– Split it into 2 new leaves LeftLeaf and RightLeaf• LeftLeaf has the (L+1) / 2 smallest keys• RightLeaf has the remaining (L+1) / 2 keys

– Make a copy of the smallest key in RightLeaf, say MinKeyRight, to be the parent of LeftLeaf and RightLeaf [COPY UP]

– Insert MinKeyRight, together with LeftLeaf and RightLeaf, into the original parent node

An example of inserting T into a B+ tree of order M = 4 and L= 3

Search for T; this leaf has 3 keys. Overflow

Page 14: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

14

Insertion in B+ Trees: Splitting Leaf (Cont’d)

xL xR

Insert S into the parent. Maintain the order of keys and child references (DONE).

Split the leaf (xL gets (L+1)/2 keys, xR gets

(L+1)/2) keys , takes the minimum key in xR be the parent of xL and XR.

Make S the parent of the two new leaves, and insert S to the parent. Since the parent only has 2 keys (U, Y), we can insert the subtree rooted at S to it.

Page 15: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

15

Insertion in B+ Trees: Splitting Internal Node

• An insertion in a full parent node causes the parent to overflow, in that case this internal node must be split.

• Splitting internal node:– Split it into 2 new internal nodes LeftNode and RightNode

• LeftNode has the smallest M/2 -1 keys• RightNode has the largest M/2 keys NumberOfKeys in LeftNode <= NumberOfKeysInRightNode• Note that the M/2 th key is not in either node.

– Make the M/2 th key, say “MiddleKey”, to be the parent of LeftNode and RightNode [PROPAGATE UP]

– Insert “MiddleKey”, together with LeftNode and RightNode, into the original parent node

• Splitting root:– Follow exactly the same procedure as splitting an internal node– “MiddleKey”, the parent of LeftNode and RightNode, is now set to be

the root of the tree– After splitting the root, the depth of the tree increases by 1

Page 16: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

16

Insertion in B+ TreesAn example of inserting M into a B+ tree of order M= 4 and L = 3

Split the leaf and distribute the keys.

Search for M; this leaf has 3 keys.

Insert M and B+ tree condition is violated.

Page 17: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

17

Insertion in B+ Trees

Insert L and its child references into the parent.

Split the leaf and distribute the keys.

Make L the parent of the two new leaves.

However, we cannot just insert L into the parent as it is already full.

Page 18: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

18

Insertion in B+ Trees

xL xR

Since the parent is not full, we can just insert the subtree rooted at J to the parent Done.

The key J becomes the parent of the two internal nodes. Insert J into the next parent.

Page 19: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

19

Insertion in B+ TreesInsert 16 then 8 in the following B+ tree of order M = 5, L = 4:

Note: A * in a leaf node key indicates a key-dataReference pair

Root17 24 3013

2* 3* 5* 7* 8*

2* 5* 7*3*

17 24 3013

8*

One new child (leaf node) generated; must add one more reference to its parent, thus one more key value as well.

14* 15* 16*

overflow!

Page 20: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

20

Insertion in B+ Trees

Inserting 8* (cont.)

• Copy up the middle value (leaf split)

2* 3* 5* 7* 8*

5

Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)

s copied up and

13 17 24 30

overflow! 5 13 17 24 30

Page 21: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

21

Insertion in B+ Trees

(Note that 17 is pushed up and onlyappears once in the index. Contrast

Entry to be inserted in parent node.

this with a leaf split.)

5 24 30

17

13

5 13 17 24 30

We split this node, redistribute entries evenly, and propagate up middle key.

Page 22: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

22

Insertion in B+ Trees

Notice that root was split, leading to increase in height.

2* 3*

Root

17

24 30

14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Page 23: 1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET

23

Inserting a Data Entry into a B+ Tree: Summary

Find correct leaf X. Put data entry onto X.If X has enough space, done!

Else, must split X (into X and a new node X2)Redistribute entries evenly, put middle key in X2copy up middle key.Insert reference (index entry) refering to X2 into parent of X.

This can happen recursivelyTo split index node, redistribute entries evenly, but push (propagate) up middle key. (Contrast with leaf splits.)

Splits “grow” tree; root split increases height .Tree growth: gets wider or one level taller at top.