arboles b

21
Arboles B

Upload: paxton

Post on 09-Jan-2016

22 views

Category:

Documents


2 download

DESCRIPTION

Arboles B. 7.1 External Search. The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er) Big data sets are frequently stored in secondary storage devices (hard disk). Slow(er) access (about 100-1000 times slower) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Arboles B

Arboles B

Page 2: Arboles B

2

7.1 External Search

• The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)

• Big data sets are frequently stored in secondary storage devices (hard disk). Slow(er) access (about 100-1000 times slower)

Access: always to a complete block (page) of data (4096 bytes), which is stored in the RAM

For efficiency: keep the number of accesses to the pages low!

Page 3: Arboles B

3

For external search: a variant of search trees:1 node = 1 page

Multiple way search trees!

Page 4: Arboles B

4

Definition (Multiple way-search trees)

An empty tree is a multiple way search tree with an empty set of keys {} .

Be T0, ..., Tn multiple way-search trees with keys taken from a common key set S, and be k1,...,kn a sequence of keys with k1 < ...< kn. Then is the sequence:

T0 k1 T1 k2 T2 k3 .... kn Tn

a multiple way-search trees only when:

• for all keys x from T0 x < k1 • for i=1,...,n-1, for all keys x in Ti, ki < x < ki+1 • for all keys x from Tn kn < x

Page 5: Arboles B

5

B-Tree

Definition

A B-Tree of Order m is a multiple way tree with the following characteristics

• 1 #(keys in the root) 2m and m #(keys in the nodes) 2m for all other nodes.• All paths from the root to a leaf are equally long. • Each internal node (not leaf) which has s keys has exactly s+1

children. • 2-3 Trees is a particular case for m=1

Page 6: Arboles B

6

Example: a B-tree of order 2:

Page 7: Arboles B

7

Assessment of B-trees

The minimal possible number of nodes in a B-tree of order m and height h:

• Number of nodes in each sub-tree 1 + (m+1) + (m+1)2 + .... + (m+1)h-1

= ( (m+1)h – 1) / m.

The root of the minimal tree has only one key and two children, all other nodes have m keys.

Altogether: number of keys n in a B-tree of height h: n 2 (m+1)h – 1

Thus the following holds for each B-tree of height h with n keys:h logm+1 ((n+1)/2) .

Page 8: Arboles B

8

ExampleThe following holds for each B-tree of height h with n keys:

h logm+1 ((n+1)/2).

Example: for• Page size: 1 KByte and • each entry plus pointer: 8 bytes, If we chose m=63, and for an ammount of data of n= 1 000 000 We have

h log 64 500 000.5 < 4 and with that hmax = 3.

Page 9: Arboles B

9

Algorithms for searching keys in a B-tree

Algorithm search(r, x) //search for key x in the tree having as root node r; //global variable p = pointer to last node visited in r, search for the first key y >= x or until no more keys if y == x {stop search, p = r, found} else if r a leaf {stop search, p = r, not found} else if not past last key search(pointer to node before y, x) else search(last pointer, x)

Page 10: Arboles B

10

Algorithms for inserting and deleting of keys in a B-tree

Algorithm insert (r, x) //insert key x in the tree having root r search for x in tree having root r; if x was not found { be p the leaf where the search stopped; insert x in the right position; if p now has 2m+1 keys {overflow(p)} }

Page 11: Arboles B

11

Algorithm overflow (p) = split (p)

Algorithm split (p) first case: p has a parent q.

Divide the overflowed node. The key of the middle goes to the parent.

remark: the splitting may go up until the root, in which case the height of the tree is incremented by one.

Algorithm Split (1)

Page 12: Arboles B

12

Algorithm split (p) second case: p is the

root.

Divide overflowed node. Open a new level above containing a new root with the key of the middle (root has one key).

Algorithm Split (2)

Page 13: Arboles B

13

//delete key x from tree having root r search for x in the tree with root r; if x found { if x is in an internal node { exchange x with the next bigger key x' in the tree // if x is in an internal node then there must // be at least one bigger number in the tree //this number is in a leaf ! } be p the leaf, containing x; erase x from p; if p is not in the root r { if p has m-1 keys {underflow (p)} } }

Algorithm delete (r,x)

Page 14: Arboles B

14

Algorithm underflow (p)

if p has a neighboring node with s>m nodes { balance (p,p') }else // because p cannot be the root, p must have a neighbor with

m keys { be p' the neighbor with m keys; merge (p,p')}

Page 15: Arboles B

15

Algorithm balance (p, p') // balance node p with its neighbor p'

(s > m , r = (m+s)/2 -m )

Page 16: Arboles B

16

Algorithm merge (p,p') // merge node p with its neighbor perform the following operation:

afterwards:if( q <> root) and (q

has m-1 keys) underflow (q)

else (if(q= root) and (q empty)) {free q let root point to p^}

Page 17: Arboles B

17

Recursion

If when performing underflow we have to perform merge, we might have to perform underflow again one level up

This process might be repeated until the root.

Page 18: Arboles B

18

Example:B-Tree of order 2 (m = 2)

Page 19: Arboles B

19

Cost

Be m the order of the B-tree, n the number of keys.

Costs for search , insert and delete: O(h) = O(logm+1 ((n+1)/2) )

= O(logm+1(n)).

Page 20: Arboles B

20

Remark:

B-trees can also be used as internal storage structure:

Especially: B-trees of order 1 (then only one or 2 keys in each node – no elaborate search inside the nodes).

Cost of search, insert, delete: O(log n).

Page 21: Arboles B

21

Remark: use of storage memory

Over 50%reason: the condition:

1/2•k #(keys in the node) k For nodes root

(k=2m)