1 chapter 6: searching trees and more sorting algorithms 6.1 binnary tree the bin tree class with...

1

Chapter 6: Searching trees and more Sorting Algorithms

6.1 Binnary Tree

The Bin Tree class with traversing methods

6.2 Searching Trees

6.2.1 AVL Trees

6.3 HeapSort and BucketSort

6.3.1 HeapSort

6.3.2 BucketSort

2

Addendum:

A Webseite with animations for AVL-trees:

http://www.seanet.com/users/arsen/avltree.html

A Webseite with animation for Heapsort:

http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/heapsort.html

3

W

Z

a

b

c

x

y

W Z

a

b

c

x y

new

new

First rotation

Second rotation

4

6.3 BucketSort

All sorting procedures we have seen so far are based on the comparission of two keys

The general bottom bound for the cost for this kind of procedures is:

O(n log n).

For certain sets of keys:

Sorting without comparing keys and more efficient !

5

Idea: use the keys to calculate the the storing addresses for the elements of the sequence to be sorted (like in Hashing).

Example (ideal situation, not frequent):

Set of n data objects {s0, ... , sn-1} with key values 0, ..., n-1, without duplicates given as an array S.

Sorting algoritm:

for(int i = 0, i < n, i++)

T[S[i].key] = S[i];

cost: O(n).

6

BucketSort

Sets of n data objects {s0, ... , s n-1} with key values 0, ..., m-1, given as array S.

duplicate keys are allowed.

void BucketSort(S) { int i; int j;

for(j=0; j<m; j++) B[j] = null; //the buckets, lists for(i=0; i<n; i++) insert(S[i], B[S[i].key()] ); for(j=0; j<m; j++) output(B[j]);}cost: O(n+m).

7

RadixSort

Sets of n data objects {s0, ... , sn-1} with key values 0, ..., nk -1, given as an array S. Duplicate keys allowed.

The bucketsort for that would take: O(n + nk).

Making it better (RadixSort): • Write the keys on base n. We have numbers of k ciphers• Run k times the BucketSort algorithm sorting the objects

according to each cipher, in order, starting from the less significant cipher (last?)until the most significant one (first?) (e.g. using mod and div).

cost: O(k•n).

8

Example for RadixSort:

n=10, k=2.

Sequence to be sorted:

64, 17, 3, 99, 79, 78, 19, 13, 67, 34.

1. step: insert them in buckets according to the last cipher:

after that, output them in the order they are :

3, 13, 64, 34, 17, 67, 78, 99, 79, 19

0 1 2 3 4 5 6 7 8 9

3

13

64

34

17

67

78 99

79

19

9

Continuation RadixSort

2nd. step: the sequence obtained from the step 1

3, 13, 64, 34, 17, 67, 78, 99, 79, 19

Insert in the buckets according to the penultimate cipher

and output them:

3, 13, 17, 19, 34, 64, 67, 78, 79, 99.

0 1 2 3 4 5 6 7 8 9

3 13

17

19

34 64

67

78

79

99

10

Generalizing:

Ciphers in different possitions can have a different value range.

Example: Date=(year, month, day)

( [0..9999], [1..12], [1..31] )

BucketSort the dates according to day, month and year.

General things about binary trees

• They are recursive structures, this means, many algorithms over them are “better” (shorter, more elegant) expressed in a recursive way

• This means, in most cases it is necessary to execute recursively the algorithm on one or both sub-trees and analyze the root node (the order may vary according to the task)

• (one of) The base case(s) (when there is no recursive call any more) is when the pointer to the root of the

(sub-)tree is null (empty tree)• To improve efficiency we can avoid recursive calls when there

is no child;

11

Example 1 : search

12

Node search(int x, Node y) { //returns a pointer to the node containing y //null if it is not in the tree if (y == null) return null; if (y.key == x) return y if (y.key > x) return search(x, y.left); return search(x, y.right);}

Exmple 2 : count

13

int count(Node y) { //returns the number of nodes in the tree if (y == null) return 0; int a = count(y.right); int b = count(y.left); return a + b + 1; // return count(y.right)+count(y.left)+1;}

Exmple 3 : check if search tree

14

boolean isBST(Node y) { //returns true if the tree is a //binary search tree if (y == null) return true; if (y.right == null && y.left == null) return true; if (!isBST(y.left) || !isBST(y.right)) return false if (y.left != null && max(y.left) < y.key && y.right != null && min(y.right) > y.key) return true; return false;}

Exmple 3 : more effienciently

15

class Resp { int min, max; boolean ok; Resp(int x, int y, boolean z){ min = x; max = y; ok = z; }}

resp isBST(Node y) { if (y == null) return new Resp(0,0,true); Resp a = null, b = null; c= new Resp(y.key, y.key, true); if (y.left != null) a = isBST(y.left); if (y.right != null) b = isBST(y.right); if ( a != null && b != null) { c.min = a.min; c.max = b.max; c.ok = a.ok && b.ok && a.max < y.key && b.min > y.key; } if (a != null && b == null) { c.min = a.min; c.ok = a.ok && a.max < y.key; } if ( a == null && b != null) { c.max = b.max; c.ok = b.ok && b.min > y.key; } return c;}

16

Chapter 7: Selected Algorithms

7.1 External Search

17

7.1 External Search

• The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)

• Big data sets are frequently stored in secondary storage devices (hard disk). Slow(er) access (about 100-1000 times slower)

Access: always to a complete block (page) of data (4096 bytes), which is stored in the RAM

For efficiency: keep the number of accesses to the pages low!

18

For external search: a variant of search trees:

1 node = 1 page

Multiple way search trees!

19

Definition (Multiple way-search trees)

An empty tree is a multiple way search tree with an empty set of keys {} .

Be T0, ..., Tn multiple way-search trees with keys taken from a common key set S, and be k1,...,kn a sequence of keys with k1 < ...< kn. Then is the sequence:

T0 k1 T1 k2 T2 k3 .... kn Tn

a multiple way-search trees only when:

• for all keys x from T0 x < k1 • for i=1,...,n-1, for all keys x in Ti, ki < x < ki+1 • for all keys x from Tn kn < x

20

B-Tree

Definition 7.1.2

A B-Tree of Order m is a multiple way tree with the following characteristics

• 1 #(keys in the root) 2m and m #(keys in the nodes) 2m for all other nodes.• All paths from the root to a leaf are equally long. • Each internal node (not leaf) which has s keys has exactly

s+1 children.

21

Example: a B-tree of order 2:

22

Assessment of B-trees

The minimal possible number of nodes in a B-tree of order m and height h:

• Number of nodes in each sub-tree 1 + (m+1) + (m+1)2 + .... + (m+1)h-1

= ( (m+1)h – 1) / m.

The root of the minimal tree has only one key and two children, all other nodes have m keys.

Altogether: number of keys n in a B-tree of height h: n 2 (m+1)h – 1

Thus the following holds for each B-tree of height h with n keys:

h logm+1 ((n+1)/2) .

23

Example

The following holds for each B-tree of height h with n keys:

h logm+1 ((n+1)/2).

Example: for• Page size: 1 KByte and • each entry plus pointer: 8 bytes, If we chose m=63, and for an ammount of data of n= 1 000 000 We have

h log 64 500 000.5 < 4 and with that hmax = 3.

24

Algorithms for searching keys in a B-tree

Algorithm search(r, x) //search for key x in the tree having as root node r; //global variable p in r, search for the first key y >= x or until no more keys if y == x {stop search, p = r, found} else if r a leaf {stop search, p = r, not found} else if not past last key search(pointer to node before y, x) else search(last pointer, x)

25

Algorithms for inserting and deleting of keys in a B-tree

Algorithm insert (r, x) //insert key x in the tree having root r search for x in tree having root r; if x was not found { be p the leaf where the search stopped; insert x in the right position; if p now has 2m+1 keys {overflow(p)} }

26

Algorithm overflow (p) = split (p)

Algorithm split (p) first case: p has a parent q.

Divide the overflowed node. The key of the middle goes to the parent.

remark: the splitting may go up until the root, in which case the height of the tree is incremented by one.

Algorithm Split (1)

27

Algorithm split (p) second case: p is

the root.

Divide overflowed node. Open a new level above containing a new root with the key of the middle (root has one key).

Algorithm Split (2)

28

//delete key x from tree having root r search for x in the tree with root r; if x found { if x is in an internal node { exchange x with the next bigger key x' in the tree // if x is in an internal node then there must // be at least one bigger number in the tree //this number is in a leaf ! } be p the leaf, containing x; erase x from p; if p is not in the root r { if p has m-1 keys {underflow (p)} } }

Algorithm delete (r,x)

29

Algorithm underflow (p)

if p has a neighboring node with s>m nodes { balance (p,p') }else // because p cannot be the root, p must have a

neighbor with m keys { be p' the neighbor with m keys; merge (p,p')}

30

Algorithm balance (p, p') // balance node p with its neighbor p'

(s > m , r = (m+s)/2 -m )

31

Algorithm merge (p,p') // merge node p with its neighbor perform the following operation:

afterwards:if( q <> root) and (q

has m-1 keys) underflow (q)

else (if(q= root) and (q empty)) {free q let root point to p^}

32

Recursion

If when performing underflow we have to perform merge, we might have to perform underflow again one level up

This process might be repeated until the root.

33

Example:B-Tree of order 2 (m = 2)

34

Cost

Be m the order of the B-tree,

n the number of keys.

Costs for search , insert and delete:

O(h) = O(logm+1 ((n+1)/2) )

= O(logm+1(n)).

35

Remark:

B-trees can also be used as internal storage structure:

Especially: B-trees of order 1 (then only one or 2 keys in each node – no elaborate search inside the nodes).

Cost of search, insert, delete: O(log n).

36

Remark: use of storage memory

Over 50%

reason: the condition:

1/2•k #(keys in the node) k

For nodes root

(k=2m)

37

Even higher usage ratio of memory is possible to achieve with the following condition ( 66%):

2/3•k #(keys in nodes) k

For all nodes and their children

This can be reached by 1) modified balancing also when inserting 2) split only then, when 2 neighbors are full.

Drawback : More frequent reorganization is necessary when inserting and deleting.

.

38

7.2 External Sorting

Problem: Sorting big amount of data, as in external searching, stored in blocks (pages).

efficiency: number of the access to pages should be kept low!

Strategy: Sorting algorithm which processes the data sequentially (no frequent page exchanges): MergeSort!

39

Start: n data in a file g1, divided in pages of size b:

Page 1: s1,…,sb

Page 2: sb+1,…s2b …

Page k: s(k-1)b+1 ,…,sn

( k = [n/b]+ )When sequentially processed: only k page accesses

instead of n.

40

Variation of MergeSort for external sorting

MergeSort: Divide-and-Conquer-Algorithm

for external sorting: without divide-step,

only merge.

Definition: run := ordered subsequence within a file.

Strategy: by merging increasingly generated runs until everything is sorted.

41

Algorithm

1. Step: Generate from the sequence in the input file g1

„starting runs“ and distribute them in two files f1 and f2,

with the same number of runs (1) in each.

(for this there are many strategies, later).

Now: use four files f1, f2, g1, g2.

42

2. Step (main step):

While the number of runs > 1 repeat: {

• Merge each two runs from f1 and f2 to a double sized run alternating to g1 und g2, until there are no more runs in f1 and f2.

• Merge each two runs from g1 and g2 to a double sized run alternating to f1 and f2, until there are no more runs in g1 und g2.

}

Each loop = two phases

1 chapter 6: searching trees and more sorting algorithms 6.1 binnary tree the bin tree class with...

Documents