1 chapter 6: searching trees and more sorting algorithms 6.1 binnary tree the bin tree class with...

42
1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees 6.3 HeapSort and BucketSort 6.3.1 HeapSort 6.3.2 BucketSort

Upload: leslie-evans

Post on 31-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

1

Chapter 6:  Searching trees and more Sorting Algorithms

6.1   Binnary Tree

The Bin Tree class with traversing methods

6.2   Searching Trees

6.2.1   AVL Trees

6.3   HeapSort and BucketSort

6.3.1   HeapSort

6.3.2   BucketSort

Page 2: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

2

Addendum:

A Webseite with animations for AVL-trees:

http://www.seanet.com/users/arsen/avltree.html

A Webseite with animation for Heapsort:

http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/heapsort.html

Page 3: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

3

W

Z

a

b

c

x

y

W Z

a

b

c

x y

new

new

First rotation

Second rotation

Page 4: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

4

6.3 BucketSort

All sorting procedures we have seen so far are based on the comparission of two keys

The general bottom bound for the cost for this kind of procedures is:

O(n log n).

For certain sets of keys:

Sorting without comparing keys and more efficient !

Page 5: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

5

Idea: use the keys to calculate the the storing addresses for the elements of the sequence to be sorted (like in Hashing).

Example (ideal situation, not frequent):

Set of n data objects {s0, ... , sn-1} with key values 0, ..., n-1, without duplicates given as an array S.

Sorting algoritm:

for(int i = 0, i < n, i++)

T[S[i].key] = S[i];

cost: O(n).

Page 6: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

6

BucketSort

Sets of n data objects {s0, ... , s n-1} with key values 0, ..., m-1, given as array S.

duplicate keys are allowed.

void BucketSort(S) { int i; int j;

for(j=0; j<m; j++) B[j] = null; //the buckets, lists for(i=0; i<n; i++) insert(S[i], B[S[i].key()] ); for(j=0; j<m; j++) output(B[j]);}cost: O(n+m).

Page 7: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

7

RadixSort

Sets of n data objects {s0, ... , sn-1} with key values 0, ..., nk -1, given as an array S. Duplicate keys allowed.

The bucketsort for that would take: O(n + nk).

Making it better (RadixSort): • Write the keys on base n. We have numbers of k ciphers• Run k times the BucketSort algorithm sorting the objects

according to each cipher, in order, starting from the less significant cipher (last?)until the most significant one (first?) (e.g. using mod and div).

cost: O(k•n).

Page 8: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

8

Example for RadixSort:

n=10, k=2.

Sequence to be sorted:

64, 17, 3, 99, 79, 78, 19, 13, 67, 34.

1. step: insert them in buckets according to the last cipher:

after that, output them in the order they are :

3, 13, 64, 34, 17, 67, 78, 99, 79, 19

0 1 2 3 4 5 6 7 8 9

3

13

64

34

17

67

78 99

79

19

Page 9: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

9

Continuation RadixSort

2nd. step: the sequence obtained from the step 1

3, 13, 64, 34, 17, 67, 78, 99, 79, 19

Insert in the buckets according to the penultimate cipher

and output them:

3, 13, 17, 19, 34, 64, 67, 78, 79, 99.

0 1 2 3 4 5 6 7 8 9

3 13

17

19

34 64

67

78

79

99

Page 10: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

10

Generalizing:

Ciphers in different possitions can have a different value range.

Example: Date=(year, month, day)

( [0..9999], [1..12], [1..31] )

BucketSort the dates according to day, month and year.

Page 11: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

General things about binary trees

• They are recursive structures, this means, many algorithms over them are “better” (shorter, more elegant) expressed in a recursive way

• This means, in most cases it is necessary to execute recursively the algorithm on one or both sub-trees and analyze the root node (the order may vary according to the task)

• (one of) The base case(s) (when there is no recursive call any more) is when the pointer to the root of the

(sub-)tree is null (empty tree)• To improve efficiency we can avoid recursive calls when there

is no child;

11

Page 12: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

Example 1 : search

12

Node search(int x, Node y) { //returns a pointer to the node containing y //null if it is not in the tree if (y == null) return null; if (y.key == x) return y if (y.key > x) return search(x, y.left); return search(x, y.right);}

Page 13: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

Exmple 2 : count

13

int count(Node y) { //returns the number of nodes in the tree if (y == null) return 0; int a = count(y.right); int b = count(y.left); return a + b + 1; // return count(y.right)+count(y.left)+1;}

Page 14: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

Exmple 3 : check if search tree

14

boolean isBST(Node y) { //returns true if the tree is a //binary search tree if (y == null) return true; if (y.right == null && y.left == null) return true; if (!isBST(y.left) || !isBST(y.right)) return false if (y.left != null && max(y.left) < y.key && y.right != null && min(y.right) > y.key) return true; return false;}

Page 15: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

Exmple 3 : more effienciently

15

class Resp { int min, max; boolean ok; Resp(int x, int y, boolean z){ min = x; max = y; ok = z; }}

resp isBST(Node y) { if (y == null) return new Resp(0,0,true); Resp a = null, b = null; c= new Resp(y.key, y.key, true); if (y.left != null) a = isBST(y.left); if (y.right != null) b = isBST(y.right); if ( a != null && b != null) { c.min = a.min; c.max = b.max; c.ok = a.ok && b.ok && a.max < y.key && b.min > y.key; } if (a != null && b == null) { c.min = a.min; c.ok = a.ok && a.max < y.key; } if ( a == null && b != null) { c.max = b.max; c.ok = b.ok && b.min > y.key; } return c;}

Page 16: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

16

Chapter 7:  Selected Algorithms

7.1   External Search

Page 17: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

17

7.1 External Search

• The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)

• Big data sets are frequently stored in secondary storage devices (hard disk). Slow(er) access (about 100-1000 times slower)

Access: always to a complete block (page) of data (4096 bytes), which is stored in the RAM

For efficiency: keep the number of accesses to the pages low!

Page 18: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

18

For external search: a variant of search trees:

1 node = 1 page

Multiple way search trees!

Page 19: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

19

Definition (Multiple way-search trees)

An empty tree is a multiple way search tree with an empty set of keys {} .

Be T0, ..., Tn multiple way-search trees with keys taken from a common key set S, and be k1,...,kn a sequence of keys with k1 < ...< kn. Then is the sequence:

T0 k1 T1 k2 T2 k3 .... kn Tn

a multiple way-search trees only when:

• for all keys x from T0 x < k1 • for i=1,...,n-1, for all keys x in Ti, ki < x < ki+1 • for all keys x from Tn kn < x

Page 20: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

20

B-Tree

Definition 7.1.2

A B-Tree of Order m is a multiple way tree with the following characteristics

• 1 #(keys in the root) 2m and m #(keys in the nodes) 2m for all other nodes.• All paths from the root to a leaf are equally long. • Each internal node (not leaf) which has s keys has exactly

s+1 children.

Page 21: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

21

Example: a B-tree of order 2:

Page 22: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

22

Assessment of B-trees

The minimal possible number of nodes in a B-tree of order m and height h:

• Number of nodes in each sub-tree 1 + (m+1) + (m+1)2 + .... + (m+1)h-1

=  ( (m+1)h – 1) / m.

The root of the minimal tree has only one key and two children, all other nodes have m keys.

Altogether: number of keys n in a B-tree of height h: n 2 (m+1)h – 1

Thus the following holds for each B-tree of height h with n keys:

h logm+1 ((n+1)/2) .

Page 23: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

23

Example

The following holds for each B-tree of height h with n keys:

h logm+1 ((n+1)/2).

Example: for• Page size: 1 KByte and • each entry plus pointer: 8 bytes, If we chose m=63, and for an ammount of data of n= 1 000 000 We have

     h log 64 500 000.5 < 4 and with that hmax = 3.

Page 24: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

24

Algorithms for searching keys in a B-tree

Algorithm search(r, x) //search for key x in the tree having as root node r; //global variable p in r, search for the first key y >= x or until no more keys if y == x {stop search, p = r, found} else if r a leaf {stop search, p = r, not found} else if not past last key search(pointer to node before y, x) else search(last pointer, x)

Page 25: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

25

Algorithms for inserting and deleting of keys in a B-tree

Algorithm insert (r, x) //insert key x in the tree having root r search for x in tree having root r; if x was not found { be p the leaf where the search stopped; insert x in the right position; if p now has 2m+1 keys {overflow(p)} }

Page 26: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

26

Algorithm overflow (p) = split (p)

Algorithm split (p) first case: p has a parent q.

Divide the overflowed node. The key of the middle goes to the parent.

remark: the splitting may go up until the root, in which case the height of the tree is incremented by one.

Algorithm Split (1)

Page 27: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

27

Algorithm split (p) second case: p is

the root.

Divide overflowed node. Open a new level above containing a new root with the key of the middle (root has one key).

Algorithm Split (2)

Page 28: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

28

//delete key x from tree having root r search for x in the tree with root r; if x found { if x is in an internal node { exchange x with the next bigger key x' in the tree // if x is in an internal node then there must // be at least one bigger number in the tree //this number is in a leaf ! } be p the leaf, containing x; erase x from p; if p is not in the root r { if p has m-1 keys {underflow (p)} } }

Algorithm delete (r,x)

Page 29: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

29

Algorithm underflow (p)

if p has a neighboring node with s>m nodes { balance (p,p') }else // because p cannot be the root, p must have a

neighbor with m keys { be p' the neighbor with m keys; merge (p,p')}

Page 30: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

30

Algorithm balance (p, p') // balance node p with its neighbor p'

(s > m , r = (m+s)/2 -m )

Page 31: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

31

Algorithm merge (p,p') // merge node p with its neighbor perform the following operation:

afterwards:if( q <>  root) and (q

has m-1 keys) underflow (q)

else (if(q= root) and (q empty)) {free q let root point to p^}

Page 32: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

32

Recursion

If when performing underflow we have to perform merge, we might have to perform underflow again one level up

This process might be repeated until the root.

Page 33: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

33

Example:B-Tree of order 2 (m = 2)

Page 34: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

34

Cost

Be m the order of the B-tree,

n the number of keys.

Costs for search , insert and delete:

O(h) = O(logm+1 ((n+1)/2) )

= O(logm+1(n)).

Page 35: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

35

Remark:

B-trees can also be used as internal storage structure:

Especially: B-trees of order 1 (then only one or 2 keys in each node – no elaborate search inside the nodes).

Cost of search, insert, delete: O(log n).

Page 36: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

36

Remark: use of storage memory

Over 50%

reason: the condition:

1/2•k #(keys in the node) k

For nodes root

(k=2m)

Page 37: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

37

Even higher usage ratio of memory is possible to achieve with the following condition ( 66%):

2/3•k #(keys in nodes) k

For all nodes and their children

This can be reached by 1) modified balancing also when inserting 2) split only then, when 2 neighbors are full.

Drawback : More frequent reorganization is necessary when inserting and deleting.

.

Page 38: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

38

7.2 External Sorting

Problem: Sorting big amount of data, as in external searching, stored in blocks (pages).

efficiency: number of the access to pages should be kept low!

Strategy: Sorting algorithm which processes the data sequentially (no frequent page exchanges): MergeSort!

Page 39: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

39

Start: n data in a file g1, divided in pages of size b:

Page 1: s1,…,sb

Page 2: sb+1,…s2b …

Page k: s(k-1)b+1 ,…,sn

( k = [n/b]+ )When sequentially processed: only k page accesses

instead of n.

Page 40: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

40

Variation of MergeSort for external sorting

MergeSort: Divide-and-Conquer-Algorithm

for external sorting: without divide-step,

only merge.

Definition: run := ordered subsequence within a file.

Strategy: by merging increasingly generated runs until everything is sorted.

Page 41: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

41

Algorithm

1. Step: Generate from the sequence in the input file g1

„starting runs“ and distribute them in two files f1 and f2,

with the same number of runs (1) in each.

(for this there are many strategies, later).

Now: use four files f1, f2, g1, g2.

Page 42: 1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees 6.2.1 AVL Trees

42

2. Step (main step):

While the number of runs > 1 repeat: {

• Merge each two runs from f1 and f2 to a double sized run alternating to g1 und g2, until there are no more runs in f1 and f2.

• Merge each two runs from g1 and g2 to a double sized run alternating to f1 and f2, until there are no more runs in g1 und g2.

}

Each loop = two phases