search trees - carleton universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/comp...comp...

20
Albert Chan http://www.scs.carleton.ca/~achan School of Computer Science, Carleton University COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees Search Trees are tree structures that are used for (dictionary) search. Many kinds of search trees: Binary Search Trees AVL Trees 2-3-4 Trees Red-Black Trees –… Albert Chan http://www.scs.carleton.ca/~achan School of Computer Science, Carleton University COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-2 Binary Search Trees A Binary Search Tree (BST) is defined as: a binary tree; each internal node stores an item (k, e) of a dictionary; any subtree of the tree is also a binary search tree; keys stored at nodes in the left subtree of v are less than or equal to k; keys stored at nodes in the right subtree of v are greater than or equal to k; external nodes do not hold elements but serve as place holders.

Upload: others

Post on 08-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-1

Search Trees

• Search Trees are tree structures that are used for(dictionary) search.

• Many kinds of search trees:– Binary Search Trees

– AVL Trees

– 2-3-4 Trees

– Red-Black Trees

– …

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-2

Binary Search Trees

• A Binary Search Tree (BST) is defined as:– a binary tree;

– each internal node stores an item (k, e) of a dictionary;

– any subtree of the tree is also a binary search tree;

– keys stored at nodes in the left subtree of v are less than or equal tok;

– keys stored at nodes in the right subtree of v are greater than orequal to k;

– external nodes do not hold elements but serve as place holders.

Page 2: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-3

Binary Search Trees

29

28

32

17

8354

76

80

65

44

88

97

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-4

Searches in BST

• A binary search tree T is a decision tree, where thequestion asked at an internal node v is whether the searchkey k is less than, equal to, or greater than the key stored atv.

• We start with the root, keep asking the question, until wefind the node we are looking for; or until we know that thenode we are looking for does not exist.

Page 3: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-5

Pseudo Code

Algorithm TreeSearch (k, v):

Input: A search key k and a node v of a binary search tree T.

Output: A node w of the subtree T(v) of T rooted at v, such that either

w is an internal node storing key k or w is the external node

encountered in the inorder traversal of T(v) after all the

internal nodes with keys smaller than k and before all the

internal nodes with keys greater than k.

if v is an external node then

return v

if k = key (v) then

return v

if k < key (v) then

return TreeSearch (k, T.leftChild (v))

else { k > key (v) }

return TreeSearch (k, T.rightChild (v))

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-6

Example

• Successful:findElement (76)

• A successfulsearch traversesa path starting atthe root andending at aninternal node 29

28

32

17

8354

76

80

65

44

88

97

Page 4: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-7

Example

• Unsuccessful:findElement (25)

• An unsuccessfulsearch traversesa path starting atthe root andending at anexternal node 29

28

32

17

8354

76

80

65

44

88

97

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-8

Insertion

• To perform insertItem(k, e), let w be the nodereturned by TreeSearch(k, T.root ())

• If w is external, weknow that k is notstored in T. We canthen callexpandExternal (w) onT and store (k, e) in w.

Page 5: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-9

Insertion

• If w is internal, weknow another itemwith key k is storedat w. We call thealgorithmrecursively startingat T.rightChild (w)

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-10

Removal

• We locate the node wwhere the key is storedwith algorithm TreeSearch

• If w has an external childz, we remove w and z withremoveAboveExternal (z)

Page 6: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-11

Removal

• If w has an no externalchildren, we find theinternal node y followingw in inorder traversal

• move the item at y into w

• perform the operationremoveAboveExternal (x),where x is the left child ofy (guaranteed to beexternal)

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-12

Time Complexity

• A search, insertion, or removal, visits the nodes along aroot-to-leaf path, plus possibly the siblings of such nodes

• Time O(1) is spent at each node

• The running time of each operation is O(h), where h is theheight of the tree

Page 7: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-13

40

30

20

10

Time Complexity

• The height of binary searchtree is O(n) in the worstcase, where a binary searchtree looks like a sortedsequence.

• To achieve good runningtime, we need to keep thetree balanced, i.e., withO(log n) height

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-14

AVL Trees

• Named after its inventors, Adel’son-Vel’skii and Landis,AVL trees are special binary search trees that guarantee thetrees are nearly balanced.

• Since the trees are always nearly balanced, the worse caseperformance using AVL trees is bounded to O(nlogn).

Page 8: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-15

AVL Trees

• An AVL Tree is defined as:– a binary search tree;

– each internal node stores an item (k, e) of a dictionary;

– any subtree of an AVL tree is also an AVL tree;

– for every internal node v of T, the heights of the children of v candiffer by at most 1.

– external nodes do not hold elements but serve as place holders.

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-16

Example

Page 9: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-17

Height of AVL Trees

• Proposition: The height of an AVL tree T storing n keys isO(log n).

• Justification: The easiest way to approach this problem isto try to find the minimum number of internal nodes of anAVL tree of height h: N(h).

• We have the base cases that N(1) = 1 and N(2) = 2.

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-18

Induction

• for n≥3, an AVL tree of height h with N(h) minimal contains the rootnode, one AVL subtree of height h-1 and the other AVL subtree ofheight h-2.

• i.e. N(h) = 1 + N(h-1) + N(h-2)

• Knowing N(h-1) > N(h-2), we get N(h) > 2N(h-2)– N(h) > 2N(h-2)

– N(h) > 4N(h-4)

– …

– N(h) > 2iN(h-2i)

• Let i = h/2-1, we get: N(h) > 2h/2-1

• Taking logarithms: h < 2log N(h) + 2

• Thus the height of an AVL tree is O(log n)

Page 10: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-19

A More Accurate Estimation

• We can have a more accurate estimation on the height ofan AVL tree.

• Before we start, we must look at some mathematicalmaterial:

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-20

Fibonacci Numbers Revisited

• The Fibonacci number is defined as:– F(0) = 0

– F(1) = 1

– F(n) = F(n-1) + F(n-2) for all n ≥ 2

• The Fibonacci numbers form the sequence:– 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, …

• The Fibonacci numbers F(n) grow exponentially with n.That is F(n) = O(2n).

Page 11: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-21

251

2515

)(

and

where

)(

+

===

βα

βα nn

nF

1

1)2()(

5

22

−=−+=

++ − hh

hFhNβα

Fibonacci Numbers and AVL Trees

• Using induction, we can prove that the Fibonacci number

• We can also prove that the minimum number of theinternal node in an AVL tree of height h is:

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-22

328.0)1log(44.1

2

)1log(5loglog)2(

)1(5

1

log)1log(5log

25

2

−+≈−≤

++≤++≤

−≥

++

+

+

n

h

nh

n

n

n

h

h

α

α

αα

Height of AVL Trees

• So, now we have:

Page 12: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-23

Searching the AVL Trees

• Since AVL trees are binary search trees, it is not a surprisethat searching the AVL trees are the same as searching thebinary search trees.

• Insertion and removal, on the other hand, is much moreinvolved.

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-24

Insertions

• A binary search tree T is called balanced if for every nodev, the height of v’s children differ by at most one.

• Inserting a node into an AVL tree involves performing anexpandExternal (w) on T, which changes the heights ofsome of the nodes in T.

• If an insertion causes T to become unbalanced, we travelup the tree from the newly created node until we find thefirst node x such that its grandparent z is unbalanced node.

Page 13: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-25

Insertions

• Since z became unbalanced by an insertion in the subtreerooted at its child y, height(y) = height(sibling(y)) + 2

• To rebalance the subtree rooted at z, we must perform arestructuring– we rename x, y, and z to a, b, and c based on the order of the nodes

in an in-order traversal.

– z is replaced by b, of which the children are now a and c, of whichthe children, in turn, consist of the four other subtrees that formerlyare children of x, y, and z.

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-26

Insertion

Page 14: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-27

Insertion

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-28

Restructuring

• When the AVL tree becomes unbalanced due to insertionand/or removal, we have to rebalance the tree so it is stillan AVL tree.

• The rebalancing operation is call restructuring. Thisoperation can be divided into two groups:– single rotations

– double rotations

• Each of these groups has two different scenarios.

Page 15: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-29

Single Rotation

T0T1

T2

T3

c = xb = y

a = z

T0 T1 T2

T3

c = xb = y

a = zsingle rotation

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-30

Single Rotation

T3T2

TT0

a = xb = y

c = z

T3T2T1

T0

a = xb = y

c = zsingle rotation

Page 16: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-31

Double Rotation

double rotationa = z

b = xc = y

T0T2

T1

T3 T0

T2T3T1

a = zb = x

c = y

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-32

Double Rotation

double rotationc = z

b = xa = y

T3T1

T2

T0 T3

T1T0 T2

c = zb = x

a = y

Page 17: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-33

Algorithm restructure (x)

Algorithm restructure (x):

Input: A node x of a binary search tree T that has both a

parent y and a grandparent z

Output: Tree T restructured by a rotation (either single or

double) involving nodes x, y, and z.

let (a, b, c) be an inorder listing of the nodes x, y, z

let (T0, T1, T2, T3) be an inorder listing of the the four subtrees

of x, y, and z not rooted at x, y, or z

replace the subtree rooted at z with a new subtree rooted at b

let a be the left child of b

let T0, T1 be the left and right subtrees of a, respectively.

let c be the right child of b

let T2, T3 be the left and right subtrees of c, respectively.

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-34

Removal

• Performing a removeAboveExternal (w) can cause T tobecome unbalanced.

• Let z be the first unbalanced node encountered whiletravelling up the tree from w. Also, let y be the child of zwith the larger height, and let x be the child of y with thelarger height.

• We can perform operation restructure (x) to restore balanceat the subtree rooted at z.

• As this restructuring may upset the balance of anothernode higher in the tree, we must continue checking forbalance until the root of T is reached.

Page 18: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-35

Removal

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-36

Removal

Page 19: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-37

Removal

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-38

Removal

Page 20: Search Trees - Carleton Universitypeople.scs.carleton.ca/~achan/teaching/2402-notes/COMP...COMP 2002/2402 Introduction to Data Structures and Data Types Version 03.s 9-1 Search Trees

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-39

AVL Trees Summary

• Guaranteed O(log n) performance for all operations:– Search

– Insertion

– Removal

• A bit more complex than binary search:– Restructure through rotation

– Once at most for each insertion

– Up to O(h) or O(log n) for each removal.

Albert Chanhttp://www.scs.carleton.ca/~achan

School of Computer Science, Carleton UniversityCOMP 2002/2402 Introduction to Data Structures and Data Types

Version 03.s9-40

Other Search Trees

• 2-3-4 trees

• Red-black trees

• AA trees

• Splay trees

• B trees

• B+ trees

• …

• Many others