trees col 106 - indian institute of technology...

62
1 The tree data structure Trees COL 106 1 Acknowledgement :Many slides are courtesy Douglas Harder, UWaterloo Amit Kumar Shweta Agrawal

Upload: others

Post on 16-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

1

The tree data structure

TreesCOL 106

1

Acknowledgement :Many slides are courtesy

Douglas Harder, UWaterloo

Amit Kumar

Shweta Agrawal

Page 2: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

3

The tree data structure

Trees

A rooted tree data structure stores information in nodes

– Similar to linked lists:

• There is a first node, or root

• Each node has variable number of references to successors

(children)

• Each node, other than the root, has exactly one node as its

predecessor (or parent)

Page 3: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

4

The tree data structure

What are trees suitable for ?

4

Page 4: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

5

The tree data structure

To store hierarchy of people

Page 5: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

6

The tree data structure

To store organization of departments

6

Page 6: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

7

The tree data structure

To capture the evolution of languages

7

Page 7: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

8

The tree data structure

To organize file-systems

Unix file system

Page 8: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

9

The tree data structure

Markup elements in a webpage

9

Page 9: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

10

The tree data structure

To store phylogenetic data

10

This will be our running example. Will illustrate tree

concepts using actual phylogenetic data.

Page 10: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

11

The tree data structure

Terminology

All nodes will have zero or more child nodes or children

– I has three children: J, K and L

For all nodes other than the root node, there is one

parent node

– H is the parent of I

Page 11: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

12

The tree data structure

Terminology

The degree of a node is defined as the number of its

children: deg(I) = 3

Nodes with the same parent are siblings

– J, K, and L are siblings

Page 12: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

13

The tree data structure

Terminology

Phylogenetic trees have nodes with degree 2 or 0:

Page 13: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

14

The tree data structure

Terminology

Nodes with degree zero are also called leaf nodes

All other nodes are said to be internal nodes, that is, they

are internal to the tree

Page 14: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

15

The tree data structure

Terminology

Leaf nodes:

Page 15: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

16

The tree data structure

Terminology

Internal nodes:

Page 16: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

17

The tree data structure

Terminology

These trees are equal if the order of the children is ignored (Unordered trees )

They are different if order is relevant (ordered trees)– We will usually examine ordered trees (linear orders)

– In a hierarchical ordering, order is not relevant

Page 17: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

18

The tree data structure

Terminology

The shape of a rooted tree gives a natural

flow from the root node, or just root

Page 18: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

19

The tree data structure

Terminology

A path is a sequence of nodes

(a0, a1, ..., an)

where ak + 1 is a child of ak is

The length of this path is n

E.g., the path (B, E, G)

has length 2

Page 19: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

20

The tree data structure

Terminology

Paths of length 10 (11 nodes) and 4 (5 nodes)

Start of these paths

End of these paths

Page 20: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

21

The tree data structure

Terminology

For each node in a tree, there exists a unique path from

the root node to that node

The length of this path is the depth of the node, e.g.,

– E has depth 2

– L has depth 3

Page 21: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

22

The tree data structure

Terminology

Nodes of depth up to 17

9

14

17

4

0

Page 22: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

23

The tree data structure

Terminology

The height of a tree is defined as the maximum depth of any node within the tree

The height of a tree with one node is 0

– Just the root node

For convenience, we define the height of the empty tree to be –1

Page 23: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

24

The tree data structure

Terminology

The height of this tree is 17

17

Page 24: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

25

The tree data structure

Terminology

If a path exists from node a to node b:– a is an ancestor of b

– b is a descendent of a

Thus, a node is both an ancestor and a descendant of itself– We can add the adjective strict to exclude

equality: a is a strict descendent of b if a is a descendant of b but a ≠ b

The root node is an ancestor of all nodes

Page 25: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

26

The tree data structure

Terminology

The descendants of node B are B, C, D, E, F, and G:

The ancestors of node I are I, H, and A:

Page 26: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

27

The tree data structure

Terminology

All descendants (including itself) of the indicated node

Page 27: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

28

The tree data structure

Terminology

All ancestors (including itself) of the indicated node

Page 28: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

29

The tree data structure

Terminology

Another approach to a tree is to define the tree recursively:– A degree-0 node is a tree

– A node with degree n is a tree if it has n children and all of its children are disjoint trees (i.e., with no intersecting nodes)

Given any node a within a treewith root r, the collection of a andall of its descendants is said tobe a subtree of the tree withroot a

Page 29: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

30

The tree data structure

Example: XHTML

Consider the following XHTML document

<html>

<head>

<title>Hello World!</title>

</head>

<body>

<h1>This is a <u>Heading</u></h1>

<p>This is a paragraph with some

<u>underlined</u> text.</p>

</body>

</html>

Page 30: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

31

The tree data structure

Example: XHTML

Consider the following XHTML document<html>

<head>

<title>Hello World!</title>

</head>

<body>

<h1>This is a <u>Heading</u></h1>

<p>This is a paragraph with some

<u>underlined</u> text.</p>

</body>

</html>

heading

underlining

paragraph

body of page

title

Page 31: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

32

The tree data structure

Example: XHTML

The nested tags define a tree rooted at the HTML tag<html>

<head>

<title>Hello World!</title>

</head>

<body>

<h1>This is a <u>Heading</u></h1>

<p>This is a paragraph with some

<u>underlined</u> text.</p>

</body>

</html>

Page 32: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

33

The tree data structure

Example: XHTML

Web browsers render this tree as a web page

Page 33: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

Iterator ADTMost ADTs in Java can provide an iterator object, used to traverse all the data in any linear ADT.

Page 34: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

Iterator Interface

public interface Iterator<E>{

boolean hasNext();

E next();

void remove(); // Optional

}

Page 35: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

Getting an Iterator

You get an iterator from an ADT by calling the method iterator();

Iterator<Integer> iter = myList.iterator();

Page 36: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

Now a simple while loop can process each data value in the ADT:

while(iter.hasNext()) {

process iter.next()

}

Page 37: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

Adding Iterators to SimpleArrayList is easy

First, we add the iterator() method to SimpleArrayList:

public Iterator<E> iterator(){

return new

ArrayListIterator<E>(this);

}

Page 38: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

Then we implement the iterator class for Lists:

import java.util.*;

public class ArrayListIterator<E>

implements Iterator<E> {

// *** fields ***

private SimpleArrayList<E> list;

private int curPos;

public ArrayListIterator(

SimpleArrayList<E> list) {

this.list = list;

curPos = 0;

}

Page 39: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

public boolean hasNext() {

return curPos < list.size();

}

public E next() {

if (!hasNext()) throw

new NoSuchElementException();

E result = list.get(curPos);

curPos++;

return result;

}

public void remove() {

throw new UnsupportedOperationException();

}

}

Page 40: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

43

The tree data structure

Position ADT • Say vector contains an element “Delhi”

that we want to keep track of• The index of the element may keep

changing depending on insert/delete operations

• A position s, may be associated with the element Delhi (say at time of insertion)

• s lets us access Delhi via s.element() even if the index of Delhi changes in the container, unless we explicitly remove s

• A position ADT is associated with a particular container.

43

Delhi

s

Page 41: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

44

The tree data structure

Tree ADT

• We use positions to abstract nodes

• Generic methods:

• integer size()

• boolean isEmpty()

• objectIterator elements()

• positionIterator positions()

• Accessor methods:

• position root()

• position parent(p)

• positionIterator children(p)

44

• Query methods:

• boolean isInternal(p)

• boolean isLeaf (p)

• boolean isRoot(p)

• Update methods:

• swapElements(p, q)

• object replaceElement(p, o)

• Additional update methods may

be defined by data structures

implementing the Tree ADT

Page 42: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

45

The tree data structure

0

A Linked Structure for General Trees• A node is represented by

an object storing• Element• Parent node• Sequence of children

nodes

• Node objects implement the Position ADT

45

B

DA

C E

F

B

0 0

A D F

0

C

0

E

Page 43: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

46

The tree data structure

Tree using Array

• Each node contains a field for data and an array of pointers to the children for that node

– Missing child will have null pointer

• Tree is represented by pointer to root

• Allows access to ith child in O(1) time

• Very wasteful in space when only few nodes in tree have many children (most pointers are null)

46

Page 44: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

47

The tree data structure

Tree Traversals

• A traversal visits the nodes of a tree in a systematic manner

• We will see three types of traversals

• Pre-order

• Post-order

• In-order

47

Page 45: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

48

The tree data structure

Flavors of (Depth First) Traversal

• In a preorder traversal, a node is visited before its descendants

• In a postorder traversal, a node is visited after its descendants

• In an inorder traversal a node is visited after its left subtree and before its right subtree

48

Page 46: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

49

The tree data structure

Preorder Traversal

Process the root

Process the nodes in the all subtrees in their order

Lecture 5: Trees

Algorithm preOrder(v)

visit(v)

for each child w of v

preOrder(w)

Page 47: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

Preorder Traversal

A

S

A

M

P

L

E

R

T E

E

Preorder traversal: node is visited before its descendants

Page 48: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

51

The tree data structure

Postorder traversal

1. Process the nodes in all subtrees in their order

2. Process the root

51

Algorithm postOrder(v)

for each child w of v

postOrder(w)

visit(v)

Page 49: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

A

S

A

M

P

L

E

R

T E

E

Postorder traversal: node is visited before its descendants

Postorder Traversal

Page 50: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

53

The tree data structure

Inorder traversal

1. Process the nodes in the left subtree

2. Process the root

3. Process the nodes in the right subtree

53

Algorithm InOrder(v)

InOrder(v->left)

visit(v)

InOrder(v->right)

For simplicity, we consider tree having at most 2 children, though it can be generalized.

Page 51: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

A

S

A

M

P

L

E

R

T E

E

Inorder Traversal

Inorder traversal: node is visited after its left subtree

and before its right subtree

Page 52: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

55

The tree data structure

Non-Recursive preorder traversal

1. Start from root.

2. Print the node.

3. Push right child onto to stack.

4. Push left child onto to stack.

5. Pop node from the stack.

6. Repeat Step 2 to 5 till stack is not empty.

55

Page 53: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

Computing Height of Tree

58

Can be computed using the following idea:

1. The height of a leaf node is 0

2. The height of a node other than the leaf is the maximum of the height of the left subtree and the height of the right subtree plus 1.

Height(v) = max[height(vleft) + height(vright)] + 1

Details left as exercise.

Page 54: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

59

The tree data structure

Binary Trees

59

Every node has degree up to 2.

Proper binary tree: each internal node has

degree exactly 2.

Page 55: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

60

The tree data structure

Binary Tree

• A binary tree is a tree with the following properties:• Each internal node has two children

• The children of a node are an ordered pair

• We call the children of an internal node left child and right child

• Alternative recursive definition: a binary tree is either• a tree consisting of a single node, or

• a tree whose root has an ordered pair of children, each of which is a disjoint binary tree

• Applications:

• arithmetic expressions

• decision processes

• searching

A

B C

F GD E

H I

60

Page 56: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

61

The tree data structure

Arithmetic Expression Tree

• Binary tree associated with an arithmetic expression• internal nodes: operators• leaves: operands

• Example: arithmetic expression tree for the expression (2 * (a - 1) + (3 * b))

+

**

-2

a 1

3 b

61

Page 57: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

64

How many leaves L does a proper binary tree of height

h have?

The number of leaves at depth d = 2d

If the height of the tree is h it has 2h

leaves.

L = 2h.

Page 58: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

Data Structures and Algorithms 65

What is the height h of a proper binary tree with L

leaves?

leaves = 1 height = 0

leaves = 2 height = 1

leaves = 4 height = 2

leaves = L height = Log2L

Since L = 2h

log2L = log22h

h = log2L

Page 59: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

66

The number of internal nodes of a proper binary

tree of height h is ?

Internal nodes = 0 height = 0

Internal nodes = 1 height = 1

Internal nodes = 1 + 2 height = 2

Internal nodes = 1 + 2 + 4 height = 3

1 + 2 + 22 + . . . + 2 h-1 = 2h -1

Thus, a complete binary tree of height = h has 2h-1 internal

nodes.

Geometric series

Page 60: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

67

The number of nodes n of a proper binary

tree of height h is ?

nodes = 1 height = 0

nodes = 3 height = 1

nodes = 7 height = 2

nodes = 2h+1- 1 height = h

Since L = 2h

and since the number of internal nodes = 2h-1 the

total number of nodes n = 2h+ 2h-1 = 2(2h) – 1 = 2h+1- 1.

Page 61: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

68

If the number of nodes is n then what is the

height?

nodes = 1 height = 0

nodes = 3 height = 1

nodes = 7 height = 2

nodes = n height = Log2(n+1) - 1

Since n = 2h+1-1

n + 1 = 2h+1

Log2(n+1) = Log2 2h+1

Log2(n+1) = h+1

h = Log2(n+1) - 1

Page 62: Trees COL 106 - Indian Institute of Technology Delhimausam/courses/col106/autumn2017/lectures/07-trees.pdf · 29 The tree data structure Terminology Another approach to a tree is

BinaryTree ADT

• The BinaryTree ADT extends the Tree ADT, i.e., it inherits all the methods of the Tree ADT

• Additional methods:

• position leftChild(p)

• position rightChild(p)

• position sibling(p)

• Update methods may be defined by data structures implementing the BinaryTree ADT

69