section 10.2: applications of trees

43
Trees (Ch. 10.2) Longin Jan Latecki Temple University based on slides by Simon Langley, Shang-Hua Teng, and William Albritton

Upload: aretha-lowery

Post on 30-Dec-2015

53 views

Category:

Documents


5 download

DESCRIPTION

Trees (Ch. 10.2) Longin Jan Latecki Temple University based on slides by Simon Langley, Shang-Hua Teng, and William Albritton. Section 10.2: Applications of Trees. Binary search trees A simple data structure for sorted lists Decision trees Minimum comparisons in sorting algorithms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Section 10.2: Applications of Trees

Trees (Ch. 10.2)

Longin Jan LateckiTemple University

based on slides bySimon Langley,

Shang-Hua Teng, and William Albritton

Page 2: Section 10.2: Applications of Trees

Section 10.2: Applications of Trees

Binary search trees• A simple data structure for sorted lists

Decision trees• Minimum comparisons in sorting algorithms

Prefix codes• Huffman coding

Page 3: Section 10.2: Applications of Trees

Basic Data Structures - Trees

Informal: a tree is a structure that looks like a real tree (up-side-down)

Formal: a tree is a connected graph with no cycles.

Page 4: Section 10.2: Applications of Trees

Trees - Terminology

x

b e m

c d a

root

leaf

height=2

size=7

Every node must have its value(s)Non-leaf node has subtree(s)Non-root node has a single parent node

value

subtree

nodes

Page 5: Section 10.2: Applications of Trees

Types of Tree

Binary Tree

m-ary Trees

Each node has at most 2 sub-trees

Each node has at most m sub-trees

Page 6: Section 10.2: Applications of Trees

Binary Search Trees

A binary search tree: … is a binary tree. if a node has value N, all values in its

left sub-tree are less than or equal to N, and all values in its right sub-tree are greater than N.

Page 7: Section 10.2: Applications of Trees

Binary Search Tree Format

Items are stored at individual tree nodes.

We arrange for the tree to always obey this invariant:

For every item x,• Every node in x’s left

subtree is less than x.

• Every node in x’s right subtree is greater than x.

7

3 12

1 5 9 15

0 2 8 11

Example:

Page 8: Section 10.2: Applications of Trees

This is NOT a binary search tree

5

4 7

3 2 8 9

Page 9: Section 10.2: Applications of Trees

This is a binary search tree

Page 10: Section 10.2: Applications of Trees

Searching a binary search tree

search(t, s) {

If(s == label(t))

return t;

If(t is leaf) return null

If(s < label(t))

search(t’s left tree, s)

else

search(t’s right tree, s)}

h

Time per level

O(1)

O(1)

Total O(h)

Page 11: Section 10.2: Applications of Trees

Searching a binary search tree

search( t, s )

{ while(t != null)

{ if(s == label(t)) return t;

if(s < label(t)

t = leftSubTree(t);

else

t = rightSubTree(t);

}

return null;

h

Time per level

O(1)

O(1)

Total O(h)

Page 12: Section 10.2: Applications of Trees

Here’s another function that does the same (we search for label s):

TreeSearch(t, s)

while (t != NULL and s != label[t])

if (s < label[t])

t = left[t];

else

t = right[t];

return t;

Page 13: Section 10.2: Applications of Trees

Insertion in a binary search tree:we need to search before we insert

5

3 8

2 4 7 9

Time complexity ?

Insert 6 6

6

6

6

Insert 1111

11

11

O(height_of_tree)O(log n) if it is balanced n = size of the tree

always insert to a leaf

Page 14: Section 10.2: Applications of Trees

Insertion

insertInOrder(t, s)

{ if(t is an empty tree) // insert here

return a new tree node with value s

else if( s < label(t))

t.left = insertInOrder(t.left, s )

else

t.right = insertInOrder(t.right, s)

return t }

Page 15: Section 10.2: Applications of Trees

Recursive Binary Tree Insert

procedure insert(T: binary tree, x: item)v := root[T]if v = null then begin

root[T] := x; return “Done” endelse if v = x return “Already present”else if x < v then

return insert(leftSubtree[T], x)else {must be x > v}

return insert(rightSubtree[T], x)

Page 16: Section 10.2: Applications of Trees

Comparison –Insertion in an ordered list

Insert 6

Time complexity?

2 3 4 5 7 98

6 6 6 6

6

O(n) n = size of the list

insertInOrder(list, s) { loop1: search from beginning of list, look for an item >= s loop2: shift remaining list to its right, start from the end of list insert s}

6 7 8 9

Page 17: Section 10.2: Applications of Trees

Try it!!

Build binary search trees for the following input sequences• 7, 4, 2, 6, 1, 3, 5, 7

• 7, 1, 2, 3, 4, 5, 6, 7

• 7, 4, 2, 1, 7, 3, 6, 5

• 1, 2, 3, 4, 5, 6, 7, 8

• 8, 7, 6, 5, 4, 3, 2, 1

Page 18: Section 10.2: Applications of Trees

Decision Trees

A decision tree represents a decision-making process.• Each possible “decision point” or situation is

represented by a node.

• Each possible choice that could be made at that decision point is represented by an edge to a child node.

In the extended decision trees used in decision analysis, we also include nodes that represent random events and their outcomes.

Page 19: Section 10.2: Applications of Trees

Coin-Weighing Problem

Imagine you have 8 coins, oneof which is a lighter counterfeit, and a free-beam balance.• No scale of weight markings

is required for this problem!

How many weighings are needed to guarantee that the counterfeit coin will be found?

?

Page 20: Section 10.2: Applications of Trees

As a Decision-Tree Problem In each situation, we pick two disjoint and

equal-size subsets of coins to put on the scale.

The balance then“decides” whether to tip left, tip right, or stay balanced.

A given sequence ofweighings thus yieldsa decision tree withbranching factor 3.

Page 21: Section 10.2: Applications of Trees

Applying the Tree Height Theorem

The decision tree must have at least 8 leaf nodes, since there are 8 possible outcomes.• In terms of which coin is the counterfeit one.

Recall the tree-height theorem, h≥logm.• Thus the decision tree must have height

h ≥ log38 = 1.893… = 2. Let’s see if we solve the problem with only 2

weightings…

Page 22: Section 10.2: Applications of Trees

General Solution Strategy The problem is an example of searching for 1 unique particular

item, from among a list of n otherwise identical items. • Somewhat analogous to the adage of “searching for a needle in

haystack.” Armed with our balance, we can attack the problem using a divide-

and-conquer strategy, like what’s done in binary search.• We want to narrow down the set of possible locations where the

desired item (coin) could be found down from n to just 1, in a logarithmic fashion.

Each weighing has 3 possible outcomes.• Thus, we should use it to partition the search space into 3 pieces that

are as close to equal-sized as possible. This strategy will lead to the minimum possible worst-case number

of weighings required.

Page 23: Section 10.2: Applications of Trees

Coin Balancing Decision Tree

Here’s what the tree looks like in our case:

123 vs 456

1 vs. 2

left:123 balanced:

78right:456

7 vs. 84 vs. 5

L:1 R:2 B:3 L:4R:5 B:6 L:7 R:8

Page 24: Section 10.2: Applications of Trees

General Balance Strategy

On each step, put n/3 of the n coins to be searched on each side of the scale.• If the scale tips to the left, then:

• The lightweight fake is in the right set of n/3 ≈ n/3 coins.

• If the scale tips to the right, then:• The lightweight fake is in the left set of n/3 ≈ n/3 coins.

• If the scale stays balanced, then:• The fake is in the remaining set of n − 2n/3≈ n/3 coins

that were not weighed!

You can prove that this strategy always leads to a balanced 3-ary tree.

Page 25: Section 10.2: Applications of Trees

Suppose we have 3GB character data file that we wish to include in an email.

Suppose file only contains 26 letters {a,…,z}. Suppose each letter in {a,…,z} occurs with frequency

f. Suppose we encode each letter by a binary code If we use a fixed length code, we need 5 bits for each

character The resulting message length is

Can we do better?

Data Compression

zba fff 5

Page 26: Section 10.2: Applications of Trees

Data Compression: A Smaller Example Suppose the file only has 6 letters {a,b,c,d,e,f}

with frequencies

Fixed length 3G=3000000000 bits Variable length

110011011111001010

101100011010001000

05.09.16.12.13.45.

fedcba

Fixed length

Variable length

G24.2405.409.316.312.313.145.

Page 27: Section 10.2: Applications of Trees

How to decode?

At first it is not obvious how decoding will happen, but this is possible if we use prefix codes

Page 28: Section 10.2: Applications of Trees

Prefix Codes No encoding of a

character can be the prefix of the longer encoding of another character:

We could not encode t as 01 and x as 01101 since 01 is a prefix of 01101

By using a binary tree representation we generate prefix codes with letters as leaves

e

a

t

n s

0 1

1

1

1

0

0

0

Page 29: Section 10.2: Applications of Trees

Decoding prefix codes

Follow the tree until it reaches to a leaf, and then repeat!

A message can be decoded uniquely!

Page 30: Section 10.2: Applications of Trees

Prefix codes allow easy decoding

e

a

t

n s

0 1

1

1

1

0

0

0

Decode:

11111011100

s 1011100

sa 11100

san 0

sane

Page 31: Section 10.2: Applications of Trees

Some Properties

Prefix codes allow easy decoding An optimal code must be a full binary tree (a

tree where every internal node has two children)

For C leaves there are C-1 internal nodes The number of bits to encode a file is

ccfT TCc

length )()B(

where f(c) is the freq of c, lengthT(c) is the tree depth of c, which corresponds to the code length of c

Page 32: Section 10.2: Applications of Trees

Optimal Prefix Coding Problem

Given is a set of n letters (c1,…, cn) with frequencies (f1,…, fn).

Construct a full binary tree T to define a prefix code that minimizes the average code length

iT

n

i i cfT length )Average(1

Page 33: Section 10.2: Applications of Trees

Greedy Algorithms

Many optimization problems can be solved using a greedy approach• The basic principle is that local optimal decisions may be used to

build an optimal solution

• But the greedy approach may not always lead to an optimal solution overall for all problems

• The key is knowing which problems will work with this approach and which will not

We study• The problem of generating Huffman codes

Page 34: Section 10.2: Applications of Trees

Greedy algorithms A greedy algorithm always makes the choice that looks

best at the moment• My everyday examples:

• Driving in Los Angeles, NY, or Boston for that matter

• Playing cards

• Invest on stocks

• Choose a university

• The hope: a locally optimal choice will lead to a globally optimal solution

• For some problems, it works

Greedy algorithms tend to be easier to code

Page 35: Section 10.2: Applications of Trees

David Huffman’s idea

A Term paper at MIT

Build the tree (code) bottom-up in a greedy fashion

Each tree has a weight in its root and symbols as its leaves.

We start with a forest of one vertex trees representing the input symbols.

We recursively merge two trees whose sum of weights is minimal until we have only one tree.

Page 36: Section 10.2: Applications of Trees

The Huffman Coding algorithm- History

In 1951, David Huffman and his MIT information theory classmates given the choice of a term paper or a final exam

Huffman hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient.

In doing so, the student outdid his professor, who had worked with information theory inventor Claude Shannon to develop a similar code.

Huffman built the tree from the bottom up instead of from the top down

Page 37: Section 10.2: Applications of Trees

Huffman Coding Algorithm

1. Take the two least probable symbols in the alphabet

2. Combine these two symbols into a single symbol, and repeat.

Page 38: Section 10.2: Applications of Trees

Example

Ax={ a , b , c , d , e }

Px={0.25, 0.25, 0.2, 0.15, 0.15}

d0.15

e0.15

b0.25

c0.2

a0.25

0.3

0 1

0.45

0 1

0.55

0

1

1.0

0

1

00 10 11 010 011

Page 39: Section 10.2: Applications of Trees

Building the Encoding Tree

Page 40: Section 10.2: Applications of Trees

Building the Encoding Tree

Page 41: Section 10.2: Applications of Trees

Building the Encoding TreeBuilding the Encoding Tree

Page 42: Section 10.2: Applications of Trees

Building the Encoding TreeBuilding the Encoding Tree

Page 43: Section 10.2: Applications of Trees

Building the Encoding TreeBuilding the Encoding Tree