csce350 algorithms and data structure lecture 17 jianjun hu department of computer science and...

CSCE350 Algorithms and Data Structure

Lecture 17

Jianjun Hu

Department of Computer Science and Engineering

University of South Carolina

2009.11.

Midterm2

Average 84

>90 12 students

B-Trees

Organize data for fast queries

Index for fast search

For datasets of structured records, B-tree indexing is used

B-Trees 4

Motivation (cont.)

Assume that we use an AVL tree to store about 20 million records

We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes about 0.2 seconds

We know we can’t improve on the log n lower bound on search for a binary tree

But, the solution is to use more branches and thus reduce the height of the tree!

• As branching increases, depth decreases

B-Trees 5

Constructing a B-tree

17

3 8 28 48

1 2 6 7 12 14 16 52 53 55 6825 26 29 45

B-Trees 6

Definition of a B-tree

A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which:

1. the number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree

2. all leaves are on the same level3. all non-leaf nodes except the root have at least m / 2

children4. the root is either a leaf node, or it has from two to m

children5. a leaf node contains no more than m – 1 keys

The number m should always be odd

B-Trees 7

Comparing Trees

Binary trees• Can become unbalanced and lose their good time complexity

(big O)• AVL trees are strict binary trees that overcome the balance

problem• Heaps remain balanced but only prioritise (not order) the keys

Multi-way trees• B-Trees can be m-way, they can have any (odd) number of

children• One B-Tree, the 2-3 (or 3-way) B-Tree, approximates a

permanently balanced binary tree, exchanging the AVL tree’s balancing operations for insertion and (more complex) deletion operations

Chapter 8: Dynamic Programming

Dynamic Programming is a general algorithm design technique

Invented by American mathematician Richard Bellman in the 1950s to solve optimization problems

“Programming” here means “planning”

Main idea: • solve several smaller (overlapping) subproblems• record solutions in a table so that each subproblem is only

solved once• final state of the table will be (or contain) solution

Example: Fibonacci numbers

Recall definition of Fibonacci numbers:

f (0) = 0f (1) = 1f (n) = f (n-1) + f (n-2)

Computing the nth Fibonacci number recursively (top-down):

f(n)

f(n-1) + f(n-2)

f(n-2) + f(n-3) f(n-3) + f(n-4)

...

Example: Fibonacci numbers

Computing the nth Fibonacci number using bottom-up iteration: f(0) = 0 f(1) = 1 f(2) = 0+1 = 1 f(3) = 1+1 = 2 f(4) = 1+2 = 3 f(5) = 2+3 = 5 … f(n-2) = f(n-1) = f(n) = f(n-1) + f(n-2)

Examples of Dynamic Programming Algorithms

Computing binomial coefficients

Optimal chain matrix multiplication

Constructing an optimal binary search tree

Warshall’s algorithm for transitive closure

Floyd’s algorithms for all-pairs shortest paths

Some instances of difficult discrete optimization problems:

• travelling salesman• knapsack

The problem: check if all pair have a valid path

a

c d

b

You can run depth first search from each vertex…But can u do better?

Warshall’s Algorithm: Transitive Closure

• Computes the transitive closure of a relationComputes the transitive closure of a relation

• (Alternatively: all paths in a directed graph)(Alternatively: all paths in a directed graph)

• Example of transitive closure:Example of transitive closure:

3

42

1

0 0 1 01 0 0 10 0 0 00 1 0 0

0 0 1 01 1 11 1 10 0 0 011 1 1 11 1

3

42

1

Warshall’s Algorithm

Main idea: a path exists between two vertices i, j, iff•there is an edge from i to j; or•there is a path from i to j going through vertex 1; or•there is a path from i to j going through vertex 1 and/or 2; or•there is a path from i to j going through vertex 1, 2, and/or 3; or•...•there is a path from i to j going through any of the other vertices


3

42

13

42

13

42

13

42

1

R0

0 0 1 01 0 0 10 0 0 00 1 0 0

R1

0 0 1 01 0 11 10 0 0 00 1 0 0

R2

0 0 1 01 0 1 10 0 0 01 1 1 1 11 1

R3

0 0 1 01 0 1 10 0 0 01 1 1 1

R4

0 0 1 01 11 1 10 0 0 01 1 1 1

3

42

1


In the kth stage determine if a path exists between two vertices i, j using just vertices among 1,…,k

R(k-1)[i,j] (path using just 1 ,…,k-1) R(k)[i,j] = or

(R(k-1)[i,k] and R(k-1)[k,j]) (path from i to k and from k to i using just 1 ,…,k-1)

i

j

k

kth stage

{

Pseudocode of Warshall’s Algorithm

)(

)1()1()1()(

)0(

]),[],[(],[],[

1

1

1

])..1,..1[(

n

kkkk

R

jkRkiRjiRjiR

nj

ni

nk

AR

nnAWarshall

return

and or

do to for

do to for

do to for

ALGORITHM

Example

Time efficiency?

a

c d

b

Intuitive understanding of Warshall algo

Can the above Warshall algorithm find all possible paths?

Does Warshall algorithm avoid duplicate updates?

i=6 9 12 3 7 5=j

69

912

123

37

75=j

Floyd’s Algorithm: All pairs shortest paths

In a weighted graph, find shortest paths between every pair of vertices

Same idea: construct solution through series of matrices D(0), D(1), … using an initial subset of the vertices as intermediaries.

2

43

12

3 6

7

1064

1073

022

301

4321

091664

10773

65022

431001

4321

Similar to Warshall’s Algorithm

in is equal to the length of shortest path among all paths from the ith vertex to jth vertex with each intermediate vertex, if any, numbered not higher than k

)(kijd )(kD

v i v j

v k

di k

(k-1)d

k j

(k-1)

di j

(k-1)

ijijk

kjk

ikk

ijk

ij wdkdddd )0()1()1()1()( ,1},min{ for

Pseudocode of Floyd’s Algorithm

The next matrix in sequence can be written over its predecessor

D

jkDkiDjiDjiD

nj

ni

nk

WD

nnWFloyd

return

do to for

do to for

do to for

ALGORITHM

]},[],[],,[min{],[

1

1

1

])..1,..1[(

Example

Time efficiency?

1

3 4

22

37

6

1

Principle of Optimality

An optimal solution to any instance of a optimization problem is composed of optimal solution to its subinstance

e.g.

Shortest path between a and b… then any distance of two points S, T on the path from a to b must also be shortest….

Why?

Otherwise, we can choose another path S’ and T’!

Key issues in DP algorithm design

Derive a recurrence relationship that expresses a solution to an instance of the problem in terms of solutions to its smaller subinstances

Design Algo for Sequence Alignment Problem using DP

ACTGGAGGCCA

AGCTAGAGAAG

Score= Sum(si)match: +3

Mismatch:-2gap-X -1

csce350 algorithms and data structure lecture 17 jianjun hu department of computer science and...

Documents