csce350 algorithms and data structure lecture 17 jianjun hu department of computer science and...
TRANSCRIPT
CSCE350 Algorithms and Data Structure
Lecture 17
Jianjun Hu
Department of Computer Science and Engineering
University of South Carolina
2009.11.
Midterm2
Average 84
>90 12 students
B-Trees
Organize data for fast queries
Index for fast search
For datasets of structured records, B-tree indexing is used
B-Trees 4
Motivation (cont.)
Assume that we use an AVL tree to store about 20 million records
We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes about 0.2 seconds
We know we can’t improve on the log n lower bound on search for a binary tree
But, the solution is to use more branches and thus reduce the height of the tree!
• As branching increases, depth decreases
B-Trees 5
Constructing a B-tree
17
3 8 28 48
1 2 6 7 12 14 16 52 53 55 6825 26 29 45
B-Trees 6
Definition of a B-tree
A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which:
1. the number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree
2. all leaves are on the same level3. all non-leaf nodes except the root have at least m / 2
children4. the root is either a leaf node, or it has from two to m
children5. a leaf node contains no more than m – 1 keys
The number m should always be odd
B-Trees 7
Comparing Trees
Binary trees• Can become unbalanced and lose their good time complexity
(big O)• AVL trees are strict binary trees that overcome the balance
problem• Heaps remain balanced but only prioritise (not order) the keys
Multi-way trees• B-Trees can be m-way, they can have any (odd) number of
children• One B-Tree, the 2-3 (or 3-way) B-Tree, approximates a
permanently balanced binary tree, exchanging the AVL tree’s balancing operations for insertion and (more complex) deletion operations
Chapter 8: Dynamic Programming
Dynamic Programming is a general algorithm design technique
Invented by American mathematician Richard Bellman in the 1950s to solve optimization problems
“Programming” here means “planning”
Main idea: • solve several smaller (overlapping) subproblems• record solutions in a table so that each subproblem is only
solved once• final state of the table will be (or contain) solution
Example: Fibonacci numbers
Recall definition of Fibonacci numbers:
f (0) = 0f (1) = 1f (n) = f (n-1) + f (n-2)
Computing the nth Fibonacci number recursively (top-down):
f(n)
f(n-1) + f(n-2)
f(n-2) + f(n-3) f(n-3) + f(n-4)
...
Example: Fibonacci numbers
Computing the nth Fibonacci number using bottom-up iteration: f(0) = 0 f(1) = 1 f(2) = 0+1 = 1 f(3) = 1+1 = 2 f(4) = 1+2 = 3 f(5) = 2+3 = 5 … f(n-2) = f(n-1) = f(n) = f(n-1) + f(n-2)
Examples of Dynamic Programming Algorithms
Computing binomial coefficients
Optimal chain matrix multiplication
Constructing an optimal binary search tree
Warshall’s algorithm for transitive closure
Floyd’s algorithms for all-pairs shortest paths
Some instances of difficult discrete optimization problems:
• travelling salesman• knapsack
The problem: check if all pair have a valid path
a
c d
b
You can run depth first search from each vertex…But can u do better?
Warshall’s Algorithm: Transitive Closure
• Computes the transitive closure of a relationComputes the transitive closure of a relation
• (Alternatively: all paths in a directed graph)(Alternatively: all paths in a directed graph)
• Example of transitive closure:Example of transitive closure:
3
42
1
0 0 1 01 0 0 10 0 0 00 1 0 0
0 0 1 01 1 11 1 10 0 0 011 1 1 11 1
3
42
1
Warshall’s Algorithm
Main idea: a path exists between two vertices i, j, iff•there is an edge from i to j; or•there is a path from i to j going through vertex 1; or•there is a path from i to j going through vertex 1 and/or 2; or•there is a path from i to j going through vertex 1, 2, and/or 3; or•...•there is a path from i to j going through any of the other vertices
Warshall’s Algorithm
3
42
13
42
13
42
13
42
1
R0
0 0 1 01 0 0 10 0 0 00 1 0 0
R1
0 0 1 01 0 11 10 0 0 00 1 0 0
R2
0 0 1 01 0 1 10 0 0 01 1 1 1 11 1
R3
0 0 1 01 0 1 10 0 0 01 1 1 1
R4
0 0 1 01 11 1 10 0 0 01 1 1 1
3
42
1
Warshall’s Algorithm
In the kth stage determine if a path exists between two vertices i, j using just vertices among 1,…,k
R(k-1)[i,j] (path using just 1 ,…,k-1) R(k)[i,j] = or
(R(k-1)[i,k] and R(k-1)[k,j]) (path from i to k and from k to i using just 1 ,…,k-1)
i
j
k
kth stage
{
Pseudocode of Warshall’s Algorithm
)(
)1()1()1()(
)0(
]),[],[(],[],[
1
1
1
])..1,..1[(
n
kkkk
R
jkRkiRjiRjiR
nj
ni
nk
AR
nnAWarshall
return
and or
do to for
do to for
do to for
ALGORITHM
Example
Time efficiency?
a
c d
b
Intuitive understanding of Warshall algo
Can the above Warshall algorithm find all possible paths?
Does Warshall algorithm avoid duplicate updates?
i=6 9 12 3 7 5=j
69
912
123
37
75=j
Floyd’s Algorithm: All pairs shortest paths
In a weighted graph, find shortest paths between every pair of vertices
Same idea: construct solution through series of matrices D(0), D(1), … using an initial subset of the vertices as intermediaries.
2
43
12
3 6
7
1064
1073
022
301
4321
091664
10773
65022
431001
4321
Similar to Warshall’s Algorithm
in is equal to the length of shortest path among all paths from the ith vertex to jth vertex with each intermediate vertex, if any, numbered not higher than k
)(kijd )(kD
v i v j
v k
di k
(k-1)d
k j
(k-1)
di j
(k-1)
ijijk
kjk
ikk
ijk
ij wdkdddd )0()1()1()1()( ,1},min{ for
Pseudocode of Floyd’s Algorithm
The next matrix in sequence can be written over its predecessor
D
jkDkiDjiDjiD
nj
ni
nk
WD
nnWFloyd
return
do to for
do to for
do to for
ALGORITHM
]},[],[],,[min{],[
1
1
1
])..1,..1[(
Example
Time efficiency?
1
3 4
22
37
6
1
Principle of Optimality
An optimal solution to any instance of a optimization problem is composed of optimal solution to its subinstance
e.g.
Shortest path between a and b… then any distance of two points S, T on the path from a to b must also be shortest….
Why?
Otherwise, we can choose another path S’ and T’!
Key issues in DP algorithm design
Derive a recurrence relationship that expresses a solution to an instance of the problem in terms of solutions to its smaller subinstances
Design Algo for Sequence Alignment Problem using DP
ACTGGAGGCCA
AGCTAGAGAAG
Score= Sum(si)match: +3
Mismatch:-2gap-X -1