iii test dsa

50
Backtracking Algorithms Backtracking Backtracking is a general algorithm for finding all (or some) solutions to some computational problem, that incrementally builds candidates to the solutions, and abandons each partial candidate c ("backtracks") as soon as it determines that c cannot possibly be completed to a valid solution. N Queens Problem The n-queens problem consists in placing n non-attacking queens on an n-by-n chess board. A queen can attack another queen vertically, horizontally, or diagonally. E.g. placing a queen on a central square of the board blocks the row and column where it is placed, as well as the two diagonals (rising and falling) at whose intersection the queen was placed. The algorithm to solve this problem uses backtracking, but we will unroll the recursion. The basic idea is to place queens column by column, starting at the left. New queens must not be attacked by the ones to the left that have already been placed on the board. We place another queen in the next column if a consistent position is found. All rows in the current column are checked. We have found a solution if we placed a queen in the rightmost column. A solution to the four-queens problem is shown. Figure 1 : Four Queens problem

Upload: rituparna-roy-chowdhury

Post on 28-Nov-2014

90 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: III Test Dsa

Backtracking Algorithms

Backtracking

Backtracking is a general algorithm for finding all (or some) solutions to some computational problem,

that incrementally builds candidates to the solutions, and abandons each partial candidate c

("backtracks") as soon as it determines that c cannot possibly be completed to a valid solution.

N Queens Problem

The n-queens problem consists in placing n non-attacking queens on an n-by-n chess board. A queen can

attack another queen vertically, horizontally, or diagonally. E.g. placing a queen on a central square of

the board blocks the row and column where it is placed, as well as the two diagonals (rising and falling)

at whose intersection the queen was placed.

The algorithm to solve this problem uses backtracking, but we will unroll the recursion. The basic idea is

to place queens column by column, starting at the left. New queens must not be attacked by the ones to

the left that have already been placed on the board. We place another queen in the next column if a

consistent position is found. All rows in the current column are checked. We have found a solution if we

placed a queen in the rightmost column. A solution to the four-queens problem is shown.

Figure 1 : Four Queens problem

ADMIN
Highlight
Page 2: III Test Dsa

Backtracking Algorithms

Figure 2 : Three Queens Problem

Figure 3 : State Space tree for N-Queens

Page 3: III Test Dsa

Backtracking Algorithms

MATLAB Implementation

clc;clear; n =4 ; k=1; V=[0 0 0 0 0 0 0 0]; Nqueens(1,n,V);

function [V]=Nqueens(k,n,V) for i=1:n if possible(i,k,n,V) == 1 V(k)=i; if k==n V return; else Nqueens(k+1,n,V); end end end function [x,V]=possible(i,k,n,V) x = 1; for j=1:k-1 if V(j) == i x = 0; end if(abs(V(j)-i) == abs(j-k)) x = 0; end end

Page 4: III Test Dsa

Backtracking Algorithms

Sum of Sub Sets

In the Sum-of-Subsets problem, there are n positive integers (weights) wi and a positive integer W. The

goal is to find all subsets of the integers that sum to W. As mentioned earlier, we usually state our

problems so as to find all solutions.

We create a state space tree. A possible way to structure the tree is:-

The state space tree for n = 4, W = 13, and w1 = 3; w2 = 4; w3 = 5; w4=6;

sumOfSubsets ( i, weightSoFar, totalPossibleLeft ) if (promising ( i )) then

if ( weightSoFar == W )

print X

else

X[ i + 1 ] = "yes”

sumOfSubsets ( i + 1, weightSoFar + w[i + 1],totalPossibleLeft - w[i + 1] )

X[ i + 1 ] = "no”

sumOfSubsets ( i + 1, weightSoFar , totalPossibleLeft - w[i + 1] )

end

end

ADMIN
Highlight
ADMIN
Highlight
Page 5: III Test Dsa

Backtracking Algorithms

promising (i ) return ( weightSoFar + totalPossibleLeft > S) && ( weightSoFar == S || weightSoFar + w[i + 1] < S )

Matlab Implementation

clc;clear; w=[3 4 5 6 9 10]; W = 9; total = 0; for i=1:length(w) X(i)=0; total = total+w(i); end w=sort(w); sum_of_subsets(0,0,total,w,W,X);

function [X,weight,total,W]=sum_of_subsets(i,weight,total,w,W,X) if promising(i,weight,total,w,W)==1 if weight == W X else X(i+1) = w(i+1); sum_of_subsets(i+1, weight+w(i+1), total-w(i+1),w,W,X); X(i+1) = 0; sum_of_subsets(i+1,weight,total-w(i+1),w,W,X); end end

function [x]=promising(i,weight,total,w,W) x=0; x = ((weight+total >= W) & (weight==W || weight+w(i+1)<=W));

Graph Coloring

Let G be a graph with no loops. A k-coloring of G is an assignment of k colors to the vertices of G in such

a way that adjacent vertices are assigned different colors. If G has a k-coloring, then G is said to be k-

coloring, then G is said to be k-colorable. The chromatic number of G, denoted by X(G), is the smallest

number k for which is k-colorable. For example,

MATLAB implementation

clc;clear; A=[0 1 0 1;1 0 1 0;0 1 0 1;1 0 1 0]; for i=1:length(A) v(i)=0; end colors=3; mcoloring(0,A,v,colors);

ADMIN
Highlight
Page 6: III Test Dsa

Backtracking Algorithms

function [x]=graphcolorpromising(i,A,v) x=1; for j=1:i-1

if A(i,j)==1 & v(i) == v(j) x = 0; end end

function [v]=mcoloring(i,A,v,colors) if (graphcolorpromising(i,A,v)==1) if (i == length(A)) v else for color = 1:colors v(i+1) = color; mcoloring(i+1,A,v,colors); end end end

3-coloring

4-coloring

Page 7: III Test Dsa

Backtracking Algorithms

5-coloring

Not a permissible coloring, since one of the edge has color blue at both ends.

Page 8: III Test Dsa

Lecture Notes on Dynamic Programming

Page 1 of 13

Dynamic Programming

In mathematics and computer science, dynamic programming is a method of solving complex problems

by breaking them down into simpler steps. It is applicable to problems that exhibit the properties of

overlapping sub-problems.

Top-down dynamic programming simply means storing the results of certain calculations, which are

then re-used later because the same calculation is a sub-problem in a larger calculation. Bottom-up

dynamic programming involves formulating a complex calculation as a recursive series of simpler

calculations.

Dynamic programming is both a mathematical optimization method, and a computer programming

method. In both contexts, it refers to simplifying a complicated problem by breaking it down into

simpler sub-problems in a recursive manner. If sub-problems can be nested recursively inside larger

problems, so that dynamic programming methods are applicable, then there is a relation between the

value of the larger problem and the values of the sub-problems.

Optimal Binary Search Tree

ADMIN
Highlight
Page 9: III Test Dsa

Lecture Notes on Dynamic Programming

Page 2 of 13

Given sequence a1 < a2 <··· < an of n sorted keys, with a search probability pi for each key ai. Want to

build a binary search tree (BST) with minimum expected search cost.

Cost of BST is

Observations:

• Optimal BST may not have smallest height or may not be height balanced.

• Optimal BST may not have highest-probability key at root.

Let C(i, j) denote the cost of an optimal binary search tree containing ai,…,aj .

The cost of the optimal binary search tree with ak as its root :

∑=

⋅=

n

i

ii pa1

)(depth

ADMIN
Highlight
Page 10: III Test Dsa

Lecture Notes on Dynamic Programming

Page 3 of 13

MATLAB Implementation

clear;clc; %A=[1 3;2 4;3 2;4 1]; A=[1 76;2 15;3 36;4 43;5 64]; for i=1:length(A) sum = 0; for j=1:length(A) if (i<=j) sum = sum + A(j,2); Freq(i,j) = sum; Cost(i,j) = sum; Root(i,j)=A(j,1); end

end end for d=1:length(A) for i=1:length(A)-d j=i+d; mincost = 10000; for k=i:j if k-1>0 left = Cost(i,k-1); else left = 0 end if k+1<=length(A) right = Cost(k+1,j); else right = 0; end if left + right < mincost mincost = left + right; root=k; end end Cost(i,j)= Freq(i,j)+mincost; Root(i,j)= root; end end

Page 11: III Test Dsa

Lecture Notes on Dynamic Programming

Page 4 of 13

Page 12: III Test Dsa

Lecture Notes on Dynamic Programming

Page 5 of 13

Matrix Chain Multiplication

Let A be an n by m matrix, let B be an m by p matrix, then C = AB is an n by p matrix. C = AB can be

computed in O(nmp) time, using traditional matrix multiplication. Suppose I want to compute A1A2A3A4.

Matrix Multiplication is associative, so I can do the multiplication in several different orders.

Example:

A1 is 10 by 100 matrix ; A2 is 100 by 5 matrix; A3 is 5 by 50 matrix; A4 is 50 by 1 matrix; A1A2A3A4 is a 10

by 1 matrix

5 different orderings = 5 different parenthesizations

(A1(A2(A3A4))) ; ((A1A2)(A3A4)); (((A1A2)A3)A4); ((A1(A2A3))A4); (A1((A2A3)A4));

Each parenthesization is a different number of multiplications.

Let Aij = Ai · · ·Aj

(A1(A2(A3A4)))

– A34 = A3A4 , 250 mults, result is 5 by 1

– A24 = A2A34 , 500 mults, result is 100 by 1

– A14 = A1A24 , 1000 mults, result is 10 by 1

– Total is 1750

((A1A2)(A3A4))

– A12 = A1A2 , 5000 mults, result is 10 by 5

Page 13: III Test Dsa

Lecture Notes on Dynamic Programming

Page 6 of 13

– A34 = A3A4 , 250 mults, result is 5 by 1

– A14 = A12A34) , 50 mults, result is 10 by 1

– Total is 5300

(((A1A2)A3)A4)

– A12 = A1A2 , 5000 mults, result is 10 by 5

– A13 = A12A3 , 2500 mults, result is 10 by 50

– A14 = A13A4 , 500 mults, results is 10 by 1

– Total is 8000

((A1(A2A3))A4)

– A23 = A2A3 , 25000 mults, result is 100 by 50

– A13 = A1A23 , 50000 mults, result is 10 by 50

– A14 = A13A4 , 500 mults, results is 10 by

– Total is 75500

(A1((A2A3)A4))

– A23 = A2A3 , 25000 mults, result is 100 by 50

– A24 = A23A4 , 5000 mults, result is 100 by 1

– A14 = A1A24 , 1000 mults, result is 10 by 1

– Total is 31000

Conclusion: Order of operations makes a huge difference.

Page 14: III Test Dsa

Lecture Notes on Dynamic Programming

Page 7 of 13

To calculate the product of a matrix-chain A1A2...An, n-1 matrix multiplications are needed,

though different orders have different costs.

For matrix-chain A1A2A3, if the three have sizes 10-by-2, 2-by-20, and 20-by-5, respectively, then

the cost of (A1A2)A3 is 10*2*20 + 10*20*5 = 1400, while the cost of A1(A2A3) is 2*20*5 +

10*2*5 = 300.

For matrix-chain Ai...Aj where each Ak has dimensions Pk-1-by-Pk, the minimum cost of the

product m[i,j] corresponds to the best way to cut it into Ai...Ak and Ak+1...Aj:

m[i, j] = 0 if i = j

min{ m[i,k] + m[k+1,j] + Pi-1PkPj } if i < j

Use dynamic programming to solve this problem: calculating m for sub-chains with increasing

length, and using another matrix s to keep the cutting point k for each m[i,j].

Example: Given the following matrix dimensions:

A1 is 30-by-35

A2 is 35-by-15

A3 is 15-by-5

A4 is 5-by-10

A5 is 10-by-20

A6 is 20-by-25

Page 15: III Test Dsa

Lecture Notes on Dynamic Programming

Page 8 of 13

then the output of the program is

Optimum Sequence is (A1(A2A3))((A4A5)A6)

Page 16: III Test Dsa

Lecture Notes on Dynamic Programming

Page 9 of 13

MATLAB implementation

clear;clc; %A=[5 4;4 6;6 2;2 7;]; A=[15 55;55 9;9 20;20 13;13 16]; n=length(A); for i=1:n M(i,i) = 0; end for L=2:n for i=1:n-L+1 j=i+L-1; M(i,j)=Inf; for k=i:j-1

q=M(i,k)+M(k+1,j)+A(i,1)*A(k,2)*A(j,2); if q < M(i,j) M(i,j) = q; S(i,j)=k; end end end end

Example

A1=5x4; A2=4x6; A3=6x2; A4=2x7;

P0=5; P1=4; P3=6; P4= 2; P5=7;

Page 17: III Test Dsa

Lecture Notes on Dynamic Programming

Page 10 of 13

Page 18: III Test Dsa

Lecture Notes on Dynamic Programming

Page 11 of 13

Page 19: III Test Dsa

Lecture Notes on Dynamic Programming

Page 12 of 13

All pairs shortest path - Floyd-Warshall algorithm

In computer science, the Floyd–Warshall algorithm (sometimes known as the WFI Algorithm or Roy–

Floyd algorithm) is a graph analysis algorithm for finding shortest paths in a weighted graph. A single

execution of the algorithm will find the shortest paths between all pairs of vertices. The algorithm is an

example of dynamic programming.

FLOYD-WARSHALL(G,A )

n = length(G)

for i := 1 to n do

for j := 1 to n do

if i == j then

A[i,j] := Inf;

Else

A[i,j] := G(i,j);

end if

end for

end for

for k := 1 to n do

for i := 1 to n do

for j := 1 to n do

if A[i,k] + A[k,j] < A[i,j] then

A[i,j] = A[i,k] + A[k,j];

end if

end for

ADMIN
Highlight
Page 20: III Test Dsa

Lecture Notes on Dynamic Programming

Page 13 of 13

end for

end for

END

Example –

G =

InfInf

Inf

InfInf

Inf

11

111

11

111

A =

2121

1211

2121

1112

Example 2

A =

Inf 10 15 5 20

10 Inf 5 5 10

15 5 Inf 10 15

5 5 10 Inf 15

20 10 15 15 Inf

D =

10 10 15 5 20

10 10 5 5 10

15 5 10 10 15

5 5 10 10 15

20 10 15 15 20

Page 21: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 1 of 24

Greedy Algorithms

A greedy algorithm repeatedly executes a procedure which tries to maximize the return based on

examining local conditions, with the hope that the outcome will lead to a desired outcome for the global

problem. In some cases such a strategy is guaranteed to offer optimal solutions, and in some other cases

it may provide a compromise that produces acceptable approximations.

Typically, the greedy algorithms employ simple strategies that are simple to implement and require

minimal amount of resources.

Minimum Spanning Trees

Given a connected, undirected graph, a spanning tree of that graph is a sub graph which is a tree and

connects all the vertices together. A single graph can have many different spanning trees. A minimum

spanning tree (MST) or minimum weight spanning tree is then a spanning tree with sum of weights of

edges is less than or equal to the weight of every other spanning tree.

One example would be a cable TV company laying cable to a new neighbourhood. If it is constrained to

bury the cable only along certain paths, then there would be a graph representing which points are

connected by those paths. Some of those paths might be more expensive, because they are longer, or

require the cable to be buried deeper; these paths would be represented by edges with larger weights.

A spanning tree for that graph would be a subset of those paths that has no cycles but still connects to

every house. There might be several spanning trees possible. A minimum spanning tree would be one

with the lowest total cost.

Example –

Figure 1: A weighted graph

ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
Page 22: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 2 of 24

Figure 2 : MST of graph shown in Fig 1.

Prim’s Algorithm

In computer science, Prim's algorithm is an algorithm that finds a minimum spanning tree for a

connected weighted undirected graph. This means it finds a subset of the edges that forms a tree that

includes every vertex, where the total weight of all the edges in the tree is minimized. Prim's algorithm

is an example of a greedy algorithm.

Prim's algorithm has the property that the edges in the set A always form a single tree. We begin with

some vertex v in a given graph G =(V, E), defining the initial set of vertices A. Then, in each iteration, we

choose a minimum-weight edge (u, v), connecting a vertex v in the set A to the vertex u outside of set A.

Then vertex u is brought in to A. This process is repeated until a spanning tree is formed.

ADMIN
Highlight
ADMIN
Highlight
Page 23: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 3 of 24

Algorithm Prims( G,A,start)

// visited = [ 0 0 0 0 0 0….]

visited(start) = 1; for i=1:length(G) Min = inf; for j=1:length(G) if visited(j) == 0 & G(i,j)<Min Min = G(i,j); u = j; end end A(i,u) = G(i,u) visited(u)=1; end

Hint : Try this on MATLAB

O(V2)

Example illustrated below.

ADMIN
Highlight
Page 24: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 4 of 24

Page 25: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 5 of 24

Kruskal's algorithm

Kruskal's algorithm is an algorithm in graph theory that finds a minimum spanning tree for a connected

weighted graph. This means it finds a subset of the edges that forms a tree that includes every vertex,

where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it

finds a minimum spanning forest (a minimum spanning tree for each connected component). Kruskal's

algorithm is an example of a greedy algorithm.

It proceeds by listing the weights in increasing order, and then choosing those edges having the smallest

weights, with the one restriction that we never want to complete a circuit. In other words, as we go

along the sorted list of weights, we will always select the corresponding edge for our spanning tree

unless that choice completes a circuit.

ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
Page 26: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 6 of 24

Algorithm Kruskal( G)

visited=[1 2 3 4 5 6]; k=1; for i=1:length(G) for j=1:length(G) if(G(i,j)<inf) B(k,1)=G(i,j); B(k,2)=i; B(k,3)=j; k=k+1; end end end B = sortrows(B); % sorts by first element of row check MATLAB i=1; while(i<=length(B)) i=i+1; B(i,:)=[]; end for i=1:length(B) u=parent(B(i,2),visited); v=parent(B(i,3),visited); if(u~=v) A(B(i,2),B(i,3))= B(i,1); if u<v visited(v)=u; else visited(u)=v; end end end

function [y]=parent(y,visited) while (visited(y)~=y) y=visited(y); end

Hint : Try this on MATLAB

O(V2)

Page 27: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 7 of 24

Page 28: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 8 of 24

Page 29: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 9 of 24

Dijkstra's Single Source Shortest Path Algorithm

For a given source vertex (node) in the graph, the algorithm finds the path with lowest cost (i.e. the

shortest path) between that vertex and every other vertex. It can also be used for finding costs of

shortest paths from a single vertex to a single destination vertex by stopping the algorithm once the

shortest path to the destination vertex has been determined. For example, if the vertices of the graph

represent cities and edge path costs represent driving distances between pairs of cities connected by a

direct road, Dijkstra's algorithm can be used to find the shortest route between one city and all other

cities.

ADMIN
Highlight
ADMIN
Highlight
Page 30: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 10 of 24

Algorithm Dijkstras(G)

visited=[0 0 0 0 0 0];path=[0 0 0 0 0 0]; start=1;visited(start)=1;dest=4; for i=1:length(G) d(i)=G(start,i); path(i)=start; end

for i=1:length(G) min = inf; u=0; for j=1:length(G) if visited(j)==0 if d(j)<min min = d(j); u = j; end end end visited(u)=1;

for v=1:length(G) if visited(v)==0 if ((d(u)+G(u,v))<d(v)) d(v)=d(u)+G(u,v); path(v)=u; end end end

end

Hint : Try this on MATLAB

Page 31: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 11 of 24

O(V2)

Knapsack Problem

Given a set of items, each with a weight and a value, determine the number of each item to include in a

collection so that the total weight is less than a given limit and the total value is as large as possible. It

derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and

must fill it with the most useful items.

A thief robbing a store finds ‘n’ items; the i-th item is worth vi dollars and weighs wi pounds, where vi

and wi are integers. He wants to take as valuable a load as possible, but he can carry at most W pounds

in his knapsack for some integer W. Which items should he take? This is called the 0-1 knapsack problem

because each item must either be taken or left behind; the thief cannot take a fractional amount of an

item or take an item more than once.

In the fractional knapsack problem, the setup is the same, but the thief can take fractions of items,

rather than having to make a binary (0-1) choice for each item. You can think of an item in the 0-1

knapsack problem as being like a gold ingot, while an item in the fractional knapsack problem is more

like gold dust.

Fractional Knapsack

This is called the fractional knapsack problem because any fraction of each item can be used. Using a

greedy strategy to solve this problem, we pack the items in order of their benefit/weight value.

O/1 Knapsack

This is called the 0-1 knapsack problem because each item must be taken in its entirety. If we use a

greedy strategy to solve this problem, we would take the objects in order of their benefit/weight value.

ADMIN
Highlight
ADMIN
Highlight
Page 32: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 12 of 24

b) 0/1 Knapsack Solution

c) Fractional Solution

Algorithm FractionalKnapsack(P,W)

clear;clc;

P=[ 25 24 15];W=[18 15 10]; m=20;

X=[0 0 0];k=1; for j=1:length(P)

B(k,1)=P(j)/W(j); B(k,2)=W(j); B(k,3)=j; k=k+1;

end B=sortrows(B,-1); for i=1:length(B) if B(i,2)<m X(i)=1; m=m-B(i,2); else break; end end if i<=length(B) X(i)=m/B(i,2) end

Hint : Try this on MATLAB

O(V2)

Page 33: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 13 of 24

Job Scheduling with Deadlines and Profits

We have n jobs, each of which takes unit time, and a processor on which we would like to schedule

them in as profitable a manner as possible. Each job has a profit associated with it, as well as a deadline;

if the job is not scheduled by its deadline, then we don’t get its profit. Because each job takes the same

amount of time, we will think of a schedule S as consisting of a sequence of job “slots” 1, 2, 3, . . ., where

S(t) is the job scheduled in slot t.

Formally, the input to the problem is a sequence of pairs (d1, g1), (d2, g2), . . . , (dn, gn) where gi is a non-

negative real number representing the profit obtainable from job i, and di is the deadline for job i.

A schedule is an array S(1), S(2), . . . , S(d) where d = max di (i.e., the latest deadline, beyond which no

jobs can be scheduled), such that if S(t) = i, then job i is scheduled at time t, 1 < t < d. If S(t) = 0, then no

job is scheduled at time t.

Algorithm JobSchedule(B)

//B=[99 2;67 3;45 1;34 4;23 5;10 3;]; //B(profit,deadline) B=sortrows(B,-1); [Y I]=max(B); d=B(I(1,2),2); for i=1:d S(i)=0; end for i=1:length(B)

if(S(B(i,2))==0) S(B(i,2))=B(i,1); end

end

Hint: Try in MATLAB

O(n)

We have n jobs to execute, each one of which takes a unit time to process. At any time instant we can

do only one job. Doing job i earns a profit pi. The deadline for job i is di. Suppose n = 4; p = [50, 10, 15,

30]; d = [2, 1, 2, 1]. It should be clear that we can process no more than two jobs by their respective

deadlines. The set of feasible sequences are:

Page 34: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 14 of 24

Optimal Storage on Tapes

Consider ‘n’ programs that are to be stored on a tape of length L. Each program I is of length li where i

lies between 1 and n.All programs can be stored on the tape iff the sum of the lengths of the programs is

at most L. It is assumed that, whenever a program is to be retrieved the tape is initially positioned at the

start.

Let tj be the time required retrieving program ij where programs are stored in the order I = i1, i2, i3, …,in.

The time taken to access a program on the tape is called the mean retrieval time (MRT)

i.e tj = ∑ ikl k=1,2,...j

Now the problem is to store the programs on the tape so that MRT is minimized. From the above

discussion one can observe that the MRT can be minimized if the programs are stored in an increasing

order i.e., l1 < l2 < l3, … ln. Hence the ordering defined minimizes the retrieval time.

Assume that 3 sorted files are given. Let the length of files A, B and C be 7, 3 and 5 units respectively. All

these three files are to be stored on to a tape S in some sequence that reduces the average retrieval

time. The table shows the retrieval time for all possible orders.

Page 35: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 15 of 24

Huffman Coding

Huffman code is a technique for compressing data. Huffman's greedy algorithm looks at the occurrence

of each character and it as a binary string in an optimal way.

Suppose we have a data consists of 100,000 characters that we want to compress. The characters in the

data occur with following frequencies.

a b c d e f

Frequency 45,000 13,000 12,000 16,000 9,000 5,000

Consider the problem of designing a "binary character code" in which each character is represented by a

unique binary string.

a b c d e f

Frequency 45,000 13,000 12,000 16,000 9,000 5,000

Fixed Length code 000 001 010 011 100 101

Variable length 0 101 100 111 1101 1100

In a fixed-length code each codeword has the same length. In a variable-length code codewords may

have different lengths. Here are examples of fixed and variable length codes for our problem.

Page 36: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 16 of 24

Fixed-length code requires 300,000 bits while variable code requires 224,000 bits.

Page 37: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 17 of 24

Page 38: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 18 of 24

Divide and Conquer

A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-

problems of the same (or related) type, until these become simple enough to be solved directly. The

solutions to the sub-problems are then combined to give a solution to the original problem.

This technique is the basis of efficient algorithms for all kinds of problems, such as sorting (e.g.,

quicksort, merge sort), multiplying large numbers (e.g. Karatsuba), syntactic analysis (e.g., top-down

parsers), and computing the discrete Fourier transform (FFTs).

1. Divide the problem (instance) into sub-problems.

2. Conquer the sub-problems by solving them recursively.

3. Combine sub-problem solutions.

Finding Max and Min

MIN-MAX (A,n)

if |A| = 1, then

return min = max = A[0]

end

A1 = [ A(1), A(2), …. , A(n/2)]; A2 = [A(n/2 + 1),……,A(n)];

(min1; max1) := MIN-MAX (A1);

(min2; max2) := MIN-MAX (A2);

if min1 < min2 then

return min = min1

else

return min = min2

end

if max1 > max2 then

return max = max1

else

return max = max2

end

T(n) = 2 T(n/2) + 2

Multiplying Large Integers

The standard integer multiplication routine of two n-digit numbers involves ‘n’ multiplications of an n-

digit number by a single digit, plus the addition of n numbers, which have at most 2n digits. All in all,

ADMIN
Highlight
ADMIN
Highlight
Page 39: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 19 of 24

assuming that each addition and multiplication between single digits takes O(1), this multiplication takes

O(n2).

Imagine multiplying an n-digit number by another n-digit number, where n is a perfect power of 2. (This

will make the analysis easier.) We can split up each of these numbers into two halves.

Say, ‘A’ and ‘B’ are the numbers. We can split A into (AL x 10n/2

+ AR) and B into (BL x 10n/2

+ BR)

So A X B = = AL x BL x 10n + (AL x BR + AR x BL) x 10n/2 + AR x BR

Written in this manner we have broken down the problem of the multiplication of 2 n-digit numbers

into 4 multiplications of n/2- digit numbers plus 3 additions. Thus, we can compute the running time

T(n) = 4T(n/2) + O (n)

This has the solution of T(n) = O(n2) by the Master Theorem.

Now, the question becomes, can we optimize this solution in any way. In particular, is there any way to

reduce the number of multiplications done?

P1 = (AL+AR) x (BL+BR) = ALBL+ (ALBR+ARBL) +ARBR = P2 + (P1-P2-P3) + P3

A x B = P2 x 10n + (P1 - P2 – P3) x 10

n/2 + P3

Now, consider the work necessary in computing P1, P2 and P3. Both P2 and P3 are n/2-digit

multiplications. But, P1 is a bit more complicated to compute. We do two n/2 digir additions, (this takes

O(n) time), and then one n/2-digit multiplication. (Potentially, n/2+1 bits…)

After that, we do two subtractions, and another two additions, each of which still takes O(n) time. Thus,

our running time T(n) obeys the following recurrence relation:

T(n) = 3T(n/2) + O(n).

The solution to this recurrence is T(n) = θ(n (log2

3)), which is approximately T(n) = O(n1.585), a

improvement.

Although this seems it would be slower initially because of some extra pre-computing before doing the

multiplications, for very large integers, this will save time.

Algorithm IntegerMultiplication(A,B, n)

If n = 1

Return A x B

Else

P=A / 10n/2;Q=A mod 10n/2;

R= B /10n/2;S= B /10n/2;

P1 = IntegerMultiplication(P+Q,R+S,n/2)

P2 = IntegerMultiplication(P,R,n/2)

ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
Page 40: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 20 of 24

P3 = IntegerMultiplication(Q,S,n/2)

return P2x10n+(P1-P2-P3)x10n/2+P3

End

Strassen’s Matrix Multiplication

Suppose we want to multiply two matrices of size N x N: for example A x B = C.

C11 = a11b11 + a12b21

C12 = a11b12 + a12b22

C21 = a21b11 + a22b21

C22 = a21b12 + a22b22

2x2 matrix multiplication can be accomplished in 8 multiplications.

So, T(n) = 8 T(n/2) + C = O(n log

28) = O(n

3)

Strassen showed that 2x2 matrix multiplication can be accomplished in 7 multiplication and 18

additions or subtractions.

T(n) = 7 T(n/2) + C = O(n2.81

)

P1 = (A11+ A22)(B11+B22)

P2 = (A21 + A22) * B11

P3 = A11 * (B12 - B22)

P4 = A22 * (B21 - B11)

P5 = (A11 + A12) * B22

P6 = (A21 - A11) * (B11 + B12)

P7 = (A12 - A22) * (B21 + B22)

C11 = P1 + P4 - P5 + P7

C12 = P3 + P5

C21 = P2 + P4

C22 = P1 + P3 - P2 + P6

ADMIN
Highlight
Page 41: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 21 of 24

Closest Pair of Points

Given a set of n points (x , y) the problem asks what is the distance between the two closest points. A

divide-and-conquer algorithm can sort the points along the x-axis, partition the region into two parts Rleft

and Rright having equal number of points, recursively apply the algorithm on the sub-regions, and then

derive the minimal distance in the original region.

The closest pair resides in the left region, the right region, or across the borderline. The last case needs

to deal only with points at distance = min( left, right) from the dividing line, where right and right are the

minimal distances for the left and right regions, respectively.

The points in the region around the boundary line are sorted along the y coordinate, and processed in

that order. The processing consists of comparing each of these points with points that are ahead at most

in their y coordinate. Since a window of size × 2 can contain at most 6 points, at most five distances

need to be evaluated for each of these points.

The sorting of the points along the x and y coordinates can be done before applying the recursive divide-

and-conquer algorithm; they require O(n log n) time.

The processing of the points along the boundary line takes O(n) time. Hence, the recurrence equation

for the time complexity of the algorithm:

The solution of the equation is T(n) = O(n log n).

ADMIN
Highlight
Page 42: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 22 of 24

Page 43: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 23 of 24

MATLAB Implementation

function [d]=closestpair(A,i,j,n) if (n==2) d=((A(i,1)-A(j,1))^2+(A(i,2)-A(j,2))^2)^0.5; else dleft=closestpair(A,1,n/2,n/2); dright=closestpair(A,1+(n/2),n,n/2); d=min(dleft,dright); for i=1:n/2 for j=1+(n/2):n if (A(i,1) < A(j,1)+d) & (A(i,2)+d > A(j,2)) & (A(j,2) > A(i,2)-

d) d1=((A(i,1)-A(j,1))^2+(A(i,2)-A(j,2))^2)^0.5; if d1<d d = d1 A(i,:) A(j,:) end end end end end

clc;clear;

A=[ 2.1 3;1 1;2.2 2.5;4 4;4 3;3 3;3 2;3 1;]; A=sortrows(A); closestpair(A,1,length(A),length(A));

Another implementation -

function closest_pair (P: point set; n: in

teger )

return float is

DELTA-LEFT, DELTA-RIGHT : float;

DELTA : float;

begin

if n = 2 then

return distance from p(1) to p(2);

else

P-LEFT := ( p(1), p(2) ,..., p(n/2) );

P-RIGHT := ( p(n/2+1), p(n/2+2) ,..., p(n) );

DELTA-LEFT := closestpair( P-LEFT, n/2 );

DELTA-RIGHT := closestpair( P-RIGHT, n/2 );

DELTA := minimum ( DELTA-LEFT, DELTA-RIGHT );

--*********************************************

Determine whether there are points p(l) in

P-LEFT and p(r) in P-RIGHT with

distance( p(l), p(r) ) < DELTA. If there

are such points, set DELTA to be the smallest

distance.

--**********************************************

return DELTA;

end if;

Page 44: III Test Dsa

Greedy Algorithms & Divide and Conquer

Page 24 of 24

end closest_pair;

for i in 1..s loop

for j in i+1..s loop

exit when (| x(i) - x(j) | > DELTA or

| y(i) - y(j) | > DELTA);

if distance( q(i), q(j) ) < DELTA then

DELTA := distance ( q(i), q(j) );

end if;

end loop;

end loop;

Page 45: III Test Dsa

Searching

Page 1 of 6

Searching

Computer systems are often used to store large amounts of data from which individual records must be

retrieved according to some search criterion. Thus the efficient storage of data to facilitate fast

searching is an important issue. In this section, we shall investigate the performance of some searching

algorithms and the data structures which they use.

Sequential Search

If there are n items in our collection - whether it is stored as an array or as a linked list - then it is

obvious that in the worst case, when there is no item in the collection with the desired key, then n

comparisons of the key with keys of the items in the collection will have to be made.

Algorithm LinearSearch(A,Item)

For i =1 to length(A)

if A[i] = item

Print “Item Found”

Break;

Else if A[i] != item

i = i + 1

Else

Print “item not found”

End

End

Binary Search

However, if we place our items in an array and sort them in either ascending or descending order on the

key first, then we can obtain much better performance with an algorithm called binary search.

In binary search, we first compare the key with the item in the middle position of the array. If there's a

match, we can return immediately. If the key is less than the middle key, then the item sought must lie

in the lower half of the array; if it's greater then the item sought must lie in the upper half of the array.

So we repeat the procedure on the lower (or upper) half of the array.

Algorithm BinaryIterativeSearch(A,Item)

// ‘A’ sorted array

Low = 1

Hi =n

While (Low<= Hi)

Mid = (Low + Hi)/2

If A[Mid] == Item

ADMIN
Highlight
Page 46: III Test Dsa

Searching

Page 2 of 6

Print “Item Found”

Elseif item<A[Mid]

Hi = Mid - 1

Else

Low = Mid + 1

End If

Wend

Each step of the algorithm divides the block of items being searched in half. We can divide a set of n

items in half at most log2 n times.

Thus the running time of a binary search is proportional to log n and we say this is a O(logn) algorithm.

Page 47: III Test Dsa

Searching

Page 3 of 6

Interpolation Search

Interpolation search is an extension of the binary search. The basic idea is similar to leafing through the

telephone book where one don’t just chose the middle element of the search area but decides where it

is most likely to be judging by the range limits.

To that end the difference of the first element of the interval (the smallest element) to the element to

be searched is divided by the span of the interval values in relation to the interval length. Multiplied

with the interval boundaries that then is the interpolated position of the element to be searched.

With approximately evenly distributed values, the expected complexity of the Interpolation Search is

O(log log n)

Algorithm interpolationSearch(A, item){

low = 1;

hi = n;

while (A[low] <= item && A[hi] >= item) {

mid = low + ((item - A[low]) * (hi - low)) / (A[hi] - A[low]);

if (A[mid] < item)

low = mid + 1;

else if (A[mid] > item)

hi = mid - 1;

else

return mid;

}

if (A[low] == item)

return low;

else

ADMIN
Highlight
Page 48: III Test Dsa

Searching

Page 4 of 6

return -1; // Not found

}

Hashing

Hashing is a method for storing and retrieving records from an array. It lets you insert, delete, and

search for records based on a search key value. When properly implemented, these operations can be

performed in constant time i.e O(1). In fact, a properly tuned hash system typically looks at only one or

two records for each search, insert, or delete operation. This is far better than the O(log n) time required

to do a binary search on a sorted array of n records, or the O(log n) time required to do an operation on

a binary search tree. However, even though hashing is based on a very simple idea, it is surprisingly

difficult to implement properly.

A hash system stores records in an array called a hash table. Hashing works by performing a

computation on a search key K in a way that is intended to identify the position in Hash table that

contains the record with key K. The function that does this calculation is called the hash function, and is

usually denoted by the letter ‘h’. Since hashing schemes place records in the table in whatever order

satisfies the needs of the address calculation, records are not ordered by value. A position in the hash

table is also known as a slot. The number of slots in hash table hash table will be denoted by the variable

M with slots numbered from 0 to M - 1.

The goal for a hashing system is to arrange things such that, for any key value K and some hash function

h, i = h(K) is a slot in the table such that 0 <= i < M, and we have the key of the record stored at A[i]

equal to K.

Hashing generally takes records whose key values come from a large range and stores those records in a

table with a relatively small number of slots. Collisions occur when two records hash to the same slot in

the table. If we are careful — or lucky — when selecting a hash function, then the actual number of

collisions will be few. Unfortunately, even under the best of circumstances, collisions are nearly

unavoidable.

Thus, hashing implementations must include some form of collision resolution policy. Collision

resolution techniques can be broken into two classes: open hashing (also called separate chaining) and

closed hashing (also called open addressing). The difference between the two has to do with whether

collisions are stored outside the table (open hashing), or whether collisions result in storing one of the

records at another slot in the table (closed hashing).

Open Hashing

The simplest form of open hashing defines each slot in the hash table to be the head of a linked list. All

records that hash to a particular slot are placed on that slot's linked list.

The hash function used in this example is h(K) = K mod Array_Size

ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
Page 49: III Test Dsa

Searching

Page 5 of 6

Linear Probing (Closed Hashing)

During insertion, the goal of collision resolution is to find a free slot in the hash table when the home

position for the record is already occupied. We can view any collision resolution method as generating a

sequence of hash table slots that can potentially hold the record. The first slot in the sequence will be

the home position for the key. If the home position is occupied, then the collision resolution policy goes

to the next slot in the sequence. If this is occupied as well, then another slot must be found, and so on.

This is an example of a technique for collision resolution known as linear probing.

In fact, linear probing is one of the worst collision resolution methods. The main problem is illustrated

by the figure below. Here, we see a hash table of ten slots used to store four-digit numbers. The hash

function used is h(K) = K mod 10. The four values 1001, 9050, 9877, and 2037 are inserted into the table.

ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
Page 50: III Test Dsa

Searching

Page 6 of 6

In the above example you can see that 9877 is occupying slot 7, the next number 2037 when it is

entered does not find a free slot at slot 7 as it is already occupied hence the number will be pushed to

the next free slot i.e slot 8.

This tendency of linear probing to cluster items together is known as primary clustering. Small clusters

tend to merge into big clusters, making the problem worse.

Multiple Hashing or Rehash function

A second function can be used to select a second table location for the new item. If that location is also

in use, the rehash function can be applied again to generate a third location, etc. The rehash function

has to be such that when we use it in the lookup process we again generate the same sequence of

locations. Since the idea in a hash table is to do a lookup by only looking at one location in the table,

rehashing should be minimized as much as possible. In the worst case, when the table is completely

filled, one could rehash over and over until every location in the table has been tried. The running time

would be horrible!

ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight
ADMIN
Highlight