algorithms

Sorting and Searching Algorithms

Cutajar & Cutajar

Algorithms 2

Sorting

Sorting means the arrangement of records in a particular order, according to a specified sorting key.

For simplicity we will consider sorting of an array of records, all present in main memory.

Generally sorting involves a complexity of N2 where N is the number of records. This means that in a list of N records, sorting normally involves scanning all the N records for N times.

ID Card Surname

712364 Cutajar

345222 Zammit

778879 Sammut

453211 Abela

ID Card Surname

345222 Zammit

453211 Abela

712364 Cutajar

778879 Sammut

SortingSort key

Algorithms 3

Considerations For Choosing A Sorting Method

Number of items to be sorted.

Initial arrangement of data

Complexity of algorithm (hence its implementation)

The relative speed of a given method

Size of data items to be sorted.

Algorithms 4

Selection Sort

This is a simple technique similar to the one performed by pencil and paper.

It consists of repeatedly looking through a data array to find the lowest key (for sorting in ascending order). This element is than written to another array. The data element is than cancelled from the original array. ( sometimes a rogue value is just written)

This procedure is repeated until all the records are sorted out of the original data array.

data

array

new

array

pass 1 pass 2 pass 3

data new data new data new

234 234 156 156 156

320 320 320 234 234

156 320

n = 3

n = 3

Algorithms 5

Selection Sort Algorithm

In this algorithm, we use the same array instead of two arrays. We scan the data array and find the smallest item and place it in the first place, The smallest element is swapped with the considered element. We repeat with the second item and place it in the second place … etc.

Consider the following data array:

123

213

456

145

431

100

123

100

100

213

456

145

431

123

213

123

100

123

456

145

431

213

456

145

100

123

145

456

431

213

456

213

100

123

145

213

431

456

431

431

smallest element

pass 1 pass 2 pass 3 pass 4 pass 5

Algorithms 6

Pseudo-Code

Program Selection_Sort

use variables: numbers array[n] of type integer

temp, count, pass, lowest of type integer

for pass := 1 to n do

lowest := pass

for count := pass to n-1 {find smallest}

if numbers[count+1] < numbers[lowest]

lowest := count+1

end for

temp := numbers[pass] {swap current with smallest}

numbers[pass] := numbers[lowest]

numbers[lowest] := temp

end for

end program

Algorithms 7

Insertion Sort

In this algorithm the first two elements of the array are compared and arranged in order. The third element is compared with the first two and inserted in its correct position. This process is repeated until every element in the list has been inserted in its correct position. This method is similar to when we sort cards by hand.

Initial array pass 1 pass 2 pass 3 pass 4 pass 5

21 16 16 16 16 12

16 21 21 21 19 16

35 35 35 21 19

47 47 35 21

19 47 35

12 47

Algorithms 8

Insertion Sort Flowchart

Flowchart Notation: CP(item) : item in location pointed at by CP PP(item) : item in location pointed at by PP CP : value of pointer pointing to the number

currently being inserted PP : value of pointer pointing to the numbers (in

ordered list) that are being compared to CP(item)

N

CP:=1

Start

PP<=CP?

Temp:=CP(item)Exchange

consecutive items down to

PP+1

PP(item):=temp

PP:=PP+1

CP=max?

End

CP:=CP+1PP:=1

Y N

Y

PP(item)>CP(item)?

Y

N

Algorithms 9

Insertion Sort Algorithm

123

213

456

145

431

100

initially

123

213

456

145

431

100

pass 1

123

213

456

145

431

100

pass 2

123

213

456

145

431

100

pass 3

123

213

456

431

100

145

123

213

456

431

100

123

145

213

456

431

100

pass 4

431

123

145

213

456

431

100

123

145

213

456

100

123

145

213

456

100

123

145

213

431

456

100

pass 5

100

123

145

213

431

456

123

145

213

431

456

100

123

145

213

431

456

123

145

213

431

456

100

In this algorithm we use the same array and shift the data downwards to make space for the insertion

Algorithms 10

Pseudo Code

Program Insertion_Sort

use variables: numbers array[n] of type integer

current, pass, position of type integer

for pass := 1 to n-1 do

current := numbers[pass]

position := pass

while (position > 0 AND numbers[position-1] > current )

numbers[position]:=numbers[position-1] {shift down}

position := position-1

end while

numbers[position] := current {insert current element in space}

end for

end program

Algorithms 11

Bubble Sort

This is so called because the smallest element rises to the top of the array and then the next smallest „bubbles‟ up to the next position and so on.

On the first pass the last two elements (n) and (n-1) of the array are compared and exchanged if necessary. This process is repeated with the (n-1) and (n-2) elements and then with the (n-2) and (n-3) and so on until the smallest element arrives at the top of the array.

The next pass repeats the same procedure with the whole array except the already sorted elements.

The passes end when the array is completely sorted

Algorithms 12

Bubble Sort Algorithm

3

2

0

4

1

pass 1initially

3

2

0

1

4

3

2

0

1

4

3

0

2

1

4

0

3

2

1

4

0

3

2

1

4

0

3

1

2

4

0

1

3

2

4

pass 2

0

1

3

2

4

0

1

2

3

4

pass 3

0

1

2

3

4

pass 4

0

1

2

3

4

final

Algorithms 13

Pseudo-Code

Program bubble_sort

use variables numbers array[n] of type integer

temp, pass, current, of type integer

for pass := 1 to n-2 do

for current := n downto pass+1 do

if numbers[current] < numbers[current-1]

then swap(numbers[current],numbers[current-1])

end for

end for

end program

Procedure swap (parameters: variable a,b of type integer)

use variables temp of type integer

temp := a

a := b

b := temp

end procedure

Algorithms 14

Computational Complexity

Computational Complexity or simply Complexity is the number of steps or arithmetic operations required to solve the posed problem.

The interesting aspect is usually how complexity scales with the size of the input (the "scalability"), where the size of the input is described by some number N.

Thus an algorithm may have computational complexity O(N2) (of the order of the square of the size of the input), in which case if the input doubles in size, the computation will take four times as many steps. The ideal is a constant time algorithm (O(1)) or failing that, O(log2N) or O(N).

O(N2)

Algorithms 15

The Big-O Analysis

In complexity O denotes “in the order of”

It gives us an idea of the proportion of the efficiency of an algorithm with N and not an equality.

Thus if the execution steps of an algorithm is given by:

a.N + b we say it is of the order of N or O(N)

a.(log2N) + b we say if is of the order of log2N or O(log2N)

a.N.(log2N) + b we say if is of the order of N.log2N or O(N.log2N)

a.N2 + b we say it is of the order of N2 or O(N2)

Or simply a we say it is constant or O(1).

If the complexity has an order higher then polynomial, say in exponential relation with N, it is considered impossible due to the large amount of time required to perform the required task.

Algorithms 16

Procedure Example2

…

…

For i : = 1 to 2N do

For j := 1 to N do

…

…

End for

End for

End Procedure

Consider the following codes:

Procedure Example1

…

…

For j := 1 to 2N do

…

End for

For i := 1 to N do

…

End for

End Procedure

Complexity Example

The execution steps would be b + a.N , where a = 2a1+a2 , and so the complexity is still O(N)

b steps

The execution steps would be b + 2.a.N2

and so the complexity is still O(N2)

a steps

b steps

a1 steps

a2 steps

Algorithms 17

Complexity Classes

Most algorithms fall into one of the following types:

Type Complexity Comment

Logarithmic O(log2N) Very Good

Linear O(N) Good

Linear-Logarithmic O(N.Log2N) Fairly Good

Quadratic O(N2) Ok

Polynomial O(Nk) k ≥ 1 Poor

Exponential O(aN) a > 1 Awful

Algorithms 18

Complexity of Simple Sorts

To analyse the complexity of the sorting algorithms seen so far, i.e. Insertion Sort, Selection Sort and Bubble Sort, let us analyse the number of comparisons made for a general list of N elements.

In all these algorithms the sum of comparisons is:

N-1 for the pass 1

N-2 for the pass 2

N-3 for the pass 3

…

2 for pass N-2

1 for pass N-1

Sum = N-1 + N-2 + N-3 + … + 2 + 1

Or 1 + 2 + 3 + … + N-2 + N-1

If we add the two lines above we get 2*Sum

2*Sum = N + N + N + … + N + N

i.e. N for (N-1) times

Thus 2*Sum = N*(N-1) Therefore Sum = N*(N-1)

2

Which could have been obtained by the sum of an arithmetic progression.

Thus

Cplx = O(N2)

Algorithms 19

Quick Sort

Complex type routine used when list of items to be sorted is large.

Although Quicksort is faster than other methods when sorting large amounts of data, it is often slower (depending on both the implementation and the starting order) with less than about a dozen items. Hence quicksort programs sometimes include a switch to another method whenever the number remaining to be sorted drops below some arbitrary figure.

Additionally if the unsorted list is already somewhat ordered the quicksort method becomes somewhat inefficient – the worst case for quicksort being an input list which is already in order!

Algorithms 20

Algorithm

Select an item, usually the first item, of the unsorted list. This is called the Pivot

Partition the remaining items into TWO sublists.

A LEFT SUBLIST, with data items LESS than the selected item

A RIGHT SUBLIST, with data items GREATER than the selected item.

Place the pivot between these two sublists.

If left sublist contains more than one item

Then Quicksort the left sublist

If right sublist contains more than one item

Then Quicksort the right sublist.

Algorithms 21

How to Partition

Pivot Value:= Table[First]

Up := First

Down = Last

Repeat

Increment Up until Table[Up] > Pivot Value

Decrement Down until Table[Down] <= Pivot Value

If Up < Down exchange their values

Until Up >= Down

Exchange Table[First] and Table[Down]

Define Pivot Index as Down

Algorithms 22

Partitioning Example

44 76 23 43 55 12 64 77 33

First LastUp Down

44 76 23 43 55 12 64 77 33

First LastUp Down

Pivot Value:= Table[First] ; Up := First ; Down = Last

Increment Up until Table[Up] > Pivot ValueDecrement Down until Table[Down] <= Pivot Value

If Up < Down Exchange their values

44 33 23 43 55 12 64 77 76

First LastUp Down

Pivot = 44

Pivot = 44

Pivot = 44

Table

Table

Table

Algorithms 23

Partitioning (cont…)

44 33 23 43 55 12 64 77 76

First LastUp Down

Up is < Down so Continue


If Up < Down Exchange their values

44 33 23 43 55 12 64 77 76

First LastUp Down

44 33 23 43 12 55 64 77 76

First LastUp Down

Pivot = 44

Pivot = 44

Pivot = 44

Algorithms 24


44 33 23 43 12 55 64 77 76

First LastUpDown

Up is < Down so Continue


Now Down < Up so exchange pivot value with Table[Down]

44 33 23 43 12 55 64 77 76

First LastUp Down

12 33 23 43 44 55 64 77 76

First LastUpDown

Pivot = 44

Pivot = 44

Pivot = 44

Table

Table

Table

Algorithms 25


Note that all values under the Pivot Index are smaller than the Pivot Value and all values above the Pivot Index are Larger than the Pivot Value

This gives us two sub-arrays to re-partition

12 33 23 43 44 55 64 77 76

Pivot Index

Pivot = 44Table

Last 1First 1 First 2 Last 2

Partition 1 Partition 2

Algorithms 26

Quick Sort Algorithm

Procedure QuickSort( use variables First, Last : integer)

Use Varianbles PivIndex: integer;

If (First < Last) then

PivIndex = Partition(First,Last);

QuickSort(First, Pivindex-1);

QuickSort(Pivindex+1, Last);

Endif

End Procedure;

Algorithms 27

Quick Sort Example

Consider the following list

12 33 23 43 44 55 64 77 76

12 33 23 43 55 64 77 76

23 33 43 64 77 76

76 774323

77

12 23 33 43 44 55 64 76 77

Algorithms 28

Complexity of Quick Sort

Let us analyse the number of comparisons that are made in this algorithm:

N for the first pass where all the elements are compared with the pivot

2*N/2 for the next pair of passes where N/2 elements in each “half” of the original array are compared to their own pivot values.

4*N/4 for the next four passes where N/4 elements in each “quarter” of the original array are compared to their own pivot values.

How Many Partitions Occur ??

Algorithms 29

How Many Partitions

It depends on the order of the original array elements: If each partition divides the sub-array

approximately in half, there will be only log2N partitions made, and so Quicksort is O(Nlog2N).

But, if the original array was sorted to begin with, the recursive calls will partition the array into parts of unequal length, with one part empty, and the other part containing all the rest of the array except for the pivot value itself. In this case, there can be as many as N-1 partitions made, and QuickSort will have O(N2).

Best Case: 3 (log28)

comparisons of 8 elements

Worst Case: 7 comparisons of 8 elements

Algorithms 30

Comparison Of Sorting Routines

Relative speeds for sorting random integers, using different methods.

Bubble Sort

Insertion Sort

Selection Sort

Quick Sort

Algorithms 31

Merge Sort

In MergeSort, the list to be sorted is successively subdivided in two until the number of elements in the sub-list remain one or two.

Subsequently they are merged together in order in such a way that after successive merges the whole list is recomposed in the desired sorting order.

This algorithm lends itself well to a recursive method of programming.

Algorithms 32

Merge Sort Algorithm

1. If the input sequence has fewer than two elements, return. 2. Partition the input sequence into two halves. 3. Sort the two subsequences using the same algorithm. 4. Merge the two sorted subsequences to form the output sequence.

MergeSort(list, first, last) if (first < last)

middle = (first + last) div 2 MergeSort(list, first, middle) MergeSort(list, middle+1, last) Merge(list, first,middle, last)

endif

44 76 23 43 55 12 64 77 33

First Last

list

Algorithms 33

The Merge Algorithm (Part I)

void Merge(use variables A[] array of integer; f, m, l : integer)

first1 := f; last1 := m; first2 := m+1; last2 := last;

index = first1;

B[SIZE] : array of integer;

while((first1 <= last1) && (first2 <= last2))

if(A[first1] < A[first2]) then

B[index] = A[first1];

first1:= first1+1;

else


first2:=first2+1;

Endif

index := index+1;

End While

Last1

44 76 23 43

First1 First2 Last2

A

23 43 44 76B

f l

Algorithms 34

The Merge Algorithm (Part II)

while(first1 <= last1)


first1:=first1+1;

index:=index+1;

Endwhile

while(first2 <= last2)


first2:= first2+1;

index:=index+1;

Endwhile

For index := f to l

A[index] = B[index];

finish off first sub-array if necessary

finish off second sub-array if necessary

Copy Temporary array back to original array

Algorithms 35

Merge Sort Complexity

The entire array can be subdivided into halves only log2N times.

Each time it is subdivided, function Merge is called to re-combine the halves. Function Merge uses a temporary array to store the merged elements.

Merging is O(N) because it compares each element in the sub-arrays.

Copying elements back from the temporary array to the values in the array is also O(N)

Thus Merge Sort is O(N.log2N)

Algorithms 36

Tree Sort (Tournament Sort)

Algorithm:

Transform the unsorted list into a binary search tree.

In a binary search tree, every node has the following property: All of its left descendants are smaller in value than the value

of the node itself, and all of its right descendants are larger than its value.

Traverse the resultant binary search tree in order.

Algorithms 37

Tree Sort Example

Consider the following unsorted list of numbers:

27 48 13 50 39 77 82 91 65 19 70 66

Creating the binary search tree for this given list:

27

13

48

50

39

77

82

91

65

19

70

66

Traversing this resultant tree in order we get the sorted list:

13 19 27 39 48 50 65 66 70 77 82 91.

Algorithms 38

Tree Sort Complexity

Building of the tree has a complexity of O(N). This is done just one time in the algorithm.

To read back the tree into a sorted array the visit of the tree must be performed N times with a complexity depending on the distribution of the tree.

In the best case the tree is perfectly balanced, i.e. the maximum difference between the lowest leaf and the highest leaf is 1 level. Then it would require log2N for each element thus leading to a total complexity of O(N.log2N)

In the worst case the tree is totally unbalanced, which is similar to a simple linked list. This would lead an average time of N/2 to read each of the N elements thus giving an overall complexity of O(N2). This occurs when the original list is already ordered and thus all children are placed on the right of the parent node

Note that this algorithm requires N extra memory space to build the tree.

Algorithms 39

Complexity Summary

Sorting Algorithm Best Case Average Case Worst Case

Selection Sort O(N2) O(N2) O(N2)

Insertion Sort O(N2) O(N2) O(N2)

Bubble Sort O(N2) O(N2) O(N2)

Merge Sort O(N.log2N) O(N.log2N) O(N.log2N)

Quick Sort O(N.log2N) O(N.log2N) O(N2)

Tree Sort O(N.log2N) O(N.log2N) O(N2)

Algorithms 40

Comments on Sorting Algorithms

Bubble, Insertion and Selection sort routines are preferable when list of items to be sorted consists of a few elements.

Bubble sort is the slowest in execution but the easiest method (and simplest implementation).

Insertion and Selection sorts have approximately the same speeds and both are usually marginally faster than a bubble sort, implementation (programs) are short and simple (advantage)

Quick Sort and Tree Sort are far faster than the above methods when large quantities of items are to be sorted. They both suffer from the initial arrangement of the list problem.

Merge Sort doesn‟t suffer from the initial arrangement problem. However Merge Sort and Tree Sort require an extra memory space for the temporary array or the binary tree. However the algorithms for the later three (and implementation) are much more complex.

Algorithms 41

Linear Search

This can be a matter of looking through the array, element by element sequentially until the required key is found.

Since the particular key may be absent from the unsorted array, searching the array may require the search through the entire array.

Thus the complexity of this type of searching is of the order of N, O(N).

This is clearly an inefficient method, called direct or linear search, and is only used for small arrays.

Sorting the array on the search key can improve the efficiency by stopping when the the particular key is found or a greater one is found instead.

342

234

123

675

455

664

344

888

645

456

456

456

N

Algorithms 42

Pseudo-Code

Program direct_search

use variables: numbers array[N] of type integer

key, item of type integer

found of type boolean

found := FALSE

repeat

if key = numbers[item] then found := TRUE

else item := item + 1

until found = TRUE OR item = N

end program

Algorithms 43

Binary Search

This algorithm, although performing more computation is more efficient since it performs a less number of comparisons.

The method involves finding the center of the array and comparing the search key with that element. If the search key is equal then the element is found.

If the search key is less than the element then the key must be in the lower half of the array so the same procedure is repeated on the lower half.

If the search key is greater than the element then the key must be in the upper half of the array so the same procedure is repeated on the upper half.

An allowance must be taken for the case where the key doesn‟t exist in the array

The complexity of this algorithm is thus log2(N)

NOTE: This algorithm works only on sorted arrays !

Algorithms 44

S NOT FOUND

Binary Search Flowchart

Output

„found‟

S FOUND

Start

Consider the entire input list l

Compare S to middle item of considered list

Is Considered list empty?

Output „not found‟

End

YESNO

Is S < Considered item?

Take the considered list as the lower half of

the list

YES NO

Take the considered list as the upper half of

the list

Algorithms 45

Binary Search Algorithm

123

234

342

344

455

456

645

664

675

888

456

pass 1

123

234

342

344

455

456

645

664

675

888

456

pass 2

123

234

342

344

455

456

645

664

675

888

pass 3

456

Algorithms 46

Pseudo-Code

Program Binary_Search

use variables numbers array[N] of type integer

start, middle, end, key of type integer

found of type boolean

start := 0

end := N-1

found := false

repeat

middle := ( end + start ) div 2

if key = numbers[middle] then found := TRUE

else if key > numbers[middle] then start := middle + 1

else if key < numbers[middle] then end := middle – 1

until found = TRUE OR start > end

end program

algorithms

Documents