adsa: sorting/3 1 241-423 advanced data structures and algorithms objective –examine popular...
TRANSCRIPT
ADSA: Sorting/3 1
241-423 Advanced Data Structures and Algorithms
• Objective– examine popular sorting algorithms, with an
emphasis on divide and conquer
Semester 2, 2013-2014
3. Sorting Algorithms
ADSA: Sorting/3 2
Contents
1. Insertion Sort
2. Divide and Conquer Algorithms
3. Merge Sort
4. Quicksort
5. Comparison of Sorting Algorithms
6. Finding the kth Largest Element
ADSA: Sorting/3 3
1. Insertion Sort
• Each pass inserts an element (x) into a sorted sublist (sub-array) on the left.
• Items larger than x move to the right to make room for its insertion.
ADSA: Sorting/3 5
Outline Algorithm
• Assume the first array element is in the right position.
• In the ith pass (1 ≤ i ≤ n-1), the elements in the range 0 to i-1 are already sorted.
• Insert ith position target into correct position j by moving elements in the range [j, i-1] to the right until there is space in arr[j].
ADSA: Sorting/3 6
Simple Insertion Sort
public static void insertion_srt(int arr[]){
int n = arr.length;
for (int i = 1; i < n; i++) {
int j = i;
int target = arr[i]; // sort ith elem
while ((j > 0) && (arr[j-1] > target)){
arr[j] = arr[j-1]; // move right
j--;
}
arr[j] = target;
}
}
ADSA: Sorting/3 7
insertionSort()public static <T extends Comparable<? super T>>void insertionSort(T[] arr){ int n = arr.length; for (int i = 1; i < n; i++) { int j = i; T target = arr[i];
while (j > 0 && target.compareTo(arr[j-1]) < 0) {
arr[j] = arr[j-1];
j--;
}
arr[j] = target;
}
} // end of insertionSort()
ADSA: Sorting/3 8
Insertion Sort Efficiency
• Best case running time is O(n) – when the array is already sorted
• The worst and average case running times are O(n2).
• Insertion sort is very efficient when the array is "almost sorted".
ADSA: Sorting/3 9
2. Divide and Conquer Algorithms
• Divide a problem into smaller versions of the same problem, using recursion.
• Solve the smaller versions.
• Combine the small versions solutions together to get an answer for the big, original problem.
ADSA: Sorting/3 10
Examples
• Binary search
• Merge sort and quicksort (here)
• Binary tree traversal
ADSA: Sorting/3 11
3. Merge Sort• Sort an array with n elements by splitting it
into two halves. Keep splitting in half recursively.
• Sort the small elements.
• Merge the small elements recursively back together into a single sorted array.
ADSA: Sorting/3 13
General Sort Methods
• Ford and Topp's Arrays class provides two versions of the merge sort algorithm.– one version takes an Object array arr[] as input;– the other version is generic and specifies arr[]
as an array of type T
• Both methods call msort() to carry out the merge sort.
ADSA: Sorting/3 14
sort() - with Object Array
public static void sort(Object[] arr){ // create a temporary array Object[] tempArr = arr.clone();
msort(arr, tempArr, 0, arr.length);}
sort the entire array
(the range 0-arr.length)
ADSA: Sorting/3 15
sort() - Generic Version
public static <T extends Comparable<? super T>>void sort(T[] arr){ // create a temporary array T[] tempArr = (T[])arr.clone();
msort(arr, tempArr, 0, arr.length);}
ADSA: Sorting/3 16
msort()
• Split into two lists by computing the midpoint of the index range:
int midpt = (last + first)/2;
• Call msort() recursively on the index range [first, mid) and on the index range [mid, last).
• When the resulting lists are small, start merging them back together into sorted order.
ADSA: Sorting/3 18
msort()private static void msort(Object[] arr, Object[] tempArr, int first, int last){ // if sublist has more than 1 elem. if ((first + 1) < last){ int midpt = (last + first)/2;
msort(arr, tempArr, first, midpt); msort(arr, tempArr, midpt, last);
// if arr[] is now sorted, finish if (((Comparable)arr[midpt-1]).compareTo (arr[midpt]) <= 0) return; :
ADSA: Sorting/3 19
// indexA scans arr[] in range [first, mid) int indexA = first;
// indexB scans arr[] in range [mid, last) int indexB = midpt;
int indexC = first; // for merged temp list
/* while both sublists are not finished, compare arr[indexA] and arr[indexB]; copy the smaller into the temp list */ while (indexA < midpt && indexB < last) { if (((Comparable)arr[indexA]).compareTo (arr[indexB]) < 0) { tempArr[indexC] = arr[indexA]; indexA++; }
ADSA: Sorting/3 20
else { tempArr[indexC] = arr[indexB]; indexB++; } indexC++; }
// copy over what's left of sublist A while (indexA < midpt) { tempArr[indexC] = arr[indexA]; indexA++; indexC++; } :
ADSA: Sorting/3 21
// copy over what's left of sublist B while (indexB < last) { tempArr[indexC] = arr[indexB]; indexB++; indexC++; }
// copy temp array back to arr[] for (int i = first; i < last; i++) arr[i] = tempArr[i]; }
} // end of msort()
ADSA: Sorting/3 22
msort() Notes
• Continue only as long as first+1 < last
• Do not merge arr if arr[mid-1] < arr[mid]
ADSA: Sorting/3 24
Efficiency of Merge Sort
• Total number of comparisons = no. of levels * no. of comparisons at a level
• msort() starts with a list of size n
• msort() recurses until the sublist size is 1
• Each level roughly halves the sublist size:– n, n/2, n/4, ..., 1
– no. of levels = log2n (roughly)
ADSA: Sorting/3 25
• No. of msort() calls at a level:– at level 0: 1 msort() call– at level 1: 2 calls– at level 2: 4 calls– ...– at level i: 2i calls
ADSA: Sorting/3 26
• No of comparisons in 1 msort call at a level:– at level 0: a msort() call compares n elements
– at level 1: n/2 comparisons
– at level 2: n/4 comparisons
– ...
– at level i: n/2i elements
• Total no. of comparisons at a level:– no. of calls at a level * comparisons in 1 msort()call
– 2i * n/2i = n
ADSA: Sorting/3 27
• Total number of comparisons = no. of levels * no. of comparisons at a level
= log2n * n
• So the worst case running time is = O(n log2n)
ADSA: Sorting/3 28
4. Quicksort
• Uses a divide-and-conquer strategy like merge sort.
• But, unlike merge sort, quicksort is an in-place sorting algorithm– elements are exchanged within the list without the need
for temporary lists/arrays– space efficient
ADSA: Sorting/3 29
Quicksort Steps
• Pick an element, called a pivot, from the list.
• Reorder the list so that all elements which are less than the pivot come before the pivot and so that all elements greater than the pivot come after it
ADSA: Sorting/3 30
• Recursively call quicksort on the sublist of lesser elements and the sublist of greater elements.
• The stopping case for the recursion are lists of size zero or one, which are always sorted.
ADSA: Sorting/3 32
Partitioning a List
• The pivot is the element at index– mid = (first + last)/2.
• Separate the elements of arr[] into two sublists, Sl and Sh.
– Sl contains the elements ≤ pivot (l = low)
– Sh contains the elements ≥ pivot (h = high)
ADSA: Sorting/3 33
• Exchange arr[first] and arr[mid]
• Scan the list with index range [first+1, last)– scanUp starts at first+1 and moves up the list,
finding elements for Sl.
– scanDown starts at position last -1 and moves down the list, finding elements for Sh.
ADSA: Sorting/3 34
• When arr[scanUp] pivot and arr[scanDown] pivot then the two elements are in the wrong sublists.
• Exchange the elements at the two positions and then resume scanning.
ADSA: Sorting/3 37
• scanUp and scanDown move toward each other until they meet or pass one another (scanDown scanUp).
ADSA: Sorting/3 38
• scanDown is at the place where the pivot should appear– exchange arr[0] and arr[scanDown] to correctly
position the pivot
ADSA: Sorting/3 39
pivotIndex()
• The methodpublic static <T extends Comparable<? super
T>> int pivotIndex(T[] arr, int first, int last)
takes array arr and index range [first, last) and returns the index of the pivot after partitioning arr[].
ADSA: Sorting/3 40
public static <T extends Comparable<? super T>>int pivotIndex(T[] arr, int first, int last){ int mid; // index for the midpoint T pivot;
if (first == last) // empty sublist return last; else if (first == (last-1)) // 1-element sublist return first; else { mid = (last + first)/2; pivot = arr[mid];
:
ADSA: Sorting/3 41
// exchange pivot and bottom end of range arr[mid] = arr[first]; arr[first] = pivot;
int scanUp = first + 1; // scanning indices int scanDown = last - 1;
while(true) { /* move up the lower sublist while scanUp is less than or equal to scanDown and the array value is less than pivot */ while ((scanUp <= scanDown) && (arr[scanUp].compareTo(pivot) < 0)) scanUp++;
ADSA: Sorting/3 42
/* move down upper sublist while array value is greater than the pivot */ while (pivot.compareTo(arr[scanDown]) < 0) scanDown--;
/* if indices are not in their sublists, partition is complete */ if (scanUp >= scanDown) break;
// found two elements in wrong sublists; exchange T temp = arr[scanUp]; arr[scanUp] = arr[scanDown]; arr[scanDown] = temp;
scanUp++; scanDown--; } :
ADSA: Sorting/3 43
// copy pivot to index posn (scanDown) that // partitions the sublists arr[first] = arr[scanDown]; arr[scanDown] = pivot;
return scanDown; }} // end of pivotIndex()
ADSA: Sorting/3 44
quicksort()
public static <T extends Comparable<? super T>>void quicksort(T[] arr){ qsort(arr, 0, arr.length); }
• quicksort() sorts a generic array arr[] by calling qsort() with the index range [0, arr.length).
ADSA: Sorting/3 45
qsort()
• Recursively partition the elements in the index range into smaller and smaller sublists, terminating when the size of a list is 0 or 1.
• For efficiency, handle a list of size 2 by comparing the elements and exchanging them if necessary.
ADSA: Sorting/3 46
• For larger lists, call pivotIndex() to reorder the elements and determine the pivot.
• Make two calls to qsort(): – the first call specifies the index range for the
lower sublist– the second call specifies the index range for the
upper sublist
ADSA: Sorting/3 48
private static <T extends Comparable<? super T>>void qsort(T[] arr, int first, int last){ // if range is less than two elements if ((last – first) <= 1) return;
// if sublist has two elements else if ((last – first) == 2) {
:
ADSA: Sorting/3 49
/* compare arr[first] and arr[last-1] and exchange if necessary */ if (arr[last-1].compareTo(arr[first]) < 0) { T temp = arr[last-1]; arr[last-1] = arr[first]; arr[first] = temp; } return; } else { int pivotLoc = pivotIndex(arr, first, last); qsort(arr, first, pivotLoc); qsort(arr, pivotLoc +1, last); }
} // end of qsort()
ADSA: Sorting/3 50
Running Time of Quicksort
• The average case running time is O(n log2n).
• The best case occurs when the array is already sorted.
ADSA: Sorting/3 52
• The worst-case occurs when the chosen pivot is always the largest or smallest element in its sublist. – the running time is O(n2)– highly unlikely
ADSA: Sorting/3 53
5. Comparison of Sorting Algorithms
• An inversion in an array, arr[], is an ordered pair (arr[i], arr[j]), i < j, where arr[i] > arr[j].
• When sorting in ascending order, arr[i] and arr[j] are out of order.
ADSA: Sorting/3 54
• The O(n2) sorting algorithms compare adjacent elements, generally remove one inversion with each iteration– e.g. selection and insertion sort
• The O(n log2n) sorting algorithms compare non-adjacent elements, and generally remove more than one inversion with each iteration.– e.g. quicksort and merge sort
ADSA: Sorting/3 55
Timing Sorts
import java.util.Random;import ds.util.Arrays;import ds.time.Timing;
public class TimingSorts{
public static void main(String[] args) { final int SIZE = 75000; Integer[] arr1 = new Integer[SIZE], arr2 = new Integer[SIZE], arr3 = new Integer[SIZE]; Random rnd = new Random(); :
ADSA: Sorting/3 56
/* load each array with the same sequence of random numbers in the range 0 to 999999 */ int rndNum; for (int i=0; i < SIZE; i++) { rndNum = rnd.nextInt(1000000); arr1[i] = arr2[i] = arr3[i] = rndNum; }
// call timeSort() for each sort timeSort(arr1, 0, "Merge sort"); timeSort(arr2, 1, "Quick sort"); timeSort(arr3, 2, "Insertion sort"); } // end of main()
ADSA: Sorting/3 57
public static <T extends Comparable<? super T>> void timeSort(T[] arr, int sortType, String sortName) { Timing t = new Timing(); t.start();
if(sortType == 0) Arrays.sort(arr); // merge sort in F&T else if (sortType == 1) Arrays.quicksort(arr); else Arrays.insertionSort(arr);
double timeRequired = t.stop();
outputFirst_Last(arr); System.out.print(" " + sortName + " time is " + timeRequired + "\n\n"); } // end of timeSort()
}
ADSA: Sorting/3 58
public static void outputFirst_Last(Object[] arr) // output first 3 elements and last 3 elements { for (int i=0; i < 3; i++) System.out.print(arr[i] + " ");
System.out.print(". . . ");
for (int i=n-3; i < arr.length; i++) System.out.print(arr[i] + " ");
System.out.println(); }
ADSA: Sorting/3 59
Output
26 38 47 . . . 999980 999984 999984 Merge sort time is 0.109
26 38 47 . . . 999980 999984 999984 Quick sort time is 0.078
26 38 47 . . . 999980 999984 999984 Insertion sort time is 100.611 O(nO(n22))
O(n logO(n log22n)n)
ADSA: Sorting/3 60
6. Finding the kth Largest Element
• Sort the array and then access the element at position k. – running time is O(n log2n) is we use quicksort
or merge sort
• For a more efficient solution, locate the position of the kth-largest value by partitioning the elements into two sublists.
ADSA: Sorting/3 61
• The lower sublist contains k elements that are ≤ the kth-largest.
• The upper sublist contains elements that are ≥ the kth-largest.
• The elements in the sublists do not need to be ordered.
values ≤ kth-largest values ≥ kth-largestkth-largest
k k+1 ... n-10 ... k-1
ADSA: Sorting/3 62
• Use the pivoting technique from the quicksort algorithm to create a partition.
• The algorithm is recursive: – index = pivotIndex() – If index == k, done, return arr[index]; – otherwise, call pivotIndex() with range [first, index) if k < index,
or with range [index+1, last) if k > index.• examine only one of the lists
ADSA: Sorting/3 63
public static <T extends Comparable<? super T>>int findKth(T[] arr, int first, int last, int k){ if (first > last) return -1; // partition range (first, last) in arr about the // pivot arr[index] int index = pivotIndex(arr, first, last);
// if index == k, we are done. kth largest is arr[k] if (index == k) return arr[index]; // return array value else if(k < index) // search in lower sublist (first, index) findKth(arr, first, index, k); else // search in upper sublist (index+1, last) findKth(arr, index+1, last, k);}
ADSA: Sorting/3 64
Running Time of findKth()
• The running time is O(n)– no of comparisons = n + n/2 + n/4 + n/8 + ...
= 2n
• This is faster than the O(n log2n) result for a sorted array– this is to be expected since findKth() only uses one of its
sublists at each recursive call compared to quicksort or merge sort which use both