2il50 data structures spring 2015 lecture 5: quicksort & selection
TRANSCRIPT
![Page 1: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/1.jpg)
2IL50 Data Structures
Spring 2015
Lecture 5: QuickSort & Selection
![Page 2: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/2.jpg)
QuickSort
One more sorting algorithm …
![Page 3: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/3.jpg)
Sorting algorithms
Input: a sequence of n numbers ‹a1, a2, …, an›
Output: a permutation of the input such that ‹ai1 ≤ … ≤ ain›
Important properties of sorting algorithms:
running time: how fast is the algorithm in the worst case
in place: only a constant number of input elements are ever stored outside the input array
![Page 4: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/4.jpg)
Sorting algorithms
Input: a sequence of n numbers ‹a1, a2, …, an›
Output: a permutation of the input such that ‹ai1 ≤ … ≤ ain›
Important properties of sorting algorithms:
running time: how fast is the algorithm in the worst case
in place: only a constant number of input elements are ever stored outside the input array
worst case running time in place
InsertionSort
MergeSort
HeapSort
QuickSort
Θ(n2) yes
Θ(n log n) no
yesΘ(n log n)
Θ(n2) yes
![Page 5: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/5.jpg)
QuickSort
Why QuickSort?
1. Expected running time: Θ(n log n) (randomized QuickSort)
2. Constants hidden in Θ(n log n) are small
3. using linear time median finding to guarantee good pivot gives worst case Θ(n log n)
worst case running time in place
InsertionSort
MergeSort
HeapSort
QuickSort
Θ(n2) yes
Θ(n log n) no
yesΘ(n log n)
Θ(n2) yes
![Page 6: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/6.jpg)
QuickSort
QuickSort is a divide-and-conquer algorithm
To sort the subarray A[p..r]:
DividePartition A[p..r] into two subarrays A[p..q-1] and A[q+1..r], such that each element in A[p..q-1] is ≤ A[q] and A[q] is < each element in A[q+1..r].
ConquerSort the two subarrays by recursive calls to QuickSort
CombineNo work is needed to combine the subarrays, since they are sorted in place.
Divide using a procedure Partition which returns q.
![Page 7: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/7.jpg)
QuickSort
QuickSort(A, p, r)
1. if p < r2. then q = Partition(A, p, r)3. QuickSort(A, p, q-1)4. QuickSort(A, q+1, r)
Partition(A, p, r)
5. x = A[r]6. i = p-17. for j = p to r-18. do if A[j] ≤ x9. then i = i+110. exchange A[i] ↔ A[j]11. exchange A[i+1] ↔ A[r]12. return i+1
Initial call: QuickSort(A, 1, n)
Partition always selects A[r] as the pivot (the element around which to partition)
![Page 8: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/8.jpg)
Partition
As Partition executes, the arrayis partitioned into four regions (some may be empty)
Loop invariant
1. all entries in A[p..i] are ≤ pivot
2. all entries in A[i+1..j-1] are > pivot
3. A[r] = pivot
Partition(A, p, r)
1. x = A[r]2. i = p-13. for j = p to r-14. do if A[j] ≤ x5. then i = i+16. exchange A[i] ↔ A[j]7. exchange A[i+1] ↔ A[r]8. return i+1
x
p i j r
≤ x > x ???
![Page 9: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/9.jpg)
Partition
xp i j r
≤ x > x ???
Partition(A, p, r)
1. x = A[r]2. i = p-13. for j = p to r-14. do if A[j] ≤ x5. then i = i+16. exchange A[i] ↔ A[j]7. exchange A[i+1] ↔ A[r]8. return i+18 1 6 4 0 3 9 5
pi j r
8 1 6 4 0 3 9 5pi j r
1 8 6 4 0 3 9 5pi j r
1 4 6 8 0 3 9 5p i j r
1 8 6 4 0 3 9 5pi j r
1 4 0 8 6 3 9 5p i j r
1 4 0 3 6 8 9 5p i j r
1 4 0 3 6 8 9 5p i r
1 4 0 3 5 8 9 6p i r
![Page 10: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/10.jpg)
Loop invariant
1. all entries in A[p..i] are ≤ pivot
2. all entries in A[i+1..j-1] are > pivot
3. A[r] = pivot
Initializationbefore the loop starts, all conditions are satisfied, since r is the pivot and the two subarrays A[p..i] and A[i+1..j-1] are empty
Maintenancewhile the loop is running, if A[j] ≤ pivot, then A[j] and A[i+1] are swapped and then i and j are incremented ➨ 1. and 2. hold.If A[j] > pivot, then increment only j ➨ 1. and 2. hold.
Partition - Correctness
Partition(A, p, r)
1. x = A[r]2. i = p-13. for j = p to r-14. do if A[j] ≤ x5. then i = i+16. exchange A[i] ↔ A[j]7. exchange A[i+1] ↔ A[r]8. return i+1
xp i j
≤ x > x ???
r
![Page 11: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/11.jpg)
Loop invariant
1. all entries in A[p..i] are ≤ pivot
2. all entries in A[i+1..j-1] are > pivot
3. A[r] = pivot
Terminationwhen the loop terminates, j = r, so all elements in A are partitioned into one of three cases:
A[p..i] ≤ pivot, A[i+1..r-1] > pivot, and A[r] = pivot
Lines 7 and 8 move the pivot between the two subarrays
Running time:
Partition - Correctness
Partition(A, p, r)
1. x = A[r]2. i = p-13. for j = p to r-14. do if A[j] ≤ x5. then i = i+16. exchange A[i] ↔ A[j]7. exchange A[i+1] ↔ A[r]8. return i+1
xp i j
≤ x > x ???
r
Θ(n) for an n-element subarray
![Page 12: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/12.jpg)
QuickSort: running timeQuickSort(A, p, r)
1. if p < r2. then q = Partition(A, p, r)3. QuickSort(A, p, q-1)4. QuickSort(A, q+1, r)
Running time depends on partitioning of subarrays: if they are balanced, then QuickSort is as fast as MergeSort if they are unbalanced, then QuickSort can be as slow as
InsertionSort
Worst case
subarrays completely unbalanced: 0 elements in one, n-1 in the other
T(n) = T(n-1) + T(0) + Θ(n) = T(n-1) + Θ(n) = Θ(n2) input: sorted array
![Page 13: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/13.jpg)
QuickSort: running timeQuickSort(A, p, r)
1. if p < r2. then q = Partition(A, p, r)3. QuickSort(A, p, q-1)4. QuickSort(A, q+1, r)
Running time depends on partitioning of subarrays: if they are balanced, then QuickSort is as fast as MergeSort if they are unbalanced, then QuickSort can be as slow as
InsertionSort
Best case
subarrays completely balanced: each has ≤ n/2 elements T(n) = 2T(n/2) + Θ(n) = Θ(n log n)
Average?
![Page 14: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/14.jpg)
QuickSort: running time
Average running time is much closer to best case than to worst case.
Intuition
imagine that Partition always produces a 9-to1 split T(n) = T(9n/10) + T(n/10) + Θ(n)
![Page 15: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/15.jpg)
T(n) = T(9n/10) + T(n/10) + Θ(n)
Remember Section 4.2 (or Lecture 2) log10n full levels, log10/9n non-empty levels base of log does not matter in asymptotic notation
(as long as it is constant)
![Page 16: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/16.jpg)
QuickSort: running time
Average running time is much closer to best case than to worst case.
Intuition
imagine that Partition always produces a 9-to1 split T(n) = T(9n/10) + T(n/10) + Θ(n) = Θ(n log n)
Any split of constant proportionality yields a recursion tree of depth Θ(log n)
But splits will not always be constant, there will be a mix of good andbad splits …
![Page 17: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/17.jpg)
QuickSort: running time
Average running time is much closer to best case than to worst case.
More intuition …
mixing good and bad splits does not affect the asymptotic running time
assume levels alternate between best-case and worst case splits
extra levels add only to hidden constant, in both cases O(n log n)
Θ(n)n
0 n-1
(n-1)/2 -1 (n-1)/2 -1
Θ(n)n
(n-1)/2 -1 (n-1)/2 -1
![Page 18: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/18.jpg)
Randomized QuickSort
pick pivot at random
RandomizedPartition(A, p, r)
1. i = Random(p, r)2. exchange A[r] ↔A[i]3. return Partition(A, p, r)
random pivot results in reasonably balanced split on average ➨ expected running time Θ(n log n)
see book for detailed analysis
alternative: use linear time median finding to find a good pivot ➨ worst case running time Θ(n log n)
price to pay: added complexity
![Page 19: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/19.jpg)
Selection
Medians and Order Statistics
![Page 20: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/20.jpg)
Definitions
ith order statistic: ith smallest of a set of n elements
minimum: 1st order statistic
maximum: nth order statistic
median: “halfway point” n odd ➨ unique median at i = (n+1)/2 n even ➨ lower median at i = n/2, upper median at i = n/2+1
here: median means lower median
![Page 21: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/21.jpg)
The selection problem
Input: a set A of of n distinct numbers and a number i, with 1 ≤ i ≤ n.
Output: The element x ∈ A that is larger than exactly i-1 other elements in A. (The ith smallest element of A.)
Easy solution:
1. sort the input in Θ(n log n) time
2. return the ith element in the sorted array
This can be done faster … start with minimum and maximum
![Page 22: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/22.jpg)
Minimum and maximum
Find the minimum with n-1 comparisons: examine each element in turn and keep track of the smallest one
Is this the best we can do?
Each element (except the minimum) must be compared to a smaller element at least once …
Minimum(A, n)
1. min = A[1]2. for i = 2 to n3. do if min > A[i]4. then min = A[i]5. return min
Find maximum by replacing > with <
yes
![Page 23: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/23.jpg)
Simultaneous minimum and maximum
Assume we need to find both the minimum and the maximum
Easy solution: find both separately
➨ 2n-2 comparisons ➨ Θ(n) time
But only 3 n/2 are needed …
maintain the minimum and maximum seen so far don’t compare elements to the minimum and maximum
separately, process them in pairs compare the elements of each pair to each other, then compare
the largest to the maximum and the smallest to the minimum
➨ 3 comparisons for every 2 elements
![Page 24: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/24.jpg)
The selection problem
Input: a set A of of n distinct numbers and a number i, with 1 ≤ i ≤ n.
Output: The element x ∈ A that is larger than exactly i-1 other elements in A. (The ith smallest element of A.)
TheoremThe ith smallest element of A can be found in O(n) time in the worst case.
Idea: partition the input array, recurse on one side of the split guarantee a good split use Partition with a designated pivot element
![Page 25: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/25.jpg)
Selection in worst-case linear time
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
20 3 10 8 14 6 12 9 11 18 7 4 5 17 15 1 2 13
1. Divide the n elements into groups of 5 ➨ n/5 groups
i = 12
![Page 26: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/26.jpg)
Selection in worst-case linear time
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
20 3 10 8 14 6 12 9 11 18 7 4 5 17 15 1 2 13
1. Divide the n elements into groups of 5 ➨ n/5 groups
2. Find the median of each of the n/5 groups(sort each group of 5 elements in constant time and simply pick the median)
3. Find the median x of the n/5 medians recursively
4. Partition the array around x
x
i = 12
![Page 27: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/27.jpg)
Selection in worst-case linear time
1. Divide the n elements into groups of 5 ➨ n/5 groups
2. Find the median of each of the n/5 groups(sort each group of 5 elements in constant time and simply pick the median)
3. Find the median x of the n/5 medians recursively
4. Partition the array around x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
20 3 10 8 14 6 12 9 11 18 7 4 5 17 15 1 2 13x
i = 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
3 6 4 5 1 2 7 12 9 11 18 13 10 8 17 15 14 20x
➨ x is the kth element after partitioning
k-1 n-k
![Page 28: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/28.jpg)
Selection in worst-case linear time
1. Divide the n elements into groups of 5 ➨ n/5 groups
2. Find the median of each of the n/5 groups(sort each group of 5 elements in constant time and simply pick the median)
3. Find the median x of the n/5 medians recursively
4. Partition the array around x
5. If i = k, return x. If i < k, recursively find the ith smallest element on the low side. If i > k, recursively find the (i-k)th smallest element on the high side.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
20 3 10 8 14 6 12 9 11 18 7 4 5 17 15 1 2 13x
i = 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
3 6 4 5 1 2 7 12 9 11 18 13 10 8 17 15 14 20x
➨ x is the kth element after partitioning
k-1 n-k
![Page 29: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/29.jpg)
Selection in worst-case linear time
1. Divide the n elements into groups of 5 ➨ n/5 groups
2. Find the median of each of the n/5 groups(sort each group of 5 elements in constant time and simply pick the median)
3. Find the median x of the n/5 medians recursively
4. Partition the array around x
5. If i = k, return x. If i < k, recursively find the ith smallest element on the low side. If i > k, recursively find the (i-k)th smallest element on the high side.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
20 3 10 8 14 6 12 9 11 18 7 4 5 17 15 1 2 13x
i = 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
3 6 4 5 1 2 7 12 9 11 18 13 10 8 17 15 14 20
➨ x is the kth element after partitioning
i = 5
![Page 30: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/30.jpg)
Analysis
How many elements are larger than x?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
3 6 4 5 1 2 7 12 9 11 18 13 10 8 17 15 14 20
![Page 31: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/31.jpg)
How many elements are larger than x?
Half of the medians found in step 2 are ≥ x The groups of these medians contain 3 elements each which are > x
(discounting x’s group and the last group)
➨ at least elements are > x
Analysis
x
6103n
25n
21
3
![Page 32: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/32.jpg)
Symmetrically, at least 3n/10 – 6 elements are < x
➨ the algorithm recurses on ≤ 7n/10 + 6 elements
Analysis
x
![Page 33: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/33.jpg)
1. Divide the n elements into groups of 5 ➨ n/5 groups
2. Find the median of each of the n/5 groups(sort each group of 5 elements in constant time and simply pick the median)
3. Find the median x of the n/5 medians recursively
4. Partition the array around x
5. If i = k, return x. If i < k, recursively find the ith smallest element on the low side. If i > k, recursively find the (i-k)th smallest element on the high side.
1. O(n)
2. O(n)
3. T( n/5 )
4. O(n)
5. ≤ T(7n/10 + 6)
T(n) = O(1) for small n (< 140)
Analysis
O(1) if n < 140
T( n/5 ) + T(7n/10 + 6) + O(n) if n ≥ 140T(n) =
![Page 34: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/34.jpg)
Solving the recurrence
Solve by substitution
Inductive hypothesis: T(n) ≤ cn for some constant c and all n > 0
assume that c is large enough such that T(n) ≤ cn for all n < 140 pick constant a such that the O(n) term is ≤ an for all n > 0
T(n) ≤ c n/5 + c(7n/10 + 6) + an ≤ c n/5 + c + 7cn/10 + 6c + an = 9cn/10 + 7c + an = cn + (-cn/10 + 7c + an)
remains to show: -cn/10 + 7c + an ≤ 0
O(1) if n < 140
T( n/5 ) + T(7n/10 + 6) + O(n) if n ≥ 140T(n) =
![Page 35: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/35.jpg)
Solving the recurrence
remains to show: -cn/10 + 7c + an ≤ 0
-cn/10 + 7c + an ≤ 0 cn/10 -7c ≥ an
cn -70c ≥ 10an c(n -70) ≥ 10an
c ≥ 10a(n/(n-70))
n ≥ 140 ➨ n/(n-70) ≤ 2 ➨ 20a ≥ 10a(n/(n-70))
choose c ≥ 20a ➨ T(n) = O(n) ■
O(1) if n < 140
T( n/5 ) + T(7n/10 + 6) + O(n) if n ≥ 140T(n) =
Why 140? Any integer > 70 would have worked …
![Page 36: 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cf45503460f949c2d77/html5/thumbnails/36.jpg)
Selection
TheoremThe ith smallest element of A can be found in O(n) time in the worst case.
Does not require any assumptions on the input Is not in conflict with the Ω(n log n) lower bound for sorting, since it
does not use sorting
Randomized Selection: pick a pivot at random
TheoremThe ith smallest element of A can be found in O(n) expected time.