chapter 9: selection order statistics what are an order statistic? min, max median, i th smallest,...
TRANSCRIPT
![Page 1: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/1.jpg)
Chapter 9: Selection Order Statistics
What are an order statistic?min, max median, ith smallest, etc.
Selection means finding a particular order statisticSelection by sorting T(n) = (nlgn)Selection in linear time
best caseworst caseaverage case
![Page 2: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/2.jpg)
Given a set of n elements, ith order statistic = ith smallest element
min is 1st order statistic; max is the nth order statistic
parity of a set is whether n is even or odd
median is roughly half way between min and max unique for an odd parity set
ith smallest with i = (n+1)/2
regardless of parity lower median means ith smallest with i = (n+1)/2upper median means ith smallest with i = (n+1)/2
Min, Max and Median order statistics
![Page 3: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/3.jpg)
Find the ith order statistic in set of n (distinct) elements A=<a1, a2,...,an> (i.e. find x A that x is larger than exactly i –1 other elements of A)
Selection problem can be solve in O(nlgn) by sorting Since min and max can be found in linear time, expect that any order statistic can be found in linear time.
Analyze deterministic selection algorithm with O(n) in worst case.
Analyze randomized selection by partition: O(n) in average case
Selection problem
![Page 4: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/4.jpg)
Selection algorithm with worst-case runtime = O(n)
Possible to design a deterministic selection algorithm that has a linear worst-case runtime.
Making the pivot an input parameter, can guarantee a good split when partition is called
Processing before calling partition determines a good choice for pivot.
![Page 5: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/5.jpg)
Outline of recursive Select with worst-case runtime = O(n):
Step 1: Divide n-element sequence into flour(n/5) groups of at most 5 elements (one may have less than 5) cost = (n)
Step 2: Use insertion sort to find median of each subgroup cost = constant times number of subgroups = (n)
Step 3: Use Select to find the median of the medianscost = T(ceiling(n/5))
Step 4: Partition the input array with pivot = median of mediansCalculate the number of elements in the lower sub-arraycost = (n) + constant
Step 5: If pivot is not the ith smallest element, bound the runtime by the
time to Select from the larger sub-arraycost < T(7n/10 + 6)
![Page 6: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/6.jpg)
Diagram to help explain cost of Step 5
Dots represent elements of input. Subgroups of 5 occupy columns
Arrows point from larger to smaller elements.
Medians are white. x marks median of medians. Shaded area shows elements greater than x
3 out of 5 are shaded if subgroup is full and does not contain x
![Page 7: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/7.jpg)
At least 3[(1/2)(n/5) – 2] elements larger than x
At most {n - 3[(1/2)(n/5) – 2]} = 7n/10+6 elements less than x
Worst case described by T(n) = T(ceiling(n/5)) + T(7n/10+6) + (n)
Solve by substitution method
![Page 8: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/8.jpg)
CS 450 Spring 2015[All problems are from Cormen et al, 3rd Edition]
Homework Assignment 8: due 3/13/151. ex 9.3-1 p 223 (groups of 7 and 3)2. ex 9.3-3 p 2233. ex 9.3-5 p 223
On problems 2 and 3, Write a pseudo code (variation of code in text)Explain how code worksAnalyze its run time
![Page 9: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/9.jpg)
Select-by-Partition(A,p,r,i)1 if p=r then return A[p] (single element is ith smallest by default)2 q Partition(A,p,r) (get upper and lower sub-arrays)3 k q – p + 1 (number of elements in lower including pivot)4 if i = k then 5 return A[q] (pivot is the ith smallest element)6 else7 if i < k then return Select-by-Partition(A,p,q-1,i)8 else 9 return Select-by-Partition(A,q+1,r,i - k)
Select by partition pseudocode
Note: index of ith order statistic changed in upper sub-arrayWith favorable splits, T(n) = O(n)Why not O(nlg(n)) as in quicksort?
![Page 10: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/10.jpg)
Randomized-Select lets us analyze the runtime for the average case
Randomized-Select(A,p,r,i)1 if p=r then return A[p]2 q Randomized-Partition(A,p,r)3 k q – p + 14 if i = k then 5 return A[q] (pivot is the ith smallest element)6 else7 if i < k then return Randomized-Select(A,p,q-1,i)8 else 9 return Randomized-Select(A,q+1,r,i –k)
As in Randomized-Quicksort, Randomized-Partition chooses a pivot at random from array elements between p and r
![Page 11: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/11.jpg)
Upper bound on the expected value of T(n) for Randomized-Select
Call to Randomized-Partition creates upper and lower sub-arrays
Include the pivot in lower sub-array A(p..q)
Define indicator random variables Xk = I{sub-array A[p...q]} has exactly k elements} 1 < k < n
All possibilities values of k are equally likely. E[Xk] = 1/n
![Page 12: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/12.jpg)
Assume that the desired element always falls in larger partitionThis assumption ensures an upper bound on E(T(n))
T(n) < {Xk T(max(k-1,n-k))} + O(n)
Sum contains only one nonzero termT(n) = T(n-1) + O(n) when lower sub-array has 1 elementT(n) = T(n-2) + O(n) when lower sub-array has 2 element...T(n) = T(n-2) + O(n) when lower sub-array has n-1 elementT(n) = T(n-1) + O(n) when lower sub-array has n element
n
1k
![Page 13: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/13.jpg)
E[T(n)] < { E[Xk T(max(k-1,n-k))] } + O(n) (linearity of expected values)
E[T(n)] < { E[Xk] E[ T(max(k-1,n-k))] } + O(n)
(independence of random variables, exercise 9.2-2)
E[T(n)] < (1/n) E[ T(max(k-1,n-k))] + O(n)
(using E[Xk] = 1/n)
n
1k
n
1k
n
1k
![Page 14: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/14.jpg)
E[T(n)] < (1/n) E[ T(max(k-1,n-k))] + O(n)
if k > n/2, max(k-1,n-k) = k-1 if k < n/2, max(k-1,n-k) = n-k
For even n, each term from T(n/2) to T(n-1) occurs exactly twice
Similar argument applies for odd n
E[T(n)] < (2/n) E[ T(k)] + O(n) (using the redundancy of T’s)
E[T(n)] < (2/n) { E[ T(k)] - E[ T(k)] } + O(n)
(Get setup to use the arithmetic sum)
n
1k
1-n
n/2k
1-n
1k
1-n/2
1k
![Page 15: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/15.jpg)
Apply substitution method: assume E[T(k)] = O(k)
Then exist c > 0 such that E[T(k)] < ck
E[T(n)] < (2c/n) { k - k} + dn d>0
Now use arithmetic sum
After much algebra (text p219)
E[T(n)] < cn – (cn/4 – c/2 – dn)
Find c and n0
1-n
1k
1-n/2
1k
![Page 16: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/16.jpg)
![Page 17: Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic](https://reader036.vdocuments.us/reader036/viewer/2022082711/56649f135503460f94c27c81/html5/thumbnails/17.jpg)
simplify (see text p219)
E[T(n)] < cn – (cn/4 – c/2 – dn)
E[T(n)] < cn if (cn/4 – c/2 – dn) > 0
n(c/4 –d) > c/2
If c > 4d, n sufficient large does exist
If c = 8d, than n > 4